Commit Graph

267 Commits

Author SHA1 Message Date
Rakesh Pandit 32c662c58a lightnvm: include NVM Express driver if OCSSD is selected for build
Because NVM needs BLK_DEV_NVME, select it automatically if we mark NVM
in config file before building kernel.  Also append PCI to depends as
select doesn't automatically add dependencies.

Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-13 08:34:57 -06:00
Rakesh Pandit c9d84b350f lightnvm: pblk: fix error path in pblk_lines_alloc_metadata
Use appropriate memory free calls based on allocation type used and
also fix number of times free is called if kmalloc fails.

Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-13 08:34:57 -06:00
Rakesh Pandit a96d50fa0c lightnvm: remove already calculated nr_chnls
Remove repeated calculation for number of channels while creating a
target device.

Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-13 08:34:57 -06:00
Rakesh Pandit 88d31ea267 lightnvm: protect target type list with correct locks
nvm_tgt_types list was protected by wrong lock for NVM_INFO ioctl call
and can race with addition or removal of target types.  Also
unregistering target type was not protected correctly.

Fixes: 5cd907853 ("lightnvm: remove nested lock conflict with mm")
Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-13 08:34:57 -06:00
Rakesh Pandit bb6aa6f082 lightnvm: prevent bd removal if busy
When a virtual block device is formatted and mounted after creating
with "nvme lnvm create... -t pblk", a removal from "nvm lnvm remove"
would result in this:

446416.309757] bdi-block not registered
[446416.309773] ------------[ cut here ]------------
[446416.309780] WARNING: CPU: 3 PID: 4319 at fs/fs-writeback.c:2159
  __mark_inode_dirty+0x268/0x340

Ideally removal should return -EBUSY as block device is mounted after
formatting.  This patch tries to address this checking if whole device
or any partition of it already mounted or not before removal.

Whole device is checked using "bd_super" member of block device.  This
member is always set once block device has been mounted using a
filesystem.  Another member "bd_part_count" takes care of checking any
if any partitions are under use.  "bd_part_count" is only updated
under locks when partitions are opened or closed (first open and last
release).  This at least does take care sending -EBUSY if removal is
being attempted while whole block device or any partition is mounted.

Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-13 08:34:57 -06:00
Rakesh Pandit 900148296b lightnvm: prevent target type module removal when in use
If target type module e.g. pblk here is unloaded (rmmod) while module
is in use (after creating target) system crashes.  We fix this by
using module API refcnt.

Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-13 08:34:57 -06:00
Javier González 75cb8e939c lightnvm: pblk: advance bio according to lba index
When a lba either hits the cache or corresponds to an empty entry in the
L2P table, we need to advance the bio according to the position in which
the lba is located. Otherwise, we will copy data in the wrong page, thus
causing data corruption for the application.

In case of a cache hit, we assumed that bio->bi_iter.bi_idx would
contain the correct index, but this is no necessarily true. Instead, use
the local bio advance counter and iterator. This guarantees that lbas
hitting the cache are copied into the right bv_page.

In case of an empty L2P entry, we omitted to advance the bio. In the
cases when the same I/O also contains a cache hit, data corresponding
to this lba will be copied to the wrong bv_page. Fix this by advancing
the bio as we do in the case of a cache hit.

Fixes: a4bd217b43 lightnvm: physical block device (pblk) target

Signed-off-by: Javier González <javier@javigon.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-28 08:06:00 -06:00
Javier González 56c76417ad lightnvm: pblk: remove unnecessary checks
Remove unnecessary checks when freeing dma memory in the completion
path.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-07 13:17:36 -06:00
Javier González 3eaa11e278 lightnvm: pblk: control I/O flow also on tear down
When removing a pblk instance, control the write I/O flow to the
controller as we do in the fast path.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-07 13:17:34 -06:00
Javier González a84ebb837b lightnvm: pblk: set line bitmap check under debug
Do bitmap checks only when debug mode is enable. The line bitmap used
for mapping to physical addresses is fairly large (~512KB) and it is
expensive to do this checks on the fast path.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-30 11:08:18 -06:00
Javier González 076984669d lightnvm: pblk: verify that cache read is still valid
When a read is directed to the cache, we risk that the lba has been
updated during the time we made the L2P table lookup and the time we are
actually reading form the cache. We intentionally not hold the L2P lock
not to block other threads.

While strict ordering is not a guarantee at this level (unless REQ_FLUSH
has been previously issued), we have experience that some databases that
have recently implemented direct I/O support, issue metadata reads very
close to the writes, without issuing a fsync in the middle. An easy way
to support them while they is to make an extra effort and check the L2P
map right before reading the cache.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-30 11:08:18 -06:00
Javier González b5e063a286 lightnvm: pblk: add initialization check
Add a sanity check to the pblk initialization sequence in order to
ensure that enough LUNs have been allocated to store the line metadata.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-30 11:08:18 -06:00
Javier González ee8d5c1ad5 lightnvm: pblk: remove target using async. I/Os
When removing a pblk instance, pad the current line using asynchronous
I/O. This reduces the removal time from ~1 minute in the worst case to a
couple of seconds.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-30 11:08:18 -06:00
Javier González de54e703a4 lightnvm: pblk: use vmalloc for GC data buffer
For now, we allocate a per I/O buffer for GC data. Since the potential
size of the buffer is 256KB and GC is not in the fast path, do this
allocation with vmalloc. This puts lets pressure on the memory
allocator at no performance cost.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-30 11:08:18 -06:00
Javier González 8224cbd80b lightnvm: pblk: use right metadata buffer for recovery
Fix bad metadata buffer assignations introduced when refactoring the
medatada write path.

Fixes: dd2a434373 lightnvm: pblk: sched. metadata on write thread
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-30 11:08:18 -06:00
Javier González 1088812978 lightnvm: pblk: schedule if data is not ready
When user threads place data into the write buffer, they reserve space
and do the memory copy out of the lock. As a consequence, when the write
thread starts persisting data, there is a chance that it is not copied
yet. In this case, avoid polling, and schedule before retrying.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-30 11:08:18 -06:00
Javier González 653cbb8472 lightnvm: pblk: remove unused return variable
Remove unused variable.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-30 11:08:18 -06:00
Javier González 2950e7e610 lightnvm: pblk: fix double-free on pblk init
Prevent pblk->lines being double freed in case of an error during pblk
initialization.

Fixes: dd2a43437337: "lightnvm: pblk: sched. metadata on write thread"
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-30 11:08:18 -06:00
Javier González f417aa0bd8 lightnvm: pblk: fix bad le64 assignations
Use the right types and conversions on le64 variables. Reported by
sparse.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-30 11:08:18 -06:00
Rakesh Pandit 12e9a6d622 lightnvm: if LUNs are already allocated fix return
While creating new device with NVM_DEV_CREATE if LUNs are already
allocated ioctl would return -ENOMEM which is wrong.  This patch
propagates -EBUSY from nvm_reserve_luns which is correct response.

Fixes: ade69e243 ("lightnvm: merge gennvm with core")
Reviewed-by: Frans Klaver <fransklaver@gmail.com>
Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-27 08:22:09 -06:00
Javier González 588726d3ec lightnvm: pblk: fail gracefully on irrec. error
Due to user writes being decoupled from media writes because of the need
of an intermediate write buffer, irrecoverable media write errors lead
to pblk stalling; user writes fill up the buffer and end up in an
infinite retry loop.

In order to let user writes fail gracefully, it is necessary for pblk to
keep track of its own internal state and prevent further writes from
being placed into the write buffer.

This patch implements a state machine to keep track of internal errors
and, in case of failure, fail further user writes in an standard way.
Depending on the type of error, pblk will do its best to persist
buffered writes (which are already acknowledged) and close down on a
graceful manner. This way, data might be recovered by re-instantiating
pblk. Such state machine paves out the way for a state-based FTL log.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-26 16:27:39 -06:00
Javier González ef5764946b lightnvm: pblk: set mempool and workqueue params.
Make constants to define sizes for internal mempools and workqueues. In
this process, adjust the values to be more meaningful given the internal
constrains of the FTL. In order to do this for workqueues, separate the
current auxiliary workqueue into two dedicated workqueues to manage
lines being closed and bad blocks.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-26 16:27:39 -06:00
Javier González b20ba1bc74 lightnvm: pblk: redesign GC algorithm
At the moment, in order to get enough read parallelism, we have recycled
several lines at the same time. This approach has proven not to work
well when reaching capacity, since we end up mixing valid data from all
lines, thus not maintaining a sustainable free/recycled line ratio.

The new design, relies on a two level workqueue mechanism. In the first
level, we read the metadata for a number of lines based on the GC list
they reside on (this is governed by the number of valid sectors in each
line). In the second level, we recycle a single line at a time. Here, we
issue reads in parallel, while a single GC write thread places data in
the write buffer. This design allows to (i) only move data from one line
at a time, thus maintaining a sane free/recycled ration and (ii)
maintain the GC writer busy with recycled data.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-26 16:27:39 -06:00
Javier González 476118c981 lightnvm: pblk: add lock assertions on helpers
Add lockdep assertions on helper functions.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-26 16:27:39 -06:00
Javier González 0c0ea8817e lightnvm: pblk: cleanup unnecessary code
Cleanup unnecessary headers and code lines.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-26 16:27:39 -06:00
Javier González 63e3809cf7 lightnvm: pblk: set metadata list for all I/Os
Set a dma area for all I/Os in order to read/write from/to the metadata
stored on the per-sector out-of-bound area.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-26 16:27:39 -06:00
Javier González d45ebd470b lightnvm: pblk: choose optimal victim GC line
At the moment, we separate the closed lines on three different list
based on their number of valid sectors. GC recycles lines from each list
based on capacity. Lines from each list are taken in a FIFO fashion.

Since the number of lines is limited (it corresponds to the number of
blocks in a LUN, which is somewhere between 1000-2000), we can afford
scanning the lists to choose the optimal line to be recycled. This helps
specially in lines with a high number of valid sectors.

If the number of blocks per LUN increases, we will consider a more
efficient policy.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-26 16:27:39 -06:00
Javier González dffdd960ee lightnvm: pblk: decouple bad block from line alloc
Decouple bad block discovery from line allocation logic. This allows to
return meaningful error codes in case of bad block discovery failure.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-26 16:27:39 -06:00
Javier González f680f19aa6 lightnvm: pblk: simplify meta. memory allocation
smeta size will always be suitable for a kmalloc allocation. Simplify
the code and leave the vmalloc fallback only for emeta, where the pblk
configuration has an impact.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-26 16:27:39 -06:00
Javier González f9c101523d lightnvm: pblk: issue multiplane reads if possible
If a read request is sequential and its size aligns with a
multi-plane page size, use the multi-plane hint to process the I/O in
parallel in the controller.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-26 16:27:39 -06:00
Javier González 0880a9aa2d lightnvm: pblk: delete redundant buffer pointer
After refactoring the metadata path, the backpointer controlling
synced I/Os in a line becomes unnecessary; metadata is scheduled
on the write thread, thus we know when the end of the line is reached
and act on it directly.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-26 16:27:39 -06:00
Javier González fd1b0158f5 lightnvm: pblk: delete redundant debug line stat
Remove a legacy variable that helped verifying the consistency of the
run-time metadata for the free line list. With the new metadata layout,
this check is no longer necessary.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-26 16:27:39 -06:00
Javier González dd2a434373 lightnvm: pblk: sched. metadata on write thread
At the moment, line metadata is persisted on a separate work queue, that
is kicked each time that a line is closed. The assumption when designing
this was that freeing the write thread from creating a new write request
was better than the potential impact of writes colliding on the media
(user I/O and metadata I/O). Experimentation has proven that this
assumption is wrong; collision can cause up to 25% of bandwidth and
introduce long tail latencies on the write thread, which potentially
cause user write threads to spend more time spinning to get a free entry
on the write buffer.

This patch moves the metadata logic to the write thread. When a line is
closed, remaining metadata is written in memory and is placed on a
metadata queue. The write thread then takes the metadata corresponding
to the previous line, creates the write request and schedules it to
minimize collisions on the media. Using this approach, we see that we
can saturate the media's bandwidth, which helps reducing both write
latencies and the spinning time for user writer threads.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-26 16:27:39 -06:00
Javier González 084ec9ba07 lightnvm: pblk: rename read request pool
Read requests allocate some extra memory to store its per I/O context.
Instead of requiring yet another memory pool for other type of requests,
generalize this context allocation (and change naming accordingly).

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-26 16:27:13 -06:00
Javier González d624f371d5 lightnvm: pblk: generalize erase path
Erase I/Os are scheduled with the following goals in mind: (i) minimize
LUNs collisions with write I/Os, and (ii) even out the price of erasing
on every write, instead of putting all the burden on when garbage
collection runs. This works well on the current design, but is specific
to the default mapping algorithm.

This patch generalizes the erase path so that other mapping algorithms
can select an arbitrary line to be erased instead. It also gets rid of
the erase semaphore since it creates jittering for user writes.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-26 16:24:53 -06:00
Javier González c2e9f5d457 lightnvm: pblk: expose max sec per write on sysfs
Allow to configure the number of maximum sectors per write command
through sysfs. This makes it easier to tune write command sizes for
different controller configurations.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-26 16:24:53 -06:00
Javier González db7ada33cd lightnvm: pblk: add debug stat for read cache hits
Add a new debug counter to measure cache hits on the read path

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-26 16:24:53 -06:00
Javier González caa69fa560 lightnvm: pblk: spare double cpu_to_le64 calc.
Spare a double calculation on the fast write path.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-26 16:24:53 -06:00
Javier González 3e505afb45 lightnvm: re-convert ppa format on I/O failure
In case of a failure when submitting a request, convert the ppa_list
addresses to the target format so that it can interpret ppas for
recovery

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-26 16:24:53 -06:00
NeilBrown b25d52379a lightnvm/pblk-read: use bio_clone_fast()
pblk_submit_read() uses bio_clone_bioset() but doesn't change the
io_vec, so bio_clone_fast() is a better choice.

It also uses fs_bio_set which is intended for filesystems.  Using it
in a device driver can deadlock.
So allocate a new bioset, and and use bio_clone_fast().

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Javier González <javier@cnexlabs.com>
Tested-by: Javier González <javier@cnexlabs.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-18 12:40:59 -06:00
NeilBrown af67c31fba blk: remove bio_set arg from blk_queue_split()
blk_queue_split() is always called with the last arg being q->bio_split,
where 'q' is the first arg.

Also blk_queue_split() sometimes uses the passed-in 'bs' and sometimes uses
q->bio_split.

This is inconsistent and unnecessary.  Remove the last arg and always use
q->bio_split inside blk_queue_split()

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Credit-to: Javier González <jg@lightnvm.io> (Noticed that lightnvm was missed)
Reviewed-by: Javier González <javier@cnexlabs.com>
Tested-by: Javier González <javier@cnexlabs.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-18 12:40:59 -06:00
Christoph Hellwig 4e4cbee93d block: switch bios to blk_status_t
Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-09 09:27:32 -06:00
Javier González 507f7d68fe lightnvm: fix bad back free on error path
Free memory correctly when an allocation fails on a loop and we free
backwards previously successful allocations.

Signed-off-by: Javier González <javier@cnexlabs.com>
Reviewed-by: Matias Bjørling <matias@cnexlabs.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-05-04 07:53:04 -06:00
Wei Yongjun 5136a4fd58 lightnvm: fix possible memory leak in pblk_bb_discovery()
'blks' is malloced in pblk_bb_discovery() and should be freed
before leaving from the nvm_get_tgt_bb_tbl() error handling cases,
otherwise it will cause memory leak. Also skip assign blks to
rlun->bb_list when error.

Fixes: a4bd217b43 ("lightnvm: physical block device (pblk) target")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-25 10:44:29 -06:00
Javier González a44f53faf4 lightnvm: pblk: fix erase counters on error fail
When block erases fail, these blocks are marked bad. The number of valid
blocks in the line was not updated, which could cause an infinite loop
on the erase path.

Fix this atomic counter and, in order to avoid taking an irq lock on the
interrupt context, make the erase counters atomic too.

Also, in the case that a significant number of blocks become bad in a
line, the result is the double shared metadata buffer (emeta) to stop
the pipeline until all metadata is flushed to the media. Increase the
number of metadata lines from 2 to 4 to avoid this case.

Fixes: a4bd217b43 "lightnvm: physical block device (pblk) target"

Signed-off-by: Javier González <javier@cnexlabs.com>
Reviewed-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-23 16:57:52 -06:00
Javier González be388d9fbd lightnvm: pblk: free metadata on line alloc failure
When a line allocation fails, for example, due to having too many bad
blocks, free its metadata correctly.

Fixes: a4bd217b43 "lightnvm: physical block device (pblk) target"

Signed-off-by: Javier González <javier@cnexlabs.com>
Reviewed-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-23 16:57:52 -06:00
Javier González 33db9fd46e lightnvm: pblk: fix memory leak on error path
When write recovery fails, Free memory for the recovery structure.

Fixes: a4bd217b43 "lightnvm: physical block device (pblk) target"

Signed-off-by: Javier González <javier@cnexlabs.com>
Reviewed-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-23 16:57:52 -06:00
Javier González f3236cef5a lightnvm: pblk: fix bad error check
Fix bad error check

Fixes: a4bd217b43 "lightnvm: physical block device (pblk) target"

Signed-off-by: Javier González <javier@cnexlabs.com>
Reviewed-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-23 16:57:52 -06:00
Javier González 3dc001f343 lightnvm: pblk: fix race condition on line retry
When a pblk line fails (or is recovered), make sure to take the line
management lock.

Fixes: a4bd217b43 "lightnvm: physical block device (pblk) target"

Signed-off-by: Javier González <javier@cnexlabs.com>
Reviewed-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-23 16:57:52 -06:00
Dan Carpenter 659226eb63 lightnvm: don't print a warning for ADDR_EMPTY
Reading from ADDR_EMPTY is out of bounds.  The current code generates a
static checker warning because we check for out of bounds "lba" before
we check for ADDR_EMPTY, so the second check is always false.  It looks
like we intended ADDR_EMPTY to be a no-op without printing a warning.

Fixes: a4bd217b43 ("lightnvm: physical block device (pblk) target")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-21 16:49:56 -06:00
Dan Carpenter 5bf1e1ee62 lightnvm: potential underflow in pblk_read_rq()
This is a static checker fix, and perhaps not a real bug.  The static
checker thinks that nr_secs could be negative.  It would result in
zeroing more memory than intended.  Anyway, even if it's not a bug,
changing this variable to unsigned makes the code easier to audit.

Fixes: a4bd217b43 ("lightnvm: physical block device (pblk) target")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-21 16:48:40 -06:00
Rakesh Pandit 8d77bb8276 lightnvm: propagate pblk_init return to userspace
From userspace calling ioctl(NVM_DEV_CREATE) was returning ENOMEM for
invalid arguments even though pblk (pblk_init) was returning correctly
-EINVAL to nvm_create_tgt inside core.  This patch propagates the
correct return value to userspace.

Because pblk was introduced recently this only needs to go in 4.12.

Fixes: a4bd217b43 ("lightnvm: physical block device (pblk) target")
Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-21 12:40:41 -06:00
Rakesh Pandit 75ba4ada82 ligtnvm: fix double blk_put_queue on same queue
On an error path in NVM_DEV_CREATE ioctl blk_put_queue is being called
twice: one via blk_cleanup_queue and another via put_disk.  Straight fix
seems to remove queue pointer so that disk_release never ends up caling
blk_put_queue again.

  [  391.808827] WARNING: CPU: 1 PID: 1250 at lib/refcount.c:128 refcount_sub_and_test+0x70/0x80
  [  391.808830] refcount_t: underflow; use-after-free.
  [ 391.808832] Modules linked in: nf_conntrack_netbios_ns............
  [  391.809052] CPU: 1 PID: 1250 Comm: nvme Not tainted.........
  [  391.809057] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
             BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
  [  391.809060] Call Trace:
  [  391.809079]  dump_stack+0x63/0x86
  [  391.809094]  __warn+0xcb/0xf0
  [  391.809103]  warn_slowpath_fmt+0x5f/0x80
  [  391.809118]  refcount_sub_and_test+0x70/0x80
  [  391.809125]  refcount_dec_and_test+0x11/0x20
  [  391.809136]  kobject_put+0x1f/0x60
  [  391.809149]  blk_put_queue+0x15/0x20
  [  391.809159]  disk_release+0xae/0xf0
  [  391.809172]  device_release+0x32/0x90
  [  391.809184]  kobject_release+0x6a/0x170
  [  391.809196]  kobject_put+0x2f/0x60
  [  391.809206]  put_disk+0x17/0x20
  [  391.809219]  nvm_ioctl_dev_create.isra.16+0x897/0xa30
  [  391.809236]  nvm_ctl_ioctl+0x23c/0x4c0
  [  391.809248]  do_vfs_ioctl+0xa3/0x5f0
  [  391.809258]  SyS_ioctl+0x79/0x90
  [  391.809271]  entry_SYSCALL_64_fastpath+0x1a/0xa9
  [  391.809280] RIP: 0033:0x7f5d3ef363c7
  [  391.809286] RSP: 002b:00007ffc72ed8d78 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
  [  391.809296] RAX: ffffffffffffffda RBX: 00007ffc72edb552 RCX: 00007f5d3ef363c7
  [  391.809301] RDX: 00007ffc72ed8d90 RSI: 0000000040804c22 RDI: 0000000000000003
  [  391.809306] RBP: 0000000000000001 R08: 0000000000000020 R09: 0000000000000001
  [  391.809311] R10: 000000000000053f R11: 0000000000000206 R12: 0000000000000000
  [  391.809316] R13: 0000000000000000 R14: 00007ffc72edb58d R15: 00007ffc72edb581

Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Reviewed-by: Matias Bjørling <matias@cnexlabs.com>
Fixes: 7d1ef2f408 "lightnvm: fix cleanup order of disk on init error"
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20 08:17:47 -06:00
Arnd Bergmann ef697902a1 lightnvm: assume 64-bit lba numbers
The driver uses both u64 and sector_t to refer to offsets, and assigns between the
two. This causes one harmless warning when sector_t is 32-bit:

drivers/lightnvm/pblk-rb.c: In function 'pblk_rb_write_entry_gc':
include/linux/lightnvm.h:215:20: error: large integer implicitly truncated to unsigned type [-Werror=overflow]
drivers/lightnvm/pblk-rb.c:324:22: note: in expansion of macro 'ADDR_EMPTY'

As the driver is already doing this inconsistently, changing the type
won't make it worse and is an easy way to avoid the warning.

Fixes: a4bd217b43 ("lightnvm: physical block device (pblk) target")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-19 12:07:28 -06:00
Dan Carpenter 1c6286f263 lightnvm: fix some error code in pblk-init.c
There were a bunch of places in pblk_lines_init() where we didn't set an
error code.  And in pblk_writer_init() we accidentally return 1 instead
of a correct error code, which would result in a Oops later.

Fixes: 11a5d6fdf919 ("lightnvm: physical block device (pblk) target")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-16 10:06:34 -06:00
Dan Carpenter 2a79efd833 lightnvm: fix some WARN() messages
WARN_ON() takes a condition, not an error message.  I slightly tweaked
some conditions so hopefully it's more clear.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-16 10:06:34 -06:00
Dan Carpenter 503ec94eca lightnvm: pblk-gc: fix an error pointer dereference in init
These labels are reversed so we could end up dereferencing an error
pointer or leaking.

Fixes: 7f347ba6bb3a ("lightnvm: physical block device (pblk) target")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-16 10:06:34 -06:00
Javier González a4bd217b43 lightnvm: physical block device (pblk) target
This patch introduces pblk, a host-side translation layer for
Open-Channel SSDs to expose them like block devices. The translation
layer allows data placement decisions, and I/O scheduling to be
managed by the host, enabling users to optimize the SSD for their
specific workloads.

An open-channel SSD has a set of LUNs (parallel units) and a
collection of blocks. Each block can be read in any order, but
writes must be sequential. Writes may also fail, and if a block
requires it, must also be reset before new writes can be
applied.

To manage the constraints, pblk maintains a logical to
physical address (L2P) table,  write cache, garbage
collection logic, recovery scheme, and logic to rate-limit
user I/Os versus garbage collection I/Os.

The L2P table is fully-associative and manages sectors at a
4KB granularity. Pblk stores the L2P table in two places, in
the out-of-band area of the media and on the last page of a
line. In the cause of a power failure, pblk will perform a
scan to recover the L2P table.

The user data is organized into lines. A line is data
striped across blocks and LUNs. The lines enable the host to
reduce the amount of metadata to maintain besides the user
data and makes it easier to implement RAID or erasure coding
in the future.

pblk implements multi-tenant support and can be instantiated
multiple times on the same drive. Each instance owns a
portion of the SSD - both regarding I/O bandwidth and
capacity - providing I/O isolation for each case.

Finally, pblk also exposes a sysfs interface that allows
user-space to peek into the internals of pblk. The interface
is available at /dev/block/*/pblk/ where * is the block
device name exposed.

This work also contains contributions from:
  Matias Bjørling <matias@cnexlabs.com>
  Simon A. F. Lund <slund@cnexlabs.com>
  Young Tack Jin <youngtack.jin@gmail.com>
  Huaicheng Li <huaicheng@cs.uchicago.edu>

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-16 10:06:33 -06:00
Javier González 6eb082452d lightnvm: convert sprintf into strlcpy
Convert sprintf calls to strlcpy in order to make possible buffer
overflow more obvious.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-16 10:06:25 -06:00
Javier González d788c59be5 lightnvm: fix type checks on rrpc
sector_t is always unsigned, therefore avoid < 0 checks on it.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-16 10:06:25 -06:00
Javier González 5d30f3bd55 lightnvm: clean unused variable
Clean unused variable on lightnvm core.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-16 10:06:25 -06:00
Javier González 46b160ceda lightnvm: make nvm_free static
Prefix the nvm_free static function with a missing static keyword.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-16 10:06:25 -06:00
Javier González 4af3f75d79 lightnvm: allow to init targets on factory mode
Target initialization has two responsibilities: creating the target
partition and instantiating the target. This patch enables to create a
factory partition (e.g., do not trigger recovery on the given target).
This is useful for target development and for being able to restore the
device state at any moment in time without requiring a full-device
erase.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-16 10:06:25 -06:00
Javier González 7d1ef2f408 lightnvm: fix cleanup order of disk on init error
Reorder disk allocation such that the disk structure can be put
safely.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-16 10:06:25 -06:00
Javier González edee1bdd66 lightnvm: double-clear of dev->lun_map on target init error
The dev->lun_map bits are cleared twice if an target init error occurs.
First in the target clean routine, and then next in the nvm_tgt_create
error function. Make sure that it is only cleared once by extending
nvm_remove_tgt_devi() with a clear bit, such that clearing of bits can
ignored when cleaning up a successful initialized target.

Signed-off-by: Javier González <javier@cnexlabs.com>
Fix style.
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>

Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-16 10:06:25 -06:00
NeilBrown b0e0306ce1 lightnvm: don't check for failure from mempool_alloc()
mempool_alloc() cannot fail if the gfp flags allow it to
sleep, and both GFP_KERNEL and GFP_NOIO allows for sleeping.

So rrpc_move_valid_pages() and rrpc_make_rq() don't need to
test the return value.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-16 10:06:25 -06:00
Javier González 7a3de2b33f lightnvm: free reverse device map
Free the reverse mapping table correctly on target tear down

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-16 10:06:25 -06:00
Javier González 17912c49ed lightnvm: submit erases using the I/O path
Until now erases have been submitted as synchronous commands through a
dedicated erase function. In order to enable targets implementing
asynchronous erases, refactor the erase path so that it uses the normal
async I/O submission functions. If a target requires sync I/O, it can
implement it internally. Also, adapt rrpc to use the new erase path.

Signed-off-by: Javier González <javier@cnexlabs.com>
Fixed spelling error.
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>

Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-16 10:06:25 -06:00
Christophe JAILLET 654a01b788 lightnvm: Fix error handling
According to error handling in this function, it is likely that going to
'out' was expected here.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-16 10:06:25 -06:00
Matias Bjørling 6732c74010 lightnvm: set default lun range when no luns are specified
The create target ioctl takes a lun begin and lun end parameter, which
defines the range of luns to initialize a target with. If the user does
not set the parameters, it default to only using lun 0. Instead,
defaults to use all luns in the OCSSD, as it is the usual behaviour
users want.

Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-15 08:27:21 -07:00
Matias Bjørling 0e5ffd1cb5 lightnvm: fix off-by-one error on target initialization
If one specifies the end lun id to be the absolute number of luns,
without taking zero indexing into account, the lightnvm core will pass
the off-by-one end lun id to target creation, which then panics during
nvm_ioctl_dev_create.

Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-15 08:27:19 -07:00
Javier González 9a69b0ed62 lightnvm: allow targets to use sysfs
In order to register through the sysfs interface, a driver needs to know
its kobject. On a disk structure, this happens when the partition
information is added (device_add_disk), which for lightnvm takes place
after the target has been initialized. This means that on target
initialization, the kboject has not been created yet.

This patch adds a target function to let targets initialize their own
kboject as a child of the disk kobject.

Signed-off-by: Javier González <javier@cnexlabs.com>
Added exit typedef and passed gendisk instead of void pointer for exit.
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-31 08:32:13 -07:00
Javier González deccf5a52e lightnvm: free properly on target creation error
Fix a memory leak when target creation fails. More specifically, free
the entire device structure given to the target (tgt_dev).

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-31 08:32:13 -07:00
Matias Bjørling 06894efea7 lightnvm: use end_io callback instead of instance
When the lightnvm core had the "gennvm" layer between the device and the
target, there was a need for the core to be able to figure out which
target it should send an end_io callback to. Leading to a "double"
end_io, first for the media manager instance, and then for the target
instance. Now that core and gennvm is merged, there is no longer a need
for this, and a single end_io callback will do.

Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-31 08:32:13 -07:00
Matias Bjørling 19bd6fe73c lightnvm: reduce number of nvm_id groups to one
The number of configuration groups has been limited to one in current
code, even if there is support for up to four. With the introduction
of the open-channel SSD 1.3 specification, only a single
group is exposed onwards. Reflect this in the nvm_id structure.

Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-31 08:32:13 -07:00
Matias Bjørling dab8ee9e8a lightnvm: cleanup nvm transformation functions
Going from target specific ppa addresses to device was accomplished by
first converting target to generic ppa addresses and generic to device
addresses. The conversion was either open-coded or used the built-in
nvm_trans_* and nvm_map_* functions for conversion. Simplify the
interface and cleanup the calls to provide clean functions that now
either take a list of ppas or a nvm_rq, and is exposed through:

 void nvm_ppa_* - target to/from device with a list of PPAs,
 void nvm_rq_* - target to/from device with a nvm_rq.

Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-31 08:32:13 -07:00
Matias Bjørling 61a561d8d7 lightnvm: make nvm_map_* return void
The only check there was done was a debugging check. Remove it and
replace the return value with void to reduce error checking.

Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-31 08:32:13 -07:00
Matias Bjørling 8f4fe008fb lightnvm: remove nvm_get_bb_tbl and nvm_set_bb_tbl
Since the merge of gennvm and core, there is no longer a need for the
device specific bad block functions.

Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-31 08:32:13 -07:00
Matias Bjørling 583b7058b2 lightnvm: remove nvm_submit_ppa* functions
The nvm_submit_ppa* functions are no longer needed after gennvm and core
have been merged.

Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-31 08:32:13 -07:00
Matias Bjørling 10995c3dc9 lightnvm: collapse nvm_erase_ppa and nvm_erase_blk
After gennvm and core have been merged, there are no more callers to
nvm_erase_ppa. Therefore collapse the device specific and target
specific erase functions.

Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-31 08:32:13 -07:00
Matias Bjørling ade69e2432 lightnvm: merge gennvm with core
For the first iteration of Open-Channel SSDs, it was anticipated that
there could be various media managers on top of an open-channel SSD,
such to allow vendors to plug in their own host-side FTLs, without the
media manager in between.

Now that an Open-Channel SSD is exposed as a traditional block device,
there is no longer a need for this. Therefore lets merge the gennvm code
with core and simplify the stack.

Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-31 08:32:13 -07:00
Linus Torvalds b78b499a67 Char/Misc driver patches for 4.10-rc1
Here's the big char/misc driver patches for 4.10-rc1.  Lots of tiny
 changes over lots of "minor" driver subsystems, the largest being some
 new FPGA drivers.  Other than that, a few other new drivers, but no new
 driver subsystems added for this kernel cycle, a nice change.
 
 All of these have been in linux-next with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWFAtwA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykyCgCeJn36u1AsBi7qZ3u/1hwD8k56s2IAnRo6U31r
 WW65YcNTK7qYXqNbfgIa
 =/t/V
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver updates from Greg KH:
 "Here's the big char/misc driver patches for 4.10-rc1. Lots of tiny
  changes over lots of "minor" driver subsystems, the largest being some
  new FPGA drivers. Other than that, a few other new drivers, but no new
  driver subsystems added for this kernel cycle, a nice change.

  All of these have been in linux-next with no reported issues"

* tag 'char-misc-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (107 commits)
  uio-hv-generic: store physical addresses instead of virtual
  Tools: hv: kvp: configurable external scripts path
  uio-hv-generic: new userspace i/o driver for VMBus
  vmbus: add support for dynamic device id's
  hv: change clockevents unbind tactics
  hv: acquire vmbus_connection.channel_mutex in vmbus_free_channels()
  hyperv: Fix spelling of HV_UNKOWN
  mei: bus: enable non-blocking RX
  mei: fix the back to back interrupt handling
  mei: synchronize irq before initiating a reset.
  VME: Remove shutdown entry from vme_driver
  auxdisplay: ht16k33: select framebuffer helper modules
  MAINTAINERS: add git url for fpga
  fpga: Clarify how write_init works streaming modes
  fpga zynq: Fix incorrect ISR state on bootup
  fpga zynq: Remove priv->dev
  fpga zynq: Add missing \n to messages
  fpga: Add COMPILE_TEST to all drivers
  uio: pruss: add clk_disable()
  char/pcmcia: add some error checking in scr24x_read()
  ...
2016-12-13 12:11:01 -08:00
Javier González 333ba053d1 lightnvm: transform target get/set bad block
Since targets are given a virtual target device, it is necessary to
translate all communication between targets and the backend device.
Implement the translation layer for get/set bad block table.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-29 12:12:51 -07:00
Javier González da2d7cb828 lightnvm: use target nvm on target-specific ops.
On target-specific operations pass on nvm_tgt_dev instead of the generic
nvm device.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-29 12:12:51 -07:00
Javier González a279006afa lightnvm: introduce max_phys_sects helper function
Target devices do not have access to the device driver operations.
Introduce a helper function that exposes the max. number of physical
sectors supported by the underlying device.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-29 12:12:51 -07:00
Javier González 959e911b31 lightnvm: introduce helpers for generic ops in rrpc
Avoid calling media manager and device-specific operations directly from
rrpc. Create helper functions on lightnvm's core instead.

Signed-off-by: Javier González <javier@cnexlabs.com>

Made it work with null_blk as well.
Signed-off-by: Matias Bjørling <m@bjorling.me>

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-29 12:12:51 -07:00
Javier González 8e53624d44 lightnvm: eliminate nvm_lun abstraction in mm
In order to naturally support multi-target instances on an Open-Channel
SSD, targets should own the LUNs they get blocks from and manage
provisioning internally. This is done in several steps.

Since targets own the LUNs the are instantiated on top of and manage the
free block list internally, there is no need for a LUN abstraction in
the media manager. LUNs are intrinsically managed as in the physical
layout (ch:0,lun:0, ..., ch:0,lun:n, ch:1,lun:0, ch:1,lun:n, ...,
ch:m,lun:0, ch:m,lun:n) and given to the targets based on the target
creation ioctl. This simplifies LUN management and clears the path for a
partition manager to sit directly underneath LightNVM targets.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-29 12:12:51 -07:00
Javier González 2a02e627c2 lightnvm: eliminate nvm_block abstraction on mm
In order to naturally support multi-target instances on an Open-Channel
SSD, targets should own the LUNs they get blocks from and manage
provisioning internally. This is done in several steps.

A part of this transformation is that targets manage their blocks
internally. This patch eliminates the nvm_block abstraction and moves
block management to the target logic. The rrpc target is transformed.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-29 12:12:51 -07:00
Javier González eec44565e9 lightnvm: remove debug lun statistics from gennvm
Since LUNs are managed internally on targets, the media manager has no
access to the free LUN lists. Thus, debug functions that show LUN
information on the device should not be implemented on the media
manager, but rather on the target in itself.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-29 12:12:51 -07:00
Javier González 0ac4072eb1 lightnvm: remove get_lun operation on gennvm
Since LUNs are managed internally on the target, there is no need for
the media manager to implement a get_lun operation.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-29 12:12:51 -07:00
Javier González 8e79b5cb1d lightnvm: move block provisioning to targets
In order to naturally support multi-target instances on an Open-Channel
SSD, targets should own the LUNs they get blocks from and manage
provisioning internally. This is done in several steps.

This patch moves the block provisioning inside of the target and removes
the get/put block interface from the media manager.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-29 12:12:51 -07:00
Javier González 8176117b82 lightnvm: manage lun partitions internally in mm
LUNs are exclusively owned by targets implementing a block device FTL.
Doing this reservation requires at the moment a 2-way callback gennvm
<-> target. The reason behind this is that LUNs were not assumed to
always be exclusively owned by targets. However, this design decision
goes against I/O determinism QoS (two targets would mix I/O on the same
parallel unit in the device).

This patch makes LUN reservation as part of the target creation on the
media manager. This makes that LUNs are always exclusively owned by the
target instantiated on top of them. LUN stripping and/or sharing should
be implemented on the target itself or the layers on top.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-29 12:12:51 -07:00
Javier González de93434fcf lightnvm: remove gen_lun abstraction
The gen_lun abstraction in the generic media manager was conceived on
the assumption that a single target would instantiated on top of it.
This has complicated target design to implement multi-instances. Remove
this abstraction and move its logic to nvm_lun, which manages physical
lun geometry and operations.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-29 12:12:51 -07:00
Javier González 98379a12c5 lightnvm: use constant name instead of value
There is a constant to refer to free blocks. Use it when marking bad
blocks instead of using a constant value

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-29 12:12:51 -07:00
Javier González eb00352b52 lightnvm: remove unnecessary variables in rrpc
Before vectored I/Os were supported on rrpc, the physical address was
stored as part of the nvm_rqd request. This variable become obsolete
when the ppa_list was introduced. Cleanup this variable.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-29 12:12:51 -07:00
Javier González 0e5c3246db lightnvm: make address conversion functions global
Targets are assumed to used the same generic ppa format, where the
address is partitioned on ch:lun:block:pg:pl:sec. Thus, make the
function in charge of transforming the ppa address from a linear format
to the generic one available to all targets.

This function will be needed by the media manager in order to do target
mapping translations when targets are divided on different physical
partitions.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-29 12:12:51 -07:00
Javier González 17b25cfc87 lightnvm: remove sysfs configuration interface
LightNVM used to be managed and configured through sysfs. Since the
introduction of management ioctls this interface is redundant and
outdated. Get rid of it.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-29 12:12:51 -07:00
Javier González f0b01b6a61 lightnvm: rrpc: split bios of size > 256kb
rrpc cannot handle bios of size > 256kb due to NVMe using a 64 bit
bitmap to signal I/O completion. If a larger bio comes, split it
explicitly.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-29 12:12:51 -07:00
Javier González a24ba4644b lightnvm: export set bad block table
Bad blocks should be managed by block owners. This would be either
targets for data blocks or sysblk for system blocks.

In order to support this, export two functions: One to mark a block as
an specific type (e.g., bad block) and another to update the bad block
table on the device.

Move bad block management to rrpc.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-29 12:12:51 -07:00
Javier González 8a3c95ab38 lightnvm: do not protect block 0
Device blocks should be marked by the device and considered as bad
blocks by the media manager. Thus, do not make assumptions on which
blocks are going to be used by the device. In doing so we might lose
valid blocks from the free list.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-29 12:12:51 -07:00