Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-block

Pull core block layer updates from Jens Axboe:
 "This is the main pull request for block storage for 4.15-rc1.

  Nothing out of the ordinary in here, and no API changes or anything
  like that. Just various new features for drivers, core changes, etc.
  In particular, this pull request contains:

   - A patch series from Bart, closing the whole on blk/scsi-mq queue
     quescing.

   - A series from Christoph, building towards hidden gendisks (for
     multipath) and ability to move bio chains around.

   - NVMe
        - Support for native multipath for NVMe (Christoph).
        - Userspace notifications for AENs (Keith).
        - Command side-effects support (Keith).
        - SGL support (Chaitanya Kulkarni)
        - FC fixes and improvements (James Smart)
        - Lots of fixes and tweaks (Various)

   - bcache
        - New maintainer (Michael Lyle)
        - Writeback control improvements (Michael)
        - Various fixes (Coly, Elena, Eric, Liang, et al)

   - lightnvm updates, mostly centered around the pblk interface
     (Javier, Hans, and Rakesh).

   - Removal of unused bio/bvec kmap atomic interfaces (me, Christoph)

   - Writeback series that fix the much discussed hundreds of millions
     of sync-all units. This goes all the way, as discussed previously
     (me).

   - Fix for missing wakeup on writeback timer adjustments (Yafang
     Shao).

   - Fix laptop mode on blk-mq (me).

   - {mq,name} tupple lookup for IO schedulers, allowing us to have
     alias names. This means you can use 'deadline' on both !mq and on
     mq (where it's called mq-deadline). (me).

   - blktrace race fix, oopsing on sg load (me).

   - blk-mq optimizations (me).

   - Obscure waitqueue race fix for kyber (Omar).

   - NBD fixes (Josef).

   - Disable writeback throttling by default on bfq, like we do on cfq
     (Luca Miccio).

   - Series from Ming that enable us to treat flush requests on blk-mq
     like any other request. This is a really nice cleanup.

   - Series from Ming that improves merging on blk-mq with schedulers,
     getting us closer to flipping the switch on scsi-mq again.

   - BFQ updates (Paolo).

   - blk-mq atomic flags memory ordering fixes (Peter Z).

   - Loop cgroup support (Shaohua).

   - Lots of minor fixes from lots of different folks, both for core and
     driver code"

* 'for-4.15/block' of git://git.kernel.dk/linux-block: (294 commits)
  nvme: fix visibility of "uuid" ns attribute
  blk-mq: fixup some comment typos and lengths
  ide: ide-atapi: fix compile error with defining macro DEBUG
  blk-mq: improve tag waiting setup for non-shared tags
  brd: remove unused brd_mutex
  blk-mq: only run the hardware queue if IO is pending
  block: avoid null pointer dereference on null disk
  fs: guard_bio_eod() needs to consider partitions
  xtensa/simdisk: fix compile error
  nvme: expose subsys attribute to sysfs
  nvme: create 'slaves' and 'holders' entries for hidden controllers
  block: create 'slaves' and 'holders' entries for hidden gendisks
  nvme: also expose the namespace identification sysfs files for mpath nodes
  nvme: implement multipath access to nvme subsystems
  nvme: track shared namespaces
  nvme: introduce a nvme_ns_ids structure
  nvme: track subsystems
  block, nvme: Introduce blk_mq_req_flags_t
  block, scsi: Make SCSI quiesce and resume work reliably
  block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag
  ...
This commit is contained in:
Linus Torvalds 2017-11-14 15:32:19 -08:00
commit e2c5923c34
131 changed files with 5485 additions and 3104 deletions

View File

@ -1,5 +0,0 @@
What: /proc/sys/vm/nr_pdflush_threads
Date: June 2012
Contact: Wanpeng Li <liwp@linux.vnet.ibm.com>
Description: Since pdflush is replaced by per-BDI flusher, the interface of old pdflush
exported in /proc/sys/vm/ should be removed.

View File

@ -216,10 +216,9 @@ may need to abort DMA operations and revert to PIO for the transfer, in
which case a virtual mapping of the page is required. For SCSI it is also which case a virtual mapping of the page is required. For SCSI it is also
done in some scenarios where the low level driver cannot be trusted to done in some scenarios where the low level driver cannot be trusted to
handle a single sg entry correctly. The driver is expected to perform the handle a single sg entry correctly. The driver is expected to perform the
kmaps as needed on such occasions using the __bio_kmap_atomic and bio_kmap_irq kmaps as needed on such occasions as appropriate. A driver could also use
routines as appropriate. A driver could also use the blk_queue_bounce() the blk_queue_bounce() routine on its own to bounce highmem i/o to low
routine on its own to bounce highmem i/o to low memory for specific requests memory for specific requests if so desired.
if so desired.
iii. The i/o scheduler algorithm itself can be replaced/set as appropriate iii. The i/o scheduler algorithm itself can be replaced/set as appropriate
@ -1137,8 +1136,8 @@ use dma_map_sg for scatter gather) to be able to ship it to the driver. For
PIO drivers (or drivers that need to revert to PIO transfer once in a PIO drivers (or drivers that need to revert to PIO transfer once in a
while (IDE for example)), where the CPU is doing the actual data while (IDE for example)), where the CPU is doing the actual data
transfer a virtual mapping is needed. If the driver supports highmem I/O, transfer a virtual mapping is needed. If the driver supports highmem I/O,
(Sec 1.1, (ii) ) it needs to use __bio_kmap_atomic and bio_kmap_irq to (Sec 1.1, (ii) ) it needs to use kmap_atomic or similar to temporarily map
temporarily map a bio into the virtual address space. a bio into the virtual address space.
8. Prior/Related/Impacted patches 8. Prior/Related/Impacted patches

View File

@ -38,7 +38,7 @@ gb=[Size in GB]: Default: 250GB
bs=[Block size (in bytes)]: Default: 512 bytes bs=[Block size (in bytes)]: Default: 512 bytes
The block size reported to the system. The block size reported to the system.
nr_devices=[Number of devices]: Default: 2 nr_devices=[Number of devices]: Default: 1
Number of block devices instantiated. They are instantiated as /dev/nullb0, Number of block devices instantiated. They are instantiated as /dev/nullb0,
etc. etc.
@ -52,13 +52,13 @@ irqmode=[0-2]: Default: 1-Soft-irq
2: Timer: Waits a specific period (completion_nsec) for each IO before 2: Timer: Waits a specific period (completion_nsec) for each IO before
completion. completion.
completion_nsec=[ns]: Default: 10.000ns completion_nsec=[ns]: Default: 10,000ns
Combined with irqmode=2 (timer). The time each completion event must wait. Combined with irqmode=2 (timer). The time each completion event must wait.
submit_queues=[0..nr_cpus]: submit_queues=[1..nr_cpus]:
The number of submission queues attached to the device driver. If unset, it The number of submission queues attached to the device driver. If unset, it
defaults to 1 on single-queue and bio-based instances. For multi-queue, defaults to 1. For multi-queue, it is ignored when use_per_node_hctx module
it is ignored when use_per_node_hctx module parameter is 1. parameter is 1.
hw_queue_depth=[0..qdepth]: Default: 64 hw_queue_depth=[0..qdepth]: Default: 64
The hardware queue depth of the device. The hardware queue depth of the device.
@ -73,3 +73,12 @@ use_per_node_hctx=[0/1]: Default: 0
use_lightnvm=[0/1]: Default: 0 use_lightnvm=[0/1]: Default: 0
Register device with LightNVM. Requires blk-mq and CONFIG_NVM to be enabled. Register device with LightNVM. Requires blk-mq and CONFIG_NVM to be enabled.
no_sched=[0/1]: Default: 0
0: nullb* use default blk-mq io scheduler.
1: nullb* doesn't use io scheduler.
shared_tags=[0/1]: Default: 0
0: Tag set is not shared.
1: Tag set shared between devices for blk-mq. Only makes sense with
nr_devices > 1, otherwise there's no tag set to share.

View File

@ -2562,10 +2562,12 @@ S: Maintained
F: drivers/net/hamradio/baycom* F: drivers/net/hamradio/baycom*
BCACHE (BLOCK LAYER CACHE) BCACHE (BLOCK LAYER CACHE)
M: Michael Lyle <mlyle@lyle.org>
M: Kent Overstreet <kent.overstreet@gmail.com> M: Kent Overstreet <kent.overstreet@gmail.com>
L: linux-bcache@vger.kernel.org L: linux-bcache@vger.kernel.org
W: http://bcache.evilpiepirate.org W: http://bcache.evilpiepirate.org
S: Orphan C: irc://irc.oftc.net/bcache
S: Maintained
F: drivers/md/bcache/ F: drivers/md/bcache/
BDISP ST MEDIA DRIVER BDISP ST MEDIA DRIVER
@ -12085,7 +12087,6 @@ F: drivers/mmc/host/sdhci-omap.c
SECURE ENCRYPTING DEVICE (SED) OPAL DRIVER SECURE ENCRYPTING DEVICE (SED) OPAL DRIVER
M: Scott Bauer <scott.bauer@intel.com> M: Scott Bauer <scott.bauer@intel.com>
M: Jonathan Derrick <jonathan.derrick@intel.com> M: Jonathan Derrick <jonathan.derrick@intel.com>
M: Rafael Antognolli <rafael.antognolli@intel.com>
L: linux-block@vger.kernel.org L: linux-block@vger.kernel.org
S: Supported S: Supported
F: block/sed* F: block/sed*

View File

@ -110,13 +110,13 @@ static blk_qc_t simdisk_make_request(struct request_queue *q, struct bio *bio)
sector_t sector = bio->bi_iter.bi_sector; sector_t sector = bio->bi_iter.bi_sector;
bio_for_each_segment(bvec, bio, iter) { bio_for_each_segment(bvec, bio, iter) {
char *buffer = __bio_kmap_atomic(bio, iter); char *buffer = kmap_atomic(bvec.bv_page) + bvec.bv_offset;
unsigned len = bvec.bv_len >> SECTOR_SHIFT; unsigned len = bvec.bv_len >> SECTOR_SHIFT;
simdisk_transfer(dev, sector, len, buffer, simdisk_transfer(dev, sector, len, buffer,
bio_data_dir(bio) == WRITE); bio_data_dir(bio) == WRITE);
sector += len; sector += len;
__bio_kunmap_atomic(buffer); kunmap_atomic(buffer);
} }
bio_endio(bio); bio_endio(bio);

View File

@ -108,6 +108,7 @@
#include "blk-mq-tag.h" #include "blk-mq-tag.h"
#include "blk-mq-sched.h" #include "blk-mq-sched.h"
#include "bfq-iosched.h" #include "bfq-iosched.h"
#include "blk-wbt.h"
#define BFQ_BFQQ_FNS(name) \ #define BFQ_BFQQ_FNS(name) \
void bfq_mark_bfqq_##name(struct bfq_queue *bfqq) \ void bfq_mark_bfqq_##name(struct bfq_queue *bfqq) \
@ -724,6 +725,44 @@ static void bfq_updated_next_req(struct bfq_data *bfqd,
} }
} }
static unsigned int bfq_wr_duration(struct bfq_data *bfqd)
{
u64 dur;
if (bfqd->bfq_wr_max_time > 0)
return bfqd->bfq_wr_max_time;
dur = bfqd->RT_prod;
do_div(dur, bfqd->peak_rate);
/*
* Limit duration between 3 and 13 seconds. Tests show that
* higher values than 13 seconds often yield the opposite of
* the desired result, i.e., worsen responsiveness by letting
* non-interactive and non-soft-real-time applications
* preserve weight raising for a too long time interval.
*
* On the other end, lower values than 3 seconds make it
* difficult for most interactive tasks to complete their jobs
* before weight-raising finishes.
*/
if (dur > msecs_to_jiffies(13000))
dur = msecs_to_jiffies(13000);
else if (dur < msecs_to_jiffies(3000))
dur = msecs_to_jiffies(3000);
return dur;
}
/* switch back from soft real-time to interactive weight raising */
static void switch_back_to_interactive_wr(struct bfq_queue *bfqq,
struct bfq_data *bfqd)
{
bfqq->wr_coeff = bfqd->bfq_wr_coeff;
bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
bfqq->last_wr_start_finish = bfqq->wr_start_at_switch_to_srt;
}
static void static void
bfq_bfqq_resume_state(struct bfq_queue *bfqq, struct bfq_data *bfqd, bfq_bfqq_resume_state(struct bfq_queue *bfqq, struct bfq_data *bfqd,
struct bfq_io_cq *bic, bool bfq_already_existing) struct bfq_io_cq *bic, bool bfq_already_existing)
@ -750,10 +789,16 @@ bfq_bfqq_resume_state(struct bfq_queue *bfqq, struct bfq_data *bfqd,
if (bfqq->wr_coeff > 1 && (bfq_bfqq_in_large_burst(bfqq) || if (bfqq->wr_coeff > 1 && (bfq_bfqq_in_large_burst(bfqq) ||
time_is_before_jiffies(bfqq->last_wr_start_finish + time_is_before_jiffies(bfqq->last_wr_start_finish +
bfqq->wr_cur_max_time))) { bfqq->wr_cur_max_time))) {
bfq_log_bfqq(bfqq->bfqd, bfqq, if (bfqq->wr_cur_max_time == bfqd->bfq_wr_rt_max_time &&
"resume state: switching off wr"); !bfq_bfqq_in_large_burst(bfqq) &&
time_is_after_eq_jiffies(bfqq->wr_start_at_switch_to_srt +
bfqq->wr_coeff = 1; bfq_wr_duration(bfqd))) {
switch_back_to_interactive_wr(bfqq, bfqd);
} else {
bfqq->wr_coeff = 1;
bfq_log_bfqq(bfqq->bfqd, bfqq,
"resume state: switching off wr");
}
} }
/* make sure weight will be updated, however we got here */ /* make sure weight will be updated, however we got here */
@ -1173,33 +1218,22 @@ static bool bfq_bfqq_update_budg_for_activation(struct bfq_data *bfqd,
return wr_or_deserves_wr; return wr_or_deserves_wr;
} }
static unsigned int bfq_wr_duration(struct bfq_data *bfqd) /*
* Return the farthest future time instant according to jiffies
* macros.
*/
static unsigned long bfq_greatest_from_now(void)
{ {
u64 dur; return jiffies + MAX_JIFFY_OFFSET;
}
if (bfqd->bfq_wr_max_time > 0) /*
return bfqd->bfq_wr_max_time; * Return the farthest past time instant according to jiffies
* macros.
dur = bfqd->RT_prod; */
do_div(dur, bfqd->peak_rate); static unsigned long bfq_smallest_from_now(void)
{
/* return jiffies - MAX_JIFFY_OFFSET;
* Limit duration between 3 and 13 seconds. Tests show that
* higher values than 13 seconds often yield the opposite of
* the desired result, i.e., worsen responsiveness by letting
* non-interactive and non-soft-real-time applications
* preserve weight raising for a too long time interval.
*
* On the other end, lower values than 3 seconds make it
* difficult for most interactive tasks to complete their jobs
* before weight-raising finishes.
*/
if (dur > msecs_to_jiffies(13000))
dur = msecs_to_jiffies(13000);
else if (dur < msecs_to_jiffies(3000))
dur = msecs_to_jiffies(3000);
return dur;
} }
static void bfq_update_bfqq_wr_on_rq_arrival(struct bfq_data *bfqd, static void bfq_update_bfqq_wr_on_rq_arrival(struct bfq_data *bfqd,
@ -1216,7 +1250,19 @@ static void bfq_update_bfqq_wr_on_rq_arrival(struct bfq_data *bfqd,
bfqq->wr_coeff = bfqd->bfq_wr_coeff; bfqq->wr_coeff = bfqd->bfq_wr_coeff;
bfqq->wr_cur_max_time = bfq_wr_duration(bfqd); bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
} else { } else {
bfqq->wr_start_at_switch_to_srt = jiffies; /*
* No interactive weight raising in progress
* here: assign minus infinity to
* wr_start_at_switch_to_srt, to make sure
* that, at the end of the soft-real-time
* weight raising periods that is starting
* now, no interactive weight-raising period
* may be wrongly considered as still in
* progress (and thus actually started by
* mistake).
*/
bfqq->wr_start_at_switch_to_srt =
bfq_smallest_from_now();
bfqq->wr_coeff = bfqd->bfq_wr_coeff * bfqq->wr_coeff = bfqd->bfq_wr_coeff *
BFQ_SOFTRT_WEIGHT_FACTOR; BFQ_SOFTRT_WEIGHT_FACTOR;
bfqq->wr_cur_max_time = bfqq->wr_cur_max_time =
@ -2016,10 +2062,27 @@ static void bfq_bfqq_save_state(struct bfq_queue *bfqq)
bic->saved_IO_bound = bfq_bfqq_IO_bound(bfqq); bic->saved_IO_bound = bfq_bfqq_IO_bound(bfqq);
bic->saved_in_large_burst = bfq_bfqq_in_large_burst(bfqq); bic->saved_in_large_burst = bfq_bfqq_in_large_burst(bfqq);
bic->was_in_burst_list = !hlist_unhashed(&bfqq->burst_list_node); bic->was_in_burst_list = !hlist_unhashed(&bfqq->burst_list_node);
bic->saved_wr_coeff = bfqq->wr_coeff; if (unlikely(bfq_bfqq_just_created(bfqq) &&
bic->saved_wr_start_at_switch_to_srt = bfqq->wr_start_at_switch_to_srt; !bfq_bfqq_in_large_burst(bfqq))) {
bic->saved_last_wr_start_finish = bfqq->last_wr_start_finish; /*
bic->saved_wr_cur_max_time = bfqq->wr_cur_max_time; * bfqq being merged right after being created: bfqq
* would have deserved interactive weight raising, but
* did not make it to be set in a weight-raised state,
* because of this early merge. Store directly the
* weight-raising state that would have been assigned
* to bfqq, so that to avoid that bfqq unjustly fails
* to enjoy weight raising if split soon.
*/
bic->saved_wr_coeff = bfqq->bfqd->bfq_wr_coeff;
bic->saved_wr_cur_max_time = bfq_wr_duration(bfqq->bfqd);
bic->saved_last_wr_start_finish = jiffies;
} else {
bic->saved_wr_coeff = bfqq->wr_coeff;
bic->saved_wr_start_at_switch_to_srt =
bfqq->wr_start_at_switch_to_srt;
bic->saved_last_wr_start_finish = bfqq->last_wr_start_finish;
bic->saved_wr_cur_max_time = bfqq->wr_cur_max_time;
}
} }
static void static void
@ -2897,24 +2960,6 @@ static unsigned long bfq_bfqq_softrt_next_start(struct bfq_data *bfqd,
jiffies + nsecs_to_jiffies(bfqq->bfqd->bfq_slice_idle) + 4); jiffies + nsecs_to_jiffies(bfqq->bfqd->bfq_slice_idle) + 4);
} }
/*
* Return the farthest future time instant according to jiffies
* macros.
*/
static unsigned long bfq_greatest_from_now(void)
{
return jiffies + MAX_JIFFY_OFFSET;
}
/*
* Return the farthest past time instant according to jiffies
* macros.
*/
static unsigned long bfq_smallest_from_now(void)
{
return jiffies - MAX_JIFFY_OFFSET;
}
/** /**
* bfq_bfqq_expire - expire a queue. * bfq_bfqq_expire - expire a queue.
* @bfqd: device owning the queue. * @bfqd: device owning the queue.
@ -3489,11 +3534,7 @@ static void bfq_update_wr_data(struct bfq_data *bfqd, struct bfq_queue *bfqq)
bfq_wr_duration(bfqd))) bfq_wr_duration(bfqd)))
bfq_bfqq_end_wr(bfqq); bfq_bfqq_end_wr(bfqq);
else { else {
/* switch back to interactive wr */ switch_back_to_interactive_wr(bfqq, bfqd);
bfqq->wr_coeff = bfqd->bfq_wr_coeff;
bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
bfqq->last_wr_start_finish =
bfqq->wr_start_at_switch_to_srt;
bfqq->entity.prio_changed = 1; bfqq->entity.prio_changed = 1;
} }
} }
@ -3685,16 +3726,37 @@ void bfq_put_queue(struct bfq_queue *bfqq)
if (bfqq->ref) if (bfqq->ref)
return; return;
if (bfq_bfqq_sync(bfqq)) if (!hlist_unhashed(&bfqq->burst_list_node)) {
/*
* The fact that this queue is being destroyed does not
* invalidate the fact that this queue may have been
* activated during the current burst. As a consequence,
* although the queue does not exist anymore, and hence
* needs to be removed from the burst list if there,
* the burst size has not to be decremented.
*/
hlist_del_init(&bfqq->burst_list_node); hlist_del_init(&bfqq->burst_list_node);
/*
* Decrement also burst size after the removal, if the
* process associated with bfqq is exiting, and thus
* does not contribute to the burst any longer. This
* decrement helps filter out false positives of large
* bursts, when some short-lived process (often due to
* the execution of commands by some service) happens
* to start and exit while a complex application is
* starting, and thus spawning several processes that
* do I/O (and that *must not* be treated as a large
* burst, see comments on bfq_handle_burst).
*
* In particular, the decrement is performed only if:
* 1) bfqq is not a merged queue, because, if it is,
* then this free of bfqq is not triggered by the exit
* of the process bfqq is associated with, but exactly
* by the fact that bfqq has just been merged.
* 2) burst_size is greater than 0, to handle
* unbalanced decrements. Unbalanced decrements may
* happen in te following case: bfqq is inserted into
* the current burst list--without incrementing
* bust_size--because of a split, but the current
* burst list is not the burst list bfqq belonged to
* (see comments on the case of a split in
* bfq_set_request).
*/
if (bfqq->bic && bfqq->bfqd->burst_size > 0)
bfqq->bfqd->burst_size--;
}
kmem_cache_free(bfq_pool, bfqq); kmem_cache_free(bfq_pool, bfqq);
#ifdef CONFIG_BFQ_GROUP_IOSCHED #ifdef CONFIG_BFQ_GROUP_IOSCHED
@ -4127,7 +4189,6 @@ static void __bfq_insert_request(struct bfq_data *bfqd, struct request *rq)
new_bfqq->allocated++; new_bfqq->allocated++;
bfqq->allocated--; bfqq->allocated--;
new_bfqq->ref++; new_bfqq->ref++;
bfq_clear_bfqq_just_created(bfqq);
/* /*
* If the bic associated with the process * If the bic associated with the process
* issuing this request still points to bfqq * issuing this request still points to bfqq
@ -4139,6 +4200,8 @@ static void __bfq_insert_request(struct bfq_data *bfqd, struct request *rq)
if (bic_to_bfqq(RQ_BIC(rq), 1) == bfqq) if (bic_to_bfqq(RQ_BIC(rq), 1) == bfqq)
bfq_merge_bfqqs(bfqd, RQ_BIC(rq), bfq_merge_bfqqs(bfqd, RQ_BIC(rq),
bfqq, new_bfqq); bfqq, new_bfqq);
bfq_clear_bfqq_just_created(bfqq);
/* /*
* rq is about to be enqueued into new_bfqq, * rq is about to be enqueued into new_bfqq,
* release rq reference on bfqq * release rq reference on bfqq
@ -4424,6 +4487,34 @@ static struct bfq_queue *bfq_get_bfqq_handle_split(struct bfq_data *bfqd,
else { else {
bfq_clear_bfqq_in_large_burst(bfqq); bfq_clear_bfqq_in_large_burst(bfqq);
if (bic->was_in_burst_list) if (bic->was_in_burst_list)
/*
* If bfqq was in the current
* burst list before being
* merged, then we have to add
* it back. And we do not need
* to increase burst_size, as
* we did not decrement
* burst_size when we removed
* bfqq from the burst list as
* a consequence of a merge
* (see comments in
* bfq_put_queue). In this
* respect, it would be rather
* costly to know whether the
* current burst list is still
* the same burst list from
* which bfqq was removed on
* the merge. To avoid this
* cost, if bfqq was in a
* burst list, then we add
* bfqq to the current burst
* list without any further
* check. This can cause
* inappropriate insertions,
* but rarely enough to not
* harm the detection of large
* bursts significantly.
*/
hlist_add_head(&bfqq->burst_list_node, hlist_add_head(&bfqq->burst_list_node,
&bfqd->burst_list); &bfqd->burst_list);
} }
@ -4775,7 +4866,7 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
bfq_init_root_group(bfqd->root_group, bfqd); bfq_init_root_group(bfqd->root_group, bfqd);
bfq_init_entity(&bfqd->oom_bfqq.entity, bfqd->root_group); bfq_init_entity(&bfqd->oom_bfqq.entity, bfqd->root_group);
wbt_disable_default(q);
return 0; return 0;
out_free: out_free:

View File

@ -485,11 +485,8 @@ EXPORT_SYMBOL(bioset_integrity_create);
void bioset_integrity_free(struct bio_set *bs) void bioset_integrity_free(struct bio_set *bs)
{ {
if (bs->bio_integrity_pool) mempool_destroy(bs->bio_integrity_pool);
mempool_destroy(bs->bio_integrity_pool); mempool_destroy(bs->bvec_integrity_pool);
if (bs->bvec_integrity_pool)
mempool_destroy(bs->bvec_integrity_pool);
} }
EXPORT_SYMBOL(bioset_integrity_free); EXPORT_SYMBOL(bioset_integrity_free);

View File

@ -400,7 +400,7 @@ static void punt_bios_to_rescuer(struct bio_set *bs)
/** /**
* bio_alloc_bioset - allocate a bio for I/O * bio_alloc_bioset - allocate a bio for I/O
* @gfp_mask: the GFP_ mask given to the slab allocator * @gfp_mask: the GFP_* mask given to the slab allocator
* @nr_iovecs: number of iovecs to pre-allocate * @nr_iovecs: number of iovecs to pre-allocate
* @bs: the bio_set to allocate from. * @bs: the bio_set to allocate from.
* *
@ -1931,11 +1931,8 @@ void bioset_free(struct bio_set *bs)
if (bs->rescue_workqueue) if (bs->rescue_workqueue)
destroy_workqueue(bs->rescue_workqueue); destroy_workqueue(bs->rescue_workqueue);
if (bs->bio_pool) mempool_destroy(bs->bio_pool);
mempool_destroy(bs->bio_pool); mempool_destroy(bs->bvec_pool);
if (bs->bvec_pool)
mempool_destroy(bs->bvec_pool);
bioset_integrity_free(bs); bioset_integrity_free(bs);
bio_put_slab(bs); bio_put_slab(bs);
@ -2035,37 +2032,6 @@ int bio_associate_blkcg(struct bio *bio, struct cgroup_subsys_state *blkcg_css)
} }
EXPORT_SYMBOL_GPL(bio_associate_blkcg); EXPORT_SYMBOL_GPL(bio_associate_blkcg);
/**
* bio_associate_current - associate a bio with %current
* @bio: target bio
*
* Associate @bio with %current if it hasn't been associated yet. Block
* layer will treat @bio as if it were issued by %current no matter which
* task actually issues it.
*
* This function takes an extra reference of @task's io_context and blkcg
* which will be put when @bio is released. The caller must own @bio,
* ensure %current->io_context exists, and is responsible for synchronizing
* calls to this function.
*/
int bio_associate_current(struct bio *bio)
{
struct io_context *ioc;
if (bio->bi_css)
return -EBUSY;
ioc = current->io_context;
if (!ioc)
return -ENOENT;
get_io_context_active(ioc);
bio->bi_ioc = ioc;
bio->bi_css = task_get_css(current, io_cgrp_id);
return 0;
}
EXPORT_SYMBOL_GPL(bio_associate_current);
/** /**
* bio_disassociate_task - undo bio_associate_current() * bio_disassociate_task - undo bio_associate_current()
* @bio: target bio * @bio: target bio

View File

@ -1419,6 +1419,11 @@ int blkcg_policy_register(struct blkcg_policy *pol)
if (i >= BLKCG_MAX_POLS) if (i >= BLKCG_MAX_POLS)
goto err_unlock; goto err_unlock;
/* Make sure cpd/pd_alloc_fn and cpd/pd_free_fn in pairs */
if ((!pol->cpd_alloc_fn ^ !pol->cpd_free_fn) ||
(!pol->pd_alloc_fn ^ !pol->pd_free_fn))
goto err_unlock;
/* register @pol */ /* register @pol */
pol->plid = i; pol->plid = i;
blkcg_policy[pol->plid] = pol; blkcg_policy[pol->plid] = pol;
@ -1452,7 +1457,7 @@ int blkcg_policy_register(struct blkcg_policy *pol)
return 0; return 0;
err_free_cpds: err_free_cpds:
if (pol->cpd_alloc_fn) { if (pol->cpd_free_fn) {
list_for_each_entry(blkcg, &all_blkcgs, all_blkcgs_node) { list_for_each_entry(blkcg, &all_blkcgs, all_blkcgs_node) {
if (blkcg->cpd[pol->plid]) { if (blkcg->cpd[pol->plid]) {
pol->cpd_free_fn(blkcg->cpd[pol->plid]); pol->cpd_free_fn(blkcg->cpd[pol->plid]);
@ -1492,7 +1497,7 @@ void blkcg_policy_unregister(struct blkcg_policy *pol)
/* remove cpds and unregister */ /* remove cpds and unregister */
mutex_lock(&blkcg_pol_mutex); mutex_lock(&blkcg_pol_mutex);
if (pol->cpd_alloc_fn) { if (pol->cpd_free_fn) {
list_for_each_entry(blkcg, &all_blkcgs, all_blkcgs_node) { list_for_each_entry(blkcg, &all_blkcgs, all_blkcgs_node) {
if (blkcg->cpd[pol->plid]) { if (blkcg->cpd[pol->plid]) {
pol->cpd_free_fn(blkcg->cpd[pol->plid]); pol->cpd_free_fn(blkcg->cpd[pol->plid]);

View File

@ -333,11 +333,13 @@ EXPORT_SYMBOL(blk_stop_queue);
void blk_sync_queue(struct request_queue *q) void blk_sync_queue(struct request_queue *q)
{ {
del_timer_sync(&q->timeout); del_timer_sync(&q->timeout);
cancel_work_sync(&q->timeout_work);
if (q->mq_ops) { if (q->mq_ops) {
struct blk_mq_hw_ctx *hctx; struct blk_mq_hw_ctx *hctx;
int i; int i;
cancel_delayed_work_sync(&q->requeue_work);
queue_for_each_hw_ctx(q, hctx, i) queue_for_each_hw_ctx(q, hctx, i)
cancel_delayed_work_sync(&hctx->run_work); cancel_delayed_work_sync(&hctx->run_work);
} else { } else {
@ -346,6 +348,37 @@ void blk_sync_queue(struct request_queue *q)
} }
EXPORT_SYMBOL(blk_sync_queue); EXPORT_SYMBOL(blk_sync_queue);
/**
* blk_set_preempt_only - set QUEUE_FLAG_PREEMPT_ONLY
* @q: request queue pointer
*
* Returns the previous value of the PREEMPT_ONLY flag - 0 if the flag was not
* set and 1 if the flag was already set.
*/
int blk_set_preempt_only(struct request_queue *q)
{
unsigned long flags;
int res;
spin_lock_irqsave(q->queue_lock, flags);
res = queue_flag_test_and_set(QUEUE_FLAG_PREEMPT_ONLY, q);
spin_unlock_irqrestore(q->queue_lock, flags);
return res;
}
EXPORT_SYMBOL_GPL(blk_set_preempt_only);
void blk_clear_preempt_only(struct request_queue *q)
{
unsigned long flags;
spin_lock_irqsave(q->queue_lock, flags);
queue_flag_clear(QUEUE_FLAG_PREEMPT_ONLY, q);
wake_up_all(&q->mq_freeze_wq);
spin_unlock_irqrestore(q->queue_lock, flags);
}
EXPORT_SYMBOL_GPL(blk_clear_preempt_only);
/** /**
* __blk_run_queue_uncond - run a queue whether or not it has been stopped * __blk_run_queue_uncond - run a queue whether or not it has been stopped
* @q: The queue to run * @q: The queue to run
@ -610,6 +643,9 @@ void blk_set_queue_dying(struct request_queue *q)
} }
spin_unlock_irq(q->queue_lock); spin_unlock_irq(q->queue_lock);
} }
/* Make blk_queue_enter() reexamine the DYING flag. */
wake_up_all(&q->mq_freeze_wq);
} }
EXPORT_SYMBOL_GPL(blk_set_queue_dying); EXPORT_SYMBOL_GPL(blk_set_queue_dying);
@ -718,7 +754,7 @@ static void free_request_size(void *element, void *data)
int blk_init_rl(struct request_list *rl, struct request_queue *q, int blk_init_rl(struct request_list *rl, struct request_queue *q,
gfp_t gfp_mask) gfp_t gfp_mask)
{ {
if (unlikely(rl->rq_pool)) if (unlikely(rl->rq_pool) || q->mq_ops)
return 0; return 0;
rl->q = q; rl->q = q;
@ -760,15 +796,38 @@ struct request_queue *blk_alloc_queue(gfp_t gfp_mask)
} }
EXPORT_SYMBOL(blk_alloc_queue); EXPORT_SYMBOL(blk_alloc_queue);
int blk_queue_enter(struct request_queue *q, bool nowait) /**
* blk_queue_enter() - try to increase q->q_usage_counter
* @q: request queue pointer
* @flags: BLK_MQ_REQ_NOWAIT and/or BLK_MQ_REQ_PREEMPT
*/
int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
{ {
const bool preempt = flags & BLK_MQ_REQ_PREEMPT;
while (true) { while (true) {
bool success = false;
int ret; int ret;
if (percpu_ref_tryget_live(&q->q_usage_counter)) rcu_read_lock_sched();
if (percpu_ref_tryget_live(&q->q_usage_counter)) {
/*
* The code that sets the PREEMPT_ONLY flag is
* responsible for ensuring that that flag is globally
* visible before the queue is unfrozen.
*/
if (preempt || !blk_queue_preempt_only(q)) {
success = true;
} else {
percpu_ref_put(&q->q_usage_counter);
}
}
rcu_read_unlock_sched();
if (success)
return 0; return 0;
if (nowait) if (flags & BLK_MQ_REQ_NOWAIT)
return -EBUSY; return -EBUSY;
/* /*
@ -781,7 +840,8 @@ int blk_queue_enter(struct request_queue *q, bool nowait)
smp_rmb(); smp_rmb();
ret = wait_event_interruptible(q->mq_freeze_wq, ret = wait_event_interruptible(q->mq_freeze_wq,
!atomic_read(&q->mq_freeze_depth) || (atomic_read(&q->mq_freeze_depth) == 0 &&
(preempt || !blk_queue_preempt_only(q))) ||
blk_queue_dying(q)); blk_queue_dying(q));
if (blk_queue_dying(q)) if (blk_queue_dying(q))
return -ENODEV; return -ENODEV;
@ -844,6 +904,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
setup_timer(&q->backing_dev_info->laptop_mode_wb_timer, setup_timer(&q->backing_dev_info->laptop_mode_wb_timer,
laptop_mode_timer_fn, (unsigned long) q); laptop_mode_timer_fn, (unsigned long) q);
setup_timer(&q->timeout, blk_rq_timed_out_timer, (unsigned long) q); setup_timer(&q->timeout, blk_rq_timed_out_timer, (unsigned long) q);
INIT_WORK(&q->timeout_work, NULL);
INIT_LIST_HEAD(&q->queue_head); INIT_LIST_HEAD(&q->queue_head);
INIT_LIST_HEAD(&q->timeout_list); INIT_LIST_HEAD(&q->timeout_list);
INIT_LIST_HEAD(&q->icq_list); INIT_LIST_HEAD(&q->icq_list);
@ -1154,7 +1215,7 @@ int blk_update_nr_requests(struct request_queue *q, unsigned int nr)
* @rl: request list to allocate from * @rl: request list to allocate from
* @op: operation and flags * @op: operation and flags
* @bio: bio to allocate request for (can be %NULL) * @bio: bio to allocate request for (can be %NULL)
* @gfp_mask: allocation mask * @flags: BLQ_MQ_REQ_* flags
* *
* Get a free request from @q. This function may fail under memory * Get a free request from @q. This function may fail under memory
* pressure or if @q is dead. * pressure or if @q is dead.
@ -1164,7 +1225,7 @@ int blk_update_nr_requests(struct request_queue *q, unsigned int nr)
* Returns request pointer on success, with @q->queue_lock *not held*. * Returns request pointer on success, with @q->queue_lock *not held*.
*/ */
static struct request *__get_request(struct request_list *rl, unsigned int op, static struct request *__get_request(struct request_list *rl, unsigned int op,
struct bio *bio, gfp_t gfp_mask) struct bio *bio, blk_mq_req_flags_t flags)
{ {
struct request_queue *q = rl->q; struct request_queue *q = rl->q;
struct request *rq; struct request *rq;
@ -1173,6 +1234,8 @@ static struct request *__get_request(struct request_list *rl, unsigned int op,
struct io_cq *icq = NULL; struct io_cq *icq = NULL;
const bool is_sync = op_is_sync(op); const bool is_sync = op_is_sync(op);
int may_queue; int may_queue;
gfp_t gfp_mask = flags & BLK_MQ_REQ_NOWAIT ? GFP_ATOMIC :
__GFP_DIRECT_RECLAIM;
req_flags_t rq_flags = RQF_ALLOCED; req_flags_t rq_flags = RQF_ALLOCED;
lockdep_assert_held(q->queue_lock); lockdep_assert_held(q->queue_lock);
@ -1255,6 +1318,8 @@ static struct request *__get_request(struct request_list *rl, unsigned int op,
blk_rq_set_rl(rq, rl); blk_rq_set_rl(rq, rl);
rq->cmd_flags = op; rq->cmd_flags = op;
rq->rq_flags = rq_flags; rq->rq_flags = rq_flags;
if (flags & BLK_MQ_REQ_PREEMPT)
rq->rq_flags |= RQF_PREEMPT;
/* init elvpriv */ /* init elvpriv */
if (rq_flags & RQF_ELVPRIV) { if (rq_flags & RQF_ELVPRIV) {
@ -1333,7 +1398,7 @@ rq_starved:
* @q: request_queue to allocate request from * @q: request_queue to allocate request from
* @op: operation and flags * @op: operation and flags
* @bio: bio to allocate request for (can be %NULL) * @bio: bio to allocate request for (can be %NULL)
* @gfp_mask: allocation mask * @flags: BLK_MQ_REQ_* flags.
* *
* Get a free request from @q. If %__GFP_DIRECT_RECLAIM is set in @gfp_mask, * Get a free request from @q. If %__GFP_DIRECT_RECLAIM is set in @gfp_mask,
* this function keeps retrying under memory pressure and fails iff @q is dead. * this function keeps retrying under memory pressure and fails iff @q is dead.
@ -1343,7 +1408,7 @@ rq_starved:
* Returns request pointer on success, with @q->queue_lock *not held*. * Returns request pointer on success, with @q->queue_lock *not held*.
*/ */
static struct request *get_request(struct request_queue *q, unsigned int op, static struct request *get_request(struct request_queue *q, unsigned int op,
struct bio *bio, gfp_t gfp_mask) struct bio *bio, blk_mq_req_flags_t flags)
{ {
const bool is_sync = op_is_sync(op); const bool is_sync = op_is_sync(op);
DEFINE_WAIT(wait); DEFINE_WAIT(wait);
@ -1355,7 +1420,7 @@ static struct request *get_request(struct request_queue *q, unsigned int op,
rl = blk_get_rl(q, bio); /* transferred to @rq on success */ rl = blk_get_rl(q, bio); /* transferred to @rq on success */
retry: retry:
rq = __get_request(rl, op, bio, gfp_mask); rq = __get_request(rl, op, bio, flags);
if (!IS_ERR(rq)) if (!IS_ERR(rq))
return rq; return rq;
@ -1364,7 +1429,7 @@ retry:
return ERR_PTR(-EAGAIN); return ERR_PTR(-EAGAIN);
} }
if (!gfpflags_allow_blocking(gfp_mask) || unlikely(blk_queue_dying(q))) { if ((flags & BLK_MQ_REQ_NOWAIT) || unlikely(blk_queue_dying(q))) {
blk_put_rl(rl); blk_put_rl(rl);
return rq; return rq;
} }
@ -1391,20 +1456,28 @@ retry:
goto retry; goto retry;
} }
/* flags: BLK_MQ_REQ_PREEMPT and/or BLK_MQ_REQ_NOWAIT. */
static struct request *blk_old_get_request(struct request_queue *q, static struct request *blk_old_get_request(struct request_queue *q,
unsigned int op, gfp_t gfp_mask) unsigned int op, blk_mq_req_flags_t flags)
{ {
struct request *rq; struct request *rq;
gfp_t gfp_mask = flags & BLK_MQ_REQ_NOWAIT ? GFP_ATOMIC :
__GFP_DIRECT_RECLAIM;
int ret = 0;
WARN_ON_ONCE(q->mq_ops); WARN_ON_ONCE(q->mq_ops);
/* create ioc upfront */ /* create ioc upfront */
create_io_context(gfp_mask, q->node); create_io_context(gfp_mask, q->node);
ret = blk_queue_enter(q, flags);
if (ret)
return ERR_PTR(ret);
spin_lock_irq(q->queue_lock); spin_lock_irq(q->queue_lock);
rq = get_request(q, op, NULL, gfp_mask); rq = get_request(q, op, NULL, flags);
if (IS_ERR(rq)) { if (IS_ERR(rq)) {
spin_unlock_irq(q->queue_lock); spin_unlock_irq(q->queue_lock);
blk_queue_exit(q);
return rq; return rq;
} }
@ -1415,25 +1488,40 @@ static struct request *blk_old_get_request(struct request_queue *q,
return rq; return rq;
} }
struct request *blk_get_request(struct request_queue *q, unsigned int op, /**
gfp_t gfp_mask) * blk_get_request_flags - allocate a request
* @q: request queue to allocate a request for
* @op: operation (REQ_OP_*) and REQ_* flags, e.g. REQ_SYNC.
* @flags: BLK_MQ_REQ_* flags, e.g. BLK_MQ_REQ_NOWAIT.
*/
struct request *blk_get_request_flags(struct request_queue *q, unsigned int op,
blk_mq_req_flags_t flags)
{ {
struct request *req; struct request *req;
WARN_ON_ONCE(op & REQ_NOWAIT);
WARN_ON_ONCE(flags & ~(BLK_MQ_REQ_NOWAIT | BLK_MQ_REQ_PREEMPT));
if (q->mq_ops) { if (q->mq_ops) {
req = blk_mq_alloc_request(q, op, req = blk_mq_alloc_request(q, op, flags);
(gfp_mask & __GFP_DIRECT_RECLAIM) ?
0 : BLK_MQ_REQ_NOWAIT);
if (!IS_ERR(req) && q->mq_ops->initialize_rq_fn) if (!IS_ERR(req) && q->mq_ops->initialize_rq_fn)
q->mq_ops->initialize_rq_fn(req); q->mq_ops->initialize_rq_fn(req);
} else { } else {
req = blk_old_get_request(q, op, gfp_mask); req = blk_old_get_request(q, op, flags);
if (!IS_ERR(req) && q->initialize_rq_fn) if (!IS_ERR(req) && q->initialize_rq_fn)
q->initialize_rq_fn(req); q->initialize_rq_fn(req);
} }
return req; return req;
} }
EXPORT_SYMBOL(blk_get_request_flags);
struct request *blk_get_request(struct request_queue *q, unsigned int op,
gfp_t gfp_mask)
{
return blk_get_request_flags(q, op, gfp_mask & __GFP_DIRECT_RECLAIM ?
0 : BLK_MQ_REQ_NOWAIT);
}
EXPORT_SYMBOL(blk_get_request); EXPORT_SYMBOL(blk_get_request);
/** /**
@ -1576,6 +1664,7 @@ void __blk_put_request(struct request_queue *q, struct request *req)
blk_free_request(rl, req); blk_free_request(rl, req);
freed_request(rl, sync, rq_flags); freed_request(rl, sync, rq_flags);
blk_put_rl(rl); blk_put_rl(rl);
blk_queue_exit(q);
} }
} }
EXPORT_SYMBOL_GPL(__blk_put_request); EXPORT_SYMBOL_GPL(__blk_put_request);
@ -1857,8 +1946,10 @@ get_rq:
* Grab a free request. This is might sleep but can not fail. * Grab a free request. This is might sleep but can not fail.
* Returns with the queue unlocked. * Returns with the queue unlocked.
*/ */
req = get_request(q, bio->bi_opf, bio, GFP_NOIO); blk_queue_enter_live(q);
req = get_request(q, bio->bi_opf, bio, 0);
if (IS_ERR(req)) { if (IS_ERR(req)) {
blk_queue_exit(q);
__wbt_done(q->rq_wb, wb_acct); __wbt_done(q->rq_wb, wb_acct);
if (PTR_ERR(req) == -ENOMEM) if (PTR_ERR(req) == -ENOMEM)
bio->bi_status = BLK_STS_RESOURCE; bio->bi_status = BLK_STS_RESOURCE;
@ -2200,8 +2291,10 @@ blk_qc_t generic_make_request(struct bio *bio)
current->bio_list = bio_list_on_stack; current->bio_list = bio_list_on_stack;
do { do {
struct request_queue *q = bio->bi_disk->queue; struct request_queue *q = bio->bi_disk->queue;
blk_mq_req_flags_t flags = bio->bi_opf & REQ_NOWAIT ?
BLK_MQ_REQ_NOWAIT : 0;
if (likely(blk_queue_enter(q, bio->bi_opf & REQ_NOWAIT) == 0)) { if (likely(blk_queue_enter(q, flags) == 0)) {
struct bio_list lower, same; struct bio_list lower, same;
/* Create a fresh bio_list for all subordinate requests */ /* Create a fresh bio_list for all subordinate requests */
@ -2241,6 +2334,40 @@ out:
} }
EXPORT_SYMBOL(generic_make_request); EXPORT_SYMBOL(generic_make_request);
/**
* direct_make_request - hand a buffer directly to its device driver for I/O
* @bio: The bio describing the location in memory and on the device.
*
* This function behaves like generic_make_request(), but does not protect
* against recursion. Must only be used if the called driver is known
* to not call generic_make_request (or direct_make_request) again from
* its make_request function. (Calling direct_make_request again from
* a workqueue is perfectly fine as that doesn't recurse).
*/
blk_qc_t direct_make_request(struct bio *bio)
{
struct request_queue *q = bio->bi_disk->queue;
bool nowait = bio->bi_opf & REQ_NOWAIT;
blk_qc_t ret;
if (!generic_make_request_checks(bio))
return BLK_QC_T_NONE;
if (unlikely(blk_queue_enter(q, nowait ? BLK_MQ_REQ_NOWAIT : 0))) {
if (nowait && !blk_queue_dying(q))
bio->bi_status = BLK_STS_AGAIN;
else
bio->bi_status = BLK_STS_IOERR;
bio_endio(bio);
return BLK_QC_T_NONE;
}
ret = q->make_request_fn(q, bio);
blk_queue_exit(q);
return ret;
}
EXPORT_SYMBOL_GPL(direct_make_request);
/** /**
* submit_bio - submit a bio to the block device layer for I/O * submit_bio - submit a bio to the block device layer for I/O
* @bio: The &struct bio which describes the I/O * @bio: The &struct bio which describes the I/O
@ -2285,6 +2412,17 @@ blk_qc_t submit_bio(struct bio *bio)
} }
EXPORT_SYMBOL(submit_bio); EXPORT_SYMBOL(submit_bio);
bool blk_poll(struct request_queue *q, blk_qc_t cookie)
{
if (!q->poll_fn || !blk_qc_t_valid(cookie))
return false;
if (current->plug)
blk_flush_plug_list(current->plug, false);
return q->poll_fn(q, cookie);
}
EXPORT_SYMBOL_GPL(blk_poll);
/** /**
* blk_cloned_rq_check_limits - Helper function to check a cloned request * blk_cloned_rq_check_limits - Helper function to check a cloned request
* for new the queue limits * for new the queue limits
@ -2350,7 +2488,7 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request *
* bypass a potential scheduler on the bottom device for * bypass a potential scheduler on the bottom device for
* insert. * insert.
*/ */
blk_mq_request_bypass_insert(rq); blk_mq_request_bypass_insert(rq, true);
return BLK_STS_OK; return BLK_STS_OK;
} }
@ -2464,20 +2602,22 @@ void blk_account_io_done(struct request *req)
* Don't process normal requests when queue is suspended * Don't process normal requests when queue is suspended
* or in the process of suspending/resuming * or in the process of suspending/resuming
*/ */
static struct request *blk_pm_peek_request(struct request_queue *q, static bool blk_pm_allow_request(struct request *rq)
struct request *rq)
{ {
if (q->dev && (q->rpm_status == RPM_SUSPENDED || switch (rq->q->rpm_status) {
(q->rpm_status != RPM_ACTIVE && !(rq->rq_flags & RQF_PM)))) case RPM_RESUMING:
return NULL; case RPM_SUSPENDING:
else return rq->rq_flags & RQF_PM;
return rq; case RPM_SUSPENDED:
return false;
}
return true;
} }
#else #else
static inline struct request *blk_pm_peek_request(struct request_queue *q, static bool blk_pm_allow_request(struct request *rq)
struct request *rq)
{ {
return rq; return true;
} }
#endif #endif
@ -2517,6 +2657,48 @@ void blk_account_io_start(struct request *rq, bool new_io)
part_stat_unlock(); part_stat_unlock();
} }
static struct request *elv_next_request(struct request_queue *q)
{
struct request *rq;
struct blk_flush_queue *fq = blk_get_flush_queue(q, NULL);
WARN_ON_ONCE(q->mq_ops);
while (1) {
list_for_each_entry(rq, &q->queue_head, queuelist) {
if (blk_pm_allow_request(rq))
return rq;
if (rq->rq_flags & RQF_SOFTBARRIER)
break;
}
/*
* Flush request is running and flush request isn't queueable
* in the drive, we can hold the queue till flush request is
* finished. Even we don't do this, driver can't dispatch next
* requests and will requeue them. And this can improve
* throughput too. For example, we have request flush1, write1,
* flush 2. flush1 is dispatched, then queue is hold, write1
* isn't inserted to queue. After flush1 is finished, flush2
* will be dispatched. Since disk cache is already clean,
* flush2 will be finished very soon, so looks like flush2 is
* folded to flush1.
* Since the queue is hold, a flag is set to indicate the queue
* should be restarted later. Please see flush_end_io() for
* details.
*/
if (fq->flush_pending_idx != fq->flush_running_idx &&
!queue_flush_queueable(q)) {
fq->flush_queue_delayed = 1;
return NULL;
}
if (unlikely(blk_queue_bypass(q)) ||
!q->elevator->type->ops.sq.elevator_dispatch_fn(q, 0))
return NULL;
}
}
/** /**
* blk_peek_request - peek at the top of a request queue * blk_peek_request - peek at the top of a request queue
* @q: request queue to peek at * @q: request queue to peek at
@ -2538,12 +2720,7 @@ struct request *blk_peek_request(struct request_queue *q)
lockdep_assert_held(q->queue_lock); lockdep_assert_held(q->queue_lock);
WARN_ON_ONCE(q->mq_ops); WARN_ON_ONCE(q->mq_ops);
while ((rq = __elv_next_request(q)) != NULL) { while ((rq = elv_next_request(q)) != NULL) {
rq = blk_pm_peek_request(q, rq);
if (!rq)
break;
if (!(rq->rq_flags & RQF_STARTED)) { if (!(rq->rq_flags & RQF_STARTED)) {
/* /*
* This is the first time the device driver * This is the first time the device driver
@ -2695,6 +2872,27 @@ struct request *blk_fetch_request(struct request_queue *q)
} }
EXPORT_SYMBOL(blk_fetch_request); EXPORT_SYMBOL(blk_fetch_request);
/*
* Steal bios from a request and add them to a bio list.
* The request must not have been partially completed before.
*/
void blk_steal_bios(struct bio_list *list, struct request *rq)
{
if (rq->bio) {
if (list->tail)
list->tail->bi_next = rq->bio;
else
list->head = rq->bio;
list->tail = rq->biotail;
rq->bio = NULL;
rq->biotail = NULL;
}
rq->__data_len = 0;
}
EXPORT_SYMBOL_GPL(blk_steal_bios);
/** /**
* blk_update_request - Special helper function for request stacking drivers * blk_update_request - Special helper function for request stacking drivers
* @req: the request being processed * @req: the request being processed

View File

@ -231,8 +231,13 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error)
/* release the tag's ownership to the req cloned from */ /* release the tag's ownership to the req cloned from */
spin_lock_irqsave(&fq->mq_flush_lock, flags); spin_lock_irqsave(&fq->mq_flush_lock, flags);
hctx = blk_mq_map_queue(q, flush_rq->mq_ctx->cpu); hctx = blk_mq_map_queue(q, flush_rq->mq_ctx->cpu);
blk_mq_tag_set_rq(hctx, flush_rq->tag, fq->orig_rq); if (!q->elevator) {
flush_rq->tag = -1; blk_mq_tag_set_rq(hctx, flush_rq->tag, fq->orig_rq);
flush_rq->tag = -1;
} else {
blk_mq_put_driver_tag_hctx(hctx, flush_rq);
flush_rq->internal_tag = -1;
}
} }
running = &fq->flush_queue[fq->flush_running_idx]; running = &fq->flush_queue[fq->flush_running_idx];
@ -318,19 +323,26 @@ static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq)
blk_rq_init(q, flush_rq); blk_rq_init(q, flush_rq);
/* /*
* Borrow tag from the first request since they can't * In case of none scheduler, borrow tag from the first request
* be in flight at the same time. And acquire the tag's * since they can't be in flight at the same time. And acquire
* ownership for flush req. * the tag's ownership for flush req.
*
* In case of IO scheduler, flush rq need to borrow scheduler tag
* just for cheating put/get driver tag.
*/ */
if (q->mq_ops) { if (q->mq_ops) {
struct blk_mq_hw_ctx *hctx; struct blk_mq_hw_ctx *hctx;
flush_rq->mq_ctx = first_rq->mq_ctx; flush_rq->mq_ctx = first_rq->mq_ctx;
flush_rq->tag = first_rq->tag;
fq->orig_rq = first_rq;
hctx = blk_mq_map_queue(q, first_rq->mq_ctx->cpu); if (!q->elevator) {
blk_mq_tag_set_rq(hctx, first_rq->tag, flush_rq); fq->orig_rq = first_rq;
flush_rq->tag = first_rq->tag;
hctx = blk_mq_map_queue(q, first_rq->mq_ctx->cpu);
blk_mq_tag_set_rq(hctx, first_rq->tag, flush_rq);
} else {
flush_rq->internal_tag = first_rq->internal_tag;
}
} }
flush_rq->cmd_flags = REQ_OP_FLUSH | REQ_PREFLUSH; flush_rq->cmd_flags = REQ_OP_FLUSH | REQ_PREFLUSH;
@ -394,6 +406,11 @@ static void mq_flush_data_end_io(struct request *rq, blk_status_t error)
hctx = blk_mq_map_queue(q, ctx->cpu); hctx = blk_mq_map_queue(q, ctx->cpu);
if (q->elevator) {
WARN_ON(rq->tag < 0);
blk_mq_put_driver_tag_hctx(hctx, rq);
}
/* /*
* After populating an empty queue, kick it to avoid stall. Read * After populating an empty queue, kick it to avoid stall. Read
* the comment in flush_end_io(). * the comment in flush_end_io().
@ -463,7 +480,7 @@ void blk_insert_flush(struct request *rq)
if ((policy & REQ_FSEQ_DATA) && if ((policy & REQ_FSEQ_DATA) &&
!(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) { !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
if (q->mq_ops) if (q->mq_ops)
blk_mq_sched_insert_request(rq, false, true, false, false); blk_mq_request_bypass_insert(rq, false);
else else
list_add_tail(&rq->queuelist, &q->queue_head); list_add_tail(&rq->queuelist, &q->queue_head);
return; return;

View File

@ -275,51 +275,18 @@ static unsigned int __blkdev_sectors_to_bio_pages(sector_t nr_sects)
return min(pages, (sector_t)BIO_MAX_PAGES); return min(pages, (sector_t)BIO_MAX_PAGES);
} }
/** static int __blkdev_issue_zero_pages(struct block_device *bdev,
* __blkdev_issue_zeroout - generate number of zero filed write bios sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
* @bdev: blockdev to issue struct bio **biop)
* @sector: start sector
* @nr_sects: number of sectors to write
* @gfp_mask: memory allocation flags (for bio_alloc)
* @biop: pointer to anchor bio
* @flags: controls detailed behavior
*
* Description:
* Zero-fill a block range, either using hardware offload or by explicitly
* writing zeroes to the device.
*
* Note that this function may fail with -EOPNOTSUPP if the driver signals
* zeroing offload support, but the device fails to process the command (for
* some devices there is no non-destructive way to verify whether this
* operation is actually supported). In this case the caller should call
* retry the call to blkdev_issue_zeroout() and the fallback path will be used.
*
* If a device is using logical block provisioning, the underlying space will
* not be released if %flags contains BLKDEV_ZERO_NOUNMAP.
*
* If %flags contains BLKDEV_ZERO_NOFALLBACK, the function will return
* -EOPNOTSUPP if no explicit hardware offload for zeroing is provided.
*/
int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, struct bio **biop,
unsigned flags)
{ {
int ret; struct request_queue *q = bdev_get_queue(bdev);
int bi_size = 0;
struct bio *bio = *biop; struct bio *bio = *biop;
int bi_size = 0;
unsigned int sz; unsigned int sz;
sector_t bs_mask;
bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1; if (!q)
if ((sector | nr_sects) & bs_mask) return -ENXIO;
return -EINVAL;
ret = __blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp_mask,
biop, flags);
if (ret != -EOPNOTSUPP || (flags & BLKDEV_ZERO_NOFALLBACK))
goto out;
ret = 0;
while (nr_sects != 0) { while (nr_sects != 0) {
bio = next_bio(bio, __blkdev_sectors_to_bio_pages(nr_sects), bio = next_bio(bio, __blkdev_sectors_to_bio_pages(nr_sects),
gfp_mask); gfp_mask);
@ -339,8 +306,46 @@ int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
} }
*biop = bio; *biop = bio;
out: return 0;
return ret; }
/**
* __blkdev_issue_zeroout - generate number of zero filed write bios
* @bdev: blockdev to issue
* @sector: start sector
* @nr_sects: number of sectors to write
* @gfp_mask: memory allocation flags (for bio_alloc)
* @biop: pointer to anchor bio
* @flags: controls detailed behavior
*
* Description:
* Zero-fill a block range, either using hardware offload or by explicitly
* writing zeroes to the device.
*
* If a device is using logical block provisioning, the underlying space will
* not be released if %flags contains BLKDEV_ZERO_NOUNMAP.
*
* If %flags contains BLKDEV_ZERO_NOFALLBACK, the function will return
* -EOPNOTSUPP if no explicit hardware offload for zeroing is provided.
*/
int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, struct bio **biop,
unsigned flags)
{
int ret;
sector_t bs_mask;
bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
if ((sector | nr_sects) & bs_mask)
return -EINVAL;
ret = __blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp_mask,
biop, flags);
if (ret != -EOPNOTSUPP || (flags & BLKDEV_ZERO_NOFALLBACK))
return ret;
return __blkdev_issue_zero_pages(bdev, sector, nr_sects, gfp_mask,
biop);
} }
EXPORT_SYMBOL(__blkdev_issue_zeroout); EXPORT_SYMBOL(__blkdev_issue_zeroout);
@ -360,18 +365,49 @@ EXPORT_SYMBOL(__blkdev_issue_zeroout);
int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, unsigned flags) sector_t nr_sects, gfp_t gfp_mask, unsigned flags)
{ {
int ret; int ret = 0;
struct bio *bio = NULL; sector_t bs_mask;
struct bio *bio;
struct blk_plug plug; struct blk_plug plug;
bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev);
bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
if ((sector | nr_sects) & bs_mask)
return -EINVAL;
retry:
bio = NULL;
blk_start_plug(&plug); blk_start_plug(&plug);
ret = __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask, if (try_write_zeroes) {
&bio, flags); ret = __blkdev_issue_write_zeroes(bdev, sector, nr_sects,
gfp_mask, &bio, flags);
} else if (!(flags & BLKDEV_ZERO_NOFALLBACK)) {
ret = __blkdev_issue_zero_pages(bdev, sector, nr_sects,
gfp_mask, &bio);
} else {
/* No zeroing offload support */
ret = -EOPNOTSUPP;
}
if (ret == 0 && bio) { if (ret == 0 && bio) {
ret = submit_bio_wait(bio); ret = submit_bio_wait(bio);
bio_put(bio); bio_put(bio);
} }
blk_finish_plug(&plug); blk_finish_plug(&plug);
if (ret && try_write_zeroes) {
if (!(flags & BLKDEV_ZERO_NOFALLBACK)) {
try_write_zeroes = false;
goto retry;
}
if (!bdev_write_zeroes_sectors(bdev)) {
/*
* Zeroing offload support was indicated, but the
* device reported ILLEGAL REQUEST (for some devices
* there is no non-destructive way to verify whether
* WRITE ZEROES is actually supported).
*/
ret = -EOPNOTSUPP;
}
}
return ret; return ret;
} }

View File

@ -54,7 +54,6 @@ static const char *const blk_queue_flag_name[] = {
QUEUE_FLAG_NAME(NOMERGES), QUEUE_FLAG_NAME(NOMERGES),
QUEUE_FLAG_NAME(SAME_COMP), QUEUE_FLAG_NAME(SAME_COMP),
QUEUE_FLAG_NAME(FAIL_IO), QUEUE_FLAG_NAME(FAIL_IO),
QUEUE_FLAG_NAME(STACKABLE),
QUEUE_FLAG_NAME(NONROT), QUEUE_FLAG_NAME(NONROT),
QUEUE_FLAG_NAME(IO_STAT), QUEUE_FLAG_NAME(IO_STAT),
QUEUE_FLAG_NAME(DISCARD), QUEUE_FLAG_NAME(DISCARD),
@ -75,6 +74,7 @@ static const char *const blk_queue_flag_name[] = {
QUEUE_FLAG_NAME(REGISTERED), QUEUE_FLAG_NAME(REGISTERED),
QUEUE_FLAG_NAME(SCSI_PASSTHROUGH), QUEUE_FLAG_NAME(SCSI_PASSTHROUGH),
QUEUE_FLAG_NAME(QUIESCED), QUEUE_FLAG_NAME(QUIESCED),
QUEUE_FLAG_NAME(PREEMPT_ONLY),
}; };
#undef QUEUE_FLAG_NAME #undef QUEUE_FLAG_NAME
@ -180,7 +180,6 @@ static const char *const hctx_state_name[] = {
HCTX_STATE_NAME(STOPPED), HCTX_STATE_NAME(STOPPED),
HCTX_STATE_NAME(TAG_ACTIVE), HCTX_STATE_NAME(TAG_ACTIVE),
HCTX_STATE_NAME(SCHED_RESTART), HCTX_STATE_NAME(SCHED_RESTART),
HCTX_STATE_NAME(TAG_WAITING),
HCTX_STATE_NAME(START_ON_RUN), HCTX_STATE_NAME(START_ON_RUN),
}; };
#undef HCTX_STATE_NAME #undef HCTX_STATE_NAME

View File

@ -81,20 +81,103 @@ static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx)
} else } else
clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state); clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
if (blk_mq_hctx_has_pending(hctx)) { return blk_mq_run_hw_queue(hctx, true);
blk_mq_run_hw_queue(hctx, true);
return true;
}
return false;
} }
/*
* Only SCSI implements .get_budget and .put_budget, and SCSI restarts
* its queue by itself in its completion handler, so we don't need to
* restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
*/
static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
{
struct request_queue *q = hctx->queue;
struct elevator_queue *e = q->elevator;
LIST_HEAD(rq_list);
do {
struct request *rq;
if (e->type->ops.mq.has_work &&
!e->type->ops.mq.has_work(hctx))
break;
if (!blk_mq_get_dispatch_budget(hctx))
break;
rq = e->type->ops.mq.dispatch_request(hctx);
if (!rq) {
blk_mq_put_dispatch_budget(hctx);
break;
}
/*
* Now this rq owns the budget which has to be released
* if this rq won't be queued to driver via .queue_rq()
* in blk_mq_dispatch_rq_list().
*/
list_add(&rq->queuelist, &rq_list);
} while (blk_mq_dispatch_rq_list(q, &rq_list, true));
}
static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
struct blk_mq_ctx *ctx)
{
unsigned idx = ctx->index_hw;
if (++idx == hctx->nr_ctx)
idx = 0;
return hctx->ctxs[idx];
}
/*
* Only SCSI implements .get_budget and .put_budget, and SCSI restarts
* its queue by itself in its completion handler, so we don't need to
* restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
*/
static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
{
struct request_queue *q = hctx->queue;
LIST_HEAD(rq_list);
struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from);
do {
struct request *rq;
if (!sbitmap_any_bit_set(&hctx->ctx_map))
break;
if (!blk_mq_get_dispatch_budget(hctx))
break;
rq = blk_mq_dequeue_from_ctx(hctx, ctx);
if (!rq) {
blk_mq_put_dispatch_budget(hctx);
break;
}
/*
* Now this rq owns the budget which has to be released
* if this rq won't be queued to driver via .queue_rq()
* in blk_mq_dispatch_rq_list().
*/
list_add(&rq->queuelist, &rq_list);
/* round robin for fair dispatch */
ctx = blk_mq_next_ctx(hctx, rq->mq_ctx);
} while (blk_mq_dispatch_rq_list(q, &rq_list, true));
WRITE_ONCE(hctx->dispatch_from, ctx);
}
/* return true if hw queue need to be run again */
void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
{ {
struct request_queue *q = hctx->queue; struct request_queue *q = hctx->queue;
struct elevator_queue *e = q->elevator; struct elevator_queue *e = q->elevator;
const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request; const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
bool did_work = false;
LIST_HEAD(rq_list); LIST_HEAD(rq_list);
/* RCU or SRCU read lock is needed before checking quiesced flag */ /* RCU or SRCU read lock is needed before checking quiesced flag */
@ -122,29 +205,34 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
* scheduler, we can no longer merge or sort them. So it's best to * scheduler, we can no longer merge or sort them. So it's best to
* leave them there for as long as we can. Mark the hw queue as * leave them there for as long as we can. Mark the hw queue as
* needing a restart in that case. * needing a restart in that case.
*
* We want to dispatch from the scheduler if there was nothing
* on the dispatch list or we were able to dispatch from the
* dispatch list.
*/ */
if (!list_empty(&rq_list)) { if (!list_empty(&rq_list)) {
blk_mq_sched_mark_restart_hctx(hctx); blk_mq_sched_mark_restart_hctx(hctx);
did_work = blk_mq_dispatch_rq_list(q, &rq_list); if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
} else if (!has_sched_dispatch) { if (has_sched_dispatch)
blk_mq_do_dispatch_sched(hctx);
else
blk_mq_do_dispatch_ctx(hctx);
}
} else if (has_sched_dispatch) {
blk_mq_do_dispatch_sched(hctx);
} else if (q->mq_ops->get_budget) {
/*
* If we need to get budget before queuing request, we
* dequeue request one by one from sw queue for avoiding
* to mess up I/O merge when dispatch runs out of resource.
*
* TODO: get more budgets, and dequeue more requests in
* one time.
*/
blk_mq_do_dispatch_ctx(hctx);
} else {
blk_mq_flush_busy_ctxs(hctx, &rq_list); blk_mq_flush_busy_ctxs(hctx, &rq_list);
blk_mq_dispatch_rq_list(q, &rq_list); blk_mq_dispatch_rq_list(q, &rq_list, false);
}
/*
* We want to dispatch from the scheduler if we had no work left
* on the dispatch list, OR if we did have work but weren't able
* to make progress.
*/
if (!did_work && has_sched_dispatch) {
do {
struct request *rq;
rq = e->type->ops.mq.dispatch_request(hctx);
if (!rq)
break;
list_add(&rq->queuelist, &rq_list);
} while (blk_mq_dispatch_rq_list(q, &rq_list));
} }
} }
@ -260,21 +348,21 @@ void blk_mq_sched_request_inserted(struct request *rq)
EXPORT_SYMBOL_GPL(blk_mq_sched_request_inserted); EXPORT_SYMBOL_GPL(blk_mq_sched_request_inserted);
static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx, static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
bool has_sched,
struct request *rq) struct request *rq)
{ {
if (rq->tag == -1) { /* dispatch flush rq directly */
rq->rq_flags |= RQF_SORTED; if (rq->rq_flags & RQF_FLUSH_SEQ) {
return false; spin_lock(&hctx->lock);
list_add(&rq->queuelist, &hctx->dispatch);
spin_unlock(&hctx->lock);
return true;
} }
/* if (has_sched)
* If we already have a real request tag, send directly to rq->rq_flags |= RQF_SORTED;
* the dispatch list.
*/ return false;
spin_lock(&hctx->lock);
list_add(&rq->queuelist, &hctx->dispatch);
spin_unlock(&hctx->lock);
return true;
} }
/** /**
@ -339,21 +427,6 @@ done:
} }
} }
/*
* Add flush/fua to the queue. If we fail getting a driver tag, then
* punt to the requeue list. Requeue will re-invoke us from a context
* that's safe to block from.
*/
static void blk_mq_sched_insert_flush(struct blk_mq_hw_ctx *hctx,
struct request *rq, bool can_block)
{
if (blk_mq_get_driver_tag(rq, &hctx, can_block)) {
blk_insert_flush(rq);
blk_mq_run_hw_queue(hctx, true);
} else
blk_mq_add_to_requeue_list(rq, false, true);
}
void blk_mq_sched_insert_request(struct request *rq, bool at_head, void blk_mq_sched_insert_request(struct request *rq, bool at_head,
bool run_queue, bool async, bool can_block) bool run_queue, bool async, bool can_block)
{ {
@ -362,12 +435,15 @@ void blk_mq_sched_insert_request(struct request *rq, bool at_head,
struct blk_mq_ctx *ctx = rq->mq_ctx; struct blk_mq_ctx *ctx = rq->mq_ctx;
struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu); struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
if (rq->tag == -1 && op_is_flush(rq->cmd_flags)) { /* flush rq in flush machinery need to be dispatched directly */
blk_mq_sched_insert_flush(hctx, rq, can_block); if (!(rq->rq_flags & RQF_FLUSH_SEQ) && op_is_flush(rq->cmd_flags)) {
return; blk_insert_flush(rq);
goto run;
} }
if (e && blk_mq_sched_bypass_insert(hctx, rq)) WARN_ON(e && (rq->tag != -1));
if (blk_mq_sched_bypass_insert(hctx, !!e, rq))
goto run; goto run;
if (e && e->type->ops.mq.insert_requests) { if (e && e->type->ops.mq.insert_requests) {
@ -393,23 +469,6 @@ void blk_mq_sched_insert_requests(struct request_queue *q,
struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu); struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
struct elevator_queue *e = hctx->queue->elevator; struct elevator_queue *e = hctx->queue->elevator;
if (e) {
struct request *rq, *next;
/*
* We bypass requests that already have a driver tag assigned,
* which should only be flushes. Flushes are only ever inserted
* as single requests, so we shouldn't ever hit the
* WARN_ON_ONCE() below (but let's handle it just in case).
*/
list_for_each_entry_safe(rq, next, list, queuelist) {
if (WARN_ON_ONCE(rq->tag != -1)) {
list_del_init(&rq->queuelist);
blk_mq_sched_bypass_insert(hctx, rq);
}
}
}
if (e && e->type->ops.mq.insert_requests) if (e && e->type->ops.mq.insert_requests)
e->type->ops.mq.insert_requests(hctx, list, false); e->type->ops.mq.insert_requests(hctx, list, false);
else else

View File

@ -298,12 +298,12 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
} }
EXPORT_SYMBOL(blk_mq_tagset_busy_iter); EXPORT_SYMBOL(blk_mq_tagset_busy_iter);
int blk_mq_reinit_tagset(struct blk_mq_tag_set *set, int blk_mq_tagset_iter(struct blk_mq_tag_set *set, void *data,
int (reinit_request)(void *, struct request *)) int (fn)(void *, struct request *))
{ {
int i, j, ret = 0; int i, j, ret = 0;
if (WARN_ON_ONCE(!reinit_request)) if (WARN_ON_ONCE(!fn))
goto out; goto out;
for (i = 0; i < set->nr_hw_queues; i++) { for (i = 0; i < set->nr_hw_queues; i++) {
@ -316,8 +316,7 @@ int blk_mq_reinit_tagset(struct blk_mq_tag_set *set,
if (!tags->static_rqs[j]) if (!tags->static_rqs[j])
continue; continue;
ret = reinit_request(set->driver_data, ret = fn(data, tags->static_rqs[j]);
tags->static_rqs[j]);
if (ret) if (ret)
goto out; goto out;
} }
@ -326,7 +325,7 @@ int blk_mq_reinit_tagset(struct blk_mq_tag_set *set,
out: out:
return ret; return ret;
} }
EXPORT_SYMBOL_GPL(blk_mq_reinit_tagset); EXPORT_SYMBOL_GPL(blk_mq_tagset_iter);
void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn, void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
void *priv) void *priv)

View File

@ -44,14 +44,9 @@ static inline struct sbq_wait_state *bt_wait_ptr(struct sbitmap_queue *bt,
return sbq_wait_ptr(bt, &hctx->wait_index); return sbq_wait_ptr(bt, &hctx->wait_index);
} }
enum {
BLK_MQ_TAG_CACHE_MIN = 1,
BLK_MQ_TAG_CACHE_MAX = 64,
};
enum { enum {
BLK_MQ_TAG_FAIL = -1U, BLK_MQ_TAG_FAIL = -1U,
BLK_MQ_TAG_MIN = BLK_MQ_TAG_CACHE_MIN, BLK_MQ_TAG_MIN = 1,
BLK_MQ_TAG_MAX = BLK_MQ_TAG_FAIL - 1, BLK_MQ_TAG_MAX = BLK_MQ_TAG_FAIL - 1,
}; };

View File

@ -37,6 +37,7 @@
#include "blk-wbt.h" #include "blk-wbt.h"
#include "blk-mq-sched.h" #include "blk-mq-sched.h"
static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
static void blk_mq_poll_stats_start(struct request_queue *q); static void blk_mq_poll_stats_start(struct request_queue *q);
static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb); static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
@ -60,10 +61,10 @@ static int blk_mq_poll_stats_bkt(const struct request *rq)
/* /*
* Check if any of the ctx's have pending work in this hardware queue * Check if any of the ctx's have pending work in this hardware queue
*/ */
bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx) static bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx)
{ {
return sbitmap_any_bit_set(&hctx->ctx_map) || return !list_empty_careful(&hctx->dispatch) ||
!list_empty_careful(&hctx->dispatch) || sbitmap_any_bit_set(&hctx->ctx_map) ||
blk_mq_sched_has_work(hctx); blk_mq_sched_has_work(hctx);
} }
@ -125,7 +126,8 @@ void blk_freeze_queue_start(struct request_queue *q)
freeze_depth = atomic_inc_return(&q->mq_freeze_depth); freeze_depth = atomic_inc_return(&q->mq_freeze_depth);
if (freeze_depth == 1) { if (freeze_depth == 1) {
percpu_ref_kill(&q->q_usage_counter); percpu_ref_kill(&q->q_usage_counter);
blk_mq_run_hw_queues(q, false); if (q->mq_ops)
blk_mq_run_hw_queues(q, false);
} }
} }
EXPORT_SYMBOL_GPL(blk_freeze_queue_start); EXPORT_SYMBOL_GPL(blk_freeze_queue_start);
@ -255,13 +257,6 @@ void blk_mq_wake_waiters(struct request_queue *q)
queue_for_each_hw_ctx(q, hctx, i) queue_for_each_hw_ctx(q, hctx, i)
if (blk_mq_hw_queue_mapped(hctx)) if (blk_mq_hw_queue_mapped(hctx))
blk_mq_tag_wakeup_all(hctx->tags, true); blk_mq_tag_wakeup_all(hctx->tags, true);
/*
* If we are called because the queue has now been marked as
* dying, we need to ensure that processes currently waiting on
* the queue are notified as well.
*/
wake_up_all(&q->mq_freeze_wq);
} }
bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx) bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx)
@ -296,6 +291,8 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
rq->q = data->q; rq->q = data->q;
rq->mq_ctx = data->ctx; rq->mq_ctx = data->ctx;
rq->cmd_flags = op; rq->cmd_flags = op;
if (data->flags & BLK_MQ_REQ_PREEMPT)
rq->rq_flags |= RQF_PREEMPT;
if (blk_queue_io_stat(data->q)) if (blk_queue_io_stat(data->q))
rq->rq_flags |= RQF_IO_STAT; rq->rq_flags |= RQF_IO_STAT;
/* do not touch atomic flags, it needs atomic ops against the timer */ /* do not touch atomic flags, it needs atomic ops against the timer */
@ -336,12 +333,14 @@ static struct request *blk_mq_get_request(struct request_queue *q,
struct elevator_queue *e = q->elevator; struct elevator_queue *e = q->elevator;
struct request *rq; struct request *rq;
unsigned int tag; unsigned int tag;
struct blk_mq_ctx *local_ctx = NULL; bool put_ctx_on_error = false;
blk_queue_enter_live(q); blk_queue_enter_live(q);
data->q = q; data->q = q;
if (likely(!data->ctx)) if (likely(!data->ctx)) {
data->ctx = local_ctx = blk_mq_get_ctx(q); data->ctx = blk_mq_get_ctx(q);
put_ctx_on_error = true;
}
if (likely(!data->hctx)) if (likely(!data->hctx))
data->hctx = blk_mq_map_queue(q, data->ctx->cpu); data->hctx = blk_mq_map_queue(q, data->ctx->cpu);
if (op & REQ_NOWAIT) if (op & REQ_NOWAIT)
@ -360,8 +359,8 @@ static struct request *blk_mq_get_request(struct request_queue *q,
tag = blk_mq_get_tag(data); tag = blk_mq_get_tag(data);
if (tag == BLK_MQ_TAG_FAIL) { if (tag == BLK_MQ_TAG_FAIL) {
if (local_ctx) { if (put_ctx_on_error) {
blk_mq_put_ctx(local_ctx); blk_mq_put_ctx(data->ctx);
data->ctx = NULL; data->ctx = NULL;
} }
blk_queue_exit(q); blk_queue_exit(q);
@ -384,13 +383,13 @@ static struct request *blk_mq_get_request(struct request_queue *q,
} }
struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
unsigned int flags) blk_mq_req_flags_t flags)
{ {
struct blk_mq_alloc_data alloc_data = { .flags = flags }; struct blk_mq_alloc_data alloc_data = { .flags = flags };
struct request *rq; struct request *rq;
int ret; int ret;
ret = blk_queue_enter(q, flags & BLK_MQ_REQ_NOWAIT); ret = blk_queue_enter(q, flags);
if (ret) if (ret)
return ERR_PTR(ret); return ERR_PTR(ret);
@ -410,7 +409,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
EXPORT_SYMBOL(blk_mq_alloc_request); EXPORT_SYMBOL(blk_mq_alloc_request);
struct request *blk_mq_alloc_request_hctx(struct request_queue *q, struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
unsigned int op, unsigned int flags, unsigned int hctx_idx) unsigned int op, blk_mq_req_flags_t flags, unsigned int hctx_idx)
{ {
struct blk_mq_alloc_data alloc_data = { .flags = flags }; struct blk_mq_alloc_data alloc_data = { .flags = flags };
struct request *rq; struct request *rq;
@ -429,7 +428,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
if (hctx_idx >= q->nr_hw_queues) if (hctx_idx >= q->nr_hw_queues)
return ERR_PTR(-EIO); return ERR_PTR(-EIO);
ret = blk_queue_enter(q, true); ret = blk_queue_enter(q, flags);
if (ret) if (ret)
return ERR_PTR(ret); return ERR_PTR(ret);
@ -476,8 +475,14 @@ void blk_mq_free_request(struct request *rq)
if (rq->rq_flags & RQF_MQ_INFLIGHT) if (rq->rq_flags & RQF_MQ_INFLIGHT)
atomic_dec(&hctx->nr_active); atomic_dec(&hctx->nr_active);
if (unlikely(laptop_mode && !blk_rq_is_passthrough(rq)))
laptop_io_completion(q->backing_dev_info);
wbt_done(q->rq_wb, &rq->issue_stat); wbt_done(q->rq_wb, &rq->issue_stat);
if (blk_rq_rl(rq))
blk_put_rl(blk_rq_rl(rq));
clear_bit(REQ_ATOM_STARTED, &rq->atomic_flags); clear_bit(REQ_ATOM_STARTED, &rq->atomic_flags);
clear_bit(REQ_ATOM_POLL_SLEPT, &rq->atomic_flags); clear_bit(REQ_ATOM_POLL_SLEPT, &rq->atomic_flags);
if (rq->tag != -1) if (rq->tag != -1)
@ -593,22 +598,32 @@ void blk_mq_start_request(struct request *rq)
blk_add_timer(rq); blk_add_timer(rq);
/* WARN_ON_ONCE(test_bit(REQ_ATOM_STARTED, &rq->atomic_flags));
* Ensure that ->deadline is visible before set the started
* flag and clear the completed flag.
*/
smp_mb__before_atomic();
/* /*
* Mark us as started and clear complete. Complete might have been * Mark us as started and clear complete. Complete might have been
* set if requeue raced with timeout, which then marked it as * set if requeue raced with timeout, which then marked it as
* complete. So be sure to clear complete again when we start * complete. So be sure to clear complete again when we start
* the request, otherwise we'll ignore the completion event. * the request, otherwise we'll ignore the completion event.
*
* Ensure that ->deadline is visible before we set STARTED, such that
* blk_mq_check_expired() is guaranteed to observe our ->deadline when
* it observes STARTED.
*/ */
if (!test_bit(REQ_ATOM_STARTED, &rq->atomic_flags)) smp_wmb();
set_bit(REQ_ATOM_STARTED, &rq->atomic_flags); set_bit(REQ_ATOM_STARTED, &rq->atomic_flags);
if (test_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags)) if (test_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags)) {
/*
* Coherence order guarantees these consecutive stores to a
* single variable propagate in the specified order. Thus the
* clear_bit() is ordered _after_ the set bit. See
* blk_mq_check_expired().
*
* (the bits must be part of the same byte for this to be
* true).
*/
clear_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags); clear_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags);
}
if (q->dma_drain_size && blk_rq_bytes(rq)) { if (q->dma_drain_size && blk_rq_bytes(rq)) {
/* /*
@ -634,6 +649,8 @@ static void __blk_mq_requeue_request(struct request *rq)
{ {
struct request_queue *q = rq->q; struct request_queue *q = rq->q;
blk_mq_put_driver_tag(rq);
trace_block_rq_requeue(q, rq); trace_block_rq_requeue(q, rq);
wbt_requeue(q->rq_wb, &rq->issue_stat); wbt_requeue(q->rq_wb, &rq->issue_stat);
blk_mq_sched_requeue_request(rq); blk_mq_sched_requeue_request(rq);
@ -690,7 +707,7 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
/* /*
* We abuse this flag that is otherwise used by the I/O scheduler to * We abuse this flag that is otherwise used by the I/O scheduler to
* request head insertation from the workqueue. * request head insertion from the workqueue.
*/ */
BUG_ON(rq->rq_flags & RQF_SOFTBARRIER); BUG_ON(rq->rq_flags & RQF_SOFTBARRIER);
@ -778,10 +795,19 @@ static void blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
struct request *rq, void *priv, bool reserved) struct request *rq, void *priv, bool reserved)
{ {
struct blk_mq_timeout_data *data = priv; struct blk_mq_timeout_data *data = priv;
unsigned long deadline;
if (!test_bit(REQ_ATOM_STARTED, &rq->atomic_flags)) if (!test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
return; return;
/*
* Ensures that if we see STARTED we must also see our
* up-to-date deadline, see blk_mq_start_request().
*/
smp_rmb();
deadline = READ_ONCE(rq->deadline);
/* /*
* The rq being checked may have been freed and reallocated * The rq being checked may have been freed and reallocated
* out already here, we avoid this race by checking rq->deadline * out already here, we avoid this race by checking rq->deadline
@ -795,11 +821,20 @@ static void blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
* and clearing the flag in blk_mq_start_request(), so * and clearing the flag in blk_mq_start_request(), so
* this rq won't be timed out too. * this rq won't be timed out too.
*/ */
if (time_after_eq(jiffies, rq->deadline)) { if (time_after_eq(jiffies, deadline)) {
if (!blk_mark_rq_complete(rq)) if (!blk_mark_rq_complete(rq)) {
/*
* Again coherence order ensures that consecutive reads
* from the same variable must be in that order. This
* ensures that if we see COMPLETE clear, we must then
* see STARTED set and we'll ignore this timeout.
*
* (There's also the MB implied by the test_and_clear())
*/
blk_mq_rq_timed_out(rq, reserved); blk_mq_rq_timed_out(rq, reserved);
} else if (!data->next_set || time_after(data->next, rq->deadline)) { }
data->next = rq->deadline; } else if (!data->next_set || time_after(data->next, deadline)) {
data->next = deadline;
data->next_set = 1; data->next_set = 1;
} }
} }
@ -880,6 +915,45 @@ void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list)
} }
EXPORT_SYMBOL_GPL(blk_mq_flush_busy_ctxs); EXPORT_SYMBOL_GPL(blk_mq_flush_busy_ctxs);
struct dispatch_rq_data {
struct blk_mq_hw_ctx *hctx;
struct request *rq;
};
static bool dispatch_rq_from_ctx(struct sbitmap *sb, unsigned int bitnr,
void *data)
{
struct dispatch_rq_data *dispatch_data = data;
struct blk_mq_hw_ctx *hctx = dispatch_data->hctx;
struct blk_mq_ctx *ctx = hctx->ctxs[bitnr];
spin_lock(&ctx->lock);
if (unlikely(!list_empty(&ctx->rq_list))) {
dispatch_data->rq = list_entry_rq(ctx->rq_list.next);
list_del_init(&dispatch_data->rq->queuelist);
if (list_empty(&ctx->rq_list))
sbitmap_clear_bit(sb, bitnr);
}
spin_unlock(&ctx->lock);
return !dispatch_data->rq;
}
struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx,
struct blk_mq_ctx *start)
{
unsigned off = start ? start->index_hw : 0;
struct dispatch_rq_data data = {
.hctx = hctx,
.rq = NULL,
};
__sbitmap_for_each_set(&hctx->ctx_map, off,
dispatch_rq_from_ctx, &data);
return data.rq;
}
static inline unsigned int queued_to_index(unsigned int queued) static inline unsigned int queued_to_index(unsigned int queued)
{ {
if (!queued) if (!queued)
@ -920,109 +994,95 @@ done:
return rq->tag != -1; return rq->tag != -1;
} }
static void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx, static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode,
struct request *rq) int flags, void *key)
{
blk_mq_put_tag(hctx, hctx->tags, rq->mq_ctx, rq->tag);
rq->tag = -1;
if (rq->rq_flags & RQF_MQ_INFLIGHT) {
rq->rq_flags &= ~RQF_MQ_INFLIGHT;
atomic_dec(&hctx->nr_active);
}
}
static void blk_mq_put_driver_tag_hctx(struct blk_mq_hw_ctx *hctx,
struct request *rq)
{
if (rq->tag == -1 || rq->internal_tag == -1)
return;
__blk_mq_put_driver_tag(hctx, rq);
}
static void blk_mq_put_driver_tag(struct request *rq)
{
struct blk_mq_hw_ctx *hctx;
if (rq->tag == -1 || rq->internal_tag == -1)
return;
hctx = blk_mq_map_queue(rq->q, rq->mq_ctx->cpu);
__blk_mq_put_driver_tag(hctx, rq);
}
/*
* If we fail getting a driver tag because all the driver tags are already
* assigned and on the dispatch list, BUT the first entry does not have a
* tag, then we could deadlock. For that case, move entries with assigned
* driver tags to the front, leaving the set of tagged requests in the
* same order, and the untagged set in the same order.
*/
static bool reorder_tags_to_front(struct list_head *list)
{
struct request *rq, *tmp, *first = NULL;
list_for_each_entry_safe_reverse(rq, tmp, list, queuelist) {
if (rq == first)
break;
if (rq->tag != -1) {
list_move(&rq->queuelist, list);
if (!first)
first = rq;
}
}
return first != NULL;
}
static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode, int flags,
void *key)
{ {
struct blk_mq_hw_ctx *hctx; struct blk_mq_hw_ctx *hctx;
hctx = container_of(wait, struct blk_mq_hw_ctx, dispatch_wait); hctx = container_of(wait, struct blk_mq_hw_ctx, dispatch_wait);
list_del(&wait->entry); list_del_init(&wait->entry);
clear_bit_unlock(BLK_MQ_S_TAG_WAITING, &hctx->state);
blk_mq_run_hw_queue(hctx, true); blk_mq_run_hw_queue(hctx, true);
return 1; return 1;
} }
static bool blk_mq_dispatch_wait_add(struct blk_mq_hw_ctx *hctx) /*
* Mark us waiting for a tag. For shared tags, this involves hooking us into
* the tag wakeups. For non-shared tags, we can simply mark us nedeing a
* restart. For both caes, take care to check the condition again after
* marking us as waiting.
*/
static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx **hctx,
struct request *rq)
{ {
struct blk_mq_hw_ctx *this_hctx = *hctx;
bool shared_tags = (this_hctx->flags & BLK_MQ_F_TAG_SHARED) != 0;
struct sbq_wait_state *ws; struct sbq_wait_state *ws;
wait_queue_entry_t *wait;
bool ret;
if (!shared_tags) {
if (!test_bit(BLK_MQ_S_SCHED_RESTART, &this_hctx->state))
set_bit(BLK_MQ_S_SCHED_RESTART, &this_hctx->state);
} else {
wait = &this_hctx->dispatch_wait;
if (!list_empty_careful(&wait->entry))
return false;
spin_lock(&this_hctx->lock);
if (!list_empty(&wait->entry)) {
spin_unlock(&this_hctx->lock);
return false;
}
ws = bt_wait_ptr(&this_hctx->tags->bitmap_tags, this_hctx);
add_wait_queue(&ws->wait, wait);
}
/* /*
* The TAG_WAITING bit serves as a lock protecting hctx->dispatch_wait. * It's possible that a tag was freed in the window between the
* The thread which wins the race to grab this bit adds the hardware * allocation failure and adding the hardware queue to the wait
* queue to the wait queue. * queue.
*/ */
if (test_bit(BLK_MQ_S_TAG_WAITING, &hctx->state) || ret = blk_mq_get_driver_tag(rq, hctx, false);
test_and_set_bit_lock(BLK_MQ_S_TAG_WAITING, &hctx->state))
return false;
init_waitqueue_func_entry(&hctx->dispatch_wait, blk_mq_dispatch_wake); if (!shared_tags) {
ws = bt_wait_ptr(&hctx->tags->bitmap_tags, hctx); /*
* Don't clear RESTART here, someone else could have set it.
* At most this will cost an extra queue run.
*/
return ret;
} else {
if (!ret) {
spin_unlock(&this_hctx->lock);
return false;
}
/* /*
* As soon as this returns, it's no longer safe to fiddle with * We got a tag, remove ourselves from the wait queue to ensure
* hctx->dispatch_wait, since a completion can wake up the wait queue * someone else gets the wakeup.
* and unlock the bit. */
*/ spin_lock_irq(&ws->wait.lock);
add_wait_queue(&ws->wait, &hctx->dispatch_wait); list_del_init(&wait->entry);
return true; spin_unlock_irq(&ws->wait.lock);
spin_unlock(&this_hctx->lock);
return true;
}
} }
bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list) bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
bool got_budget)
{ {
struct blk_mq_hw_ctx *hctx; struct blk_mq_hw_ctx *hctx;
struct request *rq; struct request *rq, *nxt;
bool no_tag = false;
int errors, queued; int errors, queued;
if (list_empty(list)) if (list_empty(list))
return false; return false;
WARN_ON(!list_is_singular(list) && got_budget);
/* /*
* Now process all the entries, sending them to the driver. * Now process all the entries, sending them to the driver.
*/ */
@ -1033,23 +1093,29 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list)
rq = list_first_entry(list, struct request, queuelist); rq = list_first_entry(list, struct request, queuelist);
if (!blk_mq_get_driver_tag(rq, &hctx, false)) { if (!blk_mq_get_driver_tag(rq, &hctx, false)) {
if (!queued && reorder_tags_to_front(list))
continue;
/* /*
* The initial allocation attempt failed, so we need to * The initial allocation attempt failed, so we need to
* rerun the hardware queue when a tag is freed. * rerun the hardware queue when a tag is freed. The
* waitqueue takes care of that. If the queue is run
* before we add this entry back on the dispatch list,
* we'll re-run it below.
*/ */
if (!blk_mq_dispatch_wait_add(hctx)) if (!blk_mq_mark_tag_wait(&hctx, rq)) {
if (got_budget)
blk_mq_put_dispatch_budget(hctx);
/*
* For non-shared tags, the RESTART check
* will suffice.
*/
if (hctx->flags & BLK_MQ_F_TAG_SHARED)
no_tag = true;
break; break;
}
}
/* if (!got_budget && !blk_mq_get_dispatch_budget(hctx)) {
* It's possible that a tag was freed in the window blk_mq_put_driver_tag(rq);
* between the allocation failure and adding the break;
* hardware queue to the wait queue.
*/
if (!blk_mq_get_driver_tag(rq, &hctx, false))
break;
} }
list_del_init(&rq->queuelist); list_del_init(&rq->queuelist);
@ -1063,15 +1129,21 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list)
if (list_empty(list)) if (list_empty(list))
bd.last = true; bd.last = true;
else { else {
struct request *nxt;
nxt = list_first_entry(list, struct request, queuelist); nxt = list_first_entry(list, struct request, queuelist);
bd.last = !blk_mq_get_driver_tag(nxt, NULL, false); bd.last = !blk_mq_get_driver_tag(nxt, NULL, false);
} }
ret = q->mq_ops->queue_rq(hctx, &bd); ret = q->mq_ops->queue_rq(hctx, &bd);
if (ret == BLK_STS_RESOURCE) { if (ret == BLK_STS_RESOURCE) {
blk_mq_put_driver_tag_hctx(hctx, rq); /*
* If an I/O scheduler has been configured and we got a
* driver tag for the next request already, free it
* again.
*/
if (!list_empty(list)) {
nxt = list_first_entry(list, struct request, queuelist);
blk_mq_put_driver_tag(nxt);
}
list_add(&rq->queuelist, list); list_add(&rq->queuelist, list);
__blk_mq_requeue_request(rq); __blk_mq_requeue_request(rq);
break; break;
@ -1093,13 +1165,6 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list)
* that is where we will continue on next queue run. * that is where we will continue on next queue run.
*/ */
if (!list_empty(list)) { if (!list_empty(list)) {
/*
* If an I/O scheduler has been configured and we got a driver
* tag for the next request already, free it again.
*/
rq = list_first_entry(list, struct request, queuelist);
blk_mq_put_driver_tag(rq);
spin_lock(&hctx->lock); spin_lock(&hctx->lock);
list_splice_init(list, &hctx->dispatch); list_splice_init(list, &hctx->dispatch);
spin_unlock(&hctx->lock); spin_unlock(&hctx->lock);
@ -1109,10 +1174,10 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list)
* it is no longer set that means that it was cleared by another * it is no longer set that means that it was cleared by another
* thread and hence that a queue rerun is needed. * thread and hence that a queue rerun is needed.
* *
* If TAG_WAITING is set that means that an I/O scheduler has * If 'no_tag' is set, that means that we failed getting
* been configured and another thread is waiting for a driver * a driver tag with an I/O scheduler attached. If our dispatch
* tag. To guarantee fairness, do not rerun this hardware queue * waitqueue is no longer active, ensure that we run the queue
* but let the other thread grab the driver tag. * AFTER adding our entries back to the list.
* *
* If no I/O scheduler has been configured it is possible that * If no I/O scheduler has been configured it is possible that
* the hardware queue got stopped and restarted before requests * the hardware queue got stopped and restarted before requests
@ -1124,8 +1189,8 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list)
* returning BLK_STS_RESOURCE. Two exceptions are scsi-mq * returning BLK_STS_RESOURCE. Two exceptions are scsi-mq
* and dm-rq. * and dm-rq.
*/ */
if (!blk_mq_sched_needs_restart(hctx) && if (!blk_mq_sched_needs_restart(hctx) ||
!test_bit(BLK_MQ_S_TAG_WAITING, &hctx->state)) (no_tag && list_empty_careful(&hctx->dispatch_wait.entry)))
blk_mq_run_hw_queue(hctx, true); blk_mq_run_hw_queue(hctx, true);
} }
@ -1218,9 +1283,14 @@ void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs)
} }
EXPORT_SYMBOL(blk_mq_delay_run_hw_queue); EXPORT_SYMBOL(blk_mq_delay_run_hw_queue);
void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) bool blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
{ {
__blk_mq_delay_run_hw_queue(hctx, async, 0); if (blk_mq_hctx_has_pending(hctx)) {
__blk_mq_delay_run_hw_queue(hctx, async, 0);
return true;
}
return false;
} }
EXPORT_SYMBOL(blk_mq_run_hw_queue); EXPORT_SYMBOL(blk_mq_run_hw_queue);
@ -1230,8 +1300,7 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async)
int i; int i;
queue_for_each_hw_ctx(q, hctx, i) { queue_for_each_hw_ctx(q, hctx, i) {
if (!blk_mq_hctx_has_pending(hctx) || if (blk_mq_hctx_stopped(hctx))
blk_mq_hctx_stopped(hctx))
continue; continue;
blk_mq_run_hw_queue(hctx, async); blk_mq_run_hw_queue(hctx, async);
@ -1405,7 +1474,7 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
* Should only be used carefully, when the caller knows we want to * Should only be used carefully, when the caller knows we want to
* bypass a potential IO scheduler on the target device. * bypass a potential IO scheduler on the target device.
*/ */
void blk_mq_request_bypass_insert(struct request *rq) void blk_mq_request_bypass_insert(struct request *rq, bool run_queue)
{ {
struct blk_mq_ctx *ctx = rq->mq_ctx; struct blk_mq_ctx *ctx = rq->mq_ctx;
struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(rq->q, ctx->cpu); struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(rq->q, ctx->cpu);
@ -1414,7 +1483,8 @@ void blk_mq_request_bypass_insert(struct request *rq)
list_add_tail(&rq->queuelist, &hctx->dispatch); list_add_tail(&rq->queuelist, &hctx->dispatch);
spin_unlock(&hctx->lock); spin_unlock(&hctx->lock);
blk_mq_run_hw_queue(hctx, false); if (run_queue)
blk_mq_run_hw_queue(hctx, false);
} }
void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx, void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
@ -1501,13 +1571,9 @@ static void blk_mq_bio_to_request(struct request *rq, struct bio *bio)
{ {
blk_init_request_from_bio(rq, bio); blk_init_request_from_bio(rq, bio);
blk_account_io_start(rq, true); blk_rq_set_rl(rq, blk_get_rl(rq->q, bio));
}
static inline bool hctx_allow_merges(struct blk_mq_hw_ctx *hctx) blk_account_io_start(rq, true);
{
return (hctx->flags & BLK_MQ_F_SHOULD_MERGE) &&
!blk_queue_nomerges(hctx->queue);
} }
static inline void blk_mq_queue_io(struct blk_mq_hw_ctx *hctx, static inline void blk_mq_queue_io(struct blk_mq_hw_ctx *hctx,
@ -1552,6 +1618,11 @@ static void __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
if (!blk_mq_get_driver_tag(rq, NULL, false)) if (!blk_mq_get_driver_tag(rq, NULL, false))
goto insert; goto insert;
if (!blk_mq_get_dispatch_budget(hctx)) {
blk_mq_put_driver_tag(rq);
goto insert;
}
new_cookie = request_to_qc_t(hctx, rq); new_cookie = request_to_qc_t(hctx, rq);
/* /*
@ -1641,13 +1712,10 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
if (unlikely(is_flush_fua)) { if (unlikely(is_flush_fua)) {
blk_mq_put_ctx(data.ctx); blk_mq_put_ctx(data.ctx);
blk_mq_bio_to_request(rq, bio); blk_mq_bio_to_request(rq, bio);
if (q->elevator) {
blk_mq_sched_insert_request(rq, false, true, true, /* bypass scheduler for flush rq */
true); blk_insert_flush(rq);
} else { blk_mq_run_hw_queue(data.hctx, true);
blk_insert_flush(rq);
blk_mq_run_hw_queue(data.hctx, true);
}
} else if (plug && q->nr_hw_queues == 1) { } else if (plug && q->nr_hw_queues == 1) {
struct request *last = NULL; struct request *last = NULL;
@ -1990,6 +2058,9 @@ static int blk_mq_init_hctx(struct request_queue *q,
hctx->nr_ctx = 0; hctx->nr_ctx = 0;
init_waitqueue_func_entry(&hctx->dispatch_wait, blk_mq_dispatch_wake);
INIT_LIST_HEAD(&hctx->dispatch_wait.entry);
if (set->ops->init_hctx && if (set->ops->init_hctx &&
set->ops->init_hctx(hctx, set->driver_data, hctx_idx)) set->ops->init_hctx(hctx, set->driver_data, hctx_idx))
goto free_bitmap; goto free_bitmap;
@ -2229,8 +2300,11 @@ static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set,
mutex_lock(&set->tag_list_lock); mutex_lock(&set->tag_list_lock);
/* Check to see if we're transitioning to shared (from 1 to 2 queues). */ /*
if (!list_empty(&set->tag_list) && !(set->flags & BLK_MQ_F_TAG_SHARED)) { * Check to see if we're transitioning to shared (from 1 to 2 queues).
*/
if (!list_empty(&set->tag_list) &&
!(set->flags & BLK_MQ_F_TAG_SHARED)) {
set->flags |= BLK_MQ_F_TAG_SHARED; set->flags |= BLK_MQ_F_TAG_SHARED;
/* update existing queue */ /* update existing queue */
blk_mq_update_tag_set_depth(set, true); blk_mq_update_tag_set_depth(set, true);
@ -2404,6 +2478,8 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
spin_lock_init(&q->requeue_lock); spin_lock_init(&q->requeue_lock);
blk_queue_make_request(q, blk_mq_make_request); blk_queue_make_request(q, blk_mq_make_request);
if (q->mq_ops->poll)
q->poll_fn = blk_mq_poll;
/* /*
* Do this after blk_queue_make_request() overrides it... * Do this after blk_queue_make_request() overrides it...
@ -2460,10 +2536,9 @@ static void blk_mq_queue_reinit(struct request_queue *q)
/* /*
* redo blk_mq_init_cpu_queues and blk_mq_init_hw_queues. FIXME: maybe * redo blk_mq_init_cpu_queues and blk_mq_init_hw_queues. FIXME: maybe
* we should change hctx numa_node according to new topology (this * we should change hctx numa_node according to the new topology (this
* involves free and re-allocate memory, worthy doing?) * involves freeing and re-allocating memory, worth doing?)
*/ */
blk_mq_map_swqueue(q); blk_mq_map_swqueue(q);
blk_mq_sysfs_register(q); blk_mq_sysfs_register(q);
@ -2552,6 +2627,9 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
if (!set->ops->queue_rq) if (!set->ops->queue_rq)
return -EINVAL; return -EINVAL;
if (!set->ops->get_budget ^ !set->ops->put_budget)
return -EINVAL;
if (set->queue_depth > BLK_MQ_MAX_DEPTH) { if (set->queue_depth > BLK_MQ_MAX_DEPTH) {
pr_info("blk-mq: reduced tag depth to %u\n", pr_info("blk-mq: reduced tag depth to %u\n",
BLK_MQ_MAX_DEPTH); BLK_MQ_MAX_DEPTH);
@ -2642,8 +2720,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
* queue depth. This is similar to what the old code would do. * queue depth. This is similar to what the old code would do.
*/ */
if (!hctx->sched_tags) { if (!hctx->sched_tags) {
ret = blk_mq_tag_update_depth(hctx, &hctx->tags, ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
min(nr, set->queue_depth),
false); false);
} else { } else {
ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags, ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
@ -2863,20 +2940,14 @@ static bool __blk_mq_poll(struct blk_mq_hw_ctx *hctx, struct request *rq)
return false; return false;
} }
bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie) static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
{ {
struct blk_mq_hw_ctx *hctx; struct blk_mq_hw_ctx *hctx;
struct blk_plug *plug;
struct request *rq; struct request *rq;
if (!q->mq_ops || !q->mq_ops->poll || !blk_qc_t_valid(cookie) || if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
return false; return false;
plug = current->plug;
if (plug)
blk_flush_plug_list(plug, false);
hctx = q->queue_hw_ctx[blk_qc_t_to_queue_num(cookie)]; hctx = q->queue_hw_ctx[blk_qc_t_to_queue_num(cookie)];
if (!blk_qc_t_is_internal(cookie)) if (!blk_qc_t_is_internal(cookie))
rq = blk_mq_tag_to_rq(hctx->tags, blk_qc_t_to_tag(cookie)); rq = blk_mq_tag_to_rq(hctx->tags, blk_qc_t_to_tag(cookie));
@ -2894,10 +2965,15 @@ bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
return __blk_mq_poll(hctx, rq); return __blk_mq_poll(hctx, rq);
} }
EXPORT_SYMBOL_GPL(blk_mq_poll);
static int __init blk_mq_init(void) static int __init blk_mq_init(void)
{ {
/*
* See comment in block/blk.h rq_atomic_flags enum
*/
BUILD_BUG_ON((REQ_ATOM_STARTED / BITS_PER_BYTE) !=
(REQ_ATOM_COMPLETE / BITS_PER_BYTE));
cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL, cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
blk_mq_hctx_notify_dead); blk_mq_hctx_notify_dead);
return 0; return 0;

View File

@ -3,6 +3,7 @@
#define INT_BLK_MQ_H #define INT_BLK_MQ_H
#include "blk-stat.h" #include "blk-stat.h"
#include "blk-mq-tag.h"
struct blk_mq_tag_set; struct blk_mq_tag_set;
@ -26,16 +27,16 @@ struct blk_mq_ctx {
struct kobject kobj; struct kobject kobj;
} ____cacheline_aligned_in_smp; } ____cacheline_aligned_in_smp;
void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
void blk_mq_freeze_queue(struct request_queue *q); void blk_mq_freeze_queue(struct request_queue *q);
void blk_mq_free_queue(struct request_queue *q); void blk_mq_free_queue(struct request_queue *q);
int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr); int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr);
void blk_mq_wake_waiters(struct request_queue *q); void blk_mq_wake_waiters(struct request_queue *q);
bool blk_mq_dispatch_rq_list(struct request_queue *, struct list_head *); bool blk_mq_dispatch_rq_list(struct request_queue *, struct list_head *, bool);
void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list); void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list);
bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx);
bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx, bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx,
bool wait); bool wait);
struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx,
struct blk_mq_ctx *start);
/* /*
* Internal helpers for allocating/freeing the request map * Internal helpers for allocating/freeing the request map
@ -55,7 +56,7 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
*/ */
void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
bool at_head); bool at_head);
void blk_mq_request_bypass_insert(struct request *rq); void blk_mq_request_bypass_insert(struct request *rq, bool run_queue);
void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx, void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
struct list_head *list); struct list_head *list);
@ -109,7 +110,7 @@ static inline void blk_mq_put_ctx(struct blk_mq_ctx *ctx)
struct blk_mq_alloc_data { struct blk_mq_alloc_data {
/* input parameter */ /* input parameter */
struct request_queue *q; struct request_queue *q;
unsigned int flags; blk_mq_req_flags_t flags;
unsigned int shallow_depth; unsigned int shallow_depth;
/* input & output parameter */ /* input & output parameter */
@ -138,4 +139,53 @@ static inline bool blk_mq_hw_queue_mapped(struct blk_mq_hw_ctx *hctx)
void blk_mq_in_flight(struct request_queue *q, struct hd_struct *part, void blk_mq_in_flight(struct request_queue *q, struct hd_struct *part,
unsigned int inflight[2]); unsigned int inflight[2]);
static inline void blk_mq_put_dispatch_budget(struct blk_mq_hw_ctx *hctx)
{
struct request_queue *q = hctx->queue;
if (q->mq_ops->put_budget)
q->mq_ops->put_budget(hctx);
}
static inline bool blk_mq_get_dispatch_budget(struct blk_mq_hw_ctx *hctx)
{
struct request_queue *q = hctx->queue;
if (q->mq_ops->get_budget)
return q->mq_ops->get_budget(hctx);
return true;
}
static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
struct request *rq)
{
blk_mq_put_tag(hctx, hctx->tags, rq->mq_ctx, rq->tag);
rq->tag = -1;
if (rq->rq_flags & RQF_MQ_INFLIGHT) {
rq->rq_flags &= ~RQF_MQ_INFLIGHT;
atomic_dec(&hctx->nr_active);
}
}
static inline void blk_mq_put_driver_tag_hctx(struct blk_mq_hw_ctx *hctx,
struct request *rq)
{
if (rq->tag == -1 || rq->internal_tag == -1)
return;
__blk_mq_put_driver_tag(hctx, rq);
}
static inline void blk_mq_put_driver_tag(struct request *rq)
{
struct blk_mq_hw_ctx *hctx;
if (rq->tag == -1 || rq->internal_tag == -1)
return;
hctx = blk_mq_map_queue(rq->q, rq->mq_ctx->cpu);
__blk_mq_put_driver_tag(hctx, rq);
}
#endif #endif

View File

@ -157,7 +157,7 @@ EXPORT_SYMBOL(blk_set_stacking_limits);
* Caveat: * Caveat:
* The driver that does this *must* be able to deal appropriately * The driver that does this *must* be able to deal appropriately
* with buffers in "highmemory". This can be accomplished by either calling * with buffers in "highmemory". This can be accomplished by either calling
* __bio_kmap_atomic() to get a temporary kernel mapping, or by calling * kmap_atomic() to get a temporary kernel mapping, or by calling
* blk_queue_bounce() to create a buffer in normal memory. * blk_queue_bounce() to create a buffer in normal memory.
**/ **/
void blk_queue_make_request(struct request_queue *q, make_request_fn *mfn) void blk_queue_make_request(struct request_queue *q, make_request_fn *mfn)

View File

@ -11,8 +11,6 @@
#include "blk-mq.h" #include "blk-mq.h"
#include "blk.h" #include "blk.h"
#define BLK_RQ_STAT_BATCH 64
struct blk_queue_stats { struct blk_queue_stats {
struct list_head callbacks; struct list_head callbacks;
spinlock_t lock; spinlock_t lock;
@ -23,45 +21,21 @@ static void blk_stat_init(struct blk_rq_stat *stat)
{ {
stat->min = -1ULL; stat->min = -1ULL;
stat->max = stat->nr_samples = stat->mean = 0; stat->max = stat->nr_samples = stat->mean = 0;
stat->batch = stat->nr_batch = 0; stat->batch = 0;
}
static void blk_stat_flush_batch(struct blk_rq_stat *stat)
{
const s32 nr_batch = READ_ONCE(stat->nr_batch);
const s32 nr_samples = READ_ONCE(stat->nr_samples);
if (!nr_batch)
return;
if (!nr_samples)
stat->mean = div64_s64(stat->batch, nr_batch);
else {
stat->mean = div64_s64((stat->mean * nr_samples) +
stat->batch,
nr_batch + nr_samples);
}
stat->nr_samples += nr_batch;
stat->nr_batch = stat->batch = 0;
} }
/* src is a per-cpu stat, mean isn't initialized */
static void blk_stat_sum(struct blk_rq_stat *dst, struct blk_rq_stat *src) static void blk_stat_sum(struct blk_rq_stat *dst, struct blk_rq_stat *src)
{ {
blk_stat_flush_batch(src);
if (!src->nr_samples) if (!src->nr_samples)
return; return;
dst->min = min(dst->min, src->min); dst->min = min(dst->min, src->min);
dst->max = max(dst->max, src->max); dst->max = max(dst->max, src->max);
if (!dst->nr_samples) dst->mean = div_u64(src->batch + dst->mean * dst->nr_samples,
dst->mean = src->mean; dst->nr_samples + src->nr_samples);
else {
dst->mean = div64_s64((src->mean * src->nr_samples) +
(dst->mean * dst->nr_samples),
dst->nr_samples + src->nr_samples);
}
dst->nr_samples += src->nr_samples; dst->nr_samples += src->nr_samples;
} }
@ -69,13 +43,8 @@ static void __blk_stat_add(struct blk_rq_stat *stat, u64 value)
{ {
stat->min = min(stat->min, value); stat->min = min(stat->min, value);
stat->max = max(stat->max, value); stat->max = max(stat->max, value);
if (stat->batch + value < stat->batch ||
stat->nr_batch + 1 == BLK_RQ_STAT_BATCH)
blk_stat_flush_batch(stat);
stat->batch += value; stat->batch += value;
stat->nr_batch++; stat->nr_samples++;
} }
void blk_stat_add(struct request *rq) void blk_stat_add(struct request *rq)
@ -84,7 +53,7 @@ void blk_stat_add(struct request *rq)
struct blk_stat_callback *cb; struct blk_stat_callback *cb;
struct blk_rq_stat *stat; struct blk_rq_stat *stat;
int bucket; int bucket;
s64 now, value; u64 now, value;
now = __blk_stat_time(ktime_to_ns(ktime_get())); now = __blk_stat_time(ktime_to_ns(ktime_get()));
if (now < blk_stat_time(&rq->issue_stat)) if (now < blk_stat_time(&rq->issue_stat))

View File

@ -2113,8 +2113,12 @@ static inline void throtl_update_latency_buckets(struct throtl_data *td)
static void blk_throtl_assoc_bio(struct throtl_grp *tg, struct bio *bio) static void blk_throtl_assoc_bio(struct throtl_grp *tg, struct bio *bio)
{ {
#ifdef CONFIG_BLK_DEV_THROTTLING_LOW #ifdef CONFIG_BLK_DEV_THROTTLING_LOW
if (bio->bi_css) if (bio->bi_css) {
if (bio->bi_cg_private)
blkg_put(tg_to_blkg(bio->bi_cg_private));
bio->bi_cg_private = tg; bio->bi_cg_private = tg;
blkg_get(tg_to_blkg(tg));
}
blk_stat_set_issue(&bio->bi_issue_stat, bio_sectors(bio)); blk_stat_set_issue(&bio->bi_issue_stat, bio_sectors(bio));
#endif #endif
} }
@ -2284,8 +2288,10 @@ void blk_throtl_bio_endio(struct bio *bio)
start_time = blk_stat_time(&bio->bi_issue_stat) >> 10; start_time = blk_stat_time(&bio->bi_issue_stat) >> 10;
finish_time = __blk_stat_time(finish_time_ns) >> 10; finish_time = __blk_stat_time(finish_time_ns) >> 10;
if (!start_time || finish_time <= start_time) if (!start_time || finish_time <= start_time) {
blkg_put(tg_to_blkg(tg));
return; return;
}
lat = finish_time - start_time; lat = finish_time - start_time;
/* this is only for bio based driver */ /* this is only for bio based driver */
@ -2315,6 +2321,8 @@ void blk_throtl_bio_endio(struct bio *bio)
tg->bio_cnt /= 2; tg->bio_cnt /= 2;
tg->bad_bio_cnt /= 2; tg->bad_bio_cnt /= 2;
} }
blkg_put(tg_to_blkg(tg));
} }
#endif #endif

View File

@ -134,8 +134,6 @@ void blk_timeout_work(struct work_struct *work)
struct request *rq, *tmp; struct request *rq, *tmp;
int next_set = 0; int next_set = 0;
if (blk_queue_enter(q, true))
return;
spin_lock_irqsave(q->queue_lock, flags); spin_lock_irqsave(q->queue_lock, flags);
list_for_each_entry_safe(rq, tmp, &q->timeout_list, timeout_list) list_for_each_entry_safe(rq, tmp, &q->timeout_list, timeout_list)
@ -145,7 +143,6 @@ void blk_timeout_work(struct work_struct *work)
mod_timer(&q->timeout, round_jiffies_up(next)); mod_timer(&q->timeout, round_jiffies_up(next));
spin_unlock_irqrestore(q->queue_lock, flags); spin_unlock_irqrestore(q->queue_lock, flags);
blk_queue_exit(q);
} }
/** /**
@ -211,7 +208,7 @@ void blk_add_timer(struct request *req)
if (!req->timeout) if (!req->timeout)
req->timeout = q->rq_timeout; req->timeout = q->rq_timeout;
req->deadline = jiffies + req->timeout; WRITE_ONCE(req->deadline, jiffies + req->timeout);
/* /*
* Only the non-mq case needs to add the request to a protected list. * Only the non-mq case needs to add the request to a protected list.

View File

@ -654,7 +654,7 @@ void wbt_set_write_cache(struct rq_wb *rwb, bool write_cache_on)
} }
/* /*
* Disable wbt, if enabled by default. Only called from CFQ. * Disable wbt, if enabled by default.
*/ */
void wbt_disable_default(struct request_queue *q) void wbt_disable_default(struct request_queue *q)
{ {

View File

@ -123,8 +123,15 @@ void blk_account_io_done(struct request *req);
* Internal atomic flags for request handling * Internal atomic flags for request handling
*/ */
enum rq_atomic_flags { enum rq_atomic_flags {
/*
* Keep these two bits first - not because we depend on the
* value of them, but we do depend on them being in the same
* byte of storage to ensure ordering on writes. Keeping them
* first will achieve that nicely.
*/
REQ_ATOM_COMPLETE = 0, REQ_ATOM_COMPLETE = 0,
REQ_ATOM_STARTED, REQ_ATOM_STARTED,
REQ_ATOM_POLL_SLEPT, REQ_ATOM_POLL_SLEPT,
}; };
@ -149,45 +156,6 @@ static inline void blk_clear_rq_complete(struct request *rq)
void blk_insert_flush(struct request *rq); void blk_insert_flush(struct request *rq);
static inline struct request *__elv_next_request(struct request_queue *q)
{
struct request *rq;
struct blk_flush_queue *fq = blk_get_flush_queue(q, NULL);
WARN_ON_ONCE(q->mq_ops);
while (1) {
if (!list_empty(&q->queue_head)) {
rq = list_entry_rq(q->queue_head.next);
return rq;
}
/*
* Flush request is running and flush request isn't queueable
* in the drive, we can hold the queue till flush request is
* finished. Even we don't do this, driver can't dispatch next
* requests and will requeue them. And this can improve
* throughput too. For example, we have request flush1, write1,
* flush 2. flush1 is dispatched, then queue is hold, write1
* isn't inserted to queue. After flush1 is finished, flush2
* will be dispatched. Since disk cache is already clean,
* flush2 will be finished very soon, so looks like flush2 is
* folded to flush1.
* Since the queue is hold, a flag is set to indicate the queue
* should be restarted later. Please see flush_end_io() for
* details.
*/
if (fq->flush_pending_idx != fq->flush_running_idx &&
!queue_flush_queueable(q)) {
fq->flush_queue_delayed = 1;
return NULL;
}
if (unlikely(blk_queue_bypass(q)) ||
!q->elevator->type->ops.sq.elevator_dispatch_fn(q, 0))
return NULL;
}
}
static inline void elv_activate_rq(struct request_queue *q, struct request *rq) static inline void elv_activate_rq(struct request_queue *q, struct request *rq)
{ {
struct elevator_queue *e = q->elevator; struct elevator_queue *e = q->elevator;

View File

@ -137,7 +137,7 @@ static inline struct hlist_head *bsg_dev_idx_hash(int index)
static int blk_fill_sgv4_hdr_rq(struct request_queue *q, struct request *rq, static int blk_fill_sgv4_hdr_rq(struct request_queue *q, struct request *rq,
struct sg_io_v4 *hdr, struct bsg_device *bd, struct sg_io_v4 *hdr, struct bsg_device *bd,
fmode_t has_write_perm) fmode_t mode)
{ {
struct scsi_request *req = scsi_req(rq); struct scsi_request *req = scsi_req(rq);
@ -152,7 +152,7 @@ static int blk_fill_sgv4_hdr_rq(struct request_queue *q, struct request *rq,
return -EFAULT; return -EFAULT;
if (hdr->subprotocol == BSG_SUB_PROTOCOL_SCSI_CMD) { if (hdr->subprotocol == BSG_SUB_PROTOCOL_SCSI_CMD) {
if (blk_verify_command(req->cmd, has_write_perm)) if (blk_verify_command(req->cmd, mode))
return -EPERM; return -EPERM;
} else if (!capable(CAP_SYS_RAWIO)) } else if (!capable(CAP_SYS_RAWIO))
return -EPERM; return -EPERM;
@ -206,7 +206,7 @@ bsg_validate_sgv4_hdr(struct sg_io_v4 *hdr, int *op)
* map sg_io_v4 to a request. * map sg_io_v4 to a request.
*/ */
static struct request * static struct request *
bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, fmode_t has_write_perm) bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, fmode_t mode)
{ {
struct request_queue *q = bd->queue; struct request_queue *q = bd->queue;
struct request *rq, *next_rq = NULL; struct request *rq, *next_rq = NULL;
@ -237,7 +237,7 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, fmode_t has_write_perm)
if (IS_ERR(rq)) if (IS_ERR(rq))
return rq; return rq;
ret = blk_fill_sgv4_hdr_rq(q, rq, hdr, bd, has_write_perm); ret = blk_fill_sgv4_hdr_rq(q, rq, hdr, bd, mode);
if (ret) if (ret)
goto out; goto out;
@ -587,8 +587,7 @@ bsg_read(struct file *file, char __user *buf, size_t count, loff_t *ppos)
} }
static int __bsg_write(struct bsg_device *bd, const char __user *buf, static int __bsg_write(struct bsg_device *bd, const char __user *buf,
size_t count, ssize_t *bytes_written, size_t count, ssize_t *bytes_written, fmode_t mode)
fmode_t has_write_perm)
{ {
struct bsg_command *bc; struct bsg_command *bc;
struct request *rq; struct request *rq;
@ -619,7 +618,7 @@ static int __bsg_write(struct bsg_device *bd, const char __user *buf,
/* /*
* get a request, fill in the blanks, and add to request queue * get a request, fill in the blanks, and add to request queue
*/ */
rq = bsg_map_hdr(bd, &bc->hdr, has_write_perm); rq = bsg_map_hdr(bd, &bc->hdr, mode);
if (IS_ERR(rq)) { if (IS_ERR(rq)) {
ret = PTR_ERR(rq); ret = PTR_ERR(rq);
rq = NULL; rq = NULL;
@ -655,8 +654,7 @@ bsg_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos)
bsg_set_block(bd, file); bsg_set_block(bd, file);
bytes_written = 0; bytes_written = 0;
ret = __bsg_write(bd, buf, count, &bytes_written, ret = __bsg_write(bd, buf, count, &bytes_written, file->f_mode);
file->f_mode & FMODE_WRITE);
*ppos = bytes_written; *ppos = bytes_written;
@ -915,7 +913,7 @@ static long bsg_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
if (copy_from_user(&hdr, uarg, sizeof(hdr))) if (copy_from_user(&hdr, uarg, sizeof(hdr)))
return -EFAULT; return -EFAULT;
rq = bsg_map_hdr(bd, &hdr, file->f_mode & FMODE_WRITE); rq = bsg_map_hdr(bd, &hdr, file->f_mode);
if (IS_ERR(rq)) if (IS_ERR(rq))
return PTR_ERR(rq); return PTR_ERR(rq);

View File

@ -83,12 +83,25 @@ bool elv_bio_merge_ok(struct request *rq, struct bio *bio)
} }
EXPORT_SYMBOL(elv_bio_merge_ok); EXPORT_SYMBOL(elv_bio_merge_ok);
static struct elevator_type *elevator_find(const char *name) static bool elevator_match(const struct elevator_type *e, const char *name)
{
if (!strcmp(e->elevator_name, name))
return true;
if (e->elevator_alias && !strcmp(e->elevator_alias, name))
return true;
return false;
}
/*
* Return scheduler with name 'name' and with matching 'mq capability
*/
static struct elevator_type *elevator_find(const char *name, bool mq)
{ {
struct elevator_type *e; struct elevator_type *e;
list_for_each_entry(e, &elv_list, list) { list_for_each_entry(e, &elv_list, list) {
if (!strcmp(e->elevator_name, name)) if (elevator_match(e, name) && (mq == e->uses_mq))
return e; return e;
} }
@ -100,25 +113,25 @@ static void elevator_put(struct elevator_type *e)
module_put(e->elevator_owner); module_put(e->elevator_owner);
} }
static struct elevator_type *elevator_get(const char *name, bool try_loading) static struct elevator_type *elevator_get(struct request_queue *q,
const char *name, bool try_loading)
{ {
struct elevator_type *e; struct elevator_type *e;
spin_lock(&elv_list_lock); spin_lock(&elv_list_lock);
e = elevator_find(name); e = elevator_find(name, q->mq_ops != NULL);
if (!e && try_loading) { if (!e && try_loading) {
spin_unlock(&elv_list_lock); spin_unlock(&elv_list_lock);
request_module("%s-iosched", name); request_module("%s-iosched", name);
spin_lock(&elv_list_lock); spin_lock(&elv_list_lock);
e = elevator_find(name); e = elevator_find(name, q->mq_ops != NULL);
} }
if (e && !try_module_get(e->elevator_owner)) if (e && !try_module_get(e->elevator_owner))
e = NULL; e = NULL;
spin_unlock(&elv_list_lock); spin_unlock(&elv_list_lock);
return e; return e;
} }
@ -144,8 +157,12 @@ void __init load_default_elevator_module(void)
if (!chosen_elevator[0]) if (!chosen_elevator[0])
return; return;
/*
* Boot parameter is deprecated, we haven't supported that for MQ.
* Only look for non-mq schedulers from here.
*/
spin_lock(&elv_list_lock); spin_lock(&elv_list_lock);
e = elevator_find(chosen_elevator); e = elevator_find(chosen_elevator, false);
spin_unlock(&elv_list_lock); spin_unlock(&elv_list_lock);
if (!e) if (!e)
@ -202,7 +219,7 @@ int elevator_init(struct request_queue *q, char *name)
q->boundary_rq = NULL; q->boundary_rq = NULL;
if (name) { if (name) {
e = elevator_get(name, true); e = elevator_get(q, name, true);
if (!e) if (!e)
return -EINVAL; return -EINVAL;
} }
@ -214,7 +231,7 @@ int elevator_init(struct request_queue *q, char *name)
* allowed from async. * allowed from async.
*/ */
if (!e && !q->mq_ops && *chosen_elevator) { if (!e && !q->mq_ops && *chosen_elevator) {
e = elevator_get(chosen_elevator, false); e = elevator_get(q, chosen_elevator, false);
if (!e) if (!e)
printk(KERN_ERR "I/O scheduler %s not found\n", printk(KERN_ERR "I/O scheduler %s not found\n",
chosen_elevator); chosen_elevator);
@ -229,17 +246,17 @@ int elevator_init(struct request_queue *q, char *name)
*/ */
if (q->mq_ops) { if (q->mq_ops) {
if (q->nr_hw_queues == 1) if (q->nr_hw_queues == 1)
e = elevator_get("mq-deadline", false); e = elevator_get(q, "mq-deadline", false);
if (!e) if (!e)
return 0; return 0;
} else } else
e = elevator_get(CONFIG_DEFAULT_IOSCHED, false); e = elevator_get(q, CONFIG_DEFAULT_IOSCHED, false);
if (!e) { if (!e) {
printk(KERN_ERR printk(KERN_ERR
"Default I/O scheduler not found. " \ "Default I/O scheduler not found. " \
"Using noop.\n"); "Using noop.\n");
e = elevator_get("noop", false); e = elevator_get(q, "noop", false);
} }
} }
@ -905,7 +922,7 @@ int elv_register(struct elevator_type *e)
/* register, don't allow duplicate names */ /* register, don't allow duplicate names */
spin_lock(&elv_list_lock); spin_lock(&elv_list_lock);
if (elevator_find(e->elevator_name)) { if (elevator_find(e->elevator_name, e->uses_mq)) {
spin_unlock(&elv_list_lock); spin_unlock(&elv_list_lock);
if (e->icq_cache) if (e->icq_cache)
kmem_cache_destroy(e->icq_cache); kmem_cache_destroy(e->icq_cache);
@ -915,9 +932,9 @@ int elv_register(struct elevator_type *e)
spin_unlock(&elv_list_lock); spin_unlock(&elv_list_lock);
/* print pretty message */ /* print pretty message */
if (!strcmp(e->elevator_name, chosen_elevator) || if (elevator_match(e, chosen_elevator) ||
(!*chosen_elevator && (!*chosen_elevator &&
!strcmp(e->elevator_name, CONFIG_DEFAULT_IOSCHED))) elevator_match(e, CONFIG_DEFAULT_IOSCHED)))
def = " (default)"; def = " (default)";
printk(KERN_INFO "io scheduler %s registered%s\n", e->elevator_name, printk(KERN_INFO "io scheduler %s registered%s\n", e->elevator_name,
@ -1066,25 +1083,15 @@ static int __elevator_change(struct request_queue *q, const char *name)
return elevator_switch(q, NULL); return elevator_switch(q, NULL);
strlcpy(elevator_name, name, sizeof(elevator_name)); strlcpy(elevator_name, name, sizeof(elevator_name));
e = elevator_get(strstrip(elevator_name), true); e = elevator_get(q, strstrip(elevator_name), true);
if (!e) if (!e)
return -EINVAL; return -EINVAL;
if (q->elevator && if (q->elevator && elevator_match(q->elevator->type, elevator_name)) {
!strcmp(elevator_name, q->elevator->type->elevator_name)) {
elevator_put(e); elevator_put(e);
return 0; return 0;
} }
if (!e->uses_mq && q->mq_ops) {
elevator_put(e);
return -EINVAL;
}
if (e->uses_mq && !q->mq_ops) {
elevator_put(e);
return -EINVAL;
}
return elevator_switch(q, e); return elevator_switch(q, e);
} }
@ -1116,9 +1123,10 @@ ssize_t elv_iosched_show(struct request_queue *q, char *name)
struct elevator_queue *e = q->elevator; struct elevator_queue *e = q->elevator;
struct elevator_type *elv = NULL; struct elevator_type *elv = NULL;
struct elevator_type *__e; struct elevator_type *__e;
bool uses_mq = q->mq_ops != NULL;
int len = 0; int len = 0;
if (!blk_queue_stackable(q)) if (!queue_is_rq_based(q))
return sprintf(name, "none\n"); return sprintf(name, "none\n");
if (!q->elevator) if (!q->elevator)
@ -1128,7 +1136,8 @@ ssize_t elv_iosched_show(struct request_queue *q, char *name)
spin_lock(&elv_list_lock); spin_lock(&elv_list_lock);
list_for_each_entry(__e, &elv_list, list) { list_for_each_entry(__e, &elv_list, list) {
if (elv && !strcmp(elv->elevator_name, __e->elevator_name)) { if (elv && elevator_match(elv, __e->elevator_name) &&
(__e->uses_mq == uses_mq)) {
len += sprintf(name+len, "[%s] ", elv->elevator_name); len += sprintf(name+len, "[%s] ", elv->elevator_name);
continue; continue;
} }

View File

@ -588,6 +588,11 @@ static void register_disk(struct device *parent, struct gendisk *disk)
disk->part0.holder_dir = kobject_create_and_add("holders", &ddev->kobj); disk->part0.holder_dir = kobject_create_and_add("holders", &ddev->kobj);
disk->slave_dir = kobject_create_and_add("slaves", &ddev->kobj); disk->slave_dir = kobject_create_and_add("slaves", &ddev->kobj);
if (disk->flags & GENHD_FL_HIDDEN) {
dev_set_uevent_suppress(ddev, 0);
return;
}
/* No minors to use for partitions */ /* No minors to use for partitions */
if (!disk_part_scan_enabled(disk)) if (!disk_part_scan_enabled(disk))
goto exit; goto exit;
@ -616,6 +621,11 @@ exit:
while ((part = disk_part_iter_next(&piter))) while ((part = disk_part_iter_next(&piter)))
kobject_uevent(&part_to_dev(part)->kobj, KOBJ_ADD); kobject_uevent(&part_to_dev(part)->kobj, KOBJ_ADD);
disk_part_iter_exit(&piter); disk_part_iter_exit(&piter);
err = sysfs_create_link(&ddev->kobj,
&disk->queue->backing_dev_info->dev->kobj,
"bdi");
WARN_ON(err);
} }
/** /**
@ -630,7 +640,6 @@ exit:
*/ */
void device_add_disk(struct device *parent, struct gendisk *disk) void device_add_disk(struct device *parent, struct gendisk *disk)
{ {
struct backing_dev_info *bdi;
dev_t devt; dev_t devt;
int retval; int retval;
@ -639,7 +648,8 @@ void device_add_disk(struct device *parent, struct gendisk *disk)
* parameters make sense. * parameters make sense.
*/ */
WARN_ON(disk->minors && !(disk->major || disk->first_minor)); WARN_ON(disk->minors && !(disk->major || disk->first_minor));
WARN_ON(!disk->minors && !(disk->flags & GENHD_FL_EXT_DEVT)); WARN_ON(!disk->minors &&
!(disk->flags & (GENHD_FL_EXT_DEVT | GENHD_FL_HIDDEN)));
disk->flags |= GENHD_FL_UP; disk->flags |= GENHD_FL_UP;
@ -648,22 +658,26 @@ void device_add_disk(struct device *parent, struct gendisk *disk)
WARN_ON(1); WARN_ON(1);
return; return;
} }
disk_to_dev(disk)->devt = devt;
/* ->major and ->first_minor aren't supposed to be
* dereferenced from here on, but set them just in case.
*/
disk->major = MAJOR(devt); disk->major = MAJOR(devt);
disk->first_minor = MINOR(devt); disk->first_minor = MINOR(devt);
disk_alloc_events(disk); disk_alloc_events(disk);
/* Register BDI before referencing it from bdev */ if (disk->flags & GENHD_FL_HIDDEN) {
bdi = disk->queue->backing_dev_info; /*
bdi_register_owner(bdi, disk_to_dev(disk)); * Don't let hidden disks show up in /proc/partitions,
* and don't bother scanning for partitions either.
blk_register_region(disk_devt(disk), disk->minors, NULL, */
exact_match, exact_lock, disk); disk->flags |= GENHD_FL_SUPPRESS_PARTITION_INFO;
disk->flags |= GENHD_FL_NO_PART_SCAN;
} else {
/* Register BDI before referencing it from bdev */
disk_to_dev(disk)->devt = devt;
bdi_register_owner(disk->queue->backing_dev_info,
disk_to_dev(disk));
blk_register_region(disk_devt(disk), disk->minors, NULL,
exact_match, exact_lock, disk);
}
register_disk(parent, disk); register_disk(parent, disk);
blk_register_queue(disk); blk_register_queue(disk);
@ -673,10 +687,6 @@ void device_add_disk(struct device *parent, struct gendisk *disk)
*/ */
WARN_ON_ONCE(!blk_get_queue(disk->queue)); WARN_ON_ONCE(!blk_get_queue(disk->queue));
retval = sysfs_create_link(&disk_to_dev(disk)->kobj, &bdi->dev->kobj,
"bdi");
WARN_ON(retval);
disk_add_events(disk); disk_add_events(disk);
blk_integrity_add(disk); blk_integrity_add(disk);
} }
@ -705,7 +715,8 @@ void del_gendisk(struct gendisk *disk)
set_capacity(disk, 0); set_capacity(disk, 0);
disk->flags &= ~GENHD_FL_UP; disk->flags &= ~GENHD_FL_UP;
sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi"); if (!(disk->flags & GENHD_FL_HIDDEN))
sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
if (disk->queue) { if (disk->queue) {
/* /*
* Unregister bdi before releasing device numbers (as they can * Unregister bdi before releasing device numbers (as they can
@ -716,13 +727,15 @@ void del_gendisk(struct gendisk *disk)
} else { } else {
WARN_ON(1); WARN_ON(1);
} }
blk_unregister_region(disk_devt(disk), disk->minors);
part_stat_set_all(&disk->part0, 0); if (!(disk->flags & GENHD_FL_HIDDEN))
disk->part0.stamp = 0; blk_unregister_region(disk_devt(disk), disk->minors);
kobject_put(disk->part0.holder_dir); kobject_put(disk->part0.holder_dir);
kobject_put(disk->slave_dir); kobject_put(disk->slave_dir);
part_stat_set_all(&disk->part0, 0);
disk->part0.stamp = 0;
if (!sysfs_deprecated) if (!sysfs_deprecated)
sysfs_remove_link(block_depr, dev_name(disk_to_dev(disk))); sysfs_remove_link(block_depr, dev_name(disk_to_dev(disk)));
pm_runtime_set_memalloc_noio(disk_to_dev(disk), false); pm_runtime_set_memalloc_noio(disk_to_dev(disk), false);
@ -785,6 +798,10 @@ struct gendisk *get_gendisk(dev_t devt, int *partno)
spin_unlock_bh(&ext_devt_lock); spin_unlock_bh(&ext_devt_lock);
} }
if (disk && unlikely(disk->flags & GENHD_FL_HIDDEN)) {
put_disk(disk);
disk = NULL;
}
return disk; return disk;
} }
EXPORT_SYMBOL(get_gendisk); EXPORT_SYMBOL(get_gendisk);
@ -1028,6 +1045,15 @@ static ssize_t disk_removable_show(struct device *dev,
(disk->flags & GENHD_FL_REMOVABLE ? 1 : 0)); (disk->flags & GENHD_FL_REMOVABLE ? 1 : 0));
} }
static ssize_t disk_hidden_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct gendisk *disk = dev_to_disk(dev);
return sprintf(buf, "%d\n",
(disk->flags & GENHD_FL_HIDDEN ? 1 : 0));
}
static ssize_t disk_ro_show(struct device *dev, static ssize_t disk_ro_show(struct device *dev,
struct device_attribute *attr, char *buf) struct device_attribute *attr, char *buf)
{ {
@ -1065,6 +1091,7 @@ static ssize_t disk_discard_alignment_show(struct device *dev,
static DEVICE_ATTR(range, S_IRUGO, disk_range_show, NULL); static DEVICE_ATTR(range, S_IRUGO, disk_range_show, NULL);
static DEVICE_ATTR(ext_range, S_IRUGO, disk_ext_range_show, NULL); static DEVICE_ATTR(ext_range, S_IRUGO, disk_ext_range_show, NULL);
static DEVICE_ATTR(removable, S_IRUGO, disk_removable_show, NULL); static DEVICE_ATTR(removable, S_IRUGO, disk_removable_show, NULL);
static DEVICE_ATTR(hidden, S_IRUGO, disk_hidden_show, NULL);
static DEVICE_ATTR(ro, S_IRUGO, disk_ro_show, NULL); static DEVICE_ATTR(ro, S_IRUGO, disk_ro_show, NULL);
static DEVICE_ATTR(size, S_IRUGO, part_size_show, NULL); static DEVICE_ATTR(size, S_IRUGO, part_size_show, NULL);
static DEVICE_ATTR(alignment_offset, S_IRUGO, disk_alignment_offset_show, NULL); static DEVICE_ATTR(alignment_offset, S_IRUGO, disk_alignment_offset_show, NULL);
@ -1089,6 +1116,7 @@ static struct attribute *disk_attrs[] = {
&dev_attr_range.attr, &dev_attr_range.attr,
&dev_attr_ext_range.attr, &dev_attr_ext_range.attr,
&dev_attr_removable.attr, &dev_attr_removable.attr,
&dev_attr_hidden.attr,
&dev_attr_ro.attr, &dev_attr_ro.attr,
&dev_attr_size.attr, &dev_attr_size.attr,
&dev_attr_alignment_offset.attr, &dev_attr_alignment_offset.attr,

View File

@ -202,10 +202,16 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
{ {
uint64_t range[2]; uint64_t range[2];
uint64_t start, len; uint64_t start, len;
struct request_queue *q = bdev_get_queue(bdev);
struct address_space *mapping = bdev->bd_inode->i_mapping;
if (!(mode & FMODE_WRITE)) if (!(mode & FMODE_WRITE))
return -EBADF; return -EBADF;
if (!blk_queue_discard(q))
return -EOPNOTSUPP;
if (copy_from_user(range, (void __user *)arg, sizeof(range))) if (copy_from_user(range, (void __user *)arg, sizeof(range)))
return -EFAULT; return -EFAULT;
@ -216,12 +222,12 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
return -EINVAL; return -EINVAL;
if (len & 511) if (len & 511)
return -EINVAL; return -EINVAL;
start >>= 9;
len >>= 9;
if (start + len > (i_size_read(bdev->bd_inode) >> 9)) if (start + len > i_size_read(bdev->bd_inode))
return -EINVAL; return -EINVAL;
return blkdev_issue_discard(bdev, start, len, GFP_KERNEL, flags); truncate_inode_pages_range(mapping, start, start + len);
return blkdev_issue_discard(bdev, start >> 9, len >> 9,
GFP_KERNEL, flags);
} }
static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode, static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
@ -437,11 +443,12 @@ static int blkdev_roset(struct block_device *bdev, fmode_t mode,
{ {
int ret, n; int ret, n;
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
ret = __blkdev_driver_ioctl(bdev, mode, cmd, arg); ret = __blkdev_driver_ioctl(bdev, mode, cmd, arg);
if (!is_unrecognized_ioctl(ret)) if (!is_unrecognized_ioctl(ret))
return ret; return ret;
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
if (get_user(n, (int __user *)arg)) if (get_user(n, (int __user *)arg))
return -EFAULT; return -EFAULT;
set_device_ro(bdev, n); set_device_ro(bdev, n);

View File

@ -541,9 +541,17 @@ static int kyber_get_domain_token(struct kyber_queue_data *kqd,
/* /*
* Try again in case a token was freed before we got on the wait * Try again in case a token was freed before we got on the wait
* queue. * queue. The waker may have already removed the entry from the
* wait queue, but list_del_init() is okay with that.
*/ */
nr = __sbitmap_queue_get(domain_tokens); nr = __sbitmap_queue_get(domain_tokens);
if (nr >= 0) {
unsigned long flags;
spin_lock_irqsave(&ws->wait.lock, flags);
list_del_init(&wait->entry);
spin_unlock_irqrestore(&ws->wait.lock, flags);
}
} }
return nr; return nr;
} }
@ -641,7 +649,7 @@ static bool kyber_has_work(struct blk_mq_hw_ctx *hctx)
if (!list_empty_careful(&khd->rqs[i])) if (!list_empty_careful(&khd->rqs[i]))
return true; return true;
} }
return false; return sbitmap_any_bit_set(&hctx->ctx_map);
} }
#define KYBER_LAT_SHOW_STORE(op) \ #define KYBER_LAT_SHOW_STORE(op) \

View File

@ -657,6 +657,7 @@ static struct elevator_type mq_deadline = {
#endif #endif
.elevator_attrs = deadline_attrs, .elevator_attrs = deadline_attrs,
.elevator_name = "mq-deadline", .elevator_name = "mq-deadline",
.elevator_alias = "deadline",
.elevator_owner = THIS_MODULE, .elevator_owner = THIS_MODULE,
}; };
MODULE_ALIAS("mq-deadline-iosched"); MODULE_ALIAS("mq-deadline-iosched");

View File

@ -207,7 +207,7 @@ static void blk_set_cmd_filter_defaults(struct blk_cmd_filter *filter)
__set_bit(GPCMD_SET_READ_AHEAD, filter->write_ok); __set_bit(GPCMD_SET_READ_AHEAD, filter->write_ok);
} }
int blk_verify_command(unsigned char *cmd, fmode_t has_write_perm) int blk_verify_command(unsigned char *cmd, fmode_t mode)
{ {
struct blk_cmd_filter *filter = &blk_default_cmd_filter; struct blk_cmd_filter *filter = &blk_default_cmd_filter;
@ -220,7 +220,7 @@ int blk_verify_command(unsigned char *cmd, fmode_t has_write_perm)
return 0; return 0;
/* Write-safe commands require a writable open */ /* Write-safe commands require a writable open */
if (test_bit(cmd[0], filter->write_ok) && has_write_perm) if (test_bit(cmd[0], filter->write_ok) && (mode & FMODE_WRITE))
return 0; return 0;
return -EPERM; return -EPERM;
@ -234,7 +234,7 @@ static int blk_fill_sghdr_rq(struct request_queue *q, struct request *rq,
if (copy_from_user(req->cmd, hdr->cmdp, hdr->cmd_len)) if (copy_from_user(req->cmd, hdr->cmdp, hdr->cmd_len))
return -EFAULT; return -EFAULT;
if (blk_verify_command(req->cmd, mode & FMODE_WRITE)) if (blk_verify_command(req->cmd, mode))
return -EPERM; return -EPERM;
/* /*
@ -469,7 +469,7 @@ int sg_scsi_ioctl(struct request_queue *q, struct gendisk *disk, fmode_t mode,
if (in_len && copy_from_user(buffer, sic->data + cmdlen, in_len)) if (in_len && copy_from_user(buffer, sic->data + cmdlen, in_len))
goto error; goto error;
err = blk_verify_command(req->cmd, mode & FMODE_WRITE); err = blk_verify_command(req->cmd, mode);
if (err) if (err)
goto error; goto error;

View File

@ -68,9 +68,13 @@ config AMIGA_Z2RAM
To compile this driver as a module, choose M here: the To compile this driver as a module, choose M here: the
module will be called z2ram. module will be called z2ram.
config CDROM
tristate
config GDROM config GDROM
tristate "SEGA Dreamcast GD-ROM drive" tristate "SEGA Dreamcast GD-ROM drive"
depends on SH_DREAMCAST depends on SH_DREAMCAST
select CDROM
select BLK_SCSI_REQUEST # only for the generic cdrom code select BLK_SCSI_REQUEST # only for the generic cdrom code
help help
A standard SEGA Dreamcast comes with a modified CD ROM drive called a A standard SEGA Dreamcast comes with a modified CD ROM drive called a
@ -348,6 +352,7 @@ config BLK_DEV_RAM_DAX
config CDROM_PKTCDVD config CDROM_PKTCDVD
tristate "Packet writing on CD/DVD media (DEPRECATED)" tristate "Packet writing on CD/DVD media (DEPRECATED)"
depends on !UML depends on !UML
select CDROM
select BLK_SCSI_REQUEST select BLK_SCSI_REQUEST
help help
Note: This driver is deprecated and will be removed from the Note: This driver is deprecated and will be removed from the

View File

@ -60,7 +60,6 @@ struct brd_device {
/* /*
* Look up and return a brd's page for a given sector. * Look up and return a brd's page for a given sector.
*/ */
static DEFINE_MUTEX(brd_mutex);
static struct page *brd_lookup_page(struct brd_device *brd, sector_t sector) static struct page *brd_lookup_page(struct brd_device *brd, sector_t sector)
{ {
pgoff_t idx; pgoff_t idx;

View File

@ -43,7 +43,6 @@ cryptoloop_init(struct loop_device *lo, const struct loop_info64 *info)
int cipher_len; int cipher_len;
int mode_len; int mode_len;
char cms[LO_NAME_SIZE]; /* cipher-mode string */ char cms[LO_NAME_SIZE]; /* cipher-mode string */
char *cipher;
char *mode; char *mode;
char *cmsp = cms; /* c-m string pointer */ char *cmsp = cms; /* c-m string pointer */
struct crypto_skcipher *tfm; struct crypto_skcipher *tfm;
@ -56,7 +55,6 @@ cryptoloop_init(struct loop_device *lo, const struct loop_info64 *info)
strncpy(cms, info->lo_crypt_name, LO_NAME_SIZE); strncpy(cms, info->lo_crypt_name, LO_NAME_SIZE);
cms[LO_NAME_SIZE - 1] = 0; cms[LO_NAME_SIZE - 1] = 0;
cipher = cmsp;
cipher_len = strcspn(cmsp, "-"); cipher_len = strcspn(cmsp, "-");
mode = cmsp + cipher_len; mode = cmsp + cipher_len;

View File

@ -476,6 +476,8 @@ static void lo_rw_aio_complete(struct kiocb *iocb, long ret, long ret2)
{ {
struct loop_cmd *cmd = container_of(iocb, struct loop_cmd, iocb); struct loop_cmd *cmd = container_of(iocb, struct loop_cmd, iocb);
if (cmd->css)
css_put(cmd->css);
cmd->ret = ret; cmd->ret = ret;
lo_rw_aio_do_completion(cmd); lo_rw_aio_do_completion(cmd);
} }
@ -535,6 +537,8 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
cmd->iocb.ki_filp = file; cmd->iocb.ki_filp = file;
cmd->iocb.ki_complete = lo_rw_aio_complete; cmd->iocb.ki_complete = lo_rw_aio_complete;
cmd->iocb.ki_flags = IOCB_DIRECT; cmd->iocb.ki_flags = IOCB_DIRECT;
if (cmd->css)
kthread_associate_blkcg(cmd->css);
if (rw == WRITE) if (rw == WRITE)
ret = call_write_iter(file, &cmd->iocb, &iter); ret = call_write_iter(file, &cmd->iocb, &iter);
@ -542,6 +546,7 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
ret = call_read_iter(file, &cmd->iocb, &iter); ret = call_read_iter(file, &cmd->iocb, &iter);
lo_rw_aio_do_completion(cmd); lo_rw_aio_do_completion(cmd);
kthread_associate_blkcg(NULL);
if (ret != -EIOCBQUEUED) if (ret != -EIOCBQUEUED)
cmd->iocb.ki_complete(&cmd->iocb, ret, 0); cmd->iocb.ki_complete(&cmd->iocb, ret, 0);
@ -1686,6 +1691,14 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
break; break;
} }
/* always use the first bio's css */
#ifdef CONFIG_BLK_CGROUP
if (cmd->use_aio && cmd->rq->bio && cmd->rq->bio->bi_css) {
cmd->css = cmd->rq->bio->bi_css;
css_get(cmd->css);
} else
#endif
cmd->css = NULL;
kthread_queue_work(&lo->worker, &cmd->work); kthread_queue_work(&lo->worker, &cmd->work);
return BLK_STS_OK; return BLK_STS_OK;

View File

@ -72,6 +72,7 @@ struct loop_cmd {
long ret; long ret;
struct kiocb iocb; struct kiocb iocb;
struct bio_vec *bvec; struct bio_vec *bvec;
struct cgroup_subsys_state *css;
}; };
/* Support for loadable transfer modules */ /* Support for loadable transfer modules */

View File

@ -887,12 +887,9 @@ static void mtip_issue_non_ncq_command(struct mtip_port *port, int tag)
static bool mtip_pause_ncq(struct mtip_port *port, static bool mtip_pause_ncq(struct mtip_port *port,
struct host_to_dev_fis *fis) struct host_to_dev_fis *fis)
{ {
struct host_to_dev_fis *reply;
unsigned long task_file_data; unsigned long task_file_data;
reply = port->rxfis + RX_FIS_D2H_REG;
task_file_data = readl(port->mmio+PORT_TFDATA); task_file_data = readl(port->mmio+PORT_TFDATA);
if ((task_file_data & 1)) if ((task_file_data & 1))
return false; return false;
@ -1020,7 +1017,6 @@ static int mtip_exec_internal_command(struct mtip_port *port,
.opts = opts .opts = opts
}; };
int rv = 0; int rv = 0;
unsigned long start;
/* Make sure the buffer is 8 byte aligned. This is asic specific. */ /* Make sure the buffer is 8 byte aligned. This is asic specific. */
if (buffer & 0x00000007) { if (buffer & 0x00000007) {
@ -1057,7 +1053,6 @@ static int mtip_exec_internal_command(struct mtip_port *port,
/* Copy the command to the command table */ /* Copy the command to the command table */
memcpy(int_cmd->command, fis, fis_len*4); memcpy(int_cmd->command, fis, fis_len*4);
start = jiffies;
rq->timeout = timeout; rq->timeout = timeout;
/* insert request and run queue */ /* insert request and run queue */
@ -3015,7 +3010,6 @@ static int mtip_hw_init(struct driver_data *dd)
{ {
int i; int i;
int rv; int rv;
unsigned int num_command_slots;
unsigned long timeout, timetaken; unsigned long timeout, timetaken;
dd->mmio = pcim_iomap_table(dd->pdev)[MTIP_ABAR]; dd->mmio = pcim_iomap_table(dd->pdev)[MTIP_ABAR];
@ -3025,7 +3019,6 @@ static int mtip_hw_init(struct driver_data *dd)
rv = -EIO; rv = -EIO;
goto out1; goto out1;
} }
num_command_slots = dd->slot_groups * 32;
hba_setup(dd); hba_setup(dd);

View File

@ -288,15 +288,6 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req,
cmd->status = BLK_STS_TIMEOUT; cmd->status = BLK_STS_TIMEOUT;
return BLK_EH_HANDLED; return BLK_EH_HANDLED;
} }
/* If we are waiting on our dead timer then we could get timeout
* callbacks for our request. For this we just want to reset the timer
* and let the queue side take care of everything.
*/
if (!completion_done(&cmd->send_complete)) {
nbd_config_put(nbd);
return BLK_EH_RESET_TIMER;
}
config = nbd->config; config = nbd->config;
if (config->num_connections > 1) { if (config->num_connections > 1) {
@ -723,9 +714,9 @@ static int wait_for_reconnect(struct nbd_device *nbd)
return 0; return 0;
if (test_bit(NBD_DISCONNECTED, &config->runtime_flags)) if (test_bit(NBD_DISCONNECTED, &config->runtime_flags))
return 0; return 0;
wait_event_interruptible_timeout(config->conn_wait, wait_event_timeout(config->conn_wait,
atomic_read(&config->live_connections), atomic_read(&config->live_connections),
config->dead_conn_timeout); config->dead_conn_timeout);
return atomic_read(&config->live_connections); return atomic_read(&config->live_connections);
} }
@ -740,6 +731,7 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
if (!refcount_inc_not_zero(&nbd->config_refs)) { if (!refcount_inc_not_zero(&nbd->config_refs)) {
dev_err_ratelimited(disk_to_dev(nbd->disk), dev_err_ratelimited(disk_to_dev(nbd->disk),
"Socks array is empty\n"); "Socks array is empty\n");
blk_mq_start_request(req);
return -EINVAL; return -EINVAL;
} }
config = nbd->config; config = nbd->config;
@ -748,6 +740,7 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
dev_err_ratelimited(disk_to_dev(nbd->disk), dev_err_ratelimited(disk_to_dev(nbd->disk),
"Attempted send on invalid socket\n"); "Attempted send on invalid socket\n");
nbd_config_put(nbd); nbd_config_put(nbd);
blk_mq_start_request(req);
return -EINVAL; return -EINVAL;
} }
cmd->status = BLK_STS_OK; cmd->status = BLK_STS_OK;
@ -771,6 +764,7 @@ again:
*/ */
sock_shutdown(nbd); sock_shutdown(nbd);
nbd_config_put(nbd); nbd_config_put(nbd);
blk_mq_start_request(req);
return -EIO; return -EIO;
} }
goto again; goto again;
@ -781,6 +775,7 @@ again:
* here so that it gets put _after_ the request that is already on the * here so that it gets put _after_ the request that is already on the
* dispatch list. * dispatch list.
*/ */
blk_mq_start_request(req);
if (unlikely(nsock->pending && nsock->pending != req)) { if (unlikely(nsock->pending && nsock->pending != req)) {
blk_mq_requeue_request(req, true); blk_mq_requeue_request(req, true);
ret = 0; ret = 0;
@ -793,10 +788,10 @@ again:
ret = nbd_send_cmd(nbd, cmd, index); ret = nbd_send_cmd(nbd, cmd, index);
if (ret == -EAGAIN) { if (ret == -EAGAIN) {
dev_err_ratelimited(disk_to_dev(nbd->disk), dev_err_ratelimited(disk_to_dev(nbd->disk),
"Request send failed trying another connection\n"); "Request send failed, requeueing\n");
nbd_mark_nsock_dead(nbd, nsock, 1); nbd_mark_nsock_dead(nbd, nsock, 1);
mutex_unlock(&nsock->tx_lock); blk_mq_requeue_request(req, true);
goto again; ret = 0;
} }
out: out:
mutex_unlock(&nsock->tx_lock); mutex_unlock(&nsock->tx_lock);
@ -820,7 +815,6 @@ static blk_status_t nbd_queue_rq(struct blk_mq_hw_ctx *hctx,
* done sending everything over the wire. * done sending everything over the wire.
*/ */
init_completion(&cmd->send_complete); init_completion(&cmd->send_complete);
blk_mq_start_request(bd->rq);
/* We can be called directly from the user space process, which means we /* We can be called directly from the user space process, which means we
* could possibly have signals pending so our sendmsg will fail. In * could possibly have signals pending so our sendmsg will fail. In

View File

@ -154,6 +154,10 @@ enum {
NULL_Q_MQ = 2, NULL_Q_MQ = 2,
}; };
static int g_no_sched;
module_param_named(no_sched, g_no_sched, int, S_IRUGO);
MODULE_PARM_DESC(no_sched, "No io scheduler");
static int g_submit_queues = 1; static int g_submit_queues = 1;
module_param_named(submit_queues, g_submit_queues, int, S_IRUGO); module_param_named(submit_queues, g_submit_queues, int, S_IRUGO);
MODULE_PARM_DESC(submit_queues, "Number of submission queues"); MODULE_PARM_DESC(submit_queues, "Number of submission queues");
@ -1754,6 +1758,8 @@ static int null_init_tag_set(struct nullb *nullb, struct blk_mq_tag_set *set)
set->numa_node = nullb ? nullb->dev->home_node : g_home_node; set->numa_node = nullb ? nullb->dev->home_node : g_home_node;
set->cmd_size = sizeof(struct nullb_cmd); set->cmd_size = sizeof(struct nullb_cmd);
set->flags = BLK_MQ_F_SHOULD_MERGE; set->flags = BLK_MQ_F_SHOULD_MERGE;
if (g_no_sched)
set->flags |= BLK_MQ_F_NO_SCHED;
set->driver_data = NULL; set->driver_data = NULL;
if ((nullb && nullb->dev->blocking) || g_blocking) if ((nullb && nullb->dev->blocking) || g_blocking)
@ -1985,8 +1991,10 @@ static int __init null_init(void)
for (i = 0; i < nr_devices; i++) { for (i = 0; i < nr_devices; i++) {
dev = null_alloc_dev(); dev = null_alloc_dev();
if (!dev) if (!dev) {
ret = -ENOMEM;
goto err_dev; goto err_dev;
}
ret = null_add_dev(dev); ret = null_add_dev(dev);
if (ret) { if (ret) {
null_free_dev(dev); null_free_dev(dev);

View File

@ -26,6 +26,7 @@ config PARIDE_PD
config PARIDE_PCD config PARIDE_PCD
tristate "Parallel port ATAPI CD-ROMs" tristate "Parallel port ATAPI CD-ROMs"
depends on PARIDE depends on PARIDE
select CDROM
select BLK_SCSI_REQUEST # only for the generic cdrom code select BLK_SCSI_REQUEST # only for the generic cdrom code
---help--- ---help---
This option enables the high-level driver for ATAPI CD-ROM devices This option enables the high-level driver for ATAPI CD-ROM devices

View File

@ -1967,7 +1967,8 @@ static void skd_isr_msg_from_dev(struct skd_device *skdev)
break; break;
case FIT_MTD_CMD_LOG_HOST_ID: case FIT_MTD_CMD_LOG_HOST_ID:
skdev->connect_time_stamp = get_seconds(); /* hardware interface overflows in y2106 */
skdev->connect_time_stamp = (u32)ktime_get_real_seconds();
data = skdev->connect_time_stamp & 0xFFFF; data = skdev->connect_time_stamp & 0xFFFF;
mtd = FIT_MXD_CONS(FIT_MTD_CMD_LOG_TIME_STAMP_LO, 0, data); mtd = FIT_MXD_CONS(FIT_MTD_CMD_LOG_TIME_STAMP_LO, 0, data);
SKD_WRITEL(skdev, mtd, FIT_MSG_TO_DEVICE); SKD_WRITEL(skdev, mtd, FIT_MSG_TO_DEVICE);

View File

@ -1,14 +1,3 @@
# SPDX-License-Identifier: GPL-2.0 # SPDX-License-Identifier: GPL-2.0
# Makefile for the kernel cdrom device drivers. obj-$(CONFIG_CDROM) += cdrom.o
# obj-$(CONFIG_GDROM) += gdrom.o
# 30 Jan 1998, Michael Elizabeth Chastain, <mailto:mec@shout.net>
# Rewritten to use lists instead of if-statements.
# Each configuration option enables a list of files.
obj-$(CONFIG_BLK_DEV_IDECD) += cdrom.o
obj-$(CONFIG_BLK_DEV_SR) += cdrom.o
obj-$(CONFIG_PARIDE_PCD) += cdrom.o
obj-$(CONFIG_CDROM_PKTCDVD) += cdrom.o
obj-$(CONFIG_GDROM) += gdrom.o cdrom.o

View File

@ -117,7 +117,9 @@ config BLK_DEV_DELKIN
config BLK_DEV_IDECD config BLK_DEV_IDECD
tristate "Include IDE/ATAPI CDROM support" tristate "Include IDE/ATAPI CDROM support"
depends on BLK_DEV
select IDE_ATAPI select IDE_ATAPI
select CDROM
---help--- ---help---
If you have a CD-ROM drive using the ATAPI protocol, say Y. ATAPI is If you have a CD-ROM drive using the ATAPI protocol, say Y. ATAPI is
a newer protocol used by IDE CD-ROM and TAPE drives, similar to the a newer protocol used by IDE CD-ROM and TAPE drives, similar to the

View File

@ -282,7 +282,7 @@ int ide_cd_expiry(ide_drive_t *drive)
struct request *rq = drive->hwif->rq; struct request *rq = drive->hwif->rq;
unsigned long wait = 0; unsigned long wait = 0;
debug_log("%s: rq->cmd[0]: 0x%x\n", __func__, rq->cmd[0]); debug_log("%s: scsi_req(rq)->cmd[0]: 0x%x\n", __func__, scsi_req(rq)->cmd[0]);
/* /*
* Some commands are *slow* and normally take a long time to complete. * Some commands are *slow* and normally take a long time to complete.
@ -463,7 +463,7 @@ static ide_startstop_t ide_pc_intr(ide_drive_t *drive)
return ide_do_reset(drive); return ide_do_reset(drive);
} }
debug_log("[cmd %x]: check condition\n", rq->cmd[0]); debug_log("[cmd %x]: check condition\n", scsi_req(rq)->cmd[0]);
/* Retry operation */ /* Retry operation */
ide_retry_pc(drive); ide_retry_pc(drive);
@ -531,7 +531,7 @@ static ide_startstop_t ide_pc_intr(ide_drive_t *drive)
ide_pad_transfer(drive, write, bcount); ide_pad_transfer(drive, write, bcount);
debug_log("[cmd %x] transferred %d bytes, padded %d bytes, resid: %u\n", debug_log("[cmd %x] transferred %d bytes, padded %d bytes, resid: %u\n",
rq->cmd[0], done, bcount, scsi_req(rq)->resid_len); scsi_req(rq)->cmd[0], done, bcount, scsi_req(rq)->resid_len);
/* And set the interrupt handler again */ /* And set the interrupt handler again */
ide_set_handler(drive, ide_pc_intr, timeout); ide_set_handler(drive, ide_pc_intr, timeout);

View File

@ -90,9 +90,9 @@ int generic_ide_resume(struct device *dev)
} }
memset(&rqpm, 0, sizeof(rqpm)); memset(&rqpm, 0, sizeof(rqpm));
rq = blk_get_request(drive->queue, REQ_OP_DRV_IN, __GFP_RECLAIM); rq = blk_get_request_flags(drive->queue, REQ_OP_DRV_IN,
BLK_MQ_REQ_PREEMPT);
ide_req(rq)->type = ATA_PRIV_PM_RESUME; ide_req(rq)->type = ATA_PRIV_PM_RESUME;
rq->rq_flags |= RQF_PREEMPT;
rq->special = &rqpm; rq->special = &rqpm;
rqpm.pm_step = IDE_PM_START_RESUME; rqpm.pm_step = IDE_PM_START_RESUME;
rqpm.pm_state = PM_EVENT_ON; rqpm.pm_state = PM_EVENT_ON;

View File

@ -4,7 +4,8 @@
menuconfig NVM menuconfig NVM
bool "Open-Channel SSD target support" bool "Open-Channel SSD target support"
depends on BLOCK && HAS_DMA depends on BLOCK && HAS_DMA && PCI
select BLK_DEV_NVME
help help
Say Y here to get to enable Open-channel SSDs. Say Y here to get to enable Open-channel SSDs.

View File

@ -22,6 +22,7 @@
#include <linux/types.h> #include <linux/types.h>
#include <linux/sem.h> #include <linux/sem.h>
#include <linux/bitmap.h> #include <linux/bitmap.h>
#include <linux/module.h>
#include <linux/moduleparam.h> #include <linux/moduleparam.h>
#include <linux/miscdevice.h> #include <linux/miscdevice.h>
#include <linux/lightnvm.h> #include <linux/lightnvm.h>
@ -138,7 +139,6 @@ static struct nvm_tgt_dev *nvm_create_tgt_dev(struct nvm_dev *dev,
int prev_nr_luns; int prev_nr_luns;
int i, j; int i, j;
nr_chnls = nr_luns / dev->geo.luns_per_chnl;
nr_chnls = (nr_chnls_mod == 0) ? nr_chnls : nr_chnls + 1; nr_chnls = (nr_chnls_mod == 0) ? nr_chnls : nr_chnls + 1;
dev_map = kmalloc(sizeof(struct nvm_dev_map), GFP_KERNEL); dev_map = kmalloc(sizeof(struct nvm_dev_map), GFP_KERNEL);
@ -226,6 +226,24 @@ static const struct block_device_operations nvm_fops = {
.owner = THIS_MODULE, .owner = THIS_MODULE,
}; };
static struct nvm_tgt_type *nvm_find_target_type(const char *name, int lock)
{
struct nvm_tgt_type *tmp, *tt = NULL;
if (lock)
down_write(&nvm_tgtt_lock);
list_for_each_entry(tmp, &nvm_tgt_types, list)
if (!strcmp(name, tmp->name)) {
tt = tmp;
break;
}
if (lock)
up_write(&nvm_tgtt_lock);
return tt;
}
static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create) static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
{ {
struct nvm_ioctl_create_simple *s = &create->conf.s; struct nvm_ioctl_create_simple *s = &create->conf.s;
@ -316,6 +334,8 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
list_add_tail(&t->list, &dev->targets); list_add_tail(&t->list, &dev->targets);
mutex_unlock(&dev->mlock); mutex_unlock(&dev->mlock);
__module_get(tt->owner);
return 0; return 0;
err_sysfs: err_sysfs:
if (tt->exit) if (tt->exit)
@ -351,6 +371,7 @@ static void __nvm_remove_target(struct nvm_target *t)
nvm_remove_tgt_dev(t->dev, 1); nvm_remove_tgt_dev(t->dev, 1);
put_disk(tdisk); put_disk(tdisk);
module_put(t->type->owner);
list_del(&t->list); list_del(&t->list);
kfree(t); kfree(t);
@ -532,25 +553,6 @@ void nvm_part_to_tgt(struct nvm_dev *dev, sector_t *entries,
} }
EXPORT_SYMBOL(nvm_part_to_tgt); EXPORT_SYMBOL(nvm_part_to_tgt);
struct nvm_tgt_type *nvm_find_target_type(const char *name, int lock)
{
struct nvm_tgt_type *tmp, *tt = NULL;
if (lock)
down_write(&nvm_tgtt_lock);
list_for_each_entry(tmp, &nvm_tgt_types, list)
if (!strcmp(name, tmp->name)) {
tt = tmp;
break;
}
if (lock)
up_write(&nvm_tgtt_lock);
return tt;
}
EXPORT_SYMBOL(nvm_find_target_type);
int nvm_register_tgt_type(struct nvm_tgt_type *tt) int nvm_register_tgt_type(struct nvm_tgt_type *tt)
{ {
int ret = 0; int ret = 0;
@ -571,9 +573,9 @@ void nvm_unregister_tgt_type(struct nvm_tgt_type *tt)
if (!tt) if (!tt)
return; return;
down_write(&nvm_lock); down_write(&nvm_tgtt_lock);
list_del(&tt->list); list_del(&tt->list);
up_write(&nvm_lock); up_write(&nvm_tgtt_lock);
} }
EXPORT_SYMBOL(nvm_unregister_tgt_type); EXPORT_SYMBOL(nvm_unregister_tgt_type);
@ -602,6 +604,52 @@ static struct nvm_dev *nvm_find_nvm_dev(const char *name)
return NULL; return NULL;
} }
static int nvm_set_rqd_ppalist(struct nvm_tgt_dev *tgt_dev, struct nvm_rq *rqd,
const struct ppa_addr *ppas, int nr_ppas)
{
struct nvm_dev *dev = tgt_dev->parent;
struct nvm_geo *geo = &tgt_dev->geo;
int i, plane_cnt, pl_idx;
struct ppa_addr ppa;
if (geo->plane_mode == NVM_PLANE_SINGLE && nr_ppas == 1) {
rqd->nr_ppas = nr_ppas;
rqd->ppa_addr = ppas[0];
return 0;
}
rqd->nr_ppas = nr_ppas;
rqd->ppa_list = nvm_dev_dma_alloc(dev, GFP_KERNEL, &rqd->dma_ppa_list);
if (!rqd->ppa_list) {
pr_err("nvm: failed to allocate dma memory\n");
return -ENOMEM;
}
plane_cnt = geo->plane_mode;
rqd->nr_ppas *= plane_cnt;
for (i = 0; i < nr_ppas; i++) {
for (pl_idx = 0; pl_idx < plane_cnt; pl_idx++) {
ppa = ppas[i];
ppa.g.pl = pl_idx;
rqd->ppa_list[(pl_idx * nr_ppas) + i] = ppa;
}
}
return 0;
}
static void nvm_free_rqd_ppalist(struct nvm_tgt_dev *tgt_dev,
struct nvm_rq *rqd)
{
if (!rqd->ppa_list)
return;
nvm_dev_dma_free(tgt_dev->parent, rqd->ppa_list, rqd->dma_ppa_list);
}
int nvm_set_tgt_bb_tbl(struct nvm_tgt_dev *tgt_dev, struct ppa_addr *ppas, int nvm_set_tgt_bb_tbl(struct nvm_tgt_dev *tgt_dev, struct ppa_addr *ppas,
int nr_ppas, int type) int nr_ppas, int type)
{ {
@ -616,7 +664,7 @@ int nvm_set_tgt_bb_tbl(struct nvm_tgt_dev *tgt_dev, struct ppa_addr *ppas,
memset(&rqd, 0, sizeof(struct nvm_rq)); memset(&rqd, 0, sizeof(struct nvm_rq));
nvm_set_rqd_ppalist(tgt_dev, &rqd, ppas, nr_ppas, 1); nvm_set_rqd_ppalist(tgt_dev, &rqd, ppas, nr_ppas);
nvm_rq_tgt_to_dev(tgt_dev, &rqd); nvm_rq_tgt_to_dev(tgt_dev, &rqd);
ret = dev->ops->set_bb_tbl(dev, &rqd.ppa_addr, rqd.nr_ppas, type); ret = dev->ops->set_bb_tbl(dev, &rqd.ppa_addr, rqd.nr_ppas, type);
@ -658,12 +706,25 @@ int nvm_submit_io(struct nvm_tgt_dev *tgt_dev, struct nvm_rq *rqd)
} }
EXPORT_SYMBOL(nvm_submit_io); EXPORT_SYMBOL(nvm_submit_io);
static void nvm_end_io_sync(struct nvm_rq *rqd) int nvm_submit_io_sync(struct nvm_tgt_dev *tgt_dev, struct nvm_rq *rqd)
{ {
struct completion *waiting = rqd->private; struct nvm_dev *dev = tgt_dev->parent;
int ret;
complete(waiting); if (!dev->ops->submit_io_sync)
return -ENODEV;
nvm_rq_tgt_to_dev(tgt_dev, rqd);
rqd->dev = tgt_dev;
/* In case of error, fail with right address format */
ret = dev->ops->submit_io_sync(dev, rqd);
nvm_rq_dev_to_tgt(tgt_dev, rqd);
return ret;
} }
EXPORT_SYMBOL(nvm_submit_io_sync);
int nvm_erase_sync(struct nvm_tgt_dev *tgt_dev, struct ppa_addr *ppas, int nvm_erase_sync(struct nvm_tgt_dev *tgt_dev, struct ppa_addr *ppas,
int nr_ppas) int nr_ppas)
@ -671,25 +732,21 @@ int nvm_erase_sync(struct nvm_tgt_dev *tgt_dev, struct ppa_addr *ppas,
struct nvm_geo *geo = &tgt_dev->geo; struct nvm_geo *geo = &tgt_dev->geo;
struct nvm_rq rqd; struct nvm_rq rqd;
int ret; int ret;
DECLARE_COMPLETION_ONSTACK(wait);
memset(&rqd, 0, sizeof(struct nvm_rq)); memset(&rqd, 0, sizeof(struct nvm_rq));
rqd.opcode = NVM_OP_ERASE; rqd.opcode = NVM_OP_ERASE;
rqd.end_io = nvm_end_io_sync;
rqd.private = &wait;
rqd.flags = geo->plane_mode >> 1; rqd.flags = geo->plane_mode >> 1;
ret = nvm_set_rqd_ppalist(tgt_dev, &rqd, ppas, nr_ppas, 1); ret = nvm_set_rqd_ppalist(tgt_dev, &rqd, ppas, nr_ppas);
if (ret) if (ret)
return ret; return ret;
ret = nvm_submit_io(tgt_dev, &rqd); ret = nvm_submit_io_sync(tgt_dev, &rqd);
if (ret) { if (ret) {
pr_err("rrpr: erase I/O submission failed: %d\n", ret); pr_err("rrpr: erase I/O submission failed: %d\n", ret);
goto free_ppa_list; goto free_ppa_list;
} }
wait_for_completion_io(&wait);
free_ppa_list: free_ppa_list:
nvm_free_rqd_ppalist(tgt_dev, &rqd); nvm_free_rqd_ppalist(tgt_dev, &rqd);
@ -775,57 +832,6 @@ void nvm_put_area(struct nvm_tgt_dev *tgt_dev, sector_t begin)
} }
EXPORT_SYMBOL(nvm_put_area); EXPORT_SYMBOL(nvm_put_area);
int nvm_set_rqd_ppalist(struct nvm_tgt_dev *tgt_dev, struct nvm_rq *rqd,
const struct ppa_addr *ppas, int nr_ppas, int vblk)
{
struct nvm_dev *dev = tgt_dev->parent;
struct nvm_geo *geo = &tgt_dev->geo;
int i, plane_cnt, pl_idx;
struct ppa_addr ppa;
if ((!vblk || geo->plane_mode == NVM_PLANE_SINGLE) && nr_ppas == 1) {
rqd->nr_ppas = nr_ppas;
rqd->ppa_addr = ppas[0];
return 0;
}
rqd->nr_ppas = nr_ppas;
rqd->ppa_list = nvm_dev_dma_alloc(dev, GFP_KERNEL, &rqd->dma_ppa_list);
if (!rqd->ppa_list) {
pr_err("nvm: failed to allocate dma memory\n");
return -ENOMEM;
}
if (!vblk) {
for (i = 0; i < nr_ppas; i++)
rqd->ppa_list[i] = ppas[i];
} else {
plane_cnt = geo->plane_mode;
rqd->nr_ppas *= plane_cnt;
for (i = 0; i < nr_ppas; i++) {
for (pl_idx = 0; pl_idx < plane_cnt; pl_idx++) {
ppa = ppas[i];
ppa.g.pl = pl_idx;
rqd->ppa_list[(pl_idx * nr_ppas) + i] = ppa;
}
}
}
return 0;
}
EXPORT_SYMBOL(nvm_set_rqd_ppalist);
void nvm_free_rqd_ppalist(struct nvm_tgt_dev *tgt_dev, struct nvm_rq *rqd)
{
if (!rqd->ppa_list)
return;
nvm_dev_dma_free(tgt_dev->parent, rqd->ppa_list, rqd->dma_ppa_list);
}
EXPORT_SYMBOL(nvm_free_rqd_ppalist);
void nvm_end_io(struct nvm_rq *rqd) void nvm_end_io(struct nvm_rq *rqd)
{ {
struct nvm_tgt_dev *tgt_dev = rqd->dev; struct nvm_tgt_dev *tgt_dev = rqd->dev;
@ -1177,7 +1183,7 @@ static long nvm_ioctl_info(struct file *file, void __user *arg)
info->version[1] = NVM_VERSION_MINOR; info->version[1] = NVM_VERSION_MINOR;
info->version[2] = NVM_VERSION_PATCH; info->version[2] = NVM_VERSION_PATCH;
down_write(&nvm_lock); down_write(&nvm_tgtt_lock);
list_for_each_entry(tt, &nvm_tgt_types, list) { list_for_each_entry(tt, &nvm_tgt_types, list) {
struct nvm_ioctl_info_tgt *tgt = &info->tgts[tgt_iter]; struct nvm_ioctl_info_tgt *tgt = &info->tgts[tgt_iter];
@ -1190,7 +1196,7 @@ static long nvm_ioctl_info(struct file *file, void __user *arg)
} }
info->tgtsize = tgt_iter; info->tgtsize = tgt_iter;
up_write(&nvm_lock); up_write(&nvm_tgtt_lock);
if (copy_to_user(arg, info, sizeof(struct nvm_ioctl_info))) { if (copy_to_user(arg, info, sizeof(struct nvm_ioctl_info))) {
kfree(info); kfree(info);

View File

@ -43,8 +43,10 @@ retry:
if (unlikely(!bio_has_data(bio))) if (unlikely(!bio_has_data(bio)))
goto out; goto out;
w_ctx.flags = flags;
pblk_ppa_set_empty(&w_ctx.ppa); pblk_ppa_set_empty(&w_ctx.ppa);
w_ctx.flags = flags;
if (bio->bi_opf & REQ_PREFLUSH)
w_ctx.flags |= PBLK_FLUSH_ENTRY;
for (i = 0; i < nr_entries; i++) { for (i = 0; i < nr_entries; i++) {
void *data = bio_data(bio); void *data = bio_data(bio);
@ -73,12 +75,11 @@ out:
* On GC the incoming lbas are not necessarily sequential. Also, some of the * On GC the incoming lbas are not necessarily sequential. Also, some of the
* lbas might not be valid entries, which are marked as empty by the GC thread * lbas might not be valid entries, which are marked as empty by the GC thread
*/ */
int pblk_write_gc_to_cache(struct pblk *pblk, void *data, u64 *lba_list, int pblk_write_gc_to_cache(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
unsigned int nr_entries, unsigned int nr_rec_entries,
struct pblk_line *gc_line, unsigned long flags)
{ {
struct pblk_w_ctx w_ctx; struct pblk_w_ctx w_ctx;
unsigned int bpos, pos; unsigned int bpos, pos;
void *data = gc_rq->data;
int i, valid_entries; int i, valid_entries;
/* Update the write buffer head (mem) with the entries that we can /* Update the write buffer head (mem) with the entries that we can
@ -86,28 +87,29 @@ int pblk_write_gc_to_cache(struct pblk *pblk, void *data, u64 *lba_list,
* rollback from here on. * rollback from here on.
*/ */
retry: retry:
if (!pblk_rb_may_write_gc(&pblk->rwb, nr_rec_entries, &bpos)) { if (!pblk_rb_may_write_gc(&pblk->rwb, gc_rq->secs_to_gc, &bpos)) {
io_schedule(); io_schedule();
goto retry; goto retry;
} }
w_ctx.flags = flags; w_ctx.flags = PBLK_IOTYPE_GC;
pblk_ppa_set_empty(&w_ctx.ppa); pblk_ppa_set_empty(&w_ctx.ppa);
for (i = 0, valid_entries = 0; i < nr_entries; i++) { for (i = 0, valid_entries = 0; i < gc_rq->nr_secs; i++) {
if (lba_list[i] == ADDR_EMPTY) if (gc_rq->lba_list[i] == ADDR_EMPTY)
continue; continue;
w_ctx.lba = lba_list[i]; w_ctx.lba = gc_rq->lba_list[i];
pos = pblk_rb_wrap_pos(&pblk->rwb, bpos + valid_entries); pos = pblk_rb_wrap_pos(&pblk->rwb, bpos + valid_entries);
pblk_rb_write_entry_gc(&pblk->rwb, data, w_ctx, gc_line, pos); pblk_rb_write_entry_gc(&pblk->rwb, data, w_ctx, gc_rq->line,
gc_rq->paddr_list[i], pos);
data += PBLK_EXPOSED_PAGE_SIZE; data += PBLK_EXPOSED_PAGE_SIZE;
valid_entries++; valid_entries++;
} }
WARN_ONCE(nr_rec_entries != valid_entries, WARN_ONCE(gc_rq->secs_to_gc != valid_entries,
"pblk: inconsistent GC write\n"); "pblk: inconsistent GC write\n");
#ifdef CONFIG_NVM_DEBUG #ifdef CONFIG_NVM_DEBUG

View File

@ -18,6 +18,31 @@
#include "pblk.h" #include "pblk.h"
static void pblk_line_mark_bb(struct work_struct *work)
{
struct pblk_line_ws *line_ws = container_of(work, struct pblk_line_ws,
ws);
struct pblk *pblk = line_ws->pblk;
struct nvm_tgt_dev *dev = pblk->dev;
struct ppa_addr *ppa = line_ws->priv;
int ret;
ret = nvm_set_tgt_bb_tbl(dev, ppa, 1, NVM_BLK_T_GRWN_BAD);
if (ret) {
struct pblk_line *line;
int pos;
line = &pblk->lines[pblk_dev_ppa_to_line(*ppa)];
pos = pblk_dev_ppa_to_pos(&dev->geo, *ppa);
pr_err("pblk: failed to mark bb, line:%d, pos:%d\n",
line->id, pos);
}
kfree(ppa);
mempool_free(line_ws, pblk->gen_ws_pool);
}
static void pblk_mark_bb(struct pblk *pblk, struct pblk_line *line, static void pblk_mark_bb(struct pblk *pblk, struct pblk_line *line,
struct ppa_addr *ppa) struct ppa_addr *ppa)
{ {
@ -33,7 +58,8 @@ static void pblk_mark_bb(struct pblk *pblk, struct pblk_line *line,
pr_err("pblk: attempted to erase bb: line:%d, pos:%d\n", pr_err("pblk: attempted to erase bb: line:%d, pos:%d\n",
line->id, pos); line->id, pos);
pblk_line_run_ws(pblk, NULL, ppa, pblk_line_mark_bb, pblk->bb_wq); pblk_gen_run_ws(pblk, NULL, ppa, pblk_line_mark_bb,
GFP_ATOMIC, pblk->bb_wq);
} }
static void __pblk_end_io_erase(struct pblk *pblk, struct nvm_rq *rqd) static void __pblk_end_io_erase(struct pblk *pblk, struct nvm_rq *rqd)
@ -63,7 +89,7 @@ static void pblk_end_io_erase(struct nvm_rq *rqd)
struct pblk *pblk = rqd->private; struct pblk *pblk = rqd->private;
__pblk_end_io_erase(pblk, rqd); __pblk_end_io_erase(pblk, rqd);
mempool_free(rqd, pblk->g_rq_pool); mempool_free(rqd, pblk->e_rq_pool);
} }
void __pblk_map_invalidate(struct pblk *pblk, struct pblk_line *line, void __pblk_map_invalidate(struct pblk *pblk, struct pblk_line *line,
@ -77,11 +103,7 @@ void __pblk_map_invalidate(struct pblk *pblk, struct pblk_line *line,
* that newer updates are not overwritten. * that newer updates are not overwritten.
*/ */
spin_lock(&line->lock); spin_lock(&line->lock);
if (line->state == PBLK_LINESTATE_GC || WARN_ON(line->state == PBLK_LINESTATE_FREE);
line->state == PBLK_LINESTATE_FREE) {
spin_unlock(&line->lock);
return;
}
if (test_and_set_bit(paddr, line->invalid_bitmap)) { if (test_and_set_bit(paddr, line->invalid_bitmap)) {
WARN_ONCE(1, "pblk: double invalidate\n"); WARN_ONCE(1, "pblk: double invalidate\n");
@ -98,8 +120,7 @@ void __pblk_map_invalidate(struct pblk *pblk, struct pblk_line *line,
spin_lock(&l_mg->gc_lock); spin_lock(&l_mg->gc_lock);
spin_lock(&line->lock); spin_lock(&line->lock);
/* Prevent moving a line that has just been chosen for GC */ /* Prevent moving a line that has just been chosen for GC */
if (line->state == PBLK_LINESTATE_GC || if (line->state == PBLK_LINESTATE_GC) {
line->state == PBLK_LINESTATE_FREE) {
spin_unlock(&line->lock); spin_unlock(&line->lock);
spin_unlock(&l_mg->gc_lock); spin_unlock(&l_mg->gc_lock);
return; return;
@ -150,17 +171,25 @@ static void pblk_invalidate_range(struct pblk *pblk, sector_t slba,
spin_unlock(&pblk->trans_lock); spin_unlock(&pblk->trans_lock);
} }
struct nvm_rq *pblk_alloc_rqd(struct pblk *pblk, int rw) /* Caller must guarantee that the request is a valid type */
struct nvm_rq *pblk_alloc_rqd(struct pblk *pblk, int type)
{ {
mempool_t *pool; mempool_t *pool;
struct nvm_rq *rqd; struct nvm_rq *rqd;
int rq_size; int rq_size;
if (rw == WRITE) { switch (type) {
case PBLK_WRITE:
case PBLK_WRITE_INT:
pool = pblk->w_rq_pool; pool = pblk->w_rq_pool;
rq_size = pblk_w_rq_size; rq_size = pblk_w_rq_size;
} else { break;
pool = pblk->g_rq_pool; case PBLK_READ:
pool = pblk->r_rq_pool;
rq_size = pblk_g_rq_size;
break;
default:
pool = pblk->e_rq_pool;
rq_size = pblk_g_rq_size; rq_size = pblk_g_rq_size;
} }
@ -170,15 +199,30 @@ struct nvm_rq *pblk_alloc_rqd(struct pblk *pblk, int rw)
return rqd; return rqd;
} }
void pblk_free_rqd(struct pblk *pblk, struct nvm_rq *rqd, int rw) /* Typically used on completion path. Cannot guarantee request consistency */
void pblk_free_rqd(struct pblk *pblk, struct nvm_rq *rqd, int type)
{ {
struct nvm_tgt_dev *dev = pblk->dev;
mempool_t *pool; mempool_t *pool;
if (rw == WRITE) switch (type) {
case PBLK_WRITE:
kfree(((struct pblk_c_ctx *)nvm_rq_to_pdu(rqd))->lun_bitmap);
case PBLK_WRITE_INT:
pool = pblk->w_rq_pool; pool = pblk->w_rq_pool;
else break;
pool = pblk->g_rq_pool; case PBLK_READ:
pool = pblk->r_rq_pool;
break;
case PBLK_ERASE:
pool = pblk->e_rq_pool;
break;
default:
pr_err("pblk: trying to free unknown rqd type\n");
return;
}
nvm_dev_dma_free(dev->parent, rqd->meta_list, rqd->dma_meta_list);
mempool_free(rqd, pool); mempool_free(rqd, pool);
} }
@ -190,10 +234,9 @@ void pblk_bio_free_pages(struct pblk *pblk, struct bio *bio, int off,
WARN_ON(off + nr_pages != bio->bi_vcnt); WARN_ON(off + nr_pages != bio->bi_vcnt);
bio_advance(bio, off * PBLK_EXPOSED_PAGE_SIZE);
for (i = off; i < nr_pages + off; i++) { for (i = off; i < nr_pages + off; i++) {
bv = bio->bi_io_vec[i]; bv = bio->bi_io_vec[i];
mempool_free(bv.bv_page, pblk->page_pool); mempool_free(bv.bv_page, pblk->page_bio_pool);
} }
} }
@ -205,14 +248,12 @@ int pblk_bio_add_pages(struct pblk *pblk, struct bio *bio, gfp_t flags,
int i, ret; int i, ret;
for (i = 0; i < nr_pages; i++) { for (i = 0; i < nr_pages; i++) {
page = mempool_alloc(pblk->page_pool, flags); page = mempool_alloc(pblk->page_bio_pool, flags);
if (!page)
goto err;
ret = bio_add_pc_page(q, bio, page, PBLK_EXPOSED_PAGE_SIZE, 0); ret = bio_add_pc_page(q, bio, page, PBLK_EXPOSED_PAGE_SIZE, 0);
if (ret != PBLK_EXPOSED_PAGE_SIZE) { if (ret != PBLK_EXPOSED_PAGE_SIZE) {
pr_err("pblk: could not add page to bio\n"); pr_err("pblk: could not add page to bio\n");
mempool_free(page, pblk->page_pool); mempool_free(page, pblk->page_bio_pool);
goto err; goto err;
} }
} }
@ -245,13 +286,6 @@ void pblk_write_should_kick(struct pblk *pblk)
pblk_write_kick(pblk); pblk_write_kick(pblk);
} }
void pblk_end_bio_sync(struct bio *bio)
{
struct completion *waiting = bio->bi_private;
complete(waiting);
}
void pblk_end_io_sync(struct nvm_rq *rqd) void pblk_end_io_sync(struct nvm_rq *rqd)
{ {
struct completion *waiting = rqd->private; struct completion *waiting = rqd->private;
@ -259,7 +293,7 @@ void pblk_end_io_sync(struct nvm_rq *rqd)
complete(waiting); complete(waiting);
} }
void pblk_wait_for_meta(struct pblk *pblk) static void pblk_wait_for_meta(struct pblk *pblk)
{ {
do { do {
if (!atomic_read(&pblk->inflight_io)) if (!atomic_read(&pblk->inflight_io))
@ -336,17 +370,6 @@ void pblk_discard(struct pblk *pblk, struct bio *bio)
pblk_invalidate_range(pblk, slba, nr_secs); pblk_invalidate_range(pblk, slba, nr_secs);
} }
struct ppa_addr pblk_get_lba_map(struct pblk *pblk, sector_t lba)
{
struct ppa_addr ppa;
spin_lock(&pblk->trans_lock);
ppa = pblk_trans_map_get(pblk, lba);
spin_unlock(&pblk->trans_lock);
return ppa;
}
void pblk_log_write_err(struct pblk *pblk, struct nvm_rq *rqd) void pblk_log_write_err(struct pblk *pblk, struct nvm_rq *rqd)
{ {
atomic_long_inc(&pblk->write_failed); atomic_long_inc(&pblk->write_failed);
@ -389,34 +412,11 @@ int pblk_submit_io(struct pblk *pblk, struct nvm_rq *rqd)
struct nvm_tgt_dev *dev = pblk->dev; struct nvm_tgt_dev *dev = pblk->dev;
#ifdef CONFIG_NVM_DEBUG #ifdef CONFIG_NVM_DEBUG
struct ppa_addr *ppa_list; int ret;
ppa_list = (rqd->nr_ppas > 1) ? rqd->ppa_list : &rqd->ppa_addr; ret = pblk_check_io(pblk, rqd);
if (pblk_boundary_ppa_checks(dev, ppa_list, rqd->nr_ppas)) { if (ret)
WARN_ON(1); return ret;
return -EINVAL;
}
if (rqd->opcode == NVM_OP_PWRITE) {
struct pblk_line *line;
struct ppa_addr ppa;
int i;
for (i = 0; i < rqd->nr_ppas; i++) {
ppa = ppa_list[i];
line = &pblk->lines[pblk_dev_ppa_to_line(ppa)];
spin_lock(&line->lock);
if (line->state != PBLK_LINESTATE_OPEN) {
pr_err("pblk: bad ppa: line:%d,state:%d\n",
line->id, line->state);
WARN_ON(1);
spin_unlock(&line->lock);
return -EINVAL;
}
spin_unlock(&line->lock);
}
}
#endif #endif
atomic_inc(&pblk->inflight_io); atomic_inc(&pblk->inflight_io);
@ -424,6 +424,28 @@ int pblk_submit_io(struct pblk *pblk, struct nvm_rq *rqd)
return nvm_submit_io(dev, rqd); return nvm_submit_io(dev, rqd);
} }
int pblk_submit_io_sync(struct pblk *pblk, struct nvm_rq *rqd)
{
struct nvm_tgt_dev *dev = pblk->dev;
#ifdef CONFIG_NVM_DEBUG
int ret;
ret = pblk_check_io(pblk, rqd);
if (ret)
return ret;
#endif
atomic_inc(&pblk->inflight_io);
return nvm_submit_io_sync(dev, rqd);
}
static void pblk_bio_map_addr_endio(struct bio *bio)
{
bio_put(bio);
}
struct bio *pblk_bio_map_addr(struct pblk *pblk, void *data, struct bio *pblk_bio_map_addr(struct pblk *pblk, void *data,
unsigned int nr_secs, unsigned int len, unsigned int nr_secs, unsigned int len,
int alloc_type, gfp_t gfp_mask) int alloc_type, gfp_t gfp_mask)
@ -460,6 +482,8 @@ struct bio *pblk_bio_map_addr(struct pblk *pblk, void *data,
kaddr += PAGE_SIZE; kaddr += PAGE_SIZE;
} }
bio->bi_end_io = pblk_bio_map_addr_endio;
out: out:
return bio; return bio;
} }
@ -486,12 +510,14 @@ void pblk_dealloc_page(struct pblk *pblk, struct pblk_line *line, int nr_secs)
u64 addr; u64 addr;
int i; int i;
spin_lock(&line->lock);
addr = find_next_zero_bit(line->map_bitmap, addr = find_next_zero_bit(line->map_bitmap,
pblk->lm.sec_per_line, line->cur_sec); pblk->lm.sec_per_line, line->cur_sec);
line->cur_sec = addr - nr_secs; line->cur_sec = addr - nr_secs;
for (i = 0; i < nr_secs; i++, line->cur_sec--) for (i = 0; i < nr_secs; i++, line->cur_sec--)
WARN_ON(!test_and_clear_bit(line->cur_sec, line->map_bitmap)); WARN_ON(!test_and_clear_bit(line->cur_sec, line->map_bitmap));
spin_unlock(&line->lock);
} }
u64 __pblk_alloc_page(struct pblk *pblk, struct pblk_line *line, int nr_secs) u64 __pblk_alloc_page(struct pblk *pblk, struct pblk_line *line, int nr_secs)
@ -565,12 +591,11 @@ static int pblk_line_submit_emeta_io(struct pblk *pblk, struct pblk_line *line,
int cmd_op, bio_op; int cmd_op, bio_op;
int i, j; int i, j;
int ret; int ret;
DECLARE_COMPLETION_ONSTACK(wait);
if (dir == WRITE) { if (dir == PBLK_WRITE) {
bio_op = REQ_OP_WRITE; bio_op = REQ_OP_WRITE;
cmd_op = NVM_OP_PWRITE; cmd_op = NVM_OP_PWRITE;
} else if (dir == READ) { } else if (dir == PBLK_READ) {
bio_op = REQ_OP_READ; bio_op = REQ_OP_READ;
cmd_op = NVM_OP_PREAD; cmd_op = NVM_OP_PREAD;
} else } else
@ -607,13 +632,11 @@ next_rq:
rqd.dma_ppa_list = dma_ppa_list; rqd.dma_ppa_list = dma_ppa_list;
rqd.opcode = cmd_op; rqd.opcode = cmd_op;
rqd.nr_ppas = rq_ppas; rqd.nr_ppas = rq_ppas;
rqd.end_io = pblk_end_io_sync;
rqd.private = &wait;
if (dir == WRITE) { if (dir == PBLK_WRITE) {
struct pblk_sec_meta *meta_list = rqd.meta_list; struct pblk_sec_meta *meta_list = rqd.meta_list;
rqd.flags = pblk_set_progr_mode(pblk, WRITE); rqd.flags = pblk_set_progr_mode(pblk, PBLK_WRITE);
for (i = 0; i < rqd.nr_ppas; ) { for (i = 0; i < rqd.nr_ppas; ) {
spin_lock(&line->lock); spin_lock(&line->lock);
paddr = __pblk_alloc_page(pblk, line, min); paddr = __pblk_alloc_page(pblk, line, min);
@ -662,25 +685,17 @@ next_rq:
} }
} }
ret = pblk_submit_io(pblk, &rqd); ret = pblk_submit_io_sync(pblk, &rqd);
if (ret) { if (ret) {
pr_err("pblk: emeta I/O submission failed: %d\n", ret); pr_err("pblk: emeta I/O submission failed: %d\n", ret);
bio_put(bio); bio_put(bio);
goto free_rqd_dma; goto free_rqd_dma;
} }
if (!wait_for_completion_io_timeout(&wait,
msecs_to_jiffies(PBLK_COMMAND_TIMEOUT_MS))) {
pr_err("pblk: emeta I/O timed out\n");
}
atomic_dec(&pblk->inflight_io); atomic_dec(&pblk->inflight_io);
reinit_completion(&wait);
if (likely(pblk->l_mg.emeta_alloc_type == PBLK_VMALLOC_META))
bio_put(bio);
if (rqd.error) { if (rqd.error) {
if (dir == WRITE) if (dir == PBLK_WRITE)
pblk_log_write_err(pblk, &rqd); pblk_log_write_err(pblk, &rqd);
else else
pblk_log_read_err(pblk, &rqd); pblk_log_read_err(pblk, &rqd);
@ -721,14 +736,13 @@ static int pblk_line_submit_smeta_io(struct pblk *pblk, struct pblk_line *line,
int i, ret; int i, ret;
int cmd_op, bio_op; int cmd_op, bio_op;
int flags; int flags;
DECLARE_COMPLETION_ONSTACK(wait);
if (dir == WRITE) { if (dir == PBLK_WRITE) {
bio_op = REQ_OP_WRITE; bio_op = REQ_OP_WRITE;
cmd_op = NVM_OP_PWRITE; cmd_op = NVM_OP_PWRITE;
flags = pblk_set_progr_mode(pblk, WRITE); flags = pblk_set_progr_mode(pblk, PBLK_WRITE);
lba_list = emeta_to_lbas(pblk, line->emeta->buf); lba_list = emeta_to_lbas(pblk, line->emeta->buf);
} else if (dir == READ) { } else if (dir == PBLK_READ) {
bio_op = REQ_OP_READ; bio_op = REQ_OP_READ;
cmd_op = NVM_OP_PREAD; cmd_op = NVM_OP_PREAD;
flags = pblk_set_read_mode(pblk, PBLK_READ_SEQUENTIAL); flags = pblk_set_read_mode(pblk, PBLK_READ_SEQUENTIAL);
@ -758,15 +772,13 @@ static int pblk_line_submit_smeta_io(struct pblk *pblk, struct pblk_line *line,
rqd.opcode = cmd_op; rqd.opcode = cmd_op;
rqd.flags = flags; rqd.flags = flags;
rqd.nr_ppas = lm->smeta_sec; rqd.nr_ppas = lm->smeta_sec;
rqd.end_io = pblk_end_io_sync;
rqd.private = &wait;
for (i = 0; i < lm->smeta_sec; i++, paddr++) { for (i = 0; i < lm->smeta_sec; i++, paddr++) {
struct pblk_sec_meta *meta_list = rqd.meta_list; struct pblk_sec_meta *meta_list = rqd.meta_list;
rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id); rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
if (dir == WRITE) { if (dir == PBLK_WRITE) {
__le64 addr_empty = cpu_to_le64(ADDR_EMPTY); __le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
meta_list[i].lba = lba_list[paddr] = addr_empty; meta_list[i].lba = lba_list[paddr] = addr_empty;
@ -778,21 +790,17 @@ static int pblk_line_submit_smeta_io(struct pblk *pblk, struct pblk_line *line,
* the write thread is the only one sending write and erase commands, * the write thread is the only one sending write and erase commands,
* there is no need to take the LUN semaphore. * there is no need to take the LUN semaphore.
*/ */
ret = pblk_submit_io(pblk, &rqd); ret = pblk_submit_io_sync(pblk, &rqd);
if (ret) { if (ret) {
pr_err("pblk: smeta I/O submission failed: %d\n", ret); pr_err("pblk: smeta I/O submission failed: %d\n", ret);
bio_put(bio); bio_put(bio);
goto free_ppa_list; goto free_ppa_list;
} }
if (!wait_for_completion_io_timeout(&wait,
msecs_to_jiffies(PBLK_COMMAND_TIMEOUT_MS))) {
pr_err("pblk: smeta I/O timed out\n");
}
atomic_dec(&pblk->inflight_io); atomic_dec(&pblk->inflight_io);
if (rqd.error) { if (rqd.error) {
if (dir == WRITE) if (dir == PBLK_WRITE)
pblk_log_write_err(pblk, &rqd); pblk_log_write_err(pblk, &rqd);
else else
pblk_log_read_err(pblk, &rqd); pblk_log_read_err(pblk, &rqd);
@ -808,14 +816,14 @@ int pblk_line_read_smeta(struct pblk *pblk, struct pblk_line *line)
{ {
u64 bpaddr = pblk_line_smeta_start(pblk, line); u64 bpaddr = pblk_line_smeta_start(pblk, line);
return pblk_line_submit_smeta_io(pblk, line, bpaddr, READ); return pblk_line_submit_smeta_io(pblk, line, bpaddr, PBLK_READ);
} }
int pblk_line_read_emeta(struct pblk *pblk, struct pblk_line *line, int pblk_line_read_emeta(struct pblk *pblk, struct pblk_line *line,
void *emeta_buf) void *emeta_buf)
{ {
return pblk_line_submit_emeta_io(pblk, line, emeta_buf, return pblk_line_submit_emeta_io(pblk, line, emeta_buf,
line->emeta_ssec, READ); line->emeta_ssec, PBLK_READ);
} }
static void pblk_setup_e_rq(struct pblk *pblk, struct nvm_rq *rqd, static void pblk_setup_e_rq(struct pblk *pblk, struct nvm_rq *rqd,
@ -824,7 +832,7 @@ static void pblk_setup_e_rq(struct pblk *pblk, struct nvm_rq *rqd,
rqd->opcode = NVM_OP_ERASE; rqd->opcode = NVM_OP_ERASE;
rqd->ppa_addr = ppa; rqd->ppa_addr = ppa;
rqd->nr_ppas = 1; rqd->nr_ppas = 1;
rqd->flags = pblk_set_progr_mode(pblk, ERASE); rqd->flags = pblk_set_progr_mode(pblk, PBLK_ERASE);
rqd->bio = NULL; rqd->bio = NULL;
} }
@ -832,19 +840,15 @@ static int pblk_blk_erase_sync(struct pblk *pblk, struct ppa_addr ppa)
{ {
struct nvm_rq rqd; struct nvm_rq rqd;
int ret = 0; int ret = 0;
DECLARE_COMPLETION_ONSTACK(wait);
memset(&rqd, 0, sizeof(struct nvm_rq)); memset(&rqd, 0, sizeof(struct nvm_rq));
pblk_setup_e_rq(pblk, &rqd, ppa); pblk_setup_e_rq(pblk, &rqd, ppa);
rqd.end_io = pblk_end_io_sync;
rqd.private = &wait;
/* The write thread schedules erases so that it minimizes disturbances /* The write thread schedules erases so that it minimizes disturbances
* with writes. Thus, there is no need to take the LUN semaphore. * with writes. Thus, there is no need to take the LUN semaphore.
*/ */
ret = pblk_submit_io(pblk, &rqd); ret = pblk_submit_io_sync(pblk, &rqd);
if (ret) { if (ret) {
struct nvm_tgt_dev *dev = pblk->dev; struct nvm_tgt_dev *dev = pblk->dev;
struct nvm_geo *geo = &dev->geo; struct nvm_geo *geo = &dev->geo;
@ -857,11 +861,6 @@ static int pblk_blk_erase_sync(struct pblk *pblk, struct ppa_addr ppa)
goto out; goto out;
} }
if (!wait_for_completion_io_timeout(&wait,
msecs_to_jiffies(PBLK_COMMAND_TIMEOUT_MS))) {
pr_err("pblk: sync erase timed out\n");
}
out: out:
rqd.private = pblk; rqd.private = pblk;
__pblk_end_io_erase(pblk, &rqd); __pblk_end_io_erase(pblk, &rqd);
@ -976,7 +975,7 @@ static int pblk_line_init_metadata(struct pblk *pblk, struct pblk_line *line,
memcpy(smeta_buf->header.uuid, pblk->instance_uuid, 16); memcpy(smeta_buf->header.uuid, pblk->instance_uuid, 16);
smeta_buf->header.id = cpu_to_le32(line->id); smeta_buf->header.id = cpu_to_le32(line->id);
smeta_buf->header.type = cpu_to_le16(line->type); smeta_buf->header.type = cpu_to_le16(line->type);
smeta_buf->header.version = cpu_to_le16(1); smeta_buf->header.version = SMETA_VERSION;
/* Start metadata */ /* Start metadata */
smeta_buf->seq_nr = cpu_to_le64(line->seq_nr); smeta_buf->seq_nr = cpu_to_le64(line->seq_nr);
@ -1046,7 +1045,7 @@ static int pblk_line_init_bb(struct pblk *pblk, struct pblk_line *line,
line->smeta_ssec = off; line->smeta_ssec = off;
line->cur_sec = off + lm->smeta_sec; line->cur_sec = off + lm->smeta_sec;
if (init && pblk_line_submit_smeta_io(pblk, line, off, WRITE)) { if (init && pblk_line_submit_smeta_io(pblk, line, off, PBLK_WRITE)) {
pr_debug("pblk: line smeta I/O failed. Retry\n"); pr_debug("pblk: line smeta I/O failed. Retry\n");
return 1; return 1;
} }
@ -1056,7 +1055,6 @@ static int pblk_line_init_bb(struct pblk *pblk, struct pblk_line *line,
/* Mark emeta metadata sectors as bad sectors. We need to consider bad /* Mark emeta metadata sectors as bad sectors. We need to consider bad
* blocks to make sure that there are enough sectors to store emeta * blocks to make sure that there are enough sectors to store emeta
*/ */
bit = lm->sec_per_line;
off = lm->sec_per_line - lm->emeta_sec[0]; off = lm->sec_per_line - lm->emeta_sec[0];
bitmap_set(line->invalid_bitmap, off, lm->emeta_sec[0]); bitmap_set(line->invalid_bitmap, off, lm->emeta_sec[0]);
while (nr_bb) { while (nr_bb) {
@ -1093,25 +1091,21 @@ static int pblk_line_prepare(struct pblk *pblk, struct pblk_line *line)
struct pblk_line_meta *lm = &pblk->lm; struct pblk_line_meta *lm = &pblk->lm;
int blk_in_line = atomic_read(&line->blk_in_line); int blk_in_line = atomic_read(&line->blk_in_line);
line->map_bitmap = mempool_alloc(pblk->line_meta_pool, GFP_ATOMIC); line->map_bitmap = kzalloc(lm->sec_bitmap_len, GFP_ATOMIC);
if (!line->map_bitmap) if (!line->map_bitmap)
return -ENOMEM; return -ENOMEM;
memset(line->map_bitmap, 0, lm->sec_bitmap_len);
/* invalid_bitmap is special since it is used when line is closed. No /* will be initialized using bb info from map_bitmap */
* need to zeroized; it will be initialized using bb info form line->invalid_bitmap = kmalloc(lm->sec_bitmap_len, GFP_ATOMIC);
* map_bitmap
*/
line->invalid_bitmap = mempool_alloc(pblk->line_meta_pool, GFP_ATOMIC);
if (!line->invalid_bitmap) { if (!line->invalid_bitmap) {
mempool_free(line->map_bitmap, pblk->line_meta_pool); kfree(line->map_bitmap);
return -ENOMEM; return -ENOMEM;
} }
spin_lock(&line->lock); spin_lock(&line->lock);
if (line->state != PBLK_LINESTATE_FREE) { if (line->state != PBLK_LINESTATE_FREE) {
mempool_free(line->invalid_bitmap, pblk->line_meta_pool); kfree(line->map_bitmap);
mempool_free(line->map_bitmap, pblk->line_meta_pool); kfree(line->invalid_bitmap);
spin_unlock(&line->lock); spin_unlock(&line->lock);
WARN(1, "pblk: corrupted line %d, state %d\n", WARN(1, "pblk: corrupted line %d, state %d\n",
line->id, line->state); line->id, line->state);
@ -1163,7 +1157,7 @@ int pblk_line_recov_alloc(struct pblk *pblk, struct pblk_line *line)
void pblk_line_recov_close(struct pblk *pblk, struct pblk_line *line) void pblk_line_recov_close(struct pblk *pblk, struct pblk_line *line)
{ {
mempool_free(line->map_bitmap, pblk->line_meta_pool); kfree(line->map_bitmap);
line->map_bitmap = NULL; line->map_bitmap = NULL;
line->smeta = NULL; line->smeta = NULL;
line->emeta = NULL; line->emeta = NULL;
@ -1328,6 +1322,41 @@ static void pblk_stop_writes(struct pblk *pblk, struct pblk_line *line)
pblk->state = PBLK_STATE_STOPPING; pblk->state = PBLK_STATE_STOPPING;
} }
static void pblk_line_close_meta_sync(struct pblk *pblk)
{
struct pblk_line_mgmt *l_mg = &pblk->l_mg;
struct pblk_line_meta *lm = &pblk->lm;
struct pblk_line *line, *tline;
LIST_HEAD(list);
spin_lock(&l_mg->close_lock);
if (list_empty(&l_mg->emeta_list)) {
spin_unlock(&l_mg->close_lock);
return;
}
list_cut_position(&list, &l_mg->emeta_list, l_mg->emeta_list.prev);
spin_unlock(&l_mg->close_lock);
list_for_each_entry_safe(line, tline, &list, list) {
struct pblk_emeta *emeta = line->emeta;
while (emeta->mem < lm->emeta_len[0]) {
int ret;
ret = pblk_submit_meta_io(pblk, line);
if (ret) {
pr_err("pblk: sync meta line %d failed (%d)\n",
line->id, ret);
return;
}
}
}
pblk_wait_for_meta(pblk);
flush_workqueue(pblk->close_wq);
}
void pblk_pipeline_stop(struct pblk *pblk) void pblk_pipeline_stop(struct pblk *pblk)
{ {
struct pblk_line_mgmt *l_mg = &pblk->l_mg; struct pblk_line_mgmt *l_mg = &pblk->l_mg;
@ -1361,17 +1390,17 @@ void pblk_pipeline_stop(struct pblk *pblk)
spin_unlock(&l_mg->free_lock); spin_unlock(&l_mg->free_lock);
} }
void pblk_line_replace_data(struct pblk *pblk) struct pblk_line *pblk_line_replace_data(struct pblk *pblk)
{ {
struct pblk_line_mgmt *l_mg = &pblk->l_mg; struct pblk_line_mgmt *l_mg = &pblk->l_mg;
struct pblk_line *cur, *new; struct pblk_line *cur, *new = NULL;
unsigned int left_seblks; unsigned int left_seblks;
int is_next = 0; int is_next = 0;
cur = l_mg->data_line; cur = l_mg->data_line;
new = l_mg->data_next; new = l_mg->data_next;
if (!new) if (!new)
return; goto out;
l_mg->data_line = new; l_mg->data_line = new;
spin_lock(&l_mg->free_lock); spin_lock(&l_mg->free_lock);
@ -1379,7 +1408,7 @@ void pblk_line_replace_data(struct pblk *pblk)
l_mg->data_line = NULL; l_mg->data_line = NULL;
l_mg->data_next = NULL; l_mg->data_next = NULL;
spin_unlock(&l_mg->free_lock); spin_unlock(&l_mg->free_lock);
return; goto out;
} }
pblk_line_setup_metadata(new, l_mg, &pblk->lm); pblk_line_setup_metadata(new, l_mg, &pblk->lm);
@ -1391,7 +1420,7 @@ retry_erase:
/* If line is not fully erased, erase it */ /* If line is not fully erased, erase it */
if (atomic_read(&new->left_eblks)) { if (atomic_read(&new->left_eblks)) {
if (pblk_line_erase(pblk, new)) if (pblk_line_erase(pblk, new))
return; goto out;
} else { } else {
io_schedule(); io_schedule();
} }
@ -1402,7 +1431,7 @@ retry_setup:
if (!pblk_line_init_metadata(pblk, new, cur)) { if (!pblk_line_init_metadata(pblk, new, cur)) {
new = pblk_line_retry(pblk, new); new = pblk_line_retry(pblk, new);
if (!new) if (!new)
return; goto out;
goto retry_setup; goto retry_setup;
} }
@ -1410,7 +1439,7 @@ retry_setup:
if (!pblk_line_init_bb(pblk, new, 1)) { if (!pblk_line_init_bb(pblk, new, 1)) {
new = pblk_line_retry(pblk, new); new = pblk_line_retry(pblk, new);
if (!new) if (!new)
return; goto out;
goto retry_setup; goto retry_setup;
} }
@ -1434,14 +1463,15 @@ retry_setup:
if (is_next) if (is_next)
pblk_rl_free_lines_dec(&pblk->rl, l_mg->data_next); pblk_rl_free_lines_dec(&pblk->rl, l_mg->data_next);
out:
return new;
} }
void pblk_line_free(struct pblk *pblk, struct pblk_line *line) void pblk_line_free(struct pblk *pblk, struct pblk_line *line)
{ {
if (line->map_bitmap) kfree(line->map_bitmap);
mempool_free(line->map_bitmap, pblk->line_meta_pool); kfree(line->invalid_bitmap);
if (line->invalid_bitmap)
mempool_free(line->invalid_bitmap, pblk->line_meta_pool);
*line->vsc = cpu_to_le32(EMPTY_ENTRY); *line->vsc = cpu_to_le32(EMPTY_ENTRY);
@ -1451,11 +1481,10 @@ void pblk_line_free(struct pblk *pblk, struct pblk_line *line)
line->emeta = NULL; line->emeta = NULL;
} }
void pblk_line_put(struct kref *ref) static void __pblk_line_put(struct pblk *pblk, struct pblk_line *line)
{ {
struct pblk_line *line = container_of(ref, struct pblk_line, ref);
struct pblk *pblk = line->pblk;
struct pblk_line_mgmt *l_mg = &pblk->l_mg; struct pblk_line_mgmt *l_mg = &pblk->l_mg;
struct pblk_gc *gc = &pblk->gc;
spin_lock(&line->lock); spin_lock(&line->lock);
WARN_ON(line->state != PBLK_LINESTATE_GC); WARN_ON(line->state != PBLK_LINESTATE_GC);
@ -1464,6 +1493,8 @@ void pblk_line_put(struct kref *ref)
pblk_line_free(pblk, line); pblk_line_free(pblk, line);
spin_unlock(&line->lock); spin_unlock(&line->lock);
atomic_dec(&gc->pipeline_gc);
spin_lock(&l_mg->free_lock); spin_lock(&l_mg->free_lock);
list_add_tail(&line->list, &l_mg->free_list); list_add_tail(&line->list, &l_mg->free_list);
l_mg->nr_free_lines++; l_mg->nr_free_lines++;
@ -1472,13 +1503,49 @@ void pblk_line_put(struct kref *ref)
pblk_rl_free_lines_inc(&pblk->rl, line); pblk_rl_free_lines_inc(&pblk->rl, line);
} }
static void pblk_line_put_ws(struct work_struct *work)
{
struct pblk_line_ws *line_put_ws = container_of(work,
struct pblk_line_ws, ws);
struct pblk *pblk = line_put_ws->pblk;
struct pblk_line *line = line_put_ws->line;
__pblk_line_put(pblk, line);
mempool_free(line_put_ws, pblk->gen_ws_pool);
}
void pblk_line_put(struct kref *ref)
{
struct pblk_line *line = container_of(ref, struct pblk_line, ref);
struct pblk *pblk = line->pblk;
__pblk_line_put(pblk, line);
}
void pblk_line_put_wq(struct kref *ref)
{
struct pblk_line *line = container_of(ref, struct pblk_line, ref);
struct pblk *pblk = line->pblk;
struct pblk_line_ws *line_put_ws;
line_put_ws = mempool_alloc(pblk->gen_ws_pool, GFP_ATOMIC);
if (!line_put_ws)
return;
line_put_ws->pblk = pblk;
line_put_ws->line = line;
line_put_ws->priv = NULL;
INIT_WORK(&line_put_ws->ws, pblk_line_put_ws);
queue_work(pblk->r_end_wq, &line_put_ws->ws);
}
int pblk_blk_erase_async(struct pblk *pblk, struct ppa_addr ppa) int pblk_blk_erase_async(struct pblk *pblk, struct ppa_addr ppa)
{ {
struct nvm_rq *rqd; struct nvm_rq *rqd;
int err; int err;
rqd = mempool_alloc(pblk->g_rq_pool, GFP_KERNEL); rqd = pblk_alloc_rqd(pblk, PBLK_ERASE);
memset(rqd, 0, pblk_g_rq_size);
pblk_setup_e_rq(pblk, rqd, ppa); pblk_setup_e_rq(pblk, rqd, ppa);
@ -1517,41 +1584,6 @@ int pblk_line_is_full(struct pblk_line *line)
return (line->left_msecs == 0); return (line->left_msecs == 0);
} }
void pblk_line_close_meta_sync(struct pblk *pblk)
{
struct pblk_line_mgmt *l_mg = &pblk->l_mg;
struct pblk_line_meta *lm = &pblk->lm;
struct pblk_line *line, *tline;
LIST_HEAD(list);
spin_lock(&l_mg->close_lock);
if (list_empty(&l_mg->emeta_list)) {
spin_unlock(&l_mg->close_lock);
return;
}
list_cut_position(&list, &l_mg->emeta_list, l_mg->emeta_list.prev);
spin_unlock(&l_mg->close_lock);
list_for_each_entry_safe(line, tline, &list, list) {
struct pblk_emeta *emeta = line->emeta;
while (emeta->mem < lm->emeta_len[0]) {
int ret;
ret = pblk_submit_meta_io(pblk, line);
if (ret) {
pr_err("pblk: sync meta line %d failed (%d)\n",
line->id, ret);
return;
}
}
}
pblk_wait_for_meta(pblk);
flush_workqueue(pblk->close_wq);
}
static void pblk_line_should_sync_meta(struct pblk *pblk) static void pblk_line_should_sync_meta(struct pblk *pblk)
{ {
if (pblk_rl_is_limit(&pblk->rl)) if (pblk_rl_is_limit(&pblk->rl))
@ -1582,15 +1614,13 @@ void pblk_line_close(struct pblk *pblk, struct pblk_line *line)
list_add_tail(&line->list, move_list); list_add_tail(&line->list, move_list);
mempool_free(line->map_bitmap, pblk->line_meta_pool); kfree(line->map_bitmap);
line->map_bitmap = NULL; line->map_bitmap = NULL;
line->smeta = NULL; line->smeta = NULL;
line->emeta = NULL; line->emeta = NULL;
spin_unlock(&line->lock); spin_unlock(&line->lock);
spin_unlock(&l_mg->gc_lock); spin_unlock(&l_mg->gc_lock);
pblk_gc_should_kick(pblk);
} }
void pblk_line_close_meta(struct pblk *pblk, struct pblk_line *line) void pblk_line_close_meta(struct pblk *pblk, struct pblk_line *line)
@ -1624,43 +1654,16 @@ void pblk_line_close_ws(struct work_struct *work)
struct pblk_line *line = line_ws->line; struct pblk_line *line = line_ws->line;
pblk_line_close(pblk, line); pblk_line_close(pblk, line);
mempool_free(line_ws, pblk->line_ws_pool); mempool_free(line_ws, pblk->gen_ws_pool);
} }
void pblk_line_mark_bb(struct work_struct *work) void pblk_gen_run_ws(struct pblk *pblk, struct pblk_line *line, void *priv,
{ void (*work)(struct work_struct *), gfp_t gfp_mask,
struct pblk_line_ws *line_ws = container_of(work, struct pblk_line_ws,
ws);
struct pblk *pblk = line_ws->pblk;
struct nvm_tgt_dev *dev = pblk->dev;
struct ppa_addr *ppa = line_ws->priv;
int ret;
ret = nvm_set_tgt_bb_tbl(dev, ppa, 1, NVM_BLK_T_GRWN_BAD);
if (ret) {
struct pblk_line *line;
int pos;
line = &pblk->lines[pblk_dev_ppa_to_line(*ppa)];
pos = pblk_dev_ppa_to_pos(&dev->geo, *ppa);
pr_err("pblk: failed to mark bb, line:%d, pos:%d\n",
line->id, pos);
}
kfree(ppa);
mempool_free(line_ws, pblk->line_ws_pool);
}
void pblk_line_run_ws(struct pblk *pblk, struct pblk_line *line, void *priv,
void (*work)(struct work_struct *),
struct workqueue_struct *wq) struct workqueue_struct *wq)
{ {
struct pblk_line_ws *line_ws; struct pblk_line_ws *line_ws;
line_ws = mempool_alloc(pblk->line_ws_pool, GFP_ATOMIC); line_ws = mempool_alloc(pblk->gen_ws_pool, gfp_mask);
if (!line_ws)
return;
line_ws->pblk = pblk; line_ws->pblk = pblk;
line_ws->line = line; line_ws->line = line;
@ -1689,16 +1692,8 @@ static void __pblk_down_page(struct pblk *pblk, struct ppa_addr *ppa_list,
#endif #endif
ret = down_timeout(&rlun->wr_sem, msecs_to_jiffies(30000)); ret = down_timeout(&rlun->wr_sem, msecs_to_jiffies(30000));
if (ret) { if (ret == -ETIME || ret == -EINTR)
switch (ret) { pr_err("pblk: taking lun semaphore timed out: err %d\n", -ret);
case -ETIME:
pr_err("pblk: lun semaphore timed out\n");
break;
case -EINTR:
pr_err("pblk: lun semaphore timed out\n");
break;
}
}
} }
void pblk_down_page(struct pblk *pblk, struct ppa_addr *ppa_list, int nr_ppas) void pblk_down_page(struct pblk *pblk, struct ppa_addr *ppa_list, int nr_ppas)
@ -1758,13 +1753,11 @@ void pblk_up_rq(struct pblk *pblk, struct ppa_addr *ppa_list, int nr_ppas,
rlun = &pblk->luns[bit]; rlun = &pblk->luns[bit];
up(&rlun->wr_sem); up(&rlun->wr_sem);
} }
kfree(lun_bitmap);
} }
void pblk_update_map(struct pblk *pblk, sector_t lba, struct ppa_addr ppa) void pblk_update_map(struct pblk *pblk, sector_t lba, struct ppa_addr ppa)
{ {
struct ppa_addr l2p_ppa; struct ppa_addr ppa_l2p;
/* logic error: lba out-of-bounds. Ignore update */ /* logic error: lba out-of-bounds. Ignore update */
if (!(lba < pblk->rl.nr_secs)) { if (!(lba < pblk->rl.nr_secs)) {
@ -1773,10 +1766,10 @@ void pblk_update_map(struct pblk *pblk, sector_t lba, struct ppa_addr ppa)
} }
spin_lock(&pblk->trans_lock); spin_lock(&pblk->trans_lock);
l2p_ppa = pblk_trans_map_get(pblk, lba); ppa_l2p = pblk_trans_map_get(pblk, lba);
if (!pblk_addr_in_cache(l2p_ppa) && !pblk_ppa_empty(l2p_ppa)) if (!pblk_addr_in_cache(ppa_l2p) && !pblk_ppa_empty(ppa_l2p))
pblk_map_invalidate(pblk, l2p_ppa); pblk_map_invalidate(pblk, ppa_l2p);
pblk_trans_map_set(pblk, lba, ppa); pblk_trans_map_set(pblk, lba, ppa);
spin_unlock(&pblk->trans_lock); spin_unlock(&pblk->trans_lock);
@ -1784,6 +1777,7 @@ void pblk_update_map(struct pblk *pblk, sector_t lba, struct ppa_addr ppa)
void pblk_update_map_cache(struct pblk *pblk, sector_t lba, struct ppa_addr ppa) void pblk_update_map_cache(struct pblk *pblk, sector_t lba, struct ppa_addr ppa)
{ {
#ifdef CONFIG_NVM_DEBUG #ifdef CONFIG_NVM_DEBUG
/* Callers must ensure that the ppa points to a cache address */ /* Callers must ensure that the ppa points to a cache address */
BUG_ON(!pblk_addr_in_cache(ppa)); BUG_ON(!pblk_addr_in_cache(ppa));
@ -1793,16 +1787,16 @@ void pblk_update_map_cache(struct pblk *pblk, sector_t lba, struct ppa_addr ppa)
pblk_update_map(pblk, lba, ppa); pblk_update_map(pblk, lba, ppa);
} }
int pblk_update_map_gc(struct pblk *pblk, sector_t lba, struct ppa_addr ppa, int pblk_update_map_gc(struct pblk *pblk, sector_t lba, struct ppa_addr ppa_new,
struct pblk_line *gc_line) struct pblk_line *gc_line, u64 paddr_gc)
{ {
struct ppa_addr l2p_ppa; struct ppa_addr ppa_l2p, ppa_gc;
int ret = 1; int ret = 1;
#ifdef CONFIG_NVM_DEBUG #ifdef CONFIG_NVM_DEBUG
/* Callers must ensure that the ppa points to a cache address */ /* Callers must ensure that the ppa points to a cache address */
BUG_ON(!pblk_addr_in_cache(ppa)); BUG_ON(!pblk_addr_in_cache(ppa_new));
BUG_ON(pblk_rb_pos_oob(&pblk->rwb, pblk_addr_to_cacheline(ppa))); BUG_ON(pblk_rb_pos_oob(&pblk->rwb, pblk_addr_to_cacheline(ppa_new)));
#endif #endif
/* logic error: lba out-of-bounds. Ignore update */ /* logic error: lba out-of-bounds. Ignore update */
@ -1812,36 +1806,41 @@ int pblk_update_map_gc(struct pblk *pblk, sector_t lba, struct ppa_addr ppa,
} }
spin_lock(&pblk->trans_lock); spin_lock(&pblk->trans_lock);
l2p_ppa = pblk_trans_map_get(pblk, lba); ppa_l2p = pblk_trans_map_get(pblk, lba);
ppa_gc = addr_to_gen_ppa(pblk, paddr_gc, gc_line->id);
if (!pblk_ppa_comp(ppa_l2p, ppa_gc)) {
spin_lock(&gc_line->lock);
WARN(!test_bit(paddr_gc, gc_line->invalid_bitmap),
"pblk: corrupted GC update");
spin_unlock(&gc_line->lock);
/* Prevent updated entries to be overwritten by GC */
if (pblk_addr_in_cache(l2p_ppa) || pblk_ppa_empty(l2p_ppa) ||
pblk_tgt_ppa_to_line(l2p_ppa) != gc_line->id) {
ret = 0; ret = 0;
goto out; goto out;
} }
pblk_trans_map_set(pblk, lba, ppa); pblk_trans_map_set(pblk, lba, ppa_new);
out: out:
spin_unlock(&pblk->trans_lock); spin_unlock(&pblk->trans_lock);
return ret; return ret;
} }
void pblk_update_map_dev(struct pblk *pblk, sector_t lba, struct ppa_addr ppa, void pblk_update_map_dev(struct pblk *pblk, sector_t lba,
struct ppa_addr entry_line) struct ppa_addr ppa_mapped, struct ppa_addr ppa_cache)
{ {
struct ppa_addr l2p_line; struct ppa_addr ppa_l2p;
#ifdef CONFIG_NVM_DEBUG #ifdef CONFIG_NVM_DEBUG
/* Callers must ensure that the ppa points to a device address */ /* Callers must ensure that the ppa points to a device address */
BUG_ON(pblk_addr_in_cache(ppa)); BUG_ON(pblk_addr_in_cache(ppa_mapped));
#endif #endif
/* Invalidate and discard padded entries */ /* Invalidate and discard padded entries */
if (lba == ADDR_EMPTY) { if (lba == ADDR_EMPTY) {
#ifdef CONFIG_NVM_DEBUG #ifdef CONFIG_NVM_DEBUG
atomic_long_inc(&pblk->padded_wb); atomic_long_inc(&pblk->padded_wb);
#endif #endif
pblk_map_invalidate(pblk, ppa); if (!pblk_ppa_empty(ppa_mapped))
pblk_map_invalidate(pblk, ppa_mapped);
return; return;
} }
@ -1852,22 +1851,22 @@ void pblk_update_map_dev(struct pblk *pblk, sector_t lba, struct ppa_addr ppa,
} }
spin_lock(&pblk->trans_lock); spin_lock(&pblk->trans_lock);
l2p_line = pblk_trans_map_get(pblk, lba); ppa_l2p = pblk_trans_map_get(pblk, lba);
/* Do not update L2P if the cacheline has been updated. In this case, /* Do not update L2P if the cacheline has been updated. In this case,
* the mapped ppa must be invalidated * the mapped ppa must be invalidated
*/ */
if (l2p_line.ppa != entry_line.ppa) { if (!pblk_ppa_comp(ppa_l2p, ppa_cache)) {
if (!pblk_ppa_empty(ppa)) if (!pblk_ppa_empty(ppa_mapped))
pblk_map_invalidate(pblk, ppa); pblk_map_invalidate(pblk, ppa_mapped);
goto out; goto out;
} }
#ifdef CONFIG_NVM_DEBUG #ifdef CONFIG_NVM_DEBUG
WARN_ON(!pblk_addr_in_cache(l2p_line) && !pblk_ppa_empty(l2p_line)); WARN_ON(!pblk_addr_in_cache(ppa_l2p) && !pblk_ppa_empty(ppa_l2p));
#endif #endif
pblk_trans_map_set(pblk, lba, ppa); pblk_trans_map_set(pblk, lba, ppa_mapped);
out: out:
spin_unlock(&pblk->trans_lock); spin_unlock(&pblk->trans_lock);
} }
@ -1878,23 +1877,32 @@ void pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
int i; int i;
spin_lock(&pblk->trans_lock); spin_lock(&pblk->trans_lock);
for (i = 0; i < nr_secs; i++) for (i = 0; i < nr_secs; i++) {
ppas[i] = pblk_trans_map_get(pblk, blba + i); struct ppa_addr ppa;
ppa = ppas[i] = pblk_trans_map_get(pblk, blba + i);
/* If the L2P entry maps to a line, the reference is valid */
if (!pblk_ppa_empty(ppa) && !pblk_addr_in_cache(ppa)) {
int line_id = pblk_dev_ppa_to_line(ppa);
struct pblk_line *line = &pblk->lines[line_id];
kref_get(&line->ref);
}
}
spin_unlock(&pblk->trans_lock); spin_unlock(&pblk->trans_lock);
} }
void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas, void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
u64 *lba_list, int nr_secs) u64 *lba_list, int nr_secs)
{ {
sector_t lba; u64 lba;
int i; int i;
spin_lock(&pblk->trans_lock); spin_lock(&pblk->trans_lock);
for (i = 0; i < nr_secs; i++) { for (i = 0; i < nr_secs; i++) {
lba = lba_list[i]; lba = lba_list[i];
if (lba == ADDR_EMPTY) { if (lba != ADDR_EMPTY) {
ppas[i].ppa = ADDR_EMPTY;
} else {
/* logic error: lba out-of-bounds. Ignore update */ /* logic error: lba out-of-bounds. Ignore update */
if (!(lba < pblk->rl.nr_secs)) { if (!(lba < pblk->rl.nr_secs)) {
WARN(1, "pblk: corrupted L2P map request\n"); WARN(1, "pblk: corrupted L2P map request\n");

View File

@ -20,7 +20,8 @@
static void pblk_gc_free_gc_rq(struct pblk_gc_rq *gc_rq) static void pblk_gc_free_gc_rq(struct pblk_gc_rq *gc_rq)
{ {
vfree(gc_rq->data); if (gc_rq->data)
vfree(gc_rq->data);
kfree(gc_rq); kfree(gc_rq);
} }
@ -41,10 +42,7 @@ static int pblk_gc_write(struct pblk *pblk)
spin_unlock(&gc->w_lock); spin_unlock(&gc->w_lock);
list_for_each_entry_safe(gc_rq, tgc_rq, &w_list, list) { list_for_each_entry_safe(gc_rq, tgc_rq, &w_list, list) {
pblk_write_gc_to_cache(pblk, gc_rq->data, gc_rq->lba_list, pblk_write_gc_to_cache(pblk, gc_rq);
gc_rq->nr_secs, gc_rq->secs_to_gc,
gc_rq->line, PBLK_IOTYPE_GC);
list_del(&gc_rq->list); list_del(&gc_rq->list);
kref_put(&gc_rq->line->ref, pblk_line_put); kref_put(&gc_rq->line->ref, pblk_line_put);
pblk_gc_free_gc_rq(gc_rq); pblk_gc_free_gc_rq(gc_rq);
@ -58,64 +56,6 @@ static void pblk_gc_writer_kick(struct pblk_gc *gc)
wake_up_process(gc->gc_writer_ts); wake_up_process(gc->gc_writer_ts);
} }
/*
* Responsible for managing all memory related to a gc request. Also in case of
* failure
*/
static int pblk_gc_move_valid_secs(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
{
struct nvm_tgt_dev *dev = pblk->dev;
struct nvm_geo *geo = &dev->geo;
struct pblk_gc *gc = &pblk->gc;
struct pblk_line *line = gc_rq->line;
void *data;
unsigned int secs_to_gc;
int ret = 0;
data = vmalloc(gc_rq->nr_secs * geo->sec_size);
if (!data) {
ret = -ENOMEM;
goto out;
}
/* Read from GC victim block */
if (pblk_submit_read_gc(pblk, gc_rq->lba_list, data, gc_rq->nr_secs,
&secs_to_gc, line)) {
ret = -EFAULT;
goto free_data;
}
if (!secs_to_gc)
goto free_rq;
gc_rq->data = data;
gc_rq->secs_to_gc = secs_to_gc;
retry:
spin_lock(&gc->w_lock);
if (gc->w_entries >= PBLK_GC_W_QD) {
spin_unlock(&gc->w_lock);
pblk_gc_writer_kick(&pblk->gc);
usleep_range(128, 256);
goto retry;
}
gc->w_entries++;
list_add_tail(&gc_rq->list, &gc->w_list);
spin_unlock(&gc->w_lock);
pblk_gc_writer_kick(&pblk->gc);
return 0;
free_rq:
kfree(gc_rq);
free_data:
vfree(data);
out:
kref_put(&line->ref, pblk_line_put);
return ret;
}
static void pblk_put_line_back(struct pblk *pblk, struct pblk_line *line) static void pblk_put_line_back(struct pblk *pblk, struct pblk_line *line)
{ {
struct pblk_line_mgmt *l_mg = &pblk->l_mg; struct pblk_line_mgmt *l_mg = &pblk->l_mg;
@ -136,22 +76,57 @@ static void pblk_put_line_back(struct pblk *pblk, struct pblk_line *line)
static void pblk_gc_line_ws(struct work_struct *work) static void pblk_gc_line_ws(struct work_struct *work)
{ {
struct pblk_line_ws *line_rq_ws = container_of(work, struct pblk_line_ws *gc_rq_ws = container_of(work,
struct pblk_line_ws, ws); struct pblk_line_ws, ws);
struct pblk *pblk = line_rq_ws->pblk; struct pblk *pblk = gc_rq_ws->pblk;
struct nvm_tgt_dev *dev = pblk->dev;
struct nvm_geo *geo = &dev->geo;
struct pblk_gc *gc = &pblk->gc; struct pblk_gc *gc = &pblk->gc;
struct pblk_line *line = line_rq_ws->line; struct pblk_line *line = gc_rq_ws->line;
struct pblk_gc_rq *gc_rq = line_rq_ws->priv; struct pblk_gc_rq *gc_rq = gc_rq_ws->priv;
int ret;
up(&gc->gc_sem); up(&gc->gc_sem);
if (pblk_gc_move_valid_secs(pblk, gc_rq)) { gc_rq->data = vmalloc(gc_rq->nr_secs * geo->sec_size);
pr_err("pblk: could not GC all sectors: line:%d (%d/%d)\n", if (!gc_rq->data) {
line->id, *line->vsc, pr_err("pblk: could not GC line:%d (%d/%d)\n",
gc_rq->nr_secs); line->id, *line->vsc, gc_rq->nr_secs);
goto out;
} }
mempool_free(line_rq_ws, pblk->line_ws_pool); /* Read from GC victim block */
ret = pblk_submit_read_gc(pblk, gc_rq);
if (ret) {
pr_err("pblk: failed GC read in line:%d (err:%d)\n",
line->id, ret);
goto out;
}
if (!gc_rq->secs_to_gc)
goto out;
retry:
spin_lock(&gc->w_lock);
if (gc->w_entries >= PBLK_GC_RQ_QD) {
spin_unlock(&gc->w_lock);
pblk_gc_writer_kick(&pblk->gc);
usleep_range(128, 256);
goto retry;
}
gc->w_entries++;
list_add_tail(&gc_rq->list, &gc->w_list);
spin_unlock(&gc->w_lock);
pblk_gc_writer_kick(&pblk->gc);
kfree(gc_rq_ws);
return;
out:
pblk_gc_free_gc_rq(gc_rq);
kref_put(&line->ref, pblk_line_put);
kfree(gc_rq_ws);
} }
static void pblk_gc_line_prepare_ws(struct work_struct *work) static void pblk_gc_line_prepare_ws(struct work_struct *work)
@ -164,17 +139,24 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
struct pblk_line_meta *lm = &pblk->lm; struct pblk_line_meta *lm = &pblk->lm;
struct pblk_gc *gc = &pblk->gc; struct pblk_gc *gc = &pblk->gc;
struct line_emeta *emeta_buf; struct line_emeta *emeta_buf;
struct pblk_line_ws *line_rq_ws; struct pblk_line_ws *gc_rq_ws;
struct pblk_gc_rq *gc_rq; struct pblk_gc_rq *gc_rq;
__le64 *lba_list; __le64 *lba_list;
unsigned long *invalid_bitmap;
int sec_left, nr_secs, bit; int sec_left, nr_secs, bit;
int ret; int ret;
invalid_bitmap = kmalloc(lm->sec_bitmap_len, GFP_KERNEL);
if (!invalid_bitmap) {
pr_err("pblk: could not allocate GC invalid bitmap\n");
goto fail_free_ws;
}
emeta_buf = pblk_malloc(lm->emeta_len[0], l_mg->emeta_alloc_type, emeta_buf = pblk_malloc(lm->emeta_len[0], l_mg->emeta_alloc_type,
GFP_KERNEL); GFP_KERNEL);
if (!emeta_buf) { if (!emeta_buf) {
pr_err("pblk: cannot use GC emeta\n"); pr_err("pblk: cannot use GC emeta\n");
return; goto fail_free_bitmap;
} }
ret = pblk_line_read_emeta(pblk, line, emeta_buf); ret = pblk_line_read_emeta(pblk, line, emeta_buf);
@ -193,7 +175,11 @@ static void pblk_gc_line_prepare_ws(struct work_struct *work)
goto fail_free_emeta; goto fail_free_emeta;
} }
spin_lock(&line->lock);
bitmap_copy(invalid_bitmap, line->invalid_bitmap, lm->sec_per_line);
sec_left = pblk_line_vsc(line); sec_left = pblk_line_vsc(line);
spin_unlock(&line->lock);
if (sec_left < 0) { if (sec_left < 0) {
pr_err("pblk: corrupted GC line (%d)\n", line->id); pr_err("pblk: corrupted GC line (%d)\n", line->id);
goto fail_free_emeta; goto fail_free_emeta;
@ -207,11 +193,12 @@ next_rq:
nr_secs = 0; nr_secs = 0;
do { do {
bit = find_next_zero_bit(line->invalid_bitmap, lm->sec_per_line, bit = find_next_zero_bit(invalid_bitmap, lm->sec_per_line,
bit + 1); bit + 1);
if (bit > line->emeta_ssec) if (bit > line->emeta_ssec)
break; break;
gc_rq->paddr_list[nr_secs] = bit;
gc_rq->lba_list[nr_secs++] = le64_to_cpu(lba_list[bit]); gc_rq->lba_list[nr_secs++] = le64_to_cpu(lba_list[bit]);
} while (nr_secs < pblk->max_write_pgs); } while (nr_secs < pblk->max_write_pgs);
@ -223,19 +210,25 @@ next_rq:
gc_rq->nr_secs = nr_secs; gc_rq->nr_secs = nr_secs;
gc_rq->line = line; gc_rq->line = line;
line_rq_ws = mempool_alloc(pblk->line_ws_pool, GFP_KERNEL); gc_rq_ws = kmalloc(sizeof(struct pblk_line_ws), GFP_KERNEL);
if (!line_rq_ws) if (!gc_rq_ws)
goto fail_free_gc_rq; goto fail_free_gc_rq;
line_rq_ws->pblk = pblk; gc_rq_ws->pblk = pblk;
line_rq_ws->line = line; gc_rq_ws->line = line;
line_rq_ws->priv = gc_rq; gc_rq_ws->priv = gc_rq;
/* The write GC path can be much slower than the read GC one due to
* the budget imposed by the rate-limiter. Balance in case that we get
* back pressure from the write GC path.
*/
while (down_timeout(&gc->gc_sem, msecs_to_jiffies(30000)))
io_schedule();
down(&gc->gc_sem);
kref_get(&line->ref); kref_get(&line->ref);
INIT_WORK(&line_rq_ws->ws, pblk_gc_line_ws); INIT_WORK(&gc_rq_ws->ws, pblk_gc_line_ws);
queue_work(gc->gc_line_reader_wq, &line_rq_ws->ws); queue_work(gc->gc_line_reader_wq, &gc_rq_ws->ws);
sec_left -= nr_secs; sec_left -= nr_secs;
if (sec_left > 0) if (sec_left > 0)
@ -243,10 +236,11 @@ next_rq:
out: out:
pblk_mfree(emeta_buf, l_mg->emeta_alloc_type); pblk_mfree(emeta_buf, l_mg->emeta_alloc_type);
mempool_free(line_ws, pblk->line_ws_pool); kfree(line_ws);
kfree(invalid_bitmap);
kref_put(&line->ref, pblk_line_put); kref_put(&line->ref, pblk_line_put);
atomic_dec(&gc->inflight_gc); atomic_dec(&gc->read_inflight_gc);
return; return;
@ -254,10 +248,14 @@ fail_free_gc_rq:
kfree(gc_rq); kfree(gc_rq);
fail_free_emeta: fail_free_emeta:
pblk_mfree(emeta_buf, l_mg->emeta_alloc_type); pblk_mfree(emeta_buf, l_mg->emeta_alloc_type);
fail_free_bitmap:
kfree(invalid_bitmap);
fail_free_ws:
kfree(line_ws);
pblk_put_line_back(pblk, line); pblk_put_line_back(pblk, line);
kref_put(&line->ref, pblk_line_put); kref_put(&line->ref, pblk_line_put);
mempool_free(line_ws, pblk->line_ws_pool); atomic_dec(&gc->read_inflight_gc);
atomic_dec(&gc->inflight_gc);
pr_err("pblk: Failed to GC line %d\n", line->id); pr_err("pblk: Failed to GC line %d\n", line->id);
} }
@ -269,19 +267,40 @@ static int pblk_gc_line(struct pblk *pblk, struct pblk_line *line)
pr_debug("pblk: line '%d' being reclaimed for GC\n", line->id); pr_debug("pblk: line '%d' being reclaimed for GC\n", line->id);
line_ws = mempool_alloc(pblk->line_ws_pool, GFP_KERNEL); line_ws = kmalloc(sizeof(struct pblk_line_ws), GFP_KERNEL);
if (!line_ws) if (!line_ws)
return -ENOMEM; return -ENOMEM;
line_ws->pblk = pblk; line_ws->pblk = pblk;
line_ws->line = line; line_ws->line = line;
atomic_inc(&gc->pipeline_gc);
INIT_WORK(&line_ws->ws, pblk_gc_line_prepare_ws); INIT_WORK(&line_ws->ws, pblk_gc_line_prepare_ws);
queue_work(gc->gc_reader_wq, &line_ws->ws); queue_work(gc->gc_reader_wq, &line_ws->ws);
return 0; return 0;
} }
static void pblk_gc_reader_kick(struct pblk_gc *gc)
{
wake_up_process(gc->gc_reader_ts);
}
static void pblk_gc_kick(struct pblk *pblk)
{
struct pblk_gc *gc = &pblk->gc;
pblk_gc_writer_kick(gc);
pblk_gc_reader_kick(gc);
/* If we're shutting down GC, let's not start it up again */
if (gc->gc_enabled) {
wake_up_process(gc->gc_ts);
mod_timer(&gc->gc_timer,
jiffies + msecs_to_jiffies(GC_TIME_MSECS));
}
}
static int pblk_gc_read(struct pblk *pblk) static int pblk_gc_read(struct pblk *pblk)
{ {
struct pblk_gc *gc = &pblk->gc; struct pblk_gc *gc = &pblk->gc;
@ -305,11 +324,6 @@ static int pblk_gc_read(struct pblk *pblk)
return 0; return 0;
} }
static void pblk_gc_reader_kick(struct pblk_gc *gc)
{
wake_up_process(gc->gc_reader_ts);
}
static struct pblk_line *pblk_gc_get_victim_line(struct pblk *pblk, static struct pblk_line *pblk_gc_get_victim_line(struct pblk *pblk,
struct list_head *group_list) struct list_head *group_list)
{ {
@ -338,26 +352,17 @@ static bool pblk_gc_should_run(struct pblk_gc *gc, struct pblk_rl *rl)
return ((gc->gc_active) && (nr_blocks_need > nr_blocks_free)); return ((gc->gc_active) && (nr_blocks_need > nr_blocks_free));
} }
/* void pblk_gc_free_full_lines(struct pblk *pblk)
* Lines with no valid sectors will be returned to the free list immediately. If
* GC is activated - either because the free block count is under the determined
* threshold, or because it is being forced from user space - only lines with a
* high count of invalid sectors will be recycled.
*/
static void pblk_gc_run(struct pblk *pblk)
{ {
struct pblk_line_mgmt *l_mg = &pblk->l_mg; struct pblk_line_mgmt *l_mg = &pblk->l_mg;
struct pblk_gc *gc = &pblk->gc; struct pblk_gc *gc = &pblk->gc;
struct pblk_line *line; struct pblk_line *line;
struct list_head *group_list;
bool run_gc;
int inflight_gc, gc_group = 0, prev_group = 0;
do { do {
spin_lock(&l_mg->gc_lock); spin_lock(&l_mg->gc_lock);
if (list_empty(&l_mg->gc_full_list)) { if (list_empty(&l_mg->gc_full_list)) {
spin_unlock(&l_mg->gc_lock); spin_unlock(&l_mg->gc_lock);
break; return;
} }
line = list_first_entry(&l_mg->gc_full_list, line = list_first_entry(&l_mg->gc_full_list,
@ -371,11 +376,30 @@ static void pblk_gc_run(struct pblk *pblk)
list_del(&line->list); list_del(&line->list);
spin_unlock(&l_mg->gc_lock); spin_unlock(&l_mg->gc_lock);
atomic_inc(&gc->pipeline_gc);
kref_put(&line->ref, pblk_line_put); kref_put(&line->ref, pblk_line_put);
} while (1); } while (1);
}
/*
* Lines with no valid sectors will be returned to the free list immediately. If
* GC is activated - either because the free block count is under the determined
* threshold, or because it is being forced from user space - only lines with a
* high count of invalid sectors will be recycled.
*/
static void pblk_gc_run(struct pblk *pblk)
{
struct pblk_line_mgmt *l_mg = &pblk->l_mg;
struct pblk_gc *gc = &pblk->gc;
struct pblk_line *line;
struct list_head *group_list;
bool run_gc;
int read_inflight_gc, gc_group = 0, prev_group = 0;
pblk_gc_free_full_lines(pblk);
run_gc = pblk_gc_should_run(&pblk->gc, &pblk->rl); run_gc = pblk_gc_should_run(&pblk->gc, &pblk->rl);
if (!run_gc || (atomic_read(&gc->inflight_gc) >= PBLK_GC_L_QD)) if (!run_gc || (atomic_read(&gc->read_inflight_gc) >= PBLK_GC_L_QD))
return; return;
next_gc_group: next_gc_group:
@ -402,14 +426,14 @@ next_gc_group:
list_add_tail(&line->list, &gc->r_list); list_add_tail(&line->list, &gc->r_list);
spin_unlock(&gc->r_lock); spin_unlock(&gc->r_lock);
inflight_gc = atomic_inc_return(&gc->inflight_gc); read_inflight_gc = atomic_inc_return(&gc->read_inflight_gc);
pblk_gc_reader_kick(gc); pblk_gc_reader_kick(gc);
prev_group = 1; prev_group = 1;
/* No need to queue up more GC lines than we can handle */ /* No need to queue up more GC lines than we can handle */
run_gc = pblk_gc_should_run(&pblk->gc, &pblk->rl); run_gc = pblk_gc_should_run(&pblk->gc, &pblk->rl);
if (!run_gc || inflight_gc >= PBLK_GC_L_QD) if (!run_gc || read_inflight_gc >= PBLK_GC_L_QD)
break; break;
} while (1); } while (1);
@ -418,16 +442,6 @@ next_gc_group:
goto next_gc_group; goto next_gc_group;
} }
void pblk_gc_kick(struct pblk *pblk)
{
struct pblk_gc *gc = &pblk->gc;
wake_up_process(gc->gc_ts);
pblk_gc_writer_kick(gc);
pblk_gc_reader_kick(gc);
mod_timer(&gc->gc_timer, jiffies + msecs_to_jiffies(GC_TIME_MSECS));
}
static void pblk_gc_timer(unsigned long data) static void pblk_gc_timer(unsigned long data)
{ {
struct pblk *pblk = (struct pblk *)data; struct pblk *pblk = (struct pblk *)data;
@ -465,6 +479,7 @@ static int pblk_gc_writer_ts(void *data)
static int pblk_gc_reader_ts(void *data) static int pblk_gc_reader_ts(void *data)
{ {
struct pblk *pblk = data; struct pblk *pblk = data;
struct pblk_gc *gc = &pblk->gc;
while (!kthread_should_stop()) { while (!kthread_should_stop()) {
if (!pblk_gc_read(pblk)) if (!pblk_gc_read(pblk))
@ -473,6 +488,18 @@ static int pblk_gc_reader_ts(void *data)
io_schedule(); io_schedule();
} }
#ifdef CONFIG_NVM_DEBUG
pr_info("pblk: flushing gc pipeline, %d lines left\n",
atomic_read(&gc->pipeline_gc));
#endif
do {
if (!atomic_read(&gc->pipeline_gc))
break;
schedule();
} while (1);
return 0; return 0;
} }
@ -486,10 +513,10 @@ void pblk_gc_should_start(struct pblk *pblk)
{ {
struct pblk_gc *gc = &pblk->gc; struct pblk_gc *gc = &pblk->gc;
if (gc->gc_enabled && !gc->gc_active) if (gc->gc_enabled && !gc->gc_active) {
pblk_gc_start(pblk); pblk_gc_start(pblk);
pblk_gc_kick(pblk);
pblk_gc_kick(pblk); }
} }
/* /*
@ -510,6 +537,11 @@ void pblk_gc_should_stop(struct pblk *pblk)
pblk_gc_stop(pblk, 0); pblk_gc_stop(pblk, 0);
} }
void pblk_gc_should_kick(struct pblk *pblk)
{
pblk_rl_update_rates(&pblk->rl);
}
void pblk_gc_sysfs_state_show(struct pblk *pblk, int *gc_enabled, void pblk_gc_sysfs_state_show(struct pblk *pblk, int *gc_enabled,
int *gc_active) int *gc_active)
{ {
@ -576,7 +608,8 @@ int pblk_gc_init(struct pblk *pblk)
gc->gc_forced = 0; gc->gc_forced = 0;
gc->gc_enabled = 1; gc->gc_enabled = 1;
gc->w_entries = 0; gc->w_entries = 0;
atomic_set(&gc->inflight_gc, 0); atomic_set(&gc->read_inflight_gc, 0);
atomic_set(&gc->pipeline_gc, 0);
/* Workqueue that reads valid sectors from a line and submit them to the /* Workqueue that reads valid sectors from a line and submit them to the
* GC writer to be recycled. * GC writer to be recycled.
@ -602,7 +635,7 @@ int pblk_gc_init(struct pblk *pblk)
spin_lock_init(&gc->w_lock); spin_lock_init(&gc->w_lock);
spin_lock_init(&gc->r_lock); spin_lock_init(&gc->r_lock);
sema_init(&gc->gc_sem, 128); sema_init(&gc->gc_sem, PBLK_GC_RQ_QD);
INIT_LIST_HEAD(&gc->w_list); INIT_LIST_HEAD(&gc->w_list);
INIT_LIST_HEAD(&gc->r_list); INIT_LIST_HEAD(&gc->r_list);
@ -625,24 +658,24 @@ void pblk_gc_exit(struct pblk *pblk)
{ {
struct pblk_gc *gc = &pblk->gc; struct pblk_gc *gc = &pblk->gc;
flush_workqueue(gc->gc_reader_wq); gc->gc_enabled = 0;
flush_workqueue(gc->gc_line_reader_wq); del_timer_sync(&gc->gc_timer);
del_timer(&gc->gc_timer);
pblk_gc_stop(pblk, 1); pblk_gc_stop(pblk, 1);
if (gc->gc_ts) if (gc->gc_ts)
kthread_stop(gc->gc_ts); kthread_stop(gc->gc_ts);
if (gc->gc_reader_ts)
kthread_stop(gc->gc_reader_ts);
flush_workqueue(gc->gc_reader_wq);
if (gc->gc_reader_wq) if (gc->gc_reader_wq)
destroy_workqueue(gc->gc_reader_wq); destroy_workqueue(gc->gc_reader_wq);
flush_workqueue(gc->gc_line_reader_wq);
if (gc->gc_line_reader_wq) if (gc->gc_line_reader_wq)
destroy_workqueue(gc->gc_line_reader_wq); destroy_workqueue(gc->gc_line_reader_wq);
if (gc->gc_writer_ts) if (gc->gc_writer_ts)
kthread_stop(gc->gc_writer_ts); kthread_stop(gc->gc_writer_ts);
if (gc->gc_reader_ts)
kthread_stop(gc->gc_reader_ts);
} }

View File

@ -20,8 +20,8 @@
#include "pblk.h" #include "pblk.h"
static struct kmem_cache *pblk_blk_ws_cache, *pblk_rec_cache, *pblk_g_rq_cache, static struct kmem_cache *pblk_ws_cache, *pblk_rec_cache, *pblk_g_rq_cache,
*pblk_w_rq_cache, *pblk_line_meta_cache; *pblk_w_rq_cache;
static DECLARE_RWSEM(pblk_lock); static DECLARE_RWSEM(pblk_lock);
struct bio_set *pblk_bio_set; struct bio_set *pblk_bio_set;
@ -46,7 +46,7 @@ static int pblk_rw_io(struct request_queue *q, struct pblk *pblk,
* user I/Os. Unless stalled, the rate limiter leaves at least 256KB * user I/Os. Unless stalled, the rate limiter leaves at least 256KB
* available for user I/O. * available for user I/O.
*/ */
if (unlikely(pblk_get_secs(bio) >= pblk_rl_sysfs_rate_show(&pblk->rl))) if (pblk_get_secs(bio) > pblk_rl_max_io(&pblk->rl))
blk_queue_split(q, &bio); blk_queue_split(q, &bio);
return pblk_write_to_cache(pblk, bio, PBLK_IOTYPE_USER); return pblk_write_to_cache(pblk, bio, PBLK_IOTYPE_USER);
@ -76,6 +76,28 @@ static blk_qc_t pblk_make_rq(struct request_queue *q, struct bio *bio)
return BLK_QC_T_NONE; return BLK_QC_T_NONE;
} }
static size_t pblk_trans_map_size(struct pblk *pblk)
{
int entry_size = 8;
if (pblk->ppaf_bitsize < 32)
entry_size = 4;
return entry_size * pblk->rl.nr_secs;
}
#ifdef CONFIG_NVM_DEBUG
static u32 pblk_l2p_crc(struct pblk *pblk)
{
size_t map_size;
u32 crc = ~(u32)0;
map_size = pblk_trans_map_size(pblk);
crc = crc32_le(crc, pblk->trans_map, map_size);
return crc;
}
#endif
static void pblk_l2p_free(struct pblk *pblk) static void pblk_l2p_free(struct pblk *pblk)
{ {
vfree(pblk->trans_map); vfree(pblk->trans_map);
@ -85,12 +107,10 @@ static int pblk_l2p_init(struct pblk *pblk)
{ {
sector_t i; sector_t i;
struct ppa_addr ppa; struct ppa_addr ppa;
int entry_size = 8; size_t map_size;
if (pblk->ppaf_bitsize < 32) map_size = pblk_trans_map_size(pblk);
entry_size = 4; pblk->trans_map = vmalloc(map_size);
pblk->trans_map = vmalloc(entry_size * pblk->rl.nr_secs);
if (!pblk->trans_map) if (!pblk->trans_map)
return -ENOMEM; return -ENOMEM;
@ -132,7 +152,6 @@ static int pblk_rwb_init(struct pblk *pblk)
} }
/* Minimum pages needed within a lun */ /* Minimum pages needed within a lun */
#define PAGE_POOL_SIZE 16
#define ADDR_POOL_SIZE 64 #define ADDR_POOL_SIZE 64
static int pblk_set_ppaf(struct pblk *pblk) static int pblk_set_ppaf(struct pblk *pblk)
@ -182,12 +201,10 @@ static int pblk_set_ppaf(struct pblk *pblk)
static int pblk_init_global_caches(struct pblk *pblk) static int pblk_init_global_caches(struct pblk *pblk)
{ {
char cache_name[PBLK_CACHE_NAME_LEN];
down_write(&pblk_lock); down_write(&pblk_lock);
pblk_blk_ws_cache = kmem_cache_create("pblk_blk_ws", pblk_ws_cache = kmem_cache_create("pblk_blk_ws",
sizeof(struct pblk_line_ws), 0, 0, NULL); sizeof(struct pblk_line_ws), 0, 0, NULL);
if (!pblk_blk_ws_cache) { if (!pblk_ws_cache) {
up_write(&pblk_lock); up_write(&pblk_lock);
return -ENOMEM; return -ENOMEM;
} }
@ -195,7 +212,7 @@ static int pblk_init_global_caches(struct pblk *pblk)
pblk_rec_cache = kmem_cache_create("pblk_rec", pblk_rec_cache = kmem_cache_create("pblk_rec",
sizeof(struct pblk_rec_ctx), 0, 0, NULL); sizeof(struct pblk_rec_ctx), 0, 0, NULL);
if (!pblk_rec_cache) { if (!pblk_rec_cache) {
kmem_cache_destroy(pblk_blk_ws_cache); kmem_cache_destroy(pblk_ws_cache);
up_write(&pblk_lock); up_write(&pblk_lock);
return -ENOMEM; return -ENOMEM;
} }
@ -203,7 +220,7 @@ static int pblk_init_global_caches(struct pblk *pblk)
pblk_g_rq_cache = kmem_cache_create("pblk_g_rq", pblk_g_rq_size, pblk_g_rq_cache = kmem_cache_create("pblk_g_rq", pblk_g_rq_size,
0, 0, NULL); 0, 0, NULL);
if (!pblk_g_rq_cache) { if (!pblk_g_rq_cache) {
kmem_cache_destroy(pblk_blk_ws_cache); kmem_cache_destroy(pblk_ws_cache);
kmem_cache_destroy(pblk_rec_cache); kmem_cache_destroy(pblk_rec_cache);
up_write(&pblk_lock); up_write(&pblk_lock);
return -ENOMEM; return -ENOMEM;
@ -212,30 +229,25 @@ static int pblk_init_global_caches(struct pblk *pblk)
pblk_w_rq_cache = kmem_cache_create("pblk_w_rq", pblk_w_rq_size, pblk_w_rq_cache = kmem_cache_create("pblk_w_rq", pblk_w_rq_size,
0, 0, NULL); 0, 0, NULL);
if (!pblk_w_rq_cache) { if (!pblk_w_rq_cache) {
kmem_cache_destroy(pblk_blk_ws_cache); kmem_cache_destroy(pblk_ws_cache);
kmem_cache_destroy(pblk_rec_cache); kmem_cache_destroy(pblk_rec_cache);
kmem_cache_destroy(pblk_g_rq_cache); kmem_cache_destroy(pblk_g_rq_cache);
up_write(&pblk_lock); up_write(&pblk_lock);
return -ENOMEM; return -ENOMEM;
} }
snprintf(cache_name, sizeof(cache_name), "pblk_line_m_%s",
pblk->disk->disk_name);
pblk_line_meta_cache = kmem_cache_create(cache_name,
pblk->lm.sec_bitmap_len, 0, 0, NULL);
if (!pblk_line_meta_cache) {
kmem_cache_destroy(pblk_blk_ws_cache);
kmem_cache_destroy(pblk_rec_cache);
kmem_cache_destroy(pblk_g_rq_cache);
kmem_cache_destroy(pblk_w_rq_cache);
up_write(&pblk_lock);
return -ENOMEM;
}
up_write(&pblk_lock); up_write(&pblk_lock);
return 0; return 0;
} }
static void pblk_free_global_caches(struct pblk *pblk)
{
kmem_cache_destroy(pblk_ws_cache);
kmem_cache_destroy(pblk_rec_cache);
kmem_cache_destroy(pblk_g_rq_cache);
kmem_cache_destroy(pblk_w_rq_cache);
}
static int pblk_core_init(struct pblk *pblk) static int pblk_core_init(struct pblk *pblk)
{ {
struct nvm_tgt_dev *dev = pblk->dev; struct nvm_tgt_dev *dev = pblk->dev;
@ -247,70 +259,80 @@ static int pblk_core_init(struct pblk *pblk)
if (pblk_init_global_caches(pblk)) if (pblk_init_global_caches(pblk))
return -ENOMEM; return -ENOMEM;
pblk->page_pool = mempool_create_page_pool(PAGE_POOL_SIZE, 0); /* Internal bios can be at most the sectors signaled by the device. */
if (!pblk->page_pool) pblk->page_bio_pool = mempool_create_page_pool(nvm_max_phys_sects(dev),
return -ENOMEM; 0);
if (!pblk->page_bio_pool)
goto free_global_caches;
pblk->line_ws_pool = mempool_create_slab_pool(PBLK_WS_POOL_SIZE, pblk->gen_ws_pool = mempool_create_slab_pool(PBLK_GEN_WS_POOL_SIZE,
pblk_blk_ws_cache); pblk_ws_cache);
if (!pblk->line_ws_pool) if (!pblk->gen_ws_pool)
goto free_page_pool; goto free_page_bio_pool;
pblk->rec_pool = mempool_create_slab_pool(geo->nr_luns, pblk_rec_cache); pblk->rec_pool = mempool_create_slab_pool(geo->nr_luns, pblk_rec_cache);
if (!pblk->rec_pool) if (!pblk->rec_pool)
goto free_blk_ws_pool; goto free_gen_ws_pool;
pblk->g_rq_pool = mempool_create_slab_pool(PBLK_READ_REQ_POOL_SIZE, pblk->r_rq_pool = mempool_create_slab_pool(geo->nr_luns,
pblk_g_rq_cache); pblk_g_rq_cache);
if (!pblk->g_rq_pool) if (!pblk->r_rq_pool)
goto free_rec_pool; goto free_rec_pool;
pblk->w_rq_pool = mempool_create_slab_pool(geo->nr_luns * 2, pblk->e_rq_pool = mempool_create_slab_pool(geo->nr_luns,
pblk_g_rq_cache);
if (!pblk->e_rq_pool)
goto free_r_rq_pool;
pblk->w_rq_pool = mempool_create_slab_pool(geo->nr_luns,
pblk_w_rq_cache); pblk_w_rq_cache);
if (!pblk->w_rq_pool) if (!pblk->w_rq_pool)
goto free_g_rq_pool; goto free_e_rq_pool;
pblk->line_meta_pool =
mempool_create_slab_pool(PBLK_META_POOL_SIZE,
pblk_line_meta_cache);
if (!pblk->line_meta_pool)
goto free_w_rq_pool;
pblk->close_wq = alloc_workqueue("pblk-close-wq", pblk->close_wq = alloc_workqueue("pblk-close-wq",
WQ_MEM_RECLAIM | WQ_UNBOUND, PBLK_NR_CLOSE_JOBS); WQ_MEM_RECLAIM | WQ_UNBOUND, PBLK_NR_CLOSE_JOBS);
if (!pblk->close_wq) if (!pblk->close_wq)
goto free_line_meta_pool; goto free_w_rq_pool;
pblk->bb_wq = alloc_workqueue("pblk-bb-wq", pblk->bb_wq = alloc_workqueue("pblk-bb-wq",
WQ_MEM_RECLAIM | WQ_UNBOUND, 0); WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
if (!pblk->bb_wq) if (!pblk->bb_wq)
goto free_close_wq; goto free_close_wq;
if (pblk_set_ppaf(pblk)) pblk->r_end_wq = alloc_workqueue("pblk-read-end-wq",
WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
if (!pblk->r_end_wq)
goto free_bb_wq; goto free_bb_wq;
if (pblk_set_ppaf(pblk))
goto free_r_end_wq;
if (pblk_rwb_init(pblk)) if (pblk_rwb_init(pblk))
goto free_bb_wq; goto free_r_end_wq;
INIT_LIST_HEAD(&pblk->compl_list); INIT_LIST_HEAD(&pblk->compl_list);
return 0; return 0;
free_r_end_wq:
destroy_workqueue(pblk->r_end_wq);
free_bb_wq: free_bb_wq:
destroy_workqueue(pblk->bb_wq); destroy_workqueue(pblk->bb_wq);
free_close_wq: free_close_wq:
destroy_workqueue(pblk->close_wq); destroy_workqueue(pblk->close_wq);
free_line_meta_pool:
mempool_destroy(pblk->line_meta_pool);
free_w_rq_pool: free_w_rq_pool:
mempool_destroy(pblk->w_rq_pool); mempool_destroy(pblk->w_rq_pool);
free_g_rq_pool: free_e_rq_pool:
mempool_destroy(pblk->g_rq_pool); mempool_destroy(pblk->e_rq_pool);
free_r_rq_pool:
mempool_destroy(pblk->r_rq_pool);
free_rec_pool: free_rec_pool:
mempool_destroy(pblk->rec_pool); mempool_destroy(pblk->rec_pool);
free_blk_ws_pool: free_gen_ws_pool:
mempool_destroy(pblk->line_ws_pool); mempool_destroy(pblk->gen_ws_pool);
free_page_pool: free_page_bio_pool:
mempool_destroy(pblk->page_pool); mempool_destroy(pblk->page_bio_pool);
free_global_caches:
pblk_free_global_caches(pblk);
return -ENOMEM; return -ENOMEM;
} }
@ -319,21 +341,20 @@ static void pblk_core_free(struct pblk *pblk)
if (pblk->close_wq) if (pblk->close_wq)
destroy_workqueue(pblk->close_wq); destroy_workqueue(pblk->close_wq);
if (pblk->r_end_wq)
destroy_workqueue(pblk->r_end_wq);
if (pblk->bb_wq) if (pblk->bb_wq)
destroy_workqueue(pblk->bb_wq); destroy_workqueue(pblk->bb_wq);
mempool_destroy(pblk->page_pool); mempool_destroy(pblk->page_bio_pool);
mempool_destroy(pblk->line_ws_pool); mempool_destroy(pblk->gen_ws_pool);
mempool_destroy(pblk->rec_pool); mempool_destroy(pblk->rec_pool);
mempool_destroy(pblk->g_rq_pool); mempool_destroy(pblk->r_rq_pool);
mempool_destroy(pblk->e_rq_pool);
mempool_destroy(pblk->w_rq_pool); mempool_destroy(pblk->w_rq_pool);
mempool_destroy(pblk->line_meta_pool);
kmem_cache_destroy(pblk_blk_ws_cache); pblk_free_global_caches(pblk);
kmem_cache_destroy(pblk_rec_cache);
kmem_cache_destroy(pblk_g_rq_cache);
kmem_cache_destroy(pblk_w_rq_cache);
kmem_cache_destroy(pblk_line_meta_cache);
} }
static void pblk_luns_free(struct pblk *pblk) static void pblk_luns_free(struct pblk *pblk)
@ -372,13 +393,11 @@ static void pblk_line_meta_free(struct pblk *pblk)
kfree(l_mg->bb_aux); kfree(l_mg->bb_aux);
kfree(l_mg->vsc_list); kfree(l_mg->vsc_list);
spin_lock(&l_mg->free_lock);
for (i = 0; i < PBLK_DATA_LINES; i++) { for (i = 0; i < PBLK_DATA_LINES; i++) {
kfree(l_mg->sline_meta[i]); kfree(l_mg->sline_meta[i]);
pblk_mfree(l_mg->eline_meta[i]->buf, l_mg->emeta_alloc_type); pblk_mfree(l_mg->eline_meta[i]->buf, l_mg->emeta_alloc_type);
kfree(l_mg->eline_meta[i]); kfree(l_mg->eline_meta[i]);
} }
spin_unlock(&l_mg->free_lock);
kfree(pblk->lines); kfree(pblk->lines);
} }
@ -507,6 +526,13 @@ static int pblk_lines_configure(struct pblk *pblk, int flags)
} }
} }
#ifdef CONFIG_NVM_DEBUG
pr_info("pblk init: L2P CRC: %x\n", pblk_l2p_crc(pblk));
#endif
/* Free full lines directly as GC has not been started yet */
pblk_gc_free_full_lines(pblk);
if (!line) { if (!line) {
/* Configure next line for user data */ /* Configure next line for user data */
line = pblk_line_get_first_data(pblk); line = pblk_line_get_first_data(pblk);
@ -630,7 +656,10 @@ static int pblk_lines_alloc_metadata(struct pblk *pblk)
fail_free_emeta: fail_free_emeta:
while (--i >= 0) { while (--i >= 0) {
vfree(l_mg->eline_meta[i]->buf); if (l_mg->emeta_alloc_type == PBLK_VMALLOC_META)
vfree(l_mg->eline_meta[i]->buf);
else
kfree(l_mg->eline_meta[i]->buf);
kfree(l_mg->eline_meta[i]); kfree(l_mg->eline_meta[i]);
} }
@ -681,8 +710,8 @@ static int pblk_lines_init(struct pblk *pblk)
lm->blk_bitmap_len = BITS_TO_LONGS(geo->nr_luns) * sizeof(long); lm->blk_bitmap_len = BITS_TO_LONGS(geo->nr_luns) * sizeof(long);
lm->sec_bitmap_len = BITS_TO_LONGS(lm->sec_per_line) * sizeof(long); lm->sec_bitmap_len = BITS_TO_LONGS(lm->sec_per_line) * sizeof(long);
lm->lun_bitmap_len = BITS_TO_LONGS(geo->nr_luns) * sizeof(long); lm->lun_bitmap_len = BITS_TO_LONGS(geo->nr_luns) * sizeof(long);
lm->high_thrs = lm->sec_per_line / 2; lm->mid_thrs = lm->sec_per_line / 2;
lm->mid_thrs = lm->sec_per_line / 4; lm->high_thrs = lm->sec_per_line / 4;
lm->meta_distance = (geo->nr_luns / 2) * pblk->min_write_pgs; lm->meta_distance = (geo->nr_luns / 2) * pblk->min_write_pgs;
/* Calculate necessary pages for smeta. See comment over struct /* Calculate necessary pages for smeta. See comment over struct
@ -713,9 +742,13 @@ add_emeta_page:
goto add_emeta_page; goto add_emeta_page;
} }
lm->emeta_bb = geo->nr_luns - i; lm->emeta_bb = geo->nr_luns > i ? geo->nr_luns - i : 0;
lm->min_blk_line = 1 + DIV_ROUND_UP(lm->smeta_sec + lm->emeta_sec[0],
geo->sec_per_blk); lm->min_blk_line = 1;
if (geo->nr_luns > 1)
lm->min_blk_line += DIV_ROUND_UP(lm->smeta_sec +
lm->emeta_sec[0], geo->sec_per_blk);
if (lm->min_blk_line > lm->blk_per_line) { if (lm->min_blk_line > lm->blk_per_line) {
pr_err("pblk: config. not supported. Min. LUN in line:%d\n", pr_err("pblk: config. not supported. Min. LUN in line:%d\n",
lm->blk_per_line); lm->blk_per_line);
@ -890,6 +923,11 @@ static void pblk_exit(void *private)
down_write(&pblk_lock); down_write(&pblk_lock);
pblk_gc_exit(pblk); pblk_gc_exit(pblk);
pblk_tear_down(pblk); pblk_tear_down(pblk);
#ifdef CONFIG_NVM_DEBUG
pr_info("pblk exit: L2P CRC: %x\n", pblk_l2p_crc(pblk));
#endif
pblk_free(pblk); pblk_free(pblk);
up_write(&pblk_lock); up_write(&pblk_lock);
} }
@ -911,7 +949,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
int ret; int ret;
if (dev->identity.dom & NVM_RSP_L2P) { if (dev->identity.dom & NVM_RSP_L2P) {
pr_err("pblk: device-side L2P table not supported. (%x)\n", pr_err("pblk: host-side L2P table not supported. (%x)\n",
dev->identity.dom); dev->identity.dom);
return ERR_PTR(-EINVAL); return ERR_PTR(-EINVAL);
} }
@ -923,6 +961,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
pblk->dev = dev; pblk->dev = dev;
pblk->disk = tdisk; pblk->disk = tdisk;
pblk->state = PBLK_STATE_RUNNING; pblk->state = PBLK_STATE_RUNNING;
pblk->gc.gc_enabled = 0;
spin_lock_init(&pblk->trans_lock); spin_lock_init(&pblk->trans_lock);
spin_lock_init(&pblk->lock); spin_lock_init(&pblk->lock);
@ -944,6 +983,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
atomic_long_set(&pblk->recov_writes, 0); atomic_long_set(&pblk->recov_writes, 0);
atomic_long_set(&pblk->recov_writes, 0); atomic_long_set(&pblk->recov_writes, 0);
atomic_long_set(&pblk->recov_gc_writes, 0); atomic_long_set(&pblk->recov_gc_writes, 0);
atomic_long_set(&pblk->recov_gc_reads, 0);
#endif #endif
atomic_long_set(&pblk->read_failed, 0); atomic_long_set(&pblk->read_failed, 0);
@ -1012,6 +1052,10 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
pblk->rwb.nr_entries); pblk->rwb.nr_entries);
wake_up_process(pblk->writer_ts); wake_up_process(pblk->writer_ts);
/* Check if we need to start GC */
pblk_gc_should_kick(pblk);
return pblk; return pblk;
fail_stop_writer: fail_stop_writer:
@ -1044,6 +1088,7 @@ static struct nvm_tgt_type tt_pblk = {
.sysfs_init = pblk_sysfs_init, .sysfs_init = pblk_sysfs_init,
.sysfs_exit = pblk_sysfs_exit, .sysfs_exit = pblk_sysfs_exit,
.owner = THIS_MODULE,
}; };
static int __init pblk_module_init(void) static int __init pblk_module_init(void)

View File

@ -25,16 +25,28 @@ static void pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
unsigned int valid_secs) unsigned int valid_secs)
{ {
struct pblk_line *line = pblk_line_get_data(pblk); struct pblk_line *line = pblk_line_get_data(pblk);
struct pblk_emeta *emeta = line->emeta; struct pblk_emeta *emeta;
struct pblk_w_ctx *w_ctx; struct pblk_w_ctx *w_ctx;
__le64 *lba_list = emeta_to_lbas(pblk, emeta->buf); __le64 *lba_list;
u64 paddr; u64 paddr;
int nr_secs = pblk->min_write_pgs; int nr_secs = pblk->min_write_pgs;
int i; int i;
if (pblk_line_is_full(line)) {
struct pblk_line *prev_line = line;
line = pblk_line_replace_data(pblk);
pblk_line_close_meta(pblk, prev_line);
}
emeta = line->emeta;
lba_list = emeta_to_lbas(pblk, emeta->buf);
paddr = pblk_alloc_page(pblk, line, nr_secs); paddr = pblk_alloc_page(pblk, line, nr_secs);
for (i = 0; i < nr_secs; i++, paddr++) { for (i = 0; i < nr_secs; i++, paddr++) {
__le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
/* ppa to be sent to the device */ /* ppa to be sent to the device */
ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id); ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
@ -51,22 +63,14 @@ static void pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
w_ctx->ppa = ppa_list[i]; w_ctx->ppa = ppa_list[i];
meta_list[i].lba = cpu_to_le64(w_ctx->lba); meta_list[i].lba = cpu_to_le64(w_ctx->lba);
lba_list[paddr] = cpu_to_le64(w_ctx->lba); lba_list[paddr] = cpu_to_le64(w_ctx->lba);
line->nr_valid_lbas++; if (lba_list[paddr] != addr_empty)
line->nr_valid_lbas++;
} else { } else {
__le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
lba_list[paddr] = meta_list[i].lba = addr_empty; lba_list[paddr] = meta_list[i].lba = addr_empty;
__pblk_map_invalidate(pblk, line, paddr); __pblk_map_invalidate(pblk, line, paddr);
} }
} }
if (pblk_line_is_full(line)) {
struct pblk_line *prev_line = line;
pblk_line_replace_data(pblk);
pblk_line_close_meta(pblk, prev_line);
}
pblk_down_rq(pblk, ppa_list, nr_secs, lun_bitmap); pblk_down_rq(pblk, ppa_list, nr_secs, lun_bitmap);
} }

View File

@ -201,8 +201,7 @@ unsigned int pblk_rb_read_commit(struct pblk_rb *rb, unsigned int nr_entries)
return subm; return subm;
} }
static int __pblk_rb_update_l2p(struct pblk_rb *rb, unsigned int *l2p_upd, static int __pblk_rb_update_l2p(struct pblk_rb *rb, unsigned int to_update)
unsigned int to_update)
{ {
struct pblk *pblk = container_of(rb, struct pblk, rwb); struct pblk *pblk = container_of(rb, struct pblk, rwb);
struct pblk_line *line; struct pblk_line *line;
@ -213,7 +212,7 @@ static int __pblk_rb_update_l2p(struct pblk_rb *rb, unsigned int *l2p_upd,
int flags; int flags;
for (i = 0; i < to_update; i++) { for (i = 0; i < to_update; i++) {
entry = &rb->entries[*l2p_upd]; entry = &rb->entries[rb->l2p_update];
w_ctx = &entry->w_ctx; w_ctx = &entry->w_ctx;
flags = READ_ONCE(entry->w_ctx.flags); flags = READ_ONCE(entry->w_ctx.flags);
@ -230,7 +229,7 @@ static int __pblk_rb_update_l2p(struct pblk_rb *rb, unsigned int *l2p_upd,
line = &pblk->lines[pblk_tgt_ppa_to_line(w_ctx->ppa)]; line = &pblk->lines[pblk_tgt_ppa_to_line(w_ctx->ppa)];
kref_put(&line->ref, pblk_line_put); kref_put(&line->ref, pblk_line_put);
clean_wctx(w_ctx); clean_wctx(w_ctx);
*l2p_upd = (*l2p_upd + 1) & (rb->nr_entries - 1); rb->l2p_update = (rb->l2p_update + 1) & (rb->nr_entries - 1);
} }
pblk_rl_out(&pblk->rl, user_io, gc_io); pblk_rl_out(&pblk->rl, user_io, gc_io);
@ -258,7 +257,7 @@ static int pblk_rb_update_l2p(struct pblk_rb *rb, unsigned int nr_entries,
count = nr_entries - space; count = nr_entries - space;
/* l2p_update used exclusively under rb->w_lock */ /* l2p_update used exclusively under rb->w_lock */
ret = __pblk_rb_update_l2p(rb, &rb->l2p_update, count); ret = __pblk_rb_update_l2p(rb, count);
out: out:
return ret; return ret;
@ -280,7 +279,7 @@ void pblk_rb_sync_l2p(struct pblk_rb *rb)
sync = smp_load_acquire(&rb->sync); sync = smp_load_acquire(&rb->sync);
to_update = pblk_rb_ring_count(sync, rb->l2p_update, rb->nr_entries); to_update = pblk_rb_ring_count(sync, rb->l2p_update, rb->nr_entries);
__pblk_rb_update_l2p(rb, &rb->l2p_update, to_update); __pblk_rb_update_l2p(rb, to_update);
spin_unlock(&rb->w_lock); spin_unlock(&rb->w_lock);
} }
@ -325,8 +324,8 @@ void pblk_rb_write_entry_user(struct pblk_rb *rb, void *data,
} }
void pblk_rb_write_entry_gc(struct pblk_rb *rb, void *data, void pblk_rb_write_entry_gc(struct pblk_rb *rb, void *data,
struct pblk_w_ctx w_ctx, struct pblk_line *gc_line, struct pblk_w_ctx w_ctx, struct pblk_line *line,
unsigned int ring_pos) u64 paddr, unsigned int ring_pos)
{ {
struct pblk *pblk = container_of(rb, struct pblk, rwb); struct pblk *pblk = container_of(rb, struct pblk, rwb);
struct pblk_rb_entry *entry; struct pblk_rb_entry *entry;
@ -341,7 +340,7 @@ void pblk_rb_write_entry_gc(struct pblk_rb *rb, void *data,
__pblk_rb_write_entry(rb, data, w_ctx, entry); __pblk_rb_write_entry(rb, data, w_ctx, entry);
if (!pblk_update_map_gc(pblk, w_ctx.lba, entry->cacheline, gc_line)) if (!pblk_update_map_gc(pblk, w_ctx.lba, entry->cacheline, line, paddr))
entry->w_ctx.lba = ADDR_EMPTY; entry->w_ctx.lba = ADDR_EMPTY;
flags = w_ctx.flags | PBLK_WRITTEN_DATA; flags = w_ctx.flags | PBLK_WRITTEN_DATA;
@ -355,7 +354,6 @@ static int pblk_rb_sync_point_set(struct pblk_rb *rb, struct bio *bio,
{ {
struct pblk_rb_entry *entry; struct pblk_rb_entry *entry;
unsigned int subm, sync_point; unsigned int subm, sync_point;
int flags;
subm = READ_ONCE(rb->subm); subm = READ_ONCE(rb->subm);
@ -369,12 +367,6 @@ static int pblk_rb_sync_point_set(struct pblk_rb *rb, struct bio *bio,
sync_point = (pos == 0) ? (rb->nr_entries - 1) : (pos - 1); sync_point = (pos == 0) ? (rb->nr_entries - 1) : (pos - 1);
entry = &rb->entries[sync_point]; entry = &rb->entries[sync_point];
flags = READ_ONCE(entry->w_ctx.flags);
flags |= PBLK_FLUSH_ENTRY;
/* Release flags on context. Protect from writes */
smp_store_release(&entry->w_ctx.flags, flags);
/* Protect syncs */ /* Protect syncs */
smp_store_release(&rb->sync_point, sync_point); smp_store_release(&rb->sync_point, sync_point);
@ -454,6 +446,7 @@ static int pblk_rb_may_write_flush(struct pblk_rb *rb, unsigned int nr_entries,
/* Protect from read count */ /* Protect from read count */
smp_store_release(&rb->mem, mem); smp_store_release(&rb->mem, mem);
return 1; return 1;
} }
@ -558,12 +551,13 @@ out:
* persist data on the write buffer to the media. * persist data on the write buffer to the media.
*/ */
unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb, struct nvm_rq *rqd, unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb, struct nvm_rq *rqd,
struct bio *bio, unsigned int pos, unsigned int pos, unsigned int nr_entries,
unsigned int nr_entries, unsigned int count) unsigned int count)
{ {
struct pblk *pblk = container_of(rb, struct pblk, rwb); struct pblk *pblk = container_of(rb, struct pblk, rwb);
struct request_queue *q = pblk->dev->q; struct request_queue *q = pblk->dev->q;
struct pblk_c_ctx *c_ctx = nvm_rq_to_pdu(rqd); struct pblk_c_ctx *c_ctx = nvm_rq_to_pdu(rqd);
struct bio *bio = rqd->bio;
struct pblk_rb_entry *entry; struct pblk_rb_entry *entry;
struct page *page; struct page *page;
unsigned int pad = 0, to_read = nr_entries; unsigned int pad = 0, to_read = nr_entries;

View File

@ -39,21 +39,15 @@ static int pblk_read_from_cache(struct pblk *pblk, struct bio *bio,
} }
static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd, static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
unsigned long *read_bitmap) sector_t blba, unsigned long *read_bitmap)
{ {
struct pblk_sec_meta *meta_list = rqd->meta_list;
struct bio *bio = rqd->bio; struct bio *bio = rqd->bio;
struct ppa_addr ppas[PBLK_MAX_REQ_ADDRS]; struct ppa_addr ppas[PBLK_MAX_REQ_ADDRS];
sector_t blba = pblk_get_lba(bio);
int nr_secs = rqd->nr_ppas; int nr_secs = rqd->nr_ppas;
bool advanced_bio = false; bool advanced_bio = false;
int i, j = 0; int i, j = 0;
/* logic error: lba out-of-bounds. Ignore read request */
if (blba + nr_secs >= pblk->rl.nr_secs) {
WARN(1, "pblk: read lbas out of bounds\n");
return;
}
pblk_lookup_l2p_seq(pblk, ppas, blba, nr_secs); pblk_lookup_l2p_seq(pblk, ppas, blba, nr_secs);
for (i = 0; i < nr_secs; i++) { for (i = 0; i < nr_secs; i++) {
@ -63,6 +57,7 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
retry: retry:
if (pblk_ppa_empty(p)) { if (pblk_ppa_empty(p)) {
WARN_ON(test_and_set_bit(i, read_bitmap)); WARN_ON(test_and_set_bit(i, read_bitmap));
meta_list[i].lba = cpu_to_le64(ADDR_EMPTY);
if (unlikely(!advanced_bio)) { if (unlikely(!advanced_bio)) {
bio_advance(bio, (i) * PBLK_EXPOSED_PAGE_SIZE); bio_advance(bio, (i) * PBLK_EXPOSED_PAGE_SIZE);
@ -82,6 +77,7 @@ retry:
goto retry; goto retry;
} }
WARN_ON(test_and_set_bit(i, read_bitmap)); WARN_ON(test_and_set_bit(i, read_bitmap));
meta_list[i].lba = cpu_to_le64(lba);
advanced_bio = true; advanced_bio = true;
#ifdef CONFIG_NVM_DEBUG #ifdef CONFIG_NVM_DEBUG
atomic_long_inc(&pblk->cache_reads); atomic_long_inc(&pblk->cache_reads);
@ -117,10 +113,51 @@ static int pblk_submit_read_io(struct pblk *pblk, struct nvm_rq *rqd)
return NVM_IO_OK; return NVM_IO_OK;
} }
static void pblk_end_io_read(struct nvm_rq *rqd) static void pblk_read_check(struct pblk *pblk, struct nvm_rq *rqd,
sector_t blba)
{
struct pblk_sec_meta *meta_list = rqd->meta_list;
int nr_lbas = rqd->nr_ppas;
int i;
for (i = 0; i < nr_lbas; i++) {
u64 lba = le64_to_cpu(meta_list[i].lba);
if (lba == ADDR_EMPTY)
continue;
WARN(lba != blba + i, "pblk: corrupted read LBA\n");
}
}
static void pblk_read_put_rqd_kref(struct pblk *pblk, struct nvm_rq *rqd)
{
struct ppa_addr *ppa_list;
int i;
ppa_list = (rqd->nr_ppas > 1) ? rqd->ppa_list : &rqd->ppa_addr;
for (i = 0; i < rqd->nr_ppas; i++) {
struct ppa_addr ppa = ppa_list[i];
struct pblk_line *line;
line = &pblk->lines[pblk_dev_ppa_to_line(ppa)];
kref_put(&line->ref, pblk_line_put_wq);
}
}
static void pblk_end_user_read(struct bio *bio)
{
#ifdef CONFIG_NVM_DEBUG
WARN_ONCE(bio->bi_status, "pblk: corrupted read bio\n");
#endif
bio_endio(bio);
bio_put(bio);
}
static void __pblk_end_io_read(struct pblk *pblk, struct nvm_rq *rqd,
bool put_line)
{ {
struct pblk *pblk = rqd->private;
struct nvm_tgt_dev *dev = pblk->dev;
struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd); struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
struct bio *bio = rqd->bio; struct bio *bio = rqd->bio;
@ -131,47 +168,51 @@ static void pblk_end_io_read(struct nvm_rq *rqd)
WARN_ONCE(bio->bi_status, "pblk: corrupted read error\n"); WARN_ONCE(bio->bi_status, "pblk: corrupted read error\n");
#endif #endif
nvm_dev_dma_free(dev->parent, rqd->meta_list, rqd->dma_meta_list); pblk_read_check(pblk, rqd, r_ctx->lba);
bio_put(bio); bio_put(bio);
if (r_ctx->private) { if (r_ctx->private)
struct bio *orig_bio = r_ctx->private; pblk_end_user_read((struct bio *)r_ctx->private);
#ifdef CONFIG_NVM_DEBUG if (put_line)
WARN_ONCE(orig_bio->bi_status, "pblk: corrupted read bio\n"); pblk_read_put_rqd_kref(pblk, rqd);
#endif
bio_endio(orig_bio);
bio_put(orig_bio);
}
#ifdef CONFIG_NVM_DEBUG #ifdef CONFIG_NVM_DEBUG
atomic_long_add(rqd->nr_ppas, &pblk->sync_reads); atomic_long_add(rqd->nr_ppas, &pblk->sync_reads);
atomic_long_sub(rqd->nr_ppas, &pblk->inflight_reads); atomic_long_sub(rqd->nr_ppas, &pblk->inflight_reads);
#endif #endif
pblk_free_rqd(pblk, rqd, READ); pblk_free_rqd(pblk, rqd, PBLK_READ);
atomic_dec(&pblk->inflight_io); atomic_dec(&pblk->inflight_io);
} }
static void pblk_end_io_read(struct nvm_rq *rqd)
{
struct pblk *pblk = rqd->private;
__pblk_end_io_read(pblk, rqd, true);
}
static int pblk_fill_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd, static int pblk_fill_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
unsigned int bio_init_idx, unsigned int bio_init_idx,
unsigned long *read_bitmap) unsigned long *read_bitmap)
{ {
struct bio *new_bio, *bio = rqd->bio; struct bio *new_bio, *bio = rqd->bio;
struct pblk_sec_meta *meta_list = rqd->meta_list;
struct bio_vec src_bv, dst_bv; struct bio_vec src_bv, dst_bv;
void *ppa_ptr = NULL; void *ppa_ptr = NULL;
void *src_p, *dst_p; void *src_p, *dst_p;
dma_addr_t dma_ppa_list = 0; dma_addr_t dma_ppa_list = 0;
__le64 *lba_list_mem, *lba_list_media;
int nr_secs = rqd->nr_ppas; int nr_secs = rqd->nr_ppas;
int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs); int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
int i, ret, hole; int i, ret, hole;
DECLARE_COMPLETION_ONSTACK(wait);
/* Re-use allocated memory for intermediate lbas */
lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size);
new_bio = bio_alloc(GFP_KERNEL, nr_holes); new_bio = bio_alloc(GFP_KERNEL, nr_holes);
if (!new_bio) {
pr_err("pblk: could not alloc read bio\n");
return NVM_IO_ERR;
}
if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes)) if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes))
goto err; goto err;
@ -181,34 +222,29 @@ static int pblk_fill_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
goto err; goto err;
} }
for (i = 0; i < nr_secs; i++)
lba_list_mem[i] = meta_list[i].lba;
new_bio->bi_iter.bi_sector = 0; /* internal bio */ new_bio->bi_iter.bi_sector = 0; /* internal bio */
bio_set_op_attrs(new_bio, REQ_OP_READ, 0); bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
new_bio->bi_private = &wait;
new_bio->bi_end_io = pblk_end_bio_sync;
rqd->bio = new_bio; rqd->bio = new_bio;
rqd->nr_ppas = nr_holes; rqd->nr_ppas = nr_holes;
rqd->flags = pblk_set_read_mode(pblk, PBLK_READ_RANDOM); rqd->flags = pblk_set_read_mode(pblk, PBLK_READ_RANDOM);
rqd->end_io = NULL;
if (unlikely(nr_secs > 1 && nr_holes == 1)) { if (unlikely(nr_holes == 1)) {
ppa_ptr = rqd->ppa_list; ppa_ptr = rqd->ppa_list;
dma_ppa_list = rqd->dma_ppa_list; dma_ppa_list = rqd->dma_ppa_list;
rqd->ppa_addr = rqd->ppa_list[0]; rqd->ppa_addr = rqd->ppa_list[0];
} }
ret = pblk_submit_read_io(pblk, rqd); ret = pblk_submit_io_sync(pblk, rqd);
if (ret) { if (ret) {
bio_put(rqd->bio); bio_put(rqd->bio);
pr_err("pblk: read IO submission failed\n"); pr_err("pblk: sync read IO submission failed\n");
goto err; goto err;
} }
if (!wait_for_completion_io_timeout(&wait,
msecs_to_jiffies(PBLK_COMMAND_TIMEOUT_MS))) {
pr_err("pblk: partial read I/O timed out\n");
}
if (rqd->error) { if (rqd->error) {
atomic_long_inc(&pblk->read_failed); atomic_long_inc(&pblk->read_failed);
#ifdef CONFIG_NVM_DEBUG #ifdef CONFIG_NVM_DEBUG
@ -216,15 +252,31 @@ static int pblk_fill_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
#endif #endif
} }
if (unlikely(nr_secs > 1 && nr_holes == 1)) { if (unlikely(nr_holes == 1)) {
struct ppa_addr ppa;
ppa = rqd->ppa_addr;
rqd->ppa_list = ppa_ptr; rqd->ppa_list = ppa_ptr;
rqd->dma_ppa_list = dma_ppa_list; rqd->dma_ppa_list = dma_ppa_list;
rqd->ppa_list[0] = ppa;
}
for (i = 0; i < nr_secs; i++) {
lba_list_media[i] = meta_list[i].lba;
meta_list[i].lba = lba_list_mem[i];
} }
/* Fill the holes in the original bio */ /* Fill the holes in the original bio */
i = 0; i = 0;
hole = find_first_zero_bit(read_bitmap, nr_secs); hole = find_first_zero_bit(read_bitmap, nr_secs);
do { do {
int line_id = pblk_dev_ppa_to_line(rqd->ppa_list[i]);
struct pblk_line *line = &pblk->lines[line_id];
kref_put(&line->ref, pblk_line_put);
meta_list[hole].lba = lba_list_media[i];
src_bv = new_bio->bi_io_vec[i++]; src_bv = new_bio->bi_io_vec[i++];
dst_bv = bio->bi_io_vec[bio_init_idx + hole]; dst_bv = bio->bi_io_vec[bio_init_idx + hole];
@ -238,7 +290,7 @@ static int pblk_fill_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
kunmap_atomic(src_p); kunmap_atomic(src_p);
kunmap_atomic(dst_p); kunmap_atomic(dst_p);
mempool_free(src_bv.bv_page, pblk->page_pool); mempool_free(src_bv.bv_page, pblk->page_bio_pool);
hole = find_next_zero_bit(read_bitmap, nr_secs, hole + 1); hole = find_next_zero_bit(read_bitmap, nr_secs, hole + 1);
} while (hole < nr_secs); } while (hole < nr_secs);
@ -246,34 +298,26 @@ static int pblk_fill_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd,
bio_put(new_bio); bio_put(new_bio);
/* Complete the original bio and associated request */ /* Complete the original bio and associated request */
bio_endio(bio);
rqd->bio = bio; rqd->bio = bio;
rqd->nr_ppas = nr_secs; rqd->nr_ppas = nr_secs;
rqd->private = pblk;
bio_endio(bio); __pblk_end_io_read(pblk, rqd, false);
pblk_end_io_read(rqd);
return NVM_IO_OK; return NVM_IO_OK;
err: err:
/* Free allocated pages in new bio */ /* Free allocated pages in new bio */
pblk_bio_free_pages(pblk, bio, 0, new_bio->bi_vcnt); pblk_bio_free_pages(pblk, bio, 0, new_bio->bi_vcnt);
rqd->private = pblk; __pblk_end_io_read(pblk, rqd, false);
pblk_end_io_read(rqd);
return NVM_IO_ERR; return NVM_IO_ERR;
} }
static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd,
unsigned long *read_bitmap) sector_t lba, unsigned long *read_bitmap)
{ {
struct pblk_sec_meta *meta_list = rqd->meta_list;
struct bio *bio = rqd->bio; struct bio *bio = rqd->bio;
struct ppa_addr ppa; struct ppa_addr ppa;
sector_t lba = pblk_get_lba(bio);
/* logic error: lba out-of-bounds. Ignore read request */
if (lba >= pblk->rl.nr_secs) {
WARN(1, "pblk: read lba out of bounds\n");
return;
}
pblk_lookup_l2p_seq(pblk, &ppa, lba, 1); pblk_lookup_l2p_seq(pblk, &ppa, lba, 1);
@ -284,6 +328,7 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd,
retry: retry:
if (pblk_ppa_empty(ppa)) { if (pblk_ppa_empty(ppa)) {
WARN_ON(test_and_set_bit(0, read_bitmap)); WARN_ON(test_and_set_bit(0, read_bitmap));
meta_list[0].lba = cpu_to_le64(ADDR_EMPTY);
return; return;
} }
@ -295,9 +340,12 @@ retry:
pblk_lookup_l2p_seq(pblk, &ppa, lba, 1); pblk_lookup_l2p_seq(pblk, &ppa, lba, 1);
goto retry; goto retry;
} }
WARN_ON(test_and_set_bit(0, read_bitmap)); WARN_ON(test_and_set_bit(0, read_bitmap));
meta_list[0].lba = cpu_to_le64(lba);
#ifdef CONFIG_NVM_DEBUG #ifdef CONFIG_NVM_DEBUG
atomic_long_inc(&pblk->cache_reads); atomic_long_inc(&pblk->cache_reads);
#endif #endif
} else { } else {
rqd->ppa_addr = ppa; rqd->ppa_addr = ppa;
@ -309,22 +357,24 @@ retry:
int pblk_submit_read(struct pblk *pblk, struct bio *bio) int pblk_submit_read(struct pblk *pblk, struct bio *bio)
{ {
struct nvm_tgt_dev *dev = pblk->dev; struct nvm_tgt_dev *dev = pblk->dev;
sector_t blba = pblk_get_lba(bio);
unsigned int nr_secs = pblk_get_secs(bio); unsigned int nr_secs = pblk_get_secs(bio);
struct pblk_g_ctx *r_ctx;
struct nvm_rq *rqd; struct nvm_rq *rqd;
unsigned long read_bitmap; /* Max 64 ppas per request */
unsigned int bio_init_idx; unsigned int bio_init_idx;
unsigned long read_bitmap; /* Max 64 ppas per request */
int ret = NVM_IO_ERR; int ret = NVM_IO_ERR;
if (nr_secs > PBLK_MAX_REQ_ADDRS) /* logic error: lba out-of-bounds. Ignore read request */
if (blba >= pblk->rl.nr_secs || nr_secs > PBLK_MAX_REQ_ADDRS) {
WARN(1, "pblk: read lba out of bounds (lba:%llu, nr:%d)\n",
(unsigned long long)blba, nr_secs);
return NVM_IO_ERR; return NVM_IO_ERR;
}
bitmap_zero(&read_bitmap, nr_secs); bitmap_zero(&read_bitmap, nr_secs);
rqd = pblk_alloc_rqd(pblk, READ); rqd = pblk_alloc_rqd(pblk, PBLK_READ);
if (IS_ERR(rqd)) {
pr_err_ratelimited("pblk: not able to alloc rqd");
return NVM_IO_ERR;
}
rqd->opcode = NVM_OP_PREAD; rqd->opcode = NVM_OP_PREAD;
rqd->bio = bio; rqd->bio = bio;
@ -332,6 +382,9 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
rqd->private = pblk; rqd->private = pblk;
rqd->end_io = pblk_end_io_read; rqd->end_io = pblk_end_io_read;
r_ctx = nvm_rq_to_pdu(rqd);
r_ctx->lba = blba;
/* Save the index for this bio's start. This is needed in case /* Save the index for this bio's start. This is needed in case
* we need to fill a partial read. * we need to fill a partial read.
*/ */
@ -348,23 +401,22 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size; rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size; rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;
pblk_read_ppalist_rq(pblk, rqd, &read_bitmap); pblk_read_ppalist_rq(pblk, rqd, blba, &read_bitmap);
} else { } else {
pblk_read_rq(pblk, rqd, &read_bitmap); pblk_read_rq(pblk, rqd, blba, &read_bitmap);
} }
bio_get(bio); bio_get(bio);
if (bitmap_full(&read_bitmap, nr_secs)) { if (bitmap_full(&read_bitmap, nr_secs)) {
bio_endio(bio); bio_endio(bio);
atomic_inc(&pblk->inflight_io); atomic_inc(&pblk->inflight_io);
pblk_end_io_read(rqd); __pblk_end_io_read(pblk, rqd, false);
return NVM_IO_OK; return NVM_IO_OK;
} }
/* All sectors are to be read from the device */ /* All sectors are to be read from the device */
if (bitmap_empty(&read_bitmap, rqd->nr_ppas)) { if (bitmap_empty(&read_bitmap, rqd->nr_ppas)) {
struct bio *int_bio = NULL; struct bio *int_bio = NULL;
struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
/* Clone read bio to deal with read errors internally */ /* Clone read bio to deal with read errors internally */
int_bio = bio_clone_fast(bio, GFP_KERNEL, pblk_bio_set); int_bio = bio_clone_fast(bio, GFP_KERNEL, pblk_bio_set);
@ -399,40 +451,46 @@ int pblk_submit_read(struct pblk *pblk, struct bio *bio)
return NVM_IO_OK; return NVM_IO_OK;
fail_rqd_free: fail_rqd_free:
pblk_free_rqd(pblk, rqd, READ); pblk_free_rqd(pblk, rqd, PBLK_READ);
return ret; return ret;
} }
static int read_ppalist_rq_gc(struct pblk *pblk, struct nvm_rq *rqd, static int read_ppalist_rq_gc(struct pblk *pblk, struct nvm_rq *rqd,
struct pblk_line *line, u64 *lba_list, struct pblk_line *line, u64 *lba_list,
unsigned int nr_secs) u64 *paddr_list_gc, unsigned int nr_secs)
{ {
struct ppa_addr ppas[PBLK_MAX_REQ_ADDRS]; struct ppa_addr ppa_list_l2p[PBLK_MAX_REQ_ADDRS];
struct ppa_addr ppa_gc;
int valid_secs = 0; int valid_secs = 0;
int i; int i;
pblk_lookup_l2p_rand(pblk, ppas, lba_list, nr_secs); pblk_lookup_l2p_rand(pblk, ppa_list_l2p, lba_list, nr_secs);
for (i = 0; i < nr_secs; i++) { for (i = 0; i < nr_secs; i++) {
if (pblk_addr_in_cache(ppas[i]) || ppas[i].g.blk != line->id || if (lba_list[i] == ADDR_EMPTY)
pblk_ppa_empty(ppas[i])) { continue;
lba_list[i] = ADDR_EMPTY;
ppa_gc = addr_to_gen_ppa(pblk, paddr_list_gc[i], line->id);
if (!pblk_ppa_comp(ppa_list_l2p[i], ppa_gc)) {
paddr_list_gc[i] = lba_list[i] = ADDR_EMPTY;
continue; continue;
} }
rqd->ppa_list[valid_secs++] = ppas[i]; rqd->ppa_list[valid_secs++] = ppa_list_l2p[i];
} }
#ifdef CONFIG_NVM_DEBUG #ifdef CONFIG_NVM_DEBUG
atomic_long_add(valid_secs, &pblk->inflight_reads); atomic_long_add(valid_secs, &pblk->inflight_reads);
#endif #endif
return valid_secs; return valid_secs;
} }
static int read_rq_gc(struct pblk *pblk, struct nvm_rq *rqd, static int read_rq_gc(struct pblk *pblk, struct nvm_rq *rqd,
struct pblk_line *line, sector_t lba) struct pblk_line *line, sector_t lba,
u64 paddr_gc)
{ {
struct ppa_addr ppa; struct ppa_addr ppa_l2p, ppa_gc;
int valid_secs = 0; int valid_secs = 0;
if (lba == ADDR_EMPTY) if (lba == ADDR_EMPTY)
@ -445,15 +503,14 @@ static int read_rq_gc(struct pblk *pblk, struct nvm_rq *rqd,
} }
spin_lock(&pblk->trans_lock); spin_lock(&pblk->trans_lock);
ppa = pblk_trans_map_get(pblk, lba); ppa_l2p = pblk_trans_map_get(pblk, lba);
spin_unlock(&pblk->trans_lock); spin_unlock(&pblk->trans_lock);
/* Ignore updated values until the moment */ ppa_gc = addr_to_gen_ppa(pblk, paddr_gc, line->id);
if (pblk_addr_in_cache(ppa) || ppa.g.blk != line->id || if (!pblk_ppa_comp(ppa_l2p, ppa_gc))
pblk_ppa_empty(ppa))
goto out; goto out;
rqd->ppa_addr = ppa; rqd->ppa_addr = ppa_l2p;
valid_secs = 1; valid_secs = 1;
#ifdef CONFIG_NVM_DEBUG #ifdef CONFIG_NVM_DEBUG
@ -464,42 +521,44 @@ out:
return valid_secs; return valid_secs;
} }
int pblk_submit_read_gc(struct pblk *pblk, u64 *lba_list, void *data, int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq)
unsigned int nr_secs, unsigned int *secs_to_gc,
struct pblk_line *line)
{ {
struct nvm_tgt_dev *dev = pblk->dev; struct nvm_tgt_dev *dev = pblk->dev;
struct nvm_geo *geo = &dev->geo; struct nvm_geo *geo = &dev->geo;
struct bio *bio; struct bio *bio;
struct nvm_rq rqd; struct nvm_rq rqd;
int ret, data_len; int data_len;
DECLARE_COMPLETION_ONSTACK(wait); int ret = NVM_IO_OK;
memset(&rqd, 0, sizeof(struct nvm_rq)); memset(&rqd, 0, sizeof(struct nvm_rq));
rqd.meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL, rqd.meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL,
&rqd.dma_meta_list); &rqd.dma_meta_list);
if (!rqd.meta_list) if (!rqd.meta_list)
return NVM_IO_ERR; return -ENOMEM;
if (nr_secs > 1) { if (gc_rq->nr_secs > 1) {
rqd.ppa_list = rqd.meta_list + pblk_dma_meta_size; rqd.ppa_list = rqd.meta_list + pblk_dma_meta_size;
rqd.dma_ppa_list = rqd.dma_meta_list + pblk_dma_meta_size; rqd.dma_ppa_list = rqd.dma_meta_list + pblk_dma_meta_size;
*secs_to_gc = read_ppalist_rq_gc(pblk, &rqd, line, lba_list, gc_rq->secs_to_gc = read_ppalist_rq_gc(pblk, &rqd, gc_rq->line,
nr_secs); gc_rq->lba_list,
if (*secs_to_gc == 1) gc_rq->paddr_list,
gc_rq->nr_secs);
if (gc_rq->secs_to_gc == 1)
rqd.ppa_addr = rqd.ppa_list[0]; rqd.ppa_addr = rqd.ppa_list[0];
} else { } else {
*secs_to_gc = read_rq_gc(pblk, &rqd, line, lba_list[0]); gc_rq->secs_to_gc = read_rq_gc(pblk, &rqd, gc_rq->line,
gc_rq->lba_list[0],
gc_rq->paddr_list[0]);
} }
if (!(*secs_to_gc)) if (!(gc_rq->secs_to_gc))
goto out; goto out;
data_len = (*secs_to_gc) * geo->sec_size; data_len = (gc_rq->secs_to_gc) * geo->sec_size;
bio = pblk_bio_map_addr(pblk, data, *secs_to_gc, data_len, bio = pblk_bio_map_addr(pblk, gc_rq->data, gc_rq->secs_to_gc, data_len,
PBLK_KMALLOC_META, GFP_KERNEL); PBLK_VMALLOC_META, GFP_KERNEL);
if (IS_ERR(bio)) { if (IS_ERR(bio)) {
pr_err("pblk: could not allocate GC bio (%lu)\n", PTR_ERR(bio)); pr_err("pblk: could not allocate GC bio (%lu)\n", PTR_ERR(bio));
goto err_free_dma; goto err_free_dma;
@ -509,23 +568,16 @@ int pblk_submit_read_gc(struct pblk *pblk, u64 *lba_list, void *data,
bio_set_op_attrs(bio, REQ_OP_READ, 0); bio_set_op_attrs(bio, REQ_OP_READ, 0);
rqd.opcode = NVM_OP_PREAD; rqd.opcode = NVM_OP_PREAD;
rqd.end_io = pblk_end_io_sync; rqd.nr_ppas = gc_rq->secs_to_gc;
rqd.private = &wait;
rqd.nr_ppas = *secs_to_gc;
rqd.flags = pblk_set_read_mode(pblk, PBLK_READ_RANDOM); rqd.flags = pblk_set_read_mode(pblk, PBLK_READ_RANDOM);
rqd.bio = bio; rqd.bio = bio;
ret = pblk_submit_read_io(pblk, &rqd); if (pblk_submit_io_sync(pblk, &rqd)) {
if (ret) { ret = -EIO;
bio_endio(bio);
pr_err("pblk: GC read request failed\n"); pr_err("pblk: GC read request failed\n");
goto err_free_dma; goto err_free_bio;
} }
if (!wait_for_completion_io_timeout(&wait,
msecs_to_jiffies(PBLK_COMMAND_TIMEOUT_MS))) {
pr_err("pblk: GC read I/O timed out\n");
}
atomic_dec(&pblk->inflight_io); atomic_dec(&pblk->inflight_io);
if (rqd.error) { if (rqd.error) {
@ -536,16 +588,18 @@ int pblk_submit_read_gc(struct pblk *pblk, u64 *lba_list, void *data,
} }
#ifdef CONFIG_NVM_DEBUG #ifdef CONFIG_NVM_DEBUG
atomic_long_add(*secs_to_gc, &pblk->sync_reads); atomic_long_add(gc_rq->secs_to_gc, &pblk->sync_reads);
atomic_long_add(*secs_to_gc, &pblk->recov_gc_reads); atomic_long_add(gc_rq->secs_to_gc, &pblk->recov_gc_reads);
atomic_long_sub(*secs_to_gc, &pblk->inflight_reads); atomic_long_sub(gc_rq->secs_to_gc, &pblk->inflight_reads);
#endif #endif
out: out:
nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list); nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
return NVM_IO_OK; return ret;
err_free_bio:
bio_put(bio);
err_free_dma: err_free_dma:
nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list); nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list);
return NVM_IO_ERR; return ret;
} }

View File

@ -34,10 +34,6 @@ void pblk_submit_rec(struct work_struct *work)
max_secs); max_secs);
bio = bio_alloc(GFP_KERNEL, nr_rec_secs); bio = bio_alloc(GFP_KERNEL, nr_rec_secs);
if (!bio) {
pr_err("pblk: not able to create recovery bio\n");
return;
}
bio->bi_iter.bi_sector = 0; bio->bi_iter.bi_sector = 0;
bio_set_op_attrs(bio, REQ_OP_WRITE, 0); bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
@ -71,7 +67,7 @@ void pblk_submit_rec(struct work_struct *work)
err: err:
bio_put(bio); bio_put(bio);
pblk_free_rqd(pblk, rqd, WRITE); pblk_free_rqd(pblk, rqd, PBLK_WRITE);
} }
int pblk_recov_setup_rq(struct pblk *pblk, struct pblk_c_ctx *c_ctx, int pblk_recov_setup_rq(struct pblk *pblk, struct pblk_c_ctx *c_ctx,
@ -84,12 +80,7 @@ int pblk_recov_setup_rq(struct pblk *pblk, struct pblk_c_ctx *c_ctx,
struct pblk_c_ctx *rec_ctx; struct pblk_c_ctx *rec_ctx;
int nr_entries = c_ctx->nr_valid + c_ctx->nr_padded; int nr_entries = c_ctx->nr_valid + c_ctx->nr_padded;
rec_rqd = pblk_alloc_rqd(pblk, WRITE); rec_rqd = pblk_alloc_rqd(pblk, PBLK_WRITE);
if (IS_ERR(rec_rqd)) {
pr_err("pblk: could not create recovery req.\n");
return -ENOMEM;
}
rec_ctx = nvm_rq_to_pdu(rec_rqd); rec_ctx = nvm_rq_to_pdu(rec_rqd);
/* Copy completion bitmap, but exclude the first X completed entries */ /* Copy completion bitmap, but exclude the first X completed entries */
@ -142,19 +133,19 @@ static int pblk_recov_l2p_from_emeta(struct pblk *pblk, struct pblk_line *line)
struct pblk_emeta *emeta = line->emeta; struct pblk_emeta *emeta = line->emeta;
struct line_emeta *emeta_buf = emeta->buf; struct line_emeta *emeta_buf = emeta->buf;
__le64 *lba_list; __le64 *lba_list;
int data_start; u64 data_start, data_end;
int nr_data_lbas, nr_valid_lbas, nr_lbas = 0; u64 nr_valid_lbas, nr_lbas = 0;
int i; u64 i;
lba_list = pblk_recov_get_lba_list(pblk, emeta_buf); lba_list = pblk_recov_get_lba_list(pblk, emeta_buf);
if (!lba_list) if (!lba_list)
return 1; return 1;
data_start = pblk_line_smeta_start(pblk, line) + lm->smeta_sec; data_start = pblk_line_smeta_start(pblk, line) + lm->smeta_sec;
nr_data_lbas = lm->sec_per_line - lm->emeta_sec[0]; data_end = line->emeta_ssec;
nr_valid_lbas = le64_to_cpu(emeta_buf->nr_valid_lbas); nr_valid_lbas = le64_to_cpu(emeta_buf->nr_valid_lbas);
for (i = data_start; i < nr_data_lbas && nr_lbas < nr_valid_lbas; i++) { for (i = data_start; i < data_end; i++) {
struct ppa_addr ppa; struct ppa_addr ppa;
int pos; int pos;
@ -181,8 +172,8 @@ static int pblk_recov_l2p_from_emeta(struct pblk *pblk, struct pblk_line *line)
} }
if (nr_valid_lbas != nr_lbas) if (nr_valid_lbas != nr_lbas)
pr_err("pblk: line %d - inconsistent lba list(%llu/%d)\n", pr_err("pblk: line %d - inconsistent lba list(%llu/%llu)\n",
line->id, emeta_buf->nr_valid_lbas, nr_lbas); line->id, nr_valid_lbas, nr_lbas);
line->left_msecs = 0; line->left_msecs = 0;
@ -225,7 +216,6 @@ static int pblk_recov_read_oob(struct pblk *pblk, struct pblk_line *line,
int rq_ppas, rq_len; int rq_ppas, rq_len;
int i, j; int i, j;
int ret = 0; int ret = 0;
DECLARE_COMPLETION_ONSTACK(wait);
ppa_list = p.ppa_list; ppa_list = p.ppa_list;
meta_list = p.meta_list; meta_list = p.meta_list;
@ -262,8 +252,6 @@ next_read_rq:
rqd->ppa_list = ppa_list; rqd->ppa_list = ppa_list;
rqd->dma_ppa_list = dma_ppa_list; rqd->dma_ppa_list = dma_ppa_list;
rqd->dma_meta_list = dma_meta_list; rqd->dma_meta_list = dma_meta_list;
rqd->end_io = pblk_end_io_sync;
rqd->private = &wait;
if (pblk_io_aligned(pblk, rq_ppas)) if (pblk_io_aligned(pblk, rq_ppas))
rqd->flags = pblk_set_read_mode(pblk, PBLK_READ_SEQUENTIAL); rqd->flags = pblk_set_read_mode(pblk, PBLK_READ_SEQUENTIAL);
@ -289,19 +277,13 @@ next_read_rq:
} }
/* If read fails, more padding is needed */ /* If read fails, more padding is needed */
ret = pblk_submit_io(pblk, rqd); ret = pblk_submit_io_sync(pblk, rqd);
if (ret) { if (ret) {
pr_err("pblk: I/O submission failed: %d\n", ret); pr_err("pblk: I/O submission failed: %d\n", ret);
return ret; return ret;
} }
if (!wait_for_completion_io_timeout(&wait,
msecs_to_jiffies(PBLK_COMMAND_TIMEOUT_MS))) {
pr_err("pblk: L2P recovery read timed out\n");
return -EINTR;
}
atomic_dec(&pblk->inflight_io); atomic_dec(&pblk->inflight_io);
reinit_completion(&wait);
/* At this point, the read should not fail. If it does, it is a problem /* At this point, the read should not fail. If it does, it is a problem
* we cannot recover from here. Need FTL log. * we cannot recover from here. Need FTL log.
@ -338,13 +320,10 @@ static void pblk_end_io_recov(struct nvm_rq *rqd)
{ {
struct pblk_pad_rq *pad_rq = rqd->private; struct pblk_pad_rq *pad_rq = rqd->private;
struct pblk *pblk = pad_rq->pblk; struct pblk *pblk = pad_rq->pblk;
struct nvm_tgt_dev *dev = pblk->dev;
pblk_up_page(pblk, rqd->ppa_list, rqd->nr_ppas); pblk_up_page(pblk, rqd->ppa_list, rqd->nr_ppas);
bio_put(rqd->bio); pblk_free_rqd(pblk, rqd, PBLK_WRITE_INT);
nvm_dev_dma_free(dev->parent, rqd->meta_list, rqd->dma_meta_list);
pblk_free_rqd(pblk, rqd, WRITE);
atomic_dec(&pblk->inflight_io); atomic_dec(&pblk->inflight_io);
kref_put(&pad_rq->ref, pblk_recov_complete); kref_put(&pad_rq->ref, pblk_recov_complete);
@ -404,25 +383,21 @@ next_pad_rq:
ppa_list = (void *)(meta_list) + pblk_dma_meta_size; ppa_list = (void *)(meta_list) + pblk_dma_meta_size;
dma_ppa_list = dma_meta_list + pblk_dma_meta_size; dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
rqd = pblk_alloc_rqd(pblk, WRITE);
if (IS_ERR(rqd)) {
ret = PTR_ERR(rqd);
goto fail_free_meta;
}
bio = pblk_bio_map_addr(pblk, data, rq_ppas, rq_len, bio = pblk_bio_map_addr(pblk, data, rq_ppas, rq_len,
PBLK_VMALLOC_META, GFP_KERNEL); PBLK_VMALLOC_META, GFP_KERNEL);
if (IS_ERR(bio)) { if (IS_ERR(bio)) {
ret = PTR_ERR(bio); ret = PTR_ERR(bio);
goto fail_free_rqd; goto fail_free_meta;
} }
bio->bi_iter.bi_sector = 0; /* internal bio */ bio->bi_iter.bi_sector = 0; /* internal bio */
bio_set_op_attrs(bio, REQ_OP_WRITE, 0); bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
rqd = pblk_alloc_rqd(pblk, PBLK_WRITE_INT);
rqd->bio = bio; rqd->bio = bio;
rqd->opcode = NVM_OP_PWRITE; rqd->opcode = NVM_OP_PWRITE;
rqd->flags = pblk_set_progr_mode(pblk, WRITE); rqd->flags = pblk_set_progr_mode(pblk, PBLK_WRITE);
rqd->meta_list = meta_list; rqd->meta_list = meta_list;
rqd->nr_ppas = rq_ppas; rqd->nr_ppas = rq_ppas;
rqd->ppa_list = ppa_list; rqd->ppa_list = ppa_list;
@ -490,8 +465,6 @@ free_rq:
fail_free_bio: fail_free_bio:
bio_put(bio); bio_put(bio);
fail_free_rqd:
pblk_free_rqd(pblk, rqd, WRITE);
fail_free_meta: fail_free_meta:
nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list); nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
fail_free_pad: fail_free_pad:
@ -522,7 +495,6 @@ static int pblk_recov_scan_all_oob(struct pblk *pblk, struct pblk_line *line,
int ret = 0; int ret = 0;
int rec_round; int rec_round;
int left_ppas = pblk_calc_sec_in_line(pblk, line) - line->cur_sec; int left_ppas = pblk_calc_sec_in_line(pblk, line) - line->cur_sec;
DECLARE_COMPLETION_ONSTACK(wait);
ppa_list = p.ppa_list; ppa_list = p.ppa_list;
meta_list = p.meta_list; meta_list = p.meta_list;
@ -557,8 +529,6 @@ next_rq:
rqd->ppa_list = ppa_list; rqd->ppa_list = ppa_list;
rqd->dma_ppa_list = dma_ppa_list; rqd->dma_ppa_list = dma_ppa_list;
rqd->dma_meta_list = dma_meta_list; rqd->dma_meta_list = dma_meta_list;
rqd->end_io = pblk_end_io_sync;
rqd->private = &wait;
if (pblk_io_aligned(pblk, rq_ppas)) if (pblk_io_aligned(pblk, rq_ppas))
rqd->flags = pblk_set_read_mode(pblk, PBLK_READ_SEQUENTIAL); rqd->flags = pblk_set_read_mode(pblk, PBLK_READ_SEQUENTIAL);
@ -584,18 +554,13 @@ next_rq:
addr_to_gen_ppa(pblk, w_ptr, line->id); addr_to_gen_ppa(pblk, w_ptr, line->id);
} }
ret = pblk_submit_io(pblk, rqd); ret = pblk_submit_io_sync(pblk, rqd);
if (ret) { if (ret) {
pr_err("pblk: I/O submission failed: %d\n", ret); pr_err("pblk: I/O submission failed: %d\n", ret);
return ret; return ret;
} }
if (!wait_for_completion_io_timeout(&wait,
msecs_to_jiffies(PBLK_COMMAND_TIMEOUT_MS))) {
pr_err("pblk: L2P recovery read timed out\n");
}
atomic_dec(&pblk->inflight_io); atomic_dec(&pblk->inflight_io);
reinit_completion(&wait);
/* This should not happen since the read failed during normal recovery, /* This should not happen since the read failed during normal recovery,
* but the media works funny sometimes... * but the media works funny sometimes...
@ -663,7 +628,6 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
int i, j; int i, j;
int ret = 0; int ret = 0;
int left_ppas = pblk_calc_sec_in_line(pblk, line); int left_ppas = pblk_calc_sec_in_line(pblk, line);
DECLARE_COMPLETION_ONSTACK(wait);
ppa_list = p.ppa_list; ppa_list = p.ppa_list;
meta_list = p.meta_list; meta_list = p.meta_list;
@ -696,8 +660,6 @@ next_rq:
rqd->ppa_list = ppa_list; rqd->ppa_list = ppa_list;
rqd->dma_ppa_list = dma_ppa_list; rqd->dma_ppa_list = dma_ppa_list;
rqd->dma_meta_list = dma_meta_list; rqd->dma_meta_list = dma_meta_list;
rqd->end_io = pblk_end_io_sync;
rqd->private = &wait;
if (pblk_io_aligned(pblk, rq_ppas)) if (pblk_io_aligned(pblk, rq_ppas))
rqd->flags = pblk_set_read_mode(pblk, PBLK_READ_SEQUENTIAL); rqd->flags = pblk_set_read_mode(pblk, PBLK_READ_SEQUENTIAL);
@ -723,19 +685,14 @@ next_rq:
addr_to_gen_ppa(pblk, paddr, line->id); addr_to_gen_ppa(pblk, paddr, line->id);
} }
ret = pblk_submit_io(pblk, rqd); ret = pblk_submit_io_sync(pblk, rqd);
if (ret) { if (ret) {
pr_err("pblk: I/O submission failed: %d\n", ret); pr_err("pblk: I/O submission failed: %d\n", ret);
bio_put(bio); bio_put(bio);
return ret; return ret;
} }
if (!wait_for_completion_io_timeout(&wait,
msecs_to_jiffies(PBLK_COMMAND_TIMEOUT_MS))) {
pr_err("pblk: L2P recovery read timed out\n");
}
atomic_dec(&pblk->inflight_io); atomic_dec(&pblk->inflight_io);
reinit_completion(&wait);
/* Reached the end of the written line */ /* Reached the end of the written line */
if (rqd->error) { if (rqd->error) {
@ -785,15 +742,9 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line)
dma_addr_t dma_ppa_list, dma_meta_list; dma_addr_t dma_ppa_list, dma_meta_list;
int done, ret = 0; int done, ret = 0;
rqd = pblk_alloc_rqd(pblk, READ);
if (IS_ERR(rqd))
return PTR_ERR(rqd);
meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL, &dma_meta_list); meta_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL, &dma_meta_list);
if (!meta_list) { if (!meta_list)
ret = -ENOMEM; return -ENOMEM;
goto free_rqd;
}
ppa_list = (void *)(meta_list) + pblk_dma_meta_size; ppa_list = (void *)(meta_list) + pblk_dma_meta_size;
dma_ppa_list = dma_meta_list + pblk_dma_meta_size; dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
@ -804,6 +755,8 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line)
goto free_meta_list; goto free_meta_list;
} }
rqd = pblk_alloc_rqd(pblk, PBLK_READ);
p.ppa_list = ppa_list; p.ppa_list = ppa_list;
p.meta_list = meta_list; p.meta_list = meta_list;
p.rqd = rqd; p.rqd = rqd;
@ -832,8 +785,6 @@ out:
kfree(data); kfree(data);
free_meta_list: free_meta_list:
nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list); nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list);
free_rqd:
pblk_free_rqd(pblk, rqd, READ);
return ret; return ret;
} }
@ -851,10 +802,32 @@ static void pblk_recov_line_add_ordered(struct list_head *head,
__list_add(&line->list, t->list.prev, &t->list); __list_add(&line->list, t->list.prev, &t->list);
} }
struct pblk_line *pblk_recov_l2p(struct pblk *pblk) static u64 pblk_line_emeta_start(struct pblk *pblk, struct pblk_line *line)
{ {
struct nvm_tgt_dev *dev = pblk->dev; struct nvm_tgt_dev *dev = pblk->dev;
struct nvm_geo *geo = &dev->geo; struct nvm_geo *geo = &dev->geo;
struct pblk_line_meta *lm = &pblk->lm;
unsigned int emeta_secs;
u64 emeta_start;
struct ppa_addr ppa;
int pos;
emeta_secs = lm->emeta_sec[0];
emeta_start = lm->sec_per_line;
while (emeta_secs) {
emeta_start--;
ppa = addr_to_pblk_ppa(pblk, emeta_start, line->id);
pos = pblk_ppa_to_pos(geo, ppa);
if (!test_bit(pos, line->blk_bitmap))
emeta_secs--;
}
return emeta_start;
}
struct pblk_line *pblk_recov_l2p(struct pblk *pblk)
{
struct pblk_line_meta *lm = &pblk->lm; struct pblk_line_meta *lm = &pblk->lm;
struct pblk_line_mgmt *l_mg = &pblk->l_mg; struct pblk_line_mgmt *l_mg = &pblk->l_mg;
struct pblk_line *line, *tline, *data_line = NULL; struct pblk_line *line, *tline, *data_line = NULL;
@ -900,9 +873,9 @@ struct pblk_line *pblk_recov_l2p(struct pblk *pblk)
if (le32_to_cpu(smeta_buf->header.identifier) != PBLK_MAGIC) if (le32_to_cpu(smeta_buf->header.identifier) != PBLK_MAGIC)
continue; continue;
if (le16_to_cpu(smeta_buf->header.version) != 1) { if (smeta_buf->header.version != SMETA_VERSION) {
pr_err("pblk: found incompatible line version %u\n", pr_err("pblk: found incompatible line version %u\n",
smeta_buf->header.version); le16_to_cpu(smeta_buf->header.version));
return ERR_PTR(-EINVAL); return ERR_PTR(-EINVAL);
} }
@ -954,15 +927,9 @@ struct pblk_line *pblk_recov_l2p(struct pblk *pblk)
/* Verify closed blocks and recover this portion of L2P table*/ /* Verify closed blocks and recover this portion of L2P table*/
list_for_each_entry_safe(line, tline, &recov_list, list) { list_for_each_entry_safe(line, tline, &recov_list, list) {
int off, nr_bb;
recovered_lines++; recovered_lines++;
/* Calculate where emeta starts based on the line bb */
off = lm->sec_per_line - lm->emeta_sec[0];
nr_bb = bitmap_weight(line->blk_bitmap, lm->blk_per_line);
off -= nr_bb * geo->sec_per_pl;
line->emeta_ssec = off; line->emeta_ssec = pblk_line_emeta_start(pblk, line);
line->emeta = emeta; line->emeta = emeta;
memset(line->emeta->buf, 0, lm->emeta_len[0]); memset(line->emeta->buf, 0, lm->emeta_len[0]);
@ -987,7 +954,7 @@ next:
list_move_tail(&line->list, move_list); list_move_tail(&line->list, move_list);
spin_unlock(&l_mg->gc_lock); spin_unlock(&l_mg->gc_lock);
mempool_free(line->map_bitmap, pblk->line_meta_pool); kfree(line->map_bitmap);
line->map_bitmap = NULL; line->map_bitmap = NULL;
line->smeta = NULL; line->smeta = NULL;
line->emeta = NULL; line->emeta = NULL;

View File

@ -96,9 +96,11 @@ unsigned long pblk_rl_nr_free_blks(struct pblk_rl *rl)
* *
* Only the total number of free blocks is used to configure the rate limiter. * Only the total number of free blocks is used to configure the rate limiter.
*/ */
static int pblk_rl_update_rates(struct pblk_rl *rl, unsigned long max) void pblk_rl_update_rates(struct pblk_rl *rl)
{ {
struct pblk *pblk = container_of(rl, struct pblk, rl);
unsigned long free_blocks = pblk_rl_nr_free_blks(rl); unsigned long free_blocks = pblk_rl_nr_free_blks(rl);
int max = rl->rb_budget;
if (free_blocks >= rl->high) { if (free_blocks >= rl->high) {
rl->rb_user_max = max; rl->rb_user_max = max;
@ -124,23 +126,18 @@ static int pblk_rl_update_rates(struct pblk_rl *rl, unsigned long max)
rl->rb_state = PBLK_RL_LOW; rl->rb_state = PBLK_RL_LOW;
} }
return rl->rb_state; if (rl->rb_state == (PBLK_RL_MID | PBLK_RL_LOW))
pblk_gc_should_start(pblk);
else
pblk_gc_should_stop(pblk);
} }
void pblk_rl_free_lines_inc(struct pblk_rl *rl, struct pblk_line *line) void pblk_rl_free_lines_inc(struct pblk_rl *rl, struct pblk_line *line)
{ {
struct pblk *pblk = container_of(rl, struct pblk, rl);
int blk_in_line = atomic_read(&line->blk_in_line); int blk_in_line = atomic_read(&line->blk_in_line);
int ret;
atomic_add(blk_in_line, &rl->free_blocks); atomic_add(blk_in_line, &rl->free_blocks);
/* Rates will not change that often - no need to lock update */ pblk_rl_update_rates(rl);
ret = pblk_rl_update_rates(rl, rl->rb_budget);
if (ret == (PBLK_RL_MID | PBLK_RL_LOW))
pblk_gc_should_start(pblk);
else
pblk_gc_should_stop(pblk);
} }
void pblk_rl_free_lines_dec(struct pblk_rl *rl, struct pblk_line *line) void pblk_rl_free_lines_dec(struct pblk_rl *rl, struct pblk_line *line)
@ -148,19 +145,7 @@ void pblk_rl_free_lines_dec(struct pblk_rl *rl, struct pblk_line *line)
int blk_in_line = atomic_read(&line->blk_in_line); int blk_in_line = atomic_read(&line->blk_in_line);
atomic_sub(blk_in_line, &rl->free_blocks); atomic_sub(blk_in_line, &rl->free_blocks);
} pblk_rl_update_rates(rl);
void pblk_gc_should_kick(struct pblk *pblk)
{
struct pblk_rl *rl = &pblk->rl;
int ret;
/* Rates will not change that often - no need to lock update */
ret = pblk_rl_update_rates(rl, rl->rb_budget);
if (ret == (PBLK_RL_MID | PBLK_RL_LOW))
pblk_gc_should_start(pblk);
else
pblk_gc_should_stop(pblk);
} }
int pblk_rl_high_thrs(struct pblk_rl *rl) int pblk_rl_high_thrs(struct pblk_rl *rl)
@ -168,14 +153,9 @@ int pblk_rl_high_thrs(struct pblk_rl *rl)
return rl->high; return rl->high;
} }
int pblk_rl_low_thrs(struct pblk_rl *rl) int pblk_rl_max_io(struct pblk_rl *rl)
{ {
return rl->low; return rl->rb_max_io;
}
int pblk_rl_sysfs_rate_show(struct pblk_rl *rl)
{
return rl->rb_user_max;
} }
static void pblk_rl_u_timer(unsigned long data) static void pblk_rl_u_timer(unsigned long data)
@ -214,6 +194,7 @@ void pblk_rl_init(struct pblk_rl *rl, int budget)
/* To start with, all buffer is available to user I/O writers */ /* To start with, all buffer is available to user I/O writers */
rl->rb_budget = budget; rl->rb_budget = budget;
rl->rb_user_max = budget; rl->rb_user_max = budget;
rl->rb_max_io = budget >> 1;
rl->rb_gc_max = 0; rl->rb_gc_max = 0;
rl->rb_state = PBLK_RL_HIGH; rl->rb_state = PBLK_RL_HIGH;

View File

@ -253,7 +253,7 @@ static ssize_t pblk_sysfs_lines(struct pblk *pblk, char *page)
sz += snprintf(page + sz, PAGE_SIZE - sz, sz += snprintf(page + sz, PAGE_SIZE - sz,
"GC: full:%d, high:%d, mid:%d, low:%d, empty:%d, queue:%d\n", "GC: full:%d, high:%d, mid:%d, low:%d, empty:%d, queue:%d\n",
gc_full, gc_high, gc_mid, gc_low, gc_empty, gc_full, gc_high, gc_mid, gc_low, gc_empty,
atomic_read(&pblk->gc.inflight_gc)); atomic_read(&pblk->gc.read_inflight_gc));
sz += snprintf(page + sz, PAGE_SIZE - sz, sz += snprintf(page + sz, PAGE_SIZE - sz,
"data (%d) cur:%d, left:%d, vsc:%d, s:%d, map:%d/%d (%d)\n", "data (%d) cur:%d, left:%d, vsc:%d, s:%d, map:%d/%d (%d)\n",

View File

@ -20,7 +20,6 @@
static unsigned long pblk_end_w_bio(struct pblk *pblk, struct nvm_rq *rqd, static unsigned long pblk_end_w_bio(struct pblk *pblk, struct nvm_rq *rqd,
struct pblk_c_ctx *c_ctx) struct pblk_c_ctx *c_ctx)
{ {
struct nvm_tgt_dev *dev = pblk->dev;
struct bio *original_bio; struct bio *original_bio;
unsigned long ret; unsigned long ret;
int i; int i;
@ -33,16 +32,18 @@ static unsigned long pblk_end_w_bio(struct pblk *pblk, struct nvm_rq *rqd,
bio_endio(original_bio); bio_endio(original_bio);
} }
if (c_ctx->nr_padded)
pblk_bio_free_pages(pblk, rqd->bio, c_ctx->nr_valid,
c_ctx->nr_padded);
#ifdef CONFIG_NVM_DEBUG #ifdef CONFIG_NVM_DEBUG
atomic_long_add(c_ctx->nr_valid, &pblk->sync_writes); atomic_long_add(rqd->nr_ppas, &pblk->sync_writes);
#endif #endif
ret = pblk_rb_sync_advance(&pblk->rwb, c_ctx->nr_valid); ret = pblk_rb_sync_advance(&pblk->rwb, c_ctx->nr_valid);
nvm_dev_dma_free(dev->parent, rqd->meta_list, rqd->dma_meta_list);
bio_put(rqd->bio); bio_put(rqd->bio);
pblk_free_rqd(pblk, rqd, WRITE); pblk_free_rqd(pblk, rqd, PBLK_WRITE);
return ret; return ret;
} }
@ -107,10 +108,7 @@ static void pblk_end_w_fail(struct pblk *pblk, struct nvm_rq *rqd)
ppa_list = &rqd->ppa_addr; ppa_list = &rqd->ppa_addr;
recovery = mempool_alloc(pblk->rec_pool, GFP_ATOMIC); recovery = mempool_alloc(pblk->rec_pool, GFP_ATOMIC);
if (!recovery) {
pr_err("pblk: could not allocate recovery context\n");
return;
}
INIT_LIST_HEAD(&recovery->failed); INIT_LIST_HEAD(&recovery->failed);
bit = -1; bit = -1;
@ -175,7 +173,6 @@ static void pblk_end_io_write(struct nvm_rq *rqd)
static void pblk_end_io_write_meta(struct nvm_rq *rqd) static void pblk_end_io_write_meta(struct nvm_rq *rqd)
{ {
struct pblk *pblk = rqd->private; struct pblk *pblk = rqd->private;
struct nvm_tgt_dev *dev = pblk->dev;
struct pblk_g_ctx *m_ctx = nvm_rq_to_pdu(rqd); struct pblk_g_ctx *m_ctx = nvm_rq_to_pdu(rqd);
struct pblk_line *line = m_ctx->private; struct pblk_line *line = m_ctx->private;
struct pblk_emeta *emeta = line->emeta; struct pblk_emeta *emeta = line->emeta;
@ -187,19 +184,13 @@ static void pblk_end_io_write_meta(struct nvm_rq *rqd)
pblk_log_write_err(pblk, rqd); pblk_log_write_err(pblk, rqd);
pr_err("pblk: metadata I/O failed. Line %d\n", line->id); pr_err("pblk: metadata I/O failed. Line %d\n", line->id);
} }
#ifdef CONFIG_NVM_DEBUG
else
WARN_ONCE(rqd->bio->bi_status, "pblk: corrupted write error\n");
#endif
sync = atomic_add_return(rqd->nr_ppas, &emeta->sync); sync = atomic_add_return(rqd->nr_ppas, &emeta->sync);
if (sync == emeta->nr_entries) if (sync == emeta->nr_entries)
pblk_line_run_ws(pblk, line, NULL, pblk_line_close_ws, pblk_gen_run_ws(pblk, line, NULL, pblk_line_close_ws,
pblk->close_wq); GFP_ATOMIC, pblk->close_wq);
bio_put(rqd->bio); pblk_free_rqd(pblk, rqd, PBLK_WRITE_INT);
nvm_dev_dma_free(dev->parent, rqd->meta_list, rqd->dma_meta_list);
pblk_free_rqd(pblk, rqd, READ);
atomic_dec(&pblk->inflight_io); atomic_dec(&pblk->inflight_io);
} }
@ -213,7 +204,7 @@ static int pblk_alloc_w_rq(struct pblk *pblk, struct nvm_rq *rqd,
/* Setup write request */ /* Setup write request */
rqd->opcode = NVM_OP_PWRITE; rqd->opcode = NVM_OP_PWRITE;
rqd->nr_ppas = nr_secs; rqd->nr_ppas = nr_secs;
rqd->flags = pblk_set_progr_mode(pblk, WRITE); rqd->flags = pblk_set_progr_mode(pblk, PBLK_WRITE);
rqd->private = pblk; rqd->private = pblk;
rqd->end_io = end_io; rqd->end_io = end_io;
@ -229,15 +220,16 @@ static int pblk_alloc_w_rq(struct pblk *pblk, struct nvm_rq *rqd,
} }
static int pblk_setup_w_rq(struct pblk *pblk, struct nvm_rq *rqd, static int pblk_setup_w_rq(struct pblk *pblk, struct nvm_rq *rqd,
struct pblk_c_ctx *c_ctx, struct ppa_addr *erase_ppa) struct ppa_addr *erase_ppa)
{ {
struct pblk_line_meta *lm = &pblk->lm; struct pblk_line_meta *lm = &pblk->lm;
struct pblk_line *e_line = pblk_line_get_erase(pblk); struct pblk_line *e_line = pblk_line_get_erase(pblk);
struct pblk_c_ctx *c_ctx = nvm_rq_to_pdu(rqd);
unsigned int valid = c_ctx->nr_valid; unsigned int valid = c_ctx->nr_valid;
unsigned int padded = c_ctx->nr_padded; unsigned int padded = c_ctx->nr_padded;
unsigned int nr_secs = valid + padded; unsigned int nr_secs = valid + padded;
unsigned long *lun_bitmap; unsigned long *lun_bitmap;
int ret = 0; int ret;
lun_bitmap = kzalloc(lm->lun_bitmap_len, GFP_KERNEL); lun_bitmap = kzalloc(lm->lun_bitmap_len, GFP_KERNEL);
if (!lun_bitmap) if (!lun_bitmap)
@ -279,7 +271,7 @@ int pblk_setup_w_rec_rq(struct pblk *pblk, struct nvm_rq *rqd,
pblk_map_rq(pblk, rqd, c_ctx->sentry, lun_bitmap, c_ctx->nr_valid, 0); pblk_map_rq(pblk, rqd, c_ctx->sentry, lun_bitmap, c_ctx->nr_valid, 0);
rqd->ppa_status = (u64)0; rqd->ppa_status = (u64)0;
rqd->flags = pblk_set_progr_mode(pblk, WRITE); rqd->flags = pblk_set_progr_mode(pblk, PBLK_WRITE);
return ret; return ret;
} }
@ -303,55 +295,6 @@ static int pblk_calc_secs_to_sync(struct pblk *pblk, unsigned int secs_avail,
return secs_to_sync; return secs_to_sync;
} }
static inline int pblk_valid_meta_ppa(struct pblk *pblk,
struct pblk_line *meta_line,
struct ppa_addr *ppa_list, int nr_ppas)
{
struct nvm_tgt_dev *dev = pblk->dev;
struct nvm_geo *geo = &dev->geo;
struct pblk_line *data_line;
struct ppa_addr ppa, ppa_opt;
u64 paddr;
int i;
data_line = &pblk->lines[pblk_dev_ppa_to_line(ppa_list[0])];
paddr = pblk_lookup_page(pblk, meta_line);
ppa = addr_to_gen_ppa(pblk, paddr, 0);
if (test_bit(pblk_ppa_to_pos(geo, ppa), data_line->blk_bitmap))
return 1;
/* Schedule a metadata I/O that is half the distance from the data I/O
* with regards to the number of LUNs forming the pblk instance. This
* balances LUN conflicts across every I/O.
*
* When the LUN configuration changes (e.g., due to GC), this distance
* can align, which would result on a LUN deadlock. In this case, modify
* the distance to not be optimal, but allow metadata I/Os to succeed.
*/
ppa_opt = addr_to_gen_ppa(pblk, paddr + data_line->meta_distance, 0);
if (unlikely(ppa_opt.ppa == ppa.ppa)) {
data_line->meta_distance--;
return 0;
}
for (i = 0; i < nr_ppas; i += pblk->min_write_pgs)
if (ppa_list[i].g.ch == ppa_opt.g.ch &&
ppa_list[i].g.lun == ppa_opt.g.lun)
return 1;
if (test_bit(pblk_ppa_to_pos(geo, ppa_opt), data_line->blk_bitmap)) {
for (i = 0; i < nr_ppas; i += pblk->min_write_pgs)
if (ppa_list[i].g.ch == ppa.g.ch &&
ppa_list[i].g.lun == ppa.g.lun)
return 0;
return 1;
}
return 0;
}
int pblk_submit_meta_io(struct pblk *pblk, struct pblk_line *meta_line) int pblk_submit_meta_io(struct pblk *pblk, struct pblk_line *meta_line)
{ {
struct nvm_tgt_dev *dev = pblk->dev; struct nvm_tgt_dev *dev = pblk->dev;
@ -370,11 +313,8 @@ int pblk_submit_meta_io(struct pblk *pblk, struct pblk_line *meta_line)
int i, j; int i, j;
int ret; int ret;
rqd = pblk_alloc_rqd(pblk, READ); rqd = pblk_alloc_rqd(pblk, PBLK_WRITE_INT);
if (IS_ERR(rqd)) {
pr_err("pblk: cannot allocate write req.\n");
return PTR_ERR(rqd);
}
m_ctx = nvm_rq_to_pdu(rqd); m_ctx = nvm_rq_to_pdu(rqd);
m_ctx->private = meta_line; m_ctx->private = meta_line;
@ -407,8 +347,6 @@ int pblk_submit_meta_io(struct pblk *pblk, struct pblk_line *meta_line)
if (emeta->mem >= lm->emeta_len[0]) { if (emeta->mem >= lm->emeta_len[0]) {
spin_lock(&l_mg->close_lock); spin_lock(&l_mg->close_lock);
list_del(&meta_line->list); list_del(&meta_line->list);
WARN(!bitmap_full(meta_line->map_bitmap, lm->sec_per_line),
"pblk: corrupt meta line %d\n", meta_line->id);
spin_unlock(&l_mg->close_lock); spin_unlock(&l_mg->close_lock);
} }
@ -428,18 +366,51 @@ fail_rollback:
pblk_dealloc_page(pblk, meta_line, rq_ppas); pblk_dealloc_page(pblk, meta_line, rq_ppas);
list_add(&meta_line->list, &meta_line->list); list_add(&meta_line->list, &meta_line->list);
spin_unlock(&l_mg->close_lock); spin_unlock(&l_mg->close_lock);
nvm_dev_dma_free(dev->parent, rqd->meta_list, rqd->dma_meta_list);
fail_free_bio: fail_free_bio:
if (likely(l_mg->emeta_alloc_type == PBLK_VMALLOC_META)) bio_put(bio);
bio_put(bio);
fail_free_rqd: fail_free_rqd:
pblk_free_rqd(pblk, rqd, READ); pblk_free_rqd(pblk, rqd, PBLK_WRITE_INT);
return ret; return ret;
} }
static int pblk_sched_meta_io(struct pblk *pblk, struct ppa_addr *prev_list, static inline bool pblk_valid_meta_ppa(struct pblk *pblk,
int prev_n) struct pblk_line *meta_line,
struct nvm_rq *data_rqd)
{
struct nvm_tgt_dev *dev = pblk->dev;
struct nvm_geo *geo = &dev->geo;
struct pblk_c_ctx *data_c_ctx = nvm_rq_to_pdu(data_rqd);
struct pblk_line *data_line = pblk_line_get_data(pblk);
struct ppa_addr ppa, ppa_opt;
u64 paddr;
int pos_opt;
/* Schedule a metadata I/O that is half the distance from the data I/O
* with regards to the number of LUNs forming the pblk instance. This
* balances LUN conflicts across every I/O.
*
* When the LUN configuration changes (e.g., due to GC), this distance
* can align, which would result on metadata and data I/Os colliding. In
* this case, modify the distance to not be optimal, but move the
* optimal in the right direction.
*/
paddr = pblk_lookup_page(pblk, meta_line);
ppa = addr_to_gen_ppa(pblk, paddr, 0);
ppa_opt = addr_to_gen_ppa(pblk, paddr + data_line->meta_distance, 0);
pos_opt = pblk_ppa_to_pos(geo, ppa_opt);
if (test_bit(pos_opt, data_c_ctx->lun_bitmap) ||
test_bit(pos_opt, data_line->blk_bitmap))
return true;
if (unlikely(pblk_ppa_comp(ppa_opt, ppa)))
data_line->meta_distance--;
return false;
}
static struct pblk_line *pblk_should_submit_meta_io(struct pblk *pblk,
struct nvm_rq *data_rqd)
{ {
struct pblk_line_meta *lm = &pblk->lm; struct pblk_line_meta *lm = &pblk->lm;
struct pblk_line_mgmt *l_mg = &pblk->l_mg; struct pblk_line_mgmt *l_mg = &pblk->l_mg;
@ -449,57 +420,45 @@ static int pblk_sched_meta_io(struct pblk *pblk, struct ppa_addr *prev_list,
retry: retry:
if (list_empty(&l_mg->emeta_list)) { if (list_empty(&l_mg->emeta_list)) {
spin_unlock(&l_mg->close_lock); spin_unlock(&l_mg->close_lock);
return 0; return NULL;
} }
meta_line = list_first_entry(&l_mg->emeta_list, struct pblk_line, list); meta_line = list_first_entry(&l_mg->emeta_list, struct pblk_line, list);
if (bitmap_full(meta_line->map_bitmap, lm->sec_per_line)) if (meta_line->emeta->mem >= lm->emeta_len[0])
goto retry; goto retry;
spin_unlock(&l_mg->close_lock); spin_unlock(&l_mg->close_lock);
if (!pblk_valid_meta_ppa(pblk, meta_line, prev_list, prev_n)) if (!pblk_valid_meta_ppa(pblk, meta_line, data_rqd))
return 0; return NULL;
return pblk_submit_meta_io(pblk, meta_line); return meta_line;
} }
static int pblk_submit_io_set(struct pblk *pblk, struct nvm_rq *rqd) static int pblk_submit_io_set(struct pblk *pblk, struct nvm_rq *rqd)
{ {
struct pblk_c_ctx *c_ctx = nvm_rq_to_pdu(rqd);
struct ppa_addr erase_ppa; struct ppa_addr erase_ppa;
struct pblk_line *meta_line;
int err; int err;
ppa_set_empty(&erase_ppa); ppa_set_empty(&erase_ppa);
/* Assign lbas to ppas and populate request structure */ /* Assign lbas to ppas and populate request structure */
err = pblk_setup_w_rq(pblk, rqd, c_ctx, &erase_ppa); err = pblk_setup_w_rq(pblk, rqd, &erase_ppa);
if (err) { if (err) {
pr_err("pblk: could not setup write request: %d\n", err); pr_err("pblk: could not setup write request: %d\n", err);
return NVM_IO_ERR; return NVM_IO_ERR;
} }
if (likely(ppa_empty(erase_ppa))) { meta_line = pblk_should_submit_meta_io(pblk, rqd);
/* Submit metadata write for previous data line */
err = pblk_sched_meta_io(pblk, rqd->ppa_list, rqd->nr_ppas);
if (err) {
pr_err("pblk: metadata I/O submission failed: %d", err);
return NVM_IO_ERR;
}
/* Submit data write for current data line */ /* Submit data write for current data line */
err = pblk_submit_io(pblk, rqd); err = pblk_submit_io(pblk, rqd);
if (err) { if (err) {
pr_err("pblk: data I/O submission failed: %d\n", err); pr_err("pblk: data I/O submission failed: %d\n", err);
return NVM_IO_ERR; return NVM_IO_ERR;
} }
} else {
/* Submit data write for current data line */
err = pblk_submit_io(pblk, rqd);
if (err) {
pr_err("pblk: data I/O submission failed: %d\n", err);
return NVM_IO_ERR;
}
/* Submit available erase for next data line */ if (!ppa_empty(erase_ppa)) {
/* Submit erase for next data line */
if (pblk_blk_erase_async(pblk, erase_ppa)) { if (pblk_blk_erase_async(pblk, erase_ppa)) {
struct pblk_line *e_line = pblk_line_get_erase(pblk); struct pblk_line *e_line = pblk_line_get_erase(pblk);
struct nvm_tgt_dev *dev = pblk->dev; struct nvm_tgt_dev *dev = pblk->dev;
@ -512,6 +471,15 @@ static int pblk_submit_io_set(struct pblk *pblk, struct nvm_rq *rqd)
} }
} }
if (meta_line) {
/* Submit metadata write for previous data line */
err = pblk_submit_meta_io(pblk, meta_line);
if (err) {
pr_err("pblk: metadata I/O submission failed: %d", err);
return NVM_IO_ERR;
}
}
return NVM_IO_OK; return NVM_IO_OK;
} }
@ -521,7 +489,8 @@ static void pblk_free_write_rqd(struct pblk *pblk, struct nvm_rq *rqd)
struct bio *bio = rqd->bio; struct bio *bio = rqd->bio;
if (c_ctx->nr_padded) if (c_ctx->nr_padded)
pblk_bio_free_pages(pblk, bio, rqd->nr_ppas, c_ctx->nr_padded); pblk_bio_free_pages(pblk, bio, c_ctx->nr_valid,
c_ctx->nr_padded);
} }
static int pblk_submit_write(struct pblk *pblk) static int pblk_submit_write(struct pblk *pblk)
@ -543,31 +512,24 @@ static int pblk_submit_write(struct pblk *pblk)
if (!secs_to_flush && secs_avail < pblk->min_write_pgs) if (!secs_to_flush && secs_avail < pblk->min_write_pgs)
return 1; return 1;
rqd = pblk_alloc_rqd(pblk, WRITE);
if (IS_ERR(rqd)) {
pr_err("pblk: cannot allocate write req.\n");
return 1;
}
bio = bio_alloc(GFP_KERNEL, pblk->max_write_pgs);
if (!bio) {
pr_err("pblk: cannot allocate write bio\n");
goto fail_free_rqd;
}
bio->bi_iter.bi_sector = 0; /* internal bio */
bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
rqd->bio = bio;
secs_to_sync = pblk_calc_secs_to_sync(pblk, secs_avail, secs_to_flush); secs_to_sync = pblk_calc_secs_to_sync(pblk, secs_avail, secs_to_flush);
if (secs_to_sync > pblk->max_write_pgs) { if (secs_to_sync > pblk->max_write_pgs) {
pr_err("pblk: bad buffer sync calculation\n"); pr_err("pblk: bad buffer sync calculation\n");
goto fail_put_bio; return 1;
} }
secs_to_com = (secs_to_sync > secs_avail) ? secs_avail : secs_to_sync; secs_to_com = (secs_to_sync > secs_avail) ? secs_avail : secs_to_sync;
pos = pblk_rb_read_commit(&pblk->rwb, secs_to_com); pos = pblk_rb_read_commit(&pblk->rwb, secs_to_com);
if (pblk_rb_read_to_bio(&pblk->rwb, rqd, bio, pos, secs_to_sync, bio = bio_alloc(GFP_KERNEL, secs_to_sync);
bio->bi_iter.bi_sector = 0; /* internal bio */
bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
rqd = pblk_alloc_rqd(pblk, PBLK_WRITE);
rqd->bio = bio;
if (pblk_rb_read_to_bio(&pblk->rwb, rqd, pos, secs_to_sync,
secs_avail)) { secs_avail)) {
pr_err("pblk: corrupted write bio\n"); pr_err("pblk: corrupted write bio\n");
goto fail_put_bio; goto fail_put_bio;
@ -586,8 +548,7 @@ fail_free_bio:
pblk_free_write_rqd(pblk, rqd); pblk_free_write_rqd(pblk, rqd);
fail_put_bio: fail_put_bio:
bio_put(bio); bio_put(bio);
fail_free_rqd: pblk_free_rqd(pblk, rqd, PBLK_WRITE);
pblk_free_rqd(pblk, rqd, WRITE);
return 1; return 1;
} }

View File

@ -40,10 +40,6 @@
#define PBLK_MAX_REQ_ADDRS (64) #define PBLK_MAX_REQ_ADDRS (64)
#define PBLK_MAX_REQ_ADDRS_PW (6) #define PBLK_MAX_REQ_ADDRS_PW (6)
#define PBLK_WS_POOL_SIZE (128)
#define PBLK_META_POOL_SIZE (128)
#define PBLK_READ_REQ_POOL_SIZE (1024)
#define PBLK_NR_CLOSE_JOBS (4) #define PBLK_NR_CLOSE_JOBS (4)
#define PBLK_CACHE_NAME_LEN (DISK_NAME_LEN + 16) #define PBLK_CACHE_NAME_LEN (DISK_NAME_LEN + 16)
@ -59,7 +55,15 @@
for ((i) = 0, rlun = &(pblk)->luns[0]; \ for ((i) = 0, rlun = &(pblk)->luns[0]; \
(i) < (pblk)->nr_luns; (i)++, rlun = &(pblk)->luns[(i)]) (i) < (pblk)->nr_luns; (i)++, rlun = &(pblk)->luns[(i)])
#define ERASE 2 /* READ = 0, WRITE = 1 */ /* Static pool sizes */
#define PBLK_GEN_WS_POOL_SIZE (2)
enum {
PBLK_READ = READ,
PBLK_WRITE = WRITE,/* Write from write buffer */
PBLK_WRITE_INT, /* Internal write - no write buffer */
PBLK_ERASE,
};
enum { enum {
/* IO Types */ /* IO Types */
@ -95,6 +99,7 @@ enum {
}; };
#define pblk_dma_meta_size (sizeof(struct pblk_sec_meta) * PBLK_MAX_REQ_ADDRS) #define pblk_dma_meta_size (sizeof(struct pblk_sec_meta) * PBLK_MAX_REQ_ADDRS)
#define pblk_dma_ppa_size (sizeof(u64) * PBLK_MAX_REQ_ADDRS)
/* write buffer completion context */ /* write buffer completion context */
struct pblk_c_ctx { struct pblk_c_ctx {
@ -106,9 +111,10 @@ struct pblk_c_ctx {
unsigned int nr_padded; unsigned int nr_padded;
}; };
/* generic context */ /* read context */
struct pblk_g_ctx { struct pblk_g_ctx {
void *private; void *private;
u64 lba;
}; };
/* Pad context */ /* Pad context */
@ -207,6 +213,7 @@ struct pblk_lun {
struct pblk_gc_rq { struct pblk_gc_rq {
struct pblk_line *line; struct pblk_line *line;
void *data; void *data;
u64 paddr_list[PBLK_MAX_REQ_ADDRS];
u64 lba_list[PBLK_MAX_REQ_ADDRS]; u64 lba_list[PBLK_MAX_REQ_ADDRS];
int nr_secs; int nr_secs;
int secs_to_gc; int secs_to_gc;
@ -231,7 +238,10 @@ struct pblk_gc {
struct timer_list gc_timer; struct timer_list gc_timer;
struct semaphore gc_sem; struct semaphore gc_sem;
atomic_t inflight_gc; atomic_t read_inflight_gc; /* Number of lines with inflight GC reads */
atomic_t pipeline_gc; /* Number of lines in the GC pipeline -
* started reads to finished writes
*/
int w_entries; int w_entries;
struct list_head w_list; struct list_head w_list;
@ -267,6 +277,7 @@ struct pblk_rl {
int rb_gc_max; /* Max buffer entries available for GC I/O */ int rb_gc_max; /* Max buffer entries available for GC I/O */
int rb_gc_rsv; /* Reserved buffer entries for GC I/O */ int rb_gc_rsv; /* Reserved buffer entries for GC I/O */
int rb_state; /* Rate-limiter current state */ int rb_state; /* Rate-limiter current state */
int rb_max_io; /* Maximum size for an I/O giving the config */
atomic_t rb_user_cnt; /* User I/O buffer counter */ atomic_t rb_user_cnt; /* User I/O buffer counter */
atomic_t rb_gc_cnt; /* GC I/O buffer counter */ atomic_t rb_gc_cnt; /* GC I/O buffer counter */
@ -310,6 +321,7 @@ enum {
}; };
#define PBLK_MAGIC 0x70626c6b /*pblk*/ #define PBLK_MAGIC 0x70626c6b /*pblk*/
#define SMETA_VERSION cpu_to_le16(1)
struct line_header { struct line_header {
__le32 crc; __le32 crc;
@ -618,15 +630,16 @@ struct pblk {
struct list_head compl_list; struct list_head compl_list;
mempool_t *page_pool; mempool_t *page_bio_pool;
mempool_t *line_ws_pool; mempool_t *gen_ws_pool;
mempool_t *rec_pool; mempool_t *rec_pool;
mempool_t *g_rq_pool; mempool_t *r_rq_pool;
mempool_t *w_rq_pool; mempool_t *w_rq_pool;
mempool_t *line_meta_pool; mempool_t *e_rq_pool;
struct workqueue_struct *close_wq; struct workqueue_struct *close_wq;
struct workqueue_struct *bb_wq; struct workqueue_struct *bb_wq;
struct workqueue_struct *r_end_wq;
struct timer_list wtimer; struct timer_list wtimer;
@ -657,15 +670,15 @@ int pblk_rb_may_write_gc(struct pblk_rb *rb, unsigned int nr_entries,
void pblk_rb_write_entry_user(struct pblk_rb *rb, void *data, void pblk_rb_write_entry_user(struct pblk_rb *rb, void *data,
struct pblk_w_ctx w_ctx, unsigned int pos); struct pblk_w_ctx w_ctx, unsigned int pos);
void pblk_rb_write_entry_gc(struct pblk_rb *rb, void *data, void pblk_rb_write_entry_gc(struct pblk_rb *rb, void *data,
struct pblk_w_ctx w_ctx, struct pblk_line *gc_line, struct pblk_w_ctx w_ctx, struct pblk_line *line,
unsigned int pos); u64 paddr, unsigned int pos);
struct pblk_w_ctx *pblk_rb_w_ctx(struct pblk_rb *rb, unsigned int pos); struct pblk_w_ctx *pblk_rb_w_ctx(struct pblk_rb *rb, unsigned int pos);
void pblk_rb_flush(struct pblk_rb *rb); void pblk_rb_flush(struct pblk_rb *rb);
void pblk_rb_sync_l2p(struct pblk_rb *rb); void pblk_rb_sync_l2p(struct pblk_rb *rb);
unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb, struct nvm_rq *rqd, unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb, struct nvm_rq *rqd,
struct bio *bio, unsigned int pos, unsigned int pos, unsigned int nr_entries,
unsigned int nr_entries, unsigned int count); unsigned int count);
unsigned int pblk_rb_read_to_bio_list(struct pblk_rb *rb, struct bio *bio, unsigned int pblk_rb_read_to_bio_list(struct pblk_rb *rb, struct bio *bio,
struct list_head *list, struct list_head *list,
unsigned int max); unsigned int max);
@ -692,24 +705,23 @@ ssize_t pblk_rb_sysfs(struct pblk_rb *rb, char *buf);
/* /*
* pblk core * pblk core
*/ */
struct nvm_rq *pblk_alloc_rqd(struct pblk *pblk, int rw); struct nvm_rq *pblk_alloc_rqd(struct pblk *pblk, int type);
void pblk_free_rqd(struct pblk *pblk, struct nvm_rq *rqd, int type);
void pblk_set_sec_per_write(struct pblk *pblk, int sec_per_write); void pblk_set_sec_per_write(struct pblk *pblk, int sec_per_write);
int pblk_setup_w_rec_rq(struct pblk *pblk, struct nvm_rq *rqd, int pblk_setup_w_rec_rq(struct pblk *pblk, struct nvm_rq *rqd,
struct pblk_c_ctx *c_ctx); struct pblk_c_ctx *c_ctx);
void pblk_free_rqd(struct pblk *pblk, struct nvm_rq *rqd, int rw);
void pblk_wait_for_meta(struct pblk *pblk);
struct ppa_addr pblk_get_lba_map(struct pblk *pblk, sector_t lba);
void pblk_discard(struct pblk *pblk, struct bio *bio); void pblk_discard(struct pblk *pblk, struct bio *bio);
void pblk_log_write_err(struct pblk *pblk, struct nvm_rq *rqd); void pblk_log_write_err(struct pblk *pblk, struct nvm_rq *rqd);
void pblk_log_read_err(struct pblk *pblk, struct nvm_rq *rqd); void pblk_log_read_err(struct pblk *pblk, struct nvm_rq *rqd);
int pblk_submit_io(struct pblk *pblk, struct nvm_rq *rqd); int pblk_submit_io(struct pblk *pblk, struct nvm_rq *rqd);
int pblk_submit_io_sync(struct pblk *pblk, struct nvm_rq *rqd);
int pblk_submit_meta_io(struct pblk *pblk, struct pblk_line *meta_line); int pblk_submit_meta_io(struct pblk *pblk, struct pblk_line *meta_line);
struct bio *pblk_bio_map_addr(struct pblk *pblk, void *data, struct bio *pblk_bio_map_addr(struct pblk *pblk, void *data,
unsigned int nr_secs, unsigned int len, unsigned int nr_secs, unsigned int len,
int alloc_type, gfp_t gfp_mask); int alloc_type, gfp_t gfp_mask);
struct pblk_line *pblk_line_get(struct pblk *pblk); struct pblk_line *pblk_line_get(struct pblk *pblk);
struct pblk_line *pblk_line_get_first_data(struct pblk *pblk); struct pblk_line *pblk_line_get_first_data(struct pblk *pblk);
void pblk_line_replace_data(struct pblk *pblk); struct pblk_line *pblk_line_replace_data(struct pblk *pblk);
int pblk_line_recov_alloc(struct pblk *pblk, struct pblk_line *line); int pblk_line_recov_alloc(struct pblk *pblk, struct pblk_line *line);
void pblk_line_recov_close(struct pblk *pblk, struct pblk_line *line); void pblk_line_recov_close(struct pblk *pblk, struct pblk_line *line);
struct pblk_line *pblk_line_get_data(struct pblk *pblk); struct pblk_line *pblk_line_get_data(struct pblk *pblk);
@ -719,19 +731,18 @@ int pblk_line_is_full(struct pblk_line *line);
void pblk_line_free(struct pblk *pblk, struct pblk_line *line); void pblk_line_free(struct pblk *pblk, struct pblk_line *line);
void pblk_line_close_meta(struct pblk *pblk, struct pblk_line *line); void pblk_line_close_meta(struct pblk *pblk, struct pblk_line *line);
void pblk_line_close(struct pblk *pblk, struct pblk_line *line); void pblk_line_close(struct pblk *pblk, struct pblk_line *line);
void pblk_line_close_meta_sync(struct pblk *pblk);
void pblk_line_close_ws(struct work_struct *work); void pblk_line_close_ws(struct work_struct *work);
void pblk_pipeline_stop(struct pblk *pblk); void pblk_pipeline_stop(struct pblk *pblk);
void pblk_line_mark_bb(struct work_struct *work); void pblk_gen_run_ws(struct pblk *pblk, struct pblk_line *line, void *priv,
void pblk_line_run_ws(struct pblk *pblk, struct pblk_line *line, void *priv, void (*work)(struct work_struct *), gfp_t gfp_mask,
void (*work)(struct work_struct *), struct workqueue_struct *wq);
struct workqueue_struct *wq);
u64 pblk_line_smeta_start(struct pblk *pblk, struct pblk_line *line); u64 pblk_line_smeta_start(struct pblk *pblk, struct pblk_line *line);
int pblk_line_read_smeta(struct pblk *pblk, struct pblk_line *line); int pblk_line_read_smeta(struct pblk *pblk, struct pblk_line *line);
int pblk_line_read_emeta(struct pblk *pblk, struct pblk_line *line, int pblk_line_read_emeta(struct pblk *pblk, struct pblk_line *line,
void *emeta_buf); void *emeta_buf);
int pblk_blk_erase_async(struct pblk *pblk, struct ppa_addr erase_ppa); int pblk_blk_erase_async(struct pblk *pblk, struct ppa_addr erase_ppa);
void pblk_line_put(struct kref *ref); void pblk_line_put(struct kref *ref);
void pblk_line_put_wq(struct kref *ref);
struct list_head *pblk_line_gc_list(struct pblk *pblk, struct pblk_line *line); struct list_head *pblk_line_gc_list(struct pblk *pblk, struct pblk_line *line);
u64 pblk_lookup_page(struct pblk *pblk, struct pblk_line *line); u64 pblk_lookup_page(struct pblk *pblk, struct pblk_line *line);
void pblk_dealloc_page(struct pblk *pblk, struct pblk_line *line, int nr_secs); void pblk_dealloc_page(struct pblk *pblk, struct pblk_line *line, int nr_secs);
@ -745,7 +756,6 @@ void pblk_down_rq(struct pblk *pblk, struct ppa_addr *ppa_list, int nr_ppas,
void pblk_down_page(struct pblk *pblk, struct ppa_addr *ppa_list, int nr_ppas); void pblk_down_page(struct pblk *pblk, struct ppa_addr *ppa_list, int nr_ppas);
void pblk_up_rq(struct pblk *pblk, struct ppa_addr *ppa_list, int nr_ppas, void pblk_up_rq(struct pblk *pblk, struct ppa_addr *ppa_list, int nr_ppas,
unsigned long *lun_bitmap); unsigned long *lun_bitmap);
void pblk_end_bio_sync(struct bio *bio);
void pblk_end_io_sync(struct nvm_rq *rqd); void pblk_end_io_sync(struct nvm_rq *rqd);
int pblk_bio_add_pages(struct pblk *pblk, struct bio *bio, gfp_t flags, int pblk_bio_add_pages(struct pblk *pblk, struct bio *bio, gfp_t flags,
int nr_pages); int nr_pages);
@ -760,7 +770,7 @@ void pblk_update_map_cache(struct pblk *pblk, sector_t lba,
void pblk_update_map_dev(struct pblk *pblk, sector_t lba, void pblk_update_map_dev(struct pblk *pblk, sector_t lba,
struct ppa_addr ppa, struct ppa_addr entry_line); struct ppa_addr ppa, struct ppa_addr entry_line);
int pblk_update_map_gc(struct pblk *pblk, sector_t lba, struct ppa_addr ppa, int pblk_update_map_gc(struct pblk *pblk, sector_t lba, struct ppa_addr ppa,
struct pblk_line *gc_line); struct pblk_line *gc_line, u64 paddr);
void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas, void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
u64 *lba_list, int nr_secs); u64 *lba_list, int nr_secs);
void pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas, void pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
@ -771,9 +781,7 @@ void pblk_lookup_l2p_seq(struct pblk *pblk, struct ppa_addr *ppas,
*/ */
int pblk_write_to_cache(struct pblk *pblk, struct bio *bio, int pblk_write_to_cache(struct pblk *pblk, struct bio *bio,
unsigned long flags); unsigned long flags);
int pblk_write_gc_to_cache(struct pblk *pblk, void *data, u64 *lba_list, int pblk_write_gc_to_cache(struct pblk *pblk, struct pblk_gc_rq *gc_rq);
unsigned int nr_entries, unsigned int nr_rec_entries,
struct pblk_line *gc_line, unsigned long flags);
/* /*
* pblk map * pblk map
@ -797,9 +805,7 @@ void pblk_write_should_kick(struct pblk *pblk);
*/ */
extern struct bio_set *pblk_bio_set; extern struct bio_set *pblk_bio_set;
int pblk_submit_read(struct pblk *pblk, struct bio *bio); int pblk_submit_read(struct pblk *pblk, struct bio *bio);
int pblk_submit_read_gc(struct pblk *pblk, u64 *lba_list, void *data, int pblk_submit_read_gc(struct pblk *pblk, struct pblk_gc_rq *gc_rq);
unsigned int nr_secs, unsigned int *secs_to_gc,
struct pblk_line *line);
/* /*
* pblk recovery * pblk recovery
*/ */
@ -815,7 +821,7 @@ int pblk_recov_setup_rq(struct pblk *pblk, struct pblk_c_ctx *c_ctx,
* pblk gc * pblk gc
*/ */
#define PBLK_GC_MAX_READERS 8 /* Max number of outstanding GC reader jobs */ #define PBLK_GC_MAX_READERS 8 /* Max number of outstanding GC reader jobs */
#define PBLK_GC_W_QD 128 /* Queue depth for inflight GC write I/Os */ #define PBLK_GC_RQ_QD 128 /* Queue depth for inflight GC requests */
#define PBLK_GC_L_QD 4 /* Queue depth for inflight GC lines */ #define PBLK_GC_L_QD 4 /* Queue depth for inflight GC lines */
#define PBLK_GC_RSV_LINE 1 /* Reserved lines for GC */ #define PBLK_GC_RSV_LINE 1 /* Reserved lines for GC */
@ -824,7 +830,7 @@ void pblk_gc_exit(struct pblk *pblk);
void pblk_gc_should_start(struct pblk *pblk); void pblk_gc_should_start(struct pblk *pblk);
void pblk_gc_should_stop(struct pblk *pblk); void pblk_gc_should_stop(struct pblk *pblk);
void pblk_gc_should_kick(struct pblk *pblk); void pblk_gc_should_kick(struct pblk *pblk);
void pblk_gc_kick(struct pblk *pblk); void pblk_gc_free_full_lines(struct pblk *pblk);
void pblk_gc_sysfs_state_show(struct pblk *pblk, int *gc_enabled, void pblk_gc_sysfs_state_show(struct pblk *pblk, int *gc_enabled,
int *gc_active); int *gc_active);
int pblk_gc_sysfs_force(struct pblk *pblk, int force); int pblk_gc_sysfs_force(struct pblk *pblk, int force);
@ -834,8 +840,8 @@ int pblk_gc_sysfs_force(struct pblk *pblk, int force);
*/ */
void pblk_rl_init(struct pblk_rl *rl, int budget); void pblk_rl_init(struct pblk_rl *rl, int budget);
void pblk_rl_free(struct pblk_rl *rl); void pblk_rl_free(struct pblk_rl *rl);
void pblk_rl_update_rates(struct pblk_rl *rl);
int pblk_rl_high_thrs(struct pblk_rl *rl); int pblk_rl_high_thrs(struct pblk_rl *rl);
int pblk_rl_low_thrs(struct pblk_rl *rl);
unsigned long pblk_rl_nr_free_blks(struct pblk_rl *rl); unsigned long pblk_rl_nr_free_blks(struct pblk_rl *rl);
int pblk_rl_user_may_insert(struct pblk_rl *rl, int nr_entries); int pblk_rl_user_may_insert(struct pblk_rl *rl, int nr_entries);
void pblk_rl_inserted(struct pblk_rl *rl, int nr_entries); void pblk_rl_inserted(struct pblk_rl *rl, int nr_entries);
@ -843,10 +849,9 @@ void pblk_rl_user_in(struct pblk_rl *rl, int nr_entries);
int pblk_rl_gc_may_insert(struct pblk_rl *rl, int nr_entries); int pblk_rl_gc_may_insert(struct pblk_rl *rl, int nr_entries);
void pblk_rl_gc_in(struct pblk_rl *rl, int nr_entries); void pblk_rl_gc_in(struct pblk_rl *rl, int nr_entries);
void pblk_rl_out(struct pblk_rl *rl, int nr_user, int nr_gc); void pblk_rl_out(struct pblk_rl *rl, int nr_user, int nr_gc);
int pblk_rl_sysfs_rate_show(struct pblk_rl *rl); int pblk_rl_max_io(struct pblk_rl *rl);
void pblk_rl_free_lines_inc(struct pblk_rl *rl, struct pblk_line *line); void pblk_rl_free_lines_inc(struct pblk_rl *rl, struct pblk_line *line);
void pblk_rl_free_lines_dec(struct pblk_rl *rl, struct pblk_line *line); void pblk_rl_free_lines_dec(struct pblk_rl *rl, struct pblk_line *line);
void pblk_rl_set_space_limit(struct pblk_rl *rl, int entries_left);
int pblk_rl_is_limit(struct pblk_rl *rl); int pblk_rl_is_limit(struct pblk_rl *rl);
/* /*
@ -892,13 +897,7 @@ static inline void *emeta_to_vsc(struct pblk *pblk, struct line_emeta *emeta)
static inline int pblk_line_vsc(struct pblk_line *line) static inline int pblk_line_vsc(struct pblk_line *line)
{ {
int vsc; return le32_to_cpu(*line->vsc);
spin_lock(&line->lock);
vsc = le32_to_cpu(*line->vsc);
spin_unlock(&line->lock);
return vsc;
} }
#define NVM_MEM_PAGE_WRITE (8) #define NVM_MEM_PAGE_WRITE (8)
@ -1140,7 +1139,7 @@ static inline int pblk_set_progr_mode(struct pblk *pblk, int type)
flags = geo->plane_mode >> 1; flags = geo->plane_mode >> 1;
if (type == WRITE) if (type == PBLK_WRITE)
flags |= NVM_IO_SCRAMBLE_ENABLE; flags |= NVM_IO_SCRAMBLE_ENABLE;
return flags; return flags;
@ -1200,7 +1199,6 @@ static inline void pblk_print_failed_rqd(struct pblk *pblk, struct nvm_rq *rqd,
pr_err("error:%d, ppa_status:%llx\n", error, rqd->ppa_status); pr_err("error:%d, ppa_status:%llx\n", error, rqd->ppa_status);
} }
#endif
static inline int pblk_boundary_ppa_checks(struct nvm_tgt_dev *tgt_dev, static inline int pblk_boundary_ppa_checks(struct nvm_tgt_dev *tgt_dev,
struct ppa_addr *ppas, int nr_ppas) struct ppa_addr *ppas, int nr_ppas)
@ -1221,14 +1219,50 @@ static inline int pblk_boundary_ppa_checks(struct nvm_tgt_dev *tgt_dev,
ppa->g.sec < geo->sec_per_pg) ppa->g.sec < geo->sec_per_pg)
continue; continue;
#ifdef CONFIG_NVM_DEBUG
print_ppa(ppa, "boundary", i); print_ppa(ppa, "boundary", i);
#endif
return 1; return 1;
} }
return 0; return 0;
} }
static inline int pblk_check_io(struct pblk *pblk, struct nvm_rq *rqd)
{
struct nvm_tgt_dev *dev = pblk->dev;
struct ppa_addr *ppa_list;
ppa_list = (rqd->nr_ppas > 1) ? rqd->ppa_list : &rqd->ppa_addr;
if (pblk_boundary_ppa_checks(dev, ppa_list, rqd->nr_ppas)) {
WARN_ON(1);
return -EINVAL;
}
if (rqd->opcode == NVM_OP_PWRITE) {
struct pblk_line *line;
struct ppa_addr ppa;
int i;
for (i = 0; i < rqd->nr_ppas; i++) {
ppa = ppa_list[i];
line = &pblk->lines[pblk_dev_ppa_to_line(ppa)];
spin_lock(&line->lock);
if (line->state != PBLK_LINESTATE_OPEN) {
pr_err("pblk: bad ppa: line:%d,state:%d\n",
line->id, line->state);
WARN_ON(1);
spin_unlock(&line->lock);
return -EINVAL;
}
spin_unlock(&line->lock);
}
}
return 0;
}
#endif
static inline int pblk_boundary_paddr_checks(struct pblk *pblk, u64 paddr) static inline int pblk_boundary_paddr_checks(struct pblk *pblk, u64 paddr)
{ {
struct pblk_line_meta *lm = &pblk->lm; struct pblk_line_meta *lm = &pblk->lm;

View File

@ -407,7 +407,8 @@ long bch_bucket_alloc(struct cache *ca, unsigned reserve, bool wait)
finish_wait(&ca->set->bucket_wait, &w); finish_wait(&ca->set->bucket_wait, &w);
out: out:
wake_up_process(ca->alloc_thread); if (ca->alloc_thread)
wake_up_process(ca->alloc_thread);
trace_bcache_alloc(ca, reserve); trace_bcache_alloc(ca, reserve);
@ -442,6 +443,11 @@ out:
b->prio = INITIAL_PRIO; b->prio = INITIAL_PRIO;
} }
if (ca->set->avail_nbuckets > 0) {
ca->set->avail_nbuckets--;
bch_update_bucket_in_use(ca->set, &ca->set->gc_stats);
}
return r; return r;
} }
@ -449,6 +455,11 @@ void __bch_bucket_free(struct cache *ca, struct bucket *b)
{ {
SET_GC_MARK(b, 0); SET_GC_MARK(b, 0);
SET_GC_SECTORS_USED(b, 0); SET_GC_SECTORS_USED(b, 0);
if (ca->set->avail_nbuckets < ca->set->nbuckets) {
ca->set->avail_nbuckets++;
bch_update_bucket_in_use(ca->set, &ca->set->gc_stats);
}
} }
void bch_bucket_free(struct cache_set *c, struct bkey *k) void bch_bucket_free(struct cache_set *c, struct bkey *k)
@ -601,7 +612,7 @@ bool bch_alloc_sectors(struct cache_set *c, struct bkey *k, unsigned sectors,
/* /*
* If we had to allocate, we might race and not need to allocate the * If we had to allocate, we might race and not need to allocate the
* second time we call find_data_bucket(). If we allocated a bucket but * second time we call pick_data_bucket(). If we allocated a bucket but
* didn't use it, drop the refcount bch_bucket_alloc_set() took: * didn't use it, drop the refcount bch_bucket_alloc_set() took:
*/ */
if (KEY_PTRS(&alloc.key)) if (KEY_PTRS(&alloc.key))

View File

@ -185,6 +185,7 @@
#include <linux/mutex.h> #include <linux/mutex.h>
#include <linux/rbtree.h> #include <linux/rbtree.h>
#include <linux/rwsem.h> #include <linux/rwsem.h>
#include <linux/refcount.h>
#include <linux/types.h> #include <linux/types.h>
#include <linux/workqueue.h> #include <linux/workqueue.h>
@ -266,9 +267,6 @@ struct bcache_device {
atomic_t *stripe_sectors_dirty; atomic_t *stripe_sectors_dirty;
unsigned long *full_dirty_stripes; unsigned long *full_dirty_stripes;
unsigned long sectors_dirty_last;
long sectors_dirty_derivative;
struct bio_set *bio_split; struct bio_set *bio_split;
unsigned data_csum:1; unsigned data_csum:1;
@ -300,7 +298,7 @@ struct cached_dev {
struct semaphore sb_write_mutex; struct semaphore sb_write_mutex;
/* Refcount on the cache set. Always nonzero when we're caching. */ /* Refcount on the cache set. Always nonzero when we're caching. */
atomic_t count; refcount_t count;
struct work_struct detach; struct work_struct detach;
/* /*
@ -363,12 +361,14 @@ struct cached_dev {
uint64_t writeback_rate_target; uint64_t writeback_rate_target;
int64_t writeback_rate_proportional; int64_t writeback_rate_proportional;
int64_t writeback_rate_derivative; int64_t writeback_rate_integral;
int64_t writeback_rate_change; int64_t writeback_rate_integral_scaled;
int32_t writeback_rate_change;
unsigned writeback_rate_update_seconds; unsigned writeback_rate_update_seconds;
unsigned writeback_rate_d_term; unsigned writeback_rate_i_term_inverse;
unsigned writeback_rate_p_term_inverse; unsigned writeback_rate_p_term_inverse;
unsigned writeback_rate_minimum;
}; };
enum alloc_reserve { enum alloc_reserve {
@ -582,6 +582,7 @@ struct cache_set {
uint8_t need_gc; uint8_t need_gc;
struct gc_stat gc_stats; struct gc_stat gc_stats;
size_t nbuckets; size_t nbuckets;
size_t avail_nbuckets;
struct task_struct *gc_thread; struct task_struct *gc_thread;
/* Where in the btree gc currently is */ /* Where in the btree gc currently is */
@ -807,13 +808,13 @@ do { \
static inline void cached_dev_put(struct cached_dev *dc) static inline void cached_dev_put(struct cached_dev *dc)
{ {
if (atomic_dec_and_test(&dc->count)) if (refcount_dec_and_test(&dc->count))
schedule_work(&dc->detach); schedule_work(&dc->detach);
} }
static inline bool cached_dev_get(struct cached_dev *dc) static inline bool cached_dev_get(struct cached_dev *dc)
{ {
if (!atomic_inc_not_zero(&dc->count)) if (!refcount_inc_not_zero(&dc->count))
return false; return false;
/* Paired with the mb in cached_dev_attach */ /* Paired with the mb in cached_dev_attach */

View File

@ -1241,6 +1241,11 @@ void bch_initial_mark_key(struct cache_set *c, int level, struct bkey *k)
__bch_btree_mark_key(c, level, k); __bch_btree_mark_key(c, level, k);
} }
void bch_update_bucket_in_use(struct cache_set *c, struct gc_stat *stats)
{
stats->in_use = (c->nbuckets - c->avail_nbuckets) * 100 / c->nbuckets;
}
static bool btree_gc_mark_node(struct btree *b, struct gc_stat *gc) static bool btree_gc_mark_node(struct btree *b, struct gc_stat *gc)
{ {
uint8_t stale = 0; uint8_t stale = 0;
@ -1652,9 +1657,8 @@ static void btree_gc_start(struct cache_set *c)
mutex_unlock(&c->bucket_lock); mutex_unlock(&c->bucket_lock);
} }
static size_t bch_btree_gc_finish(struct cache_set *c) static void bch_btree_gc_finish(struct cache_set *c)
{ {
size_t available = 0;
struct bucket *b; struct bucket *b;
struct cache *ca; struct cache *ca;
unsigned i; unsigned i;
@ -1691,6 +1695,7 @@ static size_t bch_btree_gc_finish(struct cache_set *c)
} }
rcu_read_unlock(); rcu_read_unlock();
c->avail_nbuckets = 0;
for_each_cache(ca, c, i) { for_each_cache(ca, c, i) {
uint64_t *i; uint64_t *i;
@ -1712,18 +1717,16 @@ static size_t bch_btree_gc_finish(struct cache_set *c)
BUG_ON(!GC_MARK(b) && GC_SECTORS_USED(b)); BUG_ON(!GC_MARK(b) && GC_SECTORS_USED(b));
if (!GC_MARK(b) || GC_MARK(b) == GC_MARK_RECLAIMABLE) if (!GC_MARK(b) || GC_MARK(b) == GC_MARK_RECLAIMABLE)
available++; c->avail_nbuckets++;
} }
} }
mutex_unlock(&c->bucket_lock); mutex_unlock(&c->bucket_lock);
return available;
} }
static void bch_btree_gc(struct cache_set *c) static void bch_btree_gc(struct cache_set *c)
{ {
int ret; int ret;
unsigned long available;
struct gc_stat stats; struct gc_stat stats;
struct closure writes; struct closure writes;
struct btree_op op; struct btree_op op;
@ -1746,14 +1749,14 @@ static void bch_btree_gc(struct cache_set *c)
pr_warn("gc failed!"); pr_warn("gc failed!");
} while (ret); } while (ret);
available = bch_btree_gc_finish(c); bch_btree_gc_finish(c);
wake_up_allocators(c); wake_up_allocators(c);
bch_time_stats_update(&c->btree_gc_time, start_time); bch_time_stats_update(&c->btree_gc_time, start_time);
stats.key_bytes *= sizeof(uint64_t); stats.key_bytes *= sizeof(uint64_t);
stats.data <<= 9; stats.data <<= 9;
stats.in_use = (c->nbuckets - available) * 100 / c->nbuckets; bch_update_bucket_in_use(c, &stats);
memcpy(&c->gc_stats, &stats, sizeof(struct gc_stat)); memcpy(&c->gc_stats, &stats, sizeof(struct gc_stat));
trace_bcache_gc_end(c); trace_bcache_gc_end(c);

View File

@ -306,5 +306,5 @@ void bch_keybuf_del(struct keybuf *, struct keybuf_key *);
struct keybuf_key *bch_keybuf_next(struct keybuf *); struct keybuf_key *bch_keybuf_next(struct keybuf *);
struct keybuf_key *bch_keybuf_next_rescan(struct cache_set *, struct keybuf *, struct keybuf_key *bch_keybuf_next_rescan(struct cache_set *, struct keybuf *,
struct bkey *, keybuf_pred_fn *); struct bkey *, keybuf_pred_fn *);
void bch_update_bucket_in_use(struct cache_set *c, struct gc_stat *stats);
#endif #endif

View File

@ -252,6 +252,12 @@ static inline void set_closure_fn(struct closure *cl, closure_fn *fn,
static inline void closure_queue(struct closure *cl) static inline void closure_queue(struct closure *cl)
{ {
struct workqueue_struct *wq = cl->wq; struct workqueue_struct *wq = cl->wq;
/**
* Changes made to closure, work_struct, or a couple of other structs
* may cause work.func not pointing to the right location.
*/
BUILD_BUG_ON(offsetof(struct closure, fn)
!= offsetof(struct work_struct, func));
if (wq) { if (wq) {
INIT_WORK(&cl->work, cl->work.func); INIT_WORK(&cl->work, cl->work.func);
BUG_ON(!queue_work(wq, &cl->work)); BUG_ON(!queue_work(wq, &cl->work));

View File

@ -27,12 +27,12 @@ struct kmem_cache *bch_search_cache;
static void bch_data_insert_start(struct closure *); static void bch_data_insert_start(struct closure *);
static unsigned cache_mode(struct cached_dev *dc, struct bio *bio) static unsigned cache_mode(struct cached_dev *dc)
{ {
return BDEV_CACHE_MODE(&dc->sb); return BDEV_CACHE_MODE(&dc->sb);
} }
static bool verify(struct cached_dev *dc, struct bio *bio) static bool verify(struct cached_dev *dc)
{ {
return dc->verify; return dc->verify;
} }
@ -370,7 +370,7 @@ static struct hlist_head *iohash(struct cached_dev *dc, uint64_t k)
static bool check_should_bypass(struct cached_dev *dc, struct bio *bio) static bool check_should_bypass(struct cached_dev *dc, struct bio *bio)
{ {
struct cache_set *c = dc->disk.c; struct cache_set *c = dc->disk.c;
unsigned mode = cache_mode(dc, bio); unsigned mode = cache_mode(dc);
unsigned sectors, congested = bch_get_congested(c); unsigned sectors, congested = bch_get_congested(c);
struct task_struct *task = current; struct task_struct *task = current;
struct io *i; struct io *i;
@ -385,6 +385,14 @@ static bool check_should_bypass(struct cached_dev *dc, struct bio *bio)
op_is_write(bio_op(bio)))) op_is_write(bio_op(bio))))
goto skip; goto skip;
/*
* Flag for bypass if the IO is for read-ahead or background,
* unless the read-ahead request is for metadata (eg, for gfs2).
*/
if (bio->bi_opf & (REQ_RAHEAD|REQ_BACKGROUND) &&
!(bio->bi_opf & REQ_META))
goto skip;
if (bio->bi_iter.bi_sector & (c->sb.block_size - 1) || if (bio->bi_iter.bi_sector & (c->sb.block_size - 1) ||
bio_sectors(bio) & (c->sb.block_size - 1)) { bio_sectors(bio) & (c->sb.block_size - 1)) {
pr_debug("skipping unaligned io"); pr_debug("skipping unaligned io");
@ -463,6 +471,7 @@ struct search {
unsigned recoverable:1; unsigned recoverable:1;
unsigned write:1; unsigned write:1;
unsigned read_dirty_data:1; unsigned read_dirty_data:1;
unsigned cache_missed:1;
unsigned long start_time; unsigned long start_time;
@ -649,6 +658,7 @@ static inline struct search *search_alloc(struct bio *bio,
s->orig_bio = bio; s->orig_bio = bio;
s->cache_miss = NULL; s->cache_miss = NULL;
s->cache_missed = 0;
s->d = d; s->d = d;
s->recoverable = 1; s->recoverable = 1;
s->write = op_is_write(bio_op(bio)); s->write = op_is_write(bio_op(bio));
@ -698,8 +708,16 @@ static void cached_dev_read_error(struct closure *cl)
{ {
struct search *s = container_of(cl, struct search, cl); struct search *s = container_of(cl, struct search, cl);
struct bio *bio = &s->bio.bio; struct bio *bio = &s->bio.bio;
struct cached_dev *dc = container_of(s->d, struct cached_dev, disk);
if (s->recoverable) { /*
* If cache device is dirty (dc->has_dirty is non-zero), then
* recovery a failed read request from cached device may get a
* stale data back. So read failure recovery is only permitted
* when cache device is clean.
*/
if (s->recoverable &&
(dc && !atomic_read(&dc->has_dirty))) {
/* Retry from the backing device: */ /* Retry from the backing device: */
trace_bcache_read_retry(s->orig_bio); trace_bcache_read_retry(s->orig_bio);
@ -740,7 +758,7 @@ static void cached_dev_read_done(struct closure *cl)
s->cache_miss = NULL; s->cache_miss = NULL;
} }
if (verify(dc, &s->bio.bio) && s->recoverable && !s->read_dirty_data) if (verify(dc) && s->recoverable && !s->read_dirty_data)
bch_data_verify(dc, s->orig_bio); bch_data_verify(dc, s->orig_bio);
bio_complete(s); bio_complete(s);
@ -760,12 +778,12 @@ static void cached_dev_read_done_bh(struct closure *cl)
struct cached_dev *dc = container_of(s->d, struct cached_dev, disk); struct cached_dev *dc = container_of(s->d, struct cached_dev, disk);
bch_mark_cache_accounting(s->iop.c, s->d, bch_mark_cache_accounting(s->iop.c, s->d,
!s->cache_miss, s->iop.bypass); !s->cache_missed, s->iop.bypass);
trace_bcache_read(s->orig_bio, !s->cache_miss, s->iop.bypass); trace_bcache_read(s->orig_bio, !s->cache_miss, s->iop.bypass);
if (s->iop.status) if (s->iop.status)
continue_at_nobarrier(cl, cached_dev_read_error, bcache_wq); continue_at_nobarrier(cl, cached_dev_read_error, bcache_wq);
else if (s->iop.bio || verify(dc, &s->bio.bio)) else if (s->iop.bio || verify(dc))
continue_at_nobarrier(cl, cached_dev_read_done, bcache_wq); continue_at_nobarrier(cl, cached_dev_read_done, bcache_wq);
else else
continue_at_nobarrier(cl, cached_dev_bio_complete, NULL); continue_at_nobarrier(cl, cached_dev_bio_complete, NULL);
@ -779,6 +797,8 @@ static int cached_dev_cache_miss(struct btree *b, struct search *s,
struct cached_dev *dc = container_of(s->d, struct cached_dev, disk); struct cached_dev *dc = container_of(s->d, struct cached_dev, disk);
struct bio *miss, *cache_bio; struct bio *miss, *cache_bio;
s->cache_missed = 1;
if (s->cache_miss || s->iop.bypass) { if (s->cache_miss || s->iop.bypass) {
miss = bio_next_split(bio, sectors, GFP_NOIO, s->d->bio_split); miss = bio_next_split(bio, sectors, GFP_NOIO, s->d->bio_split);
ret = miss == bio ? MAP_DONE : MAP_CONTINUE; ret = miss == bio ? MAP_DONE : MAP_CONTINUE;
@ -892,7 +912,7 @@ static void cached_dev_write(struct cached_dev *dc, struct search *s)
s->iop.bypass = true; s->iop.bypass = true;
if (should_writeback(dc, s->orig_bio, if (should_writeback(dc, s->orig_bio,
cache_mode(dc, bio), cache_mode(dc),
s->iop.bypass)) { s->iop.bypass)) {
s->iop.bypass = false; s->iop.bypass = false;
s->iop.writeback = true; s->iop.writeback = true;

View File

@ -53,12 +53,15 @@ LIST_HEAD(bch_cache_sets);
static LIST_HEAD(uncached_devices); static LIST_HEAD(uncached_devices);
static int bcache_major; static int bcache_major;
static DEFINE_IDA(bcache_minor); static DEFINE_IDA(bcache_device_idx);
static wait_queue_head_t unregister_wait; static wait_queue_head_t unregister_wait;
struct workqueue_struct *bcache_wq; struct workqueue_struct *bcache_wq;
#define BTREE_MAX_PAGES (256 * 1024 / PAGE_SIZE) #define BTREE_MAX_PAGES (256 * 1024 / PAGE_SIZE)
#define BCACHE_MINORS 16 /* partition support */ /* limitation of partitions number on single bcache device */
#define BCACHE_MINORS 128
/* limitation of bcache devices number on single system */
#define BCACHE_DEVICE_IDX_MAX ((1U << MINORBITS)/BCACHE_MINORS)
/* Superblock */ /* Superblock */
@ -721,6 +724,16 @@ static void bcache_device_attach(struct bcache_device *d, struct cache_set *c,
closure_get(&c->caching); closure_get(&c->caching);
} }
static inline int first_minor_to_idx(int first_minor)
{
return (first_minor/BCACHE_MINORS);
}
static inline int idx_to_first_minor(int idx)
{
return (idx * BCACHE_MINORS);
}
static void bcache_device_free(struct bcache_device *d) static void bcache_device_free(struct bcache_device *d)
{ {
lockdep_assert_held(&bch_register_lock); lockdep_assert_held(&bch_register_lock);
@ -734,7 +747,8 @@ static void bcache_device_free(struct bcache_device *d)
if (d->disk && d->disk->queue) if (d->disk && d->disk->queue)
blk_cleanup_queue(d->disk->queue); blk_cleanup_queue(d->disk->queue);
if (d->disk) { if (d->disk) {
ida_simple_remove(&bcache_minor, d->disk->first_minor); ida_simple_remove(&bcache_device_idx,
first_minor_to_idx(d->disk->first_minor));
put_disk(d->disk); put_disk(d->disk);
} }
@ -751,7 +765,7 @@ static int bcache_device_init(struct bcache_device *d, unsigned block_size,
{ {
struct request_queue *q; struct request_queue *q;
size_t n; size_t n;
int minor; int idx;
if (!d->stripe_size) if (!d->stripe_size)
d->stripe_size = 1 << 31; d->stripe_size = 1 << 31;
@ -776,25 +790,24 @@ static int bcache_device_init(struct bcache_device *d, unsigned block_size,
if (!d->full_dirty_stripes) if (!d->full_dirty_stripes)
return -ENOMEM; return -ENOMEM;
minor = ida_simple_get(&bcache_minor, 0, MINORMASK + 1, GFP_KERNEL); idx = ida_simple_get(&bcache_device_idx, 0,
if (minor < 0) BCACHE_DEVICE_IDX_MAX, GFP_KERNEL);
return minor; if (idx < 0)
return idx;
minor *= BCACHE_MINORS;
if (!(d->bio_split = bioset_create(4, offsetof(struct bbio, bio), if (!(d->bio_split = bioset_create(4, offsetof(struct bbio, bio),
BIOSET_NEED_BVECS | BIOSET_NEED_BVECS |
BIOSET_NEED_RESCUER)) || BIOSET_NEED_RESCUER)) ||
!(d->disk = alloc_disk(BCACHE_MINORS))) { !(d->disk = alloc_disk(BCACHE_MINORS))) {
ida_simple_remove(&bcache_minor, minor); ida_simple_remove(&bcache_device_idx, idx);
return -ENOMEM; return -ENOMEM;
} }
set_capacity(d->disk, sectors); set_capacity(d->disk, sectors);
snprintf(d->disk->disk_name, DISK_NAME_LEN, "bcache%i", minor); snprintf(d->disk->disk_name, DISK_NAME_LEN, "bcache%i", idx);
d->disk->major = bcache_major; d->disk->major = bcache_major;
d->disk->first_minor = minor; d->disk->first_minor = idx_to_first_minor(idx);
d->disk->fops = &bcache_ops; d->disk->fops = &bcache_ops;
d->disk->private_data = d; d->disk->private_data = d;
@ -889,7 +902,7 @@ static void cached_dev_detach_finish(struct work_struct *w)
closure_init_stack(&cl); closure_init_stack(&cl);
BUG_ON(!test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags)); BUG_ON(!test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags));
BUG_ON(atomic_read(&dc->count)); BUG_ON(refcount_read(&dc->count));
mutex_lock(&bch_register_lock); mutex_lock(&bch_register_lock);
@ -1016,7 +1029,7 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c)
* dc->c must be set before dc->count != 0 - paired with the mb in * dc->c must be set before dc->count != 0 - paired with the mb in
* cached_dev_get() * cached_dev_get()
*/ */
atomic_set(&dc->count, 1); refcount_set(&dc->count, 1);
/* Block writeback thread, but spawn it */ /* Block writeback thread, but spawn it */
down_write(&dc->writeback_lock); down_write(&dc->writeback_lock);
@ -1028,7 +1041,7 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c)
if (BDEV_STATE(&dc->sb) == BDEV_STATE_DIRTY) { if (BDEV_STATE(&dc->sb) == BDEV_STATE_DIRTY) {
bch_sectors_dirty_init(&dc->disk); bch_sectors_dirty_init(&dc->disk);
atomic_set(&dc->has_dirty, 1); atomic_set(&dc->has_dirty, 1);
atomic_inc(&dc->count); refcount_inc(&dc->count);
bch_writeback_queue(dc); bch_writeback_queue(dc);
} }
@ -1129,9 +1142,6 @@ static int cached_dev_init(struct cached_dev *dc, unsigned block_size)
if (ret) if (ret)
return ret; return ret;
set_capacity(dc->disk.disk,
dc->bdev->bd_part->nr_sects - dc->sb.data_offset);
dc->disk.disk->queue->backing_dev_info->ra_pages = dc->disk.disk->queue->backing_dev_info->ra_pages =
max(dc->disk.disk->queue->backing_dev_info->ra_pages, max(dc->disk.disk->queue->backing_dev_info->ra_pages,
q->backing_dev_info->ra_pages); q->backing_dev_info->ra_pages);
@ -2085,6 +2095,7 @@ static void bcache_exit(void)
if (bcache_major) if (bcache_major)
unregister_blkdev(bcache_major, "bcache"); unregister_blkdev(bcache_major, "bcache");
unregister_reboot_notifier(&reboot); unregister_reboot_notifier(&reboot);
mutex_destroy(&bch_register_lock);
} }
static int __init bcache_init(void) static int __init bcache_init(void)
@ -2103,14 +2114,15 @@ static int __init bcache_init(void)
bcache_major = register_blkdev(0, "bcache"); bcache_major = register_blkdev(0, "bcache");
if (bcache_major < 0) { if (bcache_major < 0) {
unregister_reboot_notifier(&reboot); unregister_reboot_notifier(&reboot);
mutex_destroy(&bch_register_lock);
return bcache_major; return bcache_major;
} }
if (!(bcache_wq = alloc_workqueue("bcache", WQ_MEM_RECLAIM, 0)) || if (!(bcache_wq = alloc_workqueue("bcache", WQ_MEM_RECLAIM, 0)) ||
!(bcache_kobj = kobject_create_and_add("bcache", fs_kobj)) || !(bcache_kobj = kobject_create_and_add("bcache", fs_kobj)) ||
sysfs_create_files(bcache_kobj, files) ||
bch_request_init() || bch_request_init() ||
bch_debug_init(bcache_kobj)) bch_debug_init(bcache_kobj) ||
sysfs_create_files(bcache_kobj, files))
goto err; goto err;
return 0; return 0;

View File

@ -82,8 +82,9 @@ rw_attribute(writeback_delay);
rw_attribute(writeback_rate); rw_attribute(writeback_rate);
rw_attribute(writeback_rate_update_seconds); rw_attribute(writeback_rate_update_seconds);
rw_attribute(writeback_rate_d_term); rw_attribute(writeback_rate_i_term_inverse);
rw_attribute(writeback_rate_p_term_inverse); rw_attribute(writeback_rate_p_term_inverse);
rw_attribute(writeback_rate_minimum);
read_attribute(writeback_rate_debug); read_attribute(writeback_rate_debug);
read_attribute(stripe_size); read_attribute(stripe_size);
@ -131,15 +132,16 @@ SHOW(__bch_cached_dev)
sysfs_hprint(writeback_rate, dc->writeback_rate.rate << 9); sysfs_hprint(writeback_rate, dc->writeback_rate.rate << 9);
var_print(writeback_rate_update_seconds); var_print(writeback_rate_update_seconds);
var_print(writeback_rate_d_term); var_print(writeback_rate_i_term_inverse);
var_print(writeback_rate_p_term_inverse); var_print(writeback_rate_p_term_inverse);
var_print(writeback_rate_minimum);
if (attr == &sysfs_writeback_rate_debug) { if (attr == &sysfs_writeback_rate_debug) {
char rate[20]; char rate[20];
char dirty[20]; char dirty[20];
char target[20]; char target[20];
char proportional[20]; char proportional[20];
char derivative[20]; char integral[20];
char change[20]; char change[20];
s64 next_io; s64 next_io;
@ -147,7 +149,7 @@ SHOW(__bch_cached_dev)
bch_hprint(dirty, bcache_dev_sectors_dirty(&dc->disk) << 9); bch_hprint(dirty, bcache_dev_sectors_dirty(&dc->disk) << 9);
bch_hprint(target, dc->writeback_rate_target << 9); bch_hprint(target, dc->writeback_rate_target << 9);
bch_hprint(proportional,dc->writeback_rate_proportional << 9); bch_hprint(proportional,dc->writeback_rate_proportional << 9);
bch_hprint(derivative, dc->writeback_rate_derivative << 9); bch_hprint(integral, dc->writeback_rate_integral_scaled << 9);
bch_hprint(change, dc->writeback_rate_change << 9); bch_hprint(change, dc->writeback_rate_change << 9);
next_io = div64_s64(dc->writeback_rate.next - local_clock(), next_io = div64_s64(dc->writeback_rate.next - local_clock(),
@ -158,11 +160,11 @@ SHOW(__bch_cached_dev)
"dirty:\t\t%s\n" "dirty:\t\t%s\n"
"target:\t\t%s\n" "target:\t\t%s\n"
"proportional:\t%s\n" "proportional:\t%s\n"
"derivative:\t%s\n" "integral:\t%s\n"
"change:\t\t%s/sec\n" "change:\t\t%s/sec\n"
"next io:\t%llims\n", "next io:\t%llims\n",
rate, dirty, target, proportional, rate, dirty, target, proportional,
derivative, change, next_io); integral, change, next_io);
} }
sysfs_hprint(dirty_data, sysfs_hprint(dirty_data,
@ -214,7 +216,7 @@ STORE(__cached_dev)
dc->writeback_rate.rate, 1, INT_MAX); dc->writeback_rate.rate, 1, INT_MAX);
d_strtoul_nonzero(writeback_rate_update_seconds); d_strtoul_nonzero(writeback_rate_update_seconds);
d_strtoul(writeback_rate_d_term); d_strtoul(writeback_rate_i_term_inverse);
d_strtoul_nonzero(writeback_rate_p_term_inverse); d_strtoul_nonzero(writeback_rate_p_term_inverse);
d_strtoi_h(sequential_cutoff); d_strtoi_h(sequential_cutoff);
@ -320,7 +322,7 @@ static struct attribute *bch_cached_dev_files[] = {
&sysfs_writeback_percent, &sysfs_writeback_percent,
&sysfs_writeback_rate, &sysfs_writeback_rate,
&sysfs_writeback_rate_update_seconds, &sysfs_writeback_rate_update_seconds,
&sysfs_writeback_rate_d_term, &sysfs_writeback_rate_i_term_inverse,
&sysfs_writeback_rate_p_term_inverse, &sysfs_writeback_rate_p_term_inverse,
&sysfs_writeback_rate_debug, &sysfs_writeback_rate_debug,
&sysfs_dirty_data, &sysfs_dirty_data,
@ -746,6 +748,11 @@ static struct attribute *bch_cache_set_internal_files[] = {
}; };
KTYPE(bch_cache_set_internal); KTYPE(bch_cache_set_internal);
static int __bch_cache_cmp(const void *l, const void *r)
{
return *((uint16_t *)r) - *((uint16_t *)l);
}
SHOW(__bch_cache) SHOW(__bch_cache)
{ {
struct cache *ca = container_of(kobj, struct cache, kobj); struct cache *ca = container_of(kobj, struct cache, kobj);
@ -770,9 +777,6 @@ SHOW(__bch_cache)
CACHE_REPLACEMENT(&ca->sb)); CACHE_REPLACEMENT(&ca->sb));
if (attr == &sysfs_priority_stats) { if (attr == &sysfs_priority_stats) {
int cmp(const void *l, const void *r)
{ return *((uint16_t *) r) - *((uint16_t *) l); }
struct bucket *b; struct bucket *b;
size_t n = ca->sb.nbuckets, i; size_t n = ca->sb.nbuckets, i;
size_t unused = 0, available = 0, dirty = 0, meta = 0; size_t unused = 0, available = 0, dirty = 0, meta = 0;
@ -801,7 +805,7 @@ SHOW(__bch_cache)
p[i] = ca->buckets[i].prio; p[i] = ca->buckets[i].prio;
mutex_unlock(&ca->set->bucket_lock); mutex_unlock(&ca->set->bucket_lock);
sort(p, n, sizeof(uint16_t), cmp, NULL); sort(p, n, sizeof(uint16_t), __bch_cache_cmp, NULL);
while (n && while (n &&
!cached[n - 1]) !cached[n - 1])

View File

@ -232,8 +232,14 @@ uint64_t bch_next_delay(struct bch_ratelimit *d, uint64_t done)
d->next += div_u64(done * NSEC_PER_SEC, d->rate); d->next += div_u64(done * NSEC_PER_SEC, d->rate);
if (time_before64(now + NSEC_PER_SEC, d->next)) /* Bound the time. Don't let us fall further than 2 seconds behind
d->next = now + NSEC_PER_SEC; * (this prevents unnecessary backlog that would make it impossible
* to catch up). If we're ahead of the desired writeback rate,
* don't let us sleep more than 2.5 seconds (so we can notice/respond
* if the control system tells us to speed up!).
*/
if (time_before64(now + NSEC_PER_SEC * 5LLU / 2LLU, d->next))
d->next = now + NSEC_PER_SEC * 5LLU / 2LLU;
if (time_after64(now - NSEC_PER_SEC * 2, d->next)) if (time_after64(now - NSEC_PER_SEC * 2, d->next))
d->next = now - NSEC_PER_SEC * 2; d->next = now - NSEC_PER_SEC * 2;

View File

@ -442,10 +442,10 @@ struct bch_ratelimit {
uint64_t next; uint64_t next;
/* /*
* Rate at which we want to do work, in units per nanosecond * Rate at which we want to do work, in units per second
* The units here correspond to the units passed to bch_next_delay() * The units here correspond to the units passed to bch_next_delay()
*/ */
unsigned rate; uint32_t rate;
}; };
static inline void bch_ratelimit_reset(struct bch_ratelimit *d) static inline void bch_ratelimit_reset(struct bch_ratelimit *d)

View File

@ -26,48 +26,63 @@ static void __update_writeback_rate(struct cached_dev *dc)
bcache_flash_devs_sectors_dirty(c); bcache_flash_devs_sectors_dirty(c);
uint64_t cache_dirty_target = uint64_t cache_dirty_target =
div_u64(cache_sectors * dc->writeback_percent, 100); div_u64(cache_sectors * dc->writeback_percent, 100);
int64_t target = div64_u64(cache_dirty_target * bdev_sectors(dc->bdev), int64_t target = div64_u64(cache_dirty_target * bdev_sectors(dc->bdev),
c->cached_dev_sectors); c->cached_dev_sectors);
/* PD controller */ /*
* PI controller:
* Figures out the amount that should be written per second.
*
* First, the error (number of sectors that are dirty beyond our
* target) is calculated. The error is accumulated (numerically
* integrated).
*
* Then, the proportional value and integral value are scaled
* based on configured values. These are stored as inverses to
* avoid fixed point math and to make configuration easy-- e.g.
* the default value of 40 for writeback_rate_p_term_inverse
* attempts to write at a rate that would retire all the dirty
* blocks in 40 seconds.
*
* The writeback_rate_i_inverse value of 10000 means that 1/10000th
* of the error is accumulated in the integral term per second.
* This acts as a slow, long-term average that is not subject to
* variations in usage like the p term.
*/
int64_t dirty = bcache_dev_sectors_dirty(&dc->disk); int64_t dirty = bcache_dev_sectors_dirty(&dc->disk);
int64_t derivative = dirty - dc->disk.sectors_dirty_last; int64_t error = dirty - target;
int64_t proportional = dirty - target; int64_t proportional_scaled =
int64_t change; div_s64(error, dc->writeback_rate_p_term_inverse);
int64_t integral_scaled;
uint32_t new_rate;
dc->disk.sectors_dirty_last = dirty; if ((error < 0 && dc->writeback_rate_integral > 0) ||
(error > 0 && time_before64(local_clock(),
dc->writeback_rate.next + NSEC_PER_MSEC))) {
/*
* Only decrease the integral term if it's more than
* zero. Only increase the integral term if the device
* is keeping up. (Don't wind up the integral
* ineffectively in either case).
*
* It's necessary to scale this by
* writeback_rate_update_seconds to keep the integral
* term dimensioned properly.
*/
dc->writeback_rate_integral += error *
dc->writeback_rate_update_seconds;
}
/* Scale to sectors per second */ integral_scaled = div_s64(dc->writeback_rate_integral,
dc->writeback_rate_i_term_inverse);
proportional *= dc->writeback_rate_update_seconds; new_rate = clamp_t(int32_t, (proportional_scaled + integral_scaled),
proportional = div_s64(proportional, dc->writeback_rate_p_term_inverse); dc->writeback_rate_minimum, NSEC_PER_SEC);
derivative = div_s64(derivative, dc->writeback_rate_update_seconds); dc->writeback_rate_proportional = proportional_scaled;
dc->writeback_rate_integral_scaled = integral_scaled;
derivative = ewma_add(dc->disk.sectors_dirty_derivative, derivative, dc->writeback_rate_change = new_rate - dc->writeback_rate.rate;
(dc->writeback_rate_d_term / dc->writeback_rate.rate = new_rate;
dc->writeback_rate_update_seconds) ?: 1, 0);
derivative *= dc->writeback_rate_d_term;
derivative = div_s64(derivative, dc->writeback_rate_p_term_inverse);
change = proportional + derivative;
/* Don't increase writeback rate if the device isn't keeping up */
if (change > 0 &&
time_after64(local_clock(),
dc->writeback_rate.next + NSEC_PER_MSEC))
change = 0;
dc->writeback_rate.rate =
clamp_t(int64_t, (int64_t) dc->writeback_rate.rate + change,
1, NSEC_PER_MSEC);
dc->writeback_rate_proportional = proportional;
dc->writeback_rate_derivative = derivative;
dc->writeback_rate_change = change;
dc->writeback_rate_target = target; dc->writeback_rate_target = target;
} }
@ -180,13 +195,21 @@ static void write_dirty(struct closure *cl)
struct dirty_io *io = container_of(cl, struct dirty_io, cl); struct dirty_io *io = container_of(cl, struct dirty_io, cl);
struct keybuf_key *w = io->bio.bi_private; struct keybuf_key *w = io->bio.bi_private;
dirty_init(w); /*
bio_set_op_attrs(&io->bio, REQ_OP_WRITE, 0); * IO errors are signalled using the dirty bit on the key.
io->bio.bi_iter.bi_sector = KEY_START(&w->key); * If we failed to read, we should not attempt to write to the
bio_set_dev(&io->bio, io->dc->bdev); * backing device. Instead, immediately go to write_dirty_finish
io->bio.bi_end_io = dirty_endio; * to clean up.
*/
if (KEY_DIRTY(&w->key)) {
dirty_init(w);
bio_set_op_attrs(&io->bio, REQ_OP_WRITE, 0);
io->bio.bi_iter.bi_sector = KEY_START(&w->key);
bio_set_dev(&io->bio, io->dc->bdev);
io->bio.bi_end_io = dirty_endio;
closure_bio_submit(&io->bio, cl); closure_bio_submit(&io->bio, cl);
}
continue_at(cl, write_dirty_finish, io->dc->writeback_write_wq); continue_at(cl, write_dirty_finish, io->dc->writeback_write_wq);
} }
@ -418,6 +441,8 @@ static int bch_writeback_thread(void *arg)
struct cached_dev *dc = arg; struct cached_dev *dc = arg;
bool searched_full_index; bool searched_full_index;
bch_ratelimit_reset(&dc->writeback_rate);
while (!kthread_should_stop()) { while (!kthread_should_stop()) {
down_write(&dc->writeback_lock); down_write(&dc->writeback_lock);
if (!atomic_read(&dc->has_dirty) || if (!atomic_read(&dc->has_dirty) ||
@ -445,7 +470,6 @@ static int bch_writeback_thread(void *arg)
up_write(&dc->writeback_lock); up_write(&dc->writeback_lock);
bch_ratelimit_reset(&dc->writeback_rate);
read_dirty(dc); read_dirty(dc);
if (searched_full_index) { if (searched_full_index) {
@ -455,6 +479,8 @@ static int bch_writeback_thread(void *arg)
!kthread_should_stop() && !kthread_should_stop() &&
!test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags)) !test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags))
delay = schedule_timeout_interruptible(delay); delay = schedule_timeout_interruptible(delay);
bch_ratelimit_reset(&dc->writeback_rate);
} }
} }
@ -492,8 +518,6 @@ void bch_sectors_dirty_init(struct bcache_device *d)
bch_btree_map_keys(&op.op, d->c, &KEY(op.inode, 0, 0), bch_btree_map_keys(&op.op, d->c, &KEY(op.inode, 0, 0),
sectors_dirty_init_fn, 0); sectors_dirty_init_fn, 0);
d->sectors_dirty_last = bcache_dev_sectors_dirty(d);
} }
void bch_cached_dev_writeback_init(struct cached_dev *dc) void bch_cached_dev_writeback_init(struct cached_dev *dc)
@ -507,10 +531,11 @@ void bch_cached_dev_writeback_init(struct cached_dev *dc)
dc->writeback_percent = 10; dc->writeback_percent = 10;
dc->writeback_delay = 30; dc->writeback_delay = 30;
dc->writeback_rate.rate = 1024; dc->writeback_rate.rate = 1024;
dc->writeback_rate_minimum = 8;
dc->writeback_rate_update_seconds = 5; dc->writeback_rate_update_seconds = 5;
dc->writeback_rate_d_term = 30; dc->writeback_rate_p_term_inverse = 40;
dc->writeback_rate_p_term_inverse = 6000; dc->writeback_rate_i_term_inverse = 10000;
INIT_DELAYED_WORK(&dc->writeback_rate_update, update_writeback_rate); INIT_DELAYED_WORK(&dc->writeback_rate_update, update_writeback_rate);
} }

View File

@ -77,7 +77,9 @@ static inline bool should_writeback(struct cached_dev *dc, struct bio *bio,
if (would_skip) if (would_skip)
return false; return false;
return op_is_sync(bio->bi_opf) || in_use <= CUTOFF_WRITEBACK; return (op_is_sync(bio->bi_opf) ||
bio->bi_opf & (REQ_META|REQ_PRIO) ||
in_use <= CUTOFF_WRITEBACK);
} }
static inline void bch_writeback_queue(struct cached_dev *dc) static inline void bch_writeback_queue(struct cached_dev *dc)
@ -90,7 +92,7 @@ static inline void bch_writeback_add(struct cached_dev *dc)
{ {
if (!atomic_read(&dc->has_dirty) && if (!atomic_read(&dc->has_dirty) &&
!atomic_xchg(&dc->has_dirty, 1)) { !atomic_xchg(&dc->has_dirty, 1)) {
atomic_inc(&dc->count); refcount_inc(&dc->count);
if (BDEV_STATE(&dc->sb) != BDEV_STATE_DIRTY) { if (BDEV_STATE(&dc->sb) != BDEV_STATE_DIRTY) {
SET_BDEV_STATE(&dc->sb, BDEV_STATE_DIRTY); SET_BDEV_STATE(&dc->sb, BDEV_STATE_DIRTY);

View File

@ -368,7 +368,7 @@ static int read_page(struct file *file, unsigned long index,
pr_debug("read bitmap file (%dB @ %llu)\n", (int)PAGE_SIZE, pr_debug("read bitmap file (%dB @ %llu)\n", (int)PAGE_SIZE,
(unsigned long long)index << PAGE_SHIFT); (unsigned long long)index << PAGE_SHIFT);
bh = alloc_page_buffers(page, 1<<inode->i_blkbits, 0); bh = alloc_page_buffers(page, 1<<inode->i_blkbits, false);
if (!bh) { if (!bh) {
ret = -ENOMEM; ret = -ENOMEM;
goto out; goto out;

View File

@ -56,7 +56,7 @@ static unsigned dm_get_blk_mq_queue_depth(void)
int dm_request_based(struct mapped_device *md) int dm_request_based(struct mapped_device *md)
{ {
return blk_queue_stackable(md->queue); return queue_is_rq_based(md->queue);
} }
static void dm_old_start_queue(struct request_queue *q) static void dm_old_start_queue(struct request_queue *q)

View File

@ -1000,7 +1000,7 @@ verify_rq_based:
list_for_each_entry(dd, devices, list) { list_for_each_entry(dd, devices, list) {
struct request_queue *q = bdev_get_queue(dd->dm_dev->bdev); struct request_queue *q = bdev_get_queue(dd->dm_dev->bdev);
if (!blk_queue_stackable(q)) { if (!queue_is_rq_based(q)) {
DMERR("table load rejected: including" DMERR("table load rejected: including"
" non-request-stackable devices"); " non-request-stackable devices");
return -EINVAL; return -EINVAL;
@ -1847,19 +1847,6 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
*/ */
if (blk_queue_add_random(q) && dm_table_all_devices_attribute(t, device_is_not_random)) if (blk_queue_add_random(q) && dm_table_all_devices_attribute(t, device_is_not_random))
queue_flag_clear_unlocked(QUEUE_FLAG_ADD_RANDOM, q); queue_flag_clear_unlocked(QUEUE_FLAG_ADD_RANDOM, q);
/*
* QUEUE_FLAG_STACKABLE must be set after all queue settings are
* visible to other CPUs because, once the flag is set, incoming bios
* are processed by request-based dm, which refers to the queue
* settings.
* Until the flag set, bios are passed to bio-based dm and queued to
* md->deferred where queue settings are not needed yet.
* Those bios are passed to request-based dm at the resume time.
*/
smp_mb();
if (dm_table_request_based(t))
queue_flag_set_unlocked(QUEUE_FLAG_STACKABLE, q);
} }
unsigned int dm_table_get_num_targets(struct dm_table *t) unsigned int dm_table_get_num_targets(struct dm_table *t)

View File

@ -1618,17 +1618,6 @@ static void dm_wq_work(struct work_struct *work);
void dm_init_md_queue(struct mapped_device *md) void dm_init_md_queue(struct mapped_device *md)
{ {
/*
* Request-based dm devices cannot be stacked on top of bio-based dm
* devices. The type of this dm device may not have been decided yet.
* The type is decided at the first table loading time.
* To prevent problematic device stacking, clear the queue flag
* for request stacking support until then.
*
* This queue is new, so no concurrency on the queue_flags.
*/
queue_flag_clear_unlocked(QUEUE_FLAG_STACKABLE, md->queue);
/* /*
* Initialize data that will only be used by a non-blk-mq DM queue * Initialize data that will only be used by a non-blk-mq DM queue
* - must do so here (in alloc_dev callchain) before queue is used * - must do so here (in alloc_dev callchain) before queue is used

View File

@ -1,2 +1,6 @@
menu "NVME Support"
source "drivers/nvme/host/Kconfig" source "drivers/nvme/host/Kconfig"
source "drivers/nvme/target/Kconfig" source "drivers/nvme/target/Kconfig"
endmenu

View File

@ -13,6 +13,15 @@ config BLK_DEV_NVME
To compile this driver as a module, choose M here: the To compile this driver as a module, choose M here: the
module will be called nvme. module will be called nvme.
config NVME_MULTIPATH
bool "NVMe multipath support"
depends on NVME_CORE
---help---
This option enables support for multipath access to NVMe
subsystems. If this option is enabled only a single
/dev/nvmeXnY device will show up for each NVMe namespaces,
even if it is accessible through multiple controllers.
config NVME_FABRICS config NVME_FABRICS
tristate tristate

View File

@ -6,6 +6,7 @@ obj-$(CONFIG_NVME_RDMA) += nvme-rdma.o
obj-$(CONFIG_NVME_FC) += nvme-fc.o obj-$(CONFIG_NVME_FC) += nvme-fc.o
nvme-core-y := core.o nvme-core-y := core.o
nvme-core-$(CONFIG_NVME_MULTIPATH) += multipath.o
nvme-core-$(CONFIG_NVM) += lightnvm.o nvme-core-$(CONFIG_NVM) += lightnvm.o
nvme-y += pci.o nvme-y += pci.o

File diff suppressed because it is too large Load Diff

View File

@ -548,6 +548,7 @@ static const match_table_t opt_tokens = {
{ NVMF_OPT_HOSTNQN, "hostnqn=%s" }, { NVMF_OPT_HOSTNQN, "hostnqn=%s" },
{ NVMF_OPT_HOST_TRADDR, "host_traddr=%s" }, { NVMF_OPT_HOST_TRADDR, "host_traddr=%s" },
{ NVMF_OPT_HOST_ID, "hostid=%s" }, { NVMF_OPT_HOST_ID, "hostid=%s" },
{ NVMF_OPT_DUP_CONNECT, "duplicate_connect" },
{ NVMF_OPT_ERR, NULL } { NVMF_OPT_ERR, NULL }
}; };
@ -566,6 +567,7 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
opts->nr_io_queues = num_online_cpus(); opts->nr_io_queues = num_online_cpus();
opts->reconnect_delay = NVMF_DEF_RECONNECT_DELAY; opts->reconnect_delay = NVMF_DEF_RECONNECT_DELAY;
opts->kato = NVME_DEFAULT_KATO; opts->kato = NVME_DEFAULT_KATO;
opts->duplicate_connect = false;
options = o = kstrdup(buf, GFP_KERNEL); options = o = kstrdup(buf, GFP_KERNEL);
if (!options) if (!options)
@ -742,6 +744,9 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
goto out; goto out;
} }
break; break;
case NVMF_OPT_DUP_CONNECT:
opts->duplicate_connect = true;
break;
default: default:
pr_warn("unknown parameter or missing value '%s' in ctrl creation request\n", pr_warn("unknown parameter or missing value '%s' in ctrl creation request\n",
p); p);
@ -823,7 +828,7 @@ EXPORT_SYMBOL_GPL(nvmf_free_options);
#define NVMF_REQUIRED_OPTS (NVMF_OPT_TRANSPORT | NVMF_OPT_NQN) #define NVMF_REQUIRED_OPTS (NVMF_OPT_TRANSPORT | NVMF_OPT_NQN)
#define NVMF_ALLOWED_OPTS (NVMF_OPT_QUEUE_SIZE | NVMF_OPT_NR_IO_QUEUES | \ #define NVMF_ALLOWED_OPTS (NVMF_OPT_QUEUE_SIZE | NVMF_OPT_NR_IO_QUEUES | \
NVMF_OPT_KATO | NVMF_OPT_HOSTNQN | \ NVMF_OPT_KATO | NVMF_OPT_HOSTNQN | \
NVMF_OPT_HOST_ID) NVMF_OPT_HOST_ID | NVMF_OPT_DUP_CONNECT)
static struct nvme_ctrl * static struct nvme_ctrl *
nvmf_create_ctrl(struct device *dev, const char *buf, size_t count) nvmf_create_ctrl(struct device *dev, const char *buf, size_t count)
@ -841,6 +846,9 @@ nvmf_create_ctrl(struct device *dev, const char *buf, size_t count)
if (ret) if (ret)
goto out_free_opts; goto out_free_opts;
request_module("nvme-%s", opts->transport);
/* /*
* Check the generic options first as we need a valid transport for * Check the generic options first as we need a valid transport for
* the lookup below. Then clear the generic flags so that transport * the lookup below. Then clear the generic flags so that transport
@ -874,12 +882,12 @@ nvmf_create_ctrl(struct device *dev, const char *buf, size_t count)
goto out_unlock; goto out_unlock;
} }
if (strcmp(ctrl->subnqn, opts->subsysnqn)) { if (strcmp(ctrl->subsys->subnqn, opts->subsysnqn)) {
dev_warn(ctrl->device, dev_warn(ctrl->device,
"controller returned incorrect NQN: \"%s\".\n", "controller returned incorrect NQN: \"%s\".\n",
ctrl->subnqn); ctrl->subsys->subnqn);
up_read(&nvmf_transports_rwsem); up_read(&nvmf_transports_rwsem);
ctrl->ops->delete_ctrl(ctrl); nvme_delete_ctrl_sync(ctrl);
return ERR_PTR(-EINVAL); return ERR_PTR(-EINVAL);
} }

View File

@ -57,6 +57,7 @@ enum {
NVMF_OPT_HOST_TRADDR = 1 << 10, NVMF_OPT_HOST_TRADDR = 1 << 10,
NVMF_OPT_CTRL_LOSS_TMO = 1 << 11, NVMF_OPT_CTRL_LOSS_TMO = 1 << 11,
NVMF_OPT_HOST_ID = 1 << 12, NVMF_OPT_HOST_ID = 1 << 12,
NVMF_OPT_DUP_CONNECT = 1 << 13,
}; };
/** /**
@ -96,6 +97,7 @@ struct nvmf_ctrl_options {
unsigned int nr_io_queues; unsigned int nr_io_queues;
unsigned int reconnect_delay; unsigned int reconnect_delay;
bool discovery_nqn; bool discovery_nqn;
bool duplicate_connect;
unsigned int kato; unsigned int kato;
struct nvmf_host *host; struct nvmf_host *host;
int max_reconnects; int max_reconnects;
@ -131,6 +133,18 @@ struct nvmf_transport_ops {
struct nvmf_ctrl_options *opts); struct nvmf_ctrl_options *opts);
}; };
static inline bool
nvmf_ctlr_matches_baseopts(struct nvme_ctrl *ctrl,
struct nvmf_ctrl_options *opts)
{
if (strcmp(opts->subsysnqn, ctrl->opts->subsysnqn) ||
strcmp(opts->host->nqn, ctrl->opts->host->nqn) ||
memcmp(&opts->host->id, &ctrl->opts->host->id, sizeof(uuid_t)))
return false;
return true;
}
int nvmf_reg_read32(struct nvme_ctrl *ctrl, u32 off, u32 *val); int nvmf_reg_read32(struct nvme_ctrl *ctrl, u32 off, u32 *val);
int nvmf_reg_read64(struct nvme_ctrl *ctrl, u32 off, u64 *val); int nvmf_reg_read64(struct nvme_ctrl *ctrl, u32 off, u64 *val);
int nvmf_reg_write32(struct nvme_ctrl *ctrl, u32 off, u32 val); int nvmf_reg_write32(struct nvme_ctrl *ctrl, u32 off, u32 val);

File diff suppressed because it is too large Load Diff

View File

@ -305,7 +305,7 @@ static int nvme_nvm_identity(struct nvm_dev *nvmdev, struct nvm_id *nvm_id)
int ret; int ret;
c.identity.opcode = nvme_nvm_admin_identity; c.identity.opcode = nvme_nvm_admin_identity;
c.identity.nsid = cpu_to_le32(ns->ns_id); c.identity.nsid = cpu_to_le32(ns->head->ns_id);
c.identity.chnl_off = 0; c.identity.chnl_off = 0;
nvme_nvm_id = kmalloc(sizeof(struct nvme_nvm_id), GFP_KERNEL); nvme_nvm_id = kmalloc(sizeof(struct nvme_nvm_id), GFP_KERNEL);
@ -344,7 +344,7 @@ static int nvme_nvm_get_l2p_tbl(struct nvm_dev *nvmdev, u64 slba, u32 nlb,
int ret = 0; int ret = 0;
c.l2p.opcode = nvme_nvm_admin_get_l2p_tbl; c.l2p.opcode = nvme_nvm_admin_get_l2p_tbl;
c.l2p.nsid = cpu_to_le32(ns->ns_id); c.l2p.nsid = cpu_to_le32(ns->head->ns_id);
entries = kmalloc(len, GFP_KERNEL); entries = kmalloc(len, GFP_KERNEL);
if (!entries) if (!entries)
return -ENOMEM; return -ENOMEM;
@ -402,7 +402,7 @@ static int nvme_nvm_get_bb_tbl(struct nvm_dev *nvmdev, struct ppa_addr ppa,
int ret = 0; int ret = 0;
c.get_bb.opcode = nvme_nvm_admin_get_bb_tbl; c.get_bb.opcode = nvme_nvm_admin_get_bb_tbl;
c.get_bb.nsid = cpu_to_le32(ns->ns_id); c.get_bb.nsid = cpu_to_le32(ns->head->ns_id);
c.get_bb.spba = cpu_to_le64(ppa.ppa); c.get_bb.spba = cpu_to_le64(ppa.ppa);
bb_tbl = kzalloc(tblsz, GFP_KERNEL); bb_tbl = kzalloc(tblsz, GFP_KERNEL);
@ -452,7 +452,7 @@ static int nvme_nvm_set_bb_tbl(struct nvm_dev *nvmdev, struct ppa_addr *ppas,
int ret = 0; int ret = 0;
c.set_bb.opcode = nvme_nvm_admin_set_bb_tbl; c.set_bb.opcode = nvme_nvm_admin_set_bb_tbl;
c.set_bb.nsid = cpu_to_le32(ns->ns_id); c.set_bb.nsid = cpu_to_le32(ns->head->ns_id);
c.set_bb.spba = cpu_to_le64(ppas->ppa); c.set_bb.spba = cpu_to_le64(ppas->ppa);
c.set_bb.nlb = cpu_to_le16(nr_ppas - 1); c.set_bb.nlb = cpu_to_le16(nr_ppas - 1);
c.set_bb.value = type; c.set_bb.value = type;
@ -469,7 +469,7 @@ static inline void nvme_nvm_rqtocmd(struct nvm_rq *rqd, struct nvme_ns *ns,
struct nvme_nvm_command *c) struct nvme_nvm_command *c)
{ {
c->ph_rw.opcode = rqd->opcode; c->ph_rw.opcode = rqd->opcode;
c->ph_rw.nsid = cpu_to_le32(ns->ns_id); c->ph_rw.nsid = cpu_to_le32(ns->head->ns_id);
c->ph_rw.spba = cpu_to_le64(rqd->ppa_addr.ppa); c->ph_rw.spba = cpu_to_le64(rqd->ppa_addr.ppa);
c->ph_rw.metadata = cpu_to_le64(rqd->dma_meta_list); c->ph_rw.metadata = cpu_to_le64(rqd->dma_meta_list);
c->ph_rw.control = cpu_to_le16(rqd->flags); c->ph_rw.control = cpu_to_le16(rqd->flags);
@ -492,33 +492,46 @@ static void nvme_nvm_end_io(struct request *rq, blk_status_t status)
blk_mq_free_request(rq); blk_mq_free_request(rq);
} }
static struct request *nvme_nvm_alloc_request(struct request_queue *q,
struct nvm_rq *rqd,
struct nvme_nvm_command *cmd)
{
struct nvme_ns *ns = q->queuedata;
struct request *rq;
nvme_nvm_rqtocmd(rqd, ns, cmd);
rq = nvme_alloc_request(q, (struct nvme_command *)cmd, 0, NVME_QID_ANY);
if (IS_ERR(rq))
return rq;
rq->cmd_flags &= ~REQ_FAILFAST_DRIVER;
if (rqd->bio) {
blk_init_request_from_bio(rq, rqd->bio);
} else {
rq->ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_NORM);
rq->__data_len = 0;
}
return rq;
}
static int nvme_nvm_submit_io(struct nvm_dev *dev, struct nvm_rq *rqd) static int nvme_nvm_submit_io(struct nvm_dev *dev, struct nvm_rq *rqd)
{ {
struct request_queue *q = dev->q; struct request_queue *q = dev->q;
struct nvme_ns *ns = q->queuedata;
struct request *rq;
struct bio *bio = rqd->bio;
struct nvme_nvm_command *cmd; struct nvme_nvm_command *cmd;
struct request *rq;
cmd = kzalloc(sizeof(struct nvme_nvm_command), GFP_KERNEL); cmd = kzalloc(sizeof(struct nvme_nvm_command), GFP_KERNEL);
if (!cmd) if (!cmd)
return -ENOMEM; return -ENOMEM;
nvme_nvm_rqtocmd(rqd, ns, cmd); rq = nvme_nvm_alloc_request(q, rqd, cmd);
rq = nvme_alloc_request(q, (struct nvme_command *)cmd, 0, NVME_QID_ANY);
if (IS_ERR(rq)) { if (IS_ERR(rq)) {
kfree(cmd); kfree(cmd);
return PTR_ERR(rq); return PTR_ERR(rq);
} }
rq->cmd_flags &= ~REQ_FAILFAST_DRIVER;
if (bio) {
blk_init_request_from_bio(rq, bio);
} else {
rq->ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_NORM);
rq->__data_len = 0;
}
rq->end_io_data = rqd; rq->end_io_data = rqd;
@ -527,6 +540,34 @@ static int nvme_nvm_submit_io(struct nvm_dev *dev, struct nvm_rq *rqd)
return 0; return 0;
} }
static int nvme_nvm_submit_io_sync(struct nvm_dev *dev, struct nvm_rq *rqd)
{
struct request_queue *q = dev->q;
struct request *rq;
struct nvme_nvm_command cmd;
int ret = 0;
memset(&cmd, 0, sizeof(struct nvme_nvm_command));
rq = nvme_nvm_alloc_request(q, rqd, &cmd);
if (IS_ERR(rq))
return PTR_ERR(rq);
/* I/Os can fail and the error is signaled through rqd. Callers must
* handle the error accordingly.
*/
blk_execute_rq(q, NULL, rq, 0);
if (nvme_req(rq)->flags & NVME_REQ_CANCELLED)
ret = -EINTR;
rqd->ppa_status = le64_to_cpu(nvme_req(rq)->result.u64);
rqd->error = nvme_req(rq)->status;
blk_mq_free_request(rq);
return ret;
}
static void *nvme_nvm_create_dma_pool(struct nvm_dev *nvmdev, char *name) static void *nvme_nvm_create_dma_pool(struct nvm_dev *nvmdev, char *name)
{ {
struct nvme_ns *ns = nvmdev->q->queuedata; struct nvme_ns *ns = nvmdev->q->queuedata;
@ -562,6 +603,7 @@ static struct nvm_dev_ops nvme_nvm_dev_ops = {
.set_bb_tbl = nvme_nvm_set_bb_tbl, .set_bb_tbl = nvme_nvm_set_bb_tbl,
.submit_io = nvme_nvm_submit_io, .submit_io = nvme_nvm_submit_io,
.submit_io_sync = nvme_nvm_submit_io_sync,
.create_dma_pool = nvme_nvm_create_dma_pool, .create_dma_pool = nvme_nvm_create_dma_pool,
.destroy_dma_pool = nvme_nvm_destroy_dma_pool, .destroy_dma_pool = nvme_nvm_destroy_dma_pool,
@ -600,8 +642,6 @@ static int nvme_nvm_submit_user_cmd(struct request_queue *q,
rq->timeout = timeout ? timeout : ADMIN_TIMEOUT; rq->timeout = timeout ? timeout : ADMIN_TIMEOUT;
rq->cmd_flags &= ~REQ_FAILFAST_DRIVER;
if (ppa_buf && ppa_len) { if (ppa_buf && ppa_len) {
ppa_list = dma_pool_alloc(dev->dma_pool, GFP_KERNEL, &ppa_dma); ppa_list = dma_pool_alloc(dev->dma_pool, GFP_KERNEL, &ppa_dma);
if (!ppa_list) { if (!ppa_list) {
@ -691,7 +731,7 @@ static int nvme_nvm_submit_vio(struct nvme_ns *ns,
memset(&c, 0, sizeof(c)); memset(&c, 0, sizeof(c));
c.ph_rw.opcode = vio.opcode; c.ph_rw.opcode = vio.opcode;
c.ph_rw.nsid = cpu_to_le32(ns->ns_id); c.ph_rw.nsid = cpu_to_le32(ns->head->ns_id);
c.ph_rw.control = cpu_to_le16(vio.control); c.ph_rw.control = cpu_to_le16(vio.control);
c.ph_rw.length = cpu_to_le16(vio.nppas); c.ph_rw.length = cpu_to_le16(vio.nppas);
@ -728,7 +768,7 @@ static int nvme_nvm_user_vcmd(struct nvme_ns *ns, int admin,
memset(&c, 0, sizeof(c)); memset(&c, 0, sizeof(c));
c.common.opcode = vcmd.opcode; c.common.opcode = vcmd.opcode;
c.common.nsid = cpu_to_le32(ns->ns_id); c.common.nsid = cpu_to_le32(ns->head->ns_id);
c.common.cdw2[0] = cpu_to_le32(vcmd.cdw2); c.common.cdw2[0] = cpu_to_le32(vcmd.cdw2);
c.common.cdw2[1] = cpu_to_le32(vcmd.cdw3); c.common.cdw2[1] = cpu_to_le32(vcmd.cdw3);
/* cdw11-12 */ /* cdw11-12 */

View File

@ -0,0 +1,291 @@
/*
* Copyright (c) 2017 Christoph Hellwig.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*/
#include <linux/moduleparam.h>
#include "nvme.h"
static bool multipath = true;
module_param(multipath, bool, 0644);
MODULE_PARM_DESC(multipath,
"turn on native support for multiple controllers per subsystem");
void nvme_failover_req(struct request *req)
{
struct nvme_ns *ns = req->q->queuedata;
unsigned long flags;
spin_lock_irqsave(&ns->head->requeue_lock, flags);
blk_steal_bios(&ns->head->requeue_list, req);
spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
blk_mq_end_request(req, 0);
nvme_reset_ctrl(ns->ctrl);
kblockd_schedule_work(&ns->head->requeue_work);
}
bool nvme_req_needs_failover(struct request *req)
{
if (!(req->cmd_flags & REQ_NVME_MPATH))
return false;
switch (nvme_req(req)->status & 0x7ff) {
/*
* Generic command status:
*/
case NVME_SC_INVALID_OPCODE:
case NVME_SC_INVALID_FIELD:
case NVME_SC_INVALID_NS:
case NVME_SC_LBA_RANGE:
case NVME_SC_CAP_EXCEEDED:
case NVME_SC_RESERVATION_CONFLICT:
return false;
/*
* I/O command set specific error. Unfortunately these values are
* reused for fabrics commands, but those should never get here.
*/
case NVME_SC_BAD_ATTRIBUTES:
case NVME_SC_INVALID_PI:
case NVME_SC_READ_ONLY:
case NVME_SC_ONCS_NOT_SUPPORTED:
WARN_ON_ONCE(nvme_req(req)->cmd->common.opcode ==
nvme_fabrics_command);
return false;
/*
* Media and Data Integrity Errors:
*/
case NVME_SC_WRITE_FAULT:
case NVME_SC_READ_ERROR:
case NVME_SC_GUARD_CHECK:
case NVME_SC_APPTAG_CHECK:
case NVME_SC_REFTAG_CHECK:
case NVME_SC_COMPARE_FAILED:
case NVME_SC_ACCESS_DENIED:
case NVME_SC_UNWRITTEN_BLOCK:
return false;
}
/* Everything else could be a path failure, so should be retried */
return true;
}
void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl)
{
struct nvme_ns *ns;
mutex_lock(&ctrl->namespaces_mutex);
list_for_each_entry(ns, &ctrl->namespaces, list) {
if (ns->head->disk)
kblockd_schedule_work(&ns->head->requeue_work);
}
mutex_unlock(&ctrl->namespaces_mutex);
}
static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head)
{
struct nvme_ns *ns;
list_for_each_entry_rcu(ns, &head->list, siblings) {
if (ns->ctrl->state == NVME_CTRL_LIVE) {
rcu_assign_pointer(head->current_path, ns);
return ns;
}
}
return NULL;
}
inline struct nvme_ns *nvme_find_path(struct nvme_ns_head *head)
{
struct nvme_ns *ns = srcu_dereference(head->current_path, &head->srcu);
if (unlikely(!ns || ns->ctrl->state != NVME_CTRL_LIVE))
ns = __nvme_find_path(head);
return ns;
}
static blk_qc_t nvme_ns_head_make_request(struct request_queue *q,
struct bio *bio)
{
struct nvme_ns_head *head = q->queuedata;
struct device *dev = disk_to_dev(head->disk);
struct nvme_ns *ns;
blk_qc_t ret = BLK_QC_T_NONE;
int srcu_idx;
srcu_idx = srcu_read_lock(&head->srcu);
ns = nvme_find_path(head);
if (likely(ns)) {
bio->bi_disk = ns->disk;
bio->bi_opf |= REQ_NVME_MPATH;
ret = direct_make_request(bio);
} else if (!list_empty_careful(&head->list)) {
dev_warn_ratelimited(dev, "no path available - requeing I/O\n");
spin_lock_irq(&head->requeue_lock);
bio_list_add(&head->requeue_list, bio);
spin_unlock_irq(&head->requeue_lock);
} else {
dev_warn_ratelimited(dev, "no path - failing I/O\n");
bio->bi_status = BLK_STS_IOERR;
bio_endio(bio);
}
srcu_read_unlock(&head->srcu, srcu_idx);
return ret;
}
static bool nvme_ns_head_poll(struct request_queue *q, blk_qc_t qc)
{
struct nvme_ns_head *head = q->queuedata;
struct nvme_ns *ns;
bool found = false;
int srcu_idx;
srcu_idx = srcu_read_lock(&head->srcu);
ns = srcu_dereference(head->current_path, &head->srcu);
if (likely(ns && ns->ctrl->state == NVME_CTRL_LIVE))
found = ns->queue->poll_fn(q, qc);
srcu_read_unlock(&head->srcu, srcu_idx);
return found;
}
static void nvme_requeue_work(struct work_struct *work)
{
struct nvme_ns_head *head =
container_of(work, struct nvme_ns_head, requeue_work);
struct bio *bio, *next;
spin_lock_irq(&head->requeue_lock);
next = bio_list_get(&head->requeue_list);
spin_unlock_irq(&head->requeue_lock);
while ((bio = next) != NULL) {
next = bio->bi_next;
bio->bi_next = NULL;
/*
* Reset disk to the mpath node and resubmit to select a new
* path.
*/
bio->bi_disk = head->disk;
generic_make_request(bio);
}
}
int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
{
struct request_queue *q;
bool vwc = false;
bio_list_init(&head->requeue_list);
spin_lock_init(&head->requeue_lock);
INIT_WORK(&head->requeue_work, nvme_requeue_work);
/*
* Add a multipath node if the subsystems supports multiple controllers.
* We also do this for private namespaces as the namespace sharing data could
* change after a rescan.
*/
if (!(ctrl->subsys->cmic & (1 << 1)) || !multipath)
return 0;
q = blk_alloc_queue_node(GFP_KERNEL, NUMA_NO_NODE);
if (!q)
goto out;
q->queuedata = head;
blk_queue_make_request(q, nvme_ns_head_make_request);
q->poll_fn = nvme_ns_head_poll;
queue_flag_set_unlocked(QUEUE_FLAG_NONROT, q);
/* set to a default value for 512 until disk is validated */
blk_queue_logical_block_size(q, 512);
/* we need to propagate up the VMC settings */
if (ctrl->vwc & NVME_CTRL_VWC_PRESENT)
vwc = true;
blk_queue_write_cache(q, vwc, vwc);
head->disk = alloc_disk(0);
if (!head->disk)
goto out_cleanup_queue;
head->disk->fops = &nvme_ns_head_ops;
head->disk->private_data = head;
head->disk->queue = q;
head->disk->flags = GENHD_FL_EXT_DEVT;
sprintf(head->disk->disk_name, "nvme%dn%d",
ctrl->subsys->instance, head->instance);
return 0;
out_cleanup_queue:
blk_cleanup_queue(q);
out:
return -ENOMEM;
}
void nvme_mpath_add_disk(struct nvme_ns_head *head)
{
if (!head->disk)
return;
device_add_disk(&head->subsys->dev, head->disk);
if (sysfs_create_group(&disk_to_dev(head->disk)->kobj,
&nvme_ns_id_attr_group))
pr_warn("%s: failed to create sysfs group for identification\n",
head->disk->disk_name);
}
void nvme_mpath_add_disk_links(struct nvme_ns *ns)
{
struct kobject *slave_disk_kobj, *holder_disk_kobj;
if (!ns->head->disk)
return;
slave_disk_kobj = &disk_to_dev(ns->disk)->kobj;
if (sysfs_create_link(ns->head->disk->slave_dir, slave_disk_kobj,
kobject_name(slave_disk_kobj)))
return;
holder_disk_kobj = &disk_to_dev(ns->head->disk)->kobj;
if (sysfs_create_link(ns->disk->part0.holder_dir, holder_disk_kobj,
kobject_name(holder_disk_kobj)))
sysfs_remove_link(ns->head->disk->slave_dir,
kobject_name(slave_disk_kobj));
}
void nvme_mpath_remove_disk(struct nvme_ns_head *head)
{
if (!head->disk)
return;
sysfs_remove_group(&disk_to_dev(head->disk)->kobj,
&nvme_ns_id_attr_group);
del_gendisk(head->disk);
blk_set_queue_dying(head->disk->queue);
/* make sure all pending bios are cleaned up */
kblockd_schedule_work(&head->requeue_work);
flush_work(&head->requeue_work);
blk_cleanup_queue(head->disk->queue);
put_disk(head->disk);
}
void nvme_mpath_remove_disk_links(struct nvme_ns *ns)
{
if (!ns->head->disk)
return;
sysfs_remove_link(ns->disk->part0.holder_dir,
kobject_name(&disk_to_dev(ns->head->disk)->kobj));
sysfs_remove_link(ns->head->disk->slave_dir,
kobject_name(&disk_to_dev(ns->disk)->kobj));
}

View File

@ -15,16 +15,17 @@
#define _NVME_H #define _NVME_H
#include <linux/nvme.h> #include <linux/nvme.h>
#include <linux/cdev.h>
#include <linux/pci.h> #include <linux/pci.h>
#include <linux/kref.h> #include <linux/kref.h>
#include <linux/blk-mq.h> #include <linux/blk-mq.h>
#include <linux/lightnvm.h> #include <linux/lightnvm.h>
#include <linux/sed-opal.h> #include <linux/sed-opal.h>
extern unsigned char nvme_io_timeout; extern unsigned int nvme_io_timeout;
#define NVME_IO_TIMEOUT (nvme_io_timeout * HZ) #define NVME_IO_TIMEOUT (nvme_io_timeout * HZ)
extern unsigned char admin_timeout; extern unsigned int admin_timeout;
#define ADMIN_TIMEOUT (admin_timeout * HZ) #define ADMIN_TIMEOUT (admin_timeout * HZ)
#define NVME_DEFAULT_KATO 5 #define NVME_DEFAULT_KATO 5
@ -94,6 +95,11 @@ struct nvme_request {
u16 status; u16 status;
}; };
/*
* Mark a bio as coming in through the mpath node.
*/
#define REQ_NVME_MPATH REQ_DRV
enum { enum {
NVME_REQ_CANCELLED = (1 << 0), NVME_REQ_CANCELLED = (1 << 0),
}; };
@ -127,24 +133,23 @@ struct nvme_ctrl {
struct request_queue *admin_q; struct request_queue *admin_q;
struct request_queue *connect_q; struct request_queue *connect_q;
struct device *dev; struct device *dev;
struct kref kref;
int instance; int instance;
struct blk_mq_tag_set *tagset; struct blk_mq_tag_set *tagset;
struct blk_mq_tag_set *admin_tagset; struct blk_mq_tag_set *admin_tagset;
struct list_head namespaces; struct list_head namespaces;
struct mutex namespaces_mutex; struct mutex namespaces_mutex;
struct device ctrl_device;
struct device *device; /* char device */ struct device *device; /* char device */
struct list_head node; struct cdev cdev;
struct ida ns_ida;
struct work_struct reset_work; struct work_struct reset_work;
struct work_struct delete_work;
struct nvme_subsystem *subsys;
struct list_head subsys_entry;
struct opal_dev *opal_dev; struct opal_dev *opal_dev;
char name[12]; char name[12];
char serial[20];
char model[40];
char firmware_rev[8];
char subnqn[NVMF_NQN_SIZE];
u16 cntlid; u16 cntlid;
u32 ctrl_config; u32 ctrl_config;
@ -155,23 +160,23 @@ struct nvme_ctrl {
u32 page_size; u32 page_size;
u32 max_hw_sectors; u32 max_hw_sectors;
u16 oncs; u16 oncs;
u16 vid;
u16 oacs; u16 oacs;
u16 nssa; u16 nssa;
u16 nr_streams; u16 nr_streams;
atomic_t abort_limit; atomic_t abort_limit;
u8 event_limit;
u8 vwc; u8 vwc;
u32 vs; u32 vs;
u32 sgls; u32 sgls;
u16 kas; u16 kas;
u8 npss; u8 npss;
u8 apsta; u8 apsta;
u32 aen_result;
unsigned int shutdown_timeout; unsigned int shutdown_timeout;
unsigned int kato; unsigned int kato;
bool subsystem; bool subsystem;
unsigned long quirks; unsigned long quirks;
struct nvme_id_power_state psd[32]; struct nvme_id_power_state psd[32];
struct nvme_effects_log *effects;
struct work_struct scan_work; struct work_struct scan_work;
struct work_struct async_event_work; struct work_struct async_event_work;
struct delayed_work ka_work; struct delayed_work ka_work;
@ -197,21 +202,72 @@ struct nvme_ctrl {
struct nvmf_ctrl_options *opts; struct nvmf_ctrl_options *opts;
}; };
struct nvme_subsystem {
int instance;
struct device dev;
/*
* Because we unregister the device on the last put we need
* a separate refcount.
*/
struct kref ref;
struct list_head entry;
struct mutex lock;
struct list_head ctrls;
struct list_head nsheads;
char subnqn[NVMF_NQN_SIZE];
char serial[20];
char model[40];
char firmware_rev[8];
u8 cmic;
u16 vendor_id;
struct ida ns_ida;
};
/*
* Container structure for uniqueue namespace identifiers.
*/
struct nvme_ns_ids {
u8 eui64[8];
u8 nguid[16];
uuid_t uuid;
};
/*
* Anchor structure for namespaces. There is one for each namespace in a
* NVMe subsystem that any of our controllers can see, and the namespace
* structure for each controller is chained of it. For private namespaces
* there is a 1:1 relation to our namespace structures, that is ->list
* only ever has a single entry for private namespaces.
*/
struct nvme_ns_head {
#ifdef CONFIG_NVME_MULTIPATH
struct gendisk *disk;
struct nvme_ns __rcu *current_path;
struct bio_list requeue_list;
spinlock_t requeue_lock;
struct work_struct requeue_work;
#endif
struct list_head list;
struct srcu_struct srcu;
struct nvme_subsystem *subsys;
unsigned ns_id;
struct nvme_ns_ids ids;
struct list_head entry;
struct kref ref;
int instance;
};
struct nvme_ns { struct nvme_ns {
struct list_head list; struct list_head list;
struct nvme_ctrl *ctrl; struct nvme_ctrl *ctrl;
struct request_queue *queue; struct request_queue *queue;
struct gendisk *disk; struct gendisk *disk;
struct list_head siblings;
struct nvm_dev *ndev; struct nvm_dev *ndev;
struct kref kref; struct kref kref;
int instance; struct nvme_ns_head *head;
u8 eui[8];
u8 nguid[16];
uuid_t uuid;
unsigned ns_id;
int lba_shift; int lba_shift;
u16 ms; u16 ms;
u16 sgs; u16 sgs;
@ -234,9 +290,10 @@ struct nvme_ctrl_ops {
int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 val); int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 val);
int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 *val); int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 *val);
void (*free_ctrl)(struct nvme_ctrl *ctrl); void (*free_ctrl)(struct nvme_ctrl *ctrl);
void (*submit_async_event)(struct nvme_ctrl *ctrl, int aer_idx); void (*submit_async_event)(struct nvme_ctrl *ctrl);
int (*delete_ctrl)(struct nvme_ctrl *ctrl); void (*delete_ctrl)(struct nvme_ctrl *ctrl);
int (*get_address)(struct nvme_ctrl *ctrl, char *buf, int size); int (*get_address)(struct nvme_ctrl *ctrl, char *buf, int size);
int (*reinit_request)(void *data, struct request *rq);
}; };
static inline bool nvme_ctrl_ready(struct nvme_ctrl *ctrl) static inline bool nvme_ctrl_ready(struct nvme_ctrl *ctrl)
@ -278,6 +335,16 @@ static inline void nvme_end_request(struct request *req, __le16 status,
blk_mq_complete_request(req); blk_mq_complete_request(req);
} }
static inline void nvme_get_ctrl(struct nvme_ctrl *ctrl)
{
get_device(ctrl->device);
}
static inline void nvme_put_ctrl(struct nvme_ctrl *ctrl)
{
put_device(ctrl->device);
}
void nvme_complete_rq(struct request *req); void nvme_complete_rq(struct request *req);
void nvme_cancel_request(struct request *req, void *data, bool reserved); void nvme_cancel_request(struct request *req, void *data, bool reserved);
bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl, bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
@ -299,10 +366,8 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl);
int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len, int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
bool send); bool send);
#define NVME_NR_AERS 1
void nvme_complete_async_event(struct nvme_ctrl *ctrl, __le16 status, void nvme_complete_async_event(struct nvme_ctrl *ctrl, __le16 status,
union nvme_result *res); union nvme_result *res);
void nvme_queue_async_events(struct nvme_ctrl *ctrl);
void nvme_stop_queues(struct nvme_ctrl *ctrl); void nvme_stop_queues(struct nvme_ctrl *ctrl);
void nvme_start_queues(struct nvme_ctrl *ctrl); void nvme_start_queues(struct nvme_ctrl *ctrl);
@ -311,21 +376,79 @@ void nvme_unfreeze(struct nvme_ctrl *ctrl);
void nvme_wait_freeze(struct nvme_ctrl *ctrl); void nvme_wait_freeze(struct nvme_ctrl *ctrl);
void nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout); void nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout);
void nvme_start_freeze(struct nvme_ctrl *ctrl); void nvme_start_freeze(struct nvme_ctrl *ctrl);
int nvme_reinit_tagset(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set);
#define NVME_QID_ANY -1 #define NVME_QID_ANY -1
struct request *nvme_alloc_request(struct request_queue *q, struct request *nvme_alloc_request(struct request_queue *q,
struct nvme_command *cmd, unsigned int flags, int qid); struct nvme_command *cmd, blk_mq_req_flags_t flags, int qid);
blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct request *req, blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct request *req,
struct nvme_command *cmd); struct nvme_command *cmd);
int nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd, int nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
void *buf, unsigned bufflen); void *buf, unsigned bufflen);
int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd, int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
union nvme_result *result, void *buffer, unsigned bufflen, union nvme_result *result, void *buffer, unsigned bufflen,
unsigned timeout, int qid, int at_head, int flags); unsigned timeout, int qid, int at_head,
blk_mq_req_flags_t flags);
int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count); int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count);
void nvme_start_keep_alive(struct nvme_ctrl *ctrl); void nvme_start_keep_alive(struct nvme_ctrl *ctrl);
void nvme_stop_keep_alive(struct nvme_ctrl *ctrl); void nvme_stop_keep_alive(struct nvme_ctrl *ctrl);
int nvme_reset_ctrl(struct nvme_ctrl *ctrl); int nvme_reset_ctrl(struct nvme_ctrl *ctrl);
int nvme_delete_ctrl(struct nvme_ctrl *ctrl);
int nvme_delete_ctrl_sync(struct nvme_ctrl *ctrl);
extern const struct attribute_group nvme_ns_id_attr_group;
extern const struct block_device_operations nvme_ns_head_ops;
#ifdef CONFIG_NVME_MULTIPATH
void nvme_failover_req(struct request *req);
bool nvme_req_needs_failover(struct request *req);
void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl);
int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl,struct nvme_ns_head *head);
void nvme_mpath_add_disk(struct nvme_ns_head *head);
void nvme_mpath_add_disk_links(struct nvme_ns *ns);
void nvme_mpath_remove_disk(struct nvme_ns_head *head);
void nvme_mpath_remove_disk_links(struct nvme_ns *ns);
static inline void nvme_mpath_clear_current_path(struct nvme_ns *ns)
{
struct nvme_ns_head *head = ns->head;
if (head && ns == srcu_dereference(head->current_path, &head->srcu))
rcu_assign_pointer(head->current_path, NULL);
}
struct nvme_ns *nvme_find_path(struct nvme_ns_head *head);
#else
static inline void nvme_failover_req(struct request *req)
{
}
static inline bool nvme_req_needs_failover(struct request *req)
{
return false;
}
static inline void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl)
{
}
static inline int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl,
struct nvme_ns_head *head)
{
return 0;
}
static inline void nvme_mpath_add_disk(struct nvme_ns_head *head)
{
}
static inline void nvme_mpath_remove_disk(struct nvme_ns_head *head)
{
}
static inline void nvme_mpath_add_disk_links(struct nvme_ns *ns)
{
}
static inline void nvme_mpath_remove_disk_links(struct nvme_ns *ns)
{
}
static inline void nvme_mpath_clear_current_path(struct nvme_ns *ns)
{
}
#endif /* CONFIG_NVME_MULTIPATH */
#ifdef CONFIG_NVM #ifdef CONFIG_NVM
int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node); int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node);

View File

@ -13,7 +13,6 @@
*/ */
#include <linux/aer.h> #include <linux/aer.h>
#include <linux/bitops.h>
#include <linux/blkdev.h> #include <linux/blkdev.h>
#include <linux/blk-mq.h> #include <linux/blk-mq.h>
#include <linux/blk-mq-pci.h> #include <linux/blk-mq-pci.h>
@ -26,12 +25,9 @@
#include <linux/mutex.h> #include <linux/mutex.h>
#include <linux/once.h> #include <linux/once.h>
#include <linux/pci.h> #include <linux/pci.h>
#include <linux/poison.h>
#include <linux/t10-pi.h> #include <linux/t10-pi.h>
#include <linux/timer.h>
#include <linux/types.h> #include <linux/types.h>
#include <linux/io-64-nonatomic-lo-hi.h> #include <linux/io-64-nonatomic-lo-hi.h>
#include <asm/unaligned.h>
#include <linux/sed-opal.h> #include <linux/sed-opal.h>
#include "nvme.h" #include "nvme.h"
@ -39,11 +35,7 @@
#define SQ_SIZE(depth) (depth * sizeof(struct nvme_command)) #define SQ_SIZE(depth) (depth * sizeof(struct nvme_command))
#define CQ_SIZE(depth) (depth * sizeof(struct nvme_completion)) #define CQ_SIZE(depth) (depth * sizeof(struct nvme_completion))
/* #define SGES_PER_PAGE (PAGE_SIZE / sizeof(struct nvme_sgl_desc))
* We handle AEN commands ourselves and don't even let the
* block layer know about them.
*/
#define NVME_AQ_BLKMQ_DEPTH (NVME_AQ_DEPTH - NVME_NR_AERS)
static int use_threaded_interrupts; static int use_threaded_interrupts;
module_param(use_threaded_interrupts, int, 0); module_param(use_threaded_interrupts, int, 0);
@ -57,6 +49,12 @@ module_param(max_host_mem_size_mb, uint, 0444);
MODULE_PARM_DESC(max_host_mem_size_mb, MODULE_PARM_DESC(max_host_mem_size_mb,
"Maximum Host Memory Buffer (HMB) size per controller (in MiB)"); "Maximum Host Memory Buffer (HMB) size per controller (in MiB)");
static unsigned int sgl_threshold = SZ_32K;
module_param(sgl_threshold, uint, 0644);
MODULE_PARM_DESC(sgl_threshold,
"Use SGLs when average request segment size is larger or equal to "
"this size. Use 0 to disable SGLs.");
static int io_queue_depth_set(const char *val, const struct kernel_param *kp); static int io_queue_depth_set(const char *val, const struct kernel_param *kp);
static const struct kernel_param_ops io_queue_depth_ops = { static const struct kernel_param_ops io_queue_depth_ops = {
.set = io_queue_depth_set, .set = io_queue_depth_set,
@ -178,6 +176,7 @@ struct nvme_queue {
struct nvme_iod { struct nvme_iod {
struct nvme_request req; struct nvme_request req;
struct nvme_queue *nvmeq; struct nvme_queue *nvmeq;
bool use_sgl;
int aborted; int aborted;
int npages; /* In the PRP list. 0 means small pool in use */ int npages; /* In the PRP list. 0 means small pool in use */
int nents; /* Used in scatterlist */ int nents; /* Used in scatterlist */
@ -331,17 +330,35 @@ static int nvme_npages(unsigned size, struct nvme_dev *dev)
return DIV_ROUND_UP(8 * nprps, PAGE_SIZE - 8); return DIV_ROUND_UP(8 * nprps, PAGE_SIZE - 8);
} }
static unsigned int nvme_iod_alloc_size(struct nvme_dev *dev, /*
unsigned int size, unsigned int nseg) * Calculates the number of pages needed for the SGL segments. For example a 4k
* page can accommodate 256 SGL descriptors.
*/
static int nvme_pci_npages_sgl(unsigned int num_seg)
{ {
return sizeof(__le64 *) * nvme_npages(size, dev) + return DIV_ROUND_UP(num_seg * sizeof(struct nvme_sgl_desc), PAGE_SIZE);
sizeof(struct scatterlist) * nseg;
} }
static unsigned int nvme_cmd_size(struct nvme_dev *dev) static unsigned int nvme_pci_iod_alloc_size(struct nvme_dev *dev,
unsigned int size, unsigned int nseg, bool use_sgl)
{ {
return sizeof(struct nvme_iod) + size_t alloc_size;
nvme_iod_alloc_size(dev, NVME_INT_BYTES(dev), NVME_INT_PAGES);
if (use_sgl)
alloc_size = sizeof(__le64 *) * nvme_pci_npages_sgl(nseg);
else
alloc_size = sizeof(__le64 *) * nvme_npages(size, dev);
return alloc_size + sizeof(struct scatterlist) * nseg;
}
static unsigned int nvme_pci_cmd_size(struct nvme_dev *dev, bool use_sgl)
{
unsigned int alloc_size = nvme_pci_iod_alloc_size(dev,
NVME_INT_BYTES(dev), NVME_INT_PAGES,
use_sgl);
return sizeof(struct nvme_iod) + alloc_size;
} }
static int nvme_admin_init_hctx(struct blk_mq_hw_ctx *hctx, void *data, static int nvme_admin_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
@ -425,10 +442,10 @@ static void __nvme_submit_cmd(struct nvme_queue *nvmeq,
nvmeq->sq_tail = tail; nvmeq->sq_tail = tail;
} }
static __le64 **iod_list(struct request *req) static void **nvme_pci_iod_list(struct request *req)
{ {
struct nvme_iod *iod = blk_mq_rq_to_pdu(req); struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
return (__le64 **)(iod->sg + blk_rq_nr_phys_segments(req)); return (void **)(iod->sg + blk_rq_nr_phys_segments(req));
} }
static blk_status_t nvme_init_iod(struct request *rq, struct nvme_dev *dev) static blk_status_t nvme_init_iod(struct request *rq, struct nvme_dev *dev)
@ -438,7 +455,10 @@ static blk_status_t nvme_init_iod(struct request *rq, struct nvme_dev *dev)
unsigned int size = blk_rq_payload_bytes(rq); unsigned int size = blk_rq_payload_bytes(rq);
if (nseg > NVME_INT_PAGES || size > NVME_INT_BYTES(dev)) { if (nseg > NVME_INT_PAGES || size > NVME_INT_BYTES(dev)) {
iod->sg = kmalloc(nvme_iod_alloc_size(dev, size, nseg), GFP_ATOMIC); size_t alloc_size = nvme_pci_iod_alloc_size(dev, size, nseg,
iod->use_sgl);
iod->sg = kmalloc(alloc_size, GFP_ATOMIC);
if (!iod->sg) if (!iod->sg)
return BLK_STS_RESOURCE; return BLK_STS_RESOURCE;
} else { } else {
@ -456,18 +476,31 @@ static blk_status_t nvme_init_iod(struct request *rq, struct nvme_dev *dev)
static void nvme_free_iod(struct nvme_dev *dev, struct request *req) static void nvme_free_iod(struct nvme_dev *dev, struct request *req)
{ {
struct nvme_iod *iod = blk_mq_rq_to_pdu(req); struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
const int last_prp = dev->ctrl.page_size / 8 - 1; const int last_prp = dev->ctrl.page_size / sizeof(__le64) - 1;
dma_addr_t dma_addr = iod->first_dma, next_dma_addr;
int i; int i;
__le64 **list = iod_list(req);
dma_addr_t prp_dma = iod->first_dma;
if (iod->npages == 0) if (iod->npages == 0)
dma_pool_free(dev->prp_small_pool, list[0], prp_dma); dma_pool_free(dev->prp_small_pool, nvme_pci_iod_list(req)[0],
dma_addr);
for (i = 0; i < iod->npages; i++) { for (i = 0; i < iod->npages; i++) {
__le64 *prp_list = list[i]; void *addr = nvme_pci_iod_list(req)[i];
dma_addr_t next_prp_dma = le64_to_cpu(prp_list[last_prp]);
dma_pool_free(dev->prp_page_pool, prp_list, prp_dma); if (iod->use_sgl) {
prp_dma = next_prp_dma; struct nvme_sgl_desc *sg_list = addr;
next_dma_addr =
le64_to_cpu((sg_list[SGES_PER_PAGE - 1]).addr);
} else {
__le64 *prp_list = addr;
next_dma_addr = le64_to_cpu(prp_list[last_prp]);
}
dma_pool_free(dev->prp_page_pool, addr, dma_addr);
dma_addr = next_dma_addr;
} }
if (iod->sg != iod->inline_sg) if (iod->sg != iod->inline_sg)
@ -555,7 +588,8 @@ static void nvme_print_sgl(struct scatterlist *sgl, int nents)
} }
} }
static blk_status_t nvme_setup_prps(struct nvme_dev *dev, struct request *req) static blk_status_t nvme_pci_setup_prps(struct nvme_dev *dev,
struct request *req, struct nvme_rw_command *cmnd)
{ {
struct nvme_iod *iod = blk_mq_rq_to_pdu(req); struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
struct dma_pool *pool; struct dma_pool *pool;
@ -566,14 +600,16 @@ static blk_status_t nvme_setup_prps(struct nvme_dev *dev, struct request *req)
u32 page_size = dev->ctrl.page_size; u32 page_size = dev->ctrl.page_size;
int offset = dma_addr & (page_size - 1); int offset = dma_addr & (page_size - 1);
__le64 *prp_list; __le64 *prp_list;
__le64 **list = iod_list(req); void **list = nvme_pci_iod_list(req);
dma_addr_t prp_dma; dma_addr_t prp_dma;
int nprps, i; int nprps, i;
iod->use_sgl = false;
length -= (page_size - offset); length -= (page_size - offset);
if (length <= 0) { if (length <= 0) {
iod->first_dma = 0; iod->first_dma = 0;
return BLK_STS_OK; goto done;
} }
dma_len -= (page_size - offset); dma_len -= (page_size - offset);
@ -587,7 +623,7 @@ static blk_status_t nvme_setup_prps(struct nvme_dev *dev, struct request *req)
if (length <= page_size) { if (length <= page_size) {
iod->first_dma = dma_addr; iod->first_dma = dma_addr;
return BLK_STS_OK; goto done;
} }
nprps = DIV_ROUND_UP(length, page_size); nprps = DIV_ROUND_UP(length, page_size);
@ -634,6 +670,10 @@ static blk_status_t nvme_setup_prps(struct nvme_dev *dev, struct request *req)
dma_len = sg_dma_len(sg); dma_len = sg_dma_len(sg);
} }
done:
cmnd->dptr.prp1 = cpu_to_le64(sg_dma_address(iod->sg));
cmnd->dptr.prp2 = cpu_to_le64(iod->first_dma);
return BLK_STS_OK; return BLK_STS_OK;
bad_sgl: bad_sgl:
@ -643,6 +683,110 @@ static blk_status_t nvme_setup_prps(struct nvme_dev *dev, struct request *req)
return BLK_STS_IOERR; return BLK_STS_IOERR;
} }
static void nvme_pci_sgl_set_data(struct nvme_sgl_desc *sge,
struct scatterlist *sg)
{
sge->addr = cpu_to_le64(sg_dma_address(sg));
sge->length = cpu_to_le32(sg_dma_len(sg));
sge->type = NVME_SGL_FMT_DATA_DESC << 4;
}
static void nvme_pci_sgl_set_seg(struct nvme_sgl_desc *sge,
dma_addr_t dma_addr, int entries)
{
sge->addr = cpu_to_le64(dma_addr);
if (entries < SGES_PER_PAGE) {
sge->length = cpu_to_le32(entries * sizeof(*sge));
sge->type = NVME_SGL_FMT_LAST_SEG_DESC << 4;
} else {
sge->length = cpu_to_le32(PAGE_SIZE);
sge->type = NVME_SGL_FMT_SEG_DESC << 4;
}
}
static blk_status_t nvme_pci_setup_sgls(struct nvme_dev *dev,
struct request *req, struct nvme_rw_command *cmd)
{
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
int length = blk_rq_payload_bytes(req);
struct dma_pool *pool;
struct nvme_sgl_desc *sg_list;
struct scatterlist *sg = iod->sg;
int entries = iod->nents, i = 0;
dma_addr_t sgl_dma;
iod->use_sgl = true;
/* setting the transfer type as SGL */
cmd->flags = NVME_CMD_SGL_METABUF;
if (length == sg_dma_len(sg)) {
nvme_pci_sgl_set_data(&cmd->dptr.sgl, sg);
return BLK_STS_OK;
}
if (entries <= (256 / sizeof(struct nvme_sgl_desc))) {
pool = dev->prp_small_pool;
iod->npages = 0;
} else {
pool = dev->prp_page_pool;
iod->npages = 1;
}
sg_list = dma_pool_alloc(pool, GFP_ATOMIC, &sgl_dma);
if (!sg_list) {
iod->npages = -1;
return BLK_STS_RESOURCE;
}
nvme_pci_iod_list(req)[0] = sg_list;
iod->first_dma = sgl_dma;
nvme_pci_sgl_set_seg(&cmd->dptr.sgl, sgl_dma, entries);
do {
if (i == SGES_PER_PAGE) {
struct nvme_sgl_desc *old_sg_desc = sg_list;
struct nvme_sgl_desc *link = &old_sg_desc[i - 1];
sg_list = dma_pool_alloc(pool, GFP_ATOMIC, &sgl_dma);
if (!sg_list)
return BLK_STS_RESOURCE;
i = 0;
nvme_pci_iod_list(req)[iod->npages++] = sg_list;
sg_list[i++] = *link;
nvme_pci_sgl_set_seg(link, sgl_dma, entries);
}
nvme_pci_sgl_set_data(&sg_list[i++], sg);
length -= sg_dma_len(sg);
sg = sg_next(sg);
entries--;
} while (length > 0);
WARN_ON(entries > 0);
return BLK_STS_OK;
}
static inline bool nvme_pci_use_sgls(struct nvme_dev *dev, struct request *req)
{
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
unsigned int avg_seg_size;
avg_seg_size = DIV_ROUND_UP(blk_rq_payload_bytes(req),
blk_rq_nr_phys_segments(req));
if (!(dev->ctrl.sgls & ((1 << 0) | (1 << 1))))
return false;
if (!iod->nvmeq->qid)
return false;
if (!sgl_threshold || avg_seg_size < sgl_threshold)
return false;
return true;
}
static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req, static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
struct nvme_command *cmnd) struct nvme_command *cmnd)
{ {
@ -662,7 +806,11 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
DMA_ATTR_NO_WARN)) DMA_ATTR_NO_WARN))
goto out; goto out;
ret = nvme_setup_prps(dev, req); if (nvme_pci_use_sgls(dev, req))
ret = nvme_pci_setup_sgls(dev, req, &cmnd->rw);
else
ret = nvme_pci_setup_prps(dev, req, &cmnd->rw);
if (ret != BLK_STS_OK) if (ret != BLK_STS_OK)
goto out_unmap; goto out_unmap;
@ -682,8 +830,6 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
goto out_unmap; goto out_unmap;
} }
cmnd->rw.dptr.prp1 = cpu_to_le64(sg_dma_address(iod->sg));
cmnd->rw.dptr.prp2 = cpu_to_le64(iod->first_dma);
if (blk_integrity_rq(req)) if (blk_integrity_rq(req))
cmnd->rw.metadata = cpu_to_le64(sg_dma_address(&iod->meta_sg)); cmnd->rw.metadata = cpu_to_le64(sg_dma_address(&iod->meta_sg));
return BLK_STS_OK; return BLK_STS_OK;
@ -804,7 +950,7 @@ static inline void nvme_handle_cqe(struct nvme_queue *nvmeq,
* for them but rather special case them here. * for them but rather special case them here.
*/ */
if (unlikely(nvmeq->qid == 0 && if (unlikely(nvmeq->qid == 0 &&
cqe->command_id >= NVME_AQ_BLKMQ_DEPTH)) { cqe->command_id >= NVME_AQ_BLK_MQ_DEPTH)) {
nvme_complete_async_event(&nvmeq->dev->ctrl, nvme_complete_async_event(&nvmeq->dev->ctrl,
cqe->status, &cqe->result); cqe->status, &cqe->result);
return; return;
@ -897,7 +1043,7 @@ static int nvme_poll(struct blk_mq_hw_ctx *hctx, unsigned int tag)
return __nvme_poll(nvmeq, tag); return __nvme_poll(nvmeq, tag);
} }
static void nvme_pci_submit_async_event(struct nvme_ctrl *ctrl, int aer_idx) static void nvme_pci_submit_async_event(struct nvme_ctrl *ctrl)
{ {
struct nvme_dev *dev = to_nvme_dev(ctrl); struct nvme_dev *dev = to_nvme_dev(ctrl);
struct nvme_queue *nvmeq = dev->queues[0]; struct nvme_queue *nvmeq = dev->queues[0];
@ -905,7 +1051,7 @@ static void nvme_pci_submit_async_event(struct nvme_ctrl *ctrl, int aer_idx)
memset(&c, 0, sizeof(c)); memset(&c, 0, sizeof(c));
c.common.opcode = nvme_admin_async_event; c.common.opcode = nvme_admin_async_event;
c.common.command_id = NVME_AQ_BLKMQ_DEPTH + aer_idx; c.common.command_id = NVME_AQ_BLK_MQ_DEPTH;
spin_lock_irq(&nvmeq->q_lock); spin_lock_irq(&nvmeq->q_lock);
__nvme_submit_cmd(nvmeq, &c); __nvme_submit_cmd(nvmeq, &c);
@ -930,7 +1076,7 @@ static int adapter_alloc_cq(struct nvme_dev *dev, u16 qid,
int flags = NVME_QUEUE_PHYS_CONTIG | NVME_CQ_IRQ_ENABLED; int flags = NVME_QUEUE_PHYS_CONTIG | NVME_CQ_IRQ_ENABLED;
/* /*
* Note: we (ab)use the fact the the prp fields survive if no data * Note: we (ab)use the fact that the prp fields survive if no data
* is attached to the request. * is attached to the request.
*/ */
memset(&c, 0, sizeof(c)); memset(&c, 0, sizeof(c));
@ -951,7 +1097,7 @@ static int adapter_alloc_sq(struct nvme_dev *dev, u16 qid,
int flags = NVME_QUEUE_PHYS_CONTIG; int flags = NVME_QUEUE_PHYS_CONTIG;
/* /*
* Note: we (ab)use the fact the the prp fields survive if no data * Note: we (ab)use the fact that the prp fields survive if no data
* is attached to the request. * is attached to the request.
*/ */
memset(&c, 0, sizeof(c)); memset(&c, 0, sizeof(c));
@ -1372,14 +1518,10 @@ static int nvme_alloc_admin_tags(struct nvme_dev *dev)
dev->admin_tagset.ops = &nvme_mq_admin_ops; dev->admin_tagset.ops = &nvme_mq_admin_ops;
dev->admin_tagset.nr_hw_queues = 1; dev->admin_tagset.nr_hw_queues = 1;
/* dev->admin_tagset.queue_depth = NVME_AQ_MQ_TAG_DEPTH;
* Subtract one to leave an empty queue entry for 'Full Queue'
* condition. See NVM-Express 1.2 specification, section 4.1.2.
*/
dev->admin_tagset.queue_depth = NVME_AQ_BLKMQ_DEPTH - 1;
dev->admin_tagset.timeout = ADMIN_TIMEOUT; dev->admin_tagset.timeout = ADMIN_TIMEOUT;
dev->admin_tagset.numa_node = dev_to_node(dev->dev); dev->admin_tagset.numa_node = dev_to_node(dev->dev);
dev->admin_tagset.cmd_size = nvme_cmd_size(dev); dev->admin_tagset.cmd_size = nvme_pci_cmd_size(dev, false);
dev->admin_tagset.flags = BLK_MQ_F_NO_SCHED; dev->admin_tagset.flags = BLK_MQ_F_NO_SCHED;
dev->admin_tagset.driver_data = dev; dev->admin_tagset.driver_data = dev;
@ -1906,7 +2048,11 @@ static int nvme_dev_add(struct nvme_dev *dev)
dev->tagset.numa_node = dev_to_node(dev->dev); dev->tagset.numa_node = dev_to_node(dev->dev);
dev->tagset.queue_depth = dev->tagset.queue_depth =
min_t(int, dev->q_depth, BLK_MQ_MAX_DEPTH) - 1; min_t(int, dev->q_depth, BLK_MQ_MAX_DEPTH) - 1;
dev->tagset.cmd_size = nvme_cmd_size(dev); dev->tagset.cmd_size = nvme_pci_cmd_size(dev, false);
if ((dev->ctrl.sgls & ((1 << 0) | (1 << 1))) && sgl_threshold) {
dev->tagset.cmd_size = max(dev->tagset.cmd_size,
nvme_pci_cmd_size(dev, true));
}
dev->tagset.flags = BLK_MQ_F_SHOULD_MERGE; dev->tagset.flags = BLK_MQ_F_SHOULD_MERGE;
dev->tagset.driver_data = dev; dev->tagset.driver_data = dev;
@ -2132,9 +2278,9 @@ static void nvme_remove_dead_ctrl(struct nvme_dev *dev, int status)
{ {
dev_warn(dev->ctrl.device, "Removing after probe failure status: %d\n", status); dev_warn(dev->ctrl.device, "Removing after probe failure status: %d\n", status);
kref_get(&dev->ctrl.kref); nvme_get_ctrl(&dev->ctrl);
nvme_dev_disable(dev, false); nvme_dev_disable(dev, false);
if (!schedule_work(&dev->remove_work)) if (!queue_work(nvme_wq, &dev->remove_work))
nvme_put_ctrl(&dev->ctrl); nvme_put_ctrl(&dev->ctrl);
} }
@ -2557,6 +2703,7 @@ static int __init nvme_init(void)
static void __exit nvme_exit(void) static void __exit nvme_exit(void)
{ {
pci_unregister_driver(&nvme_driver); pci_unregister_driver(&nvme_driver);
flush_workqueue(nvme_wq);
_nvme_check_size(); _nvme_check_size();
} }

View File

@ -41,17 +41,9 @@
#define NVME_RDMA_MAX_INLINE_SEGMENTS 1 #define NVME_RDMA_MAX_INLINE_SEGMENTS 1
/*
* We handle AEN commands ourselves and don't even let the
* block layer know about them.
*/
#define NVME_RDMA_NR_AEN_COMMANDS 1
#define NVME_RDMA_AQ_BLKMQ_DEPTH \
(NVME_AQ_DEPTH - NVME_RDMA_NR_AEN_COMMANDS)
struct nvme_rdma_device { struct nvme_rdma_device {
struct ib_device *dev; struct ib_device *dev;
struct ib_pd *pd; struct ib_pd *pd;
struct kref ref; struct kref ref;
struct list_head entry; struct list_head entry;
}; };
@ -79,8 +71,8 @@ struct nvme_rdma_request {
}; };
enum nvme_rdma_queue_flags { enum nvme_rdma_queue_flags {
NVME_RDMA_Q_LIVE = 0, NVME_RDMA_Q_ALLOCATED = 0,
NVME_RDMA_Q_DELETING = 1, NVME_RDMA_Q_LIVE = 1,
}; };
struct nvme_rdma_queue { struct nvme_rdma_queue {
@ -105,7 +97,6 @@ struct nvme_rdma_ctrl {
/* other member variables */ /* other member variables */
struct blk_mq_tag_set tag_set; struct blk_mq_tag_set tag_set;
struct work_struct delete_work;
struct work_struct err_work; struct work_struct err_work;
struct nvme_rdma_qe async_event_sqe; struct nvme_rdma_qe async_event_sqe;
@ -274,6 +265,9 @@ static int nvme_rdma_reinit_request(void *data, struct request *rq)
struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq); struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq);
int ret = 0; int ret = 0;
if (WARN_ON_ONCE(!req->mr))
return 0;
ib_dereg_mr(req->mr); ib_dereg_mr(req->mr);
req->mr = ib_alloc_mr(dev->pd, IB_MR_TYPE_MEM_REG, req->mr = ib_alloc_mr(dev->pd, IB_MR_TYPE_MEM_REG,
@ -434,11 +428,9 @@ out_err:
static void nvme_rdma_destroy_queue_ib(struct nvme_rdma_queue *queue) static void nvme_rdma_destroy_queue_ib(struct nvme_rdma_queue *queue)
{ {
struct nvme_rdma_device *dev; struct nvme_rdma_device *dev = queue->device;
struct ib_device *ibdev; struct ib_device *ibdev = dev->dev;
dev = queue->device;
ibdev = dev->dev;
rdma_destroy_qp(queue->cm_id); rdma_destroy_qp(queue->cm_id);
ib_free_cq(queue->ib_cq); ib_free_cq(queue->ib_cq);
@ -493,7 +485,7 @@ static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue)
return 0; return 0;
out_destroy_qp: out_destroy_qp:
ib_destroy_qp(queue->qp); rdma_destroy_qp(queue->cm_id);
out_destroy_ib_cq: out_destroy_ib_cq:
ib_free_cq(queue->ib_cq); ib_free_cq(queue->ib_cq);
out_put_dev: out_put_dev:
@ -544,11 +536,11 @@ static int nvme_rdma_alloc_queue(struct nvme_rdma_ctrl *ctrl,
ret = nvme_rdma_wait_for_cm(queue); ret = nvme_rdma_wait_for_cm(queue);
if (ret) { if (ret) {
dev_info(ctrl->ctrl.device, dev_info(ctrl->ctrl.device,
"rdma_resolve_addr wait failed (%d).\n", ret); "rdma connection establishment failed (%d)\n", ret);
goto out_destroy_cm_id; goto out_destroy_cm_id;
} }
clear_bit(NVME_RDMA_Q_DELETING, &queue->flags); set_bit(NVME_RDMA_Q_ALLOCATED, &queue->flags);
return 0; return 0;
@ -568,7 +560,7 @@ static void nvme_rdma_stop_queue(struct nvme_rdma_queue *queue)
static void nvme_rdma_free_queue(struct nvme_rdma_queue *queue) static void nvme_rdma_free_queue(struct nvme_rdma_queue *queue)
{ {
if (test_and_set_bit(NVME_RDMA_Q_DELETING, &queue->flags)) if (!test_and_clear_bit(NVME_RDMA_Q_ALLOCATED, &queue->flags))
return; return;
if (nvme_rdma_queue_idx(queue) == 0) { if (nvme_rdma_queue_idx(queue) == 0) {
@ -676,11 +668,10 @@ out_free_queues:
return ret; return ret;
} }
static void nvme_rdma_free_tagset(struct nvme_ctrl *nctrl, bool admin) static void nvme_rdma_free_tagset(struct nvme_ctrl *nctrl,
struct blk_mq_tag_set *set)
{ {
struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl); struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl);
struct blk_mq_tag_set *set = admin ?
&ctrl->admin_tag_set : &ctrl->tag_set;
blk_mq_free_tag_set(set); blk_mq_free_tag_set(set);
nvme_rdma_dev_put(ctrl->device); nvme_rdma_dev_put(ctrl->device);
@ -697,7 +688,7 @@ static struct blk_mq_tag_set *nvme_rdma_alloc_tagset(struct nvme_ctrl *nctrl,
set = &ctrl->admin_tag_set; set = &ctrl->admin_tag_set;
memset(set, 0, sizeof(*set)); memset(set, 0, sizeof(*set));
set->ops = &nvme_rdma_admin_mq_ops; set->ops = &nvme_rdma_admin_mq_ops;
set->queue_depth = NVME_RDMA_AQ_BLKMQ_DEPTH; set->queue_depth = NVME_AQ_MQ_TAG_DEPTH;
set->reserved_tags = 2; /* connect + keep-alive */ set->reserved_tags = 2; /* connect + keep-alive */
set->numa_node = NUMA_NO_NODE; set->numa_node = NUMA_NO_NODE;
set->cmd_size = sizeof(struct nvme_rdma_request) + set->cmd_size = sizeof(struct nvme_rdma_request) +
@ -705,6 +696,7 @@ static struct blk_mq_tag_set *nvme_rdma_alloc_tagset(struct nvme_ctrl *nctrl,
set->driver_data = ctrl; set->driver_data = ctrl;
set->nr_hw_queues = 1; set->nr_hw_queues = 1;
set->timeout = ADMIN_TIMEOUT; set->timeout = ADMIN_TIMEOUT;
set->flags = BLK_MQ_F_NO_SCHED;
} else { } else {
set = &ctrl->tag_set; set = &ctrl->tag_set;
memset(set, 0, sizeof(*set)); memset(set, 0, sizeof(*set));
@ -748,7 +740,7 @@ static void nvme_rdma_destroy_admin_queue(struct nvme_rdma_ctrl *ctrl,
nvme_rdma_stop_queue(&ctrl->queues[0]); nvme_rdma_stop_queue(&ctrl->queues[0]);
if (remove) { if (remove) {
blk_cleanup_queue(ctrl->ctrl.admin_q); blk_cleanup_queue(ctrl->ctrl.admin_q);
nvme_rdma_free_tagset(&ctrl->ctrl, true); nvme_rdma_free_tagset(&ctrl->ctrl, ctrl->ctrl.admin_tagset);
} }
nvme_rdma_free_queue(&ctrl->queues[0]); nvme_rdma_free_queue(&ctrl->queues[0]);
} }
@ -780,8 +772,7 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl,
goto out_free_tagset; goto out_free_tagset;
} }
} else { } else {
error = blk_mq_reinit_tagset(&ctrl->admin_tag_set, error = nvme_reinit_tagset(&ctrl->ctrl, ctrl->ctrl.admin_tagset);
nvme_rdma_reinit_request);
if (error) if (error)
goto out_free_queue; goto out_free_queue;
} }
@ -825,7 +816,7 @@ out_cleanup_queue:
blk_cleanup_queue(ctrl->ctrl.admin_q); blk_cleanup_queue(ctrl->ctrl.admin_q);
out_free_tagset: out_free_tagset:
if (new) if (new)
nvme_rdma_free_tagset(&ctrl->ctrl, true); nvme_rdma_free_tagset(&ctrl->ctrl, ctrl->ctrl.admin_tagset);
out_free_queue: out_free_queue:
nvme_rdma_free_queue(&ctrl->queues[0]); nvme_rdma_free_queue(&ctrl->queues[0]);
return error; return error;
@ -837,7 +828,7 @@ static void nvme_rdma_destroy_io_queues(struct nvme_rdma_ctrl *ctrl,
nvme_rdma_stop_io_queues(ctrl); nvme_rdma_stop_io_queues(ctrl);
if (remove) { if (remove) {
blk_cleanup_queue(ctrl->ctrl.connect_q); blk_cleanup_queue(ctrl->ctrl.connect_q);
nvme_rdma_free_tagset(&ctrl->ctrl, false); nvme_rdma_free_tagset(&ctrl->ctrl, ctrl->ctrl.tagset);
} }
nvme_rdma_free_io_queues(ctrl); nvme_rdma_free_io_queues(ctrl);
} }
@ -863,8 +854,7 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
goto out_free_tag_set; goto out_free_tag_set;
} }
} else { } else {
ret = blk_mq_reinit_tagset(&ctrl->tag_set, ret = nvme_reinit_tagset(&ctrl->ctrl, ctrl->ctrl.tagset);
nvme_rdma_reinit_request);
if (ret) if (ret)
goto out_free_io_queues; goto out_free_io_queues;
@ -883,7 +873,7 @@ out_cleanup_connect_q:
blk_cleanup_queue(ctrl->ctrl.connect_q); blk_cleanup_queue(ctrl->ctrl.connect_q);
out_free_tag_set: out_free_tag_set:
if (new) if (new)
nvme_rdma_free_tagset(&ctrl->ctrl, false); nvme_rdma_free_tagset(&ctrl->ctrl, ctrl->ctrl.tagset);
out_free_io_queues: out_free_io_queues:
nvme_rdma_free_io_queues(ctrl); nvme_rdma_free_io_queues(ctrl);
return ret; return ret;
@ -922,7 +912,7 @@ static void nvme_rdma_reconnect_or_remove(struct nvme_rdma_ctrl *ctrl)
ctrl->ctrl.opts->reconnect_delay * HZ); ctrl->ctrl.opts->reconnect_delay * HZ);
} else { } else {
dev_info(ctrl->ctrl.device, "Removing controller...\n"); dev_info(ctrl->ctrl.device, "Removing controller...\n");
queue_work(nvme_wq, &ctrl->delete_work); nvme_delete_ctrl(&ctrl->ctrl);
} }
} }
@ -935,10 +925,6 @@ static void nvme_rdma_reconnect_ctrl_work(struct work_struct *work)
++ctrl->ctrl.nr_reconnects; ++ctrl->ctrl.nr_reconnects;
if (ctrl->ctrl.queue_count > 1)
nvme_rdma_destroy_io_queues(ctrl, false);
nvme_rdma_destroy_admin_queue(ctrl, false);
ret = nvme_rdma_configure_admin_queue(ctrl, false); ret = nvme_rdma_configure_admin_queue(ctrl, false);
if (ret) if (ret)
goto requeue; goto requeue;
@ -946,7 +932,7 @@ static void nvme_rdma_reconnect_ctrl_work(struct work_struct *work)
if (ctrl->ctrl.queue_count > 1) { if (ctrl->ctrl.queue_count > 1) {
ret = nvme_rdma_configure_io_queues(ctrl, false); ret = nvme_rdma_configure_io_queues(ctrl, false);
if (ret) if (ret)
goto requeue; goto destroy_admin;
} }
changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE); changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
@ -956,14 +942,17 @@ static void nvme_rdma_reconnect_ctrl_work(struct work_struct *work)
return; return;
} }
ctrl->ctrl.nr_reconnects = 0;
nvme_start_ctrl(&ctrl->ctrl); nvme_start_ctrl(&ctrl->ctrl);
dev_info(ctrl->ctrl.device, "Successfully reconnected\n"); dev_info(ctrl->ctrl.device, "Successfully reconnected (%d attempts)\n",
ctrl->ctrl.nr_reconnects);
ctrl->ctrl.nr_reconnects = 0;
return; return;
destroy_admin:
nvme_rdma_destroy_admin_queue(ctrl, false);
requeue: requeue:
dev_info(ctrl->ctrl.device, "Failed reconnect attempt %d\n", dev_info(ctrl->ctrl.device, "Failed reconnect attempt %d\n",
ctrl->ctrl.nr_reconnects); ctrl->ctrl.nr_reconnects);
@ -979,17 +968,15 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work)
if (ctrl->ctrl.queue_count > 1) { if (ctrl->ctrl.queue_count > 1) {
nvme_stop_queues(&ctrl->ctrl); nvme_stop_queues(&ctrl->ctrl);
nvme_rdma_stop_io_queues(ctrl);
}
blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
nvme_rdma_stop_queue(&ctrl->queues[0]);
/* We must take care of fastfail/requeue all our inflight requests */
if (ctrl->ctrl.queue_count > 1)
blk_mq_tagset_busy_iter(&ctrl->tag_set, blk_mq_tagset_busy_iter(&ctrl->tag_set,
nvme_cancel_request, &ctrl->ctrl); nvme_cancel_request, &ctrl->ctrl);
nvme_rdma_destroy_io_queues(ctrl, false);
}
blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
blk_mq_tagset_busy_iter(&ctrl->admin_tag_set, blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
nvme_cancel_request, &ctrl->ctrl); nvme_cancel_request, &ctrl->ctrl);
nvme_rdma_destroy_admin_queue(ctrl, false);
/* /*
* queues are not a live anymore, so restart the queues to fail fast * queues are not a live anymore, so restart the queues to fail fast
@ -1065,7 +1052,7 @@ static void nvme_rdma_unmap_data(struct nvme_rdma_queue *queue,
if (!blk_rq_bytes(rq)) if (!blk_rq_bytes(rq))
return; return;
if (req->mr->need_inval) { if (req->mr->need_inval && test_bit(NVME_RDMA_Q_LIVE, &req->queue->flags)) {
res = nvme_rdma_inv_rkey(queue, req); res = nvme_rdma_inv_rkey(queue, req);
if (unlikely(res < 0)) { if (unlikely(res < 0)) {
dev_err(ctrl->ctrl.device, dev_err(ctrl->ctrl.device,
@ -1314,7 +1301,7 @@ static struct blk_mq_tags *nvme_rdma_tagset(struct nvme_rdma_queue *queue)
return queue->ctrl->tag_set.tags[queue_idx - 1]; return queue->ctrl->tag_set.tags[queue_idx - 1];
} }
static void nvme_rdma_submit_async_event(struct nvme_ctrl *arg, int aer_idx) static void nvme_rdma_submit_async_event(struct nvme_ctrl *arg)
{ {
struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(arg); struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(arg);
struct nvme_rdma_queue *queue = &ctrl->queues[0]; struct nvme_rdma_queue *queue = &ctrl->queues[0];
@ -1324,14 +1311,11 @@ static void nvme_rdma_submit_async_event(struct nvme_ctrl *arg, int aer_idx)
struct ib_sge sge; struct ib_sge sge;
int ret; int ret;
if (WARN_ON_ONCE(aer_idx != 0))
return;
ib_dma_sync_single_for_cpu(dev, sqe->dma, sizeof(*cmd), DMA_TO_DEVICE); ib_dma_sync_single_for_cpu(dev, sqe->dma, sizeof(*cmd), DMA_TO_DEVICE);
memset(cmd, 0, sizeof(*cmd)); memset(cmd, 0, sizeof(*cmd));
cmd->common.opcode = nvme_admin_async_event; cmd->common.opcode = nvme_admin_async_event;
cmd->common.command_id = NVME_RDMA_AQ_BLKMQ_DEPTH; cmd->common.command_id = NVME_AQ_BLK_MQ_DEPTH;
cmd->common.flags |= NVME_CMD_SGL_METABUF; cmd->common.flags |= NVME_CMD_SGL_METABUF;
nvme_rdma_set_sg_null(cmd); nvme_rdma_set_sg_null(cmd);
@ -1393,7 +1377,7 @@ static int __nvme_rdma_recv_done(struct ib_cq *cq, struct ib_wc *wc, int tag)
* for them but rather special case them here. * for them but rather special case them here.
*/ */
if (unlikely(nvme_rdma_queue_idx(queue) == 0 && if (unlikely(nvme_rdma_queue_idx(queue) == 0 &&
cqe->command_id >= NVME_RDMA_AQ_BLKMQ_DEPTH)) cqe->command_id >= NVME_AQ_BLK_MQ_DEPTH))
nvme_complete_async_event(&queue->ctrl->ctrl, cqe->status, nvme_complete_async_event(&queue->ctrl->ctrl, cqe->status,
&cqe->result); &cqe->result);
else else
@ -1590,6 +1574,10 @@ nvme_rdma_timeout(struct request *rq, bool reserved)
{ {
struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq); struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq);
dev_warn(req->queue->ctrl->ctrl.device,
"I/O %d QID %d timeout, reset controller\n",
rq->tag, nvme_rdma_queue_idx(req->queue));
/* queue error recovery */ /* queue error recovery */
nvme_rdma_error_recovery(req->queue->ctrl); nvme_rdma_error_recovery(req->queue->ctrl);
@ -1767,50 +1755,9 @@ static void nvme_rdma_shutdown_ctrl(struct nvme_rdma_ctrl *ctrl, bool shutdown)
nvme_rdma_destroy_admin_queue(ctrl, shutdown); nvme_rdma_destroy_admin_queue(ctrl, shutdown);
} }
static void nvme_rdma_remove_ctrl(struct nvme_rdma_ctrl *ctrl) static void nvme_rdma_delete_ctrl(struct nvme_ctrl *ctrl)
{ {
nvme_remove_namespaces(&ctrl->ctrl); nvme_rdma_shutdown_ctrl(to_rdma_ctrl(ctrl), true);
nvme_rdma_shutdown_ctrl(ctrl, true);
nvme_uninit_ctrl(&ctrl->ctrl);
nvme_put_ctrl(&ctrl->ctrl);
}
static void nvme_rdma_del_ctrl_work(struct work_struct *work)
{
struct nvme_rdma_ctrl *ctrl = container_of(work,
struct nvme_rdma_ctrl, delete_work);
nvme_stop_ctrl(&ctrl->ctrl);
nvme_rdma_remove_ctrl(ctrl);
}
static int __nvme_rdma_del_ctrl(struct nvme_rdma_ctrl *ctrl)
{
if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_DELETING))
return -EBUSY;
if (!queue_work(nvme_wq, &ctrl->delete_work))
return -EBUSY;
return 0;
}
static int nvme_rdma_del_ctrl(struct nvme_ctrl *nctrl)
{
struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl);
int ret = 0;
/*
* Keep a reference until all work is flushed since
* __nvme_rdma_del_ctrl can free the ctrl mem
*/
if (!kref_get_unless_zero(&ctrl->ctrl.kref))
return -EBUSY;
ret = __nvme_rdma_del_ctrl(ctrl);
if (!ret)
flush_work(&ctrl->delete_work);
nvme_put_ctrl(&ctrl->ctrl);
return ret;
} }
static void nvme_rdma_reset_ctrl_work(struct work_struct *work) static void nvme_rdma_reset_ctrl_work(struct work_struct *work)
@ -1834,7 +1781,11 @@ static void nvme_rdma_reset_ctrl_work(struct work_struct *work)
} }
changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE); changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
WARN_ON_ONCE(!changed); if (!changed) {
/* state change failure is ok if we're in DELETING state */
WARN_ON_ONCE(ctrl->ctrl.state != NVME_CTRL_DELETING);
return;
}
nvme_start_ctrl(&ctrl->ctrl); nvme_start_ctrl(&ctrl->ctrl);
@ -1842,7 +1793,10 @@ static void nvme_rdma_reset_ctrl_work(struct work_struct *work)
out_fail: out_fail:
dev_warn(ctrl->ctrl.device, "Removing after reset failure\n"); dev_warn(ctrl->ctrl.device, "Removing after reset failure\n");
nvme_rdma_remove_ctrl(ctrl); nvme_remove_namespaces(&ctrl->ctrl);
nvme_rdma_shutdown_ctrl(ctrl, true);
nvme_uninit_ctrl(&ctrl->ctrl);
nvme_put_ctrl(&ctrl->ctrl);
} }
static const struct nvme_ctrl_ops nvme_rdma_ctrl_ops = { static const struct nvme_ctrl_ops nvme_rdma_ctrl_ops = {
@ -1854,10 +1808,88 @@ static const struct nvme_ctrl_ops nvme_rdma_ctrl_ops = {
.reg_write32 = nvmf_reg_write32, .reg_write32 = nvmf_reg_write32,
.free_ctrl = nvme_rdma_free_ctrl, .free_ctrl = nvme_rdma_free_ctrl,
.submit_async_event = nvme_rdma_submit_async_event, .submit_async_event = nvme_rdma_submit_async_event,
.delete_ctrl = nvme_rdma_del_ctrl, .delete_ctrl = nvme_rdma_delete_ctrl,
.get_address = nvmf_get_address, .get_address = nvmf_get_address,
.reinit_request = nvme_rdma_reinit_request,
}; };
static inline bool
__nvme_rdma_options_match(struct nvme_rdma_ctrl *ctrl,
struct nvmf_ctrl_options *opts)
{
char *stdport = __stringify(NVME_RDMA_IP_PORT);
if (!nvmf_ctlr_matches_baseopts(&ctrl->ctrl, opts) ||
strcmp(opts->traddr, ctrl->ctrl.opts->traddr))
return false;
if (opts->mask & NVMF_OPT_TRSVCID &&
ctrl->ctrl.opts->mask & NVMF_OPT_TRSVCID) {
if (strcmp(opts->trsvcid, ctrl->ctrl.opts->trsvcid))
return false;
} else if (opts->mask & NVMF_OPT_TRSVCID) {
if (strcmp(opts->trsvcid, stdport))
return false;
} else if (ctrl->ctrl.opts->mask & NVMF_OPT_TRSVCID) {
if (strcmp(stdport, ctrl->ctrl.opts->trsvcid))
return false;
}
/* else, it's a match as both have stdport. Fall to next checks */
/*
* checking the local address is rough. In most cases, one
* is not specified and the host port is selected by the stack.
*
* Assume no match if:
* local address is specified and address is not the same
* local address is not specified but remote is, or vice versa
* (admin using specific host_traddr when it matters).
*/
if (opts->mask & NVMF_OPT_HOST_TRADDR &&
ctrl->ctrl.opts->mask & NVMF_OPT_HOST_TRADDR) {
if (strcmp(opts->host_traddr, ctrl->ctrl.opts->host_traddr))
return false;
} else if (opts->mask & NVMF_OPT_HOST_TRADDR ||
ctrl->ctrl.opts->mask & NVMF_OPT_HOST_TRADDR)
return false;
/*
* if neither controller had an host port specified, assume it's
* a match as everything else matched.
*/
return true;
}
/*
* Fails a connection request if it matches an existing controller
* (association) with the same tuple:
* <Host NQN, Host ID, local address, remote address, remote port, SUBSYS NQN>
*
* if local address is not specified in the request, it will match an
* existing controller with all the other parameters the same and no
* local port address specified as well.
*
* The ports don't need to be compared as they are intrinsically
* already matched by the port pointers supplied.
*/
static bool
nvme_rdma_existing_controller(struct nvmf_ctrl_options *opts)
{
struct nvme_rdma_ctrl *ctrl;
bool found = false;
mutex_lock(&nvme_rdma_ctrl_mutex);
list_for_each_entry(ctrl, &nvme_rdma_ctrl_list, list) {
found = __nvme_rdma_options_match(ctrl, opts);
if (found)
break;
}
mutex_unlock(&nvme_rdma_ctrl_mutex);
return found;
}
static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev, static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
struct nvmf_ctrl_options *opts) struct nvmf_ctrl_options *opts)
{ {
@ -1894,6 +1926,11 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
} }
} }
if (!opts->duplicate_connect && nvme_rdma_existing_controller(opts)) {
ret = -EALREADY;
goto out_free_ctrl;
}
ret = nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_rdma_ctrl_ops, ret = nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_rdma_ctrl_ops,
0 /* no quirks, we're perfect! */); 0 /* no quirks, we're perfect! */);
if (ret) if (ret)
@ -1902,7 +1939,6 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
INIT_DELAYED_WORK(&ctrl->reconnect_work, INIT_DELAYED_WORK(&ctrl->reconnect_work,
nvme_rdma_reconnect_ctrl_work); nvme_rdma_reconnect_ctrl_work);
INIT_WORK(&ctrl->err_work, nvme_rdma_error_recovery_work); INIT_WORK(&ctrl->err_work, nvme_rdma_error_recovery_work);
INIT_WORK(&ctrl->delete_work, nvme_rdma_del_ctrl_work);
INIT_WORK(&ctrl->ctrl.reset_work, nvme_rdma_reset_ctrl_work); INIT_WORK(&ctrl->ctrl.reset_work, nvme_rdma_reset_ctrl_work);
ctrl->ctrl.queue_count = opts->nr_io_queues + 1; /* +1 for admin queue */ ctrl->ctrl.queue_count = opts->nr_io_queues + 1; /* +1 for admin queue */
@ -1961,7 +1997,7 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
dev_info(ctrl->ctrl.device, "new ctrl: NQN \"%s\", addr %pISpcs\n", dev_info(ctrl->ctrl.device, "new ctrl: NQN \"%s\", addr %pISpcs\n",
ctrl->ctrl.opts->subsysnqn, &ctrl->addr); ctrl->ctrl.opts->subsysnqn, &ctrl->addr);
kref_get(&ctrl->ctrl.kref); nvme_get_ctrl(&ctrl->ctrl);
mutex_lock(&nvme_rdma_ctrl_mutex); mutex_lock(&nvme_rdma_ctrl_mutex);
list_add_tail(&ctrl->list, &nvme_rdma_ctrl_list); list_add_tail(&ctrl->list, &nvme_rdma_ctrl_list);
@ -2006,7 +2042,7 @@ static void nvme_rdma_remove_one(struct ib_device *ib_device, void *client_data)
dev_info(ctrl->ctrl.device, dev_info(ctrl->ctrl.device,
"Removing ctrl: NQN \"%s\", addr %pISp\n", "Removing ctrl: NQN \"%s\", addr %pISp\n",
ctrl->ctrl.opts->subsysnqn, &ctrl->addr); ctrl->ctrl.opts->subsysnqn, &ctrl->addr);
__nvme_rdma_del_ctrl(ctrl); nvme_delete_ctrl(&ctrl->ctrl);
} }
mutex_unlock(&nvme_rdma_ctrl_mutex); mutex_unlock(&nvme_rdma_ctrl_mutex);

View File

@ -35,17 +35,14 @@ u32 nvmet_get_log_page_len(struct nvme_command *cmd)
static u16 nvmet_get_smart_log_nsid(struct nvmet_req *req, static u16 nvmet_get_smart_log_nsid(struct nvmet_req *req,
struct nvme_smart_log *slog) struct nvme_smart_log *slog)
{ {
u16 status;
struct nvmet_ns *ns; struct nvmet_ns *ns;
u64 host_reads, host_writes, data_units_read, data_units_written; u64 host_reads, host_writes, data_units_read, data_units_written;
status = NVME_SC_SUCCESS;
ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->get_log_page.nsid); ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->get_log_page.nsid);
if (!ns) { if (!ns) {
status = NVME_SC_INVALID_NS;
pr_err("nvmet : Could not find namespace id : %d\n", pr_err("nvmet : Could not find namespace id : %d\n",
le32_to_cpu(req->cmd->get_log_page.nsid)); le32_to_cpu(req->cmd->get_log_page.nsid));
goto out; return NVME_SC_INVALID_NS;
} }
host_reads = part_stat_read(ns->bdev->bd_part, ios[READ]); host_reads = part_stat_read(ns->bdev->bd_part, ios[READ]);
@ -58,20 +55,18 @@ static u16 nvmet_get_smart_log_nsid(struct nvmet_req *req,
put_unaligned_le64(host_writes, &slog->host_writes[0]); put_unaligned_le64(host_writes, &slog->host_writes[0]);
put_unaligned_le64(data_units_written, &slog->data_units_written[0]); put_unaligned_le64(data_units_written, &slog->data_units_written[0]);
nvmet_put_namespace(ns); nvmet_put_namespace(ns);
out:
return status; return NVME_SC_SUCCESS;
} }
static u16 nvmet_get_smart_log_all(struct nvmet_req *req, static u16 nvmet_get_smart_log_all(struct nvmet_req *req,
struct nvme_smart_log *slog) struct nvme_smart_log *slog)
{ {
u16 status;
u64 host_reads = 0, host_writes = 0; u64 host_reads = 0, host_writes = 0;
u64 data_units_read = 0, data_units_written = 0; u64 data_units_read = 0, data_units_written = 0;
struct nvmet_ns *ns; struct nvmet_ns *ns;
struct nvmet_ctrl *ctrl; struct nvmet_ctrl *ctrl;
status = NVME_SC_SUCCESS;
ctrl = req->sq->ctrl; ctrl = req->sq->ctrl;
rcu_read_lock(); rcu_read_lock();
@ -91,7 +86,7 @@ static u16 nvmet_get_smart_log_all(struct nvmet_req *req,
put_unaligned_le64(host_writes, &slog->host_writes[0]); put_unaligned_le64(host_writes, &slog->host_writes[0]);
put_unaligned_le64(data_units_written, &slog->data_units_written[0]); put_unaligned_le64(data_units_written, &slog->data_units_written[0]);
return status; return NVME_SC_SUCCESS;
} }
static u16 nvmet_get_smart_log(struct nvmet_req *req, static u16 nvmet_get_smart_log(struct nvmet_req *req,
@ -144,10 +139,8 @@ static void nvmet_execute_get_log_page(struct nvmet_req *req)
} }
smart_log = buf; smart_log = buf;
status = nvmet_get_smart_log(req, smart_log); status = nvmet_get_smart_log(req, smart_log);
if (status) { if (status)
memset(buf, '\0', data_len);
goto err; goto err;
}
break; break;
case NVME_LOG_FW_SLOT: case NVME_LOG_FW_SLOT:
/* /*
@ -300,7 +293,7 @@ static void nvmet_execute_identify_ns(struct nvmet_req *req)
} }
/* /*
* nuse = ncap = nsze isn't aways true, but we have no way to find * nuse = ncap = nsze isn't always true, but we have no way to find
* that out from the underlying device. * that out from the underlying device.
*/ */
id->ncap = id->nuse = id->nsze = id->ncap = id->nuse = id->nsze =
@ -424,7 +417,7 @@ out:
} }
/* /*
* A "mimimum viable" abort implementation: the command is mandatory in the * A "minimum viable" abort implementation: the command is mandatory in the
* spec, but we are not required to do any useful work. We couldn't really * spec, but we are not required to do any useful work. We couldn't really
* do a useful abort, so don't bother even with waiting for the command * do a useful abort, so don't bother even with waiting for the command
* to be exectuted and return immediately telling the command to abort * to be exectuted and return immediately telling the command to abort

View File

@ -57,6 +57,17 @@ u16 nvmet_copy_from_sgl(struct nvmet_req *req, off_t off, void *buf, size_t len)
return 0; return 0;
} }
static unsigned int nvmet_max_nsid(struct nvmet_subsys *subsys)
{
struct nvmet_ns *ns;
if (list_empty(&subsys->namespaces))
return 0;
ns = list_last_entry(&subsys->namespaces, struct nvmet_ns, dev_link);
return ns->nsid;
}
static u32 nvmet_async_event_result(struct nvmet_async_event *aen) static u32 nvmet_async_event_result(struct nvmet_async_event *aen)
{ {
return aen->event_type | (aen->event_info << 8) | (aen->log_page << 16); return aen->event_type | (aen->event_info << 8) | (aen->log_page << 16);
@ -334,6 +345,8 @@ void nvmet_ns_disable(struct nvmet_ns *ns)
ns->enabled = false; ns->enabled = false;
list_del_rcu(&ns->dev_link); list_del_rcu(&ns->dev_link);
if (ns->nsid == subsys->max_nsid)
subsys->max_nsid = nvmet_max_nsid(subsys);
mutex_unlock(&subsys->lock); mutex_unlock(&subsys->lock);
/* /*
@ -497,6 +510,7 @@ bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
req->ops = ops; req->ops = ops;
req->sg = NULL; req->sg = NULL;
req->sg_cnt = 0; req->sg_cnt = 0;
req->transfer_len = 0;
req->rsp->status = 0; req->rsp->status = 0;
/* no support for fused commands yet */ /* no support for fused commands yet */
@ -546,6 +560,15 @@ void nvmet_req_uninit(struct nvmet_req *req)
} }
EXPORT_SYMBOL_GPL(nvmet_req_uninit); EXPORT_SYMBOL_GPL(nvmet_req_uninit);
void nvmet_req_execute(struct nvmet_req *req)
{
if (unlikely(req->data_len != req->transfer_len))
nvmet_req_complete(req, NVME_SC_SGL_INVALID_DATA | NVME_SC_DNR);
else
req->execute(req);
}
EXPORT_SYMBOL_GPL(nvmet_req_execute);
static inline bool nvmet_cc_en(u32 cc) static inline bool nvmet_cc_en(u32 cc)
{ {
return (cc >> NVME_CC_EN_SHIFT) & 0x1; return (cc >> NVME_CC_EN_SHIFT) & 0x1;

View File

@ -76,7 +76,6 @@ struct nvmet_fc_fcp_iod {
dma_addr_t rspdma; dma_addr_t rspdma;
struct scatterlist *data_sg; struct scatterlist *data_sg;
int data_sg_cnt; int data_sg_cnt;
u32 total_length;
u32 offset; u32 offset;
enum nvmet_fcp_datadir io_dir; enum nvmet_fcp_datadir io_dir;
bool active; bool active;
@ -150,6 +149,7 @@ struct nvmet_fc_tgt_assoc {
struct list_head a_list; struct list_head a_list;
struct nvmet_fc_tgt_queue *queues[NVMET_NR_QUEUES + 1]; struct nvmet_fc_tgt_queue *queues[NVMET_NR_QUEUES + 1];
struct kref ref; struct kref ref;
struct work_struct del_work;
}; };
@ -232,6 +232,7 @@ static void nvmet_fc_tgtport_put(struct nvmet_fc_tgtport *tgtport);
static int nvmet_fc_tgtport_get(struct nvmet_fc_tgtport *tgtport); static int nvmet_fc_tgtport_get(struct nvmet_fc_tgtport *tgtport);
static void nvmet_fc_handle_fcp_rqst(struct nvmet_fc_tgtport *tgtport, static void nvmet_fc_handle_fcp_rqst(struct nvmet_fc_tgtport *tgtport,
struct nvmet_fc_fcp_iod *fod); struct nvmet_fc_fcp_iod *fod);
static void nvmet_fc_delete_target_assoc(struct nvmet_fc_tgt_assoc *assoc);
/* *********************** FC-NVME DMA Handling **************************** */ /* *********************** FC-NVME DMA Handling **************************** */
@ -802,6 +803,16 @@ nvmet_fc_find_target_queue(struct nvmet_fc_tgtport *tgtport,
return NULL; return NULL;
} }
static void
nvmet_fc_delete_assoc(struct work_struct *work)
{
struct nvmet_fc_tgt_assoc *assoc =
container_of(work, struct nvmet_fc_tgt_assoc, del_work);
nvmet_fc_delete_target_assoc(assoc);
nvmet_fc_tgt_a_put(assoc);
}
static struct nvmet_fc_tgt_assoc * static struct nvmet_fc_tgt_assoc *
nvmet_fc_alloc_target_assoc(struct nvmet_fc_tgtport *tgtport) nvmet_fc_alloc_target_assoc(struct nvmet_fc_tgtport *tgtport)
{ {
@ -826,6 +837,7 @@ nvmet_fc_alloc_target_assoc(struct nvmet_fc_tgtport *tgtport)
assoc->a_id = idx; assoc->a_id = idx;
INIT_LIST_HEAD(&assoc->a_list); INIT_LIST_HEAD(&assoc->a_list);
kref_init(&assoc->ref); kref_init(&assoc->ref);
INIT_WORK(&assoc->del_work, nvmet_fc_delete_assoc);
while (needrandom) { while (needrandom) {
get_random_bytes(&ran, sizeof(ran) - BYTES_FOR_QID); get_random_bytes(&ran, sizeof(ran) - BYTES_FOR_QID);
@ -1118,8 +1130,7 @@ nvmet_fc_delete_ctrl(struct nvmet_ctrl *ctrl)
nvmet_fc_tgtport_put(tgtport); nvmet_fc_tgtport_put(tgtport);
if (found_ctrl) { if (found_ctrl) {
nvmet_fc_delete_target_assoc(assoc); schedule_work(&assoc->del_work);
nvmet_fc_tgt_a_put(assoc);
return; return;
} }
@ -1688,7 +1699,7 @@ nvmet_fc_alloc_tgt_pgs(struct nvmet_fc_fcp_iod *fod)
u32 page_len, length; u32 page_len, length;
int i = 0; int i = 0;
length = fod->total_length; length = fod->req.transfer_len;
nent = DIV_ROUND_UP(length, PAGE_SIZE); nent = DIV_ROUND_UP(length, PAGE_SIZE);
sg = kmalloc_array(nent, sizeof(struct scatterlist), GFP_KERNEL); sg = kmalloc_array(nent, sizeof(struct scatterlist), GFP_KERNEL);
if (!sg) if (!sg)
@ -1777,7 +1788,7 @@ nvmet_fc_prep_fcp_rsp(struct nvmet_fc_tgtport *tgtport,
u32 rsn, rspcnt, xfr_length; u32 rsn, rspcnt, xfr_length;
if (fod->fcpreq->op == NVMET_FCOP_READDATA_RSP) if (fod->fcpreq->op == NVMET_FCOP_READDATA_RSP)
xfr_length = fod->total_length; xfr_length = fod->req.transfer_len;
else else
xfr_length = fod->offset; xfr_length = fod->offset;
@ -1803,7 +1814,7 @@ nvmet_fc_prep_fcp_rsp(struct nvmet_fc_tgtport *tgtport,
rspcnt = atomic_inc_return(&fod->queue->zrspcnt); rspcnt = atomic_inc_return(&fod->queue->zrspcnt);
if (!(rspcnt % fod->queue->ersp_ratio) || if (!(rspcnt % fod->queue->ersp_ratio) ||
sqe->opcode == nvme_fabrics_command || sqe->opcode == nvme_fabrics_command ||
xfr_length != fod->total_length || xfr_length != fod->req.transfer_len ||
(le16_to_cpu(cqe->status) & 0xFFFE) || cqewd[0] || cqewd[1] || (le16_to_cpu(cqe->status) & 0xFFFE) || cqewd[0] || cqewd[1] ||
(sqe->flags & (NVME_CMD_FUSE_FIRST | NVME_CMD_FUSE_SECOND)) || (sqe->flags & (NVME_CMD_FUSE_FIRST | NVME_CMD_FUSE_SECOND)) ||
queue_90percent_full(fod->queue, le16_to_cpu(cqe->sq_head))) queue_90percent_full(fod->queue, le16_to_cpu(cqe->sq_head)))
@ -1880,7 +1891,7 @@ nvmet_fc_transfer_fcp_data(struct nvmet_fc_tgtport *tgtport,
fcpreq->timeout = NVME_FC_TGTOP_TIMEOUT_SEC; fcpreq->timeout = NVME_FC_TGTOP_TIMEOUT_SEC;
tlen = min_t(u32, tgtport->max_sg_cnt * PAGE_SIZE, tlen = min_t(u32, tgtport->max_sg_cnt * PAGE_SIZE,
(fod->total_length - fod->offset)); (fod->req.transfer_len - fod->offset));
fcpreq->transfer_length = tlen; fcpreq->transfer_length = tlen;
fcpreq->transferred_length = 0; fcpreq->transferred_length = 0;
fcpreq->fcp_error = 0; fcpreq->fcp_error = 0;
@ -1894,7 +1905,7 @@ nvmet_fc_transfer_fcp_data(struct nvmet_fc_tgtport *tgtport,
* combined xfr with response. * combined xfr with response.
*/ */
if ((op == NVMET_FCOP_READDATA) && if ((op == NVMET_FCOP_READDATA) &&
((fod->offset + fcpreq->transfer_length) == fod->total_length) && ((fod->offset + fcpreq->transfer_length) == fod->req.transfer_len) &&
(tgtport->ops->target_features & NVMET_FCTGTFEAT_READDATA_RSP)) { (tgtport->ops->target_features & NVMET_FCTGTFEAT_READDATA_RSP)) {
fcpreq->op = NVMET_FCOP_READDATA_RSP; fcpreq->op = NVMET_FCOP_READDATA_RSP;
nvmet_fc_prep_fcp_rsp(tgtport, fod); nvmet_fc_prep_fcp_rsp(tgtport, fod);
@ -1974,7 +1985,7 @@ nvmet_fc_fod_op_done(struct nvmet_fc_fcp_iod *fod)
} }
fod->offset += fcpreq->transferred_length; fod->offset += fcpreq->transferred_length;
if (fod->offset != fod->total_length) { if (fod->offset != fod->req.transfer_len) {
spin_lock_irqsave(&fod->flock, flags); spin_lock_irqsave(&fod->flock, flags);
fod->writedataactive = true; fod->writedataactive = true;
spin_unlock_irqrestore(&fod->flock, flags); spin_unlock_irqrestore(&fod->flock, flags);
@ -1986,9 +1997,7 @@ nvmet_fc_fod_op_done(struct nvmet_fc_fcp_iod *fod)
} }
/* data transfer complete, resume with nvmet layer */ /* data transfer complete, resume with nvmet layer */
nvmet_req_execute(&fod->req);
fod->req.execute(&fod->req);
break; break;
case NVMET_FCOP_READDATA: case NVMET_FCOP_READDATA:
@ -2011,7 +2020,7 @@ nvmet_fc_fod_op_done(struct nvmet_fc_fcp_iod *fod)
} }
fod->offset += fcpreq->transferred_length; fod->offset += fcpreq->transferred_length;
if (fod->offset != fod->total_length) { if (fod->offset != fod->req.transfer_len) {
/* transfer the next chunk */ /* transfer the next chunk */
nvmet_fc_transfer_fcp_data(tgtport, fod, nvmet_fc_transfer_fcp_data(tgtport, fod,
NVMET_FCOP_READDATA); NVMET_FCOP_READDATA);
@ -2148,7 +2157,7 @@ nvmet_fc_handle_fcp_rqst(struct nvmet_fc_tgtport *tgtport,
fod->fcpreq->done = nvmet_fc_xmt_fcp_op_done; fod->fcpreq->done = nvmet_fc_xmt_fcp_op_done;
fod->total_length = be32_to_cpu(cmdiu->data_len); fod->req.transfer_len = be32_to_cpu(cmdiu->data_len);
if (cmdiu->flags & FCNVME_CMD_FLAGS_WRITE) { if (cmdiu->flags & FCNVME_CMD_FLAGS_WRITE) {
fod->io_dir = NVMET_FCP_WRITE; fod->io_dir = NVMET_FCP_WRITE;
if (!nvme_is_write(&cmdiu->sqe)) if (!nvme_is_write(&cmdiu->sqe))
@ -2159,7 +2168,7 @@ nvmet_fc_handle_fcp_rqst(struct nvmet_fc_tgtport *tgtport,
goto transport_error; goto transport_error;
} else { } else {
fod->io_dir = NVMET_FCP_NODATA; fod->io_dir = NVMET_FCP_NODATA;
if (fod->total_length) if (fod->req.transfer_len)
goto transport_error; goto transport_error;
} }
@ -2167,9 +2176,6 @@ nvmet_fc_handle_fcp_rqst(struct nvmet_fc_tgtport *tgtport,
fod->req.rsp = &fod->rspiubuf.cqe; fod->req.rsp = &fod->rspiubuf.cqe;
fod->req.port = fod->queue->port; fod->req.port = fod->queue->port;
/* ensure nvmet handlers will set cmd handler callback */
fod->req.execute = NULL;
/* clear any response payload */ /* clear any response payload */
memset(&fod->rspiubuf, 0, sizeof(fod->rspiubuf)); memset(&fod->rspiubuf, 0, sizeof(fod->rspiubuf));
@ -2189,7 +2195,7 @@ nvmet_fc_handle_fcp_rqst(struct nvmet_fc_tgtport *tgtport,
/* keep a running counter of tail position */ /* keep a running counter of tail position */
atomic_inc(&fod->queue->sqtail); atomic_inc(&fod->queue->sqtail);
if (fod->total_length) { if (fod->req.transfer_len) {
ret = nvmet_fc_alloc_tgt_pgs(fod); ret = nvmet_fc_alloc_tgt_pgs(fod);
if (ret) { if (ret) {
nvmet_req_complete(&fod->req, ret); nvmet_req_complete(&fod->req, ret);
@ -2212,9 +2218,7 @@ nvmet_fc_handle_fcp_rqst(struct nvmet_fc_tgtport *tgtport,
* can invoke the nvmet_layer now. If read data, cmd completion will * can invoke the nvmet_layer now. If read data, cmd completion will
* push the data * push the data
*/ */
nvmet_req_execute(&fod->req);
fod->req.execute(&fod->req);
return; return;
transport_error: transport_error:

View File

@ -33,18 +33,11 @@ static inline u32 nvmet_rw_len(struct nvmet_req *req)
req->ns->blksize_shift; req->ns->blksize_shift;
} }
static void nvmet_inline_bio_init(struct nvmet_req *req)
{
struct bio *bio = &req->inline_bio;
bio_init(bio, req->inline_bvec, NVMET_MAX_INLINE_BIOVEC);
}
static void nvmet_execute_rw(struct nvmet_req *req) static void nvmet_execute_rw(struct nvmet_req *req)
{ {
int sg_cnt = req->sg_cnt; int sg_cnt = req->sg_cnt;
struct bio *bio = &req->inline_bio;
struct scatterlist *sg; struct scatterlist *sg;
struct bio *bio;
sector_t sector; sector_t sector;
blk_qc_t cookie; blk_qc_t cookie;
int op, op_flags = 0, i; int op, op_flags = 0, i;
@ -66,8 +59,7 @@ static void nvmet_execute_rw(struct nvmet_req *req)
sector = le64_to_cpu(req->cmd->rw.slba); sector = le64_to_cpu(req->cmd->rw.slba);
sector <<= (req->ns->blksize_shift - 9); sector <<= (req->ns->blksize_shift - 9);
nvmet_inline_bio_init(req); bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
bio = &req->inline_bio;
bio_set_dev(bio, req->ns->bdev); bio_set_dev(bio, req->ns->bdev);
bio->bi_iter.bi_sector = sector; bio->bi_iter.bi_sector = sector;
bio->bi_private = req; bio->bi_private = req;
@ -94,16 +86,14 @@ static void nvmet_execute_rw(struct nvmet_req *req)
cookie = submit_bio(bio); cookie = submit_bio(bio);
blk_mq_poll(bdev_get_queue(req->ns->bdev), cookie); blk_poll(bdev_get_queue(req->ns->bdev), cookie);
} }
static void nvmet_execute_flush(struct nvmet_req *req) static void nvmet_execute_flush(struct nvmet_req *req)
{ {
struct bio *bio; struct bio *bio = &req->inline_bio;
nvmet_inline_bio_init(req);
bio = &req->inline_bio;
bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
bio_set_dev(bio, req->ns->bdev); bio_set_dev(bio, req->ns->bdev);
bio->bi_private = req; bio->bi_private = req;
bio->bi_end_io = nvmet_bio_done; bio->bi_end_io = nvmet_bio_done;

View File

@ -23,14 +23,6 @@
#define NVME_LOOP_MAX_SEGMENTS 256 #define NVME_LOOP_MAX_SEGMENTS 256
/*
* We handle AEN commands ourselves and don't even let the
* block layer know about them.
*/
#define NVME_LOOP_NR_AEN_COMMANDS 1
#define NVME_LOOP_AQ_BLKMQ_DEPTH \
(NVME_AQ_DEPTH - NVME_LOOP_NR_AEN_COMMANDS)
struct nvme_loop_iod { struct nvme_loop_iod {
struct nvme_request nvme_req; struct nvme_request nvme_req;
struct nvme_command cmd; struct nvme_command cmd;
@ -53,7 +45,6 @@ struct nvme_loop_ctrl {
struct nvme_ctrl ctrl; struct nvme_ctrl ctrl;
struct nvmet_ctrl *target_ctrl; struct nvmet_ctrl *target_ctrl;
struct work_struct delete_work;
}; };
static inline struct nvme_loop_ctrl *to_loop_ctrl(struct nvme_ctrl *ctrl) static inline struct nvme_loop_ctrl *to_loop_ctrl(struct nvme_ctrl *ctrl)
@ -113,7 +104,7 @@ static void nvme_loop_queue_response(struct nvmet_req *req)
* for them but rather special case them here. * for them but rather special case them here.
*/ */
if (unlikely(nvme_loop_queue_idx(queue) == 0 && if (unlikely(nvme_loop_queue_idx(queue) == 0 &&
cqe->command_id >= NVME_LOOP_AQ_BLKMQ_DEPTH)) { cqe->command_id >= NVME_AQ_BLK_MQ_DEPTH)) {
nvme_complete_async_event(&queue->ctrl->ctrl, cqe->status, nvme_complete_async_event(&queue->ctrl->ctrl, cqe->status,
&cqe->result); &cqe->result);
} else { } else {
@ -136,7 +127,7 @@ static void nvme_loop_execute_work(struct work_struct *work)
struct nvme_loop_iod *iod = struct nvme_loop_iod *iod =
container_of(work, struct nvme_loop_iod, work); container_of(work, struct nvme_loop_iod, work);
iod->req.execute(&iod->req); nvmet_req_execute(&iod->req);
} }
static enum blk_eh_timer_return static enum blk_eh_timer_return
@ -185,6 +176,7 @@ static blk_status_t nvme_loop_queue_rq(struct blk_mq_hw_ctx *hctx,
iod->req.sg = iod->sg_table.sgl; iod->req.sg = iod->sg_table.sgl;
iod->req.sg_cnt = blk_rq_map_sg(req->q, req, iod->sg_table.sgl); iod->req.sg_cnt = blk_rq_map_sg(req->q, req, iod->sg_table.sgl);
iod->req.transfer_len = blk_rq_bytes(req);
} }
blk_mq_start_request(req); blk_mq_start_request(req);
@ -193,7 +185,7 @@ static blk_status_t nvme_loop_queue_rq(struct blk_mq_hw_ctx *hctx,
return BLK_STS_OK; return BLK_STS_OK;
} }
static void nvme_loop_submit_async_event(struct nvme_ctrl *arg, int aer_idx) static void nvme_loop_submit_async_event(struct nvme_ctrl *arg)
{ {
struct nvme_loop_ctrl *ctrl = to_loop_ctrl(arg); struct nvme_loop_ctrl *ctrl = to_loop_ctrl(arg);
struct nvme_loop_queue *queue = &ctrl->queues[0]; struct nvme_loop_queue *queue = &ctrl->queues[0];
@ -201,7 +193,7 @@ static void nvme_loop_submit_async_event(struct nvme_ctrl *arg, int aer_idx)
memset(&iod->cmd, 0, sizeof(iod->cmd)); memset(&iod->cmd, 0, sizeof(iod->cmd));
iod->cmd.common.opcode = nvme_admin_async_event; iod->cmd.common.opcode = nvme_admin_async_event;
iod->cmd.common.command_id = NVME_LOOP_AQ_BLKMQ_DEPTH; iod->cmd.common.command_id = NVME_AQ_BLK_MQ_DEPTH;
iod->cmd.common.flags |= NVME_CMD_SGL_METABUF; iod->cmd.common.flags |= NVME_CMD_SGL_METABUF;
if (!nvmet_req_init(&iod->req, &queue->nvme_cq, &queue->nvme_sq, if (!nvmet_req_init(&iod->req, &queue->nvme_cq, &queue->nvme_sq,
@ -357,7 +349,7 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
memset(&ctrl->admin_tag_set, 0, sizeof(ctrl->admin_tag_set)); memset(&ctrl->admin_tag_set, 0, sizeof(ctrl->admin_tag_set));
ctrl->admin_tag_set.ops = &nvme_loop_admin_mq_ops; ctrl->admin_tag_set.ops = &nvme_loop_admin_mq_ops;
ctrl->admin_tag_set.queue_depth = NVME_LOOP_AQ_BLKMQ_DEPTH; ctrl->admin_tag_set.queue_depth = NVME_AQ_MQ_TAG_DEPTH;
ctrl->admin_tag_set.reserved_tags = 2; /* connect + keep-alive */ ctrl->admin_tag_set.reserved_tags = 2; /* connect + keep-alive */
ctrl->admin_tag_set.numa_node = NUMA_NO_NODE; ctrl->admin_tag_set.numa_node = NUMA_NO_NODE;
ctrl->admin_tag_set.cmd_size = sizeof(struct nvme_loop_iod) + ctrl->admin_tag_set.cmd_size = sizeof(struct nvme_loop_iod) +
@ -365,6 +357,7 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
ctrl->admin_tag_set.driver_data = ctrl; ctrl->admin_tag_set.driver_data = ctrl;
ctrl->admin_tag_set.nr_hw_queues = 1; ctrl->admin_tag_set.nr_hw_queues = 1;
ctrl->admin_tag_set.timeout = ADMIN_TIMEOUT; ctrl->admin_tag_set.timeout = ADMIN_TIMEOUT;
ctrl->admin_tag_set.flags = BLK_MQ_F_NO_SCHED;
ctrl->queues[0].ctrl = ctrl; ctrl->queues[0].ctrl = ctrl;
error = nvmet_sq_init(&ctrl->queues[0].nvme_sq); error = nvmet_sq_init(&ctrl->queues[0].nvme_sq);
@ -438,41 +431,9 @@ static void nvme_loop_shutdown_ctrl(struct nvme_loop_ctrl *ctrl)
nvme_loop_destroy_admin_queue(ctrl); nvme_loop_destroy_admin_queue(ctrl);
} }
static void nvme_loop_del_ctrl_work(struct work_struct *work) static void nvme_loop_delete_ctrl_host(struct nvme_ctrl *ctrl)
{ {
struct nvme_loop_ctrl *ctrl = container_of(work, nvme_loop_shutdown_ctrl(to_loop_ctrl(ctrl));
struct nvme_loop_ctrl, delete_work);
nvme_stop_ctrl(&ctrl->ctrl);
nvme_remove_namespaces(&ctrl->ctrl);
nvme_loop_shutdown_ctrl(ctrl);
nvme_uninit_ctrl(&ctrl->ctrl);
nvme_put_ctrl(&ctrl->ctrl);
}
static int __nvme_loop_del_ctrl(struct nvme_loop_ctrl *ctrl)
{
if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_DELETING))
return -EBUSY;
if (!queue_work(nvme_wq, &ctrl->delete_work))
return -EBUSY;
return 0;
}
static int nvme_loop_del_ctrl(struct nvme_ctrl *nctrl)
{
struct nvme_loop_ctrl *ctrl = to_loop_ctrl(nctrl);
int ret;
ret = __nvme_loop_del_ctrl(ctrl);
if (ret)
return ret;
flush_work(&ctrl->delete_work);
return 0;
} }
static void nvme_loop_delete_ctrl(struct nvmet_ctrl *nctrl) static void nvme_loop_delete_ctrl(struct nvmet_ctrl *nctrl)
@ -482,7 +443,7 @@ static void nvme_loop_delete_ctrl(struct nvmet_ctrl *nctrl)
mutex_lock(&nvme_loop_ctrl_mutex); mutex_lock(&nvme_loop_ctrl_mutex);
list_for_each_entry(ctrl, &nvme_loop_ctrl_list, list) { list_for_each_entry(ctrl, &nvme_loop_ctrl_list, list) {
if (ctrl->ctrl.cntlid == nctrl->cntlid) if (ctrl->ctrl.cntlid == nctrl->cntlid)
__nvme_loop_del_ctrl(ctrl); nvme_delete_ctrl(&ctrl->ctrl);
} }
mutex_unlock(&nvme_loop_ctrl_mutex); mutex_unlock(&nvme_loop_ctrl_mutex);
} }
@ -538,7 +499,7 @@ static const struct nvme_ctrl_ops nvme_loop_ctrl_ops = {
.reg_write32 = nvmf_reg_write32, .reg_write32 = nvmf_reg_write32,
.free_ctrl = nvme_loop_free_ctrl, .free_ctrl = nvme_loop_free_ctrl,
.submit_async_event = nvme_loop_submit_async_event, .submit_async_event = nvme_loop_submit_async_event,
.delete_ctrl = nvme_loop_del_ctrl, .delete_ctrl = nvme_loop_delete_ctrl_host,
}; };
static int nvme_loop_create_io_queues(struct nvme_loop_ctrl *ctrl) static int nvme_loop_create_io_queues(struct nvme_loop_ctrl *ctrl)
@ -600,7 +561,6 @@ static struct nvme_ctrl *nvme_loop_create_ctrl(struct device *dev,
ctrl->ctrl.opts = opts; ctrl->ctrl.opts = opts;
INIT_LIST_HEAD(&ctrl->list); INIT_LIST_HEAD(&ctrl->list);
INIT_WORK(&ctrl->delete_work, nvme_loop_del_ctrl_work);
INIT_WORK(&ctrl->ctrl.reset_work, nvme_loop_reset_ctrl_work); INIT_WORK(&ctrl->ctrl.reset_work, nvme_loop_reset_ctrl_work);
ret = nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_loop_ctrl_ops, ret = nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_loop_ctrl_ops,
@ -641,7 +601,7 @@ static struct nvme_ctrl *nvme_loop_create_ctrl(struct device *dev,
dev_info(ctrl->ctrl.device, dev_info(ctrl->ctrl.device,
"new ctrl: \"%s\"\n", ctrl->ctrl.opts->subsysnqn); "new ctrl: \"%s\"\n", ctrl->ctrl.opts->subsysnqn);
kref_get(&ctrl->ctrl.kref); nvme_get_ctrl(&ctrl->ctrl);
changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE); changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
WARN_ON_ONCE(!changed); WARN_ON_ONCE(!changed);
@ -730,7 +690,7 @@ static void __exit nvme_loop_cleanup_module(void)
mutex_lock(&nvme_loop_ctrl_mutex); mutex_lock(&nvme_loop_ctrl_mutex);
list_for_each_entry_safe(ctrl, next, &nvme_loop_ctrl_list, list) list_for_each_entry_safe(ctrl, next, &nvme_loop_ctrl_list, list)
__nvme_loop_del_ctrl(ctrl); nvme_delete_ctrl(&ctrl->ctrl);
mutex_unlock(&nvme_loop_ctrl_mutex); mutex_unlock(&nvme_loop_ctrl_mutex);
flush_workqueue(nvme_wq); flush_workqueue(nvme_wq);

View File

@ -223,7 +223,10 @@ struct nvmet_req {
struct bio inline_bio; struct bio inline_bio;
struct bio_vec inline_bvec[NVMET_MAX_INLINE_BIOVEC]; struct bio_vec inline_bvec[NVMET_MAX_INLINE_BIOVEC];
int sg_cnt; int sg_cnt;
/* data length as parsed from the command: */
size_t data_len; size_t data_len;
/* data length as parsed from the SGL descriptor: */
size_t transfer_len;
struct nvmet_port *port; struct nvmet_port *port;
@ -266,6 +269,7 @@ u16 nvmet_parse_fabrics_cmd(struct nvmet_req *req);
bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq, bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
struct nvmet_sq *sq, struct nvmet_fabrics_ops *ops); struct nvmet_sq *sq, struct nvmet_fabrics_ops *ops);
void nvmet_req_uninit(struct nvmet_req *req); void nvmet_req_uninit(struct nvmet_req *req);
void nvmet_req_execute(struct nvmet_req *req);
void nvmet_req_complete(struct nvmet_req *req, u16 status); void nvmet_req_complete(struct nvmet_req *req, u16 status);
void nvmet_cq_setup(struct nvmet_ctrl *ctrl, struct nvmet_cq *cq, u16 qid, void nvmet_cq_setup(struct nvmet_ctrl *ctrl, struct nvmet_cq *cq, u16 qid,
@ -314,7 +318,7 @@ u16 nvmet_copy_from_sgl(struct nvmet_req *req, off_t off, void *buf,
u32 nvmet_get_log_page_len(struct nvme_command *cmd); u32 nvmet_get_log_page_len(struct nvme_command *cmd);
#define NVMET_QUEUE_SIZE 1024 #define NVMET_QUEUE_SIZE 1024
#define NVMET_NR_QUEUES 64 #define NVMET_NR_QUEUES 128
#define NVMET_MAX_CMD NVMET_QUEUE_SIZE #define NVMET_MAX_CMD NVMET_QUEUE_SIZE
#define NVMET_KAS 10 #define NVMET_KAS 10
#define NVMET_DISC_KATO 120 #define NVMET_DISC_KATO 120

View File

@ -148,14 +148,14 @@ static inline u32 get_unaligned_le24(const u8 *p)
static inline bool nvmet_rdma_need_data_in(struct nvmet_rdma_rsp *rsp) static inline bool nvmet_rdma_need_data_in(struct nvmet_rdma_rsp *rsp)
{ {
return nvme_is_write(rsp->req.cmd) && return nvme_is_write(rsp->req.cmd) &&
rsp->req.data_len && rsp->req.transfer_len &&
!(rsp->flags & NVMET_RDMA_REQ_INLINE_DATA); !(rsp->flags & NVMET_RDMA_REQ_INLINE_DATA);
} }
static inline bool nvmet_rdma_need_data_out(struct nvmet_rdma_rsp *rsp) static inline bool nvmet_rdma_need_data_out(struct nvmet_rdma_rsp *rsp)
{ {
return !nvme_is_write(rsp->req.cmd) && return !nvme_is_write(rsp->req.cmd) &&
rsp->req.data_len && rsp->req.transfer_len &&
!rsp->req.rsp->status && !rsp->req.rsp->status &&
!(rsp->flags & NVMET_RDMA_REQ_INLINE_DATA); !(rsp->flags & NVMET_RDMA_REQ_INLINE_DATA);
} }
@ -577,7 +577,7 @@ static void nvmet_rdma_read_data_done(struct ib_cq *cq, struct ib_wc *wc)
return; return;
} }
rsp->req.execute(&rsp->req); nvmet_req_execute(&rsp->req);
} }
static void nvmet_rdma_use_inline_sg(struct nvmet_rdma_rsp *rsp, u32 len, static void nvmet_rdma_use_inline_sg(struct nvmet_rdma_rsp *rsp, u32 len,
@ -609,6 +609,7 @@ static u16 nvmet_rdma_map_sgl_inline(struct nvmet_rdma_rsp *rsp)
nvmet_rdma_use_inline_sg(rsp, len, off); nvmet_rdma_use_inline_sg(rsp, len, off);
rsp->flags |= NVMET_RDMA_REQ_INLINE_DATA; rsp->flags |= NVMET_RDMA_REQ_INLINE_DATA;
rsp->req.transfer_len += len;
return 0; return 0;
} }
@ -636,6 +637,7 @@ static u16 nvmet_rdma_map_sgl_keyed(struct nvmet_rdma_rsp *rsp,
nvmet_data_dir(&rsp->req)); nvmet_data_dir(&rsp->req));
if (ret < 0) if (ret < 0)
return NVME_SC_INTERNAL; return NVME_SC_INTERNAL;
rsp->req.transfer_len += len;
rsp->n_rdma += ret; rsp->n_rdma += ret;
if (invalidate) { if (invalidate) {
@ -693,7 +695,7 @@ static bool nvmet_rdma_execute_command(struct nvmet_rdma_rsp *rsp)
queue->cm_id->port_num, &rsp->read_cqe, NULL)) queue->cm_id->port_num, &rsp->read_cqe, NULL))
nvmet_req_complete(&rsp->req, NVME_SC_DATA_XFER_ERROR); nvmet_req_complete(&rsp->req, NVME_SC_DATA_XFER_ERROR);
} else { } else {
rsp->req.execute(&rsp->req); nvmet_req_execute(&rsp->req);
} }
return true; return true;
@ -1512,15 +1514,17 @@ static struct nvmet_fabrics_ops nvmet_rdma_ops = {
static void nvmet_rdma_remove_one(struct ib_device *ib_device, void *client_data) static void nvmet_rdma_remove_one(struct ib_device *ib_device, void *client_data)
{ {
struct nvmet_rdma_queue *queue; struct nvmet_rdma_queue *queue, *tmp;
/* Device is being removed, delete all queues using this device */ /* Device is being removed, delete all queues using this device */
mutex_lock(&nvmet_rdma_queue_mutex); mutex_lock(&nvmet_rdma_queue_mutex);
list_for_each_entry(queue, &nvmet_rdma_queue_list, queue_list) { list_for_each_entry_safe(queue, tmp, &nvmet_rdma_queue_list,
queue_list) {
if (queue->dev->device != ib_device) if (queue->dev->device != ib_device)
continue; continue;
pr_info("Removing queue %d\n", queue->idx); pr_info("Removing queue %d\n", queue->idx);
list_del_init(&queue->queue_list);
__nvmet_rdma_queue_disconnect(queue); __nvmet_rdma_queue_disconnect(queue);
} }
mutex_unlock(&nvmet_rdma_queue_mutex); mutex_unlock(&nvmet_rdma_queue_mutex);

View File

@ -130,7 +130,8 @@ config CHR_DEV_OSST
config BLK_DEV_SR config BLK_DEV_SR
tristate "SCSI CDROM support" tristate "SCSI CDROM support"
depends on SCSI depends on SCSI && BLK_DEV
select CDROM
---help--- ---help---
If you want to use a CD or DVD drive attached to your computer If you want to use a CD or DVD drive attached to your computer
by SCSI, FireWire, USB or ATAPI, say Y and read the SCSI-HOWTO by SCSI, FireWire, USB or ATAPI, say Y and read the SCSI-HOWTO

View File

@ -3246,6 +3246,11 @@ lpfc_update_rport_devloss_tmo(struct lpfc_vport *vport)
continue; continue;
if (ndlp->rport) if (ndlp->rport)
ndlp->rport->dev_loss_tmo = vport->cfg_devloss_tmo; ndlp->rport->dev_loss_tmo = vport->cfg_devloss_tmo;
#if (IS_ENABLED(CONFIG_NVME_FC))
if (ndlp->nrport)
nvme_fc_set_remoteport_devloss(ndlp->nrport->remoteport,
vport->cfg_devloss_tmo);
#endif
} }
spin_unlock_irq(shost->host_lock); spin_unlock_irq(shost->host_lock);
} }

View File

@ -252,9 +252,9 @@ int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
struct scsi_request *rq; struct scsi_request *rq;
int ret = DRIVER_ERROR << 24; int ret = DRIVER_ERROR << 24;
req = blk_get_request(sdev->request_queue, req = blk_get_request_flags(sdev->request_queue,
data_direction == DMA_TO_DEVICE ? data_direction == DMA_TO_DEVICE ?
REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, __GFP_RECLAIM); REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, BLK_MQ_REQ_PREEMPT);
if (IS_ERR(req)) if (IS_ERR(req))
return ret; return ret;
rq = scsi_req(req); rq = scsi_req(req);
@ -268,7 +268,7 @@ int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
rq->retries = retries; rq->retries = retries;
req->timeout = timeout; req->timeout = timeout;
req->cmd_flags |= flags; req->cmd_flags |= flags;
req->rq_flags |= rq_flags | RQF_QUIET | RQF_PREEMPT; req->rq_flags |= rq_flags | RQF_QUIET;
/* /*
* head injection *required* here otherwise quiesce won't work * head injection *required* here otherwise quiesce won't work
@ -1301,7 +1301,7 @@ scsi_prep_state_check(struct scsi_device *sdev, struct request *req)
/* /*
* If the devices is blocked we defer normal commands. * If the devices is blocked we defer normal commands.
*/ */
if (!(req->rq_flags & RQF_PREEMPT)) if (req && !(req->rq_flags & RQF_PREEMPT))
ret = BLKPREP_DEFER; ret = BLKPREP_DEFER;
break; break;
default: default:
@ -1310,7 +1310,7 @@ scsi_prep_state_check(struct scsi_device *sdev, struct request *req)
* special commands. In particular any user initiated * special commands. In particular any user initiated
* command is not allowed. * command is not allowed.
*/ */
if (!(req->rq_flags & RQF_PREEMPT)) if (req && !(req->rq_flags & RQF_PREEMPT))
ret = BLKPREP_KILL; ret = BLKPREP_KILL;
break; break;
} }
@ -1940,6 +1940,33 @@ static void scsi_mq_done(struct scsi_cmnd *cmd)
blk_mq_complete_request(cmd->request); blk_mq_complete_request(cmd->request);
} }
static void scsi_mq_put_budget(struct blk_mq_hw_ctx *hctx)
{
struct request_queue *q = hctx->queue;
struct scsi_device *sdev = q->queuedata;
atomic_dec(&sdev->device_busy);
put_device(&sdev->sdev_gendev);
}
static bool scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx)
{
struct request_queue *q = hctx->queue;
struct scsi_device *sdev = q->queuedata;
if (!get_device(&sdev->sdev_gendev))
goto out;
if (!scsi_dev_queue_ready(q, sdev))
goto out_put_device;
return true;
out_put_device:
put_device(&sdev->sdev_gendev);
out:
return false;
}
static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
const struct blk_mq_queue_data *bd) const struct blk_mq_queue_data *bd)
{ {
@ -1953,16 +1980,11 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
ret = prep_to_mq(scsi_prep_state_check(sdev, req)); ret = prep_to_mq(scsi_prep_state_check(sdev, req));
if (ret != BLK_STS_OK) if (ret != BLK_STS_OK)
goto out; goto out_put_budget;
ret = BLK_STS_RESOURCE; ret = BLK_STS_RESOURCE;
if (!get_device(&sdev->sdev_gendev))
goto out;
if (!scsi_dev_queue_ready(q, sdev))
goto out_put_device;
if (!scsi_target_queue_ready(shost, sdev)) if (!scsi_target_queue_ready(shost, sdev))
goto out_dec_device_busy; goto out_put_budget;
if (!scsi_host_queue_ready(q, shost, sdev)) if (!scsi_host_queue_ready(q, shost, sdev))
goto out_dec_target_busy; goto out_dec_target_busy;
@ -1993,15 +2015,12 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
return BLK_STS_OK; return BLK_STS_OK;
out_dec_host_busy: out_dec_host_busy:
atomic_dec(&shost->host_busy); atomic_dec(&shost->host_busy);
out_dec_target_busy: out_dec_target_busy:
if (scsi_target(sdev)->can_queue > 0) if (scsi_target(sdev)->can_queue > 0)
atomic_dec(&scsi_target(sdev)->target_busy); atomic_dec(&scsi_target(sdev)->target_busy);
out_dec_device_busy: out_put_budget:
atomic_dec(&sdev->device_busy); scsi_mq_put_budget(hctx);
out_put_device:
put_device(&sdev->sdev_gendev);
out:
switch (ret) { switch (ret) {
case BLK_STS_OK: case BLK_STS_OK:
break; break;
@ -2205,6 +2224,8 @@ struct request_queue *scsi_old_alloc_queue(struct scsi_device *sdev)
} }
static const struct blk_mq_ops scsi_mq_ops = { static const struct blk_mq_ops scsi_mq_ops = {
.get_budget = scsi_mq_get_budget,
.put_budget = scsi_mq_put_budget,
.queue_rq = scsi_queue_rq, .queue_rq = scsi_queue_rq,
.complete = scsi_softirq_done, .complete = scsi_softirq_done,
.timeout = scsi_timeout, .timeout = scsi_timeout,
@ -2919,21 +2940,37 @@ static void scsi_wait_for_queuecommand(struct scsi_device *sdev)
int int
scsi_device_quiesce(struct scsi_device *sdev) scsi_device_quiesce(struct scsi_device *sdev)
{ {
struct request_queue *q = sdev->request_queue;
int err; int err;
/*
* It is allowed to call scsi_device_quiesce() multiple times from
* the same context but concurrent scsi_device_quiesce() calls are
* not allowed.
*/
WARN_ON_ONCE(sdev->quiesced_by && sdev->quiesced_by != current);
blk_set_preempt_only(q);
blk_mq_freeze_queue(q);
/*
* Ensure that the effect of blk_set_preempt_only() will be visible
* for percpu_ref_tryget() callers that occur after the queue
* unfreeze even if the queue was already frozen before this function
* was called. See also https://lwn.net/Articles/573497/.
*/
synchronize_rcu();
blk_mq_unfreeze_queue(q);
mutex_lock(&sdev->state_mutex); mutex_lock(&sdev->state_mutex);
err = scsi_device_set_state(sdev, SDEV_QUIESCE); err = scsi_device_set_state(sdev, SDEV_QUIESCE);
if (err == 0)
sdev->quiesced_by = current;
else
blk_clear_preempt_only(q);
mutex_unlock(&sdev->state_mutex); mutex_unlock(&sdev->state_mutex);
if (err) return err;
return err;
scsi_run_queue(sdev->request_queue);
while (atomic_read(&sdev->device_busy)) {
msleep_interruptible(200);
scsi_run_queue(sdev->request_queue);
}
return 0;
} }
EXPORT_SYMBOL(scsi_device_quiesce); EXPORT_SYMBOL(scsi_device_quiesce);
@ -2953,9 +2990,11 @@ void scsi_device_resume(struct scsi_device *sdev)
* device deleted during suspend) * device deleted during suspend)
*/ */
mutex_lock(&sdev->state_mutex); mutex_lock(&sdev->state_mutex);
if (sdev->sdev_state == SDEV_QUIESCE && WARN_ON_ONCE(!sdev->quiesced_by);
scsi_device_set_state(sdev, SDEV_RUNNING) == 0) sdev->quiesced_by = NULL;
scsi_run_queue(sdev->request_queue); blk_clear_preempt_only(sdev->request_queue);
if (sdev->sdev_state == SDEV_QUIESCE)
scsi_device_set_state(sdev, SDEV_RUNNING);
mutex_unlock(&sdev->state_mutex); mutex_unlock(&sdev->state_mutex);
} }
EXPORT_SYMBOL(scsi_device_resume); EXPORT_SYMBOL(scsi_device_resume);

View File

@ -217,7 +217,7 @@ static int sg_allow_access(struct file *filp, unsigned char *cmd)
if (sfp->parentdp->device->type == TYPE_SCANNER) if (sfp->parentdp->device->type == TYPE_SCANNER)
return 0; return 0;
return blk_verify_command(cmd, filp->f_mode & FMODE_WRITE); return blk_verify_command(cmd, filp->f_mode);
} }
static int static int

View File

@ -54,18 +54,6 @@ struct block_device *I_BDEV(struct inode *inode)
} }
EXPORT_SYMBOL(I_BDEV); EXPORT_SYMBOL(I_BDEV);
void __vfs_msg(struct super_block *sb, const char *prefix, const char *fmt, ...)
{
struct va_format vaf;
va_list args;
va_start(args, fmt);
vaf.fmt = fmt;
vaf.va = &args;
printk_ratelimited("%sVFS (%s): %pV\n", prefix, sb->s_id, &vaf);
va_end(args);
}
static void bdev_write_inode(struct block_device *bdev) static void bdev_write_inode(struct block_device *bdev)
{ {
struct inode *inode = bdev->bd_inode; struct inode *inode = bdev->bd_inode;
@ -249,7 +237,7 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
if (!READ_ONCE(bio.bi_private)) if (!READ_ONCE(bio.bi_private))
break; break;
if (!(iocb->ki_flags & IOCB_HIPRI) || if (!(iocb->ki_flags & IOCB_HIPRI) ||
!blk_mq_poll(bdev_get_queue(bdev), qc)) !blk_poll(bdev_get_queue(bdev), qc))
io_schedule(); io_schedule();
} }
__set_current_state(TASK_RUNNING); __set_current_state(TASK_RUNNING);
@ -414,7 +402,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages)
break; break;
if (!(iocb->ki_flags & IOCB_HIPRI) || if (!(iocb->ki_flags & IOCB_HIPRI) ||
!blk_mq_poll(bdev_get_queue(bdev), qc)) !blk_poll(bdev_get_queue(bdev), qc))
io_schedule(); io_schedule();
} }
__set_current_state(TASK_RUNNING); __set_current_state(TASK_RUNNING);
@ -674,7 +662,7 @@ int bdev_read_page(struct block_device *bdev, sector_t sector,
if (!ops->rw_page || bdev_get_integrity(bdev)) if (!ops->rw_page || bdev_get_integrity(bdev))
return result; return result;
result = blk_queue_enter(bdev->bd_queue, false); result = blk_queue_enter(bdev->bd_queue, 0);
if (result) if (result)
return result; return result;
result = ops->rw_page(bdev, sector + get_start_sect(bdev), page, false); result = ops->rw_page(bdev, sector + get_start_sect(bdev), page, false);
@ -710,7 +698,7 @@ int bdev_write_page(struct block_device *bdev, sector_t sector,
if (!ops->rw_page || bdev_get_integrity(bdev)) if (!ops->rw_page || bdev_get_integrity(bdev))
return -EOPNOTSUPP; return -EOPNOTSUPP;
result = blk_queue_enter(bdev->bd_queue, false); result = blk_queue_enter(bdev->bd_queue, 0);
if (result) if (result)
return result; return result;

View File

@ -252,27 +252,6 @@ out:
return ret; return ret;
} }
/*
* Kick the writeback threads then try to free up some ZONE_NORMAL memory.
*/
static void free_more_memory(void)
{
struct zoneref *z;
int nid;
wakeup_flusher_threads(1024, WB_REASON_FREE_MORE_MEM);
yield();
for_each_online_node(nid) {
z = first_zones_zonelist(node_zonelist(nid, GFP_NOFS),
gfp_zone(GFP_NOFS), NULL);
if (z->zone)
try_to_free_pages(node_zonelist(nid, GFP_NOFS), 0,
GFP_NOFS, NULL);
}
}
/* /*
* I/O completion handler for block_read_full_page() - pages * I/O completion handler for block_read_full_page() - pages
* which come unlocked at the end of I/O. * which come unlocked at the end of I/O.
@ -861,16 +840,19 @@ int remove_inode_buffers(struct inode *inode)
* which may not fail from ordinary buffer allocations. * which may not fail from ordinary buffer allocations.
*/ */
struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size, struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size,
int retry) bool retry)
{ {
struct buffer_head *bh, *head; struct buffer_head *bh, *head;
gfp_t gfp = GFP_NOFS;
long offset; long offset;
try_again: if (retry)
gfp |= __GFP_NOFAIL;
head = NULL; head = NULL;
offset = PAGE_SIZE; offset = PAGE_SIZE;
while ((offset -= size) >= 0) { while ((offset -= size) >= 0) {
bh = alloc_buffer_head(GFP_NOFS); bh = alloc_buffer_head(gfp);
if (!bh) if (!bh)
goto no_grow; goto no_grow;
@ -896,23 +878,7 @@ no_grow:
} while (head); } while (head);
} }
/* return NULL;
* Return failure for non-async IO requests. Async IO requests
* are not allowed to fail, so we have to wait until buffer heads
* become available. But we don't want tasks sleeping with
* partially complete buffers, so all were released above.
*/
if (!retry)
return NULL;
/* We're _really_ low on memory. Now we just
* wait for old buffer heads to become free due to
* finishing IO. Since this is an async request and
* the reserve list is empty, we're sure there are
* async buffer heads in use.
*/
free_more_memory();
goto try_again;
} }
EXPORT_SYMBOL_GPL(alloc_page_buffers); EXPORT_SYMBOL_GPL(alloc_page_buffers);
@ -1001,8 +967,6 @@ grow_dev_page(struct block_device *bdev, sector_t block,
gfp_mask |= __GFP_NOFAIL; gfp_mask |= __GFP_NOFAIL;
page = find_or_create_page(inode->i_mapping, index, gfp_mask); page = find_or_create_page(inode->i_mapping, index, gfp_mask);
if (!page)
return ret;
BUG_ON(!PageLocked(page)); BUG_ON(!PageLocked(page));
@ -1021,9 +985,7 @@ grow_dev_page(struct block_device *bdev, sector_t block,
/* /*
* Allocate some buffers for this page * Allocate some buffers for this page
*/ */
bh = alloc_page_buffers(page, size, 0); bh = alloc_page_buffers(page, size, true);
if (!bh)
goto failed;
/* /*
* Link the page to the buffers and initialise them. Take the * Link the page to the buffers and initialise them. Take the
@ -1103,8 +1065,6 @@ __getblk_slow(struct block_device *bdev, sector_t block,
ret = grow_buffers(bdev, block, size, gfp); ret = grow_buffers(bdev, block, size, gfp);
if (ret < 0) if (ret < 0)
return NULL; return NULL;
if (ret == 0)
free_more_memory();
} }
} }
@ -1575,7 +1535,7 @@ void create_empty_buffers(struct page *page,
{ {
struct buffer_head *bh, *head, *tail; struct buffer_head *bh, *head, *tail;
head = alloc_page_buffers(page, blocksize, 1); head = alloc_page_buffers(page, blocksize, true);
bh = head; bh = head;
do { do {
bh->b_state |= b_state; bh->b_state |= b_state;
@ -2639,7 +2599,7 @@ int nobh_write_begin(struct address_space *mapping,
* Be careful: the buffer linked list is a NULL terminated one, rather * Be careful: the buffer linked list is a NULL terminated one, rather
* than the circular one we're used to. * than the circular one we're used to.
*/ */
head = alloc_page_buffers(page, blocksize, 0); head = alloc_page_buffers(page, blocksize, false);
if (!head) { if (!head) {
ret = -ENOMEM; ret = -ENOMEM;
goto out_release; goto out_release;
@ -3056,8 +3016,16 @@ void guard_bio_eod(int op, struct bio *bio)
sector_t maxsector; sector_t maxsector;
struct bio_vec *bvec = &bio->bi_io_vec[bio->bi_vcnt - 1]; struct bio_vec *bvec = &bio->bi_io_vec[bio->bi_vcnt - 1];
unsigned truncated_bytes; unsigned truncated_bytes;
struct hd_struct *part;
rcu_read_lock();
part = __disk_get_part(bio->bi_disk, bio->bi_partno);
if (part)
maxsector = part_nr_sects_read(part);
else
maxsector = get_capacity(bio->bi_disk);
rcu_read_unlock();
maxsector = get_capacity(bio->bi_disk);
if (!maxsector) if (!maxsector)
return; return;

Some files were not shown because too many files have changed in this diff Show More