for-4.21/block-20181221
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlwb7R8QHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpjiID/97oDjMhNT7rwpuMbHw855h62j1hEN/m+N3 FI0uxivYoYZLD+eJRnMcBwHlKjrCX8iJQAcv9ffI3ThtFW7dnZT3atUacaZVR/Dt IrxdymdBP3qsmuaId5NYBug7rJ+AiqFJKjEvCcSPu5X397J4I3SEbzhfvYLJ/aZX 16o0HJlVVIrcbmq1IP4HwiIIOaKXvPaw04L4z4fpeynRSWG7EAi8NLSnhlR4Rxbb BTiMkCTsjRCFdyO6da4fvNQKWmPGPa3bJkYy3qR99cvJCeIbQjRyCloQlWNJRRgi 3eJpCHVxqFmN0/+DNTJVQEEr4H8o0AVucrLVct1Jc4pessenkpoUniP8vELqwlng Z2VHLkhTfCEmvFlk82grrYdNvGATRsrbswt/PlP4T7rBfr1IpDk8kXDWF59EL2dy ly35Sk3wJGHBl8qa+vEPXOAnaWdqJXuVGpwB4ifOIatOls8mOxwfZjiRc7x05/fC 1O4rR2IfLwRqwoYHs0AJ+h6ohOSn1mkGezl2Tch1VSFcJUOHmuYvraTaUi6hblpA SslaAoEhO39hRBL0HsvsMeqVWM9uzqvFkLDCfNPdiA81H1258CIbo4vF8z6czCIS eeXnTJxVhPVbZgb3a1a93SPwM6KIDZFoIijyd+NqjpU94thlnhYD0QEcKJIKH7os 2p4aHs6ktw== =TRdW -----END PGP SIGNATURE----- Merge tag 'for-4.21/block-20181221' of git://git.kernel.dk/linux-block Pull block updates from Jens Axboe: "This is the main pull request for block/storage for 4.21. Larger than usual, it was a busy round with lots of goodies queued up. Most notable is the removal of the old IO stack, which has been a long time coming. No new features for a while, everything coming in this week has all been fixes for things that were previously merged. This contains: - Use atomic counters instead of semaphores for mtip32xx (Arnd) - Cleanup of the mtip32xx request setup (Christoph) - Fix for circular locking dependency in loop (Jan, Tetsuo) - bcache (Coly, Guoju, Shenghui) * Optimizations for writeback caching * Various fixes and improvements - nvme (Chaitanya, Christoph, Sagi, Jay, me, Keith) * host and target support for NVMe over TCP * Error log page support * Support for separate read/write/poll queues * Much improved polling * discard OOM fallback * Tracepoint improvements - lightnvm (Hans, Hua, Igor, Matias, Javier) * Igor added packed metadata to pblk. Now drives without metadata per LBA can be used as well. * Fix from Geert on uninitialized value on chunk metadata reads. * Fixes from Hans and Javier to pblk recovery and write path. * Fix from Hua Su to fix a race condition in the pblk recovery code. * Scan optimization added to pblk recovery from Zhoujie. * Small geometry cleanup from me. - Conversion of the last few drivers that used the legacy path to blk-mq (me) - Removal of legacy IO path in SCSI (me, Christoph) - Removal of legacy IO stack and schedulers (me) - Support for much better polling, now without interrupts at all. blk-mq adds support for multiple queue maps, which enables us to have a map per type. This in turn enables nvme to have separate completion queues for polling, which can then be interrupt-less. Also means we're ready for async polled IO, which is hopefully coming in the next release. - Killing of (now) unused block exports (Christoph) - Unification of the blk-rq-qos and blk-wbt wait handling (Josef) - Support for zoned testing with null_blk (Masato) - sx8 conversion to per-host tag sets (Christoph) - IO priority improvements (Damien) - mq-deadline zoned fix (Damien) - Ref count blkcg series (Dennis) - Lots of blk-mq improvements and speedups (me) - sbitmap scalability improvements (me) - Make core inflight IO accounting per-cpu (Mikulas) - Export timeout setting in sysfs (Weiping) - Cleanup the direct issue path (Jianchao) - Export blk-wbt internals in block debugfs for easier debugging (Ming) - Lots of other fixes and improvements" * tag 'for-4.21/block-20181221' of git://git.kernel.dk/linux-block: (364 commits) kyber: use sbitmap add_wait_queue/list_del wait helpers sbitmap: add helpers for add/del wait queue handling block: save irq state in blkg_lookup_create() dm: don't reuse bio for flushes nvme-pci: trace SQ status on completions nvme-rdma: implement polling queue map nvme-fabrics: allow user to pass in nr_poll_queues nvme-fabrics: allow nvmf_connect_io_queue to poll nvme-core: optionally poll sync commands block: make request_to_qc_t public nvme-tcp: fix spelling mistake "attepmpt" -> "attempt" nvme-tcp: fix endianess annotations nvmet-tcp: fix endianess annotations nvme-pci: refactor nvme_poll_irqdisable to make sparse happy nvme-pci: only set nr_maps to 2 if poll queues are supported nvmet: use a macro for default error location nvmet: fix comparison of a u16 with -1 blk-mq: enable IO poll if .nr_queues of type poll > 0 blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight() blk-mq: skip zero-queue maps in blk_mq_map_swqueue ...
This commit is contained in:
commit
0e9da3fbf7
|
@ -244,7 +244,7 @@ Description:
|
|||
|
||||
What: /sys/block/<disk>/queue/zoned
|
||||
Date: September 2016
|
||||
Contact: Damien Le Moal <damien.lemoal@hgst.com>
|
||||
Contact: Damien Le Moal <damien.lemoal@wdc.com>
|
||||
Description:
|
||||
zoned indicates if the device is a zoned block device
|
||||
and the zone model of the device if it is indeed zoned.
|
||||
|
@ -259,6 +259,14 @@ Description:
|
|||
zone commands, they will be treated as regular block
|
||||
devices and zoned will report "none".
|
||||
|
||||
What: /sys/block/<disk>/queue/nr_zones
|
||||
Date: November 2018
|
||||
Contact: Damien Le Moal <damien.lemoal@wdc.com>
|
||||
Description:
|
||||
nr_zones indicates the total number of zones of a zoned block
|
||||
device ("host-aware" or "host-managed" zone model). For regular
|
||||
block devices, the value is always 0.
|
||||
|
||||
What: /sys/block/<disk>/queue/chunk_sectors
|
||||
Date: September 2016
|
||||
Contact: Hannes Reinecke <hare@suse.com>
|
||||
|
@ -268,6 +276,6 @@ Description:
|
|||
indicates the size in 512B sectors of the RAID volume
|
||||
stripe segment. For a zoned block device, either
|
||||
host-aware or host-managed, chunk_sectors indicates the
|
||||
size of 512B sectors of the zones of the device, with
|
||||
size in 512B sectors of the zones of the device, with
|
||||
the eventual exception of the last zone of the device
|
||||
which may be smaller.
|
||||
|
|
|
@ -1879,8 +1879,10 @@ following two functions.
|
|||
|
||||
wbc_init_bio(@wbc, @bio)
|
||||
Should be called for each bio carrying writeback data and
|
||||
associates the bio with the inode's owner cgroup. Can be
|
||||
called anytime between bio allocation and submission.
|
||||
associates the bio with the inode's owner cgroup and the
|
||||
corresponding request queue. This must be called after
|
||||
a queue (device) has been associated with the bio and
|
||||
before submission.
|
||||
|
||||
wbc_account_io(@wbc, @page, @bytes)
|
||||
Should be called for each data segment being written out.
|
||||
|
@ -1899,7 +1901,7 @@ the configuration, the bio may be executed at a lower priority and if
|
|||
the writeback session is holding shared resources, e.g. a journal
|
||||
entry, may lead to priority inversion. There is no one easy solution
|
||||
for the problem. Filesystems can try to work around specific problem
|
||||
cases by skipping wbc_init_bio() or using bio_associate_blkcg()
|
||||
cases by skipping wbc_init_bio() and using bio_associate_blkg()
|
||||
directly.
|
||||
|
||||
|
||||
|
|
|
@ -65,7 +65,6 @@ Description of Contents:
|
|||
3.2.3 I/O completion
|
||||
3.2.4 Implications for drivers that do not interpret bios (don't handle
|
||||
multiple segments)
|
||||
3.2.5 Request command tagging
|
||||
3.3 I/O submission
|
||||
4. The I/O scheduler
|
||||
5. Scalability related changes
|
||||
|
@ -708,93 +707,6 @@ is crossed on completion of a transfer. (The end*request* functions should
|
|||
be used if only if the request has come down from block/bio path, not for
|
||||
direct access requests which only specify rq->buffer without a valid rq->bio)
|
||||
|
||||
3.2.5 Generic request command tagging
|
||||
|
||||
3.2.5.1 Tag helpers
|
||||
|
||||
Block now offers some simple generic functionality to help support command
|
||||
queueing (typically known as tagged command queueing), ie manage more than
|
||||
one outstanding command on a queue at any given time.
|
||||
|
||||
blk_queue_init_tags(struct request_queue *q, int depth)
|
||||
|
||||
Initialize internal command tagging structures for a maximum
|
||||
depth of 'depth'.
|
||||
|
||||
blk_queue_free_tags((struct request_queue *q)
|
||||
|
||||
Teardown tag info associated with the queue. This will be done
|
||||
automatically by block if blk_queue_cleanup() is called on a queue
|
||||
that is using tagging.
|
||||
|
||||
The above are initialization and exit management, the main helpers during
|
||||
normal operations are:
|
||||
|
||||
blk_queue_start_tag(struct request_queue *q, struct request *rq)
|
||||
|
||||
Start tagged operation for this request. A free tag number between
|
||||
0 and 'depth' is assigned to the request (rq->tag holds this number),
|
||||
and 'rq' is added to the internal tag management. If the maximum depth
|
||||
for this queue is already achieved (or if the tag wasn't started for
|
||||
some other reason), 1 is returned. Otherwise 0 is returned.
|
||||
|
||||
blk_queue_end_tag(struct request_queue *q, struct request *rq)
|
||||
|
||||
End tagged operation on this request. 'rq' is removed from the internal
|
||||
book keeping structures.
|
||||
|
||||
To minimize struct request and queue overhead, the tag helpers utilize some
|
||||
of the same request members that are used for normal request queue management.
|
||||
This means that a request cannot both be an active tag and be on the queue
|
||||
list at the same time. blk_queue_start_tag() will remove the request, but
|
||||
the driver must remember to call blk_queue_end_tag() before signalling
|
||||
completion of the request to the block layer. This means ending tag
|
||||
operations before calling end_that_request_last()! For an example of a user
|
||||
of these helpers, see the IDE tagged command queueing support.
|
||||
|
||||
3.2.5.2 Tag info
|
||||
|
||||
Some block functions exist to query current tag status or to go from a
|
||||
tag number to the associated request. These are, in no particular order:
|
||||
|
||||
blk_queue_tagged(q)
|
||||
|
||||
Returns 1 if the queue 'q' is using tagging, 0 if not.
|
||||
|
||||
blk_queue_tag_request(q, tag)
|
||||
|
||||
Returns a pointer to the request associated with tag 'tag'.
|
||||
|
||||
blk_queue_tag_depth(q)
|
||||
|
||||
Return current queue depth.
|
||||
|
||||
blk_queue_tag_queue(q)
|
||||
|
||||
Returns 1 if the queue can accept a new queued command, 0 if we are
|
||||
at the maximum depth already.
|
||||
|
||||
blk_queue_rq_tagged(rq)
|
||||
|
||||
Returns 1 if the request 'rq' is tagged.
|
||||
|
||||
3.2.5.2 Internal structure
|
||||
|
||||
Internally, block manages tags in the blk_queue_tag structure:
|
||||
|
||||
struct blk_queue_tag {
|
||||
struct request **tag_index; /* array or pointers to rq */
|
||||
unsigned long *tag_map; /* bitmap of free tags */
|
||||
struct list_head busy_list; /* fifo list of busy tags */
|
||||
int busy; /* queue depth */
|
||||
int max_depth; /* max queue depth */
|
||||
};
|
||||
|
||||
Most of the above is simple and straight forward, however busy_list may need
|
||||
a bit of explaining. Normally we don't care too much about request ordering,
|
||||
but in the event of any barrier requests in the tag queue we need to ensure
|
||||
that requests are restarted in the order they were queue.
|
||||
|
||||
3.3 I/O Submission
|
||||
|
||||
The routine submit_bio() is used to submit a single io. Higher level i/o
|
||||
|
|
|
@ -1,291 +0,0 @@
|
|||
CFQ (Complete Fairness Queueing)
|
||||
===============================
|
||||
|
||||
The main aim of CFQ scheduler is to provide a fair allocation of the disk
|
||||
I/O bandwidth for all the processes which requests an I/O operation.
|
||||
|
||||
CFQ maintains the per process queue for the processes which request I/O
|
||||
operation(synchronous requests). In case of asynchronous requests, all the
|
||||
requests from all the processes are batched together according to their
|
||||
process's I/O priority.
|
||||
|
||||
CFQ ioscheduler tunables
|
||||
========================
|
||||
|
||||
slice_idle
|
||||
----------
|
||||
This specifies how long CFQ should idle for next request on certain cfq queues
|
||||
(for sequential workloads) and service trees (for random workloads) before
|
||||
queue is expired and CFQ selects next queue to dispatch from.
|
||||
|
||||
By default slice_idle is a non-zero value. That means by default we idle on
|
||||
queues/service trees. This can be very helpful on highly seeky media like
|
||||
single spindle SATA/SAS disks where we can cut down on overall number of
|
||||
seeks and see improved throughput.
|
||||
|
||||
Setting slice_idle to 0 will remove all the idling on queues/service tree
|
||||
level and one should see an overall improved throughput on faster storage
|
||||
devices like multiple SATA/SAS disks in hardware RAID configuration. The down
|
||||
side is that isolation provided from WRITES also goes down and notion of
|
||||
IO priority becomes weaker.
|
||||
|
||||
So depending on storage and workload, it might be useful to set slice_idle=0.
|
||||
In general I think for SATA/SAS disks and software RAID of SATA/SAS disks
|
||||
keeping slice_idle enabled should be useful. For any configurations where
|
||||
there are multiple spindles behind single LUN (Host based hardware RAID
|
||||
controller or for storage arrays), setting slice_idle=0 might end up in better
|
||||
throughput and acceptable latencies.
|
||||
|
||||
back_seek_max
|
||||
-------------
|
||||
This specifies, given in Kbytes, the maximum "distance" for backward seeking.
|
||||
The distance is the amount of space from the current head location to the
|
||||
sectors that are backward in terms of distance.
|
||||
|
||||
This parameter allows the scheduler to anticipate requests in the "backward"
|
||||
direction and consider them as being the "next" if they are within this
|
||||
distance from the current head location.
|
||||
|
||||
back_seek_penalty
|
||||
-----------------
|
||||
This parameter is used to compute the cost of backward seeking. If the
|
||||
backward distance of request is just 1/back_seek_penalty from a "front"
|
||||
request, then the seeking cost of two requests is considered equivalent.
|
||||
|
||||
So scheduler will not bias toward one or the other request (otherwise scheduler
|
||||
will bias toward front request). Default value of back_seek_penalty is 2.
|
||||
|
||||
fifo_expire_async
|
||||
-----------------
|
||||
This parameter is used to set the timeout of asynchronous requests. Default
|
||||
value of this is 248ms.
|
||||
|
||||
fifo_expire_sync
|
||||
----------------
|
||||
This parameter is used to set the timeout of synchronous requests. Default
|
||||
value of this is 124ms. In case to favor synchronous requests over asynchronous
|
||||
one, this value should be decreased relative to fifo_expire_async.
|
||||
|
||||
group_idle
|
||||
-----------
|
||||
This parameter forces idling at the CFQ group level instead of CFQ
|
||||
queue level. This was introduced after a bottleneck was observed
|
||||
in higher end storage due to idle on sequential queue and allow dispatch
|
||||
from a single queue. The idea with this parameter is that it can be run with
|
||||
slice_idle=0 and group_idle=8, so that idling does not happen on individual
|
||||
queues in the group but happens overall on the group and thus still keeps the
|
||||
IO controller working.
|
||||
Not idling on individual queues in the group will dispatch requests from
|
||||
multiple queues in the group at the same time and achieve higher throughput
|
||||
on higher end storage.
|
||||
|
||||
Default value for this parameter is 8ms.
|
||||
|
||||
low_latency
|
||||
-----------
|
||||
This parameter is used to enable/disable the low latency mode of the CFQ
|
||||
scheduler. If enabled, CFQ tries to recompute the slice time for each process
|
||||
based on the target_latency set for the system. This favors fairness over
|
||||
throughput. Disabling low latency (setting it to 0) ignores target latency,
|
||||
allowing each process in the system to get a full time slice.
|
||||
|
||||
By default low latency mode is enabled.
|
||||
|
||||
target_latency
|
||||
--------------
|
||||
This parameter is used to calculate the time slice for a process if cfq's
|
||||
latency mode is enabled. It will ensure that sync requests have an estimated
|
||||
latency. But if sequential workload is higher(e.g. sequential read),
|
||||
then to meet the latency constraints, throughput may decrease because of less
|
||||
time for each process to issue I/O request before the cfq queue is switched.
|
||||
|
||||
Though this can be overcome by disabling the latency_mode, it may increase
|
||||
the read latency for some applications. This parameter allows for changing
|
||||
target_latency through the sysfs interface which can provide the balanced
|
||||
throughput and read latency.
|
||||
|
||||
Default value for target_latency is 300ms.
|
||||
|
||||
slice_async
|
||||
-----------
|
||||
This parameter is same as of slice_sync but for asynchronous queue. The
|
||||
default value is 40ms.
|
||||
|
||||
slice_async_rq
|
||||
--------------
|
||||
This parameter is used to limit the dispatching of asynchronous request to
|
||||
device request queue in queue's slice time. The maximum number of request that
|
||||
are allowed to be dispatched also depends upon the io priority. Default value
|
||||
for this is 2.
|
||||
|
||||
slice_sync
|
||||
----------
|
||||
When a queue is selected for execution, the queues IO requests are only
|
||||
executed for a certain amount of time(time_slice) before switching to another
|
||||
queue. This parameter is used to calculate the time slice of synchronous
|
||||
queue.
|
||||
|
||||
time_slice is computed using the below equation:-
|
||||
time_slice = slice_sync + (slice_sync/5 * (4 - prio)). To increase the
|
||||
time_slice of synchronous queue, increase the value of slice_sync. Default
|
||||
value is 100ms.
|
||||
|
||||
quantum
|
||||
-------
|
||||
This specifies the number of request dispatched to the device queue. In a
|
||||
queue's time slice, a request will not be dispatched if the number of request
|
||||
in the device exceeds this parameter. This parameter is used for synchronous
|
||||
request.
|
||||
|
||||
In case of storage with several disk, this setting can limit the parallel
|
||||
processing of request. Therefore, increasing the value can improve the
|
||||
performance although this can cause the latency of some I/O to increase due
|
||||
to more number of requests.
|
||||
|
||||
CFQ Group scheduling
|
||||
====================
|
||||
|
||||
CFQ supports blkio cgroup and has "blkio." prefixed files in each
|
||||
blkio cgroup directory. It is weight-based and there are four knobs
|
||||
for configuration - weight[_device] and leaf_weight[_device].
|
||||
Internal cgroup nodes (the ones with children) can also have tasks in
|
||||
them, so the former two configure how much proportion the cgroup as a
|
||||
whole is entitled to at its parent's level while the latter two
|
||||
configure how much proportion the tasks in the cgroup have compared to
|
||||
its direct children.
|
||||
|
||||
Another way to think about it is assuming that each internal node has
|
||||
an implicit leaf child node which hosts all the tasks whose weight is
|
||||
configured by leaf_weight[_device]. Let's assume a blkio hierarchy
|
||||
composed of five cgroups - root, A, B, AA and AB - with the following
|
||||
weights where the names represent the hierarchy.
|
||||
|
||||
weight leaf_weight
|
||||
root : 125 125
|
||||
A : 500 750
|
||||
B : 250 500
|
||||
AA : 500 500
|
||||
AB : 1000 500
|
||||
|
||||
root never has a parent making its weight is meaningless. For backward
|
||||
compatibility, weight is always kept in sync with leaf_weight. B, AA
|
||||
and AB have no child and thus its tasks have no children cgroup to
|
||||
compete with. They always get 100% of what the cgroup won at the
|
||||
parent level. Considering only the weights which matter, the hierarchy
|
||||
looks like the following.
|
||||
|
||||
root
|
||||
/ | \
|
||||
A B leaf
|
||||
500 250 125
|
||||
/ | \
|
||||
AA AB leaf
|
||||
500 1000 750
|
||||
|
||||
If all cgroups have active IOs and competing with each other, disk
|
||||
time will be distributed like the following.
|
||||
|
||||
Distribution below root. The total active weight at this level is
|
||||
A:500 + B:250 + C:125 = 875.
|
||||
|
||||
root-leaf : 125 / 875 =~ 14%
|
||||
A : 500 / 875 =~ 57%
|
||||
B(-leaf) : 250 / 875 =~ 28%
|
||||
|
||||
A has children and further distributes its 57% among the children and
|
||||
the implicit leaf node. The total active weight at this level is
|
||||
AA:500 + AB:1000 + A-leaf:750 = 2250.
|
||||
|
||||
A-leaf : ( 750 / 2250) * A =~ 19%
|
||||
AA(-leaf) : ( 500 / 2250) * A =~ 12%
|
||||
AB(-leaf) : (1000 / 2250) * A =~ 25%
|
||||
|
||||
CFQ IOPS Mode for group scheduling
|
||||
===================================
|
||||
Basic CFQ design is to provide priority based time slices. Higher priority
|
||||
process gets bigger time slice and lower priority process gets smaller time
|
||||
slice. Measuring time becomes harder if storage is fast and supports NCQ and
|
||||
it would be better to dispatch multiple requests from multiple cfq queues in
|
||||
request queue at a time. In such scenario, it is not possible to measure time
|
||||
consumed by single queue accurately.
|
||||
|
||||
What is possible though is to measure number of requests dispatched from a
|
||||
single queue and also allow dispatch from multiple cfq queue at the same time.
|
||||
This effectively becomes the fairness in terms of IOPS (IO operations per
|
||||
second).
|
||||
|
||||
If one sets slice_idle=0 and if storage supports NCQ, CFQ internally switches
|
||||
to IOPS mode and starts providing fairness in terms of number of requests
|
||||
dispatched. Note that this mode switching takes effect only for group
|
||||
scheduling. For non-cgroup users nothing should change.
|
||||
|
||||
CFQ IO scheduler Idling Theory
|
||||
===============================
|
||||
Idling on a queue is primarily about waiting for the next request to come
|
||||
on same queue after completion of a request. In this process CFQ will not
|
||||
dispatch requests from other cfq queues even if requests are pending there.
|
||||
|
||||
The rationale behind idling is that it can cut down on number of seeks
|
||||
on rotational media. For example, if a process is doing dependent
|
||||
sequential reads (next read will come on only after completion of previous
|
||||
one), then not dispatching request from other queue should help as we
|
||||
did not move the disk head and kept on dispatching sequential IO from
|
||||
one queue.
|
||||
|
||||
CFQ has following service trees and various queues are put on these trees.
|
||||
|
||||
sync-idle sync-noidle async
|
||||
|
||||
All cfq queues doing synchronous sequential IO go on to sync-idle tree.
|
||||
On this tree we idle on each queue individually.
|
||||
|
||||
All synchronous non-sequential queues go on sync-noidle tree. Also any
|
||||
synchronous write request which is not marked with REQ_IDLE goes on this
|
||||
service tree. On this tree we do not idle on individual queues instead idle
|
||||
on the whole group of queues or the tree. So if there are 4 queues waiting
|
||||
for IO to dispatch we will idle only once last queue has dispatched the IO
|
||||
and there is no more IO on this service tree.
|
||||
|
||||
All async writes go on async service tree. There is no idling on async
|
||||
queues.
|
||||
|
||||
CFQ has some optimizations for SSDs and if it detects a non-rotational
|
||||
media which can support higher queue depth (multiple requests at in
|
||||
flight at a time), then it cuts down on idling of individual queues and
|
||||
all the queues move to sync-noidle tree and only tree idle remains. This
|
||||
tree idling provides isolation with buffered write queues on async tree.
|
||||
|
||||
FAQ
|
||||
===
|
||||
Q1. Why to idle at all on queues not marked with REQ_IDLE.
|
||||
|
||||
A1. We only do tree idle (all queues on sync-noidle tree) on queues not marked
|
||||
with REQ_IDLE. This helps in providing isolation with all the sync-idle
|
||||
queues. Otherwise in presence of many sequential readers, other
|
||||
synchronous IO might not get fair share of disk.
|
||||
|
||||
For example, if there are 10 sequential readers doing IO and they get
|
||||
100ms each. If a !REQ_IDLE request comes in, it will be scheduled
|
||||
roughly after 1 second. If after completion of !REQ_IDLE request we
|
||||
do not idle, and after a couple of milli seconds a another !REQ_IDLE
|
||||
request comes in, again it will be scheduled after 1second. Repeat it
|
||||
and notice how a workload can lose its disk share and suffer due to
|
||||
multiple sequential readers.
|
||||
|
||||
fsync can generate dependent IO where bunch of data is written in the
|
||||
context of fsync, and later some journaling data is written. Journaling
|
||||
data comes in only after fsync has finished its IO (atleast for ext4
|
||||
that seemed to be the case). Now if one decides not to idle on fsync
|
||||
thread due to !REQ_IDLE, then next journaling write will not get
|
||||
scheduled for another second. A process doing small fsync, will suffer
|
||||
badly in presence of multiple sequential readers.
|
||||
|
||||
Hence doing tree idling on threads using !REQ_IDLE flag on requests
|
||||
provides isolation from multiple sequential readers and at the same
|
||||
time we do not idle on individual threads.
|
||||
|
||||
Q2. When to specify REQ_IDLE
|
||||
A2. I would think whenever one is doing synchronous write and expecting
|
||||
more writes to be dispatched from same context soon, should be able
|
||||
to specify REQ_IDLE on writes and that probably should work well for
|
||||
most of the cases.
|
|
@ -64,7 +64,7 @@ guess, the kernel will put the process issuing IO to sleep for an amount
|
|||
of time, before entering a classic poll loop. This mode might be a
|
||||
little slower than pure classic polling, but it will be more efficient.
|
||||
If set to a value larger than 0, the kernel will put the process issuing
|
||||
IO to sleep for this amont of microseconds before entering classic
|
||||
IO to sleep for this amount of microseconds before entering classic
|
||||
polling.
|
||||
|
||||
iostats (RW)
|
||||
|
@ -194,4 +194,31 @@ blk-throttle makes decision based on the samplings. Lower time means cgroups
|
|||
have more smooth throughput, but higher CPU overhead. This exists only when
|
||||
CONFIG_BLK_DEV_THROTTLING_LOW is enabled.
|
||||
|
||||
zoned (RO)
|
||||
----------
|
||||
This indicates if the device is a zoned block device and the zone model of the
|
||||
device if it is indeed zoned. The possible values indicated by zoned are
|
||||
"none" for regular block devices and "host-aware" or "host-managed" for zoned
|
||||
block devices. The characteristics of host-aware and host-managed zoned block
|
||||
devices are described in the ZBC (Zoned Block Commands) and ZAC
|
||||
(Zoned Device ATA Command Set) standards. These standards also define the
|
||||
"drive-managed" zone model. However, since drive-managed zoned block devices
|
||||
do not support zone commands, they will be treated as regular block devices
|
||||
and zoned will report "none".
|
||||
|
||||
nr_zones (RO)
|
||||
-------------
|
||||
For zoned block devices (zoned attribute indicating "host-managed" or
|
||||
"host-aware"), this indicates the total number of zones of the device.
|
||||
This is always 0 for regular block devices.
|
||||
|
||||
chunk_sectors (RO)
|
||||
------------------
|
||||
This has different meaning depending on the type of the block device.
|
||||
For a RAID device (dm-raid), chunk_sectors indicates the size in 512B sectors
|
||||
of the RAID volume stripe segment. For a zoned block device, either host-aware
|
||||
or host-managed, chunk_sectors indicates the size in 512B sectors of the zones
|
||||
of the device, with the eventual exception of the last zone of the device which
|
||||
may be smaller.
|
||||
|
||||
Jens Axboe <jens.axboe@oracle.com>, February 2009
|
||||
|
|
|
@ -97,11 +97,6 @@ parameters may be changed at runtime by the command
|
|||
allowing boot to proceed. none ignores them, expecting
|
||||
user space to do the scan.
|
||||
|
||||
scsi_mod.use_blk_mq=
|
||||
[SCSI] use blk-mq I/O path by default
|
||||
See SCSI_MQ_DEFAULT in drivers/scsi/Kconfig.
|
||||
Format: <y/n>
|
||||
|
||||
sim710= [SCSI,HW]
|
||||
See header of drivers/scsi/sim710.c.
|
||||
|
||||
|
|
|
@ -155,12 +155,6 @@ config BLK_CGROUP_IOLATENCY
|
|||
|
||||
Note, this is an experimental interface and could be changed someday.
|
||||
|
||||
config BLK_WBT_SQ
|
||||
bool "Single queue writeback throttling"
|
||||
depends on BLK_WBT
|
||||
---help---
|
||||
Enable writeback throttling by default on legacy single queue devices
|
||||
|
||||
config BLK_WBT_MQ
|
||||
bool "Multiqueue writeback throttling"
|
||||
default y
|
||||
|
|
|
@ -3,67 +3,6 @@ if BLOCK
|
|||
|
||||
menu "IO Schedulers"
|
||||
|
||||
config IOSCHED_NOOP
|
||||
bool
|
||||
default y
|
||||
---help---
|
||||
The no-op I/O scheduler is a minimal scheduler that does basic merging
|
||||
and sorting. Its main uses include non-disk based block devices like
|
||||
memory devices, and specialised software or hardware environments
|
||||
that do their own scheduling and require only minimal assistance from
|
||||
the kernel.
|
||||
|
||||
config IOSCHED_DEADLINE
|
||||
tristate "Deadline I/O scheduler"
|
||||
default y
|
||||
---help---
|
||||
The deadline I/O scheduler is simple and compact. It will provide
|
||||
CSCAN service with FIFO expiration of requests, switching to
|
||||
a new point in the service tree and doing a batch of IO from there
|
||||
in case of expiry.
|
||||
|
||||
config IOSCHED_CFQ
|
||||
tristate "CFQ I/O scheduler"
|
||||
default y
|
||||
---help---
|
||||
The CFQ I/O scheduler tries to distribute bandwidth equally
|
||||
among all processes in the system. It should provide a fair
|
||||
and low latency working environment, suitable for both desktop
|
||||
and server systems.
|
||||
|
||||
This is the default I/O scheduler.
|
||||
|
||||
config CFQ_GROUP_IOSCHED
|
||||
bool "CFQ Group Scheduling support"
|
||||
depends on IOSCHED_CFQ && BLK_CGROUP
|
||||
---help---
|
||||
Enable group IO scheduling in CFQ.
|
||||
|
||||
choice
|
||||
|
||||
prompt "Default I/O scheduler"
|
||||
default DEFAULT_CFQ
|
||||
help
|
||||
Select the I/O scheduler which will be used by default for all
|
||||
block devices.
|
||||
|
||||
config DEFAULT_DEADLINE
|
||||
bool "Deadline" if IOSCHED_DEADLINE=y
|
||||
|
||||
config DEFAULT_CFQ
|
||||
bool "CFQ" if IOSCHED_CFQ=y
|
||||
|
||||
config DEFAULT_NOOP
|
||||
bool "No-op"
|
||||
|
||||
endchoice
|
||||
|
||||
config DEFAULT_IOSCHED
|
||||
string
|
||||
default "deadline" if DEFAULT_DEADLINE
|
||||
default "cfq" if DEFAULT_CFQ
|
||||
default "noop" if DEFAULT_NOOP
|
||||
|
||||
config MQ_IOSCHED_DEADLINE
|
||||
tristate "MQ deadline I/O scheduler"
|
||||
default y
|
||||
|
|
|
@ -3,7 +3,7 @@
|
|||
# Makefile for the kernel block layer
|
||||
#
|
||||
|
||||
obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-tag.o blk-sysfs.o \
|
||||
obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-sysfs.o \
|
||||
blk-flush.o blk-settings.o blk-ioc.o blk-map.o \
|
||||
blk-exec.o blk-merge.o blk-softirq.o blk-timeout.o \
|
||||
blk-lib.o blk-mq.o blk-mq-tag.o blk-stat.o \
|
||||
|
@ -18,9 +18,6 @@ obj-$(CONFIG_BLK_DEV_BSGLIB) += bsg-lib.o
|
|||
obj-$(CONFIG_BLK_CGROUP) += blk-cgroup.o
|
||||
obj-$(CONFIG_BLK_DEV_THROTTLING) += blk-throttle.o
|
||||
obj-$(CONFIG_BLK_CGROUP_IOLATENCY) += blk-iolatency.o
|
||||
obj-$(CONFIG_IOSCHED_NOOP) += noop-iosched.o
|
||||
obj-$(CONFIG_IOSCHED_DEADLINE) += deadline-iosched.o
|
||||
obj-$(CONFIG_IOSCHED_CFQ) += cfq-iosched.o
|
||||
obj-$(CONFIG_MQ_IOSCHED_DEADLINE) += mq-deadline.o
|
||||
obj-$(CONFIG_MQ_IOSCHED_KYBER) += kyber-iosched.o
|
||||
bfq-y := bfq-iosched.o bfq-wf2q.o bfq-cgroup.o
|
||||
|
|
|
@ -334,7 +334,7 @@ static void bfqg_stats_xfer_dead(struct bfq_group *bfqg)
|
|||
|
||||
parent = bfqg_parent(bfqg);
|
||||
|
||||
lockdep_assert_held(bfqg_to_blkg(bfqg)->q->queue_lock);
|
||||
lockdep_assert_held(&bfqg_to_blkg(bfqg)->q->queue_lock);
|
||||
|
||||
if (unlikely(!parent))
|
||||
return;
|
||||
|
@ -642,7 +642,7 @@ void bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio)
|
|||
uint64_t serial_nr;
|
||||
|
||||
rcu_read_lock();
|
||||
serial_nr = bio_blkcg(bio)->css.serial_nr;
|
||||
serial_nr = __bio_blkcg(bio)->css.serial_nr;
|
||||
|
||||
/*
|
||||
* Check whether blkcg has changed. The condition may trigger
|
||||
|
@ -651,7 +651,7 @@ void bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio)
|
|||
if (unlikely(!bfqd) || likely(bic->blkcg_serial_nr == serial_nr))
|
||||
goto out;
|
||||
|
||||
bfqg = __bfq_bic_change_cgroup(bfqd, bic, bio_blkcg(bio));
|
||||
bfqg = __bfq_bic_change_cgroup(bfqd, bic, __bio_blkcg(bio));
|
||||
/*
|
||||
* Update blkg_path for bfq_log_* functions. We cache this
|
||||
* path, and update it here, for the following
|
||||
|
|
|
@ -399,9 +399,9 @@ static struct bfq_io_cq *bfq_bic_lookup(struct bfq_data *bfqd,
|
|||
unsigned long flags;
|
||||
struct bfq_io_cq *icq;
|
||||
|
||||
spin_lock_irqsave(q->queue_lock, flags);
|
||||
spin_lock_irqsave(&q->queue_lock, flags);
|
||||
icq = icq_to_bic(ioc_lookup_icq(ioc, q));
|
||||
spin_unlock_irqrestore(q->queue_lock, flags);
|
||||
spin_unlock_irqrestore(&q->queue_lock, flags);
|
||||
|
||||
return icq;
|
||||
}
|
||||
|
@ -4066,7 +4066,7 @@ static void bfq_update_dispatch_stats(struct request_queue *q,
|
|||
* In addition, the following queue lock guarantees that
|
||||
* bfqq_group(bfqq) exists as well.
|
||||
*/
|
||||
spin_lock_irq(q->queue_lock);
|
||||
spin_lock_irq(&q->queue_lock);
|
||||
if (idle_timer_disabled)
|
||||
/*
|
||||
* Since the idle timer has been disabled,
|
||||
|
@ -4085,7 +4085,7 @@ static void bfq_update_dispatch_stats(struct request_queue *q,
|
|||
bfqg_stats_set_start_empty_time(bfqg);
|
||||
bfqg_stats_update_io_remove(bfqg, rq->cmd_flags);
|
||||
}
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
spin_unlock_irq(&q->queue_lock);
|
||||
}
|
||||
#else
|
||||
static inline void bfq_update_dispatch_stats(struct request_queue *q,
|
||||
|
@ -4416,7 +4416,7 @@ static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
|
|||
|
||||
rcu_read_lock();
|
||||
|
||||
bfqg = bfq_find_set_group(bfqd, bio_blkcg(bio));
|
||||
bfqg = bfq_find_set_group(bfqd, __bio_blkcg(bio));
|
||||
if (!bfqg) {
|
||||
bfqq = &bfqd->oom_bfqq;
|
||||
goto out;
|
||||
|
@ -4669,11 +4669,11 @@ static void bfq_update_insert_stats(struct request_queue *q,
|
|||
* In addition, the following queue lock guarantees that
|
||||
* bfqq_group(bfqq) exists as well.
|
||||
*/
|
||||
spin_lock_irq(q->queue_lock);
|
||||
spin_lock_irq(&q->queue_lock);
|
||||
bfqg_stats_update_io_add(bfqq_group(bfqq), bfqq, cmd_flags);
|
||||
if (idle_timer_disabled)
|
||||
bfqg_stats_update_idle_time(bfqq_group(bfqq));
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
spin_unlock_irq(&q->queue_lock);
|
||||
}
|
||||
#else
|
||||
static inline void bfq_update_insert_stats(struct request_queue *q,
|
||||
|
@ -5414,9 +5414,9 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
|
|||
}
|
||||
eq->elevator_data = bfqd;
|
||||
|
||||
spin_lock_irq(q->queue_lock);
|
||||
spin_lock_irq(&q->queue_lock);
|
||||
q->elevator = eq;
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
spin_unlock_irq(&q->queue_lock);
|
||||
|
||||
/*
|
||||
* Our fallback bfqq if bfq_find_alloc_queue() runs into OOM issues.
|
||||
|
@ -5756,7 +5756,7 @@ static struct elv_fs_entry bfq_attrs[] = {
|
|||
};
|
||||
|
||||
static struct elevator_type iosched_bfq_mq = {
|
||||
.ops.mq = {
|
||||
.ops = {
|
||||
.limit_depth = bfq_limit_depth,
|
||||
.prepare_request = bfq_prepare_request,
|
||||
.requeue_request = bfq_finish_requeue_request,
|
||||
|
@ -5777,7 +5777,6 @@ static struct elevator_type iosched_bfq_mq = {
|
|||
.exit_sched = bfq_exit_queue,
|
||||
},
|
||||
|
||||
.uses_mq = true,
|
||||
.icq_size = sizeof(struct bfq_io_cq),
|
||||
.icq_align = __alignof__(struct bfq_io_cq),
|
||||
.elevator_attrs = bfq_attrs,
|
||||
|
|
|
@ -390,7 +390,6 @@ void bio_integrity_advance(struct bio *bio, unsigned int bytes_done)
|
|||
bip->bip_iter.bi_sector += bytes_done >> 9;
|
||||
bvec_iter_advance(bip->bip_vec, &bip->bip_iter, bytes);
|
||||
}
|
||||
EXPORT_SYMBOL(bio_integrity_advance);
|
||||
|
||||
/**
|
||||
* bio_integrity_trim - Trim integrity vector
|
||||
|
@ -460,7 +459,6 @@ void bioset_integrity_free(struct bio_set *bs)
|
|||
mempool_exit(&bs->bio_integrity_pool);
|
||||
mempool_exit(&bs->bvec_integrity_pool);
|
||||
}
|
||||
EXPORT_SYMBOL(bioset_integrity_free);
|
||||
|
||||
void __init bio_integrity_init(void)
|
||||
{
|
||||
|
|
236
block/bio.c
236
block/bio.c
|
@ -244,7 +244,7 @@ fallback:
|
|||
|
||||
void bio_uninit(struct bio *bio)
|
||||
{
|
||||
bio_disassociate_task(bio);
|
||||
bio_disassociate_blkg(bio);
|
||||
}
|
||||
EXPORT_SYMBOL(bio_uninit);
|
||||
|
||||
|
@ -571,14 +571,13 @@ void bio_put(struct bio *bio)
|
|||
}
|
||||
EXPORT_SYMBOL(bio_put);
|
||||
|
||||
inline int bio_phys_segments(struct request_queue *q, struct bio *bio)
|
||||
int bio_phys_segments(struct request_queue *q, struct bio *bio)
|
||||
{
|
||||
if (unlikely(!bio_flagged(bio, BIO_SEG_VALID)))
|
||||
blk_recount_segments(q, bio);
|
||||
|
||||
return bio->bi_phys_segments;
|
||||
}
|
||||
EXPORT_SYMBOL(bio_phys_segments);
|
||||
|
||||
/**
|
||||
* __bio_clone_fast - clone a bio that shares the original bio's biovec
|
||||
|
@ -610,7 +609,8 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
|
|||
bio->bi_iter = bio_src->bi_iter;
|
||||
bio->bi_io_vec = bio_src->bi_io_vec;
|
||||
|
||||
bio_clone_blkcg_association(bio, bio_src);
|
||||
bio_clone_blkg_association(bio, bio_src);
|
||||
blkcg_bio_issue_init(bio);
|
||||
}
|
||||
EXPORT_SYMBOL(__bio_clone_fast);
|
||||
|
||||
|
@ -901,7 +901,6 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
|
|||
|
||||
return 0;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
|
||||
|
||||
static void submit_bio_wait_endio(struct bio *bio)
|
||||
{
|
||||
|
@ -1592,7 +1591,6 @@ void bio_set_pages_dirty(struct bio *bio)
|
|||
set_page_dirty_lock(bvec->bv_page);
|
||||
}
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(bio_set_pages_dirty);
|
||||
|
||||
static void bio_release_pages(struct bio *bio)
|
||||
{
|
||||
|
@ -1662,17 +1660,33 @@ defer:
|
|||
spin_unlock_irqrestore(&bio_dirty_lock, flags);
|
||||
schedule_work(&bio_dirty_work);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(bio_check_pages_dirty);
|
||||
|
||||
void update_io_ticks(struct hd_struct *part, unsigned long now)
|
||||
{
|
||||
unsigned long stamp;
|
||||
again:
|
||||
stamp = READ_ONCE(part->stamp);
|
||||
if (unlikely(stamp != now)) {
|
||||
if (likely(cmpxchg(&part->stamp, stamp, now) == stamp)) {
|
||||
__part_stat_add(part, io_ticks, 1);
|
||||
}
|
||||
}
|
||||
if (part->partno) {
|
||||
part = &part_to_disk(part)->part0;
|
||||
goto again;
|
||||
}
|
||||
}
|
||||
|
||||
void generic_start_io_acct(struct request_queue *q, int op,
|
||||
unsigned long sectors, struct hd_struct *part)
|
||||
{
|
||||
const int sgrp = op_stat_group(op);
|
||||
int cpu = part_stat_lock();
|
||||
|
||||
part_round_stats(q, cpu, part);
|
||||
part_stat_inc(cpu, part, ios[sgrp]);
|
||||
part_stat_add(cpu, part, sectors[sgrp], sectors);
|
||||
part_stat_lock();
|
||||
|
||||
update_io_ticks(part, jiffies);
|
||||
part_stat_inc(part, ios[sgrp]);
|
||||
part_stat_add(part, sectors[sgrp], sectors);
|
||||
part_inc_in_flight(q, part, op_is_write(op));
|
||||
|
||||
part_stat_unlock();
|
||||
|
@ -1682,12 +1696,15 @@ EXPORT_SYMBOL(generic_start_io_acct);
|
|||
void generic_end_io_acct(struct request_queue *q, int req_op,
|
||||
struct hd_struct *part, unsigned long start_time)
|
||||
{
|
||||
unsigned long duration = jiffies - start_time;
|
||||
unsigned long now = jiffies;
|
||||
unsigned long duration = now - start_time;
|
||||
const int sgrp = op_stat_group(req_op);
|
||||
int cpu = part_stat_lock();
|
||||
|
||||
part_stat_add(cpu, part, nsecs[sgrp], jiffies_to_nsecs(duration));
|
||||
part_round_stats(q, cpu, part);
|
||||
part_stat_lock();
|
||||
|
||||
update_io_ticks(part, now);
|
||||
part_stat_add(part, nsecs[sgrp], jiffies_to_nsecs(duration));
|
||||
part_stat_add(part, time_in_queue, duration);
|
||||
part_dec_in_flight(q, part, op_is_write(req_op));
|
||||
|
||||
part_stat_unlock();
|
||||
|
@ -1957,102 +1974,133 @@ EXPORT_SYMBOL(bioset_init_from_src);
|
|||
|
||||
#ifdef CONFIG_BLK_CGROUP
|
||||
|
||||
#ifdef CONFIG_MEMCG
|
||||
/**
|
||||
* bio_associate_blkcg_from_page - associate a bio with the page's blkcg
|
||||
* bio_disassociate_blkg - puts back the blkg reference if associated
|
||||
* @bio: target bio
|
||||
* @page: the page to lookup the blkcg from
|
||||
*
|
||||
* Associate @bio with the blkcg from @page's owning memcg. This works like
|
||||
* every other associate function wrt references.
|
||||
* Helper to disassociate the blkg from @bio if a blkg is associated.
|
||||
*/
|
||||
int bio_associate_blkcg_from_page(struct bio *bio, struct page *page)
|
||||
void bio_disassociate_blkg(struct bio *bio)
|
||||
{
|
||||
struct cgroup_subsys_state *blkcg_css;
|
||||
|
||||
if (unlikely(bio->bi_css))
|
||||
return -EBUSY;
|
||||
if (!page->mem_cgroup)
|
||||
return 0;
|
||||
blkcg_css = cgroup_get_e_css(page->mem_cgroup->css.cgroup,
|
||||
&io_cgrp_subsys);
|
||||
bio->bi_css = blkcg_css;
|
||||
return 0;
|
||||
}
|
||||
#endif /* CONFIG_MEMCG */
|
||||
|
||||
/**
|
||||
* bio_associate_blkcg - associate a bio with the specified blkcg
|
||||
* @bio: target bio
|
||||
* @blkcg_css: css of the blkcg to associate
|
||||
*
|
||||
* Associate @bio with the blkcg specified by @blkcg_css. Block layer will
|
||||
* treat @bio as if it were issued by a task which belongs to the blkcg.
|
||||
*
|
||||
* This function takes an extra reference of @blkcg_css which will be put
|
||||
* when @bio is released. The caller must own @bio and is responsible for
|
||||
* synchronizing calls to this function.
|
||||
*/
|
||||
int bio_associate_blkcg(struct bio *bio, struct cgroup_subsys_state *blkcg_css)
|
||||
{
|
||||
if (unlikely(bio->bi_css))
|
||||
return -EBUSY;
|
||||
css_get(blkcg_css);
|
||||
bio->bi_css = blkcg_css;
|
||||
return 0;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(bio_associate_blkcg);
|
||||
|
||||
/**
|
||||
* bio_associate_blkg - associate a bio with the specified blkg
|
||||
* @bio: target bio
|
||||
* @blkg: the blkg to associate
|
||||
*
|
||||
* Associate @bio with the blkg specified by @blkg. This is the queue specific
|
||||
* blkcg information associated with the @bio, a reference will be taken on the
|
||||
* @blkg and will be freed when the bio is freed.
|
||||
*/
|
||||
int bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg)
|
||||
{
|
||||
if (unlikely(bio->bi_blkg))
|
||||
return -EBUSY;
|
||||
if (!blkg_try_get(blkg))
|
||||
return -ENODEV;
|
||||
bio->bi_blkg = blkg;
|
||||
return 0;
|
||||
}
|
||||
|
||||
/**
|
||||
* bio_disassociate_task - undo bio_associate_current()
|
||||
* @bio: target bio
|
||||
*/
|
||||
void bio_disassociate_task(struct bio *bio)
|
||||
{
|
||||
if (bio->bi_ioc) {
|
||||
put_io_context(bio->bi_ioc);
|
||||
bio->bi_ioc = NULL;
|
||||
}
|
||||
if (bio->bi_css) {
|
||||
css_put(bio->bi_css);
|
||||
bio->bi_css = NULL;
|
||||
}
|
||||
if (bio->bi_blkg) {
|
||||
blkg_put(bio->bi_blkg);
|
||||
bio->bi_blkg = NULL;
|
||||
}
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(bio_disassociate_blkg);
|
||||
|
||||
/**
|
||||
* bio_clone_blkcg_association - clone blkcg association from src to dst bio
|
||||
* __bio_associate_blkg - associate a bio with the a blkg
|
||||
* @bio: target bio
|
||||
* @blkg: the blkg to associate
|
||||
*
|
||||
* This tries to associate @bio with the specified @blkg. Association failure
|
||||
* is handled by walking up the blkg tree. Therefore, the blkg associated can
|
||||
* be anything between @blkg and the root_blkg. This situation only happens
|
||||
* when a cgroup is dying and then the remaining bios will spill to the closest
|
||||
* alive blkg.
|
||||
*
|
||||
* A reference will be taken on the @blkg and will be released when @bio is
|
||||
* freed.
|
||||
*/
|
||||
static void __bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg)
|
||||
{
|
||||
bio_disassociate_blkg(bio);
|
||||
|
||||
bio->bi_blkg = blkg_tryget_closest(blkg);
|
||||
}
|
||||
|
||||
/**
|
||||
* bio_associate_blkg_from_css - associate a bio with a specified css
|
||||
* @bio: target bio
|
||||
* @css: target css
|
||||
*
|
||||
* Associate @bio with the blkg found by combining the css's blkg and the
|
||||
* request_queue of the @bio. This falls back to the queue's root_blkg if
|
||||
* the association fails with the css.
|
||||
*/
|
||||
void bio_associate_blkg_from_css(struct bio *bio,
|
||||
struct cgroup_subsys_state *css)
|
||||
{
|
||||
struct request_queue *q = bio->bi_disk->queue;
|
||||
struct blkcg_gq *blkg;
|
||||
|
||||
rcu_read_lock();
|
||||
|
||||
if (!css || !css->parent)
|
||||
blkg = q->root_blkg;
|
||||
else
|
||||
blkg = blkg_lookup_create(css_to_blkcg(css), q);
|
||||
|
||||
__bio_associate_blkg(bio, blkg);
|
||||
|
||||
rcu_read_unlock();
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(bio_associate_blkg_from_css);
|
||||
|
||||
#ifdef CONFIG_MEMCG
|
||||
/**
|
||||
* bio_associate_blkg_from_page - associate a bio with the page's blkg
|
||||
* @bio: target bio
|
||||
* @page: the page to lookup the blkcg from
|
||||
*
|
||||
* Associate @bio with the blkg from @page's owning memcg and the respective
|
||||
* request_queue. If cgroup_e_css returns %NULL, fall back to the queue's
|
||||
* root_blkg.
|
||||
*/
|
||||
void bio_associate_blkg_from_page(struct bio *bio, struct page *page)
|
||||
{
|
||||
struct cgroup_subsys_state *css;
|
||||
|
||||
if (!page->mem_cgroup)
|
||||
return;
|
||||
|
||||
rcu_read_lock();
|
||||
|
||||
css = cgroup_e_css(page->mem_cgroup->css.cgroup, &io_cgrp_subsys);
|
||||
bio_associate_blkg_from_css(bio, css);
|
||||
|
||||
rcu_read_unlock();
|
||||
}
|
||||
#endif /* CONFIG_MEMCG */
|
||||
|
||||
/**
|
||||
* bio_associate_blkg - associate a bio with a blkg
|
||||
* @bio: target bio
|
||||
*
|
||||
* Associate @bio with the blkg found from the bio's css and request_queue.
|
||||
* If one is not found, bio_lookup_blkg() creates the blkg. If a blkg is
|
||||
* already associated, the css is reused and association redone as the
|
||||
* request_queue may have changed.
|
||||
*/
|
||||
void bio_associate_blkg(struct bio *bio)
|
||||
{
|
||||
struct cgroup_subsys_state *css;
|
||||
|
||||
rcu_read_lock();
|
||||
|
||||
if (bio->bi_blkg)
|
||||
css = &bio_blkcg(bio)->css;
|
||||
else
|
||||
css = blkcg_css();
|
||||
|
||||
bio_associate_blkg_from_css(bio, css);
|
||||
|
||||
rcu_read_unlock();
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(bio_associate_blkg);
|
||||
|
||||
/**
|
||||
* bio_clone_blkg_association - clone blkg association from src to dst bio
|
||||
* @dst: destination bio
|
||||
* @src: source bio
|
||||
*/
|
||||
void bio_clone_blkcg_association(struct bio *dst, struct bio *src)
|
||||
void bio_clone_blkg_association(struct bio *dst, struct bio *src)
|
||||
{
|
||||
if (src->bi_css)
|
||||
WARN_ON(bio_associate_blkcg(dst, src->bi_css));
|
||||
if (src->bi_blkg)
|
||||
__bio_associate_blkg(dst, src->bi_blkg);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(bio_clone_blkcg_association);
|
||||
EXPORT_SYMBOL_GPL(bio_clone_blkg_association);
|
||||
#endif /* CONFIG_BLK_CGROUP */
|
||||
|
||||
static void __init biovec_init_slabs(void)
|
||||
|
|
|
@ -76,14 +76,42 @@ static void blkg_free(struct blkcg_gq *blkg)
|
|||
if (blkg->pd[i])
|
||||
blkcg_policy[i]->pd_free_fn(blkg->pd[i]);
|
||||
|
||||
if (blkg->blkcg != &blkcg_root)
|
||||
blk_exit_rl(blkg->q, &blkg->rl);
|
||||
|
||||
blkg_rwstat_exit(&blkg->stat_ios);
|
||||
blkg_rwstat_exit(&blkg->stat_bytes);
|
||||
kfree(blkg);
|
||||
}
|
||||
|
||||
static void __blkg_release(struct rcu_head *rcu)
|
||||
{
|
||||
struct blkcg_gq *blkg = container_of(rcu, struct blkcg_gq, rcu_head);
|
||||
|
||||
percpu_ref_exit(&blkg->refcnt);
|
||||
|
||||
/* release the blkcg and parent blkg refs this blkg has been holding */
|
||||
css_put(&blkg->blkcg->css);
|
||||
if (blkg->parent)
|
||||
blkg_put(blkg->parent);
|
||||
|
||||
wb_congested_put(blkg->wb_congested);
|
||||
|
||||
blkg_free(blkg);
|
||||
}
|
||||
|
||||
/*
|
||||
* A group is RCU protected, but having an rcu lock does not mean that one
|
||||
* can access all the fields of blkg and assume these are valid. For
|
||||
* example, don't try to follow throtl_data and request queue links.
|
||||
*
|
||||
* Having a reference to blkg under an rcu allows accesses to only values
|
||||
* local to groups like group stats and group rate limits.
|
||||
*/
|
||||
static void blkg_release(struct percpu_ref *ref)
|
||||
{
|
||||
struct blkcg_gq *blkg = container_of(ref, struct blkcg_gq, refcnt);
|
||||
|
||||
call_rcu(&blkg->rcu_head, __blkg_release);
|
||||
}
|
||||
|
||||
/**
|
||||
* blkg_alloc - allocate a blkg
|
||||
* @blkcg: block cgroup the new blkg is associated with
|
||||
|
@ -110,14 +138,6 @@ static struct blkcg_gq *blkg_alloc(struct blkcg *blkcg, struct request_queue *q,
|
|||
blkg->q = q;
|
||||
INIT_LIST_HEAD(&blkg->q_node);
|
||||
blkg->blkcg = blkcg;
|
||||
atomic_set(&blkg->refcnt, 1);
|
||||
|
||||
/* root blkg uses @q->root_rl, init rl only for !root blkgs */
|
||||
if (blkcg != &blkcg_root) {
|
||||
if (blk_init_rl(&blkg->rl, q, gfp_mask))
|
||||
goto err_free;
|
||||
blkg->rl.blkg = blkg;
|
||||
}
|
||||
|
||||
for (i = 0; i < BLKCG_MAX_POLS; i++) {
|
||||
struct blkcg_policy *pol = blkcg_policy[i];
|
||||
|
@ -157,7 +177,7 @@ struct blkcg_gq *blkg_lookup_slowpath(struct blkcg *blkcg,
|
|||
blkg = radix_tree_lookup(&blkcg->blkg_tree, q->id);
|
||||
if (blkg && blkg->q == q) {
|
||||
if (update_hint) {
|
||||
lockdep_assert_held(q->queue_lock);
|
||||
lockdep_assert_held(&q->queue_lock);
|
||||
rcu_assign_pointer(blkcg->blkg_hint, blkg);
|
||||
}
|
||||
return blkg;
|
||||
|
@ -180,7 +200,13 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
|
|||
int i, ret;
|
||||
|
||||
WARN_ON_ONCE(!rcu_read_lock_held());
|
||||
lockdep_assert_held(q->queue_lock);
|
||||
lockdep_assert_held(&q->queue_lock);
|
||||
|
||||
/* request_queue is dying, do not create/recreate a blkg */
|
||||
if (blk_queue_dying(q)) {
|
||||
ret = -ENODEV;
|
||||
goto err_free_blkg;
|
||||
}
|
||||
|
||||
/* blkg holds a reference to blkcg */
|
||||
if (!css_tryget_online(&blkcg->css)) {
|
||||
|
@ -217,6 +243,11 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
|
|||
blkg_get(blkg->parent);
|
||||
}
|
||||
|
||||
ret = percpu_ref_init(&blkg->refcnt, blkg_release, 0,
|
||||
GFP_NOWAIT | __GFP_NOWARN);
|
||||
if (ret)
|
||||
goto err_cancel_ref;
|
||||
|
||||
/* invoke per-policy init */
|
||||
for (i = 0; i < BLKCG_MAX_POLS; i++) {
|
||||
struct blkcg_policy *pol = blkcg_policy[i];
|
||||
|
@ -249,6 +280,8 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
|
|||
blkg_put(blkg);
|
||||
return ERR_PTR(ret);
|
||||
|
||||
err_cancel_ref:
|
||||
percpu_ref_exit(&blkg->refcnt);
|
||||
err_put_congested:
|
||||
wb_congested_put(wb_congested);
|
||||
err_put_css:
|
||||
|
@ -259,7 +292,7 @@ err_free_blkg:
|
|||
}
|
||||
|
||||
/**
|
||||
* blkg_lookup_create - lookup blkg, try to create one if not there
|
||||
* __blkg_lookup_create - lookup blkg, try to create one if not there
|
||||
* @blkcg: blkcg of interest
|
||||
* @q: request_queue of interest
|
||||
*
|
||||
|
@ -268,24 +301,16 @@ err_free_blkg:
|
|||
* that all non-root blkg's have access to the parent blkg. This function
|
||||
* should be called under RCU read lock and @q->queue_lock.
|
||||
*
|
||||
* Returns pointer to the looked up or created blkg on success, ERR_PTR()
|
||||
* value on error. If @q is dead, returns ERR_PTR(-EINVAL). If @q is not
|
||||
* dead and bypassing, returns ERR_PTR(-EBUSY).
|
||||
* Returns the blkg or the closest blkg if blkg_create() fails as it walks
|
||||
* down from root.
|
||||
*/
|
||||
struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
|
||||
struct request_queue *q)
|
||||
struct blkcg_gq *__blkg_lookup_create(struct blkcg *blkcg,
|
||||
struct request_queue *q)
|
||||
{
|
||||
struct blkcg_gq *blkg;
|
||||
|
||||
WARN_ON_ONCE(!rcu_read_lock_held());
|
||||
lockdep_assert_held(q->queue_lock);
|
||||
|
||||
/*
|
||||
* This could be the first entry point of blkcg implementation and
|
||||
* we shouldn't allow anything to go through for a bypassing queue.
|
||||
*/
|
||||
if (unlikely(blk_queue_bypass(q)))
|
||||
return ERR_PTR(blk_queue_dying(q) ? -ENODEV : -EBUSY);
|
||||
lockdep_assert_held(&q->queue_lock);
|
||||
|
||||
blkg = __blkg_lookup(blkcg, q, true);
|
||||
if (blkg)
|
||||
|
@ -293,30 +318,64 @@ struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
|
|||
|
||||
/*
|
||||
* Create blkgs walking down from blkcg_root to @blkcg, so that all
|
||||
* non-root blkgs have access to their parents.
|
||||
* non-root blkgs have access to their parents. Returns the closest
|
||||
* blkg to the intended blkg should blkg_create() fail.
|
||||
*/
|
||||
while (true) {
|
||||
struct blkcg *pos = blkcg;
|
||||
struct blkcg *parent = blkcg_parent(blkcg);
|
||||
struct blkcg_gq *ret_blkg = q->root_blkg;
|
||||
|
||||
while (parent && !__blkg_lookup(parent, q, false)) {
|
||||
while (parent) {
|
||||
blkg = __blkg_lookup(parent, q, false);
|
||||
if (blkg) {
|
||||
/* remember closest blkg */
|
||||
ret_blkg = blkg;
|
||||
break;
|
||||
}
|
||||
pos = parent;
|
||||
parent = blkcg_parent(parent);
|
||||
}
|
||||
|
||||
blkg = blkg_create(pos, q, NULL);
|
||||
if (pos == blkcg || IS_ERR(blkg))
|
||||
if (IS_ERR(blkg))
|
||||
return ret_blkg;
|
||||
if (pos == blkcg)
|
||||
return blkg;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* blkg_lookup_create - find or create a blkg
|
||||
* @blkcg: target block cgroup
|
||||
* @q: target request_queue
|
||||
*
|
||||
* This looks up or creates the blkg representing the unique pair
|
||||
* of the blkcg and the request_queue.
|
||||
*/
|
||||
struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
|
||||
struct request_queue *q)
|
||||
{
|
||||
struct blkcg_gq *blkg = blkg_lookup(blkcg, q);
|
||||
|
||||
if (unlikely(!blkg)) {
|
||||
unsigned long flags;
|
||||
|
||||
spin_lock_irqsave(&q->queue_lock, flags);
|
||||
blkg = __blkg_lookup_create(blkcg, q);
|
||||
spin_unlock_irqrestore(&q->queue_lock, flags);
|
||||
}
|
||||
|
||||
return blkg;
|
||||
}
|
||||
|
||||
static void blkg_destroy(struct blkcg_gq *blkg)
|
||||
{
|
||||
struct blkcg *blkcg = blkg->blkcg;
|
||||
struct blkcg_gq *parent = blkg->parent;
|
||||
int i;
|
||||
|
||||
lockdep_assert_held(blkg->q->queue_lock);
|
||||
lockdep_assert_held(&blkg->q->queue_lock);
|
||||
lockdep_assert_held(&blkcg->lock);
|
||||
|
||||
/* Something wrong if we are trying to remove same group twice */
|
||||
|
@ -353,7 +412,7 @@ static void blkg_destroy(struct blkcg_gq *blkg)
|
|||
* Put the reference taken at the time of creation so that when all
|
||||
* queues are gone, group can be destroyed.
|
||||
*/
|
||||
blkg_put(blkg);
|
||||
percpu_ref_kill(&blkg->refcnt);
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -366,8 +425,7 @@ static void blkg_destroy_all(struct request_queue *q)
|
|||
{
|
||||
struct blkcg_gq *blkg, *n;
|
||||
|
||||
lockdep_assert_held(q->queue_lock);
|
||||
|
||||
spin_lock_irq(&q->queue_lock);
|
||||
list_for_each_entry_safe(blkg, n, &q->blkg_list, q_node) {
|
||||
struct blkcg *blkcg = blkg->blkcg;
|
||||
|
||||
|
@ -377,7 +435,7 @@ static void blkg_destroy_all(struct request_queue *q)
|
|||
}
|
||||
|
||||
q->root_blkg = NULL;
|
||||
q->root_rl.blkg = NULL;
|
||||
spin_unlock_irq(&q->queue_lock);
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -403,41 +461,6 @@ void __blkg_release_rcu(struct rcu_head *rcu_head)
|
|||
}
|
||||
EXPORT_SYMBOL_GPL(__blkg_release_rcu);
|
||||
|
||||
/*
|
||||
* The next function used by blk_queue_for_each_rl(). It's a bit tricky
|
||||
* because the root blkg uses @q->root_rl instead of its own rl.
|
||||
*/
|
||||
struct request_list *__blk_queue_next_rl(struct request_list *rl,
|
||||
struct request_queue *q)
|
||||
{
|
||||
struct list_head *ent;
|
||||
struct blkcg_gq *blkg;
|
||||
|
||||
/*
|
||||
* Determine the current blkg list_head. The first entry is
|
||||
* root_rl which is off @q->blkg_list and mapped to the head.
|
||||
*/
|
||||
if (rl == &q->root_rl) {
|
||||
ent = &q->blkg_list;
|
||||
/* There are no more block groups, hence no request lists */
|
||||
if (list_empty(ent))
|
||||
return NULL;
|
||||
} else {
|
||||
blkg = container_of(rl, struct blkcg_gq, rl);
|
||||
ent = &blkg->q_node;
|
||||
}
|
||||
|
||||
/* walk to the next list_head, skip root blkcg */
|
||||
ent = ent->next;
|
||||
if (ent == &q->root_blkg->q_node)
|
||||
ent = ent->next;
|
||||
if (ent == &q->blkg_list)
|
||||
return NULL;
|
||||
|
||||
blkg = container_of(ent, struct blkcg_gq, q_node);
|
||||
return &blkg->rl;
|
||||
}
|
||||
|
||||
static int blkcg_reset_stats(struct cgroup_subsys_state *css,
|
||||
struct cftype *cftype, u64 val)
|
||||
{
|
||||
|
@ -477,7 +500,6 @@ const char *blkg_dev_name(struct blkcg_gq *blkg)
|
|||
return dev_name(blkg->q->backing_dev_info->dev);
|
||||
return NULL;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blkg_dev_name);
|
||||
|
||||
/**
|
||||
* blkcg_print_blkgs - helper for printing per-blkg data
|
||||
|
@ -508,10 +530,10 @@ void blkcg_print_blkgs(struct seq_file *sf, struct blkcg *blkcg,
|
|||
|
||||
rcu_read_lock();
|
||||
hlist_for_each_entry_rcu(blkg, &blkcg->blkg_list, blkcg_node) {
|
||||
spin_lock_irq(blkg->q->queue_lock);
|
||||
spin_lock_irq(&blkg->q->queue_lock);
|
||||
if (blkcg_policy_enabled(blkg->q, pol))
|
||||
total += prfill(sf, blkg->pd[pol->plid], data);
|
||||
spin_unlock_irq(blkg->q->queue_lock);
|
||||
spin_unlock_irq(&blkg->q->queue_lock);
|
||||
}
|
||||
rcu_read_unlock();
|
||||
|
||||
|
@ -709,7 +731,7 @@ u64 blkg_stat_recursive_sum(struct blkcg_gq *blkg,
|
|||
struct cgroup_subsys_state *pos_css;
|
||||
u64 sum = 0;
|
||||
|
||||
lockdep_assert_held(blkg->q->queue_lock);
|
||||
lockdep_assert_held(&blkg->q->queue_lock);
|
||||
|
||||
rcu_read_lock();
|
||||
blkg_for_each_descendant_pre(pos_blkg, pos_css, blkg) {
|
||||
|
@ -752,7 +774,7 @@ struct blkg_rwstat blkg_rwstat_recursive_sum(struct blkcg_gq *blkg,
|
|||
struct blkg_rwstat sum = { };
|
||||
int i;
|
||||
|
||||
lockdep_assert_held(blkg->q->queue_lock);
|
||||
lockdep_assert_held(&blkg->q->queue_lock);
|
||||
|
||||
rcu_read_lock();
|
||||
blkg_for_each_descendant_pre(pos_blkg, pos_css, blkg) {
|
||||
|
@ -783,18 +805,10 @@ static struct blkcg_gq *blkg_lookup_check(struct blkcg *blkcg,
|
|||
struct request_queue *q)
|
||||
{
|
||||
WARN_ON_ONCE(!rcu_read_lock_held());
|
||||
lockdep_assert_held(q->queue_lock);
|
||||
lockdep_assert_held(&q->queue_lock);
|
||||
|
||||
if (!blkcg_policy_enabled(q, pol))
|
||||
return ERR_PTR(-EOPNOTSUPP);
|
||||
|
||||
/*
|
||||
* This could be the first entry point of blkcg implementation and
|
||||
* we shouldn't allow anything to go through for a bypassing queue.
|
||||
*/
|
||||
if (unlikely(blk_queue_bypass(q)))
|
||||
return ERR_PTR(blk_queue_dying(q) ? -ENODEV : -EBUSY);
|
||||
|
||||
return __blkg_lookup(blkcg, q, true /* update_hint */);
|
||||
}
|
||||
|
||||
|
@ -812,7 +826,7 @@ static struct blkcg_gq *blkg_lookup_check(struct blkcg *blkcg,
|
|||
*/
|
||||
int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
|
||||
char *input, struct blkg_conf_ctx *ctx)
|
||||
__acquires(rcu) __acquires(disk->queue->queue_lock)
|
||||
__acquires(rcu) __acquires(&disk->queue->queue_lock)
|
||||
{
|
||||
struct gendisk *disk;
|
||||
struct request_queue *q;
|
||||
|
@ -840,7 +854,7 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
|
|||
q = disk->queue;
|
||||
|
||||
rcu_read_lock();
|
||||
spin_lock_irq(q->queue_lock);
|
||||
spin_lock_irq(&q->queue_lock);
|
||||
|
||||
blkg = blkg_lookup_check(blkcg, pol, q);
|
||||
if (IS_ERR(blkg)) {
|
||||
|
@ -867,7 +881,7 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
|
|||
}
|
||||
|
||||
/* Drop locks to do new blkg allocation with GFP_KERNEL. */
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
spin_unlock_irq(&q->queue_lock);
|
||||
rcu_read_unlock();
|
||||
|
||||
new_blkg = blkg_alloc(pos, q, GFP_KERNEL);
|
||||
|
@ -877,7 +891,7 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
|
|||
}
|
||||
|
||||
rcu_read_lock();
|
||||
spin_lock_irq(q->queue_lock);
|
||||
spin_lock_irq(&q->queue_lock);
|
||||
|
||||
blkg = blkg_lookup_check(pos, pol, q);
|
||||
if (IS_ERR(blkg)) {
|
||||
|
@ -905,7 +919,7 @@ success:
|
|||
return 0;
|
||||
|
||||
fail_unlock:
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
spin_unlock_irq(&q->queue_lock);
|
||||
rcu_read_unlock();
|
||||
fail:
|
||||
put_disk_and_module(disk);
|
||||
|
@ -921,7 +935,6 @@ fail:
|
|||
}
|
||||
return ret;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blkg_conf_prep);
|
||||
|
||||
/**
|
||||
* blkg_conf_finish - finish up per-blkg config update
|
||||
|
@ -931,13 +944,12 @@ EXPORT_SYMBOL_GPL(blkg_conf_prep);
|
|||
* with blkg_conf_prep().
|
||||
*/
|
||||
void blkg_conf_finish(struct blkg_conf_ctx *ctx)
|
||||
__releases(ctx->disk->queue->queue_lock) __releases(rcu)
|
||||
__releases(&ctx->disk->queue->queue_lock) __releases(rcu)
|
||||
{
|
||||
spin_unlock_irq(ctx->disk->queue->queue_lock);
|
||||
spin_unlock_irq(&ctx->disk->queue->queue_lock);
|
||||
rcu_read_unlock();
|
||||
put_disk_and_module(ctx->disk);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blkg_conf_finish);
|
||||
|
||||
static int blkcg_print_stat(struct seq_file *sf, void *v)
|
||||
{
|
||||
|
@ -967,7 +979,7 @@ static int blkcg_print_stat(struct seq_file *sf, void *v)
|
|||
*/
|
||||
off += scnprintf(buf+off, size-off, "%s ", dname);
|
||||
|
||||
spin_lock_irq(blkg->q->queue_lock);
|
||||
spin_lock_irq(&blkg->q->queue_lock);
|
||||
|
||||
rwstat = blkg_rwstat_recursive_sum(blkg, NULL,
|
||||
offsetof(struct blkcg_gq, stat_bytes));
|
||||
|
@ -981,7 +993,7 @@ static int blkcg_print_stat(struct seq_file *sf, void *v)
|
|||
wios = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_WRITE]);
|
||||
dios = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_DISCARD]);
|
||||
|
||||
spin_unlock_irq(blkg->q->queue_lock);
|
||||
spin_unlock_irq(&blkg->q->queue_lock);
|
||||
|
||||
if (rbytes || wbytes || rios || wios) {
|
||||
has_stats = true;
|
||||
|
@ -1102,9 +1114,9 @@ void blkcg_destroy_blkgs(struct blkcg *blkcg)
|
|||
struct blkcg_gq, blkcg_node);
|
||||
struct request_queue *q = blkg->q;
|
||||
|
||||
if (spin_trylock(q->queue_lock)) {
|
||||
if (spin_trylock(&q->queue_lock)) {
|
||||
blkg_destroy(blkg);
|
||||
spin_unlock(q->queue_lock);
|
||||
spin_unlock(&q->queue_lock);
|
||||
} else {
|
||||
spin_unlock_irq(&blkcg->lock);
|
||||
cpu_relax();
|
||||
|
@ -1225,36 +1237,31 @@ int blkcg_init_queue(struct request_queue *q)
|
|||
|
||||
/* Make sure the root blkg exists. */
|
||||
rcu_read_lock();
|
||||
spin_lock_irq(q->queue_lock);
|
||||
spin_lock_irq(&q->queue_lock);
|
||||
blkg = blkg_create(&blkcg_root, q, new_blkg);
|
||||
if (IS_ERR(blkg))
|
||||
goto err_unlock;
|
||||
q->root_blkg = blkg;
|
||||
q->root_rl.blkg = blkg;
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
spin_unlock_irq(&q->queue_lock);
|
||||
rcu_read_unlock();
|
||||
|
||||
if (preloaded)
|
||||
radix_tree_preload_end();
|
||||
|
||||
ret = blk_iolatency_init(q);
|
||||
if (ret) {
|
||||
spin_lock_irq(q->queue_lock);
|
||||
blkg_destroy_all(q);
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
return ret;
|
||||
}
|
||||
if (ret)
|
||||
goto err_destroy_all;
|
||||
|
||||
ret = blk_throtl_init(q);
|
||||
if (ret) {
|
||||
spin_lock_irq(q->queue_lock);
|
||||
blkg_destroy_all(q);
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
}
|
||||
return ret;
|
||||
if (ret)
|
||||
goto err_destroy_all;
|
||||
return 0;
|
||||
|
||||
err_destroy_all:
|
||||
blkg_destroy_all(q);
|
||||
return ret;
|
||||
err_unlock:
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
spin_unlock_irq(&q->queue_lock);
|
||||
rcu_read_unlock();
|
||||
if (preloaded)
|
||||
radix_tree_preload_end();
|
||||
|
@ -1269,7 +1276,7 @@ err_unlock:
|
|||
*/
|
||||
void blkcg_drain_queue(struct request_queue *q)
|
||||
{
|
||||
lockdep_assert_held(q->queue_lock);
|
||||
lockdep_assert_held(&q->queue_lock);
|
||||
|
||||
/*
|
||||
* @q could be exiting and already have destroyed all blkgs as
|
||||
|
@ -1289,10 +1296,7 @@ void blkcg_drain_queue(struct request_queue *q)
|
|||
*/
|
||||
void blkcg_exit_queue(struct request_queue *q)
|
||||
{
|
||||
spin_lock_irq(q->queue_lock);
|
||||
blkg_destroy_all(q);
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
|
||||
blk_throtl_exit(q);
|
||||
}
|
||||
|
||||
|
@ -1396,10 +1400,8 @@ int blkcg_activate_policy(struct request_queue *q,
|
|||
if (blkcg_policy_enabled(q, pol))
|
||||
return 0;
|
||||
|
||||
if (q->mq_ops)
|
||||
if (queue_is_mq(q))
|
||||
blk_mq_freeze_queue(q);
|
||||
else
|
||||
blk_queue_bypass_start(q);
|
||||
pd_prealloc:
|
||||
if (!pd_prealloc) {
|
||||
pd_prealloc = pol->pd_alloc_fn(GFP_KERNEL, q->node);
|
||||
|
@ -1409,7 +1411,7 @@ pd_prealloc:
|
|||
}
|
||||
}
|
||||
|
||||
spin_lock_irq(q->queue_lock);
|
||||
spin_lock_irq(&q->queue_lock);
|
||||
|
||||
list_for_each_entry(blkg, &q->blkg_list, q_node) {
|
||||
struct blkg_policy_data *pd;
|
||||
|
@ -1421,7 +1423,7 @@ pd_prealloc:
|
|||
if (!pd)
|
||||
swap(pd, pd_prealloc);
|
||||
if (!pd) {
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
spin_unlock_irq(&q->queue_lock);
|
||||
goto pd_prealloc;
|
||||
}
|
||||
|
||||
|
@ -1435,12 +1437,10 @@ pd_prealloc:
|
|||
__set_bit(pol->plid, q->blkcg_pols);
|
||||
ret = 0;
|
||||
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
spin_unlock_irq(&q->queue_lock);
|
||||
out_bypass_end:
|
||||
if (q->mq_ops)
|
||||
if (queue_is_mq(q))
|
||||
blk_mq_unfreeze_queue(q);
|
||||
else
|
||||
blk_queue_bypass_end(q);
|
||||
if (pd_prealloc)
|
||||
pol->pd_free_fn(pd_prealloc);
|
||||
return ret;
|
||||
|
@ -1463,12 +1463,10 @@ void blkcg_deactivate_policy(struct request_queue *q,
|
|||
if (!blkcg_policy_enabled(q, pol))
|
||||
return;
|
||||
|
||||
if (q->mq_ops)
|
||||
if (queue_is_mq(q))
|
||||
blk_mq_freeze_queue(q);
|
||||
else
|
||||
blk_queue_bypass_start(q);
|
||||
|
||||
spin_lock_irq(q->queue_lock);
|
||||
spin_lock_irq(&q->queue_lock);
|
||||
|
||||
__clear_bit(pol->plid, q->blkcg_pols);
|
||||
|
||||
|
@ -1481,12 +1479,10 @@ void blkcg_deactivate_policy(struct request_queue *q,
|
|||
}
|
||||
}
|
||||
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
spin_unlock_irq(&q->queue_lock);
|
||||
|
||||
if (q->mq_ops)
|
||||
if (queue_is_mq(q))
|
||||
blk_mq_unfreeze_queue(q);
|
||||
else
|
||||
blk_queue_bypass_end(q);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blkcg_deactivate_policy);
|
||||
|
||||
|
@ -1748,8 +1744,7 @@ void blkcg_maybe_throttle_current(void)
|
|||
blkg = blkg_lookup(blkcg, q);
|
||||
if (!blkg)
|
||||
goto out;
|
||||
blkg = blkg_try_get(blkg);
|
||||
if (!blkg)
|
||||
if (!blkg_tryget(blkg))
|
||||
goto out;
|
||||
rcu_read_unlock();
|
||||
|
||||
|
@ -1761,7 +1756,6 @@ out:
|
|||
rcu_read_unlock();
|
||||
blk_put_queue(q);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blkcg_maybe_throttle_current);
|
||||
|
||||
/**
|
||||
* blkcg_schedule_throttle - this task needs to check for throttling
|
||||
|
@ -1795,7 +1789,6 @@ void blkcg_schedule_throttle(struct request_queue *q, bool use_memdelay)
|
|||
current->use_memdelay = use_memdelay;
|
||||
set_notify_resume(current);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blkcg_schedule_throttle);
|
||||
|
||||
/**
|
||||
* blkcg_add_delay - add delay to this blkg
|
||||
|
@ -1810,7 +1803,6 @@ void blkcg_add_delay(struct blkcg_gq *blkg, u64 now, u64 delta)
|
|||
blkcg_scale_delay(blkg, now);
|
||||
atomic64_add(delta, &blkg->delay_nsec);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blkcg_add_delay);
|
||||
|
||||
module_param(blkcg_debug_stats, bool, 0644);
|
||||
MODULE_PARM_DESC(blkcg_debug_stats, "True if you want debug stats, false if not");
|
||||
|
|
2066
block/blk-core.c
2066
block/blk-core.c
File diff suppressed because it is too large
Load Diff
|
@ -48,8 +48,6 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
|
|||
struct request *rq, int at_head,
|
||||
rq_end_io_fn *done)
|
||||
{
|
||||
int where = at_head ? ELEVATOR_INSERT_FRONT : ELEVATOR_INSERT_BACK;
|
||||
|
||||
WARN_ON(irqs_disabled());
|
||||
WARN_ON(!blk_rq_is_passthrough(rq));
|
||||
|
||||
|
@ -60,23 +58,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
|
|||
* don't check dying flag for MQ because the request won't
|
||||
* be reused after dying flag is set
|
||||
*/
|
||||
if (q->mq_ops) {
|
||||
blk_mq_sched_insert_request(rq, at_head, true, false);
|
||||
return;
|
||||
}
|
||||
|
||||
spin_lock_irq(q->queue_lock);
|
||||
|
||||
if (unlikely(blk_queue_dying(q))) {
|
||||
rq->rq_flags |= RQF_QUIET;
|
||||
__blk_end_request_all(rq, BLK_STS_IOERR);
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
return;
|
||||
}
|
||||
|
||||
__elv_add_request(q, rq, where);
|
||||
__blk_run_queue(q);
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
blk_mq_sched_insert_request(rq, at_head, true, false);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_execute_rq_nowait);
|
||||
|
||||
|
|
|
@ -93,7 +93,7 @@ enum {
|
|||
FLUSH_PENDING_TIMEOUT = 5 * HZ,
|
||||
};
|
||||
|
||||
static bool blk_kick_flush(struct request_queue *q,
|
||||
static void blk_kick_flush(struct request_queue *q,
|
||||
struct blk_flush_queue *fq, unsigned int flags);
|
||||
|
||||
static unsigned int blk_flush_policy(unsigned long fflags, struct request *rq)
|
||||
|
@ -132,18 +132,9 @@ static void blk_flush_restore_request(struct request *rq)
|
|||
rq->end_io = rq->flush.saved_end_io;
|
||||
}
|
||||
|
||||
static bool blk_flush_queue_rq(struct request *rq, bool add_front)
|
||||
static void blk_flush_queue_rq(struct request *rq, bool add_front)
|
||||
{
|
||||
if (rq->q->mq_ops) {
|
||||
blk_mq_add_to_requeue_list(rq, add_front, true);
|
||||
return false;
|
||||
} else {
|
||||
if (add_front)
|
||||
list_add(&rq->queuelist, &rq->q->queue_head);
|
||||
else
|
||||
list_add_tail(&rq->queuelist, &rq->q->queue_head);
|
||||
return true;
|
||||
}
|
||||
blk_mq_add_to_requeue_list(rq, add_front, true);
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -157,18 +148,17 @@ static bool blk_flush_queue_rq(struct request *rq, bool add_front)
|
|||
* completion and trigger the next step.
|
||||
*
|
||||
* CONTEXT:
|
||||
* spin_lock_irq(q->queue_lock or fq->mq_flush_lock)
|
||||
* spin_lock_irq(fq->mq_flush_lock)
|
||||
*
|
||||
* RETURNS:
|
||||
* %true if requests were added to the dispatch queue, %false otherwise.
|
||||
*/
|
||||
static bool blk_flush_complete_seq(struct request *rq,
|
||||
static void blk_flush_complete_seq(struct request *rq,
|
||||
struct blk_flush_queue *fq,
|
||||
unsigned int seq, blk_status_t error)
|
||||
{
|
||||
struct request_queue *q = rq->q;
|
||||
struct list_head *pending = &fq->flush_queue[fq->flush_pending_idx];
|
||||
bool queued = false, kicked;
|
||||
unsigned int cmd_flags;
|
||||
|
||||
BUG_ON(rq->flush.seq & seq);
|
||||
|
@ -191,7 +181,7 @@ static bool blk_flush_complete_seq(struct request *rq,
|
|||
|
||||
case REQ_FSEQ_DATA:
|
||||
list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
|
||||
queued = blk_flush_queue_rq(rq, true);
|
||||
blk_flush_queue_rq(rq, true);
|
||||
break;
|
||||
|
||||
case REQ_FSEQ_DONE:
|
||||
|
@ -204,42 +194,34 @@ static bool blk_flush_complete_seq(struct request *rq,
|
|||
BUG_ON(!list_empty(&rq->queuelist));
|
||||
list_del_init(&rq->flush.list);
|
||||
blk_flush_restore_request(rq);
|
||||
if (q->mq_ops)
|
||||
blk_mq_end_request(rq, error);
|
||||
else
|
||||
__blk_end_request_all(rq, error);
|
||||
blk_mq_end_request(rq, error);
|
||||
break;
|
||||
|
||||
default:
|
||||
BUG();
|
||||
}
|
||||
|
||||
kicked = blk_kick_flush(q, fq, cmd_flags);
|
||||
return kicked | queued;
|
||||
blk_kick_flush(q, fq, cmd_flags);
|
||||
}
|
||||
|
||||
static void flush_end_io(struct request *flush_rq, blk_status_t error)
|
||||
{
|
||||
struct request_queue *q = flush_rq->q;
|
||||
struct list_head *running;
|
||||
bool queued = false;
|
||||
struct request *rq, *n;
|
||||
unsigned long flags = 0;
|
||||
struct blk_flush_queue *fq = blk_get_flush_queue(q, flush_rq->mq_ctx);
|
||||
struct blk_mq_hw_ctx *hctx;
|
||||
|
||||
if (q->mq_ops) {
|
||||
struct blk_mq_hw_ctx *hctx;
|
||||
|
||||
/* release the tag's ownership to the req cloned from */
|
||||
spin_lock_irqsave(&fq->mq_flush_lock, flags);
|
||||
hctx = blk_mq_map_queue(q, flush_rq->mq_ctx->cpu);
|
||||
if (!q->elevator) {
|
||||
blk_mq_tag_set_rq(hctx, flush_rq->tag, fq->orig_rq);
|
||||
flush_rq->tag = -1;
|
||||
} else {
|
||||
blk_mq_put_driver_tag_hctx(hctx, flush_rq);
|
||||
flush_rq->internal_tag = -1;
|
||||
}
|
||||
/* release the tag's ownership to the req cloned from */
|
||||
spin_lock_irqsave(&fq->mq_flush_lock, flags);
|
||||
hctx = flush_rq->mq_hctx;
|
||||
if (!q->elevator) {
|
||||
blk_mq_tag_set_rq(hctx, flush_rq->tag, fq->orig_rq);
|
||||
flush_rq->tag = -1;
|
||||
} else {
|
||||
blk_mq_put_driver_tag_hctx(hctx, flush_rq);
|
||||
flush_rq->internal_tag = -1;
|
||||
}
|
||||
|
||||
running = &fq->flush_queue[fq->flush_running_idx];
|
||||
|
@ -248,35 +230,16 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error)
|
|||
/* account completion of the flush request */
|
||||
fq->flush_running_idx ^= 1;
|
||||
|
||||
if (!q->mq_ops)
|
||||
elv_completed_request(q, flush_rq);
|
||||
|
||||
/* and push the waiting requests to the next stage */
|
||||
list_for_each_entry_safe(rq, n, running, flush.list) {
|
||||
unsigned int seq = blk_flush_cur_seq(rq);
|
||||
|
||||
BUG_ON(seq != REQ_FSEQ_PREFLUSH && seq != REQ_FSEQ_POSTFLUSH);
|
||||
queued |= blk_flush_complete_seq(rq, fq, seq, error);
|
||||
blk_flush_complete_seq(rq, fq, seq, error);
|
||||
}
|
||||
|
||||
/*
|
||||
* Kick the queue to avoid stall for two cases:
|
||||
* 1. Moving a request silently to empty queue_head may stall the
|
||||
* queue.
|
||||
* 2. When flush request is running in non-queueable queue, the
|
||||
* queue is hold. Restart the queue after flush request is finished
|
||||
* to avoid stall.
|
||||
* This function is called from request completion path and calling
|
||||
* directly into request_fn may confuse the driver. Always use
|
||||
* kblockd.
|
||||
*/
|
||||
if (queued || fq->flush_queue_delayed) {
|
||||
WARN_ON(q->mq_ops);
|
||||
blk_run_queue_async(q);
|
||||
}
|
||||
fq->flush_queue_delayed = 0;
|
||||
if (q->mq_ops)
|
||||
spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
|
||||
spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -289,12 +252,10 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error)
|
|||
* Please read the comment at the top of this file for more info.
|
||||
*
|
||||
* CONTEXT:
|
||||
* spin_lock_irq(q->queue_lock or fq->mq_flush_lock)
|
||||
* spin_lock_irq(fq->mq_flush_lock)
|
||||
*
|
||||
* RETURNS:
|
||||
* %true if flush was issued, %false otherwise.
|
||||
*/
|
||||
static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
|
||||
static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
|
||||
unsigned int flags)
|
||||
{
|
||||
struct list_head *pending = &fq->flush_queue[fq->flush_pending_idx];
|
||||
|
@ -304,7 +265,7 @@ static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
|
|||
|
||||
/* C1 described at the top of this file */
|
||||
if (fq->flush_pending_idx != fq->flush_running_idx || list_empty(pending))
|
||||
return false;
|
||||
return;
|
||||
|
||||
/* C2 and C3
|
||||
*
|
||||
|
@ -312,11 +273,10 @@ static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
|
|||
* assigned to empty flushes, and we deadlock if we are expecting
|
||||
* other requests to make progress. Don't defer for that case.
|
||||
*/
|
||||
if (!list_empty(&fq->flush_data_in_flight) &&
|
||||
!(q->mq_ops && q->elevator) &&
|
||||
if (!list_empty(&fq->flush_data_in_flight) && q->elevator &&
|
||||
time_before(jiffies,
|
||||
fq->flush_pending_since + FLUSH_PENDING_TIMEOUT))
|
||||
return false;
|
||||
return;
|
||||
|
||||
/*
|
||||
* Issue flush and toggle pending_idx. This makes pending_idx
|
||||
|
@ -334,19 +294,15 @@ static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
|
|||
* In case of IO scheduler, flush rq need to borrow scheduler tag
|
||||
* just for cheating put/get driver tag.
|
||||
*/
|
||||
if (q->mq_ops) {
|
||||
struct blk_mq_hw_ctx *hctx;
|
||||
flush_rq->mq_ctx = first_rq->mq_ctx;
|
||||
flush_rq->mq_hctx = first_rq->mq_hctx;
|
||||
|
||||
flush_rq->mq_ctx = first_rq->mq_ctx;
|
||||
|
||||
if (!q->elevator) {
|
||||
fq->orig_rq = first_rq;
|
||||
flush_rq->tag = first_rq->tag;
|
||||
hctx = blk_mq_map_queue(q, first_rq->mq_ctx->cpu);
|
||||
blk_mq_tag_set_rq(hctx, first_rq->tag, flush_rq);
|
||||
} else {
|
||||
flush_rq->internal_tag = first_rq->internal_tag;
|
||||
}
|
||||
if (!q->elevator) {
|
||||
fq->orig_rq = first_rq;
|
||||
flush_rq->tag = first_rq->tag;
|
||||
blk_mq_tag_set_rq(flush_rq->mq_hctx, first_rq->tag, flush_rq);
|
||||
} else {
|
||||
flush_rq->internal_tag = first_rq->internal_tag;
|
||||
}
|
||||
|
||||
flush_rq->cmd_flags = REQ_OP_FLUSH | REQ_PREFLUSH;
|
||||
|
@ -355,62 +311,17 @@ static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
|
|||
flush_rq->rq_disk = first_rq->rq_disk;
|
||||
flush_rq->end_io = flush_end_io;
|
||||
|
||||
return blk_flush_queue_rq(flush_rq, false);
|
||||
}
|
||||
|
||||
static void flush_data_end_io(struct request *rq, blk_status_t error)
|
||||
{
|
||||
struct request_queue *q = rq->q;
|
||||
struct blk_flush_queue *fq = blk_get_flush_queue(q, NULL);
|
||||
|
||||
lockdep_assert_held(q->queue_lock);
|
||||
|
||||
/*
|
||||
* Updating q->in_flight[] here for making this tag usable
|
||||
* early. Because in blk_queue_start_tag(),
|
||||
* q->in_flight[BLK_RW_ASYNC] is used to limit async I/O and
|
||||
* reserve tags for sync I/O.
|
||||
*
|
||||
* More importantly this way can avoid the following I/O
|
||||
* deadlock:
|
||||
*
|
||||
* - suppose there are 40 fua requests comming to flush queue
|
||||
* and queue depth is 31
|
||||
* - 30 rqs are scheduled then blk_queue_start_tag() can't alloc
|
||||
* tag for async I/O any more
|
||||
* - all the 30 rqs are completed before FLUSH_PENDING_TIMEOUT
|
||||
* and flush_data_end_io() is called
|
||||
* - the other rqs still can't go ahead if not updating
|
||||
* q->in_flight[BLK_RW_ASYNC] here, meantime these rqs
|
||||
* are held in flush data queue and make no progress of
|
||||
* handling post flush rq
|
||||
* - only after the post flush rq is handled, all these rqs
|
||||
* can be completed
|
||||
*/
|
||||
|
||||
elv_completed_request(q, rq);
|
||||
|
||||
/* for avoiding double accounting */
|
||||
rq->rq_flags &= ~RQF_STARTED;
|
||||
|
||||
/*
|
||||
* After populating an empty queue, kick it to avoid stall. Read
|
||||
* the comment in flush_end_io().
|
||||
*/
|
||||
if (blk_flush_complete_seq(rq, fq, REQ_FSEQ_DATA, error))
|
||||
blk_run_queue_async(q);
|
||||
blk_flush_queue_rq(flush_rq, false);
|
||||
}
|
||||
|
||||
static void mq_flush_data_end_io(struct request *rq, blk_status_t error)
|
||||
{
|
||||
struct request_queue *q = rq->q;
|
||||
struct blk_mq_hw_ctx *hctx;
|
||||
struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
|
||||
struct blk_mq_ctx *ctx = rq->mq_ctx;
|
||||
unsigned long flags;
|
||||
struct blk_flush_queue *fq = blk_get_flush_queue(q, ctx);
|
||||
|
||||
hctx = blk_mq_map_queue(q, ctx->cpu);
|
||||
|
||||
if (q->elevator) {
|
||||
WARN_ON(rq->tag < 0);
|
||||
blk_mq_put_driver_tag_hctx(hctx, rq);
|
||||
|
@ -443,9 +354,6 @@ void blk_insert_flush(struct request *rq)
|
|||
unsigned int policy = blk_flush_policy(fflags, rq);
|
||||
struct blk_flush_queue *fq = blk_get_flush_queue(q, rq->mq_ctx);
|
||||
|
||||
if (!q->mq_ops)
|
||||
lockdep_assert_held(q->queue_lock);
|
||||
|
||||
/*
|
||||
* @policy now records what operations need to be done. Adjust
|
||||
* REQ_PREFLUSH and FUA for the driver.
|
||||
|
@ -468,10 +376,7 @@ void blk_insert_flush(struct request *rq)
|
|||
* complete the request.
|
||||
*/
|
||||
if (!policy) {
|
||||
if (q->mq_ops)
|
||||
blk_mq_end_request(rq, 0);
|
||||
else
|
||||
__blk_end_request(rq, 0, 0);
|
||||
blk_mq_end_request(rq, 0);
|
||||
return;
|
||||
}
|
||||
|
||||
|
@ -484,10 +389,7 @@ void blk_insert_flush(struct request *rq)
|
|||
*/
|
||||
if ((policy & REQ_FSEQ_DATA) &&
|
||||
!(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
|
||||
if (q->mq_ops)
|
||||
blk_mq_request_bypass_insert(rq, false);
|
||||
else
|
||||
list_add_tail(&rq->queuelist, &q->queue_head);
|
||||
blk_mq_request_bypass_insert(rq, false);
|
||||
return;
|
||||
}
|
||||
|
||||
|
@ -499,17 +401,12 @@ void blk_insert_flush(struct request *rq)
|
|||
INIT_LIST_HEAD(&rq->flush.list);
|
||||
rq->rq_flags |= RQF_FLUSH_SEQ;
|
||||
rq->flush.saved_end_io = rq->end_io; /* Usually NULL */
|
||||
if (q->mq_ops) {
|
||||
rq->end_io = mq_flush_data_end_io;
|
||||
|
||||
spin_lock_irq(&fq->mq_flush_lock);
|
||||
blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
|
||||
spin_unlock_irq(&fq->mq_flush_lock);
|
||||
return;
|
||||
}
|
||||
rq->end_io = flush_data_end_io;
|
||||
rq->end_io = mq_flush_data_end_io;
|
||||
|
||||
spin_lock_irq(&fq->mq_flush_lock);
|
||||
blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
|
||||
spin_unlock_irq(&fq->mq_flush_lock);
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -575,8 +472,7 @@ struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q,
|
|||
if (!fq)
|
||||
goto fail;
|
||||
|
||||
if (q->mq_ops)
|
||||
spin_lock_init(&fq->mq_flush_lock);
|
||||
spin_lock_init(&fq->mq_flush_lock);
|
||||
|
||||
rq_sz = round_up(rq_sz + cmd_size, cache_line_size());
|
||||
fq->flush_rq = kzalloc_node(rq_sz, flags, node);
|
||||
|
|
|
@ -28,7 +28,6 @@ void get_io_context(struct io_context *ioc)
|
|||
BUG_ON(atomic_long_read(&ioc->refcount) <= 0);
|
||||
atomic_long_inc(&ioc->refcount);
|
||||
}
|
||||
EXPORT_SYMBOL(get_io_context);
|
||||
|
||||
static void icq_free_icq_rcu(struct rcu_head *head)
|
||||
{
|
||||
|
@ -48,10 +47,8 @@ static void ioc_exit_icq(struct io_cq *icq)
|
|||
if (icq->flags & ICQ_EXITED)
|
||||
return;
|
||||
|
||||
if (et->uses_mq && et->ops.mq.exit_icq)
|
||||
et->ops.mq.exit_icq(icq);
|
||||
else if (!et->uses_mq && et->ops.sq.elevator_exit_icq_fn)
|
||||
et->ops.sq.elevator_exit_icq_fn(icq);
|
||||
if (et->ops.exit_icq)
|
||||
et->ops.exit_icq(icq);
|
||||
|
||||
icq->flags |= ICQ_EXITED;
|
||||
}
|
||||
|
@ -113,9 +110,9 @@ static void ioc_release_fn(struct work_struct *work)
|
|||
struct io_cq, ioc_node);
|
||||
struct request_queue *q = icq->q;
|
||||
|
||||
if (spin_trylock(q->queue_lock)) {
|
||||
if (spin_trylock(&q->queue_lock)) {
|
||||
ioc_destroy_icq(icq);
|
||||
spin_unlock(q->queue_lock);
|
||||
spin_unlock(&q->queue_lock);
|
||||
} else {
|
||||
spin_unlock_irqrestore(&ioc->lock, flags);
|
||||
cpu_relax();
|
||||
|
@ -162,7 +159,6 @@ void put_io_context(struct io_context *ioc)
|
|||
if (free_ioc)
|
||||
kmem_cache_free(iocontext_cachep, ioc);
|
||||
}
|
||||
EXPORT_SYMBOL(put_io_context);
|
||||
|
||||
/**
|
||||
* put_io_context_active - put active reference on ioc
|
||||
|
@ -173,7 +169,6 @@ EXPORT_SYMBOL(put_io_context);
|
|||
*/
|
||||
void put_io_context_active(struct io_context *ioc)
|
||||
{
|
||||
struct elevator_type *et;
|
||||
unsigned long flags;
|
||||
struct io_cq *icq;
|
||||
|
||||
|
@ -187,25 +182,12 @@ void put_io_context_active(struct io_context *ioc)
|
|||
* reverse double locking. Read comment in ioc_release_fn() for
|
||||
* explanation on the nested locking annotation.
|
||||
*/
|
||||
retry:
|
||||
spin_lock_irqsave_nested(&ioc->lock, flags, 1);
|
||||
hlist_for_each_entry(icq, &ioc->icq_list, ioc_node) {
|
||||
if (icq->flags & ICQ_EXITED)
|
||||
continue;
|
||||
|
||||
et = icq->q->elevator->type;
|
||||
if (et->uses_mq) {
|
||||
ioc_exit_icq(icq);
|
||||
} else {
|
||||
if (spin_trylock(icq->q->queue_lock)) {
|
||||
ioc_exit_icq(icq);
|
||||
spin_unlock(icq->q->queue_lock);
|
||||
} else {
|
||||
spin_unlock_irqrestore(&ioc->lock, flags);
|
||||
cpu_relax();
|
||||
goto retry;
|
||||
}
|
||||
}
|
||||
ioc_exit_icq(icq);
|
||||
}
|
||||
spin_unlock_irqrestore(&ioc->lock, flags);
|
||||
|
||||
|
@ -232,7 +214,7 @@ static void __ioc_clear_queue(struct list_head *icq_list)
|
|||
|
||||
while (!list_empty(icq_list)) {
|
||||
struct io_cq *icq = list_entry(icq_list->next,
|
||||
struct io_cq, q_node);
|
||||
struct io_cq, q_node);
|
||||
struct io_context *ioc = icq->ioc;
|
||||
|
||||
spin_lock_irqsave(&ioc->lock, flags);
|
||||
|
@ -251,16 +233,11 @@ void ioc_clear_queue(struct request_queue *q)
|
|||
{
|
||||
LIST_HEAD(icq_list);
|
||||
|
||||
spin_lock_irq(q->queue_lock);
|
||||
spin_lock_irq(&q->queue_lock);
|
||||
list_splice_init(&q->icq_list, &icq_list);
|
||||
spin_unlock_irq(&q->queue_lock);
|
||||
|
||||
if (q->mq_ops) {
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
__ioc_clear_queue(&icq_list);
|
||||
} else {
|
||||
__ioc_clear_queue(&icq_list);
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
}
|
||||
__ioc_clear_queue(&icq_list);
|
||||
}
|
||||
|
||||
int create_task_io_context(struct task_struct *task, gfp_t gfp_flags, int node)
|
||||
|
@ -336,7 +313,6 @@ struct io_context *get_task_io_context(struct task_struct *task,
|
|||
|
||||
return NULL;
|
||||
}
|
||||
EXPORT_SYMBOL(get_task_io_context);
|
||||
|
||||
/**
|
||||
* ioc_lookup_icq - lookup io_cq from ioc
|
||||
|
@ -350,7 +326,7 @@ struct io_cq *ioc_lookup_icq(struct io_context *ioc, struct request_queue *q)
|
|||
{
|
||||
struct io_cq *icq;
|
||||
|
||||
lockdep_assert_held(q->queue_lock);
|
||||
lockdep_assert_held(&q->queue_lock);
|
||||
|
||||
/*
|
||||
* icq's are indexed from @ioc using radix tree and hint pointer,
|
||||
|
@ -409,16 +385,14 @@ struct io_cq *ioc_create_icq(struct io_context *ioc, struct request_queue *q,
|
|||
INIT_HLIST_NODE(&icq->ioc_node);
|
||||
|
||||
/* lock both q and ioc and try to link @icq */
|
||||
spin_lock_irq(q->queue_lock);
|
||||
spin_lock_irq(&q->queue_lock);
|
||||
spin_lock(&ioc->lock);
|
||||
|
||||
if (likely(!radix_tree_insert(&ioc->icq_tree, q->id, icq))) {
|
||||
hlist_add_head(&icq->ioc_node, &ioc->icq_list);
|
||||
list_add(&icq->q_node, &q->icq_list);
|
||||
if (et->uses_mq && et->ops.mq.init_icq)
|
||||
et->ops.mq.init_icq(icq);
|
||||
else if (!et->uses_mq && et->ops.sq.elevator_init_icq_fn)
|
||||
et->ops.sq.elevator_init_icq_fn(icq);
|
||||
if (et->ops.init_icq)
|
||||
et->ops.init_icq(icq);
|
||||
} else {
|
||||
kmem_cache_free(et->icq_cache, icq);
|
||||
icq = ioc_lookup_icq(ioc, q);
|
||||
|
@ -427,7 +401,7 @@ struct io_cq *ioc_create_icq(struct io_context *ioc, struct request_queue *q,
|
|||
}
|
||||
|
||||
spin_unlock(&ioc->lock);
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
spin_unlock_irq(&q->queue_lock);
|
||||
radix_tree_preload_end();
|
||||
return icq;
|
||||
}
|
||||
|
|
|
@ -262,29 +262,25 @@ static inline void iolat_update_total_lat_avg(struct iolatency_grp *iolat,
|
|||
stat->rqs.mean);
|
||||
}
|
||||
|
||||
static inline bool iolatency_may_queue(struct iolatency_grp *iolat,
|
||||
wait_queue_entry_t *wait,
|
||||
bool first_block)
|
||||
static void iolat_cleanup_cb(struct rq_wait *rqw, void *private_data)
|
||||
{
|
||||
struct rq_wait *rqw = &iolat->rq_wait;
|
||||
atomic_dec(&rqw->inflight);
|
||||
wake_up(&rqw->wait);
|
||||
}
|
||||
|
||||
if (first_block && waitqueue_active(&rqw->wait) &&
|
||||
rqw->wait.head.next != &wait->entry)
|
||||
return false;
|
||||
static bool iolat_acquire_inflight(struct rq_wait *rqw, void *private_data)
|
||||
{
|
||||
struct iolatency_grp *iolat = private_data;
|
||||
return rq_wait_inc_below(rqw, iolat->rq_depth.max_depth);
|
||||
}
|
||||
|
||||
static void __blkcg_iolatency_throttle(struct rq_qos *rqos,
|
||||
struct iolatency_grp *iolat,
|
||||
spinlock_t *lock, bool issue_as_root,
|
||||
bool issue_as_root,
|
||||
bool use_memdelay)
|
||||
__releases(lock)
|
||||
__acquires(lock)
|
||||
{
|
||||
struct rq_wait *rqw = &iolat->rq_wait;
|
||||
unsigned use_delay = atomic_read(&lat_to_blkg(iolat)->use_delay);
|
||||
DEFINE_WAIT(wait);
|
||||
bool first_block = true;
|
||||
|
||||
if (use_delay)
|
||||
blkcg_schedule_throttle(rqos->q, use_memdelay);
|
||||
|
@ -301,27 +297,7 @@ static void __blkcg_iolatency_throttle(struct rq_qos *rqos,
|
|||
return;
|
||||
}
|
||||
|
||||
if (iolatency_may_queue(iolat, &wait, first_block))
|
||||
return;
|
||||
|
||||
do {
|
||||
prepare_to_wait_exclusive(&rqw->wait, &wait,
|
||||
TASK_UNINTERRUPTIBLE);
|
||||
|
||||
if (iolatency_may_queue(iolat, &wait, first_block))
|
||||
break;
|
||||
first_block = false;
|
||||
|
||||
if (lock) {
|
||||
spin_unlock_irq(lock);
|
||||
io_schedule();
|
||||
spin_lock_irq(lock);
|
||||
} else {
|
||||
io_schedule();
|
||||
}
|
||||
} while (1);
|
||||
|
||||
finish_wait(&rqw->wait, &wait);
|
||||
rq_qos_wait(rqw, iolat, iolat_acquire_inflight, iolat_cleanup_cb);
|
||||
}
|
||||
|
||||
#define SCALE_DOWN_FACTOR 2
|
||||
|
@ -478,38 +454,15 @@ static void check_scale_change(struct iolatency_grp *iolat)
|
|||
scale_change(iolat, direction > 0);
|
||||
}
|
||||
|
||||
static void blkcg_iolatency_throttle(struct rq_qos *rqos, struct bio *bio,
|
||||
spinlock_t *lock)
|
||||
static void blkcg_iolatency_throttle(struct rq_qos *rqos, struct bio *bio)
|
||||
{
|
||||
struct blk_iolatency *blkiolat = BLKIOLATENCY(rqos);
|
||||
struct blkcg *blkcg;
|
||||
struct blkcg_gq *blkg;
|
||||
struct request_queue *q = rqos->q;
|
||||
struct blkcg_gq *blkg = bio->bi_blkg;
|
||||
bool issue_as_root = bio_issue_as_root_blkg(bio);
|
||||
|
||||
if (!blk_iolatency_enabled(blkiolat))
|
||||
return;
|
||||
|
||||
rcu_read_lock();
|
||||
blkcg = bio_blkcg(bio);
|
||||
bio_associate_blkcg(bio, &blkcg->css);
|
||||
blkg = blkg_lookup(blkcg, q);
|
||||
if (unlikely(!blkg)) {
|
||||
if (!lock)
|
||||
spin_lock_irq(q->queue_lock);
|
||||
blkg = blkg_lookup_create(blkcg, q);
|
||||
if (IS_ERR(blkg))
|
||||
blkg = NULL;
|
||||
if (!lock)
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
}
|
||||
if (!blkg)
|
||||
goto out;
|
||||
|
||||
bio_issue_init(&bio->bi_issue, bio_sectors(bio));
|
||||
bio_associate_blkg(bio, blkg);
|
||||
out:
|
||||
rcu_read_unlock();
|
||||
while (blkg && blkg->parent) {
|
||||
struct iolatency_grp *iolat = blkg_to_lat(blkg);
|
||||
if (!iolat) {
|
||||
|
@ -518,7 +471,7 @@ out:
|
|||
}
|
||||
|
||||
check_scale_change(iolat);
|
||||
__blkcg_iolatency_throttle(rqos, iolat, lock, issue_as_root,
|
||||
__blkcg_iolatency_throttle(rqos, iolat, issue_as_root,
|
||||
(bio->bi_opf & REQ_SWAP) == REQ_SWAP);
|
||||
blkg = blkg->parent;
|
||||
}
|
||||
|
@ -640,7 +593,7 @@ static void blkcg_iolatency_done_bio(struct rq_qos *rqos, struct bio *bio)
|
|||
bool enabled = false;
|
||||
|
||||
blkg = bio->bi_blkg;
|
||||
if (!blkg)
|
||||
if (!blkg || !bio_flagged(bio, BIO_TRACKED))
|
||||
return;
|
||||
|
||||
iolat = blkg_to_lat(bio->bi_blkg);
|
||||
|
@ -730,7 +683,7 @@ static void blkiolatency_timer_fn(struct timer_list *t)
|
|||
* We could be exiting, don't access the pd unless we have a
|
||||
* ref on the blkg.
|
||||
*/
|
||||
if (!blkg_try_get(blkg))
|
||||
if (!blkg_tryget(blkg))
|
||||
continue;
|
||||
|
||||
iolat = blkg_to_lat(blkg);
|
||||
|
|
|
@ -389,7 +389,6 @@ void blk_recount_segments(struct request_queue *q, struct bio *bio)
|
|||
|
||||
bio_set_flag(bio, BIO_SEG_VALID);
|
||||
}
|
||||
EXPORT_SYMBOL(blk_recount_segments);
|
||||
|
||||
static int blk_phys_contig_segment(struct request_queue *q, struct bio *bio,
|
||||
struct bio *nxt)
|
||||
|
@ -596,17 +595,6 @@ int ll_front_merge_fn(struct request_queue *q, struct request *req,
|
|||
return ll_new_hw_segment(q, req, bio);
|
||||
}
|
||||
|
||||
/*
|
||||
* blk-mq uses req->special to carry normal driver per-request payload, it
|
||||
* does not indicate a prepared command that we cannot merge with.
|
||||
*/
|
||||
static bool req_no_special_merge(struct request *req)
|
||||
{
|
||||
struct request_queue *q = req->q;
|
||||
|
||||
return !q->mq_ops && req->special;
|
||||
}
|
||||
|
||||
static bool req_attempt_discard_merge(struct request_queue *q, struct request *req,
|
||||
struct request *next)
|
||||
{
|
||||
|
@ -632,13 +620,6 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
|
|||
unsigned int seg_size =
|
||||
req->biotail->bi_seg_back_size + next->bio->bi_seg_front_size;
|
||||
|
||||
/*
|
||||
* First check if the either of the requests are re-queued
|
||||
* requests. Can't merge them if they are.
|
||||
*/
|
||||
if (req_no_special_merge(req) || req_no_special_merge(next))
|
||||
return 0;
|
||||
|
||||
if (req_gap_back_merge(req, next->bio))
|
||||
return 0;
|
||||
|
||||
|
@ -703,12 +684,10 @@ static void blk_account_io_merge(struct request *req)
|
|||
{
|
||||
if (blk_do_io_stat(req)) {
|
||||
struct hd_struct *part;
|
||||
int cpu;
|
||||
|
||||
cpu = part_stat_lock();
|
||||
part_stat_lock();
|
||||
part = req->part;
|
||||
|
||||
part_round_stats(req->q, cpu, part);
|
||||
part_dec_in_flight(req->q, part, rq_data_dir(req));
|
||||
|
||||
hd_struct_put(part);
|
||||
|
@ -731,7 +710,8 @@ static inline bool blk_discard_mergable(struct request *req)
|
|||
return false;
|
||||
}
|
||||
|
||||
enum elv_merge blk_try_req_merge(struct request *req, struct request *next)
|
||||
static enum elv_merge blk_try_req_merge(struct request *req,
|
||||
struct request *next)
|
||||
{
|
||||
if (blk_discard_mergable(req))
|
||||
return ELEVATOR_DISCARD_MERGE;
|
||||
|
@ -748,9 +728,6 @@ enum elv_merge blk_try_req_merge(struct request *req, struct request *next)
|
|||
static struct request *attempt_merge(struct request_queue *q,
|
||||
struct request *req, struct request *next)
|
||||
{
|
||||
if (!q->mq_ops)
|
||||
lockdep_assert_held(q->queue_lock);
|
||||
|
||||
if (!rq_mergeable(req) || !rq_mergeable(next))
|
||||
return NULL;
|
||||
|
||||
|
@ -758,8 +735,7 @@ static struct request *attempt_merge(struct request_queue *q,
|
|||
return NULL;
|
||||
|
||||
if (rq_data_dir(req) != rq_data_dir(next)
|
||||
|| req->rq_disk != next->rq_disk
|
||||
|| req_no_special_merge(next))
|
||||
|| req->rq_disk != next->rq_disk)
|
||||
return NULL;
|
||||
|
||||
if (req_op(req) == REQ_OP_WRITE_SAME &&
|
||||
|
@ -773,6 +749,9 @@ static struct request *attempt_merge(struct request_queue *q,
|
|||
if (req->write_hint != next->write_hint)
|
||||
return NULL;
|
||||
|
||||
if (req->ioprio != next->ioprio)
|
||||
return NULL;
|
||||
|
||||
/*
|
||||
* If we are allowed to merge, then append bio list
|
||||
* from next to rq and release next. merge_requests_fn
|
||||
|
@ -828,10 +807,6 @@ static struct request *attempt_merge(struct request_queue *q,
|
|||
*/
|
||||
blk_account_io_merge(next);
|
||||
|
||||
req->ioprio = ioprio_best(req->ioprio, next->ioprio);
|
||||
if (blk_rq_cpu_valid(next))
|
||||
req->cpu = next->cpu;
|
||||
|
||||
/*
|
||||
* ownership of bio passed from next to req, return 'next' for
|
||||
* the caller to free
|
||||
|
@ -863,16 +838,11 @@ struct request *attempt_front_merge(struct request_queue *q, struct request *rq)
|
|||
int blk_attempt_req_merge(struct request_queue *q, struct request *rq,
|
||||
struct request *next)
|
||||
{
|
||||
struct elevator_queue *e = q->elevator;
|
||||
struct request *free;
|
||||
|
||||
if (!e->uses_mq && e->type->ops.sq.elevator_allow_rq_merge_fn)
|
||||
if (!e->type->ops.sq.elevator_allow_rq_merge_fn(q, rq, next))
|
||||
return 0;
|
||||
|
||||
free = attempt_merge(q, rq, next);
|
||||
if (free) {
|
||||
__blk_put_request(q, free);
|
||||
blk_put_request(free);
|
||||
return 1;
|
||||
}
|
||||
|
||||
|
@ -891,8 +861,8 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
|
|||
if (bio_data_dir(bio) != rq_data_dir(rq))
|
||||
return false;
|
||||
|
||||
/* must be same device and not a special request */
|
||||
if (rq->rq_disk != bio->bi_disk || req_no_special_merge(rq))
|
||||
/* must be same device */
|
||||
if (rq->rq_disk != bio->bi_disk)
|
||||
return false;
|
||||
|
||||
/* only merge integrity protected bio into ditto rq */
|
||||
|
@ -911,6 +881,9 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
|
|||
if (rq->write_hint != bio->bi_write_hint)
|
||||
return false;
|
||||
|
||||
if (rq->ioprio != bio_prio(bio))
|
||||
return false;
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
|
|
|
@ -14,9 +14,10 @@
|
|||
#include "blk.h"
|
||||
#include "blk-mq.h"
|
||||
|
||||
static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
|
||||
static int cpu_to_queue_index(struct blk_mq_queue_map *qmap,
|
||||
unsigned int nr_queues, const int cpu)
|
||||
{
|
||||
return cpu % nr_queues;
|
||||
return qmap->queue_offset + (cpu % nr_queues);
|
||||
}
|
||||
|
||||
static int get_first_sibling(unsigned int cpu)
|
||||
|
@ -30,10 +31,10 @@ static int get_first_sibling(unsigned int cpu)
|
|||
return cpu;
|
||||
}
|
||||
|
||||
int blk_mq_map_queues(struct blk_mq_tag_set *set)
|
||||
int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
|
||||
{
|
||||
unsigned int *map = set->mq_map;
|
||||
unsigned int nr_queues = set->nr_hw_queues;
|
||||
unsigned int *map = qmap->mq_map;
|
||||
unsigned int nr_queues = qmap->nr_queues;
|
||||
unsigned int cpu, first_sibling;
|
||||
|
||||
for_each_possible_cpu(cpu) {
|
||||
|
@ -44,11 +45,11 @@ int blk_mq_map_queues(struct blk_mq_tag_set *set)
|
|||
* performace optimizations.
|
||||
*/
|
||||
if (cpu < nr_queues) {
|
||||
map[cpu] = cpu_to_queue_index(nr_queues, cpu);
|
||||
map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
|
||||
} else {
|
||||
first_sibling = get_first_sibling(cpu);
|
||||
if (first_sibling == cpu)
|
||||
map[cpu] = cpu_to_queue_index(nr_queues, cpu);
|
||||
map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
|
||||
else
|
||||
map[cpu] = map[first_sibling];
|
||||
}
|
||||
|
@ -62,12 +63,12 @@ EXPORT_SYMBOL_GPL(blk_mq_map_queues);
|
|||
* We have no quick way of doing reverse lookups. This is only used at
|
||||
* queue init time, so runtime isn't important.
|
||||
*/
|
||||
int blk_mq_hw_queue_to_node(unsigned int *mq_map, unsigned int index)
|
||||
int blk_mq_hw_queue_to_node(struct blk_mq_queue_map *qmap, unsigned int index)
|
||||
{
|
||||
int i;
|
||||
|
||||
for_each_possible_cpu(i) {
|
||||
if (index == mq_map[i])
|
||||
if (index == qmap->mq_map[i])
|
||||
return local_memory_node(cpu_to_node(i));
|
||||
}
|
||||
|
||||
|
|
|
@ -23,6 +23,7 @@
|
|||
#include "blk-mq.h"
|
||||
#include "blk-mq-debugfs.h"
|
||||
#include "blk-mq-tag.h"
|
||||
#include "blk-rq-qos.h"
|
||||
|
||||
static void print_stat(struct seq_file *m, struct blk_rq_stat *stat)
|
||||
{
|
||||
|
@ -112,10 +113,8 @@ static int queue_pm_only_show(void *data, struct seq_file *m)
|
|||
|
||||
#define QUEUE_FLAG_NAME(name) [QUEUE_FLAG_##name] = #name
|
||||
static const char *const blk_queue_flag_name[] = {
|
||||
QUEUE_FLAG_NAME(QUEUED),
|
||||
QUEUE_FLAG_NAME(STOPPED),
|
||||
QUEUE_FLAG_NAME(DYING),
|
||||
QUEUE_FLAG_NAME(BYPASS),
|
||||
QUEUE_FLAG_NAME(BIDI),
|
||||
QUEUE_FLAG_NAME(NOMERGES),
|
||||
QUEUE_FLAG_NAME(SAME_COMP),
|
||||
|
@ -318,7 +317,6 @@ static const char *const cmd_flag_name[] = {
|
|||
static const char *const rqf_name[] = {
|
||||
RQF_NAME(SORTED),
|
||||
RQF_NAME(STARTED),
|
||||
RQF_NAME(QUEUED),
|
||||
RQF_NAME(SOFTBARRIER),
|
||||
RQF_NAME(FLUSH_SEQ),
|
||||
RQF_NAME(MIXED_MERGE),
|
||||
|
@ -424,15 +422,18 @@ struct show_busy_params {
|
|||
|
||||
/*
|
||||
* Note: the state of a request may change while this function is in progress,
|
||||
* e.g. due to a concurrent blk_mq_finish_request() call.
|
||||
* e.g. due to a concurrent blk_mq_finish_request() call. Returns true to
|
||||
* keep iterating requests.
|
||||
*/
|
||||
static void hctx_show_busy_rq(struct request *rq, void *data, bool reserved)
|
||||
static bool hctx_show_busy_rq(struct request *rq, void *data, bool reserved)
|
||||
{
|
||||
const struct show_busy_params *params = data;
|
||||
|
||||
if (blk_mq_map_queue(rq->q, rq->mq_ctx->cpu) == params->hctx)
|
||||
if (rq->mq_hctx == params->hctx)
|
||||
__blk_mq_debugfs_rq_show(params->m,
|
||||
list_entry_rq(&rq->queuelist));
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
static int hctx_busy_show(void *data, struct seq_file *m)
|
||||
|
@ -446,6 +447,21 @@ static int hctx_busy_show(void *data, struct seq_file *m)
|
|||
return 0;
|
||||
}
|
||||
|
||||
static const char *const hctx_types[] = {
|
||||
[HCTX_TYPE_DEFAULT] = "default",
|
||||
[HCTX_TYPE_READ] = "read",
|
||||
[HCTX_TYPE_POLL] = "poll",
|
||||
};
|
||||
|
||||
static int hctx_type_show(void *data, struct seq_file *m)
|
||||
{
|
||||
struct blk_mq_hw_ctx *hctx = data;
|
||||
|
||||
BUILD_BUG_ON(ARRAY_SIZE(hctx_types) != HCTX_MAX_TYPES);
|
||||
seq_printf(m, "%s\n", hctx_types[hctx->type]);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int hctx_ctx_map_show(void *data, struct seq_file *m)
|
||||
{
|
||||
struct blk_mq_hw_ctx *hctx = data;
|
||||
|
@ -636,36 +652,43 @@ static int hctx_dispatch_busy_show(void *data, struct seq_file *m)
|
|||
return 0;
|
||||
}
|
||||
|
||||
static void *ctx_rq_list_start(struct seq_file *m, loff_t *pos)
|
||||
__acquires(&ctx->lock)
|
||||
{
|
||||
struct blk_mq_ctx *ctx = m->private;
|
||||
|
||||
spin_lock(&ctx->lock);
|
||||
return seq_list_start(&ctx->rq_list, *pos);
|
||||
#define CTX_RQ_SEQ_OPS(name, type) \
|
||||
static void *ctx_##name##_rq_list_start(struct seq_file *m, loff_t *pos) \
|
||||
__acquires(&ctx->lock) \
|
||||
{ \
|
||||
struct blk_mq_ctx *ctx = m->private; \
|
||||
\
|
||||
spin_lock(&ctx->lock); \
|
||||
return seq_list_start(&ctx->rq_lists[type], *pos); \
|
||||
} \
|
||||
\
|
||||
static void *ctx_##name##_rq_list_next(struct seq_file *m, void *v, \
|
||||
loff_t *pos) \
|
||||
{ \
|
||||
struct blk_mq_ctx *ctx = m->private; \
|
||||
\
|
||||
return seq_list_next(v, &ctx->rq_lists[type], pos); \
|
||||
} \
|
||||
\
|
||||
static void ctx_##name##_rq_list_stop(struct seq_file *m, void *v) \
|
||||
__releases(&ctx->lock) \
|
||||
{ \
|
||||
struct blk_mq_ctx *ctx = m->private; \
|
||||
\
|
||||
spin_unlock(&ctx->lock); \
|
||||
} \
|
||||
\
|
||||
static const struct seq_operations ctx_##name##_rq_list_seq_ops = { \
|
||||
.start = ctx_##name##_rq_list_start, \
|
||||
.next = ctx_##name##_rq_list_next, \
|
||||
.stop = ctx_##name##_rq_list_stop, \
|
||||
.show = blk_mq_debugfs_rq_show, \
|
||||
}
|
||||
|
||||
static void *ctx_rq_list_next(struct seq_file *m, void *v, loff_t *pos)
|
||||
{
|
||||
struct blk_mq_ctx *ctx = m->private;
|
||||
CTX_RQ_SEQ_OPS(default, HCTX_TYPE_DEFAULT);
|
||||
CTX_RQ_SEQ_OPS(read, HCTX_TYPE_READ);
|
||||
CTX_RQ_SEQ_OPS(poll, HCTX_TYPE_POLL);
|
||||
|
||||
return seq_list_next(v, &ctx->rq_list, pos);
|
||||
}
|
||||
|
||||
static void ctx_rq_list_stop(struct seq_file *m, void *v)
|
||||
__releases(&ctx->lock)
|
||||
{
|
||||
struct blk_mq_ctx *ctx = m->private;
|
||||
|
||||
spin_unlock(&ctx->lock);
|
||||
}
|
||||
|
||||
static const struct seq_operations ctx_rq_list_seq_ops = {
|
||||
.start = ctx_rq_list_start,
|
||||
.next = ctx_rq_list_next,
|
||||
.stop = ctx_rq_list_stop,
|
||||
.show = blk_mq_debugfs_rq_show,
|
||||
};
|
||||
static int ctx_dispatched_show(void *data, struct seq_file *m)
|
||||
{
|
||||
struct blk_mq_ctx *ctx = data;
|
||||
|
@ -798,11 +821,14 @@ static const struct blk_mq_debugfs_attr blk_mq_debugfs_hctx_attrs[] = {
|
|||
{"run", 0600, hctx_run_show, hctx_run_write},
|
||||
{"active", 0400, hctx_active_show},
|
||||
{"dispatch_busy", 0400, hctx_dispatch_busy_show},
|
||||
{"type", 0400, hctx_type_show},
|
||||
{},
|
||||
};
|
||||
|
||||
static const struct blk_mq_debugfs_attr blk_mq_debugfs_ctx_attrs[] = {
|
||||
{"rq_list", 0400, .seq_ops = &ctx_rq_list_seq_ops},
|
||||
{"default_rq_list", 0400, .seq_ops = &ctx_default_rq_list_seq_ops},
|
||||
{"read_rq_list", 0400, .seq_ops = &ctx_read_rq_list_seq_ops},
|
||||
{"poll_rq_list", 0400, .seq_ops = &ctx_poll_rq_list_seq_ops},
|
||||
{"dispatched", 0600, ctx_dispatched_show, ctx_dispatched_write},
|
||||
{"merged", 0600, ctx_merged_show, ctx_merged_write},
|
||||
{"completed", 0600, ctx_completed_show, ctx_completed_write},
|
||||
|
@ -856,6 +882,15 @@ int blk_mq_debugfs_register(struct request_queue *q)
|
|||
goto err;
|
||||
}
|
||||
|
||||
if (q->rq_qos) {
|
||||
struct rq_qos *rqos = q->rq_qos;
|
||||
|
||||
while (rqos) {
|
||||
blk_mq_debugfs_register_rqos(rqos);
|
||||
rqos = rqos->next;
|
||||
}
|
||||
}
|
||||
|
||||
return 0;
|
||||
|
||||
err:
|
||||
|
@ -978,6 +1013,50 @@ void blk_mq_debugfs_unregister_sched(struct request_queue *q)
|
|||
q->sched_debugfs_dir = NULL;
|
||||
}
|
||||
|
||||
void blk_mq_debugfs_unregister_rqos(struct rq_qos *rqos)
|
||||
{
|
||||
debugfs_remove_recursive(rqos->debugfs_dir);
|
||||
rqos->debugfs_dir = NULL;
|
||||
}
|
||||
|
||||
int blk_mq_debugfs_register_rqos(struct rq_qos *rqos)
|
||||
{
|
||||
struct request_queue *q = rqos->q;
|
||||
const char *dir_name = rq_qos_id_to_name(rqos->id);
|
||||
|
||||
if (!q->debugfs_dir)
|
||||
return -ENOENT;
|
||||
|
||||
if (rqos->debugfs_dir || !rqos->ops->debugfs_attrs)
|
||||
return 0;
|
||||
|
||||
if (!q->rqos_debugfs_dir) {
|
||||
q->rqos_debugfs_dir = debugfs_create_dir("rqos",
|
||||
q->debugfs_dir);
|
||||
if (!q->rqos_debugfs_dir)
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
rqos->debugfs_dir = debugfs_create_dir(dir_name,
|
||||
rqos->q->rqos_debugfs_dir);
|
||||
if (!rqos->debugfs_dir)
|
||||
return -ENOMEM;
|
||||
|
||||
if (!debugfs_create_files(rqos->debugfs_dir, rqos,
|
||||
rqos->ops->debugfs_attrs))
|
||||
goto err;
|
||||
return 0;
|
||||
err:
|
||||
blk_mq_debugfs_unregister_rqos(rqos);
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
void blk_mq_debugfs_unregister_queue_rqos(struct request_queue *q)
|
||||
{
|
||||
debugfs_remove_recursive(q->rqos_debugfs_dir);
|
||||
q->rqos_debugfs_dir = NULL;
|
||||
}
|
||||
|
||||
int blk_mq_debugfs_register_sched_hctx(struct request_queue *q,
|
||||
struct blk_mq_hw_ctx *hctx)
|
||||
{
|
||||
|
|
|
@ -31,6 +31,10 @@ void blk_mq_debugfs_unregister_sched(struct request_queue *q);
|
|||
int blk_mq_debugfs_register_sched_hctx(struct request_queue *q,
|
||||
struct blk_mq_hw_ctx *hctx);
|
||||
void blk_mq_debugfs_unregister_sched_hctx(struct blk_mq_hw_ctx *hctx);
|
||||
|
||||
int blk_mq_debugfs_register_rqos(struct rq_qos *rqos);
|
||||
void blk_mq_debugfs_unregister_rqos(struct rq_qos *rqos);
|
||||
void blk_mq_debugfs_unregister_queue_rqos(struct request_queue *q);
|
||||
#else
|
||||
static inline int blk_mq_debugfs_register(struct request_queue *q)
|
||||
{
|
||||
|
@ -78,6 +82,19 @@ static inline int blk_mq_debugfs_register_sched_hctx(struct request_queue *q,
|
|||
static inline void blk_mq_debugfs_unregister_sched_hctx(struct blk_mq_hw_ctx *hctx)
|
||||
{
|
||||
}
|
||||
|
||||
static inline int blk_mq_debugfs_register_rqos(struct rq_qos *rqos)
|
||||
{
|
||||
return 0;
|
||||
}
|
||||
|
||||
static inline void blk_mq_debugfs_unregister_rqos(struct rq_qos *rqos)
|
||||
{
|
||||
}
|
||||
|
||||
static inline void blk_mq_debugfs_unregister_queue_rqos(struct request_queue *q)
|
||||
{
|
||||
}
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_BLK_DEBUG_FS_ZONED
|
||||
|
|
|
@ -31,26 +31,26 @@
|
|||
* that maps a queue to the CPUs that have irq affinity for the corresponding
|
||||
* vector.
|
||||
*/
|
||||
int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev,
|
||||
int blk_mq_pci_map_queues(struct blk_mq_queue_map *qmap, struct pci_dev *pdev,
|
||||
int offset)
|
||||
{
|
||||
const struct cpumask *mask;
|
||||
unsigned int queue, cpu;
|
||||
|
||||
for (queue = 0; queue < set->nr_hw_queues; queue++) {
|
||||
for (queue = 0; queue < qmap->nr_queues; queue++) {
|
||||
mask = pci_irq_get_affinity(pdev, queue + offset);
|
||||
if (!mask)
|
||||
goto fallback;
|
||||
|
||||
for_each_cpu(cpu, mask)
|
||||
set->mq_map[cpu] = queue;
|
||||
qmap->mq_map[cpu] = qmap->queue_offset + queue;
|
||||
}
|
||||
|
||||
return 0;
|
||||
|
||||
fallback:
|
||||
WARN_ON_ONCE(set->nr_hw_queues > 1);
|
||||
blk_mq_clear_mq_map(set);
|
||||
WARN_ON_ONCE(qmap->nr_queues > 1);
|
||||
blk_mq_clear_mq_map(qmap);
|
||||
return 0;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_mq_pci_map_queues);
|
||||
|
|
|
@ -29,24 +29,24 @@
|
|||
* @set->nr_hw_queues, or @dev does not provide an affinity mask for a
|
||||
* vector, we fallback to the naive mapping.
|
||||
*/
|
||||
int blk_mq_rdma_map_queues(struct blk_mq_tag_set *set,
|
||||
int blk_mq_rdma_map_queues(struct blk_mq_queue_map *map,
|
||||
struct ib_device *dev, int first_vec)
|
||||
{
|
||||
const struct cpumask *mask;
|
||||
unsigned int queue, cpu;
|
||||
|
||||
for (queue = 0; queue < set->nr_hw_queues; queue++) {
|
||||
for (queue = 0; queue < map->nr_queues; queue++) {
|
||||
mask = ib_get_vector_affinity(dev, first_vec + queue);
|
||||
if (!mask)
|
||||
goto fallback;
|
||||
|
||||
for_each_cpu(cpu, mask)
|
||||
set->mq_map[cpu] = queue;
|
||||
map->mq_map[cpu] = map->queue_offset + queue;
|
||||
}
|
||||
|
||||
return 0;
|
||||
|
||||
fallback:
|
||||
return blk_mq_map_queues(set);
|
||||
return blk_mq_map_queues(map);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_mq_rdma_map_queues);
|
||||
|
|
|
@ -31,15 +31,22 @@ void blk_mq_sched_free_hctx_data(struct request_queue *q,
|
|||
}
|
||||
EXPORT_SYMBOL_GPL(blk_mq_sched_free_hctx_data);
|
||||
|
||||
void blk_mq_sched_assign_ioc(struct request *rq, struct bio *bio)
|
||||
void blk_mq_sched_assign_ioc(struct request *rq)
|
||||
{
|
||||
struct request_queue *q = rq->q;
|
||||
struct io_context *ioc = rq_ioc(bio);
|
||||
struct io_context *ioc;
|
||||
struct io_cq *icq;
|
||||
|
||||
spin_lock_irq(q->queue_lock);
|
||||
/*
|
||||
* May not have an IO context if it's a passthrough request
|
||||
*/
|
||||
ioc = current->io_context;
|
||||
if (!ioc)
|
||||
return;
|
||||
|
||||
spin_lock_irq(&q->queue_lock);
|
||||
icq = ioc_lookup_icq(ioc, q);
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
spin_unlock_irq(&q->queue_lock);
|
||||
|
||||
if (!icq) {
|
||||
icq = ioc_create_icq(ioc, q, GFP_ATOMIC);
|
||||
|
@ -54,13 +61,14 @@ void blk_mq_sched_assign_ioc(struct request *rq, struct bio *bio)
|
|||
* Mark a hardware queue as needing a restart. For shared queues, maintain
|
||||
* a count of how many hardware queues are marked for restart.
|
||||
*/
|
||||
static void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx)
|
||||
void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx)
|
||||
{
|
||||
if (test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state))
|
||||
return;
|
||||
|
||||
set_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_mq_sched_mark_restart_hctx);
|
||||
|
||||
void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
|
||||
{
|
||||
|
@ -85,14 +93,13 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
|
|||
do {
|
||||
struct request *rq;
|
||||
|
||||
if (e->type->ops.mq.has_work &&
|
||||
!e->type->ops.mq.has_work(hctx))
|
||||
if (e->type->ops.has_work && !e->type->ops.has_work(hctx))
|
||||
break;
|
||||
|
||||
if (!blk_mq_get_dispatch_budget(hctx))
|
||||
break;
|
||||
|
||||
rq = e->type->ops.mq.dispatch_request(hctx);
|
||||
rq = e->type->ops.dispatch_request(hctx);
|
||||
if (!rq) {
|
||||
blk_mq_put_dispatch_budget(hctx);
|
||||
break;
|
||||
|
@ -110,7 +117,7 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
|
|||
static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
|
||||
struct blk_mq_ctx *ctx)
|
||||
{
|
||||
unsigned idx = ctx->index_hw;
|
||||
unsigned short idx = ctx->index_hw[hctx->type];
|
||||
|
||||
if (++idx == hctx->nr_ctx)
|
||||
idx = 0;
|
||||
|
@ -163,7 +170,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
|
|||
{
|
||||
struct request_queue *q = hctx->queue;
|
||||
struct elevator_queue *e = q->elevator;
|
||||
const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
|
||||
const bool has_sched_dispatch = e && e->type->ops.dispatch_request;
|
||||
LIST_HEAD(rq_list);
|
||||
|
||||
/* RCU or SRCU read lock is needed before checking quiesced flag */
|
||||
|
@ -295,11 +302,14 @@ EXPORT_SYMBOL_GPL(blk_mq_bio_list_merge);
|
|||
* too much time checking for merges.
|
||||
*/
|
||||
static bool blk_mq_attempt_merge(struct request_queue *q,
|
||||
struct blk_mq_hw_ctx *hctx,
|
||||
struct blk_mq_ctx *ctx, struct bio *bio)
|
||||
{
|
||||
enum hctx_type type = hctx->type;
|
||||
|
||||
lockdep_assert_held(&ctx->lock);
|
||||
|
||||
if (blk_mq_bio_list_merge(q, &ctx->rq_list, bio)) {
|
||||
if (blk_mq_bio_list_merge(q, &ctx->rq_lists[type], bio)) {
|
||||
ctx->rq_merged++;
|
||||
return true;
|
||||
}
|
||||
|
@ -311,19 +321,21 @@ bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio)
|
|||
{
|
||||
struct elevator_queue *e = q->elevator;
|
||||
struct blk_mq_ctx *ctx = blk_mq_get_ctx(q);
|
||||
struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
|
||||
struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, bio->bi_opf, ctx->cpu);
|
||||
bool ret = false;
|
||||
enum hctx_type type;
|
||||
|
||||
if (e && e->type->ops.mq.bio_merge) {
|
||||
if (e && e->type->ops.bio_merge) {
|
||||
blk_mq_put_ctx(ctx);
|
||||
return e->type->ops.mq.bio_merge(hctx, bio);
|
||||
return e->type->ops.bio_merge(hctx, bio);
|
||||
}
|
||||
|
||||
type = hctx->type;
|
||||
if ((hctx->flags & BLK_MQ_F_SHOULD_MERGE) &&
|
||||
!list_empty_careful(&ctx->rq_list)) {
|
||||
!list_empty_careful(&ctx->rq_lists[type])) {
|
||||
/* default per sw-queue merge */
|
||||
spin_lock(&ctx->lock);
|
||||
ret = blk_mq_attempt_merge(q, ctx, bio);
|
||||
ret = blk_mq_attempt_merge(q, hctx, ctx, bio);
|
||||
spin_unlock(&ctx->lock);
|
||||
}
|
||||
|
||||
|
@ -367,7 +379,7 @@ void blk_mq_sched_insert_request(struct request *rq, bool at_head,
|
|||
struct request_queue *q = rq->q;
|
||||
struct elevator_queue *e = q->elevator;
|
||||
struct blk_mq_ctx *ctx = rq->mq_ctx;
|
||||
struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
|
||||
struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
|
||||
|
||||
/* flush rq in flush machinery need to be dispatched directly */
|
||||
if (!(rq->rq_flags & RQF_FLUSH_SEQ) && op_is_flush(rq->cmd_flags)) {
|
||||
|
@ -380,11 +392,11 @@ void blk_mq_sched_insert_request(struct request *rq, bool at_head,
|
|||
if (blk_mq_sched_bypass_insert(hctx, !!e, rq))
|
||||
goto run;
|
||||
|
||||
if (e && e->type->ops.mq.insert_requests) {
|
||||
if (e && e->type->ops.insert_requests) {
|
||||
LIST_HEAD(list);
|
||||
|
||||
list_add(&rq->queuelist, &list);
|
||||
e->type->ops.mq.insert_requests(hctx, &list, at_head);
|
||||
e->type->ops.insert_requests(hctx, &list, at_head);
|
||||
} else {
|
||||
spin_lock(&ctx->lock);
|
||||
__blk_mq_insert_request(hctx, rq, at_head);
|
||||
|
@ -396,27 +408,25 @@ run:
|
|||
blk_mq_run_hw_queue(hctx, async);
|
||||
}
|
||||
|
||||
void blk_mq_sched_insert_requests(struct request_queue *q,
|
||||
void blk_mq_sched_insert_requests(struct blk_mq_hw_ctx *hctx,
|
||||
struct blk_mq_ctx *ctx,
|
||||
struct list_head *list, bool run_queue_async)
|
||||
{
|
||||
struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
|
||||
struct elevator_queue *e = hctx->queue->elevator;
|
||||
struct elevator_queue *e;
|
||||
|
||||
if (e && e->type->ops.mq.insert_requests)
|
||||
e->type->ops.mq.insert_requests(hctx, list, false);
|
||||
e = hctx->queue->elevator;
|
||||
if (e && e->type->ops.insert_requests)
|
||||
e->type->ops.insert_requests(hctx, list, false);
|
||||
else {
|
||||
/*
|
||||
* try to issue requests directly if the hw queue isn't
|
||||
* busy in case of 'none' scheduler, and this way may save
|
||||
* us one extra enqueue & dequeue to sw queue.
|
||||
*/
|
||||
if (!hctx->dispatch_busy && !e && !run_queue_async) {
|
||||
if (!hctx->dispatch_busy && !e && !run_queue_async)
|
||||
blk_mq_try_issue_list_directly(hctx, list);
|
||||
if (list_empty(list))
|
||||
return;
|
||||
}
|
||||
blk_mq_insert_requests(hctx, ctx, list);
|
||||
else
|
||||
blk_mq_insert_requests(hctx, ctx, list);
|
||||
}
|
||||
|
||||
blk_mq_run_hw_queue(hctx, run_queue_async);
|
||||
|
@ -489,15 +499,15 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
|
|||
goto err;
|
||||
}
|
||||
|
||||
ret = e->ops.mq.init_sched(q, e);
|
||||
ret = e->ops.init_sched(q, e);
|
||||
if (ret)
|
||||
goto err;
|
||||
|
||||
blk_mq_debugfs_register_sched(q);
|
||||
|
||||
queue_for_each_hw_ctx(q, hctx, i) {
|
||||
if (e->ops.mq.init_hctx) {
|
||||
ret = e->ops.mq.init_hctx(hctx, i);
|
||||
if (e->ops.init_hctx) {
|
||||
ret = e->ops.init_hctx(hctx, i);
|
||||
if (ret) {
|
||||
eq = q->elevator;
|
||||
blk_mq_exit_sched(q, eq);
|
||||
|
@ -523,14 +533,14 @@ void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e)
|
|||
|
||||
queue_for_each_hw_ctx(q, hctx, i) {
|
||||
blk_mq_debugfs_unregister_sched_hctx(hctx);
|
||||
if (e->type->ops.mq.exit_hctx && hctx->sched_data) {
|
||||
e->type->ops.mq.exit_hctx(hctx, i);
|
||||
if (e->type->ops.exit_hctx && hctx->sched_data) {
|
||||
e->type->ops.exit_hctx(hctx, i);
|
||||
hctx->sched_data = NULL;
|
||||
}
|
||||
}
|
||||
blk_mq_debugfs_unregister_sched(q);
|
||||
if (e->type->ops.mq.exit_sched)
|
||||
e->type->ops.mq.exit_sched(e);
|
||||
if (e->type->ops.exit_sched)
|
||||
e->type->ops.exit_sched(e);
|
||||
blk_mq_sched_tags_teardown(q);
|
||||
q->elevator = NULL;
|
||||
}
|
||||
|
|
|
@ -8,18 +8,19 @@
|
|||
void blk_mq_sched_free_hctx_data(struct request_queue *q,
|
||||
void (*exit)(struct blk_mq_hw_ctx *));
|
||||
|
||||
void blk_mq_sched_assign_ioc(struct request *rq, struct bio *bio);
|
||||
void blk_mq_sched_assign_ioc(struct request *rq);
|
||||
|
||||
void blk_mq_sched_request_inserted(struct request *rq);
|
||||
bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
|
||||
struct request **merged_request);
|
||||
bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio);
|
||||
bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq);
|
||||
void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx);
|
||||
void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx);
|
||||
|
||||
void blk_mq_sched_insert_request(struct request *rq, bool at_head,
|
||||
bool run_queue, bool async);
|
||||
void blk_mq_sched_insert_requests(struct request_queue *q,
|
||||
void blk_mq_sched_insert_requests(struct blk_mq_hw_ctx *hctx,
|
||||
struct blk_mq_ctx *ctx,
|
||||
struct list_head *list, bool run_queue_async);
|
||||
|
||||
|
@ -43,8 +44,8 @@ blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
|
|||
{
|
||||
struct elevator_queue *e = q->elevator;
|
||||
|
||||
if (e && e->type->ops.mq.allow_merge)
|
||||
return e->type->ops.mq.allow_merge(q, rq, bio);
|
||||
if (e && e->type->ops.allow_merge)
|
||||
return e->type->ops.allow_merge(q, rq, bio);
|
||||
|
||||
return true;
|
||||
}
|
||||
|
@ -53,8 +54,8 @@ static inline void blk_mq_sched_completed_request(struct request *rq, u64 now)
|
|||
{
|
||||
struct elevator_queue *e = rq->q->elevator;
|
||||
|
||||
if (e && e->type->ops.mq.completed_request)
|
||||
e->type->ops.mq.completed_request(rq, now);
|
||||
if (e && e->type->ops.completed_request)
|
||||
e->type->ops.completed_request(rq, now);
|
||||
}
|
||||
|
||||
static inline void blk_mq_sched_started_request(struct request *rq)
|
||||
|
@ -62,8 +63,8 @@ static inline void blk_mq_sched_started_request(struct request *rq)
|
|||
struct request_queue *q = rq->q;
|
||||
struct elevator_queue *e = q->elevator;
|
||||
|
||||
if (e && e->type->ops.mq.started_request)
|
||||
e->type->ops.mq.started_request(rq);
|
||||
if (e && e->type->ops.started_request)
|
||||
e->type->ops.started_request(rq);
|
||||
}
|
||||
|
||||
static inline void blk_mq_sched_requeue_request(struct request *rq)
|
||||
|
@ -71,16 +72,16 @@ static inline void blk_mq_sched_requeue_request(struct request *rq)
|
|||
struct request_queue *q = rq->q;
|
||||
struct elevator_queue *e = q->elevator;
|
||||
|
||||
if (e && e->type->ops.mq.requeue_request)
|
||||
e->type->ops.mq.requeue_request(rq);
|
||||
if (e && e->type->ops.requeue_request)
|
||||
e->type->ops.requeue_request(rq);
|
||||
}
|
||||
|
||||
static inline bool blk_mq_sched_has_work(struct blk_mq_hw_ctx *hctx)
|
||||
{
|
||||
struct elevator_queue *e = hctx->queue->elevator;
|
||||
|
||||
if (e && e->type->ops.mq.has_work)
|
||||
return e->type->ops.mq.has_work(hctx);
|
||||
if (e && e->type->ops.has_work)
|
||||
return e->type->ops.has_work(hctx);
|
||||
|
||||
return false;
|
||||
}
|
||||
|
|
|
@ -15,6 +15,18 @@
|
|||
|
||||
static void blk_mq_sysfs_release(struct kobject *kobj)
|
||||
{
|
||||
struct blk_mq_ctxs *ctxs = container_of(kobj, struct blk_mq_ctxs, kobj);
|
||||
|
||||
free_percpu(ctxs->queue_ctx);
|
||||
kfree(ctxs);
|
||||
}
|
||||
|
||||
static void blk_mq_ctx_sysfs_release(struct kobject *kobj)
|
||||
{
|
||||
struct blk_mq_ctx *ctx = container_of(kobj, struct blk_mq_ctx, kobj);
|
||||
|
||||
/* ctx->ctxs won't be released until all ctx are freed */
|
||||
kobject_put(&ctx->ctxs->kobj);
|
||||
}
|
||||
|
||||
static void blk_mq_hw_sysfs_release(struct kobject *kobj)
|
||||
|
@ -203,7 +215,7 @@ static struct kobj_type blk_mq_ktype = {
|
|||
static struct kobj_type blk_mq_ctx_ktype = {
|
||||
.sysfs_ops = &blk_mq_sysfs_ops,
|
||||
.default_attrs = default_ctx_attrs,
|
||||
.release = blk_mq_sysfs_release,
|
||||
.release = blk_mq_ctx_sysfs_release,
|
||||
};
|
||||
|
||||
static struct kobj_type blk_mq_hw_ktype = {
|
||||
|
@ -235,7 +247,7 @@ static int blk_mq_register_hctx(struct blk_mq_hw_ctx *hctx)
|
|||
if (!hctx->nr_ctx)
|
||||
return 0;
|
||||
|
||||
ret = kobject_add(&hctx->kobj, &q->mq_kobj, "%u", hctx->queue_num);
|
||||
ret = kobject_add(&hctx->kobj, q->mq_kobj, "%u", hctx->queue_num);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
|
@ -258,8 +270,8 @@ void blk_mq_unregister_dev(struct device *dev, struct request_queue *q)
|
|||
queue_for_each_hw_ctx(q, hctx, i)
|
||||
blk_mq_unregister_hctx(hctx);
|
||||
|
||||
kobject_uevent(&q->mq_kobj, KOBJ_REMOVE);
|
||||
kobject_del(&q->mq_kobj);
|
||||
kobject_uevent(q->mq_kobj, KOBJ_REMOVE);
|
||||
kobject_del(q->mq_kobj);
|
||||
kobject_put(&dev->kobj);
|
||||
|
||||
q->mq_sysfs_init_done = false;
|
||||
|
@ -279,7 +291,7 @@ void blk_mq_sysfs_deinit(struct request_queue *q)
|
|||
ctx = per_cpu_ptr(q->queue_ctx, cpu);
|
||||
kobject_put(&ctx->kobj);
|
||||
}
|
||||
kobject_put(&q->mq_kobj);
|
||||
kobject_put(q->mq_kobj);
|
||||
}
|
||||
|
||||
void blk_mq_sysfs_init(struct request_queue *q)
|
||||
|
@ -287,10 +299,12 @@ void blk_mq_sysfs_init(struct request_queue *q)
|
|||
struct blk_mq_ctx *ctx;
|
||||
int cpu;
|
||||
|
||||
kobject_init(&q->mq_kobj, &blk_mq_ktype);
|
||||
kobject_init(q->mq_kobj, &blk_mq_ktype);
|
||||
|
||||
for_each_possible_cpu(cpu) {
|
||||
ctx = per_cpu_ptr(q->queue_ctx, cpu);
|
||||
|
||||
kobject_get(q->mq_kobj);
|
||||
kobject_init(&ctx->kobj, &blk_mq_ctx_ktype);
|
||||
}
|
||||
}
|
||||
|
@ -303,11 +317,11 @@ int __blk_mq_register_dev(struct device *dev, struct request_queue *q)
|
|||
WARN_ON_ONCE(!q->kobj.parent);
|
||||
lockdep_assert_held(&q->sysfs_lock);
|
||||
|
||||
ret = kobject_add(&q->mq_kobj, kobject_get(&dev->kobj), "%s", "mq");
|
||||
ret = kobject_add(q->mq_kobj, kobject_get(&dev->kobj), "%s", "mq");
|
||||
if (ret < 0)
|
||||
goto out;
|
||||
|
||||
kobject_uevent(&q->mq_kobj, KOBJ_ADD);
|
||||
kobject_uevent(q->mq_kobj, KOBJ_ADD);
|
||||
|
||||
queue_for_each_hw_ctx(q, hctx, i) {
|
||||
ret = blk_mq_register_hctx(hctx);
|
||||
|
@ -324,8 +338,8 @@ unreg:
|
|||
while (--i >= 0)
|
||||
blk_mq_unregister_hctx(q->queue_hw_ctx[i]);
|
||||
|
||||
kobject_uevent(&q->mq_kobj, KOBJ_REMOVE);
|
||||
kobject_del(&q->mq_kobj);
|
||||
kobject_uevent(q->mq_kobj, KOBJ_REMOVE);
|
||||
kobject_del(q->mq_kobj);
|
||||
kobject_put(&dev->kobj);
|
||||
return ret;
|
||||
}
|
||||
|
@ -340,7 +354,6 @@ int blk_mq_register_dev(struct device *dev, struct request_queue *q)
|
|||
|
||||
return ret;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_mq_register_dev);
|
||||
|
||||
void blk_mq_sysfs_unregister(struct request_queue *q)
|
||||
{
|
||||
|
|
|
@ -110,7 +110,7 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
|
|||
struct blk_mq_tags *tags = blk_mq_tags_from_data(data);
|
||||
struct sbitmap_queue *bt;
|
||||
struct sbq_wait_state *ws;
|
||||
DEFINE_WAIT(wait);
|
||||
DEFINE_SBQ_WAIT(wait);
|
||||
unsigned int tag_offset;
|
||||
bool drop_ctx;
|
||||
int tag;
|
||||
|
@ -154,8 +154,7 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
|
|||
if (tag != -1)
|
||||
break;
|
||||
|
||||
prepare_to_wait_exclusive(&ws->wait, &wait,
|
||||
TASK_UNINTERRUPTIBLE);
|
||||
sbitmap_prepare_to_wait(bt, ws, &wait, TASK_UNINTERRUPTIBLE);
|
||||
|
||||
tag = __blk_mq_get_tag(data, bt);
|
||||
if (tag != -1)
|
||||
|
@ -167,16 +166,17 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
|
|||
bt_prev = bt;
|
||||
io_schedule();
|
||||
|
||||
sbitmap_finish_wait(bt, ws, &wait);
|
||||
|
||||
data->ctx = blk_mq_get_ctx(data->q);
|
||||
data->hctx = blk_mq_map_queue(data->q, data->ctx->cpu);
|
||||
data->hctx = blk_mq_map_queue(data->q, data->cmd_flags,
|
||||
data->ctx->cpu);
|
||||
tags = blk_mq_tags_from_data(data);
|
||||
if (data->flags & BLK_MQ_REQ_RESERVED)
|
||||
bt = &tags->breserved_tags;
|
||||
else
|
||||
bt = &tags->bitmap_tags;
|
||||
|
||||
finish_wait(&ws->wait, &wait);
|
||||
|
||||
/*
|
||||
* If destination hw queue is changed, fake wake up on
|
||||
* previous queue for compensating the wake up miss, so
|
||||
|
@ -191,7 +191,7 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
|
|||
if (drop_ctx && data->ctx)
|
||||
blk_mq_put_ctx(data->ctx);
|
||||
|
||||
finish_wait(&ws->wait, &wait);
|
||||
sbitmap_finish_wait(bt, ws, &wait);
|
||||
|
||||
found_tag:
|
||||
return tag + tag_offset;
|
||||
|
@ -235,7 +235,7 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
|
|||
* test and set the bit before assigning ->rqs[].
|
||||
*/
|
||||
if (rq && rq->q == hctx->queue)
|
||||
iter_data->fn(hctx, rq, iter_data->data, reserved);
|
||||
return iter_data->fn(hctx, rq, iter_data->data, reserved);
|
||||
return true;
|
||||
}
|
||||
|
||||
|
@ -247,7 +247,8 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
|
|||
* @fn: Pointer to the function that will be called for each request
|
||||
* associated with @hctx that has been assigned a driver tag.
|
||||
* @fn will be called as follows: @fn(@hctx, rq, @data, @reserved)
|
||||
* where rq is a pointer to a request.
|
||||
* where rq is a pointer to a request. Return true to continue
|
||||
* iterating tags, false to stop.
|
||||
* @data: Will be passed as third argument to @fn.
|
||||
* @reserved: Indicates whether @bt is the breserved_tags member or the
|
||||
* bitmap_tags member of struct blk_mq_tags.
|
||||
|
@ -288,7 +289,7 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
|
|||
*/
|
||||
rq = tags->rqs[bitnr];
|
||||
if (rq && blk_mq_request_started(rq))
|
||||
iter_data->fn(rq, iter_data->data, reserved);
|
||||
return iter_data->fn(rq, iter_data->data, reserved);
|
||||
|
||||
return true;
|
||||
}
|
||||
|
@ -300,7 +301,8 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
|
|||
* or the bitmap_tags member of struct blk_mq_tags.
|
||||
* @fn: Pointer to the function that will be called for each started
|
||||
* request. @fn will be called as follows: @fn(rq, @data,
|
||||
* @reserved) where rq is a pointer to a request.
|
||||
* @reserved) where rq is a pointer to a request. Return true
|
||||
* to continue iterating tags, false to stop.
|
||||
* @data: Will be passed as second argument to @fn.
|
||||
* @reserved: Indicates whether @bt is the breserved_tags member or the
|
||||
* bitmap_tags member of struct blk_mq_tags.
|
||||
|
@ -325,7 +327,8 @@ static void bt_tags_for_each(struct blk_mq_tags *tags, struct sbitmap_queue *bt,
|
|||
* @fn: Pointer to the function that will be called for each started
|
||||
* request. @fn will be called as follows: @fn(rq, @priv,
|
||||
* reserved) where rq is a pointer to a request. 'reserved'
|
||||
* indicates whether or not @rq is a reserved request.
|
||||
* indicates whether or not @rq is a reserved request. Return
|
||||
* true to continue iterating tags, false to stop.
|
||||
* @priv: Will be passed as second argument to @fn.
|
||||
*/
|
||||
static void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags,
|
||||
|
@ -342,7 +345,8 @@ static void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags,
|
|||
* @fn: Pointer to the function that will be called for each started
|
||||
* request. @fn will be called as follows: @fn(rq, @priv,
|
||||
* reserved) where rq is a pointer to a request. 'reserved'
|
||||
* indicates whether or not @rq is a reserved request.
|
||||
* indicates whether or not @rq is a reserved request. Return
|
||||
* true to continue iterating tags, false to stop.
|
||||
* @priv: Will be passed as second argument to @fn.
|
||||
*/
|
||||
void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
|
||||
|
@ -526,16 +530,7 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
|
|||
*/
|
||||
u32 blk_mq_unique_tag(struct request *rq)
|
||||
{
|
||||
struct request_queue *q = rq->q;
|
||||
struct blk_mq_hw_ctx *hctx;
|
||||
int hwq = 0;
|
||||
|
||||
if (q->mq_ops) {
|
||||
hctx = blk_mq_map_queue(q, rq->mq_ctx->cpu);
|
||||
hwq = hctx->queue_num;
|
||||
}
|
||||
|
||||
return (hwq << BLK_MQ_UNIQUE_TAG_BITS) |
|
||||
return (rq->mq_hctx->queue_num << BLK_MQ_UNIQUE_TAG_BITS) |
|
||||
(rq->tag & BLK_MQ_UNIQUE_TAG_MASK);
|
||||
}
|
||||
EXPORT_SYMBOL(blk_mq_unique_tag);
|
||||
|
|
|
@ -29,7 +29,7 @@
|
|||
* that maps a queue to the CPUs that have irq affinity for the corresponding
|
||||
* vector.
|
||||
*/
|
||||
int blk_mq_virtio_map_queues(struct blk_mq_tag_set *set,
|
||||
int blk_mq_virtio_map_queues(struct blk_mq_queue_map *qmap,
|
||||
struct virtio_device *vdev, int first_vec)
|
||||
{
|
||||
const struct cpumask *mask;
|
||||
|
@ -38,17 +38,17 @@ int blk_mq_virtio_map_queues(struct blk_mq_tag_set *set,
|
|||
if (!vdev->config->get_vq_affinity)
|
||||
goto fallback;
|
||||
|
||||
for (queue = 0; queue < set->nr_hw_queues; queue++) {
|
||||
for (queue = 0; queue < qmap->nr_queues; queue++) {
|
||||
mask = vdev->config->get_vq_affinity(vdev, first_vec + queue);
|
||||
if (!mask)
|
||||
goto fallback;
|
||||
|
||||
for_each_cpu(cpu, mask)
|
||||
set->mq_map[cpu] = queue;
|
||||
qmap->mq_map[cpu] = qmap->queue_offset + queue;
|
||||
}
|
||||
|
||||
return 0;
|
||||
fallback:
|
||||
return blk_mq_map_queues(set);
|
||||
return blk_mq_map_queues(qmap);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_mq_virtio_map_queues);
|
||||
|
|
783
block/blk-mq.c
783
block/blk-mq.c
File diff suppressed because it is too large
Load Diff
|
@ -7,17 +7,22 @@
|
|||
|
||||
struct blk_mq_tag_set;
|
||||
|
||||
struct blk_mq_ctxs {
|
||||
struct kobject kobj;
|
||||
struct blk_mq_ctx __percpu *queue_ctx;
|
||||
};
|
||||
|
||||
/**
|
||||
* struct blk_mq_ctx - State for a software queue facing the submitting CPUs
|
||||
*/
|
||||
struct blk_mq_ctx {
|
||||
struct {
|
||||
spinlock_t lock;
|
||||
struct list_head rq_list;
|
||||
} ____cacheline_aligned_in_smp;
|
||||
struct list_head rq_lists[HCTX_MAX_TYPES];
|
||||
} ____cacheline_aligned_in_smp;
|
||||
|
||||
unsigned int cpu;
|
||||
unsigned int index_hw;
|
||||
unsigned short index_hw[HCTX_MAX_TYPES];
|
||||
|
||||
/* incremented at dispatch time */
|
||||
unsigned long rq_dispatched[2];
|
||||
|
@ -27,6 +32,7 @@ struct blk_mq_ctx {
|
|||
unsigned long ____cacheline_aligned_in_smp rq_completed[2];
|
||||
|
||||
struct request_queue *queue;
|
||||
struct blk_mq_ctxs *ctxs;
|
||||
struct kobject kobj;
|
||||
} ____cacheline_aligned_in_smp;
|
||||
|
||||
|
@ -62,20 +68,55 @@ void blk_mq_request_bypass_insert(struct request *rq, bool run_queue);
|
|||
void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
|
||||
struct list_head *list);
|
||||
|
||||
/* Used by blk_insert_cloned_request() to issue request directly */
|
||||
blk_status_t blk_mq_request_issue_directly(struct request *rq);
|
||||
blk_status_t blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
|
||||
struct request *rq,
|
||||
blk_qc_t *cookie,
|
||||
bool bypass, bool last);
|
||||
void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
|
||||
struct list_head *list);
|
||||
|
||||
/*
|
||||
* CPU -> queue mappings
|
||||
*/
|
||||
extern int blk_mq_hw_queue_to_node(unsigned int *map, unsigned int);
|
||||
extern int blk_mq_hw_queue_to_node(struct blk_mq_queue_map *qmap, unsigned int);
|
||||
|
||||
static inline struct blk_mq_hw_ctx *blk_mq_map_queue(struct request_queue *q,
|
||||
int cpu)
|
||||
/*
|
||||
* blk_mq_map_queue_type() - map (hctx_type,cpu) to hardware queue
|
||||
* @q: request queue
|
||||
* @type: the hctx type index
|
||||
* @cpu: CPU
|
||||
*/
|
||||
static inline struct blk_mq_hw_ctx *blk_mq_map_queue_type(struct request_queue *q,
|
||||
enum hctx_type type,
|
||||
unsigned int cpu)
|
||||
{
|
||||
return q->queue_hw_ctx[q->mq_map[cpu]];
|
||||
return q->queue_hw_ctx[q->tag_set->map[type].mq_map[cpu]];
|
||||
}
|
||||
|
||||
/*
|
||||
* blk_mq_map_queue() - map (cmd_flags,type) to hardware queue
|
||||
* @q: request queue
|
||||
* @flags: request command flags
|
||||
* @cpu: CPU
|
||||
*/
|
||||
static inline struct blk_mq_hw_ctx *blk_mq_map_queue(struct request_queue *q,
|
||||
unsigned int flags,
|
||||
unsigned int cpu)
|
||||
{
|
||||
enum hctx_type type = HCTX_TYPE_DEFAULT;
|
||||
|
||||
if ((flags & REQ_HIPRI) &&
|
||||
q->tag_set->nr_maps > HCTX_TYPE_POLL &&
|
||||
q->tag_set->map[HCTX_TYPE_POLL].nr_queues &&
|
||||
test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
|
||||
type = HCTX_TYPE_POLL;
|
||||
|
||||
else if (((flags & REQ_OP_MASK) == REQ_OP_READ) &&
|
||||
q->tag_set->nr_maps > HCTX_TYPE_READ &&
|
||||
q->tag_set->map[HCTX_TYPE_READ].nr_queues)
|
||||
type = HCTX_TYPE_READ;
|
||||
|
||||
return blk_mq_map_queue_type(q, type, cpu);
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -126,6 +167,7 @@ struct blk_mq_alloc_data {
|
|||
struct request_queue *q;
|
||||
blk_mq_req_flags_t flags;
|
||||
unsigned int shallow_depth;
|
||||
unsigned int cmd_flags;
|
||||
|
||||
/* input & output parameter */
|
||||
struct blk_mq_ctx *ctx;
|
||||
|
@ -150,8 +192,7 @@ static inline bool blk_mq_hw_queue_mapped(struct blk_mq_hw_ctx *hctx)
|
|||
return hctx->nr_ctx && hctx->tags;
|
||||
}
|
||||
|
||||
void blk_mq_in_flight(struct request_queue *q, struct hd_struct *part,
|
||||
unsigned int inflight[2]);
|
||||
unsigned int blk_mq_in_flight(struct request_queue *q, struct hd_struct *part);
|
||||
void blk_mq_in_flight_rw(struct request_queue *q, struct hd_struct *part,
|
||||
unsigned int inflight[2]);
|
||||
|
||||
|
@ -195,21 +236,18 @@ static inline void blk_mq_put_driver_tag_hctx(struct blk_mq_hw_ctx *hctx,
|
|||
|
||||
static inline void blk_mq_put_driver_tag(struct request *rq)
|
||||
{
|
||||
struct blk_mq_hw_ctx *hctx;
|
||||
|
||||
if (rq->tag == -1 || rq->internal_tag == -1)
|
||||
return;
|
||||
|
||||
hctx = blk_mq_map_queue(rq->q, rq->mq_ctx->cpu);
|
||||
__blk_mq_put_driver_tag(hctx, rq);
|
||||
__blk_mq_put_driver_tag(rq->mq_hctx, rq);
|
||||
}
|
||||
|
||||
static inline void blk_mq_clear_mq_map(struct blk_mq_tag_set *set)
|
||||
static inline void blk_mq_clear_mq_map(struct blk_mq_queue_map *qmap)
|
||||
{
|
||||
int cpu;
|
||||
|
||||
for_each_possible_cpu(cpu)
|
||||
set->mq_map[cpu] = 0;
|
||||
qmap->mq_map[cpu] = 0;
|
||||
}
|
||||
|
||||
#endif
|
||||
|
|
|
@ -89,12 +89,12 @@ int blk_pre_runtime_suspend(struct request_queue *q)
|
|||
/* Switch q_usage_counter back to per-cpu mode. */
|
||||
blk_mq_unfreeze_queue(q);
|
||||
|
||||
spin_lock_irq(q->queue_lock);
|
||||
spin_lock_irq(&q->queue_lock);
|
||||
if (ret < 0)
|
||||
pm_runtime_mark_last_busy(q->dev);
|
||||
else
|
||||
q->rpm_status = RPM_SUSPENDING;
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
spin_unlock_irq(&q->queue_lock);
|
||||
|
||||
if (ret)
|
||||
blk_clear_pm_only(q);
|
||||
|
@ -121,14 +121,14 @@ void blk_post_runtime_suspend(struct request_queue *q, int err)
|
|||
if (!q->dev)
|
||||
return;
|
||||
|
||||
spin_lock_irq(q->queue_lock);
|
||||
spin_lock_irq(&q->queue_lock);
|
||||
if (!err) {
|
||||
q->rpm_status = RPM_SUSPENDED;
|
||||
} else {
|
||||
q->rpm_status = RPM_ACTIVE;
|
||||
pm_runtime_mark_last_busy(q->dev);
|
||||
}
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
spin_unlock_irq(&q->queue_lock);
|
||||
|
||||
if (err)
|
||||
blk_clear_pm_only(q);
|
||||
|
@ -151,9 +151,9 @@ void blk_pre_runtime_resume(struct request_queue *q)
|
|||
if (!q->dev)
|
||||
return;
|
||||
|
||||
spin_lock_irq(q->queue_lock);
|
||||
spin_lock_irq(&q->queue_lock);
|
||||
q->rpm_status = RPM_RESUMING;
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
spin_unlock_irq(&q->queue_lock);
|
||||
}
|
||||
EXPORT_SYMBOL(blk_pre_runtime_resume);
|
||||
|
||||
|
@ -176,7 +176,7 @@ void blk_post_runtime_resume(struct request_queue *q, int err)
|
|||
if (!q->dev)
|
||||
return;
|
||||
|
||||
spin_lock_irq(q->queue_lock);
|
||||
spin_lock_irq(&q->queue_lock);
|
||||
if (!err) {
|
||||
q->rpm_status = RPM_ACTIVE;
|
||||
pm_runtime_mark_last_busy(q->dev);
|
||||
|
@ -184,7 +184,7 @@ void blk_post_runtime_resume(struct request_queue *q, int err)
|
|||
} else {
|
||||
q->rpm_status = RPM_SUSPENDED;
|
||||
}
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
spin_unlock_irq(&q->queue_lock);
|
||||
|
||||
if (!err)
|
||||
blk_clear_pm_only(q);
|
||||
|
@ -207,10 +207,10 @@ EXPORT_SYMBOL(blk_post_runtime_resume);
|
|||
*/
|
||||
void blk_set_runtime_active(struct request_queue *q)
|
||||
{
|
||||
spin_lock_irq(q->queue_lock);
|
||||
spin_lock_irq(&q->queue_lock);
|
||||
q->rpm_status = RPM_ACTIVE;
|
||||
pm_runtime_mark_last_busy(q->dev);
|
||||
pm_request_autosuspend(q->dev);
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
spin_unlock_irq(&q->queue_lock);
|
||||
}
|
||||
EXPORT_SYMBOL(blk_set_runtime_active);
|
||||
|
|
|
@ -21,7 +21,7 @@ static inline void blk_pm_mark_last_busy(struct request *rq)
|
|||
|
||||
static inline void blk_pm_requeue_request(struct request *rq)
|
||||
{
|
||||
lockdep_assert_held(rq->q->queue_lock);
|
||||
lockdep_assert_held(&rq->q->queue_lock);
|
||||
|
||||
if (rq->q->dev && !(rq->rq_flags & RQF_PM))
|
||||
rq->q->nr_pending--;
|
||||
|
@ -30,7 +30,7 @@ static inline void blk_pm_requeue_request(struct request *rq)
|
|||
static inline void blk_pm_add_request(struct request_queue *q,
|
||||
struct request *rq)
|
||||
{
|
||||
lockdep_assert_held(q->queue_lock);
|
||||
lockdep_assert_held(&q->queue_lock);
|
||||
|
||||
if (q->dev && !(rq->rq_flags & RQF_PM))
|
||||
q->nr_pending++;
|
||||
|
@ -38,7 +38,7 @@ static inline void blk_pm_add_request(struct request_queue *q,
|
|||
|
||||
static inline void blk_pm_put_request(struct request *rq)
|
||||
{
|
||||
lockdep_assert_held(rq->q->queue_lock);
|
||||
lockdep_assert_held(&rq->q->queue_lock);
|
||||
|
||||
if (rq->q->dev && !(rq->rq_flags & RQF_PM))
|
||||
--rq->q->nr_pending;
|
||||
|
|
|
@ -27,75 +27,67 @@ bool rq_wait_inc_below(struct rq_wait *rq_wait, unsigned int limit)
|
|||
return atomic_inc_below(&rq_wait->inflight, limit);
|
||||
}
|
||||
|
||||
void rq_qos_cleanup(struct request_queue *q, struct bio *bio)
|
||||
void __rq_qos_cleanup(struct rq_qos *rqos, struct bio *bio)
|
||||
{
|
||||
struct rq_qos *rqos;
|
||||
|
||||
for (rqos = q->rq_qos; rqos; rqos = rqos->next) {
|
||||
do {
|
||||
if (rqos->ops->cleanup)
|
||||
rqos->ops->cleanup(rqos, bio);
|
||||
}
|
||||
rqos = rqos->next;
|
||||
} while (rqos);
|
||||
}
|
||||
|
||||
void rq_qos_done(struct request_queue *q, struct request *rq)
|
||||
void __rq_qos_done(struct rq_qos *rqos, struct request *rq)
|
||||
{
|
||||
struct rq_qos *rqos;
|
||||
|
||||
for (rqos = q->rq_qos; rqos; rqos = rqos->next) {
|
||||
do {
|
||||
if (rqos->ops->done)
|
||||
rqos->ops->done(rqos, rq);
|
||||
}
|
||||
rqos = rqos->next;
|
||||
} while (rqos);
|
||||
}
|
||||
|
||||
void rq_qos_issue(struct request_queue *q, struct request *rq)
|
||||
void __rq_qos_issue(struct rq_qos *rqos, struct request *rq)
|
||||
{
|
||||
struct rq_qos *rqos;
|
||||
|
||||
for(rqos = q->rq_qos; rqos; rqos = rqos->next) {
|
||||
do {
|
||||
if (rqos->ops->issue)
|
||||
rqos->ops->issue(rqos, rq);
|
||||
}
|
||||
rqos = rqos->next;
|
||||
} while (rqos);
|
||||
}
|
||||
|
||||
void rq_qos_requeue(struct request_queue *q, struct request *rq)
|
||||
void __rq_qos_requeue(struct rq_qos *rqos, struct request *rq)
|
||||
{
|
||||
struct rq_qos *rqos;
|
||||
|
||||
for(rqos = q->rq_qos; rqos; rqos = rqos->next) {
|
||||
do {
|
||||
if (rqos->ops->requeue)
|
||||
rqos->ops->requeue(rqos, rq);
|
||||
}
|
||||
rqos = rqos->next;
|
||||
} while (rqos);
|
||||
}
|
||||
|
||||
void rq_qos_throttle(struct request_queue *q, struct bio *bio,
|
||||
spinlock_t *lock)
|
||||
void __rq_qos_throttle(struct rq_qos *rqos, struct bio *bio)
|
||||
{
|
||||
struct rq_qos *rqos;
|
||||
|
||||
for(rqos = q->rq_qos; rqos; rqos = rqos->next) {
|
||||
do {
|
||||
if (rqos->ops->throttle)
|
||||
rqos->ops->throttle(rqos, bio, lock);
|
||||
}
|
||||
rqos->ops->throttle(rqos, bio);
|
||||
rqos = rqos->next;
|
||||
} while (rqos);
|
||||
}
|
||||
|
||||
void rq_qos_track(struct request_queue *q, struct request *rq, struct bio *bio)
|
||||
void __rq_qos_track(struct rq_qos *rqos, struct request *rq, struct bio *bio)
|
||||
{
|
||||
struct rq_qos *rqos;
|
||||
|
||||
for(rqos = q->rq_qos; rqos; rqos = rqos->next) {
|
||||
do {
|
||||
if (rqos->ops->track)
|
||||
rqos->ops->track(rqos, rq, bio);
|
||||
}
|
||||
rqos = rqos->next;
|
||||
} while (rqos);
|
||||
}
|
||||
|
||||
void rq_qos_done_bio(struct request_queue *q, struct bio *bio)
|
||||
void __rq_qos_done_bio(struct rq_qos *rqos, struct bio *bio)
|
||||
{
|
||||
struct rq_qos *rqos;
|
||||
|
||||
for(rqos = q->rq_qos; rqos; rqos = rqos->next) {
|
||||
do {
|
||||
if (rqos->ops->done_bio)
|
||||
rqos->ops->done_bio(rqos, bio);
|
||||
}
|
||||
rqos = rqos->next;
|
||||
} while (rqos);
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -184,8 +176,96 @@ void rq_depth_scale_down(struct rq_depth *rqd, bool hard_throttle)
|
|||
rq_depth_calc_max_depth(rqd);
|
||||
}
|
||||
|
||||
struct rq_qos_wait_data {
|
||||
struct wait_queue_entry wq;
|
||||
struct task_struct *task;
|
||||
struct rq_wait *rqw;
|
||||
acquire_inflight_cb_t *cb;
|
||||
void *private_data;
|
||||
bool got_token;
|
||||
};
|
||||
|
||||
static int rq_qos_wake_function(struct wait_queue_entry *curr,
|
||||
unsigned int mode, int wake_flags, void *key)
|
||||
{
|
||||
struct rq_qos_wait_data *data = container_of(curr,
|
||||
struct rq_qos_wait_data,
|
||||
wq);
|
||||
|
||||
/*
|
||||
* If we fail to get a budget, return -1 to interrupt the wake up loop
|
||||
* in __wake_up_common.
|
||||
*/
|
||||
if (!data->cb(data->rqw, data->private_data))
|
||||
return -1;
|
||||
|
||||
data->got_token = true;
|
||||
list_del_init(&curr->entry);
|
||||
wake_up_process(data->task);
|
||||
return 1;
|
||||
}
|
||||
|
||||
/**
|
||||
* rq_qos_wait - throttle on a rqw if we need to
|
||||
* @private_data - caller provided specific data
|
||||
* @acquire_inflight_cb - inc the rqw->inflight counter if we can
|
||||
* @cleanup_cb - the callback to cleanup in case we race with a waker
|
||||
*
|
||||
* This provides a uniform place for the rq_qos users to do their throttling.
|
||||
* Since you can end up with a lot of things sleeping at once, this manages the
|
||||
* waking up based on the resources available. The acquire_inflight_cb should
|
||||
* inc the rqw->inflight if we have the ability to do so, or return false if not
|
||||
* and then we will sleep until the room becomes available.
|
||||
*
|
||||
* cleanup_cb is in case that we race with a waker and need to cleanup the
|
||||
* inflight count accordingly.
|
||||
*/
|
||||
void rq_qos_wait(struct rq_wait *rqw, void *private_data,
|
||||
acquire_inflight_cb_t *acquire_inflight_cb,
|
||||
cleanup_cb_t *cleanup_cb)
|
||||
{
|
||||
struct rq_qos_wait_data data = {
|
||||
.wq = {
|
||||
.func = rq_qos_wake_function,
|
||||
.entry = LIST_HEAD_INIT(data.wq.entry),
|
||||
},
|
||||
.task = current,
|
||||
.rqw = rqw,
|
||||
.cb = acquire_inflight_cb,
|
||||
.private_data = private_data,
|
||||
};
|
||||
bool has_sleeper;
|
||||
|
||||
has_sleeper = wq_has_sleeper(&rqw->wait);
|
||||
if (!has_sleeper && acquire_inflight_cb(rqw, private_data))
|
||||
return;
|
||||
|
||||
prepare_to_wait_exclusive(&rqw->wait, &data.wq, TASK_UNINTERRUPTIBLE);
|
||||
do {
|
||||
if (data.got_token)
|
||||
break;
|
||||
if (!has_sleeper && acquire_inflight_cb(rqw, private_data)) {
|
||||
finish_wait(&rqw->wait, &data.wq);
|
||||
|
||||
/*
|
||||
* We raced with wbt_wake_function() getting a token,
|
||||
* which means we now have two. Put our local token
|
||||
* and wake anyone else potentially waiting for one.
|
||||
*/
|
||||
if (data.got_token)
|
||||
cleanup_cb(rqw, private_data);
|
||||
break;
|
||||
}
|
||||
io_schedule();
|
||||
has_sleeper = false;
|
||||
} while (1);
|
||||
finish_wait(&rqw->wait, &data.wq);
|
||||
}
|
||||
|
||||
void rq_qos_exit(struct request_queue *q)
|
||||
{
|
||||
blk_mq_debugfs_unregister_queue_rqos(q);
|
||||
|
||||
while (q->rq_qos) {
|
||||
struct rq_qos *rqos = q->rq_qos;
|
||||
q->rq_qos = rqos->next;
|
||||
|
|
|
@ -7,6 +7,10 @@
|
|||
#include <linux/atomic.h>
|
||||
#include <linux/wait.h>
|
||||
|
||||
#include "blk-mq-debugfs.h"
|
||||
|
||||
struct blk_mq_debugfs_attr;
|
||||
|
||||
enum rq_qos_id {
|
||||
RQ_QOS_WBT,
|
||||
RQ_QOS_CGROUP,
|
||||
|
@ -22,10 +26,13 @@ struct rq_qos {
|
|||
struct request_queue *q;
|
||||
enum rq_qos_id id;
|
||||
struct rq_qos *next;
|
||||
#ifdef CONFIG_BLK_DEBUG_FS
|
||||
struct dentry *debugfs_dir;
|
||||
#endif
|
||||
};
|
||||
|
||||
struct rq_qos_ops {
|
||||
void (*throttle)(struct rq_qos *, struct bio *, spinlock_t *);
|
||||
void (*throttle)(struct rq_qos *, struct bio *);
|
||||
void (*track)(struct rq_qos *, struct request *, struct bio *);
|
||||
void (*issue)(struct rq_qos *, struct request *);
|
||||
void (*requeue)(struct rq_qos *, struct request *);
|
||||
|
@ -33,6 +40,7 @@ struct rq_qos_ops {
|
|||
void (*done_bio)(struct rq_qos *, struct bio *);
|
||||
void (*cleanup)(struct rq_qos *, struct bio *);
|
||||
void (*exit)(struct rq_qos *);
|
||||
const struct blk_mq_debugfs_attr *debugfs_attrs;
|
||||
};
|
||||
|
||||
struct rq_depth {
|
||||
|
@ -66,6 +74,17 @@ static inline struct rq_qos *blkcg_rq_qos(struct request_queue *q)
|
|||
return rq_qos_id(q, RQ_QOS_CGROUP);
|
||||
}
|
||||
|
||||
static inline const char *rq_qos_id_to_name(enum rq_qos_id id)
|
||||
{
|
||||
switch (id) {
|
||||
case RQ_QOS_WBT:
|
||||
return "wbt";
|
||||
case RQ_QOS_CGROUP:
|
||||
return "cgroup";
|
||||
}
|
||||
return "unknown";
|
||||
}
|
||||
|
||||
static inline void rq_wait_init(struct rq_wait *rq_wait)
|
||||
{
|
||||
atomic_set(&rq_wait->inflight, 0);
|
||||
|
@ -76,6 +95,9 @@ static inline void rq_qos_add(struct request_queue *q, struct rq_qos *rqos)
|
|||
{
|
||||
rqos->next = q->rq_qos;
|
||||
q->rq_qos = rqos;
|
||||
|
||||
if (rqos->ops->debugfs_attrs)
|
||||
blk_mq_debugfs_register_rqos(rqos);
|
||||
}
|
||||
|
||||
static inline void rq_qos_del(struct request_queue *q, struct rq_qos *rqos)
|
||||
|
@ -91,19 +113,77 @@ static inline void rq_qos_del(struct request_queue *q, struct rq_qos *rqos)
|
|||
}
|
||||
prev = cur;
|
||||
}
|
||||
|
||||
blk_mq_debugfs_unregister_rqos(rqos);
|
||||
}
|
||||
|
||||
typedef bool (acquire_inflight_cb_t)(struct rq_wait *rqw, void *private_data);
|
||||
typedef void (cleanup_cb_t)(struct rq_wait *rqw, void *private_data);
|
||||
|
||||
void rq_qos_wait(struct rq_wait *rqw, void *private_data,
|
||||
acquire_inflight_cb_t *acquire_inflight_cb,
|
||||
cleanup_cb_t *cleanup_cb);
|
||||
bool rq_wait_inc_below(struct rq_wait *rq_wait, unsigned int limit);
|
||||
void rq_depth_scale_up(struct rq_depth *rqd);
|
||||
void rq_depth_scale_down(struct rq_depth *rqd, bool hard_throttle);
|
||||
bool rq_depth_calc_max_depth(struct rq_depth *rqd);
|
||||
|
||||
void rq_qos_cleanup(struct request_queue *, struct bio *);
|
||||
void rq_qos_done(struct request_queue *, struct request *);
|
||||
void rq_qos_issue(struct request_queue *, struct request *);
|
||||
void rq_qos_requeue(struct request_queue *, struct request *);
|
||||
void rq_qos_done_bio(struct request_queue *q, struct bio *bio);
|
||||
void rq_qos_throttle(struct request_queue *, struct bio *, spinlock_t *);
|
||||
void rq_qos_track(struct request_queue *q, struct request *, struct bio *);
|
||||
void __rq_qos_cleanup(struct rq_qos *rqos, struct bio *bio);
|
||||
void __rq_qos_done(struct rq_qos *rqos, struct request *rq);
|
||||
void __rq_qos_issue(struct rq_qos *rqos, struct request *rq);
|
||||
void __rq_qos_requeue(struct rq_qos *rqos, struct request *rq);
|
||||
void __rq_qos_throttle(struct rq_qos *rqos, struct bio *bio);
|
||||
void __rq_qos_track(struct rq_qos *rqos, struct request *rq, struct bio *bio);
|
||||
void __rq_qos_done_bio(struct rq_qos *rqos, struct bio *bio);
|
||||
|
||||
static inline void rq_qos_cleanup(struct request_queue *q, struct bio *bio)
|
||||
{
|
||||
if (q->rq_qos)
|
||||
__rq_qos_cleanup(q->rq_qos, bio);
|
||||
}
|
||||
|
||||
static inline void rq_qos_done(struct request_queue *q, struct request *rq)
|
||||
{
|
||||
if (q->rq_qos)
|
||||
__rq_qos_done(q->rq_qos, rq);
|
||||
}
|
||||
|
||||
static inline void rq_qos_issue(struct request_queue *q, struct request *rq)
|
||||
{
|
||||
if (q->rq_qos)
|
||||
__rq_qos_issue(q->rq_qos, rq);
|
||||
}
|
||||
|
||||
static inline void rq_qos_requeue(struct request_queue *q, struct request *rq)
|
||||
{
|
||||
if (q->rq_qos)
|
||||
__rq_qos_requeue(q->rq_qos, rq);
|
||||
}
|
||||
|
||||
static inline void rq_qos_done_bio(struct request_queue *q, struct bio *bio)
|
||||
{
|
||||
if (q->rq_qos)
|
||||
__rq_qos_done_bio(q->rq_qos, bio);
|
||||
}
|
||||
|
||||
static inline void rq_qos_throttle(struct request_queue *q, struct bio *bio)
|
||||
{
|
||||
/*
|
||||
* BIO_TRACKED lets controllers know that a bio went through the
|
||||
* normal rq_qos path.
|
||||
*/
|
||||
bio_set_flag(bio, BIO_TRACKED);
|
||||
if (q->rq_qos)
|
||||
__rq_qos_throttle(q->rq_qos, bio);
|
||||
}
|
||||
|
||||
static inline void rq_qos_track(struct request_queue *q, struct request *rq,
|
||||
struct bio *bio)
|
||||
{
|
||||
if (q->rq_qos)
|
||||
__rq_qos_track(q->rq_qos, rq, bio);
|
||||
}
|
||||
|
||||
void rq_qos_exit(struct request_queue *);
|
||||
|
||||
#endif
|
||||
|
|
|
@ -20,65 +20,12 @@ EXPORT_SYMBOL(blk_max_low_pfn);
|
|||
|
||||
unsigned long blk_max_pfn;
|
||||
|
||||
/**
|
||||
* blk_queue_prep_rq - set a prepare_request function for queue
|
||||
* @q: queue
|
||||
* @pfn: prepare_request function
|
||||
*
|
||||
* It's possible for a queue to register a prepare_request callback which
|
||||
* is invoked before the request is handed to the request_fn. The goal of
|
||||
* the function is to prepare a request for I/O, it can be used to build a
|
||||
* cdb from the request data for instance.
|
||||
*
|
||||
*/
|
||||
void blk_queue_prep_rq(struct request_queue *q, prep_rq_fn *pfn)
|
||||
{
|
||||
q->prep_rq_fn = pfn;
|
||||
}
|
||||
EXPORT_SYMBOL(blk_queue_prep_rq);
|
||||
|
||||
/**
|
||||
* blk_queue_unprep_rq - set an unprepare_request function for queue
|
||||
* @q: queue
|
||||
* @ufn: unprepare_request function
|
||||
*
|
||||
* It's possible for a queue to register an unprepare_request callback
|
||||
* which is invoked before the request is finally completed. The goal
|
||||
* of the function is to deallocate any data that was allocated in the
|
||||
* prepare_request callback.
|
||||
*
|
||||
*/
|
||||
void blk_queue_unprep_rq(struct request_queue *q, unprep_rq_fn *ufn)
|
||||
{
|
||||
q->unprep_rq_fn = ufn;
|
||||
}
|
||||
EXPORT_SYMBOL(blk_queue_unprep_rq);
|
||||
|
||||
void blk_queue_softirq_done(struct request_queue *q, softirq_done_fn *fn)
|
||||
{
|
||||
q->softirq_done_fn = fn;
|
||||
}
|
||||
EXPORT_SYMBOL(blk_queue_softirq_done);
|
||||
|
||||
void blk_queue_rq_timeout(struct request_queue *q, unsigned int timeout)
|
||||
{
|
||||
q->rq_timeout = timeout;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_queue_rq_timeout);
|
||||
|
||||
void blk_queue_rq_timed_out(struct request_queue *q, rq_timed_out_fn *fn)
|
||||
{
|
||||
WARN_ON_ONCE(q->mq_ops);
|
||||
q->rq_timed_out_fn = fn;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_queue_rq_timed_out);
|
||||
|
||||
void blk_queue_lld_busy(struct request_queue *q, lld_busy_fn *fn)
|
||||
{
|
||||
q->lld_busy_fn = fn;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_queue_lld_busy);
|
||||
|
||||
/**
|
||||
* blk_set_default_limits - reset limits to default values
|
||||
* @lim: the queue_limits structure to reset
|
||||
|
@ -169,8 +116,6 @@ void blk_queue_make_request(struct request_queue *q, make_request_fn *mfn)
|
|||
|
||||
q->make_request_fn = mfn;
|
||||
blk_queue_dma_alignment(q, 511);
|
||||
blk_queue_congestion_threshold(q);
|
||||
q->nr_batching = BLK_BATCH_REQ;
|
||||
|
||||
blk_set_default_limits(&q->limits);
|
||||
}
|
||||
|
@ -889,16 +834,14 @@ EXPORT_SYMBOL(blk_set_queue_depth);
|
|||
*/
|
||||
void blk_queue_write_cache(struct request_queue *q, bool wc, bool fua)
|
||||
{
|
||||
spin_lock_irq(q->queue_lock);
|
||||
if (wc)
|
||||
queue_flag_set(QUEUE_FLAG_WC, q);
|
||||
blk_queue_flag_set(QUEUE_FLAG_WC, q);
|
||||
else
|
||||
queue_flag_clear(QUEUE_FLAG_WC, q);
|
||||
blk_queue_flag_clear(QUEUE_FLAG_WC, q);
|
||||
if (fua)
|
||||
queue_flag_set(QUEUE_FLAG_FUA, q);
|
||||
blk_queue_flag_set(QUEUE_FLAG_FUA, q);
|
||||
else
|
||||
queue_flag_clear(QUEUE_FLAG_FUA, q);
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
blk_queue_flag_clear(QUEUE_FLAG_FUA, q);
|
||||
|
||||
wbt_set_write_cache(q, test_bit(QUEUE_FLAG_WC, &q->queue_flags));
|
||||
}
|
||||
|
|
|
@ -34,7 +34,7 @@ static __latent_entropy void blk_done_softirq(struct softirq_action *h)
|
|||
|
||||
rq = list_entry(local_list.next, struct request, ipi_list);
|
||||
list_del_init(&rq->ipi_list);
|
||||
rq->q->softirq_done_fn(rq);
|
||||
rq->q->mq_ops->complete(rq);
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -98,11 +98,11 @@ static int blk_softirq_cpu_dead(unsigned int cpu)
|
|||
void __blk_complete_request(struct request *req)
|
||||
{
|
||||
struct request_queue *q = req->q;
|
||||
int cpu, ccpu = q->mq_ops ? req->mq_ctx->cpu : req->cpu;
|
||||
int cpu, ccpu = req->mq_ctx->cpu;
|
||||
unsigned long flags;
|
||||
bool shared = false;
|
||||
|
||||
BUG_ON(!q->softirq_done_fn);
|
||||
BUG_ON(!q->mq_ops->complete);
|
||||
|
||||
local_irq_save(flags);
|
||||
cpu = smp_processor_id();
|
||||
|
@ -143,27 +143,6 @@ do_local:
|
|||
|
||||
local_irq_restore(flags);
|
||||
}
|
||||
EXPORT_SYMBOL(__blk_complete_request);
|
||||
|
||||
/**
|
||||
* blk_complete_request - end I/O on a request
|
||||
* @req: the request being processed
|
||||
*
|
||||
* Description:
|
||||
* Ends all I/O on a request. It does not handle partial completions,
|
||||
* unless the driver actually implements this in its completion callback
|
||||
* through requeueing. The actual completion happens out-of-order,
|
||||
* through a softirq handler. The user must have registered a completion
|
||||
* callback through blk_queue_softirq_done().
|
||||
**/
|
||||
void blk_complete_request(struct request *req)
|
||||
{
|
||||
if (unlikely(blk_should_fake_timeout(req->q)))
|
||||
return;
|
||||
if (!blk_mark_rq_complete(req))
|
||||
__blk_complete_request(req);
|
||||
}
|
||||
EXPORT_SYMBOL(blk_complete_request);
|
||||
|
||||
static __init int blk_softirq_init(void)
|
||||
{
|
||||
|
|
|
@ -130,7 +130,6 @@ blk_stat_alloc_callback(void (*timer_fn)(struct blk_stat_callback *),
|
|||
|
||||
return cb;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_stat_alloc_callback);
|
||||
|
||||
void blk_stat_add_callback(struct request_queue *q,
|
||||
struct blk_stat_callback *cb)
|
||||
|
@ -151,7 +150,6 @@ void blk_stat_add_callback(struct request_queue *q,
|
|||
blk_queue_flag_set(QUEUE_FLAG_STATS, q);
|
||||
spin_unlock(&q->stats->lock);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_stat_add_callback);
|
||||
|
||||
void blk_stat_remove_callback(struct request_queue *q,
|
||||
struct blk_stat_callback *cb)
|
||||
|
@ -164,7 +162,6 @@ void blk_stat_remove_callback(struct request_queue *q,
|
|||
|
||||
del_timer_sync(&cb->timer);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_stat_remove_callback);
|
||||
|
||||
static void blk_stat_free_callback_rcu(struct rcu_head *head)
|
||||
{
|
||||
|
@ -181,7 +178,6 @@ void blk_stat_free_callback(struct blk_stat_callback *cb)
|
|||
if (cb)
|
||||
call_rcu(&cb->rcu, blk_stat_free_callback_rcu);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_stat_free_callback);
|
||||
|
||||
void blk_stat_enable_accounting(struct request_queue *q)
|
||||
{
|
||||
|
|
|
@ -145,6 +145,11 @@ static inline void blk_stat_activate_nsecs(struct blk_stat_callback *cb,
|
|||
mod_timer(&cb->timer, jiffies + nsecs_to_jiffies(nsecs));
|
||||
}
|
||||
|
||||
static inline void blk_stat_deactivate(struct blk_stat_callback *cb)
|
||||
{
|
||||
del_timer_sync(&cb->timer);
|
||||
}
|
||||
|
||||
/**
|
||||
* blk_stat_activate_msecs() - Gather block statistics during a time window in
|
||||
* milliseconds.
|
||||
|
|
|
@ -68,7 +68,7 @@ queue_requests_store(struct request_queue *q, const char *page, size_t count)
|
|||
unsigned long nr;
|
||||
int ret, err;
|
||||
|
||||
if (!q->request_fn && !q->mq_ops)
|
||||
if (!queue_is_mq(q))
|
||||
return -EINVAL;
|
||||
|
||||
ret = queue_var_store(&nr, page, count);
|
||||
|
@ -78,11 +78,7 @@ queue_requests_store(struct request_queue *q, const char *page, size_t count)
|
|||
if (nr < BLKDEV_MIN_RQ)
|
||||
nr = BLKDEV_MIN_RQ;
|
||||
|
||||
if (q->request_fn)
|
||||
err = blk_update_nr_requests(q, nr);
|
||||
else
|
||||
err = blk_mq_update_nr_requests(q, nr);
|
||||
|
||||
err = blk_mq_update_nr_requests(q, nr);
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
|
@ -242,10 +238,10 @@ queue_max_sectors_store(struct request_queue *q, const char *page, size_t count)
|
|||
if (max_sectors_kb > max_hw_sectors_kb || max_sectors_kb < page_kb)
|
||||
return -EINVAL;
|
||||
|
||||
spin_lock_irq(q->queue_lock);
|
||||
spin_lock_irq(&q->queue_lock);
|
||||
q->limits.max_sectors = max_sectors_kb << 1;
|
||||
q->backing_dev_info->io_pages = max_sectors_kb >> (PAGE_SHIFT - 10);
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
spin_unlock_irq(&q->queue_lock);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
@ -320,14 +316,12 @@ static ssize_t queue_nomerges_store(struct request_queue *q, const char *page,
|
|||
if (ret < 0)
|
||||
return ret;
|
||||
|
||||
spin_lock_irq(q->queue_lock);
|
||||
queue_flag_clear(QUEUE_FLAG_NOMERGES, q);
|
||||
queue_flag_clear(QUEUE_FLAG_NOXMERGES, q);
|
||||
blk_queue_flag_clear(QUEUE_FLAG_NOMERGES, q);
|
||||
blk_queue_flag_clear(QUEUE_FLAG_NOXMERGES, q);
|
||||
if (nm == 2)
|
||||
queue_flag_set(QUEUE_FLAG_NOMERGES, q);
|
||||
blk_queue_flag_set(QUEUE_FLAG_NOMERGES, q);
|
||||
else if (nm)
|
||||
queue_flag_set(QUEUE_FLAG_NOXMERGES, q);
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
blk_queue_flag_set(QUEUE_FLAG_NOXMERGES, q);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
@ -351,18 +345,16 @@ queue_rq_affinity_store(struct request_queue *q, const char *page, size_t count)
|
|||
if (ret < 0)
|
||||
return ret;
|
||||
|
||||
spin_lock_irq(q->queue_lock);
|
||||
if (val == 2) {
|
||||
queue_flag_set(QUEUE_FLAG_SAME_COMP, q);
|
||||
queue_flag_set(QUEUE_FLAG_SAME_FORCE, q);
|
||||
blk_queue_flag_set(QUEUE_FLAG_SAME_COMP, q);
|
||||
blk_queue_flag_set(QUEUE_FLAG_SAME_FORCE, q);
|
||||
} else if (val == 1) {
|
||||
queue_flag_set(QUEUE_FLAG_SAME_COMP, q);
|
||||
queue_flag_clear(QUEUE_FLAG_SAME_FORCE, q);
|
||||
blk_queue_flag_set(QUEUE_FLAG_SAME_COMP, q);
|
||||
blk_queue_flag_clear(QUEUE_FLAG_SAME_FORCE, q);
|
||||
} else if (val == 0) {
|
||||
queue_flag_clear(QUEUE_FLAG_SAME_COMP, q);
|
||||
queue_flag_clear(QUEUE_FLAG_SAME_FORCE, q);
|
||||
blk_queue_flag_clear(QUEUE_FLAG_SAME_COMP, q);
|
||||
blk_queue_flag_clear(QUEUE_FLAG_SAME_FORCE, q);
|
||||
}
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
#endif
|
||||
return ret;
|
||||
}
|
||||
|
@ -410,7 +402,8 @@ static ssize_t queue_poll_store(struct request_queue *q, const char *page,
|
|||
unsigned long poll_on;
|
||||
ssize_t ret;
|
||||
|
||||
if (!q->mq_ops || !q->mq_ops->poll)
|
||||
if (!q->tag_set || q->tag_set->nr_maps <= HCTX_TYPE_POLL ||
|
||||
!q->tag_set->map[HCTX_TYPE_POLL].nr_queues)
|
||||
return -EINVAL;
|
||||
|
||||
ret = queue_var_store(&poll_on, page, count);
|
||||
|
@ -425,6 +418,26 @@ static ssize_t queue_poll_store(struct request_queue *q, const char *page,
|
|||
return ret;
|
||||
}
|
||||
|
||||
static ssize_t queue_io_timeout_show(struct request_queue *q, char *page)
|
||||
{
|
||||
return sprintf(page, "%u\n", jiffies_to_msecs(q->rq_timeout));
|
||||
}
|
||||
|
||||
static ssize_t queue_io_timeout_store(struct request_queue *q, const char *page,
|
||||
size_t count)
|
||||
{
|
||||
unsigned int val;
|
||||
int err;
|
||||
|
||||
err = kstrtou32(page, 10, &val);
|
||||
if (err || val == 0)
|
||||
return -EINVAL;
|
||||
|
||||
blk_queue_rq_timeout(q, msecs_to_jiffies(val));
|
||||
|
||||
return count;
|
||||
}
|
||||
|
||||
static ssize_t queue_wb_lat_show(struct request_queue *q, char *page)
|
||||
{
|
||||
if (!wbt_rq_qos(q))
|
||||
|
@ -463,20 +476,14 @@ static ssize_t queue_wb_lat_store(struct request_queue *q, const char *page,
|
|||
* ends up either enabling or disabling wbt completely. We can't
|
||||
* have IO inflight if that happens.
|
||||
*/
|
||||
if (q->mq_ops) {
|
||||
blk_mq_freeze_queue(q);
|
||||
blk_mq_quiesce_queue(q);
|
||||
} else
|
||||
blk_queue_bypass_start(q);
|
||||
blk_mq_freeze_queue(q);
|
||||
blk_mq_quiesce_queue(q);
|
||||
|
||||
wbt_set_min_lat(q, val);
|
||||
wbt_update_limits(q);
|
||||
|
||||
if (q->mq_ops) {
|
||||
blk_mq_unquiesce_queue(q);
|
||||
blk_mq_unfreeze_queue(q);
|
||||
} else
|
||||
blk_queue_bypass_end(q);
|
||||
blk_mq_unquiesce_queue(q);
|
||||
blk_mq_unfreeze_queue(q);
|
||||
|
||||
return count;
|
||||
}
|
||||
|
@ -699,6 +706,12 @@ static struct queue_sysfs_entry queue_dax_entry = {
|
|||
.show = queue_dax_show,
|
||||
};
|
||||
|
||||
static struct queue_sysfs_entry queue_io_timeout_entry = {
|
||||
.attr = {.name = "io_timeout", .mode = 0644 },
|
||||
.show = queue_io_timeout_show,
|
||||
.store = queue_io_timeout_store,
|
||||
};
|
||||
|
||||
static struct queue_sysfs_entry queue_wb_lat_entry = {
|
||||
.attr = {.name = "wbt_lat_usec", .mode = 0644 },
|
||||
.show = queue_wb_lat_show,
|
||||
|
@ -748,6 +761,7 @@ static struct attribute *default_attrs[] = {
|
|||
&queue_dax_entry.attr,
|
||||
&queue_wb_lat_entry.attr,
|
||||
&queue_poll_delay_entry.attr,
|
||||
&queue_io_timeout_entry.attr,
|
||||
#ifdef CONFIG_BLK_DEV_THROTTLING_LOW
|
||||
&throtl_sample_time_entry.attr,
|
||||
#endif
|
||||
|
@ -847,24 +861,14 @@ static void __blk_release_queue(struct work_struct *work)
|
|||
|
||||
blk_free_queue_stats(q->stats);
|
||||
|
||||
blk_exit_rl(q, &q->root_rl);
|
||||
|
||||
if (q->queue_tags)
|
||||
__blk_queue_free_tags(q);
|
||||
|
||||
blk_queue_free_zone_bitmaps(q);
|
||||
|
||||
if (!q->mq_ops) {
|
||||
if (q->exit_rq_fn)
|
||||
q->exit_rq_fn(q, q->fq->flush_rq);
|
||||
blk_free_flush_queue(q->fq);
|
||||
} else {
|
||||
if (queue_is_mq(q))
|
||||
blk_mq_release(q);
|
||||
}
|
||||
|
||||
blk_trace_shutdown(q);
|
||||
|
||||
if (q->mq_ops)
|
||||
if (queue_is_mq(q))
|
||||
blk_mq_debugfs_unregister(q);
|
||||
|
||||
bioset_exit(&q->bio_split);
|
||||
|
@ -909,7 +913,7 @@ int blk_register_queue(struct gendisk *disk)
|
|||
WARN_ONCE(test_bit(QUEUE_FLAG_REGISTERED, &q->queue_flags),
|
||||
"%s is registering an already registered queue\n",
|
||||
kobject_name(&dev->kobj));
|
||||
queue_flag_set_unlocked(QUEUE_FLAG_REGISTERED, q);
|
||||
blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q);
|
||||
|
||||
/*
|
||||
* SCSI probing may synchronously create and destroy a lot of
|
||||
|
@ -921,9 +925,8 @@ int blk_register_queue(struct gendisk *disk)
|
|||
* request_queues for non-existent devices never get registered.
|
||||
*/
|
||||
if (!blk_queue_init_done(q)) {
|
||||
queue_flag_set_unlocked(QUEUE_FLAG_INIT_DONE, q);
|
||||
blk_queue_flag_set(QUEUE_FLAG_INIT_DONE, q);
|
||||
percpu_ref_switch_to_percpu(&q->q_usage_counter);
|
||||
blk_queue_bypass_end(q);
|
||||
}
|
||||
|
||||
ret = blk_trace_init_sysfs(dev);
|
||||
|
@ -939,7 +942,7 @@ int blk_register_queue(struct gendisk *disk)
|
|||
goto unlock;
|
||||
}
|
||||
|
||||
if (q->mq_ops) {
|
||||
if (queue_is_mq(q)) {
|
||||
__blk_mq_register_dev(dev, q);
|
||||
blk_mq_debugfs_register(q);
|
||||
}
|
||||
|
@ -950,7 +953,7 @@ int blk_register_queue(struct gendisk *disk)
|
|||
|
||||
blk_throtl_register_queue(q);
|
||||
|
||||
if (q->request_fn || (q->mq_ops && q->elevator)) {
|
||||
if (q->elevator) {
|
||||
ret = elv_register_queue(q);
|
||||
if (ret) {
|
||||
mutex_unlock(&q->sysfs_lock);
|
||||
|
@ -999,7 +1002,7 @@ void blk_unregister_queue(struct gendisk *disk)
|
|||
* Remove the sysfs attributes before unregistering the queue data
|
||||
* structures that can be modified through sysfs.
|
||||
*/
|
||||
if (q->mq_ops)
|
||||
if (queue_is_mq(q))
|
||||
blk_mq_unregister_dev(disk_to_dev(disk), q);
|
||||
mutex_unlock(&q->sysfs_lock);
|
||||
|
||||
|
@ -1008,7 +1011,7 @@ void blk_unregister_queue(struct gendisk *disk)
|
|||
blk_trace_remove_sysfs(disk_to_dev(disk));
|
||||
|
||||
mutex_lock(&q->sysfs_lock);
|
||||
if (q->request_fn || (q->mq_ops && q->elevator))
|
||||
if (q->elevator)
|
||||
elv_unregister_queue(q);
|
||||
mutex_unlock(&q->sysfs_lock);
|
||||
|
||||
|
|
378
block/blk-tag.c
378
block/blk-tag.c
|
@ -1,378 +0,0 @@
|
|||
// SPDX-License-Identifier: GPL-2.0
|
||||
/*
|
||||
* Functions related to tagged command queuing
|
||||
*/
|
||||
#include <linux/kernel.h>
|
||||
#include <linux/module.h>
|
||||
#include <linux/bio.h>
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/slab.h>
|
||||
|
||||
#include "blk.h"
|
||||
|
||||
/**
|
||||
* blk_queue_find_tag - find a request by its tag and queue
|
||||
* @q: The request queue for the device
|
||||
* @tag: The tag of the request
|
||||
*
|
||||
* Notes:
|
||||
* Should be used when a device returns a tag and you want to match
|
||||
* it with a request.
|
||||
*
|
||||
* no locks need be held.
|
||||
**/
|
||||
struct request *blk_queue_find_tag(struct request_queue *q, int tag)
|
||||
{
|
||||
return blk_map_queue_find_tag(q->queue_tags, tag);
|
||||
}
|
||||
EXPORT_SYMBOL(blk_queue_find_tag);
|
||||
|
||||
/**
|
||||
* blk_free_tags - release a given set of tag maintenance info
|
||||
* @bqt: the tag map to free
|
||||
*
|
||||
* Drop the reference count on @bqt and frees it when the last reference
|
||||
* is dropped.
|
||||
*/
|
||||
void blk_free_tags(struct blk_queue_tag *bqt)
|
||||
{
|
||||
if (atomic_dec_and_test(&bqt->refcnt)) {
|
||||
BUG_ON(find_first_bit(bqt->tag_map, bqt->max_depth) <
|
||||
bqt->max_depth);
|
||||
|
||||
kfree(bqt->tag_index);
|
||||
bqt->tag_index = NULL;
|
||||
|
||||
kfree(bqt->tag_map);
|
||||
bqt->tag_map = NULL;
|
||||
|
||||
kfree(bqt);
|
||||
}
|
||||
}
|
||||
EXPORT_SYMBOL(blk_free_tags);
|
||||
|
||||
/**
|
||||
* __blk_queue_free_tags - release tag maintenance info
|
||||
* @q: the request queue for the device
|
||||
*
|
||||
* Notes:
|
||||
* blk_cleanup_queue() will take care of calling this function, if tagging
|
||||
* has been used. So there's no need to call this directly.
|
||||
**/
|
||||
void __blk_queue_free_tags(struct request_queue *q)
|
||||
{
|
||||
struct blk_queue_tag *bqt = q->queue_tags;
|
||||
|
||||
if (!bqt)
|
||||
return;
|
||||
|
||||
blk_free_tags(bqt);
|
||||
|
||||
q->queue_tags = NULL;
|
||||
queue_flag_clear_unlocked(QUEUE_FLAG_QUEUED, q);
|
||||
}
|
||||
|
||||
/**
|
||||
* blk_queue_free_tags - release tag maintenance info
|
||||
* @q: the request queue for the device
|
||||
*
|
||||
* Notes:
|
||||
* This is used to disable tagged queuing to a device, yet leave
|
||||
* queue in function.
|
||||
**/
|
||||
void blk_queue_free_tags(struct request_queue *q)
|
||||
{
|
||||
queue_flag_clear_unlocked(QUEUE_FLAG_QUEUED, q);
|
||||
}
|
||||
EXPORT_SYMBOL(blk_queue_free_tags);
|
||||
|
||||
static int
|
||||
init_tag_map(struct request_queue *q, struct blk_queue_tag *tags, int depth)
|
||||
{
|
||||
struct request **tag_index;
|
||||
unsigned long *tag_map;
|
||||
int nr_ulongs;
|
||||
|
||||
if (q && depth > q->nr_requests * 2) {
|
||||
depth = q->nr_requests * 2;
|
||||
printk(KERN_ERR "%s: adjusted depth to %d\n",
|
||||
__func__, depth);
|
||||
}
|
||||
|
||||
tag_index = kcalloc(depth, sizeof(struct request *), GFP_ATOMIC);
|
||||
if (!tag_index)
|
||||
goto fail;
|
||||
|
||||
nr_ulongs = ALIGN(depth, BITS_PER_LONG) / BITS_PER_LONG;
|
||||
tag_map = kcalloc(nr_ulongs, sizeof(unsigned long), GFP_ATOMIC);
|
||||
if (!tag_map)
|
||||
goto fail;
|
||||
|
||||
tags->real_max_depth = depth;
|
||||
tags->max_depth = depth;
|
||||
tags->tag_index = tag_index;
|
||||
tags->tag_map = tag_map;
|
||||
|
||||
return 0;
|
||||
fail:
|
||||
kfree(tag_index);
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
static struct blk_queue_tag *__blk_queue_init_tags(struct request_queue *q,
|
||||
int depth, int alloc_policy)
|
||||
{
|
||||
struct blk_queue_tag *tags;
|
||||
|
||||
tags = kmalloc(sizeof(struct blk_queue_tag), GFP_ATOMIC);
|
||||
if (!tags)
|
||||
goto fail;
|
||||
|
||||
if (init_tag_map(q, tags, depth))
|
||||
goto fail;
|
||||
|
||||
atomic_set(&tags->refcnt, 1);
|
||||
tags->alloc_policy = alloc_policy;
|
||||
tags->next_tag = 0;
|
||||
return tags;
|
||||
fail:
|
||||
kfree(tags);
|
||||
return NULL;
|
||||
}
|
||||
|
||||
/**
|
||||
* blk_init_tags - initialize the tag info for an external tag map
|
||||
* @depth: the maximum queue depth supported
|
||||
* @alloc_policy: tag allocation policy
|
||||
**/
|
||||
struct blk_queue_tag *blk_init_tags(int depth, int alloc_policy)
|
||||
{
|
||||
return __blk_queue_init_tags(NULL, depth, alloc_policy);
|
||||
}
|
||||
EXPORT_SYMBOL(blk_init_tags);
|
||||
|
||||
/**
|
||||
* blk_queue_init_tags - initialize the queue tag info
|
||||
* @q: the request queue for the device
|
||||
* @depth: the maximum queue depth supported
|
||||
* @tags: the tag to use
|
||||
* @alloc_policy: tag allocation policy
|
||||
*
|
||||
* Queue lock must be held here if the function is called to resize an
|
||||
* existing map.
|
||||
**/
|
||||
int blk_queue_init_tags(struct request_queue *q, int depth,
|
||||
struct blk_queue_tag *tags, int alloc_policy)
|
||||
{
|
||||
int rc;
|
||||
|
||||
BUG_ON(tags && q->queue_tags && tags != q->queue_tags);
|
||||
|
||||
if (!tags && !q->queue_tags) {
|
||||
tags = __blk_queue_init_tags(q, depth, alloc_policy);
|
||||
|
||||
if (!tags)
|
||||
return -ENOMEM;
|
||||
|
||||
} else if (q->queue_tags) {
|
||||
rc = blk_queue_resize_tags(q, depth);
|
||||
if (rc)
|
||||
return rc;
|
||||
queue_flag_set(QUEUE_FLAG_QUEUED, q);
|
||||
return 0;
|
||||
} else
|
||||
atomic_inc(&tags->refcnt);
|
||||
|
||||
/*
|
||||
* assign it, all done
|
||||
*/
|
||||
q->queue_tags = tags;
|
||||
queue_flag_set_unlocked(QUEUE_FLAG_QUEUED, q);
|
||||
return 0;
|
||||
}
|
||||
EXPORT_SYMBOL(blk_queue_init_tags);
|
||||
|
||||
/**
|
||||
* blk_queue_resize_tags - change the queueing depth
|
||||
* @q: the request queue for the device
|
||||
* @new_depth: the new max command queueing depth
|
||||
*
|
||||
* Notes:
|
||||
* Must be called with the queue lock held.
|
||||
**/
|
||||
int blk_queue_resize_tags(struct request_queue *q, int new_depth)
|
||||
{
|
||||
struct blk_queue_tag *bqt = q->queue_tags;
|
||||
struct request **tag_index;
|
||||
unsigned long *tag_map;
|
||||
int max_depth, nr_ulongs;
|
||||
|
||||
if (!bqt)
|
||||
return -ENXIO;
|
||||
|
||||
/*
|
||||
* if we already have large enough real_max_depth. just
|
||||
* adjust max_depth. *NOTE* as requests with tag value
|
||||
* between new_depth and real_max_depth can be in-flight, tag
|
||||
* map can not be shrunk blindly here.
|
||||
*/
|
||||
if (new_depth <= bqt->real_max_depth) {
|
||||
bqt->max_depth = new_depth;
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* Currently cannot replace a shared tag map with a new
|
||||
* one, so error out if this is the case
|
||||
*/
|
||||
if (atomic_read(&bqt->refcnt) != 1)
|
||||
return -EBUSY;
|
||||
|
||||
/*
|
||||
* save the old state info, so we can copy it back
|
||||
*/
|
||||
tag_index = bqt->tag_index;
|
||||
tag_map = bqt->tag_map;
|
||||
max_depth = bqt->real_max_depth;
|
||||
|
||||
if (init_tag_map(q, bqt, new_depth))
|
||||
return -ENOMEM;
|
||||
|
||||
memcpy(bqt->tag_index, tag_index, max_depth * sizeof(struct request *));
|
||||
nr_ulongs = ALIGN(max_depth, BITS_PER_LONG) / BITS_PER_LONG;
|
||||
memcpy(bqt->tag_map, tag_map, nr_ulongs * sizeof(unsigned long));
|
||||
|
||||
kfree(tag_index);
|
||||
kfree(tag_map);
|
||||
return 0;
|
||||
}
|
||||
EXPORT_SYMBOL(blk_queue_resize_tags);
|
||||
|
||||
/**
|
||||
* blk_queue_end_tag - end tag operations for a request
|
||||
* @q: the request queue for the device
|
||||
* @rq: the request that has completed
|
||||
*
|
||||
* Description:
|
||||
* Typically called when end_that_request_first() returns %0, meaning
|
||||
* all transfers have been done for a request. It's important to call
|
||||
* this function before end_that_request_last(), as that will put the
|
||||
* request back on the free list thus corrupting the internal tag list.
|
||||
**/
|
||||
void blk_queue_end_tag(struct request_queue *q, struct request *rq)
|
||||
{
|
||||
struct blk_queue_tag *bqt = q->queue_tags;
|
||||
unsigned tag = rq->tag; /* negative tags invalid */
|
||||
|
||||
lockdep_assert_held(q->queue_lock);
|
||||
|
||||
BUG_ON(tag >= bqt->real_max_depth);
|
||||
|
||||
list_del_init(&rq->queuelist);
|
||||
rq->rq_flags &= ~RQF_QUEUED;
|
||||
rq->tag = -1;
|
||||
rq->internal_tag = -1;
|
||||
|
||||
if (unlikely(bqt->tag_index[tag] == NULL))
|
||||
printk(KERN_ERR "%s: tag %d is missing\n",
|
||||
__func__, tag);
|
||||
|
||||
bqt->tag_index[tag] = NULL;
|
||||
|
||||
if (unlikely(!test_bit(tag, bqt->tag_map))) {
|
||||
printk(KERN_ERR "%s: attempt to clear non-busy tag (%d)\n",
|
||||
__func__, tag);
|
||||
return;
|
||||
}
|
||||
/*
|
||||
* The tag_map bit acts as a lock for tag_index[bit], so we need
|
||||
* unlock memory barrier semantics.
|
||||
*/
|
||||
clear_bit_unlock(tag, bqt->tag_map);
|
||||
}
|
||||
|
||||
/**
|
||||
* blk_queue_start_tag - find a free tag and assign it
|
||||
* @q: the request queue for the device
|
||||
* @rq: the block request that needs tagging
|
||||
*
|
||||
* Description:
|
||||
* This can either be used as a stand-alone helper, or possibly be
|
||||
* assigned as the queue &prep_rq_fn (in which case &struct request
|
||||
* automagically gets a tag assigned). Note that this function
|
||||
* assumes that any type of request can be queued! if this is not
|
||||
* true for your device, you must check the request type before
|
||||
* calling this function. The request will also be removed from
|
||||
* the request queue, so it's the drivers responsibility to readd
|
||||
* it if it should need to be restarted for some reason.
|
||||
**/
|
||||
int blk_queue_start_tag(struct request_queue *q, struct request *rq)
|
||||
{
|
||||
struct blk_queue_tag *bqt = q->queue_tags;
|
||||
unsigned max_depth;
|
||||
int tag;
|
||||
|
||||
lockdep_assert_held(q->queue_lock);
|
||||
|
||||
if (unlikely((rq->rq_flags & RQF_QUEUED))) {
|
||||
printk(KERN_ERR
|
||||
"%s: request %p for device [%s] already tagged %d",
|
||||
__func__, rq,
|
||||
rq->rq_disk ? rq->rq_disk->disk_name : "?", rq->tag);
|
||||
BUG();
|
||||
}
|
||||
|
||||
/*
|
||||
* Protect against shared tag maps, as we may not have exclusive
|
||||
* access to the tag map.
|
||||
*
|
||||
* We reserve a few tags just for sync IO, since we don't want
|
||||
* to starve sync IO on behalf of flooding async IO.
|
||||
*/
|
||||
max_depth = bqt->max_depth;
|
||||
if (!rq_is_sync(rq) && max_depth > 1) {
|
||||
switch (max_depth) {
|
||||
case 2:
|
||||
max_depth = 1;
|
||||
break;
|
||||
case 3:
|
||||
max_depth = 2;
|
||||
break;
|
||||
default:
|
||||
max_depth -= 2;
|
||||
}
|
||||
if (q->in_flight[BLK_RW_ASYNC] > max_depth)
|
||||
return 1;
|
||||
}
|
||||
|
||||
do {
|
||||
if (bqt->alloc_policy == BLK_TAG_ALLOC_FIFO) {
|
||||
tag = find_first_zero_bit(bqt->tag_map, max_depth);
|
||||
if (tag >= max_depth)
|
||||
return 1;
|
||||
} else {
|
||||
int start = bqt->next_tag;
|
||||
int size = min_t(int, bqt->max_depth, max_depth + start);
|
||||
tag = find_next_zero_bit(bqt->tag_map, size, start);
|
||||
if (tag >= size && start + size > bqt->max_depth) {
|
||||
size = start + size - bqt->max_depth;
|
||||
tag = find_first_zero_bit(bqt->tag_map, size);
|
||||
}
|
||||
if (tag >= size)
|
||||
return 1;
|
||||
}
|
||||
|
||||
} while (test_and_set_bit_lock(tag, bqt->tag_map));
|
||||
/*
|
||||
* We need lock ordering semantics given by test_and_set_bit_lock.
|
||||
* See blk_queue_end_tag for details.
|
||||
*/
|
||||
|
||||
bqt->next_tag = (tag + 1) % bqt->max_depth;
|
||||
rq->rq_flags |= RQF_QUEUED;
|
||||
rq->tag = tag;
|
||||
bqt->tag_index[tag] = rq;
|
||||
blk_start_request(rq);
|
||||
return 0;
|
||||
}
|
||||
EXPORT_SYMBOL(blk_queue_start_tag);
|
|
@ -1243,7 +1243,7 @@ static void throtl_pending_timer_fn(struct timer_list *t)
|
|||
bool dispatched;
|
||||
int ret;
|
||||
|
||||
spin_lock_irq(q->queue_lock);
|
||||
spin_lock_irq(&q->queue_lock);
|
||||
if (throtl_can_upgrade(td, NULL))
|
||||
throtl_upgrade_state(td);
|
||||
|
||||
|
@ -1266,9 +1266,9 @@ again:
|
|||
break;
|
||||
|
||||
/* this dispatch windows is still open, relax and repeat */
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
spin_unlock_irq(&q->queue_lock);
|
||||
cpu_relax();
|
||||
spin_lock_irq(q->queue_lock);
|
||||
spin_lock_irq(&q->queue_lock);
|
||||
}
|
||||
|
||||
if (!dispatched)
|
||||
|
@ -1290,7 +1290,7 @@ again:
|
|||
queue_work(kthrotld_workqueue, &td->dispatch_work);
|
||||
}
|
||||
out_unlock:
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
spin_unlock_irq(&q->queue_lock);
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -1314,11 +1314,11 @@ static void blk_throtl_dispatch_work_fn(struct work_struct *work)
|
|||
|
||||
bio_list_init(&bio_list_on_stack);
|
||||
|
||||
spin_lock_irq(q->queue_lock);
|
||||
spin_lock_irq(&q->queue_lock);
|
||||
for (rw = READ; rw <= WRITE; rw++)
|
||||
while ((bio = throtl_pop_queued(&td_sq->queued[rw], NULL)))
|
||||
bio_list_add(&bio_list_on_stack, bio);
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
spin_unlock_irq(&q->queue_lock);
|
||||
|
||||
if (!bio_list_empty(&bio_list_on_stack)) {
|
||||
blk_start_plug(&plug);
|
||||
|
@ -2115,16 +2115,6 @@ static inline void throtl_update_latency_buckets(struct throtl_data *td)
|
|||
}
|
||||
#endif
|
||||
|
||||
static void blk_throtl_assoc_bio(struct throtl_grp *tg, struct bio *bio)
|
||||
{
|
||||
#ifdef CONFIG_BLK_DEV_THROTTLING_LOW
|
||||
/* fallback to root_blkg if we fail to get a blkg ref */
|
||||
if (bio->bi_css && (bio_associate_blkg(bio, tg_to_blkg(tg)) == -ENODEV))
|
||||
bio_associate_blkg(bio, bio->bi_disk->queue->root_blkg);
|
||||
bio_issue_init(&bio->bi_issue, bio_sectors(bio));
|
||||
#endif
|
||||
}
|
||||
|
||||
bool blk_throtl_bio(struct request_queue *q, struct blkcg_gq *blkg,
|
||||
struct bio *bio)
|
||||
{
|
||||
|
@ -2141,14 +2131,10 @@ bool blk_throtl_bio(struct request_queue *q, struct blkcg_gq *blkg,
|
|||
if (bio_flagged(bio, BIO_THROTTLED) || !tg->has_rules[rw])
|
||||
goto out;
|
||||
|
||||
spin_lock_irq(q->queue_lock);
|
||||
spin_lock_irq(&q->queue_lock);
|
||||
|
||||
throtl_update_latency_buckets(td);
|
||||
|
||||
if (unlikely(blk_queue_bypass(q)))
|
||||
goto out_unlock;
|
||||
|
||||
blk_throtl_assoc_bio(tg, bio);
|
||||
blk_throtl_update_idletime(tg);
|
||||
|
||||
sq = &tg->service_queue;
|
||||
|
@ -2227,7 +2213,7 @@ again:
|
|||
}
|
||||
|
||||
out_unlock:
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
spin_unlock_irq(&q->queue_lock);
|
||||
out:
|
||||
bio_set_flag(bio, BIO_THROTTLED);
|
||||
|
||||
|
@ -2348,7 +2334,7 @@ static void tg_drain_bios(struct throtl_service_queue *parent_sq)
|
|||
* Dispatch all currently throttled bios on @q through ->make_request_fn().
|
||||
*/
|
||||
void blk_throtl_drain(struct request_queue *q)
|
||||
__releases(q->queue_lock) __acquires(q->queue_lock)
|
||||
__releases(&q->queue_lock) __acquires(&q->queue_lock)
|
||||
{
|
||||
struct throtl_data *td = q->td;
|
||||
struct blkcg_gq *blkg;
|
||||
|
@ -2356,7 +2342,6 @@ void blk_throtl_drain(struct request_queue *q)
|
|||
struct bio *bio;
|
||||
int rw;
|
||||
|
||||
queue_lockdep_assert_held(q);
|
||||
rcu_read_lock();
|
||||
|
||||
/*
|
||||
|
@ -2372,7 +2357,7 @@ void blk_throtl_drain(struct request_queue *q)
|
|||
tg_drain_bios(&td->service_queue);
|
||||
|
||||
rcu_read_unlock();
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
spin_unlock_irq(&q->queue_lock);
|
||||
|
||||
/* all bios now should be in td->service_queue, issue them */
|
||||
for (rw = READ; rw <= WRITE; rw++)
|
||||
|
@ -2380,7 +2365,7 @@ void blk_throtl_drain(struct request_queue *q)
|
|||
NULL)))
|
||||
generic_make_request(bio);
|
||||
|
||||
spin_lock_irq(q->queue_lock);
|
||||
spin_lock_irq(&q->queue_lock);
|
||||
}
|
||||
|
||||
int blk_throtl_init(struct request_queue *q)
|
||||
|
@ -2460,7 +2445,7 @@ void blk_throtl_register_queue(struct request_queue *q)
|
|||
td->throtl_slice = DFL_THROTL_SLICE_HD;
|
||||
#endif
|
||||
|
||||
td->track_bio_latency = !queue_is_rq_based(q);
|
||||
td->track_bio_latency = !queue_is_mq(q);
|
||||
if (!td->track_bio_latency)
|
||||
blk_stat_enable_accounting(q);
|
||||
}
|
||||
|
|
|
@ -68,80 +68,6 @@ ssize_t part_timeout_store(struct device *dev, struct device_attribute *attr,
|
|||
|
||||
#endif /* CONFIG_FAIL_IO_TIMEOUT */
|
||||
|
||||
/*
|
||||
* blk_delete_timer - Delete/cancel timer for a given function.
|
||||
* @req: request that we are canceling timer for
|
||||
*
|
||||
*/
|
||||
void blk_delete_timer(struct request *req)
|
||||
{
|
||||
list_del_init(&req->timeout_list);
|
||||
}
|
||||
|
||||
static void blk_rq_timed_out(struct request *req)
|
||||
{
|
||||
struct request_queue *q = req->q;
|
||||
enum blk_eh_timer_return ret = BLK_EH_RESET_TIMER;
|
||||
|
||||
if (q->rq_timed_out_fn)
|
||||
ret = q->rq_timed_out_fn(req);
|
||||
switch (ret) {
|
||||
case BLK_EH_RESET_TIMER:
|
||||
blk_add_timer(req);
|
||||
blk_clear_rq_complete(req);
|
||||
break;
|
||||
case BLK_EH_DONE:
|
||||
/*
|
||||
* LLD handles this for now but in the future
|
||||
* we can send a request msg to abort the command
|
||||
* and we can move more of the generic scsi eh code to
|
||||
* the blk layer.
|
||||
*/
|
||||
break;
|
||||
default:
|
||||
printk(KERN_ERR "block: bad eh return: %d\n", ret);
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
static void blk_rq_check_expired(struct request *rq, unsigned long *next_timeout,
|
||||
unsigned int *next_set)
|
||||
{
|
||||
const unsigned long deadline = blk_rq_deadline(rq);
|
||||
|
||||
if (time_after_eq(jiffies, deadline)) {
|
||||
list_del_init(&rq->timeout_list);
|
||||
|
||||
/*
|
||||
* Check if we raced with end io completion
|
||||
*/
|
||||
if (!blk_mark_rq_complete(rq))
|
||||
blk_rq_timed_out(rq);
|
||||
} else if (!*next_set || time_after(*next_timeout, deadline)) {
|
||||
*next_timeout = deadline;
|
||||
*next_set = 1;
|
||||
}
|
||||
}
|
||||
|
||||
void blk_timeout_work(struct work_struct *work)
|
||||
{
|
||||
struct request_queue *q =
|
||||
container_of(work, struct request_queue, timeout_work);
|
||||
unsigned long flags, next = 0;
|
||||
struct request *rq, *tmp;
|
||||
int next_set = 0;
|
||||
|
||||
spin_lock_irqsave(q->queue_lock, flags);
|
||||
|
||||
list_for_each_entry_safe(rq, tmp, &q->timeout_list, timeout_list)
|
||||
blk_rq_check_expired(rq, &next, &next_set);
|
||||
|
||||
if (next_set)
|
||||
mod_timer(&q->timeout, round_jiffies_up(next));
|
||||
|
||||
spin_unlock_irqrestore(q->queue_lock, flags);
|
||||
}
|
||||
|
||||
/**
|
||||
* blk_abort_request -- Request request recovery for the specified command
|
||||
* @req: pointer to the request of interest
|
||||
|
@ -149,24 +75,17 @@ void blk_timeout_work(struct work_struct *work)
|
|||
* This function requests that the block layer start recovery for the
|
||||
* request by deleting the timer and calling the q's timeout function.
|
||||
* LLDDs who implement their own error recovery MAY ignore the timeout
|
||||
* event if they generated blk_abort_req. Must hold queue lock.
|
||||
* event if they generated blk_abort_request.
|
||||
*/
|
||||
void blk_abort_request(struct request *req)
|
||||
{
|
||||
if (req->q->mq_ops) {
|
||||
/*
|
||||
* All we need to ensure is that timeout scan takes place
|
||||
* immediately and that scan sees the new timeout value.
|
||||
* No need for fancy synchronizations.
|
||||
*/
|
||||
blk_rq_set_deadline(req, jiffies);
|
||||
kblockd_schedule_work(&req->q->timeout_work);
|
||||
} else {
|
||||
if (blk_mark_rq_complete(req))
|
||||
return;
|
||||
blk_delete_timer(req);
|
||||
blk_rq_timed_out(req);
|
||||
}
|
||||
/*
|
||||
* All we need to ensure is that timeout scan takes place
|
||||
* immediately and that scan sees the new timeout value.
|
||||
* No need for fancy synchronizations.
|
||||
*/
|
||||
WRITE_ONCE(req->deadline, jiffies);
|
||||
kblockd_schedule_work(&req->q->timeout_work);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_abort_request);
|
||||
|
||||
|
@ -194,15 +113,6 @@ void blk_add_timer(struct request *req)
|
|||
struct request_queue *q = req->q;
|
||||
unsigned long expiry;
|
||||
|
||||
if (!q->mq_ops)
|
||||
lockdep_assert_held(q->queue_lock);
|
||||
|
||||
/* blk-mq has its own handler, so we don't need ->rq_timed_out_fn */
|
||||
if (!q->mq_ops && !q->rq_timed_out_fn)
|
||||
return;
|
||||
|
||||
BUG_ON(!list_empty(&req->timeout_list));
|
||||
|
||||
/*
|
||||
* Some LLDs, like scsi, peek at the timeout to prevent a
|
||||
* command from being retried forever.
|
||||
|
@ -211,21 +121,16 @@ void blk_add_timer(struct request *req)
|
|||
req->timeout = q->rq_timeout;
|
||||
|
||||
req->rq_flags &= ~RQF_TIMED_OUT;
|
||||
blk_rq_set_deadline(req, jiffies + req->timeout);
|
||||
|
||||
/*
|
||||
* Only the non-mq case needs to add the request to a protected list.
|
||||
* For the mq case we simply scan the tag map.
|
||||
*/
|
||||
if (!q->mq_ops)
|
||||
list_add_tail(&req->timeout_list, &req->q->timeout_list);
|
||||
expiry = jiffies + req->timeout;
|
||||
WRITE_ONCE(req->deadline, expiry);
|
||||
|
||||
/*
|
||||
* If the timer isn't already pending or this timeout is earlier
|
||||
* than an existing one, modify the timer. Round up to next nearest
|
||||
* second.
|
||||
*/
|
||||
expiry = blk_rq_timeout(round_jiffies_up(blk_rq_deadline(req)));
|
||||
expiry = blk_rq_timeout(round_jiffies_up(expiry));
|
||||
|
||||
if (!timer_pending(&q->timeout) ||
|
||||
time_before(expiry, q->timeout.expires)) {
|
||||
|
|
176
block/blk-wbt.c
176
block/blk-wbt.c
|
@ -489,31 +489,21 @@ static inline unsigned int get_limit(struct rq_wb *rwb, unsigned long rw)
|
|||
}
|
||||
|
||||
struct wbt_wait_data {
|
||||
struct wait_queue_entry wq;
|
||||
struct task_struct *task;
|
||||
struct rq_wb *rwb;
|
||||
struct rq_wait *rqw;
|
||||
enum wbt_flags wb_acct;
|
||||
unsigned long rw;
|
||||
bool got_token;
|
||||
};
|
||||
|
||||
static int wbt_wake_function(struct wait_queue_entry *curr, unsigned int mode,
|
||||
int wake_flags, void *key)
|
||||
static bool wbt_inflight_cb(struct rq_wait *rqw, void *private_data)
|
||||
{
|
||||
struct wbt_wait_data *data = container_of(curr, struct wbt_wait_data,
|
||||
wq);
|
||||
struct wbt_wait_data *data = private_data;
|
||||
return rq_wait_inc_below(rqw, get_limit(data->rwb, data->rw));
|
||||
}
|
||||
|
||||
/*
|
||||
* If we fail to get a budget, return -1 to interrupt the wake up
|
||||
* loop in __wake_up_common.
|
||||
*/
|
||||
if (!rq_wait_inc_below(data->rqw, get_limit(data->rwb, data->rw)))
|
||||
return -1;
|
||||
|
||||
data->got_token = true;
|
||||
list_del_init(&curr->entry);
|
||||
wake_up_process(data->task);
|
||||
return 1;
|
||||
static void wbt_cleanup_cb(struct rq_wait *rqw, void *private_data)
|
||||
{
|
||||
struct wbt_wait_data *data = private_data;
|
||||
wbt_rqw_done(data->rwb, rqw, data->wb_acct);
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -521,57 +511,16 @@ static int wbt_wake_function(struct wait_queue_entry *curr, unsigned int mode,
|
|||
* the timer to kick off queuing again.
|
||||
*/
|
||||
static void __wbt_wait(struct rq_wb *rwb, enum wbt_flags wb_acct,
|
||||
unsigned long rw, spinlock_t *lock)
|
||||
__releases(lock)
|
||||
__acquires(lock)
|
||||
unsigned long rw)
|
||||
{
|
||||
struct rq_wait *rqw = get_rq_wait(rwb, wb_acct);
|
||||
struct wbt_wait_data data = {
|
||||
.wq = {
|
||||
.func = wbt_wake_function,
|
||||
.entry = LIST_HEAD_INIT(data.wq.entry),
|
||||
},
|
||||
.task = current,
|
||||
.rwb = rwb,
|
||||
.rqw = rqw,
|
||||
.wb_acct = wb_acct,
|
||||
.rw = rw,
|
||||
};
|
||||
bool has_sleeper;
|
||||
|
||||
has_sleeper = wq_has_sleeper(&rqw->wait);
|
||||
if (!has_sleeper && rq_wait_inc_below(rqw, get_limit(rwb, rw)))
|
||||
return;
|
||||
|
||||
prepare_to_wait_exclusive(&rqw->wait, &data.wq, TASK_UNINTERRUPTIBLE);
|
||||
do {
|
||||
if (data.got_token)
|
||||
break;
|
||||
|
||||
if (!has_sleeper &&
|
||||
rq_wait_inc_below(rqw, get_limit(rwb, rw))) {
|
||||
finish_wait(&rqw->wait, &data.wq);
|
||||
|
||||
/*
|
||||
* We raced with wbt_wake_function() getting a token,
|
||||
* which means we now have two. Put our local token
|
||||
* and wake anyone else potentially waiting for one.
|
||||
*/
|
||||
if (data.got_token)
|
||||
wbt_rqw_done(rwb, rqw, wb_acct);
|
||||
break;
|
||||
}
|
||||
|
||||
if (lock) {
|
||||
spin_unlock_irq(lock);
|
||||
io_schedule();
|
||||
spin_lock_irq(lock);
|
||||
} else
|
||||
io_schedule();
|
||||
|
||||
has_sleeper = false;
|
||||
} while (1);
|
||||
|
||||
finish_wait(&rqw->wait, &data.wq);
|
||||
rq_qos_wait(rqw, &data, wbt_inflight_cb, wbt_cleanup_cb);
|
||||
}
|
||||
|
||||
static inline bool wbt_should_throttle(struct rq_wb *rwb, struct bio *bio)
|
||||
|
@ -624,7 +573,7 @@ static void wbt_cleanup(struct rq_qos *rqos, struct bio *bio)
|
|||
* in an irq held spinlock, if it holds one when calling this function.
|
||||
* If we do sleep, we'll release and re-grab it.
|
||||
*/
|
||||
static void wbt_wait(struct rq_qos *rqos, struct bio *bio, spinlock_t *lock)
|
||||
static void wbt_wait(struct rq_qos *rqos, struct bio *bio)
|
||||
{
|
||||
struct rq_wb *rwb = RQWB(rqos);
|
||||
enum wbt_flags flags;
|
||||
|
@ -636,7 +585,7 @@ static void wbt_wait(struct rq_qos *rqos, struct bio *bio, spinlock_t *lock)
|
|||
return;
|
||||
}
|
||||
|
||||
__wbt_wait(rwb, flags, bio->bi_opf, lock);
|
||||
__wbt_wait(rwb, flags, bio->bi_opf);
|
||||
|
||||
if (!blk_stat_is_active(rwb->cb))
|
||||
rwb_arm_timer(rwb);
|
||||
|
@ -709,8 +658,7 @@ void wbt_enable_default(struct request_queue *q)
|
|||
if (!test_bit(QUEUE_FLAG_REGISTERED, &q->queue_flags))
|
||||
return;
|
||||
|
||||
if ((q->mq_ops && IS_ENABLED(CONFIG_BLK_WBT_MQ)) ||
|
||||
(q->request_fn && IS_ENABLED(CONFIG_BLK_WBT_SQ)))
|
||||
if (queue_is_mq(q) && IS_ENABLED(CONFIG_BLK_WBT_MQ))
|
||||
wbt_init(q);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(wbt_enable_default);
|
||||
|
@ -760,11 +708,100 @@ void wbt_disable_default(struct request_queue *q)
|
|||
if (!rqos)
|
||||
return;
|
||||
rwb = RQWB(rqos);
|
||||
if (rwb->enable_state == WBT_STATE_ON_DEFAULT)
|
||||
if (rwb->enable_state == WBT_STATE_ON_DEFAULT) {
|
||||
blk_stat_deactivate(rwb->cb);
|
||||
rwb->wb_normal = 0;
|
||||
}
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(wbt_disable_default);
|
||||
|
||||
#ifdef CONFIG_BLK_DEBUG_FS
|
||||
static int wbt_curr_win_nsec_show(void *data, struct seq_file *m)
|
||||
{
|
||||
struct rq_qos *rqos = data;
|
||||
struct rq_wb *rwb = RQWB(rqos);
|
||||
|
||||
seq_printf(m, "%llu\n", rwb->cur_win_nsec);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int wbt_enabled_show(void *data, struct seq_file *m)
|
||||
{
|
||||
struct rq_qos *rqos = data;
|
||||
struct rq_wb *rwb = RQWB(rqos);
|
||||
|
||||
seq_printf(m, "%d\n", rwb->enable_state);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int wbt_id_show(void *data, struct seq_file *m)
|
||||
{
|
||||
struct rq_qos *rqos = data;
|
||||
|
||||
seq_printf(m, "%u\n", rqos->id);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int wbt_inflight_show(void *data, struct seq_file *m)
|
||||
{
|
||||
struct rq_qos *rqos = data;
|
||||
struct rq_wb *rwb = RQWB(rqos);
|
||||
int i;
|
||||
|
||||
for (i = 0; i < WBT_NUM_RWQ; i++)
|
||||
seq_printf(m, "%d: inflight %d\n", i,
|
||||
atomic_read(&rwb->rq_wait[i].inflight));
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int wbt_min_lat_nsec_show(void *data, struct seq_file *m)
|
||||
{
|
||||
struct rq_qos *rqos = data;
|
||||
struct rq_wb *rwb = RQWB(rqos);
|
||||
|
||||
seq_printf(m, "%lu\n", rwb->min_lat_nsec);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int wbt_unknown_cnt_show(void *data, struct seq_file *m)
|
||||
{
|
||||
struct rq_qos *rqos = data;
|
||||
struct rq_wb *rwb = RQWB(rqos);
|
||||
|
||||
seq_printf(m, "%u\n", rwb->unknown_cnt);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int wbt_normal_show(void *data, struct seq_file *m)
|
||||
{
|
||||
struct rq_qos *rqos = data;
|
||||
struct rq_wb *rwb = RQWB(rqos);
|
||||
|
||||
seq_printf(m, "%u\n", rwb->wb_normal);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int wbt_background_show(void *data, struct seq_file *m)
|
||||
{
|
||||
struct rq_qos *rqos = data;
|
||||
struct rq_wb *rwb = RQWB(rqos);
|
||||
|
||||
seq_printf(m, "%u\n", rwb->wb_background);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static const struct blk_mq_debugfs_attr wbt_debugfs_attrs[] = {
|
||||
{"curr_win_nsec", 0400, wbt_curr_win_nsec_show},
|
||||
{"enabled", 0400, wbt_enabled_show},
|
||||
{"id", 0400, wbt_id_show},
|
||||
{"inflight", 0400, wbt_inflight_show},
|
||||
{"min_lat_nsec", 0400, wbt_min_lat_nsec_show},
|
||||
{"unknown_cnt", 0400, wbt_unknown_cnt_show},
|
||||
{"wb_normal", 0400, wbt_normal_show},
|
||||
{"wb_background", 0400, wbt_background_show},
|
||||
{},
|
||||
};
|
||||
#endif
|
||||
|
||||
static struct rq_qos_ops wbt_rqos_ops = {
|
||||
.throttle = wbt_wait,
|
||||
|
@ -774,6 +811,9 @@ static struct rq_qos_ops wbt_rqos_ops = {
|
|||
.done = wbt_done,
|
||||
.cleanup = wbt_cleanup,
|
||||
.exit = wbt_exit,
|
||||
#ifdef CONFIG_BLK_DEBUG_FS
|
||||
.debugfs_attrs = wbt_debugfs_attrs,
|
||||
#endif
|
||||
};
|
||||
|
||||
int wbt_init(struct request_queue *q)
|
||||
|
|
|
@ -421,7 +421,7 @@ int blk_revalidate_disk_zones(struct gendisk *disk)
|
|||
* BIO based queues do not use a scheduler so only q->nr_zones
|
||||
* needs to be updated so that the sysfs exposed value is correct.
|
||||
*/
|
||||
if (!queue_is_rq_based(q)) {
|
||||
if (!queue_is_mq(q)) {
|
||||
q->nr_zones = nr_zones;
|
||||
return 0;
|
||||
}
|
||||
|
|
188
block/blk.h
188
block/blk.h
|
@ -7,12 +7,6 @@
|
|||
#include <xen/xen.h>
|
||||
#include "blk-mq.h"
|
||||
|
||||
/* Amount of time in which a process may batch requests */
|
||||
#define BLK_BATCH_TIME (HZ/50UL)
|
||||
|
||||
/* Number of requests a "batching" process may submit */
|
||||
#define BLK_BATCH_REQ 32
|
||||
|
||||
/* Max future timer expiry for timeouts */
|
||||
#define BLK_MAX_TIMEOUT (5 * HZ)
|
||||
|
||||
|
@ -38,85 +32,13 @@ struct blk_flush_queue {
|
|||
};
|
||||
|
||||
extern struct kmem_cache *blk_requestq_cachep;
|
||||
extern struct kmem_cache *request_cachep;
|
||||
extern struct kobj_type blk_queue_ktype;
|
||||
extern struct ida blk_queue_ida;
|
||||
|
||||
/*
|
||||
* @q->queue_lock is set while a queue is being initialized. Since we know
|
||||
* that no other threads access the queue object before @q->queue_lock has
|
||||
* been set, it is safe to manipulate queue flags without holding the
|
||||
* queue_lock if @q->queue_lock == NULL. See also blk_alloc_queue_node() and
|
||||
* blk_init_allocated_queue().
|
||||
*/
|
||||
static inline void queue_lockdep_assert_held(struct request_queue *q)
|
||||
static inline struct blk_flush_queue *
|
||||
blk_get_flush_queue(struct request_queue *q, struct blk_mq_ctx *ctx)
|
||||
{
|
||||
if (q->queue_lock)
|
||||
lockdep_assert_held(q->queue_lock);
|
||||
}
|
||||
|
||||
static inline void queue_flag_set_unlocked(unsigned int flag,
|
||||
struct request_queue *q)
|
||||
{
|
||||
if (test_bit(QUEUE_FLAG_INIT_DONE, &q->queue_flags) &&
|
||||
kref_read(&q->kobj.kref))
|
||||
lockdep_assert_held(q->queue_lock);
|
||||
__set_bit(flag, &q->queue_flags);
|
||||
}
|
||||
|
||||
static inline void queue_flag_clear_unlocked(unsigned int flag,
|
||||
struct request_queue *q)
|
||||
{
|
||||
if (test_bit(QUEUE_FLAG_INIT_DONE, &q->queue_flags) &&
|
||||
kref_read(&q->kobj.kref))
|
||||
lockdep_assert_held(q->queue_lock);
|
||||
__clear_bit(flag, &q->queue_flags);
|
||||
}
|
||||
|
||||
static inline int queue_flag_test_and_clear(unsigned int flag,
|
||||
struct request_queue *q)
|
||||
{
|
||||
queue_lockdep_assert_held(q);
|
||||
|
||||
if (test_bit(flag, &q->queue_flags)) {
|
||||
__clear_bit(flag, &q->queue_flags);
|
||||
return 1;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static inline int queue_flag_test_and_set(unsigned int flag,
|
||||
struct request_queue *q)
|
||||
{
|
||||
queue_lockdep_assert_held(q);
|
||||
|
||||
if (!test_bit(flag, &q->queue_flags)) {
|
||||
__set_bit(flag, &q->queue_flags);
|
||||
return 0;
|
||||
}
|
||||
|
||||
return 1;
|
||||
}
|
||||
|
||||
static inline void queue_flag_set(unsigned int flag, struct request_queue *q)
|
||||
{
|
||||
queue_lockdep_assert_held(q);
|
||||
__set_bit(flag, &q->queue_flags);
|
||||
}
|
||||
|
||||
static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
|
||||
{
|
||||
queue_lockdep_assert_held(q);
|
||||
__clear_bit(flag, &q->queue_flags);
|
||||
}
|
||||
|
||||
static inline struct blk_flush_queue *blk_get_flush_queue(
|
||||
struct request_queue *q, struct blk_mq_ctx *ctx)
|
||||
{
|
||||
if (q->mq_ops)
|
||||
return blk_mq_map_queue(q, ctx->cpu)->fq;
|
||||
return q->fq;
|
||||
return blk_mq_map_queue(q, REQ_OP_FLUSH, ctx->cpu)->fq;
|
||||
}
|
||||
|
||||
static inline void __blk_get_queue(struct request_queue *q)
|
||||
|
@ -128,15 +50,9 @@ struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q,
|
|||
int node, int cmd_size, gfp_t flags);
|
||||
void blk_free_flush_queue(struct blk_flush_queue *q);
|
||||
|
||||
int blk_init_rl(struct request_list *rl, struct request_queue *q,
|
||||
gfp_t gfp_mask);
|
||||
void blk_exit_rl(struct request_queue *q, struct request_list *rl);
|
||||
void blk_exit_queue(struct request_queue *q);
|
||||
void blk_rq_bio_prep(struct request_queue *q, struct request *rq,
|
||||
struct bio *bio);
|
||||
void blk_queue_bypass_start(struct request_queue *q);
|
||||
void blk_queue_bypass_end(struct request_queue *q);
|
||||
void __blk_queue_free_tags(struct request_queue *q);
|
||||
void blk_freeze_queue(struct request_queue *q);
|
||||
|
||||
static inline void blk_queue_enter_live(struct request_queue *q)
|
||||
|
@ -235,11 +151,8 @@ static inline bool bio_integrity_endio(struct bio *bio)
|
|||
}
|
||||
#endif /* CONFIG_BLK_DEV_INTEGRITY */
|
||||
|
||||
void blk_timeout_work(struct work_struct *work);
|
||||
unsigned long blk_rq_timeout(unsigned long timeout);
|
||||
void blk_add_timer(struct request *req);
|
||||
void blk_delete_timer(struct request *);
|
||||
|
||||
|
||||
bool bio_attempt_front_merge(struct request_queue *q, struct request *req,
|
||||
struct bio *bio);
|
||||
|
@ -248,34 +161,12 @@ bool bio_attempt_back_merge(struct request_queue *q, struct request *req,
|
|||
bool bio_attempt_discard_merge(struct request_queue *q, struct request *req,
|
||||
struct bio *bio);
|
||||
bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
|
||||
unsigned int *request_count,
|
||||
struct request **same_queue_rq);
|
||||
unsigned int blk_plug_queued_count(struct request_queue *q);
|
||||
|
||||
void blk_account_io_start(struct request *req, bool new_io);
|
||||
void blk_account_io_completion(struct request *req, unsigned int bytes);
|
||||
void blk_account_io_done(struct request *req, u64 now);
|
||||
|
||||
/*
|
||||
* EH timer and IO completion will both attempt to 'grab' the request, make
|
||||
* sure that only one of them succeeds. Steal the bottom bit of the
|
||||
* __deadline field for this.
|
||||
*/
|
||||
static inline int blk_mark_rq_complete(struct request *rq)
|
||||
{
|
||||
return test_and_set_bit(0, &rq->__deadline);
|
||||
}
|
||||
|
||||
static inline void blk_clear_rq_complete(struct request *rq)
|
||||
{
|
||||
clear_bit(0, &rq->__deadline);
|
||||
}
|
||||
|
||||
static inline bool blk_rq_is_complete(struct request *rq)
|
||||
{
|
||||
return test_bit(0, &rq->__deadline);
|
||||
}
|
||||
|
||||
/*
|
||||
* Internal elevator interface
|
||||
*/
|
||||
|
@ -283,23 +174,6 @@ static inline bool blk_rq_is_complete(struct request *rq)
|
|||
|
||||
void blk_insert_flush(struct request *rq);
|
||||
|
||||
static inline void elv_activate_rq(struct request_queue *q, struct request *rq)
|
||||
{
|
||||
struct elevator_queue *e = q->elevator;
|
||||
|
||||
if (e->type->ops.sq.elevator_activate_req_fn)
|
||||
e->type->ops.sq.elevator_activate_req_fn(q, rq);
|
||||
}
|
||||
|
||||
static inline void elv_deactivate_rq(struct request_queue *q, struct request *rq)
|
||||
{
|
||||
struct elevator_queue *e = q->elevator;
|
||||
|
||||
if (e->type->ops.sq.elevator_deactivate_req_fn)
|
||||
e->type->ops.sq.elevator_deactivate_req_fn(q, rq);
|
||||
}
|
||||
|
||||
int elevator_init(struct request_queue *);
|
||||
int elevator_init_mq(struct request_queue *q);
|
||||
int elevator_switch_mq(struct request_queue *q,
|
||||
struct elevator_type *new_e);
|
||||
|
@ -334,31 +208,8 @@ void blk_rq_set_mixed_merge(struct request *rq);
|
|||
bool blk_rq_merge_ok(struct request *rq, struct bio *bio);
|
||||
enum elv_merge blk_try_merge(struct request *rq, struct bio *bio);
|
||||
|
||||
void blk_queue_congestion_threshold(struct request_queue *q);
|
||||
|
||||
int blk_dev_init(void);
|
||||
|
||||
|
||||
/*
|
||||
* Return the threshold (number of used requests) at which the queue is
|
||||
* considered to be congested. It include a little hysteresis to keep the
|
||||
* context switch rate down.
|
||||
*/
|
||||
static inline int queue_congestion_on_threshold(struct request_queue *q)
|
||||
{
|
||||
return q->nr_congestion_on;
|
||||
}
|
||||
|
||||
/*
|
||||
* The threshold at which a queue is considered to be uncongested
|
||||
*/
|
||||
static inline int queue_congestion_off_threshold(struct request_queue *q)
|
||||
{
|
||||
return q->nr_congestion_off;
|
||||
}
|
||||
|
||||
extern int blk_update_nr_requests(struct request_queue *, unsigned int);
|
||||
|
||||
/*
|
||||
* Contribute to IO statistics IFF:
|
||||
*
|
||||
|
@ -380,21 +231,6 @@ static inline void req_set_nomerge(struct request_queue *q, struct request *req)
|
|||
q->last_merge = NULL;
|
||||
}
|
||||
|
||||
/*
|
||||
* Steal a bit from this field for legacy IO path atomic IO marking. Note that
|
||||
* setting the deadline clears the bottom bit, potentially clearing the
|
||||
* completed bit. The user has to be OK with this (current ones are fine).
|
||||
*/
|
||||
static inline void blk_rq_set_deadline(struct request *rq, unsigned long time)
|
||||
{
|
||||
rq->__deadline = time & ~0x1UL;
|
||||
}
|
||||
|
||||
static inline unsigned long blk_rq_deadline(struct request *rq)
|
||||
{
|
||||
return rq->__deadline & ~0x1UL;
|
||||
}
|
||||
|
||||
/*
|
||||
* The max size one bio can handle is UINT_MAX becasue bvec_iter.bi_size
|
||||
* is defined as 'unsigned int', meantime it has to aligned to with logical
|
||||
|
@ -416,22 +252,6 @@ void ioc_clear_queue(struct request_queue *q);
|
|||
|
||||
int create_task_io_context(struct task_struct *task, gfp_t gfp_mask, int node);
|
||||
|
||||
/**
|
||||
* rq_ioc - determine io_context for request allocation
|
||||
* @bio: request being allocated is for this bio (can be %NULL)
|
||||
*
|
||||
* Determine io_context to use for request allocation for @bio. May return
|
||||
* %NULL if %current->io_context doesn't exist.
|
||||
*/
|
||||
static inline struct io_context *rq_ioc(struct bio *bio)
|
||||
{
|
||||
#ifdef CONFIG_BLK_CGROUP
|
||||
if (bio && bio->bi_ioc)
|
||||
return bio->bi_ioc;
|
||||
#endif
|
||||
return current->io_context;
|
||||
}
|
||||
|
||||
/**
|
||||
* create_io_context - try to create task->io_context
|
||||
* @gfp_mask: allocation mask
|
||||
|
@ -490,8 +310,6 @@ static inline void blk_queue_bounce(struct request_queue *q, struct bio **bio)
|
|||
}
|
||||
#endif /* CONFIG_BOUNCE */
|
||||
|
||||
extern void blk_drain_queue(struct request_queue *q);
|
||||
|
||||
#ifdef CONFIG_BLK_CGROUP_IOLATENCY
|
||||
extern int blk_iolatency_init(struct request_queue *q);
|
||||
#else
|
||||
|
|
|
@ -277,7 +277,8 @@ static struct bio *bounce_clone_bio(struct bio *bio_src, gfp_t gfp_mask,
|
|||
}
|
||||
}
|
||||
|
||||
bio_clone_blkcg_association(bio, bio_src);
|
||||
bio_clone_blkg_association(bio, bio_src);
|
||||
blkcg_bio_issue_init(bio);
|
||||
|
||||
return bio;
|
||||
}
|
||||
|
|
142
block/bsg-lib.c
142
block/bsg-lib.c
|
@ -21,7 +21,7 @@
|
|||
*
|
||||
*/
|
||||
#include <linux/slab.h>
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/blk-mq.h>
|
||||
#include <linux/delay.h>
|
||||
#include <linux/scatterlist.h>
|
||||
#include <linux/bsg-lib.h>
|
||||
|
@ -31,6 +31,12 @@
|
|||
|
||||
#define uptr64(val) ((void __user *)(uintptr_t)(val))
|
||||
|
||||
struct bsg_set {
|
||||
struct blk_mq_tag_set tag_set;
|
||||
bsg_job_fn *job_fn;
|
||||
bsg_timeout_fn *timeout_fn;
|
||||
};
|
||||
|
||||
static int bsg_transport_check_proto(struct sg_io_v4 *hdr)
|
||||
{
|
||||
if (hdr->protocol != BSG_PROTOCOL_SCSI ||
|
||||
|
@ -129,7 +135,7 @@ static void bsg_teardown_job(struct kref *kref)
|
|||
kfree(job->request_payload.sg_list);
|
||||
kfree(job->reply_payload.sg_list);
|
||||
|
||||
blk_end_request_all(rq, BLK_STS_OK);
|
||||
blk_mq_end_request(rq, BLK_STS_OK);
|
||||
}
|
||||
|
||||
void bsg_job_put(struct bsg_job *job)
|
||||
|
@ -157,15 +163,15 @@ void bsg_job_done(struct bsg_job *job, int result,
|
|||
{
|
||||
job->result = result;
|
||||
job->reply_payload_rcv_len = reply_payload_rcv_len;
|
||||
blk_complete_request(blk_mq_rq_from_pdu(job));
|
||||
blk_mq_complete_request(blk_mq_rq_from_pdu(job));
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(bsg_job_done);
|
||||
|
||||
/**
|
||||
* bsg_softirq_done - softirq done routine for destroying the bsg requests
|
||||
* bsg_complete - softirq done routine for destroying the bsg requests
|
||||
* @rq: BSG request that holds the job to be destroyed
|
||||
*/
|
||||
static void bsg_softirq_done(struct request *rq)
|
||||
static void bsg_complete(struct request *rq)
|
||||
{
|
||||
struct bsg_job *job = blk_mq_rq_to_pdu(rq);
|
||||
|
||||
|
@ -224,54 +230,48 @@ failjob_rls_job:
|
|||
}
|
||||
|
||||
/**
|
||||
* bsg_request_fn - generic handler for bsg requests
|
||||
* @q: request queue to manage
|
||||
* bsg_queue_rq - generic handler for bsg requests
|
||||
* @hctx: hardware queue
|
||||
* @bd: queue data
|
||||
*
|
||||
* On error the create_bsg_job function should return a -Exyz error value
|
||||
* that will be set to ->result.
|
||||
*
|
||||
* Drivers/subsys should pass this to the queue init function.
|
||||
*/
|
||||
static void bsg_request_fn(struct request_queue *q)
|
||||
__releases(q->queue_lock)
|
||||
__acquires(q->queue_lock)
|
||||
static blk_status_t bsg_queue_rq(struct blk_mq_hw_ctx *hctx,
|
||||
const struct blk_mq_queue_data *bd)
|
||||
{
|
||||
struct request_queue *q = hctx->queue;
|
||||
struct device *dev = q->queuedata;
|
||||
struct request *req;
|
||||
struct request *req = bd->rq;
|
||||
struct bsg_set *bset =
|
||||
container_of(q->tag_set, struct bsg_set, tag_set);
|
||||
int ret;
|
||||
|
||||
blk_mq_start_request(req);
|
||||
|
||||
if (!get_device(dev))
|
||||
return;
|
||||
return BLK_STS_IOERR;
|
||||
|
||||
while (1) {
|
||||
req = blk_fetch_request(q);
|
||||
if (!req)
|
||||
break;
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
if (!bsg_prepare_job(dev, req))
|
||||
return BLK_STS_IOERR;
|
||||
|
||||
if (!bsg_prepare_job(dev, req)) {
|
||||
blk_end_request_all(req, BLK_STS_OK);
|
||||
spin_lock_irq(q->queue_lock);
|
||||
continue;
|
||||
}
|
||||
ret = bset->job_fn(blk_mq_rq_to_pdu(req));
|
||||
if (ret)
|
||||
return BLK_STS_IOERR;
|
||||
|
||||
ret = q->bsg_job_fn(blk_mq_rq_to_pdu(req));
|
||||
spin_lock_irq(q->queue_lock);
|
||||
if (ret)
|
||||
break;
|
||||
}
|
||||
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
put_device(dev);
|
||||
spin_lock_irq(q->queue_lock);
|
||||
return BLK_STS_OK;
|
||||
}
|
||||
|
||||
/* called right after the request is allocated for the request_queue */
|
||||
static int bsg_init_rq(struct request_queue *q, struct request *req, gfp_t gfp)
|
||||
static int bsg_init_rq(struct blk_mq_tag_set *set, struct request *req,
|
||||
unsigned int hctx_idx, unsigned int numa_node)
|
||||
{
|
||||
struct bsg_job *job = blk_mq_rq_to_pdu(req);
|
||||
|
||||
job->reply = kzalloc(SCSI_SENSE_BUFFERSIZE, gfp);
|
||||
job->reply = kzalloc(SCSI_SENSE_BUFFERSIZE, GFP_KERNEL);
|
||||
if (!job->reply)
|
||||
return -ENOMEM;
|
||||
return 0;
|
||||
|
@ -289,13 +289,47 @@ static void bsg_initialize_rq(struct request *req)
|
|||
job->dd_data = job + 1;
|
||||
}
|
||||
|
||||
static void bsg_exit_rq(struct request_queue *q, struct request *req)
|
||||
static void bsg_exit_rq(struct blk_mq_tag_set *set, struct request *req,
|
||||
unsigned int hctx_idx)
|
||||
{
|
||||
struct bsg_job *job = blk_mq_rq_to_pdu(req);
|
||||
|
||||
kfree(job->reply);
|
||||
}
|
||||
|
||||
void bsg_remove_queue(struct request_queue *q)
|
||||
{
|
||||
if (q) {
|
||||
struct bsg_set *bset =
|
||||
container_of(q->tag_set, struct bsg_set, tag_set);
|
||||
|
||||
bsg_unregister_queue(q);
|
||||
blk_cleanup_queue(q);
|
||||
blk_mq_free_tag_set(&bset->tag_set);
|
||||
kfree(bset);
|
||||
}
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(bsg_remove_queue);
|
||||
|
||||
static enum blk_eh_timer_return bsg_timeout(struct request *rq, bool reserved)
|
||||
{
|
||||
struct bsg_set *bset =
|
||||
container_of(rq->q->tag_set, struct bsg_set, tag_set);
|
||||
|
||||
if (!bset->timeout_fn)
|
||||
return BLK_EH_DONE;
|
||||
return bset->timeout_fn(rq);
|
||||
}
|
||||
|
||||
static const struct blk_mq_ops bsg_mq_ops = {
|
||||
.queue_rq = bsg_queue_rq,
|
||||
.init_request = bsg_init_rq,
|
||||
.exit_request = bsg_exit_rq,
|
||||
.initialize_rq_fn = bsg_initialize_rq,
|
||||
.complete = bsg_complete,
|
||||
.timeout = bsg_timeout,
|
||||
};
|
||||
|
||||
/**
|
||||
* bsg_setup_queue - Create and add the bsg hooks so we can receive requests
|
||||
* @dev: device to attach bsg device to
|
||||
|
@ -304,28 +338,38 @@ static void bsg_exit_rq(struct request_queue *q, struct request *req)
|
|||
* @dd_job_size: size of LLD data needed for each job
|
||||
*/
|
||||
struct request_queue *bsg_setup_queue(struct device *dev, const char *name,
|
||||
bsg_job_fn *job_fn, int dd_job_size)
|
||||
bsg_job_fn *job_fn, bsg_timeout_fn *timeout, int dd_job_size)
|
||||
{
|
||||
struct bsg_set *bset;
|
||||
struct blk_mq_tag_set *set;
|
||||
struct request_queue *q;
|
||||
int ret;
|
||||
int ret = -ENOMEM;
|
||||
|
||||
q = blk_alloc_queue(GFP_KERNEL);
|
||||
if (!q)
|
||||
bset = kzalloc(sizeof(*bset), GFP_KERNEL);
|
||||
if (!bset)
|
||||
return ERR_PTR(-ENOMEM);
|
||||
q->cmd_size = sizeof(struct bsg_job) + dd_job_size;
|
||||
q->init_rq_fn = bsg_init_rq;
|
||||
q->exit_rq_fn = bsg_exit_rq;
|
||||
q->initialize_rq_fn = bsg_initialize_rq;
|
||||
q->request_fn = bsg_request_fn;
|
||||
|
||||
ret = blk_init_allocated_queue(q);
|
||||
if (ret)
|
||||
goto out_cleanup_queue;
|
||||
bset->job_fn = job_fn;
|
||||
bset->timeout_fn = timeout;
|
||||
|
||||
set = &bset->tag_set;
|
||||
set->ops = &bsg_mq_ops,
|
||||
set->nr_hw_queues = 1;
|
||||
set->queue_depth = 128;
|
||||
set->numa_node = NUMA_NO_NODE;
|
||||
set->cmd_size = sizeof(struct bsg_job) + dd_job_size;
|
||||
set->flags = BLK_MQ_F_NO_SCHED | BLK_MQ_F_BLOCKING;
|
||||
if (blk_mq_alloc_tag_set(set))
|
||||
goto out_tag_set;
|
||||
|
||||
q = blk_mq_init_queue(set);
|
||||
if (IS_ERR(q)) {
|
||||
ret = PTR_ERR(q);
|
||||
goto out_queue;
|
||||
}
|
||||
|
||||
q->queuedata = dev;
|
||||
q->bsg_job_fn = job_fn;
|
||||
blk_queue_flag_set(QUEUE_FLAG_BIDI, q);
|
||||
blk_queue_softirq_done(q, bsg_softirq_done);
|
||||
blk_queue_rq_timeout(q, BLK_DEFAULT_SG_TIMEOUT);
|
||||
|
||||
ret = bsg_register_queue(q, dev, name, &bsg_transport_ops);
|
||||
|
@ -338,6 +382,10 @@ struct request_queue *bsg_setup_queue(struct device *dev, const char *name,
|
|||
return q;
|
||||
out_cleanup_queue:
|
||||
blk_cleanup_queue(q);
|
||||
out_queue:
|
||||
blk_mq_free_tag_set(set);
|
||||
out_tag_set:
|
||||
kfree(bset);
|
||||
return ERR_PTR(ret);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(bsg_setup_queue);
|
||||
|
|
|
@ -471,7 +471,7 @@ int bsg_register_queue(struct request_queue *q, struct device *parent,
|
|||
/*
|
||||
* we need a proper transport to send commands, not a stacked device
|
||||
*/
|
||||
if (!queue_is_rq_based(q))
|
||||
if (!queue_is_mq(q))
|
||||
return 0;
|
||||
|
||||
bcd = &q->bsg_dev;
|
||||
|
|
4916
block/cfq-iosched.c
4916
block/cfq-iosched.c
File diff suppressed because it is too large
Load Diff
|
@ -1,560 +0,0 @@
|
|||
/*
|
||||
* Deadline i/o scheduler.
|
||||
*
|
||||
* Copyright (C) 2002 Jens Axboe <axboe@kernel.dk>
|
||||
*/
|
||||
#include <linux/kernel.h>
|
||||
#include <linux/fs.h>
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/elevator.h>
|
||||
#include <linux/bio.h>
|
||||
#include <linux/module.h>
|
||||
#include <linux/slab.h>
|
||||
#include <linux/init.h>
|
||||
#include <linux/compiler.h>
|
||||
#include <linux/rbtree.h>
|
||||
|
||||
/*
|
||||
* See Documentation/block/deadline-iosched.txt
|
||||
*/
|
||||
static const int read_expire = HZ / 2; /* max time before a read is submitted. */
|
||||
static const int write_expire = 5 * HZ; /* ditto for writes, these limits are SOFT! */
|
||||
static const int writes_starved = 2; /* max times reads can starve a write */
|
||||
static const int fifo_batch = 16; /* # of sequential requests treated as one
|
||||
by the above parameters. For throughput. */
|
||||
|
||||
struct deadline_data {
|
||||
/*
|
||||
* run time data
|
||||
*/
|
||||
|
||||
/*
|
||||
* requests (deadline_rq s) are present on both sort_list and fifo_list
|
||||
*/
|
||||
struct rb_root sort_list[2];
|
||||
struct list_head fifo_list[2];
|
||||
|
||||
/*
|
||||
* next in sort order. read, write or both are NULL
|
||||
*/
|
||||
struct request *next_rq[2];
|
||||
unsigned int batching; /* number of sequential requests made */
|
||||
unsigned int starved; /* times reads have starved writes */
|
||||
|
||||
/*
|
||||
* settings that change how the i/o scheduler behaves
|
||||
*/
|
||||
int fifo_expire[2];
|
||||
int fifo_batch;
|
||||
int writes_starved;
|
||||
int front_merges;
|
||||
};
|
||||
|
||||
static inline struct rb_root *
|
||||
deadline_rb_root(struct deadline_data *dd, struct request *rq)
|
||||
{
|
||||
return &dd->sort_list[rq_data_dir(rq)];
|
||||
}
|
||||
|
||||
/*
|
||||
* get the request after `rq' in sector-sorted order
|
||||
*/
|
||||
static inline struct request *
|
||||
deadline_latter_request(struct request *rq)
|
||||
{
|
||||
struct rb_node *node = rb_next(&rq->rb_node);
|
||||
|
||||
if (node)
|
||||
return rb_entry_rq(node);
|
||||
|
||||
return NULL;
|
||||
}
|
||||
|
||||
static void
|
||||
deadline_add_rq_rb(struct deadline_data *dd, struct request *rq)
|
||||
{
|
||||
struct rb_root *root = deadline_rb_root(dd, rq);
|
||||
|
||||
elv_rb_add(root, rq);
|
||||
}
|
||||
|
||||
static inline void
|
||||
deadline_del_rq_rb(struct deadline_data *dd, struct request *rq)
|
||||
{
|
||||
const int data_dir = rq_data_dir(rq);
|
||||
|
||||
if (dd->next_rq[data_dir] == rq)
|
||||
dd->next_rq[data_dir] = deadline_latter_request(rq);
|
||||
|
||||
elv_rb_del(deadline_rb_root(dd, rq), rq);
|
||||
}
|
||||
|
||||
/*
|
||||
* add rq to rbtree and fifo
|
||||
*/
|
||||
static void
|
||||
deadline_add_request(struct request_queue *q, struct request *rq)
|
||||
{
|
||||
struct deadline_data *dd = q->elevator->elevator_data;
|
||||
const int data_dir = rq_data_dir(rq);
|
||||
|
||||
/*
|
||||
* This may be a requeue of a write request that has locked its
|
||||
* target zone. If it is the case, this releases the zone lock.
|
||||
*/
|
||||
blk_req_zone_write_unlock(rq);
|
||||
|
||||
deadline_add_rq_rb(dd, rq);
|
||||
|
||||
/*
|
||||
* set expire time and add to fifo list
|
||||
*/
|
||||
rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
|
||||
list_add_tail(&rq->queuelist, &dd->fifo_list[data_dir]);
|
||||
}
|
||||
|
||||
/*
|
||||
* remove rq from rbtree and fifo.
|
||||
*/
|
||||
static void deadline_remove_request(struct request_queue *q, struct request *rq)
|
||||
{
|
||||
struct deadline_data *dd = q->elevator->elevator_data;
|
||||
|
||||
rq_fifo_clear(rq);
|
||||
deadline_del_rq_rb(dd, rq);
|
||||
}
|
||||
|
||||
static enum elv_merge
|
||||
deadline_merge(struct request_queue *q, struct request **req, struct bio *bio)
|
||||
{
|
||||
struct deadline_data *dd = q->elevator->elevator_data;
|
||||
struct request *__rq;
|
||||
|
||||
/*
|
||||
* check for front merge
|
||||
*/
|
||||
if (dd->front_merges) {
|
||||
sector_t sector = bio_end_sector(bio);
|
||||
|
||||
__rq = elv_rb_find(&dd->sort_list[bio_data_dir(bio)], sector);
|
||||
if (__rq) {
|
||||
BUG_ON(sector != blk_rq_pos(__rq));
|
||||
|
||||
if (elv_bio_merge_ok(__rq, bio)) {
|
||||
*req = __rq;
|
||||
return ELEVATOR_FRONT_MERGE;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return ELEVATOR_NO_MERGE;
|
||||
}
|
||||
|
||||
static void deadline_merged_request(struct request_queue *q,
|
||||
struct request *req, enum elv_merge type)
|
||||
{
|
||||
struct deadline_data *dd = q->elevator->elevator_data;
|
||||
|
||||
/*
|
||||
* if the merge was a front merge, we need to reposition request
|
||||
*/
|
||||
if (type == ELEVATOR_FRONT_MERGE) {
|
||||
elv_rb_del(deadline_rb_root(dd, req), req);
|
||||
deadline_add_rq_rb(dd, req);
|
||||
}
|
||||
}
|
||||
|
||||
static void
|
||||
deadline_merged_requests(struct request_queue *q, struct request *req,
|
||||
struct request *next)
|
||||
{
|
||||
/*
|
||||
* if next expires before rq, assign its expire time to rq
|
||||
* and move into next position (next will be deleted) in fifo
|
||||
*/
|
||||
if (!list_empty(&req->queuelist) && !list_empty(&next->queuelist)) {
|
||||
if (time_before((unsigned long)next->fifo_time,
|
||||
(unsigned long)req->fifo_time)) {
|
||||
list_move(&req->queuelist, &next->queuelist);
|
||||
req->fifo_time = next->fifo_time;
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* kill knowledge of next, this one is a goner
|
||||
*/
|
||||
deadline_remove_request(q, next);
|
||||
}
|
||||
|
||||
/*
|
||||
* move request from sort list to dispatch queue.
|
||||
*/
|
||||
static inline void
|
||||
deadline_move_to_dispatch(struct deadline_data *dd, struct request *rq)
|
||||
{
|
||||
struct request_queue *q = rq->q;
|
||||
|
||||
/*
|
||||
* For a zoned block device, write requests must write lock their
|
||||
* target zone.
|
||||
*/
|
||||
blk_req_zone_write_lock(rq);
|
||||
|
||||
deadline_remove_request(q, rq);
|
||||
elv_dispatch_add_tail(q, rq);
|
||||
}
|
||||
|
||||
/*
|
||||
* move an entry to dispatch queue
|
||||
*/
|
||||
static void
|
||||
deadline_move_request(struct deadline_data *dd, struct request *rq)
|
||||
{
|
||||
const int data_dir = rq_data_dir(rq);
|
||||
|
||||
dd->next_rq[READ] = NULL;
|
||||
dd->next_rq[WRITE] = NULL;
|
||||
dd->next_rq[data_dir] = deadline_latter_request(rq);
|
||||
|
||||
/*
|
||||
* take it off the sort and fifo list, move
|
||||
* to dispatch queue
|
||||
*/
|
||||
deadline_move_to_dispatch(dd, rq);
|
||||
}
|
||||
|
||||
/*
|
||||
* deadline_check_fifo returns 0 if there are no expired requests on the fifo,
|
||||
* 1 otherwise. Requires !list_empty(&dd->fifo_list[data_dir])
|
||||
*/
|
||||
static inline int deadline_check_fifo(struct deadline_data *dd, int ddir)
|
||||
{
|
||||
struct request *rq = rq_entry_fifo(dd->fifo_list[ddir].next);
|
||||
|
||||
/*
|
||||
* rq is expired!
|
||||
*/
|
||||
if (time_after_eq(jiffies, (unsigned long)rq->fifo_time))
|
||||
return 1;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* For the specified data direction, return the next request to dispatch using
|
||||
* arrival ordered lists.
|
||||
*/
|
||||
static struct request *
|
||||
deadline_fifo_request(struct deadline_data *dd, int data_dir)
|
||||
{
|
||||
struct request *rq;
|
||||
|
||||
if (WARN_ON_ONCE(data_dir != READ && data_dir != WRITE))
|
||||
return NULL;
|
||||
|
||||
if (list_empty(&dd->fifo_list[data_dir]))
|
||||
return NULL;
|
||||
|
||||
rq = rq_entry_fifo(dd->fifo_list[data_dir].next);
|
||||
if (data_dir == READ || !blk_queue_is_zoned(rq->q))
|
||||
return rq;
|
||||
|
||||
/*
|
||||
* Look for a write request that can be dispatched, that is one with
|
||||
* an unlocked target zone.
|
||||
*/
|
||||
list_for_each_entry(rq, &dd->fifo_list[WRITE], queuelist) {
|
||||
if (blk_req_can_dispatch_to_zone(rq))
|
||||
return rq;
|
||||
}
|
||||
|
||||
return NULL;
|
||||
}
|
||||
|
||||
/*
|
||||
* For the specified data direction, return the next request to dispatch using
|
||||
* sector position sorted lists.
|
||||
*/
|
||||
static struct request *
|
||||
deadline_next_request(struct deadline_data *dd, int data_dir)
|
||||
{
|
||||
struct request *rq;
|
||||
|
||||
if (WARN_ON_ONCE(data_dir != READ && data_dir != WRITE))
|
||||
return NULL;
|
||||
|
||||
rq = dd->next_rq[data_dir];
|
||||
if (!rq)
|
||||
return NULL;
|
||||
|
||||
if (data_dir == READ || !blk_queue_is_zoned(rq->q))
|
||||
return rq;
|
||||
|
||||
/*
|
||||
* Look for a write request that can be dispatched, that is one with
|
||||
* an unlocked target zone.
|
||||
*/
|
||||
while (rq) {
|
||||
if (blk_req_can_dispatch_to_zone(rq))
|
||||
return rq;
|
||||
rq = deadline_latter_request(rq);
|
||||
}
|
||||
|
||||
return NULL;
|
||||
}
|
||||
|
||||
/*
|
||||
* deadline_dispatch_requests selects the best request according to
|
||||
* read/write expire, fifo_batch, etc
|
||||
*/
|
||||
static int deadline_dispatch_requests(struct request_queue *q, int force)
|
||||
{
|
||||
struct deadline_data *dd = q->elevator->elevator_data;
|
||||
const int reads = !list_empty(&dd->fifo_list[READ]);
|
||||
const int writes = !list_empty(&dd->fifo_list[WRITE]);
|
||||
struct request *rq, *next_rq;
|
||||
int data_dir;
|
||||
|
||||
/*
|
||||
* batches are currently reads XOR writes
|
||||
*/
|
||||
rq = deadline_next_request(dd, WRITE);
|
||||
if (!rq)
|
||||
rq = deadline_next_request(dd, READ);
|
||||
|
||||
if (rq && dd->batching < dd->fifo_batch)
|
||||
/* we have a next request are still entitled to batch */
|
||||
goto dispatch_request;
|
||||
|
||||
/*
|
||||
* at this point we are not running a batch. select the appropriate
|
||||
* data direction (read / write)
|
||||
*/
|
||||
|
||||
if (reads) {
|
||||
BUG_ON(RB_EMPTY_ROOT(&dd->sort_list[READ]));
|
||||
|
||||
if (deadline_fifo_request(dd, WRITE) &&
|
||||
(dd->starved++ >= dd->writes_starved))
|
||||
goto dispatch_writes;
|
||||
|
||||
data_dir = READ;
|
||||
|
||||
goto dispatch_find_request;
|
||||
}
|
||||
|
||||
/*
|
||||
* there are either no reads or writes have been starved
|
||||
*/
|
||||
|
||||
if (writes) {
|
||||
dispatch_writes:
|
||||
BUG_ON(RB_EMPTY_ROOT(&dd->sort_list[WRITE]));
|
||||
|
||||
dd->starved = 0;
|
||||
|
||||
data_dir = WRITE;
|
||||
|
||||
goto dispatch_find_request;
|
||||
}
|
||||
|
||||
return 0;
|
||||
|
||||
dispatch_find_request:
|
||||
/*
|
||||
* we are not running a batch, find best request for selected data_dir
|
||||
*/
|
||||
next_rq = deadline_next_request(dd, data_dir);
|
||||
if (deadline_check_fifo(dd, data_dir) || !next_rq) {
|
||||
/*
|
||||
* A deadline has expired, the last request was in the other
|
||||
* direction, or we have run out of higher-sectored requests.
|
||||
* Start again from the request with the earliest expiry time.
|
||||
*/
|
||||
rq = deadline_fifo_request(dd, data_dir);
|
||||
} else {
|
||||
/*
|
||||
* The last req was the same dir and we have a next request in
|
||||
* sort order. No expired requests so continue on from here.
|
||||
*/
|
||||
rq = next_rq;
|
||||
}
|
||||
|
||||
/*
|
||||
* For a zoned block device, if we only have writes queued and none of
|
||||
* them can be dispatched, rq will be NULL.
|
||||
*/
|
||||
if (!rq)
|
||||
return 0;
|
||||
|
||||
dd->batching = 0;
|
||||
|
||||
dispatch_request:
|
||||
/*
|
||||
* rq is the selected appropriate request.
|
||||
*/
|
||||
dd->batching++;
|
||||
deadline_move_request(dd, rq);
|
||||
|
||||
return 1;
|
||||
}
|
||||
|
||||
/*
|
||||
* For zoned block devices, write unlock the target zone of completed
|
||||
* write requests.
|
||||
*/
|
||||
static void
|
||||
deadline_completed_request(struct request_queue *q, struct request *rq)
|
||||
{
|
||||
blk_req_zone_write_unlock(rq);
|
||||
}
|
||||
|
||||
static void deadline_exit_queue(struct elevator_queue *e)
|
||||
{
|
||||
struct deadline_data *dd = e->elevator_data;
|
||||
|
||||
BUG_ON(!list_empty(&dd->fifo_list[READ]));
|
||||
BUG_ON(!list_empty(&dd->fifo_list[WRITE]));
|
||||
|
||||
kfree(dd);
|
||||
}
|
||||
|
||||
/*
|
||||
* initialize elevator private data (deadline_data).
|
||||
*/
|
||||
static int deadline_init_queue(struct request_queue *q, struct elevator_type *e)
|
||||
{
|
||||
struct deadline_data *dd;
|
||||
struct elevator_queue *eq;
|
||||
|
||||
eq = elevator_alloc(q, e);
|
||||
if (!eq)
|
||||
return -ENOMEM;
|
||||
|
||||
dd = kzalloc_node(sizeof(*dd), GFP_KERNEL, q->node);
|
||||
if (!dd) {
|
||||
kobject_put(&eq->kobj);
|
||||
return -ENOMEM;
|
||||
}
|
||||
eq->elevator_data = dd;
|
||||
|
||||
INIT_LIST_HEAD(&dd->fifo_list[READ]);
|
||||
INIT_LIST_HEAD(&dd->fifo_list[WRITE]);
|
||||
dd->sort_list[READ] = RB_ROOT;
|
||||
dd->sort_list[WRITE] = RB_ROOT;
|
||||
dd->fifo_expire[READ] = read_expire;
|
||||
dd->fifo_expire[WRITE] = write_expire;
|
||||
dd->writes_starved = writes_starved;
|
||||
dd->front_merges = 1;
|
||||
dd->fifo_batch = fifo_batch;
|
||||
|
||||
spin_lock_irq(q->queue_lock);
|
||||
q->elevator = eq;
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* sysfs parts below
|
||||
*/
|
||||
|
||||
static ssize_t
|
||||
deadline_var_show(int var, char *page)
|
||||
{
|
||||
return sprintf(page, "%d\n", var);
|
||||
}
|
||||
|
||||
static void
|
||||
deadline_var_store(int *var, const char *page)
|
||||
{
|
||||
char *p = (char *) page;
|
||||
|
||||
*var = simple_strtol(p, &p, 10);
|
||||
}
|
||||
|
||||
#define SHOW_FUNCTION(__FUNC, __VAR, __CONV) \
|
||||
static ssize_t __FUNC(struct elevator_queue *e, char *page) \
|
||||
{ \
|
||||
struct deadline_data *dd = e->elevator_data; \
|
||||
int __data = __VAR; \
|
||||
if (__CONV) \
|
||||
__data = jiffies_to_msecs(__data); \
|
||||
return deadline_var_show(__data, (page)); \
|
||||
}
|
||||
SHOW_FUNCTION(deadline_read_expire_show, dd->fifo_expire[READ], 1);
|
||||
SHOW_FUNCTION(deadline_write_expire_show, dd->fifo_expire[WRITE], 1);
|
||||
SHOW_FUNCTION(deadline_writes_starved_show, dd->writes_starved, 0);
|
||||
SHOW_FUNCTION(deadline_front_merges_show, dd->front_merges, 0);
|
||||
SHOW_FUNCTION(deadline_fifo_batch_show, dd->fifo_batch, 0);
|
||||
#undef SHOW_FUNCTION
|
||||
|
||||
#define STORE_FUNCTION(__FUNC, __PTR, MIN, MAX, __CONV) \
|
||||
static ssize_t __FUNC(struct elevator_queue *e, const char *page, size_t count) \
|
||||
{ \
|
||||
struct deadline_data *dd = e->elevator_data; \
|
||||
int __data; \
|
||||
deadline_var_store(&__data, (page)); \
|
||||
if (__data < (MIN)) \
|
||||
__data = (MIN); \
|
||||
else if (__data > (MAX)) \
|
||||
__data = (MAX); \
|
||||
if (__CONV) \
|
||||
*(__PTR) = msecs_to_jiffies(__data); \
|
||||
else \
|
||||
*(__PTR) = __data; \
|
||||
return count; \
|
||||
}
|
||||
STORE_FUNCTION(deadline_read_expire_store, &dd->fifo_expire[READ], 0, INT_MAX, 1);
|
||||
STORE_FUNCTION(deadline_write_expire_store, &dd->fifo_expire[WRITE], 0, INT_MAX, 1);
|
||||
STORE_FUNCTION(deadline_writes_starved_store, &dd->writes_starved, INT_MIN, INT_MAX, 0);
|
||||
STORE_FUNCTION(deadline_front_merges_store, &dd->front_merges, 0, 1, 0);
|
||||
STORE_FUNCTION(deadline_fifo_batch_store, &dd->fifo_batch, 0, INT_MAX, 0);
|
||||
#undef STORE_FUNCTION
|
||||
|
||||
#define DD_ATTR(name) \
|
||||
__ATTR(name, 0644, deadline_##name##_show, deadline_##name##_store)
|
||||
|
||||
static struct elv_fs_entry deadline_attrs[] = {
|
||||
DD_ATTR(read_expire),
|
||||
DD_ATTR(write_expire),
|
||||
DD_ATTR(writes_starved),
|
||||
DD_ATTR(front_merges),
|
||||
DD_ATTR(fifo_batch),
|
||||
__ATTR_NULL
|
||||
};
|
||||
|
||||
static struct elevator_type iosched_deadline = {
|
||||
.ops.sq = {
|
||||
.elevator_merge_fn = deadline_merge,
|
||||
.elevator_merged_fn = deadline_merged_request,
|
||||
.elevator_merge_req_fn = deadline_merged_requests,
|
||||
.elevator_dispatch_fn = deadline_dispatch_requests,
|
||||
.elevator_completed_req_fn = deadline_completed_request,
|
||||
.elevator_add_req_fn = deadline_add_request,
|
||||
.elevator_former_req_fn = elv_rb_former_request,
|
||||
.elevator_latter_req_fn = elv_rb_latter_request,
|
||||
.elevator_init_fn = deadline_init_queue,
|
||||
.elevator_exit_fn = deadline_exit_queue,
|
||||
},
|
||||
|
||||
.elevator_attrs = deadline_attrs,
|
||||
.elevator_name = "deadline",
|
||||
.elevator_owner = THIS_MODULE,
|
||||
};
|
||||
|
||||
static int __init deadline_init(void)
|
||||
{
|
||||
return elv_register(&iosched_deadline);
|
||||
}
|
||||
|
||||
static void __exit deadline_exit(void)
|
||||
{
|
||||
elv_unregister(&iosched_deadline);
|
||||
}
|
||||
|
||||
module_init(deadline_init);
|
||||
module_exit(deadline_exit);
|
||||
|
||||
MODULE_AUTHOR("Jens Axboe");
|
||||
MODULE_LICENSE("GPL");
|
||||
MODULE_DESCRIPTION("deadline IO scheduler");
|
475
block/elevator.c
475
block/elevator.c
|
@ -61,10 +61,8 @@ static int elv_iosched_allow_bio_merge(struct request *rq, struct bio *bio)
|
|||
struct request_queue *q = rq->q;
|
||||
struct elevator_queue *e = q->elevator;
|
||||
|
||||
if (e->uses_mq && e->type->ops.mq.allow_merge)
|
||||
return e->type->ops.mq.allow_merge(q, rq, bio);
|
||||
else if (!e->uses_mq && e->type->ops.sq.elevator_allow_bio_merge_fn)
|
||||
return e->type->ops.sq.elevator_allow_bio_merge_fn(q, rq, bio);
|
||||
if (e->type->ops.allow_merge)
|
||||
return e->type->ops.allow_merge(q, rq, bio);
|
||||
|
||||
return 1;
|
||||
}
|
||||
|
@ -95,14 +93,14 @@ static bool elevator_match(const struct elevator_type *e, const char *name)
|
|||
}
|
||||
|
||||
/*
|
||||
* Return scheduler with name 'name' and with matching 'mq capability
|
||||
* Return scheduler with name 'name'
|
||||
*/
|
||||
static struct elevator_type *elevator_find(const char *name, bool mq)
|
||||
static struct elevator_type *elevator_find(const char *name)
|
||||
{
|
||||
struct elevator_type *e;
|
||||
|
||||
list_for_each_entry(e, &elv_list, list) {
|
||||
if (elevator_match(e, name) && (mq == e->uses_mq))
|
||||
if (elevator_match(e, name))
|
||||
return e;
|
||||
}
|
||||
|
||||
|
@ -121,12 +119,12 @@ static struct elevator_type *elevator_get(struct request_queue *q,
|
|||
|
||||
spin_lock(&elv_list_lock);
|
||||
|
||||
e = elevator_find(name, q->mq_ops != NULL);
|
||||
e = elevator_find(name);
|
||||
if (!e && try_loading) {
|
||||
spin_unlock(&elv_list_lock);
|
||||
request_module("%s-iosched", name);
|
||||
spin_lock(&elv_list_lock);
|
||||
e = elevator_find(name, q->mq_ops != NULL);
|
||||
e = elevator_find(name);
|
||||
}
|
||||
|
||||
if (e && !try_module_get(e->elevator_owner))
|
||||
|
@ -150,26 +148,6 @@ static int __init elevator_setup(char *str)
|
|||
|
||||
__setup("elevator=", elevator_setup);
|
||||
|
||||
/* called during boot to load the elevator chosen by the elevator param */
|
||||
void __init load_default_elevator_module(void)
|
||||
{
|
||||
struct elevator_type *e;
|
||||
|
||||
if (!chosen_elevator[0])
|
||||
return;
|
||||
|
||||
/*
|
||||
* Boot parameter is deprecated, we haven't supported that for MQ.
|
||||
* Only look for non-mq schedulers from here.
|
||||
*/
|
||||
spin_lock(&elv_list_lock);
|
||||
e = elevator_find(chosen_elevator, false);
|
||||
spin_unlock(&elv_list_lock);
|
||||
|
||||
if (!e)
|
||||
request_module("%s-iosched", chosen_elevator);
|
||||
}
|
||||
|
||||
static struct kobj_type elv_ktype;
|
||||
|
||||
struct elevator_queue *elevator_alloc(struct request_queue *q,
|
||||
|
@ -185,7 +163,6 @@ struct elevator_queue *elevator_alloc(struct request_queue *q,
|
|||
kobject_init(&eq->kobj, &elv_ktype);
|
||||
mutex_init(&eq->sysfs_lock);
|
||||
hash_init(eq->hash);
|
||||
eq->uses_mq = e->uses_mq;
|
||||
|
||||
return eq;
|
||||
}
|
||||
|
@ -200,54 +177,11 @@ static void elevator_release(struct kobject *kobj)
|
|||
kfree(e);
|
||||
}
|
||||
|
||||
/*
|
||||
* Use the default elevator specified by config boot param for non-mq devices,
|
||||
* or by config option. Don't try to load modules as we could be running off
|
||||
* async and request_module() isn't allowed from async.
|
||||
*/
|
||||
int elevator_init(struct request_queue *q)
|
||||
{
|
||||
struct elevator_type *e = NULL;
|
||||
int err = 0;
|
||||
|
||||
/*
|
||||
* q->sysfs_lock must be held to provide mutual exclusion between
|
||||
* elevator_switch() and here.
|
||||
*/
|
||||
mutex_lock(&q->sysfs_lock);
|
||||
if (unlikely(q->elevator))
|
||||
goto out_unlock;
|
||||
|
||||
if (*chosen_elevator) {
|
||||
e = elevator_get(q, chosen_elevator, false);
|
||||
if (!e)
|
||||
printk(KERN_ERR "I/O scheduler %s not found\n",
|
||||
chosen_elevator);
|
||||
}
|
||||
|
||||
if (!e)
|
||||
e = elevator_get(q, CONFIG_DEFAULT_IOSCHED, false);
|
||||
if (!e) {
|
||||
printk(KERN_ERR
|
||||
"Default I/O scheduler not found. Using noop.\n");
|
||||
e = elevator_get(q, "noop", false);
|
||||
}
|
||||
|
||||
err = e->ops.sq.elevator_init_fn(q, e);
|
||||
if (err)
|
||||
elevator_put(e);
|
||||
out_unlock:
|
||||
mutex_unlock(&q->sysfs_lock);
|
||||
return err;
|
||||
}
|
||||
|
||||
void elevator_exit(struct request_queue *q, struct elevator_queue *e)
|
||||
{
|
||||
mutex_lock(&e->sysfs_lock);
|
||||
if (e->uses_mq && e->type->ops.mq.exit_sched)
|
||||
if (e->type->ops.exit_sched)
|
||||
blk_mq_exit_sched(q, e);
|
||||
else if (!e->uses_mq && e->type->ops.sq.elevator_exit_fn)
|
||||
e->type->ops.sq.elevator_exit_fn(e);
|
||||
mutex_unlock(&e->sysfs_lock);
|
||||
|
||||
kobject_put(&e->kobj);
|
||||
|
@ -356,68 +290,6 @@ struct request *elv_rb_find(struct rb_root *root, sector_t sector)
|
|||
}
|
||||
EXPORT_SYMBOL(elv_rb_find);
|
||||
|
||||
/*
|
||||
* Insert rq into dispatch queue of q. Queue lock must be held on
|
||||
* entry. rq is sort instead into the dispatch queue. To be used by
|
||||
* specific elevators.
|
||||
*/
|
||||
void elv_dispatch_sort(struct request_queue *q, struct request *rq)
|
||||
{
|
||||
sector_t boundary;
|
||||
struct list_head *entry;
|
||||
|
||||
if (q->last_merge == rq)
|
||||
q->last_merge = NULL;
|
||||
|
||||
elv_rqhash_del(q, rq);
|
||||
|
||||
q->nr_sorted--;
|
||||
|
||||
boundary = q->end_sector;
|
||||
list_for_each_prev(entry, &q->queue_head) {
|
||||
struct request *pos = list_entry_rq(entry);
|
||||
|
||||
if (req_op(rq) != req_op(pos))
|
||||
break;
|
||||
if (rq_data_dir(rq) != rq_data_dir(pos))
|
||||
break;
|
||||
if (pos->rq_flags & (RQF_STARTED | RQF_SOFTBARRIER))
|
||||
break;
|
||||
if (blk_rq_pos(rq) >= boundary) {
|
||||
if (blk_rq_pos(pos) < boundary)
|
||||
continue;
|
||||
} else {
|
||||
if (blk_rq_pos(pos) >= boundary)
|
||||
break;
|
||||
}
|
||||
if (blk_rq_pos(rq) >= blk_rq_pos(pos))
|
||||
break;
|
||||
}
|
||||
|
||||
list_add(&rq->queuelist, entry);
|
||||
}
|
||||
EXPORT_SYMBOL(elv_dispatch_sort);
|
||||
|
||||
/*
|
||||
* Insert rq into dispatch queue of q. Queue lock must be held on
|
||||
* entry. rq is added to the back of the dispatch queue. To be used by
|
||||
* specific elevators.
|
||||
*/
|
||||
void elv_dispatch_add_tail(struct request_queue *q, struct request *rq)
|
||||
{
|
||||
if (q->last_merge == rq)
|
||||
q->last_merge = NULL;
|
||||
|
||||
elv_rqhash_del(q, rq);
|
||||
|
||||
q->nr_sorted--;
|
||||
|
||||
q->end_sector = rq_end_sector(rq);
|
||||
q->boundary_rq = rq;
|
||||
list_add_tail(&rq->queuelist, &q->queue_head);
|
||||
}
|
||||
EXPORT_SYMBOL(elv_dispatch_add_tail);
|
||||
|
||||
enum elv_merge elv_merge(struct request_queue *q, struct request **req,
|
||||
struct bio *bio)
|
||||
{
|
||||
|
@ -457,10 +329,8 @@ enum elv_merge elv_merge(struct request_queue *q, struct request **req,
|
|||
return ELEVATOR_BACK_MERGE;
|
||||
}
|
||||
|
||||
if (e->uses_mq && e->type->ops.mq.request_merge)
|
||||
return e->type->ops.mq.request_merge(q, req, bio);
|
||||
else if (!e->uses_mq && e->type->ops.sq.elevator_merge_fn)
|
||||
return e->type->ops.sq.elevator_merge_fn(q, req, bio);
|
||||
if (e->type->ops.request_merge)
|
||||
return e->type->ops.request_merge(q, req, bio);
|
||||
|
||||
return ELEVATOR_NO_MERGE;
|
||||
}
|
||||
|
@ -511,10 +381,8 @@ void elv_merged_request(struct request_queue *q, struct request *rq,
|
|||
{
|
||||
struct elevator_queue *e = q->elevator;
|
||||
|
||||
if (e->uses_mq && e->type->ops.mq.request_merged)
|
||||
e->type->ops.mq.request_merged(q, rq, type);
|
||||
else if (!e->uses_mq && e->type->ops.sq.elevator_merged_fn)
|
||||
e->type->ops.sq.elevator_merged_fn(q, rq, type);
|
||||
if (e->type->ops.request_merged)
|
||||
e->type->ops.request_merged(q, rq, type);
|
||||
|
||||
if (type == ELEVATOR_BACK_MERGE)
|
||||
elv_rqhash_reposition(q, rq);
|
||||
|
@ -526,176 +394,20 @@ void elv_merge_requests(struct request_queue *q, struct request *rq,
|
|||
struct request *next)
|
||||
{
|
||||
struct elevator_queue *e = q->elevator;
|
||||
bool next_sorted = false;
|
||||
|
||||
if (e->uses_mq && e->type->ops.mq.requests_merged)
|
||||
e->type->ops.mq.requests_merged(q, rq, next);
|
||||
else if (e->type->ops.sq.elevator_merge_req_fn) {
|
||||
next_sorted = (__force bool)(next->rq_flags & RQF_SORTED);
|
||||
if (next_sorted)
|
||||
e->type->ops.sq.elevator_merge_req_fn(q, rq, next);
|
||||
}
|
||||
if (e->type->ops.requests_merged)
|
||||
e->type->ops.requests_merged(q, rq, next);
|
||||
|
||||
elv_rqhash_reposition(q, rq);
|
||||
|
||||
if (next_sorted) {
|
||||
elv_rqhash_del(q, next);
|
||||
q->nr_sorted--;
|
||||
}
|
||||
|
||||
q->last_merge = rq;
|
||||
}
|
||||
|
||||
void elv_bio_merged(struct request_queue *q, struct request *rq,
|
||||
struct bio *bio)
|
||||
{
|
||||
struct elevator_queue *e = q->elevator;
|
||||
|
||||
if (WARN_ON_ONCE(e->uses_mq))
|
||||
return;
|
||||
|
||||
if (e->type->ops.sq.elevator_bio_merged_fn)
|
||||
e->type->ops.sq.elevator_bio_merged_fn(q, rq, bio);
|
||||
}
|
||||
|
||||
void elv_requeue_request(struct request_queue *q, struct request *rq)
|
||||
{
|
||||
/*
|
||||
* it already went through dequeue, we need to decrement the
|
||||
* in_flight count again
|
||||
*/
|
||||
if (blk_account_rq(rq)) {
|
||||
q->in_flight[rq_is_sync(rq)]--;
|
||||
if (rq->rq_flags & RQF_SORTED)
|
||||
elv_deactivate_rq(q, rq);
|
||||
}
|
||||
|
||||
rq->rq_flags &= ~RQF_STARTED;
|
||||
|
||||
blk_pm_requeue_request(rq);
|
||||
|
||||
__elv_add_request(q, rq, ELEVATOR_INSERT_REQUEUE);
|
||||
}
|
||||
|
||||
void elv_drain_elevator(struct request_queue *q)
|
||||
{
|
||||
struct elevator_queue *e = q->elevator;
|
||||
static int printed;
|
||||
|
||||
if (WARN_ON_ONCE(e->uses_mq))
|
||||
return;
|
||||
|
||||
lockdep_assert_held(q->queue_lock);
|
||||
|
||||
while (e->type->ops.sq.elevator_dispatch_fn(q, 1))
|
||||
;
|
||||
if (q->nr_sorted && !blk_queue_is_zoned(q) && printed++ < 10 ) {
|
||||
printk(KERN_ERR "%s: forced dispatching is broken "
|
||||
"(nr_sorted=%u), please report this\n",
|
||||
q->elevator->type->elevator_name, q->nr_sorted);
|
||||
}
|
||||
}
|
||||
|
||||
void __elv_add_request(struct request_queue *q, struct request *rq, int where)
|
||||
{
|
||||
trace_block_rq_insert(q, rq);
|
||||
|
||||
blk_pm_add_request(q, rq);
|
||||
|
||||
rq->q = q;
|
||||
|
||||
if (rq->rq_flags & RQF_SOFTBARRIER) {
|
||||
/* barriers are scheduling boundary, update end_sector */
|
||||
if (!blk_rq_is_passthrough(rq)) {
|
||||
q->end_sector = rq_end_sector(rq);
|
||||
q->boundary_rq = rq;
|
||||
}
|
||||
} else if (!(rq->rq_flags & RQF_ELVPRIV) &&
|
||||
(where == ELEVATOR_INSERT_SORT ||
|
||||
where == ELEVATOR_INSERT_SORT_MERGE))
|
||||
where = ELEVATOR_INSERT_BACK;
|
||||
|
||||
switch (where) {
|
||||
case ELEVATOR_INSERT_REQUEUE:
|
||||
case ELEVATOR_INSERT_FRONT:
|
||||
rq->rq_flags |= RQF_SOFTBARRIER;
|
||||
list_add(&rq->queuelist, &q->queue_head);
|
||||
break;
|
||||
|
||||
case ELEVATOR_INSERT_BACK:
|
||||
rq->rq_flags |= RQF_SOFTBARRIER;
|
||||
elv_drain_elevator(q);
|
||||
list_add_tail(&rq->queuelist, &q->queue_head);
|
||||
/*
|
||||
* We kick the queue here for the following reasons.
|
||||
* - The elevator might have returned NULL previously
|
||||
* to delay requests and returned them now. As the
|
||||
* queue wasn't empty before this request, ll_rw_blk
|
||||
* won't run the queue on return, resulting in hang.
|
||||
* - Usually, back inserted requests won't be merged
|
||||
* with anything. There's no point in delaying queue
|
||||
* processing.
|
||||
*/
|
||||
__blk_run_queue(q);
|
||||
break;
|
||||
|
||||
case ELEVATOR_INSERT_SORT_MERGE:
|
||||
/*
|
||||
* If we succeed in merging this request with one in the
|
||||
* queue already, we are done - rq has now been freed,
|
||||
* so no need to do anything further.
|
||||
*/
|
||||
if (elv_attempt_insert_merge(q, rq))
|
||||
break;
|
||||
/* fall through */
|
||||
case ELEVATOR_INSERT_SORT:
|
||||
BUG_ON(blk_rq_is_passthrough(rq));
|
||||
rq->rq_flags |= RQF_SORTED;
|
||||
q->nr_sorted++;
|
||||
if (rq_mergeable(rq)) {
|
||||
elv_rqhash_add(q, rq);
|
||||
if (!q->last_merge)
|
||||
q->last_merge = rq;
|
||||
}
|
||||
|
||||
/*
|
||||
* Some ioscheds (cfq) run q->request_fn directly, so
|
||||
* rq cannot be accessed after calling
|
||||
* elevator_add_req_fn.
|
||||
*/
|
||||
q->elevator->type->ops.sq.elevator_add_req_fn(q, rq);
|
||||
break;
|
||||
|
||||
case ELEVATOR_INSERT_FLUSH:
|
||||
rq->rq_flags |= RQF_SOFTBARRIER;
|
||||
blk_insert_flush(rq);
|
||||
break;
|
||||
default:
|
||||
printk(KERN_ERR "%s: bad insertion point %d\n",
|
||||
__func__, where);
|
||||
BUG();
|
||||
}
|
||||
}
|
||||
EXPORT_SYMBOL(__elv_add_request);
|
||||
|
||||
void elv_add_request(struct request_queue *q, struct request *rq, int where)
|
||||
{
|
||||
unsigned long flags;
|
||||
|
||||
spin_lock_irqsave(q->queue_lock, flags);
|
||||
__elv_add_request(q, rq, where);
|
||||
spin_unlock_irqrestore(q->queue_lock, flags);
|
||||
}
|
||||
EXPORT_SYMBOL(elv_add_request);
|
||||
|
||||
struct request *elv_latter_request(struct request_queue *q, struct request *rq)
|
||||
{
|
||||
struct elevator_queue *e = q->elevator;
|
||||
|
||||
if (e->uses_mq && e->type->ops.mq.next_request)
|
||||
return e->type->ops.mq.next_request(q, rq);
|
||||
else if (!e->uses_mq && e->type->ops.sq.elevator_latter_req_fn)
|
||||
return e->type->ops.sq.elevator_latter_req_fn(q, rq);
|
||||
if (e->type->ops.next_request)
|
||||
return e->type->ops.next_request(q, rq);
|
||||
|
||||
return NULL;
|
||||
}
|
||||
|
@ -704,68 +416,12 @@ struct request *elv_former_request(struct request_queue *q, struct request *rq)
|
|||
{
|
||||
struct elevator_queue *e = q->elevator;
|
||||
|
||||
if (e->uses_mq && e->type->ops.mq.former_request)
|
||||
return e->type->ops.mq.former_request(q, rq);
|
||||
if (!e->uses_mq && e->type->ops.sq.elevator_former_req_fn)
|
||||
return e->type->ops.sq.elevator_former_req_fn(q, rq);
|
||||
if (e->type->ops.former_request)
|
||||
return e->type->ops.former_request(q, rq);
|
||||
|
||||
return NULL;
|
||||
}
|
||||
|
||||
int elv_set_request(struct request_queue *q, struct request *rq,
|
||||
struct bio *bio, gfp_t gfp_mask)
|
||||
{
|
||||
struct elevator_queue *e = q->elevator;
|
||||
|
||||
if (WARN_ON_ONCE(e->uses_mq))
|
||||
return 0;
|
||||
|
||||
if (e->type->ops.sq.elevator_set_req_fn)
|
||||
return e->type->ops.sq.elevator_set_req_fn(q, rq, bio, gfp_mask);
|
||||
return 0;
|
||||
}
|
||||
|
||||
void elv_put_request(struct request_queue *q, struct request *rq)
|
||||
{
|
||||
struct elevator_queue *e = q->elevator;
|
||||
|
||||
if (WARN_ON_ONCE(e->uses_mq))
|
||||
return;
|
||||
|
||||
if (e->type->ops.sq.elevator_put_req_fn)
|
||||
e->type->ops.sq.elevator_put_req_fn(rq);
|
||||
}
|
||||
|
||||
int elv_may_queue(struct request_queue *q, unsigned int op)
|
||||
{
|
||||
struct elevator_queue *e = q->elevator;
|
||||
|
||||
if (WARN_ON_ONCE(e->uses_mq))
|
||||
return 0;
|
||||
|
||||
if (e->type->ops.sq.elevator_may_queue_fn)
|
||||
return e->type->ops.sq.elevator_may_queue_fn(q, op);
|
||||
|
||||
return ELV_MQUEUE_MAY;
|
||||
}
|
||||
|
||||
void elv_completed_request(struct request_queue *q, struct request *rq)
|
||||
{
|
||||
struct elevator_queue *e = q->elevator;
|
||||
|
||||
if (WARN_ON_ONCE(e->uses_mq))
|
||||
return;
|
||||
|
||||
/*
|
||||
* request is released from the driver, io must be done
|
||||
*/
|
||||
if (blk_account_rq(rq)) {
|
||||
q->in_flight[rq_is_sync(rq)]--;
|
||||
if ((rq->rq_flags & RQF_SORTED) &&
|
||||
e->type->ops.sq.elevator_completed_req_fn)
|
||||
e->type->ops.sq.elevator_completed_req_fn(q, rq);
|
||||
}
|
||||
}
|
||||
|
||||
#define to_elv(atr) container_of((atr), struct elv_fs_entry, attr)
|
||||
|
||||
static ssize_t
|
||||
|
@ -832,8 +488,6 @@ int elv_register_queue(struct request_queue *q)
|
|||
}
|
||||
kobject_uevent(&e->kobj, KOBJ_ADD);
|
||||
e->registered = 1;
|
||||
if (!e->uses_mq && e->type->ops.sq.elevator_registered_fn)
|
||||
e->type->ops.sq.elevator_registered_fn(q);
|
||||
}
|
||||
return error;
|
||||
}
|
||||
|
@ -873,7 +527,7 @@ int elv_register(struct elevator_type *e)
|
|||
|
||||
/* register, don't allow duplicate names */
|
||||
spin_lock(&elv_list_lock);
|
||||
if (elevator_find(e->elevator_name, e->uses_mq)) {
|
||||
if (elevator_find(e->elevator_name)) {
|
||||
spin_unlock(&elv_list_lock);
|
||||
kmem_cache_destroy(e->icq_cache);
|
||||
return -EBUSY;
|
||||
|
@ -881,12 +535,6 @@ int elv_register(struct elevator_type *e)
|
|||
list_add_tail(&e->list, &elv_list);
|
||||
spin_unlock(&elv_list_lock);
|
||||
|
||||
/* print pretty message */
|
||||
if (elevator_match(e, chosen_elevator) ||
|
||||
(!*chosen_elevator &&
|
||||
elevator_match(e, CONFIG_DEFAULT_IOSCHED)))
|
||||
def = " (default)";
|
||||
|
||||
printk(KERN_INFO "io scheduler %s registered%s\n", e->elevator_name,
|
||||
def);
|
||||
return 0;
|
||||
|
@ -989,71 +637,17 @@ out_unlock:
|
|||
*/
|
||||
static int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
|
||||
{
|
||||
struct elevator_queue *old = q->elevator;
|
||||
bool old_registered = false;
|
||||
int err;
|
||||
|
||||
lockdep_assert_held(&q->sysfs_lock);
|
||||
|
||||
if (q->mq_ops) {
|
||||
blk_mq_freeze_queue(q);
|
||||
blk_mq_quiesce_queue(q);
|
||||
blk_mq_freeze_queue(q);
|
||||
blk_mq_quiesce_queue(q);
|
||||
|
||||
err = elevator_switch_mq(q, new_e);
|
||||
err = elevator_switch_mq(q, new_e);
|
||||
|
||||
blk_mq_unquiesce_queue(q);
|
||||
blk_mq_unfreeze_queue(q);
|
||||
|
||||
return err;
|
||||
}
|
||||
|
||||
/*
|
||||
* Turn on BYPASS and drain all requests w/ elevator private data.
|
||||
* Block layer doesn't call into a quiesced elevator - all requests
|
||||
* are directly put on the dispatch list without elevator data
|
||||
* using INSERT_BACK. All requests have SOFTBARRIER set and no
|
||||
* merge happens either.
|
||||
*/
|
||||
if (old) {
|
||||
old_registered = old->registered;
|
||||
|
||||
blk_queue_bypass_start(q);
|
||||
|
||||
/* unregister and clear all auxiliary data of the old elevator */
|
||||
if (old_registered)
|
||||
elv_unregister_queue(q);
|
||||
|
||||
ioc_clear_queue(q);
|
||||
}
|
||||
|
||||
/* allocate, init and register new elevator */
|
||||
err = new_e->ops.sq.elevator_init_fn(q, new_e);
|
||||
if (err)
|
||||
goto fail_init;
|
||||
|
||||
err = elv_register_queue(q);
|
||||
if (err)
|
||||
goto fail_register;
|
||||
|
||||
/* done, kill the old one and finish */
|
||||
if (old) {
|
||||
elevator_exit(q, old);
|
||||
blk_queue_bypass_end(q);
|
||||
}
|
||||
|
||||
blk_add_trace_msg(q, "elv switch: %s", new_e->elevator_name);
|
||||
|
||||
return 0;
|
||||
|
||||
fail_register:
|
||||
elevator_exit(q, q->elevator);
|
||||
fail_init:
|
||||
/* switch failed, restore and re-register old elevator */
|
||||
if (old) {
|
||||
q->elevator = old;
|
||||
elv_register_queue(q);
|
||||
blk_queue_bypass_end(q);
|
||||
}
|
||||
blk_mq_unquiesce_queue(q);
|
||||
blk_mq_unfreeze_queue(q);
|
||||
|
||||
return err;
|
||||
}
|
||||
|
@ -1073,7 +667,7 @@ static int __elevator_change(struct request_queue *q, const char *name)
|
|||
/*
|
||||
* Special case for mq, turn off scheduling
|
||||
*/
|
||||
if (q->mq_ops && !strncmp(name, "none", 4))
|
||||
if (!strncmp(name, "none", 4))
|
||||
return elevator_switch(q, NULL);
|
||||
|
||||
strlcpy(elevator_name, name, sizeof(elevator_name));
|
||||
|
@ -1091,8 +685,7 @@ static int __elevator_change(struct request_queue *q, const char *name)
|
|||
|
||||
static inline bool elv_support_iosched(struct request_queue *q)
|
||||
{
|
||||
if (q->mq_ops && q->tag_set && (q->tag_set->flags &
|
||||
BLK_MQ_F_NO_SCHED))
|
||||
if (q->tag_set && (q->tag_set->flags & BLK_MQ_F_NO_SCHED))
|
||||
return false;
|
||||
return true;
|
||||
}
|
||||
|
@ -1102,7 +695,7 @@ ssize_t elv_iosched_store(struct request_queue *q, const char *name,
|
|||
{
|
||||
int ret;
|
||||
|
||||
if (!(q->mq_ops || q->request_fn) || !elv_support_iosched(q))
|
||||
if (!queue_is_mq(q) || !elv_support_iosched(q))
|
||||
return count;
|
||||
|
||||
ret = __elevator_change(q, name);
|
||||
|
@ -1117,10 +710,9 @@ ssize_t elv_iosched_show(struct request_queue *q, char *name)
|
|||
struct elevator_queue *e = q->elevator;
|
||||
struct elevator_type *elv = NULL;
|
||||
struct elevator_type *__e;
|
||||
bool uses_mq = q->mq_ops != NULL;
|
||||
int len = 0;
|
||||
|
||||
if (!queue_is_rq_based(q))
|
||||
if (!queue_is_mq(q))
|
||||
return sprintf(name, "none\n");
|
||||
|
||||
if (!q->elevator)
|
||||
|
@ -1130,19 +722,16 @@ ssize_t elv_iosched_show(struct request_queue *q, char *name)
|
|||
|
||||
spin_lock(&elv_list_lock);
|
||||
list_for_each_entry(__e, &elv_list, list) {
|
||||
if (elv && elevator_match(elv, __e->elevator_name) &&
|
||||
(__e->uses_mq == uses_mq)) {
|
||||
if (elv && elevator_match(elv, __e->elevator_name)) {
|
||||
len += sprintf(name+len, "[%s] ", elv->elevator_name);
|
||||
continue;
|
||||
}
|
||||
if (__e->uses_mq && q->mq_ops && elv_support_iosched(q))
|
||||
len += sprintf(name+len, "%s ", __e->elevator_name);
|
||||
else if (!__e->uses_mq && !q->mq_ops)
|
||||
if (elv_support_iosched(q))
|
||||
len += sprintf(name+len, "%s ", __e->elevator_name);
|
||||
}
|
||||
spin_unlock(&elv_list_lock);
|
||||
|
||||
if (q->mq_ops && q->elevator)
|
||||
if (q->elevator)
|
||||
len += sprintf(name+len, "none");
|
||||
|
||||
len += sprintf(len+name, "\n");
|
||||
|
|
|
@ -47,51 +47,64 @@ static void disk_release_events(struct gendisk *disk);
|
|||
|
||||
void part_inc_in_flight(struct request_queue *q, struct hd_struct *part, int rw)
|
||||
{
|
||||
if (q->mq_ops)
|
||||
if (queue_is_mq(q))
|
||||
return;
|
||||
|
||||
atomic_inc(&part->in_flight[rw]);
|
||||
part_stat_local_inc(part, in_flight[rw]);
|
||||
if (part->partno)
|
||||
atomic_inc(&part_to_disk(part)->part0.in_flight[rw]);
|
||||
part_stat_local_inc(&part_to_disk(part)->part0, in_flight[rw]);
|
||||
}
|
||||
|
||||
void part_dec_in_flight(struct request_queue *q, struct hd_struct *part, int rw)
|
||||
{
|
||||
if (q->mq_ops)
|
||||
if (queue_is_mq(q))
|
||||
return;
|
||||
|
||||
atomic_dec(&part->in_flight[rw]);
|
||||
part_stat_local_dec(part, in_flight[rw]);
|
||||
if (part->partno)
|
||||
atomic_dec(&part_to_disk(part)->part0.in_flight[rw]);
|
||||
part_stat_local_dec(&part_to_disk(part)->part0, in_flight[rw]);
|
||||
}
|
||||
|
||||
void part_in_flight(struct request_queue *q, struct hd_struct *part,
|
||||
unsigned int inflight[2])
|
||||
unsigned int part_in_flight(struct request_queue *q, struct hd_struct *part)
|
||||
{
|
||||
if (q->mq_ops) {
|
||||
blk_mq_in_flight(q, part, inflight);
|
||||
return;
|
||||
int cpu;
|
||||
unsigned int inflight;
|
||||
|
||||
if (queue_is_mq(q)) {
|
||||
return blk_mq_in_flight(q, part);
|
||||
}
|
||||
|
||||
inflight[0] = atomic_read(&part->in_flight[0]) +
|
||||
atomic_read(&part->in_flight[1]);
|
||||
if (part->partno) {
|
||||
part = &part_to_disk(part)->part0;
|
||||
inflight[1] = atomic_read(&part->in_flight[0]) +
|
||||
atomic_read(&part->in_flight[1]);
|
||||
inflight = 0;
|
||||
for_each_possible_cpu(cpu) {
|
||||
inflight += part_stat_local_read_cpu(part, in_flight[0], cpu) +
|
||||
part_stat_local_read_cpu(part, in_flight[1], cpu);
|
||||
}
|
||||
if ((int)inflight < 0)
|
||||
inflight = 0;
|
||||
|
||||
return inflight;
|
||||
}
|
||||
|
||||
void part_in_flight_rw(struct request_queue *q, struct hd_struct *part,
|
||||
unsigned int inflight[2])
|
||||
{
|
||||
if (q->mq_ops) {
|
||||
int cpu;
|
||||
|
||||
if (queue_is_mq(q)) {
|
||||
blk_mq_in_flight_rw(q, part, inflight);
|
||||
return;
|
||||
}
|
||||
|
||||
inflight[0] = atomic_read(&part->in_flight[0]);
|
||||
inflight[1] = atomic_read(&part->in_flight[1]);
|
||||
inflight[0] = 0;
|
||||
inflight[1] = 0;
|
||||
for_each_possible_cpu(cpu) {
|
||||
inflight[0] += part_stat_local_read_cpu(part, in_flight[0], cpu);
|
||||
inflight[1] += part_stat_local_read_cpu(part, in_flight[1], cpu);
|
||||
}
|
||||
if ((int)inflight[0] < 0)
|
||||
inflight[0] = 0;
|
||||
if ((int)inflight[1] < 0)
|
||||
inflight[1] = 0;
|
||||
}
|
||||
|
||||
struct hd_struct *__disk_get_part(struct gendisk *disk, int partno)
|
||||
|
@ -1325,8 +1338,7 @@ static int diskstats_show(struct seq_file *seqf, void *v)
|
|||
struct disk_part_iter piter;
|
||||
struct hd_struct *hd;
|
||||
char buf[BDEVNAME_SIZE];
|
||||
unsigned int inflight[2];
|
||||
int cpu;
|
||||
unsigned int inflight;
|
||||
|
||||
/*
|
||||
if (&disk_to_dev(gp)->kobj.entry == block_class.devices.next)
|
||||
|
@ -1338,10 +1350,7 @@ static int diskstats_show(struct seq_file *seqf, void *v)
|
|||
|
||||
disk_part_iter_init(&piter, gp, DISK_PITER_INCL_EMPTY_PART0);
|
||||
while ((hd = disk_part_iter_next(&piter))) {
|
||||
cpu = part_stat_lock();
|
||||
part_round_stats(gp->queue, cpu, hd);
|
||||
part_stat_unlock();
|
||||
part_in_flight(gp->queue, hd, inflight);
|
||||
inflight = part_in_flight(gp->queue, hd);
|
||||
seq_printf(seqf, "%4d %7d %s "
|
||||
"%lu %lu %lu %u "
|
||||
"%lu %lu %lu %u "
|
||||
|
@ -1357,7 +1366,7 @@ static int diskstats_show(struct seq_file *seqf, void *v)
|
|||
part_stat_read(hd, merges[STAT_WRITE]),
|
||||
part_stat_read(hd, sectors[STAT_WRITE]),
|
||||
(unsigned int)part_stat_read_msecs(hd, STAT_WRITE),
|
||||
inflight[0],
|
||||
inflight,
|
||||
jiffies_to_msecs(part_stat_read(hd, io_ticks)),
|
||||
jiffies_to_msecs(part_stat_read(hd, time_in_queue)),
|
||||
part_stat_read(hd, ios[STAT_DISCARD]),
|
||||
|
|
|
@ -195,7 +195,7 @@ struct kyber_hctx_data {
|
|||
unsigned int batching;
|
||||
struct kyber_ctx_queue *kcqs;
|
||||
struct sbitmap kcq_map[KYBER_NUM_DOMAINS];
|
||||
wait_queue_entry_t domain_wait[KYBER_NUM_DOMAINS];
|
||||
struct sbq_wait domain_wait[KYBER_NUM_DOMAINS];
|
||||
struct sbq_wait_state *domain_ws[KYBER_NUM_DOMAINS];
|
||||
atomic_t wait_index[KYBER_NUM_DOMAINS];
|
||||
};
|
||||
|
@ -501,10 +501,11 @@ static int kyber_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
|
|||
|
||||
for (i = 0; i < KYBER_NUM_DOMAINS; i++) {
|
||||
INIT_LIST_HEAD(&khd->rqs[i]);
|
||||
init_waitqueue_func_entry(&khd->domain_wait[i],
|
||||
khd->domain_wait[i].sbq = NULL;
|
||||
init_waitqueue_func_entry(&khd->domain_wait[i].wait,
|
||||
kyber_domain_wake);
|
||||
khd->domain_wait[i].private = hctx;
|
||||
INIT_LIST_HEAD(&khd->domain_wait[i].entry);
|
||||
khd->domain_wait[i].wait.private = hctx;
|
||||
INIT_LIST_HEAD(&khd->domain_wait[i].wait.entry);
|
||||
atomic_set(&khd->wait_index[i], 0);
|
||||
}
|
||||
|
||||
|
@ -576,7 +577,7 @@ static bool kyber_bio_merge(struct blk_mq_hw_ctx *hctx, struct bio *bio)
|
|||
{
|
||||
struct kyber_hctx_data *khd = hctx->sched_data;
|
||||
struct blk_mq_ctx *ctx = blk_mq_get_ctx(hctx->queue);
|
||||
struct kyber_ctx_queue *kcq = &khd->kcqs[ctx->index_hw];
|
||||
struct kyber_ctx_queue *kcq = &khd->kcqs[ctx->index_hw[hctx->type]];
|
||||
unsigned int sched_domain = kyber_sched_domain(bio->bi_opf);
|
||||
struct list_head *rq_list = &kcq->rq_list[sched_domain];
|
||||
bool merged;
|
||||
|
@ -602,7 +603,7 @@ static void kyber_insert_requests(struct blk_mq_hw_ctx *hctx,
|
|||
|
||||
list_for_each_entry_safe(rq, next, rq_list, queuelist) {
|
||||
unsigned int sched_domain = kyber_sched_domain(rq->cmd_flags);
|
||||
struct kyber_ctx_queue *kcq = &khd->kcqs[rq->mq_ctx->index_hw];
|
||||
struct kyber_ctx_queue *kcq = &khd->kcqs[rq->mq_ctx->index_hw[hctx->type]];
|
||||
struct list_head *head = &kcq->rq_list[sched_domain];
|
||||
|
||||
spin_lock(&kcq->lock);
|
||||
|
@ -611,7 +612,7 @@ static void kyber_insert_requests(struct blk_mq_hw_ctx *hctx,
|
|||
else
|
||||
list_move_tail(&rq->queuelist, head);
|
||||
sbitmap_set_bit(&khd->kcq_map[sched_domain],
|
||||
rq->mq_ctx->index_hw);
|
||||
rq->mq_ctx->index_hw[hctx->type]);
|
||||
blk_mq_sched_request_inserted(rq);
|
||||
spin_unlock(&kcq->lock);
|
||||
}
|
||||
|
@ -698,12 +699,13 @@ static void kyber_flush_busy_kcqs(struct kyber_hctx_data *khd,
|
|||
flush_busy_kcq, &data);
|
||||
}
|
||||
|
||||
static int kyber_domain_wake(wait_queue_entry_t *wait, unsigned mode, int flags,
|
||||
static int kyber_domain_wake(wait_queue_entry_t *wqe, unsigned mode, int flags,
|
||||
void *key)
|
||||
{
|
||||
struct blk_mq_hw_ctx *hctx = READ_ONCE(wait->private);
|
||||
struct blk_mq_hw_ctx *hctx = READ_ONCE(wqe->private);
|
||||
struct sbq_wait *wait = container_of(wqe, struct sbq_wait, wait);
|
||||
|
||||
list_del_init(&wait->entry);
|
||||
sbitmap_del_wait_queue(wait);
|
||||
blk_mq_run_hw_queue(hctx, true);
|
||||
return 1;
|
||||
}
|
||||
|
@ -714,7 +716,7 @@ static int kyber_get_domain_token(struct kyber_queue_data *kqd,
|
|||
{
|
||||
unsigned int sched_domain = khd->cur_domain;
|
||||
struct sbitmap_queue *domain_tokens = &kqd->domain_tokens[sched_domain];
|
||||
wait_queue_entry_t *wait = &khd->domain_wait[sched_domain];
|
||||
struct sbq_wait *wait = &khd->domain_wait[sched_domain];
|
||||
struct sbq_wait_state *ws;
|
||||
int nr;
|
||||
|
||||
|
@ -725,11 +727,11 @@ static int kyber_get_domain_token(struct kyber_queue_data *kqd,
|
|||
* run when one becomes available. Note that this is serialized on
|
||||
* khd->lock, but we still need to be careful about the waker.
|
||||
*/
|
||||
if (nr < 0 && list_empty_careful(&wait->entry)) {
|
||||
if (nr < 0 && list_empty_careful(&wait->wait.entry)) {
|
||||
ws = sbq_wait_ptr(domain_tokens,
|
||||
&khd->wait_index[sched_domain]);
|
||||
khd->domain_ws[sched_domain] = ws;
|
||||
add_wait_queue(&ws->wait, wait);
|
||||
sbitmap_add_wait_queue(domain_tokens, ws, wait);
|
||||
|
||||
/*
|
||||
* Try again in case a token was freed before we got on the wait
|
||||
|
@ -745,10 +747,10 @@ static int kyber_get_domain_token(struct kyber_queue_data *kqd,
|
|||
* between the !list_empty_careful() check and us grabbing the lock, but
|
||||
* list_del_init() is okay with that.
|
||||
*/
|
||||
if (nr >= 0 && !list_empty_careful(&wait->entry)) {
|
||||
if (nr >= 0 && !list_empty_careful(&wait->wait.entry)) {
|
||||
ws = khd->domain_ws[sched_domain];
|
||||
spin_lock_irq(&ws->wait.lock);
|
||||
list_del_init(&wait->entry);
|
||||
sbitmap_del_wait_queue(wait);
|
||||
spin_unlock_irq(&ws->wait.lock);
|
||||
}
|
||||
|
||||
|
@ -951,7 +953,7 @@ static int kyber_##name##_waiting_show(void *data, struct seq_file *m) \
|
|||
{ \
|
||||
struct blk_mq_hw_ctx *hctx = data; \
|
||||
struct kyber_hctx_data *khd = hctx->sched_data; \
|
||||
wait_queue_entry_t *wait = &khd->domain_wait[domain]; \
|
||||
wait_queue_entry_t *wait = &khd->domain_wait[domain].wait; \
|
||||
\
|
||||
seq_printf(m, "%d\n", !list_empty_careful(&wait->entry)); \
|
||||
return 0; \
|
||||
|
@ -1017,7 +1019,7 @@ static const struct blk_mq_debugfs_attr kyber_hctx_debugfs_attrs[] = {
|
|||
#endif
|
||||
|
||||
static struct elevator_type kyber_sched = {
|
||||
.ops.mq = {
|
||||
.ops = {
|
||||
.init_sched = kyber_init_sched,
|
||||
.exit_sched = kyber_exit_sched,
|
||||
.init_hctx = kyber_init_hctx,
|
||||
|
@ -1032,7 +1034,6 @@ static struct elevator_type kyber_sched = {
|
|||
.dispatch_request = kyber_dispatch_request,
|
||||
.has_work = kyber_has_work,
|
||||
},
|
||||
.uses_mq = true,
|
||||
#ifdef CONFIG_BLK_DEBUG_FS
|
||||
.queue_debugfs_attrs = kyber_queue_debugfs_attrs,
|
||||
.hctx_debugfs_attrs = kyber_hctx_debugfs_attrs,
|
||||
|
|
|
@ -373,9 +373,16 @@ done:
|
|||
|
||||
/*
|
||||
* One confusing aspect here is that we get called for a specific
|
||||
* hardware queue, but we return a request that may not be for a
|
||||
* hardware queue, but we may return a request that is for a
|
||||
* different hardware queue. This is because mq-deadline has shared
|
||||
* state for all hardware queues, in terms of sorting, FIFOs, etc.
|
||||
*
|
||||
* For a zoned block device, __dd_dispatch_request() may return NULL
|
||||
* if all the queued write requests are directed at zones that are already
|
||||
* locked due to on-going write requests. In this case, make sure to mark
|
||||
* the queue as needing a restart to ensure that the queue is run again
|
||||
* and the pending writes dispatched once the target zones for the ongoing
|
||||
* write requests are unlocked in dd_finish_request().
|
||||
*/
|
||||
static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
|
||||
{
|
||||
|
@ -384,6 +391,9 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
|
|||
|
||||
spin_lock(&dd->lock);
|
||||
rq = __dd_dispatch_request(dd);
|
||||
if (!rq && blk_queue_is_zoned(hctx->queue) &&
|
||||
!list_empty(&dd->fifo_list[WRITE]))
|
||||
blk_mq_sched_mark_restart_hctx(hctx);
|
||||
spin_unlock(&dd->lock);
|
||||
|
||||
return rq;
|
||||
|
@ -761,7 +771,7 @@ static const struct blk_mq_debugfs_attr deadline_queue_debugfs_attrs[] = {
|
|||
#endif
|
||||
|
||||
static struct elevator_type mq_deadline = {
|
||||
.ops.mq = {
|
||||
.ops = {
|
||||
.insert_requests = dd_insert_requests,
|
||||
.dispatch_request = dd_dispatch_request,
|
||||
.prepare_request = dd_prepare_request,
|
||||
|
@ -777,7 +787,6 @@ static struct elevator_type mq_deadline = {
|
|||
.exit_sched = dd_exit_queue,
|
||||
},
|
||||
|
||||
.uses_mq = true,
|
||||
#ifdef CONFIG_BLK_DEBUG_FS
|
||||
.queue_debugfs_attrs = deadline_queue_debugfs_attrs,
|
||||
#endif
|
||||
|
|
|
@ -1,124 +0,0 @@
|
|||
/*
|
||||
* elevator noop
|
||||
*/
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/elevator.h>
|
||||
#include <linux/bio.h>
|
||||
#include <linux/module.h>
|
||||
#include <linux/slab.h>
|
||||
#include <linux/init.h>
|
||||
|
||||
struct noop_data {
|
||||
struct list_head queue;
|
||||
};
|
||||
|
||||
static void noop_merged_requests(struct request_queue *q, struct request *rq,
|
||||
struct request *next)
|
||||
{
|
||||
list_del_init(&next->queuelist);
|
||||
}
|
||||
|
||||
static int noop_dispatch(struct request_queue *q, int force)
|
||||
{
|
||||
struct noop_data *nd = q->elevator->elevator_data;
|
||||
struct request *rq;
|
||||
|
||||
rq = list_first_entry_or_null(&nd->queue, struct request, queuelist);
|
||||
if (rq) {
|
||||
list_del_init(&rq->queuelist);
|
||||
elv_dispatch_sort(q, rq);
|
||||
return 1;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void noop_add_request(struct request_queue *q, struct request *rq)
|
||||
{
|
||||
struct noop_data *nd = q->elevator->elevator_data;
|
||||
|
||||
list_add_tail(&rq->queuelist, &nd->queue);
|
||||
}
|
||||
|
||||
static struct request *
|
||||
noop_former_request(struct request_queue *q, struct request *rq)
|
||||
{
|
||||
struct noop_data *nd = q->elevator->elevator_data;
|
||||
|
||||
if (rq->queuelist.prev == &nd->queue)
|
||||
return NULL;
|
||||
return list_prev_entry(rq, queuelist);
|
||||
}
|
||||
|
||||
static struct request *
|
||||
noop_latter_request(struct request_queue *q, struct request *rq)
|
||||
{
|
||||
struct noop_data *nd = q->elevator->elevator_data;
|
||||
|
||||
if (rq->queuelist.next == &nd->queue)
|
||||
return NULL;
|
||||
return list_next_entry(rq, queuelist);
|
||||
}
|
||||
|
||||
static int noop_init_queue(struct request_queue *q, struct elevator_type *e)
|
||||
{
|
||||
struct noop_data *nd;
|
||||
struct elevator_queue *eq;
|
||||
|
||||
eq = elevator_alloc(q, e);
|
||||
if (!eq)
|
||||
return -ENOMEM;
|
||||
|
||||
nd = kmalloc_node(sizeof(*nd), GFP_KERNEL, q->node);
|
||||
if (!nd) {
|
||||
kobject_put(&eq->kobj);
|
||||
return -ENOMEM;
|
||||
}
|
||||
eq->elevator_data = nd;
|
||||
|
||||
INIT_LIST_HEAD(&nd->queue);
|
||||
|
||||
spin_lock_irq(q->queue_lock);
|
||||
q->elevator = eq;
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void noop_exit_queue(struct elevator_queue *e)
|
||||
{
|
||||
struct noop_data *nd = e->elevator_data;
|
||||
|
||||
BUG_ON(!list_empty(&nd->queue));
|
||||
kfree(nd);
|
||||
}
|
||||
|
||||
static struct elevator_type elevator_noop = {
|
||||
.ops.sq = {
|
||||
.elevator_merge_req_fn = noop_merged_requests,
|
||||
.elevator_dispatch_fn = noop_dispatch,
|
||||
.elevator_add_req_fn = noop_add_request,
|
||||
.elevator_former_req_fn = noop_former_request,
|
||||
.elevator_latter_req_fn = noop_latter_request,
|
||||
.elevator_init_fn = noop_init_queue,
|
||||
.elevator_exit_fn = noop_exit_queue,
|
||||
},
|
||||
.elevator_name = "noop",
|
||||
.elevator_owner = THIS_MODULE,
|
||||
};
|
||||
|
||||
static int __init noop_init(void)
|
||||
{
|
||||
return elv_register(&elevator_noop);
|
||||
}
|
||||
|
||||
static void __exit noop_exit(void)
|
||||
{
|
||||
elv_unregister(&elevator_noop);
|
||||
}
|
||||
|
||||
module_init(noop_init);
|
||||
module_exit(noop_exit);
|
||||
|
||||
|
||||
MODULE_AUTHOR("Jens Axboe");
|
||||
MODULE_LICENSE("GPL");
|
||||
MODULE_DESCRIPTION("No-op IO scheduler");
|
|
@ -120,13 +120,9 @@ ssize_t part_stat_show(struct device *dev,
|
|||
{
|
||||
struct hd_struct *p = dev_to_part(dev);
|
||||
struct request_queue *q = part_to_disk(p)->queue;
|
||||
unsigned int inflight[2];
|
||||
int cpu;
|
||||
unsigned int inflight;
|
||||
|
||||
cpu = part_stat_lock();
|
||||
part_round_stats(q, cpu, p);
|
||||
part_stat_unlock();
|
||||
part_in_flight(q, p, inflight);
|
||||
inflight = part_in_flight(q, p);
|
||||
return sprintf(buf,
|
||||
"%8lu %8lu %8llu %8u "
|
||||
"%8lu %8lu %8llu %8u "
|
||||
|
@ -141,7 +137,7 @@ ssize_t part_stat_show(struct device *dev,
|
|||
part_stat_read(p, merges[STAT_WRITE]),
|
||||
(unsigned long long)part_stat_read(p, sectors[STAT_WRITE]),
|
||||
(unsigned int)part_stat_read_msecs(p, STAT_WRITE),
|
||||
inflight[0],
|
||||
inflight,
|
||||
jiffies_to_msecs(part_stat_read(p, io_ticks)),
|
||||
jiffies_to_msecs(part_stat_read(p, time_in_queue)),
|
||||
part_stat_read(p, ios[STAT_DISCARD]),
|
||||
|
@ -249,9 +245,10 @@ struct device_type part_type = {
|
|||
.uevent = part_uevent,
|
||||
};
|
||||
|
||||
static void delete_partition_rcu_cb(struct rcu_head *head)
|
||||
static void delete_partition_work_fn(struct work_struct *work)
|
||||
{
|
||||
struct hd_struct *part = container_of(head, struct hd_struct, rcu_head);
|
||||
struct hd_struct *part = container_of(to_rcu_work(work), struct hd_struct,
|
||||
rcu_work);
|
||||
|
||||
part->start_sect = 0;
|
||||
part->nr_sects = 0;
|
||||
|
@ -262,7 +259,8 @@ static void delete_partition_rcu_cb(struct rcu_head *head)
|
|||
void __delete_partition(struct percpu_ref *ref)
|
||||
{
|
||||
struct hd_struct *part = container_of(ref, struct hd_struct, ref);
|
||||
call_rcu(&part->rcu_head, delete_partition_rcu_cb);
|
||||
INIT_RCU_WORK(&part->rcu_work, delete_partition_work_fn);
|
||||
queue_rcu_work(system_wq, &part->rcu_work);
|
||||
}
|
||||
|
||||
/*
|
||||
|
|
|
@ -919,8 +919,6 @@ static void ata_eh_set_pending(struct ata_port *ap, int fastdrain)
|
|||
void ata_qc_schedule_eh(struct ata_queued_cmd *qc)
|
||||
{
|
||||
struct ata_port *ap = qc->ap;
|
||||
struct request_queue *q = qc->scsicmd->device->request_queue;
|
||||
unsigned long flags;
|
||||
|
||||
WARN_ON(!ap->ops->error_handler);
|
||||
|
||||
|
@ -932,9 +930,7 @@ void ata_qc_schedule_eh(struct ata_queued_cmd *qc)
|
|||
* Note that ATA_QCFLAG_FAILED is unconditionally set after
|
||||
* this function completes.
|
||||
*/
|
||||
spin_lock_irqsave(q->queue_lock, flags);
|
||||
blk_abort_request(qc->scsicmd->request);
|
||||
spin_unlock_irqrestore(q->queue_lock, flags);
|
||||
}
|
||||
|
||||
/**
|
||||
|
|
|
@ -100,6 +100,10 @@ enum {
|
|||
MAX_TAINT = 1000, /* cap on aoetgt taint */
|
||||
};
|
||||
|
||||
struct aoe_req {
|
||||
unsigned long nr_bios;
|
||||
};
|
||||
|
||||
struct buf {
|
||||
ulong nframesout;
|
||||
struct bio *bio;
|
||||
|
|
|
@ -387,6 +387,7 @@ aoeblk_gdalloc(void *vp)
|
|||
|
||||
set = &d->tag_set;
|
||||
set->ops = &aoeblk_mq_ops;
|
||||
set->cmd_size = sizeof(struct aoe_req);
|
||||
set->nr_hw_queues = 1;
|
||||
set->queue_depth = 128;
|
||||
set->numa_node = NUMA_NO_NODE;
|
||||
|
|
|
@ -822,17 +822,6 @@ out:
|
|||
spin_unlock_irqrestore(&d->lock, flags);
|
||||
}
|
||||
|
||||
static unsigned long
|
||||
rqbiocnt(struct request *r)
|
||||
{
|
||||
struct bio *bio;
|
||||
unsigned long n = 0;
|
||||
|
||||
__rq_for_each_bio(bio, r)
|
||||
n++;
|
||||
return n;
|
||||
}
|
||||
|
||||
static void
|
||||
bufinit(struct buf *buf, struct request *rq, struct bio *bio)
|
||||
{
|
||||
|
@ -847,6 +836,7 @@ nextbuf(struct aoedev *d)
|
|||
{
|
||||
struct request *rq;
|
||||
struct request_queue *q;
|
||||
struct aoe_req *req;
|
||||
struct buf *buf;
|
||||
struct bio *bio;
|
||||
|
||||
|
@ -865,7 +855,11 @@ nextbuf(struct aoedev *d)
|
|||
blk_mq_start_request(rq);
|
||||
d->ip.rq = rq;
|
||||
d->ip.nxbio = rq->bio;
|
||||
rq->special = (void *) rqbiocnt(rq);
|
||||
|
||||
req = blk_mq_rq_to_pdu(rq);
|
||||
req->nr_bios = 0;
|
||||
__rq_for_each_bio(bio, rq)
|
||||
req->nr_bios++;
|
||||
}
|
||||
buf = mempool_alloc(d->bufpool, GFP_ATOMIC);
|
||||
if (buf == NULL) {
|
||||
|
@ -1069,16 +1063,13 @@ aoe_end_request(struct aoedev *d, struct request *rq, int fastfail)
|
|||
static void
|
||||
aoe_end_buf(struct aoedev *d, struct buf *buf)
|
||||
{
|
||||
struct request *rq;
|
||||
unsigned long n;
|
||||
struct request *rq = buf->rq;
|
||||
struct aoe_req *req = blk_mq_rq_to_pdu(rq);
|
||||
|
||||
if (buf == d->ip.buf)
|
||||
d->ip.buf = NULL;
|
||||
rq = buf->rq;
|
||||
mempool_free(buf, d->bufpool);
|
||||
n = (unsigned long) rq->special;
|
||||
rq->special = (void *) --n;
|
||||
if (n == 0)
|
||||
if (--req->nr_bios == 0)
|
||||
aoe_end_request(d, rq, 0);
|
||||
}
|
||||
|
||||
|
|
|
@ -160,21 +160,22 @@ static void
|
|||
aoe_failip(struct aoedev *d)
|
||||
{
|
||||
struct request *rq;
|
||||
struct aoe_req *req;
|
||||
struct bio *bio;
|
||||
unsigned long n;
|
||||
|
||||
aoe_failbuf(d, d->ip.buf);
|
||||
|
||||
rq = d->ip.rq;
|
||||
if (rq == NULL)
|
||||
return;
|
||||
|
||||
req = blk_mq_rq_to_pdu(rq);
|
||||
while ((bio = d->ip.nxbio)) {
|
||||
bio->bi_status = BLK_STS_IOERR;
|
||||
d->ip.nxbio = bio->bi_next;
|
||||
n = (unsigned long) rq->special;
|
||||
rq->special = (void *) --n;
|
||||
req->nr_bios--;
|
||||
}
|
||||
if ((unsigned long) rq->special == 0)
|
||||
|
||||
if (!req->nr_bios)
|
||||
aoe_end_request(d, rq, 0);
|
||||
}
|
||||
|
||||
|
|
|
@ -24,7 +24,7 @@ static void discover_timer(struct timer_list *t)
|
|||
aoecmd_cfg(0xffff, 0xff);
|
||||
}
|
||||
|
||||
static void
|
||||
static void __exit
|
||||
aoe_exit(void)
|
||||
{
|
||||
del_timer_sync(&timer);
|
||||
|
|
|
@ -1471,6 +1471,15 @@ static void setup_req_params( int drive )
|
|||
ReqTrack, ReqSector, (unsigned long)ReqData ));
|
||||
}
|
||||
|
||||
static void ataflop_commit_rqs(struct blk_mq_hw_ctx *hctx)
|
||||
{
|
||||
spin_lock_irq(&ataflop_lock);
|
||||
atari_disable_irq(IRQ_MFP_FDC);
|
||||
finish_fdc();
|
||||
atari_enable_irq(IRQ_MFP_FDC);
|
||||
spin_unlock_irq(&ataflop_lock);
|
||||
}
|
||||
|
||||
static blk_status_t ataflop_queue_rq(struct blk_mq_hw_ctx *hctx,
|
||||
const struct blk_mq_queue_data *bd)
|
||||
{
|
||||
|
@ -1947,6 +1956,7 @@ static const struct block_device_operations floppy_fops = {
|
|||
|
||||
static const struct blk_mq_ops ataflop_mq_ops = {
|
||||
.queue_rq = ataflop_queue_rq,
|
||||
.commit_rqs = ataflop_commit_rqs,
|
||||
};
|
||||
|
||||
static struct kobject *floppy_find(dev_t dev, int *part, void *data)
|
||||
|
@ -1982,6 +1992,7 @@ static int __init atari_floppy_init (void)
|
|||
&ataflop_mq_ops, 2,
|
||||
BLK_MQ_F_SHOULD_MERGE);
|
||||
if (IS_ERR(unit[i].disk->queue)) {
|
||||
put_disk(unit[i].disk);
|
||||
ret = PTR_ERR(unit[i].disk->queue);
|
||||
unit[i].disk->queue = NULL;
|
||||
goto err;
|
||||
|
@ -2033,18 +2044,13 @@ static int __init atari_floppy_init (void)
|
|||
return 0;
|
||||
|
||||
err:
|
||||
do {
|
||||
while (--i >= 0) {
|
||||
struct gendisk *disk = unit[i].disk;
|
||||
|
||||
if (disk) {
|
||||
if (disk->queue) {
|
||||
blk_cleanup_queue(disk->queue);
|
||||
disk->queue = NULL;
|
||||
}
|
||||
blk_mq_free_tag_set(&unit[i].tag_set);
|
||||
put_disk(unit[i].disk);
|
||||
}
|
||||
} while (i--);
|
||||
blk_cleanup_queue(disk->queue);
|
||||
blk_mq_free_tag_set(&unit[i].tag_set);
|
||||
put_disk(unit[i].disk);
|
||||
}
|
||||
|
||||
unregister_blkdev(FLOPPY_MAJOR, "fd");
|
||||
return ret;
|
||||
|
|
|
@ -2792,7 +2792,7 @@ enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsig
|
|||
|
||||
drbd_init_set_defaults(device);
|
||||
|
||||
q = blk_alloc_queue_node(GFP_KERNEL, NUMA_NO_NODE, &resource->req_lock);
|
||||
q = blk_alloc_queue_node(GFP_KERNEL, NUMA_NO_NODE);
|
||||
if (!q)
|
||||
goto out_no_q;
|
||||
device->rq_queue = q;
|
||||
|
|
|
@ -2231,7 +2231,6 @@ static void request_done(int uptodate)
|
|||
{
|
||||
struct request *req = current_req;
|
||||
struct request_queue *q;
|
||||
unsigned long flags;
|
||||
int block;
|
||||
char msg[sizeof("request done ") + sizeof(int) * 3];
|
||||
|
||||
|
@ -2254,10 +2253,7 @@ static void request_done(int uptodate)
|
|||
if (block > _floppy->sect)
|
||||
DRS->maxtrack = 1;
|
||||
|
||||
/* unlock chained buffers */
|
||||
spin_lock_irqsave(q->queue_lock, flags);
|
||||
floppy_end_request(req, 0);
|
||||
spin_unlock_irqrestore(q->queue_lock, flags);
|
||||
} else {
|
||||
if (rq_data_dir(req) == WRITE) {
|
||||
/* record write error information */
|
||||
|
@ -2269,9 +2265,7 @@ static void request_done(int uptodate)
|
|||
DRWE->last_error_sector = blk_rq_pos(req);
|
||||
DRWE->last_error_generation = DRS->generation;
|
||||
}
|
||||
spin_lock_irqsave(q->queue_lock, flags);
|
||||
floppy_end_request(req, BLK_STS_IOERR);
|
||||
spin_unlock_irqrestore(q->queue_lock, flags);
|
||||
}
|
||||
}
|
||||
|
||||
|
|
|
@ -77,13 +77,14 @@
|
|||
#include <linux/falloc.h>
|
||||
#include <linux/uio.h>
|
||||
#include <linux/ioprio.h>
|
||||
#include <linux/blk-cgroup.h>
|
||||
|
||||
#include "loop.h"
|
||||
|
||||
#include <linux/uaccess.h>
|
||||
|
||||
static DEFINE_IDR(loop_index_idr);
|
||||
static DEFINE_MUTEX(loop_index_mutex);
|
||||
static DEFINE_MUTEX(loop_ctl_mutex);
|
||||
|
||||
static int max_part;
|
||||
static int part_shift;
|
||||
|
@ -630,18 +631,7 @@ static void loop_reread_partitions(struct loop_device *lo,
|
|||
{
|
||||
int rc;
|
||||
|
||||
/*
|
||||
* bd_mutex has been held already in release path, so don't
|
||||
* acquire it if this function is called in such case.
|
||||
*
|
||||
* If the reread partition isn't from release path, lo_refcnt
|
||||
* must be at least one and it can only become zero when the
|
||||
* current holder is released.
|
||||
*/
|
||||
if (!atomic_read(&lo->lo_refcnt))
|
||||
rc = __blkdev_reread_part(bdev);
|
||||
else
|
||||
rc = blkdev_reread_part(bdev);
|
||||
rc = blkdev_reread_part(bdev);
|
||||
if (rc)
|
||||
pr_warn("%s: partition scan of loop%d (%s) failed (rc=%d)\n",
|
||||
__func__, lo->lo_number, lo->lo_file_name, rc);
|
||||
|
@ -688,26 +678,30 @@ static int loop_validate_file(struct file *file, struct block_device *bdev)
|
|||
static int loop_change_fd(struct loop_device *lo, struct block_device *bdev,
|
||||
unsigned int arg)
|
||||
{
|
||||
struct file *file, *old_file;
|
||||
struct file *file = NULL, *old_file;
|
||||
int error;
|
||||
bool partscan;
|
||||
|
||||
error = mutex_lock_killable(&loop_ctl_mutex);
|
||||
if (error)
|
||||
return error;
|
||||
error = -ENXIO;
|
||||
if (lo->lo_state != Lo_bound)
|
||||
goto out;
|
||||
goto out_err;
|
||||
|
||||
/* the loop device has to be read-only */
|
||||
error = -EINVAL;
|
||||
if (!(lo->lo_flags & LO_FLAGS_READ_ONLY))
|
||||
goto out;
|
||||
goto out_err;
|
||||
|
||||
error = -EBADF;
|
||||
file = fget(arg);
|
||||
if (!file)
|
||||
goto out;
|
||||
goto out_err;
|
||||
|
||||
error = loop_validate_file(file, bdev);
|
||||
if (error)
|
||||
goto out_putf;
|
||||
goto out_err;
|
||||
|
||||
old_file = lo->lo_backing_file;
|
||||
|
||||
|
@ -715,7 +709,7 @@ static int loop_change_fd(struct loop_device *lo, struct block_device *bdev,
|
|||
|
||||
/* size of the new backing store needs to be the same */
|
||||
if (get_loop_size(lo, file) != get_loop_size(lo, old_file))
|
||||
goto out_putf;
|
||||
goto out_err;
|
||||
|
||||
/* and ... switch */
|
||||
blk_mq_freeze_queue(lo->lo_queue);
|
||||
|
@ -726,15 +720,22 @@ static int loop_change_fd(struct loop_device *lo, struct block_device *bdev,
|
|||
lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS));
|
||||
loop_update_dio(lo);
|
||||
blk_mq_unfreeze_queue(lo->lo_queue);
|
||||
|
||||
partscan = lo->lo_flags & LO_FLAGS_PARTSCAN;
|
||||
mutex_unlock(&loop_ctl_mutex);
|
||||
/*
|
||||
* We must drop file reference outside of loop_ctl_mutex as dropping
|
||||
* the file ref can take bd_mutex which creates circular locking
|
||||
* dependency.
|
||||
*/
|
||||
fput(old_file);
|
||||
if (lo->lo_flags & LO_FLAGS_PARTSCAN)
|
||||
if (partscan)
|
||||
loop_reread_partitions(lo, bdev);
|
||||
return 0;
|
||||
|
||||
out_putf:
|
||||
fput(file);
|
||||
out:
|
||||
out_err:
|
||||
mutex_unlock(&loop_ctl_mutex);
|
||||
if (file)
|
||||
fput(file);
|
||||
return error;
|
||||
}
|
||||
|
||||
|
@ -909,6 +910,7 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
|
|||
int lo_flags = 0;
|
||||
int error;
|
||||
loff_t size;
|
||||
bool partscan;
|
||||
|
||||
/* This is safe, since we have a reference from open(). */
|
||||
__module_get(THIS_MODULE);
|
||||
|
@ -918,13 +920,17 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
|
|||
if (!file)
|
||||
goto out;
|
||||
|
||||
error = mutex_lock_killable(&loop_ctl_mutex);
|
||||
if (error)
|
||||
goto out_putf;
|
||||
|
||||
error = -EBUSY;
|
||||
if (lo->lo_state != Lo_unbound)
|
||||
goto out_putf;
|
||||
goto out_unlock;
|
||||
|
||||
error = loop_validate_file(file, bdev);
|
||||
if (error)
|
||||
goto out_putf;
|
||||
goto out_unlock;
|
||||
|
||||
mapping = file->f_mapping;
|
||||
inode = mapping->host;
|
||||
|
@ -936,10 +942,10 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
|
|||
error = -EFBIG;
|
||||
size = get_loop_size(lo, file);
|
||||
if ((loff_t)(sector_t)size != size)
|
||||
goto out_putf;
|
||||
goto out_unlock;
|
||||
error = loop_prepare_queue(lo);
|
||||
if (error)
|
||||
goto out_putf;
|
||||
goto out_unlock;
|
||||
|
||||
error = 0;
|
||||
|
||||
|
@ -971,18 +977,22 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
|
|||
lo->lo_state = Lo_bound;
|
||||
if (part_shift)
|
||||
lo->lo_flags |= LO_FLAGS_PARTSCAN;
|
||||
if (lo->lo_flags & LO_FLAGS_PARTSCAN)
|
||||
loop_reread_partitions(lo, bdev);
|
||||
partscan = lo->lo_flags & LO_FLAGS_PARTSCAN;
|
||||
|
||||
/* Grab the block_device to prevent its destruction after we
|
||||
* put /dev/loopXX inode. Later in loop_clr_fd() we bdput(bdev).
|
||||
* put /dev/loopXX inode. Later in __loop_clr_fd() we bdput(bdev).
|
||||
*/
|
||||
bdgrab(bdev);
|
||||
mutex_unlock(&loop_ctl_mutex);
|
||||
if (partscan)
|
||||
loop_reread_partitions(lo, bdev);
|
||||
return 0;
|
||||
|
||||
out_putf:
|
||||
out_unlock:
|
||||
mutex_unlock(&loop_ctl_mutex);
|
||||
out_putf:
|
||||
fput(file);
|
||||
out:
|
||||
out:
|
||||
/* This is safe: open() is still holding a reference. */
|
||||
module_put(THIS_MODULE);
|
||||
return error;
|
||||
|
@ -1025,39 +1035,31 @@ loop_init_xfer(struct loop_device *lo, struct loop_func_table *xfer,
|
|||
return err;
|
||||
}
|
||||
|
||||
static int loop_clr_fd(struct loop_device *lo)
|
||||
static int __loop_clr_fd(struct loop_device *lo, bool release)
|
||||
{
|
||||
struct file *filp = lo->lo_backing_file;
|
||||
struct file *filp = NULL;
|
||||
gfp_t gfp = lo->old_gfp_mask;
|
||||
struct block_device *bdev = lo->lo_device;
|
||||
int err = 0;
|
||||
bool partscan = false;
|
||||
int lo_number;
|
||||
|
||||
if (lo->lo_state != Lo_bound)
|
||||
return -ENXIO;
|
||||
|
||||
/*
|
||||
* If we've explicitly asked to tear down the loop device,
|
||||
* and it has an elevated reference count, set it for auto-teardown when
|
||||
* the last reference goes away. This stops $!~#$@ udev from
|
||||
* preventing teardown because it decided that it needs to run blkid on
|
||||
* the loopback device whenever they appear. xfstests is notorious for
|
||||
* failing tests because blkid via udev races with a losetup
|
||||
* <dev>/do something like mkfs/losetup -d <dev> causing the losetup -d
|
||||
* command to fail with EBUSY.
|
||||
*/
|
||||
if (atomic_read(&lo->lo_refcnt) > 1) {
|
||||
lo->lo_flags |= LO_FLAGS_AUTOCLEAR;
|
||||
mutex_unlock(&lo->lo_ctl_mutex);
|
||||
return 0;
|
||||
mutex_lock(&loop_ctl_mutex);
|
||||
if (WARN_ON_ONCE(lo->lo_state != Lo_rundown)) {
|
||||
err = -ENXIO;
|
||||
goto out_unlock;
|
||||
}
|
||||
|
||||
if (filp == NULL)
|
||||
return -EINVAL;
|
||||
filp = lo->lo_backing_file;
|
||||
if (filp == NULL) {
|
||||
err = -EINVAL;
|
||||
goto out_unlock;
|
||||
}
|
||||
|
||||
/* freeze request queue during the transition */
|
||||
blk_mq_freeze_queue(lo->lo_queue);
|
||||
|
||||
spin_lock_irq(&lo->lo_lock);
|
||||
lo->lo_state = Lo_rundown;
|
||||
lo->lo_backing_file = NULL;
|
||||
spin_unlock_irq(&lo->lo_lock);
|
||||
|
||||
|
@ -1093,21 +1095,73 @@ static int loop_clr_fd(struct loop_device *lo)
|
|||
module_put(THIS_MODULE);
|
||||
blk_mq_unfreeze_queue(lo->lo_queue);
|
||||
|
||||
if (lo->lo_flags & LO_FLAGS_PARTSCAN && bdev)
|
||||
loop_reread_partitions(lo, bdev);
|
||||
partscan = lo->lo_flags & LO_FLAGS_PARTSCAN && bdev;
|
||||
lo_number = lo->lo_number;
|
||||
lo->lo_flags = 0;
|
||||
if (!part_shift)
|
||||
lo->lo_disk->flags |= GENHD_FL_NO_PART_SCAN;
|
||||
loop_unprepare_queue(lo);
|
||||
mutex_unlock(&lo->lo_ctl_mutex);
|
||||
out_unlock:
|
||||
mutex_unlock(&loop_ctl_mutex);
|
||||
if (partscan) {
|
||||
/*
|
||||
* bd_mutex has been held already in release path, so don't
|
||||
* acquire it if this function is called in such case.
|
||||
*
|
||||
* If the reread partition isn't from release path, lo_refcnt
|
||||
* must be at least one and it can only become zero when the
|
||||
* current holder is released.
|
||||
*/
|
||||
if (release)
|
||||
err = __blkdev_reread_part(bdev);
|
||||
else
|
||||
err = blkdev_reread_part(bdev);
|
||||
pr_warn("%s: partition scan of loop%d failed (rc=%d)\n",
|
||||
__func__, lo_number, err);
|
||||
/* Device is gone, no point in returning error */
|
||||
err = 0;
|
||||
}
|
||||
/*
|
||||
* Need not hold lo_ctl_mutex to fput backing file.
|
||||
* Calling fput holding lo_ctl_mutex triggers a circular
|
||||
* Need not hold loop_ctl_mutex to fput backing file.
|
||||
* Calling fput holding loop_ctl_mutex triggers a circular
|
||||
* lock dependency possibility warning as fput can take
|
||||
* bd_mutex which is usually taken before lo_ctl_mutex.
|
||||
* bd_mutex which is usually taken before loop_ctl_mutex.
|
||||
*/
|
||||
fput(filp);
|
||||
return 0;
|
||||
if (filp)
|
||||
fput(filp);
|
||||
return err;
|
||||
}
|
||||
|
||||
static int loop_clr_fd(struct loop_device *lo)
|
||||
{
|
||||
int err;
|
||||
|
||||
err = mutex_lock_killable(&loop_ctl_mutex);
|
||||
if (err)
|
||||
return err;
|
||||
if (lo->lo_state != Lo_bound) {
|
||||
mutex_unlock(&loop_ctl_mutex);
|
||||
return -ENXIO;
|
||||
}
|
||||
/*
|
||||
* If we've explicitly asked to tear down the loop device,
|
||||
* and it has an elevated reference count, set it for auto-teardown when
|
||||
* the last reference goes away. This stops $!~#$@ udev from
|
||||
* preventing teardown because it decided that it needs to run blkid on
|
||||
* the loopback device whenever they appear. xfstests is notorious for
|
||||
* failing tests because blkid via udev races with a losetup
|
||||
* <dev>/do something like mkfs/losetup -d <dev> causing the losetup -d
|
||||
* command to fail with EBUSY.
|
||||
*/
|
||||
if (atomic_read(&lo->lo_refcnt) > 1) {
|
||||
lo->lo_flags |= LO_FLAGS_AUTOCLEAR;
|
||||
mutex_unlock(&loop_ctl_mutex);
|
||||
return 0;
|
||||
}
|
||||
lo->lo_state = Lo_rundown;
|
||||
mutex_unlock(&loop_ctl_mutex);
|
||||
|
||||
return __loop_clr_fd(lo, false);
|
||||
}
|
||||
|
||||
static int
|
||||
|
@ -1116,47 +1170,58 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
|
|||
int err;
|
||||
struct loop_func_table *xfer;
|
||||
kuid_t uid = current_uid();
|
||||
struct block_device *bdev;
|
||||
bool partscan = false;
|
||||
|
||||
err = mutex_lock_killable(&loop_ctl_mutex);
|
||||
if (err)
|
||||
return err;
|
||||
if (lo->lo_encrypt_key_size &&
|
||||
!uid_eq(lo->lo_key_owner, uid) &&
|
||||
!capable(CAP_SYS_ADMIN))
|
||||
return -EPERM;
|
||||
if (lo->lo_state != Lo_bound)
|
||||
return -ENXIO;
|
||||
if ((unsigned int) info->lo_encrypt_key_size > LO_KEY_SIZE)
|
||||
return -EINVAL;
|
||||
!capable(CAP_SYS_ADMIN)) {
|
||||
err = -EPERM;
|
||||
goto out_unlock;
|
||||
}
|
||||
if (lo->lo_state != Lo_bound) {
|
||||
err = -ENXIO;
|
||||
goto out_unlock;
|
||||
}
|
||||
if ((unsigned int) info->lo_encrypt_key_size > LO_KEY_SIZE) {
|
||||
err = -EINVAL;
|
||||
goto out_unlock;
|
||||
}
|
||||
|
||||
/* I/O need to be drained during transfer transition */
|
||||
blk_mq_freeze_queue(lo->lo_queue);
|
||||
|
||||
err = loop_release_xfer(lo);
|
||||
if (err)
|
||||
goto exit;
|
||||
goto out_unfreeze;
|
||||
|
||||
if (info->lo_encrypt_type) {
|
||||
unsigned int type = info->lo_encrypt_type;
|
||||
|
||||
if (type >= MAX_LO_CRYPT) {
|
||||
err = -EINVAL;
|
||||
goto exit;
|
||||
goto out_unfreeze;
|
||||
}
|
||||
xfer = xfer_funcs[type];
|
||||
if (xfer == NULL) {
|
||||
err = -EINVAL;
|
||||
goto exit;
|
||||
goto out_unfreeze;
|
||||
}
|
||||
} else
|
||||
xfer = NULL;
|
||||
|
||||
err = loop_init_xfer(lo, xfer, info);
|
||||
if (err)
|
||||
goto exit;
|
||||
goto out_unfreeze;
|
||||
|
||||
if (lo->lo_offset != info->lo_offset ||
|
||||
lo->lo_sizelimit != info->lo_sizelimit) {
|
||||
if (figure_loop_size(lo, info->lo_offset, info->lo_sizelimit)) {
|
||||
err = -EFBIG;
|
||||
goto exit;
|
||||
goto out_unfreeze;
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -1188,15 +1253,20 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
|
|||
/* update dio if lo_offset or transfer is changed */
|
||||
__loop_update_dio(lo, lo->use_dio);
|
||||
|
||||
exit:
|
||||
out_unfreeze:
|
||||
blk_mq_unfreeze_queue(lo->lo_queue);
|
||||
|
||||
if (!err && (info->lo_flags & LO_FLAGS_PARTSCAN) &&
|
||||
!(lo->lo_flags & LO_FLAGS_PARTSCAN)) {
|
||||
lo->lo_flags |= LO_FLAGS_PARTSCAN;
|
||||
lo->lo_disk->flags &= ~GENHD_FL_NO_PART_SCAN;
|
||||
loop_reread_partitions(lo, lo->lo_device);
|
||||
bdev = lo->lo_device;
|
||||
partscan = true;
|
||||
}
|
||||
out_unlock:
|
||||
mutex_unlock(&loop_ctl_mutex);
|
||||
if (partscan)
|
||||
loop_reread_partitions(lo, bdev);
|
||||
|
||||
return err;
|
||||
}
|
||||
|
@ -1204,12 +1274,15 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
|
|||
static int
|
||||
loop_get_status(struct loop_device *lo, struct loop_info64 *info)
|
||||
{
|
||||
struct file *file;
|
||||
struct path path;
|
||||
struct kstat stat;
|
||||
int ret;
|
||||
|
||||
ret = mutex_lock_killable(&loop_ctl_mutex);
|
||||
if (ret)
|
||||
return ret;
|
||||
if (lo->lo_state != Lo_bound) {
|
||||
mutex_unlock(&lo->lo_ctl_mutex);
|
||||
mutex_unlock(&loop_ctl_mutex);
|
||||
return -ENXIO;
|
||||
}
|
||||
|
||||
|
@ -1228,17 +1301,17 @@ loop_get_status(struct loop_device *lo, struct loop_info64 *info)
|
|||
lo->lo_encrypt_key_size);
|
||||
}
|
||||
|
||||
/* Drop lo_ctl_mutex while we call into the filesystem. */
|
||||
file = get_file(lo->lo_backing_file);
|
||||
mutex_unlock(&lo->lo_ctl_mutex);
|
||||
ret = vfs_getattr(&file->f_path, &stat, STATX_INO,
|
||||
AT_STATX_SYNC_AS_STAT);
|
||||
/* Drop loop_ctl_mutex while we call into the filesystem. */
|
||||
path = lo->lo_backing_file->f_path;
|
||||
path_get(&path);
|
||||
mutex_unlock(&loop_ctl_mutex);
|
||||
ret = vfs_getattr(&path, &stat, STATX_INO, AT_STATX_SYNC_AS_STAT);
|
||||
if (!ret) {
|
||||
info->lo_device = huge_encode_dev(stat.dev);
|
||||
info->lo_inode = stat.ino;
|
||||
info->lo_rdevice = huge_encode_dev(stat.rdev);
|
||||
}
|
||||
fput(file);
|
||||
path_put(&path);
|
||||
return ret;
|
||||
}
|
||||
|
||||
|
@ -1322,10 +1395,8 @@ loop_get_status_old(struct loop_device *lo, struct loop_info __user *arg) {
|
|||
struct loop_info64 info64;
|
||||
int err;
|
||||
|
||||
if (!arg) {
|
||||
mutex_unlock(&lo->lo_ctl_mutex);
|
||||
if (!arg)
|
||||
return -EINVAL;
|
||||
}
|
||||
err = loop_get_status(lo, &info64);
|
||||
if (!err)
|
||||
err = loop_info64_to_old(&info64, &info);
|
||||
|
@ -1340,10 +1411,8 @@ loop_get_status64(struct loop_device *lo, struct loop_info64 __user *arg) {
|
|||
struct loop_info64 info64;
|
||||
int err;
|
||||
|
||||
if (!arg) {
|
||||
mutex_unlock(&lo->lo_ctl_mutex);
|
||||
if (!arg)
|
||||
return -EINVAL;
|
||||
}
|
||||
err = loop_get_status(lo, &info64);
|
||||
if (!err && copy_to_user(arg, &info64, sizeof(info64)))
|
||||
err = -EFAULT;
|
||||
|
@ -1393,70 +1462,73 @@ static int loop_set_block_size(struct loop_device *lo, unsigned long arg)
|
|||
return 0;
|
||||
}
|
||||
|
||||
static int lo_simple_ioctl(struct loop_device *lo, unsigned int cmd,
|
||||
unsigned long arg)
|
||||
{
|
||||
int err;
|
||||
|
||||
err = mutex_lock_killable(&loop_ctl_mutex);
|
||||
if (err)
|
||||
return err;
|
||||
switch (cmd) {
|
||||
case LOOP_SET_CAPACITY:
|
||||
err = loop_set_capacity(lo);
|
||||
break;
|
||||
case LOOP_SET_DIRECT_IO:
|
||||
err = loop_set_dio(lo, arg);
|
||||
break;
|
||||
case LOOP_SET_BLOCK_SIZE:
|
||||
err = loop_set_block_size(lo, arg);
|
||||
break;
|
||||
default:
|
||||
err = lo->ioctl ? lo->ioctl(lo, cmd, arg) : -EINVAL;
|
||||
}
|
||||
mutex_unlock(&loop_ctl_mutex);
|
||||
return err;
|
||||
}
|
||||
|
||||
static int lo_ioctl(struct block_device *bdev, fmode_t mode,
|
||||
unsigned int cmd, unsigned long arg)
|
||||
{
|
||||
struct loop_device *lo = bdev->bd_disk->private_data;
|
||||
int err;
|
||||
|
||||
err = mutex_lock_killable_nested(&lo->lo_ctl_mutex, 1);
|
||||
if (err)
|
||||
goto out_unlocked;
|
||||
|
||||
switch (cmd) {
|
||||
case LOOP_SET_FD:
|
||||
err = loop_set_fd(lo, mode, bdev, arg);
|
||||
break;
|
||||
return loop_set_fd(lo, mode, bdev, arg);
|
||||
case LOOP_CHANGE_FD:
|
||||
err = loop_change_fd(lo, bdev, arg);
|
||||
break;
|
||||
return loop_change_fd(lo, bdev, arg);
|
||||
case LOOP_CLR_FD:
|
||||
/* loop_clr_fd would have unlocked lo_ctl_mutex on success */
|
||||
err = loop_clr_fd(lo);
|
||||
if (!err)
|
||||
goto out_unlocked;
|
||||
break;
|
||||
return loop_clr_fd(lo);
|
||||
case LOOP_SET_STATUS:
|
||||
err = -EPERM;
|
||||
if ((mode & FMODE_WRITE) || capable(CAP_SYS_ADMIN))
|
||||
if ((mode & FMODE_WRITE) || capable(CAP_SYS_ADMIN)) {
|
||||
err = loop_set_status_old(lo,
|
||||
(struct loop_info __user *)arg);
|
||||
}
|
||||
break;
|
||||
case LOOP_GET_STATUS:
|
||||
err = loop_get_status_old(lo, (struct loop_info __user *) arg);
|
||||
/* loop_get_status() unlocks lo_ctl_mutex */
|
||||
goto out_unlocked;
|
||||
return loop_get_status_old(lo, (struct loop_info __user *) arg);
|
||||
case LOOP_SET_STATUS64:
|
||||
err = -EPERM;
|
||||
if ((mode & FMODE_WRITE) || capable(CAP_SYS_ADMIN))
|
||||
if ((mode & FMODE_WRITE) || capable(CAP_SYS_ADMIN)) {
|
||||
err = loop_set_status64(lo,
|
||||
(struct loop_info64 __user *) arg);
|
||||
}
|
||||
break;
|
||||
case LOOP_GET_STATUS64:
|
||||
err = loop_get_status64(lo, (struct loop_info64 __user *) arg);
|
||||
/* loop_get_status() unlocks lo_ctl_mutex */
|
||||
goto out_unlocked;
|
||||
return loop_get_status64(lo, (struct loop_info64 __user *) arg);
|
||||
case LOOP_SET_CAPACITY:
|
||||
err = -EPERM;
|
||||
if ((mode & FMODE_WRITE) || capable(CAP_SYS_ADMIN))
|
||||
err = loop_set_capacity(lo);
|
||||
break;
|
||||
case LOOP_SET_DIRECT_IO:
|
||||
err = -EPERM;
|
||||
if ((mode & FMODE_WRITE) || capable(CAP_SYS_ADMIN))
|
||||
err = loop_set_dio(lo, arg);
|
||||
break;
|
||||
case LOOP_SET_BLOCK_SIZE:
|
||||
err = -EPERM;
|
||||
if ((mode & FMODE_WRITE) || capable(CAP_SYS_ADMIN))
|
||||
err = loop_set_block_size(lo, arg);
|
||||
break;
|
||||
if (!(mode & FMODE_WRITE) && !capable(CAP_SYS_ADMIN))
|
||||
return -EPERM;
|
||||
/* Fall through */
|
||||
default:
|
||||
err = lo->ioctl ? lo->ioctl(lo, cmd, arg) : -EINVAL;
|
||||
err = lo_simple_ioctl(lo, cmd, arg);
|
||||
break;
|
||||
}
|
||||
mutex_unlock(&lo->lo_ctl_mutex);
|
||||
|
||||
out_unlocked:
|
||||
return err;
|
||||
}
|
||||
|
||||
|
@ -1570,10 +1642,8 @@ loop_get_status_compat(struct loop_device *lo,
|
|||
struct loop_info64 info64;
|
||||
int err;
|
||||
|
||||
if (!arg) {
|
||||
mutex_unlock(&lo->lo_ctl_mutex);
|
||||
if (!arg)
|
||||
return -EINVAL;
|
||||
}
|
||||
err = loop_get_status(lo, &info64);
|
||||
if (!err)
|
||||
err = loop_info64_to_compat(&info64, arg);
|
||||
|
@ -1588,20 +1658,12 @@ static int lo_compat_ioctl(struct block_device *bdev, fmode_t mode,
|
|||
|
||||
switch(cmd) {
|
||||
case LOOP_SET_STATUS:
|
||||
err = mutex_lock_killable(&lo->lo_ctl_mutex);
|
||||
if (!err) {
|
||||
err = loop_set_status_compat(lo,
|
||||
(const struct compat_loop_info __user *)arg);
|
||||
mutex_unlock(&lo->lo_ctl_mutex);
|
||||
}
|
||||
err = loop_set_status_compat(lo,
|
||||
(const struct compat_loop_info __user *)arg);
|
||||
break;
|
||||
case LOOP_GET_STATUS:
|
||||
err = mutex_lock_killable(&lo->lo_ctl_mutex);
|
||||
if (!err) {
|
||||
err = loop_get_status_compat(lo,
|
||||
(struct compat_loop_info __user *)arg);
|
||||
/* loop_get_status() unlocks lo_ctl_mutex */
|
||||
}
|
||||
err = loop_get_status_compat(lo,
|
||||
(struct compat_loop_info __user *)arg);
|
||||
break;
|
||||
case LOOP_SET_CAPACITY:
|
||||
case LOOP_CLR_FD:
|
||||
|
@ -1625,9 +1687,11 @@ static int lo_compat_ioctl(struct block_device *bdev, fmode_t mode,
|
|||
static int lo_open(struct block_device *bdev, fmode_t mode)
|
||||
{
|
||||
struct loop_device *lo;
|
||||
int err = 0;
|
||||
int err;
|
||||
|
||||
mutex_lock(&loop_index_mutex);
|
||||
err = mutex_lock_killable(&loop_ctl_mutex);
|
||||
if (err)
|
||||
return err;
|
||||
lo = bdev->bd_disk->private_data;
|
||||
if (!lo) {
|
||||
err = -ENXIO;
|
||||
|
@ -1636,26 +1700,30 @@ static int lo_open(struct block_device *bdev, fmode_t mode)
|
|||
|
||||
atomic_inc(&lo->lo_refcnt);
|
||||
out:
|
||||
mutex_unlock(&loop_index_mutex);
|
||||
mutex_unlock(&loop_ctl_mutex);
|
||||
return err;
|
||||
}
|
||||
|
||||
static void __lo_release(struct loop_device *lo)
|
||||
static void lo_release(struct gendisk *disk, fmode_t mode)
|
||||
{
|
||||
int err;
|
||||
struct loop_device *lo;
|
||||
|
||||
mutex_lock(&loop_ctl_mutex);
|
||||
lo = disk->private_data;
|
||||
if (atomic_dec_return(&lo->lo_refcnt))
|
||||
return;
|
||||
goto out_unlock;
|
||||
|
||||
mutex_lock(&lo->lo_ctl_mutex);
|
||||
if (lo->lo_flags & LO_FLAGS_AUTOCLEAR) {
|
||||
if (lo->lo_state != Lo_bound)
|
||||
goto out_unlock;
|
||||
lo->lo_state = Lo_rundown;
|
||||
mutex_unlock(&loop_ctl_mutex);
|
||||
/*
|
||||
* In autoclear mode, stop the loop thread
|
||||
* and remove configuration after last close.
|
||||
*/
|
||||
err = loop_clr_fd(lo);
|
||||
if (!err)
|
||||
return;
|
||||
__loop_clr_fd(lo, true);
|
||||
return;
|
||||
} else if (lo->lo_state == Lo_bound) {
|
||||
/*
|
||||
* Otherwise keep thread (if running) and config,
|
||||
|
@ -1665,14 +1733,8 @@ static void __lo_release(struct loop_device *lo)
|
|||
blk_mq_unfreeze_queue(lo->lo_queue);
|
||||
}
|
||||
|
||||
mutex_unlock(&lo->lo_ctl_mutex);
|
||||
}
|
||||
|
||||
static void lo_release(struct gendisk *disk, fmode_t mode)
|
||||
{
|
||||
mutex_lock(&loop_index_mutex);
|
||||
__lo_release(disk->private_data);
|
||||
mutex_unlock(&loop_index_mutex);
|
||||
out_unlock:
|
||||
mutex_unlock(&loop_ctl_mutex);
|
||||
}
|
||||
|
||||
static const struct block_device_operations lo_fops = {
|
||||
|
@ -1711,10 +1773,10 @@ static int unregister_transfer_cb(int id, void *ptr, void *data)
|
|||
struct loop_device *lo = ptr;
|
||||
struct loop_func_table *xfer = data;
|
||||
|
||||
mutex_lock(&lo->lo_ctl_mutex);
|
||||
mutex_lock(&loop_ctl_mutex);
|
||||
if (lo->lo_encryption == xfer)
|
||||
loop_release_xfer(lo);
|
||||
mutex_unlock(&lo->lo_ctl_mutex);
|
||||
mutex_unlock(&loop_ctl_mutex);
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
@ -1759,8 +1821,8 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
|
|||
|
||||
/* always use the first bio's css */
|
||||
#ifdef CONFIG_BLK_CGROUP
|
||||
if (cmd->use_aio && rq->bio && rq->bio->bi_css) {
|
||||
cmd->css = rq->bio->bi_css;
|
||||
if (cmd->use_aio && rq->bio && rq->bio->bi_blkg) {
|
||||
cmd->css = &bio_blkcg(rq->bio)->css;
|
||||
css_get(cmd->css);
|
||||
} else
|
||||
#endif
|
||||
|
@ -1853,7 +1915,7 @@ static int loop_add(struct loop_device **l, int i)
|
|||
goto out_free_idr;
|
||||
|
||||
lo->lo_queue = blk_mq_init_queue(&lo->tag_set);
|
||||
if (IS_ERR_OR_NULL(lo->lo_queue)) {
|
||||
if (IS_ERR(lo->lo_queue)) {
|
||||
err = PTR_ERR(lo->lo_queue);
|
||||
goto out_cleanup_tags;
|
||||
}
|
||||
|
@ -1895,7 +1957,6 @@ static int loop_add(struct loop_device **l, int i)
|
|||
if (!part_shift)
|
||||
disk->flags |= GENHD_FL_NO_PART_SCAN;
|
||||
disk->flags |= GENHD_FL_EXT_DEVT;
|
||||
mutex_init(&lo->lo_ctl_mutex);
|
||||
atomic_set(&lo->lo_refcnt, 0);
|
||||
lo->lo_number = i;
|
||||
spin_lock_init(&lo->lo_lock);
|
||||
|
@ -1974,7 +2035,7 @@ static struct kobject *loop_probe(dev_t dev, int *part, void *data)
|
|||
struct kobject *kobj;
|
||||
int err;
|
||||
|
||||
mutex_lock(&loop_index_mutex);
|
||||
mutex_lock(&loop_ctl_mutex);
|
||||
err = loop_lookup(&lo, MINOR(dev) >> part_shift);
|
||||
if (err < 0)
|
||||
err = loop_add(&lo, MINOR(dev) >> part_shift);
|
||||
|
@ -1982,7 +2043,7 @@ static struct kobject *loop_probe(dev_t dev, int *part, void *data)
|
|||
kobj = NULL;
|
||||
else
|
||||
kobj = get_disk_and_module(lo->lo_disk);
|
||||
mutex_unlock(&loop_index_mutex);
|
||||
mutex_unlock(&loop_ctl_mutex);
|
||||
|
||||
*part = 0;
|
||||
return kobj;
|
||||
|
@ -1992,9 +2053,13 @@ static long loop_control_ioctl(struct file *file, unsigned int cmd,
|
|||
unsigned long parm)
|
||||
{
|
||||
struct loop_device *lo;
|
||||
int ret = -ENOSYS;
|
||||
int ret;
|
||||
|
||||
mutex_lock(&loop_index_mutex);
|
||||
ret = mutex_lock_killable(&loop_ctl_mutex);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
ret = -ENOSYS;
|
||||
switch (cmd) {
|
||||
case LOOP_CTL_ADD:
|
||||
ret = loop_lookup(&lo, parm);
|
||||
|
@ -2008,21 +2073,15 @@ static long loop_control_ioctl(struct file *file, unsigned int cmd,
|
|||
ret = loop_lookup(&lo, parm);
|
||||
if (ret < 0)
|
||||
break;
|
||||
ret = mutex_lock_killable(&lo->lo_ctl_mutex);
|
||||
if (ret)
|
||||
break;
|
||||
if (lo->lo_state != Lo_unbound) {
|
||||
ret = -EBUSY;
|
||||
mutex_unlock(&lo->lo_ctl_mutex);
|
||||
break;
|
||||
}
|
||||
if (atomic_read(&lo->lo_refcnt) > 0) {
|
||||
ret = -EBUSY;
|
||||
mutex_unlock(&lo->lo_ctl_mutex);
|
||||
break;
|
||||
}
|
||||
lo->lo_disk->private_data = NULL;
|
||||
mutex_unlock(&lo->lo_ctl_mutex);
|
||||
idr_remove(&loop_index_idr, lo->lo_number);
|
||||
loop_remove(lo);
|
||||
break;
|
||||
|
@ -2032,7 +2091,7 @@ static long loop_control_ioctl(struct file *file, unsigned int cmd,
|
|||
break;
|
||||
ret = loop_add(&lo, -1);
|
||||
}
|
||||
mutex_unlock(&loop_index_mutex);
|
||||
mutex_unlock(&loop_ctl_mutex);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
@ -2116,10 +2175,10 @@ static int __init loop_init(void)
|
|||
THIS_MODULE, loop_probe, NULL, NULL);
|
||||
|
||||
/* pre-create number of devices given by config or max_loop */
|
||||
mutex_lock(&loop_index_mutex);
|
||||
mutex_lock(&loop_ctl_mutex);
|
||||
for (i = 0; i < nr; i++)
|
||||
loop_add(&lo, i);
|
||||
mutex_unlock(&loop_index_mutex);
|
||||
mutex_unlock(&loop_ctl_mutex);
|
||||
|
||||
printk(KERN_INFO "loop: module loaded\n");
|
||||
return 0;
|
||||
|
|
|
@ -54,7 +54,6 @@ struct loop_device {
|
|||
|
||||
spinlock_t lo_lock;
|
||||
int lo_state;
|
||||
struct mutex lo_ctl_mutex;
|
||||
struct kthread_worker worker;
|
||||
struct task_struct *worker_task;
|
||||
bool use_dio;
|
||||
|
|
|
@ -168,41 +168,6 @@ static bool mtip_check_surprise_removal(struct pci_dev *pdev)
|
|||
return false; /* device present */
|
||||
}
|
||||
|
||||
/* we have to use runtime tag to setup command header */
|
||||
static void mtip_init_cmd_header(struct request *rq)
|
||||
{
|
||||
struct driver_data *dd = rq->q->queuedata;
|
||||
struct mtip_cmd *cmd = blk_mq_rq_to_pdu(rq);
|
||||
|
||||
/* Point the command headers at the command tables. */
|
||||
cmd->command_header = dd->port->command_list +
|
||||
(sizeof(struct mtip_cmd_hdr) * rq->tag);
|
||||
cmd->command_header_dma = dd->port->command_list_dma +
|
||||
(sizeof(struct mtip_cmd_hdr) * rq->tag);
|
||||
|
||||
if (test_bit(MTIP_PF_HOST_CAP_64, &dd->port->flags))
|
||||
cmd->command_header->ctbau = __force_bit2int cpu_to_le32((cmd->command_dma >> 16) >> 16);
|
||||
|
||||
cmd->command_header->ctba = __force_bit2int cpu_to_le32(cmd->command_dma & 0xFFFFFFFF);
|
||||
}
|
||||
|
||||
static struct mtip_cmd *mtip_get_int_command(struct driver_data *dd)
|
||||
{
|
||||
struct request *rq;
|
||||
|
||||
if (mtip_check_surprise_removal(dd->pdev))
|
||||
return NULL;
|
||||
|
||||
rq = blk_mq_alloc_request(dd->queue, REQ_OP_DRV_IN, BLK_MQ_REQ_RESERVED);
|
||||
if (IS_ERR(rq))
|
||||
return NULL;
|
||||
|
||||
/* Internal cmd isn't submitted via .queue_rq */
|
||||
mtip_init_cmd_header(rq);
|
||||
|
||||
return blk_mq_rq_to_pdu(rq);
|
||||
}
|
||||
|
||||
static struct mtip_cmd *mtip_cmd_from_tag(struct driver_data *dd,
|
||||
unsigned int tag)
|
||||
{
|
||||
|
@ -1023,13 +988,14 @@ static int mtip_exec_internal_command(struct mtip_port *port,
|
|||
return -EFAULT;
|
||||
}
|
||||
|
||||
int_cmd = mtip_get_int_command(dd);
|
||||
if (!int_cmd) {
|
||||
if (mtip_check_surprise_removal(dd->pdev))
|
||||
return -EFAULT;
|
||||
|
||||
rq = blk_mq_alloc_request(dd->queue, REQ_OP_DRV_IN, BLK_MQ_REQ_RESERVED);
|
||||
if (IS_ERR(rq)) {
|
||||
dbg_printk(MTIP_DRV_NAME "Unable to allocate tag for PIO cmd\n");
|
||||
return -EFAULT;
|
||||
}
|
||||
rq = blk_mq_rq_from_pdu(int_cmd);
|
||||
rq->special = &icmd;
|
||||
|
||||
set_bit(MTIP_PF_IC_ACTIVE_BIT, &port->flags);
|
||||
|
||||
|
@ -1050,6 +1016,8 @@ static int mtip_exec_internal_command(struct mtip_port *port,
|
|||
}
|
||||
|
||||
/* Copy the command to the command table */
|
||||
int_cmd = blk_mq_rq_to_pdu(rq);
|
||||
int_cmd->icmd = &icmd;
|
||||
memcpy(int_cmd->command, fis, fis_len*4);
|
||||
|
||||
rq->timeout = timeout;
|
||||
|
@ -1423,23 +1391,19 @@ static int mtip_get_smart_attr(struct mtip_port *port, unsigned int id,
|
|||
* @dd pointer to driver_data structure
|
||||
* @lba starting lba
|
||||
* @len # of 512b sectors to trim
|
||||
*
|
||||
* return value
|
||||
* -ENOMEM Out of dma memory
|
||||
* -EINVAL Invalid parameters passed in, trim not supported
|
||||
* -EIO Error submitting trim request to hw
|
||||
*/
|
||||
static int mtip_send_trim(struct driver_data *dd, unsigned int lba,
|
||||
unsigned int len)
|
||||
static blk_status_t mtip_send_trim(struct driver_data *dd, unsigned int lba,
|
||||
unsigned int len)
|
||||
{
|
||||
int i, rv = 0;
|
||||
u64 tlba, tlen, sect_left;
|
||||
struct mtip_trim_entry *buf;
|
||||
dma_addr_t dma_addr;
|
||||
struct host_to_dev_fis fis;
|
||||
blk_status_t ret = BLK_STS_OK;
|
||||
int i;
|
||||
|
||||
if (!len || dd->trim_supp == false)
|
||||
return -EINVAL;
|
||||
return BLK_STS_IOERR;
|
||||
|
||||
/* Trim request too big */
|
||||
WARN_ON(len > (MTIP_MAX_TRIM_ENTRY_LEN * MTIP_MAX_TRIM_ENTRIES));
|
||||
|
@ -1454,7 +1418,7 @@ static int mtip_send_trim(struct driver_data *dd, unsigned int lba,
|
|||
buf = dmam_alloc_coherent(&dd->pdev->dev, ATA_SECT_SIZE, &dma_addr,
|
||||
GFP_KERNEL);
|
||||
if (!buf)
|
||||
return -ENOMEM;
|
||||
return BLK_STS_RESOURCE;
|
||||
memset(buf, 0, ATA_SECT_SIZE);
|
||||
|
||||
for (i = 0, sect_left = len, tlba = lba;
|
||||
|
@ -1463,8 +1427,8 @@ static int mtip_send_trim(struct driver_data *dd, unsigned int lba,
|
|||
tlen = (sect_left >= MTIP_MAX_TRIM_ENTRY_LEN ?
|
||||
MTIP_MAX_TRIM_ENTRY_LEN :
|
||||
sect_left);
|
||||
buf[i].lba = __force_bit2int cpu_to_le32(tlba);
|
||||
buf[i].range = __force_bit2int cpu_to_le16(tlen);
|
||||
buf[i].lba = cpu_to_le32(tlba);
|
||||
buf[i].range = cpu_to_le16(tlen);
|
||||
tlba += tlen;
|
||||
sect_left -= tlen;
|
||||
}
|
||||
|
@ -1486,10 +1450,10 @@ static int mtip_send_trim(struct driver_data *dd, unsigned int lba,
|
|||
ATA_SECT_SIZE,
|
||||
0,
|
||||
MTIP_TRIM_TIMEOUT_MS) < 0)
|
||||
rv = -EIO;
|
||||
ret = BLK_STS_IOERR;
|
||||
|
||||
dmam_free_coherent(&dd->pdev->dev, ATA_SECT_SIZE, buf, dma_addr);
|
||||
return rv;
|
||||
return ret;
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -1585,23 +1549,20 @@ static inline void fill_command_sg(struct driver_data *dd,
|
|||
int n;
|
||||
unsigned int dma_len;
|
||||
struct mtip_cmd_sg *command_sg;
|
||||
struct scatterlist *sg = command->sg;
|
||||
struct scatterlist *sg;
|
||||
|
||||
command_sg = command->command + AHCI_CMD_TBL_HDR_SZ;
|
||||
|
||||
for (n = 0; n < nents; n++) {
|
||||
for_each_sg(command->sg, sg, nents, n) {
|
||||
dma_len = sg_dma_len(sg);
|
||||
if (dma_len > 0x400000)
|
||||
dev_err(&dd->pdev->dev,
|
||||
"DMA segment length truncated\n");
|
||||
command_sg->info = __force_bit2int
|
||||
cpu_to_le32((dma_len-1) & 0x3FFFFF);
|
||||
command_sg->dba = __force_bit2int
|
||||
cpu_to_le32(sg_dma_address(sg));
|
||||
command_sg->dba_upper = __force_bit2int
|
||||
command_sg->info = cpu_to_le32((dma_len-1) & 0x3FFFFF);
|
||||
command_sg->dba = cpu_to_le32(sg_dma_address(sg));
|
||||
command_sg->dba_upper =
|
||||
cpu_to_le32((sg_dma_address(sg) >> 16) >> 16);
|
||||
command_sg++;
|
||||
sg++;
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -2171,7 +2132,6 @@ static int mtip_hw_ioctl(struct driver_data *dd, unsigned int cmd,
|
|||
* @dd Pointer to the driver data structure.
|
||||
* @start First sector to read.
|
||||
* @nsect Number of sectors to read.
|
||||
* @nents Number of entries in scatter list for the read command.
|
||||
* @tag The tag of this read command.
|
||||
* @callback Pointer to the function that should be called
|
||||
* when the read completes.
|
||||
|
@ -2183,16 +2143,20 @@ static int mtip_hw_ioctl(struct driver_data *dd, unsigned int cmd,
|
|||
* None
|
||||
*/
|
||||
static void mtip_hw_submit_io(struct driver_data *dd, struct request *rq,
|
||||
struct mtip_cmd *command, int nents,
|
||||
struct mtip_cmd *command,
|
||||
struct blk_mq_hw_ctx *hctx)
|
||||
{
|
||||
struct mtip_cmd_hdr *hdr =
|
||||
dd->port->command_list + sizeof(struct mtip_cmd_hdr) * rq->tag;
|
||||
struct host_to_dev_fis *fis;
|
||||
struct mtip_port *port = dd->port;
|
||||
int dma_dir = rq_data_dir(rq) == READ ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
|
||||
u64 start = blk_rq_pos(rq);
|
||||
unsigned int nsect = blk_rq_sectors(rq);
|
||||
unsigned int nents;
|
||||
|
||||
/* Map the scatter list for DMA access */
|
||||
nents = blk_rq_map_sg(hctx->queue, rq, command->sg);
|
||||
nents = dma_map_sg(&dd->pdev->dev, command->sg, nents, dma_dir);
|
||||
|
||||
prefetch(&port->flags);
|
||||
|
@ -2233,10 +2197,11 @@ static void mtip_hw_submit_io(struct driver_data *dd, struct request *rq,
|
|||
fis->device |= 1 << 7;
|
||||
|
||||
/* Populate the command header */
|
||||
command->command_header->opts =
|
||||
__force_bit2int cpu_to_le32(
|
||||
(nents << 16) | 5 | AHCI_CMD_PREFETCH);
|
||||
command->command_header->byte_count = 0;
|
||||
hdr->ctba = cpu_to_le32(command->command_dma & 0xFFFFFFFF);
|
||||
if (test_bit(MTIP_PF_HOST_CAP_64, &dd->port->flags))
|
||||
hdr->ctbau = cpu_to_le32((command->command_dma >> 16) >> 16);
|
||||
hdr->opts = cpu_to_le32((nents << 16) | 5 | AHCI_CMD_PREFETCH);
|
||||
hdr->byte_count = 0;
|
||||
|
||||
command->direction = dma_dir;
|
||||
|
||||
|
@ -2715,12 +2680,12 @@ static void mtip_softirq_done_fn(struct request *rq)
|
|||
cmd->direction);
|
||||
|
||||
if (unlikely(cmd->unaligned))
|
||||
up(&dd->port->cmd_slot_unal);
|
||||
atomic_inc(&dd->port->cmd_slot_unal);
|
||||
|
||||
blk_mq_end_request(rq, cmd->status);
|
||||
}
|
||||
|
||||
static void mtip_abort_cmd(struct request *req, void *data, bool reserved)
|
||||
static bool mtip_abort_cmd(struct request *req, void *data, bool reserved)
|
||||
{
|
||||
struct mtip_cmd *cmd = blk_mq_rq_to_pdu(req);
|
||||
struct driver_data *dd = data;
|
||||
|
@ -2730,14 +2695,16 @@ static void mtip_abort_cmd(struct request *req, void *data, bool reserved)
|
|||
clear_bit(req->tag, dd->port->cmds_to_issue);
|
||||
cmd->status = BLK_STS_IOERR;
|
||||
mtip_softirq_done_fn(req);
|
||||
return true;
|
||||
}
|
||||
|
||||
static void mtip_queue_cmd(struct request *req, void *data, bool reserved)
|
||||
static bool mtip_queue_cmd(struct request *req, void *data, bool reserved)
|
||||
{
|
||||
struct driver_data *dd = data;
|
||||
|
||||
set_bit(req->tag, dd->port->cmds_to_issue);
|
||||
blk_abort_request(req);
|
||||
return true;
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -2803,10 +2770,7 @@ restart_eh:
|
|||
|
||||
blk_mq_quiesce_queue(dd->queue);
|
||||
|
||||
spin_lock(dd->queue->queue_lock);
|
||||
blk_mq_tagset_busy_iter(&dd->tags,
|
||||
mtip_queue_cmd, dd);
|
||||
spin_unlock(dd->queue->queue_lock);
|
||||
blk_mq_tagset_busy_iter(&dd->tags, mtip_queue_cmd, dd);
|
||||
|
||||
set_bit(MTIP_PF_ISSUE_CMDS_BIT, &dd->port->flags);
|
||||
|
||||
|
@ -3026,7 +2990,7 @@ static int mtip_hw_init(struct driver_data *dd)
|
|||
else
|
||||
dd->unal_qdepth = 0;
|
||||
|
||||
sema_init(&dd->port->cmd_slot_unal, dd->unal_qdepth);
|
||||
atomic_set(&dd->port->cmd_slot_unal, dd->unal_qdepth);
|
||||
|
||||
/* Spinlock to prevent concurrent issue */
|
||||
for (i = 0; i < MTIP_MAX_SLOT_GROUPS; i++)
|
||||
|
@ -3531,58 +3495,24 @@ static inline bool is_se_active(struct driver_data *dd)
|
|||
return false;
|
||||
}
|
||||
|
||||
/*
|
||||
* Block layer make request function.
|
||||
*
|
||||
* This function is called by the kernel to process a BIO for
|
||||
* the P320 device.
|
||||
*
|
||||
* @queue Pointer to the request queue. Unused other than to obtain
|
||||
* the driver data structure.
|
||||
* @rq Pointer to the request.
|
||||
*
|
||||
*/
|
||||
static int mtip_submit_request(struct blk_mq_hw_ctx *hctx, struct request *rq)
|
||||
static inline bool is_stopped(struct driver_data *dd, struct request *rq)
|
||||
{
|
||||
struct driver_data *dd = hctx->queue->queuedata;
|
||||
struct mtip_cmd *cmd = blk_mq_rq_to_pdu(rq);
|
||||
unsigned int nents;
|
||||
if (likely(!(dd->dd_flag & MTIP_DDF_STOP_IO)))
|
||||
return false;
|
||||
|
||||
if (is_se_active(dd))
|
||||
return -ENODATA;
|
||||
if (test_bit(MTIP_DDF_REMOVE_PENDING_BIT, &dd->dd_flag))
|
||||
return true;
|
||||
if (test_bit(MTIP_DDF_OVER_TEMP_BIT, &dd->dd_flag))
|
||||
return true;
|
||||
if (test_bit(MTIP_DDF_WRITE_PROTECT_BIT, &dd->dd_flag) &&
|
||||
rq_data_dir(rq))
|
||||
return true;
|
||||
if (test_bit(MTIP_DDF_SEC_LOCK_BIT, &dd->dd_flag))
|
||||
return true;
|
||||
if (test_bit(MTIP_DDF_REBUILD_FAILED_BIT, &dd->dd_flag))
|
||||
return true;
|
||||
|
||||
if (unlikely(dd->dd_flag & MTIP_DDF_STOP_IO)) {
|
||||
if (unlikely(test_bit(MTIP_DDF_REMOVE_PENDING_BIT,
|
||||
&dd->dd_flag))) {
|
||||
return -ENXIO;
|
||||
}
|
||||
if (unlikely(test_bit(MTIP_DDF_OVER_TEMP_BIT, &dd->dd_flag))) {
|
||||
return -ENODATA;
|
||||
}
|
||||
if (unlikely(test_bit(MTIP_DDF_WRITE_PROTECT_BIT,
|
||||
&dd->dd_flag) &&
|
||||
rq_data_dir(rq))) {
|
||||
return -ENODATA;
|
||||
}
|
||||
if (unlikely(test_bit(MTIP_DDF_SEC_LOCK_BIT, &dd->dd_flag) ||
|
||||
test_bit(MTIP_DDF_REBUILD_FAILED_BIT, &dd->dd_flag)))
|
||||
return -ENODATA;
|
||||
}
|
||||
|
||||
if (req_op(rq) == REQ_OP_DISCARD) {
|
||||
int err;
|
||||
|
||||
err = mtip_send_trim(dd, blk_rq_pos(rq), blk_rq_sectors(rq));
|
||||
blk_mq_end_request(rq, err ? BLK_STS_IOERR : BLK_STS_OK);
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* Create the scatter list for this request. */
|
||||
nents = blk_rq_map_sg(hctx->queue, rq, cmd->sg);
|
||||
|
||||
/* Issue the read/write. */
|
||||
mtip_hw_submit_io(dd, rq, cmd, nents, hctx);
|
||||
return 0;
|
||||
return false;
|
||||
}
|
||||
|
||||
static bool mtip_check_unal_depth(struct blk_mq_hw_ctx *hctx,
|
||||
|
@ -3603,7 +3533,7 @@ static bool mtip_check_unal_depth(struct blk_mq_hw_ctx *hctx,
|
|||
cmd->unaligned = 1;
|
||||
}
|
||||
|
||||
if (cmd->unaligned && down_trylock(&dd->port->cmd_slot_unal))
|
||||
if (cmd->unaligned && atomic_dec_if_positive(&dd->port->cmd_slot_unal) >= 0)
|
||||
return true;
|
||||
|
||||
return false;
|
||||
|
@ -3613,32 +3543,33 @@ static blk_status_t mtip_issue_reserved_cmd(struct blk_mq_hw_ctx *hctx,
|
|||
struct request *rq)
|
||||
{
|
||||
struct driver_data *dd = hctx->queue->queuedata;
|
||||
struct mtip_int_cmd *icmd = rq->special;
|
||||
struct mtip_cmd *cmd = blk_mq_rq_to_pdu(rq);
|
||||
struct mtip_int_cmd *icmd = cmd->icmd;
|
||||
struct mtip_cmd_hdr *hdr =
|
||||
dd->port->command_list + sizeof(struct mtip_cmd_hdr) * rq->tag;
|
||||
struct mtip_cmd_sg *command_sg;
|
||||
|
||||
if (mtip_commands_active(dd->port))
|
||||
return BLK_STS_RESOURCE;
|
||||
return BLK_STS_DEV_RESOURCE;
|
||||
|
||||
hdr->ctba = cpu_to_le32(cmd->command_dma & 0xFFFFFFFF);
|
||||
if (test_bit(MTIP_PF_HOST_CAP_64, &dd->port->flags))
|
||||
hdr->ctbau = cpu_to_le32((cmd->command_dma >> 16) >> 16);
|
||||
/* Populate the SG list */
|
||||
cmd->command_header->opts =
|
||||
__force_bit2int cpu_to_le32(icmd->opts | icmd->fis_len);
|
||||
hdr->opts = cpu_to_le32(icmd->opts | icmd->fis_len);
|
||||
if (icmd->buf_len) {
|
||||
command_sg = cmd->command + AHCI_CMD_TBL_HDR_SZ;
|
||||
|
||||
command_sg->info =
|
||||
__force_bit2int cpu_to_le32((icmd->buf_len-1) & 0x3FFFFF);
|
||||
command_sg->dba =
|
||||
__force_bit2int cpu_to_le32(icmd->buffer & 0xFFFFFFFF);
|
||||
command_sg->info = cpu_to_le32((icmd->buf_len-1) & 0x3FFFFF);
|
||||
command_sg->dba = cpu_to_le32(icmd->buffer & 0xFFFFFFFF);
|
||||
command_sg->dba_upper =
|
||||
__force_bit2int cpu_to_le32((icmd->buffer >> 16) >> 16);
|
||||
cpu_to_le32((icmd->buffer >> 16) >> 16);
|
||||
|
||||
cmd->command_header->opts |=
|
||||
__force_bit2int cpu_to_le32((1 << 16));
|
||||
hdr->opts |= cpu_to_le32((1 << 16));
|
||||
}
|
||||
|
||||
/* Populate the command header */
|
||||
cmd->command_header->byte_count = 0;
|
||||
hdr->byte_count = 0;
|
||||
|
||||
blk_mq_start_request(rq);
|
||||
mtip_issue_non_ncq_command(dd->port, rq->tag);
|
||||
|
@ -3648,23 +3579,25 @@ static blk_status_t mtip_issue_reserved_cmd(struct blk_mq_hw_ctx *hctx,
|
|||
static blk_status_t mtip_queue_rq(struct blk_mq_hw_ctx *hctx,
|
||||
const struct blk_mq_queue_data *bd)
|
||||
{
|
||||
struct driver_data *dd = hctx->queue->queuedata;
|
||||
struct request *rq = bd->rq;
|
||||
int ret;
|
||||
|
||||
mtip_init_cmd_header(rq);
|
||||
struct mtip_cmd *cmd = blk_mq_rq_to_pdu(rq);
|
||||
|
||||
if (blk_rq_is_passthrough(rq))
|
||||
return mtip_issue_reserved_cmd(hctx, rq);
|
||||
|
||||
if (unlikely(mtip_check_unal_depth(hctx, rq)))
|
||||
return BLK_STS_RESOURCE;
|
||||
return BLK_STS_DEV_RESOURCE;
|
||||
|
||||
if (is_se_active(dd) || is_stopped(dd, rq))
|
||||
return BLK_STS_IOERR;
|
||||
|
||||
blk_mq_start_request(rq);
|
||||
|
||||
ret = mtip_submit_request(hctx, rq);
|
||||
if (likely(!ret))
|
||||
return BLK_STS_OK;
|
||||
return BLK_STS_IOERR;
|
||||
if (req_op(rq) == REQ_OP_DISCARD)
|
||||
return mtip_send_trim(dd, blk_rq_pos(rq), blk_rq_sectors(rq));
|
||||
mtip_hw_submit_io(dd, rq, cmd, hctx);
|
||||
return BLK_STS_OK;
|
||||
}
|
||||
|
||||
static void mtip_free_cmd(struct blk_mq_tag_set *set, struct request *rq,
|
||||
|
@ -3920,12 +3853,13 @@ protocol_init_error:
|
|||
return rv;
|
||||
}
|
||||
|
||||
static void mtip_no_dev_cleanup(struct request *rq, void *data, bool reserv)
|
||||
static bool mtip_no_dev_cleanup(struct request *rq, void *data, bool reserv)
|
||||
{
|
||||
struct mtip_cmd *cmd = blk_mq_rq_to_pdu(rq);
|
||||
|
||||
cmd->status = BLK_STS_IOERR;
|
||||
blk_mq_complete_request(rq);
|
||||
return true;
|
||||
}
|
||||
|
||||
/*
|
||||
|
|
|
@ -126,8 +126,6 @@
|
|||
|
||||
#define MTIP_DFS_MAX_BUF_SIZE 1024
|
||||
|
||||
#define __force_bit2int (unsigned int __force)
|
||||
|
||||
enum {
|
||||
/* below are bit numbers in 'flags' defined in mtip_port */
|
||||
MTIP_PF_IC_ACTIVE_BIT = 0, /* pio/ioctl */
|
||||
|
@ -174,10 +172,10 @@ enum {
|
|||
|
||||
struct smart_attr {
|
||||
u8 attr_id;
|
||||
u16 flags;
|
||||
__le16 flags;
|
||||
u8 cur;
|
||||
u8 worst;
|
||||
u32 data;
|
||||
__le32 data;
|
||||
u8 res[3];
|
||||
} __packed;
|
||||
|
||||
|
@ -200,9 +198,9 @@ struct mtip_work {
|
|||
#define MTIP_MAX_TRIM_ENTRY_LEN 0xfff8
|
||||
|
||||
struct mtip_trim_entry {
|
||||
u32 lba; /* starting lba of region */
|
||||
u16 rsvd; /* unused */
|
||||
u16 range; /* # of 512b blocks to trim */
|
||||
__le32 lba; /* starting lba of region */
|
||||
__le16 rsvd; /* unused */
|
||||
__le16 range; /* # of 512b blocks to trim */
|
||||
} __packed;
|
||||
|
||||
struct mtip_trim {
|
||||
|
@ -278,24 +276,24 @@ struct mtip_cmd_hdr {
|
|||
* - Bit 5 Unused in this implementation.
|
||||
* - Bits 4:0 Length of the command FIS in DWords (DWord = 4 bytes).
|
||||
*/
|
||||
unsigned int opts;
|
||||
__le32 opts;
|
||||
/* This field is unsed when using NCQ. */
|
||||
union {
|
||||
unsigned int byte_count;
|
||||
unsigned int status;
|
||||
__le32 byte_count;
|
||||
__le32 status;
|
||||
};
|
||||
/*
|
||||
* Lower 32 bits of the command table address associated with this
|
||||
* header. The command table addresses must be 128 byte aligned.
|
||||
*/
|
||||
unsigned int ctba;
|
||||
__le32 ctba;
|
||||
/*
|
||||
* If 64 bit addressing is used this field is the upper 32 bits
|
||||
* of the command table address associated with this command.
|
||||
*/
|
||||
unsigned int ctbau;
|
||||
__le32 ctbau;
|
||||
/* Reserved and unused. */
|
||||
unsigned int res[4];
|
||||
u32 res[4];
|
||||
};
|
||||
|
||||
/* Command scatter gather structure (PRD). */
|
||||
|
@ -305,31 +303,28 @@ struct mtip_cmd_sg {
|
|||
* address must be 8 byte aligned signified by bits 2:0 being
|
||||
* set to 0.
|
||||
*/
|
||||
unsigned int dba;
|
||||
__le32 dba;
|
||||
/*
|
||||
* When 64 bit addressing is used this field is the upper
|
||||
* 32 bits of the data buffer address.
|
||||
*/
|
||||
unsigned int dba_upper;
|
||||
__le32 dba_upper;
|
||||
/* Unused. */
|
||||
unsigned int reserved;
|
||||
__le32 reserved;
|
||||
/*
|
||||
* Bit 31: interrupt when this data block has been transferred.
|
||||
* Bits 30..22: reserved
|
||||
* Bits 21..0: byte count (minus 1). For P320 the byte count must be
|
||||
* 8 byte aligned signified by bits 2:0 being set to 1.
|
||||
*/
|
||||
unsigned int info;
|
||||
__le32 info;
|
||||
};
|
||||
struct mtip_port;
|
||||
|
||||
struct mtip_int_cmd;
|
||||
|
||||
/* Structure used to describe a command. */
|
||||
struct mtip_cmd {
|
||||
|
||||
struct mtip_cmd_hdr *command_header; /* ptr to command header entry */
|
||||
|
||||
dma_addr_t command_header_dma; /* corresponding physical address */
|
||||
|
||||
void *command; /* ptr to command table entry */
|
||||
|
||||
dma_addr_t command_dma; /* corresponding physical address */
|
||||
|
@ -338,7 +333,10 @@ struct mtip_cmd {
|
|||
|
||||
int unaligned; /* command is unaligned on 4k boundary */
|
||||
|
||||
struct scatterlist sg[MTIP_MAX_SG]; /* Scatter list entries */
|
||||
union {
|
||||
struct scatterlist sg[MTIP_MAX_SG]; /* Scatter list entries */
|
||||
struct mtip_int_cmd *icmd;
|
||||
};
|
||||
|
||||
int retries; /* The number of retries left for this command. */
|
||||
|
||||
|
@ -435,8 +433,8 @@ struct mtip_port {
|
|||
*/
|
||||
unsigned long ic_pause_timer;
|
||||
|
||||
/* Semaphore to control queue depth of unaligned IOs */
|
||||
struct semaphore cmd_slot_unal;
|
||||
/* Counter to control queue depth of unaligned IOs */
|
||||
atomic_t cmd_slot_unal;
|
||||
|
||||
/* Spinlock for working around command-issue bug. */
|
||||
spinlock_t cmd_issue_lock[MTIP_MAX_SLOT_GROUPS];
|
||||
|
|
|
@ -734,12 +734,13 @@ static void recv_work(struct work_struct *work)
|
|||
kfree(args);
|
||||
}
|
||||
|
||||
static void nbd_clear_req(struct request *req, void *data, bool reserved)
|
||||
static bool nbd_clear_req(struct request *req, void *data, bool reserved)
|
||||
{
|
||||
struct nbd_cmd *cmd = blk_mq_rq_to_pdu(req);
|
||||
|
||||
cmd->status = BLK_STS_IOERR;
|
||||
blk_mq_complete_request(req);
|
||||
return true;
|
||||
}
|
||||
|
||||
static void nbd_clear_que(struct nbd_device *nbd)
|
||||
|
|
|
@ -49,6 +49,7 @@ struct nullb_device {
|
|||
unsigned long completion_nsec; /* time in ns to complete a request */
|
||||
unsigned long cache_size; /* disk cache size in MB */
|
||||
unsigned long zone_size; /* zone size in MB if device is zoned */
|
||||
unsigned int zone_nr_conv; /* number of conventional zones */
|
||||
unsigned int submit_queues; /* number of submission queues */
|
||||
unsigned int home_node; /* home node for the device */
|
||||
unsigned int queue_mode; /* block interface */
|
||||
|
|
|
@ -188,6 +188,10 @@ static unsigned long g_zone_size = 256;
|
|||
module_param_named(zone_size, g_zone_size, ulong, S_IRUGO);
|
||||
MODULE_PARM_DESC(zone_size, "Zone size in MB when block device is zoned. Must be power-of-two: Default: 256");
|
||||
|
||||
static unsigned int g_zone_nr_conv;
|
||||
module_param_named(zone_nr_conv, g_zone_nr_conv, uint, 0444);
|
||||
MODULE_PARM_DESC(zone_nr_conv, "Number of conventional zones when block device is zoned. Default: 0");
|
||||
|
||||
static struct nullb_device *null_alloc_dev(void);
|
||||
static void null_free_dev(struct nullb_device *dev);
|
||||
static void null_del_dev(struct nullb *nullb);
|
||||
|
@ -293,6 +297,7 @@ NULLB_DEVICE_ATTR(mbps, uint);
|
|||
NULLB_DEVICE_ATTR(cache_size, ulong);
|
||||
NULLB_DEVICE_ATTR(zoned, bool);
|
||||
NULLB_DEVICE_ATTR(zone_size, ulong);
|
||||
NULLB_DEVICE_ATTR(zone_nr_conv, uint);
|
||||
|
||||
static ssize_t nullb_device_power_show(struct config_item *item, char *page)
|
||||
{
|
||||
|
@ -407,6 +412,7 @@ static struct configfs_attribute *nullb_device_attrs[] = {
|
|||
&nullb_device_attr_badblocks,
|
||||
&nullb_device_attr_zoned,
|
||||
&nullb_device_attr_zone_size,
|
||||
&nullb_device_attr_zone_nr_conv,
|
||||
NULL,
|
||||
};
|
||||
|
||||
|
@ -520,6 +526,7 @@ static struct nullb_device *null_alloc_dev(void)
|
|||
dev->use_per_node_hctx = g_use_per_node_hctx;
|
||||
dev->zoned = g_zoned;
|
||||
dev->zone_size = g_zone_size;
|
||||
dev->zone_nr_conv = g_zone_nr_conv;
|
||||
return dev;
|
||||
}
|
||||
|
||||
|
@ -635,14 +642,9 @@ static void null_cmd_end_timer(struct nullb_cmd *cmd)
|
|||
hrtimer_start(&cmd->timer, kt, HRTIMER_MODE_REL);
|
||||
}
|
||||
|
||||
static void null_softirq_done_fn(struct request *rq)
|
||||
static void null_complete_rq(struct request *rq)
|
||||
{
|
||||
struct nullb *nullb = rq->q->queuedata;
|
||||
|
||||
if (nullb->dev->queue_mode == NULL_Q_MQ)
|
||||
end_cmd(blk_mq_rq_to_pdu(rq));
|
||||
else
|
||||
end_cmd(rq->special);
|
||||
end_cmd(blk_mq_rq_to_pdu(rq));
|
||||
}
|
||||
|
||||
static struct nullb_page *null_alloc_page(gfp_t gfp_flags)
|
||||
|
@ -1350,7 +1352,7 @@ static blk_status_t null_queue_rq(struct blk_mq_hw_ctx *hctx,
|
|||
|
||||
static const struct blk_mq_ops null_mq_ops = {
|
||||
.queue_rq = null_queue_rq,
|
||||
.complete = null_softirq_done_fn,
|
||||
.complete = null_complete_rq,
|
||||
.timeout = null_timeout_rq,
|
||||
};
|
||||
|
||||
|
@ -1657,8 +1659,7 @@ static int null_add_dev(struct nullb_device *dev)
|
|||
}
|
||||
null_init_queues(nullb);
|
||||
} else if (dev->queue_mode == NULL_Q_BIO) {
|
||||
nullb->q = blk_alloc_queue_node(GFP_KERNEL, dev->home_node,
|
||||
NULL);
|
||||
nullb->q = blk_alloc_queue_node(GFP_KERNEL, dev->home_node);
|
||||
if (!nullb->q) {
|
||||
rv = -ENOMEM;
|
||||
goto out_cleanup_queues;
|
||||
|
|
|
@ -29,7 +29,25 @@ int null_zone_init(struct nullb_device *dev)
|
|||
if (!dev->zones)
|
||||
return -ENOMEM;
|
||||
|
||||
for (i = 0; i < dev->nr_zones; i++) {
|
||||
if (dev->zone_nr_conv >= dev->nr_zones) {
|
||||
dev->zone_nr_conv = dev->nr_zones - 1;
|
||||
pr_info("null_blk: changed the number of conventional zones to %u",
|
||||
dev->zone_nr_conv);
|
||||
}
|
||||
|
||||
for (i = 0; i < dev->zone_nr_conv; i++) {
|
||||
struct blk_zone *zone = &dev->zones[i];
|
||||
|
||||
zone->start = sector;
|
||||
zone->len = dev->zone_size_sects;
|
||||
zone->wp = zone->start + zone->len;
|
||||
zone->type = BLK_ZONE_TYPE_CONVENTIONAL;
|
||||
zone->cond = BLK_ZONE_COND_NOT_WP;
|
||||
|
||||
sector += dev->zone_size_sects;
|
||||
}
|
||||
|
||||
for (i = dev->zone_nr_conv; i < dev->nr_zones; i++) {
|
||||
struct blk_zone *zone = &dev->zones[i];
|
||||
|
||||
zone->start = zone->wp = sector;
|
||||
|
@ -98,6 +116,8 @@ void null_zone_write(struct nullb_cmd *cmd, sector_t sector,
|
|||
if (zone->wp == zone->start + zone->len)
|
||||
zone->cond = BLK_ZONE_COND_FULL;
|
||||
break;
|
||||
case BLK_ZONE_COND_NOT_WP:
|
||||
break;
|
||||
default:
|
||||
/* Invalid zone condition */
|
||||
cmd->error = BLK_STS_IOERR;
|
||||
|
@ -111,6 +131,11 @@ void null_zone_reset(struct nullb_cmd *cmd, sector_t sector)
|
|||
unsigned int zno = null_zone_no(dev, sector);
|
||||
struct blk_zone *zone = &dev->zones[zno];
|
||||
|
||||
if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL) {
|
||||
cmd->error = BLK_STS_IOERR;
|
||||
return;
|
||||
}
|
||||
|
||||
zone->cond = BLK_ZONE_COND_EMPTY;
|
||||
zone->wp = zone->start;
|
||||
}
|
||||
|
|
|
@ -242,6 +242,11 @@ struct pd_unit {
|
|||
|
||||
static struct pd_unit pd[PD_UNITS];
|
||||
|
||||
struct pd_req {
|
||||
/* for REQ_OP_DRV_IN: */
|
||||
enum action (*func)(struct pd_unit *disk);
|
||||
};
|
||||
|
||||
static char pd_scratch[512]; /* scratch block buffer */
|
||||
|
||||
static char *pd_errs[17] = { "ERR", "INDEX", "ECC", "DRQ", "SEEK", "WRERR",
|
||||
|
@ -502,8 +507,9 @@ static enum action do_pd_io_start(void)
|
|||
|
||||
static enum action pd_special(void)
|
||||
{
|
||||
enum action (*func)(struct pd_unit *) = pd_req->special;
|
||||
return func(pd_current);
|
||||
struct pd_req *req = blk_mq_rq_to_pdu(pd_req);
|
||||
|
||||
return req->func(pd_current);
|
||||
}
|
||||
|
||||
static int pd_next_buf(void)
|
||||
|
@ -767,12 +773,14 @@ static int pd_special_command(struct pd_unit *disk,
|
|||
enum action (*func)(struct pd_unit *disk))
|
||||
{
|
||||
struct request *rq;
|
||||
struct pd_req *req;
|
||||
|
||||
rq = blk_get_request(disk->gd->queue, REQ_OP_DRV_IN, 0);
|
||||
if (IS_ERR(rq))
|
||||
return PTR_ERR(rq);
|
||||
req = blk_mq_rq_to_pdu(rq);
|
||||
|
||||
rq->special = func;
|
||||
req->func = func;
|
||||
blk_execute_rq(disk->gd->queue, disk->gd, rq, 0);
|
||||
blk_put_request(rq);
|
||||
return 0;
|
||||
|
@ -892,9 +900,21 @@ static void pd_probe_drive(struct pd_unit *disk)
|
|||
disk->gd = p;
|
||||
p->private_data = disk;
|
||||
|
||||
p->queue = blk_mq_init_sq_queue(&disk->tag_set, &pd_mq_ops, 2,
|
||||
BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_BLOCKING);
|
||||
memset(&disk->tag_set, 0, sizeof(disk->tag_set));
|
||||
disk->tag_set.ops = &pd_mq_ops;
|
||||
disk->tag_set.cmd_size = sizeof(struct pd_req);
|
||||
disk->tag_set.nr_hw_queues = 1;
|
||||
disk->tag_set.nr_maps = 1;
|
||||
disk->tag_set.queue_depth = 2;
|
||||
disk->tag_set.numa_node = NUMA_NO_NODE;
|
||||
disk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_BLOCKING;
|
||||
|
||||
if (blk_mq_alloc_tag_set(&disk->tag_set))
|
||||
return;
|
||||
|
||||
p->queue = blk_mq_init_queue(&disk->tag_set);
|
||||
if (IS_ERR(p->queue)) {
|
||||
blk_mq_free_tag_set(&disk->tag_set);
|
||||
p->queue = NULL;
|
||||
return;
|
||||
}
|
||||
|
|
|
@ -2203,9 +2203,7 @@ static int pkt_open_dev(struct pktcdvd_device *pd, fmode_t write)
|
|||
* Some CDRW drives can not handle writes larger than one packet,
|
||||
* even if the size is a multiple of the packet size.
|
||||
*/
|
||||
spin_lock_irq(q->queue_lock);
|
||||
blk_queue_max_hw_sectors(q, pd->settings.size);
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
set_bit(PACKET_WRITABLE, &pd->flags);
|
||||
} else {
|
||||
pkt_set_speed(pd, MAX_SPEED, MAX_SPEED);
|
||||
|
|
|
@ -181,6 +181,7 @@ struct skd_request_context {
|
|||
struct fit_completion_entry_v1 completion;
|
||||
|
||||
struct fit_comp_error_info err_info;
|
||||
int retries;
|
||||
|
||||
blk_status_t status;
|
||||
};
|
||||
|
@ -382,11 +383,12 @@ static void skd_log_skreq(struct skd_device *skdev,
|
|||
* READ/WRITE REQUESTS
|
||||
*****************************************************************************
|
||||
*/
|
||||
static void skd_inc_in_flight(struct request *rq, void *data, bool reserved)
|
||||
static bool skd_inc_in_flight(struct request *rq, void *data, bool reserved)
|
||||
{
|
||||
int *count = data;
|
||||
|
||||
count++;
|
||||
return true;
|
||||
}
|
||||
|
||||
static int skd_in_flight(struct skd_device *skdev)
|
||||
|
@ -494,6 +496,11 @@ static blk_status_t skd_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
|
|||
if (unlikely(skdev->state != SKD_DRVR_STATE_ONLINE))
|
||||
return skd_fail_all(q) ? BLK_STS_IOERR : BLK_STS_RESOURCE;
|
||||
|
||||
if (!(req->rq_flags & RQF_DONTPREP)) {
|
||||
skreq->retries = 0;
|
||||
req->rq_flags |= RQF_DONTPREP;
|
||||
}
|
||||
|
||||
blk_mq_start_request(req);
|
||||
|
||||
WARN_ONCE(tag >= skd_max_queue_depth, "%#x > %#x (nr_requests = %lu)\n",
|
||||
|
@ -1425,7 +1432,7 @@ static void skd_resolve_req_exception(struct skd_device *skdev,
|
|||
break;
|
||||
|
||||
case SKD_CHECK_STATUS_REQUEUE_REQUEST:
|
||||
if ((unsigned long) ++req->special < SKD_MAX_RETRIES) {
|
||||
if (++skreq->retries < SKD_MAX_RETRIES) {
|
||||
skd_log_skreq(skdev, skreq, "retry");
|
||||
blk_mq_requeue_request(req, true);
|
||||
break;
|
||||
|
@ -1887,13 +1894,13 @@ static void skd_isr_fwstate(struct skd_device *skdev)
|
|||
skd_skdev_state_to_str(skdev->state), skdev->state);
|
||||
}
|
||||
|
||||
static void skd_recover_request(struct request *req, void *data, bool reserved)
|
||||
static bool skd_recover_request(struct request *req, void *data, bool reserved)
|
||||
{
|
||||
struct skd_device *const skdev = data;
|
||||
struct skd_request_context *skreq = blk_mq_rq_to_pdu(req);
|
||||
|
||||
if (skreq->state != SKD_REQ_STATE_BUSY)
|
||||
return;
|
||||
return true;
|
||||
|
||||
skd_log_skreq(skdev, skreq, "recover");
|
||||
|
||||
|
@ -1904,6 +1911,7 @@ static void skd_recover_request(struct request *req, void *data, bool reserved)
|
|||
skreq->state = SKD_REQ_STATE_IDLE;
|
||||
skreq->status = BLK_STS_IOERR;
|
||||
blk_mq_complete_request(req);
|
||||
return true;
|
||||
}
|
||||
|
||||
static void skd_recover_requests(struct skd_device *skdev)
|
||||
|
|
|
@ -6,7 +6,7 @@
|
|||
#include <linux/module.h>
|
||||
#include <linux/kernel.h>
|
||||
#include <linux/types.h>
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/blk-mq.h>
|
||||
#include <linux/hdreg.h>
|
||||
#include <linux/genhd.h>
|
||||
#include <linux/cdrom.h>
|
||||
|
@ -45,6 +45,8 @@ MODULE_VERSION(DRV_MODULE_VERSION);
|
|||
#define WAITING_FOR_GEN_CMD 0x04
|
||||
#define WAITING_FOR_ANY -1
|
||||
|
||||
#define VDC_MAX_RETRIES 10
|
||||
|
||||
static struct workqueue_struct *sunvdc_wq;
|
||||
|
||||
struct vdc_req_entry {
|
||||
|
@ -66,9 +68,10 @@ struct vdc_port {
|
|||
|
||||
u64 max_xfer_size;
|
||||
u32 vdisk_block_size;
|
||||
u32 drain;
|
||||
|
||||
u64 ldc_timeout;
|
||||
struct timer_list ldc_reset_timer;
|
||||
struct delayed_work ldc_reset_timer_work;
|
||||
struct work_struct ldc_reset_work;
|
||||
|
||||
/* The server fills these in for us in the disk attribute
|
||||
|
@ -80,12 +83,14 @@ struct vdc_port {
|
|||
u8 vdisk_mtype;
|
||||
u32 vdisk_phys_blksz;
|
||||
|
||||
struct blk_mq_tag_set tag_set;
|
||||
|
||||
char disk_name[32];
|
||||
};
|
||||
|
||||
static void vdc_ldc_reset(struct vdc_port *port);
|
||||
static void vdc_ldc_reset_work(struct work_struct *work);
|
||||
static void vdc_ldc_reset_timer(struct timer_list *t);
|
||||
static void vdc_ldc_reset_timer_work(struct work_struct *work);
|
||||
|
||||
static inline struct vdc_port *to_vdc_port(struct vio_driver_state *vio)
|
||||
{
|
||||
|
@ -175,11 +180,8 @@ static void vdc_blk_queue_start(struct vdc_port *port)
|
|||
* handshake completes, so check for initial handshake before we've
|
||||
* allocated a disk.
|
||||
*/
|
||||
if (port->disk && blk_queue_stopped(port->disk->queue) &&
|
||||
vdc_tx_dring_avail(dr) * 100 / VDC_TX_RING_SIZE >= 50) {
|
||||
blk_start_queue(port->disk->queue);
|
||||
}
|
||||
|
||||
if (port->disk && vdc_tx_dring_avail(dr) * 100 / VDC_TX_RING_SIZE >= 50)
|
||||
blk_mq_start_hw_queues(port->disk->queue);
|
||||
}
|
||||
|
||||
static void vdc_finish(struct vio_driver_state *vio, int err, int waiting_for)
|
||||
|
@ -197,7 +199,7 @@ static void vdc_handshake_complete(struct vio_driver_state *vio)
|
|||
{
|
||||
struct vdc_port *port = to_vdc_port(vio);
|
||||
|
||||
del_timer(&port->ldc_reset_timer);
|
||||
cancel_delayed_work(&port->ldc_reset_timer_work);
|
||||
vdc_finish(vio, 0, WAITING_FOR_LINK_UP);
|
||||
vdc_blk_queue_start(port);
|
||||
}
|
||||
|
@ -320,7 +322,7 @@ static void vdc_end_one(struct vdc_port *port, struct vio_dring_state *dr,
|
|||
|
||||
rqe->req = NULL;
|
||||
|
||||
__blk_end_request(req, (desc->status ? BLK_STS_IOERR : 0), desc->size);
|
||||
blk_mq_end_request(req, desc->status ? BLK_STS_IOERR : 0);
|
||||
|
||||
vdc_blk_queue_start(port);
|
||||
}
|
||||
|
@ -431,6 +433,7 @@ static int __vdc_tx_trigger(struct vdc_port *port)
|
|||
.end_idx = dr->prod,
|
||||
};
|
||||
int err, delay;
|
||||
int retries = 0;
|
||||
|
||||
hdr.seq = dr->snd_nxt;
|
||||
delay = 1;
|
||||
|
@ -443,6 +446,8 @@ static int __vdc_tx_trigger(struct vdc_port *port)
|
|||
udelay(delay);
|
||||
if ((delay <<= 1) > 128)
|
||||
delay = 128;
|
||||
if (retries++ > VDC_MAX_RETRIES)
|
||||
break;
|
||||
} while (err == -EAGAIN);
|
||||
|
||||
if (err == -ENOTCONN)
|
||||
|
@ -525,29 +530,40 @@ static int __send_request(struct request *req)
|
|||
return err;
|
||||
}
|
||||
|
||||
static void do_vdc_request(struct request_queue *rq)
|
||||
static blk_status_t vdc_queue_rq(struct blk_mq_hw_ctx *hctx,
|
||||
const struct blk_mq_queue_data *bd)
|
||||
{
|
||||
struct request *req;
|
||||
struct vdc_port *port = hctx->queue->queuedata;
|
||||
struct vio_dring_state *dr;
|
||||
unsigned long flags;
|
||||
|
||||
while ((req = blk_peek_request(rq)) != NULL) {
|
||||
struct vdc_port *port;
|
||||
struct vio_dring_state *dr;
|
||||
dr = &port->vio.drings[VIO_DRIVER_TX_RING];
|
||||
|
||||
port = req->rq_disk->private_data;
|
||||
dr = &port->vio.drings[VIO_DRIVER_TX_RING];
|
||||
if (unlikely(vdc_tx_dring_avail(dr) < 1))
|
||||
goto wait;
|
||||
blk_mq_start_request(bd->rq);
|
||||
|
||||
blk_start_request(req);
|
||||
spin_lock_irqsave(&port->vio.lock, flags);
|
||||
|
||||
if (__send_request(req) < 0) {
|
||||
blk_requeue_request(rq, req);
|
||||
wait:
|
||||
/* Avoid pointless unplugs. */
|
||||
blk_stop_queue(rq);
|
||||
break;
|
||||
}
|
||||
/*
|
||||
* Doing drain, just end the request in error
|
||||
*/
|
||||
if (unlikely(port->drain)) {
|
||||
spin_unlock_irqrestore(&port->vio.lock, flags);
|
||||
return BLK_STS_IOERR;
|
||||
}
|
||||
|
||||
if (unlikely(vdc_tx_dring_avail(dr) < 1)) {
|
||||
spin_unlock_irqrestore(&port->vio.lock, flags);
|
||||
blk_mq_stop_hw_queue(hctx);
|
||||
return BLK_STS_DEV_RESOURCE;
|
||||
}
|
||||
|
||||
if (__send_request(bd->rq) < 0) {
|
||||
spin_unlock_irqrestore(&port->vio.lock, flags);
|
||||
return BLK_STS_IOERR;
|
||||
}
|
||||
|
||||
spin_unlock_irqrestore(&port->vio.lock, flags);
|
||||
return BLK_STS_OK;
|
||||
}
|
||||
|
||||
static int generic_request(struct vdc_port *port, u8 op, void *buf, int len)
|
||||
|
@ -759,6 +775,31 @@ static void vdc_port_down(struct vdc_port *port)
|
|||
vio_ldc_free(&port->vio);
|
||||
}
|
||||
|
||||
static const struct blk_mq_ops vdc_mq_ops = {
|
||||
.queue_rq = vdc_queue_rq,
|
||||
};
|
||||
|
||||
static void cleanup_queue(struct request_queue *q)
|
||||
{
|
||||
struct vdc_port *port = q->queuedata;
|
||||
|
||||
blk_cleanup_queue(q);
|
||||
blk_mq_free_tag_set(&port->tag_set);
|
||||
}
|
||||
|
||||
static struct request_queue *init_queue(struct vdc_port *port)
|
||||
{
|
||||
struct request_queue *q;
|
||||
|
||||
q = blk_mq_init_sq_queue(&port->tag_set, &vdc_mq_ops, VDC_TX_RING_SIZE,
|
||||
BLK_MQ_F_SHOULD_MERGE);
|
||||
if (IS_ERR(q))
|
||||
return q;
|
||||
|
||||
q->queuedata = port;
|
||||
return q;
|
||||
}
|
||||
|
||||
static int probe_disk(struct vdc_port *port)
|
||||
{
|
||||
struct request_queue *q;
|
||||
|
@ -796,17 +837,17 @@ static int probe_disk(struct vdc_port *port)
|
|||
(u64)geom.num_sec);
|
||||
}
|
||||
|
||||
q = blk_init_queue(do_vdc_request, &port->vio.lock);
|
||||
if (!q) {
|
||||
q = init_queue(port);
|
||||
if (IS_ERR(q)) {
|
||||
printk(KERN_ERR PFX "%s: Could not allocate queue.\n",
|
||||
port->vio.name);
|
||||
return -ENOMEM;
|
||||
return PTR_ERR(q);
|
||||
}
|
||||
g = alloc_disk(1 << PARTITION_SHIFT);
|
||||
if (!g) {
|
||||
printk(KERN_ERR PFX "%s: Could not allocate gendisk.\n",
|
||||
port->vio.name);
|
||||
blk_cleanup_queue(q);
|
||||
cleanup_queue(q);
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
|
@ -981,7 +1022,7 @@ static int vdc_port_probe(struct vio_dev *vdev, const struct vio_device_id *id)
|
|||
*/
|
||||
ldc_timeout = mdesc_get_property(hp, vdev->mp, "vdc-timeout", NULL);
|
||||
port->ldc_timeout = ldc_timeout ? *ldc_timeout : 0;
|
||||
timer_setup(&port->ldc_reset_timer, vdc_ldc_reset_timer, 0);
|
||||
INIT_DELAYED_WORK(&port->ldc_reset_timer_work, vdc_ldc_reset_timer_work);
|
||||
INIT_WORK(&port->ldc_reset_work, vdc_ldc_reset_work);
|
||||
|
||||
err = vio_driver_init(&port->vio, vdev, VDEV_DISK,
|
||||
|
@ -1034,18 +1075,14 @@ static int vdc_port_remove(struct vio_dev *vdev)
|
|||
struct vdc_port *port = dev_get_drvdata(&vdev->dev);
|
||||
|
||||
if (port) {
|
||||
unsigned long flags;
|
||||
|
||||
spin_lock_irqsave(&port->vio.lock, flags);
|
||||
blk_stop_queue(port->disk->queue);
|
||||
spin_unlock_irqrestore(&port->vio.lock, flags);
|
||||
blk_mq_stop_hw_queues(port->disk->queue);
|
||||
|
||||
flush_work(&port->ldc_reset_work);
|
||||
del_timer_sync(&port->ldc_reset_timer);
|
||||
cancel_delayed_work_sync(&port->ldc_reset_timer_work);
|
||||
del_timer_sync(&port->vio.timer);
|
||||
|
||||
del_gendisk(port->disk);
|
||||
blk_cleanup_queue(port->disk->queue);
|
||||
cleanup_queue(port->disk->queue);
|
||||
put_disk(port->disk);
|
||||
port->disk = NULL;
|
||||
|
||||
|
@ -1080,32 +1117,46 @@ static void vdc_requeue_inflight(struct vdc_port *port)
|
|||
}
|
||||
|
||||
rqe->req = NULL;
|
||||
blk_requeue_request(port->disk->queue, req);
|
||||
blk_mq_requeue_request(req, false);
|
||||
}
|
||||
}
|
||||
|
||||
static void vdc_queue_drain(struct vdc_port *port)
|
||||
{
|
||||
struct request *req;
|
||||
struct request_queue *q = port->disk->queue;
|
||||
|
||||
while ((req = blk_fetch_request(port->disk->queue)) != NULL)
|
||||
__blk_end_request_all(req, BLK_STS_IOERR);
|
||||
/*
|
||||
* Mark the queue as draining, then freeze/quiesce to ensure
|
||||
* that all existing requests are seen in ->queue_rq() and killed
|
||||
*/
|
||||
port->drain = 1;
|
||||
spin_unlock_irq(&port->vio.lock);
|
||||
|
||||
blk_mq_freeze_queue(q);
|
||||
blk_mq_quiesce_queue(q);
|
||||
|
||||
spin_lock_irq(&port->vio.lock);
|
||||
port->drain = 0;
|
||||
blk_mq_unquiesce_queue(q);
|
||||
blk_mq_unfreeze_queue(q);
|
||||
}
|
||||
|
||||
static void vdc_ldc_reset_timer(struct timer_list *t)
|
||||
static void vdc_ldc_reset_timer_work(struct work_struct *work)
|
||||
{
|
||||
struct vdc_port *port = from_timer(port, t, ldc_reset_timer);
|
||||
struct vio_driver_state *vio = &port->vio;
|
||||
unsigned long flags;
|
||||
struct vdc_port *port;
|
||||
struct vio_driver_state *vio;
|
||||
|
||||
spin_lock_irqsave(&vio->lock, flags);
|
||||
port = container_of(work, struct vdc_port, ldc_reset_timer_work.work);
|
||||
vio = &port->vio;
|
||||
|
||||
spin_lock_irq(&vio->lock);
|
||||
if (!(port->vio.hs_state & VIO_HS_COMPLETE)) {
|
||||
pr_warn(PFX "%s ldc down %llu seconds, draining queue\n",
|
||||
port->disk_name, port->ldc_timeout);
|
||||
vdc_queue_drain(port);
|
||||
vdc_blk_queue_start(port);
|
||||
}
|
||||
spin_unlock_irqrestore(&vio->lock, flags);
|
||||
spin_unlock_irq(&vio->lock);
|
||||
}
|
||||
|
||||
static void vdc_ldc_reset_work(struct work_struct *work)
|
||||
|
@ -1129,7 +1180,7 @@ static void vdc_ldc_reset(struct vdc_port *port)
|
|||
assert_spin_locked(&port->vio.lock);
|
||||
|
||||
pr_warn(PFX "%s ldc link reset\n", port->disk_name);
|
||||
blk_stop_queue(port->disk->queue);
|
||||
blk_mq_stop_hw_queues(port->disk->queue);
|
||||
vdc_requeue_inflight(port);
|
||||
vdc_port_down(port);
|
||||
|
||||
|
@ -1146,7 +1197,7 @@ static void vdc_ldc_reset(struct vdc_port *port)
|
|||
}
|
||||
|
||||
if (port->ldc_timeout)
|
||||
mod_timer(&port->ldc_reset_timer,
|
||||
mod_delayed_work(system_wq, &port->ldc_reset_timer_work,
|
||||
round_jiffies(jiffies + HZ * port->ldc_timeout));
|
||||
mod_timer(&port->vio.timer, round_jiffies(jiffies + HZ));
|
||||
return;
|
||||
|
|
|
@ -243,7 +243,6 @@ struct carm_port {
|
|||
unsigned int port_no;
|
||||
struct gendisk *disk;
|
||||
struct carm_host *host;
|
||||
struct blk_mq_tag_set tag_set;
|
||||
|
||||
/* attached device characteristics */
|
||||
u64 capacity;
|
||||
|
@ -254,13 +253,10 @@ struct carm_port {
|
|||
};
|
||||
|
||||
struct carm_request {
|
||||
unsigned int tag;
|
||||
int n_elem;
|
||||
unsigned int msg_type;
|
||||
unsigned int msg_subtype;
|
||||
unsigned int msg_bucket;
|
||||
struct request *rq;
|
||||
struct carm_port *port;
|
||||
struct scatterlist sg[CARM_MAX_REQ_SG];
|
||||
};
|
||||
|
||||
|
@ -291,9 +287,6 @@ struct carm_host {
|
|||
unsigned int wait_q_cons;
|
||||
struct request_queue *wait_q[CARM_MAX_WAIT_Q];
|
||||
|
||||
unsigned int n_msgs;
|
||||
u64 msg_alloc;
|
||||
struct carm_request req[CARM_MAX_REQ];
|
||||
void *msg_base;
|
||||
dma_addr_t msg_dma;
|
||||
|
||||
|
@ -478,10 +471,10 @@ static inline dma_addr_t carm_ref_msg_dma(struct carm_host *host,
|
|||
}
|
||||
|
||||
static int carm_send_msg(struct carm_host *host,
|
||||
struct carm_request *crq)
|
||||
struct carm_request *crq, unsigned tag)
|
||||
{
|
||||
void __iomem *mmio = host->mmio;
|
||||
u32 msg = (u32) carm_ref_msg_dma(host, crq->tag);
|
||||
u32 msg = (u32) carm_ref_msg_dma(host, tag);
|
||||
u32 cm_bucket = crq->msg_bucket;
|
||||
u32 tmp;
|
||||
int rc = 0;
|
||||
|
@ -506,99 +499,24 @@ static int carm_send_msg(struct carm_host *host,
|
|||
return rc;
|
||||
}
|
||||
|
||||
static struct carm_request *carm_get_request(struct carm_host *host)
|
||||
{
|
||||
unsigned int i;
|
||||
|
||||
/* obey global hardware limit on S/G entries */
|
||||
if (host->hw_sg_used >= (CARM_MAX_HOST_SG - CARM_MAX_REQ_SG))
|
||||
return NULL;
|
||||
|
||||
for (i = 0; i < max_queue; i++)
|
||||
if ((host->msg_alloc & (1ULL << i)) == 0) {
|
||||
struct carm_request *crq = &host->req[i];
|
||||
crq->port = NULL;
|
||||
crq->n_elem = 0;
|
||||
|
||||
host->msg_alloc |= (1ULL << i);
|
||||
host->n_msgs++;
|
||||
|
||||
assert(host->n_msgs <= CARM_MAX_REQ);
|
||||
sg_init_table(crq->sg, CARM_MAX_REQ_SG);
|
||||
return crq;
|
||||
}
|
||||
|
||||
DPRINTK("no request available, returning NULL\n");
|
||||
return NULL;
|
||||
}
|
||||
|
||||
static int carm_put_request(struct carm_host *host, struct carm_request *crq)
|
||||
{
|
||||
assert(crq->tag < max_queue);
|
||||
|
||||
if (unlikely((host->msg_alloc & (1ULL << crq->tag)) == 0))
|
||||
return -EINVAL; /* tried to clear a tag that was not active */
|
||||
|
||||
assert(host->hw_sg_used >= crq->n_elem);
|
||||
|
||||
host->msg_alloc &= ~(1ULL << crq->tag);
|
||||
host->hw_sg_used -= crq->n_elem;
|
||||
host->n_msgs--;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static struct carm_request *carm_get_special(struct carm_host *host)
|
||||
{
|
||||
unsigned long flags;
|
||||
struct carm_request *crq = NULL;
|
||||
struct request *rq;
|
||||
int tries = 5000;
|
||||
|
||||
while (tries-- > 0) {
|
||||
spin_lock_irqsave(&host->lock, flags);
|
||||
crq = carm_get_request(host);
|
||||
spin_unlock_irqrestore(&host->lock, flags);
|
||||
|
||||
if (crq)
|
||||
break;
|
||||
msleep(10);
|
||||
}
|
||||
|
||||
if (!crq)
|
||||
return NULL;
|
||||
|
||||
rq = blk_get_request(host->oob_q, REQ_OP_DRV_OUT, 0);
|
||||
if (IS_ERR(rq)) {
|
||||
spin_lock_irqsave(&host->lock, flags);
|
||||
carm_put_request(host, crq);
|
||||
spin_unlock_irqrestore(&host->lock, flags);
|
||||
return NULL;
|
||||
}
|
||||
|
||||
crq->rq = rq;
|
||||
return crq;
|
||||
}
|
||||
|
||||
static int carm_array_info (struct carm_host *host, unsigned int array_idx)
|
||||
{
|
||||
struct carm_msg_ioctl *ioc;
|
||||
unsigned int idx;
|
||||
u32 msg_data;
|
||||
dma_addr_t msg_dma;
|
||||
struct carm_request *crq;
|
||||
struct request *rq;
|
||||
int rc;
|
||||
|
||||
crq = carm_get_special(host);
|
||||
if (!crq) {
|
||||
rq = blk_mq_alloc_request(host->oob_q, REQ_OP_DRV_OUT, 0);
|
||||
if (IS_ERR(rq)) {
|
||||
rc = -ENOMEM;
|
||||
goto err_out;
|
||||
}
|
||||
crq = blk_mq_rq_to_pdu(rq);
|
||||
|
||||
idx = crq->tag;
|
||||
|
||||
ioc = carm_ref_msg(host, idx);
|
||||
msg_dma = carm_ref_msg_dma(host, idx);
|
||||
ioc = carm_ref_msg(host, rq->tag);
|
||||
msg_dma = carm_ref_msg_dma(host, rq->tag);
|
||||
msg_data = (u32) (msg_dma + sizeof(struct carm_array_info));
|
||||
|
||||
crq->msg_type = CARM_MSG_ARRAY;
|
||||
|
@ -612,7 +530,7 @@ static int carm_array_info (struct carm_host *host, unsigned int array_idx)
|
|||
ioc->type = CARM_MSG_ARRAY;
|
||||
ioc->subtype = CARM_ARRAY_INFO;
|
||||
ioc->array_id = (u8) array_idx;
|
||||
ioc->handle = cpu_to_le32(TAG_ENCODE(idx));
|
||||
ioc->handle = cpu_to_le32(TAG_ENCODE(rq->tag));
|
||||
ioc->data_addr = cpu_to_le32(msg_data);
|
||||
|
||||
spin_lock_irq(&host->lock);
|
||||
|
@ -620,9 +538,8 @@ static int carm_array_info (struct carm_host *host, unsigned int array_idx)
|
|||
host->state == HST_DEV_SCAN);
|
||||
spin_unlock_irq(&host->lock);
|
||||
|
||||
DPRINTK("blk_execute_rq_nowait, tag == %u\n", idx);
|
||||
crq->rq->special = crq;
|
||||
blk_execute_rq_nowait(host->oob_q, NULL, crq->rq, true, NULL);
|
||||
DPRINTK("blk_execute_rq_nowait, tag == %u\n", rq->tag);
|
||||
blk_execute_rq_nowait(host->oob_q, NULL, rq, true, NULL);
|
||||
|
||||
return 0;
|
||||
|
||||
|
@ -637,21 +554,21 @@ typedef unsigned int (*carm_sspc_t)(struct carm_host *, unsigned int, void *);
|
|||
|
||||
static int carm_send_special (struct carm_host *host, carm_sspc_t func)
|
||||
{
|
||||
struct request *rq;
|
||||
struct carm_request *crq;
|
||||
struct carm_msg_ioctl *ioc;
|
||||
void *mem;
|
||||
unsigned int idx, msg_size;
|
||||
unsigned int msg_size;
|
||||
int rc;
|
||||
|
||||
crq = carm_get_special(host);
|
||||
if (!crq)
|
||||
rq = blk_mq_alloc_request(host->oob_q, REQ_OP_DRV_OUT, 0);
|
||||
if (IS_ERR(rq))
|
||||
return -ENOMEM;
|
||||
crq = blk_mq_rq_to_pdu(rq);
|
||||
|
||||
idx = crq->tag;
|
||||
mem = carm_ref_msg(host, rq->tag);
|
||||
|
||||
mem = carm_ref_msg(host, idx);
|
||||
|
||||
msg_size = func(host, idx, mem);
|
||||
msg_size = func(host, rq->tag, mem);
|
||||
|
||||
ioc = mem;
|
||||
crq->msg_type = ioc->type;
|
||||
|
@ -660,9 +577,8 @@ static int carm_send_special (struct carm_host *host, carm_sspc_t func)
|
|||
BUG_ON(rc < 0);
|
||||
crq->msg_bucket = (u32) rc;
|
||||
|
||||
DPRINTK("blk_execute_rq_nowait, tag == %u\n", idx);
|
||||
crq->rq->special = crq;
|
||||
blk_execute_rq_nowait(host->oob_q, NULL, crq->rq, true, NULL);
|
||||
DPRINTK("blk_execute_rq_nowait, tag == %u\n", rq->tag);
|
||||
blk_execute_rq_nowait(host->oob_q, NULL, rq, true, NULL);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
@ -744,19 +660,6 @@ static unsigned int carm_fill_get_fw_ver(struct carm_host *host,
|
|||
sizeof(struct carm_fw_ver);
|
||||
}
|
||||
|
||||
static inline void carm_end_request_queued(struct carm_host *host,
|
||||
struct carm_request *crq,
|
||||
blk_status_t error)
|
||||
{
|
||||
struct request *req = crq->rq;
|
||||
int rc;
|
||||
|
||||
blk_mq_end_request(req, error);
|
||||
|
||||
rc = carm_put_request(host, crq);
|
||||
assert(rc == 0);
|
||||
}
|
||||
|
||||
static inline void carm_push_q (struct carm_host *host, struct request_queue *q)
|
||||
{
|
||||
unsigned int idx = host->wait_q_prod % CARM_MAX_WAIT_Q;
|
||||
|
@ -791,101 +694,50 @@ static inline void carm_round_robin(struct carm_host *host)
|
|||
}
|
||||
}
|
||||
|
||||
static inline void carm_end_rq(struct carm_host *host, struct carm_request *crq,
|
||||
blk_status_t error)
|
||||
static inline enum dma_data_direction carm_rq_dir(struct request *rq)
|
||||
{
|
||||
carm_end_request_queued(host, crq, error);
|
||||
if (max_queue == 1)
|
||||
carm_round_robin(host);
|
||||
else if ((host->n_msgs <= CARM_MSG_LOW_WATER) &&
|
||||
(host->hw_sg_used <= CARM_SG_LOW_WATER)) {
|
||||
carm_round_robin(host);
|
||||
}
|
||||
}
|
||||
|
||||
static blk_status_t carm_oob_queue_rq(struct blk_mq_hw_ctx *hctx,
|
||||
const struct blk_mq_queue_data *bd)
|
||||
{
|
||||
struct request_queue *q = hctx->queue;
|
||||
struct carm_host *host = q->queuedata;
|
||||
struct carm_request *crq;
|
||||
int rc;
|
||||
|
||||
blk_mq_start_request(bd->rq);
|
||||
|
||||
spin_lock_irq(&host->lock);
|
||||
|
||||
crq = bd->rq->special;
|
||||
assert(crq != NULL);
|
||||
assert(crq->rq == bd->rq);
|
||||
|
||||
crq->n_elem = 0;
|
||||
|
||||
DPRINTK("send req\n");
|
||||
rc = carm_send_msg(host, crq);
|
||||
if (rc) {
|
||||
carm_push_q(host, q);
|
||||
spin_unlock_irq(&host->lock);
|
||||
return BLK_STS_DEV_RESOURCE;
|
||||
}
|
||||
|
||||
spin_unlock_irq(&host->lock);
|
||||
return BLK_STS_OK;
|
||||
return op_is_write(req_op(rq)) ? DMA_TO_DEVICE : DMA_FROM_DEVICE;
|
||||
}
|
||||
|
||||
static blk_status_t carm_queue_rq(struct blk_mq_hw_ctx *hctx,
|
||||
const struct blk_mq_queue_data *bd)
|
||||
{
|
||||
struct request_queue *q = hctx->queue;
|
||||
struct request *rq = bd->rq;
|
||||
struct carm_port *port = q->queuedata;
|
||||
struct carm_host *host = port->host;
|
||||
struct carm_request *crq = blk_mq_rq_to_pdu(rq);
|
||||
struct carm_msg_rw *msg;
|
||||
struct carm_request *crq;
|
||||
struct request *rq = bd->rq;
|
||||
struct scatterlist *sg;
|
||||
int writing = 0, pci_dir, i, n_elem, rc;
|
||||
u32 tmp;
|
||||
int i, n_elem = 0, rc;
|
||||
unsigned int msg_size;
|
||||
u32 tmp;
|
||||
|
||||
crq->n_elem = 0;
|
||||
sg_init_table(crq->sg, CARM_MAX_REQ_SG);
|
||||
|
||||
blk_mq_start_request(rq);
|
||||
|
||||
spin_lock_irq(&host->lock);
|
||||
|
||||
crq = carm_get_request(host);
|
||||
if (!crq) {
|
||||
carm_push_q(host, q);
|
||||
spin_unlock_irq(&host->lock);
|
||||
return BLK_STS_DEV_RESOURCE;
|
||||
}
|
||||
crq->rq = rq;
|
||||
|
||||
if (rq_data_dir(rq) == WRITE) {
|
||||
writing = 1;
|
||||
pci_dir = DMA_TO_DEVICE;
|
||||
} else {
|
||||
pci_dir = DMA_FROM_DEVICE;
|
||||
}
|
||||
if (req_op(rq) == REQ_OP_DRV_OUT)
|
||||
goto send_msg;
|
||||
|
||||
/* get scatterlist from block layer */
|
||||
sg = &crq->sg[0];
|
||||
n_elem = blk_rq_map_sg(q, rq, sg);
|
||||
if (n_elem <= 0) {
|
||||
/* request with no s/g entries? */
|
||||
carm_end_rq(host, crq, BLK_STS_IOERR);
|
||||
spin_unlock_irq(&host->lock);
|
||||
return BLK_STS_IOERR;
|
||||
}
|
||||
if (n_elem <= 0)
|
||||
goto out_ioerr;
|
||||
|
||||
/* map scatterlist to PCI bus addresses */
|
||||
n_elem = dma_map_sg(&host->pdev->dev, sg, n_elem, pci_dir);
|
||||
if (n_elem <= 0) {
|
||||
/* request with no s/g entries? */
|
||||
carm_end_rq(host, crq, BLK_STS_IOERR);
|
||||
spin_unlock_irq(&host->lock);
|
||||
return BLK_STS_IOERR;
|
||||
}
|
||||
n_elem = dma_map_sg(&host->pdev->dev, sg, n_elem, carm_rq_dir(rq));
|
||||
if (n_elem <= 0)
|
||||
goto out_ioerr;
|
||||
|
||||
/* obey global hardware limit on S/G entries */
|
||||
if (host->hw_sg_used >= CARM_MAX_HOST_SG - n_elem)
|
||||
goto out_resource;
|
||||
|
||||
crq->n_elem = n_elem;
|
||||
crq->port = port;
|
||||
host->hw_sg_used += n_elem;
|
||||
|
||||
/*
|
||||
|
@ -893,9 +745,9 @@ static blk_status_t carm_queue_rq(struct blk_mq_hw_ctx *hctx,
|
|||
*/
|
||||
|
||||
VPRINTK("build msg\n");
|
||||
msg = (struct carm_msg_rw *) carm_ref_msg(host, crq->tag);
|
||||
msg = (struct carm_msg_rw *) carm_ref_msg(host, rq->tag);
|
||||
|
||||
if (writing) {
|
||||
if (rq_data_dir(rq) == WRITE) {
|
||||
msg->type = CARM_MSG_WRITE;
|
||||
crq->msg_type = CARM_MSG_WRITE;
|
||||
} else {
|
||||
|
@ -906,7 +758,7 @@ static blk_status_t carm_queue_rq(struct blk_mq_hw_ctx *hctx,
|
|||
msg->id = port->port_no;
|
||||
msg->sg_count = n_elem;
|
||||
msg->sg_type = SGT_32BIT;
|
||||
msg->handle = cpu_to_le32(TAG_ENCODE(crq->tag));
|
||||
msg->handle = cpu_to_le32(TAG_ENCODE(rq->tag));
|
||||
msg->lba = cpu_to_le32(blk_rq_pos(rq) & 0xffffffff);
|
||||
tmp = (blk_rq_pos(rq) >> 16) >> 16;
|
||||
msg->lba_high = cpu_to_le16( (u16) tmp );
|
||||
|
@ -923,22 +775,28 @@ static blk_status_t carm_queue_rq(struct blk_mq_hw_ctx *hctx,
|
|||
rc = carm_lookup_bucket(msg_size);
|
||||
BUG_ON(rc < 0);
|
||||
crq->msg_bucket = (u32) rc;
|
||||
|
||||
send_msg:
|
||||
/*
|
||||
* queue read/write message to hardware
|
||||
*/
|
||||
|
||||
VPRINTK("send msg, tag == %u\n", crq->tag);
|
||||
rc = carm_send_msg(host, crq);
|
||||
VPRINTK("send msg, tag == %u\n", rq->tag);
|
||||
rc = carm_send_msg(host, crq, rq->tag);
|
||||
if (rc) {
|
||||
carm_put_request(host, crq);
|
||||
carm_push_q(host, q);
|
||||
spin_unlock_irq(&host->lock);
|
||||
return BLK_STS_DEV_RESOURCE;
|
||||
host->hw_sg_used -= n_elem;
|
||||
goto out_resource;
|
||||
}
|
||||
|
||||
spin_unlock_irq(&host->lock);
|
||||
return BLK_STS_OK;
|
||||
out_resource:
|
||||
dma_unmap_sg(&host->pdev->dev, &crq->sg[0], n_elem, carm_rq_dir(rq));
|
||||
carm_push_q(host, q);
|
||||
spin_unlock_irq(&host->lock);
|
||||
return BLK_STS_DEV_RESOURCE;
|
||||
out_ioerr:
|
||||
carm_round_robin(host);
|
||||
spin_unlock_irq(&host->lock);
|
||||
return BLK_STS_IOERR;
|
||||
}
|
||||
|
||||
static void carm_handle_array_info(struct carm_host *host,
|
||||
|
@ -954,8 +812,6 @@ static void carm_handle_array_info(struct carm_host *host,
|
|||
|
||||
DPRINTK("ENTER\n");
|
||||
|
||||
carm_end_rq(host, crq, error);
|
||||
|
||||
if (error)
|
||||
goto out;
|
||||
if (le32_to_cpu(desc->array_status) & ARRAY_NO_EXIST)
|
||||
|
@ -1011,8 +867,6 @@ static void carm_handle_scan_chan(struct carm_host *host,
|
|||
|
||||
DPRINTK("ENTER\n");
|
||||
|
||||
carm_end_rq(host, crq, error);
|
||||
|
||||
if (error) {
|
||||
new_state = HST_ERROR;
|
||||
goto out;
|
||||
|
@ -1040,8 +894,6 @@ static void carm_handle_generic(struct carm_host *host,
|
|||
{
|
||||
DPRINTK("ENTER\n");
|
||||
|
||||
carm_end_rq(host, crq, error);
|
||||
|
||||
assert(host->state == cur_state);
|
||||
if (error)
|
||||
host->state = HST_ERROR;
|
||||
|
@ -1050,28 +902,12 @@ static void carm_handle_generic(struct carm_host *host,
|
|||
schedule_work(&host->fsm_task);
|
||||
}
|
||||
|
||||
static inline void carm_handle_rw(struct carm_host *host,
|
||||
struct carm_request *crq, blk_status_t error)
|
||||
{
|
||||
int pci_dir;
|
||||
|
||||
VPRINTK("ENTER\n");
|
||||
|
||||
if (rq_data_dir(crq->rq) == WRITE)
|
||||
pci_dir = DMA_TO_DEVICE;
|
||||
else
|
||||
pci_dir = DMA_FROM_DEVICE;
|
||||
|
||||
dma_unmap_sg(&host->pdev->dev, &crq->sg[0], crq->n_elem, pci_dir);
|
||||
|
||||
carm_end_rq(host, crq, error);
|
||||
}
|
||||
|
||||
static inline void carm_handle_resp(struct carm_host *host,
|
||||
__le32 ret_handle_le, u32 status)
|
||||
{
|
||||
u32 handle = le32_to_cpu(ret_handle_le);
|
||||
unsigned int msg_idx;
|
||||
struct request *rq;
|
||||
struct carm_request *crq;
|
||||
blk_status_t error = (status == RMSG_OK) ? 0 : BLK_STS_IOERR;
|
||||
u8 *mem;
|
||||
|
@ -1087,13 +923,15 @@ static inline void carm_handle_resp(struct carm_host *host,
|
|||
msg_idx = TAG_DECODE(handle);
|
||||
VPRINTK("tag == %u\n", msg_idx);
|
||||
|
||||
crq = &host->req[msg_idx];
|
||||
rq = blk_mq_tag_to_rq(host->tag_set.tags[0], msg_idx);
|
||||
crq = blk_mq_rq_to_pdu(rq);
|
||||
|
||||
/* fast path */
|
||||
if (likely(crq->msg_type == CARM_MSG_READ ||
|
||||
crq->msg_type == CARM_MSG_WRITE)) {
|
||||
carm_handle_rw(host, crq, error);
|
||||
return;
|
||||
dma_unmap_sg(&host->pdev->dev, &crq->sg[0], crq->n_elem,
|
||||
carm_rq_dir(rq));
|
||||
goto done;
|
||||
}
|
||||
|
||||
mem = carm_ref_msg(host, msg_idx);
|
||||
|
@ -1103,7 +941,7 @@ static inline void carm_handle_resp(struct carm_host *host,
|
|||
switch (crq->msg_subtype) {
|
||||
case CARM_IOC_SCAN_CHAN:
|
||||
carm_handle_scan_chan(host, crq, mem, error);
|
||||
break;
|
||||
goto done;
|
||||
default:
|
||||
/* unknown / invalid response */
|
||||
goto err_out;
|
||||
|
@ -1116,11 +954,11 @@ static inline void carm_handle_resp(struct carm_host *host,
|
|||
case MISC_ALLOC_MEM:
|
||||
carm_handle_generic(host, crq, error,
|
||||
HST_ALLOC_BUF, HST_SYNC_TIME);
|
||||
break;
|
||||
goto done;
|
||||
case MISC_SET_TIME:
|
||||
carm_handle_generic(host, crq, error,
|
||||
HST_SYNC_TIME, HST_GET_FW_VER);
|
||||
break;
|
||||
goto done;
|
||||
case MISC_GET_FW_VER: {
|
||||
struct carm_fw_ver *ver = (struct carm_fw_ver *)
|
||||
(mem + sizeof(struct carm_msg_get_fw_ver));
|
||||
|
@ -1130,7 +968,7 @@ static inline void carm_handle_resp(struct carm_host *host,
|
|||
}
|
||||
carm_handle_generic(host, crq, error,
|
||||
HST_GET_FW_VER, HST_PORT_SCAN);
|
||||
break;
|
||||
goto done;
|
||||
}
|
||||
default:
|
||||
/* unknown / invalid response */
|
||||
|
@ -1161,7 +999,13 @@ static inline void carm_handle_resp(struct carm_host *host,
|
|||
err_out:
|
||||
printk(KERN_WARNING DRV_NAME "(%s): BUG: unhandled message type %d/%d\n",
|
||||
pci_name(host->pdev), crq->msg_type, crq->msg_subtype);
|
||||
carm_end_rq(host, crq, BLK_STS_IOERR);
|
||||
error = BLK_STS_IOERR;
|
||||
done:
|
||||
host->hw_sg_used -= crq->n_elem;
|
||||
blk_mq_end_request(blk_mq_rq_from_pdu(crq), error);
|
||||
|
||||
if (host->hw_sg_used <= CARM_SG_LOW_WATER)
|
||||
carm_round_robin(host);
|
||||
}
|
||||
|
||||
static inline void carm_handle_responses(struct carm_host *host)
|
||||
|
@ -1491,78 +1335,56 @@ static int carm_init_host(struct carm_host *host)
|
|||
return 0;
|
||||
}
|
||||
|
||||
static const struct blk_mq_ops carm_oob_mq_ops = {
|
||||
.queue_rq = carm_oob_queue_rq,
|
||||
};
|
||||
|
||||
static const struct blk_mq_ops carm_mq_ops = {
|
||||
.queue_rq = carm_queue_rq,
|
||||
};
|
||||
|
||||
static int carm_init_disks(struct carm_host *host)
|
||||
static int carm_init_disk(struct carm_host *host, unsigned int port_no)
|
||||
{
|
||||
unsigned int i;
|
||||
int rc = 0;
|
||||
struct carm_port *port = &host->port[port_no];
|
||||
struct gendisk *disk;
|
||||
struct request_queue *q;
|
||||
|
||||
for (i = 0; i < CARM_MAX_PORTS; i++) {
|
||||
struct gendisk *disk;
|
||||
struct request_queue *q;
|
||||
struct carm_port *port;
|
||||
port->host = host;
|
||||
port->port_no = port_no;
|
||||
|
||||
port = &host->port[i];
|
||||
port->host = host;
|
||||
port->port_no = i;
|
||||
disk = alloc_disk(CARM_MINORS_PER_MAJOR);
|
||||
if (!disk)
|
||||
return -ENOMEM;
|
||||
|
||||
disk = alloc_disk(CARM_MINORS_PER_MAJOR);
|
||||
if (!disk) {
|
||||
rc = -ENOMEM;
|
||||
break;
|
||||
}
|
||||
port->disk = disk;
|
||||
sprintf(disk->disk_name, DRV_NAME "/%u",
|
||||
(unsigned int)host->id * CARM_MAX_PORTS + port_no);
|
||||
disk->major = host->major;
|
||||
disk->first_minor = port_no * CARM_MINORS_PER_MAJOR;
|
||||
disk->fops = &carm_bd_ops;
|
||||
disk->private_data = port;
|
||||
|
||||
port->disk = disk;
|
||||
sprintf(disk->disk_name, DRV_NAME "/%u",
|
||||
(unsigned int) (host->id * CARM_MAX_PORTS) + i);
|
||||
disk->major = host->major;
|
||||
disk->first_minor = i * CARM_MINORS_PER_MAJOR;
|
||||
disk->fops = &carm_bd_ops;
|
||||
disk->private_data = port;
|
||||
q = blk_mq_init_queue(&host->tag_set);
|
||||
if (IS_ERR(q))
|
||||
return PTR_ERR(q);
|
||||
|
||||
q = blk_mq_init_sq_queue(&port->tag_set, &carm_mq_ops,
|
||||
max_queue, BLK_MQ_F_SHOULD_MERGE);
|
||||
if (IS_ERR(q)) {
|
||||
rc = PTR_ERR(q);
|
||||
break;
|
||||
}
|
||||
disk->queue = q;
|
||||
blk_queue_max_segments(q, CARM_MAX_REQ_SG);
|
||||
blk_queue_segment_boundary(q, CARM_SG_BOUNDARY);
|
||||
blk_queue_max_segments(q, CARM_MAX_REQ_SG);
|
||||
blk_queue_segment_boundary(q, CARM_SG_BOUNDARY);
|
||||
|
||||
q->queuedata = port;
|
||||
}
|
||||
|
||||
return rc;
|
||||
q->queuedata = port;
|
||||
disk->queue = q;
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void carm_free_disks(struct carm_host *host)
|
||||
static void carm_free_disk(struct carm_host *host, unsigned int port_no)
|
||||
{
|
||||
unsigned int i;
|
||||
struct carm_port *port = &host->port[port_no];
|
||||
struct gendisk *disk = port->disk;
|
||||
|
||||
for (i = 0; i < CARM_MAX_PORTS; i++) {
|
||||
struct carm_port *port = &host->port[i];
|
||||
struct gendisk *disk = port->disk;
|
||||
if (!disk)
|
||||
return;
|
||||
|
||||
if (disk) {
|
||||
struct request_queue *q = disk->queue;
|
||||
|
||||
if (disk->flags & GENHD_FL_UP)
|
||||
del_gendisk(disk);
|
||||
if (q) {
|
||||
blk_mq_free_tag_set(&port->tag_set);
|
||||
blk_cleanup_queue(q);
|
||||
}
|
||||
put_disk(disk);
|
||||
}
|
||||
}
|
||||
if (disk->flags & GENHD_FL_UP)
|
||||
del_gendisk(disk);
|
||||
if (disk->queue)
|
||||
blk_cleanup_queue(disk->queue);
|
||||
put_disk(disk);
|
||||
}
|
||||
|
||||
static int carm_init_shm(struct carm_host *host)
|
||||
|
@ -1618,9 +1440,6 @@ static int carm_init_one (struct pci_dev *pdev, const struct pci_device_id *ent)
|
|||
INIT_WORK(&host->fsm_task, carm_fsm_task);
|
||||
init_completion(&host->probe_comp);
|
||||
|
||||
for (i = 0; i < ARRAY_SIZE(host->req); i++)
|
||||
host->req[i].tag = i;
|
||||
|
||||
host->mmio = ioremap(pci_resource_start(pdev, 0),
|
||||
pci_resource_len(pdev, 0));
|
||||
if (!host->mmio) {
|
||||
|
@ -1637,14 +1456,26 @@ static int carm_init_one (struct pci_dev *pdev, const struct pci_device_id *ent)
|
|||
goto err_out_iounmap;
|
||||
}
|
||||
|
||||
q = blk_mq_init_sq_queue(&host->tag_set, &carm_oob_mq_ops, 1,
|
||||
BLK_MQ_F_NO_SCHED);
|
||||
memset(&host->tag_set, 0, sizeof(host->tag_set));
|
||||
host->tag_set.ops = &carm_mq_ops;
|
||||
host->tag_set.cmd_size = sizeof(struct carm_request);
|
||||
host->tag_set.nr_hw_queues = 1;
|
||||
host->tag_set.nr_maps = 1;
|
||||
host->tag_set.queue_depth = max_queue;
|
||||
host->tag_set.numa_node = NUMA_NO_NODE;
|
||||
host->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
|
||||
|
||||
rc = blk_mq_alloc_tag_set(&host->tag_set);
|
||||
if (rc)
|
||||
goto err_out_dma_free;
|
||||
|
||||
q = blk_mq_init_queue(&host->tag_set);
|
||||
if (IS_ERR(q)) {
|
||||
printk(KERN_ERR DRV_NAME "(%s): OOB queue alloc failure\n",
|
||||
pci_name(pdev));
|
||||
rc = PTR_ERR(q);
|
||||
blk_mq_free_tag_set(&host->tag_set);
|
||||
goto err_out_dma_free;
|
||||
}
|
||||
|
||||
host->oob_q = q;
|
||||
q->queuedata = host;
|
||||
|
||||
|
@ -1667,9 +1498,11 @@ static int carm_init_one (struct pci_dev *pdev, const struct pci_device_id *ent)
|
|||
if (host->flags & FL_DYN_MAJOR)
|
||||
host->major = rc;
|
||||
|
||||
rc = carm_init_disks(host);
|
||||
if (rc)
|
||||
goto err_out_blkdev_disks;
|
||||
for (i = 0; i < CARM_MAX_PORTS; i++) {
|
||||
rc = carm_init_disk(host, i);
|
||||
if (rc)
|
||||
goto err_out_blkdev_disks;
|
||||
}
|
||||
|
||||
pci_set_master(pdev);
|
||||
|
||||
|
@ -1699,7 +1532,8 @@ static int carm_init_one (struct pci_dev *pdev, const struct pci_device_id *ent)
|
|||
err_out_free_irq:
|
||||
free_irq(pdev->irq, host);
|
||||
err_out_blkdev_disks:
|
||||
carm_free_disks(host);
|
||||
for (i = 0; i < CARM_MAX_PORTS; i++)
|
||||
carm_free_disk(host, i);
|
||||
unregister_blkdev(host->major, host->name);
|
||||
err_out_free_majors:
|
||||
if (host->major == 160)
|
||||
|
@ -1724,6 +1558,7 @@ err_out:
|
|||
static void carm_remove_one (struct pci_dev *pdev)
|
||||
{
|
||||
struct carm_host *host = pci_get_drvdata(pdev);
|
||||
unsigned int i;
|
||||
|
||||
if (!host) {
|
||||
printk(KERN_ERR PFX "BUG: no host data for PCI(%s)\n",
|
||||
|
@ -1732,7 +1567,8 @@ static void carm_remove_one (struct pci_dev *pdev)
|
|||
}
|
||||
|
||||
free_irq(pdev->irq, host);
|
||||
carm_free_disks(host);
|
||||
for (i = 0; i < CARM_MAX_PORTS; i++)
|
||||
carm_free_disk(host, i);
|
||||
unregister_blkdev(host->major, host->name);
|
||||
if (host->major == 160)
|
||||
clear_bit(0, &carm_major_alloc);
|
||||
|
|
|
@ -888,8 +888,7 @@ static int mm_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
|
|||
card->biotail = &card->bio;
|
||||
spin_lock_init(&card->lock);
|
||||
|
||||
card->queue = blk_alloc_queue_node(GFP_KERNEL, NUMA_NO_NODE,
|
||||
&card->lock);
|
||||
card->queue = blk_alloc_queue_node(GFP_KERNEL, NUMA_NO_NODE);
|
||||
if (!card->queue)
|
||||
goto failed_alloc;
|
||||
|
||||
|
|
|
@ -214,6 +214,20 @@ static void virtblk_done(struct virtqueue *vq)
|
|||
spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
|
||||
}
|
||||
|
||||
static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
|
||||
{
|
||||
struct virtio_blk *vblk = hctx->queue->queuedata;
|
||||
struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
|
||||
bool kick;
|
||||
|
||||
spin_lock_irq(&vq->lock);
|
||||
kick = virtqueue_kick_prepare(vq->vq);
|
||||
spin_unlock_irq(&vq->lock);
|
||||
|
||||
if (kick)
|
||||
virtqueue_notify(vq->vq);
|
||||
}
|
||||
|
||||
static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
|
||||
const struct blk_mq_queue_data *bd)
|
||||
{
|
||||
|
@ -624,7 +638,7 @@ static int virtblk_map_queues(struct blk_mq_tag_set *set)
|
|||
{
|
||||
struct virtio_blk *vblk = set->driver_data;
|
||||
|
||||
return blk_mq_virtio_map_queues(set, vblk->vdev, 0);
|
||||
return blk_mq_virtio_map_queues(&set->map[0], vblk->vdev, 0);
|
||||
}
|
||||
|
||||
#ifdef CONFIG_VIRTIO_BLK_SCSI
|
||||
|
@ -638,6 +652,7 @@ static void virtblk_initialize_rq(struct request *req)
|
|||
|
||||
static const struct blk_mq_ops virtio_mq_ops = {
|
||||
.queue_rq = virtio_queue_rq,
|
||||
.commit_rqs = virtio_commit_rqs,
|
||||
.complete = virtblk_request_done,
|
||||
.init_request = virtblk_init_request,
|
||||
#ifdef CONFIG_VIRTIO_BLK_SCSI
|
||||
|
|
|
@ -94,7 +94,7 @@ int ide_queue_pc_tail(ide_drive_t *drive, struct gendisk *disk,
|
|||
|
||||
rq = blk_get_request(drive->queue, REQ_OP_DRV_IN, 0);
|
||||
ide_req(rq)->type = ATA_PRIV_MISC;
|
||||
rq->special = (char *)pc;
|
||||
ide_req(rq)->special = pc;
|
||||
|
||||
if (buf && bufflen) {
|
||||
error = blk_rq_map_kern(drive->queue, rq, buf, bufflen,
|
||||
|
@ -172,8 +172,8 @@ EXPORT_SYMBOL_GPL(ide_create_request_sense_cmd);
|
|||
void ide_prep_sense(ide_drive_t *drive, struct request *rq)
|
||||
{
|
||||
struct request_sense *sense = &drive->sense_data;
|
||||
struct request *sense_rq = drive->sense_rq;
|
||||
struct scsi_request *req = scsi_req(sense_rq);
|
||||
struct request *sense_rq;
|
||||
struct scsi_request *req;
|
||||
unsigned int cmd_len, sense_len;
|
||||
int err;
|
||||
|
||||
|
@ -196,9 +196,16 @@ void ide_prep_sense(ide_drive_t *drive, struct request *rq)
|
|||
if (ata_sense_request(rq) || drive->sense_rq_armed)
|
||||
return;
|
||||
|
||||
sense_rq = drive->sense_rq;
|
||||
if (!sense_rq) {
|
||||
sense_rq = blk_mq_alloc_request(drive->queue, REQ_OP_DRV_IN,
|
||||
BLK_MQ_REQ_RESERVED | BLK_MQ_REQ_NOWAIT);
|
||||
drive->sense_rq = sense_rq;
|
||||
}
|
||||
req = scsi_req(sense_rq);
|
||||
|
||||
memset(sense, 0, sizeof(*sense));
|
||||
|
||||
blk_rq_init(rq->q, sense_rq);
|
||||
scsi_req_init(req);
|
||||
|
||||
err = blk_rq_map_kern(drive->queue, sense_rq, sense, sense_len,
|
||||
|
@ -207,6 +214,8 @@ void ide_prep_sense(ide_drive_t *drive, struct request *rq)
|
|||
if (printk_ratelimit())
|
||||
printk(KERN_WARNING PFX "%s: failed to map sense "
|
||||
"buffer\n", drive->name);
|
||||
blk_mq_free_request(sense_rq);
|
||||
drive->sense_rq = NULL;
|
||||
return;
|
||||
}
|
||||
|
||||
|
@ -226,6 +235,8 @@ EXPORT_SYMBOL_GPL(ide_prep_sense);
|
|||
|
||||
int ide_queue_sense_rq(ide_drive_t *drive, void *special)
|
||||
{
|
||||
struct request *sense_rq = drive->sense_rq;
|
||||
|
||||
/* deferred failure from ide_prep_sense() */
|
||||
if (!drive->sense_rq_armed) {
|
||||
printk(KERN_WARNING PFX "%s: error queuing a sense request\n",
|
||||
|
@ -233,12 +244,12 @@ int ide_queue_sense_rq(ide_drive_t *drive, void *special)
|
|||
return -ENOMEM;
|
||||
}
|
||||
|
||||
drive->sense_rq->special = special;
|
||||
ide_req(sense_rq)->special = special;
|
||||
drive->sense_rq_armed = false;
|
||||
|
||||
drive->hwif->rq = NULL;
|
||||
|
||||
elv_add_request(drive->queue, drive->sense_rq, ELEVATOR_INSERT_FRONT);
|
||||
ide_insert_request_head(drive, sense_rq);
|
||||
return 0;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(ide_queue_sense_rq);
|
||||
|
@ -270,10 +281,8 @@ void ide_retry_pc(ide_drive_t *drive)
|
|||
*/
|
||||
drive->hwif->rq = NULL;
|
||||
ide_requeue_and_plug(drive, failed_rq);
|
||||
if (ide_queue_sense_rq(drive, pc)) {
|
||||
blk_start_request(failed_rq);
|
||||
if (ide_queue_sense_rq(drive, pc))
|
||||
ide_complete_rq(drive, BLK_STS_IOERR, blk_rq_bytes(failed_rq));
|
||||
}
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(ide_retry_pc);
|
||||
|
||||
|
|
|
@ -211,12 +211,12 @@ static void cdrom_analyze_sense_data(ide_drive_t *drive,
|
|||
static void ide_cd_complete_failed_rq(ide_drive_t *drive, struct request *rq)
|
||||
{
|
||||
/*
|
||||
* For ATA_PRIV_SENSE, "rq->special" points to the original
|
||||
* For ATA_PRIV_SENSE, "ide_req(rq)->special" points to the original
|
||||
* failed request. Also, the sense data should be read
|
||||
* directly from rq which might be different from the original
|
||||
* sense buffer if it got copied during mapping.
|
||||
*/
|
||||
struct request *failed = (struct request *)rq->special;
|
||||
struct request *failed = ide_req(rq)->special;
|
||||
void *sense = bio_data(rq->bio);
|
||||
|
||||
if (failed) {
|
||||
|
@ -258,11 +258,22 @@ static int ide_cd_breathe(ide_drive_t *drive, struct request *rq)
|
|||
/*
|
||||
* take a breather
|
||||
*/
|
||||
blk_delay_queue(drive->queue, 1);
|
||||
blk_mq_requeue_request(rq, false);
|
||||
blk_mq_delay_kick_requeue_list(drive->queue, 1);
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
|
||||
static void ide_cd_free_sense(ide_drive_t *drive)
|
||||
{
|
||||
if (!drive->sense_rq)
|
||||
return;
|
||||
|
||||
blk_mq_free_request(drive->sense_rq);
|
||||
drive->sense_rq = NULL;
|
||||
drive->sense_rq_armed = false;
|
||||
}
|
||||
|
||||
/**
|
||||
* Returns:
|
||||
* 0: if the request should be continued.
|
||||
|
@ -516,6 +527,82 @@ static bool ide_cd_error_cmd(ide_drive_t *drive, struct ide_cmd *cmd)
|
|||
return false;
|
||||
}
|
||||
|
||||
/* standard prep_rq that builds 10 byte cmds */
|
||||
static bool ide_cdrom_prep_fs(struct request_queue *q, struct request *rq)
|
||||
{
|
||||
int hard_sect = queue_logical_block_size(q);
|
||||
long block = (long)blk_rq_pos(rq) / (hard_sect >> 9);
|
||||
unsigned long blocks = blk_rq_sectors(rq) / (hard_sect >> 9);
|
||||
struct scsi_request *req = scsi_req(rq);
|
||||
|
||||
if (rq_data_dir(rq) == READ)
|
||||
req->cmd[0] = GPCMD_READ_10;
|
||||
else
|
||||
req->cmd[0] = GPCMD_WRITE_10;
|
||||
|
||||
/*
|
||||
* fill in lba
|
||||
*/
|
||||
req->cmd[2] = (block >> 24) & 0xff;
|
||||
req->cmd[3] = (block >> 16) & 0xff;
|
||||
req->cmd[4] = (block >> 8) & 0xff;
|
||||
req->cmd[5] = block & 0xff;
|
||||
|
||||
/*
|
||||
* and transfer length
|
||||
*/
|
||||
req->cmd[7] = (blocks >> 8) & 0xff;
|
||||
req->cmd[8] = blocks & 0xff;
|
||||
req->cmd_len = 10;
|
||||
return true;
|
||||
}
|
||||
|
||||
/*
|
||||
* Most of the SCSI commands are supported directly by ATAPI devices.
|
||||
* This transform handles the few exceptions.
|
||||
*/
|
||||
static bool ide_cdrom_prep_pc(struct request *rq)
|
||||
{
|
||||
u8 *c = scsi_req(rq)->cmd;
|
||||
|
||||
/* transform 6-byte read/write commands to the 10-byte version */
|
||||
if (c[0] == READ_6 || c[0] == WRITE_6) {
|
||||
c[8] = c[4];
|
||||
c[5] = c[3];
|
||||
c[4] = c[2];
|
||||
c[3] = c[1] & 0x1f;
|
||||
c[2] = 0;
|
||||
c[1] &= 0xe0;
|
||||
c[0] += (READ_10 - READ_6);
|
||||
scsi_req(rq)->cmd_len = 10;
|
||||
return true;
|
||||
}
|
||||
|
||||
/*
|
||||
* it's silly to pretend we understand 6-byte sense commands, just
|
||||
* reject with ILLEGAL_REQUEST and the caller should take the
|
||||
* appropriate action
|
||||
*/
|
||||
if (c[0] == MODE_SENSE || c[0] == MODE_SELECT) {
|
||||
scsi_req(rq)->result = ILLEGAL_REQUEST;
|
||||
return false;
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
static bool ide_cdrom_prep_rq(ide_drive_t *drive, struct request *rq)
|
||||
{
|
||||
if (!blk_rq_is_passthrough(rq)) {
|
||||
scsi_req_init(scsi_req(rq));
|
||||
|
||||
return ide_cdrom_prep_fs(drive->queue, rq);
|
||||
} else if (blk_rq_is_scsi(rq))
|
||||
return ide_cdrom_prep_pc(rq);
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
static ide_startstop_t cdrom_newpc_intr(ide_drive_t *drive)
|
||||
{
|
||||
ide_hwif_t *hwif = drive->hwif;
|
||||
|
@ -675,7 +762,7 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t *drive)
|
|||
out_end:
|
||||
if (blk_rq_is_scsi(rq) && rc == 0) {
|
||||
scsi_req(rq)->resid_len = 0;
|
||||
blk_end_request_all(rq, BLK_STS_OK);
|
||||
blk_mq_end_request(rq, BLK_STS_OK);
|
||||
hwif->rq = NULL;
|
||||
} else {
|
||||
if (sense && uptodate)
|
||||
|
@ -705,6 +792,8 @@ out_end:
|
|||
if (sense && rc == 2)
|
||||
ide_error(drive, "request sense failure", stat);
|
||||
}
|
||||
|
||||
ide_cd_free_sense(drive);
|
||||
return ide_stopped;
|
||||
}
|
||||
|
||||
|
@ -729,7 +818,7 @@ static ide_startstop_t cdrom_start_rw(ide_drive_t *drive, struct request *rq)
|
|||
* We may be retrying this request after an error. Fix up any
|
||||
* weirdness which might be present in the request packet.
|
||||
*/
|
||||
q->prep_rq_fn(q, rq);
|
||||
ide_cdrom_prep_rq(drive, rq);
|
||||
}
|
||||
|
||||
/* fs requests *must* be hardware frame aligned */
|
||||
|
@ -1323,82 +1412,6 @@ static int ide_cdrom_probe_capabilities(ide_drive_t *drive)
|
|||
return nslots;
|
||||
}
|
||||
|
||||
/* standard prep_rq_fn that builds 10 byte cmds */
|
||||
static int ide_cdrom_prep_fs(struct request_queue *q, struct request *rq)
|
||||
{
|
||||
int hard_sect = queue_logical_block_size(q);
|
||||
long block = (long)blk_rq_pos(rq) / (hard_sect >> 9);
|
||||
unsigned long blocks = blk_rq_sectors(rq) / (hard_sect >> 9);
|
||||
struct scsi_request *req = scsi_req(rq);
|
||||
|
||||
q->initialize_rq_fn(rq);
|
||||
|
||||
if (rq_data_dir(rq) == READ)
|
||||
req->cmd[0] = GPCMD_READ_10;
|
||||
else
|
||||
req->cmd[0] = GPCMD_WRITE_10;
|
||||
|
||||
/*
|
||||
* fill in lba
|
||||
*/
|
||||
req->cmd[2] = (block >> 24) & 0xff;
|
||||
req->cmd[3] = (block >> 16) & 0xff;
|
||||
req->cmd[4] = (block >> 8) & 0xff;
|
||||
req->cmd[5] = block & 0xff;
|
||||
|
||||
/*
|
||||
* and transfer length
|
||||
*/
|
||||
req->cmd[7] = (blocks >> 8) & 0xff;
|
||||
req->cmd[8] = blocks & 0xff;
|
||||
req->cmd_len = 10;
|
||||
return BLKPREP_OK;
|
||||
}
|
||||
|
||||
/*
|
||||
* Most of the SCSI commands are supported directly by ATAPI devices.
|
||||
* This transform handles the few exceptions.
|
||||
*/
|
||||
static int ide_cdrom_prep_pc(struct request *rq)
|
||||
{
|
||||
u8 *c = scsi_req(rq)->cmd;
|
||||
|
||||
/* transform 6-byte read/write commands to the 10-byte version */
|
||||
if (c[0] == READ_6 || c[0] == WRITE_6) {
|
||||
c[8] = c[4];
|
||||
c[5] = c[3];
|
||||
c[4] = c[2];
|
||||
c[3] = c[1] & 0x1f;
|
||||
c[2] = 0;
|
||||
c[1] &= 0xe0;
|
||||
c[0] += (READ_10 - READ_6);
|
||||
scsi_req(rq)->cmd_len = 10;
|
||||
return BLKPREP_OK;
|
||||
}
|
||||
|
||||
/*
|
||||
* it's silly to pretend we understand 6-byte sense commands, just
|
||||
* reject with ILLEGAL_REQUEST and the caller should take the
|
||||
* appropriate action
|
||||
*/
|
||||
if (c[0] == MODE_SENSE || c[0] == MODE_SELECT) {
|
||||
scsi_req(rq)->result = ILLEGAL_REQUEST;
|
||||
return BLKPREP_KILL;
|
||||
}
|
||||
|
||||
return BLKPREP_OK;
|
||||
}
|
||||
|
||||
static int ide_cdrom_prep_fn(struct request_queue *q, struct request *rq)
|
||||
{
|
||||
if (!blk_rq_is_passthrough(rq))
|
||||
return ide_cdrom_prep_fs(q, rq);
|
||||
else if (blk_rq_is_scsi(rq))
|
||||
return ide_cdrom_prep_pc(rq);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
struct cd_list_entry {
|
||||
const char *id_model;
|
||||
const char *id_firmware;
|
||||
|
@ -1508,7 +1521,7 @@ static int ide_cdrom_setup(ide_drive_t *drive)
|
|||
|
||||
ide_debug_log(IDE_DBG_PROBE, "enter");
|
||||
|
||||
blk_queue_prep_rq(q, ide_cdrom_prep_fn);
|
||||
drive->prep_rq = ide_cdrom_prep_rq;
|
||||
blk_queue_dma_alignment(q, 31);
|
||||
blk_queue_update_dma_pad(q, 15);
|
||||
|
||||
|
@ -1569,7 +1582,7 @@ static void ide_cd_release(struct device *dev)
|
|||
if (devinfo->handle == drive)
|
||||
unregister_cdrom(devinfo);
|
||||
drive->driver_data = NULL;
|
||||
blk_queue_prep_rq(drive->queue, NULL);
|
||||
drive->prep_rq = NULL;
|
||||
g->private_data = NULL;
|
||||
put_disk(g);
|
||||
kfree(info);
|
||||
|
|
|
@ -171,7 +171,7 @@ int ide_devset_execute(ide_drive_t *drive, const struct ide_devset *setting,
|
|||
scsi_req(rq)->cmd_len = 5;
|
||||
scsi_req(rq)->cmd[0] = REQ_DEVSET_EXEC;
|
||||
*(int *)&scsi_req(rq)->cmd[1] = arg;
|
||||
rq->special = setting->set;
|
||||
ide_req(rq)->special = setting->set;
|
||||
|
||||
blk_execute_rq(q, NULL, rq, 0);
|
||||
ret = scsi_req(rq)->result;
|
||||
|
@ -182,7 +182,7 @@ int ide_devset_execute(ide_drive_t *drive, const struct ide_devset *setting,
|
|||
|
||||
ide_startstop_t ide_do_devset(ide_drive_t *drive, struct request *rq)
|
||||
{
|
||||
int err, (*setfunc)(ide_drive_t *, int) = rq->special;
|
||||
int err, (*setfunc)(ide_drive_t *, int) = ide_req(rq)->special;
|
||||
|
||||
err = setfunc(drive, *(int *)&scsi_req(rq)->cmd[1]);
|
||||
if (err)
|
||||
|
|
|
@ -427,16 +427,15 @@ static void ide_disk_unlock_native_capacity(ide_drive_t *drive)
|
|||
drive->dev_flags |= IDE_DFLAG_NOHPA; /* disable HPA on resume */
|
||||
}
|
||||
|
||||
static int idedisk_prep_fn(struct request_queue *q, struct request *rq)
|
||||
static bool idedisk_prep_rq(ide_drive_t *drive, struct request *rq)
|
||||
{
|
||||
ide_drive_t *drive = q->queuedata;
|
||||
struct ide_cmd *cmd;
|
||||
|
||||
if (req_op(rq) != REQ_OP_FLUSH)
|
||||
return BLKPREP_OK;
|
||||
return true;
|
||||
|
||||
if (rq->special) {
|
||||
cmd = rq->special;
|
||||
if (ide_req(rq)->special) {
|
||||
cmd = ide_req(rq)->special;
|
||||
memset(cmd, 0, sizeof(*cmd));
|
||||
} else {
|
||||
cmd = kzalloc(sizeof(*cmd), GFP_ATOMIC);
|
||||
|
@ -456,10 +455,10 @@ static int idedisk_prep_fn(struct request_queue *q, struct request *rq)
|
|||
rq->cmd_flags &= ~REQ_OP_MASK;
|
||||
rq->cmd_flags |= REQ_OP_DRV_OUT;
|
||||
ide_req(rq)->type = ATA_PRIV_TASKFILE;
|
||||
rq->special = cmd;
|
||||
ide_req(rq)->special = cmd;
|
||||
cmd->rq = rq;
|
||||
|
||||
return BLKPREP_OK;
|
||||
return true;
|
||||
}
|
||||
|
||||
ide_devset_get(multcount, mult_count);
|
||||
|
@ -548,7 +547,7 @@ static void update_flush(ide_drive_t *drive)
|
|||
|
||||
if (barrier) {
|
||||
wc = true;
|
||||
blk_queue_prep_rq(drive->queue, idedisk_prep_fn);
|
||||
drive->prep_rq = idedisk_prep_rq;
|
||||
}
|
||||
}
|
||||
|
||||
|
|
|
@ -125,7 +125,7 @@ ide_startstop_t ide_error(ide_drive_t *drive, const char *msg, u8 stat)
|
|||
/* retry only "normal" I/O: */
|
||||
if (blk_rq_is_passthrough(rq)) {
|
||||
if (ata_taskfile_request(rq)) {
|
||||
struct ide_cmd *cmd = rq->special;
|
||||
struct ide_cmd *cmd = ide_req(rq)->special;
|
||||
|
||||
if (cmd)
|
||||
ide_complete_cmd(drive, cmd, stat, err);
|
||||
|
|
|
@ -276,7 +276,7 @@ static ide_startstop_t ide_floppy_do_request(ide_drive_t *drive,
|
|||
switch (ide_req(rq)->type) {
|
||||
case ATA_PRIV_MISC:
|
||||
case ATA_PRIV_SENSE:
|
||||
pc = (struct ide_atapi_pc *)rq->special;
|
||||
pc = (struct ide_atapi_pc *)ide_req(rq)->special;
|
||||
break;
|
||||
default:
|
||||
BUG();
|
||||
|
|
|
@ -67,7 +67,15 @@ int ide_end_rq(ide_drive_t *drive, struct request *rq, blk_status_t error,
|
|||
ide_dma_on(drive);
|
||||
}
|
||||
|
||||
return blk_end_request(rq, error, nr_bytes);
|
||||
if (!blk_update_request(rq, error, nr_bytes)) {
|
||||
if (rq == drive->sense_rq)
|
||||
drive->sense_rq = NULL;
|
||||
|
||||
__blk_mq_end_request(rq, error);
|
||||
return 0;
|
||||
}
|
||||
|
||||
return 1;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(ide_end_rq);
|
||||
|
||||
|
@ -103,7 +111,7 @@ void ide_complete_cmd(ide_drive_t *drive, struct ide_cmd *cmd, u8 stat, u8 err)
|
|||
}
|
||||
|
||||
if (rq && ata_taskfile_request(rq)) {
|
||||
struct ide_cmd *orig_cmd = rq->special;
|
||||
struct ide_cmd *orig_cmd = ide_req(rq)->special;
|
||||
|
||||
if (cmd->tf_flags & IDE_TFLAG_DYN)
|
||||
kfree(orig_cmd);
|
||||
|
@ -253,7 +261,7 @@ EXPORT_SYMBOL_GPL(ide_init_sg_cmd);
|
|||
static ide_startstop_t execute_drive_cmd (ide_drive_t *drive,
|
||||
struct request *rq)
|
||||
{
|
||||
struct ide_cmd *cmd = rq->special;
|
||||
struct ide_cmd *cmd = ide_req(rq)->special;
|
||||
|
||||
if (cmd) {
|
||||
if (cmd->protocol == ATA_PROT_PIO) {
|
||||
|
@ -307,8 +315,6 @@ static ide_startstop_t start_request (ide_drive_t *drive, struct request *rq)
|
|||
{
|
||||
ide_startstop_t startstop;
|
||||
|
||||
BUG_ON(!(rq->rq_flags & RQF_STARTED));
|
||||
|
||||
#ifdef DEBUG
|
||||
printk("%s: start_request: current=0x%08lx\n",
|
||||
drive->hwif->name, (unsigned long) rq);
|
||||
|
@ -320,6 +326,9 @@ static ide_startstop_t start_request (ide_drive_t *drive, struct request *rq)
|
|||
goto kill_rq;
|
||||
}
|
||||
|
||||
if (drive->prep_rq && !drive->prep_rq(drive, rq))
|
||||
return ide_stopped;
|
||||
|
||||
if (ata_pm_request(rq))
|
||||
ide_check_pm_state(drive, rq);
|
||||
|
||||
|
@ -343,7 +352,7 @@ static ide_startstop_t start_request (ide_drive_t *drive, struct request *rq)
|
|||
if (ata_taskfile_request(rq))
|
||||
return execute_drive_cmd(drive, rq);
|
||||
else if (ata_pm_request(rq)) {
|
||||
struct ide_pm_state *pm = rq->special;
|
||||
struct ide_pm_state *pm = ide_req(rq)->special;
|
||||
#ifdef DEBUG_PM
|
||||
printk("%s: start_power_step(step: %d)\n",
|
||||
drive->name, pm->pm_step);
|
||||
|
@ -430,44 +439,42 @@ static inline void ide_unlock_host(struct ide_host *host)
|
|||
}
|
||||
}
|
||||
|
||||
static void __ide_requeue_and_plug(struct request_queue *q, struct request *rq)
|
||||
{
|
||||
if (rq)
|
||||
blk_requeue_request(q, rq);
|
||||
if (rq || blk_peek_request(q)) {
|
||||
/* Use 3ms as that was the old plug delay */
|
||||
blk_delay_queue(q, 3);
|
||||
}
|
||||
}
|
||||
|
||||
void ide_requeue_and_plug(ide_drive_t *drive, struct request *rq)
|
||||
{
|
||||
struct request_queue *q = drive->queue;
|
||||
unsigned long flags;
|
||||
|
||||
spin_lock_irqsave(q->queue_lock, flags);
|
||||
__ide_requeue_and_plug(q, rq);
|
||||
spin_unlock_irqrestore(q->queue_lock, flags);
|
||||
/* Use 3ms as that was the old plug delay */
|
||||
if (rq) {
|
||||
blk_mq_requeue_request(rq, false);
|
||||
blk_mq_delay_kick_requeue_list(q, 3);
|
||||
} else
|
||||
blk_mq_delay_run_hw_queue(q->queue_hw_ctx[0], 3);
|
||||
}
|
||||
|
||||
/*
|
||||
* Issue a new request to a device.
|
||||
*/
|
||||
void do_ide_request(struct request_queue *q)
|
||||
blk_status_t ide_queue_rq(struct blk_mq_hw_ctx *hctx,
|
||||
const struct blk_mq_queue_data *bd)
|
||||
{
|
||||
ide_drive_t *drive = q->queuedata;
|
||||
ide_drive_t *drive = hctx->queue->queuedata;
|
||||
ide_hwif_t *hwif = drive->hwif;
|
||||
struct ide_host *host = hwif->host;
|
||||
struct request *rq = NULL;
|
||||
struct request *rq = bd->rq;
|
||||
ide_startstop_t startstop;
|
||||
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
if (!blk_rq_is_passthrough(rq) && !(rq->rq_flags & RQF_DONTPREP)) {
|
||||
rq->rq_flags |= RQF_DONTPREP;
|
||||
ide_req(rq)->special = NULL;
|
||||
}
|
||||
|
||||
/* HLD do_request() callback might sleep, make sure it's okay */
|
||||
might_sleep();
|
||||
|
||||
if (ide_lock_host(host, hwif))
|
||||
goto plug_device_2;
|
||||
return BLK_STS_DEV_RESOURCE;
|
||||
|
||||
blk_mq_start_request(rq);
|
||||
|
||||
spin_lock_irq(&hwif->lock);
|
||||
|
||||
|
@ -503,21 +510,16 @@ repeat:
|
|||
hwif->cur_dev = drive;
|
||||
drive->dev_flags &= ~(IDE_DFLAG_SLEEPING | IDE_DFLAG_PARKED);
|
||||
|
||||
spin_unlock_irq(&hwif->lock);
|
||||
spin_lock_irq(q->queue_lock);
|
||||
/*
|
||||
* we know that the queue isn't empty, but this can happen
|
||||
* if the q->prep_rq_fn() decides to kill a request
|
||||
* if ->prep_rq() decides to kill a request
|
||||
*/
|
||||
if (!rq)
|
||||
rq = blk_fetch_request(drive->queue);
|
||||
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
spin_lock_irq(&hwif->lock);
|
||||
|
||||
if (!rq) {
|
||||
ide_unlock_port(hwif);
|
||||
goto out;
|
||||
rq = bd->rq;
|
||||
if (!rq) {
|
||||
ide_unlock_port(hwif);
|
||||
goto out;
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -551,23 +553,24 @@ repeat:
|
|||
if (startstop == ide_stopped) {
|
||||
rq = hwif->rq;
|
||||
hwif->rq = NULL;
|
||||
goto repeat;
|
||||
if (rq)
|
||||
goto repeat;
|
||||
ide_unlock_port(hwif);
|
||||
goto out;
|
||||
}
|
||||
} else
|
||||
goto plug_device;
|
||||
} else {
|
||||
plug_device:
|
||||
spin_unlock_irq(&hwif->lock);
|
||||
ide_unlock_host(host);
|
||||
ide_requeue_and_plug(drive, rq);
|
||||
return BLK_STS_OK;
|
||||
}
|
||||
|
||||
out:
|
||||
spin_unlock_irq(&hwif->lock);
|
||||
if (rq == NULL)
|
||||
ide_unlock_host(host);
|
||||
spin_lock_irq(q->queue_lock);
|
||||
return;
|
||||
|
||||
plug_device:
|
||||
spin_unlock_irq(&hwif->lock);
|
||||
ide_unlock_host(host);
|
||||
plug_device_2:
|
||||
spin_lock_irq(q->queue_lock);
|
||||
__ide_requeue_and_plug(q, rq);
|
||||
return BLK_STS_OK;
|
||||
}
|
||||
|
||||
static int drive_is_ready(ide_drive_t *drive)
|
||||
|
@ -887,3 +890,16 @@ void ide_pad_transfer(ide_drive_t *drive, int write, int len)
|
|||
}
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(ide_pad_transfer);
|
||||
|
||||
void ide_insert_request_head(ide_drive_t *drive, struct request *rq)
|
||||
{
|
||||
ide_hwif_t *hwif = drive->hwif;
|
||||
unsigned long flags;
|
||||
|
||||
spin_lock_irqsave(&hwif->lock, flags);
|
||||
list_add_tail(&rq->queuelist, &drive->rq_list);
|
||||
spin_unlock_irqrestore(&hwif->lock, flags);
|
||||
|
||||
kblockd_schedule_work(&drive->rq_work);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(ide_insert_request_head);
|
||||
|
|
|
@ -27,7 +27,7 @@ static void issue_park_cmd(ide_drive_t *drive, unsigned long timeout)
|
|||
spin_unlock_irq(&hwif->lock);
|
||||
|
||||
if (start_queue)
|
||||
blk_run_queue(q);
|
||||
blk_mq_run_hw_queues(q, true);
|
||||
return;
|
||||
}
|
||||
spin_unlock_irq(&hwif->lock);
|
||||
|
@ -36,7 +36,7 @@ static void issue_park_cmd(ide_drive_t *drive, unsigned long timeout)
|
|||
scsi_req(rq)->cmd[0] = REQ_PARK_HEADS;
|
||||
scsi_req(rq)->cmd_len = 1;
|
||||
ide_req(rq)->type = ATA_PRIV_MISC;
|
||||
rq->special = &timeout;
|
||||
ide_req(rq)->special = &timeout;
|
||||
blk_execute_rq(q, NULL, rq, 1);
|
||||
rc = scsi_req(rq)->result ? -EIO : 0;
|
||||
blk_put_request(rq);
|
||||
|
@ -54,7 +54,7 @@ static void issue_park_cmd(ide_drive_t *drive, unsigned long timeout)
|
|||
scsi_req(rq)->cmd[0] = REQ_UNPARK_HEADS;
|
||||
scsi_req(rq)->cmd_len = 1;
|
||||
ide_req(rq)->type = ATA_PRIV_MISC;
|
||||
elv_add_request(q, rq, ELEVATOR_INSERT_FRONT);
|
||||
ide_insert_request_head(drive, rq);
|
||||
|
||||
out:
|
||||
return;
|
||||
|
@ -67,7 +67,7 @@ ide_startstop_t ide_do_park_unpark(ide_drive_t *drive, struct request *rq)
|
|||
|
||||
memset(&cmd, 0, sizeof(cmd));
|
||||
if (scsi_req(rq)->cmd[0] == REQ_PARK_HEADS) {
|
||||
drive->sleep = *(unsigned long *)rq->special;
|
||||
drive->sleep = *(unsigned long *)ide_req(rq)->special;
|
||||
drive->dev_flags |= IDE_DFLAG_SLEEPING;
|
||||
tf->command = ATA_CMD_IDLEIMMEDIATE;
|
||||
tf->feature = 0x44;
|
||||
|
|
|
@ -21,7 +21,7 @@ int generic_ide_suspend(struct device *dev, pm_message_t mesg)
|
|||
memset(&rqpm, 0, sizeof(rqpm));
|
||||
rq = blk_get_request(drive->queue, REQ_OP_DRV_IN, 0);
|
||||
ide_req(rq)->type = ATA_PRIV_PM_SUSPEND;
|
||||
rq->special = &rqpm;
|
||||
ide_req(rq)->special = &rqpm;
|
||||
rqpm.pm_step = IDE_PM_START_SUSPEND;
|
||||
if (mesg.event == PM_EVENT_PRETHAW)
|
||||
mesg.event = PM_EVENT_FREEZE;
|
||||
|
@ -40,32 +40,17 @@ int generic_ide_suspend(struct device *dev, pm_message_t mesg)
|
|||
return ret;
|
||||
}
|
||||
|
||||
static void ide_end_sync_rq(struct request *rq, blk_status_t error)
|
||||
{
|
||||
complete(rq->end_io_data);
|
||||
}
|
||||
|
||||
static int ide_pm_execute_rq(struct request *rq)
|
||||
{
|
||||
struct request_queue *q = rq->q;
|
||||
DECLARE_COMPLETION_ONSTACK(wait);
|
||||
|
||||
rq->end_io_data = &wait;
|
||||
rq->end_io = ide_end_sync_rq;
|
||||
|
||||
spin_lock_irq(q->queue_lock);
|
||||
if (unlikely(blk_queue_dying(q))) {
|
||||
rq->rq_flags |= RQF_QUIET;
|
||||
scsi_req(rq)->result = -ENXIO;
|
||||
__blk_end_request_all(rq, BLK_STS_OK);
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
blk_mq_end_request(rq, BLK_STS_OK);
|
||||
return -ENXIO;
|
||||
}
|
||||
__elv_add_request(q, rq, ELEVATOR_INSERT_FRONT);
|
||||
__blk_run_queue_uncond(q);
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
|
||||
wait_for_completion_io(&wait);
|
||||
blk_execute_rq(q, NULL, rq, true);
|
||||
|
||||
return scsi_req(rq)->result ? -EIO : 0;
|
||||
}
|
||||
|
@ -79,6 +64,8 @@ int generic_ide_resume(struct device *dev)
|
|||
struct ide_pm_state rqpm;
|
||||
int err;
|
||||
|
||||
blk_mq_start_stopped_hw_queues(drive->queue, true);
|
||||
|
||||
if (ide_port_acpi(hwif)) {
|
||||
/* call ACPI _PS0 / _STM only once */
|
||||
if ((drive->dn & 1) == 0 || pair == NULL) {
|
||||
|
@ -92,7 +79,7 @@ int generic_ide_resume(struct device *dev)
|
|||
memset(&rqpm, 0, sizeof(rqpm));
|
||||
rq = blk_get_request(drive->queue, REQ_OP_DRV_IN, BLK_MQ_REQ_PREEMPT);
|
||||
ide_req(rq)->type = ATA_PRIV_PM_RESUME;
|
||||
rq->special = &rqpm;
|
||||
ide_req(rq)->special = &rqpm;
|
||||
rqpm.pm_step = IDE_PM_START_RESUME;
|
||||
rqpm.pm_state = PM_EVENT_ON;
|
||||
|
||||
|
@ -111,7 +98,7 @@ int generic_ide_resume(struct device *dev)
|
|||
|
||||
void ide_complete_power_step(ide_drive_t *drive, struct request *rq)
|
||||
{
|
||||
struct ide_pm_state *pm = rq->special;
|
||||
struct ide_pm_state *pm = ide_req(rq)->special;
|
||||
|
||||
#ifdef DEBUG_PM
|
||||
printk(KERN_INFO "%s: complete_power_step(step: %d)\n",
|
||||
|
@ -141,7 +128,7 @@ void ide_complete_power_step(ide_drive_t *drive, struct request *rq)
|
|||
|
||||
ide_startstop_t ide_start_power_step(ide_drive_t *drive, struct request *rq)
|
||||
{
|
||||
struct ide_pm_state *pm = rq->special;
|
||||
struct ide_pm_state *pm = ide_req(rq)->special;
|
||||
struct ide_cmd cmd = { };
|
||||
|
||||
switch (pm->pm_step) {
|
||||
|
@ -213,8 +200,7 @@ out_do_tf:
|
|||
void ide_complete_pm_rq(ide_drive_t *drive, struct request *rq)
|
||||
{
|
||||
struct request_queue *q = drive->queue;
|
||||
struct ide_pm_state *pm = rq->special;
|
||||
unsigned long flags;
|
||||
struct ide_pm_state *pm = ide_req(rq)->special;
|
||||
|
||||
ide_complete_power_step(drive, rq);
|
||||
if (pm->pm_step != IDE_PM_COMPLETED)
|
||||
|
@ -224,22 +210,19 @@ void ide_complete_pm_rq(ide_drive_t *drive, struct request *rq)
|
|||
printk("%s: completing PM request, %s\n", drive->name,
|
||||
(ide_req(rq)->type == ATA_PRIV_PM_SUSPEND) ? "suspend" : "resume");
|
||||
#endif
|
||||
spin_lock_irqsave(q->queue_lock, flags);
|
||||
if (ide_req(rq)->type == ATA_PRIV_PM_SUSPEND)
|
||||
blk_stop_queue(q);
|
||||
blk_mq_stop_hw_queues(q);
|
||||
else
|
||||
drive->dev_flags &= ~IDE_DFLAG_BLOCKED;
|
||||
spin_unlock_irqrestore(q->queue_lock, flags);
|
||||
|
||||
drive->hwif->rq = NULL;
|
||||
|
||||
if (blk_end_request(rq, BLK_STS_OK, 0))
|
||||
BUG();
|
||||
blk_mq_end_request(rq, BLK_STS_OK);
|
||||
}
|
||||
|
||||
void ide_check_pm_state(ide_drive_t *drive, struct request *rq)
|
||||
{
|
||||
struct ide_pm_state *pm = rq->special;
|
||||
struct ide_pm_state *pm = ide_req(rq)->special;
|
||||
|
||||
if (blk_rq_is_private(rq) &&
|
||||
ide_req(rq)->type == ATA_PRIV_PM_SUSPEND &&
|
||||
|
@ -260,7 +243,6 @@ void ide_check_pm_state(ide_drive_t *drive, struct request *rq)
|
|||
ide_hwif_t *hwif = drive->hwif;
|
||||
const struct ide_tp_ops *tp_ops = hwif->tp_ops;
|
||||
struct request_queue *q = drive->queue;
|
||||
unsigned long flags;
|
||||
int rc;
|
||||
#ifdef DEBUG_PM
|
||||
printk("%s: Wakeup request inited, waiting for !BSY...\n", drive->name);
|
||||
|
@ -274,8 +256,6 @@ void ide_check_pm_state(ide_drive_t *drive, struct request *rq)
|
|||
if (rc)
|
||||
printk(KERN_WARNING "%s: drive not ready on wakeup\n", drive->name);
|
||||
|
||||
spin_lock_irqsave(q->queue_lock, flags);
|
||||
blk_start_queue(q);
|
||||
spin_unlock_irqrestore(q->queue_lock, flags);
|
||||
blk_mq_start_hw_queues(q);
|
||||
}
|
||||
}
|
||||
|
|
|
@ -746,10 +746,16 @@ static void ide_initialize_rq(struct request *rq)
|
|||
{
|
||||
struct ide_request *req = blk_mq_rq_to_pdu(rq);
|
||||
|
||||
req->special = NULL;
|
||||
scsi_req_init(&req->sreq);
|
||||
req->sreq.sense = req->sense;
|
||||
}
|
||||
|
||||
static const struct blk_mq_ops ide_mq_ops = {
|
||||
.queue_rq = ide_queue_rq,
|
||||
.initialize_rq_fn = ide_initialize_rq,
|
||||
};
|
||||
|
||||
/*
|
||||
* init request queue
|
||||
*/
|
||||
|
@ -759,6 +765,7 @@ static int ide_init_queue(ide_drive_t *drive)
|
|||
ide_hwif_t *hwif = drive->hwif;
|
||||
int max_sectors = 256;
|
||||
int max_sg_entries = PRD_ENTRIES;
|
||||
struct blk_mq_tag_set *set;
|
||||
|
||||
/*
|
||||
* Our default set up assumes the normal IDE case,
|
||||
|
@ -767,19 +774,26 @@ static int ide_init_queue(ide_drive_t *drive)
|
|||
* limits and LBA48 we could raise it but as yet
|
||||
* do not.
|
||||
*/
|
||||
q = blk_alloc_queue_node(GFP_KERNEL, hwif_to_node(hwif), NULL);
|
||||
if (!q)
|
||||
|
||||
set = &drive->tag_set;
|
||||
set->ops = &ide_mq_ops;
|
||||
set->nr_hw_queues = 1;
|
||||
set->queue_depth = 32;
|
||||
set->reserved_tags = 1;
|
||||
set->cmd_size = sizeof(struct ide_request);
|
||||
set->numa_node = hwif_to_node(hwif);
|
||||
set->flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_BLOCKING;
|
||||
if (blk_mq_alloc_tag_set(set))
|
||||
return 1;
|
||||
|
||||
q->request_fn = do_ide_request;
|
||||
q->initialize_rq_fn = ide_initialize_rq;
|
||||
q->cmd_size = sizeof(struct ide_request);
|
||||
blk_queue_flag_set(QUEUE_FLAG_SCSI_PASSTHROUGH, q);
|
||||
if (blk_init_allocated_queue(q) < 0) {
|
||||
blk_cleanup_queue(q);
|
||||
q = blk_mq_init_queue(set);
|
||||
if (IS_ERR(q)) {
|
||||
blk_mq_free_tag_set(set);
|
||||
return 1;
|
||||
}
|
||||
|
||||
blk_queue_flag_set(QUEUE_FLAG_SCSI_PASSTHROUGH, q);
|
||||
|
||||
q->queuedata = drive;
|
||||
blk_queue_segment_boundary(q, 0xffff);
|
||||
|
||||
|
@ -965,8 +979,12 @@ static void drive_release_dev (struct device *dev)
|
|||
|
||||
ide_proc_unregister_device(drive);
|
||||
|
||||
if (drive->sense_rq)
|
||||
blk_mq_free_request(drive->sense_rq);
|
||||
|
||||
blk_cleanup_queue(drive->queue);
|
||||
drive->queue = NULL;
|
||||
blk_mq_free_tag_set(&drive->tag_set);
|
||||
|
||||
drive->dev_flags &= ~IDE_DFLAG_PRESENT;
|
||||
|
||||
|
@ -1133,6 +1151,28 @@ static void ide_port_cable_detect(ide_hwif_t *hwif)
|
|||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Deferred request list insertion handler
|
||||
*/
|
||||
static void drive_rq_insert_work(struct work_struct *work)
|
||||
{
|
||||
ide_drive_t *drive = container_of(work, ide_drive_t, rq_work);
|
||||
ide_hwif_t *hwif = drive->hwif;
|
||||
struct request *rq;
|
||||
LIST_HEAD(list);
|
||||
|
||||
spin_lock_irq(&hwif->lock);
|
||||
if (!list_empty(&drive->rq_list))
|
||||
list_splice_init(&drive->rq_list, &list);
|
||||
spin_unlock_irq(&hwif->lock);
|
||||
|
||||
while (!list_empty(&list)) {
|
||||
rq = list_first_entry(&list, struct request, queuelist);
|
||||
list_del_init(&rq->queuelist);
|
||||
blk_execute_rq_nowait(drive->queue, rq->rq_disk, rq, true, NULL);
|
||||
}
|
||||
}
|
||||
|
||||
static const u8 ide_hwif_to_major[] =
|
||||
{ IDE0_MAJOR, IDE1_MAJOR, IDE2_MAJOR, IDE3_MAJOR, IDE4_MAJOR,
|
||||
IDE5_MAJOR, IDE6_MAJOR, IDE7_MAJOR, IDE8_MAJOR, IDE9_MAJOR };
|
||||
|
@ -1145,12 +1185,10 @@ static void ide_port_init_devices_data(ide_hwif_t *hwif)
|
|||
ide_port_for_each_dev(i, drive, hwif) {
|
||||
u8 j = (hwif->index * MAX_DRIVES) + i;
|
||||
u16 *saved_id = drive->id;
|
||||
struct request *saved_sense_rq = drive->sense_rq;
|
||||
|
||||
memset(drive, 0, sizeof(*drive));
|
||||
memset(saved_id, 0, SECTOR_SIZE);
|
||||
drive->id = saved_id;
|
||||
drive->sense_rq = saved_sense_rq;
|
||||
|
||||
drive->media = ide_disk;
|
||||
drive->select = (i << 4) | ATA_DEVICE_OBS;
|
||||
|
@ -1166,6 +1204,9 @@ static void ide_port_init_devices_data(ide_hwif_t *hwif)
|
|||
|
||||
INIT_LIST_HEAD(&drive->list);
|
||||
init_completion(&drive->gendev_rel_comp);
|
||||
|
||||
INIT_WORK(&drive->rq_work, drive_rq_insert_work);
|
||||
INIT_LIST_HEAD(&drive->rq_list);
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -1255,7 +1296,6 @@ static void ide_port_free_devices(ide_hwif_t *hwif)
|
|||
int i;
|
||||
|
||||
ide_port_for_each_dev(i, drive, hwif) {
|
||||
kfree(drive->sense_rq);
|
||||
kfree(drive->id);
|
||||
kfree(drive);
|
||||
}
|
||||
|
@ -1283,17 +1323,10 @@ static int ide_port_alloc_devices(ide_hwif_t *hwif, int node)
|
|||
if (drive->id == NULL)
|
||||
goto out_free_drive;
|
||||
|
||||
drive->sense_rq = kmalloc(sizeof(struct request) +
|
||||
sizeof(struct ide_request), GFP_KERNEL);
|
||||
if (!drive->sense_rq)
|
||||
goto out_free_id;
|
||||
|
||||
hwif->devices[i] = drive;
|
||||
}
|
||||
return 0;
|
||||
|
||||
out_free_id:
|
||||
kfree(drive->id);
|
||||
out_free_drive:
|
||||
kfree(drive);
|
||||
out_nomem:
|
||||
|
|
|
@ -639,7 +639,7 @@ static ide_startstop_t idetape_do_request(ide_drive_t *drive,
|
|||
goto out;
|
||||
}
|
||||
if (req->cmd[13] & REQ_IDETAPE_PC1) {
|
||||
pc = (struct ide_atapi_pc *)rq->special;
|
||||
pc = (struct ide_atapi_pc *)ide_req(rq)->special;
|
||||
req->cmd[13] &= ~(REQ_IDETAPE_PC1);
|
||||
req->cmd[13] |= REQ_IDETAPE_PC2;
|
||||
goto out;
|
||||
|
|
|
@ -440,7 +440,7 @@ int ide_raw_taskfile(ide_drive_t *drive, struct ide_cmd *cmd, u8 *buf,
|
|||
goto put_req;
|
||||
}
|
||||
|
||||
rq->special = cmd;
|
||||
ide_req(rq)->special = cmd;
|
||||
cmd->rq = rq;
|
||||
|
||||
blk_execute_rq(drive->queue, NULL, rq, 0);
|
||||
|
|
|
@ -389,7 +389,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
|
|||
goto err_dev;
|
||||
}
|
||||
|
||||
tqueue = blk_alloc_queue_node(GFP_KERNEL, dev->q->node, NULL);
|
||||
tqueue = blk_alloc_queue_node(GFP_KERNEL, dev->q->node);
|
||||
if (!tqueue) {
|
||||
ret = -ENOMEM;
|
||||
goto err_disk;
|
||||
|
@ -974,7 +974,7 @@ static int nvm_get_bb_meta(struct nvm_dev *dev, sector_t slba,
|
|||
struct ppa_addr ppa;
|
||||
u8 *blks;
|
||||
int ch, lun, nr_blks;
|
||||
int ret;
|
||||
int ret = 0;
|
||||
|
||||
ppa.ppa = slba;
|
||||
ppa = dev_to_generic_addr(dev, ppa);
|
||||
|
@ -1140,20 +1140,26 @@ EXPORT_SYMBOL(nvm_alloc_dev);
|
|||
|
||||
int nvm_register(struct nvm_dev *dev)
|
||||
{
|
||||
int ret;
|
||||
int ret, exp_pool_size;
|
||||
|
||||
if (!dev->q || !dev->ops)
|
||||
return -EINVAL;
|
||||
|
||||
dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist");
|
||||
if (!dev->dma_pool) {
|
||||
pr_err("nvm: could not create dma pool\n");
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
ret = nvm_init(dev);
|
||||
if (ret)
|
||||
goto err_init;
|
||||
return ret;
|
||||
|
||||
exp_pool_size = max_t(int, PAGE_SIZE,
|
||||
(NVM_MAX_VLBA * (sizeof(u64) + dev->geo.sos)));
|
||||
exp_pool_size = round_up(exp_pool_size, PAGE_SIZE);
|
||||
|
||||
dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist",
|
||||
exp_pool_size);
|
||||
if (!dev->dma_pool) {
|
||||
pr_err("nvm: could not create dma pool\n");
|
||||
nvm_free(dev);
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
/* register device with a supported media manager */
|
||||
down_write(&nvm_lock);
|
||||
|
@ -1161,9 +1167,6 @@ int nvm_register(struct nvm_dev *dev)
|
|||
up_write(&nvm_lock);
|
||||
|
||||
return 0;
|
||||
err_init:
|
||||
dev->ops->destroy_dma_pool(dev->dma_pool);
|
||||
return ret;
|
||||
}
|
||||
EXPORT_SYMBOL(nvm_register);
|
||||
|
||||
|
|
|
@ -250,8 +250,8 @@ int pblk_alloc_rqd_meta(struct pblk *pblk, struct nvm_rq *rqd)
|
|||
if (rqd->nr_ppas == 1)
|
||||
return 0;
|
||||
|
||||
rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
|
||||
rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;
|
||||
rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size(pblk);
|
||||
rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size(pblk);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
@ -376,7 +376,7 @@ void pblk_write_should_kick(struct pblk *pblk)
|
|||
{
|
||||
unsigned int secs_avail = pblk_rb_read_count(&pblk->rwb);
|
||||
|
||||
if (secs_avail >= pblk->min_write_pgs)
|
||||
if (secs_avail >= pblk->min_write_pgs_data)
|
||||
pblk_write_kick(pblk);
|
||||
}
|
||||
|
||||
|
@ -407,7 +407,9 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, struct pblk_line *line)
|
|||
struct pblk_line_meta *lm = &pblk->lm;
|
||||
struct pblk_line_mgmt *l_mg = &pblk->l_mg;
|
||||
struct list_head *move_list = NULL;
|
||||
int vsc = le32_to_cpu(*line->vsc);
|
||||
int packed_meta = (le32_to_cpu(*line->vsc) / pblk->min_write_pgs_data)
|
||||
* (pblk->min_write_pgs - pblk->min_write_pgs_data);
|
||||
int vsc = le32_to_cpu(*line->vsc) + packed_meta;
|
||||
|
||||
lockdep_assert_held(&line->lock);
|
||||
|
||||
|
@ -531,7 +533,7 @@ void pblk_check_chunk_state_update(struct pblk *pblk, struct nvm_rq *rqd)
|
|||
if (caddr == 0)
|
||||
trace_pblk_chunk_state(pblk_disk_name(pblk),
|
||||
ppa, NVM_CHK_ST_OPEN);
|
||||
else if (caddr == chunk->cnlb)
|
||||
else if (caddr == (chunk->cnlb - 1))
|
||||
trace_pblk_chunk_state(pblk_disk_name(pblk),
|
||||
ppa, NVM_CHK_ST_CLOSED);
|
||||
}
|
||||
|
@ -620,12 +622,15 @@ out:
|
|||
}
|
||||
|
||||
int pblk_calc_secs(struct pblk *pblk, unsigned long secs_avail,
|
||||
unsigned long secs_to_flush)
|
||||
unsigned long secs_to_flush, bool skip_meta)
|
||||
{
|
||||
int max = pblk->sec_per_write;
|
||||
int min = pblk->min_write_pgs;
|
||||
int secs_to_sync = 0;
|
||||
|
||||
if (skip_meta && pblk->min_write_pgs_data != pblk->min_write_pgs)
|
||||
min = max = pblk->min_write_pgs_data;
|
||||
|
||||
if (secs_avail >= max)
|
||||
secs_to_sync = max;
|
||||
else if (secs_avail >= min)
|
||||
|
@ -796,10 +801,11 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
|
|||
rqd.is_seq = 1;
|
||||
|
||||
for (i = 0; i < lm->smeta_sec; i++, paddr++) {
|
||||
struct pblk_sec_meta *meta_list = rqd.meta_list;
|
||||
struct pblk_sec_meta *meta = pblk_get_meta(pblk,
|
||||
rqd.meta_list, i);
|
||||
|
||||
rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
|
||||
meta_list[i].lba = lba_list[paddr] = addr_empty;
|
||||
meta->lba = lba_list[paddr] = addr_empty;
|
||||
}
|
||||
|
||||
ret = pblk_submit_io_sync_sem(pblk, &rqd);
|
||||
|
@ -845,13 +851,13 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
|
|||
if (!meta_list)
|
||||
return -ENOMEM;
|
||||
|
||||
ppa_list = meta_list + pblk_dma_meta_size;
|
||||
dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
|
||||
ppa_list = meta_list + pblk_dma_meta_size(pblk);
|
||||
dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk);
|
||||
|
||||
next_rq:
|
||||
memset(&rqd, 0, sizeof(struct nvm_rq));
|
||||
|
||||
rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
|
||||
rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
|
||||
rq_len = rq_ppas * geo->csecs;
|
||||
|
||||
bio = pblk_bio_map_addr(pblk, emeta_buf, rq_ppas, rq_len,
|
||||
|
@ -1276,6 +1282,7 @@ static int pblk_line_prepare(struct pblk *pblk, struct pblk_line *line)
|
|||
return 0;
|
||||
}
|
||||
|
||||
/* Line allocations in the recovery path are always single threaded */
|
||||
int pblk_line_recov_alloc(struct pblk *pblk, struct pblk_line *line)
|
||||
{
|
||||
struct pblk_line_mgmt *l_mg = &pblk->l_mg;
|
||||
|
@ -1295,15 +1302,22 @@ int pblk_line_recov_alloc(struct pblk *pblk, struct pblk_line *line)
|
|||
|
||||
ret = pblk_line_alloc_bitmaps(pblk, line);
|
||||
if (ret)
|
||||
return ret;
|
||||
goto fail;
|
||||
|
||||
if (!pblk_line_init_bb(pblk, line, 0)) {
|
||||
list_add(&line->list, &l_mg->free_list);
|
||||
return -EINTR;
|
||||
ret = -EINTR;
|
||||
goto fail;
|
||||
}
|
||||
|
||||
pblk_rl_free_lines_dec(&pblk->rl, line, true);
|
||||
return 0;
|
||||
|
||||
fail:
|
||||
spin_lock(&l_mg->free_lock);
|
||||
list_add(&line->list, &l_mg->free_list);
|
||||
spin_unlock(&l_mg->free_lock);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
void pblk_line_recov_close(struct pblk *pblk, struct pblk_line *line)
|
||||
|
@ -2160,3 +2174,38 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
|
|||
}
|
||||
spin_unlock(&pblk->trans_lock);
|
||||
}
|
||||
|
||||
void *pblk_get_meta_for_writes(struct pblk *pblk, struct nvm_rq *rqd)
|
||||
{
|
||||
void *buffer;
|
||||
|
||||
if (pblk_is_oob_meta_supported(pblk)) {
|
||||
/* Just use OOB metadata buffer as always */
|
||||
buffer = rqd->meta_list;
|
||||
} else {
|
||||
/* We need to reuse last page of request (packed metadata)
|
||||
* in similar way as traditional oob metadata
|
||||
*/
|
||||
buffer = page_to_virt(
|
||||
rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
|
||||
}
|
||||
|
||||
return buffer;
|
||||
}
|
||||
|
||||
void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd)
|
||||
{
|
||||
void *meta_list = rqd->meta_list;
|
||||
void *page;
|
||||
int i = 0;
|
||||
|
||||
if (pblk_is_oob_meta_supported(pblk))
|
||||
return;
|
||||
|
||||
page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
|
||||
/* We need to fill oob meta buffer with data from packed metadata */
|
||||
for (; i < rqd->nr_ppas; i++)
|
||||
memcpy(pblk_get_meta(pblk, meta_list, i),
|
||||
page + (i * sizeof(struct pblk_sec_meta)),
|
||||
sizeof(struct pblk_sec_meta));
|
||||
}
|
||||
|
|
|
@ -207,9 +207,6 @@ static int pblk_rwb_init(struct pblk *pblk)
|
|||
return pblk_rb_init(&pblk->rwb, buffer_size, threshold, geo->csecs);
|
||||
}
|
||||
|
||||
/* Minimum pages needed within a lun */
|
||||
#define ADDR_POOL_SIZE 64
|
||||
|
||||
static int pblk_set_addrf_12(struct pblk *pblk, struct nvm_geo *geo,
|
||||
struct nvm_addrf_12 *dst)
|
||||
{
|
||||
|
@ -350,23 +347,19 @@ fail_destroy_ws:
|
|||
|
||||
static int pblk_get_global_caches(void)
|
||||
{
|
||||
int ret;
|
||||
int ret = 0;
|
||||
|
||||
mutex_lock(&pblk_caches.mutex);
|
||||
|
||||
if (kref_read(&pblk_caches.kref) > 0) {
|
||||
kref_get(&pblk_caches.kref);
|
||||
mutex_unlock(&pblk_caches.mutex);
|
||||
return 0;
|
||||
}
|
||||
if (kref_get_unless_zero(&pblk_caches.kref))
|
||||
goto out;
|
||||
|
||||
ret = pblk_create_global_caches();
|
||||
|
||||
if (!ret)
|
||||
kref_get(&pblk_caches.kref);
|
||||
kref_init(&pblk_caches.kref);
|
||||
|
||||
out:
|
||||
mutex_unlock(&pblk_caches.mutex);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
|
@ -406,12 +399,45 @@ static int pblk_core_init(struct pblk *pblk)
|
|||
pblk->nr_flush_rst = 0;
|
||||
|
||||
pblk->min_write_pgs = geo->ws_opt;
|
||||
pblk->min_write_pgs_data = pblk->min_write_pgs;
|
||||
max_write_ppas = pblk->min_write_pgs * geo->all_luns;
|
||||
pblk->max_write_pgs = min_t(int, max_write_ppas, NVM_MAX_VLBA);
|
||||
pblk->max_write_pgs = min_t(int, pblk->max_write_pgs,
|
||||
queue_max_hw_sectors(dev->q) / (geo->csecs >> SECTOR_SHIFT));
|
||||
pblk_set_sec_per_write(pblk, pblk->min_write_pgs);
|
||||
|
||||
pblk->oob_meta_size = geo->sos;
|
||||
if (!pblk_is_oob_meta_supported(pblk)) {
|
||||
/* For drives which does not have OOB metadata feature
|
||||
* in order to support recovery feature we need to use
|
||||
* so called packed metadata. Packed metada will store
|
||||
* the same information as OOB metadata (l2p table mapping,
|
||||
* but in the form of the single page at the end of
|
||||
* every write request.
|
||||
*/
|
||||
if (pblk->min_write_pgs
|
||||
* sizeof(struct pblk_sec_meta) > PAGE_SIZE) {
|
||||
/* We want to keep all the packed metadata on single
|
||||
* page per write requests. So we need to ensure that
|
||||
* it will fit.
|
||||
*
|
||||
* This is more like sanity check, since there is
|
||||
* no device with such a big minimal write size
|
||||
* (above 1 metabytes).
|
||||
*/
|
||||
pblk_err(pblk, "Not supported min write size\n");
|
||||
return -EINVAL;
|
||||
}
|
||||
/* For packed meta approach we do some simplification.
|
||||
* On read path we always issue requests which size
|
||||
* equal to max_write_pgs, with all pages filled with
|
||||
* user payload except of last one page which will be
|
||||
* filled with packed metadata.
|
||||
*/
|
||||
pblk->max_write_pgs = pblk->min_write_pgs;
|
||||
pblk->min_write_pgs_data = pblk->min_write_pgs - 1;
|
||||
}
|
||||
|
||||
pblk->pad_dist = kcalloc(pblk->min_write_pgs - 1, sizeof(atomic64_t),
|
||||
GFP_KERNEL);
|
||||
if (!pblk->pad_dist)
|
||||
|
@ -635,40 +661,61 @@ static unsigned int calc_emeta_len(struct pblk *pblk)
|
|||
return (lm->emeta_len[1] + lm->emeta_len[2] + lm->emeta_len[3]);
|
||||
}
|
||||
|
||||
static void pblk_set_provision(struct pblk *pblk, long nr_free_blks)
|
||||
static int pblk_set_provision(struct pblk *pblk, int nr_free_chks)
|
||||
{
|
||||
struct nvm_tgt_dev *dev = pblk->dev;
|
||||
struct pblk_line_mgmt *l_mg = &pblk->l_mg;
|
||||
struct pblk_line_meta *lm = &pblk->lm;
|
||||
struct nvm_geo *geo = &dev->geo;
|
||||
sector_t provisioned;
|
||||
int sec_meta, blk_meta;
|
||||
int sec_meta, blk_meta, clba;
|
||||
int minimum;
|
||||
|
||||
if (geo->op == NVM_TARGET_DEFAULT_OP)
|
||||
pblk->op = PBLK_DEFAULT_OP;
|
||||
else
|
||||
pblk->op = geo->op;
|
||||
|
||||
provisioned = nr_free_blks;
|
||||
minimum = pblk_get_min_chks(pblk);
|
||||
provisioned = nr_free_chks;
|
||||
provisioned *= (100 - pblk->op);
|
||||
sector_div(provisioned, 100);
|
||||
|
||||
pblk->op_blks = nr_free_blks - provisioned;
|
||||
if ((nr_free_chks - provisioned) < minimum) {
|
||||
if (geo->op != NVM_TARGET_DEFAULT_OP) {
|
||||
pblk_err(pblk, "OP too small to create a sane instance\n");
|
||||
return -EINTR;
|
||||
}
|
||||
|
||||
/* If the user did not specify an OP value, and PBLK_DEFAULT_OP
|
||||
* is not enough, calculate and set sane value
|
||||
*/
|
||||
|
||||
provisioned = nr_free_chks - minimum;
|
||||
pblk->op = (100 * minimum) / nr_free_chks;
|
||||
pblk_info(pblk, "Default OP insufficient, adjusting OP to %d\n",
|
||||
pblk->op);
|
||||
}
|
||||
|
||||
pblk->op_blks = nr_free_chks - provisioned;
|
||||
|
||||
/* Internally pblk manages all free blocks, but all calculations based
|
||||
* on user capacity consider only provisioned blocks
|
||||
*/
|
||||
pblk->rl.total_blocks = nr_free_blks;
|
||||
pblk->rl.nr_secs = nr_free_blks * geo->clba;
|
||||
pblk->rl.total_blocks = nr_free_chks;
|
||||
pblk->rl.nr_secs = nr_free_chks * geo->clba;
|
||||
|
||||
/* Consider sectors used for metadata */
|
||||
sec_meta = (lm->smeta_sec + lm->emeta_sec[0]) * l_mg->nr_free_lines;
|
||||
blk_meta = DIV_ROUND_UP(sec_meta, geo->clba);
|
||||
|
||||
pblk->capacity = (provisioned - blk_meta) * geo->clba;
|
||||
clba = (geo->clba / pblk->min_write_pgs) * pblk->min_write_pgs_data;
|
||||
pblk->capacity = (provisioned - blk_meta) * clba;
|
||||
|
||||
atomic_set(&pblk->rl.free_blocks, nr_free_blks);
|
||||
atomic_set(&pblk->rl.free_user_blocks, nr_free_blks);
|
||||
atomic_set(&pblk->rl.free_blocks, nr_free_chks);
|
||||
atomic_set(&pblk->rl.free_user_blocks, nr_free_chks);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int pblk_setup_line_meta_chk(struct pblk *pblk, struct pblk_line *line,
|
||||
|
@ -984,7 +1031,7 @@ static int pblk_lines_init(struct pblk *pblk)
|
|||
struct pblk_line_mgmt *l_mg = &pblk->l_mg;
|
||||
struct pblk_line *line;
|
||||
void *chunk_meta;
|
||||
long nr_free_chks = 0;
|
||||
int nr_free_chks = 0;
|
||||
int i, ret;
|
||||
|
||||
ret = pblk_line_meta_init(pblk);
|
||||
|
@ -1031,7 +1078,9 @@ static int pblk_lines_init(struct pblk *pblk)
|
|||
goto fail_free_lines;
|
||||
}
|
||||
|
||||
pblk_set_provision(pblk, nr_free_chks);
|
||||
ret = pblk_set_provision(pblk, nr_free_chks);
|
||||
if (ret)
|
||||
goto fail_free_lines;
|
||||
|
||||
vfree(chunk_meta);
|
||||
return 0;
|
||||
|
@ -1041,7 +1090,7 @@ fail_free_lines:
|
|||
pblk_line_meta_free(l_mg, &pblk->lines[i]);
|
||||
kfree(pblk->lines);
|
||||
fail_free_chunk_meta:
|
||||
kfree(chunk_meta);
|
||||
vfree(chunk_meta);
|
||||
fail_free_luns:
|
||||
kfree(pblk->luns);
|
||||
fail_free_meta:
|
||||
|
@ -1154,6 +1203,12 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
|
|||
return ERR_PTR(-EINVAL);
|
||||
}
|
||||
|
||||
if (geo->ext) {
|
||||
pblk_err(pblk, "extended metadata not supported\n");
|
||||
kfree(pblk);
|
||||
return ERR_PTR(-EINVAL);
|
||||
}
|
||||
|
||||
spin_lock_init(&pblk->resubmit_lock);
|
||||
spin_lock_init(&pblk->trans_lock);
|
||||
spin_lock_init(&pblk->lock);
|
||||
|
|
|
@ -22,7 +22,7 @@
|
|||
static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
|
||||
struct ppa_addr *ppa_list,
|
||||
unsigned long *lun_bitmap,
|
||||
struct pblk_sec_meta *meta_list,
|
||||
void *meta_list,
|
||||
unsigned int valid_secs)
|
||||
{
|
||||
struct pblk_line *line = pblk_line_get_data(pblk);
|
||||
|
@ -33,6 +33,9 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
|
|||
int nr_secs = pblk->min_write_pgs;
|
||||
int i;
|
||||
|
||||
if (!line)
|
||||
return -ENOSPC;
|
||||
|
||||
if (pblk_line_is_full(line)) {
|
||||
struct pblk_line *prev_line = line;
|
||||
|
||||
|
@ -42,8 +45,11 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
|
|||
line = pblk_line_replace_data(pblk);
|
||||
pblk_line_close_meta(pblk, prev_line);
|
||||
|
||||
if (!line)
|
||||
return -EINTR;
|
||||
if (!line) {
|
||||
pblk_pipeline_stop(pblk);
|
||||
return -ENOSPC;
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
emeta = line->emeta;
|
||||
|
@ -52,6 +58,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
|
|||
paddr = pblk_alloc_page(pblk, line, nr_secs);
|
||||
|
||||
for (i = 0; i < nr_secs; i++, paddr++) {
|
||||
struct pblk_sec_meta *meta = pblk_get_meta(pblk, meta_list, i);
|
||||
__le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
|
||||
|
||||
/* ppa to be sent to the device */
|
||||
|
@ -68,14 +75,15 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
|
|||
kref_get(&line->ref);
|
||||
w_ctx = pblk_rb_w_ctx(&pblk->rwb, sentry + i);
|
||||
w_ctx->ppa = ppa_list[i];
|
||||
meta_list[i].lba = cpu_to_le64(w_ctx->lba);
|
||||
meta->lba = cpu_to_le64(w_ctx->lba);
|
||||
lba_list[paddr] = cpu_to_le64(w_ctx->lba);
|
||||
if (lba_list[paddr] != addr_empty)
|
||||
line->nr_valid_lbas++;
|
||||
else
|
||||
atomic64_inc(&pblk->pad_wa);
|
||||
} else {
|
||||
lba_list[paddr] = meta_list[i].lba = addr_empty;
|
||||
lba_list[paddr] = addr_empty;
|
||||
meta->lba = addr_empty;
|
||||
__pblk_map_invalidate(pblk, line, paddr);
|
||||
}
|
||||
}
|
||||
|
@ -84,50 +92,57 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
|
|||
return 0;
|
||||
}
|
||||
|
||||
void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
|
||||
int pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
|
||||
unsigned long *lun_bitmap, unsigned int valid_secs,
|
||||
unsigned int off)
|
||||
{
|
||||
struct pblk_sec_meta *meta_list = rqd->meta_list;
|
||||
void *meta_list = pblk_get_meta_for_writes(pblk, rqd);
|
||||
void *meta_buffer;
|
||||
struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
|
||||
unsigned int map_secs;
|
||||
int min = pblk->min_write_pgs;
|
||||
int i;
|
||||
int ret;
|
||||
|
||||
for (i = off; i < rqd->nr_ppas; i += min) {
|
||||
map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
|
||||
if (pblk_map_page_data(pblk, sentry + i, &ppa_list[i],
|
||||
lun_bitmap, &meta_list[i], map_secs)) {
|
||||
bio_put(rqd->bio);
|
||||
pblk_free_rqd(pblk, rqd, PBLK_WRITE);
|
||||
pblk_pipeline_stop(pblk);
|
||||
}
|
||||
meta_buffer = pblk_get_meta(pblk, meta_list, i);
|
||||
|
||||
ret = pblk_map_page_data(pblk, sentry + i, &ppa_list[i],
|
||||
lun_bitmap, meta_buffer, map_secs);
|
||||
if (ret)
|
||||
return ret;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* only if erase_ppa is set, acquire erase semaphore */
|
||||
void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
|
||||
int pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
|
||||
unsigned int sentry, unsigned long *lun_bitmap,
|
||||
unsigned int valid_secs, struct ppa_addr *erase_ppa)
|
||||
{
|
||||
struct nvm_tgt_dev *dev = pblk->dev;
|
||||
struct nvm_geo *geo = &dev->geo;
|
||||
struct pblk_line_meta *lm = &pblk->lm;
|
||||
struct pblk_sec_meta *meta_list = rqd->meta_list;
|
||||
void *meta_list = pblk_get_meta_for_writes(pblk, rqd);
|
||||
void *meta_buffer;
|
||||
struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
|
||||
struct pblk_line *e_line, *d_line;
|
||||
unsigned int map_secs;
|
||||
int min = pblk->min_write_pgs;
|
||||
int i, erase_lun;
|
||||
int ret;
|
||||
|
||||
|
||||
for (i = 0; i < rqd->nr_ppas; i += min) {
|
||||
map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
|
||||
if (pblk_map_page_data(pblk, sentry + i, &ppa_list[i],
|
||||
lun_bitmap, &meta_list[i], map_secs)) {
|
||||
bio_put(rqd->bio);
|
||||
pblk_free_rqd(pblk, rqd, PBLK_WRITE);
|
||||
pblk_pipeline_stop(pblk);
|
||||
}
|
||||
meta_buffer = pblk_get_meta(pblk, meta_list, i);
|
||||
|
||||
ret = pblk_map_page_data(pblk, sentry + i, &ppa_list[i],
|
||||
lun_bitmap, meta_buffer, map_secs);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
erase_lun = pblk_ppa_to_pos(geo, ppa_list[i]);
|
||||
|
||||
|
@ -163,7 +178,7 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
|
|||
*/
|
||||
e_line = pblk_line_get_erase(pblk);
|
||||
if (!e_line)
|
||||
return;
|
||||
return -ENOSPC;
|
||||
|
||||
/* Erase blocks that are bad in this line but might not be in next */
|
||||
if (unlikely(pblk_ppa_empty(*erase_ppa)) &&
|
||||
|
@ -174,7 +189,7 @@ retry:
|
|||
bit = find_next_bit(d_line->blk_bitmap,
|
||||
lm->blk_per_line, bit + 1);
|
||||
if (bit >= lm->blk_per_line)
|
||||
return;
|
||||
return 0;
|
||||
|
||||
spin_lock(&e_line->lock);
|
||||
if (test_bit(bit, e_line->erase_bitmap)) {
|
||||
|
@ -188,4 +203,6 @@ retry:
|
|||
*erase_ppa = pblk->luns[bit].bppa; /* set ch and lun */
|
||||
erase_ppa->a.blk = e_line->id;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
|
|
@ -147,7 +147,7 @@ int pblk_rb_init(struct pblk_rb *rb, unsigned int size, unsigned int threshold,
|
|||
|
||||
/*
|
||||
* Initialize rate-limiter, which controls access to the write buffer
|
||||
* but user and GC I/O
|
||||
* by user and GC I/O
|
||||
*/
|
||||
pblk_rl_init(&pblk->rl, rb->nr_entries);
|
||||
|
||||
|
@ -552,6 +552,9 @@ unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb, struct nvm_rq *rqd,
|
|||
to_read = count;
|
||||
}
|
||||
|
||||
/* Add space for packed metadata if in use*/
|
||||
pad += (pblk->min_write_pgs - pblk->min_write_pgs_data);
|
||||
|
||||
c_ctx->sentry = pos;
|
||||
c_ctx->nr_valid = to_read;
|
||||
c_ctx->nr_padded = pad;
|
||||
|
|
|
@ -43,7 +43,7 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
|
|||
struct bio *bio, sector_t blba,
|
||||
unsigned long *read_bitmap)
|
||||
{
|
||||
struct pblk_sec_meta *meta_list = rqd->meta_list;
|
||||
void *meta_list = rqd->meta_list;
|
||||
struct ppa_addr ppas[NVM_MAX_VLBA];
|
||||
int nr_secs = rqd->nr_ppas;
|
||||
bool advanced_bio = false;
|
||||
|
@ -53,12 +53,15 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
|
|||
|
||||
for (i = 0; i < nr_secs; i++) {
|
||||
struct ppa_addr p = ppas[i];
|
||||
struct pblk_sec_meta *meta = pblk_get_meta(pblk, meta_list, i);
|
||||
sector_t lba = blba + i;
|
||||
|
||||
retry:
|
||||
if (pblk_ppa_empty(p)) {
|
||||
__le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
|
||||
|
||||
WARN_ON(test_and_set_bit(i, read_bitmap));
|
||||
meta_list[i].lba = cpu_to_le64(ADDR_EMPTY);
|
||||
meta->lba = addr_empty;
|
||||
|
||||
if (unlikely(!advanced_bio)) {
|
||||
bio_advance(bio, (i) * PBLK_EXPOSED_PAGE_SIZE);
|
||||
|
@ -78,7 +81,7 @@ retry:
|
|||
goto retry;
|
||||
}
|
||||
WARN_ON(test_and_set_bit(i, read_bitmap));
|
||||
meta_list[i].lba = cpu_to_le64(lba);
|
||||
meta->lba = cpu_to_le64(lba);
|
||||
advanced_bio = true;
|
||||
#ifdef CONFIG_NVM_PBLK_DEBUG
|
||||
atomic_long_inc(&pblk->cache_reads);
|
||||
|
@ -105,12 +108,16 @@ next:
|
|||
static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
|
||||
sector_t blba)
|
||||
{
|
||||
struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
|
||||
void *meta_list = rqd->meta_list;
|
||||
int nr_lbas = rqd->nr_ppas;
|
||||
int i;
|
||||
|
||||
if (!pblk_is_oob_meta_supported(pblk))
|
||||
return;
|
||||
|
||||
for (i = 0; i < nr_lbas; i++) {
|
||||
u64 lba = le64_to_cpu(meta_lba_list[i].lba);
|
||||
struct pblk_sec_meta *meta = pblk_get_meta(pblk, meta_list, i);
|
||||
u64 lba = le64_to_cpu(meta->lba);
|
||||
|
||||
if (lba == ADDR_EMPTY)
|
||||
continue;
|
||||
|
@ -134,17 +141,22 @@ static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
|
|||
static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
|
||||
u64 *lba_list, int nr_lbas)
|
||||
{
|
||||
struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
|
||||
void *meta_lba_list = rqd->meta_list;
|
||||
int i, j;
|
||||
|
||||
if (!pblk_is_oob_meta_supported(pblk))
|
||||
return;
|
||||
|
||||
for (i = 0, j = 0; i < nr_lbas; i++) {
|
||||
struct pblk_sec_meta *meta = pblk_get_meta(pblk,
|
||||
meta_lba_list, j);
|
||||
u64 lba = lba_list[i];
|
||||
u64 meta_lba;
|
||||
|
||||
if (lba == ADDR_EMPTY)
|
||||
continue;
|
||||
|
||||
meta_lba = le64_to_cpu(meta_lba_list[j].lba);
|
||||
meta_lba = le64_to_cpu(meta->lba);
|
||||
|
||||
if (lba != meta_lba) {
|
||||
#ifdef CONFIG_NVM_PBLK_DEBUG
|
||||
|
@ -216,15 +228,15 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
|
|||
struct pblk *pblk = rqd->private;
|
||||
struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
|
||||
struct pblk_pr_ctx *pr_ctx = r_ctx->private;
|
||||
struct pblk_sec_meta *meta;
|
||||
struct bio *new_bio = rqd->bio;
|
||||
struct bio *bio = pr_ctx->orig_bio;
|
||||
struct bio_vec src_bv, dst_bv;
|
||||
struct pblk_sec_meta *meta_list = rqd->meta_list;
|
||||
void *meta_list = rqd->meta_list;
|
||||
int bio_init_idx = pr_ctx->bio_init_idx;
|
||||
unsigned long *read_bitmap = pr_ctx->bitmap;
|
||||
int nr_secs = pr_ctx->orig_nr_secs;
|
||||
int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
|
||||
__le64 *lba_list_mem, *lba_list_media;
|
||||
void *src_p, *dst_p;
|
||||
int hole, i;
|
||||
|
||||
|
@ -237,13 +249,10 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
|
|||
rqd->ppa_list[0] = ppa;
|
||||
}
|
||||
|
||||
/* Re-use allocated memory for intermediate lbas */
|
||||
lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
|
||||
lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size);
|
||||
|
||||
for (i = 0; i < nr_secs; i++) {
|
||||
lba_list_media[i] = meta_list[i].lba;
|
||||
meta_list[i].lba = lba_list_mem[i];
|
||||
meta = pblk_get_meta(pblk, meta_list, i);
|
||||
pr_ctx->lba_list_media[i] = le64_to_cpu(meta->lba);
|
||||
meta->lba = cpu_to_le64(pr_ctx->lba_list_mem[i]);
|
||||
}
|
||||
|
||||
/* Fill the holes in the original bio */
|
||||
|
@ -255,7 +264,8 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
|
|||
line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]);
|
||||
kref_put(&line->ref, pblk_line_put);
|
||||
|
||||
meta_list[hole].lba = lba_list_media[i];
|
||||
meta = pblk_get_meta(pblk, meta_list, hole);
|
||||
meta->lba = cpu_to_le64(pr_ctx->lba_list_media[i]);
|
||||
|
||||
src_bv = new_bio->bi_io_vec[i++];
|
||||
dst_bv = bio->bi_io_vec[bio_init_idx + hole];
|
||||
|
@ -291,17 +301,13 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
|
|||
unsigned long *read_bitmap,
|
||||
int nr_holes)
|
||||
{
|
||||
struct pblk_sec_meta *meta_list = rqd->meta_list;
|
||||
void *meta_list = rqd->meta_list;
|
||||
struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
|
||||
struct pblk_pr_ctx *pr_ctx;
|
||||
struct bio *new_bio, *bio = r_ctx->private;
|
||||
__le64 *lba_list_mem;
|
||||
int nr_secs = rqd->nr_ppas;
|
||||
int i;
|
||||
|
||||
/* Re-use allocated memory for intermediate lbas */
|
||||
lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
|
||||
|
||||
new_bio = bio_alloc(GFP_KERNEL, nr_holes);
|
||||
|
||||
if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes))
|
||||
|
@ -312,12 +318,15 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
|
|||
goto fail_free_pages;
|
||||
}
|
||||
|
||||
pr_ctx = kmalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL);
|
||||
pr_ctx = kzalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL);
|
||||
if (!pr_ctx)
|
||||
goto fail_free_pages;
|
||||
|
||||
for (i = 0; i < nr_secs; i++)
|
||||
lba_list_mem[i] = meta_list[i].lba;
|
||||
for (i = 0; i < nr_secs; i++) {
|
||||
struct pblk_sec_meta *meta = pblk_get_meta(pblk, meta_list, i);
|
||||
|
||||
pr_ctx->lba_list_mem[i] = le64_to_cpu(meta->lba);
|
||||
}
|
||||
|
||||
new_bio->bi_iter.bi_sector = 0; /* internal bio */
|
||||
bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
|
||||
|
@ -325,7 +334,6 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
|
|||
rqd->bio = new_bio;
|
||||
rqd->nr_ppas = nr_holes;
|
||||
|
||||
pr_ctx->ppa_ptr = NULL;
|
||||
pr_ctx->orig_bio = bio;
|
||||
bitmap_copy(pr_ctx->bitmap, read_bitmap, NVM_MAX_VLBA);
|
||||
pr_ctx->bio_init_idx = bio_init_idx;
|
||||
|
@ -383,7 +391,7 @@ err:
|
|||
static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
|
||||
sector_t lba, unsigned long *read_bitmap)
|
||||
{
|
||||
struct pblk_sec_meta *meta_list = rqd->meta_list;
|
||||
struct pblk_sec_meta *meta = pblk_get_meta(pblk, rqd->meta_list, 0);
|
||||
struct ppa_addr ppa;
|
||||
|
||||
pblk_lookup_l2p_seq(pblk, &ppa, lba, 1);
|
||||
|
@ -394,8 +402,10 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
|
|||
|
||||
retry:
|
||||
if (pblk_ppa_empty(ppa)) {
|
||||
__le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
|
||||
|
||||
WARN_ON(test_and_set_bit(0, read_bitmap));
|
||||
meta_list[0].lba = cpu_to_le64(ADDR_EMPTY);
|
||||
meta->lba = addr_empty;
|
||||
return;
|
||||
}
|
||||
|
||||
|
@ -409,7 +419,7 @@ retry:
|
|||
}
|
||||
|
||||
WARN_ON(test_and_set_bit(0, read_bitmap));
|
||||
meta_list[0].lba = cpu_to_le64(lba);
|
||||
meta->lba = cpu_to_le64(lba);
|
||||
|
||||
#ifdef CONFIG_NVM_PBLK_DEBUG
|
||||
atomic_long_inc(&pblk->cache_reads);
|
||||
|
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue