for-4.21/block-20181221

-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlwb7R8QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpjiID/97oDjMhNT7rwpuMbHw855h62j1hEN/m+N3
 FI0uxivYoYZLD+eJRnMcBwHlKjrCX8iJQAcv9ffI3ThtFW7dnZT3atUacaZVR/Dt
 IrxdymdBP3qsmuaId5NYBug7rJ+AiqFJKjEvCcSPu5X397J4I3SEbzhfvYLJ/aZX
 16o0HJlVVIrcbmq1IP4HwiIIOaKXvPaw04L4z4fpeynRSWG7EAi8NLSnhlR4Rxbb
 BTiMkCTsjRCFdyO6da4fvNQKWmPGPa3bJkYy3qR99cvJCeIbQjRyCloQlWNJRRgi
 3eJpCHVxqFmN0/+DNTJVQEEr4H8o0AVucrLVct1Jc4pessenkpoUniP8vELqwlng
 Z2VHLkhTfCEmvFlk82grrYdNvGATRsrbswt/PlP4T7rBfr1IpDk8kXDWF59EL2dy
 ly35Sk3wJGHBl8qa+vEPXOAnaWdqJXuVGpwB4ifOIatOls8mOxwfZjiRc7x05/fC
 1O4rR2IfLwRqwoYHs0AJ+h6ohOSn1mkGezl2Tch1VSFcJUOHmuYvraTaUi6hblpA
 SslaAoEhO39hRBL0HsvsMeqVWM9uzqvFkLDCfNPdiA81H1258CIbo4vF8z6czCIS
 eeXnTJxVhPVbZgb3a1a93SPwM6KIDZFoIijyd+NqjpU94thlnhYD0QEcKJIKH7os
 2p4aHs6ktw==
 =TRdW
 -----END PGP SIGNATURE-----

Merge tag 'for-4.21/block-20181221' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:
 "This is the main pull request for block/storage for 4.21.

  Larger than usual, it was a busy round with lots of goodies queued up.
  Most notable is the removal of the old IO stack, which has been a long
  time coming. No new features for a while, everything coming in this
  week has all been fixes for things that were previously merged.

  This contains:

   - Use atomic counters instead of semaphores for mtip32xx (Arnd)

   - Cleanup of the mtip32xx request setup (Christoph)

   - Fix for circular locking dependency in loop (Jan, Tetsuo)

   - bcache (Coly, Guoju, Shenghui)
      * Optimizations for writeback caching
      * Various fixes and improvements

   - nvme (Chaitanya, Christoph, Sagi, Jay, me, Keith)
      * host and target support for NVMe over TCP
      * Error log page support
      * Support for separate read/write/poll queues
      * Much improved polling
      * discard OOM fallback
      * Tracepoint improvements

   - lightnvm (Hans, Hua, Igor, Matias, Javier)
      * Igor added packed metadata to pblk. Now drives without metadata
        per LBA can be used as well.
      * Fix from Geert on uninitialized value on chunk metadata reads.
      * Fixes from Hans and Javier to pblk recovery and write path.
      * Fix from Hua Su to fix a race condition in the pblk recovery
        code.
      * Scan optimization added to pblk recovery from Zhoujie.
      * Small geometry cleanup from me.

   - Conversion of the last few drivers that used the legacy path to
     blk-mq (me)

   - Removal of legacy IO path in SCSI (me, Christoph)

   - Removal of legacy IO stack and schedulers (me)

   - Support for much better polling, now without interrupts at all.
     blk-mq adds support for multiple queue maps, which enables us to
     have a map per type. This in turn enables nvme to have separate
     completion queues for polling, which can then be interrupt-less.
     Also means we're ready for async polled IO, which is hopefully
     coming in the next release.

   - Killing of (now) unused block exports (Christoph)

   - Unification of the blk-rq-qos and blk-wbt wait handling (Josef)

   - Support for zoned testing with null_blk (Masato)

   - sx8 conversion to per-host tag sets (Christoph)

   - IO priority improvements (Damien)

   - mq-deadline zoned fix (Damien)

   - Ref count blkcg series (Dennis)

   - Lots of blk-mq improvements and speedups (me)

   - sbitmap scalability improvements (me)

   - Make core inflight IO accounting per-cpu (Mikulas)

   - Export timeout setting in sysfs (Weiping)

   - Cleanup the direct issue path (Jianchao)

   - Export blk-wbt internals in block debugfs for easier debugging
     (Ming)

   - Lots of other fixes and improvements"

* tag 'for-4.21/block-20181221' of git://git.kernel.dk/linux-block: (364 commits)
  kyber: use sbitmap add_wait_queue/list_del wait helpers
  sbitmap: add helpers for add/del wait queue handling
  block: save irq state in blkg_lookup_create()
  dm: don't reuse bio for flushes
  nvme-pci: trace SQ status on completions
  nvme-rdma: implement polling queue map
  nvme-fabrics: allow user to pass in nr_poll_queues
  nvme-fabrics: allow nvmf_connect_io_queue to poll
  nvme-core: optionally poll sync commands
  block: make request_to_qc_t public
  nvme-tcp: fix spelling mistake "attepmpt" -> "attempt"
  nvme-tcp: fix endianess annotations
  nvmet-tcp: fix endianess annotations
  nvme-pci: refactor nvme_poll_irqdisable to make sparse happy
  nvme-pci: only set nr_maps to 2 if poll queues are supported
  nvmet: use a macro for default error location
  nvmet: fix comparison of a u16 with -1
  blk-mq: enable IO poll if .nr_queues of type poll > 0
  blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight()
  blk-mq: skip zero-queue maps in blk_mq_map_swqueue
  ...
This commit is contained in:
Linus Torvalds 2018-12-28 13:19:59 -08:00
commit 0e9da3fbf7
246 changed files with 10469 additions and 14370 deletions

View File

@ -244,7 +244,7 @@ Description:
What: /sys/block/<disk>/queue/zoned
Date: September 2016
Contact: Damien Le Moal <damien.lemoal@hgst.com>
Contact: Damien Le Moal <damien.lemoal@wdc.com>
Description:
zoned indicates if the device is a zoned block device
and the zone model of the device if it is indeed zoned.
@ -259,6 +259,14 @@ Description:
zone commands, they will be treated as regular block
devices and zoned will report "none".
What: /sys/block/<disk>/queue/nr_zones
Date: November 2018
Contact: Damien Le Moal <damien.lemoal@wdc.com>
Description:
nr_zones indicates the total number of zones of a zoned block
device ("host-aware" or "host-managed" zone model). For regular
block devices, the value is always 0.
What: /sys/block/<disk>/queue/chunk_sectors
Date: September 2016
Contact: Hannes Reinecke <hare@suse.com>
@ -268,6 +276,6 @@ Description:
indicates the size in 512B sectors of the RAID volume
stripe segment. For a zoned block device, either
host-aware or host-managed, chunk_sectors indicates the
size of 512B sectors of the zones of the device, with
size in 512B sectors of the zones of the device, with
the eventual exception of the last zone of the device
which may be smaller.

View File

@ -1879,8 +1879,10 @@ following two functions.
wbc_init_bio(@wbc, @bio)
Should be called for each bio carrying writeback data and
associates the bio with the inode's owner cgroup. Can be
called anytime between bio allocation and submission.
associates the bio with the inode's owner cgroup and the
corresponding request queue. This must be called after
a queue (device) has been associated with the bio and
before submission.
wbc_account_io(@wbc, @page, @bytes)
Should be called for each data segment being written out.
@ -1899,7 +1901,7 @@ the configuration, the bio may be executed at a lower priority and if
the writeback session is holding shared resources, e.g. a journal
entry, may lead to priority inversion. There is no one easy solution
for the problem. Filesystems can try to work around specific problem
cases by skipping wbc_init_bio() or using bio_associate_blkcg()
cases by skipping wbc_init_bio() and using bio_associate_blkg()
directly.

View File

@ -65,7 +65,6 @@ Description of Contents:
3.2.3 I/O completion
3.2.4 Implications for drivers that do not interpret bios (don't handle
multiple segments)
3.2.5 Request command tagging
3.3 I/O submission
4. The I/O scheduler
5. Scalability related changes
@ -708,93 +707,6 @@ is crossed on completion of a transfer. (The end*request* functions should
be used if only if the request has come down from block/bio path, not for
direct access requests which only specify rq->buffer without a valid rq->bio)
3.2.5 Generic request command tagging
3.2.5.1 Tag helpers
Block now offers some simple generic functionality to help support command
queueing (typically known as tagged command queueing), ie manage more than
one outstanding command on a queue at any given time.
blk_queue_init_tags(struct request_queue *q, int depth)
Initialize internal command tagging structures for a maximum
depth of 'depth'.
blk_queue_free_tags((struct request_queue *q)
Teardown tag info associated with the queue. This will be done
automatically by block if blk_queue_cleanup() is called on a queue
that is using tagging.
The above are initialization and exit management, the main helpers during
normal operations are:
blk_queue_start_tag(struct request_queue *q, struct request *rq)
Start tagged operation for this request. A free tag number between
0 and 'depth' is assigned to the request (rq->tag holds this number),
and 'rq' is added to the internal tag management. If the maximum depth
for this queue is already achieved (or if the tag wasn't started for
some other reason), 1 is returned. Otherwise 0 is returned.
blk_queue_end_tag(struct request_queue *q, struct request *rq)
End tagged operation on this request. 'rq' is removed from the internal
book keeping structures.
To minimize struct request and queue overhead, the tag helpers utilize some
of the same request members that are used for normal request queue management.
This means that a request cannot both be an active tag and be on the queue
list at the same time. blk_queue_start_tag() will remove the request, but
the driver must remember to call blk_queue_end_tag() before signalling
completion of the request to the block layer. This means ending tag
operations before calling end_that_request_last()! For an example of a user
of these helpers, see the IDE tagged command queueing support.
3.2.5.2 Tag info
Some block functions exist to query current tag status or to go from a
tag number to the associated request. These are, in no particular order:
blk_queue_tagged(q)
Returns 1 if the queue 'q' is using tagging, 0 if not.
blk_queue_tag_request(q, tag)
Returns a pointer to the request associated with tag 'tag'.
blk_queue_tag_depth(q)
Return current queue depth.
blk_queue_tag_queue(q)
Returns 1 if the queue can accept a new queued command, 0 if we are
at the maximum depth already.
blk_queue_rq_tagged(rq)
Returns 1 if the request 'rq' is tagged.
3.2.5.2 Internal structure
Internally, block manages tags in the blk_queue_tag structure:
struct blk_queue_tag {
struct request **tag_index; /* array or pointers to rq */
unsigned long *tag_map; /* bitmap of free tags */
struct list_head busy_list; /* fifo list of busy tags */
int busy; /* queue depth */
int max_depth; /* max queue depth */
};
Most of the above is simple and straight forward, however busy_list may need
a bit of explaining. Normally we don't care too much about request ordering,
but in the event of any barrier requests in the tag queue we need to ensure
that requests are restarted in the order they were queue.
3.3 I/O Submission
The routine submit_bio() is used to submit a single io. Higher level i/o

View File

@ -1,291 +0,0 @@
CFQ (Complete Fairness Queueing)
===============================
The main aim of CFQ scheduler is to provide a fair allocation of the disk
I/O bandwidth for all the processes which requests an I/O operation.
CFQ maintains the per process queue for the processes which request I/O
operation(synchronous requests). In case of asynchronous requests, all the
requests from all the processes are batched together according to their
process's I/O priority.
CFQ ioscheduler tunables
========================
slice_idle
----------
This specifies how long CFQ should idle for next request on certain cfq queues
(for sequential workloads) and service trees (for random workloads) before
queue is expired and CFQ selects next queue to dispatch from.
By default slice_idle is a non-zero value. That means by default we idle on
queues/service trees. This can be very helpful on highly seeky media like
single spindle SATA/SAS disks where we can cut down on overall number of
seeks and see improved throughput.
Setting slice_idle to 0 will remove all the idling on queues/service tree
level and one should see an overall improved throughput on faster storage
devices like multiple SATA/SAS disks in hardware RAID configuration. The down
side is that isolation provided from WRITES also goes down and notion of
IO priority becomes weaker.
So depending on storage and workload, it might be useful to set slice_idle=0.
In general I think for SATA/SAS disks and software RAID of SATA/SAS disks
keeping slice_idle enabled should be useful. For any configurations where
there are multiple spindles behind single LUN (Host based hardware RAID
controller or for storage arrays), setting slice_idle=0 might end up in better
throughput and acceptable latencies.
back_seek_max
-------------
This specifies, given in Kbytes, the maximum "distance" for backward seeking.
The distance is the amount of space from the current head location to the
sectors that are backward in terms of distance.
This parameter allows the scheduler to anticipate requests in the "backward"
direction and consider them as being the "next" if they are within this
distance from the current head location.
back_seek_penalty
-----------------
This parameter is used to compute the cost of backward seeking. If the
backward distance of request is just 1/back_seek_penalty from a "front"
request, then the seeking cost of two requests is considered equivalent.
So scheduler will not bias toward one or the other request (otherwise scheduler
will bias toward front request). Default value of back_seek_penalty is 2.
fifo_expire_async
-----------------
This parameter is used to set the timeout of asynchronous requests. Default
value of this is 248ms.
fifo_expire_sync
----------------
This parameter is used to set the timeout of synchronous requests. Default
value of this is 124ms. In case to favor synchronous requests over asynchronous
one, this value should be decreased relative to fifo_expire_async.
group_idle
-----------
This parameter forces idling at the CFQ group level instead of CFQ
queue level. This was introduced after a bottleneck was observed
in higher end storage due to idle on sequential queue and allow dispatch
from a single queue. The idea with this parameter is that it can be run with
slice_idle=0 and group_idle=8, so that idling does not happen on individual
queues in the group but happens overall on the group and thus still keeps the
IO controller working.
Not idling on individual queues in the group will dispatch requests from
multiple queues in the group at the same time and achieve higher throughput
on higher end storage.
Default value for this parameter is 8ms.
low_latency
-----------
This parameter is used to enable/disable the low latency mode of the CFQ
scheduler. If enabled, CFQ tries to recompute the slice time for each process
based on the target_latency set for the system. This favors fairness over
throughput. Disabling low latency (setting it to 0) ignores target latency,
allowing each process in the system to get a full time slice.
By default low latency mode is enabled.
target_latency
--------------
This parameter is used to calculate the time slice for a process if cfq's
latency mode is enabled. It will ensure that sync requests have an estimated
latency. But if sequential workload is higher(e.g. sequential read),
then to meet the latency constraints, throughput may decrease because of less
time for each process to issue I/O request before the cfq queue is switched.
Though this can be overcome by disabling the latency_mode, it may increase
the read latency for some applications. This parameter allows for changing
target_latency through the sysfs interface which can provide the balanced
throughput and read latency.
Default value for target_latency is 300ms.
slice_async
-----------
This parameter is same as of slice_sync but for asynchronous queue. The
default value is 40ms.
slice_async_rq
--------------
This parameter is used to limit the dispatching of asynchronous request to
device request queue in queue's slice time. The maximum number of request that
are allowed to be dispatched also depends upon the io priority. Default value
for this is 2.
slice_sync
----------
When a queue is selected for execution, the queues IO requests are only
executed for a certain amount of time(time_slice) before switching to another
queue. This parameter is used to calculate the time slice of synchronous
queue.
time_slice is computed using the below equation:-
time_slice = slice_sync + (slice_sync/5 * (4 - prio)). To increase the
time_slice of synchronous queue, increase the value of slice_sync. Default
value is 100ms.
quantum
-------
This specifies the number of request dispatched to the device queue. In a
queue's time slice, a request will not be dispatched if the number of request
in the device exceeds this parameter. This parameter is used for synchronous
request.
In case of storage with several disk, this setting can limit the parallel
processing of request. Therefore, increasing the value can improve the
performance although this can cause the latency of some I/O to increase due
to more number of requests.
CFQ Group scheduling
====================
CFQ supports blkio cgroup and has "blkio." prefixed files in each
blkio cgroup directory. It is weight-based and there are four knobs
for configuration - weight[_device] and leaf_weight[_device].
Internal cgroup nodes (the ones with children) can also have tasks in
them, so the former two configure how much proportion the cgroup as a
whole is entitled to at its parent's level while the latter two
configure how much proportion the tasks in the cgroup have compared to
its direct children.
Another way to think about it is assuming that each internal node has
an implicit leaf child node which hosts all the tasks whose weight is
configured by leaf_weight[_device]. Let's assume a blkio hierarchy
composed of five cgroups - root, A, B, AA and AB - with the following
weights where the names represent the hierarchy.
weight leaf_weight
root : 125 125
A : 500 750
B : 250 500
AA : 500 500
AB : 1000 500
root never has a parent making its weight is meaningless. For backward
compatibility, weight is always kept in sync with leaf_weight. B, AA
and AB have no child and thus its tasks have no children cgroup to
compete with. They always get 100% of what the cgroup won at the
parent level. Considering only the weights which matter, the hierarchy
looks like the following.
root
/ | \
A B leaf
500 250 125
/ | \
AA AB leaf
500 1000 750
If all cgroups have active IOs and competing with each other, disk
time will be distributed like the following.
Distribution below root. The total active weight at this level is
A:500 + B:250 + C:125 = 875.
root-leaf : 125 / 875 =~ 14%
A : 500 / 875 =~ 57%
B(-leaf) : 250 / 875 =~ 28%
A has children and further distributes its 57% among the children and
the implicit leaf node. The total active weight at this level is
AA:500 + AB:1000 + A-leaf:750 = 2250.
A-leaf : ( 750 / 2250) * A =~ 19%
AA(-leaf) : ( 500 / 2250) * A =~ 12%
AB(-leaf) : (1000 / 2250) * A =~ 25%
CFQ IOPS Mode for group scheduling
===================================
Basic CFQ design is to provide priority based time slices. Higher priority
process gets bigger time slice and lower priority process gets smaller time
slice. Measuring time becomes harder if storage is fast and supports NCQ and
it would be better to dispatch multiple requests from multiple cfq queues in
request queue at a time. In such scenario, it is not possible to measure time
consumed by single queue accurately.
What is possible though is to measure number of requests dispatched from a
single queue and also allow dispatch from multiple cfq queue at the same time.
This effectively becomes the fairness in terms of IOPS (IO operations per
second).
If one sets slice_idle=0 and if storage supports NCQ, CFQ internally switches
to IOPS mode and starts providing fairness in terms of number of requests
dispatched. Note that this mode switching takes effect only for group
scheduling. For non-cgroup users nothing should change.
CFQ IO scheduler Idling Theory
===============================
Idling on a queue is primarily about waiting for the next request to come
on same queue after completion of a request. In this process CFQ will not
dispatch requests from other cfq queues even if requests are pending there.
The rationale behind idling is that it can cut down on number of seeks
on rotational media. For example, if a process is doing dependent
sequential reads (next read will come on only after completion of previous
one), then not dispatching request from other queue should help as we
did not move the disk head and kept on dispatching sequential IO from
one queue.
CFQ has following service trees and various queues are put on these trees.
sync-idle sync-noidle async
All cfq queues doing synchronous sequential IO go on to sync-idle tree.
On this tree we idle on each queue individually.
All synchronous non-sequential queues go on sync-noidle tree. Also any
synchronous write request which is not marked with REQ_IDLE goes on this
service tree. On this tree we do not idle on individual queues instead idle
on the whole group of queues or the tree. So if there are 4 queues waiting
for IO to dispatch we will idle only once last queue has dispatched the IO
and there is no more IO on this service tree.
All async writes go on async service tree. There is no idling on async
queues.
CFQ has some optimizations for SSDs and if it detects a non-rotational
media which can support higher queue depth (multiple requests at in
flight at a time), then it cuts down on idling of individual queues and
all the queues move to sync-noidle tree and only tree idle remains. This
tree idling provides isolation with buffered write queues on async tree.
FAQ
===
Q1. Why to idle at all on queues not marked with REQ_IDLE.
A1. We only do tree idle (all queues on sync-noidle tree) on queues not marked
with REQ_IDLE. This helps in providing isolation with all the sync-idle
queues. Otherwise in presence of many sequential readers, other
synchronous IO might not get fair share of disk.
For example, if there are 10 sequential readers doing IO and they get
100ms each. If a !REQ_IDLE request comes in, it will be scheduled
roughly after 1 second. If after completion of !REQ_IDLE request we
do not idle, and after a couple of milli seconds a another !REQ_IDLE
request comes in, again it will be scheduled after 1second. Repeat it
and notice how a workload can lose its disk share and suffer due to
multiple sequential readers.
fsync can generate dependent IO where bunch of data is written in the
context of fsync, and later some journaling data is written. Journaling
data comes in only after fsync has finished its IO (atleast for ext4
that seemed to be the case). Now if one decides not to idle on fsync
thread due to !REQ_IDLE, then next journaling write will not get
scheduled for another second. A process doing small fsync, will suffer
badly in presence of multiple sequential readers.
Hence doing tree idling on threads using !REQ_IDLE flag on requests
provides isolation from multiple sequential readers and at the same
time we do not idle on individual threads.
Q2. When to specify REQ_IDLE
A2. I would think whenever one is doing synchronous write and expecting
more writes to be dispatched from same context soon, should be able
to specify REQ_IDLE on writes and that probably should work well for
most of the cases.

View File

@ -64,7 +64,7 @@ guess, the kernel will put the process issuing IO to sleep for an amount
of time, before entering a classic poll loop. This mode might be a
little slower than pure classic polling, but it will be more efficient.
If set to a value larger than 0, the kernel will put the process issuing
IO to sleep for this amont of microseconds before entering classic
IO to sleep for this amount of microseconds before entering classic
polling.
iostats (RW)
@ -194,4 +194,31 @@ blk-throttle makes decision based on the samplings. Lower time means cgroups
have more smooth throughput, but higher CPU overhead. This exists only when
CONFIG_BLK_DEV_THROTTLING_LOW is enabled.
zoned (RO)
----------
This indicates if the device is a zoned block device and the zone model of the
device if it is indeed zoned. The possible values indicated by zoned are
"none" for regular block devices and "host-aware" or "host-managed" for zoned
block devices. The characteristics of host-aware and host-managed zoned block
devices are described in the ZBC (Zoned Block Commands) and ZAC
(Zoned Device ATA Command Set) standards. These standards also define the
"drive-managed" zone model. However, since drive-managed zoned block devices
do not support zone commands, they will be treated as regular block devices
and zoned will report "none".
nr_zones (RO)
-------------
For zoned block devices (zoned attribute indicating "host-managed" or
"host-aware"), this indicates the total number of zones of the device.
This is always 0 for regular block devices.
chunk_sectors (RO)
------------------
This has different meaning depending on the type of the block device.
For a RAID device (dm-raid), chunk_sectors indicates the size in 512B sectors
of the RAID volume stripe segment. For a zoned block device, either host-aware
or host-managed, chunk_sectors indicates the size in 512B sectors of the zones
of the device, with the eventual exception of the last zone of the device which
may be smaller.
Jens Axboe <jens.axboe@oracle.com>, February 2009

View File

@ -97,11 +97,6 @@ parameters may be changed at runtime by the command
allowing boot to proceed. none ignores them, expecting
user space to do the scan.
scsi_mod.use_blk_mq=
[SCSI] use blk-mq I/O path by default
See SCSI_MQ_DEFAULT in drivers/scsi/Kconfig.
Format: <y/n>
sim710= [SCSI,HW]
See header of drivers/scsi/sim710.c.

View File

@ -155,12 +155,6 @@ config BLK_CGROUP_IOLATENCY
Note, this is an experimental interface and could be changed someday.
config BLK_WBT_SQ
bool "Single queue writeback throttling"
depends on BLK_WBT
---help---
Enable writeback throttling by default on legacy single queue devices
config BLK_WBT_MQ
bool "Multiqueue writeback throttling"
default y

View File

@ -3,67 +3,6 @@ if BLOCK
menu "IO Schedulers"
config IOSCHED_NOOP
bool
default y
---help---
The no-op I/O scheduler is a minimal scheduler that does basic merging
and sorting. Its main uses include non-disk based block devices like
memory devices, and specialised software or hardware environments
that do their own scheduling and require only minimal assistance from
the kernel.
config IOSCHED_DEADLINE
tristate "Deadline I/O scheduler"
default y
---help---
The deadline I/O scheduler is simple and compact. It will provide
CSCAN service with FIFO expiration of requests, switching to
a new point in the service tree and doing a batch of IO from there
in case of expiry.
config IOSCHED_CFQ
tristate "CFQ I/O scheduler"
default y
---help---
The CFQ I/O scheduler tries to distribute bandwidth equally
among all processes in the system. It should provide a fair
and low latency working environment, suitable for both desktop
and server systems.
This is the default I/O scheduler.
config CFQ_GROUP_IOSCHED
bool "CFQ Group Scheduling support"
depends on IOSCHED_CFQ && BLK_CGROUP
---help---
Enable group IO scheduling in CFQ.
choice
prompt "Default I/O scheduler"
default DEFAULT_CFQ
help
Select the I/O scheduler which will be used by default for all
block devices.
config DEFAULT_DEADLINE
bool "Deadline" if IOSCHED_DEADLINE=y
config DEFAULT_CFQ
bool "CFQ" if IOSCHED_CFQ=y
config DEFAULT_NOOP
bool "No-op"
endchoice
config DEFAULT_IOSCHED
string
default "deadline" if DEFAULT_DEADLINE
default "cfq" if DEFAULT_CFQ
default "noop" if DEFAULT_NOOP
config MQ_IOSCHED_DEADLINE
tristate "MQ deadline I/O scheduler"
default y

View File

@ -3,7 +3,7 @@
# Makefile for the kernel block layer
#
obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-tag.o blk-sysfs.o \
obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-sysfs.o \
blk-flush.o blk-settings.o blk-ioc.o blk-map.o \
blk-exec.o blk-merge.o blk-softirq.o blk-timeout.o \
blk-lib.o blk-mq.o blk-mq-tag.o blk-stat.o \
@ -18,9 +18,6 @@ obj-$(CONFIG_BLK_DEV_BSGLIB) += bsg-lib.o
obj-$(CONFIG_BLK_CGROUP) += blk-cgroup.o
obj-$(CONFIG_BLK_DEV_THROTTLING) += blk-throttle.o
obj-$(CONFIG_BLK_CGROUP_IOLATENCY) += blk-iolatency.o
obj-$(CONFIG_IOSCHED_NOOP) += noop-iosched.o
obj-$(CONFIG_IOSCHED_DEADLINE) += deadline-iosched.o
obj-$(CONFIG_IOSCHED_CFQ) += cfq-iosched.o
obj-$(CONFIG_MQ_IOSCHED_DEADLINE) += mq-deadline.o
obj-$(CONFIG_MQ_IOSCHED_KYBER) += kyber-iosched.o
bfq-y := bfq-iosched.o bfq-wf2q.o bfq-cgroup.o

View File

@ -334,7 +334,7 @@ static void bfqg_stats_xfer_dead(struct bfq_group *bfqg)
parent = bfqg_parent(bfqg);
lockdep_assert_held(bfqg_to_blkg(bfqg)->q->queue_lock);
lockdep_assert_held(&bfqg_to_blkg(bfqg)->q->queue_lock);
if (unlikely(!parent))
return;
@ -642,7 +642,7 @@ void bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio)
uint64_t serial_nr;
rcu_read_lock();
serial_nr = bio_blkcg(bio)->css.serial_nr;
serial_nr = __bio_blkcg(bio)->css.serial_nr;
/*
* Check whether blkcg has changed. The condition may trigger
@ -651,7 +651,7 @@ void bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio)
if (unlikely(!bfqd) || likely(bic->blkcg_serial_nr == serial_nr))
goto out;
bfqg = __bfq_bic_change_cgroup(bfqd, bic, bio_blkcg(bio));
bfqg = __bfq_bic_change_cgroup(bfqd, bic, __bio_blkcg(bio));
/*
* Update blkg_path for bfq_log_* functions. We cache this
* path, and update it here, for the following

View File

@ -399,9 +399,9 @@ static struct bfq_io_cq *bfq_bic_lookup(struct bfq_data *bfqd,
unsigned long flags;
struct bfq_io_cq *icq;
spin_lock_irqsave(q->queue_lock, flags);
spin_lock_irqsave(&q->queue_lock, flags);
icq = icq_to_bic(ioc_lookup_icq(ioc, q));
spin_unlock_irqrestore(q->queue_lock, flags);
spin_unlock_irqrestore(&q->queue_lock, flags);
return icq;
}
@ -4066,7 +4066,7 @@ static void bfq_update_dispatch_stats(struct request_queue *q,
* In addition, the following queue lock guarantees that
* bfqq_group(bfqq) exists as well.
*/
spin_lock_irq(q->queue_lock);
spin_lock_irq(&q->queue_lock);
if (idle_timer_disabled)
/*
* Since the idle timer has been disabled,
@ -4085,7 +4085,7 @@ static void bfq_update_dispatch_stats(struct request_queue *q,
bfqg_stats_set_start_empty_time(bfqg);
bfqg_stats_update_io_remove(bfqg, rq->cmd_flags);
}
spin_unlock_irq(q->queue_lock);
spin_unlock_irq(&q->queue_lock);
}
#else
static inline void bfq_update_dispatch_stats(struct request_queue *q,
@ -4416,7 +4416,7 @@ static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
rcu_read_lock();
bfqg = bfq_find_set_group(bfqd, bio_blkcg(bio));
bfqg = bfq_find_set_group(bfqd, __bio_blkcg(bio));
if (!bfqg) {
bfqq = &bfqd->oom_bfqq;
goto out;
@ -4669,11 +4669,11 @@ static void bfq_update_insert_stats(struct request_queue *q,
* In addition, the following queue lock guarantees that
* bfqq_group(bfqq) exists as well.
*/
spin_lock_irq(q->queue_lock);
spin_lock_irq(&q->queue_lock);
bfqg_stats_update_io_add(bfqq_group(bfqq), bfqq, cmd_flags);
if (idle_timer_disabled)
bfqg_stats_update_idle_time(bfqq_group(bfqq));
spin_unlock_irq(q->queue_lock);
spin_unlock_irq(&q->queue_lock);
}
#else
static inline void bfq_update_insert_stats(struct request_queue *q,
@ -5414,9 +5414,9 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
}
eq->elevator_data = bfqd;
spin_lock_irq(q->queue_lock);
spin_lock_irq(&q->queue_lock);
q->elevator = eq;
spin_unlock_irq(q->queue_lock);
spin_unlock_irq(&q->queue_lock);
/*
* Our fallback bfqq if bfq_find_alloc_queue() runs into OOM issues.
@ -5756,7 +5756,7 @@ static struct elv_fs_entry bfq_attrs[] = {
};
static struct elevator_type iosched_bfq_mq = {
.ops.mq = {
.ops = {
.limit_depth = bfq_limit_depth,
.prepare_request = bfq_prepare_request,
.requeue_request = bfq_finish_requeue_request,
@ -5777,7 +5777,6 @@ static struct elevator_type iosched_bfq_mq = {
.exit_sched = bfq_exit_queue,
},
.uses_mq = true,
.icq_size = sizeof(struct bfq_io_cq),
.icq_align = __alignof__(struct bfq_io_cq),
.elevator_attrs = bfq_attrs,

View File

@ -390,7 +390,6 @@ void bio_integrity_advance(struct bio *bio, unsigned int bytes_done)
bip->bip_iter.bi_sector += bytes_done >> 9;
bvec_iter_advance(bip->bip_vec, &bip->bip_iter, bytes);
}
EXPORT_SYMBOL(bio_integrity_advance);
/**
* bio_integrity_trim - Trim integrity vector
@ -460,7 +459,6 @@ void bioset_integrity_free(struct bio_set *bs)
mempool_exit(&bs->bio_integrity_pool);
mempool_exit(&bs->bvec_integrity_pool);
}
EXPORT_SYMBOL(bioset_integrity_free);
void __init bio_integrity_init(void)
{

View File

@ -244,7 +244,7 @@ fallback:
void bio_uninit(struct bio *bio)
{
bio_disassociate_task(bio);
bio_disassociate_blkg(bio);
}
EXPORT_SYMBOL(bio_uninit);
@ -571,14 +571,13 @@ void bio_put(struct bio *bio)
}
EXPORT_SYMBOL(bio_put);
inline int bio_phys_segments(struct request_queue *q, struct bio *bio)
int bio_phys_segments(struct request_queue *q, struct bio *bio)
{
if (unlikely(!bio_flagged(bio, BIO_SEG_VALID)))
blk_recount_segments(q, bio);
return bio->bi_phys_segments;
}
EXPORT_SYMBOL(bio_phys_segments);
/**
* __bio_clone_fast - clone a bio that shares the original bio's biovec
@ -610,7 +609,8 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
bio->bi_iter = bio_src->bi_iter;
bio->bi_io_vec = bio_src->bi_io_vec;
bio_clone_blkcg_association(bio, bio_src);
bio_clone_blkg_association(bio, bio_src);
blkcg_bio_issue_init(bio);
}
EXPORT_SYMBOL(__bio_clone_fast);
@ -901,7 +901,6 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
return 0;
}
EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
static void submit_bio_wait_endio(struct bio *bio)
{
@ -1592,7 +1591,6 @@ void bio_set_pages_dirty(struct bio *bio)
set_page_dirty_lock(bvec->bv_page);
}
}
EXPORT_SYMBOL_GPL(bio_set_pages_dirty);
static void bio_release_pages(struct bio *bio)
{
@ -1662,17 +1660,33 @@ defer:
spin_unlock_irqrestore(&bio_dirty_lock, flags);
schedule_work(&bio_dirty_work);
}
EXPORT_SYMBOL_GPL(bio_check_pages_dirty);
void update_io_ticks(struct hd_struct *part, unsigned long now)
{
unsigned long stamp;
again:
stamp = READ_ONCE(part->stamp);
if (unlikely(stamp != now)) {
if (likely(cmpxchg(&part->stamp, stamp, now) == stamp)) {
__part_stat_add(part, io_ticks, 1);
}
}
if (part->partno) {
part = &part_to_disk(part)->part0;
goto again;
}
}
void generic_start_io_acct(struct request_queue *q, int op,
unsigned long sectors, struct hd_struct *part)
{
const int sgrp = op_stat_group(op);
int cpu = part_stat_lock();
part_round_stats(q, cpu, part);
part_stat_inc(cpu, part, ios[sgrp]);
part_stat_add(cpu, part, sectors[sgrp], sectors);
part_stat_lock();
update_io_ticks(part, jiffies);
part_stat_inc(part, ios[sgrp]);
part_stat_add(part, sectors[sgrp], sectors);
part_inc_in_flight(q, part, op_is_write(op));
part_stat_unlock();
@ -1682,12 +1696,15 @@ EXPORT_SYMBOL(generic_start_io_acct);
void generic_end_io_acct(struct request_queue *q, int req_op,
struct hd_struct *part, unsigned long start_time)
{
unsigned long duration = jiffies - start_time;
unsigned long now = jiffies;
unsigned long duration = now - start_time;
const int sgrp = op_stat_group(req_op);
int cpu = part_stat_lock();
part_stat_add(cpu, part, nsecs[sgrp], jiffies_to_nsecs(duration));
part_round_stats(q, cpu, part);
part_stat_lock();
update_io_ticks(part, now);
part_stat_add(part, nsecs[sgrp], jiffies_to_nsecs(duration));
part_stat_add(part, time_in_queue, duration);
part_dec_in_flight(q, part, op_is_write(req_op));
part_stat_unlock();
@ -1957,102 +1974,133 @@ EXPORT_SYMBOL(bioset_init_from_src);
#ifdef CONFIG_BLK_CGROUP
#ifdef CONFIG_MEMCG
/**
* bio_associate_blkcg_from_page - associate a bio with the page's blkcg
* bio_disassociate_blkg - puts back the blkg reference if associated
* @bio: target bio
* @page: the page to lookup the blkcg from
*
* Associate @bio with the blkcg from @page's owning memcg. This works like
* every other associate function wrt references.
* Helper to disassociate the blkg from @bio if a blkg is associated.
*/
int bio_associate_blkcg_from_page(struct bio *bio, struct page *page)
void bio_disassociate_blkg(struct bio *bio)
{
struct cgroup_subsys_state *blkcg_css;
if (unlikely(bio->bi_css))
return -EBUSY;
if (!page->mem_cgroup)
return 0;
blkcg_css = cgroup_get_e_css(page->mem_cgroup->css.cgroup,
&io_cgrp_subsys);
bio->bi_css = blkcg_css;
return 0;
}
#endif /* CONFIG_MEMCG */
/**
* bio_associate_blkcg - associate a bio with the specified blkcg
* @bio: target bio
* @blkcg_css: css of the blkcg to associate
*
* Associate @bio with the blkcg specified by @blkcg_css. Block layer will
* treat @bio as if it were issued by a task which belongs to the blkcg.
*
* This function takes an extra reference of @blkcg_css which will be put
* when @bio is released. The caller must own @bio and is responsible for
* synchronizing calls to this function.
*/
int bio_associate_blkcg(struct bio *bio, struct cgroup_subsys_state *blkcg_css)
{
if (unlikely(bio->bi_css))
return -EBUSY;
css_get(blkcg_css);
bio->bi_css = blkcg_css;
return 0;
}
EXPORT_SYMBOL_GPL(bio_associate_blkcg);
/**
* bio_associate_blkg - associate a bio with the specified blkg
* @bio: target bio
* @blkg: the blkg to associate
*
* Associate @bio with the blkg specified by @blkg. This is the queue specific
* blkcg information associated with the @bio, a reference will be taken on the
* @blkg and will be freed when the bio is freed.
*/
int bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg)
{
if (unlikely(bio->bi_blkg))
return -EBUSY;
if (!blkg_try_get(blkg))
return -ENODEV;
bio->bi_blkg = blkg;
return 0;
}
/**
* bio_disassociate_task - undo bio_associate_current()
* @bio: target bio
*/
void bio_disassociate_task(struct bio *bio)
{
if (bio->bi_ioc) {
put_io_context(bio->bi_ioc);
bio->bi_ioc = NULL;
}
if (bio->bi_css) {
css_put(bio->bi_css);
bio->bi_css = NULL;
}
if (bio->bi_blkg) {
blkg_put(bio->bi_blkg);
bio->bi_blkg = NULL;
}
}
EXPORT_SYMBOL_GPL(bio_disassociate_blkg);
/**
* bio_clone_blkcg_association - clone blkcg association from src to dst bio
* __bio_associate_blkg - associate a bio with the a blkg
* @bio: target bio
* @blkg: the blkg to associate
*
* This tries to associate @bio with the specified @blkg. Association failure
* is handled by walking up the blkg tree. Therefore, the blkg associated can
* be anything between @blkg and the root_blkg. This situation only happens
* when a cgroup is dying and then the remaining bios will spill to the closest
* alive blkg.
*
* A reference will be taken on the @blkg and will be released when @bio is
* freed.
*/
static void __bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg)
{
bio_disassociate_blkg(bio);
bio->bi_blkg = blkg_tryget_closest(blkg);
}
/**
* bio_associate_blkg_from_css - associate a bio with a specified css
* @bio: target bio
* @css: target css
*
* Associate @bio with the blkg found by combining the css's blkg and the
* request_queue of the @bio. This falls back to the queue's root_blkg if
* the association fails with the css.
*/
void bio_associate_blkg_from_css(struct bio *bio,
struct cgroup_subsys_state *css)
{
struct request_queue *q = bio->bi_disk->queue;
struct blkcg_gq *blkg;
rcu_read_lock();
if (!css || !css->parent)
blkg = q->root_blkg;
else
blkg = blkg_lookup_create(css_to_blkcg(css), q);
__bio_associate_blkg(bio, blkg);
rcu_read_unlock();
}
EXPORT_SYMBOL_GPL(bio_associate_blkg_from_css);
#ifdef CONFIG_MEMCG
/**
* bio_associate_blkg_from_page - associate a bio with the page's blkg
* @bio: target bio
* @page: the page to lookup the blkcg from
*
* Associate @bio with the blkg from @page's owning memcg and the respective
* request_queue. If cgroup_e_css returns %NULL, fall back to the queue's
* root_blkg.
*/
void bio_associate_blkg_from_page(struct bio *bio, struct page *page)
{
struct cgroup_subsys_state *css;
if (!page->mem_cgroup)
return;
rcu_read_lock();
css = cgroup_e_css(page->mem_cgroup->css.cgroup, &io_cgrp_subsys);
bio_associate_blkg_from_css(bio, css);
rcu_read_unlock();
}
#endif /* CONFIG_MEMCG */
/**
* bio_associate_blkg - associate a bio with a blkg
* @bio: target bio
*
* Associate @bio with the blkg found from the bio's css and request_queue.
* If one is not found, bio_lookup_blkg() creates the blkg. If a blkg is
* already associated, the css is reused and association redone as the
* request_queue may have changed.
*/
void bio_associate_blkg(struct bio *bio)
{
struct cgroup_subsys_state *css;
rcu_read_lock();
if (bio->bi_blkg)
css = &bio_blkcg(bio)->css;
else
css = blkcg_css();
bio_associate_blkg_from_css(bio, css);
rcu_read_unlock();
}
EXPORT_SYMBOL_GPL(bio_associate_blkg);
/**
* bio_clone_blkg_association - clone blkg association from src to dst bio
* @dst: destination bio
* @src: source bio
*/
void bio_clone_blkcg_association(struct bio *dst, struct bio *src)
void bio_clone_blkg_association(struct bio *dst, struct bio *src)
{
if (src->bi_css)
WARN_ON(bio_associate_blkcg(dst, src->bi_css));
if (src->bi_blkg)
__bio_associate_blkg(dst, src->bi_blkg);
}
EXPORT_SYMBOL_GPL(bio_clone_blkcg_association);
EXPORT_SYMBOL_GPL(bio_clone_blkg_association);
#endif /* CONFIG_BLK_CGROUP */
static void __init biovec_init_slabs(void)

View File

@ -76,14 +76,42 @@ static void blkg_free(struct blkcg_gq *blkg)
if (blkg->pd[i])
blkcg_policy[i]->pd_free_fn(blkg->pd[i]);
if (blkg->blkcg != &blkcg_root)
blk_exit_rl(blkg->q, &blkg->rl);
blkg_rwstat_exit(&blkg->stat_ios);
blkg_rwstat_exit(&blkg->stat_bytes);
kfree(blkg);
}
static void __blkg_release(struct rcu_head *rcu)
{
struct blkcg_gq *blkg = container_of(rcu, struct blkcg_gq, rcu_head);
percpu_ref_exit(&blkg->refcnt);
/* release the blkcg and parent blkg refs this blkg has been holding */
css_put(&blkg->blkcg->css);
if (blkg->parent)
blkg_put(blkg->parent);
wb_congested_put(blkg->wb_congested);
blkg_free(blkg);
}
/*
* A group is RCU protected, but having an rcu lock does not mean that one
* can access all the fields of blkg and assume these are valid. For
* example, don't try to follow throtl_data and request queue links.
*
* Having a reference to blkg under an rcu allows accesses to only values
* local to groups like group stats and group rate limits.
*/
static void blkg_release(struct percpu_ref *ref)
{
struct blkcg_gq *blkg = container_of(ref, struct blkcg_gq, refcnt);
call_rcu(&blkg->rcu_head, __blkg_release);
}
/**
* blkg_alloc - allocate a blkg
* @blkcg: block cgroup the new blkg is associated with
@ -110,14 +138,6 @@ static struct blkcg_gq *blkg_alloc(struct blkcg *blkcg, struct request_queue *q,
blkg->q = q;
INIT_LIST_HEAD(&blkg->q_node);
blkg->blkcg = blkcg;
atomic_set(&blkg->refcnt, 1);
/* root blkg uses @q->root_rl, init rl only for !root blkgs */
if (blkcg != &blkcg_root) {
if (blk_init_rl(&blkg->rl, q, gfp_mask))
goto err_free;
blkg->rl.blkg = blkg;
}
for (i = 0; i < BLKCG_MAX_POLS; i++) {
struct blkcg_policy *pol = blkcg_policy[i];
@ -157,7 +177,7 @@ struct blkcg_gq *blkg_lookup_slowpath(struct blkcg *blkcg,
blkg = radix_tree_lookup(&blkcg->blkg_tree, q->id);
if (blkg && blkg->q == q) {
if (update_hint) {
lockdep_assert_held(q->queue_lock);
lockdep_assert_held(&q->queue_lock);
rcu_assign_pointer(blkcg->blkg_hint, blkg);
}
return blkg;
@ -180,7 +200,13 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
int i, ret;
WARN_ON_ONCE(!rcu_read_lock_held());
lockdep_assert_held(q->queue_lock);
lockdep_assert_held(&q->queue_lock);
/* request_queue is dying, do not create/recreate a blkg */
if (blk_queue_dying(q)) {
ret = -ENODEV;
goto err_free_blkg;
}
/* blkg holds a reference to blkcg */
if (!css_tryget_online(&blkcg->css)) {
@ -217,6 +243,11 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
blkg_get(blkg->parent);
}
ret = percpu_ref_init(&blkg->refcnt, blkg_release, 0,
GFP_NOWAIT | __GFP_NOWARN);
if (ret)
goto err_cancel_ref;
/* invoke per-policy init */
for (i = 0; i < BLKCG_MAX_POLS; i++) {
struct blkcg_policy *pol = blkcg_policy[i];
@ -249,6 +280,8 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
blkg_put(blkg);
return ERR_PTR(ret);
err_cancel_ref:
percpu_ref_exit(&blkg->refcnt);
err_put_congested:
wb_congested_put(wb_congested);
err_put_css:
@ -259,7 +292,7 @@ err_free_blkg:
}
/**
* blkg_lookup_create - lookup blkg, try to create one if not there
* __blkg_lookup_create - lookup blkg, try to create one if not there
* @blkcg: blkcg of interest
* @q: request_queue of interest
*
@ -268,24 +301,16 @@ err_free_blkg:
* that all non-root blkg's have access to the parent blkg. This function
* should be called under RCU read lock and @q->queue_lock.
*
* Returns pointer to the looked up or created blkg on success, ERR_PTR()
* value on error. If @q is dead, returns ERR_PTR(-EINVAL). If @q is not
* dead and bypassing, returns ERR_PTR(-EBUSY).
* Returns the blkg or the closest blkg if blkg_create() fails as it walks
* down from root.
*/
struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
struct request_queue *q)
struct blkcg_gq *__blkg_lookup_create(struct blkcg *blkcg,
struct request_queue *q)
{
struct blkcg_gq *blkg;
WARN_ON_ONCE(!rcu_read_lock_held());
lockdep_assert_held(q->queue_lock);
/*
* This could be the first entry point of blkcg implementation and
* we shouldn't allow anything to go through for a bypassing queue.
*/
if (unlikely(blk_queue_bypass(q)))
return ERR_PTR(blk_queue_dying(q) ? -ENODEV : -EBUSY);
lockdep_assert_held(&q->queue_lock);
blkg = __blkg_lookup(blkcg, q, true);
if (blkg)
@ -293,30 +318,64 @@ struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
/*
* Create blkgs walking down from blkcg_root to @blkcg, so that all
* non-root blkgs have access to their parents.
* non-root blkgs have access to their parents. Returns the closest
* blkg to the intended blkg should blkg_create() fail.
*/
while (true) {
struct blkcg *pos = blkcg;
struct blkcg *parent = blkcg_parent(blkcg);
struct blkcg_gq *ret_blkg = q->root_blkg;
while (parent && !__blkg_lookup(parent, q, false)) {
while (parent) {
blkg = __blkg_lookup(parent, q, false);
if (blkg) {
/* remember closest blkg */
ret_blkg = blkg;
break;
}
pos = parent;
parent = blkcg_parent(parent);
}
blkg = blkg_create(pos, q, NULL);
if (pos == blkcg || IS_ERR(blkg))
if (IS_ERR(blkg))
return ret_blkg;
if (pos == blkcg)
return blkg;
}
}
/**
* blkg_lookup_create - find or create a blkg
* @blkcg: target block cgroup
* @q: target request_queue
*
* This looks up or creates the blkg representing the unique pair
* of the blkcg and the request_queue.
*/
struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
struct request_queue *q)
{
struct blkcg_gq *blkg = blkg_lookup(blkcg, q);
if (unlikely(!blkg)) {
unsigned long flags;
spin_lock_irqsave(&q->queue_lock, flags);
blkg = __blkg_lookup_create(blkcg, q);
spin_unlock_irqrestore(&q->queue_lock, flags);
}
return blkg;
}
static void blkg_destroy(struct blkcg_gq *blkg)
{
struct blkcg *blkcg = blkg->blkcg;
struct blkcg_gq *parent = blkg->parent;
int i;
lockdep_assert_held(blkg->q->queue_lock);
lockdep_assert_held(&blkg->q->queue_lock);
lockdep_assert_held(&blkcg->lock);
/* Something wrong if we are trying to remove same group twice */
@ -353,7 +412,7 @@ static void blkg_destroy(struct blkcg_gq *blkg)
* Put the reference taken at the time of creation so that when all
* queues are gone, group can be destroyed.
*/
blkg_put(blkg);
percpu_ref_kill(&blkg->refcnt);
}
/**
@ -366,8 +425,7 @@ static void blkg_destroy_all(struct request_queue *q)
{
struct blkcg_gq *blkg, *n;
lockdep_assert_held(q->queue_lock);
spin_lock_irq(&q->queue_lock);
list_for_each_entry_safe(blkg, n, &q->blkg_list, q_node) {
struct blkcg *blkcg = blkg->blkcg;
@ -377,7 +435,7 @@ static void blkg_destroy_all(struct request_queue *q)
}
q->root_blkg = NULL;
q->root_rl.blkg = NULL;
spin_unlock_irq(&q->queue_lock);
}
/*
@ -403,41 +461,6 @@ void __blkg_release_rcu(struct rcu_head *rcu_head)
}
EXPORT_SYMBOL_GPL(__blkg_release_rcu);
/*
* The next function used by blk_queue_for_each_rl(). It's a bit tricky
* because the root blkg uses @q->root_rl instead of its own rl.
*/
struct request_list *__blk_queue_next_rl(struct request_list *rl,
struct request_queue *q)
{
struct list_head *ent;
struct blkcg_gq *blkg;
/*
* Determine the current blkg list_head. The first entry is
* root_rl which is off @q->blkg_list and mapped to the head.
*/
if (rl == &q->root_rl) {
ent = &q->blkg_list;
/* There are no more block groups, hence no request lists */
if (list_empty(ent))
return NULL;
} else {
blkg = container_of(rl, struct blkcg_gq, rl);
ent = &blkg->q_node;
}
/* walk to the next list_head, skip root blkcg */
ent = ent->next;
if (ent == &q->root_blkg->q_node)
ent = ent->next;
if (ent == &q->blkg_list)
return NULL;
blkg = container_of(ent, struct blkcg_gq, q_node);
return &blkg->rl;
}
static int blkcg_reset_stats(struct cgroup_subsys_state *css,
struct cftype *cftype, u64 val)
{
@ -477,7 +500,6 @@ const char *blkg_dev_name(struct blkcg_gq *blkg)
return dev_name(blkg->q->backing_dev_info->dev);
return NULL;
}
EXPORT_SYMBOL_GPL(blkg_dev_name);
/**
* blkcg_print_blkgs - helper for printing per-blkg data
@ -508,10 +530,10 @@ void blkcg_print_blkgs(struct seq_file *sf, struct blkcg *blkcg,
rcu_read_lock();
hlist_for_each_entry_rcu(blkg, &blkcg->blkg_list, blkcg_node) {
spin_lock_irq(blkg->q->queue_lock);
spin_lock_irq(&blkg->q->queue_lock);
if (blkcg_policy_enabled(blkg->q, pol))
total += prfill(sf, blkg->pd[pol->plid], data);
spin_unlock_irq(blkg->q->queue_lock);
spin_unlock_irq(&blkg->q->queue_lock);
}
rcu_read_unlock();
@ -709,7 +731,7 @@ u64 blkg_stat_recursive_sum(struct blkcg_gq *blkg,
struct cgroup_subsys_state *pos_css;
u64 sum = 0;
lockdep_assert_held(blkg->q->queue_lock);
lockdep_assert_held(&blkg->q->queue_lock);
rcu_read_lock();
blkg_for_each_descendant_pre(pos_blkg, pos_css, blkg) {
@ -752,7 +774,7 @@ struct blkg_rwstat blkg_rwstat_recursive_sum(struct blkcg_gq *blkg,
struct blkg_rwstat sum = { };
int i;
lockdep_assert_held(blkg->q->queue_lock);
lockdep_assert_held(&blkg->q->queue_lock);
rcu_read_lock();
blkg_for_each_descendant_pre(pos_blkg, pos_css, blkg) {
@ -783,18 +805,10 @@ static struct blkcg_gq *blkg_lookup_check(struct blkcg *blkcg,
struct request_queue *q)
{
WARN_ON_ONCE(!rcu_read_lock_held());
lockdep_assert_held(q->queue_lock);
lockdep_assert_held(&q->queue_lock);
if (!blkcg_policy_enabled(q, pol))
return ERR_PTR(-EOPNOTSUPP);
/*
* This could be the first entry point of blkcg implementation and
* we shouldn't allow anything to go through for a bypassing queue.
*/
if (unlikely(blk_queue_bypass(q)))
return ERR_PTR(blk_queue_dying(q) ? -ENODEV : -EBUSY);
return __blkg_lookup(blkcg, q, true /* update_hint */);
}
@ -812,7 +826,7 @@ static struct blkcg_gq *blkg_lookup_check(struct blkcg *blkcg,
*/
int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
char *input, struct blkg_conf_ctx *ctx)
__acquires(rcu) __acquires(disk->queue->queue_lock)
__acquires(rcu) __acquires(&disk->queue->queue_lock)
{
struct gendisk *disk;
struct request_queue *q;
@ -840,7 +854,7 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
q = disk->queue;
rcu_read_lock();
spin_lock_irq(q->queue_lock);
spin_lock_irq(&q->queue_lock);
blkg = blkg_lookup_check(blkcg, pol, q);
if (IS_ERR(blkg)) {
@ -867,7 +881,7 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
}
/* Drop locks to do new blkg allocation with GFP_KERNEL. */
spin_unlock_irq(q->queue_lock);
spin_unlock_irq(&q->queue_lock);
rcu_read_unlock();
new_blkg = blkg_alloc(pos, q, GFP_KERNEL);
@ -877,7 +891,7 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
}
rcu_read_lock();
spin_lock_irq(q->queue_lock);
spin_lock_irq(&q->queue_lock);
blkg = blkg_lookup_check(pos, pol, q);
if (IS_ERR(blkg)) {
@ -905,7 +919,7 @@ success:
return 0;
fail_unlock:
spin_unlock_irq(q->queue_lock);
spin_unlock_irq(&q->queue_lock);
rcu_read_unlock();
fail:
put_disk_and_module(disk);
@ -921,7 +935,6 @@ fail:
}
return ret;
}
EXPORT_SYMBOL_GPL(blkg_conf_prep);
/**
* blkg_conf_finish - finish up per-blkg config update
@ -931,13 +944,12 @@ EXPORT_SYMBOL_GPL(blkg_conf_prep);
* with blkg_conf_prep().
*/
void blkg_conf_finish(struct blkg_conf_ctx *ctx)
__releases(ctx->disk->queue->queue_lock) __releases(rcu)
__releases(&ctx->disk->queue->queue_lock) __releases(rcu)
{
spin_unlock_irq(ctx->disk->queue->queue_lock);
spin_unlock_irq(&ctx->disk->queue->queue_lock);
rcu_read_unlock();
put_disk_and_module(ctx->disk);
}
EXPORT_SYMBOL_GPL(blkg_conf_finish);
static int blkcg_print_stat(struct seq_file *sf, void *v)
{
@ -967,7 +979,7 @@ static int blkcg_print_stat(struct seq_file *sf, void *v)
*/
off += scnprintf(buf+off, size-off, "%s ", dname);
spin_lock_irq(blkg->q->queue_lock);
spin_lock_irq(&blkg->q->queue_lock);
rwstat = blkg_rwstat_recursive_sum(blkg, NULL,
offsetof(struct blkcg_gq, stat_bytes));
@ -981,7 +993,7 @@ static int blkcg_print_stat(struct seq_file *sf, void *v)
wios = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_WRITE]);
dios = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_DISCARD]);
spin_unlock_irq(blkg->q->queue_lock);
spin_unlock_irq(&blkg->q->queue_lock);
if (rbytes || wbytes || rios || wios) {
has_stats = true;
@ -1102,9 +1114,9 @@ void blkcg_destroy_blkgs(struct blkcg *blkcg)
struct blkcg_gq, blkcg_node);
struct request_queue *q = blkg->q;
if (spin_trylock(q->queue_lock)) {
if (spin_trylock(&q->queue_lock)) {
blkg_destroy(blkg);
spin_unlock(q->queue_lock);
spin_unlock(&q->queue_lock);
} else {
spin_unlock_irq(&blkcg->lock);
cpu_relax();
@ -1225,36 +1237,31 @@ int blkcg_init_queue(struct request_queue *q)
/* Make sure the root blkg exists. */
rcu_read_lock();
spin_lock_irq(q->queue_lock);
spin_lock_irq(&q->queue_lock);
blkg = blkg_create(&blkcg_root, q, new_blkg);
if (IS_ERR(blkg))
goto err_unlock;
q->root_blkg = blkg;
q->root_rl.blkg = blkg;
spin_unlock_irq(q->queue_lock);
spin_unlock_irq(&q->queue_lock);
rcu_read_unlock();
if (preloaded)
radix_tree_preload_end();
ret = blk_iolatency_init(q);
if (ret) {
spin_lock_irq(q->queue_lock);
blkg_destroy_all(q);
spin_unlock_irq(q->queue_lock);
return ret;
}
if (ret)
goto err_destroy_all;
ret = blk_throtl_init(q);
if (ret) {
spin_lock_irq(q->queue_lock);
blkg_destroy_all(q);
spin_unlock_irq(q->queue_lock);
}
return ret;
if (ret)
goto err_destroy_all;
return 0;
err_destroy_all:
blkg_destroy_all(q);
return ret;
err_unlock:
spin_unlock_irq(q->queue_lock);
spin_unlock_irq(&q->queue_lock);
rcu_read_unlock();
if (preloaded)
radix_tree_preload_end();
@ -1269,7 +1276,7 @@ err_unlock:
*/
void blkcg_drain_queue(struct request_queue *q)
{
lockdep_assert_held(q->queue_lock);
lockdep_assert_held(&q->queue_lock);
/*
* @q could be exiting and already have destroyed all blkgs as
@ -1289,10 +1296,7 @@ void blkcg_drain_queue(struct request_queue *q)
*/
void blkcg_exit_queue(struct request_queue *q)
{
spin_lock_irq(q->queue_lock);
blkg_destroy_all(q);
spin_unlock_irq(q->queue_lock);
blk_throtl_exit(q);
}
@ -1396,10 +1400,8 @@ int blkcg_activate_policy(struct request_queue *q,
if (blkcg_policy_enabled(q, pol))
return 0;
if (q->mq_ops)
if (queue_is_mq(q))
blk_mq_freeze_queue(q);
else
blk_queue_bypass_start(q);
pd_prealloc:
if (!pd_prealloc) {
pd_prealloc = pol->pd_alloc_fn(GFP_KERNEL, q->node);
@ -1409,7 +1411,7 @@ pd_prealloc:
}
}
spin_lock_irq(q->queue_lock);
spin_lock_irq(&q->queue_lock);
list_for_each_entry(blkg, &q->blkg_list, q_node) {
struct blkg_policy_data *pd;
@ -1421,7 +1423,7 @@ pd_prealloc:
if (!pd)
swap(pd, pd_prealloc);
if (!pd) {
spin_unlock_irq(q->queue_lock);
spin_unlock_irq(&q->queue_lock);
goto pd_prealloc;
}
@ -1435,12 +1437,10 @@ pd_prealloc:
__set_bit(pol->plid, q->blkcg_pols);
ret = 0;
spin_unlock_irq(q->queue_lock);
spin_unlock_irq(&q->queue_lock);
out_bypass_end:
if (q->mq_ops)
if (queue_is_mq(q))
blk_mq_unfreeze_queue(q);
else
blk_queue_bypass_end(q);
if (pd_prealloc)
pol->pd_free_fn(pd_prealloc);
return ret;
@ -1463,12 +1463,10 @@ void blkcg_deactivate_policy(struct request_queue *q,
if (!blkcg_policy_enabled(q, pol))
return;
if (q->mq_ops)
if (queue_is_mq(q))
blk_mq_freeze_queue(q);
else
blk_queue_bypass_start(q);
spin_lock_irq(q->queue_lock);
spin_lock_irq(&q->queue_lock);
__clear_bit(pol->plid, q->blkcg_pols);
@ -1481,12 +1479,10 @@ void blkcg_deactivate_policy(struct request_queue *q,
}
}
spin_unlock_irq(q->queue_lock);
spin_unlock_irq(&q->queue_lock);
if (q->mq_ops)
if (queue_is_mq(q))
blk_mq_unfreeze_queue(q);
else
blk_queue_bypass_end(q);
}
EXPORT_SYMBOL_GPL(blkcg_deactivate_policy);
@ -1748,8 +1744,7 @@ void blkcg_maybe_throttle_current(void)
blkg = blkg_lookup(blkcg, q);
if (!blkg)
goto out;
blkg = blkg_try_get(blkg);
if (!blkg)
if (!blkg_tryget(blkg))
goto out;
rcu_read_unlock();
@ -1761,7 +1756,6 @@ out:
rcu_read_unlock();
blk_put_queue(q);
}
EXPORT_SYMBOL_GPL(blkcg_maybe_throttle_current);
/**
* blkcg_schedule_throttle - this task needs to check for throttling
@ -1795,7 +1789,6 @@ void blkcg_schedule_throttle(struct request_queue *q, bool use_memdelay)
current->use_memdelay = use_memdelay;
set_notify_resume(current);
}
EXPORT_SYMBOL_GPL(blkcg_schedule_throttle);
/**
* blkcg_add_delay - add delay to this blkg
@ -1810,7 +1803,6 @@ void blkcg_add_delay(struct blkcg_gq *blkg, u64 now, u64 delta)
blkcg_scale_delay(blkg, now);
atomic64_add(delta, &blkg->delay_nsec);
}
EXPORT_SYMBOL_GPL(blkcg_add_delay);
module_param(blkcg_debug_stats, bool, 0644);
MODULE_PARM_DESC(blkcg_debug_stats, "True if you want debug stats, false if not");

File diff suppressed because it is too large Load Diff

View File

@ -48,8 +48,6 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
struct request *rq, int at_head,
rq_end_io_fn *done)
{
int where = at_head ? ELEVATOR_INSERT_FRONT : ELEVATOR_INSERT_BACK;
WARN_ON(irqs_disabled());
WARN_ON(!blk_rq_is_passthrough(rq));
@ -60,23 +58,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
* don't check dying flag for MQ because the request won't
* be reused after dying flag is set
*/
if (q->mq_ops) {
blk_mq_sched_insert_request(rq, at_head, true, false);
return;
}
spin_lock_irq(q->queue_lock);
if (unlikely(blk_queue_dying(q))) {
rq->rq_flags |= RQF_QUIET;
__blk_end_request_all(rq, BLK_STS_IOERR);
spin_unlock_irq(q->queue_lock);
return;
}
__elv_add_request(q, rq, where);
__blk_run_queue(q);
spin_unlock_irq(q->queue_lock);
blk_mq_sched_insert_request(rq, at_head, true, false);
}
EXPORT_SYMBOL_GPL(blk_execute_rq_nowait);

View File

@ -93,7 +93,7 @@ enum {
FLUSH_PENDING_TIMEOUT = 5 * HZ,
};
static bool blk_kick_flush(struct request_queue *q,
static void blk_kick_flush(struct request_queue *q,
struct blk_flush_queue *fq, unsigned int flags);
static unsigned int blk_flush_policy(unsigned long fflags, struct request *rq)
@ -132,18 +132,9 @@ static void blk_flush_restore_request(struct request *rq)
rq->end_io = rq->flush.saved_end_io;
}
static bool blk_flush_queue_rq(struct request *rq, bool add_front)
static void blk_flush_queue_rq(struct request *rq, bool add_front)
{
if (rq->q->mq_ops) {
blk_mq_add_to_requeue_list(rq, add_front, true);
return false;
} else {
if (add_front)
list_add(&rq->queuelist, &rq->q->queue_head);
else
list_add_tail(&rq->queuelist, &rq->q->queue_head);
return true;
}
blk_mq_add_to_requeue_list(rq, add_front, true);
}
/**
@ -157,18 +148,17 @@ static bool blk_flush_queue_rq(struct request *rq, bool add_front)
* completion and trigger the next step.
*
* CONTEXT:
* spin_lock_irq(q->queue_lock or fq->mq_flush_lock)
* spin_lock_irq(fq->mq_flush_lock)
*
* RETURNS:
* %true if requests were added to the dispatch queue, %false otherwise.
*/
static bool blk_flush_complete_seq(struct request *rq,
static void blk_flush_complete_seq(struct request *rq,
struct blk_flush_queue *fq,
unsigned int seq, blk_status_t error)
{
struct request_queue *q = rq->q;
struct list_head *pending = &fq->flush_queue[fq->flush_pending_idx];
bool queued = false, kicked;
unsigned int cmd_flags;
BUG_ON(rq->flush.seq & seq);
@ -191,7 +181,7 @@ static bool blk_flush_complete_seq(struct request *rq,
case REQ_FSEQ_DATA:
list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
queued = blk_flush_queue_rq(rq, true);
blk_flush_queue_rq(rq, true);
break;
case REQ_FSEQ_DONE:
@ -204,42 +194,34 @@ static bool blk_flush_complete_seq(struct request *rq,
BUG_ON(!list_empty(&rq->queuelist));
list_del_init(&rq->flush.list);
blk_flush_restore_request(rq);
if (q->mq_ops)
blk_mq_end_request(rq, error);
else
__blk_end_request_all(rq, error);
blk_mq_end_request(rq, error);
break;
default:
BUG();
}
kicked = blk_kick_flush(q, fq, cmd_flags);
return kicked | queued;
blk_kick_flush(q, fq, cmd_flags);
}
static void flush_end_io(struct request *flush_rq, blk_status_t error)
{
struct request_queue *q = flush_rq->q;
struct list_head *running;
bool queued = false;
struct request *rq, *n;
unsigned long flags = 0;
struct blk_flush_queue *fq = blk_get_flush_queue(q, flush_rq->mq_ctx);
struct blk_mq_hw_ctx *hctx;
if (q->mq_ops) {
struct blk_mq_hw_ctx *hctx;
/* release the tag's ownership to the req cloned from */
spin_lock_irqsave(&fq->mq_flush_lock, flags);
hctx = blk_mq_map_queue(q, flush_rq->mq_ctx->cpu);
if (!q->elevator) {
blk_mq_tag_set_rq(hctx, flush_rq->tag, fq->orig_rq);
flush_rq->tag = -1;
} else {
blk_mq_put_driver_tag_hctx(hctx, flush_rq);
flush_rq->internal_tag = -1;
}
/* release the tag's ownership to the req cloned from */
spin_lock_irqsave(&fq->mq_flush_lock, flags);
hctx = flush_rq->mq_hctx;
if (!q->elevator) {
blk_mq_tag_set_rq(hctx, flush_rq->tag, fq->orig_rq);
flush_rq->tag = -1;
} else {
blk_mq_put_driver_tag_hctx(hctx, flush_rq);
flush_rq->internal_tag = -1;
}
running = &fq->flush_queue[fq->flush_running_idx];
@ -248,35 +230,16 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error)
/* account completion of the flush request */
fq->flush_running_idx ^= 1;
if (!q->mq_ops)
elv_completed_request(q, flush_rq);
/* and push the waiting requests to the next stage */
list_for_each_entry_safe(rq, n, running, flush.list) {
unsigned int seq = blk_flush_cur_seq(rq);
BUG_ON(seq != REQ_FSEQ_PREFLUSH && seq != REQ_FSEQ_POSTFLUSH);
queued |= blk_flush_complete_seq(rq, fq, seq, error);
blk_flush_complete_seq(rq, fq, seq, error);
}
/*
* Kick the queue to avoid stall for two cases:
* 1. Moving a request silently to empty queue_head may stall the
* queue.
* 2. When flush request is running in non-queueable queue, the
* queue is hold. Restart the queue after flush request is finished
* to avoid stall.
* This function is called from request completion path and calling
* directly into request_fn may confuse the driver. Always use
* kblockd.
*/
if (queued || fq->flush_queue_delayed) {
WARN_ON(q->mq_ops);
blk_run_queue_async(q);
}
fq->flush_queue_delayed = 0;
if (q->mq_ops)
spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
}
/**
@ -289,12 +252,10 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error)
* Please read the comment at the top of this file for more info.
*
* CONTEXT:
* spin_lock_irq(q->queue_lock or fq->mq_flush_lock)
* spin_lock_irq(fq->mq_flush_lock)
*
* RETURNS:
* %true if flush was issued, %false otherwise.
*/
static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
unsigned int flags)
{
struct list_head *pending = &fq->flush_queue[fq->flush_pending_idx];
@ -304,7 +265,7 @@ static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
/* C1 described at the top of this file */
if (fq->flush_pending_idx != fq->flush_running_idx || list_empty(pending))
return false;
return;
/* C2 and C3
*
@ -312,11 +273,10 @@ static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
* assigned to empty flushes, and we deadlock if we are expecting
* other requests to make progress. Don't defer for that case.
*/
if (!list_empty(&fq->flush_data_in_flight) &&
!(q->mq_ops && q->elevator) &&
if (!list_empty(&fq->flush_data_in_flight) && q->elevator &&
time_before(jiffies,
fq->flush_pending_since + FLUSH_PENDING_TIMEOUT))
return false;
return;
/*
* Issue flush and toggle pending_idx. This makes pending_idx
@ -334,19 +294,15 @@ static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
* In case of IO scheduler, flush rq need to borrow scheduler tag
* just for cheating put/get driver tag.
*/
if (q->mq_ops) {
struct blk_mq_hw_ctx *hctx;
flush_rq->mq_ctx = first_rq->mq_ctx;
flush_rq->mq_hctx = first_rq->mq_hctx;
flush_rq->mq_ctx = first_rq->mq_ctx;
if (!q->elevator) {
fq->orig_rq = first_rq;
flush_rq->tag = first_rq->tag;
hctx = blk_mq_map_queue(q, first_rq->mq_ctx->cpu);
blk_mq_tag_set_rq(hctx, first_rq->tag, flush_rq);
} else {
flush_rq->internal_tag = first_rq->internal_tag;
}
if (!q->elevator) {
fq->orig_rq = first_rq;
flush_rq->tag = first_rq->tag;
blk_mq_tag_set_rq(flush_rq->mq_hctx, first_rq->tag, flush_rq);
} else {
flush_rq->internal_tag = first_rq->internal_tag;
}
flush_rq->cmd_flags = REQ_OP_FLUSH | REQ_PREFLUSH;
@ -355,62 +311,17 @@ static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
flush_rq->rq_disk = first_rq->rq_disk;
flush_rq->end_io = flush_end_io;
return blk_flush_queue_rq(flush_rq, false);
}
static void flush_data_end_io(struct request *rq, blk_status_t error)
{
struct request_queue *q = rq->q;
struct blk_flush_queue *fq = blk_get_flush_queue(q, NULL);
lockdep_assert_held(q->queue_lock);
/*
* Updating q->in_flight[] here for making this tag usable
* early. Because in blk_queue_start_tag(),
* q->in_flight[BLK_RW_ASYNC] is used to limit async I/O and
* reserve tags for sync I/O.
*
* More importantly this way can avoid the following I/O
* deadlock:
*
* - suppose there are 40 fua requests comming to flush queue
* and queue depth is 31
* - 30 rqs are scheduled then blk_queue_start_tag() can't alloc
* tag for async I/O any more
* - all the 30 rqs are completed before FLUSH_PENDING_TIMEOUT
* and flush_data_end_io() is called
* - the other rqs still can't go ahead if not updating
* q->in_flight[BLK_RW_ASYNC] here, meantime these rqs
* are held in flush data queue and make no progress of
* handling post flush rq
* - only after the post flush rq is handled, all these rqs
* can be completed
*/
elv_completed_request(q, rq);
/* for avoiding double accounting */
rq->rq_flags &= ~RQF_STARTED;
/*
* After populating an empty queue, kick it to avoid stall. Read
* the comment in flush_end_io().
*/
if (blk_flush_complete_seq(rq, fq, REQ_FSEQ_DATA, error))
blk_run_queue_async(q);
blk_flush_queue_rq(flush_rq, false);
}
static void mq_flush_data_end_io(struct request *rq, blk_status_t error)
{
struct request_queue *q = rq->q;
struct blk_mq_hw_ctx *hctx;
struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
struct blk_mq_ctx *ctx = rq->mq_ctx;
unsigned long flags;
struct blk_flush_queue *fq = blk_get_flush_queue(q, ctx);
hctx = blk_mq_map_queue(q, ctx->cpu);
if (q->elevator) {
WARN_ON(rq->tag < 0);
blk_mq_put_driver_tag_hctx(hctx, rq);
@ -443,9 +354,6 @@ void blk_insert_flush(struct request *rq)
unsigned int policy = blk_flush_policy(fflags, rq);
struct blk_flush_queue *fq = blk_get_flush_queue(q, rq->mq_ctx);
if (!q->mq_ops)
lockdep_assert_held(q->queue_lock);
/*
* @policy now records what operations need to be done. Adjust
* REQ_PREFLUSH and FUA for the driver.
@ -468,10 +376,7 @@ void blk_insert_flush(struct request *rq)
* complete the request.
*/
if (!policy) {
if (q->mq_ops)
blk_mq_end_request(rq, 0);
else
__blk_end_request(rq, 0, 0);
blk_mq_end_request(rq, 0);
return;
}
@ -484,10 +389,7 @@ void blk_insert_flush(struct request *rq)
*/
if ((policy & REQ_FSEQ_DATA) &&
!(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
if (q->mq_ops)
blk_mq_request_bypass_insert(rq, false);
else
list_add_tail(&rq->queuelist, &q->queue_head);
blk_mq_request_bypass_insert(rq, false);
return;
}
@ -499,17 +401,12 @@ void blk_insert_flush(struct request *rq)
INIT_LIST_HEAD(&rq->flush.list);
rq->rq_flags |= RQF_FLUSH_SEQ;
rq->flush.saved_end_io = rq->end_io; /* Usually NULL */
if (q->mq_ops) {
rq->end_io = mq_flush_data_end_io;
spin_lock_irq(&fq->mq_flush_lock);
blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
spin_unlock_irq(&fq->mq_flush_lock);
return;
}
rq->end_io = flush_data_end_io;
rq->end_io = mq_flush_data_end_io;
spin_lock_irq(&fq->mq_flush_lock);
blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
spin_unlock_irq(&fq->mq_flush_lock);
}
/**
@ -575,8 +472,7 @@ struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q,
if (!fq)
goto fail;
if (q->mq_ops)
spin_lock_init(&fq->mq_flush_lock);
spin_lock_init(&fq->mq_flush_lock);
rq_sz = round_up(rq_sz + cmd_size, cache_line_size());
fq->flush_rq = kzalloc_node(rq_sz, flags, node);

View File

@ -28,7 +28,6 @@ void get_io_context(struct io_context *ioc)
BUG_ON(atomic_long_read(&ioc->refcount) <= 0);
atomic_long_inc(&ioc->refcount);
}
EXPORT_SYMBOL(get_io_context);
static void icq_free_icq_rcu(struct rcu_head *head)
{
@ -48,10 +47,8 @@ static void ioc_exit_icq(struct io_cq *icq)
if (icq->flags & ICQ_EXITED)
return;
if (et->uses_mq && et->ops.mq.exit_icq)
et->ops.mq.exit_icq(icq);
else if (!et->uses_mq && et->ops.sq.elevator_exit_icq_fn)
et->ops.sq.elevator_exit_icq_fn(icq);
if (et->ops.exit_icq)
et->ops.exit_icq(icq);
icq->flags |= ICQ_EXITED;
}
@ -113,9 +110,9 @@ static void ioc_release_fn(struct work_struct *work)
struct io_cq, ioc_node);
struct request_queue *q = icq->q;
if (spin_trylock(q->queue_lock)) {
if (spin_trylock(&q->queue_lock)) {
ioc_destroy_icq(icq);
spin_unlock(q->queue_lock);
spin_unlock(&q->queue_lock);
} else {
spin_unlock_irqrestore(&ioc->lock, flags);
cpu_relax();
@ -162,7 +159,6 @@ void put_io_context(struct io_context *ioc)
if (free_ioc)
kmem_cache_free(iocontext_cachep, ioc);
}
EXPORT_SYMBOL(put_io_context);
/**
* put_io_context_active - put active reference on ioc
@ -173,7 +169,6 @@ EXPORT_SYMBOL(put_io_context);
*/
void put_io_context_active(struct io_context *ioc)
{
struct elevator_type *et;
unsigned long flags;
struct io_cq *icq;
@ -187,25 +182,12 @@ void put_io_context_active(struct io_context *ioc)
* reverse double locking. Read comment in ioc_release_fn() for
* explanation on the nested locking annotation.
*/
retry:
spin_lock_irqsave_nested(&ioc->lock, flags, 1);
hlist_for_each_entry(icq, &ioc->icq_list, ioc_node) {
if (icq->flags & ICQ_EXITED)
continue;
et = icq->q->elevator->type;
if (et->uses_mq) {
ioc_exit_icq(icq);
} else {
if (spin_trylock(icq->q->queue_lock)) {
ioc_exit_icq(icq);
spin_unlock(icq->q->queue_lock);
} else {
spin_unlock_irqrestore(&ioc->lock, flags);
cpu_relax();
goto retry;
}
}
ioc_exit_icq(icq);
}
spin_unlock_irqrestore(&ioc->lock, flags);
@ -232,7 +214,7 @@ static void __ioc_clear_queue(struct list_head *icq_list)
while (!list_empty(icq_list)) {
struct io_cq *icq = list_entry(icq_list->next,
struct io_cq, q_node);
struct io_cq, q_node);
struct io_context *ioc = icq->ioc;
spin_lock_irqsave(&ioc->lock, flags);
@ -251,16 +233,11 @@ void ioc_clear_queue(struct request_queue *q)
{
LIST_HEAD(icq_list);
spin_lock_irq(q->queue_lock);
spin_lock_irq(&q->queue_lock);
list_splice_init(&q->icq_list, &icq_list);
spin_unlock_irq(&q->queue_lock);
if (q->mq_ops) {
spin_unlock_irq(q->queue_lock);
__ioc_clear_queue(&icq_list);
} else {
__ioc_clear_queue(&icq_list);
spin_unlock_irq(q->queue_lock);
}
__ioc_clear_queue(&icq_list);
}
int create_task_io_context(struct task_struct *task, gfp_t gfp_flags, int node)
@ -336,7 +313,6 @@ struct io_context *get_task_io_context(struct task_struct *task,
return NULL;
}
EXPORT_SYMBOL(get_task_io_context);
/**
* ioc_lookup_icq - lookup io_cq from ioc
@ -350,7 +326,7 @@ struct io_cq *ioc_lookup_icq(struct io_context *ioc, struct request_queue *q)
{
struct io_cq *icq;
lockdep_assert_held(q->queue_lock);
lockdep_assert_held(&q->queue_lock);
/*
* icq's are indexed from @ioc using radix tree and hint pointer,
@ -409,16 +385,14 @@ struct io_cq *ioc_create_icq(struct io_context *ioc, struct request_queue *q,
INIT_HLIST_NODE(&icq->ioc_node);
/* lock both q and ioc and try to link @icq */
spin_lock_irq(q->queue_lock);
spin_lock_irq(&q->queue_lock);
spin_lock(&ioc->lock);
if (likely(!radix_tree_insert(&ioc->icq_tree, q->id, icq))) {
hlist_add_head(&icq->ioc_node, &ioc->icq_list);
list_add(&icq->q_node, &q->icq_list);
if (et->uses_mq && et->ops.mq.init_icq)
et->ops.mq.init_icq(icq);
else if (!et->uses_mq && et->ops.sq.elevator_init_icq_fn)
et->ops.sq.elevator_init_icq_fn(icq);
if (et->ops.init_icq)
et->ops.init_icq(icq);
} else {
kmem_cache_free(et->icq_cache, icq);
icq = ioc_lookup_icq(ioc, q);
@ -427,7 +401,7 @@ struct io_cq *ioc_create_icq(struct io_context *ioc, struct request_queue *q,
}
spin_unlock(&ioc->lock);
spin_unlock_irq(q->queue_lock);
spin_unlock_irq(&q->queue_lock);
radix_tree_preload_end();
return icq;
}

View File

@ -262,29 +262,25 @@ static inline void iolat_update_total_lat_avg(struct iolatency_grp *iolat,
stat->rqs.mean);
}
static inline bool iolatency_may_queue(struct iolatency_grp *iolat,
wait_queue_entry_t *wait,
bool first_block)
static void iolat_cleanup_cb(struct rq_wait *rqw, void *private_data)
{
struct rq_wait *rqw = &iolat->rq_wait;
atomic_dec(&rqw->inflight);
wake_up(&rqw->wait);
}
if (first_block && waitqueue_active(&rqw->wait) &&
rqw->wait.head.next != &wait->entry)
return false;
static bool iolat_acquire_inflight(struct rq_wait *rqw, void *private_data)
{
struct iolatency_grp *iolat = private_data;
return rq_wait_inc_below(rqw, iolat->rq_depth.max_depth);
}
static void __blkcg_iolatency_throttle(struct rq_qos *rqos,
struct iolatency_grp *iolat,
spinlock_t *lock, bool issue_as_root,
bool issue_as_root,
bool use_memdelay)
__releases(lock)
__acquires(lock)
{
struct rq_wait *rqw = &iolat->rq_wait;
unsigned use_delay = atomic_read(&lat_to_blkg(iolat)->use_delay);
DEFINE_WAIT(wait);
bool first_block = true;
if (use_delay)
blkcg_schedule_throttle(rqos->q, use_memdelay);
@ -301,27 +297,7 @@ static void __blkcg_iolatency_throttle(struct rq_qos *rqos,
return;
}
if (iolatency_may_queue(iolat, &wait, first_block))
return;
do {
prepare_to_wait_exclusive(&rqw->wait, &wait,
TASK_UNINTERRUPTIBLE);
if (iolatency_may_queue(iolat, &wait, first_block))
break;
first_block = false;
if (lock) {
spin_unlock_irq(lock);
io_schedule();
spin_lock_irq(lock);
} else {
io_schedule();
}
} while (1);
finish_wait(&rqw->wait, &wait);
rq_qos_wait(rqw, iolat, iolat_acquire_inflight, iolat_cleanup_cb);
}
#define SCALE_DOWN_FACTOR 2
@ -478,38 +454,15 @@ static void check_scale_change(struct iolatency_grp *iolat)
scale_change(iolat, direction > 0);
}
static void blkcg_iolatency_throttle(struct rq_qos *rqos, struct bio *bio,
spinlock_t *lock)
static void blkcg_iolatency_throttle(struct rq_qos *rqos, struct bio *bio)
{
struct blk_iolatency *blkiolat = BLKIOLATENCY(rqos);
struct blkcg *blkcg;
struct blkcg_gq *blkg;
struct request_queue *q = rqos->q;
struct blkcg_gq *blkg = bio->bi_blkg;
bool issue_as_root = bio_issue_as_root_blkg(bio);
if (!blk_iolatency_enabled(blkiolat))
return;
rcu_read_lock();
blkcg = bio_blkcg(bio);
bio_associate_blkcg(bio, &blkcg->css);
blkg = blkg_lookup(blkcg, q);
if (unlikely(!blkg)) {
if (!lock)
spin_lock_irq(q->queue_lock);
blkg = blkg_lookup_create(blkcg, q);
if (IS_ERR(blkg))
blkg = NULL;
if (!lock)
spin_unlock_irq(q->queue_lock);
}
if (!blkg)
goto out;
bio_issue_init(&bio->bi_issue, bio_sectors(bio));
bio_associate_blkg(bio, blkg);
out:
rcu_read_unlock();
while (blkg && blkg->parent) {
struct iolatency_grp *iolat = blkg_to_lat(blkg);
if (!iolat) {
@ -518,7 +471,7 @@ out:
}
check_scale_change(iolat);
__blkcg_iolatency_throttle(rqos, iolat, lock, issue_as_root,
__blkcg_iolatency_throttle(rqos, iolat, issue_as_root,
(bio->bi_opf & REQ_SWAP) == REQ_SWAP);
blkg = blkg->parent;
}
@ -640,7 +593,7 @@ static void blkcg_iolatency_done_bio(struct rq_qos *rqos, struct bio *bio)
bool enabled = false;
blkg = bio->bi_blkg;
if (!blkg)
if (!blkg || !bio_flagged(bio, BIO_TRACKED))
return;
iolat = blkg_to_lat(bio->bi_blkg);
@ -730,7 +683,7 @@ static void blkiolatency_timer_fn(struct timer_list *t)
* We could be exiting, don't access the pd unless we have a
* ref on the blkg.
*/
if (!blkg_try_get(blkg))
if (!blkg_tryget(blkg))
continue;
iolat = blkg_to_lat(blkg);

View File

@ -389,7 +389,6 @@ void blk_recount_segments(struct request_queue *q, struct bio *bio)
bio_set_flag(bio, BIO_SEG_VALID);
}
EXPORT_SYMBOL(blk_recount_segments);
static int blk_phys_contig_segment(struct request_queue *q, struct bio *bio,
struct bio *nxt)
@ -596,17 +595,6 @@ int ll_front_merge_fn(struct request_queue *q, struct request *req,
return ll_new_hw_segment(q, req, bio);
}
/*
* blk-mq uses req->special to carry normal driver per-request payload, it
* does not indicate a prepared command that we cannot merge with.
*/
static bool req_no_special_merge(struct request *req)
{
struct request_queue *q = req->q;
return !q->mq_ops && req->special;
}
static bool req_attempt_discard_merge(struct request_queue *q, struct request *req,
struct request *next)
{
@ -632,13 +620,6 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
unsigned int seg_size =
req->biotail->bi_seg_back_size + next->bio->bi_seg_front_size;
/*
* First check if the either of the requests are re-queued
* requests. Can't merge them if they are.
*/
if (req_no_special_merge(req) || req_no_special_merge(next))
return 0;
if (req_gap_back_merge(req, next->bio))
return 0;
@ -703,12 +684,10 @@ static void blk_account_io_merge(struct request *req)
{
if (blk_do_io_stat(req)) {
struct hd_struct *part;
int cpu;
cpu = part_stat_lock();
part_stat_lock();
part = req->part;
part_round_stats(req->q, cpu, part);
part_dec_in_flight(req->q, part, rq_data_dir(req));
hd_struct_put(part);
@ -731,7 +710,8 @@ static inline bool blk_discard_mergable(struct request *req)
return false;
}
enum elv_merge blk_try_req_merge(struct request *req, struct request *next)
static enum elv_merge blk_try_req_merge(struct request *req,
struct request *next)
{
if (blk_discard_mergable(req))
return ELEVATOR_DISCARD_MERGE;
@ -748,9 +728,6 @@ enum elv_merge blk_try_req_merge(struct request *req, struct request *next)
static struct request *attempt_merge(struct request_queue *q,
struct request *req, struct request *next)
{
if (!q->mq_ops)
lockdep_assert_held(q->queue_lock);
if (!rq_mergeable(req) || !rq_mergeable(next))
return NULL;
@ -758,8 +735,7 @@ static struct request *attempt_merge(struct request_queue *q,
return NULL;
if (rq_data_dir(req) != rq_data_dir(next)
|| req->rq_disk != next->rq_disk
|| req_no_special_merge(next))
|| req->rq_disk != next->rq_disk)
return NULL;
if (req_op(req) == REQ_OP_WRITE_SAME &&
@ -773,6 +749,9 @@ static struct request *attempt_merge(struct request_queue *q,
if (req->write_hint != next->write_hint)
return NULL;
if (req->ioprio != next->ioprio)
return NULL;
/*
* If we are allowed to merge, then append bio list
* from next to rq and release next. merge_requests_fn
@ -828,10 +807,6 @@ static struct request *attempt_merge(struct request_queue *q,
*/
blk_account_io_merge(next);
req->ioprio = ioprio_best(req->ioprio, next->ioprio);
if (blk_rq_cpu_valid(next))
req->cpu = next->cpu;
/*
* ownership of bio passed from next to req, return 'next' for
* the caller to free
@ -863,16 +838,11 @@ struct request *attempt_front_merge(struct request_queue *q, struct request *rq)
int blk_attempt_req_merge(struct request_queue *q, struct request *rq,
struct request *next)
{
struct elevator_queue *e = q->elevator;
struct request *free;
if (!e->uses_mq && e->type->ops.sq.elevator_allow_rq_merge_fn)
if (!e->type->ops.sq.elevator_allow_rq_merge_fn(q, rq, next))
return 0;
free = attempt_merge(q, rq, next);
if (free) {
__blk_put_request(q, free);
blk_put_request(free);
return 1;
}
@ -891,8 +861,8 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
if (bio_data_dir(bio) != rq_data_dir(rq))
return false;
/* must be same device and not a special request */
if (rq->rq_disk != bio->bi_disk || req_no_special_merge(rq))
/* must be same device */
if (rq->rq_disk != bio->bi_disk)
return false;
/* only merge integrity protected bio into ditto rq */
@ -911,6 +881,9 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
if (rq->write_hint != bio->bi_write_hint)
return false;
if (rq->ioprio != bio_prio(bio))
return false;
return true;
}

View File

@ -14,9 +14,10 @@
#include "blk.h"
#include "blk-mq.h"
static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
static int cpu_to_queue_index(struct blk_mq_queue_map *qmap,
unsigned int nr_queues, const int cpu)
{
return cpu % nr_queues;
return qmap->queue_offset + (cpu % nr_queues);
}
static int get_first_sibling(unsigned int cpu)
@ -30,10 +31,10 @@ static int get_first_sibling(unsigned int cpu)
return cpu;
}
int blk_mq_map_queues(struct blk_mq_tag_set *set)
int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
{
unsigned int *map = set->mq_map;
unsigned int nr_queues = set->nr_hw_queues;
unsigned int *map = qmap->mq_map;
unsigned int nr_queues = qmap->nr_queues;
unsigned int cpu, first_sibling;
for_each_possible_cpu(cpu) {
@ -44,11 +45,11 @@ int blk_mq_map_queues(struct blk_mq_tag_set *set)
* performace optimizations.
*/
if (cpu < nr_queues) {
map[cpu] = cpu_to_queue_index(nr_queues, cpu);
map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
} else {
first_sibling = get_first_sibling(cpu);
if (first_sibling == cpu)
map[cpu] = cpu_to_queue_index(nr_queues, cpu);
map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
else
map[cpu] = map[first_sibling];
}
@ -62,12 +63,12 @@ EXPORT_SYMBOL_GPL(blk_mq_map_queues);
* We have no quick way of doing reverse lookups. This is only used at
* queue init time, so runtime isn't important.
*/
int blk_mq_hw_queue_to_node(unsigned int *mq_map, unsigned int index)
int blk_mq_hw_queue_to_node(struct blk_mq_queue_map *qmap, unsigned int index)
{
int i;
for_each_possible_cpu(i) {
if (index == mq_map[i])
if (index == qmap->mq_map[i])
return local_memory_node(cpu_to_node(i));
}

View File

@ -23,6 +23,7 @@
#include "blk-mq.h"
#include "blk-mq-debugfs.h"
#include "blk-mq-tag.h"
#include "blk-rq-qos.h"
static void print_stat(struct seq_file *m, struct blk_rq_stat *stat)
{
@ -112,10 +113,8 @@ static int queue_pm_only_show(void *data, struct seq_file *m)
#define QUEUE_FLAG_NAME(name) [QUEUE_FLAG_##name] = #name
static const char *const blk_queue_flag_name[] = {
QUEUE_FLAG_NAME(QUEUED),
QUEUE_FLAG_NAME(STOPPED),
QUEUE_FLAG_NAME(DYING),
QUEUE_FLAG_NAME(BYPASS),
QUEUE_FLAG_NAME(BIDI),
QUEUE_FLAG_NAME(NOMERGES),
QUEUE_FLAG_NAME(SAME_COMP),
@ -318,7 +317,6 @@ static const char *const cmd_flag_name[] = {
static const char *const rqf_name[] = {
RQF_NAME(SORTED),
RQF_NAME(STARTED),
RQF_NAME(QUEUED),
RQF_NAME(SOFTBARRIER),
RQF_NAME(FLUSH_SEQ),
RQF_NAME(MIXED_MERGE),
@ -424,15 +422,18 @@ struct show_busy_params {
/*
* Note: the state of a request may change while this function is in progress,
* e.g. due to a concurrent blk_mq_finish_request() call.
* e.g. due to a concurrent blk_mq_finish_request() call. Returns true to
* keep iterating requests.
*/
static void hctx_show_busy_rq(struct request *rq, void *data, bool reserved)
static bool hctx_show_busy_rq(struct request *rq, void *data, bool reserved)
{
const struct show_busy_params *params = data;
if (blk_mq_map_queue(rq->q, rq->mq_ctx->cpu) == params->hctx)
if (rq->mq_hctx == params->hctx)
__blk_mq_debugfs_rq_show(params->m,
list_entry_rq(&rq->queuelist));
return true;
}
static int hctx_busy_show(void *data, struct seq_file *m)
@ -446,6 +447,21 @@ static int hctx_busy_show(void *data, struct seq_file *m)
return 0;
}
static const char *const hctx_types[] = {
[HCTX_TYPE_DEFAULT] = "default",
[HCTX_TYPE_READ] = "read",
[HCTX_TYPE_POLL] = "poll",
};
static int hctx_type_show(void *data, struct seq_file *m)
{
struct blk_mq_hw_ctx *hctx = data;
BUILD_BUG_ON(ARRAY_SIZE(hctx_types) != HCTX_MAX_TYPES);
seq_printf(m, "%s\n", hctx_types[hctx->type]);
return 0;
}
static int hctx_ctx_map_show(void *data, struct seq_file *m)
{
struct blk_mq_hw_ctx *hctx = data;
@ -636,36 +652,43 @@ static int hctx_dispatch_busy_show(void *data, struct seq_file *m)
return 0;
}
static void *ctx_rq_list_start(struct seq_file *m, loff_t *pos)
__acquires(&ctx->lock)
{
struct blk_mq_ctx *ctx = m->private;
spin_lock(&ctx->lock);
return seq_list_start(&ctx->rq_list, *pos);
#define CTX_RQ_SEQ_OPS(name, type) \
static void *ctx_##name##_rq_list_start(struct seq_file *m, loff_t *pos) \
__acquires(&ctx->lock) \
{ \
struct blk_mq_ctx *ctx = m->private; \
\
spin_lock(&ctx->lock); \
return seq_list_start(&ctx->rq_lists[type], *pos); \
} \
\
static void *ctx_##name##_rq_list_next(struct seq_file *m, void *v, \
loff_t *pos) \
{ \
struct blk_mq_ctx *ctx = m->private; \
\
return seq_list_next(v, &ctx->rq_lists[type], pos); \
} \
\
static void ctx_##name##_rq_list_stop(struct seq_file *m, void *v) \
__releases(&ctx->lock) \
{ \
struct blk_mq_ctx *ctx = m->private; \
\
spin_unlock(&ctx->lock); \
} \
\
static const struct seq_operations ctx_##name##_rq_list_seq_ops = { \
.start = ctx_##name##_rq_list_start, \
.next = ctx_##name##_rq_list_next, \
.stop = ctx_##name##_rq_list_stop, \
.show = blk_mq_debugfs_rq_show, \
}
static void *ctx_rq_list_next(struct seq_file *m, void *v, loff_t *pos)
{
struct blk_mq_ctx *ctx = m->private;
CTX_RQ_SEQ_OPS(default, HCTX_TYPE_DEFAULT);
CTX_RQ_SEQ_OPS(read, HCTX_TYPE_READ);
CTX_RQ_SEQ_OPS(poll, HCTX_TYPE_POLL);
return seq_list_next(v, &ctx->rq_list, pos);
}
static void ctx_rq_list_stop(struct seq_file *m, void *v)
__releases(&ctx->lock)
{
struct blk_mq_ctx *ctx = m->private;
spin_unlock(&ctx->lock);
}
static const struct seq_operations ctx_rq_list_seq_ops = {
.start = ctx_rq_list_start,
.next = ctx_rq_list_next,
.stop = ctx_rq_list_stop,
.show = blk_mq_debugfs_rq_show,
};
static int ctx_dispatched_show(void *data, struct seq_file *m)
{
struct blk_mq_ctx *ctx = data;
@ -798,11 +821,14 @@ static const struct blk_mq_debugfs_attr blk_mq_debugfs_hctx_attrs[] = {
{"run", 0600, hctx_run_show, hctx_run_write},
{"active", 0400, hctx_active_show},
{"dispatch_busy", 0400, hctx_dispatch_busy_show},
{"type", 0400, hctx_type_show},
{},
};
static const struct blk_mq_debugfs_attr blk_mq_debugfs_ctx_attrs[] = {
{"rq_list", 0400, .seq_ops = &ctx_rq_list_seq_ops},
{"default_rq_list", 0400, .seq_ops = &ctx_default_rq_list_seq_ops},
{"read_rq_list", 0400, .seq_ops = &ctx_read_rq_list_seq_ops},
{"poll_rq_list", 0400, .seq_ops = &ctx_poll_rq_list_seq_ops},
{"dispatched", 0600, ctx_dispatched_show, ctx_dispatched_write},
{"merged", 0600, ctx_merged_show, ctx_merged_write},
{"completed", 0600, ctx_completed_show, ctx_completed_write},
@ -856,6 +882,15 @@ int blk_mq_debugfs_register(struct request_queue *q)
goto err;
}
if (q->rq_qos) {
struct rq_qos *rqos = q->rq_qos;
while (rqos) {
blk_mq_debugfs_register_rqos(rqos);
rqos = rqos->next;
}
}
return 0;
err:
@ -978,6 +1013,50 @@ void blk_mq_debugfs_unregister_sched(struct request_queue *q)
q->sched_debugfs_dir = NULL;
}
void blk_mq_debugfs_unregister_rqos(struct rq_qos *rqos)
{
debugfs_remove_recursive(rqos->debugfs_dir);
rqos->debugfs_dir = NULL;
}
int blk_mq_debugfs_register_rqos(struct rq_qos *rqos)
{
struct request_queue *q = rqos->q;
const char *dir_name = rq_qos_id_to_name(rqos->id);
if (!q->debugfs_dir)
return -ENOENT;
if (rqos->debugfs_dir || !rqos->ops->debugfs_attrs)
return 0;
if (!q->rqos_debugfs_dir) {
q->rqos_debugfs_dir = debugfs_create_dir("rqos",
q->debugfs_dir);
if (!q->rqos_debugfs_dir)
return -ENOMEM;
}
rqos->debugfs_dir = debugfs_create_dir(dir_name,
rqos->q->rqos_debugfs_dir);
if (!rqos->debugfs_dir)
return -ENOMEM;
if (!debugfs_create_files(rqos->debugfs_dir, rqos,
rqos->ops->debugfs_attrs))
goto err;
return 0;
err:
blk_mq_debugfs_unregister_rqos(rqos);
return -ENOMEM;
}
void blk_mq_debugfs_unregister_queue_rqos(struct request_queue *q)
{
debugfs_remove_recursive(q->rqos_debugfs_dir);
q->rqos_debugfs_dir = NULL;
}
int blk_mq_debugfs_register_sched_hctx(struct request_queue *q,
struct blk_mq_hw_ctx *hctx)
{

View File

@ -31,6 +31,10 @@ void blk_mq_debugfs_unregister_sched(struct request_queue *q);
int blk_mq_debugfs_register_sched_hctx(struct request_queue *q,
struct blk_mq_hw_ctx *hctx);
void blk_mq_debugfs_unregister_sched_hctx(struct blk_mq_hw_ctx *hctx);
int blk_mq_debugfs_register_rqos(struct rq_qos *rqos);
void blk_mq_debugfs_unregister_rqos(struct rq_qos *rqos);
void blk_mq_debugfs_unregister_queue_rqos(struct request_queue *q);
#else
static inline int blk_mq_debugfs_register(struct request_queue *q)
{
@ -78,6 +82,19 @@ static inline int blk_mq_debugfs_register_sched_hctx(struct request_queue *q,
static inline void blk_mq_debugfs_unregister_sched_hctx(struct blk_mq_hw_ctx *hctx)
{
}
static inline int blk_mq_debugfs_register_rqos(struct rq_qos *rqos)
{
return 0;
}
static inline void blk_mq_debugfs_unregister_rqos(struct rq_qos *rqos)
{
}
static inline void blk_mq_debugfs_unregister_queue_rqos(struct request_queue *q)
{
}
#endif
#ifdef CONFIG_BLK_DEBUG_FS_ZONED

View File

@ -31,26 +31,26 @@
* that maps a queue to the CPUs that have irq affinity for the corresponding
* vector.
*/
int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev,
int blk_mq_pci_map_queues(struct blk_mq_queue_map *qmap, struct pci_dev *pdev,
int offset)
{
const struct cpumask *mask;
unsigned int queue, cpu;
for (queue = 0; queue < set->nr_hw_queues; queue++) {
for (queue = 0; queue < qmap->nr_queues; queue++) {
mask = pci_irq_get_affinity(pdev, queue + offset);
if (!mask)
goto fallback;
for_each_cpu(cpu, mask)
set->mq_map[cpu] = queue;
qmap->mq_map[cpu] = qmap->queue_offset + queue;
}
return 0;
fallback:
WARN_ON_ONCE(set->nr_hw_queues > 1);
blk_mq_clear_mq_map(set);
WARN_ON_ONCE(qmap->nr_queues > 1);
blk_mq_clear_mq_map(qmap);
return 0;
}
EXPORT_SYMBOL_GPL(blk_mq_pci_map_queues);

View File

@ -29,24 +29,24 @@
* @set->nr_hw_queues, or @dev does not provide an affinity mask for a
* vector, we fallback to the naive mapping.
*/
int blk_mq_rdma_map_queues(struct blk_mq_tag_set *set,
int blk_mq_rdma_map_queues(struct blk_mq_queue_map *map,
struct ib_device *dev, int first_vec)
{
const struct cpumask *mask;
unsigned int queue, cpu;
for (queue = 0; queue < set->nr_hw_queues; queue++) {
for (queue = 0; queue < map->nr_queues; queue++) {
mask = ib_get_vector_affinity(dev, first_vec + queue);
if (!mask)
goto fallback;
for_each_cpu(cpu, mask)
set->mq_map[cpu] = queue;
map->mq_map[cpu] = map->queue_offset + queue;
}
return 0;
fallback:
return blk_mq_map_queues(set);
return blk_mq_map_queues(map);
}
EXPORT_SYMBOL_GPL(blk_mq_rdma_map_queues);

View File

@ -31,15 +31,22 @@ void blk_mq_sched_free_hctx_data(struct request_queue *q,
}
EXPORT_SYMBOL_GPL(blk_mq_sched_free_hctx_data);
void blk_mq_sched_assign_ioc(struct request *rq, struct bio *bio)
void blk_mq_sched_assign_ioc(struct request *rq)
{
struct request_queue *q = rq->q;
struct io_context *ioc = rq_ioc(bio);
struct io_context *ioc;
struct io_cq *icq;
spin_lock_irq(q->queue_lock);
/*
* May not have an IO context if it's a passthrough request
*/
ioc = current->io_context;
if (!ioc)
return;
spin_lock_irq(&q->queue_lock);
icq = ioc_lookup_icq(ioc, q);
spin_unlock_irq(q->queue_lock);
spin_unlock_irq(&q->queue_lock);
if (!icq) {
icq = ioc_create_icq(ioc, q, GFP_ATOMIC);
@ -54,13 +61,14 @@ void blk_mq_sched_assign_ioc(struct request *rq, struct bio *bio)
* Mark a hardware queue as needing a restart. For shared queues, maintain
* a count of how many hardware queues are marked for restart.
*/
static void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx)
void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx)
{
if (test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state))
return;
set_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
}
EXPORT_SYMBOL_GPL(blk_mq_sched_mark_restart_hctx);
void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
{
@ -85,14 +93,13 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
do {
struct request *rq;
if (e->type->ops.mq.has_work &&
!e->type->ops.mq.has_work(hctx))
if (e->type->ops.has_work && !e->type->ops.has_work(hctx))
break;
if (!blk_mq_get_dispatch_budget(hctx))
break;
rq = e->type->ops.mq.dispatch_request(hctx);
rq = e->type->ops.dispatch_request(hctx);
if (!rq) {
blk_mq_put_dispatch_budget(hctx);
break;
@ -110,7 +117,7 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
struct blk_mq_ctx *ctx)
{
unsigned idx = ctx->index_hw;
unsigned short idx = ctx->index_hw[hctx->type];
if (++idx == hctx->nr_ctx)
idx = 0;
@ -163,7 +170,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
{
struct request_queue *q = hctx->queue;
struct elevator_queue *e = q->elevator;
const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
const bool has_sched_dispatch = e && e->type->ops.dispatch_request;
LIST_HEAD(rq_list);
/* RCU or SRCU read lock is needed before checking quiesced flag */
@ -295,11 +302,14 @@ EXPORT_SYMBOL_GPL(blk_mq_bio_list_merge);
* too much time checking for merges.
*/
static bool blk_mq_attempt_merge(struct request_queue *q,
struct blk_mq_hw_ctx *hctx,
struct blk_mq_ctx *ctx, struct bio *bio)
{
enum hctx_type type = hctx->type;
lockdep_assert_held(&ctx->lock);
if (blk_mq_bio_list_merge(q, &ctx->rq_list, bio)) {
if (blk_mq_bio_list_merge(q, &ctx->rq_lists[type], bio)) {
ctx->rq_merged++;
return true;
}
@ -311,19 +321,21 @@ bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio)
{
struct elevator_queue *e = q->elevator;
struct blk_mq_ctx *ctx = blk_mq_get_ctx(q);
struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, bio->bi_opf, ctx->cpu);
bool ret = false;
enum hctx_type type;
if (e && e->type->ops.mq.bio_merge) {
if (e && e->type->ops.bio_merge) {
blk_mq_put_ctx(ctx);
return e->type->ops.mq.bio_merge(hctx, bio);
return e->type->ops.bio_merge(hctx, bio);
}
type = hctx->type;
if ((hctx->flags & BLK_MQ_F_SHOULD_MERGE) &&
!list_empty_careful(&ctx->rq_list)) {
!list_empty_careful(&ctx->rq_lists[type])) {
/* default per sw-queue merge */
spin_lock(&ctx->lock);
ret = blk_mq_attempt_merge(q, ctx, bio);
ret = blk_mq_attempt_merge(q, hctx, ctx, bio);
spin_unlock(&ctx->lock);
}
@ -367,7 +379,7 @@ void blk_mq_sched_insert_request(struct request *rq, bool at_head,
struct request_queue *q = rq->q;
struct elevator_queue *e = q->elevator;
struct blk_mq_ctx *ctx = rq->mq_ctx;
struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
/* flush rq in flush machinery need to be dispatched directly */
if (!(rq->rq_flags & RQF_FLUSH_SEQ) && op_is_flush(rq->cmd_flags)) {
@ -380,11 +392,11 @@ void blk_mq_sched_insert_request(struct request *rq, bool at_head,
if (blk_mq_sched_bypass_insert(hctx, !!e, rq))
goto run;
if (e && e->type->ops.mq.insert_requests) {
if (e && e->type->ops.insert_requests) {
LIST_HEAD(list);
list_add(&rq->queuelist, &list);
e->type->ops.mq.insert_requests(hctx, &list, at_head);
e->type->ops.insert_requests(hctx, &list, at_head);
} else {
spin_lock(&ctx->lock);
__blk_mq_insert_request(hctx, rq, at_head);
@ -396,27 +408,25 @@ run:
blk_mq_run_hw_queue(hctx, async);
}
void blk_mq_sched_insert_requests(struct request_queue *q,
void blk_mq_sched_insert_requests(struct blk_mq_hw_ctx *hctx,
struct blk_mq_ctx *ctx,
struct list_head *list, bool run_queue_async)
{
struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
struct elevator_queue *e = hctx->queue->elevator;
struct elevator_queue *e;
if (e && e->type->ops.mq.insert_requests)
e->type->ops.mq.insert_requests(hctx, list, false);
e = hctx->queue->elevator;
if (e && e->type->ops.insert_requests)
e->type->ops.insert_requests(hctx, list, false);
else {
/*
* try to issue requests directly if the hw queue isn't
* busy in case of 'none' scheduler, and this way may save
* us one extra enqueue & dequeue to sw queue.
*/
if (!hctx->dispatch_busy && !e && !run_queue_async) {
if (!hctx->dispatch_busy && !e && !run_queue_async)
blk_mq_try_issue_list_directly(hctx, list);
if (list_empty(list))
return;
}
blk_mq_insert_requests(hctx, ctx, list);
else
blk_mq_insert_requests(hctx, ctx, list);
}
blk_mq_run_hw_queue(hctx, run_queue_async);
@ -489,15 +499,15 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
goto err;
}
ret = e->ops.mq.init_sched(q, e);
ret = e->ops.init_sched(q, e);
if (ret)
goto err;
blk_mq_debugfs_register_sched(q);
queue_for_each_hw_ctx(q, hctx, i) {
if (e->ops.mq.init_hctx) {
ret = e->ops.mq.init_hctx(hctx, i);
if (e->ops.init_hctx) {
ret = e->ops.init_hctx(hctx, i);
if (ret) {
eq = q->elevator;
blk_mq_exit_sched(q, eq);
@ -523,14 +533,14 @@ void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e)
queue_for_each_hw_ctx(q, hctx, i) {
blk_mq_debugfs_unregister_sched_hctx(hctx);
if (e->type->ops.mq.exit_hctx && hctx->sched_data) {
e->type->ops.mq.exit_hctx(hctx, i);
if (e->type->ops.exit_hctx && hctx->sched_data) {
e->type->ops.exit_hctx(hctx, i);
hctx->sched_data = NULL;
}
}
blk_mq_debugfs_unregister_sched(q);
if (e->type->ops.mq.exit_sched)
e->type->ops.mq.exit_sched(e);
if (e->type->ops.exit_sched)
e->type->ops.exit_sched(e);
blk_mq_sched_tags_teardown(q);
q->elevator = NULL;
}

View File

@ -8,18 +8,19 @@
void blk_mq_sched_free_hctx_data(struct request_queue *q,
void (*exit)(struct blk_mq_hw_ctx *));
void blk_mq_sched_assign_ioc(struct request *rq, struct bio *bio);
void blk_mq_sched_assign_ioc(struct request *rq);
void blk_mq_sched_request_inserted(struct request *rq);
bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
struct request **merged_request);
bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio);
bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq);
void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx);
void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx);
void blk_mq_sched_insert_request(struct request *rq, bool at_head,
bool run_queue, bool async);
void blk_mq_sched_insert_requests(struct request_queue *q,
void blk_mq_sched_insert_requests(struct blk_mq_hw_ctx *hctx,
struct blk_mq_ctx *ctx,
struct list_head *list, bool run_queue_async);
@ -43,8 +44,8 @@ blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
{
struct elevator_queue *e = q->elevator;
if (e && e->type->ops.mq.allow_merge)
return e->type->ops.mq.allow_merge(q, rq, bio);
if (e && e->type->ops.allow_merge)
return e->type->ops.allow_merge(q, rq, bio);
return true;
}
@ -53,8 +54,8 @@ static inline void blk_mq_sched_completed_request(struct request *rq, u64 now)
{
struct elevator_queue *e = rq->q->elevator;
if (e && e->type->ops.mq.completed_request)
e->type->ops.mq.completed_request(rq, now);
if (e && e->type->ops.completed_request)
e->type->ops.completed_request(rq, now);
}
static inline void blk_mq_sched_started_request(struct request *rq)
@ -62,8 +63,8 @@ static inline void blk_mq_sched_started_request(struct request *rq)
struct request_queue *q = rq->q;
struct elevator_queue *e = q->elevator;
if (e && e->type->ops.mq.started_request)
e->type->ops.mq.started_request(rq);
if (e && e->type->ops.started_request)
e->type->ops.started_request(rq);
}
static inline void blk_mq_sched_requeue_request(struct request *rq)
@ -71,16 +72,16 @@ static inline void blk_mq_sched_requeue_request(struct request *rq)
struct request_queue *q = rq->q;
struct elevator_queue *e = q->elevator;
if (e && e->type->ops.mq.requeue_request)
e->type->ops.mq.requeue_request(rq);
if (e && e->type->ops.requeue_request)
e->type->ops.requeue_request(rq);
}
static inline bool blk_mq_sched_has_work(struct blk_mq_hw_ctx *hctx)
{
struct elevator_queue *e = hctx->queue->elevator;
if (e && e->type->ops.mq.has_work)
return e->type->ops.mq.has_work(hctx);
if (e && e->type->ops.has_work)
return e->type->ops.has_work(hctx);
return false;
}

View File

@ -15,6 +15,18 @@
static void blk_mq_sysfs_release(struct kobject *kobj)
{
struct blk_mq_ctxs *ctxs = container_of(kobj, struct blk_mq_ctxs, kobj);
free_percpu(ctxs->queue_ctx);
kfree(ctxs);
}
static void blk_mq_ctx_sysfs_release(struct kobject *kobj)
{
struct blk_mq_ctx *ctx = container_of(kobj, struct blk_mq_ctx, kobj);
/* ctx->ctxs won't be released until all ctx are freed */
kobject_put(&ctx->ctxs->kobj);
}
static void blk_mq_hw_sysfs_release(struct kobject *kobj)
@ -203,7 +215,7 @@ static struct kobj_type blk_mq_ktype = {
static struct kobj_type blk_mq_ctx_ktype = {
.sysfs_ops = &blk_mq_sysfs_ops,
.default_attrs = default_ctx_attrs,
.release = blk_mq_sysfs_release,
.release = blk_mq_ctx_sysfs_release,
};
static struct kobj_type blk_mq_hw_ktype = {
@ -235,7 +247,7 @@ static int blk_mq_register_hctx(struct blk_mq_hw_ctx *hctx)
if (!hctx->nr_ctx)
return 0;
ret = kobject_add(&hctx->kobj, &q->mq_kobj, "%u", hctx->queue_num);
ret = kobject_add(&hctx->kobj, q->mq_kobj, "%u", hctx->queue_num);
if (ret)
return ret;
@ -258,8 +270,8 @@ void blk_mq_unregister_dev(struct device *dev, struct request_queue *q)
queue_for_each_hw_ctx(q, hctx, i)
blk_mq_unregister_hctx(hctx);
kobject_uevent(&q->mq_kobj, KOBJ_REMOVE);
kobject_del(&q->mq_kobj);
kobject_uevent(q->mq_kobj, KOBJ_REMOVE);
kobject_del(q->mq_kobj);
kobject_put(&dev->kobj);
q->mq_sysfs_init_done = false;
@ -279,7 +291,7 @@ void blk_mq_sysfs_deinit(struct request_queue *q)
ctx = per_cpu_ptr(q->queue_ctx, cpu);
kobject_put(&ctx->kobj);
}
kobject_put(&q->mq_kobj);
kobject_put(q->mq_kobj);
}
void blk_mq_sysfs_init(struct request_queue *q)
@ -287,10 +299,12 @@ void blk_mq_sysfs_init(struct request_queue *q)
struct blk_mq_ctx *ctx;
int cpu;
kobject_init(&q->mq_kobj, &blk_mq_ktype);
kobject_init(q->mq_kobj, &blk_mq_ktype);
for_each_possible_cpu(cpu) {
ctx = per_cpu_ptr(q->queue_ctx, cpu);
kobject_get(q->mq_kobj);
kobject_init(&ctx->kobj, &blk_mq_ctx_ktype);
}
}
@ -303,11 +317,11 @@ int __blk_mq_register_dev(struct device *dev, struct request_queue *q)
WARN_ON_ONCE(!q->kobj.parent);
lockdep_assert_held(&q->sysfs_lock);
ret = kobject_add(&q->mq_kobj, kobject_get(&dev->kobj), "%s", "mq");
ret = kobject_add(q->mq_kobj, kobject_get(&dev->kobj), "%s", "mq");
if (ret < 0)
goto out;
kobject_uevent(&q->mq_kobj, KOBJ_ADD);
kobject_uevent(q->mq_kobj, KOBJ_ADD);
queue_for_each_hw_ctx(q, hctx, i) {
ret = blk_mq_register_hctx(hctx);
@ -324,8 +338,8 @@ unreg:
while (--i >= 0)
blk_mq_unregister_hctx(q->queue_hw_ctx[i]);
kobject_uevent(&q->mq_kobj, KOBJ_REMOVE);
kobject_del(&q->mq_kobj);
kobject_uevent(q->mq_kobj, KOBJ_REMOVE);
kobject_del(q->mq_kobj);
kobject_put(&dev->kobj);
return ret;
}
@ -340,7 +354,6 @@ int blk_mq_register_dev(struct device *dev, struct request_queue *q)
return ret;
}
EXPORT_SYMBOL_GPL(blk_mq_register_dev);
void blk_mq_sysfs_unregister(struct request_queue *q)
{

View File

@ -110,7 +110,7 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
struct blk_mq_tags *tags = blk_mq_tags_from_data(data);
struct sbitmap_queue *bt;
struct sbq_wait_state *ws;
DEFINE_WAIT(wait);
DEFINE_SBQ_WAIT(wait);
unsigned int tag_offset;
bool drop_ctx;
int tag;
@ -154,8 +154,7 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
if (tag != -1)
break;
prepare_to_wait_exclusive(&ws->wait, &wait,
TASK_UNINTERRUPTIBLE);
sbitmap_prepare_to_wait(bt, ws, &wait, TASK_UNINTERRUPTIBLE);
tag = __blk_mq_get_tag(data, bt);
if (tag != -1)
@ -167,16 +166,17 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
bt_prev = bt;
io_schedule();
sbitmap_finish_wait(bt, ws, &wait);
data->ctx = blk_mq_get_ctx(data->q);
data->hctx = blk_mq_map_queue(data->q, data->ctx->cpu);
data->hctx = blk_mq_map_queue(data->q, data->cmd_flags,
data->ctx->cpu);
tags = blk_mq_tags_from_data(data);
if (data->flags & BLK_MQ_REQ_RESERVED)
bt = &tags->breserved_tags;
else
bt = &tags->bitmap_tags;
finish_wait(&ws->wait, &wait);
/*
* If destination hw queue is changed, fake wake up on
* previous queue for compensating the wake up miss, so
@ -191,7 +191,7 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
if (drop_ctx && data->ctx)
blk_mq_put_ctx(data->ctx);
finish_wait(&ws->wait, &wait);
sbitmap_finish_wait(bt, ws, &wait);
found_tag:
return tag + tag_offset;
@ -235,7 +235,7 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
* test and set the bit before assigning ->rqs[].
*/
if (rq && rq->q == hctx->queue)
iter_data->fn(hctx, rq, iter_data->data, reserved);
return iter_data->fn(hctx, rq, iter_data->data, reserved);
return true;
}
@ -247,7 +247,8 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
* @fn: Pointer to the function that will be called for each request
* associated with @hctx that has been assigned a driver tag.
* @fn will be called as follows: @fn(@hctx, rq, @data, @reserved)
* where rq is a pointer to a request.
* where rq is a pointer to a request. Return true to continue
* iterating tags, false to stop.
* @data: Will be passed as third argument to @fn.
* @reserved: Indicates whether @bt is the breserved_tags member or the
* bitmap_tags member of struct blk_mq_tags.
@ -288,7 +289,7 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
*/
rq = tags->rqs[bitnr];
if (rq && blk_mq_request_started(rq))
iter_data->fn(rq, iter_data->data, reserved);
return iter_data->fn(rq, iter_data->data, reserved);
return true;
}
@ -300,7 +301,8 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
* or the bitmap_tags member of struct blk_mq_tags.
* @fn: Pointer to the function that will be called for each started
* request. @fn will be called as follows: @fn(rq, @data,
* @reserved) where rq is a pointer to a request.
* @reserved) where rq is a pointer to a request. Return true
* to continue iterating tags, false to stop.
* @data: Will be passed as second argument to @fn.
* @reserved: Indicates whether @bt is the breserved_tags member or the
* bitmap_tags member of struct blk_mq_tags.
@ -325,7 +327,8 @@ static void bt_tags_for_each(struct blk_mq_tags *tags, struct sbitmap_queue *bt,
* @fn: Pointer to the function that will be called for each started
* request. @fn will be called as follows: @fn(rq, @priv,
* reserved) where rq is a pointer to a request. 'reserved'
* indicates whether or not @rq is a reserved request.
* indicates whether or not @rq is a reserved request. Return
* true to continue iterating tags, false to stop.
* @priv: Will be passed as second argument to @fn.
*/
static void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags,
@ -342,7 +345,8 @@ static void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags,
* @fn: Pointer to the function that will be called for each started
* request. @fn will be called as follows: @fn(rq, @priv,
* reserved) where rq is a pointer to a request. 'reserved'
* indicates whether or not @rq is a reserved request.
* indicates whether or not @rq is a reserved request. Return
* true to continue iterating tags, false to stop.
* @priv: Will be passed as second argument to @fn.
*/
void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
@ -526,16 +530,7 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
*/
u32 blk_mq_unique_tag(struct request *rq)
{
struct request_queue *q = rq->q;
struct blk_mq_hw_ctx *hctx;
int hwq = 0;
if (q->mq_ops) {
hctx = blk_mq_map_queue(q, rq->mq_ctx->cpu);
hwq = hctx->queue_num;
}
return (hwq << BLK_MQ_UNIQUE_TAG_BITS) |
return (rq->mq_hctx->queue_num << BLK_MQ_UNIQUE_TAG_BITS) |
(rq->tag & BLK_MQ_UNIQUE_TAG_MASK);
}
EXPORT_SYMBOL(blk_mq_unique_tag);

View File

@ -29,7 +29,7 @@
* that maps a queue to the CPUs that have irq affinity for the corresponding
* vector.
*/
int blk_mq_virtio_map_queues(struct blk_mq_tag_set *set,
int blk_mq_virtio_map_queues(struct blk_mq_queue_map *qmap,
struct virtio_device *vdev, int first_vec)
{
const struct cpumask *mask;
@ -38,17 +38,17 @@ int blk_mq_virtio_map_queues(struct blk_mq_tag_set *set,
if (!vdev->config->get_vq_affinity)
goto fallback;
for (queue = 0; queue < set->nr_hw_queues; queue++) {
for (queue = 0; queue < qmap->nr_queues; queue++) {
mask = vdev->config->get_vq_affinity(vdev, first_vec + queue);
if (!mask)
goto fallback;
for_each_cpu(cpu, mask)
set->mq_map[cpu] = queue;
qmap->mq_map[cpu] = qmap->queue_offset + queue;
}
return 0;
fallback:
return blk_mq_map_queues(set);
return blk_mq_map_queues(qmap);
}
EXPORT_SYMBOL_GPL(blk_mq_virtio_map_queues);

File diff suppressed because it is too large Load Diff

View File

@ -7,17 +7,22 @@
struct blk_mq_tag_set;
struct blk_mq_ctxs {
struct kobject kobj;
struct blk_mq_ctx __percpu *queue_ctx;
};
/**
* struct blk_mq_ctx - State for a software queue facing the submitting CPUs
*/
struct blk_mq_ctx {
struct {
spinlock_t lock;
struct list_head rq_list;
} ____cacheline_aligned_in_smp;
struct list_head rq_lists[HCTX_MAX_TYPES];
} ____cacheline_aligned_in_smp;
unsigned int cpu;
unsigned int index_hw;
unsigned short index_hw[HCTX_MAX_TYPES];
/* incremented at dispatch time */
unsigned long rq_dispatched[2];
@ -27,6 +32,7 @@ struct blk_mq_ctx {
unsigned long ____cacheline_aligned_in_smp rq_completed[2];
struct request_queue *queue;
struct blk_mq_ctxs *ctxs;
struct kobject kobj;
} ____cacheline_aligned_in_smp;
@ -62,20 +68,55 @@ void blk_mq_request_bypass_insert(struct request *rq, bool run_queue);
void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
struct list_head *list);
/* Used by blk_insert_cloned_request() to issue request directly */
blk_status_t blk_mq_request_issue_directly(struct request *rq);
blk_status_t blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
struct request *rq,
blk_qc_t *cookie,
bool bypass, bool last);
void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
struct list_head *list);
/*
* CPU -> queue mappings
*/
extern int blk_mq_hw_queue_to_node(unsigned int *map, unsigned int);
extern int blk_mq_hw_queue_to_node(struct blk_mq_queue_map *qmap, unsigned int);
static inline struct blk_mq_hw_ctx *blk_mq_map_queue(struct request_queue *q,
int cpu)
/*
* blk_mq_map_queue_type() - map (hctx_type,cpu) to hardware queue
* @q: request queue
* @type: the hctx type index
* @cpu: CPU
*/
static inline struct blk_mq_hw_ctx *blk_mq_map_queue_type(struct request_queue *q,
enum hctx_type type,
unsigned int cpu)
{
return q->queue_hw_ctx[q->mq_map[cpu]];
return q->queue_hw_ctx[q->tag_set->map[type].mq_map[cpu]];
}
/*
* blk_mq_map_queue() - map (cmd_flags,type) to hardware queue
* @q: request queue
* @flags: request command flags
* @cpu: CPU
*/
static inline struct blk_mq_hw_ctx *blk_mq_map_queue(struct request_queue *q,
unsigned int flags,
unsigned int cpu)
{
enum hctx_type type = HCTX_TYPE_DEFAULT;
if ((flags & REQ_HIPRI) &&
q->tag_set->nr_maps > HCTX_TYPE_POLL &&
q->tag_set->map[HCTX_TYPE_POLL].nr_queues &&
test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
type = HCTX_TYPE_POLL;
else if (((flags & REQ_OP_MASK) == REQ_OP_READ) &&
q->tag_set->nr_maps > HCTX_TYPE_READ &&
q->tag_set->map[HCTX_TYPE_READ].nr_queues)
type = HCTX_TYPE_READ;
return blk_mq_map_queue_type(q, type, cpu);
}
/*
@ -126,6 +167,7 @@ struct blk_mq_alloc_data {
struct request_queue *q;
blk_mq_req_flags_t flags;
unsigned int shallow_depth;
unsigned int cmd_flags;
/* input & output parameter */
struct blk_mq_ctx *ctx;
@ -150,8 +192,7 @@ static inline bool blk_mq_hw_queue_mapped(struct blk_mq_hw_ctx *hctx)
return hctx->nr_ctx && hctx->tags;
}
void blk_mq_in_flight(struct request_queue *q, struct hd_struct *part,
unsigned int inflight[2]);
unsigned int blk_mq_in_flight(struct request_queue *q, struct hd_struct *part);
void blk_mq_in_flight_rw(struct request_queue *q, struct hd_struct *part,
unsigned int inflight[2]);
@ -195,21 +236,18 @@ static inline void blk_mq_put_driver_tag_hctx(struct blk_mq_hw_ctx *hctx,
static inline void blk_mq_put_driver_tag(struct request *rq)
{
struct blk_mq_hw_ctx *hctx;
if (rq->tag == -1 || rq->internal_tag == -1)
return;
hctx = blk_mq_map_queue(rq->q, rq->mq_ctx->cpu);
__blk_mq_put_driver_tag(hctx, rq);
__blk_mq_put_driver_tag(rq->mq_hctx, rq);
}
static inline void blk_mq_clear_mq_map(struct blk_mq_tag_set *set)
static inline void blk_mq_clear_mq_map(struct blk_mq_queue_map *qmap)
{
int cpu;
for_each_possible_cpu(cpu)
set->mq_map[cpu] = 0;
qmap->mq_map[cpu] = 0;
}
#endif

View File

@ -89,12 +89,12 @@ int blk_pre_runtime_suspend(struct request_queue *q)
/* Switch q_usage_counter back to per-cpu mode. */
blk_mq_unfreeze_queue(q);
spin_lock_irq(q->queue_lock);
spin_lock_irq(&q->queue_lock);
if (ret < 0)
pm_runtime_mark_last_busy(q->dev);
else
q->rpm_status = RPM_SUSPENDING;
spin_unlock_irq(q->queue_lock);
spin_unlock_irq(&q->queue_lock);
if (ret)
blk_clear_pm_only(q);
@ -121,14 +121,14 @@ void blk_post_runtime_suspend(struct request_queue *q, int err)
if (!q->dev)
return;
spin_lock_irq(q->queue_lock);
spin_lock_irq(&q->queue_lock);
if (!err) {
q->rpm_status = RPM_SUSPENDED;
} else {
q->rpm_status = RPM_ACTIVE;
pm_runtime_mark_last_busy(q->dev);
}
spin_unlock_irq(q->queue_lock);
spin_unlock_irq(&q->queue_lock);
if (err)
blk_clear_pm_only(q);
@ -151,9 +151,9 @@ void blk_pre_runtime_resume(struct request_queue *q)
if (!q->dev)
return;
spin_lock_irq(q->queue_lock);
spin_lock_irq(&q->queue_lock);
q->rpm_status = RPM_RESUMING;
spin_unlock_irq(q->queue_lock);
spin_unlock_irq(&q->queue_lock);
}
EXPORT_SYMBOL(blk_pre_runtime_resume);
@ -176,7 +176,7 @@ void blk_post_runtime_resume(struct request_queue *q, int err)
if (!q->dev)
return;
spin_lock_irq(q->queue_lock);
spin_lock_irq(&q->queue_lock);
if (!err) {
q->rpm_status = RPM_ACTIVE;
pm_runtime_mark_last_busy(q->dev);
@ -184,7 +184,7 @@ void blk_post_runtime_resume(struct request_queue *q, int err)
} else {
q->rpm_status = RPM_SUSPENDED;
}
spin_unlock_irq(q->queue_lock);
spin_unlock_irq(&q->queue_lock);
if (!err)
blk_clear_pm_only(q);
@ -207,10 +207,10 @@ EXPORT_SYMBOL(blk_post_runtime_resume);
*/
void blk_set_runtime_active(struct request_queue *q)
{
spin_lock_irq(q->queue_lock);
spin_lock_irq(&q->queue_lock);
q->rpm_status = RPM_ACTIVE;
pm_runtime_mark_last_busy(q->dev);
pm_request_autosuspend(q->dev);
spin_unlock_irq(q->queue_lock);
spin_unlock_irq(&q->queue_lock);
}
EXPORT_SYMBOL(blk_set_runtime_active);

View File

@ -21,7 +21,7 @@ static inline void blk_pm_mark_last_busy(struct request *rq)
static inline void blk_pm_requeue_request(struct request *rq)
{
lockdep_assert_held(rq->q->queue_lock);
lockdep_assert_held(&rq->q->queue_lock);
if (rq->q->dev && !(rq->rq_flags & RQF_PM))
rq->q->nr_pending--;
@ -30,7 +30,7 @@ static inline void blk_pm_requeue_request(struct request *rq)
static inline void blk_pm_add_request(struct request_queue *q,
struct request *rq)
{
lockdep_assert_held(q->queue_lock);
lockdep_assert_held(&q->queue_lock);
if (q->dev && !(rq->rq_flags & RQF_PM))
q->nr_pending++;
@ -38,7 +38,7 @@ static inline void blk_pm_add_request(struct request_queue *q,
static inline void blk_pm_put_request(struct request *rq)
{
lockdep_assert_held(rq->q->queue_lock);
lockdep_assert_held(&rq->q->queue_lock);
if (rq->q->dev && !(rq->rq_flags & RQF_PM))
--rq->q->nr_pending;

View File

@ -27,75 +27,67 @@ bool rq_wait_inc_below(struct rq_wait *rq_wait, unsigned int limit)
return atomic_inc_below(&rq_wait->inflight, limit);
}
void rq_qos_cleanup(struct request_queue *q, struct bio *bio)
void __rq_qos_cleanup(struct rq_qos *rqos, struct bio *bio)
{
struct rq_qos *rqos;
for (rqos = q->rq_qos; rqos; rqos = rqos->next) {
do {
if (rqos->ops->cleanup)
rqos->ops->cleanup(rqos, bio);
}
rqos = rqos->next;
} while (rqos);
}
void rq_qos_done(struct request_queue *q, struct request *rq)
void __rq_qos_done(struct rq_qos *rqos, struct request *rq)
{
struct rq_qos *rqos;
for (rqos = q->rq_qos; rqos; rqos = rqos->next) {
do {
if (rqos->ops->done)
rqos->ops->done(rqos, rq);
}
rqos = rqos->next;
} while (rqos);
}
void rq_qos_issue(struct request_queue *q, struct request *rq)
void __rq_qos_issue(struct rq_qos *rqos, struct request *rq)
{
struct rq_qos *rqos;
for(rqos = q->rq_qos; rqos; rqos = rqos->next) {
do {
if (rqos->ops->issue)
rqos->ops->issue(rqos, rq);
}
rqos = rqos->next;
} while (rqos);
}
void rq_qos_requeue(struct request_queue *q, struct request *rq)
void __rq_qos_requeue(struct rq_qos *rqos, struct request *rq)
{
struct rq_qos *rqos;
for(rqos = q->rq_qos; rqos; rqos = rqos->next) {
do {
if (rqos->ops->requeue)
rqos->ops->requeue(rqos, rq);
}
rqos = rqos->next;
} while (rqos);
}
void rq_qos_throttle(struct request_queue *q, struct bio *bio,
spinlock_t *lock)
void __rq_qos_throttle(struct rq_qos *rqos, struct bio *bio)
{
struct rq_qos *rqos;
for(rqos = q->rq_qos; rqos; rqos = rqos->next) {
do {
if (rqos->ops->throttle)
rqos->ops->throttle(rqos, bio, lock);
}
rqos->ops->throttle(rqos, bio);
rqos = rqos->next;
} while (rqos);
}
void rq_qos_track(struct request_queue *q, struct request *rq, struct bio *bio)
void __rq_qos_track(struct rq_qos *rqos, struct request *rq, struct bio *bio)
{
struct rq_qos *rqos;
for(rqos = q->rq_qos; rqos; rqos = rqos->next) {
do {
if (rqos->ops->track)
rqos->ops->track(rqos, rq, bio);
}
rqos = rqos->next;
} while (rqos);
}
void rq_qos_done_bio(struct request_queue *q, struct bio *bio)
void __rq_qos_done_bio(struct rq_qos *rqos, struct bio *bio)
{
struct rq_qos *rqos;
for(rqos = q->rq_qos; rqos; rqos = rqos->next) {
do {
if (rqos->ops->done_bio)
rqos->ops->done_bio(rqos, bio);
}
rqos = rqos->next;
} while (rqos);
}
/*
@ -184,8 +176,96 @@ void rq_depth_scale_down(struct rq_depth *rqd, bool hard_throttle)
rq_depth_calc_max_depth(rqd);
}
struct rq_qos_wait_data {
struct wait_queue_entry wq;
struct task_struct *task;
struct rq_wait *rqw;
acquire_inflight_cb_t *cb;
void *private_data;
bool got_token;
};
static int rq_qos_wake_function(struct wait_queue_entry *curr,
unsigned int mode, int wake_flags, void *key)
{
struct rq_qos_wait_data *data = container_of(curr,
struct rq_qos_wait_data,
wq);
/*
* If we fail to get a budget, return -1 to interrupt the wake up loop
* in __wake_up_common.
*/
if (!data->cb(data->rqw, data->private_data))
return -1;
data->got_token = true;
list_del_init(&curr->entry);
wake_up_process(data->task);
return 1;
}
/**
* rq_qos_wait - throttle on a rqw if we need to
* @private_data - caller provided specific data
* @acquire_inflight_cb - inc the rqw->inflight counter if we can
* @cleanup_cb - the callback to cleanup in case we race with a waker
*
* This provides a uniform place for the rq_qos users to do their throttling.
* Since you can end up with a lot of things sleeping at once, this manages the
* waking up based on the resources available. The acquire_inflight_cb should
* inc the rqw->inflight if we have the ability to do so, or return false if not
* and then we will sleep until the room becomes available.
*
* cleanup_cb is in case that we race with a waker and need to cleanup the
* inflight count accordingly.
*/
void rq_qos_wait(struct rq_wait *rqw, void *private_data,
acquire_inflight_cb_t *acquire_inflight_cb,
cleanup_cb_t *cleanup_cb)
{
struct rq_qos_wait_data data = {
.wq = {
.func = rq_qos_wake_function,
.entry = LIST_HEAD_INIT(data.wq.entry),
},
.task = current,
.rqw = rqw,
.cb = acquire_inflight_cb,
.private_data = private_data,
};
bool has_sleeper;
has_sleeper = wq_has_sleeper(&rqw->wait);
if (!has_sleeper && acquire_inflight_cb(rqw, private_data))
return;
prepare_to_wait_exclusive(&rqw->wait, &data.wq, TASK_UNINTERRUPTIBLE);
do {
if (data.got_token)
break;
if (!has_sleeper && acquire_inflight_cb(rqw, private_data)) {
finish_wait(&rqw->wait, &data.wq);
/*
* We raced with wbt_wake_function() getting a token,
* which means we now have two. Put our local token
* and wake anyone else potentially waiting for one.
*/
if (data.got_token)
cleanup_cb(rqw, private_data);
break;
}
io_schedule();
has_sleeper = false;
} while (1);
finish_wait(&rqw->wait, &data.wq);
}
void rq_qos_exit(struct request_queue *q)
{
blk_mq_debugfs_unregister_queue_rqos(q);
while (q->rq_qos) {
struct rq_qos *rqos = q->rq_qos;
q->rq_qos = rqos->next;

View File

@ -7,6 +7,10 @@
#include <linux/atomic.h>
#include <linux/wait.h>
#include "blk-mq-debugfs.h"
struct blk_mq_debugfs_attr;
enum rq_qos_id {
RQ_QOS_WBT,
RQ_QOS_CGROUP,
@ -22,10 +26,13 @@ struct rq_qos {
struct request_queue *q;
enum rq_qos_id id;
struct rq_qos *next;
#ifdef CONFIG_BLK_DEBUG_FS
struct dentry *debugfs_dir;
#endif
};
struct rq_qos_ops {
void (*throttle)(struct rq_qos *, struct bio *, spinlock_t *);
void (*throttle)(struct rq_qos *, struct bio *);
void (*track)(struct rq_qos *, struct request *, struct bio *);
void (*issue)(struct rq_qos *, struct request *);
void (*requeue)(struct rq_qos *, struct request *);
@ -33,6 +40,7 @@ struct rq_qos_ops {
void (*done_bio)(struct rq_qos *, struct bio *);
void (*cleanup)(struct rq_qos *, struct bio *);
void (*exit)(struct rq_qos *);
const struct blk_mq_debugfs_attr *debugfs_attrs;
};
struct rq_depth {
@ -66,6 +74,17 @@ static inline struct rq_qos *blkcg_rq_qos(struct request_queue *q)
return rq_qos_id(q, RQ_QOS_CGROUP);
}
static inline const char *rq_qos_id_to_name(enum rq_qos_id id)
{
switch (id) {
case RQ_QOS_WBT:
return "wbt";
case RQ_QOS_CGROUP:
return "cgroup";
}
return "unknown";
}
static inline void rq_wait_init(struct rq_wait *rq_wait)
{
atomic_set(&rq_wait->inflight, 0);
@ -76,6 +95,9 @@ static inline void rq_qos_add(struct request_queue *q, struct rq_qos *rqos)
{
rqos->next = q->rq_qos;
q->rq_qos = rqos;
if (rqos->ops->debugfs_attrs)
blk_mq_debugfs_register_rqos(rqos);
}
static inline void rq_qos_del(struct request_queue *q, struct rq_qos *rqos)
@ -91,19 +113,77 @@ static inline void rq_qos_del(struct request_queue *q, struct rq_qos *rqos)
}
prev = cur;
}
blk_mq_debugfs_unregister_rqos(rqos);
}
typedef bool (acquire_inflight_cb_t)(struct rq_wait *rqw, void *private_data);
typedef void (cleanup_cb_t)(struct rq_wait *rqw, void *private_data);
void rq_qos_wait(struct rq_wait *rqw, void *private_data,
acquire_inflight_cb_t *acquire_inflight_cb,
cleanup_cb_t *cleanup_cb);
bool rq_wait_inc_below(struct rq_wait *rq_wait, unsigned int limit);
void rq_depth_scale_up(struct rq_depth *rqd);
void rq_depth_scale_down(struct rq_depth *rqd, bool hard_throttle);
bool rq_depth_calc_max_depth(struct rq_depth *rqd);
void rq_qos_cleanup(struct request_queue *, struct bio *);
void rq_qos_done(struct request_queue *, struct request *);
void rq_qos_issue(struct request_queue *, struct request *);
void rq_qos_requeue(struct request_queue *, struct request *);
void rq_qos_done_bio(struct request_queue *q, struct bio *bio);
void rq_qos_throttle(struct request_queue *, struct bio *, spinlock_t *);
void rq_qos_track(struct request_queue *q, struct request *, struct bio *);
void __rq_qos_cleanup(struct rq_qos *rqos, struct bio *bio);
void __rq_qos_done(struct rq_qos *rqos, struct request *rq);
void __rq_qos_issue(struct rq_qos *rqos, struct request *rq);
void __rq_qos_requeue(struct rq_qos *rqos, struct request *rq);
void __rq_qos_throttle(struct rq_qos *rqos, struct bio *bio);
void __rq_qos_track(struct rq_qos *rqos, struct request *rq, struct bio *bio);
void __rq_qos_done_bio(struct rq_qos *rqos, struct bio *bio);
static inline void rq_qos_cleanup(struct request_queue *q, struct bio *bio)
{
if (q->rq_qos)
__rq_qos_cleanup(q->rq_qos, bio);
}
static inline void rq_qos_done(struct request_queue *q, struct request *rq)
{
if (q->rq_qos)
__rq_qos_done(q->rq_qos, rq);
}
static inline void rq_qos_issue(struct request_queue *q, struct request *rq)
{
if (q->rq_qos)
__rq_qos_issue(q->rq_qos, rq);
}
static inline void rq_qos_requeue(struct request_queue *q, struct request *rq)
{
if (q->rq_qos)
__rq_qos_requeue(q->rq_qos, rq);
}
static inline void rq_qos_done_bio(struct request_queue *q, struct bio *bio)
{
if (q->rq_qos)
__rq_qos_done_bio(q->rq_qos, bio);
}
static inline void rq_qos_throttle(struct request_queue *q, struct bio *bio)
{
/*
* BIO_TRACKED lets controllers know that a bio went through the
* normal rq_qos path.
*/
bio_set_flag(bio, BIO_TRACKED);
if (q->rq_qos)
__rq_qos_throttle(q->rq_qos, bio);
}
static inline void rq_qos_track(struct request_queue *q, struct request *rq,
struct bio *bio)
{
if (q->rq_qos)
__rq_qos_track(q->rq_qos, rq, bio);
}
void rq_qos_exit(struct request_queue *);
#endif

View File

@ -20,65 +20,12 @@ EXPORT_SYMBOL(blk_max_low_pfn);
unsigned long blk_max_pfn;
/**
* blk_queue_prep_rq - set a prepare_request function for queue
* @q: queue
* @pfn: prepare_request function
*
* It's possible for a queue to register a prepare_request callback which
* is invoked before the request is handed to the request_fn. The goal of
* the function is to prepare a request for I/O, it can be used to build a
* cdb from the request data for instance.
*
*/
void blk_queue_prep_rq(struct request_queue *q, prep_rq_fn *pfn)
{
q->prep_rq_fn = pfn;
}
EXPORT_SYMBOL(blk_queue_prep_rq);
/**
* blk_queue_unprep_rq - set an unprepare_request function for queue
* @q: queue
* @ufn: unprepare_request function
*
* It's possible for a queue to register an unprepare_request callback
* which is invoked before the request is finally completed. The goal
* of the function is to deallocate any data that was allocated in the
* prepare_request callback.
*
*/
void blk_queue_unprep_rq(struct request_queue *q, unprep_rq_fn *ufn)
{
q->unprep_rq_fn = ufn;
}
EXPORT_SYMBOL(blk_queue_unprep_rq);
void blk_queue_softirq_done(struct request_queue *q, softirq_done_fn *fn)
{
q->softirq_done_fn = fn;
}
EXPORT_SYMBOL(blk_queue_softirq_done);
void blk_queue_rq_timeout(struct request_queue *q, unsigned int timeout)
{
q->rq_timeout = timeout;
}
EXPORT_SYMBOL_GPL(blk_queue_rq_timeout);
void blk_queue_rq_timed_out(struct request_queue *q, rq_timed_out_fn *fn)
{
WARN_ON_ONCE(q->mq_ops);
q->rq_timed_out_fn = fn;
}
EXPORT_SYMBOL_GPL(blk_queue_rq_timed_out);
void blk_queue_lld_busy(struct request_queue *q, lld_busy_fn *fn)
{
q->lld_busy_fn = fn;
}
EXPORT_SYMBOL_GPL(blk_queue_lld_busy);
/**
* blk_set_default_limits - reset limits to default values
* @lim: the queue_limits structure to reset
@ -169,8 +116,6 @@ void blk_queue_make_request(struct request_queue *q, make_request_fn *mfn)
q->make_request_fn = mfn;
blk_queue_dma_alignment(q, 511);
blk_queue_congestion_threshold(q);
q->nr_batching = BLK_BATCH_REQ;
blk_set_default_limits(&q->limits);
}
@ -889,16 +834,14 @@ EXPORT_SYMBOL(blk_set_queue_depth);
*/
void blk_queue_write_cache(struct request_queue *q, bool wc, bool fua)
{
spin_lock_irq(q->queue_lock);
if (wc)
queue_flag_set(QUEUE_FLAG_WC, q);
blk_queue_flag_set(QUEUE_FLAG_WC, q);
else
queue_flag_clear(QUEUE_FLAG_WC, q);
blk_queue_flag_clear(QUEUE_FLAG_WC, q);
if (fua)
queue_flag_set(QUEUE_FLAG_FUA, q);
blk_queue_flag_set(QUEUE_FLAG_FUA, q);
else
queue_flag_clear(QUEUE_FLAG_FUA, q);
spin_unlock_irq(q->queue_lock);
blk_queue_flag_clear(QUEUE_FLAG_FUA, q);
wbt_set_write_cache(q, test_bit(QUEUE_FLAG_WC, &q->queue_flags));
}

View File

@ -34,7 +34,7 @@ static __latent_entropy void blk_done_softirq(struct softirq_action *h)
rq = list_entry(local_list.next, struct request, ipi_list);
list_del_init(&rq->ipi_list);
rq->q->softirq_done_fn(rq);
rq->q->mq_ops->complete(rq);
}
}
@ -98,11 +98,11 @@ static int blk_softirq_cpu_dead(unsigned int cpu)
void __blk_complete_request(struct request *req)
{
struct request_queue *q = req->q;
int cpu, ccpu = q->mq_ops ? req->mq_ctx->cpu : req->cpu;
int cpu, ccpu = req->mq_ctx->cpu;
unsigned long flags;
bool shared = false;
BUG_ON(!q->softirq_done_fn);
BUG_ON(!q->mq_ops->complete);
local_irq_save(flags);
cpu = smp_processor_id();
@ -143,27 +143,6 @@ do_local:
local_irq_restore(flags);
}
EXPORT_SYMBOL(__blk_complete_request);
/**
* blk_complete_request - end I/O on a request
* @req: the request being processed
*
* Description:
* Ends all I/O on a request. It does not handle partial completions,
* unless the driver actually implements this in its completion callback
* through requeueing. The actual completion happens out-of-order,
* through a softirq handler. The user must have registered a completion
* callback through blk_queue_softirq_done().
**/
void blk_complete_request(struct request *req)
{
if (unlikely(blk_should_fake_timeout(req->q)))
return;
if (!blk_mark_rq_complete(req))
__blk_complete_request(req);
}
EXPORT_SYMBOL(blk_complete_request);
static __init int blk_softirq_init(void)
{

View File

@ -130,7 +130,6 @@ blk_stat_alloc_callback(void (*timer_fn)(struct blk_stat_callback *),
return cb;
}
EXPORT_SYMBOL_GPL(blk_stat_alloc_callback);
void blk_stat_add_callback(struct request_queue *q,
struct blk_stat_callback *cb)
@ -151,7 +150,6 @@ void blk_stat_add_callback(struct request_queue *q,
blk_queue_flag_set(QUEUE_FLAG_STATS, q);
spin_unlock(&q->stats->lock);
}
EXPORT_SYMBOL_GPL(blk_stat_add_callback);
void blk_stat_remove_callback(struct request_queue *q,
struct blk_stat_callback *cb)
@ -164,7 +162,6 @@ void blk_stat_remove_callback(struct request_queue *q,
del_timer_sync(&cb->timer);
}
EXPORT_SYMBOL_GPL(blk_stat_remove_callback);
static void blk_stat_free_callback_rcu(struct rcu_head *head)
{
@ -181,7 +178,6 @@ void blk_stat_free_callback(struct blk_stat_callback *cb)
if (cb)
call_rcu(&cb->rcu, blk_stat_free_callback_rcu);
}
EXPORT_SYMBOL_GPL(blk_stat_free_callback);
void blk_stat_enable_accounting(struct request_queue *q)
{

View File

@ -145,6 +145,11 @@ static inline void blk_stat_activate_nsecs(struct blk_stat_callback *cb,
mod_timer(&cb->timer, jiffies + nsecs_to_jiffies(nsecs));
}
static inline void blk_stat_deactivate(struct blk_stat_callback *cb)
{
del_timer_sync(&cb->timer);
}
/**
* blk_stat_activate_msecs() - Gather block statistics during a time window in
* milliseconds.

View File

@ -68,7 +68,7 @@ queue_requests_store(struct request_queue *q, const char *page, size_t count)
unsigned long nr;
int ret, err;
if (!q->request_fn && !q->mq_ops)
if (!queue_is_mq(q))
return -EINVAL;
ret = queue_var_store(&nr, page, count);
@ -78,11 +78,7 @@ queue_requests_store(struct request_queue *q, const char *page, size_t count)
if (nr < BLKDEV_MIN_RQ)
nr = BLKDEV_MIN_RQ;
if (q->request_fn)
err = blk_update_nr_requests(q, nr);
else
err = blk_mq_update_nr_requests(q, nr);
err = blk_mq_update_nr_requests(q, nr);
if (err)
return err;
@ -242,10 +238,10 @@ queue_max_sectors_store(struct request_queue *q, const char *page, size_t count)
if (max_sectors_kb > max_hw_sectors_kb || max_sectors_kb < page_kb)
return -EINVAL;
spin_lock_irq(q->queue_lock);
spin_lock_irq(&q->queue_lock);
q->limits.max_sectors = max_sectors_kb << 1;
q->backing_dev_info->io_pages = max_sectors_kb >> (PAGE_SHIFT - 10);
spin_unlock_irq(q->queue_lock);
spin_unlock_irq(&q->queue_lock);
return ret;
}
@ -320,14 +316,12 @@ static ssize_t queue_nomerges_store(struct request_queue *q, const char *page,
if (ret < 0)
return ret;
spin_lock_irq(q->queue_lock);
queue_flag_clear(QUEUE_FLAG_NOMERGES, q);
queue_flag_clear(QUEUE_FLAG_NOXMERGES, q);
blk_queue_flag_clear(QUEUE_FLAG_NOMERGES, q);
blk_queue_flag_clear(QUEUE_FLAG_NOXMERGES, q);
if (nm == 2)
queue_flag_set(QUEUE_FLAG_NOMERGES, q);
blk_queue_flag_set(QUEUE_FLAG_NOMERGES, q);
else if (nm)
queue_flag_set(QUEUE_FLAG_NOXMERGES, q);
spin_unlock_irq(q->queue_lock);
blk_queue_flag_set(QUEUE_FLAG_NOXMERGES, q);
return ret;
}
@ -351,18 +345,16 @@ queue_rq_affinity_store(struct request_queue *q, const char *page, size_t count)
if (ret < 0)
return ret;
spin_lock_irq(q->queue_lock);
if (val == 2) {
queue_flag_set(QUEUE_FLAG_SAME_COMP, q);
queue_flag_set(QUEUE_FLAG_SAME_FORCE, q);
blk_queue_flag_set(QUEUE_FLAG_SAME_COMP, q);
blk_queue_flag_set(QUEUE_FLAG_SAME_FORCE, q);
} else if (val == 1) {
queue_flag_set(QUEUE_FLAG_SAME_COMP, q);
queue_flag_clear(QUEUE_FLAG_SAME_FORCE, q);
blk_queue_flag_set(QUEUE_FLAG_SAME_COMP, q);
blk_queue_flag_clear(QUEUE_FLAG_SAME_FORCE, q);
} else if (val == 0) {
queue_flag_clear(QUEUE_FLAG_SAME_COMP, q);
queue_flag_clear(QUEUE_FLAG_SAME_FORCE, q);
blk_queue_flag_clear(QUEUE_FLAG_SAME_COMP, q);
blk_queue_flag_clear(QUEUE_FLAG_SAME_FORCE, q);
}
spin_unlock_irq(q->queue_lock);
#endif
return ret;
}
@ -410,7 +402,8 @@ static ssize_t queue_poll_store(struct request_queue *q, const char *page,
unsigned long poll_on;
ssize_t ret;
if (!q->mq_ops || !q->mq_ops->poll)
if (!q->tag_set || q->tag_set->nr_maps <= HCTX_TYPE_POLL ||
!q->tag_set->map[HCTX_TYPE_POLL].nr_queues)
return -EINVAL;
ret = queue_var_store(&poll_on, page, count);
@ -425,6 +418,26 @@ static ssize_t queue_poll_store(struct request_queue *q, const char *page,
return ret;
}
static ssize_t queue_io_timeout_show(struct request_queue *q, char *page)
{
return sprintf(page, "%u\n", jiffies_to_msecs(q->rq_timeout));
}
static ssize_t queue_io_timeout_store(struct request_queue *q, const char *page,
size_t count)
{
unsigned int val;
int err;
err = kstrtou32(page, 10, &val);
if (err || val == 0)
return -EINVAL;
blk_queue_rq_timeout(q, msecs_to_jiffies(val));
return count;
}
static ssize_t queue_wb_lat_show(struct request_queue *q, char *page)
{
if (!wbt_rq_qos(q))
@ -463,20 +476,14 @@ static ssize_t queue_wb_lat_store(struct request_queue *q, const char *page,
* ends up either enabling or disabling wbt completely. We can't
* have IO inflight if that happens.
*/
if (q->mq_ops) {
blk_mq_freeze_queue(q);
blk_mq_quiesce_queue(q);
} else
blk_queue_bypass_start(q);
blk_mq_freeze_queue(q);
blk_mq_quiesce_queue(q);
wbt_set_min_lat(q, val);
wbt_update_limits(q);
if (q->mq_ops) {
blk_mq_unquiesce_queue(q);
blk_mq_unfreeze_queue(q);
} else
blk_queue_bypass_end(q);
blk_mq_unquiesce_queue(q);
blk_mq_unfreeze_queue(q);
return count;
}
@ -699,6 +706,12 @@ static struct queue_sysfs_entry queue_dax_entry = {
.show = queue_dax_show,
};
static struct queue_sysfs_entry queue_io_timeout_entry = {
.attr = {.name = "io_timeout", .mode = 0644 },
.show = queue_io_timeout_show,
.store = queue_io_timeout_store,
};
static struct queue_sysfs_entry queue_wb_lat_entry = {
.attr = {.name = "wbt_lat_usec", .mode = 0644 },
.show = queue_wb_lat_show,
@ -748,6 +761,7 @@ static struct attribute *default_attrs[] = {
&queue_dax_entry.attr,
&queue_wb_lat_entry.attr,
&queue_poll_delay_entry.attr,
&queue_io_timeout_entry.attr,
#ifdef CONFIG_BLK_DEV_THROTTLING_LOW
&throtl_sample_time_entry.attr,
#endif
@ -847,24 +861,14 @@ static void __blk_release_queue(struct work_struct *work)
blk_free_queue_stats(q->stats);
blk_exit_rl(q, &q->root_rl);
if (q->queue_tags)
__blk_queue_free_tags(q);
blk_queue_free_zone_bitmaps(q);
if (!q->mq_ops) {
if (q->exit_rq_fn)
q->exit_rq_fn(q, q->fq->flush_rq);
blk_free_flush_queue(q->fq);
} else {
if (queue_is_mq(q))
blk_mq_release(q);
}
blk_trace_shutdown(q);
if (q->mq_ops)
if (queue_is_mq(q))
blk_mq_debugfs_unregister(q);
bioset_exit(&q->bio_split);
@ -909,7 +913,7 @@ int blk_register_queue(struct gendisk *disk)
WARN_ONCE(test_bit(QUEUE_FLAG_REGISTERED, &q->queue_flags),
"%s is registering an already registered queue\n",
kobject_name(&dev->kobj));
queue_flag_set_unlocked(QUEUE_FLAG_REGISTERED, q);
blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q);
/*
* SCSI probing may synchronously create and destroy a lot of
@ -921,9 +925,8 @@ int blk_register_queue(struct gendisk *disk)
* request_queues for non-existent devices never get registered.
*/
if (!blk_queue_init_done(q)) {
queue_flag_set_unlocked(QUEUE_FLAG_INIT_DONE, q);
blk_queue_flag_set(QUEUE_FLAG_INIT_DONE, q);
percpu_ref_switch_to_percpu(&q->q_usage_counter);
blk_queue_bypass_end(q);
}
ret = blk_trace_init_sysfs(dev);
@ -939,7 +942,7 @@ int blk_register_queue(struct gendisk *disk)
goto unlock;
}
if (q->mq_ops) {
if (queue_is_mq(q)) {
__blk_mq_register_dev(dev, q);
blk_mq_debugfs_register(q);
}
@ -950,7 +953,7 @@ int blk_register_queue(struct gendisk *disk)
blk_throtl_register_queue(q);
if (q->request_fn || (q->mq_ops && q->elevator)) {
if (q->elevator) {
ret = elv_register_queue(q);
if (ret) {
mutex_unlock(&q->sysfs_lock);
@ -999,7 +1002,7 @@ void blk_unregister_queue(struct gendisk *disk)
* Remove the sysfs attributes before unregistering the queue data
* structures that can be modified through sysfs.
*/
if (q->mq_ops)
if (queue_is_mq(q))
blk_mq_unregister_dev(disk_to_dev(disk), q);
mutex_unlock(&q->sysfs_lock);
@ -1008,7 +1011,7 @@ void blk_unregister_queue(struct gendisk *disk)
blk_trace_remove_sysfs(disk_to_dev(disk));
mutex_lock(&q->sysfs_lock);
if (q->request_fn || (q->mq_ops && q->elevator))
if (q->elevator)
elv_unregister_queue(q);
mutex_unlock(&q->sysfs_lock);

View File

@ -1,378 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Functions related to tagged command queuing
*/
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/bio.h>
#include <linux/blkdev.h>
#include <linux/slab.h>
#include "blk.h"
/**
* blk_queue_find_tag - find a request by its tag and queue
* @q: The request queue for the device
* @tag: The tag of the request
*
* Notes:
* Should be used when a device returns a tag and you want to match
* it with a request.
*
* no locks need be held.
**/
struct request *blk_queue_find_tag(struct request_queue *q, int tag)
{
return blk_map_queue_find_tag(q->queue_tags, tag);
}
EXPORT_SYMBOL(blk_queue_find_tag);
/**
* blk_free_tags - release a given set of tag maintenance info
* @bqt: the tag map to free
*
* Drop the reference count on @bqt and frees it when the last reference
* is dropped.
*/
void blk_free_tags(struct blk_queue_tag *bqt)
{
if (atomic_dec_and_test(&bqt->refcnt)) {
BUG_ON(find_first_bit(bqt->tag_map, bqt->max_depth) <
bqt->max_depth);
kfree(bqt->tag_index);
bqt->tag_index = NULL;
kfree(bqt->tag_map);
bqt->tag_map = NULL;
kfree(bqt);
}
}
EXPORT_SYMBOL(blk_free_tags);
/**
* __blk_queue_free_tags - release tag maintenance info
* @q: the request queue for the device
*
* Notes:
* blk_cleanup_queue() will take care of calling this function, if tagging
* has been used. So there's no need to call this directly.
**/
void __blk_queue_free_tags(struct request_queue *q)
{
struct blk_queue_tag *bqt = q->queue_tags;
if (!bqt)
return;
blk_free_tags(bqt);
q->queue_tags = NULL;
queue_flag_clear_unlocked(QUEUE_FLAG_QUEUED, q);
}
/**
* blk_queue_free_tags - release tag maintenance info
* @q: the request queue for the device
*
* Notes:
* This is used to disable tagged queuing to a device, yet leave
* queue in function.
**/
void blk_queue_free_tags(struct request_queue *q)
{
queue_flag_clear_unlocked(QUEUE_FLAG_QUEUED, q);
}
EXPORT_SYMBOL(blk_queue_free_tags);
static int
init_tag_map(struct request_queue *q, struct blk_queue_tag *tags, int depth)
{
struct request **tag_index;
unsigned long *tag_map;
int nr_ulongs;
if (q && depth > q->nr_requests * 2) {
depth = q->nr_requests * 2;
printk(KERN_ERR "%s: adjusted depth to %d\n",
__func__, depth);
}
tag_index = kcalloc(depth, sizeof(struct request *), GFP_ATOMIC);
if (!tag_index)
goto fail;
nr_ulongs = ALIGN(depth, BITS_PER_LONG) / BITS_PER_LONG;
tag_map = kcalloc(nr_ulongs, sizeof(unsigned long), GFP_ATOMIC);
if (!tag_map)
goto fail;
tags->real_max_depth = depth;
tags->max_depth = depth;
tags->tag_index = tag_index;
tags->tag_map = tag_map;
return 0;
fail:
kfree(tag_index);
return -ENOMEM;
}
static struct blk_queue_tag *__blk_queue_init_tags(struct request_queue *q,
int depth, int alloc_policy)
{
struct blk_queue_tag *tags;
tags = kmalloc(sizeof(struct blk_queue_tag), GFP_ATOMIC);
if (!tags)
goto fail;
if (init_tag_map(q, tags, depth))
goto fail;
atomic_set(&tags->refcnt, 1);
tags->alloc_policy = alloc_policy;
tags->next_tag = 0;
return tags;
fail:
kfree(tags);
return NULL;
}
/**
* blk_init_tags - initialize the tag info for an external tag map
* @depth: the maximum queue depth supported
* @alloc_policy: tag allocation policy
**/
struct blk_queue_tag *blk_init_tags(int depth, int alloc_policy)
{
return __blk_queue_init_tags(NULL, depth, alloc_policy);
}
EXPORT_SYMBOL(blk_init_tags);
/**
* blk_queue_init_tags - initialize the queue tag info
* @q: the request queue for the device
* @depth: the maximum queue depth supported
* @tags: the tag to use
* @alloc_policy: tag allocation policy
*
* Queue lock must be held here if the function is called to resize an
* existing map.
**/
int blk_queue_init_tags(struct request_queue *q, int depth,
struct blk_queue_tag *tags, int alloc_policy)
{
int rc;
BUG_ON(tags && q->queue_tags && tags != q->queue_tags);
if (!tags && !q->queue_tags) {
tags = __blk_queue_init_tags(q, depth, alloc_policy);
if (!tags)
return -ENOMEM;
} else if (q->queue_tags) {
rc = blk_queue_resize_tags(q, depth);
if (rc)
return rc;
queue_flag_set(QUEUE_FLAG_QUEUED, q);
return 0;
} else
atomic_inc(&tags->refcnt);
/*
* assign it, all done
*/
q->queue_tags = tags;
queue_flag_set_unlocked(QUEUE_FLAG_QUEUED, q);
return 0;
}
EXPORT_SYMBOL(blk_queue_init_tags);
/**
* blk_queue_resize_tags - change the queueing depth
* @q: the request queue for the device
* @new_depth: the new max command queueing depth
*
* Notes:
* Must be called with the queue lock held.
**/
int blk_queue_resize_tags(struct request_queue *q, int new_depth)
{
struct blk_queue_tag *bqt = q->queue_tags;
struct request **tag_index;
unsigned long *tag_map;
int max_depth, nr_ulongs;
if (!bqt)
return -ENXIO;
/*
* if we already have large enough real_max_depth. just
* adjust max_depth. *NOTE* as requests with tag value
* between new_depth and real_max_depth can be in-flight, tag
* map can not be shrunk blindly here.
*/
if (new_depth <= bqt->real_max_depth) {
bqt->max_depth = new_depth;
return 0;
}
/*
* Currently cannot replace a shared tag map with a new
* one, so error out if this is the case
*/
if (atomic_read(&bqt->refcnt) != 1)
return -EBUSY;
/*
* save the old state info, so we can copy it back
*/
tag_index = bqt->tag_index;
tag_map = bqt->tag_map;
max_depth = bqt->real_max_depth;
if (init_tag_map(q, bqt, new_depth))
return -ENOMEM;
memcpy(bqt->tag_index, tag_index, max_depth * sizeof(struct request *));
nr_ulongs = ALIGN(max_depth, BITS_PER_LONG) / BITS_PER_LONG;
memcpy(bqt->tag_map, tag_map, nr_ulongs * sizeof(unsigned long));
kfree(tag_index);
kfree(tag_map);
return 0;
}
EXPORT_SYMBOL(blk_queue_resize_tags);
/**
* blk_queue_end_tag - end tag operations for a request
* @q: the request queue for the device
* @rq: the request that has completed
*
* Description:
* Typically called when end_that_request_first() returns %0, meaning
* all transfers have been done for a request. It's important to call
* this function before end_that_request_last(), as that will put the
* request back on the free list thus corrupting the internal tag list.
**/
void blk_queue_end_tag(struct request_queue *q, struct request *rq)
{
struct blk_queue_tag *bqt = q->queue_tags;
unsigned tag = rq->tag; /* negative tags invalid */
lockdep_assert_held(q->queue_lock);
BUG_ON(tag >= bqt->real_max_depth);
list_del_init(&rq->queuelist);
rq->rq_flags &= ~RQF_QUEUED;
rq->tag = -1;
rq->internal_tag = -1;
if (unlikely(bqt->tag_index[tag] == NULL))
printk(KERN_ERR "%s: tag %d is missing\n",
__func__, tag);
bqt->tag_index[tag] = NULL;
if (unlikely(!test_bit(tag, bqt->tag_map))) {
printk(KERN_ERR "%s: attempt to clear non-busy tag (%d)\n",
__func__, tag);
return;
}
/*
* The tag_map bit acts as a lock for tag_index[bit], so we need
* unlock memory barrier semantics.
*/
clear_bit_unlock(tag, bqt->tag_map);
}
/**
* blk_queue_start_tag - find a free tag and assign it
* @q: the request queue for the device
* @rq: the block request that needs tagging
*
* Description:
* This can either be used as a stand-alone helper, or possibly be
* assigned as the queue &prep_rq_fn (in which case &struct request
* automagically gets a tag assigned). Note that this function
* assumes that any type of request can be queued! if this is not
* true for your device, you must check the request type before
* calling this function. The request will also be removed from
* the request queue, so it's the drivers responsibility to readd
* it if it should need to be restarted for some reason.
**/
int blk_queue_start_tag(struct request_queue *q, struct request *rq)
{
struct blk_queue_tag *bqt = q->queue_tags;
unsigned max_depth;
int tag;
lockdep_assert_held(q->queue_lock);
if (unlikely((rq->rq_flags & RQF_QUEUED))) {
printk(KERN_ERR
"%s: request %p for device [%s] already tagged %d",
__func__, rq,
rq->rq_disk ? rq->rq_disk->disk_name : "?", rq->tag);
BUG();
}
/*
* Protect against shared tag maps, as we may not have exclusive
* access to the tag map.
*
* We reserve a few tags just for sync IO, since we don't want
* to starve sync IO on behalf of flooding async IO.
*/
max_depth = bqt->max_depth;
if (!rq_is_sync(rq) && max_depth > 1) {
switch (max_depth) {
case 2:
max_depth = 1;
break;
case 3:
max_depth = 2;
break;
default:
max_depth -= 2;
}
if (q->in_flight[BLK_RW_ASYNC] > max_depth)
return 1;
}
do {
if (bqt->alloc_policy == BLK_TAG_ALLOC_FIFO) {
tag = find_first_zero_bit(bqt->tag_map, max_depth);
if (tag >= max_depth)
return 1;
} else {
int start = bqt->next_tag;
int size = min_t(int, bqt->max_depth, max_depth + start);
tag = find_next_zero_bit(bqt->tag_map, size, start);
if (tag >= size && start + size > bqt->max_depth) {
size = start + size - bqt->max_depth;
tag = find_first_zero_bit(bqt->tag_map, size);
}
if (tag >= size)
return 1;
}
} while (test_and_set_bit_lock(tag, bqt->tag_map));
/*
* We need lock ordering semantics given by test_and_set_bit_lock.
* See blk_queue_end_tag for details.
*/
bqt->next_tag = (tag + 1) % bqt->max_depth;
rq->rq_flags |= RQF_QUEUED;
rq->tag = tag;
bqt->tag_index[tag] = rq;
blk_start_request(rq);
return 0;
}
EXPORT_SYMBOL(blk_queue_start_tag);

View File

@ -1243,7 +1243,7 @@ static void throtl_pending_timer_fn(struct timer_list *t)
bool dispatched;
int ret;
spin_lock_irq(q->queue_lock);
spin_lock_irq(&q->queue_lock);
if (throtl_can_upgrade(td, NULL))
throtl_upgrade_state(td);
@ -1266,9 +1266,9 @@ again:
break;
/* this dispatch windows is still open, relax and repeat */
spin_unlock_irq(q->queue_lock);
spin_unlock_irq(&q->queue_lock);
cpu_relax();
spin_lock_irq(q->queue_lock);
spin_lock_irq(&q->queue_lock);
}
if (!dispatched)
@ -1290,7 +1290,7 @@ again:
queue_work(kthrotld_workqueue, &td->dispatch_work);
}
out_unlock:
spin_unlock_irq(q->queue_lock);
spin_unlock_irq(&q->queue_lock);
}
/**
@ -1314,11 +1314,11 @@ static void blk_throtl_dispatch_work_fn(struct work_struct *work)
bio_list_init(&bio_list_on_stack);
spin_lock_irq(q->queue_lock);
spin_lock_irq(&q->queue_lock);
for (rw = READ; rw <= WRITE; rw++)
while ((bio = throtl_pop_queued(&td_sq->queued[rw], NULL)))
bio_list_add(&bio_list_on_stack, bio);
spin_unlock_irq(q->queue_lock);
spin_unlock_irq(&q->queue_lock);
if (!bio_list_empty(&bio_list_on_stack)) {
blk_start_plug(&plug);
@ -2115,16 +2115,6 @@ static inline void throtl_update_latency_buckets(struct throtl_data *td)
}
#endif
static void blk_throtl_assoc_bio(struct throtl_grp *tg, struct bio *bio)
{
#ifdef CONFIG_BLK_DEV_THROTTLING_LOW
/* fallback to root_blkg if we fail to get a blkg ref */
if (bio->bi_css && (bio_associate_blkg(bio, tg_to_blkg(tg)) == -ENODEV))
bio_associate_blkg(bio, bio->bi_disk->queue->root_blkg);
bio_issue_init(&bio->bi_issue, bio_sectors(bio));
#endif
}
bool blk_throtl_bio(struct request_queue *q, struct blkcg_gq *blkg,
struct bio *bio)
{
@ -2141,14 +2131,10 @@ bool blk_throtl_bio(struct request_queue *q, struct blkcg_gq *blkg,
if (bio_flagged(bio, BIO_THROTTLED) || !tg->has_rules[rw])
goto out;
spin_lock_irq(q->queue_lock);
spin_lock_irq(&q->queue_lock);
throtl_update_latency_buckets(td);
if (unlikely(blk_queue_bypass(q)))
goto out_unlock;
blk_throtl_assoc_bio(tg, bio);
blk_throtl_update_idletime(tg);
sq = &tg->service_queue;
@ -2227,7 +2213,7 @@ again:
}
out_unlock:
spin_unlock_irq(q->queue_lock);
spin_unlock_irq(&q->queue_lock);
out:
bio_set_flag(bio, BIO_THROTTLED);
@ -2348,7 +2334,7 @@ static void tg_drain_bios(struct throtl_service_queue *parent_sq)
* Dispatch all currently throttled bios on @q through ->make_request_fn().
*/
void blk_throtl_drain(struct request_queue *q)
__releases(q->queue_lock) __acquires(q->queue_lock)
__releases(&q->queue_lock) __acquires(&q->queue_lock)
{
struct throtl_data *td = q->td;
struct blkcg_gq *blkg;
@ -2356,7 +2342,6 @@ void blk_throtl_drain(struct request_queue *q)
struct bio *bio;
int rw;
queue_lockdep_assert_held(q);
rcu_read_lock();
/*
@ -2372,7 +2357,7 @@ void blk_throtl_drain(struct request_queue *q)
tg_drain_bios(&td->service_queue);
rcu_read_unlock();
spin_unlock_irq(q->queue_lock);
spin_unlock_irq(&q->queue_lock);
/* all bios now should be in td->service_queue, issue them */
for (rw = READ; rw <= WRITE; rw++)
@ -2380,7 +2365,7 @@ void blk_throtl_drain(struct request_queue *q)
NULL)))
generic_make_request(bio);
spin_lock_irq(q->queue_lock);
spin_lock_irq(&q->queue_lock);
}
int blk_throtl_init(struct request_queue *q)
@ -2460,7 +2445,7 @@ void blk_throtl_register_queue(struct request_queue *q)
td->throtl_slice = DFL_THROTL_SLICE_HD;
#endif
td->track_bio_latency = !queue_is_rq_based(q);
td->track_bio_latency = !queue_is_mq(q);
if (!td->track_bio_latency)
blk_stat_enable_accounting(q);
}

View File

@ -68,80 +68,6 @@ ssize_t part_timeout_store(struct device *dev, struct device_attribute *attr,
#endif /* CONFIG_FAIL_IO_TIMEOUT */
/*
* blk_delete_timer - Delete/cancel timer for a given function.
* @req: request that we are canceling timer for
*
*/
void blk_delete_timer(struct request *req)
{
list_del_init(&req->timeout_list);
}
static void blk_rq_timed_out(struct request *req)
{
struct request_queue *q = req->q;
enum blk_eh_timer_return ret = BLK_EH_RESET_TIMER;
if (q->rq_timed_out_fn)
ret = q->rq_timed_out_fn(req);
switch (ret) {
case BLK_EH_RESET_TIMER:
blk_add_timer(req);
blk_clear_rq_complete(req);
break;
case BLK_EH_DONE:
/*
* LLD handles this for now but in the future
* we can send a request msg to abort the command
* and we can move more of the generic scsi eh code to
* the blk layer.
*/
break;
default:
printk(KERN_ERR "block: bad eh return: %d\n", ret);
break;
}
}
static void blk_rq_check_expired(struct request *rq, unsigned long *next_timeout,
unsigned int *next_set)
{
const unsigned long deadline = blk_rq_deadline(rq);
if (time_after_eq(jiffies, deadline)) {
list_del_init(&rq->timeout_list);
/*
* Check if we raced with end io completion
*/
if (!blk_mark_rq_complete(rq))
blk_rq_timed_out(rq);
} else if (!*next_set || time_after(*next_timeout, deadline)) {
*next_timeout = deadline;
*next_set = 1;
}
}
void blk_timeout_work(struct work_struct *work)
{
struct request_queue *q =
container_of(work, struct request_queue, timeout_work);
unsigned long flags, next = 0;
struct request *rq, *tmp;
int next_set = 0;
spin_lock_irqsave(q->queue_lock, flags);
list_for_each_entry_safe(rq, tmp, &q->timeout_list, timeout_list)
blk_rq_check_expired(rq, &next, &next_set);
if (next_set)
mod_timer(&q->timeout, round_jiffies_up(next));
spin_unlock_irqrestore(q->queue_lock, flags);
}
/**
* blk_abort_request -- Request request recovery for the specified command
* @req: pointer to the request of interest
@ -149,24 +75,17 @@ void blk_timeout_work(struct work_struct *work)
* This function requests that the block layer start recovery for the
* request by deleting the timer and calling the q's timeout function.
* LLDDs who implement their own error recovery MAY ignore the timeout
* event if they generated blk_abort_req. Must hold queue lock.
* event if they generated blk_abort_request.
*/
void blk_abort_request(struct request *req)
{
if (req->q->mq_ops) {
/*
* All we need to ensure is that timeout scan takes place
* immediately and that scan sees the new timeout value.
* No need for fancy synchronizations.
*/
blk_rq_set_deadline(req, jiffies);
kblockd_schedule_work(&req->q->timeout_work);
} else {
if (blk_mark_rq_complete(req))
return;
blk_delete_timer(req);
blk_rq_timed_out(req);
}
/*
* All we need to ensure is that timeout scan takes place
* immediately and that scan sees the new timeout value.
* No need for fancy synchronizations.
*/
WRITE_ONCE(req->deadline, jiffies);
kblockd_schedule_work(&req->q->timeout_work);
}
EXPORT_SYMBOL_GPL(blk_abort_request);
@ -194,15 +113,6 @@ void blk_add_timer(struct request *req)
struct request_queue *q = req->q;
unsigned long expiry;
if (!q->mq_ops)
lockdep_assert_held(q->queue_lock);
/* blk-mq has its own handler, so we don't need ->rq_timed_out_fn */
if (!q->mq_ops && !q->rq_timed_out_fn)
return;
BUG_ON(!list_empty(&req->timeout_list));
/*
* Some LLDs, like scsi, peek at the timeout to prevent a
* command from being retried forever.
@ -211,21 +121,16 @@ void blk_add_timer(struct request *req)
req->timeout = q->rq_timeout;
req->rq_flags &= ~RQF_TIMED_OUT;
blk_rq_set_deadline(req, jiffies + req->timeout);
/*
* Only the non-mq case needs to add the request to a protected list.
* For the mq case we simply scan the tag map.
*/
if (!q->mq_ops)
list_add_tail(&req->timeout_list, &req->q->timeout_list);
expiry = jiffies + req->timeout;
WRITE_ONCE(req->deadline, expiry);
/*
* If the timer isn't already pending or this timeout is earlier
* than an existing one, modify the timer. Round up to next nearest
* second.
*/
expiry = blk_rq_timeout(round_jiffies_up(blk_rq_deadline(req)));
expiry = blk_rq_timeout(round_jiffies_up(expiry));
if (!timer_pending(&q->timeout) ||
time_before(expiry, q->timeout.expires)) {

View File

@ -489,31 +489,21 @@ static inline unsigned int get_limit(struct rq_wb *rwb, unsigned long rw)
}
struct wbt_wait_data {
struct wait_queue_entry wq;
struct task_struct *task;
struct rq_wb *rwb;
struct rq_wait *rqw;
enum wbt_flags wb_acct;
unsigned long rw;
bool got_token;
};
static int wbt_wake_function(struct wait_queue_entry *curr, unsigned int mode,
int wake_flags, void *key)
static bool wbt_inflight_cb(struct rq_wait *rqw, void *private_data)
{
struct wbt_wait_data *data = container_of(curr, struct wbt_wait_data,
wq);
struct wbt_wait_data *data = private_data;
return rq_wait_inc_below(rqw, get_limit(data->rwb, data->rw));
}
/*
* If we fail to get a budget, return -1 to interrupt the wake up
* loop in __wake_up_common.
*/
if (!rq_wait_inc_below(data->rqw, get_limit(data->rwb, data->rw)))
return -1;
data->got_token = true;
list_del_init(&curr->entry);
wake_up_process(data->task);
return 1;
static void wbt_cleanup_cb(struct rq_wait *rqw, void *private_data)
{
struct wbt_wait_data *data = private_data;
wbt_rqw_done(data->rwb, rqw, data->wb_acct);
}
/*
@ -521,57 +511,16 @@ static int wbt_wake_function(struct wait_queue_entry *curr, unsigned int mode,
* the timer to kick off queuing again.
*/
static void __wbt_wait(struct rq_wb *rwb, enum wbt_flags wb_acct,
unsigned long rw, spinlock_t *lock)
__releases(lock)
__acquires(lock)
unsigned long rw)
{
struct rq_wait *rqw = get_rq_wait(rwb, wb_acct);
struct wbt_wait_data data = {
.wq = {
.func = wbt_wake_function,
.entry = LIST_HEAD_INIT(data.wq.entry),
},
.task = current,
.rwb = rwb,
.rqw = rqw,
.wb_acct = wb_acct,
.rw = rw,
};
bool has_sleeper;
has_sleeper = wq_has_sleeper(&rqw->wait);
if (!has_sleeper && rq_wait_inc_below(rqw, get_limit(rwb, rw)))
return;
prepare_to_wait_exclusive(&rqw->wait, &data.wq, TASK_UNINTERRUPTIBLE);
do {
if (data.got_token)
break;
if (!has_sleeper &&
rq_wait_inc_below(rqw, get_limit(rwb, rw))) {
finish_wait(&rqw->wait, &data.wq);
/*
* We raced with wbt_wake_function() getting a token,
* which means we now have two. Put our local token
* and wake anyone else potentially waiting for one.
*/
if (data.got_token)
wbt_rqw_done(rwb, rqw, wb_acct);
break;
}
if (lock) {
spin_unlock_irq(lock);
io_schedule();
spin_lock_irq(lock);
} else
io_schedule();
has_sleeper = false;
} while (1);
finish_wait(&rqw->wait, &data.wq);
rq_qos_wait(rqw, &data, wbt_inflight_cb, wbt_cleanup_cb);
}
static inline bool wbt_should_throttle(struct rq_wb *rwb, struct bio *bio)
@ -624,7 +573,7 @@ static void wbt_cleanup(struct rq_qos *rqos, struct bio *bio)
* in an irq held spinlock, if it holds one when calling this function.
* If we do sleep, we'll release and re-grab it.
*/
static void wbt_wait(struct rq_qos *rqos, struct bio *bio, spinlock_t *lock)
static void wbt_wait(struct rq_qos *rqos, struct bio *bio)
{
struct rq_wb *rwb = RQWB(rqos);
enum wbt_flags flags;
@ -636,7 +585,7 @@ static void wbt_wait(struct rq_qos *rqos, struct bio *bio, spinlock_t *lock)
return;
}
__wbt_wait(rwb, flags, bio->bi_opf, lock);
__wbt_wait(rwb, flags, bio->bi_opf);
if (!blk_stat_is_active(rwb->cb))
rwb_arm_timer(rwb);
@ -709,8 +658,7 @@ void wbt_enable_default(struct request_queue *q)
if (!test_bit(QUEUE_FLAG_REGISTERED, &q->queue_flags))
return;
if ((q->mq_ops && IS_ENABLED(CONFIG_BLK_WBT_MQ)) ||
(q->request_fn && IS_ENABLED(CONFIG_BLK_WBT_SQ)))
if (queue_is_mq(q) && IS_ENABLED(CONFIG_BLK_WBT_MQ))
wbt_init(q);
}
EXPORT_SYMBOL_GPL(wbt_enable_default);
@ -760,11 +708,100 @@ void wbt_disable_default(struct request_queue *q)
if (!rqos)
return;
rwb = RQWB(rqos);
if (rwb->enable_state == WBT_STATE_ON_DEFAULT)
if (rwb->enable_state == WBT_STATE_ON_DEFAULT) {
blk_stat_deactivate(rwb->cb);
rwb->wb_normal = 0;
}
}
EXPORT_SYMBOL_GPL(wbt_disable_default);
#ifdef CONFIG_BLK_DEBUG_FS
static int wbt_curr_win_nsec_show(void *data, struct seq_file *m)
{
struct rq_qos *rqos = data;
struct rq_wb *rwb = RQWB(rqos);
seq_printf(m, "%llu\n", rwb->cur_win_nsec);
return 0;
}
static int wbt_enabled_show(void *data, struct seq_file *m)
{
struct rq_qos *rqos = data;
struct rq_wb *rwb = RQWB(rqos);
seq_printf(m, "%d\n", rwb->enable_state);
return 0;
}
static int wbt_id_show(void *data, struct seq_file *m)
{
struct rq_qos *rqos = data;
seq_printf(m, "%u\n", rqos->id);
return 0;
}
static int wbt_inflight_show(void *data, struct seq_file *m)
{
struct rq_qos *rqos = data;
struct rq_wb *rwb = RQWB(rqos);
int i;
for (i = 0; i < WBT_NUM_RWQ; i++)
seq_printf(m, "%d: inflight %d\n", i,
atomic_read(&rwb->rq_wait[i].inflight));
return 0;
}
static int wbt_min_lat_nsec_show(void *data, struct seq_file *m)
{
struct rq_qos *rqos = data;
struct rq_wb *rwb = RQWB(rqos);
seq_printf(m, "%lu\n", rwb->min_lat_nsec);
return 0;
}
static int wbt_unknown_cnt_show(void *data, struct seq_file *m)
{
struct rq_qos *rqos = data;
struct rq_wb *rwb = RQWB(rqos);
seq_printf(m, "%u\n", rwb->unknown_cnt);
return 0;
}
static int wbt_normal_show(void *data, struct seq_file *m)
{
struct rq_qos *rqos = data;
struct rq_wb *rwb = RQWB(rqos);
seq_printf(m, "%u\n", rwb->wb_normal);
return 0;
}
static int wbt_background_show(void *data, struct seq_file *m)
{
struct rq_qos *rqos = data;
struct rq_wb *rwb = RQWB(rqos);
seq_printf(m, "%u\n", rwb->wb_background);
return 0;
}
static const struct blk_mq_debugfs_attr wbt_debugfs_attrs[] = {
{"curr_win_nsec", 0400, wbt_curr_win_nsec_show},
{"enabled", 0400, wbt_enabled_show},
{"id", 0400, wbt_id_show},
{"inflight", 0400, wbt_inflight_show},
{"min_lat_nsec", 0400, wbt_min_lat_nsec_show},
{"unknown_cnt", 0400, wbt_unknown_cnt_show},
{"wb_normal", 0400, wbt_normal_show},
{"wb_background", 0400, wbt_background_show},
{},
};
#endif
static struct rq_qos_ops wbt_rqos_ops = {
.throttle = wbt_wait,
@ -774,6 +811,9 @@ static struct rq_qos_ops wbt_rqos_ops = {
.done = wbt_done,
.cleanup = wbt_cleanup,
.exit = wbt_exit,
#ifdef CONFIG_BLK_DEBUG_FS
.debugfs_attrs = wbt_debugfs_attrs,
#endif
};
int wbt_init(struct request_queue *q)

View File

@ -421,7 +421,7 @@ int blk_revalidate_disk_zones(struct gendisk *disk)
* BIO based queues do not use a scheduler so only q->nr_zones
* needs to be updated so that the sysfs exposed value is correct.
*/
if (!queue_is_rq_based(q)) {
if (!queue_is_mq(q)) {
q->nr_zones = nr_zones;
return 0;
}

View File

@ -7,12 +7,6 @@
#include <xen/xen.h>
#include "blk-mq.h"
/* Amount of time in which a process may batch requests */
#define BLK_BATCH_TIME (HZ/50UL)
/* Number of requests a "batching" process may submit */
#define BLK_BATCH_REQ 32
/* Max future timer expiry for timeouts */
#define BLK_MAX_TIMEOUT (5 * HZ)
@ -38,85 +32,13 @@ struct blk_flush_queue {
};
extern struct kmem_cache *blk_requestq_cachep;
extern struct kmem_cache *request_cachep;
extern struct kobj_type blk_queue_ktype;
extern struct ida blk_queue_ida;
/*
* @q->queue_lock is set while a queue is being initialized. Since we know
* that no other threads access the queue object before @q->queue_lock has
* been set, it is safe to manipulate queue flags without holding the
* queue_lock if @q->queue_lock == NULL. See also blk_alloc_queue_node() and
* blk_init_allocated_queue().
*/
static inline void queue_lockdep_assert_held(struct request_queue *q)
static inline struct blk_flush_queue *
blk_get_flush_queue(struct request_queue *q, struct blk_mq_ctx *ctx)
{
if (q->queue_lock)
lockdep_assert_held(q->queue_lock);
}
static inline void queue_flag_set_unlocked(unsigned int flag,
struct request_queue *q)
{
if (test_bit(QUEUE_FLAG_INIT_DONE, &q->queue_flags) &&
kref_read(&q->kobj.kref))
lockdep_assert_held(q->queue_lock);
__set_bit(flag, &q->queue_flags);
}
static inline void queue_flag_clear_unlocked(unsigned int flag,
struct request_queue *q)
{
if (test_bit(QUEUE_FLAG_INIT_DONE, &q->queue_flags) &&
kref_read(&q->kobj.kref))
lockdep_assert_held(q->queue_lock);
__clear_bit(flag, &q->queue_flags);
}
static inline int queue_flag_test_and_clear(unsigned int flag,
struct request_queue *q)
{
queue_lockdep_assert_held(q);
if (test_bit(flag, &q->queue_flags)) {
__clear_bit(flag, &q->queue_flags);
return 1;
}
return 0;
}
static inline int queue_flag_test_and_set(unsigned int flag,
struct request_queue *q)
{
queue_lockdep_assert_held(q);
if (!test_bit(flag, &q->queue_flags)) {
__set_bit(flag, &q->queue_flags);
return 0;
}
return 1;
}
static inline void queue_flag_set(unsigned int flag, struct request_queue *q)
{
queue_lockdep_assert_held(q);
__set_bit(flag, &q->queue_flags);
}
static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
{
queue_lockdep_assert_held(q);
__clear_bit(flag, &q->queue_flags);
}
static inline struct blk_flush_queue *blk_get_flush_queue(
struct request_queue *q, struct blk_mq_ctx *ctx)
{
if (q->mq_ops)
return blk_mq_map_queue(q, ctx->cpu)->fq;
return q->fq;
return blk_mq_map_queue(q, REQ_OP_FLUSH, ctx->cpu)->fq;
}
static inline void __blk_get_queue(struct request_queue *q)
@ -128,15 +50,9 @@ struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q,
int node, int cmd_size, gfp_t flags);
void blk_free_flush_queue(struct blk_flush_queue *q);
int blk_init_rl(struct request_list *rl, struct request_queue *q,
gfp_t gfp_mask);
void blk_exit_rl(struct request_queue *q, struct request_list *rl);
void blk_exit_queue(struct request_queue *q);
void blk_rq_bio_prep(struct request_queue *q, struct request *rq,
struct bio *bio);
void blk_queue_bypass_start(struct request_queue *q);
void blk_queue_bypass_end(struct request_queue *q);
void __blk_queue_free_tags(struct request_queue *q);
void blk_freeze_queue(struct request_queue *q);
static inline void blk_queue_enter_live(struct request_queue *q)
@ -235,11 +151,8 @@ static inline bool bio_integrity_endio(struct bio *bio)
}
#endif /* CONFIG_BLK_DEV_INTEGRITY */
void blk_timeout_work(struct work_struct *work);
unsigned long blk_rq_timeout(unsigned long timeout);
void blk_add_timer(struct request *req);
void blk_delete_timer(struct request *);
bool bio_attempt_front_merge(struct request_queue *q, struct request *req,
struct bio *bio);
@ -248,34 +161,12 @@ bool bio_attempt_back_merge(struct request_queue *q, struct request *req,
bool bio_attempt_discard_merge(struct request_queue *q, struct request *req,
struct bio *bio);
bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
unsigned int *request_count,
struct request **same_queue_rq);
unsigned int blk_plug_queued_count(struct request_queue *q);
void blk_account_io_start(struct request *req, bool new_io);
void blk_account_io_completion(struct request *req, unsigned int bytes);
void blk_account_io_done(struct request *req, u64 now);
/*
* EH timer and IO completion will both attempt to 'grab' the request, make
* sure that only one of them succeeds. Steal the bottom bit of the
* __deadline field for this.
*/
static inline int blk_mark_rq_complete(struct request *rq)
{
return test_and_set_bit(0, &rq->__deadline);
}
static inline void blk_clear_rq_complete(struct request *rq)
{
clear_bit(0, &rq->__deadline);
}
static inline bool blk_rq_is_complete(struct request *rq)
{
return test_bit(0, &rq->__deadline);
}
/*
* Internal elevator interface
*/
@ -283,23 +174,6 @@ static inline bool blk_rq_is_complete(struct request *rq)
void blk_insert_flush(struct request *rq);
static inline void elv_activate_rq(struct request_queue *q, struct request *rq)
{
struct elevator_queue *e = q->elevator;
if (e->type->ops.sq.elevator_activate_req_fn)
e->type->ops.sq.elevator_activate_req_fn(q, rq);
}
static inline void elv_deactivate_rq(struct request_queue *q, struct request *rq)
{
struct elevator_queue *e = q->elevator;
if (e->type->ops.sq.elevator_deactivate_req_fn)
e->type->ops.sq.elevator_deactivate_req_fn(q, rq);
}
int elevator_init(struct request_queue *);
int elevator_init_mq(struct request_queue *q);
int elevator_switch_mq(struct request_queue *q,
struct elevator_type *new_e);
@ -334,31 +208,8 @@ void blk_rq_set_mixed_merge(struct request *rq);
bool blk_rq_merge_ok(struct request *rq, struct bio *bio);
enum elv_merge blk_try_merge(struct request *rq, struct bio *bio);
void blk_queue_congestion_threshold(struct request_queue *q);
int blk_dev_init(void);
/*
* Return the threshold (number of used requests) at which the queue is
* considered to be congested. It include a little hysteresis to keep the
* context switch rate down.
*/
static inline int queue_congestion_on_threshold(struct request_queue *q)
{
return q->nr_congestion_on;
}
/*
* The threshold at which a queue is considered to be uncongested
*/
static inline int queue_congestion_off_threshold(struct request_queue *q)
{
return q->nr_congestion_off;
}
extern int blk_update_nr_requests(struct request_queue *, unsigned int);
/*
* Contribute to IO statistics IFF:
*
@ -380,21 +231,6 @@ static inline void req_set_nomerge(struct request_queue *q, struct request *req)
q->last_merge = NULL;
}
/*
* Steal a bit from this field for legacy IO path atomic IO marking. Note that
* setting the deadline clears the bottom bit, potentially clearing the
* completed bit. The user has to be OK with this (current ones are fine).
*/
static inline void blk_rq_set_deadline(struct request *rq, unsigned long time)
{
rq->__deadline = time & ~0x1UL;
}
static inline unsigned long blk_rq_deadline(struct request *rq)
{
return rq->__deadline & ~0x1UL;
}
/*
* The max size one bio can handle is UINT_MAX becasue bvec_iter.bi_size
* is defined as 'unsigned int', meantime it has to aligned to with logical
@ -416,22 +252,6 @@ void ioc_clear_queue(struct request_queue *q);
int create_task_io_context(struct task_struct *task, gfp_t gfp_mask, int node);
/**
* rq_ioc - determine io_context for request allocation
* @bio: request being allocated is for this bio (can be %NULL)
*
* Determine io_context to use for request allocation for @bio. May return
* %NULL if %current->io_context doesn't exist.
*/
static inline struct io_context *rq_ioc(struct bio *bio)
{
#ifdef CONFIG_BLK_CGROUP
if (bio && bio->bi_ioc)
return bio->bi_ioc;
#endif
return current->io_context;
}
/**
* create_io_context - try to create task->io_context
* @gfp_mask: allocation mask
@ -490,8 +310,6 @@ static inline void blk_queue_bounce(struct request_queue *q, struct bio **bio)
}
#endif /* CONFIG_BOUNCE */
extern void blk_drain_queue(struct request_queue *q);
#ifdef CONFIG_BLK_CGROUP_IOLATENCY
extern int blk_iolatency_init(struct request_queue *q);
#else

View File

@ -277,7 +277,8 @@ static struct bio *bounce_clone_bio(struct bio *bio_src, gfp_t gfp_mask,
}
}
bio_clone_blkcg_association(bio, bio_src);
bio_clone_blkg_association(bio, bio_src);
blkcg_bio_issue_init(bio);
return bio;
}

View File

@ -21,7 +21,7 @@
*
*/
#include <linux/slab.h>
#include <linux/blkdev.h>
#include <linux/blk-mq.h>
#include <linux/delay.h>
#include <linux/scatterlist.h>
#include <linux/bsg-lib.h>
@ -31,6 +31,12 @@
#define uptr64(val) ((void __user *)(uintptr_t)(val))
struct bsg_set {
struct blk_mq_tag_set tag_set;
bsg_job_fn *job_fn;
bsg_timeout_fn *timeout_fn;
};
static int bsg_transport_check_proto(struct sg_io_v4 *hdr)
{
if (hdr->protocol != BSG_PROTOCOL_SCSI ||
@ -129,7 +135,7 @@ static void bsg_teardown_job(struct kref *kref)
kfree(job->request_payload.sg_list);
kfree(job->reply_payload.sg_list);
blk_end_request_all(rq, BLK_STS_OK);
blk_mq_end_request(rq, BLK_STS_OK);
}
void bsg_job_put(struct bsg_job *job)
@ -157,15 +163,15 @@ void bsg_job_done(struct bsg_job *job, int result,
{
job->result = result;
job->reply_payload_rcv_len = reply_payload_rcv_len;
blk_complete_request(blk_mq_rq_from_pdu(job));
blk_mq_complete_request(blk_mq_rq_from_pdu(job));
}
EXPORT_SYMBOL_GPL(bsg_job_done);
/**
* bsg_softirq_done - softirq done routine for destroying the bsg requests
* bsg_complete - softirq done routine for destroying the bsg requests
* @rq: BSG request that holds the job to be destroyed
*/
static void bsg_softirq_done(struct request *rq)
static void bsg_complete(struct request *rq)
{
struct bsg_job *job = blk_mq_rq_to_pdu(rq);
@ -224,54 +230,48 @@ failjob_rls_job:
}
/**
* bsg_request_fn - generic handler for bsg requests
* @q: request queue to manage
* bsg_queue_rq - generic handler for bsg requests
* @hctx: hardware queue
* @bd: queue data
*
* On error the create_bsg_job function should return a -Exyz error value
* that will be set to ->result.
*
* Drivers/subsys should pass this to the queue init function.
*/
static void bsg_request_fn(struct request_queue *q)
__releases(q->queue_lock)
__acquires(q->queue_lock)
static blk_status_t bsg_queue_rq(struct blk_mq_hw_ctx *hctx,
const struct blk_mq_queue_data *bd)
{
struct request_queue *q = hctx->queue;
struct device *dev = q->queuedata;
struct request *req;
struct request *req = bd->rq;
struct bsg_set *bset =
container_of(q->tag_set, struct bsg_set, tag_set);
int ret;
blk_mq_start_request(req);
if (!get_device(dev))
return;
return BLK_STS_IOERR;
while (1) {
req = blk_fetch_request(q);
if (!req)
break;
spin_unlock_irq(q->queue_lock);
if (!bsg_prepare_job(dev, req))
return BLK_STS_IOERR;
if (!bsg_prepare_job(dev, req)) {
blk_end_request_all(req, BLK_STS_OK);
spin_lock_irq(q->queue_lock);
continue;
}
ret = bset->job_fn(blk_mq_rq_to_pdu(req));
if (ret)
return BLK_STS_IOERR;
ret = q->bsg_job_fn(blk_mq_rq_to_pdu(req));
spin_lock_irq(q->queue_lock);
if (ret)
break;
}
spin_unlock_irq(q->queue_lock);
put_device(dev);
spin_lock_irq(q->queue_lock);
return BLK_STS_OK;
}
/* called right after the request is allocated for the request_queue */
static int bsg_init_rq(struct request_queue *q, struct request *req, gfp_t gfp)
static int bsg_init_rq(struct blk_mq_tag_set *set, struct request *req,
unsigned int hctx_idx, unsigned int numa_node)
{
struct bsg_job *job = blk_mq_rq_to_pdu(req);
job->reply = kzalloc(SCSI_SENSE_BUFFERSIZE, gfp);
job->reply = kzalloc(SCSI_SENSE_BUFFERSIZE, GFP_KERNEL);
if (!job->reply)
return -ENOMEM;
return 0;
@ -289,13 +289,47 @@ static void bsg_initialize_rq(struct request *req)
job->dd_data = job + 1;
}
static void bsg_exit_rq(struct request_queue *q, struct request *req)
static void bsg_exit_rq(struct blk_mq_tag_set *set, struct request *req,
unsigned int hctx_idx)
{
struct bsg_job *job = blk_mq_rq_to_pdu(req);
kfree(job->reply);
}
void bsg_remove_queue(struct request_queue *q)
{
if (q) {
struct bsg_set *bset =
container_of(q->tag_set, struct bsg_set, tag_set);
bsg_unregister_queue(q);
blk_cleanup_queue(q);
blk_mq_free_tag_set(&bset->tag_set);
kfree(bset);
}
}
EXPORT_SYMBOL_GPL(bsg_remove_queue);
static enum blk_eh_timer_return bsg_timeout(struct request *rq, bool reserved)
{
struct bsg_set *bset =
container_of(rq->q->tag_set, struct bsg_set, tag_set);
if (!bset->timeout_fn)
return BLK_EH_DONE;
return bset->timeout_fn(rq);
}
static const struct blk_mq_ops bsg_mq_ops = {
.queue_rq = bsg_queue_rq,
.init_request = bsg_init_rq,
.exit_request = bsg_exit_rq,
.initialize_rq_fn = bsg_initialize_rq,
.complete = bsg_complete,
.timeout = bsg_timeout,
};
/**
* bsg_setup_queue - Create and add the bsg hooks so we can receive requests
* @dev: device to attach bsg device to
@ -304,28 +338,38 @@ static void bsg_exit_rq(struct request_queue *q, struct request *req)
* @dd_job_size: size of LLD data needed for each job
*/
struct request_queue *bsg_setup_queue(struct device *dev, const char *name,
bsg_job_fn *job_fn, int dd_job_size)
bsg_job_fn *job_fn, bsg_timeout_fn *timeout, int dd_job_size)
{
struct bsg_set *bset;
struct blk_mq_tag_set *set;
struct request_queue *q;
int ret;
int ret = -ENOMEM;
q = blk_alloc_queue(GFP_KERNEL);
if (!q)
bset = kzalloc(sizeof(*bset), GFP_KERNEL);
if (!bset)
return ERR_PTR(-ENOMEM);
q->cmd_size = sizeof(struct bsg_job) + dd_job_size;
q->init_rq_fn = bsg_init_rq;
q->exit_rq_fn = bsg_exit_rq;
q->initialize_rq_fn = bsg_initialize_rq;
q->request_fn = bsg_request_fn;
ret = blk_init_allocated_queue(q);
if (ret)
goto out_cleanup_queue;
bset->job_fn = job_fn;
bset->timeout_fn = timeout;
set = &bset->tag_set;
set->ops = &bsg_mq_ops,
set->nr_hw_queues = 1;
set->queue_depth = 128;
set->numa_node = NUMA_NO_NODE;
set->cmd_size = sizeof(struct bsg_job) + dd_job_size;
set->flags = BLK_MQ_F_NO_SCHED | BLK_MQ_F_BLOCKING;
if (blk_mq_alloc_tag_set(set))
goto out_tag_set;
q = blk_mq_init_queue(set);
if (IS_ERR(q)) {
ret = PTR_ERR(q);
goto out_queue;
}
q->queuedata = dev;
q->bsg_job_fn = job_fn;
blk_queue_flag_set(QUEUE_FLAG_BIDI, q);
blk_queue_softirq_done(q, bsg_softirq_done);
blk_queue_rq_timeout(q, BLK_DEFAULT_SG_TIMEOUT);
ret = bsg_register_queue(q, dev, name, &bsg_transport_ops);
@ -338,6 +382,10 @@ struct request_queue *bsg_setup_queue(struct device *dev, const char *name,
return q;
out_cleanup_queue:
blk_cleanup_queue(q);
out_queue:
blk_mq_free_tag_set(set);
out_tag_set:
kfree(bset);
return ERR_PTR(ret);
}
EXPORT_SYMBOL_GPL(bsg_setup_queue);

View File

@ -471,7 +471,7 @@ int bsg_register_queue(struct request_queue *q, struct device *parent,
/*
* we need a proper transport to send commands, not a stacked device
*/
if (!queue_is_rq_based(q))
if (!queue_is_mq(q))
return 0;
bcd = &q->bsg_dev;

File diff suppressed because it is too large Load Diff

View File

@ -1,560 +0,0 @@
/*
* Deadline i/o scheduler.
*
* Copyright (C) 2002 Jens Axboe <axboe@kernel.dk>
*/
#include <linux/kernel.h>
#include <linux/fs.h>
#include <linux/blkdev.h>
#include <linux/elevator.h>
#include <linux/bio.h>
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/init.h>
#include <linux/compiler.h>
#include <linux/rbtree.h>
/*
* See Documentation/block/deadline-iosched.txt
*/
static const int read_expire = HZ / 2; /* max time before a read is submitted. */
static const int write_expire = 5 * HZ; /* ditto for writes, these limits are SOFT! */
static const int writes_starved = 2; /* max times reads can starve a write */
static const int fifo_batch = 16; /* # of sequential requests treated as one
by the above parameters. For throughput. */
struct deadline_data {
/*
* run time data
*/
/*
* requests (deadline_rq s) are present on both sort_list and fifo_list
*/
struct rb_root sort_list[2];
struct list_head fifo_list[2];
/*
* next in sort order. read, write or both are NULL
*/
struct request *next_rq[2];
unsigned int batching; /* number of sequential requests made */
unsigned int starved; /* times reads have starved writes */
/*
* settings that change how the i/o scheduler behaves
*/
int fifo_expire[2];
int fifo_batch;
int writes_starved;
int front_merges;
};
static inline struct rb_root *
deadline_rb_root(struct deadline_data *dd, struct request *rq)
{
return &dd->sort_list[rq_data_dir(rq)];
}
/*
* get the request after `rq' in sector-sorted order
*/
static inline struct request *
deadline_latter_request(struct request *rq)
{
struct rb_node *node = rb_next(&rq->rb_node);
if (node)
return rb_entry_rq(node);
return NULL;
}
static void
deadline_add_rq_rb(struct deadline_data *dd, struct request *rq)
{
struct rb_root *root = deadline_rb_root(dd, rq);
elv_rb_add(root, rq);
}
static inline void
deadline_del_rq_rb(struct deadline_data *dd, struct request *rq)
{
const int data_dir = rq_data_dir(rq);
if (dd->next_rq[data_dir] == rq)
dd->next_rq[data_dir] = deadline_latter_request(rq);
elv_rb_del(deadline_rb_root(dd, rq), rq);
}
/*
* add rq to rbtree and fifo
*/
static void
deadline_add_request(struct request_queue *q, struct request *rq)
{
struct deadline_data *dd = q->elevator->elevator_data;
const int data_dir = rq_data_dir(rq);
/*
* This may be a requeue of a write request that has locked its
* target zone. If it is the case, this releases the zone lock.
*/
blk_req_zone_write_unlock(rq);
deadline_add_rq_rb(dd, rq);
/*
* set expire time and add to fifo list
*/
rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
list_add_tail(&rq->queuelist, &dd->fifo_list[data_dir]);
}
/*
* remove rq from rbtree and fifo.
*/
static void deadline_remove_request(struct request_queue *q, struct request *rq)
{
struct deadline_data *dd = q->elevator->elevator_data;
rq_fifo_clear(rq);
deadline_del_rq_rb(dd, rq);
}
static enum elv_merge
deadline_merge(struct request_queue *q, struct request **req, struct bio *bio)
{
struct deadline_data *dd = q->elevator->elevator_data;
struct request *__rq;
/*
* check for front merge
*/
if (dd->front_merges) {
sector_t sector = bio_end_sector(bio);
__rq = elv_rb_find(&dd->sort_list[bio_data_dir(bio)], sector);
if (__rq) {
BUG_ON(sector != blk_rq_pos(__rq));
if (elv_bio_merge_ok(__rq, bio)) {
*req = __rq;
return ELEVATOR_FRONT_MERGE;
}
}
}
return ELEVATOR_NO_MERGE;
}
static void deadline_merged_request(struct request_queue *q,
struct request *req, enum elv_merge type)
{
struct deadline_data *dd = q->elevator->elevator_data;
/*
* if the merge was a front merge, we need to reposition request
*/
if (type == ELEVATOR_FRONT_MERGE) {
elv_rb_del(deadline_rb_root(dd, req), req);
deadline_add_rq_rb(dd, req);
}
}
static void
deadline_merged_requests(struct request_queue *q, struct request *req,
struct request *next)
{
/*
* if next expires before rq, assign its expire time to rq
* and move into next position (next will be deleted) in fifo
*/
if (!list_empty(&req->queuelist) && !list_empty(&next->queuelist)) {
if (time_before((unsigned long)next->fifo_time,
(unsigned long)req->fifo_time)) {
list_move(&req->queuelist, &next->queuelist);
req->fifo_time = next->fifo_time;
}
}
/*
* kill knowledge of next, this one is a goner
*/
deadline_remove_request(q, next);
}
/*
* move request from sort list to dispatch queue.
*/
static inline void
deadline_move_to_dispatch(struct deadline_data *dd, struct request *rq)
{
struct request_queue *q = rq->q;
/*
* For a zoned block device, write requests must write lock their
* target zone.
*/
blk_req_zone_write_lock(rq);
deadline_remove_request(q, rq);
elv_dispatch_add_tail(q, rq);
}
/*
* move an entry to dispatch queue
*/
static void
deadline_move_request(struct deadline_data *dd, struct request *rq)
{
const int data_dir = rq_data_dir(rq);
dd->next_rq[READ] = NULL;
dd->next_rq[WRITE] = NULL;
dd->next_rq[data_dir] = deadline_latter_request(rq);
/*
* take it off the sort and fifo list, move
* to dispatch queue
*/
deadline_move_to_dispatch(dd, rq);
}
/*
* deadline_check_fifo returns 0 if there are no expired requests on the fifo,
* 1 otherwise. Requires !list_empty(&dd->fifo_list[data_dir])
*/
static inline int deadline_check_fifo(struct deadline_data *dd, int ddir)
{
struct request *rq = rq_entry_fifo(dd->fifo_list[ddir].next);
/*
* rq is expired!
*/
if (time_after_eq(jiffies, (unsigned long)rq->fifo_time))
return 1;
return 0;
}
/*
* For the specified data direction, return the next request to dispatch using
* arrival ordered lists.
*/
static struct request *
deadline_fifo_request(struct deadline_data *dd, int data_dir)
{
struct request *rq;
if (WARN_ON_ONCE(data_dir != READ && data_dir != WRITE))
return NULL;
if (list_empty(&dd->fifo_list[data_dir]))
return NULL;
rq = rq_entry_fifo(dd->fifo_list[data_dir].next);
if (data_dir == READ || !blk_queue_is_zoned(rq->q))
return rq;
/*
* Look for a write request that can be dispatched, that is one with
* an unlocked target zone.
*/
list_for_each_entry(rq, &dd->fifo_list[WRITE], queuelist) {
if (blk_req_can_dispatch_to_zone(rq))
return rq;
}
return NULL;
}
/*
* For the specified data direction, return the next request to dispatch using
* sector position sorted lists.
*/
static struct request *
deadline_next_request(struct deadline_data *dd, int data_dir)
{
struct request *rq;
if (WARN_ON_ONCE(data_dir != READ && data_dir != WRITE))
return NULL;
rq = dd->next_rq[data_dir];
if (!rq)
return NULL;
if (data_dir == READ || !blk_queue_is_zoned(rq->q))
return rq;
/*
* Look for a write request that can be dispatched, that is one with
* an unlocked target zone.
*/
while (rq) {
if (blk_req_can_dispatch_to_zone(rq))
return rq;
rq = deadline_latter_request(rq);
}
return NULL;
}
/*
* deadline_dispatch_requests selects the best request according to
* read/write expire, fifo_batch, etc
*/
static int deadline_dispatch_requests(struct request_queue *q, int force)
{
struct deadline_data *dd = q->elevator->elevator_data;
const int reads = !list_empty(&dd->fifo_list[READ]);
const int writes = !list_empty(&dd->fifo_list[WRITE]);
struct request *rq, *next_rq;
int data_dir;
/*
* batches are currently reads XOR writes
*/
rq = deadline_next_request(dd, WRITE);
if (!rq)
rq = deadline_next_request(dd, READ);
if (rq && dd->batching < dd->fifo_batch)
/* we have a next request are still entitled to batch */
goto dispatch_request;
/*
* at this point we are not running a batch. select the appropriate
* data direction (read / write)
*/
if (reads) {
BUG_ON(RB_EMPTY_ROOT(&dd->sort_list[READ]));
if (deadline_fifo_request(dd, WRITE) &&
(dd->starved++ >= dd->writes_starved))
goto dispatch_writes;
data_dir = READ;
goto dispatch_find_request;
}
/*
* there are either no reads or writes have been starved
*/
if (writes) {
dispatch_writes:
BUG_ON(RB_EMPTY_ROOT(&dd->sort_list[WRITE]));
dd->starved = 0;
data_dir = WRITE;
goto dispatch_find_request;
}
return 0;
dispatch_find_request:
/*
* we are not running a batch, find best request for selected data_dir
*/
next_rq = deadline_next_request(dd, data_dir);
if (deadline_check_fifo(dd, data_dir) || !next_rq) {
/*
* A deadline has expired, the last request was in the other
* direction, or we have run out of higher-sectored requests.
* Start again from the request with the earliest expiry time.
*/
rq = deadline_fifo_request(dd, data_dir);
} else {
/*
* The last req was the same dir and we have a next request in
* sort order. No expired requests so continue on from here.
*/
rq = next_rq;
}
/*
* For a zoned block device, if we only have writes queued and none of
* them can be dispatched, rq will be NULL.
*/
if (!rq)
return 0;
dd->batching = 0;
dispatch_request:
/*
* rq is the selected appropriate request.
*/
dd->batching++;
deadline_move_request(dd, rq);
return 1;
}
/*
* For zoned block devices, write unlock the target zone of completed
* write requests.
*/
static void
deadline_completed_request(struct request_queue *q, struct request *rq)
{
blk_req_zone_write_unlock(rq);
}
static void deadline_exit_queue(struct elevator_queue *e)
{
struct deadline_data *dd = e->elevator_data;
BUG_ON(!list_empty(&dd->fifo_list[READ]));
BUG_ON(!list_empty(&dd->fifo_list[WRITE]));
kfree(dd);
}
/*
* initialize elevator private data (deadline_data).
*/
static int deadline_init_queue(struct request_queue *q, struct elevator_type *e)
{
struct deadline_data *dd;
struct elevator_queue *eq;
eq = elevator_alloc(q, e);
if (!eq)
return -ENOMEM;
dd = kzalloc_node(sizeof(*dd), GFP_KERNEL, q->node);
if (!dd) {
kobject_put(&eq->kobj);
return -ENOMEM;
}
eq->elevator_data = dd;
INIT_LIST_HEAD(&dd->fifo_list[READ]);
INIT_LIST_HEAD(&dd->fifo_list[WRITE]);
dd->sort_list[READ] = RB_ROOT;
dd->sort_list[WRITE] = RB_ROOT;
dd->fifo_expire[READ] = read_expire;
dd->fifo_expire[WRITE] = write_expire;
dd->writes_starved = writes_starved;
dd->front_merges = 1;
dd->fifo_batch = fifo_batch;
spin_lock_irq(q->queue_lock);
q->elevator = eq;
spin_unlock_irq(q->queue_lock);
return 0;
}
/*
* sysfs parts below
*/
static ssize_t
deadline_var_show(int var, char *page)
{
return sprintf(page, "%d\n", var);
}
static void
deadline_var_store(int *var, const char *page)
{
char *p = (char *) page;
*var = simple_strtol(p, &p, 10);
}
#define SHOW_FUNCTION(__FUNC, __VAR, __CONV) \
static ssize_t __FUNC(struct elevator_queue *e, char *page) \
{ \
struct deadline_data *dd = e->elevator_data; \
int __data = __VAR; \
if (__CONV) \
__data = jiffies_to_msecs(__data); \
return deadline_var_show(__data, (page)); \
}
SHOW_FUNCTION(deadline_read_expire_show, dd->fifo_expire[READ], 1);
SHOW_FUNCTION(deadline_write_expire_show, dd->fifo_expire[WRITE], 1);
SHOW_FUNCTION(deadline_writes_starved_show, dd->writes_starved, 0);
SHOW_FUNCTION(deadline_front_merges_show, dd->front_merges, 0);
SHOW_FUNCTION(deadline_fifo_batch_show, dd->fifo_batch, 0);
#undef SHOW_FUNCTION
#define STORE_FUNCTION(__FUNC, __PTR, MIN, MAX, __CONV) \
static ssize_t __FUNC(struct elevator_queue *e, const char *page, size_t count) \
{ \
struct deadline_data *dd = e->elevator_data; \
int __data; \
deadline_var_store(&__data, (page)); \
if (__data < (MIN)) \
__data = (MIN); \
else if (__data > (MAX)) \
__data = (MAX); \
if (__CONV) \
*(__PTR) = msecs_to_jiffies(__data); \
else \
*(__PTR) = __data; \
return count; \
}
STORE_FUNCTION(deadline_read_expire_store, &dd->fifo_expire[READ], 0, INT_MAX, 1);
STORE_FUNCTION(deadline_write_expire_store, &dd->fifo_expire[WRITE], 0, INT_MAX, 1);
STORE_FUNCTION(deadline_writes_starved_store, &dd->writes_starved, INT_MIN, INT_MAX, 0);
STORE_FUNCTION(deadline_front_merges_store, &dd->front_merges, 0, 1, 0);
STORE_FUNCTION(deadline_fifo_batch_store, &dd->fifo_batch, 0, INT_MAX, 0);
#undef STORE_FUNCTION
#define DD_ATTR(name) \
__ATTR(name, 0644, deadline_##name##_show, deadline_##name##_store)
static struct elv_fs_entry deadline_attrs[] = {
DD_ATTR(read_expire),
DD_ATTR(write_expire),
DD_ATTR(writes_starved),
DD_ATTR(front_merges),
DD_ATTR(fifo_batch),
__ATTR_NULL
};
static struct elevator_type iosched_deadline = {
.ops.sq = {
.elevator_merge_fn = deadline_merge,
.elevator_merged_fn = deadline_merged_request,
.elevator_merge_req_fn = deadline_merged_requests,
.elevator_dispatch_fn = deadline_dispatch_requests,
.elevator_completed_req_fn = deadline_completed_request,
.elevator_add_req_fn = deadline_add_request,
.elevator_former_req_fn = elv_rb_former_request,
.elevator_latter_req_fn = elv_rb_latter_request,
.elevator_init_fn = deadline_init_queue,
.elevator_exit_fn = deadline_exit_queue,
},
.elevator_attrs = deadline_attrs,
.elevator_name = "deadline",
.elevator_owner = THIS_MODULE,
};
static int __init deadline_init(void)
{
return elv_register(&iosched_deadline);
}
static void __exit deadline_exit(void)
{
elv_unregister(&iosched_deadline);
}
module_init(deadline_init);
module_exit(deadline_exit);
MODULE_AUTHOR("Jens Axboe");
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("deadline IO scheduler");

View File

@ -61,10 +61,8 @@ static int elv_iosched_allow_bio_merge(struct request *rq, struct bio *bio)
struct request_queue *q = rq->q;
struct elevator_queue *e = q->elevator;
if (e->uses_mq && e->type->ops.mq.allow_merge)
return e->type->ops.mq.allow_merge(q, rq, bio);
else if (!e->uses_mq && e->type->ops.sq.elevator_allow_bio_merge_fn)
return e->type->ops.sq.elevator_allow_bio_merge_fn(q, rq, bio);
if (e->type->ops.allow_merge)
return e->type->ops.allow_merge(q, rq, bio);
return 1;
}
@ -95,14 +93,14 @@ static bool elevator_match(const struct elevator_type *e, const char *name)
}
/*
* Return scheduler with name 'name' and with matching 'mq capability
* Return scheduler with name 'name'
*/
static struct elevator_type *elevator_find(const char *name, bool mq)
static struct elevator_type *elevator_find(const char *name)
{
struct elevator_type *e;
list_for_each_entry(e, &elv_list, list) {
if (elevator_match(e, name) && (mq == e->uses_mq))
if (elevator_match(e, name))
return e;
}
@ -121,12 +119,12 @@ static struct elevator_type *elevator_get(struct request_queue *q,
spin_lock(&elv_list_lock);
e = elevator_find(name, q->mq_ops != NULL);
e = elevator_find(name);
if (!e && try_loading) {
spin_unlock(&elv_list_lock);
request_module("%s-iosched", name);
spin_lock(&elv_list_lock);
e = elevator_find(name, q->mq_ops != NULL);
e = elevator_find(name);
}
if (e && !try_module_get(e->elevator_owner))
@ -150,26 +148,6 @@ static int __init elevator_setup(char *str)
__setup("elevator=", elevator_setup);
/* called during boot to load the elevator chosen by the elevator param */
void __init load_default_elevator_module(void)
{
struct elevator_type *e;
if (!chosen_elevator[0])
return;
/*
* Boot parameter is deprecated, we haven't supported that for MQ.
* Only look for non-mq schedulers from here.
*/
spin_lock(&elv_list_lock);
e = elevator_find(chosen_elevator, false);
spin_unlock(&elv_list_lock);
if (!e)
request_module("%s-iosched", chosen_elevator);
}
static struct kobj_type elv_ktype;
struct elevator_queue *elevator_alloc(struct request_queue *q,
@ -185,7 +163,6 @@ struct elevator_queue *elevator_alloc(struct request_queue *q,
kobject_init(&eq->kobj, &elv_ktype);
mutex_init(&eq->sysfs_lock);
hash_init(eq->hash);
eq->uses_mq = e->uses_mq;
return eq;
}
@ -200,54 +177,11 @@ static void elevator_release(struct kobject *kobj)
kfree(e);
}
/*
* Use the default elevator specified by config boot param for non-mq devices,
* or by config option. Don't try to load modules as we could be running off
* async and request_module() isn't allowed from async.
*/
int elevator_init(struct request_queue *q)
{
struct elevator_type *e = NULL;
int err = 0;
/*
* q->sysfs_lock must be held to provide mutual exclusion between
* elevator_switch() and here.
*/
mutex_lock(&q->sysfs_lock);
if (unlikely(q->elevator))
goto out_unlock;
if (*chosen_elevator) {
e = elevator_get(q, chosen_elevator, false);
if (!e)
printk(KERN_ERR "I/O scheduler %s not found\n",
chosen_elevator);
}
if (!e)
e = elevator_get(q, CONFIG_DEFAULT_IOSCHED, false);
if (!e) {
printk(KERN_ERR
"Default I/O scheduler not found. Using noop.\n");
e = elevator_get(q, "noop", false);
}
err = e->ops.sq.elevator_init_fn(q, e);
if (err)
elevator_put(e);
out_unlock:
mutex_unlock(&q->sysfs_lock);
return err;
}
void elevator_exit(struct request_queue *q, struct elevator_queue *e)
{
mutex_lock(&e->sysfs_lock);
if (e->uses_mq && e->type->ops.mq.exit_sched)
if (e->type->ops.exit_sched)
blk_mq_exit_sched(q, e);
else if (!e->uses_mq && e->type->ops.sq.elevator_exit_fn)
e->type->ops.sq.elevator_exit_fn(e);
mutex_unlock(&e->sysfs_lock);
kobject_put(&e->kobj);
@ -356,68 +290,6 @@ struct request *elv_rb_find(struct rb_root *root, sector_t sector)
}
EXPORT_SYMBOL(elv_rb_find);
/*
* Insert rq into dispatch queue of q. Queue lock must be held on
* entry. rq is sort instead into the dispatch queue. To be used by
* specific elevators.
*/
void elv_dispatch_sort(struct request_queue *q, struct request *rq)
{
sector_t boundary;
struct list_head *entry;
if (q->last_merge == rq)
q->last_merge = NULL;
elv_rqhash_del(q, rq);
q->nr_sorted--;
boundary = q->end_sector;
list_for_each_prev(entry, &q->queue_head) {
struct request *pos = list_entry_rq(entry);
if (req_op(rq) != req_op(pos))
break;
if (rq_data_dir(rq) != rq_data_dir(pos))
break;
if (pos->rq_flags & (RQF_STARTED | RQF_SOFTBARRIER))
break;
if (blk_rq_pos(rq) >= boundary) {
if (blk_rq_pos(pos) < boundary)
continue;
} else {
if (blk_rq_pos(pos) >= boundary)
break;
}
if (blk_rq_pos(rq) >= blk_rq_pos(pos))
break;
}
list_add(&rq->queuelist, entry);
}
EXPORT_SYMBOL(elv_dispatch_sort);
/*
* Insert rq into dispatch queue of q. Queue lock must be held on
* entry. rq is added to the back of the dispatch queue. To be used by
* specific elevators.
*/
void elv_dispatch_add_tail(struct request_queue *q, struct request *rq)
{
if (q->last_merge == rq)
q->last_merge = NULL;
elv_rqhash_del(q, rq);
q->nr_sorted--;
q->end_sector = rq_end_sector(rq);
q->boundary_rq = rq;
list_add_tail(&rq->queuelist, &q->queue_head);
}
EXPORT_SYMBOL(elv_dispatch_add_tail);
enum elv_merge elv_merge(struct request_queue *q, struct request **req,
struct bio *bio)
{
@ -457,10 +329,8 @@ enum elv_merge elv_merge(struct request_queue *q, struct request **req,
return ELEVATOR_BACK_MERGE;
}
if (e->uses_mq && e->type->ops.mq.request_merge)
return e->type->ops.mq.request_merge(q, req, bio);
else if (!e->uses_mq && e->type->ops.sq.elevator_merge_fn)
return e->type->ops.sq.elevator_merge_fn(q, req, bio);
if (e->type->ops.request_merge)
return e->type->ops.request_merge(q, req, bio);
return ELEVATOR_NO_MERGE;
}
@ -511,10 +381,8 @@ void elv_merged_request(struct request_queue *q, struct request *rq,
{
struct elevator_queue *e = q->elevator;
if (e->uses_mq && e->type->ops.mq.request_merged)
e->type->ops.mq.request_merged(q, rq, type);
else if (!e->uses_mq && e->type->ops.sq.elevator_merged_fn)
e->type->ops.sq.elevator_merged_fn(q, rq, type);
if (e->type->ops.request_merged)
e->type->ops.request_merged(q, rq, type);
if (type == ELEVATOR_BACK_MERGE)
elv_rqhash_reposition(q, rq);
@ -526,176 +394,20 @@ void elv_merge_requests(struct request_queue *q, struct request *rq,
struct request *next)
{
struct elevator_queue *e = q->elevator;
bool next_sorted = false;
if (e->uses_mq && e->type->ops.mq.requests_merged)
e->type->ops.mq.requests_merged(q, rq, next);
else if (e->type->ops.sq.elevator_merge_req_fn) {
next_sorted = (__force bool)(next->rq_flags & RQF_SORTED);
if (next_sorted)
e->type->ops.sq.elevator_merge_req_fn(q, rq, next);
}
if (e->type->ops.requests_merged)
e->type->ops.requests_merged(q, rq, next);
elv_rqhash_reposition(q, rq);
if (next_sorted) {
elv_rqhash_del(q, next);
q->nr_sorted--;
}
q->last_merge = rq;
}
void elv_bio_merged(struct request_queue *q, struct request *rq,
struct bio *bio)
{
struct elevator_queue *e = q->elevator;
if (WARN_ON_ONCE(e->uses_mq))
return;
if (e->type->ops.sq.elevator_bio_merged_fn)
e->type->ops.sq.elevator_bio_merged_fn(q, rq, bio);
}
void elv_requeue_request(struct request_queue *q, struct request *rq)
{
/*
* it already went through dequeue, we need to decrement the
* in_flight count again
*/
if (blk_account_rq(rq)) {
q->in_flight[rq_is_sync(rq)]--;
if (rq->rq_flags & RQF_SORTED)
elv_deactivate_rq(q, rq);
}
rq->rq_flags &= ~RQF_STARTED;
blk_pm_requeue_request(rq);
__elv_add_request(q, rq, ELEVATOR_INSERT_REQUEUE);
}
void elv_drain_elevator(struct request_queue *q)
{
struct elevator_queue *e = q->elevator;
static int printed;
if (WARN_ON_ONCE(e->uses_mq))
return;
lockdep_assert_held(q->queue_lock);
while (e->type->ops.sq.elevator_dispatch_fn(q, 1))
;
if (q->nr_sorted && !blk_queue_is_zoned(q) && printed++ < 10 ) {
printk(KERN_ERR "%s: forced dispatching is broken "
"(nr_sorted=%u), please report this\n",
q->elevator->type->elevator_name, q->nr_sorted);
}
}
void __elv_add_request(struct request_queue *q, struct request *rq, int where)
{
trace_block_rq_insert(q, rq);
blk_pm_add_request(q, rq);
rq->q = q;
if (rq->rq_flags & RQF_SOFTBARRIER) {
/* barriers are scheduling boundary, update end_sector */
if (!blk_rq_is_passthrough(rq)) {
q->end_sector = rq_end_sector(rq);
q->boundary_rq = rq;
}
} else if (!(rq->rq_flags & RQF_ELVPRIV) &&
(where == ELEVATOR_INSERT_SORT ||
where == ELEVATOR_INSERT_SORT_MERGE))
where = ELEVATOR_INSERT_BACK;
switch (where) {
case ELEVATOR_INSERT_REQUEUE:
case ELEVATOR_INSERT_FRONT:
rq->rq_flags |= RQF_SOFTBARRIER;
list_add(&rq->queuelist, &q->queue_head);
break;
case ELEVATOR_INSERT_BACK:
rq->rq_flags |= RQF_SOFTBARRIER;
elv_drain_elevator(q);
list_add_tail(&rq->queuelist, &q->queue_head);
/*
* We kick the queue here for the following reasons.
* - The elevator might have returned NULL previously
* to delay requests and returned them now. As the
* queue wasn't empty before this request, ll_rw_blk
* won't run the queue on return, resulting in hang.
* - Usually, back inserted requests won't be merged
* with anything. There's no point in delaying queue
* processing.
*/
__blk_run_queue(q);
break;
case ELEVATOR_INSERT_SORT_MERGE:
/*
* If we succeed in merging this request with one in the
* queue already, we are done - rq has now been freed,
* so no need to do anything further.
*/
if (elv_attempt_insert_merge(q, rq))
break;
/* fall through */
case ELEVATOR_INSERT_SORT:
BUG_ON(blk_rq_is_passthrough(rq));
rq->rq_flags |= RQF_SORTED;
q->nr_sorted++;
if (rq_mergeable(rq)) {
elv_rqhash_add(q, rq);
if (!q->last_merge)
q->last_merge = rq;
}
/*
* Some ioscheds (cfq) run q->request_fn directly, so
* rq cannot be accessed after calling
* elevator_add_req_fn.
*/
q->elevator->type->ops.sq.elevator_add_req_fn(q, rq);
break;
case ELEVATOR_INSERT_FLUSH:
rq->rq_flags |= RQF_SOFTBARRIER;
blk_insert_flush(rq);
break;
default:
printk(KERN_ERR "%s: bad insertion point %d\n",
__func__, where);
BUG();
}
}
EXPORT_SYMBOL(__elv_add_request);
void elv_add_request(struct request_queue *q, struct request *rq, int where)
{
unsigned long flags;
spin_lock_irqsave(q->queue_lock, flags);
__elv_add_request(q, rq, where);
spin_unlock_irqrestore(q->queue_lock, flags);
}
EXPORT_SYMBOL(elv_add_request);
struct request *elv_latter_request(struct request_queue *q, struct request *rq)
{
struct elevator_queue *e = q->elevator;
if (e->uses_mq && e->type->ops.mq.next_request)
return e->type->ops.mq.next_request(q, rq);
else if (!e->uses_mq && e->type->ops.sq.elevator_latter_req_fn)
return e->type->ops.sq.elevator_latter_req_fn(q, rq);
if (e->type->ops.next_request)
return e->type->ops.next_request(q, rq);
return NULL;
}
@ -704,68 +416,12 @@ struct request *elv_former_request(struct request_queue *q, struct request *rq)
{
struct elevator_queue *e = q->elevator;
if (e->uses_mq && e->type->ops.mq.former_request)
return e->type->ops.mq.former_request(q, rq);
if (!e->uses_mq && e->type->ops.sq.elevator_former_req_fn)
return e->type->ops.sq.elevator_former_req_fn(q, rq);
if (e->type->ops.former_request)
return e->type->ops.former_request(q, rq);
return NULL;
}
int elv_set_request(struct request_queue *q, struct request *rq,
struct bio *bio, gfp_t gfp_mask)
{
struct elevator_queue *e = q->elevator;
if (WARN_ON_ONCE(e->uses_mq))
return 0;
if (e->type->ops.sq.elevator_set_req_fn)
return e->type->ops.sq.elevator_set_req_fn(q, rq, bio, gfp_mask);
return 0;
}
void elv_put_request(struct request_queue *q, struct request *rq)
{
struct elevator_queue *e = q->elevator;
if (WARN_ON_ONCE(e->uses_mq))
return;
if (e->type->ops.sq.elevator_put_req_fn)
e->type->ops.sq.elevator_put_req_fn(rq);
}
int elv_may_queue(struct request_queue *q, unsigned int op)
{
struct elevator_queue *e = q->elevator;
if (WARN_ON_ONCE(e->uses_mq))
return 0;
if (e->type->ops.sq.elevator_may_queue_fn)
return e->type->ops.sq.elevator_may_queue_fn(q, op);
return ELV_MQUEUE_MAY;
}
void elv_completed_request(struct request_queue *q, struct request *rq)
{
struct elevator_queue *e = q->elevator;
if (WARN_ON_ONCE(e->uses_mq))
return;
/*
* request is released from the driver, io must be done
*/
if (blk_account_rq(rq)) {
q->in_flight[rq_is_sync(rq)]--;
if ((rq->rq_flags & RQF_SORTED) &&
e->type->ops.sq.elevator_completed_req_fn)
e->type->ops.sq.elevator_completed_req_fn(q, rq);
}
}
#define to_elv(atr) container_of((atr), struct elv_fs_entry, attr)
static ssize_t
@ -832,8 +488,6 @@ int elv_register_queue(struct request_queue *q)
}
kobject_uevent(&e->kobj, KOBJ_ADD);
e->registered = 1;
if (!e->uses_mq && e->type->ops.sq.elevator_registered_fn)
e->type->ops.sq.elevator_registered_fn(q);
}
return error;
}
@ -873,7 +527,7 @@ int elv_register(struct elevator_type *e)
/* register, don't allow duplicate names */
spin_lock(&elv_list_lock);
if (elevator_find(e->elevator_name, e->uses_mq)) {
if (elevator_find(e->elevator_name)) {
spin_unlock(&elv_list_lock);
kmem_cache_destroy(e->icq_cache);
return -EBUSY;
@ -881,12 +535,6 @@ int elv_register(struct elevator_type *e)
list_add_tail(&e->list, &elv_list);
spin_unlock(&elv_list_lock);
/* print pretty message */
if (elevator_match(e, chosen_elevator) ||
(!*chosen_elevator &&
elevator_match(e, CONFIG_DEFAULT_IOSCHED)))
def = " (default)";
printk(KERN_INFO "io scheduler %s registered%s\n", e->elevator_name,
def);
return 0;
@ -989,71 +637,17 @@ out_unlock:
*/
static int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
{
struct elevator_queue *old = q->elevator;
bool old_registered = false;
int err;
lockdep_assert_held(&q->sysfs_lock);
if (q->mq_ops) {
blk_mq_freeze_queue(q);
blk_mq_quiesce_queue(q);
blk_mq_freeze_queue(q);
blk_mq_quiesce_queue(q);
err = elevator_switch_mq(q, new_e);
err = elevator_switch_mq(q, new_e);
blk_mq_unquiesce_queue(q);
blk_mq_unfreeze_queue(q);
return err;
}
/*
* Turn on BYPASS and drain all requests w/ elevator private data.
* Block layer doesn't call into a quiesced elevator - all requests
* are directly put on the dispatch list without elevator data
* using INSERT_BACK. All requests have SOFTBARRIER set and no
* merge happens either.
*/
if (old) {
old_registered = old->registered;
blk_queue_bypass_start(q);
/* unregister and clear all auxiliary data of the old elevator */
if (old_registered)
elv_unregister_queue(q);
ioc_clear_queue(q);
}
/* allocate, init and register new elevator */
err = new_e->ops.sq.elevator_init_fn(q, new_e);
if (err)
goto fail_init;
err = elv_register_queue(q);
if (err)
goto fail_register;
/* done, kill the old one and finish */
if (old) {
elevator_exit(q, old);
blk_queue_bypass_end(q);
}
blk_add_trace_msg(q, "elv switch: %s", new_e->elevator_name);
return 0;
fail_register:
elevator_exit(q, q->elevator);
fail_init:
/* switch failed, restore and re-register old elevator */
if (old) {
q->elevator = old;
elv_register_queue(q);
blk_queue_bypass_end(q);
}
blk_mq_unquiesce_queue(q);
blk_mq_unfreeze_queue(q);
return err;
}
@ -1073,7 +667,7 @@ static int __elevator_change(struct request_queue *q, const char *name)
/*
* Special case for mq, turn off scheduling
*/
if (q->mq_ops && !strncmp(name, "none", 4))
if (!strncmp(name, "none", 4))
return elevator_switch(q, NULL);
strlcpy(elevator_name, name, sizeof(elevator_name));
@ -1091,8 +685,7 @@ static int __elevator_change(struct request_queue *q, const char *name)
static inline bool elv_support_iosched(struct request_queue *q)
{
if (q->mq_ops && q->tag_set && (q->tag_set->flags &
BLK_MQ_F_NO_SCHED))
if (q->tag_set && (q->tag_set->flags & BLK_MQ_F_NO_SCHED))
return false;
return true;
}
@ -1102,7 +695,7 @@ ssize_t elv_iosched_store(struct request_queue *q, const char *name,
{
int ret;
if (!(q->mq_ops || q->request_fn) || !elv_support_iosched(q))
if (!queue_is_mq(q) || !elv_support_iosched(q))
return count;
ret = __elevator_change(q, name);
@ -1117,10 +710,9 @@ ssize_t elv_iosched_show(struct request_queue *q, char *name)
struct elevator_queue *e = q->elevator;
struct elevator_type *elv = NULL;
struct elevator_type *__e;
bool uses_mq = q->mq_ops != NULL;
int len = 0;
if (!queue_is_rq_based(q))
if (!queue_is_mq(q))
return sprintf(name, "none\n");
if (!q->elevator)
@ -1130,19 +722,16 @@ ssize_t elv_iosched_show(struct request_queue *q, char *name)
spin_lock(&elv_list_lock);
list_for_each_entry(__e, &elv_list, list) {
if (elv && elevator_match(elv, __e->elevator_name) &&
(__e->uses_mq == uses_mq)) {
if (elv && elevator_match(elv, __e->elevator_name)) {
len += sprintf(name+len, "[%s] ", elv->elevator_name);
continue;
}
if (__e->uses_mq && q->mq_ops && elv_support_iosched(q))
len += sprintf(name+len, "%s ", __e->elevator_name);
else if (!__e->uses_mq && !q->mq_ops)
if (elv_support_iosched(q))
len += sprintf(name+len, "%s ", __e->elevator_name);
}
spin_unlock(&elv_list_lock);
if (q->mq_ops && q->elevator)
if (q->elevator)
len += sprintf(name+len, "none");
len += sprintf(len+name, "\n");

View File

@ -47,51 +47,64 @@ static void disk_release_events(struct gendisk *disk);
void part_inc_in_flight(struct request_queue *q, struct hd_struct *part, int rw)
{
if (q->mq_ops)
if (queue_is_mq(q))
return;
atomic_inc(&part->in_flight[rw]);
part_stat_local_inc(part, in_flight[rw]);
if (part->partno)
atomic_inc(&part_to_disk(part)->part0.in_flight[rw]);
part_stat_local_inc(&part_to_disk(part)->part0, in_flight[rw]);
}
void part_dec_in_flight(struct request_queue *q, struct hd_struct *part, int rw)
{
if (q->mq_ops)
if (queue_is_mq(q))
return;
atomic_dec(&part->in_flight[rw]);
part_stat_local_dec(part, in_flight[rw]);
if (part->partno)
atomic_dec(&part_to_disk(part)->part0.in_flight[rw]);
part_stat_local_dec(&part_to_disk(part)->part0, in_flight[rw]);
}
void part_in_flight(struct request_queue *q, struct hd_struct *part,
unsigned int inflight[2])
unsigned int part_in_flight(struct request_queue *q, struct hd_struct *part)
{
if (q->mq_ops) {
blk_mq_in_flight(q, part, inflight);
return;
int cpu;
unsigned int inflight;
if (queue_is_mq(q)) {
return blk_mq_in_flight(q, part);
}
inflight[0] = atomic_read(&part->in_flight[0]) +
atomic_read(&part->in_flight[1]);
if (part->partno) {
part = &part_to_disk(part)->part0;
inflight[1] = atomic_read(&part->in_flight[0]) +
atomic_read(&part->in_flight[1]);
inflight = 0;
for_each_possible_cpu(cpu) {
inflight += part_stat_local_read_cpu(part, in_flight[0], cpu) +
part_stat_local_read_cpu(part, in_flight[1], cpu);
}
if ((int)inflight < 0)
inflight = 0;
return inflight;
}
void part_in_flight_rw(struct request_queue *q, struct hd_struct *part,
unsigned int inflight[2])
{
if (q->mq_ops) {
int cpu;
if (queue_is_mq(q)) {
blk_mq_in_flight_rw(q, part, inflight);
return;
}
inflight[0] = atomic_read(&part->in_flight[0]);
inflight[1] = atomic_read(&part->in_flight[1]);
inflight[0] = 0;
inflight[1] = 0;
for_each_possible_cpu(cpu) {
inflight[0] += part_stat_local_read_cpu(part, in_flight[0], cpu);
inflight[1] += part_stat_local_read_cpu(part, in_flight[1], cpu);
}
if ((int)inflight[0] < 0)
inflight[0] = 0;
if ((int)inflight[1] < 0)
inflight[1] = 0;
}
struct hd_struct *__disk_get_part(struct gendisk *disk, int partno)
@ -1325,8 +1338,7 @@ static int diskstats_show(struct seq_file *seqf, void *v)
struct disk_part_iter piter;
struct hd_struct *hd;
char buf[BDEVNAME_SIZE];
unsigned int inflight[2];
int cpu;
unsigned int inflight;
/*
if (&disk_to_dev(gp)->kobj.entry == block_class.devices.next)
@ -1338,10 +1350,7 @@ static int diskstats_show(struct seq_file *seqf, void *v)
disk_part_iter_init(&piter, gp, DISK_PITER_INCL_EMPTY_PART0);
while ((hd = disk_part_iter_next(&piter))) {
cpu = part_stat_lock();
part_round_stats(gp->queue, cpu, hd);
part_stat_unlock();
part_in_flight(gp->queue, hd, inflight);
inflight = part_in_flight(gp->queue, hd);
seq_printf(seqf, "%4d %7d %s "
"%lu %lu %lu %u "
"%lu %lu %lu %u "
@ -1357,7 +1366,7 @@ static int diskstats_show(struct seq_file *seqf, void *v)
part_stat_read(hd, merges[STAT_WRITE]),
part_stat_read(hd, sectors[STAT_WRITE]),
(unsigned int)part_stat_read_msecs(hd, STAT_WRITE),
inflight[0],
inflight,
jiffies_to_msecs(part_stat_read(hd, io_ticks)),
jiffies_to_msecs(part_stat_read(hd, time_in_queue)),
part_stat_read(hd, ios[STAT_DISCARD]),

View File

@ -195,7 +195,7 @@ struct kyber_hctx_data {
unsigned int batching;
struct kyber_ctx_queue *kcqs;
struct sbitmap kcq_map[KYBER_NUM_DOMAINS];
wait_queue_entry_t domain_wait[KYBER_NUM_DOMAINS];
struct sbq_wait domain_wait[KYBER_NUM_DOMAINS];
struct sbq_wait_state *domain_ws[KYBER_NUM_DOMAINS];
atomic_t wait_index[KYBER_NUM_DOMAINS];
};
@ -501,10 +501,11 @@ static int kyber_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
for (i = 0; i < KYBER_NUM_DOMAINS; i++) {
INIT_LIST_HEAD(&khd->rqs[i]);
init_waitqueue_func_entry(&khd->domain_wait[i],
khd->domain_wait[i].sbq = NULL;
init_waitqueue_func_entry(&khd->domain_wait[i].wait,
kyber_domain_wake);
khd->domain_wait[i].private = hctx;
INIT_LIST_HEAD(&khd->domain_wait[i].entry);
khd->domain_wait[i].wait.private = hctx;
INIT_LIST_HEAD(&khd->domain_wait[i].wait.entry);
atomic_set(&khd->wait_index[i], 0);
}
@ -576,7 +577,7 @@ static bool kyber_bio_merge(struct blk_mq_hw_ctx *hctx, struct bio *bio)
{
struct kyber_hctx_data *khd = hctx->sched_data;
struct blk_mq_ctx *ctx = blk_mq_get_ctx(hctx->queue);
struct kyber_ctx_queue *kcq = &khd->kcqs[ctx->index_hw];
struct kyber_ctx_queue *kcq = &khd->kcqs[ctx->index_hw[hctx->type]];
unsigned int sched_domain = kyber_sched_domain(bio->bi_opf);
struct list_head *rq_list = &kcq->rq_list[sched_domain];
bool merged;
@ -602,7 +603,7 @@ static void kyber_insert_requests(struct blk_mq_hw_ctx *hctx,
list_for_each_entry_safe(rq, next, rq_list, queuelist) {
unsigned int sched_domain = kyber_sched_domain(rq->cmd_flags);
struct kyber_ctx_queue *kcq = &khd->kcqs[rq->mq_ctx->index_hw];
struct kyber_ctx_queue *kcq = &khd->kcqs[rq->mq_ctx->index_hw[hctx->type]];
struct list_head *head = &kcq->rq_list[sched_domain];
spin_lock(&kcq->lock);
@ -611,7 +612,7 @@ static void kyber_insert_requests(struct blk_mq_hw_ctx *hctx,
else
list_move_tail(&rq->queuelist, head);
sbitmap_set_bit(&khd->kcq_map[sched_domain],
rq->mq_ctx->index_hw);
rq->mq_ctx->index_hw[hctx->type]);
blk_mq_sched_request_inserted(rq);
spin_unlock(&kcq->lock);
}
@ -698,12 +699,13 @@ static void kyber_flush_busy_kcqs(struct kyber_hctx_data *khd,
flush_busy_kcq, &data);
}
static int kyber_domain_wake(wait_queue_entry_t *wait, unsigned mode, int flags,
static int kyber_domain_wake(wait_queue_entry_t *wqe, unsigned mode, int flags,
void *key)
{
struct blk_mq_hw_ctx *hctx = READ_ONCE(wait->private);
struct blk_mq_hw_ctx *hctx = READ_ONCE(wqe->private);
struct sbq_wait *wait = container_of(wqe, struct sbq_wait, wait);
list_del_init(&wait->entry);
sbitmap_del_wait_queue(wait);
blk_mq_run_hw_queue(hctx, true);
return 1;
}
@ -714,7 +716,7 @@ static int kyber_get_domain_token(struct kyber_queue_data *kqd,
{
unsigned int sched_domain = khd->cur_domain;
struct sbitmap_queue *domain_tokens = &kqd->domain_tokens[sched_domain];
wait_queue_entry_t *wait = &khd->domain_wait[sched_domain];
struct sbq_wait *wait = &khd->domain_wait[sched_domain];
struct sbq_wait_state *ws;
int nr;
@ -725,11 +727,11 @@ static int kyber_get_domain_token(struct kyber_queue_data *kqd,
* run when one becomes available. Note that this is serialized on
* khd->lock, but we still need to be careful about the waker.
*/
if (nr < 0 && list_empty_careful(&wait->entry)) {
if (nr < 0 && list_empty_careful(&wait->wait.entry)) {
ws = sbq_wait_ptr(domain_tokens,
&khd->wait_index[sched_domain]);
khd->domain_ws[sched_domain] = ws;
add_wait_queue(&ws->wait, wait);
sbitmap_add_wait_queue(domain_tokens, ws, wait);
/*
* Try again in case a token was freed before we got on the wait
@ -745,10 +747,10 @@ static int kyber_get_domain_token(struct kyber_queue_data *kqd,
* between the !list_empty_careful() check and us grabbing the lock, but
* list_del_init() is okay with that.
*/
if (nr >= 0 && !list_empty_careful(&wait->entry)) {
if (nr >= 0 && !list_empty_careful(&wait->wait.entry)) {
ws = khd->domain_ws[sched_domain];
spin_lock_irq(&ws->wait.lock);
list_del_init(&wait->entry);
sbitmap_del_wait_queue(wait);
spin_unlock_irq(&ws->wait.lock);
}
@ -951,7 +953,7 @@ static int kyber_##name##_waiting_show(void *data, struct seq_file *m) \
{ \
struct blk_mq_hw_ctx *hctx = data; \
struct kyber_hctx_data *khd = hctx->sched_data; \
wait_queue_entry_t *wait = &khd->domain_wait[domain]; \
wait_queue_entry_t *wait = &khd->domain_wait[domain].wait; \
\
seq_printf(m, "%d\n", !list_empty_careful(&wait->entry)); \
return 0; \
@ -1017,7 +1019,7 @@ static const struct blk_mq_debugfs_attr kyber_hctx_debugfs_attrs[] = {
#endif
static struct elevator_type kyber_sched = {
.ops.mq = {
.ops = {
.init_sched = kyber_init_sched,
.exit_sched = kyber_exit_sched,
.init_hctx = kyber_init_hctx,
@ -1032,7 +1034,6 @@ static struct elevator_type kyber_sched = {
.dispatch_request = kyber_dispatch_request,
.has_work = kyber_has_work,
},
.uses_mq = true,
#ifdef CONFIG_BLK_DEBUG_FS
.queue_debugfs_attrs = kyber_queue_debugfs_attrs,
.hctx_debugfs_attrs = kyber_hctx_debugfs_attrs,

View File

@ -373,9 +373,16 @@ done:
/*
* One confusing aspect here is that we get called for a specific
* hardware queue, but we return a request that may not be for a
* hardware queue, but we may return a request that is for a
* different hardware queue. This is because mq-deadline has shared
* state for all hardware queues, in terms of sorting, FIFOs, etc.
*
* For a zoned block device, __dd_dispatch_request() may return NULL
* if all the queued write requests are directed at zones that are already
* locked due to on-going write requests. In this case, make sure to mark
* the queue as needing a restart to ensure that the queue is run again
* and the pending writes dispatched once the target zones for the ongoing
* write requests are unlocked in dd_finish_request().
*/
static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
{
@ -384,6 +391,9 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
spin_lock(&dd->lock);
rq = __dd_dispatch_request(dd);
if (!rq && blk_queue_is_zoned(hctx->queue) &&
!list_empty(&dd->fifo_list[WRITE]))
blk_mq_sched_mark_restart_hctx(hctx);
spin_unlock(&dd->lock);
return rq;
@ -761,7 +771,7 @@ static const struct blk_mq_debugfs_attr deadline_queue_debugfs_attrs[] = {
#endif
static struct elevator_type mq_deadline = {
.ops.mq = {
.ops = {
.insert_requests = dd_insert_requests,
.dispatch_request = dd_dispatch_request,
.prepare_request = dd_prepare_request,
@ -777,7 +787,6 @@ static struct elevator_type mq_deadline = {
.exit_sched = dd_exit_queue,
},
.uses_mq = true,
#ifdef CONFIG_BLK_DEBUG_FS
.queue_debugfs_attrs = deadline_queue_debugfs_attrs,
#endif

View File

@ -1,124 +0,0 @@
/*
* elevator noop
*/
#include <linux/blkdev.h>
#include <linux/elevator.h>
#include <linux/bio.h>
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/init.h>
struct noop_data {
struct list_head queue;
};
static void noop_merged_requests(struct request_queue *q, struct request *rq,
struct request *next)
{
list_del_init(&next->queuelist);
}
static int noop_dispatch(struct request_queue *q, int force)
{
struct noop_data *nd = q->elevator->elevator_data;
struct request *rq;
rq = list_first_entry_or_null(&nd->queue, struct request, queuelist);
if (rq) {
list_del_init(&rq->queuelist);
elv_dispatch_sort(q, rq);
return 1;
}
return 0;
}
static void noop_add_request(struct request_queue *q, struct request *rq)
{
struct noop_data *nd = q->elevator->elevator_data;
list_add_tail(&rq->queuelist, &nd->queue);
}
static struct request *
noop_former_request(struct request_queue *q, struct request *rq)
{
struct noop_data *nd = q->elevator->elevator_data;
if (rq->queuelist.prev == &nd->queue)
return NULL;
return list_prev_entry(rq, queuelist);
}
static struct request *
noop_latter_request(struct request_queue *q, struct request *rq)
{
struct noop_data *nd = q->elevator->elevator_data;
if (rq->queuelist.next == &nd->queue)
return NULL;
return list_next_entry(rq, queuelist);
}
static int noop_init_queue(struct request_queue *q, struct elevator_type *e)
{
struct noop_data *nd;
struct elevator_queue *eq;
eq = elevator_alloc(q, e);
if (!eq)
return -ENOMEM;
nd = kmalloc_node(sizeof(*nd), GFP_KERNEL, q->node);
if (!nd) {
kobject_put(&eq->kobj);
return -ENOMEM;
}
eq->elevator_data = nd;
INIT_LIST_HEAD(&nd->queue);
spin_lock_irq(q->queue_lock);
q->elevator = eq;
spin_unlock_irq(q->queue_lock);
return 0;
}
static void noop_exit_queue(struct elevator_queue *e)
{
struct noop_data *nd = e->elevator_data;
BUG_ON(!list_empty(&nd->queue));
kfree(nd);
}
static struct elevator_type elevator_noop = {
.ops.sq = {
.elevator_merge_req_fn = noop_merged_requests,
.elevator_dispatch_fn = noop_dispatch,
.elevator_add_req_fn = noop_add_request,
.elevator_former_req_fn = noop_former_request,
.elevator_latter_req_fn = noop_latter_request,
.elevator_init_fn = noop_init_queue,
.elevator_exit_fn = noop_exit_queue,
},
.elevator_name = "noop",
.elevator_owner = THIS_MODULE,
};
static int __init noop_init(void)
{
return elv_register(&elevator_noop);
}
static void __exit noop_exit(void)
{
elv_unregister(&elevator_noop);
}
module_init(noop_init);
module_exit(noop_exit);
MODULE_AUTHOR("Jens Axboe");
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("No-op IO scheduler");

View File

@ -120,13 +120,9 @@ ssize_t part_stat_show(struct device *dev,
{
struct hd_struct *p = dev_to_part(dev);
struct request_queue *q = part_to_disk(p)->queue;
unsigned int inflight[2];
int cpu;
unsigned int inflight;
cpu = part_stat_lock();
part_round_stats(q, cpu, p);
part_stat_unlock();
part_in_flight(q, p, inflight);
inflight = part_in_flight(q, p);
return sprintf(buf,
"%8lu %8lu %8llu %8u "
"%8lu %8lu %8llu %8u "
@ -141,7 +137,7 @@ ssize_t part_stat_show(struct device *dev,
part_stat_read(p, merges[STAT_WRITE]),
(unsigned long long)part_stat_read(p, sectors[STAT_WRITE]),
(unsigned int)part_stat_read_msecs(p, STAT_WRITE),
inflight[0],
inflight,
jiffies_to_msecs(part_stat_read(p, io_ticks)),
jiffies_to_msecs(part_stat_read(p, time_in_queue)),
part_stat_read(p, ios[STAT_DISCARD]),
@ -249,9 +245,10 @@ struct device_type part_type = {
.uevent = part_uevent,
};
static void delete_partition_rcu_cb(struct rcu_head *head)
static void delete_partition_work_fn(struct work_struct *work)
{
struct hd_struct *part = container_of(head, struct hd_struct, rcu_head);
struct hd_struct *part = container_of(to_rcu_work(work), struct hd_struct,
rcu_work);
part->start_sect = 0;
part->nr_sects = 0;
@ -262,7 +259,8 @@ static void delete_partition_rcu_cb(struct rcu_head *head)
void __delete_partition(struct percpu_ref *ref)
{
struct hd_struct *part = container_of(ref, struct hd_struct, ref);
call_rcu(&part->rcu_head, delete_partition_rcu_cb);
INIT_RCU_WORK(&part->rcu_work, delete_partition_work_fn);
queue_rcu_work(system_wq, &part->rcu_work);
}
/*

View File

@ -919,8 +919,6 @@ static void ata_eh_set_pending(struct ata_port *ap, int fastdrain)
void ata_qc_schedule_eh(struct ata_queued_cmd *qc)
{
struct ata_port *ap = qc->ap;
struct request_queue *q = qc->scsicmd->device->request_queue;
unsigned long flags;
WARN_ON(!ap->ops->error_handler);
@ -932,9 +930,7 @@ void ata_qc_schedule_eh(struct ata_queued_cmd *qc)
* Note that ATA_QCFLAG_FAILED is unconditionally set after
* this function completes.
*/
spin_lock_irqsave(q->queue_lock, flags);
blk_abort_request(qc->scsicmd->request);
spin_unlock_irqrestore(q->queue_lock, flags);
}
/**

View File

@ -100,6 +100,10 @@ enum {
MAX_TAINT = 1000, /* cap on aoetgt taint */
};
struct aoe_req {
unsigned long nr_bios;
};
struct buf {
ulong nframesout;
struct bio *bio;

View File

@ -387,6 +387,7 @@ aoeblk_gdalloc(void *vp)
set = &d->tag_set;
set->ops = &aoeblk_mq_ops;
set->cmd_size = sizeof(struct aoe_req);
set->nr_hw_queues = 1;
set->queue_depth = 128;
set->numa_node = NUMA_NO_NODE;

View File

@ -822,17 +822,6 @@ out:
spin_unlock_irqrestore(&d->lock, flags);
}
static unsigned long
rqbiocnt(struct request *r)
{
struct bio *bio;
unsigned long n = 0;
__rq_for_each_bio(bio, r)
n++;
return n;
}
static void
bufinit(struct buf *buf, struct request *rq, struct bio *bio)
{
@ -847,6 +836,7 @@ nextbuf(struct aoedev *d)
{
struct request *rq;
struct request_queue *q;
struct aoe_req *req;
struct buf *buf;
struct bio *bio;
@ -865,7 +855,11 @@ nextbuf(struct aoedev *d)
blk_mq_start_request(rq);
d->ip.rq = rq;
d->ip.nxbio = rq->bio;
rq->special = (void *) rqbiocnt(rq);
req = blk_mq_rq_to_pdu(rq);
req->nr_bios = 0;
__rq_for_each_bio(bio, rq)
req->nr_bios++;
}
buf = mempool_alloc(d->bufpool, GFP_ATOMIC);
if (buf == NULL) {
@ -1069,16 +1063,13 @@ aoe_end_request(struct aoedev *d, struct request *rq, int fastfail)
static void
aoe_end_buf(struct aoedev *d, struct buf *buf)
{
struct request *rq;
unsigned long n;
struct request *rq = buf->rq;
struct aoe_req *req = blk_mq_rq_to_pdu(rq);
if (buf == d->ip.buf)
d->ip.buf = NULL;
rq = buf->rq;
mempool_free(buf, d->bufpool);
n = (unsigned long) rq->special;
rq->special = (void *) --n;
if (n == 0)
if (--req->nr_bios == 0)
aoe_end_request(d, rq, 0);
}

View File

@ -160,21 +160,22 @@ static void
aoe_failip(struct aoedev *d)
{
struct request *rq;
struct aoe_req *req;
struct bio *bio;
unsigned long n;
aoe_failbuf(d, d->ip.buf);
rq = d->ip.rq;
if (rq == NULL)
return;
req = blk_mq_rq_to_pdu(rq);
while ((bio = d->ip.nxbio)) {
bio->bi_status = BLK_STS_IOERR;
d->ip.nxbio = bio->bi_next;
n = (unsigned long) rq->special;
rq->special = (void *) --n;
req->nr_bios--;
}
if ((unsigned long) rq->special == 0)
if (!req->nr_bios)
aoe_end_request(d, rq, 0);
}

View File

@ -24,7 +24,7 @@ static void discover_timer(struct timer_list *t)
aoecmd_cfg(0xffff, 0xff);
}
static void
static void __exit
aoe_exit(void)
{
del_timer_sync(&timer);

View File

@ -1471,6 +1471,15 @@ static void setup_req_params( int drive )
ReqTrack, ReqSector, (unsigned long)ReqData ));
}
static void ataflop_commit_rqs(struct blk_mq_hw_ctx *hctx)
{
spin_lock_irq(&ataflop_lock);
atari_disable_irq(IRQ_MFP_FDC);
finish_fdc();
atari_enable_irq(IRQ_MFP_FDC);
spin_unlock_irq(&ataflop_lock);
}
static blk_status_t ataflop_queue_rq(struct blk_mq_hw_ctx *hctx,
const struct blk_mq_queue_data *bd)
{
@ -1947,6 +1956,7 @@ static const struct block_device_operations floppy_fops = {
static const struct blk_mq_ops ataflop_mq_ops = {
.queue_rq = ataflop_queue_rq,
.commit_rqs = ataflop_commit_rqs,
};
static struct kobject *floppy_find(dev_t dev, int *part, void *data)
@ -1982,6 +1992,7 @@ static int __init atari_floppy_init (void)
&ataflop_mq_ops, 2,
BLK_MQ_F_SHOULD_MERGE);
if (IS_ERR(unit[i].disk->queue)) {
put_disk(unit[i].disk);
ret = PTR_ERR(unit[i].disk->queue);
unit[i].disk->queue = NULL;
goto err;
@ -2033,18 +2044,13 @@ static int __init atari_floppy_init (void)
return 0;
err:
do {
while (--i >= 0) {
struct gendisk *disk = unit[i].disk;
if (disk) {
if (disk->queue) {
blk_cleanup_queue(disk->queue);
disk->queue = NULL;
}
blk_mq_free_tag_set(&unit[i].tag_set);
put_disk(unit[i].disk);
}
} while (i--);
blk_cleanup_queue(disk->queue);
blk_mq_free_tag_set(&unit[i].tag_set);
put_disk(unit[i].disk);
}
unregister_blkdev(FLOPPY_MAJOR, "fd");
return ret;

View File

@ -2792,7 +2792,7 @@ enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsig
drbd_init_set_defaults(device);
q = blk_alloc_queue_node(GFP_KERNEL, NUMA_NO_NODE, &resource->req_lock);
q = blk_alloc_queue_node(GFP_KERNEL, NUMA_NO_NODE);
if (!q)
goto out_no_q;
device->rq_queue = q;

View File

@ -2231,7 +2231,6 @@ static void request_done(int uptodate)
{
struct request *req = current_req;
struct request_queue *q;
unsigned long flags;
int block;
char msg[sizeof("request done ") + sizeof(int) * 3];
@ -2254,10 +2253,7 @@ static void request_done(int uptodate)
if (block > _floppy->sect)
DRS->maxtrack = 1;
/* unlock chained buffers */
spin_lock_irqsave(q->queue_lock, flags);
floppy_end_request(req, 0);
spin_unlock_irqrestore(q->queue_lock, flags);
} else {
if (rq_data_dir(req) == WRITE) {
/* record write error information */
@ -2269,9 +2265,7 @@ static void request_done(int uptodate)
DRWE->last_error_sector = blk_rq_pos(req);
DRWE->last_error_generation = DRS->generation;
}
spin_lock_irqsave(q->queue_lock, flags);
floppy_end_request(req, BLK_STS_IOERR);
spin_unlock_irqrestore(q->queue_lock, flags);
}
}

View File

@ -77,13 +77,14 @@
#include <linux/falloc.h>
#include <linux/uio.h>
#include <linux/ioprio.h>
#include <linux/blk-cgroup.h>
#include "loop.h"
#include <linux/uaccess.h>
static DEFINE_IDR(loop_index_idr);
static DEFINE_MUTEX(loop_index_mutex);
static DEFINE_MUTEX(loop_ctl_mutex);
static int max_part;
static int part_shift;
@ -630,18 +631,7 @@ static void loop_reread_partitions(struct loop_device *lo,
{
int rc;
/*
* bd_mutex has been held already in release path, so don't
* acquire it if this function is called in such case.
*
* If the reread partition isn't from release path, lo_refcnt
* must be at least one and it can only become zero when the
* current holder is released.
*/
if (!atomic_read(&lo->lo_refcnt))
rc = __blkdev_reread_part(bdev);
else
rc = blkdev_reread_part(bdev);
rc = blkdev_reread_part(bdev);
if (rc)
pr_warn("%s: partition scan of loop%d (%s) failed (rc=%d)\n",
__func__, lo->lo_number, lo->lo_file_name, rc);
@ -688,26 +678,30 @@ static int loop_validate_file(struct file *file, struct block_device *bdev)
static int loop_change_fd(struct loop_device *lo, struct block_device *bdev,
unsigned int arg)
{
struct file *file, *old_file;
struct file *file = NULL, *old_file;
int error;
bool partscan;
error = mutex_lock_killable(&loop_ctl_mutex);
if (error)
return error;
error = -ENXIO;
if (lo->lo_state != Lo_bound)
goto out;
goto out_err;
/* the loop device has to be read-only */
error = -EINVAL;
if (!(lo->lo_flags & LO_FLAGS_READ_ONLY))
goto out;
goto out_err;
error = -EBADF;
file = fget(arg);
if (!file)
goto out;
goto out_err;
error = loop_validate_file(file, bdev);
if (error)
goto out_putf;
goto out_err;
old_file = lo->lo_backing_file;
@ -715,7 +709,7 @@ static int loop_change_fd(struct loop_device *lo, struct block_device *bdev,
/* size of the new backing store needs to be the same */
if (get_loop_size(lo, file) != get_loop_size(lo, old_file))
goto out_putf;
goto out_err;
/* and ... switch */
blk_mq_freeze_queue(lo->lo_queue);
@ -726,15 +720,22 @@ static int loop_change_fd(struct loop_device *lo, struct block_device *bdev,
lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS));
loop_update_dio(lo);
blk_mq_unfreeze_queue(lo->lo_queue);
partscan = lo->lo_flags & LO_FLAGS_PARTSCAN;
mutex_unlock(&loop_ctl_mutex);
/*
* We must drop file reference outside of loop_ctl_mutex as dropping
* the file ref can take bd_mutex which creates circular locking
* dependency.
*/
fput(old_file);
if (lo->lo_flags & LO_FLAGS_PARTSCAN)
if (partscan)
loop_reread_partitions(lo, bdev);
return 0;
out_putf:
fput(file);
out:
out_err:
mutex_unlock(&loop_ctl_mutex);
if (file)
fput(file);
return error;
}
@ -909,6 +910,7 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
int lo_flags = 0;
int error;
loff_t size;
bool partscan;
/* This is safe, since we have a reference from open(). */
__module_get(THIS_MODULE);
@ -918,13 +920,17 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
if (!file)
goto out;
error = mutex_lock_killable(&loop_ctl_mutex);
if (error)
goto out_putf;
error = -EBUSY;
if (lo->lo_state != Lo_unbound)
goto out_putf;
goto out_unlock;
error = loop_validate_file(file, bdev);
if (error)
goto out_putf;
goto out_unlock;
mapping = file->f_mapping;
inode = mapping->host;
@ -936,10 +942,10 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
error = -EFBIG;
size = get_loop_size(lo, file);
if ((loff_t)(sector_t)size != size)
goto out_putf;
goto out_unlock;
error = loop_prepare_queue(lo);
if (error)
goto out_putf;
goto out_unlock;
error = 0;
@ -971,18 +977,22 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
lo->lo_state = Lo_bound;
if (part_shift)
lo->lo_flags |= LO_FLAGS_PARTSCAN;
if (lo->lo_flags & LO_FLAGS_PARTSCAN)
loop_reread_partitions(lo, bdev);
partscan = lo->lo_flags & LO_FLAGS_PARTSCAN;
/* Grab the block_device to prevent its destruction after we
* put /dev/loopXX inode. Later in loop_clr_fd() we bdput(bdev).
* put /dev/loopXX inode. Later in __loop_clr_fd() we bdput(bdev).
*/
bdgrab(bdev);
mutex_unlock(&loop_ctl_mutex);
if (partscan)
loop_reread_partitions(lo, bdev);
return 0;
out_putf:
out_unlock:
mutex_unlock(&loop_ctl_mutex);
out_putf:
fput(file);
out:
out:
/* This is safe: open() is still holding a reference. */
module_put(THIS_MODULE);
return error;
@ -1025,39 +1035,31 @@ loop_init_xfer(struct loop_device *lo, struct loop_func_table *xfer,
return err;
}
static int loop_clr_fd(struct loop_device *lo)
static int __loop_clr_fd(struct loop_device *lo, bool release)
{
struct file *filp = lo->lo_backing_file;
struct file *filp = NULL;
gfp_t gfp = lo->old_gfp_mask;
struct block_device *bdev = lo->lo_device;
int err = 0;
bool partscan = false;
int lo_number;
if (lo->lo_state != Lo_bound)
return -ENXIO;
/*
* If we've explicitly asked to tear down the loop device,
* and it has an elevated reference count, set it for auto-teardown when
* the last reference goes away. This stops $!~#$@ udev from
* preventing teardown because it decided that it needs to run blkid on
* the loopback device whenever they appear. xfstests is notorious for
* failing tests because blkid via udev races with a losetup
* <dev>/do something like mkfs/losetup -d <dev> causing the losetup -d
* command to fail with EBUSY.
*/
if (atomic_read(&lo->lo_refcnt) > 1) {
lo->lo_flags |= LO_FLAGS_AUTOCLEAR;
mutex_unlock(&lo->lo_ctl_mutex);
return 0;
mutex_lock(&loop_ctl_mutex);
if (WARN_ON_ONCE(lo->lo_state != Lo_rundown)) {
err = -ENXIO;
goto out_unlock;
}
if (filp == NULL)
return -EINVAL;
filp = lo->lo_backing_file;
if (filp == NULL) {
err = -EINVAL;
goto out_unlock;
}
/* freeze request queue during the transition */
blk_mq_freeze_queue(lo->lo_queue);
spin_lock_irq(&lo->lo_lock);
lo->lo_state = Lo_rundown;
lo->lo_backing_file = NULL;
spin_unlock_irq(&lo->lo_lock);
@ -1093,21 +1095,73 @@ static int loop_clr_fd(struct loop_device *lo)
module_put(THIS_MODULE);
blk_mq_unfreeze_queue(lo->lo_queue);
if (lo->lo_flags & LO_FLAGS_PARTSCAN && bdev)
loop_reread_partitions(lo, bdev);
partscan = lo->lo_flags & LO_FLAGS_PARTSCAN && bdev;
lo_number = lo->lo_number;
lo->lo_flags = 0;
if (!part_shift)
lo->lo_disk->flags |= GENHD_FL_NO_PART_SCAN;
loop_unprepare_queue(lo);
mutex_unlock(&lo->lo_ctl_mutex);
out_unlock:
mutex_unlock(&loop_ctl_mutex);
if (partscan) {
/*
* bd_mutex has been held already in release path, so don't
* acquire it if this function is called in such case.
*
* If the reread partition isn't from release path, lo_refcnt
* must be at least one and it can only become zero when the
* current holder is released.
*/
if (release)
err = __blkdev_reread_part(bdev);
else
err = blkdev_reread_part(bdev);
pr_warn("%s: partition scan of loop%d failed (rc=%d)\n",
__func__, lo_number, err);
/* Device is gone, no point in returning error */
err = 0;
}
/*
* Need not hold lo_ctl_mutex to fput backing file.
* Calling fput holding lo_ctl_mutex triggers a circular
* Need not hold loop_ctl_mutex to fput backing file.
* Calling fput holding loop_ctl_mutex triggers a circular
* lock dependency possibility warning as fput can take
* bd_mutex which is usually taken before lo_ctl_mutex.
* bd_mutex which is usually taken before loop_ctl_mutex.
*/
fput(filp);
return 0;
if (filp)
fput(filp);
return err;
}
static int loop_clr_fd(struct loop_device *lo)
{
int err;
err = mutex_lock_killable(&loop_ctl_mutex);
if (err)
return err;
if (lo->lo_state != Lo_bound) {
mutex_unlock(&loop_ctl_mutex);
return -ENXIO;
}
/*
* If we've explicitly asked to tear down the loop device,
* and it has an elevated reference count, set it for auto-teardown when
* the last reference goes away. This stops $!~#$@ udev from
* preventing teardown because it decided that it needs to run blkid on
* the loopback device whenever they appear. xfstests is notorious for
* failing tests because blkid via udev races with a losetup
* <dev>/do something like mkfs/losetup -d <dev> causing the losetup -d
* command to fail with EBUSY.
*/
if (atomic_read(&lo->lo_refcnt) > 1) {
lo->lo_flags |= LO_FLAGS_AUTOCLEAR;
mutex_unlock(&loop_ctl_mutex);
return 0;
}
lo->lo_state = Lo_rundown;
mutex_unlock(&loop_ctl_mutex);
return __loop_clr_fd(lo, false);
}
static int
@ -1116,47 +1170,58 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
int err;
struct loop_func_table *xfer;
kuid_t uid = current_uid();
struct block_device *bdev;
bool partscan = false;
err = mutex_lock_killable(&loop_ctl_mutex);
if (err)
return err;
if (lo->lo_encrypt_key_size &&
!uid_eq(lo->lo_key_owner, uid) &&
!capable(CAP_SYS_ADMIN))
return -EPERM;
if (lo->lo_state != Lo_bound)
return -ENXIO;
if ((unsigned int) info->lo_encrypt_key_size > LO_KEY_SIZE)
return -EINVAL;
!capable(CAP_SYS_ADMIN)) {
err = -EPERM;
goto out_unlock;
}
if (lo->lo_state != Lo_bound) {
err = -ENXIO;
goto out_unlock;
}
if ((unsigned int) info->lo_encrypt_key_size > LO_KEY_SIZE) {
err = -EINVAL;
goto out_unlock;
}
/* I/O need to be drained during transfer transition */
blk_mq_freeze_queue(lo->lo_queue);
err = loop_release_xfer(lo);
if (err)
goto exit;
goto out_unfreeze;
if (info->lo_encrypt_type) {
unsigned int type = info->lo_encrypt_type;
if (type >= MAX_LO_CRYPT) {
err = -EINVAL;
goto exit;
goto out_unfreeze;
}
xfer = xfer_funcs[type];
if (xfer == NULL) {
err = -EINVAL;
goto exit;
goto out_unfreeze;
}
} else
xfer = NULL;
err = loop_init_xfer(lo, xfer, info);
if (err)
goto exit;
goto out_unfreeze;
if (lo->lo_offset != info->lo_offset ||
lo->lo_sizelimit != info->lo_sizelimit) {
if (figure_loop_size(lo, info->lo_offset, info->lo_sizelimit)) {
err = -EFBIG;
goto exit;
goto out_unfreeze;
}
}
@ -1188,15 +1253,20 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
/* update dio if lo_offset or transfer is changed */
__loop_update_dio(lo, lo->use_dio);
exit:
out_unfreeze:
blk_mq_unfreeze_queue(lo->lo_queue);
if (!err && (info->lo_flags & LO_FLAGS_PARTSCAN) &&
!(lo->lo_flags & LO_FLAGS_PARTSCAN)) {
lo->lo_flags |= LO_FLAGS_PARTSCAN;
lo->lo_disk->flags &= ~GENHD_FL_NO_PART_SCAN;
loop_reread_partitions(lo, lo->lo_device);
bdev = lo->lo_device;
partscan = true;
}
out_unlock:
mutex_unlock(&loop_ctl_mutex);
if (partscan)
loop_reread_partitions(lo, bdev);
return err;
}
@ -1204,12 +1274,15 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
static int
loop_get_status(struct loop_device *lo, struct loop_info64 *info)
{
struct file *file;
struct path path;
struct kstat stat;
int ret;
ret = mutex_lock_killable(&loop_ctl_mutex);
if (ret)
return ret;
if (lo->lo_state != Lo_bound) {
mutex_unlock(&lo->lo_ctl_mutex);
mutex_unlock(&loop_ctl_mutex);
return -ENXIO;
}
@ -1228,17 +1301,17 @@ loop_get_status(struct loop_device *lo, struct loop_info64 *info)
lo->lo_encrypt_key_size);
}
/* Drop lo_ctl_mutex while we call into the filesystem. */
file = get_file(lo->lo_backing_file);
mutex_unlock(&lo->lo_ctl_mutex);
ret = vfs_getattr(&file->f_path, &stat, STATX_INO,
AT_STATX_SYNC_AS_STAT);
/* Drop loop_ctl_mutex while we call into the filesystem. */
path = lo->lo_backing_file->f_path;
path_get(&path);
mutex_unlock(&loop_ctl_mutex);
ret = vfs_getattr(&path, &stat, STATX_INO, AT_STATX_SYNC_AS_STAT);
if (!ret) {
info->lo_device = huge_encode_dev(stat.dev);
info->lo_inode = stat.ino;
info->lo_rdevice = huge_encode_dev(stat.rdev);
}
fput(file);
path_put(&path);
return ret;
}
@ -1322,10 +1395,8 @@ loop_get_status_old(struct loop_device *lo, struct loop_info __user *arg) {
struct loop_info64 info64;
int err;
if (!arg) {
mutex_unlock(&lo->lo_ctl_mutex);
if (!arg)
return -EINVAL;
}
err = loop_get_status(lo, &info64);
if (!err)
err = loop_info64_to_old(&info64, &info);
@ -1340,10 +1411,8 @@ loop_get_status64(struct loop_device *lo, struct loop_info64 __user *arg) {
struct loop_info64 info64;
int err;
if (!arg) {
mutex_unlock(&lo->lo_ctl_mutex);
if (!arg)
return -EINVAL;
}
err = loop_get_status(lo, &info64);
if (!err && copy_to_user(arg, &info64, sizeof(info64)))
err = -EFAULT;
@ -1393,70 +1462,73 @@ static int loop_set_block_size(struct loop_device *lo, unsigned long arg)
return 0;
}
static int lo_simple_ioctl(struct loop_device *lo, unsigned int cmd,
unsigned long arg)
{
int err;
err = mutex_lock_killable(&loop_ctl_mutex);
if (err)
return err;
switch (cmd) {
case LOOP_SET_CAPACITY:
err = loop_set_capacity(lo);
break;
case LOOP_SET_DIRECT_IO:
err = loop_set_dio(lo, arg);
break;
case LOOP_SET_BLOCK_SIZE:
err = loop_set_block_size(lo, arg);
break;
default:
err = lo->ioctl ? lo->ioctl(lo, cmd, arg) : -EINVAL;
}
mutex_unlock(&loop_ctl_mutex);
return err;
}
static int lo_ioctl(struct block_device *bdev, fmode_t mode,
unsigned int cmd, unsigned long arg)
{
struct loop_device *lo = bdev->bd_disk->private_data;
int err;
err = mutex_lock_killable_nested(&lo->lo_ctl_mutex, 1);
if (err)
goto out_unlocked;
switch (cmd) {
case LOOP_SET_FD:
err = loop_set_fd(lo, mode, bdev, arg);
break;
return loop_set_fd(lo, mode, bdev, arg);
case LOOP_CHANGE_FD:
err = loop_change_fd(lo, bdev, arg);
break;
return loop_change_fd(lo, bdev, arg);
case LOOP_CLR_FD:
/* loop_clr_fd would have unlocked lo_ctl_mutex on success */
err = loop_clr_fd(lo);
if (!err)
goto out_unlocked;
break;
return loop_clr_fd(lo);
case LOOP_SET_STATUS:
err = -EPERM;
if ((mode & FMODE_WRITE) || capable(CAP_SYS_ADMIN))
if ((mode & FMODE_WRITE) || capable(CAP_SYS_ADMIN)) {
err = loop_set_status_old(lo,
(struct loop_info __user *)arg);
}
break;
case LOOP_GET_STATUS:
err = loop_get_status_old(lo, (struct loop_info __user *) arg);
/* loop_get_status() unlocks lo_ctl_mutex */
goto out_unlocked;
return loop_get_status_old(lo, (struct loop_info __user *) arg);
case LOOP_SET_STATUS64:
err = -EPERM;
if ((mode & FMODE_WRITE) || capable(CAP_SYS_ADMIN))
if ((mode & FMODE_WRITE) || capable(CAP_SYS_ADMIN)) {
err = loop_set_status64(lo,
(struct loop_info64 __user *) arg);
}
break;
case LOOP_GET_STATUS64:
err = loop_get_status64(lo, (struct loop_info64 __user *) arg);
/* loop_get_status() unlocks lo_ctl_mutex */
goto out_unlocked;
return loop_get_status64(lo, (struct loop_info64 __user *) arg);
case LOOP_SET_CAPACITY:
err = -EPERM;
if ((mode & FMODE_WRITE) || capable(CAP_SYS_ADMIN))
err = loop_set_capacity(lo);
break;
case LOOP_SET_DIRECT_IO:
err = -EPERM;
if ((mode & FMODE_WRITE) || capable(CAP_SYS_ADMIN))
err = loop_set_dio(lo, arg);
break;
case LOOP_SET_BLOCK_SIZE:
err = -EPERM;
if ((mode & FMODE_WRITE) || capable(CAP_SYS_ADMIN))
err = loop_set_block_size(lo, arg);
break;
if (!(mode & FMODE_WRITE) && !capable(CAP_SYS_ADMIN))
return -EPERM;
/* Fall through */
default:
err = lo->ioctl ? lo->ioctl(lo, cmd, arg) : -EINVAL;
err = lo_simple_ioctl(lo, cmd, arg);
break;
}
mutex_unlock(&lo->lo_ctl_mutex);
out_unlocked:
return err;
}
@ -1570,10 +1642,8 @@ loop_get_status_compat(struct loop_device *lo,
struct loop_info64 info64;
int err;
if (!arg) {
mutex_unlock(&lo->lo_ctl_mutex);
if (!arg)
return -EINVAL;
}
err = loop_get_status(lo, &info64);
if (!err)
err = loop_info64_to_compat(&info64, arg);
@ -1588,20 +1658,12 @@ static int lo_compat_ioctl(struct block_device *bdev, fmode_t mode,
switch(cmd) {
case LOOP_SET_STATUS:
err = mutex_lock_killable(&lo->lo_ctl_mutex);
if (!err) {
err = loop_set_status_compat(lo,
(const struct compat_loop_info __user *)arg);
mutex_unlock(&lo->lo_ctl_mutex);
}
err = loop_set_status_compat(lo,
(const struct compat_loop_info __user *)arg);
break;
case LOOP_GET_STATUS:
err = mutex_lock_killable(&lo->lo_ctl_mutex);
if (!err) {
err = loop_get_status_compat(lo,
(struct compat_loop_info __user *)arg);
/* loop_get_status() unlocks lo_ctl_mutex */
}
err = loop_get_status_compat(lo,
(struct compat_loop_info __user *)arg);
break;
case LOOP_SET_CAPACITY:
case LOOP_CLR_FD:
@ -1625,9 +1687,11 @@ static int lo_compat_ioctl(struct block_device *bdev, fmode_t mode,
static int lo_open(struct block_device *bdev, fmode_t mode)
{
struct loop_device *lo;
int err = 0;
int err;
mutex_lock(&loop_index_mutex);
err = mutex_lock_killable(&loop_ctl_mutex);
if (err)
return err;
lo = bdev->bd_disk->private_data;
if (!lo) {
err = -ENXIO;
@ -1636,26 +1700,30 @@ static int lo_open(struct block_device *bdev, fmode_t mode)
atomic_inc(&lo->lo_refcnt);
out:
mutex_unlock(&loop_index_mutex);
mutex_unlock(&loop_ctl_mutex);
return err;
}
static void __lo_release(struct loop_device *lo)
static void lo_release(struct gendisk *disk, fmode_t mode)
{
int err;
struct loop_device *lo;
mutex_lock(&loop_ctl_mutex);
lo = disk->private_data;
if (atomic_dec_return(&lo->lo_refcnt))
return;
goto out_unlock;
mutex_lock(&lo->lo_ctl_mutex);
if (lo->lo_flags & LO_FLAGS_AUTOCLEAR) {
if (lo->lo_state != Lo_bound)
goto out_unlock;
lo->lo_state = Lo_rundown;
mutex_unlock(&loop_ctl_mutex);
/*
* In autoclear mode, stop the loop thread
* and remove configuration after last close.
*/
err = loop_clr_fd(lo);
if (!err)
return;
__loop_clr_fd(lo, true);
return;
} else if (lo->lo_state == Lo_bound) {
/*
* Otherwise keep thread (if running) and config,
@ -1665,14 +1733,8 @@ static void __lo_release(struct loop_device *lo)
blk_mq_unfreeze_queue(lo->lo_queue);
}
mutex_unlock(&lo->lo_ctl_mutex);
}
static void lo_release(struct gendisk *disk, fmode_t mode)
{
mutex_lock(&loop_index_mutex);
__lo_release(disk->private_data);
mutex_unlock(&loop_index_mutex);
out_unlock:
mutex_unlock(&loop_ctl_mutex);
}
static const struct block_device_operations lo_fops = {
@ -1711,10 +1773,10 @@ static int unregister_transfer_cb(int id, void *ptr, void *data)
struct loop_device *lo = ptr;
struct loop_func_table *xfer = data;
mutex_lock(&lo->lo_ctl_mutex);
mutex_lock(&loop_ctl_mutex);
if (lo->lo_encryption == xfer)
loop_release_xfer(lo);
mutex_unlock(&lo->lo_ctl_mutex);
mutex_unlock(&loop_ctl_mutex);
return 0;
}
@ -1759,8 +1821,8 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
/* always use the first bio's css */
#ifdef CONFIG_BLK_CGROUP
if (cmd->use_aio && rq->bio && rq->bio->bi_css) {
cmd->css = rq->bio->bi_css;
if (cmd->use_aio && rq->bio && rq->bio->bi_blkg) {
cmd->css = &bio_blkcg(rq->bio)->css;
css_get(cmd->css);
} else
#endif
@ -1853,7 +1915,7 @@ static int loop_add(struct loop_device **l, int i)
goto out_free_idr;
lo->lo_queue = blk_mq_init_queue(&lo->tag_set);
if (IS_ERR_OR_NULL(lo->lo_queue)) {
if (IS_ERR(lo->lo_queue)) {
err = PTR_ERR(lo->lo_queue);
goto out_cleanup_tags;
}
@ -1895,7 +1957,6 @@ static int loop_add(struct loop_device **l, int i)
if (!part_shift)
disk->flags |= GENHD_FL_NO_PART_SCAN;
disk->flags |= GENHD_FL_EXT_DEVT;
mutex_init(&lo->lo_ctl_mutex);
atomic_set(&lo->lo_refcnt, 0);
lo->lo_number = i;
spin_lock_init(&lo->lo_lock);
@ -1974,7 +2035,7 @@ static struct kobject *loop_probe(dev_t dev, int *part, void *data)
struct kobject *kobj;
int err;
mutex_lock(&loop_index_mutex);
mutex_lock(&loop_ctl_mutex);
err = loop_lookup(&lo, MINOR(dev) >> part_shift);
if (err < 0)
err = loop_add(&lo, MINOR(dev) >> part_shift);
@ -1982,7 +2043,7 @@ static struct kobject *loop_probe(dev_t dev, int *part, void *data)
kobj = NULL;
else
kobj = get_disk_and_module(lo->lo_disk);
mutex_unlock(&loop_index_mutex);
mutex_unlock(&loop_ctl_mutex);
*part = 0;
return kobj;
@ -1992,9 +2053,13 @@ static long loop_control_ioctl(struct file *file, unsigned int cmd,
unsigned long parm)
{
struct loop_device *lo;
int ret = -ENOSYS;
int ret;
mutex_lock(&loop_index_mutex);
ret = mutex_lock_killable(&loop_ctl_mutex);
if (ret)
return ret;
ret = -ENOSYS;
switch (cmd) {
case LOOP_CTL_ADD:
ret = loop_lookup(&lo, parm);
@ -2008,21 +2073,15 @@ static long loop_control_ioctl(struct file *file, unsigned int cmd,
ret = loop_lookup(&lo, parm);
if (ret < 0)
break;
ret = mutex_lock_killable(&lo->lo_ctl_mutex);
if (ret)
break;
if (lo->lo_state != Lo_unbound) {
ret = -EBUSY;
mutex_unlock(&lo->lo_ctl_mutex);
break;
}
if (atomic_read(&lo->lo_refcnt) > 0) {
ret = -EBUSY;
mutex_unlock(&lo->lo_ctl_mutex);
break;
}
lo->lo_disk->private_data = NULL;
mutex_unlock(&lo->lo_ctl_mutex);
idr_remove(&loop_index_idr, lo->lo_number);
loop_remove(lo);
break;
@ -2032,7 +2091,7 @@ static long loop_control_ioctl(struct file *file, unsigned int cmd,
break;
ret = loop_add(&lo, -1);
}
mutex_unlock(&loop_index_mutex);
mutex_unlock(&loop_ctl_mutex);
return ret;
}
@ -2116,10 +2175,10 @@ static int __init loop_init(void)
THIS_MODULE, loop_probe, NULL, NULL);
/* pre-create number of devices given by config or max_loop */
mutex_lock(&loop_index_mutex);
mutex_lock(&loop_ctl_mutex);
for (i = 0; i < nr; i++)
loop_add(&lo, i);
mutex_unlock(&loop_index_mutex);
mutex_unlock(&loop_ctl_mutex);
printk(KERN_INFO "loop: module loaded\n");
return 0;

View File

@ -54,7 +54,6 @@ struct loop_device {
spinlock_t lo_lock;
int lo_state;
struct mutex lo_ctl_mutex;
struct kthread_worker worker;
struct task_struct *worker_task;
bool use_dio;

View File

@ -168,41 +168,6 @@ static bool mtip_check_surprise_removal(struct pci_dev *pdev)
return false; /* device present */
}
/* we have to use runtime tag to setup command header */
static void mtip_init_cmd_header(struct request *rq)
{
struct driver_data *dd = rq->q->queuedata;
struct mtip_cmd *cmd = blk_mq_rq_to_pdu(rq);
/* Point the command headers at the command tables. */
cmd->command_header = dd->port->command_list +
(sizeof(struct mtip_cmd_hdr) * rq->tag);
cmd->command_header_dma = dd->port->command_list_dma +
(sizeof(struct mtip_cmd_hdr) * rq->tag);
if (test_bit(MTIP_PF_HOST_CAP_64, &dd->port->flags))
cmd->command_header->ctbau = __force_bit2int cpu_to_le32((cmd->command_dma >> 16) >> 16);
cmd->command_header->ctba = __force_bit2int cpu_to_le32(cmd->command_dma & 0xFFFFFFFF);
}
static struct mtip_cmd *mtip_get_int_command(struct driver_data *dd)
{
struct request *rq;
if (mtip_check_surprise_removal(dd->pdev))
return NULL;
rq = blk_mq_alloc_request(dd->queue, REQ_OP_DRV_IN, BLK_MQ_REQ_RESERVED);
if (IS_ERR(rq))
return NULL;
/* Internal cmd isn't submitted via .queue_rq */
mtip_init_cmd_header(rq);
return blk_mq_rq_to_pdu(rq);
}
static struct mtip_cmd *mtip_cmd_from_tag(struct driver_data *dd,
unsigned int tag)
{
@ -1023,13 +988,14 @@ static int mtip_exec_internal_command(struct mtip_port *port,
return -EFAULT;
}
int_cmd = mtip_get_int_command(dd);
if (!int_cmd) {
if (mtip_check_surprise_removal(dd->pdev))
return -EFAULT;
rq = blk_mq_alloc_request(dd->queue, REQ_OP_DRV_IN, BLK_MQ_REQ_RESERVED);
if (IS_ERR(rq)) {
dbg_printk(MTIP_DRV_NAME "Unable to allocate tag for PIO cmd\n");
return -EFAULT;
}
rq = blk_mq_rq_from_pdu(int_cmd);
rq->special = &icmd;
set_bit(MTIP_PF_IC_ACTIVE_BIT, &port->flags);
@ -1050,6 +1016,8 @@ static int mtip_exec_internal_command(struct mtip_port *port,
}
/* Copy the command to the command table */
int_cmd = blk_mq_rq_to_pdu(rq);
int_cmd->icmd = &icmd;
memcpy(int_cmd->command, fis, fis_len*4);
rq->timeout = timeout;
@ -1423,23 +1391,19 @@ static int mtip_get_smart_attr(struct mtip_port *port, unsigned int id,
* @dd pointer to driver_data structure
* @lba starting lba
* @len # of 512b sectors to trim
*
* return value
* -ENOMEM Out of dma memory
* -EINVAL Invalid parameters passed in, trim not supported
* -EIO Error submitting trim request to hw
*/
static int mtip_send_trim(struct driver_data *dd, unsigned int lba,
unsigned int len)
static blk_status_t mtip_send_trim(struct driver_data *dd, unsigned int lba,
unsigned int len)
{
int i, rv = 0;
u64 tlba, tlen, sect_left;
struct mtip_trim_entry *buf;
dma_addr_t dma_addr;
struct host_to_dev_fis fis;
blk_status_t ret = BLK_STS_OK;
int i;
if (!len || dd->trim_supp == false)
return -EINVAL;
return BLK_STS_IOERR;
/* Trim request too big */
WARN_ON(len > (MTIP_MAX_TRIM_ENTRY_LEN * MTIP_MAX_TRIM_ENTRIES));
@ -1454,7 +1418,7 @@ static int mtip_send_trim(struct driver_data *dd, unsigned int lba,
buf = dmam_alloc_coherent(&dd->pdev->dev, ATA_SECT_SIZE, &dma_addr,
GFP_KERNEL);
if (!buf)
return -ENOMEM;
return BLK_STS_RESOURCE;
memset(buf, 0, ATA_SECT_SIZE);
for (i = 0, sect_left = len, tlba = lba;
@ -1463,8 +1427,8 @@ static int mtip_send_trim(struct driver_data *dd, unsigned int lba,
tlen = (sect_left >= MTIP_MAX_TRIM_ENTRY_LEN ?
MTIP_MAX_TRIM_ENTRY_LEN :
sect_left);
buf[i].lba = __force_bit2int cpu_to_le32(tlba);
buf[i].range = __force_bit2int cpu_to_le16(tlen);
buf[i].lba = cpu_to_le32(tlba);
buf[i].range = cpu_to_le16(tlen);
tlba += tlen;
sect_left -= tlen;
}
@ -1486,10 +1450,10 @@ static int mtip_send_trim(struct driver_data *dd, unsigned int lba,
ATA_SECT_SIZE,
0,
MTIP_TRIM_TIMEOUT_MS) < 0)
rv = -EIO;
ret = BLK_STS_IOERR;
dmam_free_coherent(&dd->pdev->dev, ATA_SECT_SIZE, buf, dma_addr);
return rv;
return ret;
}
/*
@ -1585,23 +1549,20 @@ static inline void fill_command_sg(struct driver_data *dd,
int n;
unsigned int dma_len;
struct mtip_cmd_sg *command_sg;
struct scatterlist *sg = command->sg;
struct scatterlist *sg;
command_sg = command->command + AHCI_CMD_TBL_HDR_SZ;
for (n = 0; n < nents; n++) {
for_each_sg(command->sg, sg, nents, n) {
dma_len = sg_dma_len(sg);
if (dma_len > 0x400000)
dev_err(&dd->pdev->dev,
"DMA segment length truncated\n");
command_sg->info = __force_bit2int
cpu_to_le32((dma_len-1) & 0x3FFFFF);
command_sg->dba = __force_bit2int
cpu_to_le32(sg_dma_address(sg));
command_sg->dba_upper = __force_bit2int
command_sg->info = cpu_to_le32((dma_len-1) & 0x3FFFFF);
command_sg->dba = cpu_to_le32(sg_dma_address(sg));
command_sg->dba_upper =
cpu_to_le32((sg_dma_address(sg) >> 16) >> 16);
command_sg++;
sg++;
}
}
@ -2171,7 +2132,6 @@ static int mtip_hw_ioctl(struct driver_data *dd, unsigned int cmd,
* @dd Pointer to the driver data structure.
* @start First sector to read.
* @nsect Number of sectors to read.
* @nents Number of entries in scatter list for the read command.
* @tag The tag of this read command.
* @callback Pointer to the function that should be called
* when the read completes.
@ -2183,16 +2143,20 @@ static int mtip_hw_ioctl(struct driver_data *dd, unsigned int cmd,
* None
*/
static void mtip_hw_submit_io(struct driver_data *dd, struct request *rq,
struct mtip_cmd *command, int nents,
struct mtip_cmd *command,
struct blk_mq_hw_ctx *hctx)
{
struct mtip_cmd_hdr *hdr =
dd->port->command_list + sizeof(struct mtip_cmd_hdr) * rq->tag;
struct host_to_dev_fis *fis;
struct mtip_port *port = dd->port;
int dma_dir = rq_data_dir(rq) == READ ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
u64 start = blk_rq_pos(rq);
unsigned int nsect = blk_rq_sectors(rq);
unsigned int nents;
/* Map the scatter list for DMA access */
nents = blk_rq_map_sg(hctx->queue, rq, command->sg);
nents = dma_map_sg(&dd->pdev->dev, command->sg, nents, dma_dir);
prefetch(&port->flags);
@ -2233,10 +2197,11 @@ static void mtip_hw_submit_io(struct driver_data *dd, struct request *rq,
fis->device |= 1 << 7;
/* Populate the command header */
command->command_header->opts =
__force_bit2int cpu_to_le32(
(nents << 16) | 5 | AHCI_CMD_PREFETCH);
command->command_header->byte_count = 0;
hdr->ctba = cpu_to_le32(command->command_dma & 0xFFFFFFFF);
if (test_bit(MTIP_PF_HOST_CAP_64, &dd->port->flags))
hdr->ctbau = cpu_to_le32((command->command_dma >> 16) >> 16);
hdr->opts = cpu_to_le32((nents << 16) | 5 | AHCI_CMD_PREFETCH);
hdr->byte_count = 0;
command->direction = dma_dir;
@ -2715,12 +2680,12 @@ static void mtip_softirq_done_fn(struct request *rq)
cmd->direction);
if (unlikely(cmd->unaligned))
up(&dd->port->cmd_slot_unal);
atomic_inc(&dd->port->cmd_slot_unal);
blk_mq_end_request(rq, cmd->status);
}
static void mtip_abort_cmd(struct request *req, void *data, bool reserved)
static bool mtip_abort_cmd(struct request *req, void *data, bool reserved)
{
struct mtip_cmd *cmd = blk_mq_rq_to_pdu(req);
struct driver_data *dd = data;
@ -2730,14 +2695,16 @@ static void mtip_abort_cmd(struct request *req, void *data, bool reserved)
clear_bit(req->tag, dd->port->cmds_to_issue);
cmd->status = BLK_STS_IOERR;
mtip_softirq_done_fn(req);
return true;
}
static void mtip_queue_cmd(struct request *req, void *data, bool reserved)
static bool mtip_queue_cmd(struct request *req, void *data, bool reserved)
{
struct driver_data *dd = data;
set_bit(req->tag, dd->port->cmds_to_issue);
blk_abort_request(req);
return true;
}
/*
@ -2803,10 +2770,7 @@ restart_eh:
blk_mq_quiesce_queue(dd->queue);
spin_lock(dd->queue->queue_lock);
blk_mq_tagset_busy_iter(&dd->tags,
mtip_queue_cmd, dd);
spin_unlock(dd->queue->queue_lock);
blk_mq_tagset_busy_iter(&dd->tags, mtip_queue_cmd, dd);
set_bit(MTIP_PF_ISSUE_CMDS_BIT, &dd->port->flags);
@ -3026,7 +2990,7 @@ static int mtip_hw_init(struct driver_data *dd)
else
dd->unal_qdepth = 0;
sema_init(&dd->port->cmd_slot_unal, dd->unal_qdepth);
atomic_set(&dd->port->cmd_slot_unal, dd->unal_qdepth);
/* Spinlock to prevent concurrent issue */
for (i = 0; i < MTIP_MAX_SLOT_GROUPS; i++)
@ -3531,58 +3495,24 @@ static inline bool is_se_active(struct driver_data *dd)
return false;
}
/*
* Block layer make request function.
*
* This function is called by the kernel to process a BIO for
* the P320 device.
*
* @queue Pointer to the request queue. Unused other than to obtain
* the driver data structure.
* @rq Pointer to the request.
*
*/
static int mtip_submit_request(struct blk_mq_hw_ctx *hctx, struct request *rq)
static inline bool is_stopped(struct driver_data *dd, struct request *rq)
{
struct driver_data *dd = hctx->queue->queuedata;
struct mtip_cmd *cmd = blk_mq_rq_to_pdu(rq);
unsigned int nents;
if (likely(!(dd->dd_flag & MTIP_DDF_STOP_IO)))
return false;
if (is_se_active(dd))
return -ENODATA;
if (test_bit(MTIP_DDF_REMOVE_PENDING_BIT, &dd->dd_flag))
return true;
if (test_bit(MTIP_DDF_OVER_TEMP_BIT, &dd->dd_flag))
return true;
if (test_bit(MTIP_DDF_WRITE_PROTECT_BIT, &dd->dd_flag) &&
rq_data_dir(rq))
return true;
if (test_bit(MTIP_DDF_SEC_LOCK_BIT, &dd->dd_flag))
return true;
if (test_bit(MTIP_DDF_REBUILD_FAILED_BIT, &dd->dd_flag))
return true;
if (unlikely(dd->dd_flag & MTIP_DDF_STOP_IO)) {
if (unlikely(test_bit(MTIP_DDF_REMOVE_PENDING_BIT,
&dd->dd_flag))) {
return -ENXIO;
}
if (unlikely(test_bit(MTIP_DDF_OVER_TEMP_BIT, &dd->dd_flag))) {
return -ENODATA;
}
if (unlikely(test_bit(MTIP_DDF_WRITE_PROTECT_BIT,
&dd->dd_flag) &&
rq_data_dir(rq))) {
return -ENODATA;
}
if (unlikely(test_bit(MTIP_DDF_SEC_LOCK_BIT, &dd->dd_flag) ||
test_bit(MTIP_DDF_REBUILD_FAILED_BIT, &dd->dd_flag)))
return -ENODATA;
}
if (req_op(rq) == REQ_OP_DISCARD) {
int err;
err = mtip_send_trim(dd, blk_rq_pos(rq), blk_rq_sectors(rq));
blk_mq_end_request(rq, err ? BLK_STS_IOERR : BLK_STS_OK);
return 0;
}
/* Create the scatter list for this request. */
nents = blk_rq_map_sg(hctx->queue, rq, cmd->sg);
/* Issue the read/write. */
mtip_hw_submit_io(dd, rq, cmd, nents, hctx);
return 0;
return false;
}
static bool mtip_check_unal_depth(struct blk_mq_hw_ctx *hctx,
@ -3603,7 +3533,7 @@ static bool mtip_check_unal_depth(struct blk_mq_hw_ctx *hctx,
cmd->unaligned = 1;
}
if (cmd->unaligned && down_trylock(&dd->port->cmd_slot_unal))
if (cmd->unaligned && atomic_dec_if_positive(&dd->port->cmd_slot_unal) >= 0)
return true;
return false;
@ -3613,32 +3543,33 @@ static blk_status_t mtip_issue_reserved_cmd(struct blk_mq_hw_ctx *hctx,
struct request *rq)
{
struct driver_data *dd = hctx->queue->queuedata;
struct mtip_int_cmd *icmd = rq->special;
struct mtip_cmd *cmd = blk_mq_rq_to_pdu(rq);
struct mtip_int_cmd *icmd = cmd->icmd;
struct mtip_cmd_hdr *hdr =
dd->port->command_list + sizeof(struct mtip_cmd_hdr) * rq->tag;
struct mtip_cmd_sg *command_sg;
if (mtip_commands_active(dd->port))
return BLK_STS_RESOURCE;
return BLK_STS_DEV_RESOURCE;
hdr->ctba = cpu_to_le32(cmd->command_dma & 0xFFFFFFFF);
if (test_bit(MTIP_PF_HOST_CAP_64, &dd->port->flags))
hdr->ctbau = cpu_to_le32((cmd->command_dma >> 16) >> 16);
/* Populate the SG list */
cmd->command_header->opts =
__force_bit2int cpu_to_le32(icmd->opts | icmd->fis_len);
hdr->opts = cpu_to_le32(icmd->opts | icmd->fis_len);
if (icmd->buf_len) {
command_sg = cmd->command + AHCI_CMD_TBL_HDR_SZ;
command_sg->info =
__force_bit2int cpu_to_le32((icmd->buf_len-1) & 0x3FFFFF);
command_sg->dba =
__force_bit2int cpu_to_le32(icmd->buffer & 0xFFFFFFFF);
command_sg->info = cpu_to_le32((icmd->buf_len-1) & 0x3FFFFF);
command_sg->dba = cpu_to_le32(icmd->buffer & 0xFFFFFFFF);
command_sg->dba_upper =
__force_bit2int cpu_to_le32((icmd->buffer >> 16) >> 16);
cpu_to_le32((icmd->buffer >> 16) >> 16);
cmd->command_header->opts |=
__force_bit2int cpu_to_le32((1 << 16));
hdr->opts |= cpu_to_le32((1 << 16));
}
/* Populate the command header */
cmd->command_header->byte_count = 0;
hdr->byte_count = 0;
blk_mq_start_request(rq);
mtip_issue_non_ncq_command(dd->port, rq->tag);
@ -3648,23 +3579,25 @@ static blk_status_t mtip_issue_reserved_cmd(struct blk_mq_hw_ctx *hctx,
static blk_status_t mtip_queue_rq(struct blk_mq_hw_ctx *hctx,
const struct blk_mq_queue_data *bd)
{
struct driver_data *dd = hctx->queue->queuedata;
struct request *rq = bd->rq;
int ret;
mtip_init_cmd_header(rq);
struct mtip_cmd *cmd = blk_mq_rq_to_pdu(rq);
if (blk_rq_is_passthrough(rq))
return mtip_issue_reserved_cmd(hctx, rq);
if (unlikely(mtip_check_unal_depth(hctx, rq)))
return BLK_STS_RESOURCE;
return BLK_STS_DEV_RESOURCE;
if (is_se_active(dd) || is_stopped(dd, rq))
return BLK_STS_IOERR;
blk_mq_start_request(rq);
ret = mtip_submit_request(hctx, rq);
if (likely(!ret))
return BLK_STS_OK;
return BLK_STS_IOERR;
if (req_op(rq) == REQ_OP_DISCARD)
return mtip_send_trim(dd, blk_rq_pos(rq), blk_rq_sectors(rq));
mtip_hw_submit_io(dd, rq, cmd, hctx);
return BLK_STS_OK;
}
static void mtip_free_cmd(struct blk_mq_tag_set *set, struct request *rq,
@ -3920,12 +3853,13 @@ protocol_init_error:
return rv;
}
static void mtip_no_dev_cleanup(struct request *rq, void *data, bool reserv)
static bool mtip_no_dev_cleanup(struct request *rq, void *data, bool reserv)
{
struct mtip_cmd *cmd = blk_mq_rq_to_pdu(rq);
cmd->status = BLK_STS_IOERR;
blk_mq_complete_request(rq);
return true;
}
/*

View File

@ -126,8 +126,6 @@
#define MTIP_DFS_MAX_BUF_SIZE 1024
#define __force_bit2int (unsigned int __force)
enum {
/* below are bit numbers in 'flags' defined in mtip_port */
MTIP_PF_IC_ACTIVE_BIT = 0, /* pio/ioctl */
@ -174,10 +172,10 @@ enum {
struct smart_attr {
u8 attr_id;
u16 flags;
__le16 flags;
u8 cur;
u8 worst;
u32 data;
__le32 data;
u8 res[3];
} __packed;
@ -200,9 +198,9 @@ struct mtip_work {
#define MTIP_MAX_TRIM_ENTRY_LEN 0xfff8
struct mtip_trim_entry {
u32 lba; /* starting lba of region */
u16 rsvd; /* unused */
u16 range; /* # of 512b blocks to trim */
__le32 lba; /* starting lba of region */
__le16 rsvd; /* unused */
__le16 range; /* # of 512b blocks to trim */
} __packed;
struct mtip_trim {
@ -278,24 +276,24 @@ struct mtip_cmd_hdr {
* - Bit 5 Unused in this implementation.
* - Bits 4:0 Length of the command FIS in DWords (DWord = 4 bytes).
*/
unsigned int opts;
__le32 opts;
/* This field is unsed when using NCQ. */
union {
unsigned int byte_count;
unsigned int status;
__le32 byte_count;
__le32 status;
};
/*
* Lower 32 bits of the command table address associated with this
* header. The command table addresses must be 128 byte aligned.
*/
unsigned int ctba;
__le32 ctba;
/*
* If 64 bit addressing is used this field is the upper 32 bits
* of the command table address associated with this command.
*/
unsigned int ctbau;
__le32 ctbau;
/* Reserved and unused. */
unsigned int res[4];
u32 res[4];
};
/* Command scatter gather structure (PRD). */
@ -305,31 +303,28 @@ struct mtip_cmd_sg {
* address must be 8 byte aligned signified by bits 2:0 being
* set to 0.
*/
unsigned int dba;
__le32 dba;
/*
* When 64 bit addressing is used this field is the upper
* 32 bits of the data buffer address.
*/
unsigned int dba_upper;
__le32 dba_upper;
/* Unused. */
unsigned int reserved;
__le32 reserved;
/*
* Bit 31: interrupt when this data block has been transferred.
* Bits 30..22: reserved
* Bits 21..0: byte count (minus 1). For P320 the byte count must be
* 8 byte aligned signified by bits 2:0 being set to 1.
*/
unsigned int info;
__le32 info;
};
struct mtip_port;
struct mtip_int_cmd;
/* Structure used to describe a command. */
struct mtip_cmd {
struct mtip_cmd_hdr *command_header; /* ptr to command header entry */
dma_addr_t command_header_dma; /* corresponding physical address */
void *command; /* ptr to command table entry */
dma_addr_t command_dma; /* corresponding physical address */
@ -338,7 +333,10 @@ struct mtip_cmd {
int unaligned; /* command is unaligned on 4k boundary */
struct scatterlist sg[MTIP_MAX_SG]; /* Scatter list entries */
union {
struct scatterlist sg[MTIP_MAX_SG]; /* Scatter list entries */
struct mtip_int_cmd *icmd;
};
int retries; /* The number of retries left for this command. */
@ -435,8 +433,8 @@ struct mtip_port {
*/
unsigned long ic_pause_timer;
/* Semaphore to control queue depth of unaligned IOs */
struct semaphore cmd_slot_unal;
/* Counter to control queue depth of unaligned IOs */
atomic_t cmd_slot_unal;
/* Spinlock for working around command-issue bug. */
spinlock_t cmd_issue_lock[MTIP_MAX_SLOT_GROUPS];

View File

@ -734,12 +734,13 @@ static void recv_work(struct work_struct *work)
kfree(args);
}
static void nbd_clear_req(struct request *req, void *data, bool reserved)
static bool nbd_clear_req(struct request *req, void *data, bool reserved)
{
struct nbd_cmd *cmd = blk_mq_rq_to_pdu(req);
cmd->status = BLK_STS_IOERR;
blk_mq_complete_request(req);
return true;
}
static void nbd_clear_que(struct nbd_device *nbd)

View File

@ -49,6 +49,7 @@ struct nullb_device {
unsigned long completion_nsec; /* time in ns to complete a request */
unsigned long cache_size; /* disk cache size in MB */
unsigned long zone_size; /* zone size in MB if device is zoned */
unsigned int zone_nr_conv; /* number of conventional zones */
unsigned int submit_queues; /* number of submission queues */
unsigned int home_node; /* home node for the device */
unsigned int queue_mode; /* block interface */

View File

@ -188,6 +188,10 @@ static unsigned long g_zone_size = 256;
module_param_named(zone_size, g_zone_size, ulong, S_IRUGO);
MODULE_PARM_DESC(zone_size, "Zone size in MB when block device is zoned. Must be power-of-two: Default: 256");
static unsigned int g_zone_nr_conv;
module_param_named(zone_nr_conv, g_zone_nr_conv, uint, 0444);
MODULE_PARM_DESC(zone_nr_conv, "Number of conventional zones when block device is zoned. Default: 0");
static struct nullb_device *null_alloc_dev(void);
static void null_free_dev(struct nullb_device *dev);
static void null_del_dev(struct nullb *nullb);
@ -293,6 +297,7 @@ NULLB_DEVICE_ATTR(mbps, uint);
NULLB_DEVICE_ATTR(cache_size, ulong);
NULLB_DEVICE_ATTR(zoned, bool);
NULLB_DEVICE_ATTR(zone_size, ulong);
NULLB_DEVICE_ATTR(zone_nr_conv, uint);
static ssize_t nullb_device_power_show(struct config_item *item, char *page)
{
@ -407,6 +412,7 @@ static struct configfs_attribute *nullb_device_attrs[] = {
&nullb_device_attr_badblocks,
&nullb_device_attr_zoned,
&nullb_device_attr_zone_size,
&nullb_device_attr_zone_nr_conv,
NULL,
};
@ -520,6 +526,7 @@ static struct nullb_device *null_alloc_dev(void)
dev->use_per_node_hctx = g_use_per_node_hctx;
dev->zoned = g_zoned;
dev->zone_size = g_zone_size;
dev->zone_nr_conv = g_zone_nr_conv;
return dev;
}
@ -635,14 +642,9 @@ static void null_cmd_end_timer(struct nullb_cmd *cmd)
hrtimer_start(&cmd->timer, kt, HRTIMER_MODE_REL);
}
static void null_softirq_done_fn(struct request *rq)
static void null_complete_rq(struct request *rq)
{
struct nullb *nullb = rq->q->queuedata;
if (nullb->dev->queue_mode == NULL_Q_MQ)
end_cmd(blk_mq_rq_to_pdu(rq));
else
end_cmd(rq->special);
end_cmd(blk_mq_rq_to_pdu(rq));
}
static struct nullb_page *null_alloc_page(gfp_t gfp_flags)
@ -1350,7 +1352,7 @@ static blk_status_t null_queue_rq(struct blk_mq_hw_ctx *hctx,
static const struct blk_mq_ops null_mq_ops = {
.queue_rq = null_queue_rq,
.complete = null_softirq_done_fn,
.complete = null_complete_rq,
.timeout = null_timeout_rq,
};
@ -1657,8 +1659,7 @@ static int null_add_dev(struct nullb_device *dev)
}
null_init_queues(nullb);
} else if (dev->queue_mode == NULL_Q_BIO) {
nullb->q = blk_alloc_queue_node(GFP_KERNEL, dev->home_node,
NULL);
nullb->q = blk_alloc_queue_node(GFP_KERNEL, dev->home_node);
if (!nullb->q) {
rv = -ENOMEM;
goto out_cleanup_queues;

View File

@ -29,7 +29,25 @@ int null_zone_init(struct nullb_device *dev)
if (!dev->zones)
return -ENOMEM;
for (i = 0; i < dev->nr_zones; i++) {
if (dev->zone_nr_conv >= dev->nr_zones) {
dev->zone_nr_conv = dev->nr_zones - 1;
pr_info("null_blk: changed the number of conventional zones to %u",
dev->zone_nr_conv);
}
for (i = 0; i < dev->zone_nr_conv; i++) {
struct blk_zone *zone = &dev->zones[i];
zone->start = sector;
zone->len = dev->zone_size_sects;
zone->wp = zone->start + zone->len;
zone->type = BLK_ZONE_TYPE_CONVENTIONAL;
zone->cond = BLK_ZONE_COND_NOT_WP;
sector += dev->zone_size_sects;
}
for (i = dev->zone_nr_conv; i < dev->nr_zones; i++) {
struct blk_zone *zone = &dev->zones[i];
zone->start = zone->wp = sector;
@ -98,6 +116,8 @@ void null_zone_write(struct nullb_cmd *cmd, sector_t sector,
if (zone->wp == zone->start + zone->len)
zone->cond = BLK_ZONE_COND_FULL;
break;
case BLK_ZONE_COND_NOT_WP:
break;
default:
/* Invalid zone condition */
cmd->error = BLK_STS_IOERR;
@ -111,6 +131,11 @@ void null_zone_reset(struct nullb_cmd *cmd, sector_t sector)
unsigned int zno = null_zone_no(dev, sector);
struct blk_zone *zone = &dev->zones[zno];
if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL) {
cmd->error = BLK_STS_IOERR;
return;
}
zone->cond = BLK_ZONE_COND_EMPTY;
zone->wp = zone->start;
}

View File

@ -242,6 +242,11 @@ struct pd_unit {
static struct pd_unit pd[PD_UNITS];
struct pd_req {
/* for REQ_OP_DRV_IN: */
enum action (*func)(struct pd_unit *disk);
};
static char pd_scratch[512]; /* scratch block buffer */
static char *pd_errs[17] = { "ERR", "INDEX", "ECC", "DRQ", "SEEK", "WRERR",
@ -502,8 +507,9 @@ static enum action do_pd_io_start(void)
static enum action pd_special(void)
{
enum action (*func)(struct pd_unit *) = pd_req->special;
return func(pd_current);
struct pd_req *req = blk_mq_rq_to_pdu(pd_req);
return req->func(pd_current);
}
static int pd_next_buf(void)
@ -767,12 +773,14 @@ static int pd_special_command(struct pd_unit *disk,
enum action (*func)(struct pd_unit *disk))
{
struct request *rq;
struct pd_req *req;
rq = blk_get_request(disk->gd->queue, REQ_OP_DRV_IN, 0);
if (IS_ERR(rq))
return PTR_ERR(rq);
req = blk_mq_rq_to_pdu(rq);
rq->special = func;
req->func = func;
blk_execute_rq(disk->gd->queue, disk->gd, rq, 0);
blk_put_request(rq);
return 0;
@ -892,9 +900,21 @@ static void pd_probe_drive(struct pd_unit *disk)
disk->gd = p;
p->private_data = disk;
p->queue = blk_mq_init_sq_queue(&disk->tag_set, &pd_mq_ops, 2,
BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_BLOCKING);
memset(&disk->tag_set, 0, sizeof(disk->tag_set));
disk->tag_set.ops = &pd_mq_ops;
disk->tag_set.cmd_size = sizeof(struct pd_req);
disk->tag_set.nr_hw_queues = 1;
disk->tag_set.nr_maps = 1;
disk->tag_set.queue_depth = 2;
disk->tag_set.numa_node = NUMA_NO_NODE;
disk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_BLOCKING;
if (blk_mq_alloc_tag_set(&disk->tag_set))
return;
p->queue = blk_mq_init_queue(&disk->tag_set);
if (IS_ERR(p->queue)) {
blk_mq_free_tag_set(&disk->tag_set);
p->queue = NULL;
return;
}

View File

@ -2203,9 +2203,7 @@ static int pkt_open_dev(struct pktcdvd_device *pd, fmode_t write)
* Some CDRW drives can not handle writes larger than one packet,
* even if the size is a multiple of the packet size.
*/
spin_lock_irq(q->queue_lock);
blk_queue_max_hw_sectors(q, pd->settings.size);
spin_unlock_irq(q->queue_lock);
set_bit(PACKET_WRITABLE, &pd->flags);
} else {
pkt_set_speed(pd, MAX_SPEED, MAX_SPEED);

View File

@ -181,6 +181,7 @@ struct skd_request_context {
struct fit_completion_entry_v1 completion;
struct fit_comp_error_info err_info;
int retries;
blk_status_t status;
};
@ -382,11 +383,12 @@ static void skd_log_skreq(struct skd_device *skdev,
* READ/WRITE REQUESTS
*****************************************************************************
*/
static void skd_inc_in_flight(struct request *rq, void *data, bool reserved)
static bool skd_inc_in_flight(struct request *rq, void *data, bool reserved)
{
int *count = data;
count++;
return true;
}
static int skd_in_flight(struct skd_device *skdev)
@ -494,6 +496,11 @@ static blk_status_t skd_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
if (unlikely(skdev->state != SKD_DRVR_STATE_ONLINE))
return skd_fail_all(q) ? BLK_STS_IOERR : BLK_STS_RESOURCE;
if (!(req->rq_flags & RQF_DONTPREP)) {
skreq->retries = 0;
req->rq_flags |= RQF_DONTPREP;
}
blk_mq_start_request(req);
WARN_ONCE(tag >= skd_max_queue_depth, "%#x > %#x (nr_requests = %lu)\n",
@ -1425,7 +1432,7 @@ static void skd_resolve_req_exception(struct skd_device *skdev,
break;
case SKD_CHECK_STATUS_REQUEUE_REQUEST:
if ((unsigned long) ++req->special < SKD_MAX_RETRIES) {
if (++skreq->retries < SKD_MAX_RETRIES) {
skd_log_skreq(skdev, skreq, "retry");
blk_mq_requeue_request(req, true);
break;
@ -1887,13 +1894,13 @@ static void skd_isr_fwstate(struct skd_device *skdev)
skd_skdev_state_to_str(skdev->state), skdev->state);
}
static void skd_recover_request(struct request *req, void *data, bool reserved)
static bool skd_recover_request(struct request *req, void *data, bool reserved)
{
struct skd_device *const skdev = data;
struct skd_request_context *skreq = blk_mq_rq_to_pdu(req);
if (skreq->state != SKD_REQ_STATE_BUSY)
return;
return true;
skd_log_skreq(skdev, skreq, "recover");
@ -1904,6 +1911,7 @@ static void skd_recover_request(struct request *req, void *data, bool reserved)
skreq->state = SKD_REQ_STATE_IDLE;
skreq->status = BLK_STS_IOERR;
blk_mq_complete_request(req);
return true;
}
static void skd_recover_requests(struct skd_device *skdev)

View File

@ -6,7 +6,7 @@
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/types.h>
#include <linux/blkdev.h>
#include <linux/blk-mq.h>
#include <linux/hdreg.h>
#include <linux/genhd.h>
#include <linux/cdrom.h>
@ -45,6 +45,8 @@ MODULE_VERSION(DRV_MODULE_VERSION);
#define WAITING_FOR_GEN_CMD 0x04
#define WAITING_FOR_ANY -1
#define VDC_MAX_RETRIES 10
static struct workqueue_struct *sunvdc_wq;
struct vdc_req_entry {
@ -66,9 +68,10 @@ struct vdc_port {
u64 max_xfer_size;
u32 vdisk_block_size;
u32 drain;
u64 ldc_timeout;
struct timer_list ldc_reset_timer;
struct delayed_work ldc_reset_timer_work;
struct work_struct ldc_reset_work;
/* The server fills these in for us in the disk attribute
@ -80,12 +83,14 @@ struct vdc_port {
u8 vdisk_mtype;
u32 vdisk_phys_blksz;
struct blk_mq_tag_set tag_set;
char disk_name[32];
};
static void vdc_ldc_reset(struct vdc_port *port);
static void vdc_ldc_reset_work(struct work_struct *work);
static void vdc_ldc_reset_timer(struct timer_list *t);
static void vdc_ldc_reset_timer_work(struct work_struct *work);
static inline struct vdc_port *to_vdc_port(struct vio_driver_state *vio)
{
@ -175,11 +180,8 @@ static void vdc_blk_queue_start(struct vdc_port *port)
* handshake completes, so check for initial handshake before we've
* allocated a disk.
*/
if (port->disk && blk_queue_stopped(port->disk->queue) &&
vdc_tx_dring_avail(dr) * 100 / VDC_TX_RING_SIZE >= 50) {
blk_start_queue(port->disk->queue);
}
if (port->disk && vdc_tx_dring_avail(dr) * 100 / VDC_TX_RING_SIZE >= 50)
blk_mq_start_hw_queues(port->disk->queue);
}
static void vdc_finish(struct vio_driver_state *vio, int err, int waiting_for)
@ -197,7 +199,7 @@ static void vdc_handshake_complete(struct vio_driver_state *vio)
{
struct vdc_port *port = to_vdc_port(vio);
del_timer(&port->ldc_reset_timer);
cancel_delayed_work(&port->ldc_reset_timer_work);
vdc_finish(vio, 0, WAITING_FOR_LINK_UP);
vdc_blk_queue_start(port);
}
@ -320,7 +322,7 @@ static void vdc_end_one(struct vdc_port *port, struct vio_dring_state *dr,
rqe->req = NULL;
__blk_end_request(req, (desc->status ? BLK_STS_IOERR : 0), desc->size);
blk_mq_end_request(req, desc->status ? BLK_STS_IOERR : 0);
vdc_blk_queue_start(port);
}
@ -431,6 +433,7 @@ static int __vdc_tx_trigger(struct vdc_port *port)
.end_idx = dr->prod,
};
int err, delay;
int retries = 0;
hdr.seq = dr->snd_nxt;
delay = 1;
@ -443,6 +446,8 @@ static int __vdc_tx_trigger(struct vdc_port *port)
udelay(delay);
if ((delay <<= 1) > 128)
delay = 128;
if (retries++ > VDC_MAX_RETRIES)
break;
} while (err == -EAGAIN);
if (err == -ENOTCONN)
@ -525,29 +530,40 @@ static int __send_request(struct request *req)
return err;
}
static void do_vdc_request(struct request_queue *rq)
static blk_status_t vdc_queue_rq(struct blk_mq_hw_ctx *hctx,
const struct blk_mq_queue_data *bd)
{
struct request *req;
struct vdc_port *port = hctx->queue->queuedata;
struct vio_dring_state *dr;
unsigned long flags;
while ((req = blk_peek_request(rq)) != NULL) {
struct vdc_port *port;
struct vio_dring_state *dr;
dr = &port->vio.drings[VIO_DRIVER_TX_RING];
port = req->rq_disk->private_data;
dr = &port->vio.drings[VIO_DRIVER_TX_RING];
if (unlikely(vdc_tx_dring_avail(dr) < 1))
goto wait;
blk_mq_start_request(bd->rq);
blk_start_request(req);
spin_lock_irqsave(&port->vio.lock, flags);
if (__send_request(req) < 0) {
blk_requeue_request(rq, req);
wait:
/* Avoid pointless unplugs. */
blk_stop_queue(rq);
break;
}
/*
* Doing drain, just end the request in error
*/
if (unlikely(port->drain)) {
spin_unlock_irqrestore(&port->vio.lock, flags);
return BLK_STS_IOERR;
}
if (unlikely(vdc_tx_dring_avail(dr) < 1)) {
spin_unlock_irqrestore(&port->vio.lock, flags);
blk_mq_stop_hw_queue(hctx);
return BLK_STS_DEV_RESOURCE;
}
if (__send_request(bd->rq) < 0) {
spin_unlock_irqrestore(&port->vio.lock, flags);
return BLK_STS_IOERR;
}
spin_unlock_irqrestore(&port->vio.lock, flags);
return BLK_STS_OK;
}
static int generic_request(struct vdc_port *port, u8 op, void *buf, int len)
@ -759,6 +775,31 @@ static void vdc_port_down(struct vdc_port *port)
vio_ldc_free(&port->vio);
}
static const struct blk_mq_ops vdc_mq_ops = {
.queue_rq = vdc_queue_rq,
};
static void cleanup_queue(struct request_queue *q)
{
struct vdc_port *port = q->queuedata;
blk_cleanup_queue(q);
blk_mq_free_tag_set(&port->tag_set);
}
static struct request_queue *init_queue(struct vdc_port *port)
{
struct request_queue *q;
q = blk_mq_init_sq_queue(&port->tag_set, &vdc_mq_ops, VDC_TX_RING_SIZE,
BLK_MQ_F_SHOULD_MERGE);
if (IS_ERR(q))
return q;
q->queuedata = port;
return q;
}
static int probe_disk(struct vdc_port *port)
{
struct request_queue *q;
@ -796,17 +837,17 @@ static int probe_disk(struct vdc_port *port)
(u64)geom.num_sec);
}
q = blk_init_queue(do_vdc_request, &port->vio.lock);
if (!q) {
q = init_queue(port);
if (IS_ERR(q)) {
printk(KERN_ERR PFX "%s: Could not allocate queue.\n",
port->vio.name);
return -ENOMEM;
return PTR_ERR(q);
}
g = alloc_disk(1 << PARTITION_SHIFT);
if (!g) {
printk(KERN_ERR PFX "%s: Could not allocate gendisk.\n",
port->vio.name);
blk_cleanup_queue(q);
cleanup_queue(q);
return -ENOMEM;
}
@ -981,7 +1022,7 @@ static int vdc_port_probe(struct vio_dev *vdev, const struct vio_device_id *id)
*/
ldc_timeout = mdesc_get_property(hp, vdev->mp, "vdc-timeout", NULL);
port->ldc_timeout = ldc_timeout ? *ldc_timeout : 0;
timer_setup(&port->ldc_reset_timer, vdc_ldc_reset_timer, 0);
INIT_DELAYED_WORK(&port->ldc_reset_timer_work, vdc_ldc_reset_timer_work);
INIT_WORK(&port->ldc_reset_work, vdc_ldc_reset_work);
err = vio_driver_init(&port->vio, vdev, VDEV_DISK,
@ -1034,18 +1075,14 @@ static int vdc_port_remove(struct vio_dev *vdev)
struct vdc_port *port = dev_get_drvdata(&vdev->dev);
if (port) {
unsigned long flags;
spin_lock_irqsave(&port->vio.lock, flags);
blk_stop_queue(port->disk->queue);
spin_unlock_irqrestore(&port->vio.lock, flags);
blk_mq_stop_hw_queues(port->disk->queue);
flush_work(&port->ldc_reset_work);
del_timer_sync(&port->ldc_reset_timer);
cancel_delayed_work_sync(&port->ldc_reset_timer_work);
del_timer_sync(&port->vio.timer);
del_gendisk(port->disk);
blk_cleanup_queue(port->disk->queue);
cleanup_queue(port->disk->queue);
put_disk(port->disk);
port->disk = NULL;
@ -1080,32 +1117,46 @@ static void vdc_requeue_inflight(struct vdc_port *port)
}
rqe->req = NULL;
blk_requeue_request(port->disk->queue, req);
blk_mq_requeue_request(req, false);
}
}
static void vdc_queue_drain(struct vdc_port *port)
{
struct request *req;
struct request_queue *q = port->disk->queue;
while ((req = blk_fetch_request(port->disk->queue)) != NULL)
__blk_end_request_all(req, BLK_STS_IOERR);
/*
* Mark the queue as draining, then freeze/quiesce to ensure
* that all existing requests are seen in ->queue_rq() and killed
*/
port->drain = 1;
spin_unlock_irq(&port->vio.lock);
blk_mq_freeze_queue(q);
blk_mq_quiesce_queue(q);
spin_lock_irq(&port->vio.lock);
port->drain = 0;
blk_mq_unquiesce_queue(q);
blk_mq_unfreeze_queue(q);
}
static void vdc_ldc_reset_timer(struct timer_list *t)
static void vdc_ldc_reset_timer_work(struct work_struct *work)
{
struct vdc_port *port = from_timer(port, t, ldc_reset_timer);
struct vio_driver_state *vio = &port->vio;
unsigned long flags;
struct vdc_port *port;
struct vio_driver_state *vio;
spin_lock_irqsave(&vio->lock, flags);
port = container_of(work, struct vdc_port, ldc_reset_timer_work.work);
vio = &port->vio;
spin_lock_irq(&vio->lock);
if (!(port->vio.hs_state & VIO_HS_COMPLETE)) {
pr_warn(PFX "%s ldc down %llu seconds, draining queue\n",
port->disk_name, port->ldc_timeout);
vdc_queue_drain(port);
vdc_blk_queue_start(port);
}
spin_unlock_irqrestore(&vio->lock, flags);
spin_unlock_irq(&vio->lock);
}
static void vdc_ldc_reset_work(struct work_struct *work)
@ -1129,7 +1180,7 @@ static void vdc_ldc_reset(struct vdc_port *port)
assert_spin_locked(&port->vio.lock);
pr_warn(PFX "%s ldc link reset\n", port->disk_name);
blk_stop_queue(port->disk->queue);
blk_mq_stop_hw_queues(port->disk->queue);
vdc_requeue_inflight(port);
vdc_port_down(port);
@ -1146,7 +1197,7 @@ static void vdc_ldc_reset(struct vdc_port *port)
}
if (port->ldc_timeout)
mod_timer(&port->ldc_reset_timer,
mod_delayed_work(system_wq, &port->ldc_reset_timer_work,
round_jiffies(jiffies + HZ * port->ldc_timeout));
mod_timer(&port->vio.timer, round_jiffies(jiffies + HZ));
return;

View File

@ -243,7 +243,6 @@ struct carm_port {
unsigned int port_no;
struct gendisk *disk;
struct carm_host *host;
struct blk_mq_tag_set tag_set;
/* attached device characteristics */
u64 capacity;
@ -254,13 +253,10 @@ struct carm_port {
};
struct carm_request {
unsigned int tag;
int n_elem;
unsigned int msg_type;
unsigned int msg_subtype;
unsigned int msg_bucket;
struct request *rq;
struct carm_port *port;
struct scatterlist sg[CARM_MAX_REQ_SG];
};
@ -291,9 +287,6 @@ struct carm_host {
unsigned int wait_q_cons;
struct request_queue *wait_q[CARM_MAX_WAIT_Q];
unsigned int n_msgs;
u64 msg_alloc;
struct carm_request req[CARM_MAX_REQ];
void *msg_base;
dma_addr_t msg_dma;
@ -478,10 +471,10 @@ static inline dma_addr_t carm_ref_msg_dma(struct carm_host *host,
}
static int carm_send_msg(struct carm_host *host,
struct carm_request *crq)
struct carm_request *crq, unsigned tag)
{
void __iomem *mmio = host->mmio;
u32 msg = (u32) carm_ref_msg_dma(host, crq->tag);
u32 msg = (u32) carm_ref_msg_dma(host, tag);
u32 cm_bucket = crq->msg_bucket;
u32 tmp;
int rc = 0;
@ -506,99 +499,24 @@ static int carm_send_msg(struct carm_host *host,
return rc;
}
static struct carm_request *carm_get_request(struct carm_host *host)
{
unsigned int i;
/* obey global hardware limit on S/G entries */
if (host->hw_sg_used >= (CARM_MAX_HOST_SG - CARM_MAX_REQ_SG))
return NULL;
for (i = 0; i < max_queue; i++)
if ((host->msg_alloc & (1ULL << i)) == 0) {
struct carm_request *crq = &host->req[i];
crq->port = NULL;
crq->n_elem = 0;
host->msg_alloc |= (1ULL << i);
host->n_msgs++;
assert(host->n_msgs <= CARM_MAX_REQ);
sg_init_table(crq->sg, CARM_MAX_REQ_SG);
return crq;
}
DPRINTK("no request available, returning NULL\n");
return NULL;
}
static int carm_put_request(struct carm_host *host, struct carm_request *crq)
{
assert(crq->tag < max_queue);
if (unlikely((host->msg_alloc & (1ULL << crq->tag)) == 0))
return -EINVAL; /* tried to clear a tag that was not active */
assert(host->hw_sg_used >= crq->n_elem);
host->msg_alloc &= ~(1ULL << crq->tag);
host->hw_sg_used -= crq->n_elem;
host->n_msgs--;
return 0;
}
static struct carm_request *carm_get_special(struct carm_host *host)
{
unsigned long flags;
struct carm_request *crq = NULL;
struct request *rq;
int tries = 5000;
while (tries-- > 0) {
spin_lock_irqsave(&host->lock, flags);
crq = carm_get_request(host);
spin_unlock_irqrestore(&host->lock, flags);
if (crq)
break;
msleep(10);
}
if (!crq)
return NULL;
rq = blk_get_request(host->oob_q, REQ_OP_DRV_OUT, 0);
if (IS_ERR(rq)) {
spin_lock_irqsave(&host->lock, flags);
carm_put_request(host, crq);
spin_unlock_irqrestore(&host->lock, flags);
return NULL;
}
crq->rq = rq;
return crq;
}
static int carm_array_info (struct carm_host *host, unsigned int array_idx)
{
struct carm_msg_ioctl *ioc;
unsigned int idx;
u32 msg_data;
dma_addr_t msg_dma;
struct carm_request *crq;
struct request *rq;
int rc;
crq = carm_get_special(host);
if (!crq) {
rq = blk_mq_alloc_request(host->oob_q, REQ_OP_DRV_OUT, 0);
if (IS_ERR(rq)) {
rc = -ENOMEM;
goto err_out;
}
crq = blk_mq_rq_to_pdu(rq);
idx = crq->tag;
ioc = carm_ref_msg(host, idx);
msg_dma = carm_ref_msg_dma(host, idx);
ioc = carm_ref_msg(host, rq->tag);
msg_dma = carm_ref_msg_dma(host, rq->tag);
msg_data = (u32) (msg_dma + sizeof(struct carm_array_info));
crq->msg_type = CARM_MSG_ARRAY;
@ -612,7 +530,7 @@ static int carm_array_info (struct carm_host *host, unsigned int array_idx)
ioc->type = CARM_MSG_ARRAY;
ioc->subtype = CARM_ARRAY_INFO;
ioc->array_id = (u8) array_idx;
ioc->handle = cpu_to_le32(TAG_ENCODE(idx));
ioc->handle = cpu_to_le32(TAG_ENCODE(rq->tag));
ioc->data_addr = cpu_to_le32(msg_data);
spin_lock_irq(&host->lock);
@ -620,9 +538,8 @@ static int carm_array_info (struct carm_host *host, unsigned int array_idx)
host->state == HST_DEV_SCAN);
spin_unlock_irq(&host->lock);
DPRINTK("blk_execute_rq_nowait, tag == %u\n", idx);
crq->rq->special = crq;
blk_execute_rq_nowait(host->oob_q, NULL, crq->rq, true, NULL);
DPRINTK("blk_execute_rq_nowait, tag == %u\n", rq->tag);
blk_execute_rq_nowait(host->oob_q, NULL, rq, true, NULL);
return 0;
@ -637,21 +554,21 @@ typedef unsigned int (*carm_sspc_t)(struct carm_host *, unsigned int, void *);
static int carm_send_special (struct carm_host *host, carm_sspc_t func)
{
struct request *rq;
struct carm_request *crq;
struct carm_msg_ioctl *ioc;
void *mem;
unsigned int idx, msg_size;
unsigned int msg_size;
int rc;
crq = carm_get_special(host);
if (!crq)
rq = blk_mq_alloc_request(host->oob_q, REQ_OP_DRV_OUT, 0);
if (IS_ERR(rq))
return -ENOMEM;
crq = blk_mq_rq_to_pdu(rq);
idx = crq->tag;
mem = carm_ref_msg(host, rq->tag);
mem = carm_ref_msg(host, idx);
msg_size = func(host, idx, mem);
msg_size = func(host, rq->tag, mem);
ioc = mem;
crq->msg_type = ioc->type;
@ -660,9 +577,8 @@ static int carm_send_special (struct carm_host *host, carm_sspc_t func)
BUG_ON(rc < 0);
crq->msg_bucket = (u32) rc;
DPRINTK("blk_execute_rq_nowait, tag == %u\n", idx);
crq->rq->special = crq;
blk_execute_rq_nowait(host->oob_q, NULL, crq->rq, true, NULL);
DPRINTK("blk_execute_rq_nowait, tag == %u\n", rq->tag);
blk_execute_rq_nowait(host->oob_q, NULL, rq, true, NULL);
return 0;
}
@ -744,19 +660,6 @@ static unsigned int carm_fill_get_fw_ver(struct carm_host *host,
sizeof(struct carm_fw_ver);
}
static inline void carm_end_request_queued(struct carm_host *host,
struct carm_request *crq,
blk_status_t error)
{
struct request *req = crq->rq;
int rc;
blk_mq_end_request(req, error);
rc = carm_put_request(host, crq);
assert(rc == 0);
}
static inline void carm_push_q (struct carm_host *host, struct request_queue *q)
{
unsigned int idx = host->wait_q_prod % CARM_MAX_WAIT_Q;
@ -791,101 +694,50 @@ static inline void carm_round_robin(struct carm_host *host)
}
}
static inline void carm_end_rq(struct carm_host *host, struct carm_request *crq,
blk_status_t error)
static inline enum dma_data_direction carm_rq_dir(struct request *rq)
{
carm_end_request_queued(host, crq, error);
if (max_queue == 1)
carm_round_robin(host);
else if ((host->n_msgs <= CARM_MSG_LOW_WATER) &&
(host->hw_sg_used <= CARM_SG_LOW_WATER)) {
carm_round_robin(host);
}
}
static blk_status_t carm_oob_queue_rq(struct blk_mq_hw_ctx *hctx,
const struct blk_mq_queue_data *bd)
{
struct request_queue *q = hctx->queue;
struct carm_host *host = q->queuedata;
struct carm_request *crq;
int rc;
blk_mq_start_request(bd->rq);
spin_lock_irq(&host->lock);
crq = bd->rq->special;
assert(crq != NULL);
assert(crq->rq == bd->rq);
crq->n_elem = 0;
DPRINTK("send req\n");
rc = carm_send_msg(host, crq);
if (rc) {
carm_push_q(host, q);
spin_unlock_irq(&host->lock);
return BLK_STS_DEV_RESOURCE;
}
spin_unlock_irq(&host->lock);
return BLK_STS_OK;
return op_is_write(req_op(rq)) ? DMA_TO_DEVICE : DMA_FROM_DEVICE;
}
static blk_status_t carm_queue_rq(struct blk_mq_hw_ctx *hctx,
const struct blk_mq_queue_data *bd)
{
struct request_queue *q = hctx->queue;
struct request *rq = bd->rq;
struct carm_port *port = q->queuedata;
struct carm_host *host = port->host;
struct carm_request *crq = blk_mq_rq_to_pdu(rq);
struct carm_msg_rw *msg;
struct carm_request *crq;
struct request *rq = bd->rq;
struct scatterlist *sg;
int writing = 0, pci_dir, i, n_elem, rc;
u32 tmp;
int i, n_elem = 0, rc;
unsigned int msg_size;
u32 tmp;
crq->n_elem = 0;
sg_init_table(crq->sg, CARM_MAX_REQ_SG);
blk_mq_start_request(rq);
spin_lock_irq(&host->lock);
crq = carm_get_request(host);
if (!crq) {
carm_push_q(host, q);
spin_unlock_irq(&host->lock);
return BLK_STS_DEV_RESOURCE;
}
crq->rq = rq;
if (rq_data_dir(rq) == WRITE) {
writing = 1;
pci_dir = DMA_TO_DEVICE;
} else {
pci_dir = DMA_FROM_DEVICE;
}
if (req_op(rq) == REQ_OP_DRV_OUT)
goto send_msg;
/* get scatterlist from block layer */
sg = &crq->sg[0];
n_elem = blk_rq_map_sg(q, rq, sg);
if (n_elem <= 0) {
/* request with no s/g entries? */
carm_end_rq(host, crq, BLK_STS_IOERR);
spin_unlock_irq(&host->lock);
return BLK_STS_IOERR;
}
if (n_elem <= 0)
goto out_ioerr;
/* map scatterlist to PCI bus addresses */
n_elem = dma_map_sg(&host->pdev->dev, sg, n_elem, pci_dir);
if (n_elem <= 0) {
/* request with no s/g entries? */
carm_end_rq(host, crq, BLK_STS_IOERR);
spin_unlock_irq(&host->lock);
return BLK_STS_IOERR;
}
n_elem = dma_map_sg(&host->pdev->dev, sg, n_elem, carm_rq_dir(rq));
if (n_elem <= 0)
goto out_ioerr;
/* obey global hardware limit on S/G entries */
if (host->hw_sg_used >= CARM_MAX_HOST_SG - n_elem)
goto out_resource;
crq->n_elem = n_elem;
crq->port = port;
host->hw_sg_used += n_elem;
/*
@ -893,9 +745,9 @@ static blk_status_t carm_queue_rq(struct blk_mq_hw_ctx *hctx,
*/
VPRINTK("build msg\n");
msg = (struct carm_msg_rw *) carm_ref_msg(host, crq->tag);
msg = (struct carm_msg_rw *) carm_ref_msg(host, rq->tag);
if (writing) {
if (rq_data_dir(rq) == WRITE) {
msg->type = CARM_MSG_WRITE;
crq->msg_type = CARM_MSG_WRITE;
} else {
@ -906,7 +758,7 @@ static blk_status_t carm_queue_rq(struct blk_mq_hw_ctx *hctx,
msg->id = port->port_no;
msg->sg_count = n_elem;
msg->sg_type = SGT_32BIT;
msg->handle = cpu_to_le32(TAG_ENCODE(crq->tag));
msg->handle = cpu_to_le32(TAG_ENCODE(rq->tag));
msg->lba = cpu_to_le32(blk_rq_pos(rq) & 0xffffffff);
tmp = (blk_rq_pos(rq) >> 16) >> 16;
msg->lba_high = cpu_to_le16( (u16) tmp );
@ -923,22 +775,28 @@ static blk_status_t carm_queue_rq(struct blk_mq_hw_ctx *hctx,
rc = carm_lookup_bucket(msg_size);
BUG_ON(rc < 0);
crq->msg_bucket = (u32) rc;
send_msg:
/*
* queue read/write message to hardware
*/
VPRINTK("send msg, tag == %u\n", crq->tag);
rc = carm_send_msg(host, crq);
VPRINTK("send msg, tag == %u\n", rq->tag);
rc = carm_send_msg(host, crq, rq->tag);
if (rc) {
carm_put_request(host, crq);
carm_push_q(host, q);
spin_unlock_irq(&host->lock);
return BLK_STS_DEV_RESOURCE;
host->hw_sg_used -= n_elem;
goto out_resource;
}
spin_unlock_irq(&host->lock);
return BLK_STS_OK;
out_resource:
dma_unmap_sg(&host->pdev->dev, &crq->sg[0], n_elem, carm_rq_dir(rq));
carm_push_q(host, q);
spin_unlock_irq(&host->lock);
return BLK_STS_DEV_RESOURCE;
out_ioerr:
carm_round_robin(host);
spin_unlock_irq(&host->lock);
return BLK_STS_IOERR;
}
static void carm_handle_array_info(struct carm_host *host,
@ -954,8 +812,6 @@ static void carm_handle_array_info(struct carm_host *host,
DPRINTK("ENTER\n");
carm_end_rq(host, crq, error);
if (error)
goto out;
if (le32_to_cpu(desc->array_status) & ARRAY_NO_EXIST)
@ -1011,8 +867,6 @@ static void carm_handle_scan_chan(struct carm_host *host,
DPRINTK("ENTER\n");
carm_end_rq(host, crq, error);
if (error) {
new_state = HST_ERROR;
goto out;
@ -1040,8 +894,6 @@ static void carm_handle_generic(struct carm_host *host,
{
DPRINTK("ENTER\n");
carm_end_rq(host, crq, error);
assert(host->state == cur_state);
if (error)
host->state = HST_ERROR;
@ -1050,28 +902,12 @@ static void carm_handle_generic(struct carm_host *host,
schedule_work(&host->fsm_task);
}
static inline void carm_handle_rw(struct carm_host *host,
struct carm_request *crq, blk_status_t error)
{
int pci_dir;
VPRINTK("ENTER\n");
if (rq_data_dir(crq->rq) == WRITE)
pci_dir = DMA_TO_DEVICE;
else
pci_dir = DMA_FROM_DEVICE;
dma_unmap_sg(&host->pdev->dev, &crq->sg[0], crq->n_elem, pci_dir);
carm_end_rq(host, crq, error);
}
static inline void carm_handle_resp(struct carm_host *host,
__le32 ret_handle_le, u32 status)
{
u32 handle = le32_to_cpu(ret_handle_le);
unsigned int msg_idx;
struct request *rq;
struct carm_request *crq;
blk_status_t error = (status == RMSG_OK) ? 0 : BLK_STS_IOERR;
u8 *mem;
@ -1087,13 +923,15 @@ static inline void carm_handle_resp(struct carm_host *host,
msg_idx = TAG_DECODE(handle);
VPRINTK("tag == %u\n", msg_idx);
crq = &host->req[msg_idx];
rq = blk_mq_tag_to_rq(host->tag_set.tags[0], msg_idx);
crq = blk_mq_rq_to_pdu(rq);
/* fast path */
if (likely(crq->msg_type == CARM_MSG_READ ||
crq->msg_type == CARM_MSG_WRITE)) {
carm_handle_rw(host, crq, error);
return;
dma_unmap_sg(&host->pdev->dev, &crq->sg[0], crq->n_elem,
carm_rq_dir(rq));
goto done;
}
mem = carm_ref_msg(host, msg_idx);
@ -1103,7 +941,7 @@ static inline void carm_handle_resp(struct carm_host *host,
switch (crq->msg_subtype) {
case CARM_IOC_SCAN_CHAN:
carm_handle_scan_chan(host, crq, mem, error);
break;
goto done;
default:
/* unknown / invalid response */
goto err_out;
@ -1116,11 +954,11 @@ static inline void carm_handle_resp(struct carm_host *host,
case MISC_ALLOC_MEM:
carm_handle_generic(host, crq, error,
HST_ALLOC_BUF, HST_SYNC_TIME);
break;
goto done;
case MISC_SET_TIME:
carm_handle_generic(host, crq, error,
HST_SYNC_TIME, HST_GET_FW_VER);
break;
goto done;
case MISC_GET_FW_VER: {
struct carm_fw_ver *ver = (struct carm_fw_ver *)
(mem + sizeof(struct carm_msg_get_fw_ver));
@ -1130,7 +968,7 @@ static inline void carm_handle_resp(struct carm_host *host,
}
carm_handle_generic(host, crq, error,
HST_GET_FW_VER, HST_PORT_SCAN);
break;
goto done;
}
default:
/* unknown / invalid response */
@ -1161,7 +999,13 @@ static inline void carm_handle_resp(struct carm_host *host,
err_out:
printk(KERN_WARNING DRV_NAME "(%s): BUG: unhandled message type %d/%d\n",
pci_name(host->pdev), crq->msg_type, crq->msg_subtype);
carm_end_rq(host, crq, BLK_STS_IOERR);
error = BLK_STS_IOERR;
done:
host->hw_sg_used -= crq->n_elem;
blk_mq_end_request(blk_mq_rq_from_pdu(crq), error);
if (host->hw_sg_used <= CARM_SG_LOW_WATER)
carm_round_robin(host);
}
static inline void carm_handle_responses(struct carm_host *host)
@ -1491,78 +1335,56 @@ static int carm_init_host(struct carm_host *host)
return 0;
}
static const struct blk_mq_ops carm_oob_mq_ops = {
.queue_rq = carm_oob_queue_rq,
};
static const struct blk_mq_ops carm_mq_ops = {
.queue_rq = carm_queue_rq,
};
static int carm_init_disks(struct carm_host *host)
static int carm_init_disk(struct carm_host *host, unsigned int port_no)
{
unsigned int i;
int rc = 0;
struct carm_port *port = &host->port[port_no];
struct gendisk *disk;
struct request_queue *q;
for (i = 0; i < CARM_MAX_PORTS; i++) {
struct gendisk *disk;
struct request_queue *q;
struct carm_port *port;
port->host = host;
port->port_no = port_no;
port = &host->port[i];
port->host = host;
port->port_no = i;
disk = alloc_disk(CARM_MINORS_PER_MAJOR);
if (!disk)
return -ENOMEM;
disk = alloc_disk(CARM_MINORS_PER_MAJOR);
if (!disk) {
rc = -ENOMEM;
break;
}
port->disk = disk;
sprintf(disk->disk_name, DRV_NAME "/%u",
(unsigned int)host->id * CARM_MAX_PORTS + port_no);
disk->major = host->major;
disk->first_minor = port_no * CARM_MINORS_PER_MAJOR;
disk->fops = &carm_bd_ops;
disk->private_data = port;
port->disk = disk;
sprintf(disk->disk_name, DRV_NAME "/%u",
(unsigned int) (host->id * CARM_MAX_PORTS) + i);
disk->major = host->major;
disk->first_minor = i * CARM_MINORS_PER_MAJOR;
disk->fops = &carm_bd_ops;
disk->private_data = port;
q = blk_mq_init_queue(&host->tag_set);
if (IS_ERR(q))
return PTR_ERR(q);
q = blk_mq_init_sq_queue(&port->tag_set, &carm_mq_ops,
max_queue, BLK_MQ_F_SHOULD_MERGE);
if (IS_ERR(q)) {
rc = PTR_ERR(q);
break;
}
disk->queue = q;
blk_queue_max_segments(q, CARM_MAX_REQ_SG);
blk_queue_segment_boundary(q, CARM_SG_BOUNDARY);
blk_queue_max_segments(q, CARM_MAX_REQ_SG);
blk_queue_segment_boundary(q, CARM_SG_BOUNDARY);
q->queuedata = port;
}
return rc;
q->queuedata = port;
disk->queue = q;
return 0;
}
static void carm_free_disks(struct carm_host *host)
static void carm_free_disk(struct carm_host *host, unsigned int port_no)
{
unsigned int i;
struct carm_port *port = &host->port[port_no];
struct gendisk *disk = port->disk;
for (i = 0; i < CARM_MAX_PORTS; i++) {
struct carm_port *port = &host->port[i];
struct gendisk *disk = port->disk;
if (!disk)
return;
if (disk) {
struct request_queue *q = disk->queue;
if (disk->flags & GENHD_FL_UP)
del_gendisk(disk);
if (q) {
blk_mq_free_tag_set(&port->tag_set);
blk_cleanup_queue(q);
}
put_disk(disk);
}
}
if (disk->flags & GENHD_FL_UP)
del_gendisk(disk);
if (disk->queue)
blk_cleanup_queue(disk->queue);
put_disk(disk);
}
static int carm_init_shm(struct carm_host *host)
@ -1618,9 +1440,6 @@ static int carm_init_one (struct pci_dev *pdev, const struct pci_device_id *ent)
INIT_WORK(&host->fsm_task, carm_fsm_task);
init_completion(&host->probe_comp);
for (i = 0; i < ARRAY_SIZE(host->req); i++)
host->req[i].tag = i;
host->mmio = ioremap(pci_resource_start(pdev, 0),
pci_resource_len(pdev, 0));
if (!host->mmio) {
@ -1637,14 +1456,26 @@ static int carm_init_one (struct pci_dev *pdev, const struct pci_device_id *ent)
goto err_out_iounmap;
}
q = blk_mq_init_sq_queue(&host->tag_set, &carm_oob_mq_ops, 1,
BLK_MQ_F_NO_SCHED);
memset(&host->tag_set, 0, sizeof(host->tag_set));
host->tag_set.ops = &carm_mq_ops;
host->tag_set.cmd_size = sizeof(struct carm_request);
host->tag_set.nr_hw_queues = 1;
host->tag_set.nr_maps = 1;
host->tag_set.queue_depth = max_queue;
host->tag_set.numa_node = NUMA_NO_NODE;
host->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
rc = blk_mq_alloc_tag_set(&host->tag_set);
if (rc)
goto err_out_dma_free;
q = blk_mq_init_queue(&host->tag_set);
if (IS_ERR(q)) {
printk(KERN_ERR DRV_NAME "(%s): OOB queue alloc failure\n",
pci_name(pdev));
rc = PTR_ERR(q);
blk_mq_free_tag_set(&host->tag_set);
goto err_out_dma_free;
}
host->oob_q = q;
q->queuedata = host;
@ -1667,9 +1498,11 @@ static int carm_init_one (struct pci_dev *pdev, const struct pci_device_id *ent)
if (host->flags & FL_DYN_MAJOR)
host->major = rc;
rc = carm_init_disks(host);
if (rc)
goto err_out_blkdev_disks;
for (i = 0; i < CARM_MAX_PORTS; i++) {
rc = carm_init_disk(host, i);
if (rc)
goto err_out_blkdev_disks;
}
pci_set_master(pdev);
@ -1699,7 +1532,8 @@ static int carm_init_one (struct pci_dev *pdev, const struct pci_device_id *ent)
err_out_free_irq:
free_irq(pdev->irq, host);
err_out_blkdev_disks:
carm_free_disks(host);
for (i = 0; i < CARM_MAX_PORTS; i++)
carm_free_disk(host, i);
unregister_blkdev(host->major, host->name);
err_out_free_majors:
if (host->major == 160)
@ -1724,6 +1558,7 @@ err_out:
static void carm_remove_one (struct pci_dev *pdev)
{
struct carm_host *host = pci_get_drvdata(pdev);
unsigned int i;
if (!host) {
printk(KERN_ERR PFX "BUG: no host data for PCI(%s)\n",
@ -1732,7 +1567,8 @@ static void carm_remove_one (struct pci_dev *pdev)
}
free_irq(pdev->irq, host);
carm_free_disks(host);
for (i = 0; i < CARM_MAX_PORTS; i++)
carm_free_disk(host, i);
unregister_blkdev(host->major, host->name);
if (host->major == 160)
clear_bit(0, &carm_major_alloc);

View File

@ -888,8 +888,7 @@ static int mm_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
card->biotail = &card->bio;
spin_lock_init(&card->lock);
card->queue = blk_alloc_queue_node(GFP_KERNEL, NUMA_NO_NODE,
&card->lock);
card->queue = blk_alloc_queue_node(GFP_KERNEL, NUMA_NO_NODE);
if (!card->queue)
goto failed_alloc;

View File

@ -214,6 +214,20 @@ static void virtblk_done(struct virtqueue *vq)
spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
}
static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
{
struct virtio_blk *vblk = hctx->queue->queuedata;
struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
bool kick;
spin_lock_irq(&vq->lock);
kick = virtqueue_kick_prepare(vq->vq);
spin_unlock_irq(&vq->lock);
if (kick)
virtqueue_notify(vq->vq);
}
static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
const struct blk_mq_queue_data *bd)
{
@ -624,7 +638,7 @@ static int virtblk_map_queues(struct blk_mq_tag_set *set)
{
struct virtio_blk *vblk = set->driver_data;
return blk_mq_virtio_map_queues(set, vblk->vdev, 0);
return blk_mq_virtio_map_queues(&set->map[0], vblk->vdev, 0);
}
#ifdef CONFIG_VIRTIO_BLK_SCSI
@ -638,6 +652,7 @@ static void virtblk_initialize_rq(struct request *req)
static const struct blk_mq_ops virtio_mq_ops = {
.queue_rq = virtio_queue_rq,
.commit_rqs = virtio_commit_rqs,
.complete = virtblk_request_done,
.init_request = virtblk_init_request,
#ifdef CONFIG_VIRTIO_BLK_SCSI

View File

@ -94,7 +94,7 @@ int ide_queue_pc_tail(ide_drive_t *drive, struct gendisk *disk,
rq = blk_get_request(drive->queue, REQ_OP_DRV_IN, 0);
ide_req(rq)->type = ATA_PRIV_MISC;
rq->special = (char *)pc;
ide_req(rq)->special = pc;
if (buf && bufflen) {
error = blk_rq_map_kern(drive->queue, rq, buf, bufflen,
@ -172,8 +172,8 @@ EXPORT_SYMBOL_GPL(ide_create_request_sense_cmd);
void ide_prep_sense(ide_drive_t *drive, struct request *rq)
{
struct request_sense *sense = &drive->sense_data;
struct request *sense_rq = drive->sense_rq;
struct scsi_request *req = scsi_req(sense_rq);
struct request *sense_rq;
struct scsi_request *req;
unsigned int cmd_len, sense_len;
int err;
@ -196,9 +196,16 @@ void ide_prep_sense(ide_drive_t *drive, struct request *rq)
if (ata_sense_request(rq) || drive->sense_rq_armed)
return;
sense_rq = drive->sense_rq;
if (!sense_rq) {
sense_rq = blk_mq_alloc_request(drive->queue, REQ_OP_DRV_IN,
BLK_MQ_REQ_RESERVED | BLK_MQ_REQ_NOWAIT);
drive->sense_rq = sense_rq;
}
req = scsi_req(sense_rq);
memset(sense, 0, sizeof(*sense));
blk_rq_init(rq->q, sense_rq);
scsi_req_init(req);
err = blk_rq_map_kern(drive->queue, sense_rq, sense, sense_len,
@ -207,6 +214,8 @@ void ide_prep_sense(ide_drive_t *drive, struct request *rq)
if (printk_ratelimit())
printk(KERN_WARNING PFX "%s: failed to map sense "
"buffer\n", drive->name);
blk_mq_free_request(sense_rq);
drive->sense_rq = NULL;
return;
}
@ -226,6 +235,8 @@ EXPORT_SYMBOL_GPL(ide_prep_sense);
int ide_queue_sense_rq(ide_drive_t *drive, void *special)
{
struct request *sense_rq = drive->sense_rq;
/* deferred failure from ide_prep_sense() */
if (!drive->sense_rq_armed) {
printk(KERN_WARNING PFX "%s: error queuing a sense request\n",
@ -233,12 +244,12 @@ int ide_queue_sense_rq(ide_drive_t *drive, void *special)
return -ENOMEM;
}
drive->sense_rq->special = special;
ide_req(sense_rq)->special = special;
drive->sense_rq_armed = false;
drive->hwif->rq = NULL;
elv_add_request(drive->queue, drive->sense_rq, ELEVATOR_INSERT_FRONT);
ide_insert_request_head(drive, sense_rq);
return 0;
}
EXPORT_SYMBOL_GPL(ide_queue_sense_rq);
@ -270,10 +281,8 @@ void ide_retry_pc(ide_drive_t *drive)
*/
drive->hwif->rq = NULL;
ide_requeue_and_plug(drive, failed_rq);
if (ide_queue_sense_rq(drive, pc)) {
blk_start_request(failed_rq);
if (ide_queue_sense_rq(drive, pc))
ide_complete_rq(drive, BLK_STS_IOERR, blk_rq_bytes(failed_rq));
}
}
EXPORT_SYMBOL_GPL(ide_retry_pc);

View File

@ -211,12 +211,12 @@ static void cdrom_analyze_sense_data(ide_drive_t *drive,
static void ide_cd_complete_failed_rq(ide_drive_t *drive, struct request *rq)
{
/*
* For ATA_PRIV_SENSE, "rq->special" points to the original
* For ATA_PRIV_SENSE, "ide_req(rq)->special" points to the original
* failed request. Also, the sense data should be read
* directly from rq which might be different from the original
* sense buffer if it got copied during mapping.
*/
struct request *failed = (struct request *)rq->special;
struct request *failed = ide_req(rq)->special;
void *sense = bio_data(rq->bio);
if (failed) {
@ -258,11 +258,22 @@ static int ide_cd_breathe(ide_drive_t *drive, struct request *rq)
/*
* take a breather
*/
blk_delay_queue(drive->queue, 1);
blk_mq_requeue_request(rq, false);
blk_mq_delay_kick_requeue_list(drive->queue, 1);
return 1;
}
}
static void ide_cd_free_sense(ide_drive_t *drive)
{
if (!drive->sense_rq)
return;
blk_mq_free_request(drive->sense_rq);
drive->sense_rq = NULL;
drive->sense_rq_armed = false;
}
/**
* Returns:
* 0: if the request should be continued.
@ -516,6 +527,82 @@ static bool ide_cd_error_cmd(ide_drive_t *drive, struct ide_cmd *cmd)
return false;
}
/* standard prep_rq that builds 10 byte cmds */
static bool ide_cdrom_prep_fs(struct request_queue *q, struct request *rq)
{
int hard_sect = queue_logical_block_size(q);
long block = (long)blk_rq_pos(rq) / (hard_sect >> 9);
unsigned long blocks = blk_rq_sectors(rq) / (hard_sect >> 9);
struct scsi_request *req = scsi_req(rq);
if (rq_data_dir(rq) == READ)
req->cmd[0] = GPCMD_READ_10;
else
req->cmd[0] = GPCMD_WRITE_10;
/*
* fill in lba
*/
req->cmd[2] = (block >> 24) & 0xff;
req->cmd[3] = (block >> 16) & 0xff;
req->cmd[4] = (block >> 8) & 0xff;
req->cmd[5] = block & 0xff;
/*
* and transfer length
*/
req->cmd[7] = (blocks >> 8) & 0xff;
req->cmd[8] = blocks & 0xff;
req->cmd_len = 10;
return true;
}
/*
* Most of the SCSI commands are supported directly by ATAPI devices.
* This transform handles the few exceptions.
*/
static bool ide_cdrom_prep_pc(struct request *rq)
{
u8 *c = scsi_req(rq)->cmd;
/* transform 6-byte read/write commands to the 10-byte version */
if (c[0] == READ_6 || c[0] == WRITE_6) {
c[8] = c[4];
c[5] = c[3];
c[4] = c[2];
c[3] = c[1] & 0x1f;
c[2] = 0;
c[1] &= 0xe0;
c[0] += (READ_10 - READ_6);
scsi_req(rq)->cmd_len = 10;
return true;
}
/*
* it's silly to pretend we understand 6-byte sense commands, just
* reject with ILLEGAL_REQUEST and the caller should take the
* appropriate action
*/
if (c[0] == MODE_SENSE || c[0] == MODE_SELECT) {
scsi_req(rq)->result = ILLEGAL_REQUEST;
return false;
}
return true;
}
static bool ide_cdrom_prep_rq(ide_drive_t *drive, struct request *rq)
{
if (!blk_rq_is_passthrough(rq)) {
scsi_req_init(scsi_req(rq));
return ide_cdrom_prep_fs(drive->queue, rq);
} else if (blk_rq_is_scsi(rq))
return ide_cdrom_prep_pc(rq);
return true;
}
static ide_startstop_t cdrom_newpc_intr(ide_drive_t *drive)
{
ide_hwif_t *hwif = drive->hwif;
@ -675,7 +762,7 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t *drive)
out_end:
if (blk_rq_is_scsi(rq) && rc == 0) {
scsi_req(rq)->resid_len = 0;
blk_end_request_all(rq, BLK_STS_OK);
blk_mq_end_request(rq, BLK_STS_OK);
hwif->rq = NULL;
} else {
if (sense && uptodate)
@ -705,6 +792,8 @@ out_end:
if (sense && rc == 2)
ide_error(drive, "request sense failure", stat);
}
ide_cd_free_sense(drive);
return ide_stopped;
}
@ -729,7 +818,7 @@ static ide_startstop_t cdrom_start_rw(ide_drive_t *drive, struct request *rq)
* We may be retrying this request after an error. Fix up any
* weirdness which might be present in the request packet.
*/
q->prep_rq_fn(q, rq);
ide_cdrom_prep_rq(drive, rq);
}
/* fs requests *must* be hardware frame aligned */
@ -1323,82 +1412,6 @@ static int ide_cdrom_probe_capabilities(ide_drive_t *drive)
return nslots;
}
/* standard prep_rq_fn that builds 10 byte cmds */
static int ide_cdrom_prep_fs(struct request_queue *q, struct request *rq)
{
int hard_sect = queue_logical_block_size(q);
long block = (long)blk_rq_pos(rq) / (hard_sect >> 9);
unsigned long blocks = blk_rq_sectors(rq) / (hard_sect >> 9);
struct scsi_request *req = scsi_req(rq);
q->initialize_rq_fn(rq);
if (rq_data_dir(rq) == READ)
req->cmd[0] = GPCMD_READ_10;
else
req->cmd[0] = GPCMD_WRITE_10;
/*
* fill in lba
*/
req->cmd[2] = (block >> 24) & 0xff;
req->cmd[3] = (block >> 16) & 0xff;
req->cmd[4] = (block >> 8) & 0xff;
req->cmd[5] = block & 0xff;
/*
* and transfer length
*/
req->cmd[7] = (blocks >> 8) & 0xff;
req->cmd[8] = blocks & 0xff;
req->cmd_len = 10;
return BLKPREP_OK;
}
/*
* Most of the SCSI commands are supported directly by ATAPI devices.
* This transform handles the few exceptions.
*/
static int ide_cdrom_prep_pc(struct request *rq)
{
u8 *c = scsi_req(rq)->cmd;
/* transform 6-byte read/write commands to the 10-byte version */
if (c[0] == READ_6 || c[0] == WRITE_6) {
c[8] = c[4];
c[5] = c[3];
c[4] = c[2];
c[3] = c[1] & 0x1f;
c[2] = 0;
c[1] &= 0xe0;
c[0] += (READ_10 - READ_6);
scsi_req(rq)->cmd_len = 10;
return BLKPREP_OK;
}
/*
* it's silly to pretend we understand 6-byte sense commands, just
* reject with ILLEGAL_REQUEST and the caller should take the
* appropriate action
*/
if (c[0] == MODE_SENSE || c[0] == MODE_SELECT) {
scsi_req(rq)->result = ILLEGAL_REQUEST;
return BLKPREP_KILL;
}
return BLKPREP_OK;
}
static int ide_cdrom_prep_fn(struct request_queue *q, struct request *rq)
{
if (!blk_rq_is_passthrough(rq))
return ide_cdrom_prep_fs(q, rq);
else if (blk_rq_is_scsi(rq))
return ide_cdrom_prep_pc(rq);
return 0;
}
struct cd_list_entry {
const char *id_model;
const char *id_firmware;
@ -1508,7 +1521,7 @@ static int ide_cdrom_setup(ide_drive_t *drive)
ide_debug_log(IDE_DBG_PROBE, "enter");
blk_queue_prep_rq(q, ide_cdrom_prep_fn);
drive->prep_rq = ide_cdrom_prep_rq;
blk_queue_dma_alignment(q, 31);
blk_queue_update_dma_pad(q, 15);
@ -1569,7 +1582,7 @@ static void ide_cd_release(struct device *dev)
if (devinfo->handle == drive)
unregister_cdrom(devinfo);
drive->driver_data = NULL;
blk_queue_prep_rq(drive->queue, NULL);
drive->prep_rq = NULL;
g->private_data = NULL;
put_disk(g);
kfree(info);

View File

@ -171,7 +171,7 @@ int ide_devset_execute(ide_drive_t *drive, const struct ide_devset *setting,
scsi_req(rq)->cmd_len = 5;
scsi_req(rq)->cmd[0] = REQ_DEVSET_EXEC;
*(int *)&scsi_req(rq)->cmd[1] = arg;
rq->special = setting->set;
ide_req(rq)->special = setting->set;
blk_execute_rq(q, NULL, rq, 0);
ret = scsi_req(rq)->result;
@ -182,7 +182,7 @@ int ide_devset_execute(ide_drive_t *drive, const struct ide_devset *setting,
ide_startstop_t ide_do_devset(ide_drive_t *drive, struct request *rq)
{
int err, (*setfunc)(ide_drive_t *, int) = rq->special;
int err, (*setfunc)(ide_drive_t *, int) = ide_req(rq)->special;
err = setfunc(drive, *(int *)&scsi_req(rq)->cmd[1]);
if (err)

View File

@ -427,16 +427,15 @@ static void ide_disk_unlock_native_capacity(ide_drive_t *drive)
drive->dev_flags |= IDE_DFLAG_NOHPA; /* disable HPA on resume */
}
static int idedisk_prep_fn(struct request_queue *q, struct request *rq)
static bool idedisk_prep_rq(ide_drive_t *drive, struct request *rq)
{
ide_drive_t *drive = q->queuedata;
struct ide_cmd *cmd;
if (req_op(rq) != REQ_OP_FLUSH)
return BLKPREP_OK;
return true;
if (rq->special) {
cmd = rq->special;
if (ide_req(rq)->special) {
cmd = ide_req(rq)->special;
memset(cmd, 0, sizeof(*cmd));
} else {
cmd = kzalloc(sizeof(*cmd), GFP_ATOMIC);
@ -456,10 +455,10 @@ static int idedisk_prep_fn(struct request_queue *q, struct request *rq)
rq->cmd_flags &= ~REQ_OP_MASK;
rq->cmd_flags |= REQ_OP_DRV_OUT;
ide_req(rq)->type = ATA_PRIV_TASKFILE;
rq->special = cmd;
ide_req(rq)->special = cmd;
cmd->rq = rq;
return BLKPREP_OK;
return true;
}
ide_devset_get(multcount, mult_count);
@ -548,7 +547,7 @@ static void update_flush(ide_drive_t *drive)
if (barrier) {
wc = true;
blk_queue_prep_rq(drive->queue, idedisk_prep_fn);
drive->prep_rq = idedisk_prep_rq;
}
}

View File

@ -125,7 +125,7 @@ ide_startstop_t ide_error(ide_drive_t *drive, const char *msg, u8 stat)
/* retry only "normal" I/O: */
if (blk_rq_is_passthrough(rq)) {
if (ata_taskfile_request(rq)) {
struct ide_cmd *cmd = rq->special;
struct ide_cmd *cmd = ide_req(rq)->special;
if (cmd)
ide_complete_cmd(drive, cmd, stat, err);

View File

@ -276,7 +276,7 @@ static ide_startstop_t ide_floppy_do_request(ide_drive_t *drive,
switch (ide_req(rq)->type) {
case ATA_PRIV_MISC:
case ATA_PRIV_SENSE:
pc = (struct ide_atapi_pc *)rq->special;
pc = (struct ide_atapi_pc *)ide_req(rq)->special;
break;
default:
BUG();

View File

@ -67,7 +67,15 @@ int ide_end_rq(ide_drive_t *drive, struct request *rq, blk_status_t error,
ide_dma_on(drive);
}
return blk_end_request(rq, error, nr_bytes);
if (!blk_update_request(rq, error, nr_bytes)) {
if (rq == drive->sense_rq)
drive->sense_rq = NULL;
__blk_mq_end_request(rq, error);
return 0;
}
return 1;
}
EXPORT_SYMBOL_GPL(ide_end_rq);
@ -103,7 +111,7 @@ void ide_complete_cmd(ide_drive_t *drive, struct ide_cmd *cmd, u8 stat, u8 err)
}
if (rq && ata_taskfile_request(rq)) {
struct ide_cmd *orig_cmd = rq->special;
struct ide_cmd *orig_cmd = ide_req(rq)->special;
if (cmd->tf_flags & IDE_TFLAG_DYN)
kfree(orig_cmd);
@ -253,7 +261,7 @@ EXPORT_SYMBOL_GPL(ide_init_sg_cmd);
static ide_startstop_t execute_drive_cmd (ide_drive_t *drive,
struct request *rq)
{
struct ide_cmd *cmd = rq->special;
struct ide_cmd *cmd = ide_req(rq)->special;
if (cmd) {
if (cmd->protocol == ATA_PROT_PIO) {
@ -307,8 +315,6 @@ static ide_startstop_t start_request (ide_drive_t *drive, struct request *rq)
{
ide_startstop_t startstop;
BUG_ON(!(rq->rq_flags & RQF_STARTED));
#ifdef DEBUG
printk("%s: start_request: current=0x%08lx\n",
drive->hwif->name, (unsigned long) rq);
@ -320,6 +326,9 @@ static ide_startstop_t start_request (ide_drive_t *drive, struct request *rq)
goto kill_rq;
}
if (drive->prep_rq && !drive->prep_rq(drive, rq))
return ide_stopped;
if (ata_pm_request(rq))
ide_check_pm_state(drive, rq);
@ -343,7 +352,7 @@ static ide_startstop_t start_request (ide_drive_t *drive, struct request *rq)
if (ata_taskfile_request(rq))
return execute_drive_cmd(drive, rq);
else if (ata_pm_request(rq)) {
struct ide_pm_state *pm = rq->special;
struct ide_pm_state *pm = ide_req(rq)->special;
#ifdef DEBUG_PM
printk("%s: start_power_step(step: %d)\n",
drive->name, pm->pm_step);
@ -430,44 +439,42 @@ static inline void ide_unlock_host(struct ide_host *host)
}
}
static void __ide_requeue_and_plug(struct request_queue *q, struct request *rq)
{
if (rq)
blk_requeue_request(q, rq);
if (rq || blk_peek_request(q)) {
/* Use 3ms as that was the old plug delay */
blk_delay_queue(q, 3);
}
}
void ide_requeue_and_plug(ide_drive_t *drive, struct request *rq)
{
struct request_queue *q = drive->queue;
unsigned long flags;
spin_lock_irqsave(q->queue_lock, flags);
__ide_requeue_and_plug(q, rq);
spin_unlock_irqrestore(q->queue_lock, flags);
/* Use 3ms as that was the old plug delay */
if (rq) {
blk_mq_requeue_request(rq, false);
blk_mq_delay_kick_requeue_list(q, 3);
} else
blk_mq_delay_run_hw_queue(q->queue_hw_ctx[0], 3);
}
/*
* Issue a new request to a device.
*/
void do_ide_request(struct request_queue *q)
blk_status_t ide_queue_rq(struct blk_mq_hw_ctx *hctx,
const struct blk_mq_queue_data *bd)
{
ide_drive_t *drive = q->queuedata;
ide_drive_t *drive = hctx->queue->queuedata;
ide_hwif_t *hwif = drive->hwif;
struct ide_host *host = hwif->host;
struct request *rq = NULL;
struct request *rq = bd->rq;
ide_startstop_t startstop;
spin_unlock_irq(q->queue_lock);
if (!blk_rq_is_passthrough(rq) && !(rq->rq_flags & RQF_DONTPREP)) {
rq->rq_flags |= RQF_DONTPREP;
ide_req(rq)->special = NULL;
}
/* HLD do_request() callback might sleep, make sure it's okay */
might_sleep();
if (ide_lock_host(host, hwif))
goto plug_device_2;
return BLK_STS_DEV_RESOURCE;
blk_mq_start_request(rq);
spin_lock_irq(&hwif->lock);
@ -503,21 +510,16 @@ repeat:
hwif->cur_dev = drive;
drive->dev_flags &= ~(IDE_DFLAG_SLEEPING | IDE_DFLAG_PARKED);
spin_unlock_irq(&hwif->lock);
spin_lock_irq(q->queue_lock);
/*
* we know that the queue isn't empty, but this can happen
* if the q->prep_rq_fn() decides to kill a request
* if ->prep_rq() decides to kill a request
*/
if (!rq)
rq = blk_fetch_request(drive->queue);
spin_unlock_irq(q->queue_lock);
spin_lock_irq(&hwif->lock);
if (!rq) {
ide_unlock_port(hwif);
goto out;
rq = bd->rq;
if (!rq) {
ide_unlock_port(hwif);
goto out;
}
}
/*
@ -551,23 +553,24 @@ repeat:
if (startstop == ide_stopped) {
rq = hwif->rq;
hwif->rq = NULL;
goto repeat;
if (rq)
goto repeat;
ide_unlock_port(hwif);
goto out;
}
} else
goto plug_device;
} else {
plug_device:
spin_unlock_irq(&hwif->lock);
ide_unlock_host(host);
ide_requeue_and_plug(drive, rq);
return BLK_STS_OK;
}
out:
spin_unlock_irq(&hwif->lock);
if (rq == NULL)
ide_unlock_host(host);
spin_lock_irq(q->queue_lock);
return;
plug_device:
spin_unlock_irq(&hwif->lock);
ide_unlock_host(host);
plug_device_2:
spin_lock_irq(q->queue_lock);
__ide_requeue_and_plug(q, rq);
return BLK_STS_OK;
}
static int drive_is_ready(ide_drive_t *drive)
@ -887,3 +890,16 @@ void ide_pad_transfer(ide_drive_t *drive, int write, int len)
}
}
EXPORT_SYMBOL_GPL(ide_pad_transfer);
void ide_insert_request_head(ide_drive_t *drive, struct request *rq)
{
ide_hwif_t *hwif = drive->hwif;
unsigned long flags;
spin_lock_irqsave(&hwif->lock, flags);
list_add_tail(&rq->queuelist, &drive->rq_list);
spin_unlock_irqrestore(&hwif->lock, flags);
kblockd_schedule_work(&drive->rq_work);
}
EXPORT_SYMBOL_GPL(ide_insert_request_head);

View File

@ -27,7 +27,7 @@ static void issue_park_cmd(ide_drive_t *drive, unsigned long timeout)
spin_unlock_irq(&hwif->lock);
if (start_queue)
blk_run_queue(q);
blk_mq_run_hw_queues(q, true);
return;
}
spin_unlock_irq(&hwif->lock);
@ -36,7 +36,7 @@ static void issue_park_cmd(ide_drive_t *drive, unsigned long timeout)
scsi_req(rq)->cmd[0] = REQ_PARK_HEADS;
scsi_req(rq)->cmd_len = 1;
ide_req(rq)->type = ATA_PRIV_MISC;
rq->special = &timeout;
ide_req(rq)->special = &timeout;
blk_execute_rq(q, NULL, rq, 1);
rc = scsi_req(rq)->result ? -EIO : 0;
blk_put_request(rq);
@ -54,7 +54,7 @@ static void issue_park_cmd(ide_drive_t *drive, unsigned long timeout)
scsi_req(rq)->cmd[0] = REQ_UNPARK_HEADS;
scsi_req(rq)->cmd_len = 1;
ide_req(rq)->type = ATA_PRIV_MISC;
elv_add_request(q, rq, ELEVATOR_INSERT_FRONT);
ide_insert_request_head(drive, rq);
out:
return;
@ -67,7 +67,7 @@ ide_startstop_t ide_do_park_unpark(ide_drive_t *drive, struct request *rq)
memset(&cmd, 0, sizeof(cmd));
if (scsi_req(rq)->cmd[0] == REQ_PARK_HEADS) {
drive->sleep = *(unsigned long *)rq->special;
drive->sleep = *(unsigned long *)ide_req(rq)->special;
drive->dev_flags |= IDE_DFLAG_SLEEPING;
tf->command = ATA_CMD_IDLEIMMEDIATE;
tf->feature = 0x44;

View File

@ -21,7 +21,7 @@ int generic_ide_suspend(struct device *dev, pm_message_t mesg)
memset(&rqpm, 0, sizeof(rqpm));
rq = blk_get_request(drive->queue, REQ_OP_DRV_IN, 0);
ide_req(rq)->type = ATA_PRIV_PM_SUSPEND;
rq->special = &rqpm;
ide_req(rq)->special = &rqpm;
rqpm.pm_step = IDE_PM_START_SUSPEND;
if (mesg.event == PM_EVENT_PRETHAW)
mesg.event = PM_EVENT_FREEZE;
@ -40,32 +40,17 @@ int generic_ide_suspend(struct device *dev, pm_message_t mesg)
return ret;
}
static void ide_end_sync_rq(struct request *rq, blk_status_t error)
{
complete(rq->end_io_data);
}
static int ide_pm_execute_rq(struct request *rq)
{
struct request_queue *q = rq->q;
DECLARE_COMPLETION_ONSTACK(wait);
rq->end_io_data = &wait;
rq->end_io = ide_end_sync_rq;
spin_lock_irq(q->queue_lock);
if (unlikely(blk_queue_dying(q))) {
rq->rq_flags |= RQF_QUIET;
scsi_req(rq)->result = -ENXIO;
__blk_end_request_all(rq, BLK_STS_OK);
spin_unlock_irq(q->queue_lock);
blk_mq_end_request(rq, BLK_STS_OK);
return -ENXIO;
}
__elv_add_request(q, rq, ELEVATOR_INSERT_FRONT);
__blk_run_queue_uncond(q);
spin_unlock_irq(q->queue_lock);
wait_for_completion_io(&wait);
blk_execute_rq(q, NULL, rq, true);
return scsi_req(rq)->result ? -EIO : 0;
}
@ -79,6 +64,8 @@ int generic_ide_resume(struct device *dev)
struct ide_pm_state rqpm;
int err;
blk_mq_start_stopped_hw_queues(drive->queue, true);
if (ide_port_acpi(hwif)) {
/* call ACPI _PS0 / _STM only once */
if ((drive->dn & 1) == 0 || pair == NULL) {
@ -92,7 +79,7 @@ int generic_ide_resume(struct device *dev)
memset(&rqpm, 0, sizeof(rqpm));
rq = blk_get_request(drive->queue, REQ_OP_DRV_IN, BLK_MQ_REQ_PREEMPT);
ide_req(rq)->type = ATA_PRIV_PM_RESUME;
rq->special = &rqpm;
ide_req(rq)->special = &rqpm;
rqpm.pm_step = IDE_PM_START_RESUME;
rqpm.pm_state = PM_EVENT_ON;
@ -111,7 +98,7 @@ int generic_ide_resume(struct device *dev)
void ide_complete_power_step(ide_drive_t *drive, struct request *rq)
{
struct ide_pm_state *pm = rq->special;
struct ide_pm_state *pm = ide_req(rq)->special;
#ifdef DEBUG_PM
printk(KERN_INFO "%s: complete_power_step(step: %d)\n",
@ -141,7 +128,7 @@ void ide_complete_power_step(ide_drive_t *drive, struct request *rq)
ide_startstop_t ide_start_power_step(ide_drive_t *drive, struct request *rq)
{
struct ide_pm_state *pm = rq->special;
struct ide_pm_state *pm = ide_req(rq)->special;
struct ide_cmd cmd = { };
switch (pm->pm_step) {
@ -213,8 +200,7 @@ out_do_tf:
void ide_complete_pm_rq(ide_drive_t *drive, struct request *rq)
{
struct request_queue *q = drive->queue;
struct ide_pm_state *pm = rq->special;
unsigned long flags;
struct ide_pm_state *pm = ide_req(rq)->special;
ide_complete_power_step(drive, rq);
if (pm->pm_step != IDE_PM_COMPLETED)
@ -224,22 +210,19 @@ void ide_complete_pm_rq(ide_drive_t *drive, struct request *rq)
printk("%s: completing PM request, %s\n", drive->name,
(ide_req(rq)->type == ATA_PRIV_PM_SUSPEND) ? "suspend" : "resume");
#endif
spin_lock_irqsave(q->queue_lock, flags);
if (ide_req(rq)->type == ATA_PRIV_PM_SUSPEND)
blk_stop_queue(q);
blk_mq_stop_hw_queues(q);
else
drive->dev_flags &= ~IDE_DFLAG_BLOCKED;
spin_unlock_irqrestore(q->queue_lock, flags);
drive->hwif->rq = NULL;
if (blk_end_request(rq, BLK_STS_OK, 0))
BUG();
blk_mq_end_request(rq, BLK_STS_OK);
}
void ide_check_pm_state(ide_drive_t *drive, struct request *rq)
{
struct ide_pm_state *pm = rq->special;
struct ide_pm_state *pm = ide_req(rq)->special;
if (blk_rq_is_private(rq) &&
ide_req(rq)->type == ATA_PRIV_PM_SUSPEND &&
@ -260,7 +243,6 @@ void ide_check_pm_state(ide_drive_t *drive, struct request *rq)
ide_hwif_t *hwif = drive->hwif;
const struct ide_tp_ops *tp_ops = hwif->tp_ops;
struct request_queue *q = drive->queue;
unsigned long flags;
int rc;
#ifdef DEBUG_PM
printk("%s: Wakeup request inited, waiting for !BSY...\n", drive->name);
@ -274,8 +256,6 @@ void ide_check_pm_state(ide_drive_t *drive, struct request *rq)
if (rc)
printk(KERN_WARNING "%s: drive not ready on wakeup\n", drive->name);
spin_lock_irqsave(q->queue_lock, flags);
blk_start_queue(q);
spin_unlock_irqrestore(q->queue_lock, flags);
blk_mq_start_hw_queues(q);
}
}

View File

@ -746,10 +746,16 @@ static void ide_initialize_rq(struct request *rq)
{
struct ide_request *req = blk_mq_rq_to_pdu(rq);
req->special = NULL;
scsi_req_init(&req->sreq);
req->sreq.sense = req->sense;
}
static const struct blk_mq_ops ide_mq_ops = {
.queue_rq = ide_queue_rq,
.initialize_rq_fn = ide_initialize_rq,
};
/*
* init request queue
*/
@ -759,6 +765,7 @@ static int ide_init_queue(ide_drive_t *drive)
ide_hwif_t *hwif = drive->hwif;
int max_sectors = 256;
int max_sg_entries = PRD_ENTRIES;
struct blk_mq_tag_set *set;
/*
* Our default set up assumes the normal IDE case,
@ -767,19 +774,26 @@ static int ide_init_queue(ide_drive_t *drive)
* limits and LBA48 we could raise it but as yet
* do not.
*/
q = blk_alloc_queue_node(GFP_KERNEL, hwif_to_node(hwif), NULL);
if (!q)
set = &drive->tag_set;
set->ops = &ide_mq_ops;
set->nr_hw_queues = 1;
set->queue_depth = 32;
set->reserved_tags = 1;
set->cmd_size = sizeof(struct ide_request);
set->numa_node = hwif_to_node(hwif);
set->flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_BLOCKING;
if (blk_mq_alloc_tag_set(set))
return 1;
q->request_fn = do_ide_request;
q->initialize_rq_fn = ide_initialize_rq;
q->cmd_size = sizeof(struct ide_request);
blk_queue_flag_set(QUEUE_FLAG_SCSI_PASSTHROUGH, q);
if (blk_init_allocated_queue(q) < 0) {
blk_cleanup_queue(q);
q = blk_mq_init_queue(set);
if (IS_ERR(q)) {
blk_mq_free_tag_set(set);
return 1;
}
blk_queue_flag_set(QUEUE_FLAG_SCSI_PASSTHROUGH, q);
q->queuedata = drive;
blk_queue_segment_boundary(q, 0xffff);
@ -965,8 +979,12 @@ static void drive_release_dev (struct device *dev)
ide_proc_unregister_device(drive);
if (drive->sense_rq)
blk_mq_free_request(drive->sense_rq);
blk_cleanup_queue(drive->queue);
drive->queue = NULL;
blk_mq_free_tag_set(&drive->tag_set);
drive->dev_flags &= ~IDE_DFLAG_PRESENT;
@ -1133,6 +1151,28 @@ static void ide_port_cable_detect(ide_hwif_t *hwif)
}
}
/*
* Deferred request list insertion handler
*/
static void drive_rq_insert_work(struct work_struct *work)
{
ide_drive_t *drive = container_of(work, ide_drive_t, rq_work);
ide_hwif_t *hwif = drive->hwif;
struct request *rq;
LIST_HEAD(list);
spin_lock_irq(&hwif->lock);
if (!list_empty(&drive->rq_list))
list_splice_init(&drive->rq_list, &list);
spin_unlock_irq(&hwif->lock);
while (!list_empty(&list)) {
rq = list_first_entry(&list, struct request, queuelist);
list_del_init(&rq->queuelist);
blk_execute_rq_nowait(drive->queue, rq->rq_disk, rq, true, NULL);
}
}
static const u8 ide_hwif_to_major[] =
{ IDE0_MAJOR, IDE1_MAJOR, IDE2_MAJOR, IDE3_MAJOR, IDE4_MAJOR,
IDE5_MAJOR, IDE6_MAJOR, IDE7_MAJOR, IDE8_MAJOR, IDE9_MAJOR };
@ -1145,12 +1185,10 @@ static void ide_port_init_devices_data(ide_hwif_t *hwif)
ide_port_for_each_dev(i, drive, hwif) {
u8 j = (hwif->index * MAX_DRIVES) + i;
u16 *saved_id = drive->id;
struct request *saved_sense_rq = drive->sense_rq;
memset(drive, 0, sizeof(*drive));
memset(saved_id, 0, SECTOR_SIZE);
drive->id = saved_id;
drive->sense_rq = saved_sense_rq;
drive->media = ide_disk;
drive->select = (i << 4) | ATA_DEVICE_OBS;
@ -1166,6 +1204,9 @@ static void ide_port_init_devices_data(ide_hwif_t *hwif)
INIT_LIST_HEAD(&drive->list);
init_completion(&drive->gendev_rel_comp);
INIT_WORK(&drive->rq_work, drive_rq_insert_work);
INIT_LIST_HEAD(&drive->rq_list);
}
}
@ -1255,7 +1296,6 @@ static void ide_port_free_devices(ide_hwif_t *hwif)
int i;
ide_port_for_each_dev(i, drive, hwif) {
kfree(drive->sense_rq);
kfree(drive->id);
kfree(drive);
}
@ -1283,17 +1323,10 @@ static int ide_port_alloc_devices(ide_hwif_t *hwif, int node)
if (drive->id == NULL)
goto out_free_drive;
drive->sense_rq = kmalloc(sizeof(struct request) +
sizeof(struct ide_request), GFP_KERNEL);
if (!drive->sense_rq)
goto out_free_id;
hwif->devices[i] = drive;
}
return 0;
out_free_id:
kfree(drive->id);
out_free_drive:
kfree(drive);
out_nomem:

View File

@ -639,7 +639,7 @@ static ide_startstop_t idetape_do_request(ide_drive_t *drive,
goto out;
}
if (req->cmd[13] & REQ_IDETAPE_PC1) {
pc = (struct ide_atapi_pc *)rq->special;
pc = (struct ide_atapi_pc *)ide_req(rq)->special;
req->cmd[13] &= ~(REQ_IDETAPE_PC1);
req->cmd[13] |= REQ_IDETAPE_PC2;
goto out;

View File

@ -440,7 +440,7 @@ int ide_raw_taskfile(ide_drive_t *drive, struct ide_cmd *cmd, u8 *buf,
goto put_req;
}
rq->special = cmd;
ide_req(rq)->special = cmd;
cmd->rq = rq;
blk_execute_rq(drive->queue, NULL, rq, 0);

View File

@ -389,7 +389,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
goto err_dev;
}
tqueue = blk_alloc_queue_node(GFP_KERNEL, dev->q->node, NULL);
tqueue = blk_alloc_queue_node(GFP_KERNEL, dev->q->node);
if (!tqueue) {
ret = -ENOMEM;
goto err_disk;
@ -974,7 +974,7 @@ static int nvm_get_bb_meta(struct nvm_dev *dev, sector_t slba,
struct ppa_addr ppa;
u8 *blks;
int ch, lun, nr_blks;
int ret;
int ret = 0;
ppa.ppa = slba;
ppa = dev_to_generic_addr(dev, ppa);
@ -1140,20 +1140,26 @@ EXPORT_SYMBOL(nvm_alloc_dev);
int nvm_register(struct nvm_dev *dev)
{
int ret;
int ret, exp_pool_size;
if (!dev->q || !dev->ops)
return -EINVAL;
dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist");
if (!dev->dma_pool) {
pr_err("nvm: could not create dma pool\n");
return -ENOMEM;
}
ret = nvm_init(dev);
if (ret)
goto err_init;
return ret;
exp_pool_size = max_t(int, PAGE_SIZE,
(NVM_MAX_VLBA * (sizeof(u64) + dev->geo.sos)));
exp_pool_size = round_up(exp_pool_size, PAGE_SIZE);
dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist",
exp_pool_size);
if (!dev->dma_pool) {
pr_err("nvm: could not create dma pool\n");
nvm_free(dev);
return -ENOMEM;
}
/* register device with a supported media manager */
down_write(&nvm_lock);
@ -1161,9 +1167,6 @@ int nvm_register(struct nvm_dev *dev)
up_write(&nvm_lock);
return 0;
err_init:
dev->ops->destroy_dma_pool(dev->dma_pool);
return ret;
}
EXPORT_SYMBOL(nvm_register);

View File

@ -250,8 +250,8 @@ int pblk_alloc_rqd_meta(struct pblk *pblk, struct nvm_rq *rqd)
if (rqd->nr_ppas == 1)
return 0;
rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size;
rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size;
rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size(pblk);
rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size(pblk);
return 0;
}
@ -376,7 +376,7 @@ void pblk_write_should_kick(struct pblk *pblk)
{
unsigned int secs_avail = pblk_rb_read_count(&pblk->rwb);
if (secs_avail >= pblk->min_write_pgs)
if (secs_avail >= pblk->min_write_pgs_data)
pblk_write_kick(pblk);
}
@ -407,7 +407,9 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, struct pblk_line *line)
struct pblk_line_meta *lm = &pblk->lm;
struct pblk_line_mgmt *l_mg = &pblk->l_mg;
struct list_head *move_list = NULL;
int vsc = le32_to_cpu(*line->vsc);
int packed_meta = (le32_to_cpu(*line->vsc) / pblk->min_write_pgs_data)
* (pblk->min_write_pgs - pblk->min_write_pgs_data);
int vsc = le32_to_cpu(*line->vsc) + packed_meta;
lockdep_assert_held(&line->lock);
@ -531,7 +533,7 @@ void pblk_check_chunk_state_update(struct pblk *pblk, struct nvm_rq *rqd)
if (caddr == 0)
trace_pblk_chunk_state(pblk_disk_name(pblk),
ppa, NVM_CHK_ST_OPEN);
else if (caddr == chunk->cnlb)
else if (caddr == (chunk->cnlb - 1))
trace_pblk_chunk_state(pblk_disk_name(pblk),
ppa, NVM_CHK_ST_CLOSED);
}
@ -620,12 +622,15 @@ out:
}
int pblk_calc_secs(struct pblk *pblk, unsigned long secs_avail,
unsigned long secs_to_flush)
unsigned long secs_to_flush, bool skip_meta)
{
int max = pblk->sec_per_write;
int min = pblk->min_write_pgs;
int secs_to_sync = 0;
if (skip_meta && pblk->min_write_pgs_data != pblk->min_write_pgs)
min = max = pblk->min_write_pgs_data;
if (secs_avail >= max)
secs_to_sync = max;
else if (secs_avail >= min)
@ -796,10 +801,11 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line,
rqd.is_seq = 1;
for (i = 0; i < lm->smeta_sec; i++, paddr++) {
struct pblk_sec_meta *meta_list = rqd.meta_list;
struct pblk_sec_meta *meta = pblk_get_meta(pblk,
rqd.meta_list, i);
rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id);
meta_list[i].lba = lba_list[paddr] = addr_empty;
meta->lba = lba_list[paddr] = addr_empty;
}
ret = pblk_submit_io_sync_sem(pblk, &rqd);
@ -845,13 +851,13 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line,
if (!meta_list)
return -ENOMEM;
ppa_list = meta_list + pblk_dma_meta_size;
dma_ppa_list = dma_meta_list + pblk_dma_meta_size;
ppa_list = meta_list + pblk_dma_meta_size(pblk);
dma_ppa_list = dma_meta_list + pblk_dma_meta_size(pblk);
next_rq:
memset(&rqd, 0, sizeof(struct nvm_rq));
rq_ppas = pblk_calc_secs(pblk, left_ppas, 0);
rq_ppas = pblk_calc_secs(pblk, left_ppas, 0, false);
rq_len = rq_ppas * geo->csecs;
bio = pblk_bio_map_addr(pblk, emeta_buf, rq_ppas, rq_len,
@ -1276,6 +1282,7 @@ static int pblk_line_prepare(struct pblk *pblk, struct pblk_line *line)
return 0;
}
/* Line allocations in the recovery path are always single threaded */
int pblk_line_recov_alloc(struct pblk *pblk, struct pblk_line *line)
{
struct pblk_line_mgmt *l_mg = &pblk->l_mg;
@ -1295,15 +1302,22 @@ int pblk_line_recov_alloc(struct pblk *pblk, struct pblk_line *line)
ret = pblk_line_alloc_bitmaps(pblk, line);
if (ret)
return ret;
goto fail;
if (!pblk_line_init_bb(pblk, line, 0)) {
list_add(&line->list, &l_mg->free_list);
return -EINTR;
ret = -EINTR;
goto fail;
}
pblk_rl_free_lines_dec(&pblk->rl, line, true);
return 0;
fail:
spin_lock(&l_mg->free_lock);
list_add(&line->list, &l_mg->free_list);
spin_unlock(&l_mg->free_lock);
return ret;
}
void pblk_line_recov_close(struct pblk *pblk, struct pblk_line *line)
@ -2160,3 +2174,38 @@ void pblk_lookup_l2p_rand(struct pblk *pblk, struct ppa_addr *ppas,
}
spin_unlock(&pblk->trans_lock);
}
void *pblk_get_meta_for_writes(struct pblk *pblk, struct nvm_rq *rqd)
{
void *buffer;
if (pblk_is_oob_meta_supported(pblk)) {
/* Just use OOB metadata buffer as always */
buffer = rqd->meta_list;
} else {
/* We need to reuse last page of request (packed metadata)
* in similar way as traditional oob metadata
*/
buffer = page_to_virt(
rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
}
return buffer;
}
void pblk_get_packed_meta(struct pblk *pblk, struct nvm_rq *rqd)
{
void *meta_list = rqd->meta_list;
void *page;
int i = 0;
if (pblk_is_oob_meta_supported(pblk))
return;
page = page_to_virt(rqd->bio->bi_io_vec[rqd->bio->bi_vcnt - 1].bv_page);
/* We need to fill oob meta buffer with data from packed metadata */
for (; i < rqd->nr_ppas; i++)
memcpy(pblk_get_meta(pblk, meta_list, i),
page + (i * sizeof(struct pblk_sec_meta)),
sizeof(struct pblk_sec_meta));
}

View File

@ -207,9 +207,6 @@ static int pblk_rwb_init(struct pblk *pblk)
return pblk_rb_init(&pblk->rwb, buffer_size, threshold, geo->csecs);
}
/* Minimum pages needed within a lun */
#define ADDR_POOL_SIZE 64
static int pblk_set_addrf_12(struct pblk *pblk, struct nvm_geo *geo,
struct nvm_addrf_12 *dst)
{
@ -350,23 +347,19 @@ fail_destroy_ws:
static int pblk_get_global_caches(void)
{
int ret;
int ret = 0;
mutex_lock(&pblk_caches.mutex);
if (kref_read(&pblk_caches.kref) > 0) {
kref_get(&pblk_caches.kref);
mutex_unlock(&pblk_caches.mutex);
return 0;
}
if (kref_get_unless_zero(&pblk_caches.kref))
goto out;
ret = pblk_create_global_caches();
if (!ret)
kref_get(&pblk_caches.kref);
kref_init(&pblk_caches.kref);
out:
mutex_unlock(&pblk_caches.mutex);
return ret;
}
@ -406,12 +399,45 @@ static int pblk_core_init(struct pblk *pblk)
pblk->nr_flush_rst = 0;
pblk->min_write_pgs = geo->ws_opt;
pblk->min_write_pgs_data = pblk->min_write_pgs;
max_write_ppas = pblk->min_write_pgs * geo->all_luns;
pblk->max_write_pgs = min_t(int, max_write_ppas, NVM_MAX_VLBA);
pblk->max_write_pgs = min_t(int, pblk->max_write_pgs,
queue_max_hw_sectors(dev->q) / (geo->csecs >> SECTOR_SHIFT));
pblk_set_sec_per_write(pblk, pblk->min_write_pgs);
pblk->oob_meta_size = geo->sos;
if (!pblk_is_oob_meta_supported(pblk)) {
/* For drives which does not have OOB metadata feature
* in order to support recovery feature we need to use
* so called packed metadata. Packed metada will store
* the same information as OOB metadata (l2p table mapping,
* but in the form of the single page at the end of
* every write request.
*/
if (pblk->min_write_pgs
* sizeof(struct pblk_sec_meta) > PAGE_SIZE) {
/* We want to keep all the packed metadata on single
* page per write requests. So we need to ensure that
* it will fit.
*
* This is more like sanity check, since there is
* no device with such a big minimal write size
* (above 1 metabytes).
*/
pblk_err(pblk, "Not supported min write size\n");
return -EINVAL;
}
/* For packed meta approach we do some simplification.
* On read path we always issue requests which size
* equal to max_write_pgs, with all pages filled with
* user payload except of last one page which will be
* filled with packed metadata.
*/
pblk->max_write_pgs = pblk->min_write_pgs;
pblk->min_write_pgs_data = pblk->min_write_pgs - 1;
}
pblk->pad_dist = kcalloc(pblk->min_write_pgs - 1, sizeof(atomic64_t),
GFP_KERNEL);
if (!pblk->pad_dist)
@ -635,40 +661,61 @@ static unsigned int calc_emeta_len(struct pblk *pblk)
return (lm->emeta_len[1] + lm->emeta_len[2] + lm->emeta_len[3]);
}
static void pblk_set_provision(struct pblk *pblk, long nr_free_blks)
static int pblk_set_provision(struct pblk *pblk, int nr_free_chks)
{
struct nvm_tgt_dev *dev = pblk->dev;
struct pblk_line_mgmt *l_mg = &pblk->l_mg;
struct pblk_line_meta *lm = &pblk->lm;
struct nvm_geo *geo = &dev->geo;
sector_t provisioned;
int sec_meta, blk_meta;
int sec_meta, blk_meta, clba;
int minimum;
if (geo->op == NVM_TARGET_DEFAULT_OP)
pblk->op = PBLK_DEFAULT_OP;
else
pblk->op = geo->op;
provisioned = nr_free_blks;
minimum = pblk_get_min_chks(pblk);
provisioned = nr_free_chks;
provisioned *= (100 - pblk->op);
sector_div(provisioned, 100);
pblk->op_blks = nr_free_blks - provisioned;
if ((nr_free_chks - provisioned) < minimum) {
if (geo->op != NVM_TARGET_DEFAULT_OP) {
pblk_err(pblk, "OP too small to create a sane instance\n");
return -EINTR;
}
/* If the user did not specify an OP value, and PBLK_DEFAULT_OP
* is not enough, calculate and set sane value
*/
provisioned = nr_free_chks - minimum;
pblk->op = (100 * minimum) / nr_free_chks;
pblk_info(pblk, "Default OP insufficient, adjusting OP to %d\n",
pblk->op);
}
pblk->op_blks = nr_free_chks - provisioned;
/* Internally pblk manages all free blocks, but all calculations based
* on user capacity consider only provisioned blocks
*/
pblk->rl.total_blocks = nr_free_blks;
pblk->rl.nr_secs = nr_free_blks * geo->clba;
pblk->rl.total_blocks = nr_free_chks;
pblk->rl.nr_secs = nr_free_chks * geo->clba;
/* Consider sectors used for metadata */
sec_meta = (lm->smeta_sec + lm->emeta_sec[0]) * l_mg->nr_free_lines;
blk_meta = DIV_ROUND_UP(sec_meta, geo->clba);
pblk->capacity = (provisioned - blk_meta) * geo->clba;
clba = (geo->clba / pblk->min_write_pgs) * pblk->min_write_pgs_data;
pblk->capacity = (provisioned - blk_meta) * clba;
atomic_set(&pblk->rl.free_blocks, nr_free_blks);
atomic_set(&pblk->rl.free_user_blocks, nr_free_blks);
atomic_set(&pblk->rl.free_blocks, nr_free_chks);
atomic_set(&pblk->rl.free_user_blocks, nr_free_chks);
return 0;
}
static int pblk_setup_line_meta_chk(struct pblk *pblk, struct pblk_line *line,
@ -984,7 +1031,7 @@ static int pblk_lines_init(struct pblk *pblk)
struct pblk_line_mgmt *l_mg = &pblk->l_mg;
struct pblk_line *line;
void *chunk_meta;
long nr_free_chks = 0;
int nr_free_chks = 0;
int i, ret;
ret = pblk_line_meta_init(pblk);
@ -1031,7 +1078,9 @@ static int pblk_lines_init(struct pblk *pblk)
goto fail_free_lines;
}
pblk_set_provision(pblk, nr_free_chks);
ret = pblk_set_provision(pblk, nr_free_chks);
if (ret)
goto fail_free_lines;
vfree(chunk_meta);
return 0;
@ -1041,7 +1090,7 @@ fail_free_lines:
pblk_line_meta_free(l_mg, &pblk->lines[i]);
kfree(pblk->lines);
fail_free_chunk_meta:
kfree(chunk_meta);
vfree(chunk_meta);
fail_free_luns:
kfree(pblk->luns);
fail_free_meta:
@ -1154,6 +1203,12 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk,
return ERR_PTR(-EINVAL);
}
if (geo->ext) {
pblk_err(pblk, "extended metadata not supported\n");
kfree(pblk);
return ERR_PTR(-EINVAL);
}
spin_lock_init(&pblk->resubmit_lock);
spin_lock_init(&pblk->trans_lock);
spin_lock_init(&pblk->lock);

View File

@ -22,7 +22,7 @@
static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
struct ppa_addr *ppa_list,
unsigned long *lun_bitmap,
struct pblk_sec_meta *meta_list,
void *meta_list,
unsigned int valid_secs)
{
struct pblk_line *line = pblk_line_get_data(pblk);
@ -33,6 +33,9 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
int nr_secs = pblk->min_write_pgs;
int i;
if (!line)
return -ENOSPC;
if (pblk_line_is_full(line)) {
struct pblk_line *prev_line = line;
@ -42,8 +45,11 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
line = pblk_line_replace_data(pblk);
pblk_line_close_meta(pblk, prev_line);
if (!line)
return -EINTR;
if (!line) {
pblk_pipeline_stop(pblk);
return -ENOSPC;
}
}
emeta = line->emeta;
@ -52,6 +58,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
paddr = pblk_alloc_page(pblk, line, nr_secs);
for (i = 0; i < nr_secs; i++, paddr++) {
struct pblk_sec_meta *meta = pblk_get_meta(pblk, meta_list, i);
__le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
/* ppa to be sent to the device */
@ -68,14 +75,15 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
kref_get(&line->ref);
w_ctx = pblk_rb_w_ctx(&pblk->rwb, sentry + i);
w_ctx->ppa = ppa_list[i];
meta_list[i].lba = cpu_to_le64(w_ctx->lba);
meta->lba = cpu_to_le64(w_ctx->lba);
lba_list[paddr] = cpu_to_le64(w_ctx->lba);
if (lba_list[paddr] != addr_empty)
line->nr_valid_lbas++;
else
atomic64_inc(&pblk->pad_wa);
} else {
lba_list[paddr] = meta_list[i].lba = addr_empty;
lba_list[paddr] = addr_empty;
meta->lba = addr_empty;
__pblk_map_invalidate(pblk, line, paddr);
}
}
@ -84,50 +92,57 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry,
return 0;
}
void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
int pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry,
unsigned long *lun_bitmap, unsigned int valid_secs,
unsigned int off)
{
struct pblk_sec_meta *meta_list = rqd->meta_list;
void *meta_list = pblk_get_meta_for_writes(pblk, rqd);
void *meta_buffer;
struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
unsigned int map_secs;
int min = pblk->min_write_pgs;
int i;
int ret;
for (i = off; i < rqd->nr_ppas; i += min) {
map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
if (pblk_map_page_data(pblk, sentry + i, &ppa_list[i],
lun_bitmap, &meta_list[i], map_secs)) {
bio_put(rqd->bio);
pblk_free_rqd(pblk, rqd, PBLK_WRITE);
pblk_pipeline_stop(pblk);
}
meta_buffer = pblk_get_meta(pblk, meta_list, i);
ret = pblk_map_page_data(pblk, sentry + i, &ppa_list[i],
lun_bitmap, meta_buffer, map_secs);
if (ret)
return ret;
}
return 0;
}
/* only if erase_ppa is set, acquire erase semaphore */
void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
int pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
unsigned int sentry, unsigned long *lun_bitmap,
unsigned int valid_secs, struct ppa_addr *erase_ppa)
{
struct nvm_tgt_dev *dev = pblk->dev;
struct nvm_geo *geo = &dev->geo;
struct pblk_line_meta *lm = &pblk->lm;
struct pblk_sec_meta *meta_list = rqd->meta_list;
void *meta_list = pblk_get_meta_for_writes(pblk, rqd);
void *meta_buffer;
struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd);
struct pblk_line *e_line, *d_line;
unsigned int map_secs;
int min = pblk->min_write_pgs;
int i, erase_lun;
int ret;
for (i = 0; i < rqd->nr_ppas; i += min) {
map_secs = (i + min > valid_secs) ? (valid_secs % min) : min;
if (pblk_map_page_data(pblk, sentry + i, &ppa_list[i],
lun_bitmap, &meta_list[i], map_secs)) {
bio_put(rqd->bio);
pblk_free_rqd(pblk, rqd, PBLK_WRITE);
pblk_pipeline_stop(pblk);
}
meta_buffer = pblk_get_meta(pblk, meta_list, i);
ret = pblk_map_page_data(pblk, sentry + i, &ppa_list[i],
lun_bitmap, meta_buffer, map_secs);
if (ret)
return ret;
erase_lun = pblk_ppa_to_pos(geo, ppa_list[i]);
@ -163,7 +178,7 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd,
*/
e_line = pblk_line_get_erase(pblk);
if (!e_line)
return;
return -ENOSPC;
/* Erase blocks that are bad in this line but might not be in next */
if (unlikely(pblk_ppa_empty(*erase_ppa)) &&
@ -174,7 +189,7 @@ retry:
bit = find_next_bit(d_line->blk_bitmap,
lm->blk_per_line, bit + 1);
if (bit >= lm->blk_per_line)
return;
return 0;
spin_lock(&e_line->lock);
if (test_bit(bit, e_line->erase_bitmap)) {
@ -188,4 +203,6 @@ retry:
*erase_ppa = pblk->luns[bit].bppa; /* set ch and lun */
erase_ppa->a.blk = e_line->id;
}
return 0;
}

View File

@ -147,7 +147,7 @@ int pblk_rb_init(struct pblk_rb *rb, unsigned int size, unsigned int threshold,
/*
* Initialize rate-limiter, which controls access to the write buffer
* but user and GC I/O
* by user and GC I/O
*/
pblk_rl_init(&pblk->rl, rb->nr_entries);
@ -552,6 +552,9 @@ unsigned int pblk_rb_read_to_bio(struct pblk_rb *rb, struct nvm_rq *rqd,
to_read = count;
}
/* Add space for packed metadata if in use*/
pad += (pblk->min_write_pgs - pblk->min_write_pgs_data);
c_ctx->sentry = pos;
c_ctx->nr_valid = to_read;
c_ctx->nr_padded = pad;

View File

@ -43,7 +43,7 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
struct bio *bio, sector_t blba,
unsigned long *read_bitmap)
{
struct pblk_sec_meta *meta_list = rqd->meta_list;
void *meta_list = rqd->meta_list;
struct ppa_addr ppas[NVM_MAX_VLBA];
int nr_secs = rqd->nr_ppas;
bool advanced_bio = false;
@ -53,12 +53,15 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd,
for (i = 0; i < nr_secs; i++) {
struct ppa_addr p = ppas[i];
struct pblk_sec_meta *meta = pblk_get_meta(pblk, meta_list, i);
sector_t lba = blba + i;
retry:
if (pblk_ppa_empty(p)) {
__le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
WARN_ON(test_and_set_bit(i, read_bitmap));
meta_list[i].lba = cpu_to_le64(ADDR_EMPTY);
meta->lba = addr_empty;
if (unlikely(!advanced_bio)) {
bio_advance(bio, (i) * PBLK_EXPOSED_PAGE_SIZE);
@ -78,7 +81,7 @@ retry:
goto retry;
}
WARN_ON(test_and_set_bit(i, read_bitmap));
meta_list[i].lba = cpu_to_le64(lba);
meta->lba = cpu_to_le64(lba);
advanced_bio = true;
#ifdef CONFIG_NVM_PBLK_DEBUG
atomic_long_inc(&pblk->cache_reads);
@ -105,12 +108,16 @@ next:
static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
sector_t blba)
{
struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
void *meta_list = rqd->meta_list;
int nr_lbas = rqd->nr_ppas;
int i;
if (!pblk_is_oob_meta_supported(pblk))
return;
for (i = 0; i < nr_lbas; i++) {
u64 lba = le64_to_cpu(meta_lba_list[i].lba);
struct pblk_sec_meta *meta = pblk_get_meta(pblk, meta_list, i);
u64 lba = le64_to_cpu(meta->lba);
if (lba == ADDR_EMPTY)
continue;
@ -134,17 +141,22 @@ static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd,
static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd,
u64 *lba_list, int nr_lbas)
{
struct pblk_sec_meta *meta_lba_list = rqd->meta_list;
void *meta_lba_list = rqd->meta_list;
int i, j;
if (!pblk_is_oob_meta_supported(pblk))
return;
for (i = 0, j = 0; i < nr_lbas; i++) {
struct pblk_sec_meta *meta = pblk_get_meta(pblk,
meta_lba_list, j);
u64 lba = lba_list[i];
u64 meta_lba;
if (lba == ADDR_EMPTY)
continue;
meta_lba = le64_to_cpu(meta_lba_list[j].lba);
meta_lba = le64_to_cpu(meta->lba);
if (lba != meta_lba) {
#ifdef CONFIG_NVM_PBLK_DEBUG
@ -216,15 +228,15 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
struct pblk *pblk = rqd->private;
struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
struct pblk_pr_ctx *pr_ctx = r_ctx->private;
struct pblk_sec_meta *meta;
struct bio *new_bio = rqd->bio;
struct bio *bio = pr_ctx->orig_bio;
struct bio_vec src_bv, dst_bv;
struct pblk_sec_meta *meta_list = rqd->meta_list;
void *meta_list = rqd->meta_list;
int bio_init_idx = pr_ctx->bio_init_idx;
unsigned long *read_bitmap = pr_ctx->bitmap;
int nr_secs = pr_ctx->orig_nr_secs;
int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs);
__le64 *lba_list_mem, *lba_list_media;
void *src_p, *dst_p;
int hole, i;
@ -237,13 +249,10 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
rqd->ppa_list[0] = ppa;
}
/* Re-use allocated memory for intermediate lbas */
lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size);
for (i = 0; i < nr_secs; i++) {
lba_list_media[i] = meta_list[i].lba;
meta_list[i].lba = lba_list_mem[i];
meta = pblk_get_meta(pblk, meta_list, i);
pr_ctx->lba_list_media[i] = le64_to_cpu(meta->lba);
meta->lba = cpu_to_le64(pr_ctx->lba_list_mem[i]);
}
/* Fill the holes in the original bio */
@ -255,7 +264,8 @@ static void pblk_end_partial_read(struct nvm_rq *rqd)
line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]);
kref_put(&line->ref, pblk_line_put);
meta_list[hole].lba = lba_list_media[i];
meta = pblk_get_meta(pblk, meta_list, hole);
meta->lba = cpu_to_le64(pr_ctx->lba_list_media[i]);
src_bv = new_bio->bi_io_vec[i++];
dst_bv = bio->bi_io_vec[bio_init_idx + hole];
@ -291,17 +301,13 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
unsigned long *read_bitmap,
int nr_holes)
{
struct pblk_sec_meta *meta_list = rqd->meta_list;
void *meta_list = rqd->meta_list;
struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
struct pblk_pr_ctx *pr_ctx;
struct bio *new_bio, *bio = r_ctx->private;
__le64 *lba_list_mem;
int nr_secs = rqd->nr_ppas;
int i;
/* Re-use allocated memory for intermediate lbas */
lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size);
new_bio = bio_alloc(GFP_KERNEL, nr_holes);
if (pblk_bio_add_pages(pblk, new_bio, GFP_KERNEL, nr_holes))
@ -312,12 +318,15 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
goto fail_free_pages;
}
pr_ctx = kmalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL);
pr_ctx = kzalloc(sizeof(struct pblk_pr_ctx), GFP_KERNEL);
if (!pr_ctx)
goto fail_free_pages;
for (i = 0; i < nr_secs; i++)
lba_list_mem[i] = meta_list[i].lba;
for (i = 0; i < nr_secs; i++) {
struct pblk_sec_meta *meta = pblk_get_meta(pblk, meta_list, i);
pr_ctx->lba_list_mem[i] = le64_to_cpu(meta->lba);
}
new_bio->bi_iter.bi_sector = 0; /* internal bio */
bio_set_op_attrs(new_bio, REQ_OP_READ, 0);
@ -325,7 +334,6 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd,
rqd->bio = new_bio;
rqd->nr_ppas = nr_holes;
pr_ctx->ppa_ptr = NULL;
pr_ctx->orig_bio = bio;
bitmap_copy(pr_ctx->bitmap, read_bitmap, NVM_MAX_VLBA);
pr_ctx->bio_init_idx = bio_init_idx;
@ -383,7 +391,7 @@ err:
static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
sector_t lba, unsigned long *read_bitmap)
{
struct pblk_sec_meta *meta_list = rqd->meta_list;
struct pblk_sec_meta *meta = pblk_get_meta(pblk, rqd->meta_list, 0);
struct ppa_addr ppa;
pblk_lookup_l2p_seq(pblk, &ppa, lba, 1);
@ -394,8 +402,10 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio,
retry:
if (pblk_ppa_empty(ppa)) {
__le64 addr_empty = cpu_to_le64(ADDR_EMPTY);
WARN_ON(test_and_set_bit(0, read_bitmap));
meta_list[0].lba = cpu_to_le64(ADDR_EMPTY);
meta->lba = addr_empty;
return;
}
@ -409,7 +419,7 @@ retry:
}
WARN_ON(test_and_set_bit(0, read_bitmap));
meta_list[0].lba = cpu_to_le64(lba);
meta->lba = cpu_to_le64(lba);
#ifdef CONFIG_NVM_PBLK_DEBUG
atomic_long_inc(&pblk->cache_reads);

Some files were not shown because too many files have changed in this diff Show More