We emulate the barrier requests by draining the outstanding bio's
and then sending the WRITE_FLUSH command. To drain the I/Os
we use the refcnt that is used during disconnect to wait for all
the I/Os before disconnecting from the frontend. We latch on its
value and if it reaches either the threshold for disconnect or when
there are no more outstanding I/Os, then we have drained all I/Os.
Suggested-by: Christopher Hellwig <hch@infradead.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
This fixes the problem of three of those four memset()-s having
improper size arguments passed: Sizeof a pointer-typed expression
returns the size of the pointer, not that of the pointed to data.
It also reverts using kmalloc() instead of kzalloc() for the allocation
of the pending grant handles array, as that array gets fully
initialized in a subsequent loop.
Reported-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
..aka ATA TRIM/SCSI UNMAP command to be passed through the frontend
and used as appropiately by the backend. We also advertise
certain granulity parameters to the frontend so it can plug them in.
If the backend is a realy device - we just end up using
'blkdev_issue_discard' while for loopback devices - we just punch
a hole in the image file.
Signed-off-by: Li Dongyang <lidongyang@novell.com>
[v1: Fixed up pr_debug and commit description]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Add xen-backend:vbd module alias to the xen-blkback module. This allows
automatic loading of the module.
Signed-off-by: Bastian Blank <waldi@debian.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Running RING_FINAL_CHECK_FOR_REQUESTS from make_response is a bad
idea. It means that in-flight I/O is essentially blocking continued
batches. This essentially kills throughput on frontends which unplug
(or even just notify) early and rightfully assume addtional requests
will be picked up on time, not synchronously.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
[v1: Rebased and fixed compile problems]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
blkbk->pending_pages can be NULL here so I added a check for it.
Signed-off-by: Dan Carpenter <error27@gmail.com>
[v1: Redid the loop a bit]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The sector number on empty barrier requests may (will?) be -1, which,
given that it's being treated as unsigned 64-bit quantity, will almost
always exceed the actual (virtual) disk's size.
Inspired by Konrad's "When writting barriers set the sector number to
zero...".
While at it also add overflow checking to the math in vbd_translate().
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
vbd_resize() up_read()'s xs_state.suspend_mutex twice in a row via double
xenbus_transaction_end() calls. The next down_read() in
xenbus_transaction_start() (at eg. the next resize attempt) hangs.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=618317
Acked-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
We do a check for the operations right before calling dispatch_rw_block_io.
And then we do the same check in dispatch_rw_block_io. This patch
squashes those checks into the 'dispatch_rw_block_io' function.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
We drop the support for 'feature-barrier' and add in the support
for the 'feature-flush-cache' if the real backend storage supports
flushing.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
If one runs a simple fio request with random read/write with a
20%/80% ratio, the numbers are incredibly bad when using the CFQ scheduler.
IOmeter | | | |
64K, randrw | NOOP | CFQ | deadline |
randrwmix=80 | | | |
--------------+-------+------+----------+
blkback |103/27 |32/10 | 102/27 |
--------------+-------+------+----------+
QEMU qdisk |103/27 |102/27| 102/27 |
The problem as explained by Vivek Goyal was:
".. that difference is that sync vs async requests. In the case of
a kernel thread submitting IO, [..] all the WRITES might be being
considered as async and will go in a different queue. If you mix those
with some READS, they are always sync and will go in differnet queue.
In presence of sync queue, CFQ will idle and choke up WRITES in
an attempt to improve latencies of READs.
In case of AIO [note: this is what QEMU qdisk is doing] , [..]
it is direct IO and both READS and WRITES will be considered SYNC
and will go in a single queue and no choking of WRITES will take place."
The solution is quite simple, tack on REQ_SYNC (which is
what the WRITE_ODIRECT macro points to) and the numbers go
back up.
Suggested-by: Vivek Goyal <vgoyal@redhat.com
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
We used to the plug/unplug on the submit_bio. But that means
if within a stream of WRITE, WRITE, WRITE,...,WRITE we have
one READ, it could stall the pipeline (as the 'submio_bio'
could trigger the unplug_fnc to be called and stall/sync
when doing the READ). Instead we want to move the unplugging
when the whole (or as a much as possible) ring buffer has been
processed. This also eliminates us doing plug/unplug for
each request.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
And also shorten the name if it has blkback to blkbk.
This results in the symbol table (if compiled in the kernel)
to be much shorter, prettier, and also easier to search for.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Daniel Stodden suggested to eliminate vbd.c and interface.c, inlining the
critical bits where they belong, respectively.
Leaving only blkback.c for the data- and xenbus.c for the control path.
Suggested-by: Daniel Stodden <daniel.stodden@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>