the wake_up_process func is included by spin_lock/unlock in
vhost_work_queue,
but it could be done outside the spin_lock.
I have test it with kernel 3.0.27 and guest suse11-sp2 using iperf,
the num as below.
original modified
thread_num tp(Gbps) vhost(%) | tp(Gbps) vhost(%)
1 9.59 28.82 | 9.59 27.49
8 9.61 32.92 | 9.62 26.77
64 9.58 46.48 | 9.55 38.99
256 9.6 63.7 | 9.6 52.59
Signed-off-by: Chuanyu Qin <qinchuanyu@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Pull SCSI target updates from Nicholas Bellinger:
"Lots of activity again this round for I/O performance optimizations
(per-cpu IDA pre-allocation for vhost + iscsi/target), and the
addition of new fabric independent features to target-core
(COMPARE_AND_WRITE + EXTENDED_COPY).
The main highlights include:
- Support for iscsi-target login multiplexing across individual
network portals
- Generic Per-cpu IDA logic (kent + akpm + clameter)
- Conversion of vhost to use per-cpu IDA pre-allocation for
descriptors, SGLs and userspace page pointer list
- Conversion of iscsi-target + iser-target to use per-cpu IDA
pre-allocation for descriptors
- Add support for generic COMPARE_AND_WRITE (AtomicTestandSet)
emulation for virtual backend drivers
- Add support for generic EXTENDED_COPY (CopyOffload) emulation for
virtual backend drivers.
- Add support for fast memory registration mode to iser-target (Vu)
The patches to add COMPARE_AND_WRITE and EXTENDED_COPY support are of
particular significance, which make us the first and only open source
target to support the full set of VAAI primitives.
Currently Linux clients are lacking upstream support to actually
utilize these primitives. However, with server side support now in
place for folks like MKP + ZAB working on the client, this logic once
reserved for the highest end of storage arrays, can now be run in VMs
on their laptops"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (50 commits)
target/iscsi: Bump versions to v4.1.0
target: Update copyright ownership/year information to 2013
iscsi-target: Bump default TCP listen backlog to 256
target: Fix >= v3.9+ regression in PR APTPL + ALUA metadata write-out
iscsi-target; Bump default CmdSN Depth to 64
iscsi-target: Remove unnecessary wait_for_completion in iscsi_get_thread_set
iscsi-target: Add thread_set->ts_activate_sem + use common deallocate
iscsi-target: Fix race with thread_pre_handler flush_signals + ISCSI_THREAD_SET_DIE
target: remove unused including <linux/version.h>
iser-target: introduce fast memory registration mode (FRWR)
iser-target: generalize rdma memory registration and cleanup
iser-target: move rdma wr processing to a shared function
target: Enable global EXTENDED_COPY setup/release
target: Add Third Party Copy (3PC) bit in INQUIRY response
target: Enable EXTENDED_COPY setup in spc_parse_cdb
target: Add support for EXTENDED_COPY copy offload emulation
target: Avoid non-existent tg_pt_gp_mem in target_alua_state_check
target: Add global device list for EXTENDED_COPY
target: Make helpers non static for EXTENDED_COPY command setup
target: Make spc_parse_naa_6h_vendor_specific non static
...
Update copyright ownership/year information for target-core,
loopback, iscsi-target, tcm_qla2xx, vhost and iser-target.
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This patch adds support for pre-allocation of per tv_cmd descriptor
scatterlist + user-space page pointer memory using se_sess->sess_cmd_map
within tcm_vhost_make_nexus() code.
This includes sanity checks within vhost_scsi_map_to_sgl()
to reject I/O that exceeds these initial hardcoded values, and
the necessary cleanup in tcm_vhost_make_nexus() failure path +
tcm_vhost_drop_nexus().
v3 changes:
- Rebase to v3.11-rc5 code
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Asias He <asias@redhat.com>
Cc: Kent Overstreet <kmo@daterainc.com>
Reviewed-by: Asias He <asias@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This patch changes vhost/scsi to use transport_init_session_tags()
pre-allocation logic for per-cpu session tag pooling with internal
ida_alloc() + ida_free() calls based upon the saved se_cmd->map_tag id.
FIXME: Make transport_init_session_tags() number of tags setup
configurable per vring client setting via configfs
v5 changes:
- Convert to percpu_ida.h include
v3 changes:
- Update to percpu-ida usage
- Rebase to v3.11-rc5 code
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Asias He <asias@redhat.com>
Cc: Kent Overstreet <kmo@daterainc.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
As Michael point out, We used to limit the max pending DMAs to get better cache
utilization. But it was not done correctly since it was one done when there's no
new buffers submitted from guest. Guest can easily exceeds the limitation by
keeping sending packets.
So this patch moves the check into main loop. Tests shows about 5%-10%
improvement on per cpu throughput for guest tx.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We used to poll vhost queue before making DMA is done, this is racy if vhost
thread were waked up before marking DMA is done which can result the signal to
be missed. Fix this by always polling the vhost thread before DMA is done.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, even if the packet length is smaller than VHOST_GOODCOPY_LEN, if
upend_idx != done_idx we still set zcopy_used to true and rollback this choice
later. This could be avoided by determining zerocopy once by checking all
conditions at one time before.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Let vhost_add_used() to use vhost_add_used_n() to reduce the code
duplication. To avoid the overhead brought by __copy_to_user(). We will use
put_user() when one used need to be added.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We tend to batch the used adding and signaling in vhost_zerocopy_callback()
which may result more than 100 used buffers to be updated in
vhost_zerocopy_signal_used() in some cases. So switch to use
vhost_add_used_and_signal_n() to avoid multiple calls to
vhost_add_used_and_signal(). Which means much less times of used index
updating and memory barriers.
2% performance improvement were seen on netperf TCP_RR test.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
None of its caller use its return value, so let it return void.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
memcpy_fromiovec is moved from net/core/iovec.c to lib/iovec.c.
linux/uio.h provides the declaration for memcpy_fromiovec.
Include linux/uio.h instead of inux/socket.h for it.
Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This includes some fixes for vhost net and scsi drivers.
The test module has already been reworked to avoid rcu
usage, but the necessary core changes are missing,
we fixed this.
Unlikely to affect any real-world users, but it's
early in the cycle so, let's merge them.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQEcBAABAgAGBQJR3qu5AAoJECgfDbjSjVRp66UIAKZFjSnjWBhQbqI+LkrrUJ+H
Zv4xnEgTU+5k7zLpArqhDABbWMPVUSHm91ecjS/OX9WWsXVYG2jAT+PK6F3PrexK
BpUA3RDX8a5drXleidhwGf2dHmsBhTTwZyn31h/pvKAiM27kVjFOS0S9Rhnpu/0x
UI/z/s//CZ/kNKbCvYVqu0+KjbrwDiRD1N56hGDLqwsVoFIn1sP9CBIg/PHCqebK
GnPSidTFdRu4yvFvhwJqQ3RgfpCxbgBc619knWr8XX4PVoH25NdpLZ4yHrR58Ms0
Hb2IJ5q9bOGf0hbLFKOeHO29uVyOtTGGq0UG7NnfS1aToIT/Zw2kYICV5kBlSvs=
=XGuu
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull vhost fixes from Michael Tsirkin:
"vhost: more fixes for 3.11
This includes some fixes for vhost net and scsi drivers.
The test module has already been reworked to avoid rcu usage, but the
necessary core changes are missing, we fixed this.
Unlikely to affect any real-world users, but it's early in the cycle
so, let's merge them"
(It was earlier when Michael originally sent the email, but it somehot
got missed in the flood, so here it is after -rc2)
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost: Remove custom vhost rcu usage
vhost-scsi: Always access vq->private_data under vq mutex
vhost-net: Always access vq->private_data under vq mutex
Pull SCSI target updates from Nicholas Bellinger:
"Lots of activity this round on performance improvements in target-core
while benchmarking the prototype scsi-mq initiator code with
vhost-scsi fabric ports, along with a number of iscsi/iser-target
improvements and hardening fixes for exception path cases post v3.10
merge.
The highlights include:
- Make persistent reservations APTPL buffer allocated on-demand, and
drop per t10_reservation buffer. (grover)
- Make virtual LUN=0 a NULLIO device, and skip allocation of NULLIO
device pages (grover)
- Add transport_cmd_check_stop write_pending bit to avoid extra
access of ->t_state_lock is WRITE I/O submission fast-path. (nab)
- Drop unnecessary CMD_T_DEV_ACTIVE check from
transport_lun_remove_cmd to avoid extra access of ->t_state_lock in
release fast-path. (nab)
- Avoid extra t_state_lock access in __target_execute_cmd fast-path
(nab)
- Drop unnecessary vhost-scsi wait_for_tasks=true usage +
->t_state_lock access in release fast-path. (nab)
- Convert vhost-scsi to use modern se_cmd->cmd_kref
TARGET_SCF_ACK_KREF usage (nab)
- Add tracepoints for SCSI commands being processed (roland)
- Refactoring of iscsi-target handling of ISCSI_OP_NOOP +
ISCSI_OP_TEXT to be transport independent (nab)
- Add iscsi-target SendTargets=$IQN support for in-band discovery
(nab)
- Add iser-target support for in-band discovery (nab + Or)
- Add iscsi-target demo-mode TPG authentication context support (nab)
- Fix isert_put_reject payload buffer post (nab)
- Fix iscsit_add_reject* usage for iser (nab)
- Fix iscsit_sequence_cmd reject handling for iser (nab)
- Fix ISCSI_OP_SCSI_TMFUNC handling for iser (nab)
- Fix session reset bug with RDMA_CM_EVENT_DISCONNECTED (nab)
The last five iscsi/iser-target items are CC'ed to stable, as they do
address issues present in v3.10 code. They are certainly larger than
I'd like for stable patch set, but are important to ensure proper
REJECT exception handling in iser-target for 3.10.y"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (51 commits)
iser-target: Ignore non TEXT + LOGOUT opcodes for discovery
target: make queue_tm_rsp() return void
target: remove unused codes from enum tcm_tmrsp_table
iscsi-target: kstrtou* configfs attribute parameter cleanups
iscsi-target: Fix tfc_tpg_auth_cit configfs length overflow
iscsi-target: Fix tfc_tpg_nacl_auth_cit configfs length overflow
iser-target: Add support for ISCSI_OP_TEXT opcode + payload handling
iser-target: Rename sense_buf_[dma,len] to pdu_[dma,len]
iser-target: Add vendor_err debug output
target: Add (obsolete) checking for PMI/LBA fields in READ CAPACITY(10)
target: Return correct sense data for IO past the end of a device
target: Add tracepoints for SCSI commands being processed
iser-target: Fix session reset bug with RDMA_CM_EVENT_DISCONNECTED
iscsi-target: Fix ISCSI_OP_SCSI_TMFUNC handling for iser
iscsi-target: Fix iscsit_sequence_cmd reject handling for iser
iscsi-target: Fix iscsit_add_reject* usage for iser
iser-target: Fix isert_put_reject payload buffer post
iscsi-target: missing kfree() on error path
iscsi-target: Drop left-over iscsi_conn->bad_hdr
target: Make core_scsi3_update_and_write_aptpl return sense_reason_t
...
Now, vq->private_data is always accessed under vq mutex. No need to play
the vhost rcu trick.
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The return value wasn't checked by any of the callers. Assuming this is
correct behaviour, we can simplify some code by not bothering to
generate it.
nab: Add srpt_queue_data_in() + srpt_queue_tm_rsp() nops around
srpt_queue_response() void return
Signed-off-by: Joern Engel <joern@logfs.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
$ make C=1 M=drivers/vhost
drivers/vhost/net.c:168:5: warning: symbol 'vhost_net_set_ubuf_info' was not declared. Should it be static?
drivers/vhost/net.c:194:6: warning: symbol 'vhost_net_vq_reset' was not declared. Should it be static?
drivers/vhost/scsi.c:219:6: warning: symbol 'tcm_vhost_done_inflight' was not declared. Should it be static?
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Currently, vhost-net and vhost-scsi are sharing the vhost core code.
However, vhost-scsi shares the code by including the vhost.c file
directly.
Making vhost a separate module makes it is easier to share code with
other vhost devices.
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This way, we use cmd for struct tcm_vhost_cmd and evt for struct
tcm_vhost_cmd.
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
It was needed when struct tcm_vhost_tpg is in tcm_vhost.h
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vhost_net_ubuf_put_and_wait has a confusing name:
it will actually also free it's argument.
Thus since commit 1280c27f8e
"vhost-net: flush outstanding DMAs on memory change"
vhost_net_flush tries to use the argument after passing it
to vhost_net_ubuf_put_and_wait, this results
in use after free.
To fix, don't free the argument in vhost_net_ubuf_put_and_wait,
add an new API for callers that want to free ubufs.
Acked-by: Asias He <asias@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This patch coverts vhost/scsi to se_cmd->cmd_kref TARGET_SCF_ACK_KREF
usage, instead of assuming that vhost_scsi_free_cmd() is always called
before TCM processing is completed in the response fast path.
This includes adding vhost_scsi_check_stop_free() -> target_put_sess_cmd()
to perform the second se_cmd->cmd_kref put, and moving vhost_scsi_free_cmd()
resource release into tcm_vhost_release_cmd() that is invoked once the last
se_cmd->cmd_kref put occurs.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Roland Dreier <roland@kernel.org>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Asias He <asias@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Moussa Ba <moussaba@micron.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This patch changes vhost_scsi_free_cmd() to call transport_generic_free_cmd()
with wait_for_tasks=false in order to avoid the extra se_cmd->t_state_lock
access for the wait_for_tasks=true case.
This is unnecessary because vhost_scsi_free_cmd() is only ever called by
vhost_scsi_complete_cmd_work() after TCM completion handoff, and by
vhost_scsi_handle_vq() exception code before TCM submission handoff, so
there is never a case where se_cmd is still active from TCM's perspective
when transport_generic_free_cmd() is called.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Roland Dreier <roland@kernel.org>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Asias He <asias@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Moussa Ba <moussaba@micron.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
vhost_net_clear_ubuf_info didn't clear ubuf_info
after kfree, this could trigger double free.
Fix this and simplify this code to make it more robust: make sure
ubuf info is always freed through vhost_net_clear_ubuf_info.
Reported-by: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If device has an owner, we shouldn't touch ubuf_info
since it might be in use.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we decide not use zero-copy, msg.control should be set to NULL otherwise
macvtap/tap may set zerocopy callbacks which may decrease the kref of ubufs
wrongly.
Bug were introduced by commit cedb9bdce0
(vhost-net: skip head management if no outstanding).
This solves the following warnings:
WARNING: at include/linux/kref.h:47 handle_tx+0x477/0x4b0 [vhost_net]()
Modules linked in: vhost_net macvtap macvlan tun nfsd exportfs bridge stp llc openvswitch kvm_amd kvm bnx2 megaraid_sas [last unloaded: tun]
CPU: 5 PID: 8670 Comm: vhost-8668 Not tainted 3.10.0-rc2+ #1566
Hardware name: Dell Inc. PowerEdge R715/00XHKG, BIOS 1.5.2 04/19/2011
ffffffffa0198323 ffff88007c9ebd08 ffffffff81796b73 ffff88007c9ebd48
ffffffff8103d66b 000000007b773e20 ffff8800779f0000 ffff8800779f43f0
ffff8800779f8418 000000000000015c 0000000000000062 ffff88007c9ebd58
Call Trace:
[<ffffffff81796b73>] dump_stack+0x19/0x1e
[<ffffffff8103d66b>] warn_slowpath_common+0x6b/0xa0
[<ffffffff8103d6b5>] warn_slowpath_null+0x15/0x20
[<ffffffffa0197627>] handle_tx+0x477/0x4b0 [vhost_net]
[<ffffffffa0197690>] handle_tx_kick+0x10/0x20 [vhost_net]
[<ffffffffa019541e>] vhost_worker+0xfe/0x1a0 [vhost_net]
[<ffffffffa0195320>] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net]
[<ffffffffa0195320>] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net]
[<ffffffff81061f46>] kthread+0xc6/0xd0
[<ffffffff81061e80>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff817a1aec>] ret_from_fork+0x7c/0xb0
[<ffffffff81061e80>] ? kthread_freezable_should_stop+0x70/0x70
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes some minor issues in the patches that have been merged.
We also finally drop the workaround disabling event_idx
for scsi: it was always questionable, and now we
know it's not needed.
There's also a memory leak fix.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQEcBAABAgAGBQJRiN83AAoJECgfDbjSjVRpf8AIAJvaQ8Fnti9abad0nzf96him
bPPy4IDj/oxXRldTIKdP9ux72U4XQpWNUsFy8//6Ogd4gC8n1hgSZH/AbH6bGbU1
39u/fpBAeIy/F9WFVwI3Cdrz3tWlBo4Via0pG2TUNGydI6Cs3UTwouwfvs0KhXrm
u1YSieAir817TWXEjwDf4e0bzsDHVZPkxH/OX8mvfn13xHGoGjYOxOo9DHi2Lhwd
aXwd3SnsjFjp/7T9U2Uqo0USzRmJMu/PqaIQAAtsOFrzZvlCw6N8y8ozQuLPjq2a
B3aUiOw+TkoTW1QbPeRk7+WE/ySqYdydOvk1qhWmz8Yy3qO6914PrZhLsfR6wiA=
=1Yfl
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull more vhost fixes from Michael Tsirkin:
"This fixes some minor issues in the patches that have been merged.
We also finally drop the workaround disabling event_idx for scsi: it
was always questionable, and now we know it's not needed.
There's also a memory leak fix"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost-scsi: Enable VIRTIO_RING_F_EVENT_IDX
vhost: drop virtio_net.h dependency
vhost-net: Cleanup vhost_ubuf and vhost_zcopy
vhost: Remove vhost_enable_zcopy in vhost.h
vhost: Remove comments for hdr in vhost.h
vhost: Move VHOST_NET_FEATURES to net.c
vhost-net: Free ubuf when vhost_dev_set_owner fails
vhost: Export vhost_dev_set_owner
It was disabled as a workaround. Now userspace bits work fine with it.
The broken version was not ever committed to QEMU, I guess the same is
true for nlkt.
So, let's enable it.
Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
- Rename vhost_ubuf to vhost_net_ubuf
- Rename vhost_zcopy_mask to vhost_net_zcopy_mask
- Make funcs static
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
It is supposed to be removed when hdr is moved into vhost_net_virtqueue.
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vhost.h should not depend on device specific marcos like
VHOST_NET_F_VIRTIO_NET_HDR and VIRTIO_NET_F_MRG_RXBUF.
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
I dived into lguest again, reworking the pagetable code so we can move
the switcher page: our fixmaps sometimes take more than 2MB now...
Cheers,
Rusty.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJRga7lAAoJENkgDmzRrbjx/yIQAKpqIBtxOJeYH3SY+Uoe7Cfp
toNYcpJEldvb0UcWN8M2cSZpHoxl1SUoq9djwcM29tcKa7EZAjHaGtb/Q1qMTDgv
+B3WAfiGU2pmXFxLAkbrlLNGnysy24JspqJQ5hcYV84EiBxQdZp+nCYgOphd+GMK
ww16vo9ya8jFjzt3GeRp/Heb3vEzV4Cp6BC3i0m8A3WNpEpbRb66pqXNk5o8ggJO
SxQOKSXmUM+0m+jKSul5xn3e2Ls2LOrZZ8/DIHA+gW66N4Zab7n2/j1Q9VRxb4lh
FqnR7KwgBX8OCh9IsBDqQYS7MohvMYge6eUdLtFrq84jvMleMEhrC8q9v2tucFUb
5t18CLwvyK7Gdg6UCKiZ7YSPcuURAILO16al9bh5IseeBDsuX+43VsvQoBmFn9k6
cLOVTZ6BlOmahK5PyRYFSvLa9Rxzr/05Mr7oYq9UgshD9io78dnqczFYIORF53rW
zD7C4HuTZfYJFfNd0wAJ0RfVXnf8QvDlMdo7zPC26DSXNWqj8OexCY0qqSWUB+2F
vcfJP6NkV4fZB8aawWIFUVwc64yqtt2uPVLa7ATZWqk16PgKrchGewmw3tiEwOgu
1l7xgffTRRUIJsqaCZoXdgw3yezcKRjuUBcOxL09lDAAhc+NxWNvzZBsKp66DwDk
yZQKn0OdXnuf0CeEOfFf
=1tYL
-----END PGP SIGNATURE-----
Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull virtio & lguest updates from Rusty Russell:
"Lots of virtio work which wasn't quite ready for last merge window.
Plus I dived into lguest again, reworking the pagetable code so we can
move the switcher page: our fixmaps sometimes take more than 2MB now..."
Ugh. Annoying conflicts with the tcm_vhost -> vhost_scsi rename.
Hopefully correctly resolved.
* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (57 commits)
caif_virtio: Remove bouncing email addresses
lguest: improve code readability in lg_cpu_start.
virtio-net: fill only rx queues which are being used
lguest: map Switcher below fixmap.
lguest: cache last cpu we ran on.
lguest: map Switcher text whenever we allocate a new pagetable.
lguest: don't share Switcher PTE pages between guests.
lguest: expost switcher_pages array (as lg_switcher_pages).
lguest: extract shadow PTE walking / allocating.
lguest: make check_gpte et. al return bool.
lguest: assume Switcher text is a single page.
lguest: rename switcher_page to switcher_pages.
lguest: remove RESERVE_MEM constant.
lguest: check vaddr not pgd for Switcher protection.
lguest: prepare to make SWITCHER_ADDR a variable.
virtio: console: replace EMFILE with EBUSY for already-open port
virtio-scsi: reset virtqueue affinity when doing cpu hotplug
virtio-scsi: introduce multiqueue support
virtio-scsi: push vq lock/unlock into virtscsi_vq_done
virtio-scsi: pass struct virtio_scsi to virtqueue completion function
...
Rename module and update Kconfig and Makefile.
Add alias for compatibility with old userspace
scripts if any.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Asias He <asias@redhat.com>
Acked-by: Nicholas Bellinger <nab@linux-iscsi.org>
move uapi parts to vhost.h
move .c private parts to .c itself
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Asias He <asias@redhat.com>
Acked-by: Nicholas Bellinger <nab@linux-iscsi.org>
Move tcm_vhost.c -> scsi.c
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Asias He <asias@redhat.com>
Acked-by: Nicholas Bellinger <nab@linux-iscsi.org>
RESET_OWNER ioctl would leave the fd in a bad state if
memory allocation failed: device is stopped
but owner is not reset. Make state changes
after allocating memory, such that a failed
ioctl has no effect.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
On top of 'vhost: Allow device specific fields per vq', we can move device
specific fields to device virt queue from vhost virt queue.
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Unlike tcm_vhost_evt requests, tcm_vhost_cmd requests are passed to the
target core system, we can not make sure all the pending requests will
be finished by flushing the virt queue.
In this patch, we do refcount for every tcm_vhost_cmd requests to make
vhost_scsi_flush() wait for all the pending requests issued before the
flush operation to be finished.
This is useful when we call vhost_scsi_clear_endpoint() to stop
tcm_vhost. No new requests will be passed to target core system because
we clear the endpoint by setting vs_tpg to NULL. And we wait for all the
old requests. These guarantee no requests will be leaked and existing
requests will be completed.
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This is useful for any device who wants device specific fields per vq.
For example, tcm_vhost wants a per vq field to track requests which are
in flight on the vq. Also, on top of this we can add patches to move
things like ubufs from vhost.h out to net.c.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Everything for hotplug is ready. Let's enable the feature bit.
Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
In commit 365a715009 ([SCSI] virtio-scsi: hotplug support for
virtio-scsi), hotplug support is added to virtio-scsi.
This patch adds hotplug and hotunplug support to tcm_vhost.
You can create or delete a LUN in targetcli to hotplug or hotunplug a
LUN in guest.
Signed-off-by: Asias He <asias@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
We want to use tcm_vhost_mutex to make sure hotplug/hotunplug will not
happen when set_endpoint/clear_endpoint is in process.
Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Conflicts:
drivers/net/ethernet/emulex/benet/be_main.c
drivers/net/ethernet/intel/igb/igb_main.c
drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c
include/net/scm.h
net/batman-adv/routing.c
net/ipv4/tcp_input.c
The e{uid,gid} --> {uid,gid} credentials fix conflicted with the
cleanup in net-next to now pass cred structs around.
The be2net driver had a bug fix in 'net' that overlapped with the VLAN
interface changes by Patrick McHardy in net-next.
An IGB conflict existed because in 'net' the build_skb() support was
reverted, and in 'net-next' there was a comment style fix within that
code.
Several batman-adv conflicts were resolved by making sure that all
calls to batadv_is_my_mac() are changed to have a new bat_priv first
argument.
Eric Dumazet's TS ECR fix in TCP in 'net' conflicted with the F-RTO
rewrite in 'net-next', mostly overlapping changes.
Thanks to Stephen Rothwell and Antonio Quartulli for help with several
of these merge resolutions.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull SCSI target fixes from Nicholas Bellinger:
"Here are remaining target-pending items for v3.9-rc7 code.
The tcm_vhost patches are more than I'd usually include in a -rc7
pull, but are changes required for v3.9 to work correctly with the
pending vhost-scsi-pci QEMU upstream series merge. (Paolo CC'ed)
Plus Asias's conversion to use vhost_virtqueue->private_data + RCU for
managing vhost-scsi endpoints has gotten alot of review + testing over
the past weeks, and MST has ACKed the full series.
Also, there is a target patch to fix a long-standing bug within
control CDB handling with Standby/Offline/Transition ALUA port access
states, that had been incorrectly rejecting the control CDBs required
for LUN scan to work during these port group states. CC'ing to
stable."
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
target: Fix incorrect fallthrough of ALUA Standby/Offline/Transition CDBs
tcm_vhost: Send bad target to guest when cmd fails
tcm_vhost: Add vhost_scsi_send_bad_target() helper
tcm_vhost: Fix tv_cmd leak in vhost_scsi_handle_vq
tcm_vhost: Remove double check of response
tcm_vhost: Initialize vq->last_used_idx when set endpoint
tcm_vhost: Use vq->private_data to indicate if the endpoint is setup
tcm_vhost: Use ACCESS_ONCE for vs->vs_tpg[target] access
After commit 2b8b328b61 (vhost_net: handle polling
errors when setting backend), we in fact track the polling state through
poll->wqh, so there's no need to duplicate the work with an extra
vhost_net_polling_state. So this patch removes this and make the code simpler.
This patch also removes the all tx starting/stopping code in tx path according
to Michael's suggestion.
Netperf test shows almost the same result in stream test, but gets improvements
on TCP_RR tests (both zerocopy or copy) especially on low load cases.
Tested between multiqueue kvm guest and external host with two direct
connected 82599s.
zerocopy disabled:
sessions|transaction rates|normalize|
before/after/+improvements
1 | 9510.24/11727.29/+23.3% | 693.54/887.68/+28.0% |
25| 192931.50/241729.87/+25.3% | 2376.80/2771.70/+16.6% |
50| 277634.64/291905.76/+5% | 3118.36/3230.11/+3.6% |
zerocopy enabled:
sessions|transaction rates|normalize|
before/after/+improvements
1 | 7318.33/11929.76/+63.0% | 521.86/843.30/+61.6% |
25| 167264.88/242422.15/+44.9% | 2181.60/2788.16/+27.8% |
50| 272181.02/294347.04/+8.1% | 3071.56/3257.85/+6.1% |
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Send bad target to guest in case:
1) we can not allocate the cmd
2) fail to submit the cmd
Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Share the send bad target code with other use cases.
Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
If we fail to submit the allocated tv_vmd to tcm_vhost_submission_work,
we will leak the tv_vmd. Free tv_vmd on fail path.
Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
We did the length of response check twice.
Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This patch fixes guest hang when booting seabios and guest.
[ 0.576238] scsi0 : Virtio SCSI HBA
[ 0.616754] virtio_scsi virtio1: request:id 0 is not a head!
vq->last_used_idx is initialized only when /dev/vhost-scsi is
opened or closed.
vhost_scsi_open -> vhost_dev_init() -> vhost_vq_reset()
vhost_scsi_release() -> vhost_dev_cleanup -> vhost_vq_reset()
So, when guest talks to tcm_vhost after seabios does, vq->last_used_idx
still contains the old valule for seabios. This confuses guest.
Fix this by calling vhost_init_used() to init vq->last_used_idx when
we set endpoint.
Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Currently, vs->vs_endpoint is used indicate if the endpoint is setup or
not. It is set or cleared in vhost_scsi_set_endpoint() or
vhost_scsi_clear_endpoint() under the vs->dev.mutex lock. However, when
we check it in vhost_scsi_handle_vq(), we ignored the lock.
Instead of using the vs->vs_endpoint and the vs->dev.mutex lock to
indicate the status of the endpoint, we use per virtqueue
vq->private_data to indicate it. In this way, we can only take the
vq->mutex lock which is per queue and make the concurrent multiqueue
process having less lock contention. Further, in the read side of
vq->private_data, we can even do not take the lock if it is accessed in
the vhost worker thread, because it is protected by "vhost rcu".
(nab: Do s/VHOST_FEATURES/~VHOST_SCSI_FEATURES)
Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
In vhost_scsi_handle_vq:
tv_tpg = vs->vs_tpg[target];
if (!tv_tpg) {
....
return
}
tv_cmd = vhost_scsi_allocate_cmd(tv_tpg, &v_req,
1) vs->vs_tpg[target] might change after the NULL check and 2) the above
line might access tv_tpg from vs->vs_tpg[target]. To prevent 2), use
ACCESS_ONCE. Thanks mst for catching this up!
Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Pull SCSI target fixes from Nicholas Bellinger:
"This includes the bug-fix for a >= v3.8-rc1 regression specific to
iscsi-target persistent reservation conflict handling (CC'ed to
stable), and a tcm_vhost patch to drop VIRTIO_RING_F_EVENT_IDX usage
so that in-flight qemu vhost-scsi-pci device code can detect the
proper vhost feature bits.
Also, there are two more tcm_vhost patches still being discussed by
MST and Asias for v3.9 that will be required for the in-flight qemu
vhost-scsi-pci device patch to function properly, and that should
(hopefully) be the last target fixes for this round."
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
target: Fix RESERVATION_CONFLICT status regression for iscsi-target special case
tcm_vhost: Avoid VIRTIO_RING_F_EVENT_IDX feature bit
This patch adds a VHOST_SCSI_FEATURES mask minus VIRTIO_RING_F_EVENT_IDX
so that vhost-scsi-pci userspace will strip this feature bit once
GET_FEATURES reports it as being unsupported on the host.
This is to avoid a bug where ->handle_kicks() are missed when EVENT_IDX
is enabled by default in userspace code.
(mst: Rename to VHOST_SCSI_FEATURES + add comment)
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Asias He <asias@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Pull SCSI target fixes from Nicholas Bellinger:
"These are mostly minor fixes this time around. The iscsi-target CHAP
big-endian bugfix and bump FD_MAX_SECTORS=2048 default patch to allow
1MB sized I/Os for FILEIO backends on >= v3.5 code are both CC'ed to
stable.
Also, there is a persistent reservations regression that has recently
been reported for >= v3.8.x code, that is currently being tracked down
for v3.9."
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
target/pscsi: Reject cross page boundary case in pscsi_map_sg
target/file: Bump FD_MAX_SECTORS to 2048 to handle 1M sized I/Os
tcm_vhost: Flush vhost_work in vhost_scsi_flush()
tcm_vhost: Add missed lock in vhost_scsi_clear_endpoint()
target: fix possible memory leak in core_tpg_register()
target/iscsi: Fix mutual CHAP auth on big-endian arches
target_core_sbc: use noop for SYNCHRONIZE_CACHE
Getting use of virtio rings correct is tricky, and a recent patch saw
an implementation of in-kernel rings (as separate from userspace).
This abstracts the business of dealing with the virtio ring layout
from the access (userspace or direct); to do this, we use function
pointers, which gcc inlines correctly.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
We also need to flush the vhost_works. It is the completion vhost_work
currently.
Signed-off-by: Asias He <asias@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
tv_tpg->tv_tpg_vhost_count should be protected by tv_tpg->tv_tpg_mutex.
Signed-off-by: Asias He <asias@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
ubuf info allocator uses guest controlled head as an index,
so a malicious guest could put the same head entry in the ring twice,
and we will get two callbacks on the same value.
To fix use upend_idx which is guaranteed to be unique.
Reported-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull scsi target updates from Nicholas Bellinger:
"The highlights in this series include:
- Improve sg_table lookup scalability in RAMDISK_MCP (martin)
- Add device attribute to expose config name for INQUIRY model (tregaron)
- Convert tcm_vhost to use lock-less list for cmd completion (asias)
- Add tcm_vhost support for multiple target's per endpoint (asias)
- Add tcm_vhost support for multiple queues per vhost (asias)
- Add missing mapped_lun bounds checking during make_mappedlun setup
in generic fabric configfs code (jan engelhardt + nab)
- Enforce individual iscsi-target network portal export once per
TargetName endpoint (grover + nab)
- Add WRITE_SAME w/ UNMAP=0 emulation to FILEIO backend (nab)
Things have been mostly quiet this round, with majority of the work
being done on the iser-target WIP driver + associated iscsi-target
refactoring patches currently in flight for v3.10 code.
At this point there is one patch series left outstanding from Asias to
add support for UNMAP + WRITE_SAME w/ UNMAP=1 to FILEIO awaiting
feedback from hch & Co, that will likely be included in a post
v3.9-rc1 PULL request if there are no objections.
Also, there is a regression bug recently reported off-list that seems
to be effecting v3.5 and v3.6 kernels with MSFT iSCSI initiators that
is still being tracked down. No word if this effects >= v3.7 just
yet, but if so there will likely another PULL request coming your
way.."
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (26 commits)
target: Rename spc_get_write_same_sectors -> sbc_get_write_same_sectors
target/file: Add WRITE_SAME w/ UNMAP=0 emulation support
iscsi-target: Enforce individual network portal export once per TargetName
iscsi-target: Refactor iscsit_get_np sockaddr matching into iscsit_check_np_match
target: Add missing mapped_lun bounds checking during make_mappedlun setup
target: Fix lookup of dynamic NodeACLs during cached demo-mode operation
target: Fix parameter list length checking in MODE SELECT
target: Fix error checking for UNMAP commands
target: Fix sense data for out-of-bounds IO operations
target_core_rd: break out unterminated loop during copy
tcm_vhost: Multi-queue support
tcm_vhost: Multi-target support
target: Add device attribute to expose config_item_name for INQUIRY model
target: don't truncate the fail intr address
target: don't always say "ipv6" as address type
target/iblock: Use backend REQ_FLUSH hint for WriteCacheEnabled status
iscsi-target: make some temporary buffers larger
tcm_vhost: Optimize gup in vhost_scsi_map_to_sgl
tcm_vhost: Use iov_num_pages to calculate sgl_count
tcm_vhost: Introduce iov_num_pages
...
Here is the big driver core merge for 3.9-rc1
There are two major series here, both of which touch lots of drivers all
over the kernel, and will cause you some merge conflicts:
- add a new function called devm_ioremap_resource() to properly be
able to check return values.
- remove CONFIG_EXPERIMENTAL
If you need me to provide a merged tree to handle these resolutions,
please let me know.
Other than those patches, there's not much here, some minor fixes and
updates.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEABECAAYFAlEmV0cACgkQMUfUDdst+yncCQCfbmnQZju7kzWXk6PjdFuKspT9
weAAoMCzcAtEzzc4LXuUxxG/sXBVBCjW
=yWAQ
-----END PGP SIGNATURE-----
Merge tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core patches from Greg Kroah-Hartman:
"Here is the big driver core merge for 3.9-rc1
There are two major series here, both of which touch lots of drivers
all over the kernel, and will cause you some merge conflicts:
- add a new function called devm_ioremap_resource() to properly be
able to check return values.
- remove CONFIG_EXPERIMENTAL
Other than those patches, there's not much here, some minor fixes and
updates"
Fix up trivial conflicts
* tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits)
base: memory: fix soft/hard_offline_page permissions
drivercore: Fix ordering between deferred_probe and exiting initcalls
backlight: fix class_find_device() arguments
TTY: mark tty_get_device call with the proper const values
driver-core: constify data for class_find_device()
firmware: Ignore abort check when no user-helper is used
firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER
firmware: Make user-mode helper optional
firmware: Refactoring for splitting user-mode helper code
Driver core: treat unregistered bus_types as having no devices
watchdog: Convert to devm_ioremap_resource()
thermal: Convert to devm_ioremap_resource()
spi: Convert to devm_ioremap_resource()
power: Convert to devm_ioremap_resource()
mtd: Convert to devm_ioremap_resource()
mmc: Convert to devm_ioremap_resource()
mfd: Convert to devm_ioremap_resource()
media: Convert to devm_ioremap_resource()
iommu: Convert to devm_ioremap_resource()
drm: Convert to devm_ioremap_resource()
...
This adds virtio-scsi multi-queue support to tcm_vhost. In order to use
multi-queue, guest side multi-queue support is need. It can
be found here:
https://lkml.org/lkml/2012/12/18/166
Currently, only one thread is created by vhost core code for each
vhost_scsi instance. Even if there are multi-queues, all the handling of
guest kick (vhost_scsi_handle_kick) are processed in one thread. This is
not optimal. Luckily, most of the work is offloaded to the tcm_vhost
workqueue.
Some initial perf numbers:
1 queue, 4 targets, 1 lun per target
4K request size, 50% randread + 50% randwrite: 127K/127k IOPS
4 queues, 4 targets, 1 lun per target
4K request size, 50% randread + 50% randwrite: 181K/181k IOPS
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
In order to take advantages of Paolo's multi-queue virito-scsi, we need
multi-target support in tcm_vhost first. Otherwise all the requests go
to one queue and other queues are idle.
This patch makes:
1. All the targets under the wwpn is seen and can be used by guest.
2. No need to pass the tpgt number in struct vhost_scsi_target to
tcm_vhost.ko. Only wwpn is needed.
3. We can always pass max_target = 255 to guest now, since we abort the
request who's target id does not exist.
Changes in v2:
- Handle non-contiguous tpgt
Changes in v3:
- Simplfy lock in vhost_scsi_set_endpoint
- Return -EEXIST when does not match
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
We can get all the pages in one time instead of calling
gup N times.
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Add a helper to calculate the number of pages needed for a iov entry.
(nab: Drop unnecessary inline)
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This drops the cmd completion list spin lock and makes the cmd
completion queue lock-less.
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Pull networking fixes from David Miller:
1) Revert iwlwifi reclaimed packet tracking, it causes problems for a
bunch of folks. From Emmanuel Grumbach.
2) Work limiting code in brcmsmac wifi driver can clear tx status
without processing the event. From Arend van Spriel.
3) rtlwifi USB driver processes wrong SKB, fix from Larry Finger.
4) l2tp tunnel delete can race with close, fix from Tom Parkin.
5) pktgen_add_device() failures are not checked at all, fix from Cong
Wang.
6) Fix unintentional removal of carrier off from tun_detach(),
otherwise we confuse userspace, from Michael S. Tsirkin.
7) Don't leak socket reference counts and ubufs in vhost-net driver,
from Jason Wang.
8) vmxnet3 driver gets it's initial carrier state wrong, fix from Neil
Horman.
9) Protect against USB networking devices which spam the host with 0
length frames, from Bjørn Mork.
10) Prevent neighbour overflows in ipv6 for locally destined routes,
from Marcelo Ricardo. This is the best short-term fix for this, a
longer term fix has been implemented in net-next.
11) L2TP uses ipv4 datagram routines in it's ipv6 code, whoops. This
mistake is largely because the ipv6 functions don't even have some
kind of prefix in their names to suggest they are ipv6 specific.
From Tom Parkin.
12) Check SYN packet drops properly in tcp_rcv_fastopen_synack(), from
Yuchung Cheng.
13) Fix races and TX skb freeing bugs in via-rhine's NAPI support, from
Francois Romieu and your's truly.
14) Fix infinite loops and divides by zero in TCP congestion window
handling, from Eric Dumazet, Neal Cardwell, and Ilpo Järvinen.
15) AF_PACKET tx ring handling can leak kernel memory to userspace, fix
from Phil Sutter.
16) Fix error handling in ipv6 GRE tunnel transmit, from Tommi Rantala.
17) Protect XEN netback driver against hostile frontend putting garbage
into the rings, don't leak pages in TX GOP checking, and add proper
resource releasing in error path of xen_netbk_get_requests(). From
Ian Campbell.
18) SCTP authentication keys should be cleared out and released with
kzfree(), from Daniel Borkmann.
19) L2TP is a bit too clever trying to maintain skb->truesize, and ends
up corrupting socket memory accounting to the point where packet
sending is halted indefinitely. Just remove the adjustments
entirely, they aren't really needed. From Eric Dumazet.
20) ATM Iphase driver uses a data type with the same name as the S390
headers, rename to fix the build. From Heiko Carstens.
21) Fix a typo in copying the inner network header offset from one SKB
to another, from Pravin B Shelar.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (56 commits)
net: sctp: sctp_endpoint_free: zero out secret key data
net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree
atm/iphase: rename fregt_t -> ffreg_t
net: usb: fix regression from FLAG_NOARP code
l2tp: dont play with skb->truesize
net: sctp: sctp_auth_key_put: use kzfree instead of kfree
netback: correct netbk_tx_err to handle wrap around.
xen/netback: free already allocated memory on failure in xen_netbk_get_requests
xen/netback: don't leak pages on failure in xen_netbk_tx_check_gop.
xen/netback: shutdown the ring if it contains garbage.
net: qmi_wwan: add more Huawei devices, including E320
net: cdc_ncm: add another Huawei vendor specific device
ipv6/ip6_gre: fix error case handling in ip6gre_tunnel_xmit()
tcp: fix for zero packets_in_flight was too broad
brcmsmac: rework of mac80211 .flush() callback operation
ssb: unregister gpios before unloading ssb
bcma: unregister gpios before unloading bcma
rtlwifi: Fix scheduling while atomic bug
net: usbnet: fix tx_dropped statistics
tcp: ipv6: Update MIB counters for drops
...
It's OK to get kick before backend is set or after
it is cleared, we can just ignore it.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Currently, the polling errors were ignored, which can lead following issues:
- vhost remove itself unconditionally from waitqueue when stopping the poll,
this may crash the kernel since the previous attempt of starting may fail to
add itself to the waitqueue
- userspace may think the backend were successfully set even when the polling
failed.
Solve this by:
- check poll->wqh before trying to remove from waitqueue
- report polling errors in vhost_poll_start(), tx_poll_start(), the return value
will be checked and returned when userspace want to set the backend
After this fix, there still could be a polling failure after backend is set, it
will addressed by the next patch.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, when vhost_init_used() fails the sock refcnt and ubufs were
leaked. Correct this by calling vhost_init_used() before assign ubufs and
restore the oldsock when it fails.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull target updates from Nicholas Bellinger:
"It has been a very busy development cycle this time around in target
land, with the highlights including:
- Kill struct se_subsystem_dev, in favor of direct se_device usage
(hch)
- Simplify reservations code by combining SPC-3 + SCSI-2 support for
virtual backends only (hch)
- Simplify ALUA code for virtual only backends, and remove left over
abstractions (hch)
- Pass sense_reason_t as return value for I/O submission path (hch)
- Refactor MODE_SENSE emulation to allow for easier addition of new
mode pages. (roland)
- Add emulation of MODE_SELECT (roland)
- Fix bug in handling of ExpStatSN wrap-around (steve)
- Fix bug in TMR ABORT_TASK lookup in qla2xxx target (steve)
- Add WRITE_SAME w/ UNMAP=0 support for IBLOCK backends (nab)
- Convert ib_srpt to use modern target_submit_cmd caller + drop
legacy ioctx->kref usage (nab)
- Convert ib_srpt to use modern target_submit_tmr caller (nab)
- Add link_magic for fabric allow_link destination target_items for
symlinks within target_core_fabric_configfs.c code (nab)
- Allocate pointers in instead of full structs for
config_group->default_groups (sebastian)
- Fix 32-bit highmem breakage for FILEIO (sebastian)
All told, hch was able to shave off another ~1K LOC by killing the
se_subsystem_dev abstraction, along with a number of PR + ALUA
simplifications. Also, a nice patch by Roland is the refactoring of
MODE_SENSE handling, along with the addition of initial MODE_SELECT
emulation support for virtual backends.
Sebastian found a long-standing issue wrt to allocation of full
config_group instead of pointers for config_group->default_group[]
setup in a number of areas, which ends up saving memory with big
configurations. He also managed to fix another long-standing BUG wrt
to broken 32-bit highmem support within the FILEIO backend driver.
Thank you again to everyone who contributed this round!"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (50 commits)
target/iscsi_target: Add NodeACL tags for initiator group support
target/tcm_fc: fix the lockdep warning due to inconsistent lock state
sbp-target: fix error path in sbp_make_tpg()
sbp-target: use simple assignment in tgt_agent_rw_agent_state()
iscsi-target: use kstrdup() for iscsi_param
target/file: merge fd_do_readv() and fd_do_writev()
target/file: Fix 32-bit highmem breakage for SGL -> iovec mapping
target: Add link_magic for fabric allow_link destination target_items
ib_srpt: Convert TMR path to target_submit_tmr
ib_srpt: Convert I/O path to target_submit_cmd + drop legacy ioctx->kref
target: Make spc_get_write_same_sectors return sector_t
target/configfs: use kmalloc() instead of kzalloc() for default groups
target/configfs: allocate only 6 slots for dev_cg->default_groups
target/configfs: allocate pointers instead of full struct for default_groups
target: update error handling for sbc_setup_write_same()
iscsit: use GFP_ATOMIC under spin lock
iscsi_target: Remove redundant null check before kfree
target/iblock: Forward declare bio helpers
target: Clean up flow in transport_check_aborted_status()
target: Clean up logic in transport_put_cmd()
...
Pull trivial branch from Jiri Kosina:
"Usual stuff -- comment/printk typo fixes, documentation updates, dead
code elimination."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
HOWTO: fix double words typo
x86 mtrr: fix comment typo in mtrr_bp_init
propagate name change to comments in kernel source
doc: Update the name of profiling based on sysfs
treewide: Fix typos in various drivers
treewide: Fix typos in various Kconfig
wireless: mwifiex: Fix typo in wireless/mwifiex driver
messages: i2o: Fix typo in messages/i2o
scripts/kernel-doc: check that non-void fcts describe their return value
Kernel-doc: Convention: Use a "Return" section to describe return values
radeon: Fix typo and copy/paste error in comments
doc: Remove unnecessary declarations from Documentation/accounting/getdelays.c
various: Fix spelling of "asynchronous" in comments.
Fix misspellings of "whether" in comments.
eisa: Fix spelling of "asynchronous".
various: Fix spelling of "registered" in comments.
doc: fix quite a few typos within Documentation
target: iscsi: fix comment typos in target/iscsi drivers
treewide: fix typo of "suport" in various comments and Kconfig
treewide: fix typo of "suppport" in various comments
...
The variable se_sess is initialized but never used
otherwise, so remove the unused variable.
dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Zero copy TX has been around for a while now.
We seem to be down to eliminating theoretical bugs
and performance tuning at this point:
it's probably time to enable it by default so that
most users get the benefit.
Keep the flag around meanwhile so users can experiment
with disabling this if they experience regressions.
I expect that we will remove it in the future.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
For short packets zerocopy mode adds overhead
of managing heads which isn't necessary: we
could simly update used ring directly
same as with zerocopy disabled.
Things seem to run a bit faster if we detect
and bypass head management when zcopy isn't used.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When memory map changes, we need to flush outstanding
DMAs as they might in theory reference old memory addresses.
To do this simply stop initiating new DMAs
and wait for ubufs ref count to drop to 0.
Afterwards reset the count back to 1.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vring changes already do a flush internally where appropriate, so we do
not need a second flush.
It's currently not very expensive but a follow-up patch makes flush more
heavy-weight, so remove the extra flush here to avoid regressing
performance if call or kick fds are changed on data path.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
These packet counters are used to drive the zercopy
selection heuristic so nothing too bad happens if they are off a bit -
and they are also reset once in a while.
But it's cleaner to clear them when backend is set so that
we start in a known state.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a single descriptor crosses a region, the
second chunk length should be decremented
by size translated so far, instead it includes
the full descriptor length.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>