If IPoIB fails to look up a path record (eg if it tries during an SM
failover when one SM is dead but the new one hasn't taken over yet), the
driver ends up with a neighbour structure but no address handle (AH).
There's no mechanism to recover from this: any further packets sent to
this destination will be silently dumped in ipoib_start_xmit().
Fix this by freeing the neighbour structures when a path rec query
fails, so that the next packet queued to be sent will trigger a new path
record query.
Signed-off-by: Roland Dreier <roland@purestorage.com>
If an SRP target is no longer reachable and srp_reset_host() fails to
reconnect then ib_srp will invoke scsi_remove_host(). That function
will invoke __scsi_remove_device() for each LUN. And that last
function will change the device state from SDEV_TRANSPORT_OFFLINE into
SDEV_CANCEL. Certain user space software, e.g. older versions of
multipathd, continue queueing I/O to SCSI devices that are in the
SDEV_CANCEL state.
If these I/O requests are submitted as SG_IO that means that the
REQ_PREEMPT flag will be set and hence that these requests will be
passed to srp_queuecommand(). These requests will time out. If new
requests are queued fast enough from user space these active requests
will prevent __scsi_remove_device() to finish.
Avoid this by failing I/O requests in the SDEV_CANCEL state if the
transport is offline. Introduce a new variable to keep track of the
transport state instead of failing requests if (!target->connected ||
target->qp_in_error), so that the SCSI error handler has a chance to
retry commands after a transport layer failure occurred.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: <stable@vger.kernel.org> # 3.8
Signed-off-by: Roland Dreier <roland@purestorage.com>
If a SCSI command times out it is passed to the SCSI error
handler. The SCSI error handler will try to abort the commands that
timed out. If aborting fails, a device reset will be attempted. If
the device reset also fails a host reset will be attempted. If the
host reset also fails the whole procedure will be repeated.
srp_abort() and srp_reset_device() fail for a QP in the error state.
srp_reset_host() fails after host removal has started. Hence if the
SCSI error handler gets invoked after host removal has started and
with the QP in the error state an endless loop will be triggered.
Modify the SCSI error handling functions in ib_srp as follows:
- Abort SCSI commands properly even if the QP is in the error state.
- Make srp_reset_host() reset SCSI requests even after host removal
has already started or if reconnecting fails.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dave@thedillows.org>
Cc: <stable@vger.kernel.org> # 3.8
Signed-off-by: Roland Dreier <roland@purestorage.com>
Do not send a task management function if sending will fail anyway
because either there is no RDMA/RC connection or the QP is in the
error state.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dave@thedillows.org>
Cc: <stable@vger.kernel.org> # 3.8
Signed-off-by: Roland Dreier <roland@purestorage.com>
Remove an assignment that incorrectly overwrites the connection state
update by srp_connect_target().
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dave@thedillows.org>
Cc: <stable@vger.kernel.org> # 3.8
Signed-off-by: Roland Dreier <roland@purestorage.com>
Reuse the "SG unaligned for FMR" driver flow to make the initiator
functional when running over driver instance which doesn't support
FMRs, such as a mlx4 virtual function.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Under IO/CPU stress its possible that the FMR pool might not have a
free FMR mapping element for iSER to use because of incomplete
background unmapping processing. In that case we get -EAGAIN and the
IO is pushed back to the SCSI layer which soon retries it. No need to
be so verbose about that.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
ISER_DEF_CMD_PER_LUN was meant to be ISCSI_DEF_XMIT_CMDS_MAX, not plain 128
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
If the ipoib client info isn't found on the _remove_one callback, we
must not attempt to scan the returned null list. Found by Coverity.
Signed-off-by: Itai Garbi <igarbi@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Implement version info as well as report firmware version and bus info
of the underlying IB HW device.
Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The hash function introduced in commit b63b70d877 ("IPoIB: Use a
private hash table for path lookup in xmit path") was designd to use
the 3 octets of the IPoIB HW address that holds the remote QPN.
However, this currently isn't the case on little-endian machines,
because the the code there uses the flags part (octet[0]) and not the
last octet of the QPN (octet[3]). Fix this.
The fix caused a checkpatch warning on line over 80 characters, to
solve that changed the name of the temp variable that holds the daddr.
Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
After commit b13912bbb4 ("IPoIB: Call skb_dst_drop() once skb is
enqueued for sending"), using connected mode and running multithreaded
iperf for long time, ie
iperf -c <IP> -P 16 -t 3600
results in a crash.
After the above-mentioned patch, the driver is calling skb_orphan() and
skb_dst_drop() after calling post_send() in ipoib_cm.c::ipoib_cm_send()
(also in ipoib_ib.c::ipoib_send())
The problem with this is, as is written in a comment in both routines,
"it's entirely possible that the completion handler will run before we
execute anything after the post_send()." This leads to running the
skb cleanup routines simultaneously in two different contexts.
The solution is to always perform the skb_orphan() and skb_dst_drop()
before queueing the send work request. If an error occurs, then it
will be no different than the regular case where dev_free_skb_any() in
the completion path, which is assumed to be after these two routines.
Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Currently, IPoIB delays collecting send completions for TX packets in
order to batch work more efficiently. It does skb_orphan() right after
queuing the packets so that destructors run early, to avoid problems
like holding socket send buffers for too long (since we might not
collect a send completion until a long time after the packet is
actually sent).
However, IPoIB clears IFF_XMIT_DST_RELEASE because it actually looks
at skb_dst() to update the PMTU when it gets a too-long packet. This
means that the packets sitting in the TX ring with uncollected send
completions are holding a reference on the dst. We've seen this lead
to pathological behavior with respect to route and neighbour GC. The
easy fix for this is to call skb_dst_drop() when we call skb_orphan().
Also, give packets sent via connected mode (CM) the same skb_orphan()
/ skb_dst_drop() treatment that packets sent via datagram mode get.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Pull target updates from Nicholas Bellinger:
"It has been a very busy development cycle this time around in target
land, with the highlights including:
- Kill struct se_subsystem_dev, in favor of direct se_device usage
(hch)
- Simplify reservations code by combining SPC-3 + SCSI-2 support for
virtual backends only (hch)
- Simplify ALUA code for virtual only backends, and remove left over
abstractions (hch)
- Pass sense_reason_t as return value for I/O submission path (hch)
- Refactor MODE_SENSE emulation to allow for easier addition of new
mode pages. (roland)
- Add emulation of MODE_SELECT (roland)
- Fix bug in handling of ExpStatSN wrap-around (steve)
- Fix bug in TMR ABORT_TASK lookup in qla2xxx target (steve)
- Add WRITE_SAME w/ UNMAP=0 support for IBLOCK backends (nab)
- Convert ib_srpt to use modern target_submit_cmd caller + drop
legacy ioctx->kref usage (nab)
- Convert ib_srpt to use modern target_submit_tmr caller (nab)
- Add link_magic for fabric allow_link destination target_items for
symlinks within target_core_fabric_configfs.c code (nab)
- Allocate pointers in instead of full structs for
config_group->default_groups (sebastian)
- Fix 32-bit highmem breakage for FILEIO (sebastian)
All told, hch was able to shave off another ~1K LOC by killing the
se_subsystem_dev abstraction, along with a number of PR + ALUA
simplifications. Also, a nice patch by Roland is the refactoring of
MODE_SENSE handling, along with the addition of initial MODE_SELECT
emulation support for virtual backends.
Sebastian found a long-standing issue wrt to allocation of full
config_group instead of pointers for config_group->default_group[]
setup in a number of areas, which ends up saving memory with big
configurations. He also managed to fix another long-standing BUG wrt
to broken 32-bit highmem support within the FILEIO backend driver.
Thank you again to everyone who contributed this round!"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (50 commits)
target/iscsi_target: Add NodeACL tags for initiator group support
target/tcm_fc: fix the lockdep warning due to inconsistent lock state
sbp-target: fix error path in sbp_make_tpg()
sbp-target: use simple assignment in tgt_agent_rw_agent_state()
iscsi-target: use kstrdup() for iscsi_param
target/file: merge fd_do_readv() and fd_do_writev()
target/file: Fix 32-bit highmem breakage for SGL -> iovec mapping
target: Add link_magic for fabric allow_link destination target_items
ib_srpt: Convert TMR path to target_submit_tmr
ib_srpt: Convert I/O path to target_submit_cmd + drop legacy ioctx->kref
target: Make spc_get_write_same_sectors return sector_t
target/configfs: use kmalloc() instead of kzalloc() for default groups
target/configfs: allocate only 6 slots for dev_cg->default_groups
target/configfs: allocate pointers instead of full struct for default_groups
target: update error handling for sbc_setup_write_same()
iscsit: use GFP_ATOMIC under spin lock
iscsi_target: Remove redundant null check before kfree
target/iblock: Forward declare bio helpers
target: Clean up flow in transport_check_aborted_status()
target: Clean up logic in transport_put_cmd()
...
Make it possible to disconnect the IB RC connection used by the SRP
protocol to communicate with a target.
Have the SRP transport layer create a sysfs "delete" attribute for
initiator drivers that support this functionality.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Now that SRP recreates the CM ID, QP, and CQ for each connection,
there is no need to wait for the timewait state to complete.
Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
HW QP FATAL errors persist over a reset operation, but we can recover
from that by recreating the QP and associated CQs for each connection.
Creating a new QP/CQ also completely forecloses any possibility of
getting stale completions or packets on the new connection.
Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
[ updated to current code from OFED, cleaned up commit message ]
Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Only queue removal work after having changed the target state
into SRP_TARGET_REMOVED and not if that state was already equal
to SRP_TARGET_REMOVED. That allows us to remove the state
SRP_TARGET_DEAD. Add a call to srp_disconnect_target() in
srp_remove_target() -- due to previous changes it is now safe to
invoke that function even if the IB connection has already
been disconnected. This change allows us to replace the target
removal code in srp_remove_one() by an (indirect) call to
srp_remove_target(). Rename srp_target_port.work into
srp_target_port.remove_work to reflect its usage.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Keep track of the connection state. Only report QP errors while
connected. Only invoke ib_send_cm_dreq() when connected so that
invoking srp_disconnect_target() after having received a DREQ does not
cause an error message to be printed.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
If the RDMA RC connection is closed, tell the SCSI mid-layer to
terminate all pending commands instead of only the first.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Introduce the function srp_handle_qp_err(), change the type of
qp_in_error from int into bool and move the initialization of that
variable from srp_reconnect_target() to srp_connect_target().
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Since scsi_remove_host() has been modified so that SCSI error handling
functions will no longer be invoked after scsi_remove_host() returns,
the test at the start of srp_send_tsk_mgmt() is now superfluous.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Some SCSI upper layer drivers, e.g. sd, issue SCSI commands from
inside scsi_remove_host() (see the sd_shutdown() call in sd_remove()).
Make sure that these commands have a chance to reach the SCSI device.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Block the SCSI host while reconnecting instead of representing the
reconnection activity as a distinct SRP target state. This allows us
to eliminate the target state SRP_TARGET_CONNECTING.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Increase the block layer timeout for disks so that it is above the
InfiniBand transport layer timeout.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch converts the TMR path in srpt_handle_tsk_mgmt() to use
target_submit_tmr() with TARGET_SCF_ACK_KREF flag usage.
v2: Drop ununused res in target_submit_tmr (Fengguang.Wu)
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Roland Dreier <roland@kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This patch converts the main srpt_handle_cmd() I/O path to use modern
target_submit_cmd() with TARGET_SCF_ACK_KREF flag usage. This includes
dropping the original internal ioctx->kref + srpt_put_send_ioctx() usage
in favor of target_put_sess_cmd() w/ se_cmd_t->cmd_kref within ib_srpt
response callbacks.
It also updates srpt_abort_cmd() to call target_put_sess_cmd() for
completion of aborted commands, and adds target_wait_for_sess_cmds() into
srpt_release_channel_work() to allow outstanding I/O to complete during
session shutdown.
Also, go ahead and update srpt_handle_tsk_mgmt() to make the remaining
transport_init_se_cmd() to setup the ioctx->cmd with se_tmr_req.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Roland Dreier <roland@kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Pass the sense reason as an explicit return value from the I/O submission
path instead of storing it in struct se_cmd and using negative return
values. This cleans up a lot of the code pathes, and with the sparse
annotations for the new sense_reason_t type allows for much better
error checking.
(nab: Convert spc_emulate_modesense + spc_emulate_modeselect to use
sense_reason_t with Roland's MODE SELECT changes)
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Pull scsi target updates from Nicholas Bellinger:
"Things have been calm for the most part with no new fabric drivers in
flight for v3.7 (we're up to eight now !), so this update is primarily
focused on addressing a few long-standing items within target-core and
iscsi-target fabric code.
The highlights include:
- target: Simplify fabric sense data length handling (roland)
- qla2xxx: Fix endianness of task management response code (roland)
- target: fix truncation of mode data, support zero allocation length
(paolo)
- target: Properly support zero-length commands in normal processing
path (paolo)
- iscsi-target: Correctly set 0xffffffff field within ISCSI_OP_REJECT
PDU (ronnie + nab)
- iscsi-target: Add explicit set of cache_dynamic_acls=1 for TPG
demo-mode (ronnie + nab)
- target/file: Re-enable optional fd_buffered_io=1 operation (nab +
hch)
- iscsi-target: Add MaxXmitDataSegmenthLength forr target ->
initiator MDRSL declaration (nab)
- target: Add target_submit_cmd_map_sgls for SGL fabric memory
passthrough (nab + hch)
- tcm_loop: Convert I/O path to use target_submit_cmd_map_sgls (hch +
nab)
- tcm_vhost: Convert I/O path to use target_submit_cmd_map_sgls (nab
+ hch)
The last series for adding a new target_submit_cmd_map_sgls() fabric
caller (as requested by hch) that accepts pre-allocated SGL memory
(using existing logic), along with converting tcm_loop + tcm_vhost has
only been in -next for the last days, but has gotten enough review
+testing and is clear enough a mechanical change that I think it's
reasonable to merge for -rc1 code.
Thanks again to everyone who contributed this round! Extra special
thanks to Roland (PureStorage) for tracking down the qla2xxx target
TMR response code endian issue, and to Paolo (Redhat) for resolving
the long standing zero-length CDB issues within target-core between
virtual and pSCSI backends."
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (44 commits)
iscsi-target: Bump defaults for nopin_timeout + nopin_response_timeout values
iscsit: proper endianess conversions
iscsit: use the itt_t abstract type
iscsit: add missing endianess conversion in iscsit_check_inaddr_any
iscsit: remove incorrect unlock in iscsit_build_sendtargets_resp
iscsit: mark various functions static
target/iscsi: precedence bug in iscsit_set_dataout_sequence_values()
target/usb-gadget: strlen() doesn't count the terminator
target/usb-gadget: remove duplicate initialization
tcm_vhost: Convert I/O path to use target_submit_cmd_map_sgls
target: Add control CDB READ payload zero work-around
tcm_loop: Convert I/O path to use target_submit_cmd_map_sgls
target: Add target_submit_cmd_map_sgls for SGL fabric memory passthrough
iscsi-target: Add explicit set of cache_dynamic_acls=1 for TPG demo-mode
iscsi-target: Change iscsi_target_seq_pdu_list.c to honor MaxXmitDataSegmentLength
iscsi-target: Add MaxXmitDataSegmentLength connection recovery check
iscsi-target: Convert incoming PDU payload checks to MaxXmitDataSegmentLength
iscsi-target: Enable MaxXmitDataSegmentLength operation in login path
iscsi-target: Add base MaxXmitDataSegmentLength code
target/file: Re-enable optional fd_buffered_io=1 operation
...
RX/TX CQs will now be selected from a per HCA pool. For the RX flow
this has the effect of using different interrupt vectors when using
low level drivers (such as mlx4) that map the "vector" param provided
by the ULP on CQ creation to a dedicated IRQ/MSI-X vector. This
allows the RX flow processing of IO responses to be distributed across
multiple CPUs.
QPs (--> iSER sessions) are assigned to CQs in round robin order using
the CQ with the minimum number of sessions attached to it.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
With the new netlink support in commit 862096a8bb ("IB/ipoib: Add more
rtnl_link_ops callbacks") we need ipoib_set_mode() to be available even
if connected mode isn't built. Move the function from ipoib_cm.c to
ipoib_main.c (and make a few CM-related macros available unconditonally).
This fixes the build error
drivers/built-in.o: In function 'ipoib_changelink':
ipoib_netlink.c:(.text+0x6a5fc9): undefined reference to 'ipoib_set_mode'
ipoib_netlink.c:(.text+0x6a5fe3): undefined reference to 'ipoib_set_mode'
when CONFIG_INFINIBAND_IPOIB_CM isn't set.
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Reported-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
- mlx4 IB support for SR-IOV
- A couple of SRP initiator fixes
- Batch of nes hardware driver fixes
- Fix for long-standing use-after-free crash in IPoIB
- Other miscellaneous fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABCAAGBQJQav4jAAoJEENa44ZhAt0hmL0QAJTuMdSOzYFd/NB38owJCNM2
kz/N1GlBm3z98fIlGo8u+lzgV2qxqZSAzsJsouMeK38KiAX3CL8HKe44A1QvTM6v
dXTNL4JFX24/YF+nlmMY8Av518I9Mkte3BZCnpYkBjVFBWe0ePwoRC/btfBXPDIV
0snq4OtjoBAn00dOOyuZ5PoyY9xf0z4UB0Gple9sM4mzEb8wVWdNDDPOiuPJc6fA
L+gk6HLkZDg54+QswafdKYwpeTq45wIKLmCdS3oUNmppMLVhZY8rECOwzSa+KiTr
/Yo19n+zl+IBlvjQHhmUqGHvdD17PaGlr+TckAsQqmVfXUH5qqpEnkF8FoEK59c5
YA3lVU8Sj4BPhJ7qX54CuN3767mZizakkxCr9iPRzABFTgzWVgcSgCrE8jjx4i0h
Pam+L5bmANFStgmGR8PmXiNgcrCUcEqYHsOWDDAnHa5ekb2nyv1JL1c18hlY9hC3
Xb1YTMZFwvofGza89hBu7oHrMbLOUc5kW2lBpvUn2nlyf3i0F8ISlVbVbNjFA54p
60/jHa2VOQ2CcJUJKnJOk4ajOOEfHnPtMn2q96XJ69Dp8+eSYEO/G+0i1OlChq4h
ClnG0Yp+NkT1o8WXMd7guDR+RsXt+DXIij5TiUWRIqnIlopIsMTRhNH28tMu4jQL
fgN5n987wru91ewdX4gW
=PAcy
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull infiniband updates from Roland Dreier:
"First batch of InfiniBand/RDMA changes for the 3.7 merge window:
- mlx4 IB support for SR-IOV
- A couple of SRP initiator fixes
- Batch of nes hardware driver fixes
- Fix for long-standing use-after-free crash in IPoIB
- Other miscellaneous fixes"
This merge also removes a new use of __cancel_delayed_work(), and
replaces it with the regular cancel_delayed_work() that is now irq-safe
thanks to the workqueue updates.
That said, I suspect the sequence in question should probably use
"mod_delayed_work()". I just did the minimal "don't use deprecated
functions" fixup, though.
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (45 commits)
IB/qib: Fix local access validation for user MRs
mlx4_core: Disable SENSE_PORT for multifunction devices
mlx4_core: Clean up enabling of SENSE_PORT for older (ConnectX-1/-2) HCAs
mlx4_core: Stash PCI ID driver_data in mlx4_priv structure
IB/srp: Avoid having aborted requests hang
IB/srp: Fix use-after-free in srp_reset_req()
IB/qib: Add a qib driver version
RDMA/nes: Fix compilation error when nes_debug is enabled
RDMA/nes: Print hardware resource type
RDMA/nes: Fix for crash when TX checksum offload is off
RDMA/nes: Cosmetic changes
RDMA/nes: Fix for incorrect MSS when TSO is on
RDMA/nes: Fix incorrect resolving of the loopback MAC address
mlx4_core: Fix crash on uninitialized priv->cmd.slave_sem
mlx4_core: Trivial cleanups to driver log messages
mlx4_core: Trivial readability fix: "0X30" -> "0x30"
IB/mlx4: Create paravirt contexts for VFs when master IB driver initializes
mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations
mlx4: Paravirtualize Node Guids for slaves
mlx4: Activate SR-IOV mode for IB
...
Add the rtnl_link_ops changelink and fill_info callbacks, through
which the admin can now set/get the driver mode, etc policies.
Maintain the proprietary sysfs entries only for legacy childs.
For child devices, set dev->iflink to point to the parent
device ifindex, such that user space tools can now correctly
show the uplink relation as done for vlan, macvlan, etc
devices. Pointed out by Patrick McHardy <kaber@trash.net>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to call scsi_done() for commands after we abort them.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
srp_free_req() uses the scsi_cmnd structure contents to unmap
buffers, so we must invoke srp_free_req() before we release
ownership of that structure.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Fix a crash in ipoib_mcast_join_task(). (with help from Or Gerlitz)
Commit c8c2afe360 ("IPoIB: Use rtnl lock/unlock when changing device
flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which
is run from the ipoib_workqueue, and hence the workqueue can't be
flushed from the context of ipoib_stop().
In the current code, ipoib_stop() (which doesn't flush the workqueue)
calls ipoib_mcast_dev_flush(), which goes and deletes all the
multicast entries. This takes place without any synchronization with
a possible running instance of ipoib_mcast_join_task() for the same
ipoib device, leading to a crash due to NULL pointer dereference.
Fix this by making sure that the workqueue is flushed before
ipoib_mcast_dev_flush() is called. To make that possible, we move the
RTNL-lock wrapped code to ipoib_mcast_join_finish().
Signed-off-by: Patrick McHardy <kaber@trash.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Conflicts:
drivers/net/team/team.c
drivers/net/usb/qmi_wwan.c
net/batman-adv/bat_iv_ogm.c
net/ipv4/fib_frontend.c
net/ipv4/route.c
net/l2tp/l2tp_netlink.c
The team, fib_frontend, route, and l2tp_netlink conflicts were simply
overlapping changes.
qmi_wwan and bat_iv_ogm were of the "use HEAD" variety.
With help from Antonio Quartulli.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add rtnl_link_ops to IPoIB, with the first usage being child device
create/delete through them. Childs devices are now either legacy ones,
created/deleted through the ipoib sysfs entries, or RTNL ones.
Adding support for RTNL childs involved refactoring of ipoib_vlan_add
which is now used by both the sysfs and the link_ops code.
Also, added ndo_uninit entry to support calling unregister_netdevice_queue
from the rtnl dellink entry. This required removal of calls to
ipoib_dev_cleanup from the driver in flows which use unregister_netdevice,
since the networking core will invoke ipoib_uninit which does exactly that.
Signed-off-by: Erez Shitrit <erezsh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Every fabric driver has to supply a se_tfo->set_fabric_sense_len()
method, just so iSCSI can return an offset of 2. However, every fabric
driver is already allocating a sense buffer and passing it into the
target core, either via transport_init_se_cmd() or target_submit_cmd().
So instead of having iSCSI pass the start of its sense buffer into the
core and then later tell the core to skip the first 2 bytes, it seems
easier for iSCSI just to do the offset of 2 when it passes the sense
buffer into the core. Then we can drop the se_tfo->set_fabric_sense_len()
everywhere, and just add a couple of lines of code to iSCSI to set the
sense data length to the beginning of the buffer right before it sends
it over the network.
(nab: Remove .set_fabric_sense_len usage from tcm_qla2xxx_npiv_ops +
change transport_get_sense_buffer to follow v3.6-rc6 code w/o
->set_fabric_sense_len usage)
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
There are no callers of se_tfo->get_fabric_sense_len(), so we should
stop having every fabric driver implement it.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Lockdep points out a circular locking dependency betwwen the ipoib
device priv spinlock (priv->lock) and the neighbour table rwlock
(ntbl->rwlock).
In the normal path, ie neigbour garbage collection task, the neigh
table rwlock is taken first and then if the neighbour needs to be
deleted, priv->lock is taken.
However in some error paths, such as in ipoib_cm_handle_tx_wc(),
priv->lock is taken first and then ipoib_neigh_free routine is called
which in turn takes the neighbour table ntbl->rwlock.
The solution is to get rid the neigh table rwlock completely and use
only priv->lock.
Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
If the neighbours hash table is empty when unloading the module, then
ipoib_flush_neighs(), the cleanup routine, isn't called and the
memory used for the hash table itself leaked.
To fix this, ipoib_flush_neighs() is allways called, and another
completion object is added to signal when the table is freed.
Once invoked, ipoib_flush_neighs() flushes all the neighbours (if
there are any), calls the the hash table RCU free routine, which now
signals completion of the deletion process, and waits for the last
neighbour to be freed.
Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Avoid a crash caused by the scmnd->scsi_done(scmnd) call in
srp_process_rsp() being invoked with scsi_done == NULL. This can
happen if a reply is received during or after a command abort.
Reported-by: Joseph Glanville <joseph.glanville@orionvm.com.au>
Reference: http://marc.info/?l=linux-rdma&m=134314367801595
Cc: <stable@vger.kernel.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Correct spelling typos in comments in drivers/infiniband.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Commit b63b70d877 ("IPoIB: Use a private hash table for path lookup
in xmit path") introduced a bug where in ipoib_neigh_free() (which is
called from a few errors flows in the driver), rcu_dereference() is
invoked with the wrong pointer object, which results in a crash.
Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Commit b63b70d877 ("IPoIB: Use a private hash table for path lookup
in xmit path") introduced a bug where in ipoib_cm_destroy_tx() a CM
object is moved between lists without any supported locking. Under a
stress test, this eventually leads to list corruption and a crash.
Previously when this routine was called, callers were taking the
device priv lock. Currently this function is called from the RCU
callback associated with neighbour deletion. Fix the race by taking
the same lock we used to before.
Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Dave Miller <davem@davemloft.net> provided a detailed description of
why the way IPoIB is using neighbours for its own ipoib_neigh struct
is buggy:
Any time an ipoib_neigh is changed, a sequence like the following is made:
spin_lock_irqsave(&priv->lock, flags);
/*
* It's safe to call ipoib_put_ah() inside
* priv->lock here, because we know that
* path->ah will always hold one more reference,
* so ipoib_put_ah() will never do more than
* decrement the ref count.
*/
if (neigh->ah)
ipoib_put_ah(neigh->ah);
list_del(&neigh->list);
ipoib_neigh_free(dev, neigh);
spin_unlock_irqrestore(&priv->lock, flags);
ipoib_path_lookup(skb, n, dev);
This doesn't work, because you're leaving a stale pointer to the freed up
ipoib_neigh in the special neigh->ha pointer cookie. Yes, it even fails
with all the locking done to protect _changes_ to *ipoib_neigh(n), and
with the code in ipoib_neigh_free() that NULLs out the pointer.
The core issue is that read side calls to *to_ipoib_neigh(n) are not
being synchronized at all, they are performed without any locking. So
whether we hold the lock or not when making changes to *ipoib_neigh(n)
you still can have threads see references to freed up ipoib_neigh
objects.
cpu 1 cpu 2
n = *ipoib_neigh()
*ipoib_neigh() = NULL
kfree(n)
n->foo == OOPS
[..]
Perhaps the ipoib code can have a private path database it manages
entirely itself, which holds all the necessary information and is
looked up by some generic key which is available easily at transmit
time and does not involve generic neighbour entries.
See <http://marc.info/?l=linux-rdma&m=132812793105624&w=2> and
<http://marc.info/?l=linux-rdma&w=2&r=1&s=allows+references+to+freed+memory&q=b>
for the full discussion.
This patch aims to solve the race conditions found in the IPoIB driver.
The patch removes the connection between the core networking neighbour
structure and the ipoib_neigh structure. In addition to avoiding the
race described above, it allows us to handle SKBs carrying IP packets
that don't have any associated neighbour.
We add an ipoib_neigh hash table with N buckets where the key is the
destination hardware address. The ipoib_neigh is fetched from the
hash table and instead of the stashed location in the neighbour
structure. The hash table uses both RCU and reference counting to
guarantee that no ipoib_neigh instance is ever deleted while in use.
Fetching the ipoib_neigh structure instance from the hash also makes
the special code in ipoib_start_xmit that handles remote and local
bonding failover redundant.
Aged ipoib_neigh instances are deleted by a garbage collection task
that runs every M seconds and deletes every ipoib_neigh instance that
was idle for at least 2*M seconds. The deletion is safe since the
ipoib_neigh instances are protected using RCU and reference count
mechanisms.
The number of buckets (N) and frequency of running the GC thread (M),
are taken from the exported arb_tbl.
Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
- Updates to the qib low-level driver
- First chunk of changes for SR-IOV support for mlx4 IB
- RDMA CM support for IPv6-only binding
- Other misc cleanups and fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABCAAGBQJQDXhUAAoJEENa44ZhAt0hZxwQAJydS9f9me31AGa45SAq8rA8
6e3LgnQ6jS38he7LZZdSsT7g25jROCKOcTj6VUkNSVAVSkK8zcp7wngjDxw50IK6
FF9hNeljMkWBflOhxzB34DRGCW4b4J+Yt1o7v1RtawiG/Mhri/6imKL0Aqjt4GwX
a1MPZn+xI2osLujfdHJtATPWWB9jCaXdFe4DJUNPdqJhS6TN7s8OP3XMiqoJtdaV
ptHeSSbjdUR1mg/h2LU2FVmWHXNSBxn7MEsrBBRkQVyiEkXieFBLwPTp4DqmAUJf
xugm6Hf4sbiQ+QuU0baJODt56wveuYVQ4HKUzE0urUFXyU4TUB9blrehWZnKsRte
wo/w4nvVlnXgGhHH0Igq76RDX8aCwc/6uQJ/29oChWjrei3HE0LjmIlPAu0vAhyw
ViLe02/r2gKXQv1NxIqhPmGsJTZizg2mUk2eEJHHPQb/NGL7iNo6b+141AfURoqQ
goGmlGdzffCRpmo6FXFZ57RPVRS4gwMunCY/Pmvq5a2t4oZh8899l2+V3N7bCfvH
+JdavxAjia9U4IlgPsAqVaz8z8TyHOY/lEd75Wnw8q9www3kLx7hASvHsdwrS4LL
ihzECzsaXOSoIXQzghs+6iyp1pzmzE9ve8OqbIGJStzlrOLyn7Gjo+Ixwm0SLsTO
I7h9iJ1OMKFZ8bzmHsWb
=f09j
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull InfiniBand/RDMA changes from Roland Dreier:
- Updates to the qib low-level driver
- First chunk of changes for SR-IOV support for mlx4 IB
- RDMA CM support for IPv6-only binding
- Other misc cleanups and fixes
Fix up some add-add conflicts in include/linux/mlx4/device.h and
drivers/net/ethernet/mellanox/mlx4/main.c
* tag 'rdma-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (30 commits)
IB/qib: checkpatch fixes
IB/qib: Add congestion control agent implementation
IB/qib: Reduce sdma_lock contention
IB/qib: Fix an incorrect log message
IB/qib: Fix QP RCU sparse warnings
mlx4: Put physical GID and P_Key table sizes in mlx4_phys_caps struct and paravirtualize them
mlx4_core: Allow guests to have IB ports
mlx4_core: Implement mechanism for reserved Q_Keys
net/mlx4_core: Free ICM table in case of error
IB/cm: Destroy idr as part of the module init error flow
mlx4_core: Remove double function declarations
IB/mlx4: Fill the masked_atomic_cap attribute in query device
IB/mthca: Fill in sq_sig_type in query QP
IB/mthca: Warning about event for non-existent QPs should show event type
IB/qib: Fix sparse RCU warnings in qib_keys.c
net/mlx4_core: Initialize IB port capabilities for all slaves
mlx4: Use port management change event instead of smp_snoop
IB/qib: RCU locking for MR validation
IB/qib: Avoid returning EBUSY from MR deregister
IB/qib: Fix UC MR refs for immediate operations
...
Pull networking changes from David S Miller:
1) Remove the ipv4 routing cache. Now lookups go directly into the FIB
trie and use prebuilt routes cached there.
No more garbage collection, no more rDOS attacks on the routing
cache. Instead we now get predictable and consistent performance,
no matter what the pattern of traffic we service.
This has been almost 2 years in the making. Special thanks to
Julian Anastasov, Eric Dumazet, Steffen Klassert, and others who
have helped along the way.
I'm sure that with a change of this magnitude there will be some
kind of fallout, but such things ought the be simple to fix at this
point. Luckily I'm not European so I'll be around all of August to
fix things :-)
The major stages of this work here are each fronted by a forced
merge commit whose commit message contains a top-level description
of the motivations and implementation issues.
2) Pre-demux of established ipv4 TCP sockets, saves a route demux on
input.
3) TCP SYN/ACK performance tweaks from Eric Dumazet.
4) Add namespace support for netfilter L4 conntrack helpers, from Gao
Feng.
5) Add config mechanism for Energy Efficient Ethernet to ethtool, from
Yuval Mintz.
6) Remove quadratic behavior from /proc/net/unix, from Eric Dumazet.
7) Support for connection tracker helpers in userspace, from Pablo
Neira Ayuso.
8) Allow userspace driven TX load balancing functions in TEAM driver,
from Jiri Pirko.
9) Kill off NLMSG_PUT and RTA_PUT macros, more gross stuff with
embedded gotos.
10) TCP Small Queues, essentially minimize the amount of TCP data queued
up in the packet scheduler layer. Whereas the existing BQL (Byte
Queue Limits) limits the pkt_sched --> netdevice queuing levels,
this controls the TCP --> pkt_sched queueing levels.
From Eric Dumazet.
11) Reduce the number of get_page/put_page ops done on SKB fragments,
from Alexander Duyck.
12) Implement protection against blind resets in TCP (RFC 5961), from
Eric Dumazet.
13) Support the client side of TCP Fast Open, basically the ability to
send data in the SYN exchange, from Yuchung Cheng.
Basically, the sender queues up data with a sendmsg() call using
MSG_FASTOPEN, then they do the connect() which emits the queued up
fastopen data.
14) Avoid all the problems we get into in TCP when timers or PMTU events
hit a locked socket. The TCP Small Queues changes added a
tcp_release_cb() that allows us to queue work up to the
release_sock() caller, and that's what we use here too. From Eric
Dumazet.
15) Zero copy on TX support for TUN driver, from Michael S. Tsirkin.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1870 commits)
genetlink: define lockdep_genl_is_held() when CONFIG_LOCKDEP
r8169: revert "add byte queue limit support".
ipv4: Change rt->rt_iif encoding.
net: Make skb->skb_iif always track skb->dev
ipv4: Prepare for change of rt->rt_iif encoding.
ipv4: Remove all RTCF_DIRECTSRC handliing.
ipv4: Really ignore ICMP address requests/replies.
decnet: Don't set RTCF_DIRECTSRC.
net/ipv4/ip_vti.c: Fix __rcu warnings detected by sparse.
ipv4: Remove redundant assignment
rds: set correct msg_namelen
openvswitch: potential NULL deref in sample()
tcp: dont drop MTU reduction indications
bnx2x: Add new 57840 device IDs
tcp: avoid oops in tcp_metrics and reset tcpm_stamp
niu: Change niu_rbr_fill() to use unlikely() to check niu_rbr_add_page() return value
niu: Fix to check for dma mapping errors.
net: Fix references to out-of-scope variables in put_cmsg_compat()
net: ethernet: davinci_emac: add pm_runtime support
net: ethernet: davinci_emac: Remove unnecessary #include
...
Pull target updates from Nicholas Bellinger:
"There have been lots of work in a number of areas this past round.
The highlights include:
- Break out target_core_cdb.c emulation into SPC/SBC ops (hch)
- Add a parse_cdb method to target backend drivers (hch)
- Move sync_cache + write_same + unmap into spc_ops (hch)
- Use target_execute_cmd for WRITEs in iscsi_target + srpt (hch)
- Offload WRITE I/O backend submission in tcm_qla2xxx + tcm_fc (hch +
nab)
- Refactor core_update_device_list_for_node() into enable/disable
funcs (agrover)
- Replace the TCM processing thread with a TMR work queue (hch)
- Fix regression in transport_add_device_to_core_hba from TMR
conversion (DanC)
- Remove racy, now-redundant check of sess_tearing_down with qla2xxx
(roland)
- Add range checking, fix reading of data len + possible underflow in
UNMAP (roland)
- Allow for target_submit_cmd() returning errors + convert fabrics
(roland + nab)
- Drop bogus struct file usage for iSCSI/SCTP (viro)"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (54 commits)
iscsi-target: Drop bogus struct file usage for iSCSI/SCTP
target: NULL dereference on error path
target: Allow for target_submit_cmd() returning errors
target: Check number of unmap descriptors against our limit
target: Fix possible integer underflow in UNMAP emulation
target: Fix reading of data length fields for UNMAP commands
target: Add range checking to UNMAP emulation
target: Add generation of LOGICAL BLOCK ADDRESS OUT OF RANGE
target: Make unnecessarily global se_dev_align_max_sectors() static
target: Remove se_session.sess_wait_list
qla2xxx: Remove racy, now-redundant check of sess_tearing_down
target: Check sess_tearing_down in target_get_sess_cmd()
sbp-target: Consolidate duplicated error path code in sbp_handle_command()
target: Un-export target_get_sess_cmd()
qla2xxx: Get rid of redundant qla_tgt_sess.tearing_down
target: Make core_disable_device_list_for_node use pre-refactoring lock ordering
target: refactor core_update_device_list_for_node()
target: Eliminate else using boolean logic
target: Misc retval cleanups
target: Remove hba param from core_dev_add_lun
...
This will be used so that we can compose a full flow key.
Even though we have a route in this context, we need more. In the
future the routes will be without destination address, source address,
etc. keying. One ipv4 route will cover entire subnets, etc.
In this environment we have to have a way to possess persistent storage
for redirects and PMTU information. This persistent storage will exist
in the FIB tables, and that's why we'll need to be able to rebuild a
full lookup flow key here. Using that flow key will do a fib_lookup()
and create/update the persistent entry.
Signed-off-by: David S. Miller <davem@davemloft.net>
srpt_handle_rdma_comp is called from kthread context and thus can execute
target_execute_cmd directly. srpt_abort_cmd sets the CMD_T_LUN_STOP
flag directly, and thus the abuse of transport_generic_handle_data can be
replaced with an opencoded variant of that code path. I'm still not happy
about a fabric driver poking into target core internals like this, but
let's defer the bigger architecture changes for now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Conflicts:
net/batman-adv/bridge_loop_avoidance.c
net/batman-adv/bridge_loop_avoidance.h
net/batman-adv/soft-interface.c
net/mac80211/mlme.c
With merge help from Antonio Quartulli (batman-adv) and
Stephen Rothwell (drivers/net/usb/qmi_wwan.c).
The net/mac80211/mlme.c conflict seemed easy enough, accounting for a
conversion to some new tracing macros.
Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz reported triggering of WARN_ON_ONCE(delta < len); in
skb_try_coalesce()
This warning tracks drivers that incorrectly set skb->truesize
IPoIB indeed allocates a full page to store a fragment, but only
accounts in skb->truesize the used part of the page (frame length)
This patch fixes skb truesize underestimation, and
also fixes a performance issue, because RX skbs have not enough tailroom
to allow IP and TCP stacks to pull their header in skb linear part
without an expensive call to pskb_expand_head()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Erez Shitrit <erezsh@mellanox.com>
Cc: Shlomo Pongartz <shlomop@mellanox.com>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Add ocrdma hardware driver for Emulex IB-over-Ethernet adapters
- Add generic and mlx4 support for "raw" QPs: allow suitably privileged
applications to send and receive arbitrary packets directly to/from
the hardware
- Add "doorbell drop" handling to the cxgb4 driver
- A fairly large batch of qib hardware driver changes
- A few fixes for lockdep-detected issues
- A few other miscellaneous fixes and cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABCAAGBQJPumbZAAoJEENa44ZhAt0h8LgP/0fXe7Szm3n6P6UvMAVqkagM
4PpreH3mpWUFpzqeQE1JPDtgx700R6aPipbHqgIN+k61RWMpLjICGcNx7iwxn1I+
zqdquGygWgjceLz+BLVlk+iBmJt3vZ3fPRAXc7fdP+jhIarWkNIOy1pXWTUuRvED
jL8jIaxhCcgAVzm/zNyt6IPxkaHvCz7K9wqmpyU0dsO9OyPdGvWA9+CkGXwmOCPq
mxSVhWnfGsMkPBsL7EgTC5KP/ox2PKq6rFgysmVVS+rKCpP0L8BEVQyGX3Gf8KA8
yV+KdTi9ofDnFrv6R7Wz0v7HRUih8GRssakzBu7Y7HLfK1M/QwMG0GUAibXGZObc
vUXuQ3uRJ/cIzMPXqKeGYwpb5t+TmxyjhWu44OjCUQkNau91+9BSbA69S88KXc49
aTJiCZlhPuGf4uGMWJJuPLcE2xO2QCZj+8ckL2STYrIip6GWlCH02kJaQmRkuWH2
UhMOeJDBC4nvh4EQT/WwHpGzyhkavE2ayfo5YemxBJXo+P5Mdbf7WIDRQDLUEeQH
F8sPoccH4hDiAorN/SkTsm14jVTP7oWW1M40Ont59Nhbgm88MsVkvjoneHnfBvbD
HjK92soCWnYTAoREfj0G4xUxZgMdOZcezWrX0rx5LJ8Ju9y4zAi3cKGr7lg6hs4X
syKfN0VjiDRtJ+pxayi3
=yWfr
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull InfiniBand/RDMA changes from Roland Dreier:
- Add ocrdma hardware driver for Emulex IB-over-Ethernet adapters
- Add generic and mlx4 support for "raw" QPs: allow suitably privileged
applications to send and receive arbitrary packets directly to/from
the hardware
- Add "doorbell drop" handling to the cxgb4 driver
- A fairly large batch of qib hardware driver changes
- A few fixes for lockdep-detected issues
- A few other miscellaneous fixes and cleanups
Fix up trivial conflict in drivers/net/ethernet/emulex/benet/be.h.
* tag 'rdma-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (53 commits)
RDMA/cxgb4: Include vmalloc.h for vmalloc and vfree
IB/mlx4: Fix mlx4_ib_add() error flow
IB/core: Fix IB_SA_COMP_MASK macro
IB/iser: Fix error flow in iser ep connection establishment
IB/mlx4: Increase the number of vectors (EQs) available for ULPs
RDMA/cxgb4: Add query_qp support
RDMA/cxgb4: Remove kfifo usage
RDMA/cxgb4: Use vmalloc() for debugfs QP dump
RDMA/cxgb4: DB Drop Recovery for RDMA and LLD queues
RDMA/cxgb4: Disable interrupts in c4iw_ev_dispatch()
RDMA/cxgb4: Add DB Overflow Avoidance
RDMA/cxgb4: Add debugfs RDMA memory stats
cxgb4: DB Drop Recovery for RDMA and LLD queues
cxgb4: Common platform specific changes for DB Drop Recovery
cxgb4: Detect DB FULL events and notify RDMA ULD
RDMA/cxgb4: Drop peer_abort when no endpoint found
RDMA/cxgb4: Always wake up waiters in c4iw_peer_abort_intr()
mlx4_core: Change bitmap allocator to work in round-robin fashion
RDMA/nes: Don't call event handler if pointer is NULL
RDMA/nes: Fix for the ORD value of the connecting peer
...
Pull scsi-target changes from Nicholas Bellinger:
"There has been lots of work in existing code in a number of areas this
past cycle. The major highlights have been:
* Removal of transport_do_task_sg_chain() from core + fabrics
(Roland)
* target-core: Removal of se_task abstraction from target-core and
enforce hw_max_sectors for pSCSI backends (hch)
* Re-factoring of iscsi-target tx immediate/response queues (agrover)
* Conversion of iscsi-target back to using target core memory
allocation logic (agrover)
We've had one last minute iscsi-target patch go into for-next to
address a nasty regression bug related to the target core allocation
logic conversion from agrover that is not included in friday's
linux-next build, but has been included in this series.
On the new fabric module code front for-3.5, here is a brief status
update for the three currently in flight this round:
* usb-gadget target driver:
Sebastian Siewior's driver for supporting usb-gadget target mode
operation. This will be going out as a separate PULL request from
target-pending/usb-target-merge with subsystem maintainer ACKs. There
is one minor target-core patch in this series required to function.
* sbp ieee-1394/firewire target driver:
Chris Boot's driver for supportting the Serial Block Protocol (SBP)
across IEEE-1394 Firewire hardware. This will be going out as a
separate PULL request from target-pending/sbp-target-merge with two
additional drivers/firewire/ patches w/ subsystem maintainer ACKs.
* qla2xxx LLD target mode infrastructure changes + tcm_qla2xxx:
The Qlogic >= 24xx series HW target mode LLD infrastructure patch-set
and tcm_qla2xxx fabric driver. Support for FC target mode using
qla2xxx LLD code has been officially submitted by Qlogic to James
below, and is currently outstanding but not yet merged into
scsi.git/for-next..
[PATCH 00/22] qla2xxx: Updates for scsi "misc" branch
http://www.spinics.net/lists/linux-scsi/msg59350.html
Note there are *zero* direct dependencies upon this for-next series
for the qla2xxx LLD target + tcm_qla2xxx patches submitted above, and
over the last days the target mode team has been tracking down an
tcm_qla2xxx specific active I/O shutdown bug that appears to now be
almost squashed for 3.5-rc-fixes."
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (47 commits)
iscsi-target: Fix iov_count calculation bug in iscsit_allocate_iovecs
iscsi-target: remove dead code in iscsi_check_valuelist_for_support
target: Handle ATA_16 passthrough for pSCSI backend devices
target: Add MI_REPORT_TARGET_PGS ext. header + implict_trans_secs attribute
target: Fix MAINTENANCE_IN service action CDB checks to use lower 5 bits
target: add support for the WRITE_VERIFY command
target: make target_put_session void
target: cleanup transport_execute_tasks()
target: Remove max_sectors device attribute for modern se_task less code
target: lock => unlock typo in transport_lun_wait_for_tasks
target: Enforce hw_max_sectors for SCF_SCSI_DATA_SG_IO_CDB
target: remove the t_se_count field in struct se_cmd
target: remove the t_task_cdbs_ex_left field in struct se_cmd
target: remove the t_task_cdbs_left field in struct se_cmd
target: remove struct se_task
target: move the state and execute lists to the command
target: simplify command to task linkage
target: always allocate a single task
target: replace ->execute_task with ->execute_cmd
target: remove the task_sectors field in struct se_task
...
The current error flow code was releasing the IB connection object and
calling iscsi_destroy_endpoint() directly without going through the
reference counting mechanism introduced in commit 39ff05d ("IB/iser:
Enhance disconnection logic for multi-pathing"). This resulted in a
double free of the iscsi endpoint object, which causes a kernel NULL
pointer dereference. Fix that by plugging into the IB conn reference
counting correctly.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch renames a horribly misnamed function that no longer allocate
tasks to something more descriptive for it's modern use in target core.
(nab: Fix up ib_srpt to use this as well ahead of a target_submit_cmd
conversion)
Signed-off-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
With the modern target core, se_cmd->t_data_sg already points to a
sglist that covers the whole command. So task_sg chaining is needless
overhead and obfuscation -- instead of splicing the split up task
sglists back into one list, we can just use the original list directly.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Since commit 96104eda01 ("RDMA/core: Add SRQ type field"), kernel
users of SRQs need to specify srq_type = IB_SRQT_BASIC in struct
ib_srq_init_attr, or else most low-level drivers will fail in
when srpt_add_one() calls ib_create_srq() and gets -ENOSYS.
(mlx4_ib works OK nearly all of the time, because it just needs
srq_type != IB_SRQT_XRC. And apparently nearly everyone using
ib_srpt is using mlx4 hardware)
Reported-by: Alexey Shvetsov <alexxy@gentoo.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Pull SCSI target updates from Nicholas Bellinger:
"This contains the usual set of updates and bugfixes to target-core +
existing fabric module code, along with a handful of the patches
destined for v3.3 stable.
It also contains the necessary target-core infrastructure pieces
required to run using tcm_qla2xxx.ko WWPNs with the new Qlogic Fibre
Channel fabric module currently queued in target-pending/for-next-merge,
and coming for round 2.
The highlights for this series include:
- Add target_submit_tmr() helper function for fabric task management
(andy)
- Convert tcm_fc to use target_submit_tmr() (andy)
- Replace target core various cmd flags with a transport state (hch)
- Convert loopback to use workqueue submission (hch)
- Convert target core to use array_zalloc for tpg_lun_list (joern)
- Convert target core to use array_zalloc for device_list (joern)
- Add target core support for TMR_ABORT_TASK (nab)
- Add target core se_sess->sess_kref + get/put helpers (nab)
- Add target core se_node_acl->acl_kref for ->acl_free_comp usage
(nab)
- Convert iscsi-target to use target_put_session + sess_kref (nab)
- Fix tcm_fc fc_exch memory leak in ft_send_resp_status (nab)
- Fix ib_srpt srpt_handle_cmd send_ioctx->ioctx_kref leak on
exception (nab)
- Fix target core up handling of short INQUIRY buffers (roland)
- Untangle target-core front-end and back-end meanings of max_sectors
attribute (roland)
- Set loopback residual field for SCSI commands (roland)
- Fix target-core 16-bit target ports for SET TARGET PORT GROUPS
emulation (roland)
Thanks again to Andy, Christoph, Joern, Roland, and everyone who has
contributed this round!"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (64 commits)
ib_srpt: Fix srpt_handle_cmd send_ioctx->ioctx_kref leak on exception
loopback: Fix transport_generic_allocate_tasks error handling
iscsi-target: remove improper externs
iscsi-target: Remove unused variables in iscsi_target_parameters.c
target: remove obvious warnings
target: Use array_zalloc for device_list
target: Use array_zalloc for tpg_lun_list
target: Fix sense code for unsupported SERVICE ACTION IN
target: Remove hack to make READ CAPACITY(10) lie if thin provisioning is enabled
target: Bump core version to v4.1.0-rc2-ml + fabric versions
tcm_fc: Fix fc_exch memory leak in ft_send_resp_status
target: Drop unused legacy target_core_fabric_ops API callers
iscsi-target: Convert to use target_put_session + sess_kref
target: Convert se_node_acl->acl_group removal to use ->acl_kref
target: Add se_node_acl->acl_kref for ->acl_free_comp usage
target: Add se_node_acl->acl_free_comp for NodeACL release path
target: Add se_sess->sess_kref + get/put helpers
target: Convert session_lock to irqsave
target: Fix typo in drivers/target
iscsi-target: Fix dynamic -> explict NodeACL pointer reference
...
stands out; by patch count lots of fixes to the mlx4 driver plus some
cleanups and fixes to the core and other drivers.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABCAAGBQJPZ2THAAoJEENa44ZhAt0hMgkP/AwXjwzz6lYlIizytQ5KOH11
CT/VoHLIGIwY31aRhZRHggWLEbJi8akgfh04K9PiSh/muFRPYk5KJDoQEoJV5Odv
12krqtpZeL41YcAPq0wiyP3Ki5AZDCDumzWbOtmxE9m93mVfi3yFTTWDHFo/7SOx
A2kUITdKuZhgBpCFYS23Gs8QTWcMPUlwNlM76edG90BjSySHsK15tbqTUaEOC6Ux
SPG0p0c2YjlZCPmRObrCL65o5LFRVajinZ4rXN7rre4LEU+IuXgW+mQU2Kwy8z6a
oMPE2l4OMSqj270r2PlVK087gGH69nKfLgLQLJiP0Id+KwdYUk7vQcA+Fu0jwjP3
Ci/ahllvB76IiaNbax0vTy2ohCJmilLLotAP46NW0vPR2/KnmVy0Mqk8pXQQddw4
tnAFNnafn/MjTty8GwqyXR/enJZs29ePcSxcYnKjXYRIgG4ldejL2wktgSra8+gb
3/qFag1HRgZzcICkXGpHj7uWJ2KDe++1m6KTb0THUmrjuiel6ATFZpYoeRed3GfS
ZMqpjZvuXM8nA9CrKMkzm5vIiUFF8Qq0hUTLR38F8FiNEhmC08JqH5PqjJ3Z18if
R1yG4W4xmp/0ICHg4zyg6LWV3YsweDD+jSCJpW+KNBvu3HtYb41HIlK+i2TTmtPD
3etEZZRwROAyYKcwN01p
=DbLx
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull InfiniBand/RDMA changes for the 3.4 merge window from Roland Dreier:
"Nothing big really stands out; by patch count lots of fixes to the
mlx4 driver plus some cleanups and fixes to the core and other
drivers."
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (28 commits)
mlx4_core: Scale size of MTT table with system RAM
mlx4_core: Allow dynamic MTU configuration for IB ports
IB/mlx4: Fix info returned when querying IBoE ports
IB/mlx4: Fix possible missed completion event
mlx4_core: Report thermal error events
mlx4_core: Fix one more static exported function
IB: Change CQE "csum_ok" field to a bit flag
RDMA/iwcm: Reject connect requests if cmid is not in LISTEN state
RDMA/cxgb3: Don't pass irq flags to flush_qp()
mlx4_core: Get rid of redundant ext_port_cap flags
RDMA/ucma: Fix AB-BA deadlock
IB/ehca: Fix ilog2() compile failure
IB: Use central enum for speed instead of hard-coded values
IB/iser: Post initial receive buffers before sending the final login request
IB/iser: Free IB connection resources in the proper place
IB/srp: Consolidate repetitive sysfs code
IB/srp: Use pr_fmt() and pr_err()/pr_warn()
IB/core: Fix SDR rates in sysfs
mlx4: Enforce device max FMR maps in FMR alloc
IB/mlx4: Set bad_wr for invalid send opcode
...
Pull kmap_atomic cleanup from Cong Wang.
It's been in -next for a long time, and it gets rid of the (no longer
used) second argument to k[un]map_atomic().
Fix up a few trivial conflicts in various drivers, and do an "evil
merge" to catch some new uses that have come in since Cong's tree.
* 'kmap_atomic' of git://github.com/congwang/linux: (59 commits)
feature-removal-schedule.txt: schedule the deprecated form of kmap_atomic() for removal
highmem: kill all __kmap_atomic() [swarren@nvidia.com: highmem: Fix ARM build break due to __kmap_atomic rename]
drbd: remove the second argument of k[un]map_atomic()
zcache: remove the second argument of k[un]map_atomic()
gma500: remove the second argument of k[un]map_atomic()
dm: remove the second argument of k[un]map_atomic()
tomoyo: remove the second argument of k[un]map_atomic()
sunrpc: remove the second argument of k[un]map_atomic()
rds: remove the second argument of k[un]map_atomic()
net: remove the second argument of k[un]map_atomic()
mm: remove the second argument of k[un]map_atomic()
lib: remove the second argument of k[un]map_atomic()
power: remove the second argument of k[un]map_atomic()
kdb: remove the second argument of k[un]map_atomic()
udf: remove the second argument of k[un]map_atomic()
ubifs: remove the second argument of k[un]map_atomic()
squashfs: remove the second argument of k[un]map_atomic()
reiserfs: remove the second argument of k[un]map_atomic()
ocfs2: remove the second argument of k[un]map_atomic()
ntfs: remove the second argument of k[un]map_atomic()
...
Pull trivial tree from Jiri Kosina:
"It's indeed trivial -- mostly documentation updates and a bunch of
typo fixes from Masanari.
There are also several linux/version.h include removals from Jesper."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (101 commits)
kcore: fix spelling in read_kcore() comment
constify struct pci_dev * in obvious cases
Revert "char: Fix typo in viotape.c"
init: fix wording error in mm_init comment
usb: gadget: Kconfig: fix typo for 'different'
Revert "power, max8998: Include linux/module.h just once in drivers/power/max8998_charger.c"
writeback: fix fn name in writeback_inodes_sb_nr_if_idle() comment header
writeback: fix typo in the writeback_control comment
Documentation: Fix multiple typo in Documentation
tpm_tis: fix tis_lock with respect to RCU
Revert "media: Fix typo in mixer_drv.c and hdmi_drv.c"
Doc: Update numastat.txt
qla4xxx: Add missing spaces to error messages
compiler.h: Fix typo
security: struct security_operations kerneldoc fix
Documentation: broken URL in libata.tmpl
Documentation: broken URL in filesystems.tmpl
mtd: simplify return logic in do_map_probe()
mm: fix comment typo of truncate_inode_pages_range
power: bq27x00: Fix typos in comment
...
This patch addresses a bug in srpt_handle_cmd() failure handling where
send_ioctx->kref is being leaked with the local extra reference after init,
causing the expected kref_put() in srpt_handle_send_comp() to not be the final
call to invoke srpt_put_send_ioctx_kref() -> transport_generic_free_cmd() and
perform se_cmd descriptor memory release.
It also fixes a SCF_SCSI_RESERVATION_CONFLICT handling bug where this code
is incorrectly falling through to transport_handle_cdb_direct() after
invoking srpt_queue_status() to send SAM_STAT_RESERVATION_CONFLICT status.
Note this patch is for >= v3.3 mainline code, and current lio-core.git
code has already been converted to target_submit_cmd() + se_cmd->cmd_kref usage,
and internal ioctx->kref usage has been removed. I'm including this patch
now into target-pending/for-next with a CC' for v3.3 stable.
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Roland Dreier <roland@purestorage.com>
Cc: stable@vger.kernel.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This patch drops the following unused legacy API callers from target_core_fabric.h:
*) TFO->fall_back_to_erl0()
*) TFO->stop_session()
*) TFO->sess_logged_in()
*) TFO->is_state_remove()
This patch also removes the stub usage in loopback, tcm_fc, iscsi_target,
and ib_srpt fabric modules.
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Use a bit in wc_flags rather then a whole integer to hold the
"checksum OK" flag. By itself, this change doesn't reduce the size of
struct ib_wc on 64bit machines -- it stays on 56 bytes because of
padding. However, it will allow to add more fields in the future
without enlarging the struct. Also, it will let us have a unified
approach with future libibverbs checksum offload reporting, because a
bit flag doesn't break the library ABI.
This patch was suggested during conversation with Liran Liss
<liranl@mellanox.com>.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
An iser target may send iscsi NO-OP PDUs as soon as it marks the iSER
iSCSI session as fully operative. This means that there is window
where there are no posted receive buffers on the initiator side, so
it's possible for the iSER RC connection to break because of RNR NAK /
retry errors. To fix this, rely on the flags bits in the login
request to have FFP (0x3) in the lower nibble as a marker for the
final login request, and post an initial chunk of receive buffers
before sending that login request instead of after getting the login
response.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
We allocate the login dma buffers in iser_verbs.c as part of
alloc_ib_conn_resources(), however we are freeing them in
iser_initiator.c as part of iser_free_rx_descriptors(). This is
needlessly confusing. We have an alloc_rx_descriptors() and it
doesn't alloc something that the free_rx_descriptors() frees, and we
have an alloc_ib_conn_resources() that allocs something not freed by
free_ib_conn_resources(). Clean that up.
Signed-off-by: Doug Ledford <dledford@redhat.com>
[ Fix build error in iser_free_ib_conn_res(). - Or ]
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Remove sysfs attributes before removing a target instead of testing
the target state in every sysfs attribute callback method. Note: it is
safe to invoke a sysfs attribute removal method like
device_remove_file() twice on the same attribute.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Use pr_fmt() and pr_xxx() instead of more verbose printk() equivalents.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Change the test for if a cmd is a tmr request to checking if
SCF_SCSI_TMR_CDB (a new flag) is set in cmd->se_cmd_flags.
Also remove se_tmr_req_cache usage in favor of kzalloc usage,
and make core_tmr_alloc_req() return int + setup se_cmd->se_tmr_req
directly and fix up various fabric module usages
Cc: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Replace various atomic_ts used as flags in struct se_cmd with a single
transport_state bitmap that requires t_state_lock to be held for modifications.
In the target core that assumption generally is true, but some recently added
code in the SRP target had to grow new lock calls. I can't say I like the way
how it messes with the command state directly, but let's leave that for later.
(Re-add missing ib_srpt.c changes that nab dropped..)
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Quoth David:
1) GRO MAC header comparisons were ethernet specific, breaking other
link types. This required a multi-faceted fix to cure the originally
noted case (Infiniband), because IPoIB was lying about it's actual
hard header length. Thanks to Eric Dumazet, Roland Dreier, and
others.
2) Fix build failure when INET_UDP_DIAG is built in and ipv6 is modular.
From Anisse Astier.
3) Off by ones and other bug fixes in netprio_cgroup from Neil Horman.
4) ipv4 TCP reset generation needs to respect any network interface
binding from the socket, otherwise route lookups might give a
different result than all the other segments received. From Shawn
Lu.
5) Fix unintended regression in ipv4 proxy ARP responses, from Thomas
Graf.
6) Fix SKB under-allocation bug in sh_eth, from Yoshihiro Shimoda.
7) Revert skge PCI mapping changes that are causing crashes for some
folks, from Stephen Hemminger.
8) IPV4 route lookups fill in the wildcarded fields of the given flow
lookup key passed in, which is fine most of the time as this is
exactly what the caller's want. However there are a few cases that
want to retain the original flow key values afterwards, so handle
those cases properly. Fix from Julian Anastasov.
9) IGB/IXGBE VF lookup bug fixes from Greg Rose.
10) Properly null terminate filename passed to ethtool flash device
method, from Ben Hutchings.
11) S3 resume fix in via-velocity from David Lv.
12) Fix double SKB free during xmit failure in CAIF, from Dmitry
Tarnyagin.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (72 commits)
net: Don't proxy arp respond if iif == rt->dst.dev if private VLAN is disabled
ipv4: Fix wrong order of ip_rt_get_source() and update iph->daddr.
netprio_cgroup: fix wrong memory access when NETPRIO_CGROUP=m
netprio_cgroup: don't allocate prio table when a device is registered
netprio_cgroup: fix an off-by-one bug
bna: fix error handling of bnad_get_flash_partition_by_offset()
isdn: type bug in isdn_net_header()
net: Make qdisc_skb_cb upper size bound explicit.
ixgbe: ethtool: stats user buffer overrun
ixgbe: dcb: up2tc mapping lost on disable/enable CEE DCB state
ixgbe: do not update real num queues when netdev is going away
ixgbe: Fix broken dependency on MAX_SKB_FRAGS being related to page size
ixgbe: Fix case of Tx Hang in PF with 32 VFs
ixgbe: fix vf lookup
igb: fix vf lookup
e1000: add dropped DMA receive enable back in for WoL
gro: more generic L2 header check
IPoIB: Stop lying about hard_header_len and use skb->cb to stash LL addresses
zd1211rw: firmware needs duration_id set to zero for non-pspoll frames
net: enable TC35815 for MIPS again
...
Commit a0417fa3a1 ("net: Make qdisc_skb_cb upper size bound
explicit.") made it possible for a netdev driver to use skb->cb
between its header_ops.create method and its .ndo_start_xmit
method. Use this in ipoib_hard_header() to stash away the LL address
(GID + QPN), instead of the "ipoib_pseudoheader" hack. This allows
IPoIB to stop lying about its hard_header_len, which will let us fix
the L2 check for GRO.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
transport_init_session() and target_fabric_configfs_init() don't
return NULL pointers, they only return ERR_PTRs or valid pointers.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (165 commits)
reiserfs: Properly display mount options in /proc/mounts
vfs: prevent remount read-only if pending removes
vfs: count unlinked inodes
vfs: protect remounting superblock read-only
vfs: keep list of mounts for each superblock
vfs: switch ->show_options() to struct dentry *
vfs: switch ->show_path() to struct dentry *
vfs: switch ->show_devname() to struct dentry *
vfs: switch ->show_stats to struct dentry *
switch security_path_chmod() to struct path *
vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb
vfs: trim includes a bit
switch mnt_namespace ->root to struct mount
vfs: take /proc/*/mounts and friends to fs/proc_namespace.c
vfs: opencode mntget() mnt_set_mountpoint()
vfs: spread struct mount - remaining argument of next_mnt()
vfs: move fsnotify junk to struct mount
vfs: move mnt_devname
vfs: move mnt_list to struct mount
vfs: switch pnode.h macros to struct mount *
...
This patch adds the kernel module ib_srpt SCSI RDMA Protocol (SRP) target
implementation conforming to the SRP r16a specification for the mainline
drivers/target infrastructure.
This driver was originally developed by Vu Pham and has been optimized by
Bart Van Assche and merged into upstream LIO based on his srpt-lio-4.1
branch here:
https://github.com/bvanassche/srpt-lio/commits/srpt-lio-4.1/
This updated patch also contains the following two changes from
lio-core-2.6.git/master. One is to fix a bug with 1 >= task->task_sg[]
chained mappings in ib_srpt, and the other to convert the configfs control
plane to reference IB Port GUID and struct srpt_port directly following
mainline v4.x target_core_fabric_configfs.c convertion for ib_srpt
to work with rtslib/rtsadmin v2 code.
These seperate patches can be found here:
ib_srpt: Fix bug with chainged SGLs in srpt_map_sg_to_ib_sge
http://www.risingtidesystems.com/git/?p=lio-core-2.6.git;a=commitdiff;h=ea485147563b6555a97dbf811825fbb586519252
ib_srpt: Convert se_wwn endpoint reference to struct srpt_port->port_wwn
http://www.risingtidesystems.com/git/?p=lio-core-2.6.git;a=commitdiff;h=4e544a210acb227df1bb4ca5086e65bdf4e648ea
This also includes the following recent v1 -> v2 review changes:
ib_srpt: Fix potential out-of-bounds array access
ib_srpt: Avoid failed multipart RDMA transfers
ib_srpt: Fix srpt_alloc_fabric_acl failure case return value
ib_srpt: Update comments to reference $driver/$port layout
ib_srpt: Fix sport->port_guid formatting code
ib_srpt: Remove legacy use_port_guid_in_session_name module parameter
ib_srpt: Convert srp_max_rdma_size into per port configfs attribute
ib_srpt: Convert srp_max_rsp_size into per port configfs attribute
ib_srpt: Convert srpt_sq_size into per port configfs attribute
and v2 -> v3 review changes:
ib_srpt: Fix possible race with srp_sq_size in srpt_create_ch_ib
ib_srpt: Fix possible race with srp_max_rsp_size in srpt_release_channel_work
ib_srpt: Fix up MAX_SRPT_RDMA_SIZE define
ib_srpt: Make srpt_map_sg_to_ib_sge() failure case return -EAGAIN
ib_srpt: Convert port_guid to use subnet_prefix + interface_id formatting
ib_srpt: Make srpt_check_stop_free return kref_put status
ib_srpt: Make compilation with BUG=n proceed`
ib_srpt: Use new target_core_fabric.h include
ib_srpt: Check hex2bin() return code to silence build warning
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Vu Pham <vu@mellanox.com>
Cc: David Dillow <dillowda@ornl.gov>
Signed-off-by: Nicholas A. Bellinger <nab@risingtidesystems.com>
Reduce the number of dst_get_neighbour_noref() calls within a single
call chain. Primarily by passing the neighbour pointer down to the
helper functions.
Handle dst_get_neighbour_noref() returning NULL in ipoib_start_xmit()
by incrementing the dropped counter and freeing the packet. We don't
want it to fall through into the ARP/RARP/multicast handling, since
that should only happen when skb_dst() is NULL.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Roland Dreier <roland@purestorage.com>
To reflect the fact that a refrence is not obtained to the
resulting neighbour entry.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Roland Dreier <roland@purestorage.com>
netdev->neigh_priv_len records the private area length.
This will trigger for neigh_table objects which set tbl->entry_size
to zero, and the first instances of this will be forthcoming.
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit f2c31e32b3 ("net: fix NULL dereferences in check_peer_redir()")
forgot to take care of infiniband uses of dst neighbours.
Many thanks to Marc Aurele who provided a nice bug report and feedback.
Reported-by: Marc Aurele La France <tsi@ualberta.ca>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This following can occur with ipoib when processing a multicast reponse:
BUG: soft lockup - CPU#0 stuck for 67s! [ib_mad1:982]
Modules linked in: ...
CPU 0:
Modules linked in: ...
Pid: 982, comm: ib_mad1 Not tainted 2.6.32-131.0.15.el6.x86_64 #1 ProLiant DL160 G5
RIP: 0010:[<ffffffff814ddb27>] [<ffffffff814ddb27>] _spin_unlock_irqrestore+0x17/0x20
RSP: 0018:ffff8802119ed860 EFLAGS: 00000246
0000000000000004 RBX: ffff8802119ed860 RCX: 000000000000a299
RDX: ffff88021086c700 RSI: 0000000000000246 RDI: 0000000000000246
RBP: ffffffff8100bc8e R08: ffff880210ac229c R09: 0000000000000000
R10: ffff88021278aab8 R11: 0000000000000000 R12: ffff8802119ed860
R13: ffffffff8100be6e R14: 0000000000000001 R15: 0000000000000003
FS: 0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000006d4840 CR3: 0000000209aa5000 CR4: 00000000000406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
[<ffffffffa032c247>] ? ipoib_mcast_send+0x157/0x480 [ib_ipoib]
[<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20
[<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20
[<ffffffffa03283d4>] ? ipoib_path_lookup+0x124/0x2d0 [ib_ipoib]
[<ffffffffa03286fc>] ? ipoib_start_xmit+0x17c/0x430 [ib_ipoib]
[<ffffffff8141e758>] ? dev_hard_start_xmit+0x2c8/0x3f0
[<ffffffff81439d0a>] ? sch_direct_xmit+0x15a/0x1c0
[<ffffffff81423098>] ? dev_queue_xmit+0x388/0x4d0
[<ffffffffa032d6b7>] ? ipoib_mcast_join_finish+0x2c7/0x510 [ib_ipoib]
[<ffffffffa032dab8>] ? ipoib_mcast_sendonly_join_complete+0x1b8/0x1f0 [ib_ipoib]
[<ffffffffa02a0946>] ? mcast_work_handler+0x1a6/0x710 [ib_sa]
[<ffffffffa015f01e>] ? ib_send_mad+0xfe/0x3c0 [ib_mad]
[<ffffffffa00f6c93>] ? ib_get_cached_lmc+0xa3/0xb0 [ib_core]
[<ffffffffa02a0f9b>] ? join_handler+0xeb/0x200 [ib_sa]
[<ffffffffa029e4fc>] ? ib_sa_mcmember_rec_callback+0x5c/0xa0 [ib_sa]
[<ffffffffa029e79c>] ? recv_handler+0x3c/0x70 [ib_sa]
[<ffffffffa01603a4>] ? ib_mad_completion_handler+0x844/0x9d0 [ib_mad]
[<ffffffffa015fb60>] ? ib_mad_completion_handler+0x0/0x9d0 [ib_mad]
[<ffffffff81088830>] ? worker_thread+0x170/0x2a0
[<ffffffff8108e160>] ? autoremove_wake_function+0x0/0x40
[<ffffffff810886c0>] ? worker_thread+0x0/0x2a0
[<ffffffff8108ddf6>] ? kthread+0x96/0xa0
[<ffffffff8100c1ca>] ? child_rip+0xa/0x20
Coinciding with stack trace is the following message:
ib0: ib_address_create failed
The code below in ipoib_mcast_join_finish() will note the above
failure in the address handle but otherwise continue:
ah = ipoib_create_ah(dev, priv->pd, &av);
if (!ah) {
ipoib_warn(priv, "ib_address_create failed\n");
} else {
The while loop at the bottom of ipoib_mcast_join_finish() will attempt
to send queued multicast packets in mcast->pkt_queue and eventually
end up in ipoib_mcast_send():
if (!mcast->ah) {
if (skb_queue_len(&mcast->pkt_queue) < IPOIB_MAX_MCAST_QUEUE)
skb_queue_tail(&mcast->pkt_queue, skb);
else {
++dev->stats.tx_dropped;
dev_kfree_skb_any(skb);
}
My read is that the code will requeue the packet and return to the
ipoib_mcast_join_finish() while loop and the stage is set for the
"hung" task diagnostic as the while loop never sees a non-NULL ah, and
will do nothing to resolve.
There are GFP_ATOMIC allocates in the provider routines, so this is
possible and should be dealt with.
The test that induced the failure is associated with a host SM on the
same server during a shutdown.
This patch causes ipoib_mcast_join_finish() to exit with an error
which will flush the queued mcast packets. Nothing is done to unwind
the QP attached state so that subsequent sends from above will retry
the join.
Reviewed-by: Ram Vepa <ram.vepa@qlogic.com>
Reviewed-by: Gary Leshner <gary.leshner@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
Revert "tracing: Include module.h in define_trace.h"
irq: don't put module.h into irq.h for tracking irqgen modules.
bluetooth: macroize two small inlines to avoid module.h
ip_vs.h: fix implicit use of module_get/module_put from module.h
nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
include: replace linux/module.h with "struct module" wherever possible
include: convert various register fcns to macros to avoid include chaining
crypto.h: remove unused crypto_tfm_alg_modname() inline
uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
pm_runtime.h: explicitly requires notifier.h
linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
miscdevice.h: fix up implicit use of lists and types
stop_machine.h: fix implicit use of smp.h for smp_processor_id
of: fix implicit use of errno.h in include/linux/of.h
of_platform.h: delete needless include <linux/module.h>
acpi: remove module.h include from platform/aclinux.h
miscdevice.h: delete unnecessary inclusion of module.h
device_cgroup.h: delete needless include <linux/module.h>
net: sch_generic remove redundant use of <linux/module.h>
net: inet_timewait_sock doesnt need <linux/module.h>
...
Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
- drivers/media/dvb/frontends/dibx000_common.c
- drivers/media/video/{mt9m111.c,ov6650.c}
- drivers/mfd/ab3550-core.c
- include/linux/dmaengine.h
The current driver never does DMA unmapping on these buffers. Fix that
by adding DMA unmapping to the task cleanup callback, and DMA mapping to
the task init function (drop the headers_initialized micro-optimization).
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The driver counted on the transactional nature of iSCSI login/text
flows and used the same buffer for both the request and the response.
We also went further and did DMA mapping only once, with
DMA_FROM_DEVICE, which violates the DMA mapping API. Fix that by
using different buffers, one for requests and one for responses, and
use the correct DMA mapping direction for each.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (62 commits)
mlx4_core: Deprecate log_num_vlan module param
IB/mlx4: Don't set VLAN in IBoE WQEs' control segment
IB/mlx4: Enable 4K mtu for IBoE
RDMA/cxgb4: Mark QP in error before disabling the queue in firmware
RDMA/cxgb4: Serialize calls to CQ's comp_handler
RDMA/cxgb3: Serialize calls to CQ's comp_handler
IB/qib: Fix issue with link states and QSFP cables
IB/mlx4: Configure extended active speeds
mlx4_core: Add extended port capabilities support
IB/qib: Hold links until tuning data is available
IB/qib: Clean up checkpatch issue
IB/qib: Remove s_lock around header validation
IB/qib: Precompute timeout jiffies to optimize latency
IB/qib: Use RCU for qpn lookup
IB/qib: Eliminate divide/mod in converting idx to egr buf pointer
IB/qib: Decode path MTU optimization
IB/qib: Optimize RC/UC code by IB operation
IPoIB: Use the right function to do DMA unmap pages
RDMA/cxgb4: Use correct QID in insert_recv_cqe()
RDMA/cxgb4: Make sure flush CQ entries are collected on connection close
...
These files were getting the moduleparam infrastructure from the
implicit presence of module.h being everywhere, but that is going
away soon.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
These were getting it implicitly via device.h --> module.h but
we are going to stop that when we clean up the headers.
Fix these in advance so the tree remains biscect-clean.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
They had been getting it implicitly via device.h but we can't
rely on that for the future, due to a pending cleanup so fix
it now.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
To ease skb->truesize sanitization, its better to be able to localize
all references to skb frags size.
Define accessors : skb_frag_size() to fetch frag size, and
skb_frag_size_{set|add|sub}() to manipulate it.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pages that were mapped using ib_dma_map_page() should be unmapped
using ib_dma_unmap_page().
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Currently, there is only a single ("basic") type of SRQ, but with XRC
support we will add a second. Prepare for this by defining an SRQ type
and setting all current users to IB_SRQT_BASIC.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Use new function ib_rate_to_mbps() to handle printing rate in debugfs,
so that we handle extended rates.
Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The iscsi class currently does not support writable sysfs
attrs for LLD sysfs settings. This patch converts the
iscsi class and driver's host attrs to use the attribute
container sysfs group and the sysfs group's is_visible callout
to be able to support readable or writable sysfs attrs.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
The iscsi class currently does not support writable sysfs
attrs for LLD sysfs settings. This patch converts the
iscsi class and driver's session attrs to use the attribute
container sysfs group and the sysfs group's is_visible callout
to be able to support readable or writable sysfs attrs.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
The iscsi class currently does not support writable sysfs
attrs for LLD sysfs settings. This patch converts the
iscsi class and drivers to use the attribute container
sysfs group and the sysfs group's is_visible callout
to be able to support readable or writable sysfs attrs.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: linux-rdma@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC3270 mandates that iSCSI PDUs are padded to the closest integer
number of four byte words. Fix the iser code to support that on both
the TX/RX flows.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The code that prepares the SG associated with SCSI command for FMR was
buggy for systems with DMA addresses that don't fit in unsigned long,
e.g under the 32-bit based XenServer dom0 sizeof(dma_addr_t) is 8.
Fix that by casting to unsigned long long a masking constant used by
the code. This resolves a crash in iser_sg_to_page_vec on this system.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
target: Convert to DIV_ROUND_UP_SECTOR_T usage for sectors / dev_max_sectors
kernel.h: Add DIV_ROUND_UP_ULL and DIV_ROUND_UP_SECTOR_T macro usage
iscsi-target: Add iSCSI fabric support for target v4.1
iscsi: Add Serial Number Arithmetic LT and GT into iscsi_proto.h
iscsi: Use struct scsi_lun in iscsi structs instead of u8[8]
iscsi: Resolve iscsi_proto.h naming conflicts with drivers/target/iscsi
This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>
Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch renames the following iscsi_proto.h structures to avoid
namespace issues with drivers/target/iscsi/iscsi_target_core.h:
*) struct iscsi_cmd -> struct iscsi_scsi_req
*) struct iscsi_cmd_rsp -> struct iscsi_scsi_rsp
*) struct iscsi_login -> struct iscsi_login_req
This patch includes useful ISCSI_FLAG_LOGIN_[CURRENT,NEXT]_STAGE*,
and ISCSI_FLAG_SNACK_TYPE_* definitions used by iscsi_target_mod, and
fixes the incorrect definition of struct iscsi_snack to following
RFC-3720 Section 10.16. SNACK Request.
Also, this patch updates libiscsi, iSER, be2iscsi, and bn2xi to
use the updated structure definitions in a handful of locations.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (26 commits)
IB/qib: Defer HCA error events to tasklet
mlx4_core: Bump the driver version to 1.0
RDMA/cxgb4: Use printk_ratelimited() instead of printk_ratelimit()
IB/mlx4: Support PMA counters for IBoE
IB/mlx4: Use flow counters on IBoE ports
IB/pma: Add include file for IBA performance counters definitions
mlx4_core: Add network flow counters
mlx4_core: Fix location of counter index in QP context struct
mlx4_core: Read extended capabilities into the flags field
mlx4_core: Extend capability flags to 64 bits
IB/mlx4: Generate GID change events in IBoE code
IB/core: Add GID change event
RDMA/cma: Don't allow IPoIB port space for IBoE
RDMA: Allow for NULL .modify_device() and .modify_port() methods
IB/qib: Update active link width
IB/qib: Fix potential deadlock with link down interrupt
IB/qib: Add sysfs interface to read free contexts
IB/mthca: Remove unnecessary read of PCI_CAP_ID_EXP
IB/qib: Remove double define
IB/qib: Remove unnecessary read of PCI_CAP_ID_EXP
...
SCSI scanning of a channel🆔lun triplet in Linux works as follows
(function scsi_scan_target() in drivers/scsi/scsi_scan.c):
- If lun == SCAN_WILD_CARD, send a REPORT LUNS command to the target
and process the result.
- If lun != SCAN_WILD_CARD, send an INQUIRY command to the LUN
corresponding to the specified channel🆔lun triplet to verify
whether the LUN exists.
So a SCSI driver must either take the channel and target id values in
account in its quecommand() function or it should declare that it only
supports one channel and one target id.
Currently the ib_srp driver does neither. As a result scanning the
SCSI bus via e.g. rescan-scsi-bus.sh causes many duplicate SCSI
devices to be created. For each 0:0:L device, several duplicates are
created with the same LUN number and with (C:I) != (0:0). Fix this by
declaring that the ib_srp driver only supports one channel and one
target id.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: <stable@kernel.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
* remove interrupt.g inclusion from netdevice.h -- not needed
* fixup fallout, add interrupt.h and hardirq.h back where needed.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
RDMA/cma: Save PID of ID's owner
RDMA/cma: Add support for netlink statistics export
RDMA/cma: Pass QP type into rdma_create_id()
RDMA: Update exported headers list
RDMA/cma: Export enum cma_state in <rdma/rdma_cm.h>
RDMA/nes: Add a check for strict_strtoul()
RDMA/cxgb3: Don't post zero-byte read if endpoint is going away
RDMA/cxgb4: Use completion objects for event blocking
IB/srp: Fix integer -> pointer cast warnings
IB: Add devnode methods to cm_class and umad_class
IB/mad: Return EPROTONOSUPPORT when an RDMA device lacks the QP required
IB/uverbs: Add devnode method to set path/mode
RDMA/ucma: Add .nodename/.mode to tell userspace where to create device node
RDMA: Add netlink infrastructure
RDMA: Add error handling to ib_core_init()
The RDMA CM currently infers the QP type from the port space selected
by the user. In the future (eg with RDMA_PS_IB or XRC), there may not
be a 1-1 correspondence between port space and QP type. For netlink
export of RDMA CM state, we want to export the QP type to userspace,
so it is cleaner to explicitly associate a QP type to an ID.
Modify rdma_create_id() to allow the user to specify the QP type, and
use it to make our selections of datagram versus connected mode.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Fix
drivers/infiniband/ulp/srp/ib_srp.c: In function 'srp_handle_recv':
drivers/infiniband/ulp/srp/ib_srp.c:1150: warning: cast to pointer from integer of different size
drivers/infiniband/ulp/srp/ib_srp.c: In function 'srp_send_completion':
drivers/infiniband/ulp/srp/ib_srp.c🔢 warning: cast to pointer from integer of different size
by adding an intermediate cast to uintptr_t.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Acked-by: David Dillow <dillowda@ornl.gov>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB: Increase DMA max_segment_size on Mellanox hardware
IB/mad: Improve an error message so error code is included
RDMA/nes: Don't print success message at level KERN_ERR
RDMA/addr: Fix return of uninitialized ret value
IB/srp: try to use larger FMR sizes to cover our mappings
IB/srp: add support for indirect tables that don't fit in SRP_CMD
IB/srp: rework mapping engine to use multiple FMR entries
IB/srp: allow sg_tablesize to be set for each target
IB/srp: move IB CM setup completion into its own function
IB/srp: always avoid non-zero offsets into an FMR
Now that we can get larger SG lists, we can take advantage of HCAs that
allow us to use larger FMR sizes. In many cases, we can use up to 512
entries, so start there and work our way down.
Signed-off-by: David Dillow <dillowda@ornl.gov>
This allows us to guarantee the ability to submit up to 8 MB requests
based on the current value of SCSI_MAX_SG_CHAIN_SEGMENTS. While FMR will
usually condense the requests into 8 SG entries, it is imperative that
the target support external tables in case the FMR mapping fails or is
not supported.
We add a safety valve to allow targets without the needed support to
reap the benefits of the large tables, but fail in a manner that lets
the user know that the data didn't make it to the device. The user must
add "allow_ext_sg=1" to the target parameters to indicate that the
target has the needed support.
If indirect_sg_entries is not specified in the modules options, then
the sg_tablesize for the target will default to cmd_sg_entries unless
overridden by the target options.
Signed-off-by: David Dillow <dillowda@ornl.gov>
Instead of forcing all of the S/G entries to fit in one FMR, and falling
back to indirect descriptors if that fails, allow the use of as many
FMRs as needed to map the request. This lays the groundwork for allowing
indirect descriptor tables that are larger than can fit in the command
IU, but should marginally improve performance now by reducing the number
of indirect descriptors needed.
We increase the minimum page size for the FMR pool to 4K, as larger
pages help increase the coverage of each FMR, and it is rare that the
kernel would send down a request with scattered 512 byte fragments.
This patch also move some of the target initialization code afte the
parsing of options, to keep it together with the new code that needs to
allocate memory based on the options given.
Signed-off-by: David Dillow <dillowda@ornl.gov>
Different configurations of target software allow differing max sizes of
the command IU. Allowing this to be changed per-target allows all
targets on an initiator to get an optimal setting.
We deprecate srp_sg_tablesize and replace it with cmd_sg_entries in
preparation for allowing more indirect descriptors than can fit in the
IU.
Signed-off-by: David Dillow <dillowda@ornl.gov>
It is unclear exactly how this code works around Mellanox SRP targets,
or if the problem is on the target side or in the HCA itself. In an
abundance of caution, we should always enable the workaround.
Signed-off-by: David Dillow <dillowda@ornl.gov>
This pactch has iser export the address and port
of the endpoint.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The meaning of CONFIG_EMBEDDED has long since been obsoleted; the option
is used to configure any non-standard kernel with a much larger scope than
only small devices.
This patch renames the option to CONFIG_EXPERT in init/Kconfig and fixes
references to the option throughout the kernel. A new CONFIG_EMBEDDED
option is added that automatically selects CONFIG_EXPERT when enabled and
can be used in the future to isolate options that should only be
considered for embedded systems (RISC architectures, SLOB, etc).
Calling the option "EXPERT" more accurately represents its intention: only
expert users who understand the impact of the configuration changes they
are making should enable it.
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: David Woodhouse <david.woodhouse@intel.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Greg KH <gregkh@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Robin Holt <holt@sgi.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>