OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Linus Torvalds	dae8f283bf	Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target fixes from Nicholas Bellinger: "These are mostly minor fixes, with the exception of the following that address fall-out from recent v4.1-rc1 changes: - regression fix related to the big fabric API registration changes and configfs_depend_item() usage, that required cherry-picking one of HCH's patches from for-next to address the issue for v4.1 code. - remaining TCM-USER -v2 related changes to enforce full CDB passthrough from Andy + Ilias. Also included is a target_core_pscsi driver fix from Andy that addresses a long standing issue with a Scsi_Host reference being leaked on PSCSI device shutdown" * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: iser-target: Fix error path in isert_create_pi_ctx() target: Use a PASSTHROUGH flag instead of transport_types target: Move passthrough CDB parsing into a common function target/user: Only support full command pass-through target/user: Update example code for new ABI requirements target/pscsi: Don't leak scsi_host if hba is VIRTUAL_HOST target: Fix se_tpg_tfo->tf_subsys regression + remove tf_subsystem target: Drop signal_pending checks after interruptible lock acquire target: Add missing parentheses target: Fix bidi command handling target/user: Disallow full passthrough (pass_level=0) ISCSI: fix minor memory leak	2015-05-31 11:31:42 -07:00
Roland Dreier	b2feda4feb	iser-target: Fix error path in isert_create_pi_ctx() We don't assign pi_ctx to desc->pi_ctx until we're certain to succeed in the function. That means the cleanup path should use the local pi_ctx variable, not desc->pi_ctx. This was detected by Coverity (CID 1260062). Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-05-30 20:01:04 -07:00
Bart Van Assche	8f71c1a27b	IPoIB/CM: Fix indentation level See also patch "IPoIB/cm: Add connected mode support for devices without SRQs" (commit ID `68e995a295`). Detected by smatch. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-05-05 13:21:27 -04:00
Linus Torvalds	c6668726d2	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target updates from Nicholas Bellinger: "Lots of activity in target land the last months. The highlights include: - Convert fabric drivers tree-wide to target_register_template() (hch + bart) - iser-target hardening fixes + v1.0 improvements (sagi) - Convert iscsi_thread_set usage to kthread.h + kill iscsi_target_tq.c (sagi + nab) - Add support for T10-PI WRITE_STRIP + READ_INSERT operation (mkp + sagi + nab) - DIF fixes for CONFIG_DEBUG_SG=y + UNMAP file emulation (akinobu + sagi + mkp) - Extended TCMU ABI v2 for future BIDI + DIF support (andy + ilias) - Fix COMPARE_AND_WRITE handling for NO_ALLLOC drivers (hch + nab) Thanks to everyone who contributed this round with new features, bug-reports, fixes, cleanups and improvements. Looking forward, it's currently shaping up to be a busy v4.2 as well" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (69 commits) target: Put TCMU under a new config option target: Version 2 of TCMU ABI target: fix tcm_mod_builder.py target/file: Fix UNMAP with DIF protection support target/file: Fix SG table for prot_buf initialization target/file: Fix BUG() when CONFIG_DEBUG_SG=y and DIF protection enabled target: Make core_tmr_abort_task() skip TMFs target/sbc: Update sbc_dif_generate pr_debug output target/sbc: Make internal DIF emulation honor ->prot_checks target/sbc: Return INVALID_CDB_FIELD if DIF + sess_prot_type disabled target: Ensure sess_prot_type is saved across session restart target/rd: Don't pass incomplete scatterlist entries to sbc_dif_verify_* target: Remove the unused flag SCF_ACK_KREF target: Fix two sparse warnings target: Fix COMPARE_AND_WRITE with SG_TO_MEM_NOALLOC handling target: simplify the target template registration API target: simplify target_xcopy_init_pt_lun target: remove the unused SCF_CMD_XCOPY_PASSTHROUGH flag target/rd: reduce code duplication in rd_execute_rw() tcm_loop: fixup tpgt string to integer conversion ...	2015-04-24 10:22:09 -07:00
Linus Torvalds	7c034dfd58	InfiniBand/RDMA updates for 4.1: - IPoIB fixes from Doug Ledford and Erez Shitrit - iSER updates from Sagi Grimberg - mlx4 GUID handling changes from Yishai Hadas - other misc fixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABCAAGBQJVN9SzAAoJEENa44ZhAt0hWq4QAJRFrwoe9ubTextSHeTU0FkY CydiQtGWrhyAHTX/KtdB1Uv9FzGHc6gqkAOXImouacYTM9ffypMF6Oj4xIYIMQtz MvNlNm07KOtQYlubiaZWcP5BjdLfMZjQxb03/9smygLTBjm80dAEt5X1znx7YrqI ZfE+ibPdvRqVEvFZKfT2U0kGU6oEVKrbJEiUCoJPwwcghDZQl18YmGOxt5qdI2uO V+71ozwozT8utSIl7S2YTJZBdkJ7tLrqrX2D/D2jUAmh1rqHIDrsXXiZ44UJj82i oXuwqmHXfq1LfuC9kxCX5JJpGeLE7E3OoxM1zIev31710zPA0v57rNKKweCi2Tj6 Z36B0SIRV4ipWr/sBhVDr1Ffc/uap3DOIEU9Z+t8rwhELCEVuxmNaNb0K1e5nPiy YOQYp/ctC0NslM4mqQJLhGMVl6H8PjodbM1whnYZLsF1+8clNvdtLYzy/cA5fGbO tngUGXu0YZGdwvfuQhi5FB45XLaErJaPcMH0QRI5G0JgtjvbzXiMlqWtekTUBi7W DJNQlVRI4S1RYRBYkq709ymXiWwTeh3rhH+ZJpM+aY8b0NR/lx+dNyesNG+7GBJH y5UOOUck0w+JbQzZo264I6a5e8pXq3kMi3BH8pF4Jbo5WvxSF6uriXb6Q1JzfH20 Jn0J6W9ghCSfrhMI1zgQ =v1jB -----END PGP SIGNATURE----- Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull InfiniBand/RDMA updates from Roland Dreier: - IPoIB fixes from Doug Ledford and Erez Shitrit - iSER updates from Sagi Grimberg - mlx4 GUID handling changes from Yishai Hadas - other misc fixes * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (51 commits) mlx5: wrong page mask if CONFIG_ARCH_DMA_ADDR_T_64BIT enabled for 32Bit architectures IB/iser: Rewrite bounce buffer code path IB/iser: Bump version to 1.6 IB/iser: Remove code duplication for a single DMA entry IB/iser: Pass struct iser_mem_reg to iser_fast_reg_mr and iser_reg_sig_mr IB/iser: Modify struct iser_mem_reg members IB/iser: Make fastreg pool cache friendly IB/iser: Move PI context alloc/free to routines IB/iser: Move fastreg descriptor pool get/put to helper functions IB/iser: Merge build page-vec into register page-vec IB/iser: Get rid of struct iser_rdma_regd IB/iser: Remove redundant assignments in iser_reg_page_vec IB/iser: Move memory reg/dereg routines to iser_memory.c IB/iser: Don't pass ib_device to fall_to_bounce_buff routine IB/iser: Remove a redundant struct iser_data_buf IB/iser: Remove redundant cmd_data_len calculation IB/iser: Fix wrong calculation of protection buffer length IB/iser: Handle fastreg/local_inv completion errors IB/iser: Fix unload during ep_poll wrong dereference ib_srpt: convert printk's to pr_* functions ...	2015-04-22 11:50:05 -07:00
Erez Shitrit	2c15395974	IB/ipoib: Fix ndo_get_iflink Currently, iflink of the parent interface was always accessed, even when interface didn't have a parent and hence we crashed there. Handle the interface types properly: for a child interface, return the ifindex of the parent, for parent interface, return its ifindex. For child devices, make sure to set the parent pointer prior to invoking register_netdevice(), this allows the new ndo to be called by the stack immediately after the child device is registered. Fixes: `5aa7add8f1` ('infiniband/ipoib: implement ndo_get_iflink') Reported-by: Honggang Li <honli@redhat.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Honggang Li <honli@redhat.com> Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>+ Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-17 15:21:04 -04:00
Doug Ledford	c1c2fef6cf	Merge branches 'cve-fixup', 'ipoib', 'iser', 'misc-4.1', 'or-mlx4' and 'srp' into for-4.1	2015-04-15 16:24:49 -04:00
Sagi Grimberg	ba943fb237	IB/iser: Rewrite bounce buffer code path In some rare cases, IO operations may be not aligned to page boundaries. This prevents iser from performing fast memory registration. In order to overcome that iser uses a bounce buffer to carry the transaction. We basically allocate a buffer in the size of the transaction and perform a copy. The buffer allocation using kmalloc is too restrictive since it requires higher order (atomic) allocations for large transactions (which may result in memory exhaustion fairly fast for some workloads). We rewrite the bounce buffer code path to allocate scattered pages and perform a copy between the transaction sg and the bounce sg. Reported-by: Alex Lyakas <alex@zadarastorage.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:14 -04:00
Sagi Grimberg	4fcd1470a0	IB/iser: Bump version to 1.6 Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:13 -04:00
Sagi Grimberg	ad1e567242	IB/iser: Remove code duplication for a single DMA entry In singleton scatterlists, DMA memory registration code is taken both for Fastreg and FMR code paths. Move it to a function. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:13 -04:00
Sagi Grimberg	6ef8bb837d	IB/iser: Pass struct iser_mem_reg to iser_fast_reg_mr and iser_reg_sig_mr Instead of passing ib_sge as output variable, we pass the mem_reg pointer to have the routines fill the rkey as well. This reduces code duplication and extra assignments. This is a preparation step to unify some registration logics together. Also, pass iser_fast_reg_mr the fastreg descriptor directly. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:13 -04:00
Sagi Grimberg	90a6684c30	IB/iser: Modify struct iser_mem_reg members No need to keep lkey, va, len variables, we can keep them as struct ib_sge. This will help when we change the memory registration logic. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:13 -04:00
Sagi Grimberg	8b95aa2c1b	IB/iser: Make fastreg pool cache friendly Memory regions are resources that are saved in the device caches. Increase the probability for a cache hit by adding the MRU descriptor to pool head. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:13 -04:00
Sagi Grimberg	4dec2a27e3	IB/iser: Move PI context alloc/free to routines Make iser_[create\|destroy]_fastreg_desc shorter, more readable and easily extendable. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:13 -04:00
Sagi Grimberg	bd8b944eee	IB/iser: Move fastreg descriptor pool get/put to helper functions Instead of open-coding connection fastreg pool get/put, we introduce iser_reg_desc[get\|put] helpers. We aren't setting these static as this will be a per-device routine later on. Also, cleanup iser_unreg_rdma_mem_fastreg a bit. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:13 -04:00
Sagi Grimberg	f0e35c27a5	IB/iser: Merge build page-vec into register page-vec No need for these two separate. Keep it in a single routine like in the fastreg case. This will also make iser_reg_page_vec closer to iser_fast_reg_mr arguments. This is a preparation step for registration flow refactor. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:13 -04:00
Sagi Grimberg	b130ededff	IB/iser: Get rid of struct iser_rdma_regd This struct members other than struct iser_mem_reg are unused, so remove it altogether. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:13 -04:00
Sagi Grimberg	6847fdeb0b	IB/iser: Remove redundant assignments in iser_reg_page_vec Buffer length was assigned twice, and no reason to set va to io_addr and then add the offset, just set va to io_addr + offset. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:13 -04:00
Sagi Grimberg	d03e61d036	IB/iser: Move memory reg/dereg routines to iser_memory.c As memory registration/de-registration methods, lets move them to their natural location. While we're at it, make iser_reg_page_vec routine static. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:12 -04:00
Sagi Grimberg	5640832590	IB/iser: Don't pass ib_device to fall_to_bounce_buff routine No need to pass that, we can take it from the task. In a later stage, this function will be invoked according to a device capability. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:12 -04:00
Sagi Grimberg	e3784bd1d9	IB/iser: Remove a redundant struct iser_data_buf No need to keep two iser_data_buf structures just in case we use mem copy. We can avoid that just by adding a pointer to the original sg. So keep only two iser_data_buf per command (data and protection) and pass the relevant data_buf to bounce buffer routine. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:12 -04:00
Sagi Grimberg	ecc3993a2a	IB/iser: Remove redundant cmd_data_len calculation This code was added before we had protection data length calculation (in iser_send_command), so we needed to calc the sg data length from the sg itself. This is not needed anymore. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:12 -04:00
Sagi Grimberg	a065fe6aa2	IB/iser: Fix wrong calculation of protection buffer length This length miss-calculation may cause a silent data corruption in the DIX case and cause the device to reference unmapped area. Fixes: `d77e65350f` ('libiscsi, iser: Adjust data_length to include protection information') Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:12 -04:00
Sagi Grimberg	30bf1d58ae	IB/iser: Handle fastreg/local_inv completion errors Fast registration and local invalidate work requests can also fail. We should call error completion handler for them. Reported-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:12 -04:00
Sagi Grimberg	c4de4663e0	IB/iser: Fix unload during ep_poll wrong dereference In case the user unloaded ib_iser while ep_connect is in progress, we need to destroy the endpoint although ep_disconnect wasn't invoked (we detect this by the iser conn state != DOWN). However, if we got an REJECTED/UNREACHABLE CM event we move the connection state to DOWN which will prevent us from destroying the endpoint in the module unload stage. Fix this by setting the connection state to TERMINATING in iser_conn_error so we can still destroy the endpoint at unload stage. Reported-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:07:12 -04:00
Doug Ledford	9f5d32af09	ib_srpt: convert printk's to pr_* functions The driver already defined the pr_format, it just hadn't been converted to use pr_info, pr_warn, and pr_err instead of the equivalent printks. Convert so that messages from the driver are now properly tagged with their driver name and can be more easily debugged. In addition, a number of these printk's were not newline terminated, so fix that at the same time. Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:54 -04:00
Bart Van Assche	56b5390caf	IB/srp: Use P_Key cache for P_Key lookups This change slightly reduces the time needed to log in. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: David Dillow <dave@thedillows.org> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:54 -04:00
Erez Shitrit	0e5544d9bf	IB/ipoib: Remove IPOIB_MCAST_RUN bit After Doug Ledford's changes there is no need in that bit, it's semantic becomes subset of the IPOIB_FLAG_OPER_UP bit. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:19 -04:00
Erez Shitrit	1e85b806f9	IB/ipoib: Save only IPOIB_MAX_PATH_REC_QUEUE skb's Whenever there is no path->ah to the destination, keep only defined number of skb's. Otherwise there are cases that the driver can keep infinite list of skb's. For example, when one device want to send unicast arp to the destination, and from some reason the SM doesn't respond, the driver currently keeps all the skb's. If that unicast arp traffic stopped, all these skb's are kept by the path object till the interface is down. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:19 -04:00
Erez Shitrit	2c01073095	IB/ipoib: Handle QP in SQE state As the result of a completion error the QP can moved to SQE state by the hardware. Since it's not the Error state, there are no flushes and hence the driver doesn't know about that. The fix creates a task that after completion with error which is not a flush tracks the QP state and if it is in SQE state moves it back to RTS. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:19 -04:00
Erez Shitrit	3fd0605caa	IB/ipoib: Update broadcast record values after each successful join request Update the cached broadcast record in the priv object after every new join of this broadcast domain group. These values are needed for the port configuration (MTU size) and to all the new multicast (non-broadcast) join requests initial parameters. For example, SM starts with 2K MTU for all the fabric, and after that it restarts (or handover to new SM) with new port configuration of 4K MTU. Without using the new values, the driver will keep its old configuration of 2K and will not apply the new configuration of 4K. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:18 -04:00
Erez Shitrit	a44878d100	IB/ipoib: Use one linear skb in RX flow The current code in the RX flow uses two sg entries for each incoming packet, the first one was for the IB headers and the second for the rest of the data, that causes two dma map/unmap and two allocations, and few more actions that were done at the data path. Use only one linear skb on each incoming packet, for the data (IB headers and payload), that reduces the packet processing in the data-path (only one skb, no frags, the first frag was not used anyway, less memory allocations) and the dma handling (only one dma map/unmap over each incoming packet instead of two map/unmap per each incoming packet). After commit `73d3fe6d1c` ("gro: fix aggregation for skb using frag_list") from Eric Dumazet, we will get full aggregation for large packets. When running bandwidth tests before and after the (over the card's numa node), using "netperf -H 1.1.1.3 -T -t TCP_STREAM", the results before are ~12Gbs before and after ~16Gbs on my setup (Mellanox's ConnectX3). Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:18 -04:00
Doug Ledford	1c0453d64a	IB/ipoib: drop mcast_mutex usage We needed the mcast_mutex when we had to prevent the join completion callback from having the value it stored in mcast->mc overwritten by a delayed return from ib_sa_join_multicast. By storing the return of ib_sa_join_multicast in an intermediate variable, we prevent a delayed return from ib_sa_join_multicast overwriting the valid contents of mcast->mc, and we no longer need a mutex to force the join callback to run after the return of ib_sa_join_multicast. This allows us to do away with the mutex entirely and protect our critical sections with a just a spinlock instead. This is highly desirable as there were some places where we couldn't use a mutex because the code was not allowed to sleep, and so we were currently using a mix of mutex and spinlock to protect what we needed to protect. Now we only have a spin lock and the locking complexity is greatly reduced. Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:18 -04:00
Doug Ledford	d2fe937ce6	IB/ipoib: deserialize multicast joins Allow the ipoib layer to attempt to join all outstanding multicast groups at once. The ib_sa layer will serialize multiple attempts to join the same group, but will process attempts to join different groups in parallel. Take advantage of that. In order to make this happen, change the mcast_join_thread to loop through all needed joins, sending a join request for each one that we still need to join. There are a few special cases we handle though: 1) Don't attempt to join anything but the broadcast group until the join of the broadcast group has succeeded. 2) No longer restart the join task at the end of completion handling. If we completed successfully, we are done. The join task now needs kicked either by mcast_send or mcast_restart_task or mcast_start_thread, but should not need started anytime else except when scheduling a backoff attempt to rejoin. 3) No longer use separate join/completion routines for regular and sendonly joins, pass them all through the same routine and just do the right thing based on the SENDONLY join flag. 4) Only try to join a SENDONLY join twice, then drop the packets and quit trying. We leave the mcast group in the list so that if we get a new packet, all that we have to do is queue up the packet and restart the join task and it will automatically try to join twice and then either send or flush the queue again. Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:18 -04:00
Doug Ledford	69911416d8	IB/ipoib: fix MCAST_FLAG_BUSY usage Commit `a9c8ba5884` ("IPoIB: Fix usage of uninitialized multicast objects") added a new flag MCAST_JOIN_STARTED, but was not very strict in how it was used. We didn't always initialize the completion struct before we set the flag, and we didn't always call complete on the completion struct from all paths that complete it. And when we did complete it, sometimes we continued to touch the mcast entry after the completion, opening us up to possible use after free issues. This made it less than totally effective, and certainly made its use confusing. And in the flush function we would use the presence of this flag to signal that we should wait on the completion struct, but we never cleared this flag, ever. In order to make things clearer and aid in resolving the rtnl deadlock bug I've been chasing, I cleaned this up a bit. 1) Remove the MCAST_JOIN_STARTED flag entirely 2) Change MCAST_FLAG_BUSY so it now only means a join is in-flight 3) Test mcast->mc directly to see if we have completed ib_sa_join_multicast (using IS_ERR_OR_NULL) 4) Make sure that before setting MCAST_FLAG_BUSY we always initialize the mcast->done completion struct 5) Make sure that before calling complete(&mcast->done), we always clear the MCAST_FLAG_BUSY bit 6) Take the mcast_mutex before we call ib_sa_multicast_join and also take the mutex in our join callback. This forces ib_sa_multicast_join to return and set mcast->mc before we process the callback. This way, our callback can safely clear mcast->mc if there is an error on the join and we will do the right thing as a result in mcast_dev_flush. 7) Because we need the mutex to synchronize mcast->mc, we can no longer call mcast_sendonly_join directly from mcast_send and instead must add sendonly join processing to the mcast_join_task 8) Make MCAST_RUN mean that we have a working mcast subsystem, not that we have a running task. We know when we need to reschedule our join task thread and don't need a flag to tell us. 9) Add a helper for rescheduling the join task thread A number of different races are resolved with these changes. These races existed with the old MCAST_FLAG_BUSY usage, the MCAST_JOIN_STARTED flag was an attempt to address them, and while it helped, a determined effort could still trip things up. One race looks something like this: Thread 1 Thread 2 ib_sa_join_multicast (as part of running restart mcast task) alloc member call callback ifconfig ib0 down wait_for_completion callback call completes wait_for_completion in mcast_dev_flush completes mcast->mc is PTR_ERR_OR_NULL so we skip ib_sa_leave_multicast return from callback return from ib_sa_join_multicast set mcast->mc = return from ib_sa_multicast We now have a permanently unbalanced join/leave issue that trips up the refcounting in core/multicast.c Another like this: Thread 1 Thread 2 Thread 3 ib_sa_multicast_join ifconfig ib0 down priv->broadcast = NULL join_complete wait_for_completion mcast->mc is not yet set, so don't clear return from ib_sa_join_multicast and set mcast->mc complete return -EAGAIN (making mcast->mc invalid) call ib_sa_multicast_leave on invalid mcast->mc, hang forever By holding the mutex around ib_sa_multicast_join and taking the mutex early in the callback, we force mcast->mc to be valid at the time we run the callback. This allows us to clear mcast->mc if there is an error and the join is going to fail. We do this before we complete the mcast. In this way, mcast_dev_flush always sees consistent state in regards to mcast->mc membership at the time that the wait_for_completion() returns. Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:18 -04:00
Doug Ledford	efc82eeeae	IB/ipoib: No longer use flush as a parameter Various places in the IPoIB code had a deadlock related to flushing the ipoib workqueue. Now that we have per device workqueues and a specific flush workqueue, there is no longer a deadlock issue with flushing the device specific workqueues and we can do so unilaterally. Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:18 -04:00
Doug Ledford	0b39578bcd	IB/ipoib: Use dedicated workqueues per interface During my recent work on the rtnl lock deadlock in the IPoIB driver, I saw that even once I fixed the apparent races for a single device, as soon as that device had any children, new races popped up. It turns out that this is because no matter how well we protect against races on a single device, the fact that all devices use the same workqueue, and flush_workqueue() flushes everything from that workqueue means that we would also have to prevent all races between different devices (for instance, ipoib_mcast_restart_task on interface ib0 can race with ipoib_mcast_flush_dev on interface ib0.8002, resulting in a deadlock on the rtnl_lock). There are several possible solutions to this problem: Make carrier_on_task and mcast_restart_task try to take the rtnl for some set period of time and if they fail, then bail. This runs the real risk of dropping work on the floor, which can end up being its own separate kind of deadlock. Set some global flag in the driver that says some device is in the middle of going down, letting all tasks know to bail. Again, this can drop work on the floor. Or the method this patch attempts to use, which is when we bring an interface up, create a workqueue specifically for that interface, so that when we take it back down, we are flushing only those tasks associated with our interface. In addition, keep the global workqueue, but now limit it to only flush tasks. In this way, the flush tasks can always flush the device specific work queues without having deadlock issues. Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:18 -04:00
Doug Ledford	894021a752	IB/ipoib: Make the carrier_on_task race aware We blindly assume that we can just take the rtnl lock and that will prevent races with downing this interface. Unfortunately, that's not the case. In ipoib_mcast_stop_thread() we will call flush_workqueue() in an attempt to clear out all remaining instances of ipoib_join_task. But, since this task is put on the same workqueue as the join task, the flush_workqueue waits on this thread too. But this thread is deadlocked on the rtnl lock. The better thing here is to use trylock and loop on that until we either get the lock or we see that FLAG_OPER_UP has been cleared, in which case we don't need to do anything anyway and we just return. While investigating which flag should be used, FLAG_ADMIN_UP or FLAG_OPER_UP, it was determined that FLAG_OPER_UP was the more appropriate flag to use. However, there was a mix of these two flags in use in the existing code. So while we check for that flag here as part of this race fix, also cleanup the two places that had used the less appropriate flag for their tests. Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:17 -04:00
Doug Ledford	c84ca6d2b1	IB/ipoib: Consolidate rtnl_lock tasks in workqueue The ipoib_mcast_flush_dev routine is called with the rtnl_lock held and needs to keep it held. It also needs to call flush_workqueue() to flush out any outstanding work. In the past, we've had to try and make sure that we didn't flush out any outstanding join completions because they also wanted to grab rtnl_lock() and that would deadlock. It turns out that the only thing in the join completion handler that needs this lock can be safely moved to our carrier_on_task, thereby reducing the potential for the join completion code and the flush code to deadlock against each other. Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:17 -04:00
Doug Ledford	be7aa663fc	IB/ipoib: change init sequence ordering In preparation for using per device work queues, we need to move the start of the neighbor thread task to after ipoib_ib_dev_init and move the destruction of the neighbor task to before ipoib_ib_dev_cleanup. Otherwise we will end up freeing our workqueue with work possibly still on it. Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:17 -04:00
Doug Ledford	e135106fac	IB/ipoib: factor out ah flushing Create a an ipoib_flush_ah and ipoib_stop_ah routines to use at appropriate times to flush out all remaining ah entries before we shut the device down. Because neighbors and mcast entries can each have a reference on any given ah, we must make sure to free all of those first before our ah will actually have a 0 refcount and be able to be reaped. This factoring is needed in preparation for having per-device work queues. The original per-device workqueue code resulted in the following error message: <ibdev>: ib_dealloc_pd failed That error was tracked down to this issue. With the changes to which workqueues were flushed when, there were no flushes of the per device workqueue after the last ah's were freed, resulting in an attempt to dealloc the pd with outstanding resources still allocated. This code puts the explicit flushes in the needed places to avoid that problem. Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-04-15 16:06:17 -04:00
Christoph Hellwig	9ac8928e6a	target: simplify the target template registration API Instead of calling target_fabric_configfs_init() + target_fabric_configfs_register() / target_fabric_configfs_deregister() target_fabric_configfs_free() from every target driver, rewrite the API so that we have simple register/unregister functions that operate on a const operations vector. This patch also fixes a memory leak in several target drivers. Several target drivers namely called target_fabric_configfs_deregister() without calling target_fabric_configfs_free(). A large part of this patch is based on earlier changes from Bart Van Assche <bart.vanassche@sandisk.com>. (v2: Add a new TF_CIT_SETUP_DRV macro so that the core configfs code can declare attributes as either core only or for drivers) Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-04-14 12:28:41 -07:00
Sagi Grimberg	9e35eff449	iser-target: Bump version to 1.0 Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-04-07 23:27:57 -07:00
Sagi Grimberg	dac6ab305d	iser-target: Remove conn_ prefix from struct isert_conn members These variables are always accessed via struct isert_conn so no need to have a "conn_" prefix for them. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-04-07 23:27:57 -07:00
Sagi Grimberg	992607e813	iser-target: Remove un-needed rdma_listen backlog iser target can handle as many connect request as the fabric sends to it. This backlog should not set as a back-pressure mechanism (which is not very useful). isert does need a back-pressure mechanism, but it should be added in isert by monitoring the number of pending established connections (will be added in a later stage). Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-04-07 23:27:56 -07:00
Sagi Grimberg	57df81e3b1	iser-target: Remove redundant check on the device In iser_connect_release there is no chance that the iser device is set to NULL, if this happens we have a BUG. So use BUG_ON. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-04-07 23:27:56 -07:00
Sagi Grimberg	c6b8e9180d	iser-target: Get rid of redundant max_accept Not sure what it was used for, but there is no real need for it now as I see it. Go ahead and get rid of it. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-04-07 23:27:55 -07:00
Sagi Grimberg	ae9ea9ed38	iser-target: Split some logic in isert_connect_request to routines Move login buffer alloc/free code to dedicated routines and introduce isert_conn_init which initializes the connection lists and locks. Simplifies and cleans up the code a little bit. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-04-07 23:27:55 -07:00
Sagi Grimberg	cf8ae95823	iser-target: Rename device find/release routines isert_device_find_by_ib_dev and isert_device_try_release can have a better, more common name like isert_device_[get\|put]. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-04-07 23:27:54 -07:00
Sagi Grimberg	7748681bb8	iser-target: Rename rend/recv completion routines Make receive/send completion handling routines symmetrical. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2015-04-07 23:27:54 -07:00

1 2 3 4 5 ...

1079 Commits