OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Bhaktipriya Shridhar	f54816261c	IB/addr: Remove deprecated create_singlethread_workqueue The workqueue "addr_wq" queues a single work item &work and hence doesn't require ordering. Also, it is being used on a memory reclaim path. Hence, it has been converted to use alloc_workqueue with WQ_MEM_RECLAIM set. WQ_MEM_RECLAIM has been set to ensure forward progress under memory pressure. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-07 16:54:25 -04:00
Bhaktipriya Shridhar	dee9acbb32	IB/cma: Remove deprecated create_singlethread_workqueue alloc_ordered_workqueue() with WQ_MEM_RECLAIM set, replaces deprecated create_singlethread_workqueue(). This is the identity conversion. The workqueue "cma_wq" queues work item cma_work_handler. It has been identity converted. WQ_MEM_RECLAIM has been set to ensure forward progress under memory pressure. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-07 16:54:24 -04:00
Bhaktipriya Shridhar	a190d3b07c	IB/ucma: Remove deprecated create_singlethread_workqueue alloc_ordered_workqueue() with WQ_MEM_RECLAIM set, replaces deprecated create_singlethread_workqueue(). This is the identity conversion. The workqueue "close_wq" queues work items &ctx->close_work (maps to ucma_close_id) and &con_req_eve->close_work (maps to ucma_close_event_id). It has been identity converted. WQ_MEM_RECLAIM has been set to ensure forward progress under memory pressure. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-07 16:54:24 -04:00
Bhaktipriya Shridhar	01013cdf06	IB/multicast: Remove deprecated create_singlethread_workqueue alloc_ordered_workqueue() with WQ_MEM_RECLAIM set, replaces deprecated create_singlethread_workqueue(). This is the identity conversion. The workqueue "mcast_wq" queues work item &group->work. It has been identity converted. WQ_MEM_RECLAIM has been set to ensure forward progress under memory pressure. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-07 16:54:23 -04:00
Bhaktipriya Shridhar	1c99e299ba	IB/mad: Remove deprecated create_singlethread_workqueue The workqueue "ib_nl" queues work items &ib_nl_timed_work and &mad_agent_priv->local_work. It has been identity converted. WQ_MEM_RECLAIM has been set to ensure forward progress under memory pressure. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-07 16:54:23 -04:00
Bhaktipriya Shridhar	4534d85902	IB/sa : Remove deprecated create_singlethread_workqueue alloc_ordered_workqueue() with WQ_MEM_RECLAIM set, replaces deprecated create_singlethread_workqueue(). This is the identity conversion. The workqueue "ib_nl" queues work item &ib_nl_timed_work. It has been identity converted. WQ_MEM_RECLAIM has been set to ensure forward progress under memory pressure. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-07 16:54:22 -04:00
Aviv Heller	13eab21f92	IB/mlx5: LAG QP load balancing When LAG is active, QP tx affinity (the physical port to which a QP is affined, or the TIS in case of raw-eth) is set in a round robin fashion during state transition from RESET to INIT. Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-07 16:54:22 -04:00
Aviv Heller	4babcf97c5	IB/mlx5: Set unique device name on LAG IB bond device name is now 'mlx5_bond_X', instead of 'mlx5_X'. Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-07 16:54:21 -04:00
Aviv Heller	88621dfe90	IB/mlx5: Port status track LAG master, when LAG is active When LAG is active, port up/down events should be triggered by tracking the LAG master, and not one of the two slave netdevs. In the same manner, ib_query_port() should return the details of the LAG master. Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-07 16:54:21 -04:00
Aviv Heller	9ef9c640f4	IB/mlx5: Merge vports flow steering during LAG This is done in two steps: 1) Issuing CREATE_VPORT_LAG in order to have Ethernet traffic from both ports arriving on PF0 root flowtable, so we will be able to catch all raw-eth traffic on PF0. 2) Creation of LAG demux flowtable in order to direct all non-raw-eth traffic back to its source port, assuring that normal Ethernet traffic "jumps" to the root flowtable of its RX port (non-LAG behavior). Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-07 16:54:20 -04:00
Aviv Heller	5ec8c83e3a	IB/mlx5: Port events in RoCE now rely on netdev events Since ib_query_port() in RoCE returns the state of its netdev as the port state, it makes sense to propagate the port up/down events to ib_core when the netdev port state changes, instead of relying on traditional core events. This also keeps both the event and ib_query_port() synchronized. Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-07 16:54:20 -04:00
Yishai Hadas	350d0e4c7e	IB/mlx5: Track asynchronous events on a receive work queue Track asynchronous events on a receive work queue by using the mlx5_core_create_rq_tracked API. In case a fatal error has occurred letting the IB layer know about by using the ib_wq event handler. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-07 16:54:20 -04:00
Maor Gottlieb	466fa6d2e3	IB/mlx5: Add support of more IPv6 fields to flow steering Add support to receive Traffic Class, specific IPv6 protocol or IPv6 flow label. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-07 16:54:19 -04:00
Maor Gottlieb	ca0d475385	IB/mlx5: Add support in TOS and protocol to flow steering Add support to receive TOS or specific IPv4 protocol. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-07 16:54:19 -04:00
Maor Gottlieb	a72c6a2b0e	IB/core: Add more fields to IPv6 flow specification Add the following fields to IPv6 flow filter specification: 1. Traffic Class 2. Flow Label 3. Next Header 4. Hop Limit Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-07 16:54:18 -04:00
Maor Gottlieb	15dfbd6b4f	IB/uverbs: Add support to extend flow steering specifications Flow steering specifications structures were implemented as in an extensible way that allows one to add new filters and new fields to existing filters. These specifications have never been extended, therefore the kernel flow specifications size and the user flow specifications size were must to be equal. In downstream patch, the IPv4 flow specifications type is extended to support TOS and TTL fields. To support an extension we change the flow specifications size condition test to be as following: * If the user flow specifications is bigger than the kernel specifications, we verify that all the bits which not in the kernel specifications are zeros and the flow is added only with the kernel specifications fields. * Otherwise, we add flow rule only with the user specifications fields. User space filters must be aligned with 32bits. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-07 16:54:17 -04:00
Maor Gottlieb	c47ac6aee6	IB/mlx5: Add validation to flow specifications parsing Add validation check that all set fields in flow specification are supported by vendor. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-07 16:54:17 -04:00
Maor Gottlieb	1f02a09c38	IB/mlx4: Add validation to flow specifications parsing Add validation check that all set fields in flow specification are supported by vendor. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-07 16:54:16 -04:00
Maor Gottlieb	cc0e5d4235	IB/mlx5: Add sniffer support to steering Add support to create sniffer rule. This rule receive all incoming and outgoing packets from the port. A user could create such rule by using IB_FLOW_ATTR_SNIFFER type. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-07 16:54:16 -04:00
Maor Gottlieb	d9d4980af2	IB/mlx5: Increase flow table reference count in create rule Move the reference count increasing of flow table to be in create_flow_rule, it will increase the reference count for each rule creation and not for each flow. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-07 16:54:15 -04:00
Maor Gottlieb	dd063d0e6c	IB/mlx5: Fix coverity warning Fix covertiy warning of passing "&flow_attr" to function "create_flow_rule" which uses it as an array. In addition pass flow attributes argument as const. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-07 16:54:15 -04:00
Maor Gottlieb	5497adc632	IB/mlx5: Save flow table priority handler instead of index Saving the flow table priority object's pointer in the flow handle is necessary for downstream patches since the sniffer flow table isn't placed at the standard flow_db structure but in a different database. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-07 16:54:15 -04:00
Maor Gottlieb	7055a29471	IB/mlx5: Fix steering resource leak Fix multicast flow rule leak on adding unicast rule failure. Fixes: `038d2ef875` ('IB/mlx5: Add flow steering support') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-07 16:54:14 -04:00
Alex Vesker	eb49ab0c5f	IB/mlx5: Add port counter support for raw packet QP Counters weren't updated due to raw packet QPs' traffic since the counter-id was not associated with the QP. Added support for associating the q-counter-id with the raw packet QP. The attachment is done only when changing RQ raw packet QP state from RST to INIT in modify-RQ command. FW support is required for the above, without this support raw packet QP counters will not count. Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-07 16:54:14 -04:00
Alex Vesker	0680efa214	IB/mlx5: Refactor raw packet QP modify function Added a struct for modifying raw QP, this will allow modifying multiple parameters in raw packet QP RQ and can also be used for SQ in the future. Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-07 16:54:13 -04:00
Yishai Hadas	31f69a82b4	IB/mlx5: Expose RSS related capabilities Expose RSS related capabilities on both IB and vendor channels. In addition to the IB capabilities the driver reports some extra capabilities on its vendor channel: - Bit mask of the supported types of hash functions. - Bit mask of the supported RX fields that can participate in the RX hashing. Those capabilities are applicable only when the link layer is Ethernet. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-07 16:54:13 -04:00
Yishai Hadas	47adf2f4f5	IB/uverbs: Expose RSS related capabilities Query RSS related attributes and return them to user-space via the extended query device uverbs command. It includes both direct ones (i.e. struct ib_uverbs_rss_caps) and max_wq_type_rq which may be used in both RSS and non RSS flows. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-07 16:54:12 -04:00
Parav Pandit	e404f945a6	IB/rxe: improved debug prints & code cleanup 1. Debugging qp state transitions and qp errors in loopback and multiple QP tests is difficult without qp numbers in debug logs. This patch adds qp number to important debug logs. 2. Instead of having rxe: prefix in few logs and not having in few logs, using uniform module name prefix using pr_fmt macro. 3. Code cleanup for various warnings reported by checkpatch for incomplete unsigned data type, line over 80 characters, return statements. Signed-off-by: Parav Pandit <pandit.parav@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-06 13:50:04 -04:00
Stephen Bates	b9fe856e54	rdma_rxe: Ensure rdma_rxe init occurs at correct time There is a problem when CONFIG_RDMA_RXE=y and CONFIG_IPV6=y. This results in the rdma_rxe initialization occurring before the IPv6 services are ready. This patch delays the initialization of rdma_rxe until after the IPv6 services are ready. This fix is based on one proposed by Logan Gunthorpe on a much older code base. Signed-off-by: Stephen Bates <sbates@raithlin.com> Reviewed-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-06 13:50:04 -04:00
Parav Pandit	b6bbee0d24	IB/rxe: Properly honor max IRD value for rd/atomic. This patch honoris the max incoming read request count instead of outgoing read req count (a) during modify qp by allocating response queue metadata (b) during incoming read request processing Signed-off-by: Parav Pandit <pandit.parav@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-06 13:50:04 -04:00
Parav Pandit	d9703650f4	IB/{rxe,core,rdmavt}: Fix kernel crash for reg MR This patch fixes below kernel crash on memory registration for rxe and other transport drivers which has dma_ops extension. IB/core invokes ib_map_sg_attrs() in generic manner with dma attributes which is used by mlx5 and mthca adapters. However in doing so it ignored honoring dma_ops extension of software based transports for sg map/unmap operation. This results in calling dma_map_sg_attrs of hardware virtual device resulting in crash for null reference. We extend the core to support sg_map/unmap_attrs and transport drivers to implement those dma_ops callback functions. Verified usign perftest applications. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81032a75>] check_addr+0x35/0x60 ... Call Trace: [<ffffffff81032b39>] ? nommu_map_sg+0x99/0xd0 [<ffffffffa02b31c6>] ib_umem_get+0x3d6/0x470 [ib_core] [<ffffffffa01cc329>] rxe_mem_init_user+0x49/0x270 [rdma_rxe] [<ffffffffa01c793a>] ? rxe_add_index+0xca/0x100 [rdma_rxe] [<ffffffffa01c995f>] rxe_reg_user_mr+0x9f/0x130 [rdma_rxe] [<ffffffffa00419fe>] ib_uverbs_reg_mr+0x14e/0x2c0 [ib_uverbs] [<ffffffffa003d3ab>] ib_uverbs_write+0x15b/0x3b0 [ib_uverbs] [<ffffffff811e92a6>] ? mem_cgroup_commit_charge+0x76/0xe0 [<ffffffff811af0a9>] ? page_add_new_anon_rmap+0x89/0xc0 [<ffffffff8117e6c9>] ? lru_cache_add_active_or_unevictable+0x39/0xc0 [<ffffffff811f0da8>] __vfs_write+0x28/0x120 [<ffffffff811f1239>] ? rw_verify_area+0x49/0xb0 [<ffffffff811f1492>] vfs_write+0xb2/0x1b0 [<ffffffff811f27d6>] SyS_write+0x46/0xa0 [<ffffffff814f7d32>] entry_SYSCALL_64_fastpath+0x1a/0xa4 Signed-off-by: Parav Pandit <pandit.parav@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-06 13:50:04 -04:00
Parav Pandit	ffae955d49	IB/rxe: Fix sending out loopback packet on netdev interface. Both prepare4 and prepare6 sets loopback mask in pkt_info structure instance of skb. The xmit_packet and other requester side functions use a pkt_info struct from the stack without the proper mask. This results in sending out the packet to the actual netdev device and loopback functionality is broken. Modify prepare() to pass its correctly marked pkt_info struct to prepare4() and prepare6() instead of them using SKB_TO_PKT(skb) and getting an incorrectly set mask. Verified with perftest applications. Signed-off-by: Parav Pandit <pandit.parav@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-06 13:50:04 -04:00
Parav Pandit	063af59597	IB/rxe: Avoid scheduling tasklet for userspace QP This patch avoids scheduing tasklet for WQE and protocol processing for user space QP. It performs the task in calling process context. To improve code readability kernel specific post_send handling moved to post_send_kernel() function. Signed-off-by: Parav Pandit <pandit.parav@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-06 13:50:04 -04:00
Linus Torvalds	687ee0ad4e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: 1) BBR TCP congestion control, from Neal Cardwell, Yuchung Cheng and co. at Google. https://lwn.net/Articles/701165/ 2) Do TCP Small Queues for retransmits, from Eric Dumazet. 3) Support collect_md mode for all IPV4 and IPV6 tunnels, from Alexei Starovoitov. 4) Allow cls_flower to classify packets in ip tunnels, from Amir Vadai. 5) Support DSA tagging in older mv88e6xxx switches, from Andrew Lunn. 6) Support GMAC protocol in iwlwifi mwm, from Ayala Beker. 7) Support ndo_poll_controller in mlx5, from Calvin Owens. 8) Move VRF processing to an output hook and allow l3mdev to be loopback, from David Ahern. 9) Support SOCK_DESTROY for UDP sockets. Also from David Ahern. 10) Congestion control in RXRPC, from David Howells. 11) Support geneve RX offload in ixgbe, from Emil Tantilov. 12) When hitting pressure for new incoming TCP data SKBs, perform a partial rathern than a full purge of the OFO queue (which could be huge). From Eric Dumazet. 13) Convert XFRM state and policy lookups to RCU, from Florian Westphal. 14) Support RX network flow classification to igb, from Gangfeng Huang. 15) Hardware offloading of eBPF in nfp driver, from Jakub Kicinski. 16) New skbmod packet action, from Jamal Hadi Salim. 17) Remove some inefficiencies in snmp proc output, from Jia He. 18) Add FIB notifications to properly propagate route changes to hardware which is doing forwarding offloading. From Jiri Pirko. 19) New dsa driver for qca8xxx chips, from John Crispin. 20) Implement RFC7559 ipv6 router solicitation backoff, from Maciej Żenczykowski. 21) Add L3 mode to ipvlan, from Mahesh Bandewar. 22) Support 802.1ad in mlx4, from Moshe Shemesh. 23) Support hardware LRO in mediatek driver, from Nelson Chang. 24) Add TC offloading to mlx5, from Or Gerlitz. 25) Convert various drivers to ethtool ksettings interfaces, from Philippe Reynes. 26) TX max rate limiting for cxgb4, from Rahul Lakkireddy. 27) NAPI support for ath10k, from Rajkumar Manoharan. 28) Support XDP in mlx5, from Rana Shahout and Saeed Mahameed. 29) UDP replicast support in TIPC, from Richard Alpe. 30) Per-queue statistics for qed driver, from Sudarsana Reddy Kalluru. 31) Support BQL in thunderx driver, from Sunil Goutham. 32) TSO support in alx driver, from Tobias Regnery. 33) Add stream parser engine and use it in kcm. 34) Support async DHCP replies in ipconfig module, from Uwe Kleine-König. 35) DSA port fast aging for mv88e6xxx driver, from Vivien Didelot. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1715 commits) mlxsw: switchx2: Fix misuse of hard_header_len mlxsw: spectrum: Fix misuse of hard_header_len net/faraday: Stop NCSI device on shutdown net/ncsi: Introduce ncsi_stop_dev() net/ncsi: Rework the channel monitoring net/ncsi: Allow to extend NCSI request properties net/ncsi: Rework request index allocation net/ncsi: Don't probe on the reserved channel ID (0x1f) net/ncsi: Introduce NCSI_RESERVED_CHANNEL net/ncsi: Avoid unused-value build warning from ia64-linux-gcc net: Add netdev all_adj_list refcnt propagation to fix panic net: phy: Add Edge-rate driver for Microsemi PHYs. vmxnet3: Wake queue from reset work i40e: avoid NULL pointer dereference and recursive errors on early PCI error qed: Add RoCE ll2 & GSI support qed: Add support for memory registeration verbs qed: Add support for QP verbs qed: PD,PKEY and CQ verb support qed: Add support for RoCE hw init qede: Add qedr framework ...	2016-10-05 10:11:24 -07:00
Linus Torvalds	ce866e2d18	First pull request of the 4.9 merge window - Updates to hfi1 driver -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJX87HPAAoJELgmozMOVy/dBToP/jb9mSa7SzrCWaBvAovw7oK2 mEnETqHkV8fYa97SiuOFnPOQsK+fWSOgC6oL0I7JiK5BC5hpovTF8gDupN4x1q2v 4akTaAMvHwwjuXitA+EFNyCJWnt3jQDRVHE0WDRWeNMICXs1JD+xS5KzbRbZgWqQ 7fZjzUcT5uChL7i62GwjqvMPkp/s6w3PthtbxQerbikYVRvRkbU4LOAARXVfgjFM EfslY8hiQFKRDZ20eWgkzPGKXEdCgacjv0Ev1NMzpdeHZFtHn+zw4xJ70VGm9ukc IMKVNjbYN2Xa1hSihpxDD5ZauPxChCG6t/IKs4Bxtiodb/vnmKX5vswwdUsgpGP/ oOVixvQO8TPdoKXIB7wotfGDKLWvwd0dIhRLgmLtPj7jdLTejDAPran7/5GOSF6o ecsj0rTsQ343yWPjIgVg8ShtSW+rVgXQcOFnoOwJUqiptsNFUZJpk6OA3tx0crOM 7lLUOezb6BI99XiBBF3jN27Zd/QEGGaCKIkkfo+laSM5LSzn9VReVFvwTlaXLXx+ AwLhyaEVgYnCsfy1DiIQIgKIkXnYiLfKEd65tVo7bGDOnMaD4e2zDux2tcd+/NK+ lz0NaJ5Xuk+zOvrSG7Jw5bFnVhzghviDUJ9EI38YXhtRTYnYLPA5lpoCj+/BMgCo hPxlualfI+vd69dY/C5H =o9FO -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull hdi1 rdma driver updates from Doug Ledford: "This is the first pull request of the 4.9 merge window for the RDMA subsystem. It is only the hfi1 driver. It had dependencies on code that only landed late in the 4.7-rc cycle (around 4.7-rc7), so putting this with my other for-next code would have create an ugly merge of lot of 4.7-rc stuff. For that reason, it's being submitted individually. It's been through 0day and linux-next" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (37 commits) IB/rdmavt: Trivial function comment corrected. IB/hfi1: Fix trace of atomic ack IB/hfi1: Update SMA ingress checks for response packets IB/hfi1: Use EPROM platform configuration read IB/hfi1: Add ability to read platform config from the EPROM IB/hfi1: Restore EPROM read ability IB/hfi1: Document new sysfs entries for hfi1 driver IB/hfi1: Add new debugfs sdma_cpu_list file IB/hfi1: Add irq affinity notification handler IB/hfi1: Add a new VL sysfs attribute for sdma engines IB/hfi1: Add sysfs interface for affinity setup IB/hfi1: Fix resource release in context allocation IB/hfi1: Remove unused variable from devdata IB/hfi1: Cleanup tasklet refs in comments IB/hfi1: Adjust hardware buffering parameter IB/hfi1: Act on external device timeout IB/hfi1: Fix defered ack race with qp destroy IB/hfi1: Combine shift copy and byte copy for SGE reads IB/hfi1: Do not read more than a SGE length IB/hfi1: Extend i2c timeout ...	2016-10-04 12:08:55 -07:00
Salil	1bdab400af	IB/hns: Fix for removal of redundant code This patch removes the redundant code lines present in the functions get_send_wqe() and get_recv_wqe(). This also fixes the error in calculating the SQ WQE. Signed-off-by: Lijun Ou <oulijun@huawei.com> Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-03 11:43:15 -04:00
Lijun Ou	deb17f6f82	IB/hns: Delete the redundant lines in hns_roce_v1_m_qp() It doesn't need to assign for the filed of qp state in qpc separately when qp happen to migrate state which supported in RoCE engine v1. Signed-off-by: Lijun Ou <oulijun@huawei.com> Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-03 11:43:15 -04:00
Lijun Ou	3e413872b1	IB/hns: Fix the bug when platform_get_resource() exec fail This patch mainly fixes the bug with platform_get_resource(). It should return NULL when platform_get_resource() exec fail. Signed-off-by: Lijun Ou <oulijun@huawei.com> Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-03 11:43:15 -04:00
Lijun Ou	1fad5fab78	IB/hns: Update the rq head when modify qp state The rq head in qpc was zero will miss the rq wqes which have be sent, so here we should take the real value. Signed-off-by: Lijun Ou <oulijun@huawei.com> Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-03 11:43:15 -04:00
Lijun Ou	24f0c9c0ff	IB/hns: Cq has not been freed Cq has not been freed when fail to ib_copy_to_udata, so need to free it. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Peter Chen <luck.chen@huawei.com> Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-03 11:43:15 -04:00
Lijun Ou	cb814642e7	IB/hns: Validate mtu when modified qp The mtu should be validated when modify qp,so we check it. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Peter Chen <luck.chen@huawei.com> Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-03 11:43:15 -04:00
Lijun Ou	7c7a4ea145	IB/hns: Some items of qpc need to take user param Some items of qpc need to take user param when modified qp state. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Dongdong Huang(Donald) <hdd.huang@huawei.com> Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-03 11:43:15 -04:00
Lijun Ou	c6c3bfea82	IB/hns: The Ack timeout need a lower limit value The Ack timeout of qpc need a lower limit value,otherwise the read performance will be very lower. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Dongdong Huang(Donald) <hdd.huang@huawei.com> Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-03 11:43:15 -04:00
Lijun Ou	07182fa77b	IB/hns: Return bad wr while post send failed While post failed, hns roce should return the wr failed to user. We omitted this while qp type is wrong and fixed it. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Dongdong Huang(Donald) <hdd.huang@huawei.com> Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-03 11:43:15 -04:00
Lijun Ou	1cd11064da	IB/hns: Fix bug of memory leakage for registering user mr While the page size attribute of umem is illegal, we should release umem that get by ib_umem_get interface. Also, we should return a non-zero value while pbl number is wrong. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Dongdong Huang(Donald) <hdd.huang@huawei.com> Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-03 11:43:15 -04:00
Lijun Ou	49fdf6bb0a	IB/hns: Modify the init of iboe lock This lock will be used in query port interface, and will be called while IB device was registered to OFED framework/IB Core. So, the lock of iboe must be initiated before IB device was registered. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Dongdong Huang(Donald) <hdd.huang@huawei.com> Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-03 11:43:15 -04:00
Wei Hu (Xavier)	c4a193d3a8	IB/hns: Optimize code of aeq and ceq interrupt handle and fix the bug of qpn This patch optimized the codes of aeq and ceq interrupt handle and fixed the bug in the calculation of qpn. For the special qp(GSI or SMI), calculated the qp number according to physical port and the qpn reported in the event of async event queue. Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-03 11:43:15 -04:00
Wei Hu (Xavier)	1ca5b253ad	IB/hns: Delete the sqp_start from the structure hns_roce_caps This patch deleted the sqp_start from the structure hns_roce_caps, and modified the calculation of the qp number. Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-03 11:43:15 -04:00
Wei Hu (Xavier)	97f0e39fa5	IB/hns: Fix bug of clear hem In hip06, there's no interface to release hem memory. So, hardware can't identify whether hem memory released or not. If all context in a hem memory released, the related hem memory will be released by driver and reused by others. But, hardware don't know that this memory can't be used already. In order to fix this bug, hns roce driver reserved 128K memory for each type of hem(QPC/CQC/MTPT). While unmap hem memory, hns roce driver will write base address of reserved memory according to hem type. Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Dongdong Huang(Donald) <hdd.huang@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-03 11:43:15 -04:00
Lijun Ou	76445703e5	IB/hns: Remove unused parameter named qp_type This patch removes the qp_type parameter in hns_roce_set_kernel_sq_size(). Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Ping Zhang <zhangping5@huawei.com> Reviewed-by: Wei Hu <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-03 11:43:15 -04:00
Lijun Ou	a598c6f4c5	IB/hns: Simplify function of pd alloc and qp alloc Hns_roce_pd_alloc and hns_roce_reserve_range_qp use unnecessary transformation of parameters. This patch simplify these two functions. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Dongdong Huang(Donald) <hdd.huang@huawei.com> Reviewed-by: Wei Hu <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-03 11:43:15 -04:00
Lijun Ou	ed3e6d0113	IB/hns: Fix bug of using uninit refcount and free In current version, it uses uninitialized parameters named refcount and free in hns_roce_cq_event. This patch initializes these parameter in cq alloc and add correspond process in cq free. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Dongdong Huang(Donald) <hdd.huang@huawei.com> Reviewed-by: Wei Hu <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-03 11:43:15 -04:00
Lijun Ou	622b85f9f7	IB/hns: Remove parameters of resize cq In old version of RoCE, it doesn't support to resize cq. So, we remove parameters related to resize cq. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Dongdong Huang(Donald) <hdd.huang@huawei.com> Reviewed-by: Wei Hu <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-03 11:43:15 -04:00
Lijun Ou	a4be892e83	IB/hns: Remove unused parameters in some functions The parameter named collapsed unused in hns_roce_cq_alloc. Also, parameter named doorbell_lock unsed in hns_roce_v1_cq_set_ci. This patch optimize these parameters. Signed-off-by: Lijun Ou <oulijun@huawei.com> Reviewed-by: Wei Hu <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-03 11:43:15 -04:00
Lijun Ou	ac11125bfd	IB/hns: Fix two bugs for rdma cm connecting This patch mainly modify the value of HNS_ROCE_SL_SHIFT and delete the lines for assigning for the field of local_enable_e2e_credit in QP1C. Signed-off-by: Lijun Ou <oulijun@huawei.com> Reviewed-by: Wei Hu <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-03 11:43:10 -04:00
Lijun Ou	509bf0c2da	IB/hns: Fix the bug of rdma cm connecting on user mode Fix bug of modify qp from init to init on user mode. Otherwise, it will oops when rmda cm established. Signed-off-by: Lijun Ou <oulijun@huawei.com> Reviewed-by: Wei Hu <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-03 11:43:10 -04:00
Lijun Ou	b280db52cc	IB/hns: Change the logic for allocating uar registers This patch mainly modifies the logic for allocating uar registers. In HiP06 SoC, HW has 8 group of uar registers for kernel and user space application. The uar index is assigned as follows: 0 ------ for kernel 1~7 ------ for user space application Signed-off-by: Lijun Ou <oulijun@huawei.com> Reviewed-by: Wei Hu <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-03 11:43:10 -04:00
Lijun Ou	7716809efe	IB/hns: Add phy_port for computing GSI/QPN This patch mainly adds phy_port to HNS RoCE QP. This shall be used in calculating the GSI QPN for the port. Initally when RDMA is being established, all IB ports share a QPN which later needs to be re-assigned to a particular GSI/QPN and which is per-port. This also fixes a bug in base driver where iboe port was being used instead of phy_port at some places. This values might not be same always. Signed-off-by: Lijun Ou <oulijun@huawei.com> Reviewed-by: Wei Hu <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-03 11:43:10 -04:00
Lijun Ou	c24bf895c5	IB/hns: Fix two possible bugs for rdma cm Fix the length of wqe that maybe lead to an error and write the end bytes of QP1C into the register. Signed-off-by: Lijun Ou <oulijun@huawei.com> Reviewed-by: Wei Hu <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-03 11:43:10 -04:00
Lijun Ou	a74aab6c2f	IB/hns: Fix the value of device_cap_flags In the latest IB core version, it has some known issues with memory registration using the local_dma_lkey. Thus RoCE don't expose support for it, and remove device->local_dma_lkey which is introduced to working systems. Signed-off-by: Lijun Ou <oulijun@huawei.com> Reviewed-by: Wei Hu <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-03 11:43:10 -04:00
Lijun Ou	31644665d4	IB/hns: Add & initialize "node_guid" parameter for RDMA CM According to the Infiniband spec, NodeGUID uniquely identifies a node. This must be initialized to some unique value. This patch adds the support to the HNS RoCE driver to fetch the NodeGUID value from DT or ACPI and then use this value to initialize the node_guid parameter of IB device. This value shall be used by RDMA CM. Signed-off-by: Lijun Ou <oulijun@huawei.com> Reviewed-by: Wei Hu <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-03 11:43:10 -04:00
Lijun Ou	2eefca2722	IB/hns: Register HNS RoCE Driver get_netdev() with IB Core This patch adds get_netdev() function to the IB device. This shall be used to fetch netdev corresponding to the port number. This function would be called by IB core(Generic CM Agent) for example, when the RDMA connection is being established. Signed-off-by: Lijun Ou <oulijun@huawei.com> Reviewed-by: Wei Hu <xavier.huwei@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-03 11:43:10 -04:00
Parav Pandit	61347fa608	IB/rdmavt: Trivial function comment corrected. Corrected function name in comment from qib_ to rvt_. Signed-off-by: Parav Pandit <pandit.parav@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-03 10:55:27 -04:00
Mike Marciniszyn	37aab620bc	IB/hfi1: Fix trace of atomic ack The length is incorrect, causing the trace data to be truncated. Add the additional 8 bytes that should have been there. Also trace out the atomic ack in hex to aid debugging. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:21 -04:00
Jianxin Xiong	f380920957	IB/hfi1: Update SMA ingress checks for response packets Fix "unsupported method" error by skipping ingress pkey checks on response SMA packets. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:21 -04:00
Dean Luick	e83eba214d	IB/hfi1: Use EPROM platform configuration read The driver will now try to read directly from the EPROM as its first choice for the platform configuration file. Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:20 -04:00
Dean Luick	107ffbc521	IB/hfi1: Add ability to read platform config from the EPROM Add a function to read the platform configuration file from the EPROM. Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:20 -04:00
Dean Luick	e2113752b7	IB/hfi1: Restore EPROM read ability Partially revert commit `d079031742` ("IB/hfi1: Remove EPROM functionality from data device"), bringing back the ability to read from the EPROM. This code will be used for driver-only acccess to the EPROM, hence change EPROM read to save to a buffer instead of copy touser. Also allow any offset and remove missed includes and leftover declarations. Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:19 -04:00
Tadeusz Struk	af3674d62d	IB/hfi1: Add new debugfs sdma_cpu_list file Add a debugfs sdma_cpu_list file that can be used to examine the CPU to sdma engine assignments for the whole device. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:19 -04:00
Tadeusz Struk	2d01c37d75	IB/hfi1: Add irq affinity notification handler This patch adds an irq affinity notification handler. When a user changes interrupt affinity settings for an sdma engine, the driver needs to make changes to its internal sde structures and also update the affinity_hint. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:18 -04:00
Tadeusz Struk	f191225719	IB/hfi1: Add a new VL sysfs attribute for sdma engines This patch adds a read-only "VL" attribute for the sysfs entry of each sdma engine. It will allow the user to check VL to sdma engine mappings. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:18 -04:00
Tadeusz Struk	0cb2aa690c	IB/hfi1: Add sysfs interface for affinity setup Some users want more control over which cpu cores are being used by the driver. For example, users might want to restrict the driver to some specified subset of the cores so that they can appropriately partition processes, irq handlers, and work threads. To allow the user to fine tune system affinity settings new sysfs attributes are introduced per sdma engine. This patch adds a new attribute type for sdma engine and a new cpu_list attribute. When the user writes a cpu range to the cpu_list attribute the driver will create an internal cpu->sdma map, which will be used later as a look-up table to choose an optimal engine for a user requests. Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:17 -04:00
Jakub Pawlak	3a6982dfd3	IB/hfi1: Fix resource release in context allocation Correct resource free in allocate_ctxt() function. When context creation fails allocated resources are properly released and pointer in receive context data table is set back to NULL. Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:17 -04:00
Dennis Dalessandro	242833fbe4	IB/hfi1: Remove unused variable from devdata We no longer use an error tasklet. Remove it from the hfi1_devdata structure. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:16 -04:00
Dennis Dalessandro	ca00c62b9e	IB/hfi1: Cleanup tasklet refs in comments The code no longer uses tasklets for the send engine. However it does use a tasklet for sdma but the send routines use a workqueue now days. Update the comments to reflect that. Make things more generic with saying "send engine" because that is what is being referred to. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:16 -04:00
Harish Chegondi	e8a70af286	IB/hfi1: Adjust hardware buffering parameter It was determined that 0x880 is a better value for hardware buffering, use it. Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:15 -04:00
Dean Luick	50921be0c7	IB/hfi1: Act on external device timeout Add missing external device timeout notification. Recognize it as a failed LNI signal from the 8051 firmware. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:15 -04:00
Mike Marciniszyn	72f53af265	IB/hfi1: Fix defered ack race with qp destroy There is a a bug in defered ack stuff that causes a race with the destroy of a QP. A packet causes a defered ack to be pended by putting the QP into an rcd queue. A return from the driver interrupt processing will process that rcd queue of QPs and attempt to do a direct send of the ack. At this point no locks are held and the above QP could now be put in the reset state in the qp destroy logic. A refcount protects the QP while it is in the rcd queue so it isn't going anywhere yet. If the direct send fails to allocate a pio buffer, hfi1_schedule_send() is called to trigger sending an ack from the send engine. There is no state test in that code path. The refcount is then dropped from the driver.c caller potentially allowing the qp destroy to continue from its refcount wait in parallel with the workqueue scheduling of the qp. Cc: stable@vger.kernel.org Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:14 -04:00
Sebastian Sanchez	61868fb5c2	IB/hfi1: Combine shift copy and byte copy for SGE reads Prevent over-reading the SGE length by using byte reads for non quad-word reads. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:14 -04:00
Sebastian Sanchez	a4309d94f7	IB/hfi1: Do not read more than a SGE length In certain cases, if the tail of an SGE is not 8-byte aligned, bytes beyond the end to an 8-byte alignment can be read. Change the copy routine to avoid the over-read. Instead, stop on the final whole quad-word, then read the remaining bytes. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:13 -04:00
Dean Luick	d5cf683e62	IB/hfi1: Extend i2c timeout Allow a longer timeout for i2c due to clock stretching and inaccurate jiffy timing when under a spin lock. This timeout is consistent with other i2c-algo-bit users. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:13 -04:00
Jianxin Xiong	f6aa783546	IB/hfi1: Increase default settings of max_cqes and max_qps The ib_write_bw test allows using up to 16384 QPs. When a relatively large number of QPs (within that range) is used, the test can fail because the number of CQ entries needed exceeds the limit set by the driver. This patch increases the default setting of max_cqes from 0x2FFFF (196607) to 0x2FFFFF(3145727), which is sufficient to cover the maximum number needed by the ib_write_bw test (2097152). The default setting of max_qps is also increased from 16384 to 32768 to allow the test to run successfully with 16383 or 16384 QPs. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:12 -04:00
Sebastian Sanchez	c08d57a30a	IB/hfi1: Remove filtering of Set(PkeyTable) in HFI SMA The FM should have full control to set the pkeys in the driver pkey table. Remove filtering done by the driver. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:12 -04:00
Dennis Dalessandro	84b3adc243	IB/qib: Remove qpt_mask global There is no need to have a global qpt_mask as that does not support the multiple chip model which qib has. Instead rely on the value which exists already in the device data (dd). Fixes: `898fa52b4a` "IB/qib: Remove qpn, qp tables and related variables from qib" Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:11 -04:00
Mike Marciniszyn	b374e060cc	IB/hfi1: Consolidate pio control masks into single definition This allows for adding additional pages of adaptive pio opcode control including manufacturer specific ones. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:11 -04:00
Mike Marciniszyn	68e78b3d78	IB/rdmavt, IB/hfi1: Add lockdep asserts for lock debug This patch adds lockdep asserts in key code paths for insuring lock correctness. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:10 -04:00
Mike Marciniszyn	222f7a9aac	IB/rdmavt: Add qp init function Add an rvt_qp_init() to initialize specific common fields as the qp is created or reset. The routine is shared by the rvt_reset_qp() and the rvt_create_qp(). The intent is that lock dep assertions will only appear in the rvt_reset_qp(). Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:10 -04:00
Mike Marciniszyn	30a345cc01	IB/rdmavt: Move reset calldown to reset path The reset calldown is misplaced. It should only be called in the code that actually transitions the QP to reset. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:09 -04:00
Mike Marciniszyn	5a648dfad0	IB/hfi1: Move iowait_init() to priv allocate The call is misplaced in the reset calldown function and causes issues with lockdep assertions that are to be added. Fixes: Commit `a2c2d60895` ("staging/rdma/hfi1: Remove create_qp functionality") Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:09 -04:00
Mike Marciniszyn	eefa1d8961	IB/rdmavt: Correct sparse annotation The __must_hold() is sufficent to correct the sparse context imbalance inside a function. Per Documentation/sparse.txt: __must_hold - The specified lock is held on function entry and exit. Fixes: Commit `c0a67f6ba3` ("IB/rdmavt: Annotate rvt_reset_qp()") Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:08 -04:00
Tadeusz Struk	584d9577ff	IB/hfi1: Fix locking scheme for affinity settings Existing locking scheme in affinity.c file using the &node_affinity.lock spinlock is not very elegant. We acquire the lock to get hfi1_affinity_node entry, unlock, and then use the entry without the lock held. With more functions being added, which access and modify the entries, this can lead to race conditions. This patch makes this locking scheme more consistent. It changes the spinlock to mutex. Since all the code is executed in a user process context there is no need for a spinlock. This also allows to keep the lock not only while we look up for the node affinity entry, but over the whole section where the entry is being used. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:08 -04:00
Tymoteusz Kielan	60368186fd	IB/hfi1: Fix user-space buffers mapping with IOMMU enabled The dma_XXX API functions return bus addresses which are physical addresses when IOMMU is disabled. Buffer mapping to user-space is done via remap_pfn_range() with PFN based on bus address instead of physical. This results in wrong pages being mapped to user-space when IOMMU is enabled. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Tymoteusz Kielan <tymoteusz.kielan@intel.com> Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:07 -04:00
Harish Chegondi	0b115ef100	IB/hfi1: Fix the count of user packets submitted to an SDMA engine Each user SDMA request coming into the driver may contain multiple packets. Each user packet may use multiple SDMA descriptors to fill the send buffer. The field seqsubmitted in struct user_sdma_request counts the number of user packets submitted to an SDMA engine. Sometimes, the intermediate count may not be updated properly. However, once all the packets' descriptors are successfully submitted to the SDMA engine, the final count is updated correctly. But, if only some of the packets are submitted to the engine due to an error, the intermediate count doesn't reflect the partial number of packets submitted to the SDMA engine. This can cause a hang later in the code as the count of packets submitted to the SDMA engine doesn't match the the count of packets processed by the SDMA engine. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:07 -04:00
Dean Luick	0db9dec276	IB/hfi1: Move serdes tune inside link start function All calls to tune_serdes and start_link are paired. Move tune_serdes inside start_link. Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:06 -04:00
Mike Marciniszyn	261a435184	IB/qib,IB/hfi: Use core common header file Use common header file structs, defines, and accessors in the drivers. The old declarations are removed. The repositioning of the includes allows for the removal of hfi1_message_header and replaces its use with ib_header. Also corrected are two issues with set_armed_to_active(): - The "packet" parameter is now a pointer as it should have been - The etype is validated to insure that the header is correct Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-10-02 08:42:06 -04:00
Deepa Dinamani	078cd8279e	fs: Replace CURRENT_TIME with current_time() for inode timestamps CURRENT_TIME macro is not appropriate for filesystems as it doesn't use the right granularity for filesystem timestamps. Use current_time() instead. CURRENT_TIME is also not y2038 safe. This is also in preparation for the patch that transitions vfs timestamps to use 64 bit time and hence make them y2038 safe. As part of the effort current_time() will be extended to do range checks. Hence, it is necessary for all file system timestamps to use current_time(). Also, current_time() will be transitioned along with vfs to be y2038 safe. Note that whenever a single call to current_time() is used to change timestamps in different inodes, it is because they share the same time granularity. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Felipe Balbi <balbi@kernel.org> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: David Sterba <dsterba@suse.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-09-27 21:06:21 -04:00
Christoph Hellwig	5ef990f06b	IB/core: remove ib_get_dma_mr We now only use it from ib_alloc_pd to create a local DMA lkey if the device doesn't provide one, or a global rkey if the ULP requests it. This patch removes ib_get_dma_mr and open codes the functionality in ib_alloc_pd so that we can simplify the code and prevent abuse of the functionality. As a side effect we can also simplify things by removing the valid access bit check, and the PD refcounting. In the future I hope to also remove the per-PD global MR entirely by shifting this work into the HW drivers, as one step towards avoiding the struct ib_mr overload for various different use cases. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-23 13:47:44 -04:00
Christoph Hellwig	5f071777f9	IB/srp: use IB_PD_UNSAFE_GLOBAL_RKEY Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-23 13:47:44 -04:00
Christoph Hellwig	8e61212d05	IB/iser: use IB_PD_UNSAFE_GLOBAL_RKEY Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-23 13:47:44 -04:00
Christoph Hellwig	ed082d36a7	IB/core: add support to create a unsafe global rkey to ib_create_pd Instead of exposing ib_get_dma_mr to ULPs and letting them use it more or less unchecked, this moves the capability of creating a global rkey into the RDMA core, where it can be easily audited. It also prints a warning everytime this feature is used as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-23 13:47:44 -04:00
Christoph Hellwig	50d46335b0	IB/core: rename pd->local_mr to pd->__internal_mr This has two reasons: a) to clearly mark that drivers don't have any business using it, and b) because we're going to use it for the (dangerous) global rkey soon, so that drivers don't create on themselves. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-23 13:47:44 -04:00
David S. Miller	d6989d4bbe	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2016-09-23 06:46:57 -04:00
Hariprasad Shenai	0fbc81b3ad	chcr/cxgb4i/cxgbit/RDMA/cxgb4: Allocate resources dynamically for all cxgb4 ULD's Allocate resources dynamically to cxgb4's Upper layer driver's(ULD) like cxgbit, iw_cxgb4 and cxgb4i. Allocate resources when they register with cxgb4 driver and free them while unregistering. All the queues and the interrupts for them will be allocated during ULD probe only and freed during remove. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-09-19 01:37:32 -04:00
Linus Torvalds	dd5a477c7f	Round three of 4.8 rc fixes - Various fixes to rdmavt, ipoib, mlx5, mlx4, rxe -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJX3DdRAAoJELgmozMOVy/d2k4QAJz0HbJvS8uN/ny6zaIsIa74 08pvzHWpPkbJ6JGyxySToxHx7gs+MMvsvovUM+QPQS4jt6ZdHY1vOUDztG7GVXZC SsC8kYX8o0P2zhiDeMi/9LoBjH5bLgS3L5lfwke0jgWXCU6Cdgm5InnZ8XuoBZr6 zNQ/Zcg8epe92IhqJ9abqMveni4FstXzlj9PlhaeCkUadFarpypG2yTdcvmq7m6i aXvGDVWgaVTB0CyaJtXIK3g/lmW4Ay3z5RpIjPbdZTd2j46c8Z4yKrhpHuK2fChb 4xPSEoMdBTO/FI/M0Mf6FKEtGv4bxFcfwpjw5fuWL3sk+hWVWA0yqil3fCJ4vAy5 klUdRLE187hd0MBj2Eq8xLeblfuqAmiuWjPJ59npcspPFUaXgvw8jolxjzxE2HVM whAVnb5fEVu1nQ8ePfkDPNJ1osFmFwYObYiYqql258U5jBU/QXwohMQihSeIhR04 yRyzY1ob+WCGqp/MKkkAAZvjSUGdPzuky5YHCymmoinJKXvf9eJ7LdQ1l2jFMoa6 ZpZNppSya9v2pLhGf8MhO2DXyejhCnPgn1JS7OhoTCHbSPR5zDD8oucgW0+gmUpd 1QAzEL2HSojvLrDJJpJOTHhL8TQPLEVnnILMq5jRGy3y+lUJ1iGgw3qBLxAhZZRZ r7omK4iOuut0P+Rzvgqg =HC3X -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma fixes from Doug Ledford: "Round three of 4.8 rc fixes. This is likely the last rdma pull request this cycle. The new rxe driver had a few issues (you probably saw the boot bot bug report) and they should be addressed now. There are a couple other fixes here, mainly mlx4. There are still two outstanding issues that need resolved but I don't think their fix will make this kernel cycle. Summary: - Various fixes to rdmavt, ipoib, mlx5, mlx4, rxe" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: IB/rdmavt: Don't vfree a kzalloc'ed memory region IB/rxe: Fix kmem_cache leak IB/rxe: Fix race condition between requester and completer IB/rxe: Fix duplicate atomic request handling IB/rxe: Fix kernel panic in udp_setup_tunnel IB/mlx5: Set source mac address in FTE IB/mlx5: Enable MAD_IFC commands for IB ports only IB/mlx4: Diagnostic HW counters are not supported in slave mode IB/mlx4: Use correct subnet-prefix in QP1 mads under SR-IOV IB/mlx4: Fix code indentation in QP1 MAD flow IB/mlx4: Fix incorrect MC join state bit-masking on SR-IOV IB/ipoib: Don't allow MC joins during light MC flush IB/rxe: fix GFP_KERNEL in spinlock context	2016-09-16 13:51:42 -07:00
Mike Marciniszyn	4d6f85c3fa	IB/rdmavt, IB/qib, IB/hfi1: Use new QP put get routines This improves readability and hides the reference count mechanism from the client drivers. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-16 14:35:27 -04:00
Colin Ian King	e4618d40eb	IB/rdmavt: Don't vfree a kzalloc'ed memory region The userspace memory region 'mr' is allocated with kzalloc in __rvt_alloc_mr however it is incorrectly being freed with vfree in __rvt_free_mr. Fix this by using kfree to free it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-16 14:14:23 -04:00
Yonatan Cohen	c1cc72cb6f	IB/rxe: Fix kmem_cache leak Decrement qp reference when handling error path in completer to prevent kmem_cache leak. Fixes: `8700e3e7c4` ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-16 14:14:08 -04:00
Yonatan Cohen	3050b99850	IB/rxe: Fix race condition between requester and completer rxe_requester() is sending a pkt with rxe_xmit_packet() and then calls rxe_update() to update the wqe and qp's psn values. But sometimes the response is received before the requester had time to update the wqe in which case the completer acts on errornous wqe values. This fix updates the wqe and qp before actually sending the request and rolls back when xmit fails. Fixes: `8700e3e7c4` ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-16 14:14:08 -04:00
Yonatan Cohen	908948877b	IB/rxe: Fix duplicate atomic request handling When handling ack for atomic opcodes like "fetch&add" or "cmp&swp", the method send_atomic_ack() saves the ack before sending it, in case it gets lost and never reach the requester. In which case the method duplicate_request() will need to find it using the duplicated request.psn. But send_atomic_ack() used a wrong psn value and thus the above ack was never found. This fix uses the ack.psn to locate the ack in case its needed. This fix also copies the ack packet to the skb's control buffer since duplicate_request() will need it when calling rxe_xmit_packet() Fixes: `8700e3e7c4` ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-16 14:14:08 -04:00
Yonatan Cohen	dfdd6158ca	IB/rxe: Fix kernel panic in udp_setup_tunnel Disable creation of a UDP socket for ipv6 when CONFIG_IPV6 is not enabeld. Since udp_sock_create6() returns 0 when CONFIG_IPV6 is not set [ 46.888632] IP: [<c220705a>] setup_udp_tunnel_sock+0x6/0x4f [ 46.891355] pdpt = 0000000000000000 pde = f000ff53f000ff53 [ 46.893918] Oops: 0002 [#1] PREEMPT [ 46.896014] CPU: 0 PID: 1 Comm: swapper Not tainted 4.7.0-rc4-00001-g8700e3e #1 [ 46.900280] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014 [ 46.904905] task: cf06c040 ti: cf05e000 task.ti: cf05e000 [ 46.907854] EIP: 0060:[<c220705a>] EFLAGS: 00210246 CPU: 0 [ 46.911137] EIP is at setup_udp_tunnel_sock+0x6/0x4f [ 46.914070] EAX: 00000044 EBX: 00000001 ECX: cf05fef0 EDX: ca8142e0 [ 46.917236] ESI: c2c4505b EDI: cf05fef0 EBP: cf05fed0 ESP: cf05fed0 [ 46.919836] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 [ 46.922046] CR0: 80050033 CR2: 000001fc CR3: 02cec000 CR4: 000006b0 [ 46.924550] Stack: [ 46.926014] cf05ff10 c1fd4657 ca8142e0 0000000a 00000000 00000000 0000b712 00000008 [ 46.931274] 00000000 6bb5bd01 c1fd48de 00000000 00000000 cf05ff1c 00000000 00000000 [ 46.936122] cf05ff1c c1fd4bdf 00000000 cf05ff28 c2c4507b ffffffff cf05ff88 c2bf1c74 [ 46.942350] Call Trace: [ 46.944403] [<c1fd4657>] rxe_setup_udp_tunnel+0x8f/0x99 [ 46.947689] [<c1fd48de>] ? net_to_rxe+0x4e/0x4e [ 46.950567] [<c1fd4bdf>] rxe_net_init+0xe/0xa4 [ 46.953147] [<c2c4507b>] rxe_module_init+0x20/0x4c [ 46.955448] [<c2bf1c74>] do_one_initcall+0x89/0x113 [ 46.957797] [<c2bf15eb>] ? set_debug_rodata+0xf/0xf [ 46.959966] [<c2bf1dbc>] ? kernel_init_freeable+0xbe/0x15b [ 46.962262] [<c2bf1ddc>] kernel_init_freeable+0xde/0x15b [ 46.964418] [<c232eb54>] kernel_init+0x8/0xd0 [ 46.966618] [<c2333122>] ret_from_kernel_thread+0xe/0x24 [ 46.969592] [<c232eb4c>] ? rest_init+0x6f/0x6f Fixes: `8700e3e7c4` ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-16 14:14:08 -04:00
Maor Gottlieb	ee3da804ad	IB/mlx5: Set source mac address in FTE Set the source mac address in the FTE when L2 specification is provided. Fixes: `038d2ef875` ('IB/mlx5: Add flow steering support') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-16 14:14:08 -04:00
Noa Osherovich	7fae6655a0	IB/mlx5: Enable MAD_IFC commands for IB ports only MAD_IFC command is supported only for physical functions (PF) and when physical port is IB. The proposed fix enforces it. Fixes: `d603c809ef` ("IB/mlx5: Fix decision on using MAD_IFC") Reported-by: David Chang <dchang@suse.com> Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-16 14:14:08 -04:00
Kamal Heib	69d269d389	IB/mlx4: Diagnostic HW counters are not supported in slave mode Modify the mlx4_ib_diag_counters() to avoid the following error in the hypervisor when the slave tries to query the hardware counters in SR-IOV mode. mlx4_core 0000:81:00.0: Unknown command:0x30 accepted from slave:1 Fixes: `3f85f2aaab` ("IB/mlx4: Add diagnostic hardware counters") Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-16 14:14:08 -04:00
Jack Morgenstein	8ec07bf8a8	IB/mlx4: Use correct subnet-prefix in QP1 mads under SR-IOV When sending QP1 MAD packets which use a GRH, the source GID (which consists of the 64-bit subnet prefix, and the 64 bit port GUID) must be included in the packet GRH. For SR-IOV, a GID cache is used, since the source GID needs to be the slave's source GID, and not the Hypervisor's GID. This cache also included a subnet_prefix. Unfortunately, the subnet_prefix field in the cache was never initialized (to the default subnet prefix 0xfe80::0). As a result, this field remained all zeroes. Therefore, when SR-IOV was active, all QP1 packets which included a GRH had a source GID subnet prefix of all-zeroes. However, the subnet-prefix should initially be 0xfe80::0 (the default subnet prefix). In addition, if OpenSM modifies a port's subnet prefix, the new subnet prefix must be used in the GRH when sending QP1 packets. To fix this we now initialize the subnet prefix in the SR-IOV GID cache to the default subnet prefix. We update the cached value if/when OpenSM modifies the port's subnet prefix. We take this cached value when sending QP1 packets when SR-IOV is active. Note that the value is stored as an atomic64. This eliminates any need for locking when the subnet prefix is being updated. Note also that we depend on the FW generating the "port management change" event for tracking subnet-prefix changes performed by OpenSM. If running early FW (before 2.9.4630), subnet prefix changes will not be tracked (but the default subnet prefix still will be stored in the cache; therefore users who do not modify the subnet prefix will not have a problem). IF there is a need for such tracking also for early FW, we will add that capability in a subsequent patch. Fixes: `1ffeb2eb8b` ("IB/mlx4: SR-IOV IB context objects and proxy/tunnel SQP support") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-16 14:14:08 -04:00
Jack Morgenstein	baa0be7026	IB/mlx4: Fix code indentation in QP1 MAD flow The indentation in the QP1 GRH flow in procedure build_mlx_header is really confusing. Fix it, in preparation for a commit which touches this code. Fixes: `1ffeb2eb8b` ("IB/mlx4: SR-IOV IB context objects and proxy/tunnel SQP support") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-16 14:14:08 -04:00
Alex Vesker	e5ac40cd66	IB/mlx4: Fix incorrect MC join state bit-masking on SR-IOV Because of an incorrect bit-masking done on the join state bits, when handling a join request we failed to detect a difference between the group join state and the request join state when joining as send only full member (0x8). This caused the MC join request not to be sent. This issue is relevant only when SRIOV is enabled and SM supports send only full member. This fix separates scope bits and join states bits a nibble each. Fixes: `b9c5d6a643` ('IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV') Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-16 14:14:08 -04:00
Alex Vesker	344bacca8c	IB/ipoib: Don't allow MC joins during light MC flush This fix solves a race between light flush and on the fly joins. Light flush doesn't set the device to down and unset IPOIB_OPER_UP flag, this means that if while flushing we have a MC join in progress and the QP was attached to BC MGID we can have a mismatches when re-attaching a QP to the BC MGID. The light flush would set the broadcast group to NULL causing an on the fly join to rejoin and reattach to the BC MCG as well as adding the BC MGID to the multicast list. The flush process would later on remove the BC MGID and detach it from the QP. On the next flush the BC MGID is present in the multicast list but not found when trying to detach it because of the previous double attach and single detach. [18332.714265] ------------[ cut here ]------------ [18332.717775] WARNING: CPU: 6 PID: 3767 at drivers/infiniband/core/verbs.c:280 ib_dealloc_pd+0xff/0x120 [ib_core] ... [18332.775198] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011 [18332.779411] 0000000000000000 ffff8800b50dfbb0 ffffffff813fed47 0000000000000000 [18332.784960] 0000000000000000 ffff8800b50dfbf0 ffffffff8109add1 0000011832f58300 [18332.790547] ffff880226a596c0 ffff880032482000 ffff880032482830 ffff880226a59280 [18332.796199] Call Trace: [18332.798015] [<ffffffff813fed47>] dump_stack+0x63/0x8c [18332.801831] [<ffffffff8109add1>] __warn+0xd1/0xf0 [18332.805403] [<ffffffff8109aebd>] warn_slowpath_null+0x1d/0x20 [18332.809706] [<ffffffffa025d90f>] ib_dealloc_pd+0xff/0x120 [ib_core] [18332.814384] [<ffffffffa04f3d7c>] ipoib_transport_dev_cleanup+0xfc/0x1d0 [ib_ipoib] [18332.820031] [<ffffffffa04ed648>] ipoib_ib_dev_cleanup+0x98/0x110 [ib_ipoib] [18332.825220] [<ffffffffa04e62c8>] ipoib_dev_cleanup+0x2d8/0x550 [ib_ipoib] [18332.830290] [<ffffffffa04e656f>] ipoib_uninit+0x2f/0x40 [ib_ipoib] [18332.834911] [<ffffffff81772a8a>] rollback_registered_many+0x1aa/0x2c0 [18332.839741] [<ffffffff81772bd1>] rollback_registered+0x31/0x40 [18332.844091] [<ffffffff81773b18>] unregister_netdevice_queue+0x48/0x80 [18332.848880] [<ffffffffa04f489b>] ipoib_vlan_delete+0x1fb/0x290 [ib_ipoib] [18332.853848] [<ffffffffa04df1cd>] delete_child+0x7d/0xf0 [ib_ipoib] [18332.858474] [<ffffffff81520c08>] dev_attr_store+0x18/0x30 [18332.862510] [<ffffffff8127fe4a>] sysfs_kf_write+0x3a/0x50 [18332.866349] [<ffffffff8127f4e0>] kernfs_fop_write+0x120/0x170 [18332.870471] [<ffffffff81207198>] __vfs_write+0x28/0xe0 [18332.874152] [<ffffffff810e09bf>] ? percpu_down_read+0x1f/0x50 [18332.878274] [<ffffffff81208062>] vfs_write+0xa2/0x1a0 [18332.881896] [<ffffffff812093a6>] SyS_write+0x46/0xa0 [18332.885632] [<ffffffff810039b7>] do_syscall_64+0x57/0xb0 [18332.889709] [<ffffffff81883321>] entry_SYSCALL64_slow_path+0x25/0x25 [18332.894727] ---[ end trace 09ebbe31f831ef17 ]--- Fixes: `ee1e2c82c2` ("IPoIB: Refresh paths instead of flushing them on SM change events") Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-16 14:14:08 -04:00
Alexey Khoroshilov	5e102b3b4f	IB/rxe: fix GFP_KERNEL in spinlock context There is skb_clone(skb, GFP_KERNEL) in spinlock context in rxe_rcv_mcast_pkt(). Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Acked-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-16 14:14:08 -04:00
Varun Prakash	6e3b6fc201	libcxgb,iw_cxgb4,cxgbit: add cxgb_mk_rx_data_ack() Add cxgb_mk_rx_data_ack() to remove duplicate code to form CPL_RX_DATA_ACK hardware command. Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-09-15 20:49:20 -04:00
Varun Prakash	052f4731ed	libcxgb,iw_cxgb4,cxgbit: add cxgb_mk_abort_rpl() Add cxgb_mk_abort_rpl() to remove duplicate code to form CPL_ABORT_RPL hardware command. Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-09-15 20:49:20 -04:00
Varun Prakash	a7e1a97f88	libcxgb,iw_cxgb4,cxgbit: add cxgb_mk_abort_req() Add cxgb_mk_abort_req() to remove duplicate code to form CPL_ABORT_REQ hardware command. Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-09-15 20:49:20 -04:00
Varun Prakash	29fb6f42e7	libcxgb, iw_cxgb4, cxgbit: add cxgb_mk_close_con_req() Add cxgb_mk_close_con_req() to remove duplicate code to form CPL_CLOSE_CON_REQ hardware command. Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-09-15 20:49:20 -04:00
Varun Prakash	a1a234542b	libcxgb,iw_cxgb4,cxgbit: add cxgb_mk_tid_release() Add cxgb_mk_tid_release() to remove duplicate code to form CPL_TID_RELEASE hardware command. Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-09-15 20:49:20 -04:00
Varun Prakash	cc516700c7	libcxgb,iw_cxgb4,cxgbit: add cxgb_compute_wscale() Add cxgb_compute_wscale() in libcxgb_cm.h to remove it's duplicate definitions from cxgb4/cm.c and cxgbit/cxgbit_cm.c. Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-09-15 20:49:20 -04:00
Varun Prakash	44c6d06992	libcxgb,iw_cxgb4,cxgbit: add cxgb_best_mtu() Add cxgb_best_mtu() in libcxgb_cm.h to remove it's duplicate definitions from cxgb4/cm.c and cxgbit/cxgbit_cm.c Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-09-15 20:49:20 -04:00
Varun Prakash	b65eef0a5b	libcxgb,iw_cxgb4,cxgbit: add cxgb_is_neg_adv() Add cxgb_is_neg_adv() in libcxgb_cm.h to remove it's duplicate definitions from cxgb4/cm.c and cxgbit/cxgbit_cm.c. Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-09-15 20:49:19 -04:00
Varun Prakash	95554761d1	libcxgb,iw_cxgb4,cxgbit: add cxgb_find_route6() Add cxgb_find_route6() in libcxgb_cm.c to remove it's duplicate definitions from cxgb4/cm.c and cxgbit/cxgbit_cm.c. Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-09-15 20:49:19 -04:00
Varun Prakash	804c2f3e36	libcxgb,iw_cxgb4,cxgbit: add cxgb_find_route() Add cxgb_find_route() in libcxgb_cm.c to remove it's duplicate definitions from cxgb4/cm.c and cxgbit/cxgbit_cm.c. Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-09-15 20:49:19 -04:00
Varun Prakash	85e42b044e	libcxgb,iw_cxgb4,cxgbit: add cxgb_get_4tuple() Add cxgb_get_4tuple() in libcxgb_cm.c to remove it's duplicate definitions from cxgb4/cm.c and cxgbit/cxgbit_cm.c. Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-09-15 20:49:19 -04:00
Linus Torvalds	46626600d1	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "A set of fixes for the current series in the realm of block. Like the previous pull request, the meat of it are fixes for the nvme fabrics/target code. Outside of that, just one fix from Gabriel for not doing a queue suspend if we didn't get the admin queue setup in the first place" * 'for-linus' of git://git.kernel.dk/linux-block: nvme-rdma: add back dependency on CONFIG_BLOCK nvme-rdma: fix null pointer dereference on req->mr nvme-rdma: use ib_client API to detect device removal nvme-rdma: add DELETING queue flag nvme/quirk: Add a delay before checking device ready for memblaze device nvme: Don't suspend admin queue that wasn't created nvme-rdma: destroy nvme queue rdma resources on connect failure nvme_rdma: keep a ref on the ctrl during delete/flush iw_cxgb4: block module unload until all ep resources are released iw_cxgb4: call dev_put() on l2t allocation failure	2016-09-15 13:22:59 -07:00
Jens Axboe	3bc42f3f0e	Merge branch 'nvmf-4.8-rc' of git://git.infradead.org/nvme-fabrics into for-linus Sagi writes: Here we have: - Kconfig dependencies fix from Arnd - nvme-rdma device removal fixes from Steve - possible bad deref fix from Colin	2016-09-13 07:58:34 -06:00
David S. Miller	b20b378d49	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/mediatek/mtk_eth_soc.c drivers/net/ethernet/qlogic/qed/qed_dcbx.c drivers/net/phy/Kconfig All conflicts were cases of overlapping commits. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-09-12 15:52:44 -07:00
Steve Wise	37eb816c08	iw_cxgb4: block module unload until all ep resources are released Otherwise an endpoint can be still closing down causing a touch after free crash. Also WARN_ON if ulps have failed to destroy various resources during device removal. Fixes: `ad61a4c7a9` ("iw_cxgb4: don't block in destroy_qp awaiting the last deref") Reviewed-by: Sagi Grimberg <sagi@grimbrg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>	2016-09-04 10:00:53 +03:00
Steve Wise	609e941a6b	iw_cxgb4: call dev_put() on l2t allocation failure Reviewed-by: Sagi Grimberg <sagi@grimbrg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>	2016-09-04 10:00:53 +03:00
Mike Marciniszyn	16170d9c10	IB/hfi1: Rework debugfs to use SRCU The debugfs RCU trips many debug kernel warnings because of potential sleeps with an RCU read lock held. This includes both user copy calls and slab allocations throughout the file. This patch switches the RCU to use SRCU for file remove/access race protection. In one case, the SRCU is implicit in the use of the raw debugfs file object and just works. In the seq_file case, a wrapper around seq_read() and seq_lseek() is used to enforce the SRCU using the debugfs supplied functions debugfs_use_file_start() and debugfs_use_file_stop(). The sychronize_rcu() is deleted since the SRCU prevents the remove access race. The RCU locking is kept for qp_stats since the QP hash list is protected using the non-sleepable RCU. Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-02 14:26:55 -04:00
Harish Chegondi	429b6a7217	IB/hfi1: Make n_krcvqs be an unsigned long integer The global variable n_krcvqs stores the sum of the number of kernel receive queues of VLs 0-7 which the user can pass to the driver through the module parameter array krcvqs which is of type unsigned integer. If the user passes large value(s) into krcvqs parameter array, it can cause an arithmetic overflow while calculating n_krcvqs which is also of type unsigned int. The overflow results in an incorrect value of n_krcvqs which can lead to kernel crash while loading the driver. Fix by changing the data type of n_krcvqs to unsigned long. This patch also changes the data type of other variables that get their values from n_krcvqs. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-02 14:26:55 -04:00
Dean Luick	673b975f1f	IB/hfi1: Add QSFP sanity pre-check Sometimes a QSFP device does not respond in the expected time after a power-on. Add a read pre-check/retry when starting the link on driver load. Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-02 14:26:55 -04:00
Jubin John	af53493916	IB/hfi1: Fix AHG KDETH Intr shift In the set_txreq_header_ahg(), The KDETH Intr bit is obtained from the header in the user sdma request using a KDETH_GET shift and mask macro. This value is then futher right shifted by 16 causing us to lose the value i.e it is shifted to zero, leading to the following smatch warning: drivers/infiniband/hw/hfi1/user_sdma.c:1482 set_txreq_header_ahg() warn: mask and shift to zero The Intr bit should be left shifted into its correct position in the KDETH header before the AHG update. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-02 14:26:55 -04:00
Sebastian Sanchez	3e6c3b0fd5	IB/hfi1: Fix SGE length for misaligned PIO copy When trying to align the source pointer and there's a byte carry in an SGE copy, bytes are borrowed from the next quad-word X to complete the required quad-word copy. Then, the SGE length is reduced by the number of borrowed bytes. After this, if the remaining number of bytes from quad-word X (extra bytes) is greater than the new SGE length, the number of extra bytes needs to be updated to the new SGE length. Otherwise, when the SGE length gets updated again after the extra bytes are read to create the new byte carry, it goes negative, which then becomes a very large number as the SGE length is an unsigned integer. This causes SGE buffer to be over-read. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-02 14:26:55 -04:00
Leon Romanovsky	dbdf7d4e7f	IB/mlx5: Don't return errors from poll_cq Remove returning errors from mlx5 poll_cq function. Polling CQ operation in kernel never fails by Mellanox HCA architecture and respective driver design. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-02 14:11:40 -04:00
Yishai Hadas	d9f88e5ab9	IB/mlx5: Use TIR number based on selector Use TIR number based on selector, it should be done to differentiate between RSS QP to RAW one. Reported-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Tested-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-02 14:11:40 -04:00
Leon Romanovsky	b2a232d21f	IB/mlx5: Simplify code by removing return variable Return variable was set in a line before the actual return was called in begin_wqe function. This patch removes such variable and simplifies the code. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-02 14:11:39 -04:00
Chuck Lever	24be409bee	IB/mlx5: Return EINVAL when caller specifies too many SGEs The returned value should be EINVAL, because it is caused by wrong caller and not by internal overflow event. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-02 14:11:39 -04:00
Leon Romanovsky	20697434b6	IB/mlx4: Don't return errors from poll_cq Remove returning errors from mlx4 poll_cq function. Polling CQ operation in kernel never fails by Mellanox HCA architecture and respective driver design. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-02 14:11:38 -04:00
Leon Romanovsky	25b64fc5f2	Revert "IB/mlx4: Return EAGAIN for any error in mlx4_ib_poll_one" By Mellanox HW design and SW implementation, poll_cq never fails and returns errors, so all these printks are to catch ULP bugs. In case of such bug, the reverted patch will cause reentry of the function, resulting in a printk storm. This reverts commit `5412352fcd` ("IB/mlx4: Return EAGAIN for any error in mlx4_ib_poll_one") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-02 14:09:14 -04:00
Erez Shitrit	546481c281	IB/ipoib: Fix memory corruption in ipoib cm mode connect flow When a new CM connection is being requested, ipoib driver copies data from the path pointer in the CM/tx object, the path object might be invalid at the point and memory corruption will happened later when now the CM driver will try using that data. The next scenario demonstrates it: neigh_add_path --> ipoib_cm_create_tx --> queue_work (pointer to path is in the cm/tx struct) #while the work is still in the queue, #the port goes down and causes the ipoib_flush_paths: ipoib_flush_paths --> path_free --> kfree(path) #at this point the work scheduled starts. ipoib_cm_tx_start --> copy from the (invalid)path pointer: (memcpy(&pathrec, &p->path->pathrec, sizeof pathrec);) -> memory corruption. To fix that the driver now starts the CM/tx connection only if that specific path exists in the general paths database. This check is protected with the relevant locks, and uses the gid from the neigh member in the CM/tx object which is valid according to the ref count that was taken by the CM/tx. Fixes: `839fcaba35` ('IPoIB: Connected mode experimental support') Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-02 14:07:38 -04:00
Erez Shitrit	68c6bcdd8b	IB/core: Fix use after free in send_leave function The function send_leave sets the member: group->query_id (group->query_id = ret) after calling the sa_query, but leave_handler can be executed before the setting and it might delete the group object, and will get a memory corruption. Additionally, this patch gets rid of group->query_id variable which is not used. Fixes: `faec2f7b96` ('IB/sa: Track multicast join/leave requests') Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-02 14:06:27 -04:00
Baoyou Xie	656aacea6c	IB/cxgb4: Make _free_qp static to silence build warning We get 1 warning when build kernel with W=1: drivers/infiniband/hw/cxgb4/qp.c:686:6: warning: no previous prototype for '_free_qp' [-Wmissing-prototypes] In fact, this function is only used in the file in which it is declared and don't need a declaration, but can be made static. so this patch marks it 'static'. Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-02 13:46:33 -04:00
Raju Rangoju	63b268d232	IB/isert: Properly release resources on DEVICE_REMOVAL When the low level driver exercises the hot unplug they would call rdma_cm cma_remove_one which would fire DEVICE_REMOVAL event to all cma consumers. Now, if consumer doesn't make sure they destroy all IB objects created on that IB device instance prior to finalizing all processing of DEVICE_REMOVAL callback, rdma_cm will let the lld to de-register with IB core and destroy the IB device instance. And if the consumer calls (say) ib_dereg_mr(), it will crash since that dev object is NULL. In the current implementation, iser-target just initiates the cleanup and returns from DEVICE_REMOVAL callback. This deferred work creates a race between iser-target cleaning IB objects(say MR) and lld destroying IB device instance. This patch includes the following fixes -> make sure that consumer frees all IB objects associated with device instance -> return non-zero from the callback to destroy the rdma_cm id Signed-off-by: Raju Rangoju <rajur@chelsio.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-02 13:46:32 -04:00
Christophe Jaillet	6aaa382f12	IB/hfi1: Fix the size parameter to find_first_bit The 2nd parameter of 'find_first_bit' is the number of bits to search. In this case, we are passing 'sizeof(u64)' which is 8. It is likely that the number of bits of 'port_mask' was expected here. Use sizeof() * 8 to get the correct number. It has been spotted by the following coccinelle script: @@ expression ret, x; @@ * ret = \(find_first_bit \\| find_first_zero_bit\) (x, sizeof(...)); Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-02 13:46:32 -04:00
Christophe Jaillet	fffd68734d	IB/mlx5: Fix the size parameter to find_first_bit The 2nd parameter of 'find_first_bit' is the number of bits to search. In this case, we are passing 'sizeof(tmp)' which is likely to be 4 or 8 because 'tmp' is an 'unsigned long'. It is likely that the number of bits of 'tmp' was expected here. So use BITS_PER_LONG instead. It has been spotted by the following coccinelle script: @@ expression ret, x; @@ * ret = \(find_first_bit \\| find_first_zero_bit\) (x, sizeof(...)); Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Majd Dibbiny <majd@mellanox.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-09-02 13:46:12 -04:00
David S. Miller	6abdd5f593	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net All three conflicts were cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-08-30 00:54:02 -04:00
Christophe Jaillet	61a28d2b69	IB/hfi1: Clean up type used and casting In all other places in this file where 'find_first_bit' is called, port_num is defined as a 'u8' and no casting is done. Do the same here in order to be more consistent. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-26 10:01:49 -04:00
Shiraz Saleem	b71121b4b7	i40iw: Receive notification events correctly Device notifications are not received after the first interface is closed; since there is an unregister for notifications on every interface close. Correct this by unregistering for device notifications only when the last interface is closed. Also, make all operations on the i40iw_notifiers_registered atomic as it can be read/modified concurrently. Fixes: `8e06af711b` ("i40iw: add main, hdr, status") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-26 09:59:13 -04:00
Mustafa Ismail	866e0f4d73	i40iw: Update hw_iwarp_state Update iwqp->hw_iwarp_state to reflect the new state of the CQP modify QP operation. This avoids reissuing a CQP operation to modify a QP to a state that it is already in. Fixes: `4e9042e647` ("i40iw: add hw and utils files") Reported-by: Stefan Assmann <sassmann@redhat.com> Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-26 09:59:13 -04:00
Doug Ledford	049b1e7c7e	Merge branch 'misc-fixes' into k.o/for-4.8-rc	2016-08-25 11:17:10 -04:00
Tatyana Nikolova	07c72d7d54	i40iw: Send last streaming mode message for loopback connections Send a zero length last streaming mode message for loopback connections to synchronize between accepting QP and connecting QP. This avoids data transfer to start on the accepting QP before the connecting QP is in RTS. Also remove function i40iw_loopback_nop() as it is no longer used. Fixes: `f27b4746f3` ("i40iw: add connection management code") Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-25 11:15:47 -04:00
Doug Ledford	64278fe89b	Merge branch 'hns-roce' into k.o/for-4.9	2016-08-25 10:05:23 -04:00
Salil	528f1deb16	IB/hns: Add support of ACPI to the Hisilicon RoCE driver This patch is meant to add support of ACPI to the Hisilicon RoCE driver. Changes done are primarily meant to detect the type and then either use DT specific or ACPI spcific functions. Where ever possible, this patch tries to make use of Unified Device Property Interface APIs to support both DT and ACPI through single interface. This patch depends upon HNS ethernet driver to Reset RoCE. This function within HNS ethernet driver has also been enhanced to support ACPI and is part of other accompanying patch with this patch-set. NOTE: The changes in this patch are done over below branch, https://github.com/dledford/linux/tree/hns-roce Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-25 10:05:10 -04:00
Doug Ledford	d68478dae3	Merge branch 'mlx5-shared' into k.o/for-4.9	2016-08-25 10:02:43 -04:00
Doug Ledford	716b076ba4	IB/srpt: Update sport->port_guid with each port refresh If port_guid is set with the default subnet_prefix, then we get a change event and run a port refresh, we don't update the port_guid. As a result, attempts to create a target device that uses the new subnet_prefix in the wwn will fail to find a match and be rejected by the ib_srpt driver. This makes it impossible to configure a port if it was initialized with a default subnet_prefix and later changed to any non-default subnet-prefix. Updating the port refresh task to always update the wwn based upon the current subnext_prefix solves this problem. Cc: Bart Van Assche <bart.vanassche@sandisk.com> Cc: nab@linux-iscsi.org Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-24 16:51:16 -04:00
Selvin Xavier	3c199b4523	RDMA/ocrdma: Fix the max_sge reported from FW Current driver is reporting wrong values for max_sge and max_sge_rd in query_device. This breaks the nfs rdma and iser in some device profiles. Fixing the driver to report correct values from FW. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-24 11:31:40 -04:00
Mustafa Ismail	433c58139f	i40iw: Avoid writing to freed memory iwpbl->iwmr points to the structure that contains iwpbl, which is iwmr. Setting this to NULL would result in writing to freed memory. So just free iwmr, and return. Fixes: `d374984179` ("i40iw: add files for iwarp interface") Reported-by: Stefan Assmann <sassmann@redhat.com> Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-24 11:31:40 -04:00
Mustafa Ismail	d41d0910d9	i40iw: Fix double free of allocated_buffer Memory allocated for iwqp; iwqp->allocated_buffer is freed twice in the create_qp error path. Correct this by having it freed only once in i40iw_free_qp_resources(). Fixes: `d374984179` ("i40iw: add files for iwarp interface") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-24 11:31:39 -04:00
Chris Wilson	82d200cc6f	IB/mlx5: Remove superfluous include of io-mapping.h This file does not use any structs or functions defined by io-mapping.h (nor does it directly use iomap, ioremap, iounamp or friends). Remove it to simplify verification of changes to io-mapping.h The include existed since its inception in commit `e126ba97db` Author: Eli Cohen <eli@mellanox.com> Date: Sun Jul 7 17:25:49 2013 +0300 mlx5: Add driver for Mellanox Connect-IB adapters which looks like a copy across from the Mellanox ethernet driver. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Eli Cohen <eli@mellanox.com> Cc: Jack Morgenstein <jackm@dev.mellanox.co.il> Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: Matan Barak <matanb@mellanox.com> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: linux-rdma@vger.kernel.org Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Laurence Oberman <loberman@redhat.com> Tested-by: Laurence Oberman <loberman@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-24 11:30:39 -04:00
Mustafa Ismail	7eaf8313b1	i40iw: Do not set self-referencing pointer to NULL after kfree In i40iw_free_virt_mem(), do not set mem->va to NULL after freeing it as mem->va is a self-referencing pointer to mem. Fixes: `4e9042e647` ("i40iw: add hw and utils files") Reported-by: Stefan Assmann <sassmann@redhat.com> Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-24 11:25:34 -04:00
Shiraz Saleem	5dfd5e5e3b	i40iw: Add missing NULL check for MPA private data Add NULL check for pdata and pdata->addr before the memcpy in i40iw_form_cm_frame(). This fixes a NULL pointer de-reference which occurs when the MPA private data pointer is NULL. Also only copy pdata->size bytes in the memcpy to prevent reading past the length of the private data buffer provided by upper layer. Fixes: `f27b4746f3` ("i40iw: add connection management code") Reported-by: Stefan Assmann <sassmann@redhat.com> Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-24 11:21:51 -04:00
Bharat Potnuri	cff069b78c	iw_cxgb4: Fix cxgb4 arm CQ logic w/IB_CQ_REPORT_MISSED_EVENTS Current cxgb4 arm CQ logic ignores IB_CQ_REPORT_MISSED_EVENTS for request completion notification on a CQ. Due to this ib_poll_handler() assumes all events polled and avoids further iopoll scheduling. This patch adds logic to cxgb4 ib_req_notify_cq() handler to check if CQ is not empty and return accordingly. Based on the return value of ib_req_notify_cq() handler, ib_poll_handler() will schedule a run of iopoll handler. Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-23 12:52:52 -04:00
Mustafa Ismail	faa739fb5d	i40iw: Add missing check for interface already open In i40iw_open(), check if interface is already open and return success if it is. Fixes: `8e06af711b` ("i40iw: add main, hdr, status") Reported-by: Stefan Assmann <sassmann@redhat.com> Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-23 12:52:52 -04:00
Mustafa Ismail	44856be3e9	i40iw: Protect req_resource_num update In i40iw_alloc_resource(), ensure that the update to req_resource_num is protected by the lock. Fixes: `8e06af711b` ("i40iw: add main, hdr, status") Reported-by: Stefan Assmann <sassmann@redhat.com> Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-23 12:40:20 -04:00
Shiraz Saleem	6c7d46fdb8	i40iw: Change mem_resources pointer to a u8 iwdev->mem_resources is incorrectly defined as an unsigned long instead of u8. As a result, the offset into the dynamic allocated structures in i40iw_initialize_hw_resources() is incorrectly calculated and would lead to writing of memory regions outside of the allocated buffer. Fixes: `8e06af711b` ("i40iw: add main, hdr, status") Reported-by: Stefan Assmann <sassmann@redhat.com> Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-23 12:40:19 -04:00
Markus Elfring	48ef5865d0	IB/qib: Use memdup_user() rather than duplicating its implementation Reuse existing functionality from memdup_user() instead of keeping duplicate source code. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-23 12:38:24 -04:00
Steve Wise	30b03b1528	iw_cxgb4: use the MPA initiator's IRD if < our ORD The i40iw initiator sends an MPA-request with ird=16 and ord=16. The cxgb4 responder sends an MPA-reply with ord = 32 causing i40iw to terminate due to insufficient resources. The logic to reduce the ORD to <= peer's IRD was wrong. Reported-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-22 15:00:42 -04:00
Steve Wise	7f446abf12	iw_cxgb4: limit IRD/ORD advertised to ULP by device max. The i40iw initiator sends an MPA-request with ird = 63, ord = 63. The cxgb4 responder sends a RST. Since the inbound ord=63 and it exceeds the max_ird/c4iw_max_read_depth (=32 default), chelsio decides to abort. Instead, cxgb4 should adjust the ord/ird down before presenting it to the ULP. Reported-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-22 15:00:42 -04:00
Ira Weiny	e0cf75deab	IB/hfi1: Fix mm_struct use after free Testing with CONFIG_SLUB_DEBUG_ON=y resulted in the kernel panic below. This is the result of the mm_struct sometimes being free'd prior to hfi1_file_close being called. This was due to the combination of 2 reasons: 1) hfi1_file_close is deferred in process exit and it therefore may not be called synchronously with process exit. 2) exit_mm is called prior to exit_files in do_exit. Normally this is ok however, our kernel bypass code requires us to have access to the mm_struct for house keeping both at "normal" close time as well as at process exit. Therefore, the fix is to simply keep a reference to the mm_struct until we are done with it. [ 3006.340150] general protection fault: 0000 [#1] SMP [ 3006.346469] Modules linked in: hfi1 rdmavt rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm dm_mirror dm_region_hash dm_log dm_mod snd_hda_code c_realtek iTCO_wdt snd_hda_codec_generic iTCO_vendor_support sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass c rct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw snd_hda_intel gf128mul snd_hda_codec glue_helper snd_hda_core ablk_helper sn d_hwdep cryptd snd_seq snd_seq_device snd_pcm snd_timer snd soundcore pcspkr shpchp mei_me sg lpc_ich mei i2c_i801 mfd_core ioatdma ipmi_devi ntf wmi ipmi_si ipmi_msghandler acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 jbd2 mbcache mlx4_en ib_core sr_mod s d_mod cdrom crc32c_intel mgag200 drm_kms_helper syscopyarea sysfillrect igb sysimgblt fb_sys_fops ptp mlx4_core ttm isci pps_core ahci drm li bsas libahci dca firewire_ohci i2c_algo_bit scsi_transport_sas firewire_core crc_itu_t i2c_core libata [last unloaded: mlx4_ib] [ 3006.461759] CPU: 16 PID: 11624 Comm: mpi_stress Not tainted 4.7.0-rc5+ #1 [ 3006.469915] Hardware name: Intel Corporation W2600CR ........../W2600CR, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 [ 3006.483027] task: ffff8804102f0040 ti: ffff8804102f8000 task.ti: ffff8804102f8000 [ 3006.491971] RIP: 0010:[<ffffffff810f0383>] [<ffffffff810f0383>] __lock_acquire+0xb3/0x19e0 [ 3006.501905] RSP: 0018:ffff8804102fb908 EFLAGS: 00010002 [ 3006.508447] RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000001 RCX: 0000000000000000 [ 3006.517012] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff880410b56a40 [ 3006.525569] RBP: ffff8804102fb9b0 R08: 0000000000000001 R09: 0000000000000000 [ 3006.534119] R10: ffff8804102f0040 R11: 0000000000000000 R12: 0000000000000000 [ 3006.542664] R13: ffff880410b56a40 R14: 0000000000000000 R15: 0000000000000000 [ 3006.551203] FS: 00007ff478c08700(0000) GS:ffff88042e200000(0000) knlGS:0000000000000000 [ 3006.560814] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3006.567806] CR2: 00007f667f5109e0 CR3: 0000000001c06000 CR4: 00000000000406e0 [ 3006.576352] Stack: [ 3006.579157] ffffffff8124b819 ffffffffffffffff 0000000000000000 ffff8804102fb940 [ 3006.588072] 0000000000000002 0000000000000000 ffff8804102f0040 0000000000000007 [ 3006.596971] 0000000000000006 ffff8803cad6f000 0000000000000000 ffff8804102f0040 [ 3006.605878] Call Trace: [ 3006.609220] [<ffffffff8124b819>] ? uncharge_batch+0x109/0x250 [ 3006.616382] [<ffffffff810f2313>] lock_acquire+0xd3/0x220 [ 3006.623056] [<ffffffffa0a30bfc>] ? hfi1_release_user_pages+0x7c/0xa0 [hfi1] [ 3006.631593] [<ffffffff81775579>] down_write+0x49/0x80 [ 3006.638022] [<ffffffffa0a30bfc>] ? hfi1_release_user_pages+0x7c/0xa0 [hfi1] [ 3006.646569] [<ffffffffa0a30bfc>] hfi1_release_user_pages+0x7c/0xa0 [hfi1] [ 3006.654898] [<ffffffffa0a2efb6>] cacheless_tid_rb_remove+0x106/0x330 [hfi1] [ 3006.663417] [<ffffffff810efd36>] ? mark_held_locks+0x66/0x90 [ 3006.670498] [<ffffffff817771f6>] ? _raw_spin_unlock_irqrestore+0x36/0x60 [ 3006.678741] [<ffffffffa0a2f1ee>] tid_rb_remove+0xe/0x10 [hfi1] [ 3006.686010] [<ffffffffa0a0c5d5>] hfi1_mmu_rb_unregister+0xc5/0x100 [hfi1] [ 3006.694387] [<ffffffffa0a2fcb9>] hfi1_user_exp_rcv_free+0x39/0x120 [hfi1] [ 3006.702732] [<ffffffffa09fc6ea>] hfi1_file_close+0x17a/0x330 [hfi1] [ 3006.710489] [<ffffffff81263e9a>] __fput+0xfa/0x230 [ 3006.716595] [<ffffffff8126400e>] ____fput+0xe/0x10 [ 3006.722696] [<ffffffff810b95c6>] task_work_run+0x86/0xc0 [ 3006.729379] [<ffffffff81099933>] do_exit+0x323/0xc40 [ 3006.735672] [<ffffffff8109a2dc>] do_group_exit+0x4c/0xc0 [ 3006.742371] [<ffffffff810a7f55>] get_signal+0x345/0x940 [ 3006.748958] [<ffffffff810340c7>] do_signal+0x37/0x700 [ 3006.755328] [<ffffffff8127872a>] ? poll_select_set_timeout+0x5a/0x90 [ 3006.763146] [<ffffffff811609cb>] ? __audit_syscall_exit+0x1db/0x260 [ 3006.770853] [<ffffffff8110f3e3>] ? rcu_read_lock_sched_held+0x93/0xa0 [ 3006.778765] [<ffffffff812347a4>] ? kfree+0x1e4/0x2a0 [ 3006.784986] [<ffffffff8108e75a>] ? exit_to_usermode_loop+0x33/0xac [ 3006.792551] [<ffffffff8108e785>] exit_to_usermode_loop+0x5e/0xac [ 3006.799907] [<ffffffff81003dca>] do_syscall_64+0x12a/0x190 [ 3006.806664] [<ffffffff81777a7f>] entry_SYSCALL64_slow_path+0x25/0x25 [ 3006.814396] Code: 24 08 44 89 44 24 10 89 4c 24 18 e8 a8 d8 ff ff 48 85 c0 8b 4c 24 18 44 8b 44 24 10 44 8b 4c 24 08 4c 8b 14 24 0f 84 30 08 00 00 <f0> ff 80 98 01 00 00 8b 3d 48 ad be 01 45 8b a2 90 0b 00 00 85 [ 3006.837158] RIP [<ffffffff810f0383>] __lock_acquire+0xb3/0x19e0 [ 3006.844401] RSP <ffff8804102fb908> [ 3006.851170] ---[ end trace b7b9f21cf06c27df ]--- [ 3006.927420] Kernel panic - not syncing: Fatal exception [ 3006.933954] Kernel Offset: disabled [ 3006.940961] ---[ end Kernel panic - not syncing: Fatal exception [ 3006.948249] ------------[ cut here ]------------ Fixes: `3faa3d9a30` ("IB/hfi1: Make use of mm consistent") Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-22 15:00:42 -04:00
Mike Marciniszyn	56c8ca510d	IB/rdmvat: Fix double vfree() in rvt_create_qp() error path The unwind logic for creating a user QP has a double vfree of the non-shared receive queue when handling a "too many qps" failure. The code unwinds the mmmap info by decrementing a reference count which will call rvt_release_mmap_info() which in turn does the vfree() of the r_rq.wq. The unwind code then does the same free. Fix by guarding the vfree() with the same test that is done in close and only do the vfree() if qp->ip is NULL. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-22 15:00:42 -04:00
Mitko Haralanov	08fe16f619	IB/hfi1: Improve J_KEY generation Previously, J_KEY generation was based on the lower 16 bits of the user's UID. While this works, it was not good enough as a non-root user could collide with a root user given a sufficiently large UID. This patch attempt to improve the J_KEY generation by using the following algorithm: The 16 bit J_KEY space is partitioned into 3 separate spaces reserved for different user classes: * all users with administtor privileges (including 'root') will use J_KEYs in the range of 0 to 31, * all kernel protocols, which use KDETH packets will use J_KEYs in the range of 32 to 63, and * all other users will use J_KEYs in the range of 64 to 65535. The above separation is aimed at preventing different user levels from sending packets to each other and, additionally, separate kernel protocols from all other types of users. The later is meant to prevent the potential corruption of kernel memory by any other type of user. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-22 15:00:42 -04:00
Easwar Hariharan	f29a08dc14	IB/hfi1: Return invalid field for non-QSFP CableInfo queries The driver does not check if the CableInfo query is supported for the port type. Return early if CableInfo is not supported for the port type, making compliance with the specification explicit and preventing lower level code from potentially doing the wrong thing if the query is not supported for the hardware implementation. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-22 15:00:42 -04:00
Christophe Jaillet	86cd747c6d	IB/usnic: Fix error return code If 'pci_register_driver' fails, we return 'err' which is known to be 0. Return the error instead. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-22 14:31:43 -04:00
Christophe Jaillet	57bb562ad4	IB/hfi1: Add missing error code assignment before test It is likely that checking the result of 'setup_ctxt' is expected here. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-22 14:31:43 -04:00
Wei Yongjun	476d95bd02	IB/hfi1: Using kfree_rcu() to simplify the code The callback function of call_rcu() just calls a kfree(), so we can use kfree_rcu() instead of call_rcu() + callback function. Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-22 14:31:42 -04:00
Mike Marciniszyn	69b9f4a423	IB/hfi1: Validate header in set_armed_active Validate the etype to insure that the header is correct. Reviewed-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-22 14:31:42 -04:00
Mike Marciniszyn	c867caaf8e	IB/hfi1: Pass packet ptr to set_armed_active The "packet" parameter was being passed on the stack, change it to a pointer. Reviewed-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-22 14:31:41 -04:00
Easwar Hariharan	140690eae7	IB/hfi1: Fetch monitor values on-demand for CableInfo query The monitor values from bytes 22 through 81 of the QSFP memory space (SFF 8636) are dynamic and serving them out of the QSFP memory cache maintained by the driver provides stale data to the CableInfo SMA query. This patch refreshes the dynamic values from the QSFP memory on request and overwrites the stale data from the cache for the overlap between the requested range and the monitor range. Reviewed-by: Jubin John <jubin.john@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-22 14:31:41 -04:00
Mike Marciniszyn	c62fb260a8	IB/hfi1,IB/qib: Fix qp_stats sleep with rcu read lock held The qp init function does a kzalloc() while holding the RCU lock that encounters the following warning with a debug kernel when a cat of the qp_stats is done: [ 231.723948] rcu_scheduler_active = 1, debug_locks = 0 [ 231.731939] 3 locks held by cat/11355: [ 231.736492] #0: (debugfs_srcu){......}, at: [<ffffffff813001a5>] debugfs_use_file_start+0x5/0x90 [ 231.746955] #1: (&p->lock){+.+.+.}, at: [<ffffffff81289a6c>] seq_read+0x4c/0x3c0 [ 231.755873] #2: (rcu_read_lock){......}, at: [<ffffffffa0a0c535>] _qp_stats_seq_start+0x5/0xd0 [hfi1] [ 231.766862] The init functions do an implicit next which requires the rcu read lock before the kzalloc(). Fix for both drivers is to change the scope of the init function to only do the allocation and the initialization of the just allocated iter. The implict next is moved back into the respective start functions to fix the issue. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> CC: <stable@vger.kernel.org> # 4.6.x- Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-22 14:31:34 -04:00
Wei Yongjun	abb658ef05	IB/hfi1: Remove duplicated include from affinity.c Remove duplicated include. Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-22 14:27:14 -04:00
Wei Yongjun	1d5840c971	IB/isert: fix error return code in isert_alloc_login_buf() Fix to return error code -ENOMEM from the alloc error handling case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-22 14:26:55 -04:00
Wei Yongjun	23d70503ee	IB/core: Fix possible memory leak in cma_resolve_iboe_route() 'work' and 'route->path_rec' are malloced in cma_resolve_iboe_route() and should be freed before leaving from the error handling cases, otherwise it will cause memory leak. Fixes: `200298326b` ('IB/core: Validate route when we init ah') Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-22 14:26:54 -04:00
Tadeusz Struk	8303f683b1	IB/hfi1: Allocate cpu mask on the heap to silence warning If CONFIG_FRAME_WARN is small (1K) and CONFIG_NR_CPUS big then a frame size warning is triggered during build. Allocate the cpu mask dynamically to silence the warning. Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-22 14:26:54 -04:00
Yuval Shaia	5412352fcd	IB/mlx4: Return EAGAIN for any error in mlx4_ib_poll_one Error code EAGAIN should be used when errors are temporary and next call might succeeds. When error code other than EAGAIN is returned, the caller (mlx4_ib_poll) will assume all CQE in the same bunch are error too and will drop them all. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-22 14:26:53 -04:00
Yuval Shaia	e6a00f6684	IB/mlx4: Make function use_tunnel_data return void No need to return int if function always returns 0 Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-22 14:26:45 -04:00
Wei Yongjun	204f69ba64	IB/hns: Fix return value check in hns_roce_get_cfg() In case of error, the function devm_ioremap_resource() returns ERR_PTR() and never returns NULL. The NULL test in the return value check should be replaced with IS_ERR(). Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-22 14:13:18 -04:00
oulijun	8793f779cf	IB/hns: Kconfig and Makefile for RoCE module This patch added Kconfig and Makefile for building RoCE module. Signed-off-by: Wei Hu <xavier.huwei@huawei.com> Signed-off-by: Nenglong Zhao <zhaonenglong@hisilicon.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-22 14:02:33 -04:00
oulijun	9a4435375c	IB/hns: Add driver files for hns RoCE driver These are the various new source code files for the Hisilicon RoCE driver for ARM architecture. Signed-off-by: Wei Hu <xavier.huwei@huawei.com> Signed-off-by: Nenglong Zhao <zhaonenglong@hisilicon.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-22 14:02:32 -04:00
Noa Osherovich	d5beb7f2af	net/mlx5: Separate query_port_proto_oper for IB and EN Replaced mlx5_query_port_proto_oper with separate functions per link type. The functions should take different arguments so no point in trying to unite them. Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2016-08-18 18:49:52 +03:00
Saeed Mahameed	c4f287c4a6	net/mlx5: Unify and improve command interface Now as all commands use mlx5 ifc interface, instead of doing two calls for executing a command we embed command status checking into mlx5_cmd_exec to simplify the interface. Also we do here some cleanup for redundant software structures (inbox/outbox) and functions and improved command failure output. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2016-08-17 17:45:58 +03:00
Saeed Mahameed	1a412fb1ca	{net,IB}/mlx5: Modify QP commands via mlx5 ifc Prior to this patch we assumed that modify QP commands have the same layout. In ConnectX-4 for each QP transition there is a specific command and their layout can vary. e.g: 2err/2rst commands don't have QP context in their layout and before this patch we posted the QP context in those commands. Fortunately the FW only checks the suffix of the commands and executes them, while ignoring all invalid data sent after the valid command layout. This patch removes mlx5_modify_qp_mbox_in and changes mlx5_core_qp_modify to receive the required transition and QP context with opt_param_mask if needed. This way the caller is not required to provide the command inbox layout and it will be generated automatically. mlx5_core_qp_modify will generate the command inbox/outbox layouts according to the requested transition and will fill the requested parameters. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2016-08-17 17:45:58 +03:00
Saeed Mahameed	09a7d9eca1	{net,IB}/mlx5: QP/XRCD commands via mlx5 ifc Remove old representation of manually created QP/XRCD commands layout amd use mlx5_ifc canonical structures and defines. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2016-08-17 17:45:57 +03:00
Saeed Mahameed	ec22eb5310	{net,IB}/mlx5: MKey/PSV commands via mlx5 ifc Remove old representation of manually created MKey/PSV commands layout, and use mlx5_ifc canonical structures and defines. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2016-08-14 14:39:18 +03:00
Saeed Mahameed	2782778663	{net,IB}/mlx5: CQ commands via mlx5 ifc Remove old representation of manually created CQ commands layout, and use mlx5_ifc canonical structures and defines. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2016-08-14 14:39:15 +03:00
Linus Torvalds	84e39eeb08	Second round of merge items for 4.8 - hfi1 driver updates - Fix for max SGEs allowed via RDMA R/W API -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXoqUzAAoJELgmozMOVy/dNKAP/1/Rzn/k97eda1qFqzWpqsPl lMaxDiZZnRIAFJEqEF9Iwo1JLiFIzjpDJnqHB++CKuXZQT0NY6sHW0yrcyUwzsx7 5gui92ldkVg4vY7PTco171vyzG+79KKRZ1dFS14z7oC8XAg48zQ7yJmfb1op3dEw mgxyoLaaMwMF5aLwPoWG4+aPkBMtKUGB/ARb4ehq6M2p71c43lb18GaarJuWLdAz 1HxakXL/uzttyvGDyJGKDrT6ktXXSyvdCTRO60OrrPFJ67P2xRYXce85TLRr8srp Q5RNjyR5fP8uN0qtrQz+hl09mtBeBQHKomyFIOVwkB2r53OKqsR5g5roz3BlpA1X 7PF/MO0pKy4t8XQnLfohEwtNWgszupvxkyAAISI8MwzLOPra/V8smQ9CpTltx1UB hTu3tpAMy1auAjh8TWzzzII1ZoRZz6YCTziWnTaC3bqAljufjt1mnvjrtNmQ1sNi MCLeA3yr8HjlKWdwYr+gVfhSR1wEoOxwHZdLsvBsxmC32hFLlh6rbg2x8wceqTlR 4T8l0AERV1YPjsoSe3/pWVImKUA97qppIfeFcCZiBCBHBPlhpw3ebVt6B1mLVUCV hTMuZeFVcV75D+qr0kR5ZuVn4jgEn9zB1VH3tCV9LJnhBfySZFcP4yhATqiELaHG RVoVAiTBxq5RgNVOH4Zo =cQcp -----END PGP SIGNATURE----- Merge tag 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull second round of rdma updates from Doug Ledford: "This can be split out into just two categories: - fixes to the RDMA R/W API in regards to SG list length limits (about 5 patches) - fixes/features for the Intel hfi1 driver (everything else) The hfi1 driver is still being brought to full feature support by Intel, and they have a lot of people working on it, so that amounts to almost the entirety of this pull request" * tag 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (84 commits) IB/hfi1: Add cache evict LRU list IB/hfi1: Fix memory leak during unexpected shutdown IB/hfi1: Remove unneeded mm argument in remove function IB/hfi1: Consistently call ops->remove outside spinlock IB/hfi1: Use evict mmu rb operation IB/hfi1: Add evict operation to the mmu rb handler IB/hfi1: Fix TID caching actions IB/hfi1: Make the cache handler own its rb tree root IB/hfi1: Make use of mm consistent IB/hfi1: Fix user SDMA racy user request claim IB/hfi1: Fix error condition that needs to clean up IB/hfi1: Release node on insert failure IB/hfi1: Validate SDMA user iovector count IB/hfi1: Validate SDMA user request index IB/hfi1: Use the same capability state for all shared contexts IB/hfi1: Prevent null pointer dereference IB/hfi1: Rename TID mmu_rb_* functions IB/hfi1: Remove unneeded empty check in hfi1_mmu_rb_unregister() IB/hfi1: Restructure hfi1_file_open IB/hfi1: Make iovec loop index easy to understand ...	2016-08-04 20:26:31 -04:00
Linus Torvalds	0cda611386	Round one of 4.8 code - Updates/fixes for iw_cxgb4 driver - Updates/fixes for mlx5 driver - Add flow steering and RSS API - Add hardware stats to mlx4 and mlx5 drivers - Add firmware version API for RDMA driver use - Add the rxe driver (this is a software RoCE driver that makes any Ethernet device a RoCE device) - Fixes for i40iw driver - Support for send only multicast joins in the cma layer - Other minor fixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXo1vCAAoJELgmozMOVy/d0HcQAJqMi7siD9cSaMViYbu812pq 3kNkHZbLNB/947uShDPhhFAWFXU0nRxEnTNSvYxRo+nxnDE/9hEEXpx8OzzKLNU+ GXyDeHsEEriSFcaSne5Tak/QuiFm3PJv73ttXQROCtHG7KxLG9ieVbfusz42Xwiu 5R21qfp6PZEOC+j7L/fTZh/kEN3cfaDYrGnCgmU3z0ka9xG5Qe2/+uWGNkuioRA5 phFUR4MS+1n/VrnxPHrLXTrqv3sw8YfCfRImaXSBrxFVMqhno+cDDtEJQCRnmNrq 7KcJO2KqDMl/QqsjxdwqojNpUTh2t7SeOeQuzUsfXl15yyyetq2Zu7ZurkCGjNtQ NtTt6hv5eXq3mNuBmOPKYDDgakSYyYjS0zueoi8wFFqIeSYxRJv4wx4xoeJ/Bsz8 2LplpaPMQaTM65FhzYXGhYNBKaRkqjL9ihbIl1OcLNvfXAqLElfONM17/Yc/hgVw xfDtvNFrZcl7/exIpBBNOnxwbs4h78vvXsXoBiVoN7V/hBnMzDhkiBHNxNCfZXA0 REGs/cnyy6cpiJOnVCWs77NqL75oK/qb1mEwe1M+A2kaxe/tLixUdYXo/zclDPm8 3DLTL9lCgJIBIEiZT4q/alxLK+yUKD+SHtQT3lmF2Bfsmv/I38Uy55SXAiFO4yOq kwy96TvYtT43SkyNmmBf =oZOO -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull base rdma updates from Doug Ledford: "Round one of 4.8 code: while this is mostly normal, there is a new driver in here (the driver was hosted outside the kernel for several years and is actually a fairly mature and well coded driver). It amounts to 13,000 of the 16,000 lines of added code in here. Summary: - Updates/fixes for iw_cxgb4 driver - Updates/fixes for mlx5 driver - Add flow steering and RSS API - Add hardware stats to mlx4 and mlx5 drivers - Add firmware version API for RDMA driver use - Add the rxe driver (this is a software RoCE driver that makes any Ethernet device a RoCE device) - Fixes for i40iw driver - Support for send only multicast joins in the cma layer - Other minor fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (72 commits) Soft RoCE driver IB/core: Support for CMA multicast join flags IB/sa: Add cached attribute containing SM information to SA port IB/uverbs: Fix race between uverbs_close and remove_one IB/mthca: Clean up error unwind flow in mthca_reset() IB/mthca: NULL arg to pci_dev_put is OK IB/hfi1: NULL arg to sc_return_credits is OK IB/mlx4: Add diagnostic hardware counters net/mlx4: Query performance and diagnostics counters net/mlx4: Add diagnostic counters capability bit Use smaller 512 byte messages for portmapper messages IB/ipoib: Report SG feature regardless of HW UD CSUM capability IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct IB/hfi1: Disable by default IB/rdmavt: Disable by default IB/mlx5: Fix port counter ID association to QP offset IB/mlx5: Fix iteration overrun in GSI qps i40iw: Add NULL check for puda buffer i40iw: Change dup_ack_thresh to u8 i40iw: Remove unnecessary check for moving CQ head ...	2016-08-04 20:10:31 -04:00
Doug Ledford	7f1d25b47d	Merge branches 'misc' and 'rxe' into k.o/for-4.8-1	2016-08-04 11:13:47 -04:00
Moni Shoua	8700e3e7c4	Soft RoCE driver Soft RoCE (RXE) - The software RoCE driver ib_rxe implements the RDMA transport and registers to the RDMA core device as a kernel verbs provider. It also implements the packet IO layer. On the other hand ib_rxe registers to the Linux netdev stack as a udp encapsulating protocol, in that case RDMA, for sending and receiving packets over any Ethernet device. This yields a RDMA transport over the UDP/Ethernet network layer forming a RoCEv2 compatible device. The configuration procedure of the Soft RoCE drivers requires binding to any existing Ethernet network device. This is done with /sys interface. A userspace Soft RoCE library (librxe) provides user applications the ability to run with Soft RoCE devices. The use of rxe verbs ins user space requires the inclusion of librxe as a device specifics plug-in to libibverbs. librxe is packaged separately. Architecture: +-----------------------------------------------------------+ \| Application \| +-----------------------------------------------------------+ +-----------------------------------+ \| libibverbs \| User +-----------------------------------+ +----------------+ +----------------+ \| librxe \| \| HW RoCE lib \| +----------------+ +----------------+ +---------------------------------------------------------------+ +--------------+ +------------+ \| Sockets \| \| RDMA ULP \| +--------------+ +------------+ +--------------+ +---------------------+ \| TCP/IP \| \| ib_core \| +--------------+ +---------------------+ +------------+ +----------------+ Kernel \| ib_rxe \| \| HW RoCE driver \| +------------+ +----------------+ +------------------------------------+ \| NIC driver \| +------------------------------------+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +-----------------------------------------------------------+ \| Application \| +-----------------------------------------------------------+ +-----------------------------------+ \| libibverbs \| User +-----------------------------------+ +----------------+ +----------------+ \| librxe \| \| HW RoCE lib \| +----------------+ +----------------+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +--------------+ +------------+ \| Sockets \| \| RDMA ULP \| +--------------+ +------------+ +--------------+ +---------------------+ \| TCP/IP \| \| ib_core \| +--------------+ +---------------------+ +------------+ +----------------+ Kernel \| ib_rxe \| \| HW RoCE driver \| +------------+ +----------------+ +------------------------------------+ \| NIC driver \| +------------------------------------+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Soft RoCE resources: [1[ https://github.com/SoftRoCE/librxe-dev librxe - source code in Github [2] https://github.com/SoftRoCE/rxe-dev/wiki/rxe-dev:-Home - Soft RoCE Wiki page [3] https://github.com/SoftRoCE/librxe-dev - Soft RoCE userspace library Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-04 11:13:12 -04:00
Krzysztof Kozlowski	00085f1efa	dma-mapping: use unsigned long for dma_attrs The dma-mapping core and the implementations do not change the DMA attributes passed by pointer. Thus the pointer can point to const data. However the attributes do not have to be a bitfield. Instead unsigned long will do fine: 1. This is just simpler. Both in terms of reading the code and setting attributes. Instead of initializing local attributes on the stack and passing pointer to it to dma_set_attr(), just set the bits. 2. It brings safeness and checking for const correctness because the attributes are passed by value. Semantic patches for this change (at least most of them): virtual patch virtual context @r@ identifier f, attrs; @@ f(..., - struct dma_attrs attrs + unsigned long attrs , ...) { ... } @@ identifier r.f; @@ f(..., - NULL + 0 ) and // Options: --all-includes virtual patch virtual context @r@ identifier f, attrs; type t; @@ t f(..., struct dma_attrs attrs); @@ identifier r.f; @@ f(..., - NULL + 0 ) Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.com Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Acked-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> Acked-by: Mark Salter <msalter@redhat.com> [c6x] Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris] Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm] Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Acked-by: Joerg Roedel <jroedel@suse.de> [iommu] Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp] Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core] Acked-by: David Vrabel <david.vrabel@citrix.com> [xen] Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb] Acked-by: Joerg Roedel <jroedel@suse.de> [iommu] Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon] Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390] Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32] Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc] Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-08-04 08:50:07 -04:00
Doug Ledford	7c41765d8c	Merge branches 'hfi1' and 'sge-limit' into k.o/for-4.8-2	2016-08-03 21:51:20 -04:00
Alex Vesker	ab15c95a17	IB/core: Support for CMA multicast join flags Added UCMA and CMA support for multicast join flags. Flags are passed using UCMA CM join command previously reserved fields. Currently supporting two join flags indicating two different multicast JoinStates: 1. Full Member: The initiator creates the Multicast group(MCG) if it wasn't previously created, can send Multicast messages to the group and receive messages from the MCG. 2. Send Only Full Member: The initiator creates the Multicast group(MCG) if it wasn't previously created, can send Multicast messages to the group but doesn't receive any messages from the MCG. IB: Send Only Full Member requires a query of ClassPortInfo to determine if SM/SA supports this option. If SM/SA doesn't support Send-Only there will be no join request sent and an error will be returned. ETH: When Send Only Full Member is requested no IGMP join will be sent. Signed-off-by: Alex Vesker <valex@mellanox.com> Reviewed by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-03 21:06:46 -04:00
Alex Vesker	3d3fd74239	IB/sa: Add cached attribute containing SM information to SA port Added a new SA port attribute containing SM ClassPortInfo fields, (ClassPortInfo fields: Table 126 IB Spec 1.3.). This is useful for checking SM support for specific features. The attribute is cached to avoid resending queries, caching is done when a successful ClassPortInfo reply is received on the port. Invalidation of the attribute is done on SM change events, SM re-registration events, and SM LID change events. The fields in ClassPortInfo should not change during SM runtime without an event. Signed-off-by: Alex Vesker <valex@mellanox.com> Reviewed by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-03 21:03:36 -04:00
Jason Gunthorpe	d1e09f304a	IB/uverbs: Fix race between uverbs_close and remove_one Fixes an oops that might happen if uverbs_close races with remove_one. Both contexts may run ib_uverbs_cleanup_ucontext, it depends on the flow. Currently, there is no protection for a case that remove_one didn't make the cleanup it runs to its end, the underlying ib_device was freed then uverbs_close will call ib_uverbs_cleanup_ucontext and OOPs. Above might happen if uverbs_close deleted the file from the list then remove_one didn't find it and runs to its end. Fixes to protect against that case by a new cleanup lock so that ib_uverbs_cleanup_ucontext will be called always before that remove_one is ended. Fixes: `35d4a0b63d` ("IB/uverbs: Fix race between ib_uverbs_open and remove_one") Reported-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-03 21:03:36 -04:00
Markus Elfring	380bae5bf2	IB/mthca: Clean up error unwind flow in mthca_reset() The kfree() function was called in a few cases by the mthca_reset() function during error handling even if the passed variables "bridge_header" and "hca_header" contained a null pointer. Adjust jump targets according to the Linux coding style convention. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-03 21:03:35 -04:00
Markus Elfring	3491ab63b4	IB/mthca: NULL arg to pci_dev_put is OK The pci_dev_put() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-03 21:03:35 -04:00
Markus Elfring	f7ca535ba0	IB/hfi1: NULL arg to sc_return_credits is OK The sc_return_credits() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-03 21:03:35 -04:00
Mark Bloch	3f85f2aaab	IB/mlx4: Add diagnostic hardware counters Expose IB diagnostic hardware counters. The counters count IB events and are applicable for IB and RoCE. The counters can be divided into two groups, per device and per port. Device counters are always exposed. Port counters are exposed only if the firmware supports per port counters. rq_num_dup and sq_num_to are only exposed if we have firmware support for them, if we do, we expose them per device and per port. rq_num_udsdprd and num_cqovf are device only counters. rq - denotes responder. sq - denotes requester. \|-----------------------\|---------------------------------------\| \| Name \| Description \| \|-----------------------\|---------------------------------------\| \|rq_num_lle \| Number of local length errors \| \|-----------------------\|---------------------------------------\| \|sq_num_lle \| number of local length errors \| \|-----------------------\|---------------------------------------\| \|rq_num_lqpoe \| Number of local QP operation errors \| \|-----------------------\|---------------------------------------\| \|sq_num_lqpoe \| Number of local QP operation errors \| \|-----------------------\|---------------------------------------\| \|rq_num_lpe \| Number of local protection errors \| \|-----------------------\|---------------------------------------\| \|sq_num_lpe \| Number of local protection errors \| \|-----------------------\|---------------------------------------\| \|rq_num_wrfe \| Number of CQEs with error \| \|-----------------------\|---------------------------------------\| \|sq_num_wrfe \| Number of CQEs with error \| \|-----------------------\|---------------------------------------\| \|sq_num_mwbe \| Number of Memory Window bind errors \| \|-----------------------\|---------------------------------------\| \|sq_num_bre \| Number of bad response errors \| \|-----------------------\|---------------------------------------\| \|sq_num_rire \| Number of Remote Invalid request \| \| \| errors \| \|-----------------------\|---------------------------------------\| \|rq_num_rire \| Number of Remote Invalid request \| \| \| errors \| \|-----------------------\|---------------------------------------\| \|sq_num_rae \| Number of remote access errors \| \|-----------------------\|---------------------------------------\| \|rq_num_rae \| Number of remote access errors \| \|-----------------------\|---------------------------------------\| \|sq_num_roe \| Number of remote operation errors \| \|-----------------------\|---------------------------------------\| \|sq_num_tree \| Number of transport retries exceeded \| \| \| errors \| \|-----------------------\|---------------------------------------\| \|sq_num_rree \| Number of RNR NAK retries exceeded \| \| \| errors \| \|-----------------------\|---------------------------------------\| \|rq_num_rnr \| Number of RNR NAKs sent \| \|-----------------------\|---------------------------------------\| \|sq_num_rnr \| Number of RNR NAKs received \| \|-----------------------\|---------------------------------------\| \|rq_num_oos \| Number of Out of Sequence requests \| \| \| received \| \|-----------------------\|---------------------------------------\| \|sq_num_oos \| Number of Out of Sequence NAKs \| \| \| received \| \|-----------------------\|---------------------------------------\| \|rq_num_udsdprd \| Number of UD packets silently \| \| \| discarded on the Receive Queue due to \| \| \| lack of receive descriptor \| \|-----------------------\|---------------------------------------\| \|rq_num_dup \| Number of duplicate requests received \| \|-----------------------\|---------------------------------------\| \|sq_num_to \| Number of time out received \| \|-----------------------\|---------------------------------------\| \|num_cqovf \| Number of CQ overflows \| \|-----------------------\|---------------------------------------\| Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-03 21:03:34 -04:00
Mustafa Ismail	e972dabf9c	Use smaller 512 byte messages for portmapper messages Portmapper messages are short and do not occupy more than 512 bytes. Lower portmapper message size to 512 bytes. This change significantly reduces the amount of memory needed when trying to establish a large number of connections simultaneously. The old value is based on page size. Signed-off-by: Faisal Latif <faisal.latif@intel.com> Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-03 21:03:33 -04:00
Yuval Shaia	5faba54695	IB/ipoib: Report SG feature regardless of HW UD CSUM capability Decouple SG support from HW ability to do UD checksum. This coupling is for historical reasons and removed with 'commit `ec5f061564` ("net: Kill link between CSUM and SG features.")' During driver load it is assumed that device does not supports SG. The final decision is taken after creating UD QP based on device capability. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-03 21:03:32 -04:00
Roland Dreier	0c87b67209	IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct We allocate a small tracking structure as part of mlx4_ib_resize_cq(). However, we don't need to use GFP_ATOMIC -- immediately after the allocation, we call mlx4_cq_resize(), which allocates a command mailbox with GFP_KERNEL and then sleeps on a firmware command, so we better not be in an atomic context. This actually has a real impact, because when this GFP_ATOMIC allocation fails (and GFP_ATOMIC does fail in practice) then a userspace consumer resizing a CQ will get a spurious failure that we can easily avoid. Signed-off-by: Roland Dreier <roland@purestorage.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-03 21:03:32 -04:00
Bart Van Assche	a154a8cd08	IB/hfi1: Disable by default There is a strict policy in the Linux kernel that new drivers must be disabled by default. Hence leave out the "default m" line from Kconfig. Fixes: `f48ad614c1` ("IB/hfi1: Move driver out of staging") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Jubin John <jubin.john@intel.com> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: <stable@vger.kernel.org> # v4.7+ Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-03 21:03:32 -04:00
Bart Van Assche	378fc3201e	IB/rdmavt: Disable by default There is a strict policy in the Linux kernel that new drivers must be disabled by default. Hence leave out the "default m" line from Kconfig. Fixes: `0194621b22` ("IB/rdmavt: Create module framework and handle driver registration") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Jubin John <jubin.john@intel.com> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: <stable@vger.kernel.org> # v4.6+ Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-03 21:03:31 -04:00
Doug Ledford	6a89d89d85	Merge branch 'i40iw' into k.o/for-4.8	2016-08-03 21:00:16 -04:00
Doug Ledford	3e5e8e8a9a	Merge branches 'cxgb4' and 'mlx5' into k.o/for-4.8	2016-08-03 20:58:45 -04:00
Dean Luick	0636e9ab83	IB/hfi1: Add cache evict LRU list The original code used a LRU list to evict nodes which were least recently used. For correctness the evict code was moved under the handler->lock, now add back the LRU list. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Ira Weiny	2677a7680e	IB/hfi1: Fix memory leak during unexpected shutdown During an unexpected shutdown, references to tid_rb_node were NULL'ed out without properly being released. Fix this by calling clear_tid_node in the mmu notifier remove callback rather than after these callbacks are called. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	082b353291	IB/hfi1: Remove unneeded mm argument in remove function The reworked mmu_rb interface allows the unused mm argument to be removed. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	b85ced9151	IB/hfi1: Consistently call ops->remove outside spinlock The ops->remove() callback was called by hfi1_mmu_unregister() with a NULL mm argument while holding a spinlock. In the case of sdma_rb_remove() this caused it to pass current->mm to hfi1_release_user_pages() This had 2 problems. First this would attempt to acquire the mmap_sem under a spin lock. Second the use of current->mm is not always guaranteed to be the proper mm when the fd is being closed. Rather than depend on this implicit behavior we move all calls to ops->remove outside of the spinlock. This also allows the correct mm to be used in the remove callback without fear of deadlock. Because the MMU notifier is not guaranteed to hold mm->mmap_sem, but usually does, we must delay all remove callbacks until out of the notifier, when the callbacks can take the mmap_sem if they need to. Code comments were added to clarify what the expectations are for the users of the mmu rb tree. Suggested-by: Jim Foraker <foraker1@llnl.gov> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	b7df192f74	IB/hfi1: Use evict mmu rb operation Use the new cache evict operation in the SDMA code. This allows the cache to properly coordinate evicts and removes, preventing any race. With this change, the separate list, lock, and race flag are not needed. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	1034599805	IB/hfi1: Add evict operation to the mmu rb handler Allow users to clear nodes from the rb tree based on their evict callback. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	622c202c4a	IB/hfi1: Fix TID caching actions Per file descriptor TID caching actions depend on a global that can change midway through the lifetime of that file descriptor. Make the use of caching consistent for the life of the file descriptor by using the presence of the cache handler to decide when to use the cache functions. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	e0b09ac55d	IB/hfi1: Make the cache handler own its rb tree root The objects which use cache handling should reference their own handler object not the internal data structure it uses to track the nodes. Have the "users" of the mmu notifier code pass opaque objects which can then be properly used in the mmu callbacks depending on the owners needs. This patch has the additional benefit that operations no longer require a look up in a list to find the handlers. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Ira Weiny	3faa3d9a30	IB/hfi1: Make use of mm consistent The hfi1 driver registers a mmu_notifier callback when /dev/hfi1_* is opened, and unregisters it when the device is closed. The driver incorrectly assumes that the close will always happen from the same context as the open. In particular, closes due to SIGKILL or OOM killer activity may happen from a different context. In these cases, the wrong mm is passed to mmu_notifier_unregister(), which causes improper reference counting for the victim mm, and eventual memory corruption. Preserve the mm for all open file descriptors and use this mm rather than current->mm for memory operations for the lifetime of that fd. Note: this patch leaves 1 use of current->mm in place. This use is removed in a follow on patch because other functional changes were required prior to that use being removed. If registration fails, there is no reason to keep the handler object around. Free the handler object rather than add it to the list to prevent any mmu_notifier operations, including unregister, when registration fails. Suggested-by: Jim Foraker <foraker1@llnl.gov> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	7b3256e331	IB/hfi1: Fix user SDMA racy user request claim The user SDMA in-use claim bit is in the structure that gets zeroed out once the claim is made. Move the request in-use flag into its own bit array and use that for atomic claims. This cleans up the claim code and removes any race possibility. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	9da7e9a711	IB/hfi1: Fix error condition that needs to clean up If input validation fails, properly free the request before returning. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	a383f8ec55	IB/hfi1: Release node on insert failure If unable to insert node into the RB tree cache, node will be freed before returning from the function. Null out iovec's pointer to node so iovec does not try to free it later. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	9ff73c8715	IB/hfi1: Validate SDMA user iovector count Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	4fa0d22c9a	IB/hfi1: Validate SDMA user request index Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	bdf7752e07	IB/hfi1: Use the same capability state for all shared contexts Save the current capability state at user context creation time. Report this saved value for all shared contexts. Also get rid of unnecessary hfi1_get_base_kinfo function. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Ira Weiny	53445bb32d	IB/hfi1: Prevent null pointer dereference If a context has not been assigned or assignment failed, pq may be NULL. Move the unregister within the protection of the null check. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	a7cd2dc5d4	IB/hfi1: Rename TID mmu_rb_* functions Clarify the names of the TID mmu functions. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	20a42d0833	IB/hfi1: Remove unneeded empty check in hfi1_mmu_rb_unregister() Checking if the rb tree is empty is redundant with the while loop which is emptying the rb tree. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Ira Weiny	ea3a0ee52d	IB/hfi1: Restructure hfi1_file_open Rearrange the file open call in prep for new changes. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	ff4ce9bde9	IB/hfi1: Make iovec loop index easy to understand Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Ira Weiny	639297b4f0	IB/hfi1: Use "false" not 0 For bool parameters "false" should be used Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Ira Weiny	5ed3b15b05	IB/hfi1: Remove unused sub-context parameter subctxt is not used, just remove it. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Ira Weiny	3c1091aa94	IB/hfi1: Consolidate __mmu_rb_remove and hfi1_mmu_rb_remove __mmu_rb_remove was called in only 1 place which was a very simple call site. Combine this function into its caller. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	c0946642e5	IB/hfi1: Always expect ops functions Remove, insert, and invalidate are always provided. No need to test. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Ira Weiny	862548dace	IB/hfi1: Add parameter names to callback declarations This makes it more clear what these functions are operating on. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Ira Weiny	ac335e7e80	IB/hfi1: Add parameter names to function declarations Parameter names to function declarations make it more clear what those parameters do. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	fc87879ae2	IB/hfi1: Remove unused function hfi1_mmu_rb_search Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dean Luick	8e1f52df97	IB/hfi1: Remove unused uctxt->subpid and uctxt->pid These are no longer needed. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Ira Weiny	72720ddfc6	IB/hfi1: Fix minor format error Brackets should be on the next line of a function Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Ira Weiny	fc0b76c016	IB/hfi1: Expand reported serial number Expand the serial number space by using more bits from the GUID. Reviewed-by: Jubin John <jubin.john@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Ira Weiny	c492980269	IB/hfi1: Allow for non-double word multiple message sizes for user SDMA The driver pads non-double word multiple message sizes but it doesn't account for this padding when the packet length is calculated. Also, the data length is miscalculated for message sizes less than 4 bytes due to the bit representation in LRH. And there's a check for non-double word multiple message sizes that prevents these messages from being sent. This patch fixes length miscalculations and enables the functionality to send non-double word multiple message sizes. Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Ira Weiny	fe508272c9	IB/rdmavt: Eliminate redundant opcode test in mr ref clear The use of the specific opcode test is redundant since all ack entry users correctly manipulate the mr pointer to selectively trigger the reference clearing. The overly specific test hinders the use of implementation specific operations. The change needs to get rid of the union to insure that an atomic value is not seen as an MR pointer. Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Ira Weiny	042b0159aa	IB/hfi1: Handle kzalloc failure in init_pervl_scs Checking the return value of the memory allocation call in init_pervl_scs() was missed. Recently the kmalloc() was changed to kzalloc() which identified the problem. While fixing this issue 2 other bugs were noticed. First, the array being allocated is accessed in the nomem path which can be reached before it is allocated. Second, kernel_send_context was not released on error. Fix both of these by creating a more common memory unwind label structure. Fixes: `35f6befc84` ("staging/rdma/hfi1: Add qp to send context mapping for PIO") Reported-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 22:46:21 -04:00
Dasaratharaman Chandramouli	527dbf12e0	IB/qib, IB/hfi1: Fix grh creation in ud loopback Instead of copying the actual GRH of type struct ib_grh, existing code copies the struct ib_global_route into the sge. This patch fixes that and constructs the actual GRH from ib_global_route and copies the GRH into the sge. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Dasaratharaman Chandramouli	b736a469f9	IB/hfi1: Use hdr2sc function to calculate 5-bit SC The interface is used to compute the 5-bit SC field from the LRH and the RHF bits. Modify code to use the interface instead. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Dasaratharaman Chandramouli	89c057cae4	IB/hfi1: Cleanup UD packet handler. Cleanup hfi1_ud_rcv to not have to look at the packet header fields multiple times. The fields are looked up once and used throughout the function. Also fix sc computation when validating MAD packets. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Don Hiatt	d4d602e9a3	IB/hfi1: Rename hfi1_pio_header to hfi1_sdma_header. hfi1_pio_header should really be called hfi1_sdma_header as it is only used for sdma transmits. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Dasaratharaman Chandramouli	a9b6b3bc29	IB/hfi1: Rename struct ahg_ib_header to struct hfi1_ahg_info struct ahg_ib_header has no header specific information. Rename it to struct hfi1_ahg_info Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Dasaratharaman Chandramouli	bd24ef5eca	IB/hfi1: Remove unused elements from struct ahg_ib_header sde and hfi1_ib_header are not used anymore. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Easwar Hariharan	b5e7101954	IB/hfi1: Reset QSFP on every run through channel tuning Active QSFP cables were reset only every alternate iteration of the channel tuning algorithm instead of every iteration due to incorrect reset of the flag that controlled QSFP reset, resulting in using stale QSFP status in the channel tuning algorithm. Fixes: `8ebd4cf185` ("Add active and optical cable support") Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Easwar Hariharan	5fbd98dd20	IB/hfi1: Ignore QSFP interrupts until power stabilizes Some QSFP cables assert the interrupt line as a side effect of module plug-in and power up. This causes the SerDes and QSFP tuning algorithm to begin cable initialization by reading the QSFP memory map over I2C, which fails. This patch ignores any interrupt line assertion until the module has completed power up and voltage rails have stabilized, which can take a maximum of 500 ms per the SFF-8679 specification. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Easwar Hariharan	3ca5f4c068	IB/hfi1: Disable external device configuration requests QSFP CDR enablement is now controlled by determining power class and the configuration file. We disable the DC 8051 from requesting enablement or disabling of TX and RX CDRs by removing the code that allowed the DC 8051 to request changes. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Jianxin Xiong	d9b13c2030	IB/rdmavt, hfi1: Fix NFSoRDMA failure with FRMR enabled Hanging has been observed while writing a file over NFSoRDMA. Dmesg on the server contains messages like these: [ 931.992501] svcrdma: Error -22 posting RDMA_READ [ 952.076879] svcrdma: Error -22 posting RDMA_READ [ 982.154127] svcrdma: Error -22 posting RDMA_READ [ 1012.235884] svcrdma: Error -22 posting RDMA_READ [ 1042.319194] svcrdma: Error -22 posting RDMA_READ Here is why: With the base memory management extension enabled, FRMR is used instead of FMR. The xprtrdma server issues each RDMA read request as the following bundle: (1)IB_WR_REG_MR, signaled; (2)IB_WR_RDMA_READ, signaled; (3)IB_WR_LOCAL_INV, signaled & fencing. These requests are signaled. In order to generate completion, the fast register work request is processed by the hfi1 send engine after being posted to the work queue, and the corresponding lkey is not valid until the request is processed. However, the rdmavt driver validates lkey when the RDMA read request is posted and thus it fails immediately with error -EINVAL (-22). This patch changes the work flow of local operations (fast register and local invalidate) so that fast register work requests are always processed immediately to ensure that the corresponding lkey is valid when subsequent work requests are posted. Local invalidate requests are processed immediately if fencing is not required and no previous local invalidate request is pending. To allow completion generation for signaled local operations that have been processed before posting to the work queue, an internal send flag RVT_SEND_COMPLETION_ONLY is added. The hfi1 send engine checks this flag and only generates completion for such requests. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Mike Marciniszyn	856cc4c237	IB/hfi1: Add the capability for reserved operations This fix allows for support of in-kernel reserved operations without impacting the ULP user. The low level driver can register a non-zero value which will be transparently added to the send queue size and hidden from the ULP in every respect. ULP post sends will never see a full queue due to a reserved post send and reserved operations will never exceed that registered value. The s_avail will continue to track the ULP swqe availability and the difference between the reserved value and the reserved in use will track reserved availabity. Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Grzegorz Heldt	23002d5b08	IB/hfi1: Fix trace message units Trace shows incorrect amount of allocated memory. Fix trace to display memory in KB. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Grzegorz Heldt <grzegorz.heldt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Tadeusz Struk	b14db1f0aa	IB/hfi1: Add sysfs entry to override SDMA interrupt affinity Add sysfs entry to allow user to override affinity for SDMA engine interrupts. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Dean Luick	c3f8de0b33	IB/hfi1: Add static PCIe Gen3 CTLE tuning Enhance the PCIe Gen3 recipe to support static CTLE tuning, and add a switch to choose between static and dynamic approaches. Make discrete chips default to static CTLE tuning. Reviewed-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Jianxin Xiong	8adf71fa14	IB/hfi1: Fix "suspicious rcu_dereference_check() usage" warnings This fixes the following warnings with PROVE_LOCKING and PROVE_RCU enabled in the kernel: case (1): [ INFO: suspicious RCU usage. ] drivers/infiniband/hw/hfi1/init.c:532 suspicious rcu_dereference_check() usage! case (2): [ INFO: suspicious RCU usage. ] drivers/infiniband/hw/hfi1/hfi.h:1624 suspicious rcu_dereference_check() usage! Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Jianxin Xiong	a6580f4310	IB/rdmavt: Add missing spin_lock_init call for rdi->n_cqs_lock This fixes the following warning with PROV_LOCKING enabled kernel: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 15 PID: 12286 Comm: modprobe Not tainted 4.7.0-rc5.prove_rcu+ #1 Hardware name: Intel Corporation S2600WT2R/S2600WT2R, ...... Call Trace: [<ffffffff8139ec0d>] dump_stack+0x85/0xc8 [<ffffffff810eb765>] register_lock_class+0x415/0x4b0 [<ffffffff810ede1c>] ? __lock_acquire+0x40c/0x1960 [<ffffffff810edaa9>] __lock_acquire+0x99/0x1960 [<ffffffff8120ab62>] ? find_vmap_area+0x42/0x60 [<ffffffff8120ab39>] ? find_vmap_area+0x19/0x60 [<ffffffff810ef9d3>] lock_acquire+0xd3/0x200 [<ffffffffa049d598>] ? rvt_create_cq+0xc8/0x250 [rdmavt] [<ffffffff81763391>] _raw_spin_lock+0x31/0x40 [<ffffffffa049d598>] ? rvt_create_cq+0xc8/0x250 [rdmavt] [<ffffffffa049d598>] rvt_create_cq+0xc8/0x250 [rdmavt] [<ffffffff810ead46>] ? static_obj+0x36/0x50 [<ffffffffa0469e39>] ib_alloc_cq+0x49/0x180 [ib_core] [<ffffffffa047bed4>] ib_mad_init_device+0x204/0x6d0 [ib_core] [<ffffffff810e968f>] ? up_write+0x1f/0x40 [<ffffffffa046e2c0>] ib_register_device+0x3d0/0x510 [ib_core] [<ffffffffa0752410>] ? read_cc_setting_bin+0x200/0x200 [hfi1] [<ffffffff810ead46>] ? static_obj+0x36/0x50 [<ffffffff810eb888>] ? lockdep_init_map+0x88/0x200 [<ffffffffa049cbff>] rvt_register_device+0x17f/0x320 [rdmavt] [<ffffffffa0766caa>] hfi1_register_ib_device+0x6ca/0x7c0 [hfi1] [<ffffffffa0733de4>] init_one+0x2b4/0x430 [hfi1] [<ffffffff813e40a5>] local_pci_probe+0x45/0xa0 [<ffffffff813e5110>] ? pci_match_device+0xe0/0x110 [<ffffffff813e550c>] pci_device_probe+0xfc/0x140 [<ffffffff814daee9>] driver_probe_device+0x239/0x460 [<ffffffff814db1dd>] __driver_attach+0xcd/0xf0 [<ffffffff814db110>] ? driver_probe_device+0x460/0x460 [<ffffffff814d89b3>] bus_for_each_dev+0x73/0xc0 [<ffffffff814da74e>] driver_attach+0x1e/0x20 [<ffffffff814da1b3>] bus_add_driver+0x1d3/0x290 [<ffffffffa04cc114>] ? dev_init+0x114/0x114 [hfi1] [<ffffffff814dbf60>] driver_register+0x60/0xe0 [<ffffffffa04cc114>] ? dev_init+0x114/0x114 [hfi1] [<ffffffff813e39d0>] __pci_register_driver+0x60/0x70 [<ffffffffa04cc2aa>] hfi1_mod_init+0x196/0x1fe [hfi1] [<ffffffff81002190>] do_one_initcall+0x50/0x190 [<ffffffff8110be72>] ? rcu_read_lock_sched_held+0x62/0x70 [<ffffffff8122d4aa>] ? kmem_cache_alloc_trace+0x23a/0x2a0 [<ffffffff811c1881>] ? do_init_module+0x27/0x1dc [<ffffffff811c18ba>] do_init_module+0x60/0x1dc [<ffffffff811360cc>] load_module+0x132c/0x1ac0 [<ffffffff81132c40>] ? __symbol_put+0x60/0x60 [<ffffffff8133e50d>] ? ima_post_read_file+0x3d/0x80 Cc: Stable <stable@vger.kernel.org> # 4.6+ Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Dean Luick	b3bf270bed	IB/hfi1: Read all firmware versions Read the version of the SBus, PCIe SerDes, and Fabric Serdes firmwares at driver load time. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Dean Luick	6854c6925d	IB/hfi1: Explain state complete frame details When link up fails in LNI, the local and peer state complete frames are reported as numbers. Explain what the values mean so the operator can better diagnose the problem. Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Harish Chegondi	8784ac0243	IB/hfi1: Modify the default number of kernel receive conexts Currently, the default number of kernel receive contexts is set to the number of NUMA nodes on the system plus one for control context. However, the systems that have a single socket and/or have NUMA disabled in the BIOS will have only one receive context by default. This patch would ensure that by default there will be at least two kernel receive contexts plus one for control context regardless of the number of NUMA nodes on the system. The user can override the default number of kernel receive contexts with the krcvqs module parameter. Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Jianxin Xiong	c72cfe3e38	IB/hfi1: Add support for extended memory management Advertise and add the capability of handing all aspects of IBTA extended memory management support in post send. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Jianxin Xiong	0db3dfa03c	IB/hfi1: Work request processing for fast register mr and invalidate In order to support extended memory management support, add send side processing of work requests of type IB_WR_REG_MR, IB_WR_LOCAL_INV, and IB_WR_SEND_WITH_INV. The first two are local operations and are supported for both RC and UC. Send with invalidate is only supported for RC because the corresponding IB opcodes are not defined for UC. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Jianxin Xiong	a2df0c8332	IB/hfi1: Handle send with invalidate opcode in the RC recv path As part of enabling extended memory management support, add the processing of the RC send with invalidate. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Jianxin Xiong	d9f8723924	IB/rdmavt: Handle local operations in post send Some work requests are local operations, such as IB_WR_REG_MR and IB_WR_LOCAL_INV. They differ from non-local operations in that: (1) Local operations can be processed immediately without being posted to the send queue if neither fencing nor completion generation is needed. However, to ensure correct ordering, once a local operation is posted to the work queue due to fencing or completion requiement, all subsequent local operations must also be posted to the work queue until all the local operations on the work queue have completed. (2) Local operations don't send packets over the wire and thus don't need (and shouldn't update) the packet sequence numbers. Define a new a flag bit for the post send table to identify local operations. Add a new field to the QP structure to track the number of local operations on the send queue to determine if direct processing of new local operations should be enabled/disabled. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Jianxin Xiong	e8f8b098a4	IB/rdmavt: Add mechanism to invalidate MR keys In order to support extended memory management, add the mechanism to invalidate MR keys. This includes a flag "lkey_invalid" in the MR data structure that is to be checked when validating access to the MR via the associated key, and two utility functions to perform fast memory registration and memory key invalidate operations. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Jianxin Xiong	a41081aa59	IB/rdmavt: Add support for ib_map_mr_sg This implements the device specific function needed by the verbs API function ib_map_mr_sg(). Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Mitko Haralanov	5fd2b562ed	IB/hfi1: Pull FECN/BECN processing to a common place There were multiple places where FECN/BECN processing was being done for the different types of QPs. All of that code was very similar, which meant that it could be pulled into a single function used by the different QP types. To retain the performance in the fastpath, the common code starts with an inline function, which only calls the slow path if the packet has any of the [FB]ECN bits set. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Tymoteusz Kielan	1b23f02cf4	IB/hfi1: Fix to fully initialize send context area While handling buffer control MAD, partially initialized dd->kernel_send_context area may cause potential dereference of uninitialized pointers. Fix by using kzalloc_node() instead of kmalloc_node(). Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Andrzej Kacprowski <andrzej.kacprowski@intel.com> Signed-off-by: Tymoteusz Kielan <tymoteusz.kielan@intel.com> Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Jakub Pawlak	3210314ad3	IB/hfi1: Fix integrity errors counter value calculation PMA should not sum TX and RX replay counts when reporting local link integrity errors. Fixed by removing C_DC_TX_REPLAY counter from calculation of the link integrity errors counter value. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 16:00:58 -04:00
Mike Marciniszyn	2821c509fc	IB/rdmavt: Use new driver specific post send table Change rvt_post_one_wr to use the new table mechanism for post send. Validate that each low level driver specifies the table. Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 15:47:45 -04:00
Mike Marciniszyn	9ec4faa391	IB/qib: Add qib post send table Add initial table for table driven post_send support. Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 15:47:44 -04:00
Mike Marciniszyn	1ac57c50e9	IB/hfi1: Add hfi1 post send tables Add initial table for table driven post_send support. Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 15:47:44 -04:00
Mike Marciniszyn	afcf8f7647	IB/rdmavt: Add data structures and routines for table driven post send Add flexibility for driver dependent operations in post send because different drivers will have differing post send operation support. This includes data structure definitions to support a table driven scheme along with the necessary validation routine using the new table. Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 15:47:44 -04:00
Jakub Pawlak	71e68e3db8	IB/hfi1: Correct receive packet handler assignment Prevent processing receive packet in case when opcode is accepted by QP but handler for this type of packet is not defined. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 15:47:43 -04:00
Jianxin Xiong	14833b8c52	IB/hfi1: Improve SDMA engine assignment for user SDMA Currently each user context is assigned a single SDMA engine based on the VL, context id, and subcontext id. That means for MPI applications, each rank can only use one SDMA engine for all messages. This may create unwanted backup for independent messages going to different destinations upon congestion at one destination. This patch adds the packet "dlid" to the formula of SDMA engine selection for user SDMA requests. A simple hash table is used to maintain even distribution among the available SDMA engines regardless how the "dlid" values are distributed. Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 15:47:43 -04:00
Dean Luick	e014991d07	IB/hfi1: Remove TWSI references Remove the TWSI code. The driver now uses the kernel's built-in i2c bit bus module. Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 15:47:42 -04:00
Dean Luick	dba715f0c8	IB/hfi1: Use built-in i2c bit-shift bus adapter Use built-in i2c bit-shift bus adapter to control the i2c busses on the chip. Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 15:47:42 -04:00
Sebastian Sanchez	b094a36f90	IB/hfi1: Refine user process affinity algorithm When performing process affinity recommendations for MPI ranks, the current algorithm doesn't take into account multiple HFI units. Also, real cores and HT cores are not distinguished from one another. Therefore, all HT cores are recommended to be assigned first within the local NUMA node before recommending the assignments of cores in other NUMA nodes. It's ideal to assign all real cores across all NUMA nodes first, then all HT 1 cores, then all HT 2 cores, and so on to balance CPU workload. CPU cores in other NUMA nodes could be running interrupt handlers, and this is not taken into account. To balance the CPU workload for user processes, the following recommendation algorithm is used: For each user process that is opening a context on HFI Y: a) If all cores are assigned to user processes, start assignments all over from the first core b) Assign real cores first, then HT cores (First set of HT cores on all physical cores, then second set of HT cores, and, so on) in the following order: 1. Same NUMA node as HFI Y and not running an IRQ handler 2. Same NUMA node as HFI Y and running an IRQ handler 3. Different NUMA node to HFI Y and not running an IRQ handler 4. Different NUMA node to HFI Y and running an IRQ handler c) Mark core as assigned in the global affinity structure. As user processes are done, remove core assignments from global affinity structure. This implementation allows an arbitrary number of HT cores and provides support for multiple HFIs. This is being included in the kernel rather than user space due to the fact that user space has no way of knowing the CPU recommendations for contexts running as part of other jobs. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 15:47:33 -04:00
Sebastian Sanchez	d63730192f	IB/hfi1: Reserve and collapse CPU cores for contexts Kernel receive queues oversubscribe CPU cores on multi-HFI systems. To prevent this, the kernel receive queues are separated onto different cores, and the SDMA engine interrupts are constrained to a lesser number of cores. hfi1s_on_numa_node*krcvqs is the number of CPU cores that are reserved for kernel receive queues for all HFIs. Each HFI initializes its kernel receive queues to one of the reserved CPU cores. If there ends up being 0 CPU cores leftover for SDMA engines, use the same CPU cores as receive contexts. In addition, general and control contexts are assigned to their own CPU core, however, both types of contexts tend to have low traffic. To save CPU cores, collapse general and control contexts to one CPU core for all HFI units. This change prevents SDMA engine interrupts from wrapping around general contexts. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 15:47:07 -04:00
Dennis Dalessandro	4197344ba5	IB/hfi1: Add global structure for affinity assignments When HFI units get initialized, they each use their own mask copy for affinity assignments. On a multi-HFI system, affinity assignments overbook CPU cores as each HFI doesn't have knowledge of affinity assignments for other HFI units. Therefore, some CPU cores are never used for interrupt handlers in systems with high number of CPU cores per NUMA node. For multi-HFI systems, SDMA engine interrupt assignments start all over from the first CPU in the local NUMA node after the first HFI initialization. This change allows assignments to continue where the last HFI unit left off. Add global structure for affinity assignments for multiple HFIs to share affinity mask. Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com> Reviewed-by: Jubin John <jubin.john@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 15:45:14 -04:00
Alex Vesker	321a9e3ebc	IB/mlx5: Fix port counter ID association to QP offset The q-counter-id is given in modify-QP command associates the QP with the counter. The offset to which the counter ID was set is incorrect, causing IB port counters not to count on QP. Fixes: `0837e86a7a` ('IB/mlx5: Add per port counters') Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Tested-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 14:40:49 -04:00
Slava Shwartsman	b0ffeb537f	IB/mlx5: Fix iteration overrun in GSI qps Number of outstanding_pi may overflow and as a result may indicate that there are no elements in the queue. The effect of doing this is that the MAD layer will get stuck waiting for completions. The MAD layer will think that the QP is full - because it didn't receive these completions. This fix changes it so the outstanding_pi number is increased with 32-bit wraparound and is not limited to max_send_wr so that the difference between outstanding_pi and outstanding_ci will really indicate the number of outstanding completions. Cc: Stable <stable@vger.kernel.org> Fixes: `ea6dc20362` ('IB/mlx5: Reorder GSI completions') Signed-off-by: Slava Shwartsman <slavash@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 14:32:51 -04:00
Mustafa Ismail	da5c138185	i40iw: Add NULL check for puda buffer i40iw_puda_get_listbuf may return NULL if the list is empty. Add NULL check prior to accessing the pointer. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 14:17:38 -04:00
Mustafa Ismail	8c1ea86d44	i40iw: Change dup_ack_thresh to u8 Change dup_ack_thressh to u8 since it is a 3 bit field. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 14:17:38 -04:00
Mustafa Ismail	c5d057d32b	i40iw: Remove unnecessary check for moving CQ head In i40iw_cq_poll_completion, we always move the tail. So there is no reason to check for overflow everytime we move the head. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 14:17:38 -04:00
Mustafa Ismail	b494c3e677	i40iw: Simplify code to set fragments in SQ WQE Replace a subtract and multiply with an add; while populating fragments in SQ wqe. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 14:17:38 -04:00
Mustafa Ismail	b54143bea9	i40iw: Remove unnecessary parameter to i40iw_cq_poll_completion Post_cq parameter passed to i40iw_cq_poll_completion is always true; so remove. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 14:17:38 -04:00
Mustafa Ismail	5ec11ed23e	i40iw: Do not access pointer after free Child_listen_node pointer is used in a debug print after kfree. Move the print before kfree. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 14:17:38 -04:00
Mustafa Ismail	342c387b0d	i40iw: Correct and use size parameter to i40iw_reg_phys_mr Fix size parameter passed to i40iw_reg_phys_mr and use it to register memory. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 14:17:38 -04:00
Shiraz Saleem	fe5d6e625d	i40iw: Fix return codes Fix incorrect usage of ENOSYS and other return codes. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 14:17:38 -04:00
Steve Wise	59c68ac31e	iw_cm: free cm_id resources on the last deref Remove the complicated logic to free the iw_cm_id inside iw_cm event handlers vs when an application thread destroys the cm_id. Also remove the block in iw_destroy_cm_id() to block the application until all references are removed. This block can cause a deadlock when disconnecting or destroying cm_ids inside an rdma_cm event handler. Simply allowing the last deref of the iw_cm_id to free the memory is cleaner and avoids this potential deadlock. Also a flag is added, IW_CM_DROP_EVENTS, that is set when the cm_id is marked for destruction. If any events are pending on this iw_cm_id, then as they are processed they will be dropped vs posted upstream if IW_CM_DROP_EVENTS is set. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 13:15:18 -04:00
Steve Wise	ad61a4c7a9	iw_cxgb4: don't block in destroy_qp awaiting the last deref Blocking in c4iw_destroy_qp() causes a deadlock when apps destroy a qp or disconnect a cm_id from their cm event handler function. There is no need to block here anyway, so just replace the refcnt atomic with a kref object and free the memory on the last put. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 13:15:17 -04:00
Steve Wise	1b1a889dbb	iw_cxgb4: explicitly move the qp to ERROR state during flush This forces the connection to abort if the application failed to disconnect before flushing. This is aligned with how the common flush services work. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 13:15:17 -04:00
Steve Wise	12eb5137ed	iw_cxgb4: stop MPA_REPLY timer when disconnecting There exists a race where the application can setup a connection and then disconnect it before iw_cxgb4 processes the fw4_ack message. For passive side connections, the fw4_ack message is used to know when to stop the ep timer for MPA_REPLY messages. If the application disconnects before the fw4_ack is handled then c4iw_ep_disconnect() needs to clean up the timer state and stop the timer before restarting it for the disconnect timer. Failure to do this results in a "timer already started" message and a premature stopping of the disconnect timer. Fixes: `e4b76a2` ("RDMA/iw_cxgb4: stop_ep_timer() after MPA negotiation") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 13:15:17 -04:00
Mustafa Ismail	cea05eadde	IB/core: Add flow control to the portmapper netlink calls During connection establishment with a large number of connections, it is possible that the connection requests might fail. Adding flow control prevents this failure. Change ibnl_unicast to use blocking to enable flow control. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Faisal Latif <faisal.latif@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 13:14:27 -04:00
Amitoj Kaur Chawla	3b8fb4b86c	RDMA/cxgb3: Use AF_INET for sin_family field Elsewhere the sin_family field holds a value with a name of the form AF_..., so it seems reasonable to do so here as well. Also the values of PF_INET and AF_INET are the same. The Coccinelle semantic patch that makes this change is as follows: // <smpl> @@ struct sockaddr_in sip; @@ ( sip.sin_family == - PF_INET + AF_INET \| sip.sin_family != - PF_INET + AF_INET \| sip.sin_family = - PF_INET + AF_INET ) // </smpl> Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 13:04:42 -04:00
Hariprasad S	56b2eca3fe	RDMA/iw_cxgb4: Use kfree_skb instead of kfree The commit `0f8ab0b6e9` ("RDMA/iw_cxgb4: Low resource fixes for Memory registration") from Jun 10, 2016, leads to the following static checker warning: drivers/infiniband/hw/cxgb4/mem.c:612 c4iw_alloc_mw() error: use kfree_skb() here instead of kfree(mhp->dereg_skb) Also fixes skb leak in c4iw_dealloc_mw Fixes: `0f8ab0b6e9` ("RDMA/iw_cxgb4: Low resource fixes for Memory registration") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 13:04:42 -04:00
Wei Yongjun	619615005e	IB/mlx5: Fix duplicate const warning Fixes the following sparse warning: drivers/infiniband/hw/mlx5/main.c:2574:25: warning: duplicate const Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 13:00:33 -04:00
Bart Van Assche	e6d66e3eb6	IB/isert: Remove an unused member variable Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Laurence Oberman <loberman@redhat.com> Cc: Parav Pandit <pandit.parav@gmail.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 12:02:41 -04:00
Bart Van Assche	10fce586b2	IB/srpt: Simplify srpt_queue_response() Initialize first_wr to &send_wr. This allows to remove a ternary operator and an else branch. This patch does not change the behavior of srpt_queue_response(). Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Laurence Oberman <loberman@redhat.com> Cc: Parav Pandit <pandit.parav@gmail.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 12:02:41 -04:00
Bart Van Assche	30c6d8773d	IB/srpt: Limit the number of SG elements per work request Limit the number of SG elements per work request to what the HCA and the queue pair support. Fixes: 34693573fde0 ("IB/srpt: Reduce QP buffer size") Reported-by: Parav Pandit <pandit.parav@gmail.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Parav Pandit <pandit.parav@gmail.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Cc: Laurence Oberman <loberman@redhat.com> Cc: <stable@vger.kernel.org> #v4.7+ Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 12:02:41 -04:00
Bart Van Assche	632bc3f650	IB/core, RDMA RW API: Do not exceed QP SGE send limit Compute the SGE limit for RDMA READ and WRITE requests in ib_create_qp(). Use that limit in the RDMA RW API implementation. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Parav Pandit <pandit.parav@gmail.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Cc: Laurence Oberman <loberman@redhat.com> Cc: <stable@vger.kernel.org> #v4.7+ Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 12:02:41 -04:00
Bart Van Assche	eaa74ec732	IB/core: Make rdma_rw_ctx_init() initialize all used fields Some but not all callers of rdma_rw_ctx_init() zero-initialize struct rdma_rw_ctx. Hence make rdma_rw_ctx_init() initialize all work request fields that will be read by ib_post_send(). Fixes: `a060b5629a` ("IB/core: generic RDMA READ/WRITE API") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Laurence Oberman <loberman@redhat.com> Cc: Parav Pandit <pandit.parav@gmail.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Cc: <stable@vger.kernel.org> #v4.7+ Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 12:02:41 -04:00
Jakub Pawlak	2b71904674	IB/hfi1: Add counter to track unsupported packets drop Add sw counter to track dropped unsupported packets. Report unsupported packets drop as the RcvError. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 12:00:54 -04:00
Jakub Pawlak	583eb8b8a1	IB/hfi1: Add VL XmitDiscards counters to the opapmaquery Add per VL XmitDiscards counters to the opapmaquery status and error response. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 12:00:54 -04:00
Mike Marciniszyn	ad4210823b	IB/hfi1: Fix trace sparse errors Fix sparse errors by making sure the fast assign destinations are host cpu typed. For the void __iomem *, just make the field match source data. Fix a bug where the hw_free trace printed the pointer vs. the dereferenced value. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 12:00:54 -04:00
Sebastian Sanchez	462b6b2170	IB/hfi1: Separate tracepoints into specific headers The ftrace infrastructure used to evaluate the TRACE_SYSTEM macro on every DEFINE_EVENT() macro. Now the TRACE_SYSTEM macro only gets evaluated when trace/define_trace.h is included, so the group event information is lost. This was introduced in commit `acd388fd3a` ("tracing: Give system name a pointer") Therefore, each system tracepoint must be on its own file. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 12:00:54 -04:00
Tadeusz Struk	21a4c95d3f	IB/hfi1: Fix typo Fix a copy and paste typo in comment. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 12:00:54 -04:00
Ira Weiny	0904f32796	IB/hfi1: Remove unnecessary done label in hfi1_write_iter Simple code clean up of hfi1_write_iter. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 12:00:54 -04:00
Ira Weiny	f8181697fd	IB/hfi1: Clean up port state structure definition The definition of port state changed mid development and the old structure was kept accidentally. Remove this dead code. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-08-02 12:00:54 -04:00
David S. Miller	de0ba9a0d8	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Just several instances of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-24 00:53:32 -04:00
Brenden Blanco	224e92e02a	net/mlx4_en: break out tx_desc write into separate function In preparation for writing the tx descriptor from multiple functions, create a helper for both normal and blueflame access. Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-19 21:46:33 -07:00
Shiraz Saleem	8e0e7aedad	i40iw: Enable remote access rights for stag allocation Fix to enable remote access rights when allocating stag. Fixes: `b7aee855d3` ("RDMA/i40iw: Add base memory management extensions") Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-07-12 10:46:34 -04:00
Nicolas Iooss	b0548cff99	i40iw: do not print unitialized variables in error message i40iw_create_cqp() printed the contents of variables maj_err and min_err in an error message before they could be initialized (by calling dev->cqp_ops->cqp_create). Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-07-12 10:46:34 -04:00
Christoph Lameter	c5a81d11d7	IB core: Add port_xmit_wait counter Add the missing port_xmit_wait counter. This counter is displayed through some tools like perfquery but is not available via sysfs. For the PORT_PMA_ATTR macro the _counter field is set to zero allowing us to specify the offset directly like with PORT_PMA_ATTR_EXT See also the earlier work in 2008 by Vladimir Skolovsky https://www.mail-archive.com/general@lists.openfabrics.org/msg20313.html Signed-off-by: Vladimir Sokolvsky <vlad@mellanox.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-07-12 10:46:24 -04:00
Tadeusz Struk	98f179a5ea	IB/hfi1: Fix sleep inside atomic issue in init_asic_data The critical section should protect only the list traversal and dd->asic_data modification, not the memory allocation. The fix pulls the allocation out of the critical section. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-07-12 10:46:24 -04:00
Mike Marciniszyn	896ce45da2	IB/hfi1: Correct issues with sc5 computation There are several computatations of the sc in the ud receive routine. Besides the code duplication, all are wrong when the sc is greater than 15. In that case the code incorrectly or's a 1 into the computed sc instead of 1 shifted left by 4. Fix precomputed sc5 by using an already implemented routine hdr2sc() and deleting flawed duplicated code. Cc: Stable <stable@vger.kernel.org> # 4.6+ Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-07-12 10:46:24 -04:00
Maor Gottlieb	c5bb17302e	net/mlx5: Refactor mlx5_add_flow_rule Reduce the set of arguments passed to mlx5_add_flow_rule by introducing flow_spec structure. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-05 00:06:02 -07:00
Doug Ledford	fb92d8fb1b	Merge branches 'cxgb4-4.8', 'mlx5-4.8' and 'fw-version' into k.o/for-4.8	2016-06-23 12:29:26 -04:00
Doug Ledford	9903fd1374	Merge branches '4.7-rc-misc', 'hfi1-fixes', 'i40iw-rc-fixes' and 'mellanox-rc-fixes' into k.o/for-4.7-rc	2016-06-23 12:22:33 -04:00
Ira Weiny	939b6ca873	IB/hfi1: Add device FW version string Export the firmware version through the core. Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 12:08:34 -04:00
Ira Weiny	41a6ae1ebd	IB/core: Export a common fw_ver sysfs entry Now that all the devices have stopped exporting their own sysfs entry points we can have the core export this on their behalf. Eventually this may be removed but this provides for backwards compatibility. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 12:08:34 -04:00
Ira Weiny	1a8632121a	IB/ipoib: Use new device FW version string Using this allows for devices to specify the format of their firmware version rather than forcing a format. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 12:08:34 -04:00
Ira Weiny	15453e857a	IB/usnic: Support device FW version string And remove sysfs file in favor of the common core. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 12:08:34 -04:00
Ira Weiny	bd395005d2	IB/ocrdma: Support device FW version string And remove sysfs in favor of the core support. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 12:08:34 -04:00
Ira Weiny	96357454eb	IB/nes: Support device FW version string And remove the sysfs in favor of the core version. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 12:08:34 -04:00
Ira Weiny	51ed03978e	IB/mthca: Supprot device FW version string And remove the sysfs entry in favor of the core support. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 12:08:34 -04:00
Ira Weiny	c73428230d	IB/mlx5: Support device FW version string And remove sysfs entry in favor of the common code. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 12:08:33 -04:00
Ira Weiny	e9db59fcd2	IB/mlx4: Support device FW version string And remove the sysfs in favor of common core version. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 12:08:33 -04:00
Ira Weiny	f65c52ca23	IB/i40iw: Support device FW version string And remove sysfs support in favor of the core version. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 12:08:33 -04:00
Ira Weiny	ce1922435d	IB/cxgb4: Support device FW version string And remove sysfs fw_ver in favor of the core. Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 12:08:33 -04:00
Ira Weiny	e180369424	IB/cxgb3: Support device FW version string Also remove fw_ver sysfs to be replaced by the common core one. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 12:08:33 -04:00
Ira Weiny	5fa76c2045	IB/core: Add get FW version string to the core Allow for a common core function to get firmware version strings from the individual devices. In later patches this format can then then be used to pass a properly formated version string through the IPoIB layer. The problem with the current code in the IPoIB layer is that it is specific to certain hardware types. Furthermore, this gives us a common function through which the core can provide a common sysfs entry. Eventually we may want to remove the sysfs export but this provides for user space backwards compatibility. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 12:08:33 -04:00
Bart Van Assche	c0cf4512a3	IB/srpt: Reduce QP buffer size The memory needed for the send and receive queues associated with a QP is proportional to the max_sge parameter. The current value of that parameter is such that with an mlx4 HCA the QP buffer size is 8 MB. Since DMA is used for communication between HCA and CPU that buffer either has to be allocated coherently or map_single() must succeed for that buffer. Since large contiguous allocations are fragile and since the maximum segment size for e.g. swiotlb is 256 KB, reduce the max_sge parameter. This patch avoids that the following text appears on the console after SRP logout and relogin on a system equipped with multiple IB HCAs: mlx4_core 0000:05:00.0: swiotlb buffer is full (sz: 8388608 bytes) swiotlb: coherent allocation failed for device 0000:05:00.0 size=8388608 CPU: 11 PID: 148 Comm: kworker/11:1 Not tainted 4.7.0-rc4-dbg+ #1 Call Trace: [<ffffffff812c6d35>] dump_stack+0x67/0x92 [<ffffffff812efe71>] swiotlb_alloc_coherent+0x141/0x150 [<ffffffff810458be>] x86_swiotlb_alloc_coherent+0x3e/0x50 [<ffffffffa03861fa>] mlx4_buf_direct_alloc.isra.5+0x9a/0x120 [mlx4_core] [<ffffffffa0386545>] mlx4_buf_alloc+0x165/0x1a0 [mlx4_core] [<ffffffffa035053d>] create_qp_common.isra.29+0x57d/0xff0 [mlx4_ib] [<ffffffffa03510da>] mlx4_ib_create_qp+0x12a/0x3f0 [mlx4_ib] [<ffffffffa031154a>] ib_create_qp+0x3a/0x250 [ib_core] [<ffffffffa055dd4b>] srpt_cm_handler+0x4bb/0xcad [ib_srpt] [<ffffffffa02c1ab0>] cm_process_work+0x20/0xf0 [ib_cm] [<ffffffffa02c3640>] cm_work_handler+0x1ac0/0x2059 [ib_cm] [<ffffffff810737ed>] process_one_work+0x19d/0x490 [<ffffffff81073b29>] worker_thread+0x49/0x490 [<ffffffff8107a0ea>] kthread+0xea/0x100 [<ffffffff815b25af>] ret_from_fork+0x1f/0x40 Fixes: `b99f8e4d7b` ("IB/srpt: convert to the generic RDMA READ/WRITE API") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Laurence Oberman <loberman@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 12:04:09 -04:00
Mark Bloch	0ad17a8f7f	IB/mlx5: Add port protocol stats Expose new counters using the get protocol stats callback. We expose the following counters: \|------------------------------------------------------------------------\| \| Name \| IB \| EN \| Description \| \|------------------------------------------------------------------------\| \|rx_write_requests \| + \| - \| Number of received WRITE requests for \| \| \| \| \| the associated QP. \| \|------------------------------------------------------------------------\| \|rx_read_requests \| + \| - \| Number of received READ requests for \| \| \| \| \| the associated QP. \| \|------------------------------------------------------------------------\| \|rx_atomic_requests \| + \| - \| Number of received ATOMIC requests for \| \| \| \| \| the associated QP. \| \|------------------------------------------------------------------------\| \|out_of_buffer \| + \| + \| Number of drops occurred due to lack \| \| \| \| \| of WQE for the associated QPs/RQs. \| \|------------------------------------------------------------------------\| \|out_of_sequence \| + \| - \| Number of errors in the packet \| \| \| \| \| transport sequence number \| \|------------------------------------------------------------------------\| \|duplicate_request \| + \| + \| Number of received duplicated packets. \| \| \| \| \| A request that previously executed is \| \| \| \| \| named duplicated. \| \|------------------------------------------------------------------------\| \|rnr_nak_retry_err \| + \| + \| Number of received RNR NAC packets. \| \| \| \| \| The QP retry limit did not exceed. \| \|------------------------------------------------------------------------\| \|packet_seq_err \| + \| + \| Number of received NAK - sequence error\| \| \| \| \| packets. The QP retry limit did not \| \| \| \| \| exceed. \| \|------------------------------------------------------------------------\| \|implied_nak_err \| + \| + \| Number of times the requester detected \| \| \| \| \| an ACK with a PSN larger than expected \| \| \| \| \| PSN for RDMA READ or ATOMIC response \| \| \| \| \| The QP retry limit did not exceed. \| \|------------------------------------------------------------------------\| \|local_ack_timeout_err\| + \| - \| Number of NO ACK responses from \| \| \| \| \| responder within timer interval. \| \| \| \| \| The QP retry limit did not exceed. \| \|------------------------------------------------------------------------\| Counters are available if all of them are supported. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Christoph Lameter <cl@linux.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 11:40:00 -04:00
Mark Bloch	0837e86a7a	IB/mlx5: Add per port counters In order to support statistics for ports, we attach each QP to a counter set which is dedicate to this port. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 11:39:14 -04:00
Artemy Kovalyov	af1ba291c5	{net, IB}/mlx5: Refactor internal SRQ API Currently, the SRQ API uses the obsolete mlx5_*_srq_mbox_{in,out} structs which limit the ability to pass the SRQ attributes between net and IB parts of the driver. This patch changes the SRQ API so as to use auto-generated structs and provides a better way to pass attributes which will be in use by coming features. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 11:20:07 -04:00
Bodong Wang	402ca53644	IB/mlx5: Report mlx5 TSO capabilities when querying device Enable mlx5 based hardware to report TCP segmentation offload (TSO) capabilities from kernel to user space. A TSO enabled NIC will accept big chunks of data with sizes greater than MTU for TCP traffic. The TSO engine will break the data into separate packets and will insert headers automatically. The capabilities are exposed to user space through query_device by uhw directly. The following capabilities are reported: 1. The maximum payload size in bytes supported for segmentation by TSO engine. 2. Bitmap showing which QP types are supported by TSO operation. The bitmap is built by members from 'enmu ib_qp_type'. For example, similar code should be performed if UD QP is supported: supported_qpts \|= 1 << IB_QPT_UD; To make user-space library aware of whether kernel supports uhw or not, a new flag: cmds_supp_uhw will be returned back to user-space through alloc_ucontext. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 11:09:18 -04:00
Maor Gottlieb	026bae0cb4	IB/mlx5: Enable flow steering for IPv6 traffic Enable flow steering for IPv6 traffic by using an IPv6 spec. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 11:02:45 -04:00
Maor Gottlieb	4c2aae712c	IB/core: Add IPv6 support to flow steering Add IPv6 flow specification support. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 11:02:45 -04:00
Maor Gottlieb	89ea94a7b6	IB/mlx5: Reset flow support for IB kernel ULPs The driver exposes interfaces that directly relate to HW state. Upon fatal error, consumers of these interfaces (ULPs) that rely on completion of all their posted work-request could hang, thereby introducing dependencies in shutdown order. To prevent this from happening, we manage the relevant resources (CQs, QPs) that are used by the device. Upon a fatal error, we now generate simulated completions for outstanding WQEs that were not completed at the time the HW was reset. It includes invoking the completion event handler for all involved CQs so that the ULPs will poll those CQs. When polled we return simulated CQEs with IB_WC_WR_FLUSH_ERR return code enabling ULPs to clean up their resources and not wait forever for completions upon receiving remove_one. The above change requires an extra check in the data path to make sure that when device is in error state, the simulated CQEs will be returned and no further WQEs will be posted. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 11:02:45 -04:00
Maor Gottlieb	7c2344c3bb	IB/mlx5: Implements disassociate_ucontext API Implements the IB core disassociate_ucontext API. The driver detaches the HW resources for a given user context to prevent a dependency between application termination and device disconnect. This is done by managing the VMAs that were mapped to the HW bars such as doorbell and blueflame. When need to detach, remap them to an arbitrary kernel page returned by the zap API. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 11:02:44 -04:00
Yishai Hadas	28d6137008	IB/mlx5: Add RSS QP support Add support for Raw Ethernet RX HASH QP. Currently, creation and destruction of such a QP are supported. This QP is implemented as a simple TIR object which points to the receive RQ indirection table. The given hashing configuration is used to configure the TIR and by that it chooses the right RQ from the RQ indirection table. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 11:02:44 -04:00
Yishai Hadas	c70285f880	IB/uverbs: Extend create QP to get RWQ indirection table User applications that want to spread incoming traffic between several WQs should create a QP which contains an indirection table. When such a QP is created other receive side parameters are not valid and should not be given. Its send side is optional and assumed active based on max_send_wr capability value. Extend create QP to work accordingly. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 11:02:44 -04:00
Yishai Hadas	a9017e232f	IB/core: Extend create QP to get indirection table Extend create QP to get Receive Work Queue (WQ) indirection table. QP can be created with external Receive Work Queue indirection table, in that case it is ready to receive immediately. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 11:02:44 -04:00
Yishai Hadas	c5f9092936	IB/mlx5: Add Receive Work Queue Indirection table operations Some mlx5 based hardwares support a RQ table object. This RQ table points to a few RQ objects. We implement the receive work queue indirection table API (create and destroy) by using this hardware object. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 11:02:43 -04:00
Yishai Hadas	de019a9404	IB/uverbs: Introduce RWQ Indirection table User applications that want to spread traffic on several WQs, need to create an indirection table, by using already created WQs. Adding uverbs API in order to create and destroy this table. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 11:02:43 -04:00
Yishai Hadas	6d39786bf1	IB/core: Introduce Receive Work Queue indirection table Introduce Receive Work Queue (WQ) indirection table. This object can be used to spread incoming traffic to different receive Work Queues. A Receive WQ indirection table points to variable size of WQs. This table is given to a QP in downstream patches. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimerg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 11:02:43 -04:00
Yishai Hadas	79b20a6c30	IB/mlx5: Add receive Work Queue verbs A QP can be created without internal WQs "packaged" inside it, this QP can be configured to use "external" WQ object as its receive/send queue. WQ is a necessary component for RSS technology since RSS mechanism is supposed to distribute the traffic between multiple Receive Work Queues Receive WQs are implemented by RQs. Implement the WQ creation, modification and destruction verbs. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 11:02:43 -04:00
Yishai Hadas	f213c05272	IB/uverbs: Add WQ support User space applications which use RSS functionality need to create a work queue object (WQ). The lifetime of such an object is: * Create a WQ * Modify the WQ from reset to init state. * Use the WQ (by downstream patches). * Destroy the WQ. These commands are added to the uverbs API. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@rimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 11:02:43 -04:00
Yishai Hadas	5fd251c8b4	IB/core: Introduce Work Queue object and its verbs Introduce Work Queue object and its create/destroy/modify verbs. QP can be created without internal WQs "packaged" inside it, this QP can be configured to use "external" WQ object as its receive/send queue. WQ is a necessary component for RSS technology since RSS mechanism is supposed to distribute the traffic between multiple Receive Work Queues. WQ associated (many to one) with Completion Queue and it owns WQ properties (PD, WQ size, etc.). WQ has a type, this patch introduces the IB_WQT_RQ (i.e.receive queue), it may be extend to others such as IB_WQT_SQ. (send queue). WQ from type IB_WQT_RQ contains receive work requests. PD is an attribute of a work queue (i.e. send/receive queue), it's used by the hardware for security validation before scattering to a memory region which is pointed by the WQ. For that, an external WQ object needs a PD, letting the hardware makes that validation. When accessing a memory region that is pointed by the WQ its PD is used and not the QP's PD, this behavior is similar to a SRQ and a QP. WQ context is subject to a well-defined state transitions done by the modify_wq verb. When WQ is created its initial state becomes IB_WQS_RESET. >From IB_WQS_RESET it can be modified to itself or to IB_WQS_RDY. >From IB_WQS_RDY it can be modified to itself, to IB_WQS_RESET or to IB_WQS_ERR. >From IB_WQS_ERR it can be modified to IB_WQS_RESET. Note: transition to IB_WQS_ERR might occur implicitly in case there was some HW error. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 11:02:43 -04:00
Hariprasad S	dd6b024126	RDMA/iw_cxgb4: Low resource fixes for Completion queue Pre-allocate buffers to deallocate completion queue, so that completion queue is deallocated during RDMA termination when system is running out of memory. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:44:18 -04:00
Hariprasad S	0f8ab0b6e9	RDMA/iw_cxgb4: Low resource fixes for Memory registration Pre-allocate buffers for deregistering memory region and memory window during RDMA connection close, when system is running out of memory. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:44:17 -04:00
Hariprasad S	4a740838bf	RDMA/iw_cxgb4: Low resource fixes for connection manager Pre-allocate buffers for sending various control messages to close connection, abort connection, etc so that we gracefully handle connections when system is running out of memory. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:44:17 -04:00
Hariprasad S	4c72efefd9	RDMA/iw_cxgb4: Add missing error codes for act open cmd Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:44:17 -04:00
Hariprasad S	bce2841f5a	RDMA/iw_cxgb4: clean up c4iw_reject_cr() Get rid of unneeded code, and refactor things a bit. For MPA version 0 we abort the connection. For > 0, we attempt to send an MPA_START/REJECT Reply, and then disconnect gracefully. If the send of the MPA message fails, then we abort the connection. We can ignore c4iw_ep_disconnect() errors here because it will clean up the endpoint if there are failures. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:44:16 -04:00
Hariprasad S	68cebcab59	RDMA/iw_cxgb4: allocate enough space for debugfs "qps" dump With IPv6 addresses, the "qps" debugfs is running out of space and truncating the output. Bump the required size accordingly. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:44:16 -04:00
Hariprasad S	3d4e79949c	RDMA/iw_cxgb4: only read markers_enabled mod param once markers_enabled should be read only once during MPA negotiation. The present code does read markers_enabled twice during negotiation which results in setting wrong recv/xmit markers if the markers_enabled is changed in the middle of negotiation. With this change the markers_enabled is read only once during MPA negotiation. recv markers are set based on markers enabled module parameter and xmit markers are set based on markers flag from the MPA_START_REQ/MPA_START_REP. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:43:19 -04:00
Shiraz Saleem	7748e4990d	i40iw: Enable level-1 PBL for fast memory registration Set the chunk_size to enable level-1 PBL support when the fast memory page count is more than one. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Faisal Latif <faisal.latif@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:35:34 -04:00
Faisal Latif	0477e18145	i40iw: Return correct max_fast_reg_page_list_len Return correct value for max_fast_reg_page_list_len from i40iw_query_device(). Signed-off-by: Faisal Latif <faisal.latif@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:35:34 -04:00
Faisal Latif	ee23abd75c	i40iw: Correct status check on i40iw_get_pble i40iw_get_pble returns 0 on success. Correct the check on return code. Signed-off-by: Faisal Latif <faisal.latif@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:35:34 -04:00
Shiraz Saleem	747f1c6d9b	i40iw: Correct CQ arming CQ is armed for solicited events only, ignoring other notification flags. Correct this by arming for next and arming for solicited event if IB_CQ_SOLICITED is set. Also protect CQ shadow area update with spinlock. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:35:34 -04:00
Mike Marciniszyn	c755f4afa6	IB/rdmavt: Correct qp_priv_alloc() return value test The current drivers return errors from this calldown wrapped in an ERR_PTR(). The rdmavt code incorrectly tests for NULL. The code is fixed to use IS_ERR() and change ret according to the driver return value. Cc: Stable <stable@vger.kernel.org> # 4.6+ Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:16:15 -04:00
Ashutosh Dixit	8ae84f7c56	IB/hfi1: Don't zero out qp->s_ack_queue in rvt_reset_qp Since rvt_reset_qp already zero's out qp->s_ack_queue head and tail pointers, there is no need to zero out qp->s_ack_queue itself. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:16:15 -04:00
Mike Marciniszyn	2aee309d3e	IB/hfi1: Fix deadlock with txreq allocation slow path A failure in the get_txreq() inline will result in a slow path retry using __get_txreq(). __get_txreq() attempts to procure the qp s_lock, which is already held in all callers. Fix by deleting the s_lock maintenance in __get_txreq() and add sparse syntax hooks to future proof the code. Cc: Stable <stable@vger.kernel.org> # 4.6+ Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:16:15 -04:00
Chuck Lever	cbc9355a93	IB/mlx4: Prevent cross page boundary allocation Prevent cross page boundary allocation by allocating new page, this is required to be aligned with ConnectX-3 HW requirements. Not doing that might cause to "RDMA read local protection" error. Fixes: `1b2cd0fc67` ('IB/mlx4: Support the new memory registration API') Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:08:25 -04:00
Dotan Barak	5b420d9cf7	IB/mlx4: Fix memory leak if QP creation failed When RC, UC, or RAW QPs are created, a qp object is allocated (kzalloc). If at a later point (in procedure create_qp_common) the qp creation fails, this qp object must be freed. Fixes: `1ffeb2eb8b` ("IB/mlx4: SR-IOV IB context objects and proxy/tunnel SQP support") Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:08:25 -04:00
Yishai Hadas	5533c18ab0	IB/mlx4: Verify port number in flow steering create flow In procedure mlx4_ib_create_flow, passing an invalid port number will cause an out-of-bounds array access. Data passed to this procedure can come from user-space. Therefore, need to validate port number before proceeding onwards. Note that we check against the number of physical ports declared at the verbs (ib core) level; When bonding is active, the verbs level sees one physical port, even though the low-level driver sees two ports. Fixes: `f77c0162a3` ("IB/mlx4: Add receive flow steering support") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:07:04 -04:00
Yishai Hadas	a6100603a4	IB/mlx4: Fix error flow when sending mads under SRIOV Fix mad send error flow to prevent double freeing address handles, and leaking tx_ring entries when SRIOV is active. If ib_mad_post_send fails, the address handle pointer in the tx_ring entry must be set to NULL (or there will be a double-free) and tx_tail must be incremented (or there will be a leak of tx_ring entries). The tx_ring is handled the same way in the send-completion handler. Fixes: `37bfc7c1e8` ("IB/mlx4: SR-IOV multiplex and demultiplex MADs") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:07:03 -04:00
Yishai Hadas	f2940e2c76	IB/mlx4: Fix the SQ size of an RC QP When calculating the required size of an RC QP send queue, leave enough space for masked atomic operations, which require more space than "regular" atomic operation. Fixes: `6fa8f71984` ("IB/mlx4: Add support for masked atomic operations") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@mellanox.co.il> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:06:54 -04:00
Talat Batheesh	00bf534fce	IB/mlx5: Fix wrong naming of port_rcv_data counter port_xmit_data is written instead of port_rcv_data. Fixes: `3efd9a1121` ('IB/mlx5: Modify MAD reading counters method to use counter registers') Signed-off-by: Talat Batheesh <talatb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:03:57 -04:00
Eli Cohen	c9b254955b	IB/mlx5: Fix post send fence logic If the caller specified IB_SEND_FENCE in the send flags of the work request and no previous work request stated that the successive one should be fenced, the work request would be executed without a fence. This could result in RDMA read or atomic operations failure due to a MR being invalidated. Fix this by adding the mlx5 enumeration for fencing RDMA/atomic operations and fix the logic to apply this. Fixes: `e126ba97db` ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:03:57 -04:00
Maor Gottlieb	b57141c1ab	IB/uverbs: Initialize ib_qp_init_attr with zeros Initialize ib_qp_init_attr with zeros in order to avoid from garbage in fields that won't be set with user values. Fixes: `a060b5629a` ('IB/core: generic RDMA READ/WRITE API') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:03:57 -04:00
Eli Cohen	b3556005c5	IB/core: Fix false search of the IB_SA_WELL_KNOWN_GUID When virtualziation is supported, VFs may send SA MADs to a GID formed by the concatenation of the subnet prefix with the IB_SA_WELL_KNOWN_GUID. When a response is required, the current code will search the local HCA's port for the received GID to figure out the GID index of the entry containing this GID. However, since this is not a real GID it will not be found and error will be printed. We change the logic to check if the destination GID is this special GID and avoid lookup in this case and use GID index 0. Fixes: `a0c1b2a350` ('IB/core: Support accessing SA in virtualized environment') Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:03:57 -04:00
Alex Vesker	c65f6c5a36	IB/core: Fix RoCE v1 multicast join logic issue During multicast join of RoCEv1, IGMP join state and max hop limit were updated incorrectly. IGMP join should be sent and marked as joined only on RoCEv2 after a successful join. Max hops should be updated to the hop limit on RoCEv2 regardless of the join state. Fixes: `bee3c3c918` ('IB/cma: Join and leave multicast groups...') Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:03:57 -04:00
Talat Batheesh	f336ae0314	IB/core: Fix no default GIDs when netdevice reregisters Currently, when the netdevice returned by get_netdev is unregistered, we delete all GIDs (including the default GIDs) and reset their attributes. Therefore, when we re-register it, no default GIDs will be assigned (as their "default GID") attribute will be reset. Fixing this by keeping "default GID" attribute. Fixes: `03db3a2d81` ('IB/core: Add RoCE GID table management') Signed-off-by: Talat Batheesh <talatb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-23 10:03:57 -04:00
Sebastian Sanchez	34d351f8dd	IB/hfi1: Send a pkey change event on driver pkey update Swapping a cable from a "Mgmt Allowed=No" switch port to a "Mgmt Allowed=Yes" switch port doesn't send a pkey change notification. Therefore, the link doesn't become active as the oib_utils layer uses an old pkey table cache. Fix by ensuring the pkey change notification is sent when the table is changed both explicitly by the FM and implicitly by the driver via a cable swap. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-17 20:11:27 -04:00
Sebastian Sanchez	3ec5fa28c9	IB/hfi1: Remove FULL_MGMT_P_KEY from pkey table at link up FULL_MGMT_P_KEY doesn't get cleared from the pkey table at link bounce because the link down and link bounce code paths are different when moving a QSFP cable on a switch. This causes an HFI unit connected to a switch to try to be initialized to an FM node when the QSFP cable is moved from a MgmtAllowed=NO port to a MgmtAllowed=YES port and back to a MgmtAllowed=NO port. Remove FULL_MGMT_P_KEY from pkey table at link up. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-17 20:11:27 -04:00
Tadeusz Struk	c078f0dd01	IB/hfi1: Fix potential buffer overflow This fixes potential buffer overflow because the sprintf function doesn't check buffer boundaries. Use snprintf instead. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-17 20:11:27 -04:00
Tadeusz Struk	93dd0a0978	IB/hfi1: Fix potential NULL ptr dereference This fixes potential NULL ptr dereference because IS_ERR(dd) doesn't handle NULL. Fix the issue by initializing the pointer with a not NULL error code. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-17 20:11:27 -04:00
Ira Weiny	5e9ef24619	IB/qib: Prevent context loss If a context has already been assigned to an FD, prevent another assignment. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-17 20:11:26 -04:00
Ira Weiny	ca2f30a0a6	IB/hfi1: Prevent context loss If a context has already been assigned to an FD, prevent another assignment. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-17 20:11:26 -04:00
Jubin John	c3c64a951c	IB/hfi1: Increase packet egress timeout The current value of 500us for the packet egress timeout is too small which causes the host to declare failure on draining packets too early and unnecessarily bounces the link. Increase this to 50ms taking into account the switch packet discard timer default and the worst case per-VL package drainage rate. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-17 20:11:26 -04:00
Brian Welty	501edc4244	IB/rdmavt: Correct warning during QPN allocation Correct calculation of the low order bits which should be unset based on use of qos_shift parameter when assigning QPN. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-17 20:11:26 -04:00
Brian Welty	96605672a4	IB/rdmavt: Correct required callback functions for MODIFY_QP Functions required for MODIFY_PORT were incorrectly being required for MODIFY_QP. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-17 20:11:26 -04:00
Jubin John	b4ba6633ea	IB/hfi1: Fix credit return threshold adjustment The credit return threshold adjustment on mtu change algorithm does not take into account all the kernel send contexts that are assigned per VL. Use the pio send context map to adjust the credit return thresholds for all the allocated and assigned kernel send contexts based on the MTU adjustment per VL. The pio send context map can be changed dynamically based on the actual number of operational vls which is set by the fabric manager. When this happens update the credit return threshold values for all the remapped kernel send contexts. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-17 20:11:26 -04:00
Bart Van Assche	37e07cdafc	IB/cma: Make the code easier to verify Static source code analysis tools like smatch cannot handle functions that lock or not lock a mutex depending on the value of the arguments. Hence inline the function cma_disable_callback(). Additionally, this patch realizes a small performance optimization by reducing the number of mutex_lock() and mutex_unlock() calls in the modified functions. With this patch applied smatch no longer complains about source file cma.c. Without this patch smatch reports the following for this source file: drivers/infiniband/core/cma.c:1959: cma_req_handler() warn: inconsistent returns 'mutex:&listen_id->handler_mutex'. Locked on: line 1880 line 1959 Unlocked on: line 1941 drivers/infiniband/core/cma.c:2112: iw_conn_req_handler() warn: inconsistent returns 'mutex:&listen_id->handler_mutex'. Locked on: line 2048 Unlocked on: line 2112 Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Leon Romanovsky <leonro@mellanox.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-17 20:03:42 -04:00
Jason Gunthorpe	8c5122e45a	IB/mlx4: Properly initialize GRH TClass and FlowLabel in AHs When this code was reworked for IBoE support the order of assignments for the sl_tclass_flowlabel got flipped around resulting in TClass & FlowLabel being permanently set to 0 in the packet headers. This breaks IB routers that rely on these headers, but only affects kernel users - libmlx4 does this properly for user space. Cc: stable@vger.kernel.org Fixes: `fa417f7b52` ("IB/mlx4: Add support for IBoE") Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2016-06-17 19:36:54 -04:00

... 6 7 8 9 10 ...

6278 Commits