2005-04-17 06:20:36 +08:00
|
|
|
/*
|
|
|
|
* Copyright (c) 2004 Mellanox Technologies Ltd. All rights reserved.
|
|
|
|
* Copyright (c) 2004 Infinicon Corporation. All rights reserved.
|
|
|
|
* Copyright (c) 2004 Intel Corporation. All rights reserved.
|
|
|
|
* Copyright (c) 2004 Topspin Corporation. All rights reserved.
|
|
|
|
* Copyright (c) 2004 Voltaire Corporation. All rights reserved.
|
2005-08-11 14:03:10 +08:00
|
|
|
* Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
|
2006-01-31 06:29:21 +08:00
|
|
|
* Copyright (c) 2005, 2006 Cisco Systems. All rights reserved.
|
2005-04-17 06:20:36 +08:00
|
|
|
*
|
|
|
|
* This software is available to you under a choice of one of two
|
|
|
|
* licenses. You may choose to be licensed under the terms of the GNU
|
|
|
|
* General Public License (GPL) Version 2, available from the file
|
|
|
|
* COPYING in the main directory of this source tree, or the
|
|
|
|
* OpenIB.org BSD license below:
|
|
|
|
*
|
|
|
|
* Redistribution and use in source and binary forms, with or
|
|
|
|
* without modification, are permitted provided that the following
|
|
|
|
* conditions are met:
|
|
|
|
*
|
|
|
|
* - Redistributions of source code must retain the above
|
|
|
|
* copyright notice, this list of conditions and the following
|
|
|
|
* disclaimer.
|
|
|
|
*
|
|
|
|
* - Redistributions in binary form must reproduce the above
|
|
|
|
* copyright notice, this list of conditions and the following
|
|
|
|
* disclaimer in the documentation and/or other materials
|
|
|
|
* provided with the distribution.
|
|
|
|
*
|
|
|
|
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
|
|
|
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
|
|
|
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
|
|
|
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
|
|
|
|
* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
|
|
|
|
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
|
|
|
|
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
|
|
* SOFTWARE.
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include <linux/errno.h>
|
|
|
|
#include <linux/err.h>
|
2011-05-28 03:29:33 +08:00
|
|
|
#include <linux/export.h>
|
2005-11-07 16:59:43 +08:00
|
|
|
#include <linux/string.h>
|
2011-08-09 06:31:51 +08:00
|
|
|
#include <linux/slab.h>
|
2015-10-15 23:38:51 +08:00
|
|
|
#include <linux/in.h>
|
|
|
|
#include <linux/in6.h>
|
|
|
|
#include <net/addrconf.h>
|
IB/core: Enforce PKey security on QPs
Add new LSM hooks to allocate and free security contexts and check for
permission to access a PKey.
Allocate and free a security context when creating and destroying a QP.
This context is used for controlling access to PKeys.
When a request is made to modify a QP that changes the port, PKey index,
or alternate path, check that the QP has permission for the PKey in the
PKey table index on the subnet prefix of the port. If the QP is shared
make sure all handles to the QP also have access.
Store which port and PKey index a QP is using. After the reset to init
transition the user can modify the port, PKey index and alternate path
independently. So port and PKey settings changes can be a merge of the
previous settings and the new ones.
In order to maintain access control if there are PKey table or subnet
prefix change keep a list of all QPs are using each PKey index on
each port. If a change occurs all QPs using that device and port must
have access enforced for the new cache settings.
These changes add a transaction to the QP modify process. Association
with the old port and PKey index must be maintained if the modify fails,
and must be removed if it succeeds. Association with the new port and
PKey index must be established prior to the modify and removed if the
modify fails.
1. When a QP is modified to a particular Port, PKey index or alternate
path insert that QP into the appropriate lists.
2. Check permission to access the new settings.
3. If step 2 grants access attempt to modify the QP.
4a. If steps 2 and 3 succeed remove any prior associations.
4b. If ether fails remove the new setting associations.
If a PKey table or subnet prefix changes walk the list of QPs and
check that they have permission. If not send the QP to the error state
and raise a fatal error event. If it's a shared QP make sure all the
QPs that share the real_qp have permission as well. If the QP that
owns a security structure is denied access the security structure is
marked as such and the QP is added to an error_list. Once the moving
the QP to error is complete the security structure mark is cleared.
Maintaining the lists correctly turns QP destroy into a transaction.
The hardware driver for the device frees the ib_qp structure, so while
the destroy is in progress the ib_qp pointer in the ib_qp_security
struct is undefined. When the destroy process begins the ib_qp_security
structure is marked as destroying. This prevents any action from being
taken on the QP pointer. After the QP is destroyed successfully it
could still listed on an error_list wait for it to be processed by that
flow before cleaning up the structure.
If the destroy fails the QPs port and PKey settings are reinserted into
the appropriate lists, the destroying flag is cleared, and access control
is enforced, in case there were any cache changes during the destroy
flow.
To keep the security changes isolated a new file is used to hold security
related functionality.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Acked-by: Doug Ledford <dledford@redhat.com>
[PM: merge fixup in ib_verbs.h and uverbs_cmd.c]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-19 20:48:52 +08:00
|
|
|
#include <linux/security.h>
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2005-08-26 04:40:04 +08:00
|
|
|
#include <rdma/ib_verbs.h>
|
|
|
|
#include <rdma/ib_cache.h>
|
IB/core: Ethernet L2 attributes in verbs/cm structures
This patch add the support for Ethernet L2 attributes in the
verbs/cm/cma structures.
When dealing with L2 Ethernet, we should use smac, dmac, vlan ID and priority
in a similar manner that the IB L2 (and the L4 PKEY) attributes are used.
Thus, those attributes were added to the following structures:
* ib_ah_attr - added dmac
* ib_qp_attr - added smac and vlan_id, (sl remains vlan priority)
* ib_wc - added smac, vlan_id
* ib_sa_path_rec - added smac, dmac, vlan_id
* cm_av - added smac and vlan_id
For the path record structure, extra care was taken to avoid the new
fields when packing it into wire format, so we don't break the IB CM
and SA wire protocol.
On the active side, the CM fills. its internal structures from the
path provided by the ULP. We add there taking the ETH L2 attributes
and placing them into the CM Address Handle (struct cm_av).
On the passive side, the CM fills its internal structures from the WC
associated with the REQ message. We add there taking the ETH L2
attributes from the WC.
When the HW driver provides the required ETH L2 attributes in the WC,
they set the IB_WC_WITH_SMAC and IB_WC_WITH_VLAN flags. The IB core
code checks for the presence of these flags, and in their absence does
address resolution from the ib_init_ah_from_wc() helper function.
ib_modify_qp_is_ok is also updated to consider the link layer. Some
parameters are mandatory for Ethernet link layer, while they are
irrelevant for IB. Vendor drivers are modified to support the new
function signature.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-12-13 00:03:11 +08:00
|
|
|
#include <rdma/ib_addr.h>
|
2016-05-04 00:01:09 +08:00
|
|
|
#include <rdma/rw.h>
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2013-12-13 00:03:17 +08:00
|
|
|
#include "core_priv.h"
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2017-10-16 13:45:13 +08:00
|
|
|
static int ib_resolve_eth_dmac(struct ib_device *device,
|
|
|
|
struct rdma_ah_attr *ah_attr);
|
|
|
|
|
2015-05-18 18:40:28 +08:00
|
|
|
static const char * const ib_events[] = {
|
|
|
|
[IB_EVENT_CQ_ERR] = "CQ error",
|
|
|
|
[IB_EVENT_QP_FATAL] = "QP fatal error",
|
|
|
|
[IB_EVENT_QP_REQ_ERR] = "QP request error",
|
|
|
|
[IB_EVENT_QP_ACCESS_ERR] = "QP access error",
|
|
|
|
[IB_EVENT_COMM_EST] = "communication established",
|
|
|
|
[IB_EVENT_SQ_DRAINED] = "send queue drained",
|
|
|
|
[IB_EVENT_PATH_MIG] = "path migration successful",
|
|
|
|
[IB_EVENT_PATH_MIG_ERR] = "path migration error",
|
|
|
|
[IB_EVENT_DEVICE_FATAL] = "device fatal error",
|
|
|
|
[IB_EVENT_PORT_ACTIVE] = "port active",
|
|
|
|
[IB_EVENT_PORT_ERR] = "port error",
|
|
|
|
[IB_EVENT_LID_CHANGE] = "LID change",
|
|
|
|
[IB_EVENT_PKEY_CHANGE] = "P_key change",
|
|
|
|
[IB_EVENT_SM_CHANGE] = "SM change",
|
|
|
|
[IB_EVENT_SRQ_ERR] = "SRQ error",
|
|
|
|
[IB_EVENT_SRQ_LIMIT_REACHED] = "SRQ limit reached",
|
|
|
|
[IB_EVENT_QP_LAST_WQE_REACHED] = "last WQE reached",
|
|
|
|
[IB_EVENT_CLIENT_REREGISTER] = "client reregister",
|
|
|
|
[IB_EVENT_GID_CHANGE] = "GID changed",
|
|
|
|
};
|
|
|
|
|
2015-08-04 01:01:52 +08:00
|
|
|
const char *__attribute_const__ ib_event_msg(enum ib_event_type event)
|
2015-05-18 18:40:28 +08:00
|
|
|
{
|
|
|
|
size_t index = event;
|
|
|
|
|
|
|
|
return (index < ARRAY_SIZE(ib_events) && ib_events[index]) ?
|
|
|
|
ib_events[index] : "unrecognized event";
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_event_msg);
|
|
|
|
|
|
|
|
static const char * const wc_statuses[] = {
|
|
|
|
[IB_WC_SUCCESS] = "success",
|
|
|
|
[IB_WC_LOC_LEN_ERR] = "local length error",
|
|
|
|
[IB_WC_LOC_QP_OP_ERR] = "local QP operation error",
|
|
|
|
[IB_WC_LOC_EEC_OP_ERR] = "local EE context operation error",
|
|
|
|
[IB_WC_LOC_PROT_ERR] = "local protection error",
|
|
|
|
[IB_WC_WR_FLUSH_ERR] = "WR flushed",
|
|
|
|
[IB_WC_MW_BIND_ERR] = "memory management operation error",
|
|
|
|
[IB_WC_BAD_RESP_ERR] = "bad response error",
|
|
|
|
[IB_WC_LOC_ACCESS_ERR] = "local access error",
|
|
|
|
[IB_WC_REM_INV_REQ_ERR] = "invalid request error",
|
|
|
|
[IB_WC_REM_ACCESS_ERR] = "remote access error",
|
|
|
|
[IB_WC_REM_OP_ERR] = "remote operation error",
|
|
|
|
[IB_WC_RETRY_EXC_ERR] = "transport retry counter exceeded",
|
|
|
|
[IB_WC_RNR_RETRY_EXC_ERR] = "RNR retry counter exceeded",
|
|
|
|
[IB_WC_LOC_RDD_VIOL_ERR] = "local RDD violation error",
|
|
|
|
[IB_WC_REM_INV_RD_REQ_ERR] = "remote invalid RD request",
|
|
|
|
[IB_WC_REM_ABORT_ERR] = "operation aborted",
|
|
|
|
[IB_WC_INV_EECN_ERR] = "invalid EE context number",
|
|
|
|
[IB_WC_INV_EEC_STATE_ERR] = "invalid EE context state",
|
|
|
|
[IB_WC_FATAL_ERR] = "fatal error",
|
|
|
|
[IB_WC_RESP_TIMEOUT_ERR] = "response timeout error",
|
|
|
|
[IB_WC_GENERAL_ERR] = "general error",
|
|
|
|
};
|
|
|
|
|
2015-08-04 01:01:52 +08:00
|
|
|
const char *__attribute_const__ ib_wc_status_msg(enum ib_wc_status status)
|
2015-05-18 18:40:28 +08:00
|
|
|
{
|
|
|
|
size_t index = status;
|
|
|
|
|
|
|
|
return (index < ARRAY_SIZE(wc_statuses) && wc_statuses[index]) ?
|
|
|
|
wc_statuses[index] : "unrecognized status";
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_wc_status_msg);
|
|
|
|
|
2014-06-05 01:00:16 +08:00
|
|
|
__attribute_const__ int ib_rate_to_mult(enum ib_rate rate)
|
2006-04-11 00:43:47 +08:00
|
|
|
{
|
|
|
|
switch (rate) {
|
2018-01-02 21:50:40 +08:00
|
|
|
case IB_RATE_2_5_GBPS: return 1;
|
|
|
|
case IB_RATE_5_GBPS: return 2;
|
|
|
|
case IB_RATE_10_GBPS: return 4;
|
|
|
|
case IB_RATE_20_GBPS: return 8;
|
|
|
|
case IB_RATE_30_GBPS: return 12;
|
|
|
|
case IB_RATE_40_GBPS: return 16;
|
|
|
|
case IB_RATE_60_GBPS: return 24;
|
|
|
|
case IB_RATE_80_GBPS: return 32;
|
|
|
|
case IB_RATE_120_GBPS: return 48;
|
|
|
|
case IB_RATE_14_GBPS: return 6;
|
|
|
|
case IB_RATE_56_GBPS: return 22;
|
|
|
|
case IB_RATE_112_GBPS: return 45;
|
|
|
|
case IB_RATE_168_GBPS: return 67;
|
|
|
|
case IB_RATE_25_GBPS: return 10;
|
|
|
|
case IB_RATE_100_GBPS: return 40;
|
|
|
|
case IB_RATE_200_GBPS: return 80;
|
|
|
|
case IB_RATE_300_GBPS: return 120;
|
|
|
|
default: return -1;
|
2006-04-11 00:43:47 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_rate_to_mult);
|
|
|
|
|
2014-06-05 01:00:16 +08:00
|
|
|
__attribute_const__ enum ib_rate mult_to_ib_rate(int mult)
|
2006-04-11 00:43:47 +08:00
|
|
|
{
|
|
|
|
switch (mult) {
|
2018-01-02 21:50:40 +08:00
|
|
|
case 1: return IB_RATE_2_5_GBPS;
|
|
|
|
case 2: return IB_RATE_5_GBPS;
|
|
|
|
case 4: return IB_RATE_10_GBPS;
|
|
|
|
case 8: return IB_RATE_20_GBPS;
|
|
|
|
case 12: return IB_RATE_30_GBPS;
|
|
|
|
case 16: return IB_RATE_40_GBPS;
|
|
|
|
case 24: return IB_RATE_60_GBPS;
|
|
|
|
case 32: return IB_RATE_80_GBPS;
|
|
|
|
case 48: return IB_RATE_120_GBPS;
|
|
|
|
case 6: return IB_RATE_14_GBPS;
|
|
|
|
case 22: return IB_RATE_56_GBPS;
|
|
|
|
case 45: return IB_RATE_112_GBPS;
|
|
|
|
case 67: return IB_RATE_168_GBPS;
|
|
|
|
case 10: return IB_RATE_25_GBPS;
|
|
|
|
case 40: return IB_RATE_100_GBPS;
|
|
|
|
case 80: return IB_RATE_200_GBPS;
|
|
|
|
case 120: return IB_RATE_300_GBPS;
|
|
|
|
default: return IB_RATE_PORT_CURRENT;
|
2006-04-11 00:43:47 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(mult_to_ib_rate);
|
|
|
|
|
2014-06-05 01:00:16 +08:00
|
|
|
__attribute_const__ int ib_rate_to_mbps(enum ib_rate rate)
|
2011-10-05 19:21:47 +08:00
|
|
|
{
|
|
|
|
switch (rate) {
|
|
|
|
case IB_RATE_2_5_GBPS: return 2500;
|
|
|
|
case IB_RATE_5_GBPS: return 5000;
|
|
|
|
case IB_RATE_10_GBPS: return 10000;
|
|
|
|
case IB_RATE_20_GBPS: return 20000;
|
|
|
|
case IB_RATE_30_GBPS: return 30000;
|
|
|
|
case IB_RATE_40_GBPS: return 40000;
|
|
|
|
case IB_RATE_60_GBPS: return 60000;
|
|
|
|
case IB_RATE_80_GBPS: return 80000;
|
|
|
|
case IB_RATE_120_GBPS: return 120000;
|
|
|
|
case IB_RATE_14_GBPS: return 14062;
|
|
|
|
case IB_RATE_56_GBPS: return 56250;
|
|
|
|
case IB_RATE_112_GBPS: return 112500;
|
|
|
|
case IB_RATE_168_GBPS: return 168750;
|
|
|
|
case IB_RATE_25_GBPS: return 25781;
|
|
|
|
case IB_RATE_100_GBPS: return 103125;
|
|
|
|
case IB_RATE_200_GBPS: return 206250;
|
|
|
|
case IB_RATE_300_GBPS: return 309375;
|
|
|
|
default: return -1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_rate_to_mbps);
|
|
|
|
|
2014-06-05 01:00:16 +08:00
|
|
|
__attribute_const__ enum rdma_transport_type
|
2006-08-04 05:02:42 +08:00
|
|
|
rdma_node_get_transport(enum rdma_node_type node_type)
|
|
|
|
{
|
2017-08-17 20:50:38 +08:00
|
|
|
|
|
|
|
if (node_type == RDMA_NODE_USNIC)
|
2014-01-16 09:02:36 +08:00
|
|
|
return RDMA_TRANSPORT_USNIC;
|
2017-08-17 20:50:38 +08:00
|
|
|
if (node_type == RDMA_NODE_USNIC_UDP)
|
2014-01-10 06:48:19 +08:00
|
|
|
return RDMA_TRANSPORT_USNIC_UDP;
|
2017-08-17 20:50:38 +08:00
|
|
|
if (node_type == RDMA_NODE_RNIC)
|
|
|
|
return RDMA_TRANSPORT_IWARP;
|
|
|
|
|
|
|
|
return RDMA_TRANSPORT_IB;
|
2006-08-04 05:02:42 +08:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(rdma_node_get_transport);
|
|
|
|
|
2010-09-28 08:51:10 +08:00
|
|
|
enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device, u8 port_num)
|
|
|
|
{
|
2017-08-17 20:50:39 +08:00
|
|
|
enum rdma_transport_type lt;
|
2010-09-28 08:51:10 +08:00
|
|
|
if (device->get_link_layer)
|
|
|
|
return device->get_link_layer(device, port_num);
|
|
|
|
|
2017-08-17 20:50:39 +08:00
|
|
|
lt = rdma_node_get_transport(device->node_type);
|
|
|
|
if (lt == RDMA_TRANSPORT_IB)
|
2010-09-28 08:51:10 +08:00
|
|
|
return IB_LINK_LAYER_INFINIBAND;
|
2017-08-17 20:50:39 +08:00
|
|
|
|
|
|
|
return IB_LINK_LAYER_ETHERNET;
|
2010-09-28 08:51:10 +08:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(rdma_port_get_link_layer);
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
/* Protection domains */
|
|
|
|
|
2015-08-06 04:14:45 +08:00
|
|
|
/**
|
|
|
|
* ib_alloc_pd - Allocates an unused protection domain.
|
|
|
|
* @device: The device on which to allocate the protection domain.
|
|
|
|
*
|
|
|
|
* A protection domain object provides an association between QPs, shared
|
|
|
|
* receive queues, address handles, memory regions, and memory windows.
|
|
|
|
*
|
|
|
|
* Every PD has a local_dma_lkey which can be used as the lkey value for local
|
|
|
|
* memory operations.
|
|
|
|
*/
|
2016-09-05 18:56:17 +08:00
|
|
|
struct ib_pd *__ib_alloc_pd(struct ib_device *device, unsigned int flags,
|
|
|
|
const char *caller)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
|
|
|
struct ib_pd *pd;
|
2016-09-05 18:56:17 +08:00
|
|
|
int mr_access_flags = 0;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2005-07-08 08:57:11 +08:00
|
|
|
pd = device->alloc_pd(device, NULL, NULL);
|
2015-08-06 04:14:45 +08:00
|
|
|
if (IS_ERR(pd))
|
|
|
|
return pd;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2015-08-06 04:14:45 +08:00
|
|
|
pd->device = device;
|
|
|
|
pd->uobject = NULL;
|
2016-09-05 18:56:16 +08:00
|
|
|
pd->__internal_mr = NULL;
|
2015-08-06 04:14:45 +08:00
|
|
|
atomic_set(&pd->usecnt, 0);
|
2016-09-05 18:56:17 +08:00
|
|
|
pd->flags = flags;
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2015-12-18 16:59:45 +08:00
|
|
|
if (device->attrs.device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY)
|
2015-08-06 04:14:45 +08:00
|
|
|
pd->local_dma_lkey = device->local_dma_lkey;
|
2016-09-05 18:56:17 +08:00
|
|
|
else
|
|
|
|
mr_access_flags |= IB_ACCESS_LOCAL_WRITE;
|
|
|
|
|
|
|
|
if (flags & IB_PD_UNSAFE_GLOBAL_RKEY) {
|
|
|
|
pr_warn("%s: enabling unsafe global rkey\n", caller);
|
|
|
|
mr_access_flags |= IB_ACCESS_REMOTE_READ | IB_ACCESS_REMOTE_WRITE;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (mr_access_flags) {
|
2015-08-06 04:14:45 +08:00
|
|
|
struct ib_mr *mr;
|
|
|
|
|
2016-09-05 18:56:21 +08:00
|
|
|
mr = pd->device->get_dma_mr(pd, mr_access_flags);
|
2015-08-06 04:14:45 +08:00
|
|
|
if (IS_ERR(mr)) {
|
|
|
|
ib_dealloc_pd(pd);
|
2016-09-05 18:56:21 +08:00
|
|
|
return ERR_CAST(mr);
|
2015-08-06 04:14:45 +08:00
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2016-09-05 18:56:21 +08:00
|
|
|
mr->device = pd->device;
|
|
|
|
mr->pd = pd;
|
|
|
|
mr->uobject = NULL;
|
|
|
|
mr->need_inval = false;
|
|
|
|
|
2016-09-05 18:56:16 +08:00
|
|
|
pd->__internal_mr = mr;
|
2016-09-05 18:56:17 +08:00
|
|
|
|
|
|
|
if (!(device->attrs.device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY))
|
|
|
|
pd->local_dma_lkey = pd->__internal_mr->lkey;
|
|
|
|
|
|
|
|
if (flags & IB_PD_UNSAFE_GLOBAL_RKEY)
|
|
|
|
pd->unsafe_global_rkey = pd->__internal_mr->rkey;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
2016-09-05 18:56:17 +08:00
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
return pd;
|
|
|
|
}
|
2016-09-05 18:56:17 +08:00
|
|
|
EXPORT_SYMBOL(__ib_alloc_pd);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2015-08-06 04:34:31 +08:00
|
|
|
/**
|
|
|
|
* ib_dealloc_pd - Deallocates a protection domain.
|
|
|
|
* @pd: The protection domain to deallocate.
|
|
|
|
*
|
|
|
|
* It is an error to call this function while any resources in the pd still
|
|
|
|
* exist. The caller is responsible to synchronously destroy them and
|
|
|
|
* guarantee no new allocations will happen.
|
|
|
|
*/
|
|
|
|
void ib_dealloc_pd(struct ib_pd *pd)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2015-08-06 04:34:31 +08:00
|
|
|
int ret;
|
|
|
|
|
2016-09-05 18:56:16 +08:00
|
|
|
if (pd->__internal_mr) {
|
2016-09-05 18:56:21 +08:00
|
|
|
ret = pd->device->dereg_mr(pd->__internal_mr);
|
2015-08-06 04:34:31 +08:00
|
|
|
WARN_ON(ret);
|
2016-09-05 18:56:16 +08:00
|
|
|
pd->__internal_mr = NULL;
|
2015-08-06 04:14:45 +08:00
|
|
|
}
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2015-08-06 04:34:31 +08:00
|
|
|
/* uverbs manipulates usecnt with proper locking, while the kabi
|
|
|
|
requires the caller to guarantee we can't race here. */
|
|
|
|
WARN_ON(atomic_read(&pd->usecnt));
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2015-08-06 04:34:31 +08:00
|
|
|
/* Making delalloc_pd a void return is a WIP, no driver should return
|
|
|
|
an error here. */
|
|
|
|
ret = pd->device->dealloc_pd(pd);
|
|
|
|
WARN_ONCE(ret, "Infiniband HW driver failed dealloc_pd");
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_dealloc_pd);
|
|
|
|
|
|
|
|
/* Address handles */
|
|
|
|
|
2017-10-16 13:45:12 +08:00
|
|
|
static struct ib_ah *_rdma_create_ah(struct ib_pd *pd,
|
|
|
|
struct rdma_ah_attr *ah_attr,
|
|
|
|
struct ib_udata *udata)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
|
|
|
struct ib_ah *ah;
|
|
|
|
|
2017-10-16 13:45:12 +08:00
|
|
|
ah = pd->device->create_ah(pd, ah_attr, udata);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
|
|
|
if (!IS_ERR(ah)) {
|
2005-07-08 08:57:11 +08:00
|
|
|
ah->device = pd->device;
|
|
|
|
ah->pd = pd;
|
|
|
|
ah->uobject = NULL;
|
2017-04-30 02:41:29 +08:00
|
|
|
ah->type = ah_attr->type;
|
2005-04-17 06:20:36 +08:00
|
|
|
atomic_inc(&pd->usecnt);
|
|
|
|
}
|
|
|
|
|
|
|
|
return ah;
|
|
|
|
}
|
2017-10-16 13:45:12 +08:00
|
|
|
|
|
|
|
struct ib_ah *rdma_create_ah(struct ib_pd *pd, struct rdma_ah_attr *ah_attr)
|
|
|
|
{
|
|
|
|
return _rdma_create_ah(pd, ah_attr, NULL);
|
|
|
|
}
|
2017-04-30 02:41:19 +08:00
|
|
|
EXPORT_SYMBOL(rdma_create_ah);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2017-10-16 13:45:12 +08:00
|
|
|
/**
|
|
|
|
* rdma_create_user_ah - Creates an address handle for the
|
|
|
|
* given address vector.
|
|
|
|
* It resolves destination mac address for ah attribute of RoCE type.
|
|
|
|
* @pd: The protection domain associated with the address handle.
|
|
|
|
* @ah_attr: The attributes of the address vector.
|
|
|
|
* @udata: pointer to user's input output buffer information need by
|
|
|
|
* provider driver.
|
|
|
|
*
|
|
|
|
* It returns 0 on success and returns appropriate error code on error.
|
|
|
|
* The address handle is used to reference a local or global destination
|
|
|
|
* in all UD QP post sends.
|
|
|
|
*/
|
|
|
|
struct ib_ah *rdma_create_user_ah(struct ib_pd *pd,
|
|
|
|
struct rdma_ah_attr *ah_attr,
|
|
|
|
struct ib_udata *udata)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
|
|
|
|
if (ah_attr->type == RDMA_AH_ATTR_TYPE_ROCE) {
|
|
|
|
err = ib_resolve_eth_dmac(pd->device, ah_attr);
|
|
|
|
if (err)
|
|
|
|
return ERR_PTR(err);
|
|
|
|
}
|
|
|
|
|
|
|
|
return _rdma_create_ah(pd, ah_attr, udata);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(rdma_create_user_ah);
|
|
|
|
|
2016-11-10 17:30:56 +08:00
|
|
|
int ib_get_rdma_header_version(const union rdma_network_hdr *hdr)
|
2015-12-23 20:56:51 +08:00
|
|
|
{
|
|
|
|
const struct iphdr *ip4h = (struct iphdr *)&hdr->roce4grh;
|
|
|
|
struct iphdr ip4h_checked;
|
|
|
|
const struct ipv6hdr *ip6h = (struct ipv6hdr *)&hdr->ibgrh;
|
|
|
|
|
|
|
|
/* If it's IPv6, the version must be 6, otherwise, the first
|
|
|
|
* 20 bytes (before the IPv4 header) are garbled.
|
|
|
|
*/
|
|
|
|
if (ip6h->version != 6)
|
|
|
|
return (ip4h->version == 4) ? 4 : 0;
|
|
|
|
/* version may be 6 or 4 because the first 20 bytes could be garbled */
|
|
|
|
|
|
|
|
/* RoCE v2 requires no options, thus header length
|
|
|
|
* must be 5 words
|
|
|
|
*/
|
|
|
|
if (ip4h->ihl != 5)
|
|
|
|
return 6;
|
|
|
|
|
|
|
|
/* Verify checksum.
|
|
|
|
* We can't write on scattered buffers so we need to copy to
|
|
|
|
* temp buffer.
|
|
|
|
*/
|
|
|
|
memcpy(&ip4h_checked, ip4h, sizeof(ip4h_checked));
|
|
|
|
ip4h_checked.check = 0;
|
|
|
|
ip4h_checked.check = ip_fast_csum((u8 *)&ip4h_checked, 5);
|
|
|
|
/* if IPv4 header checksum is OK, believe it */
|
|
|
|
if (ip4h->check == ip4h_checked.check)
|
|
|
|
return 4;
|
|
|
|
return 6;
|
|
|
|
}
|
2016-11-10 17:30:56 +08:00
|
|
|
EXPORT_SYMBOL(ib_get_rdma_header_version);
|
2015-12-23 20:56:51 +08:00
|
|
|
|
|
|
|
static enum rdma_network_type ib_get_net_type_by_grh(struct ib_device *device,
|
|
|
|
u8 port_num,
|
|
|
|
const struct ib_grh *grh)
|
|
|
|
{
|
|
|
|
int grh_version;
|
|
|
|
|
|
|
|
if (rdma_protocol_ib(device, port_num))
|
|
|
|
return RDMA_NETWORK_IB;
|
|
|
|
|
2016-11-10 17:30:56 +08:00
|
|
|
grh_version = ib_get_rdma_header_version((union rdma_network_hdr *)grh);
|
2015-12-23 20:56:51 +08:00
|
|
|
|
|
|
|
if (grh_version == 4)
|
|
|
|
return RDMA_NETWORK_IPV4;
|
|
|
|
|
|
|
|
if (grh->next_hdr == IPPROTO_UDP)
|
|
|
|
return RDMA_NETWORK_IPV6;
|
|
|
|
|
|
|
|
return RDMA_NETWORK_ROCE_V1;
|
|
|
|
}
|
|
|
|
|
2015-10-15 23:38:51 +08:00
|
|
|
struct find_gid_index_context {
|
|
|
|
u16 vlan_id;
|
2015-12-23 20:56:51 +08:00
|
|
|
enum ib_gid_type gid_type;
|
2015-10-15 23:38:51 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
static bool find_gid_index(const union ib_gid *gid,
|
|
|
|
const struct ib_gid_attr *gid_attr,
|
|
|
|
void *context)
|
|
|
|
{
|
2017-11-14 20:52:04 +08:00
|
|
|
struct find_gid_index_context *ctx = context;
|
2015-10-15 23:38:51 +08:00
|
|
|
|
2015-12-23 20:56:51 +08:00
|
|
|
if (ctx->gid_type != gid_attr->gid_type)
|
|
|
|
return false;
|
|
|
|
|
2015-10-15 23:38:51 +08:00
|
|
|
if ((!!(ctx->vlan_id != 0xffff) == !is_vlan_dev(gid_attr->ndev)) ||
|
|
|
|
(is_vlan_dev(gid_attr->ndev) &&
|
|
|
|
vlan_dev_vlan_id(gid_attr->ndev) != ctx->vlan_id))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int get_sgid_index_from_eth(struct ib_device *device, u8 port_num,
|
|
|
|
u16 vlan_id, const union ib_gid *sgid,
|
2015-12-23 20:56:51 +08:00
|
|
|
enum ib_gid_type gid_type,
|
2015-10-15 23:38:51 +08:00
|
|
|
u16 *gid_index)
|
|
|
|
{
|
2015-12-23 20:56:51 +08:00
|
|
|
struct find_gid_index_context context = {.vlan_id = vlan_id,
|
|
|
|
.gid_type = gid_type};
|
2015-10-15 23:38:51 +08:00
|
|
|
|
|
|
|
return ib_find_gid_by_filter(device, sgid, port_num, find_gid_index,
|
|
|
|
&context, gid_index);
|
|
|
|
}
|
|
|
|
|
2016-11-10 17:30:56 +08:00
|
|
|
int ib_get_gids_from_rdma_hdr(const union rdma_network_hdr *hdr,
|
|
|
|
enum rdma_network_type net_type,
|
|
|
|
union ib_gid *sgid, union ib_gid *dgid)
|
2015-12-23 20:56:51 +08:00
|
|
|
{
|
|
|
|
struct sockaddr_in src_in;
|
|
|
|
struct sockaddr_in dst_in;
|
|
|
|
__be32 src_saddr, dst_saddr;
|
|
|
|
|
|
|
|
if (!sgid || !dgid)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (net_type == RDMA_NETWORK_IPV4) {
|
|
|
|
memcpy(&src_in.sin_addr.s_addr,
|
|
|
|
&hdr->roce4grh.saddr, 4);
|
|
|
|
memcpy(&dst_in.sin_addr.s_addr,
|
|
|
|
&hdr->roce4grh.daddr, 4);
|
|
|
|
src_saddr = src_in.sin_addr.s_addr;
|
|
|
|
dst_saddr = dst_in.sin_addr.s_addr;
|
|
|
|
ipv6_addr_set_v4mapped(src_saddr,
|
|
|
|
(struct in6_addr *)sgid);
|
|
|
|
ipv6_addr_set_v4mapped(dst_saddr,
|
|
|
|
(struct in6_addr *)dgid);
|
|
|
|
return 0;
|
|
|
|
} else if (net_type == RDMA_NETWORK_IPV6 ||
|
|
|
|
net_type == RDMA_NETWORK_IB) {
|
|
|
|
*dgid = hdr->ibgrh.dgid;
|
|
|
|
*sgid = hdr->ibgrh.sgid;
|
|
|
|
return 0;
|
|
|
|
} else {
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
}
|
2016-11-10 17:30:56 +08:00
|
|
|
EXPORT_SYMBOL(ib_get_gids_from_rdma_hdr);
|
2015-12-23 20:56:51 +08:00
|
|
|
|
2017-11-14 20:51:49 +08:00
|
|
|
/* Resolve destination mac address and hop limit for unicast destination
|
|
|
|
* GID entry, considering the source GID entry as well.
|
|
|
|
* ah_attribute must have have valid port_num, sgid_index.
|
|
|
|
*/
|
|
|
|
static int ib_resolve_unicast_gid_dmac(struct ib_device *device,
|
|
|
|
struct rdma_ah_attr *ah_attr)
|
|
|
|
{
|
|
|
|
struct ib_gid_attr sgid_attr;
|
|
|
|
struct ib_global_route *grh;
|
|
|
|
int hop_limit = 0xff;
|
|
|
|
union ib_gid sgid;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
grh = rdma_ah_retrieve_grh(ah_attr);
|
|
|
|
|
|
|
|
ret = ib_query_gid(device,
|
|
|
|
rdma_ah_get_port_num(ah_attr),
|
|
|
|
grh->sgid_index,
|
|
|
|
&sgid, &sgid_attr);
|
|
|
|
if (ret || !sgid_attr.ndev) {
|
|
|
|
if (!ret)
|
|
|
|
ret = -ENXIO;
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2017-11-14 20:51:50 +08:00
|
|
|
/* If destination is link local and source GID is RoCEv1,
|
|
|
|
* IP stack is not used.
|
|
|
|
*/
|
|
|
|
if (rdma_link_local_addr((struct in6_addr *)grh->dgid.raw) &&
|
|
|
|
sgid_attr.gid_type == IB_GID_TYPE_ROCE) {
|
|
|
|
rdma_get_ll_mac((struct in6_addr *)grh->dgid.raw,
|
|
|
|
ah_attr->roce.dmac);
|
|
|
|
goto done;
|
|
|
|
}
|
|
|
|
|
2017-11-14 20:51:49 +08:00
|
|
|
ret = rdma_addr_find_l2_eth_by_grh(&sgid, &grh->dgid,
|
|
|
|
ah_attr->roce.dmac,
|
|
|
|
sgid_attr.ndev, &hop_limit);
|
2017-11-14 20:51:50 +08:00
|
|
|
done:
|
2017-11-14 20:51:49 +08:00
|
|
|
dev_put(sgid_attr.ndev);
|
|
|
|
|
|
|
|
grh->hop_limit = hop_limit;
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
RDMA/core: Document confusing code
While looking into Coverity ID 1351047 I ran into the following
piece of code at
drivers/infiniband/core/verbs.c:496:
ret = rdma_addr_find_l2_eth_by_grh(&dgid, &sgid,
ah_attr->dmac,
wc->wc_flags & IB_WC_WITH_VLAN ?
NULL : &vlan_id,
&if_index, &hoplimit);
The issue here is that the position of arguments in the call to
rdma_addr_find_l2_eth_by_grh() function do not match the order of
the parameters:
&dgid is passed to sgid
&sgid is passed to dgid
This is the function prototype:
int rdma_addr_find_l2_eth_by_grh(const union ib_gid *sgid,
const union ib_gid *dgid,
u8 *dmac, u16 *vlan_id, int *if_index,
int *hoplimit)
My question here is if this is intentional?
Answer:
Yes. ib_init_ah_from_wc() creates ah from the incoming packet.
Incoming packet has dgid of the receiver node on which this code is
getting executed and sgid contains the GID of the sender.
When resolving mac address of destination, you use arrived dgid as
sgid and use sgid as dgid because sgid contains destinations GID whom to
respond to.
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-05 09:38:20 +08:00
|
|
|
/*
|
2017-11-14 20:52:17 +08:00
|
|
|
* This function initializes address handle attributes from the incoming packet.
|
RDMA/core: Document confusing code
While looking into Coverity ID 1351047 I ran into the following
piece of code at
drivers/infiniband/core/verbs.c:496:
ret = rdma_addr_find_l2_eth_by_grh(&dgid, &sgid,
ah_attr->dmac,
wc->wc_flags & IB_WC_WITH_VLAN ?
NULL : &vlan_id,
&if_index, &hoplimit);
The issue here is that the position of arguments in the call to
rdma_addr_find_l2_eth_by_grh() function do not match the order of
the parameters:
&dgid is passed to sgid
&sgid is passed to dgid
This is the function prototype:
int rdma_addr_find_l2_eth_by_grh(const union ib_gid *sgid,
const union ib_gid *dgid,
u8 *dmac, u16 *vlan_id, int *if_index,
int *hoplimit)
My question here is if this is intentional?
Answer:
Yes. ib_init_ah_from_wc() creates ah from the incoming packet.
Incoming packet has dgid of the receiver node on which this code is
getting executed and sgid contains the GID of the sender.
When resolving mac address of destination, you use arrived dgid as
sgid and use sgid as dgid because sgid contains destinations GID whom to
respond to.
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-05 09:38:20 +08:00
|
|
|
* Incoming packet has dgid of the receiver node on which this code is
|
|
|
|
* getting executed and, sgid contains the GID of the sender.
|
|
|
|
*
|
|
|
|
* When resolving mac address of destination, the arrived dgid is used
|
|
|
|
* as sgid and, sgid is used as dgid because sgid contains destinations
|
|
|
|
* GID whom to respond to.
|
|
|
|
*
|
|
|
|
*/
|
2017-11-14 20:52:17 +08:00
|
|
|
int ib_init_ah_attr_from_wc(struct ib_device *device, u8 port_num,
|
|
|
|
const struct ib_wc *wc, const struct ib_grh *grh,
|
|
|
|
struct rdma_ah_attr *ah_attr)
|
2005-07-28 02:45:34 +08:00
|
|
|
{
|
|
|
|
u32 flow_class;
|
|
|
|
u16 gid_index;
|
|
|
|
int ret;
|
2015-12-23 20:56:51 +08:00
|
|
|
enum rdma_network_type net_type = RDMA_NETWORK_IB;
|
|
|
|
enum ib_gid_type gid_type = IB_GID_TYPE_IB;
|
2016-01-04 16:49:54 +08:00
|
|
|
int hoplimit = 0xff;
|
2015-12-23 20:56:51 +08:00
|
|
|
union ib_gid dgid;
|
|
|
|
union ib_gid sgid;
|
2005-07-28 02:45:34 +08:00
|
|
|
|
2017-08-30 01:34:44 +08:00
|
|
|
might_sleep();
|
|
|
|
|
2006-06-18 11:37:39 +08:00
|
|
|
memset(ah_attr, 0, sizeof *ah_attr);
|
2017-04-30 02:41:29 +08:00
|
|
|
ah_attr->type = rdma_ah_find_type(device, port_num);
|
2015-05-05 20:50:40 +08:00
|
|
|
if (rdma_cap_eth_ah(device, port_num)) {
|
2015-12-23 20:56:51 +08:00
|
|
|
if (wc->wc_flags & IB_WC_WITH_NETWORK_HDR_TYPE)
|
|
|
|
net_type = wc->network_hdr_type;
|
|
|
|
else
|
|
|
|
net_type = ib_get_net_type_by_grh(device, port_num, grh);
|
|
|
|
gid_type = ib_network_to_gid_type(net_type);
|
|
|
|
}
|
2016-11-10 17:30:56 +08:00
|
|
|
ret = ib_get_gids_from_rdma_hdr((union rdma_network_hdr *)grh, net_type,
|
|
|
|
&sgid, &dgid);
|
2015-12-23 20:56:51 +08:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2017-11-14 20:51:49 +08:00
|
|
|
rdma_ah_set_sl(ah_attr, wc->sl);
|
|
|
|
rdma_ah_set_port_num(ah_attr, port_num);
|
|
|
|
|
2015-12-23 20:56:51 +08:00
|
|
|
if (rdma_protocol_roce(device, port_num)) {
|
2015-10-15 23:38:51 +08:00
|
|
|
u16 vlan_id = wc->wc_flags & IB_WC_WITH_VLAN ?
|
|
|
|
wc->vlan_id : 0xffff;
|
|
|
|
|
IB/core: Ethernet L2 attributes in verbs/cm structures
This patch add the support for Ethernet L2 attributes in the
verbs/cm/cma structures.
When dealing with L2 Ethernet, we should use smac, dmac, vlan ID and priority
in a similar manner that the IB L2 (and the L4 PKEY) attributes are used.
Thus, those attributes were added to the following structures:
* ib_ah_attr - added dmac
* ib_qp_attr - added smac and vlan_id, (sl remains vlan priority)
* ib_wc - added smac, vlan_id
* ib_sa_path_rec - added smac, dmac, vlan_id
* cm_av - added smac and vlan_id
For the path record structure, extra care was taken to avoid the new
fields when packing it into wire format, so we don't break the IB CM
and SA wire protocol.
On the active side, the CM fills. its internal structures from the
path provided by the ULP. We add there taking the ETH L2 attributes
and placing them into the CM Address Handle (struct cm_av).
On the passive side, the CM fills its internal structures from the WC
associated with the REQ message. We add there taking the ETH L2
attributes from the WC.
When the HW driver provides the required ETH L2 attributes in the WC,
they set the IB_WC_WITH_SMAC and IB_WC_WITH_VLAN flags. The IB core
code checks for the presence of these flags, and in their absence does
address resolution from the ib_init_ah_from_wc() helper function.
ib_modify_qp_is_ok is also updated to consider the link layer. Some
parameters are mandatory for Ethernet link layer, while they are
irrelevant for IB. Vendor drivers are modified to support the new
function signature.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-12-13 00:03:11 +08:00
|
|
|
if (!(wc->wc_flags & IB_WC_GRH))
|
|
|
|
return -EPROTOTYPE;
|
|
|
|
|
2017-11-14 20:51:49 +08:00
|
|
|
ret = get_sgid_index_from_eth(device, port_num,
|
|
|
|
vlan_id, &dgid,
|
|
|
|
gid_type, &gid_index);
|
2015-10-15 23:38:51 +08:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
IB/core: Ethernet L2 attributes in verbs/cm structures
This patch add the support for Ethernet L2 attributes in the
verbs/cm/cma structures.
When dealing with L2 Ethernet, we should use smac, dmac, vlan ID and priority
in a similar manner that the IB L2 (and the L4 PKEY) attributes are used.
Thus, those attributes were added to the following structures:
* ib_ah_attr - added dmac
* ib_qp_attr - added smac and vlan_id, (sl remains vlan priority)
* ib_wc - added smac, vlan_id
* ib_sa_path_rec - added smac, dmac, vlan_id
* cm_av - added smac and vlan_id
For the path record structure, extra care was taken to avoid the new
fields when packing it into wire format, so we don't break the IB CM
and SA wire protocol.
On the active side, the CM fills. its internal structures from the
path provided by the ULP. We add there taking the ETH L2 attributes
and placing them into the CM Address Handle (struct cm_av).
On the passive side, the CM fills its internal structures from the WC
associated with the REQ message. We add there taking the ETH L2
attributes from the WC.
When the HW driver provides the required ETH L2 attributes in the WC,
they set the IB_WC_WITH_SMAC and IB_WC_WITH_VLAN flags. The IB core
code checks for the presence of these flags, and in their absence does
address resolution from the ib_init_ah_from_wc() helper function.
ib_modify_qp_is_ok is also updated to consider the link layer. Some
parameters are mandatory for Ethernet link layer, while they are
irrelevant for IB. Vendor drivers are modified to support the new
function signature.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-12-13 00:03:11 +08:00
|
|
|
|
2017-11-14 20:51:49 +08:00
|
|
|
flow_class = be32_to_cpu(grh->version_tclass_flow);
|
|
|
|
rdma_ah_set_grh(ah_attr, &sgid,
|
|
|
|
flow_class & 0xFFFFF,
|
|
|
|
(u8)gid_index, hoplimit,
|
|
|
|
(flow_class >> 20) & 0xFF);
|
|
|
|
return ib_resolve_unicast_gid_dmac(device, ah_attr);
|
|
|
|
} else {
|
|
|
|
rdma_ah_set_dlid(ah_attr, wc->slid);
|
|
|
|
rdma_ah_set_path_bits(ah_attr, wc->dlid_path_bits);
|
2005-07-28 02:45:34 +08:00
|
|
|
|
2017-11-14 20:51:49 +08:00
|
|
|
if (wc->wc_flags & IB_WC_GRH) {
|
2016-06-22 22:27:24 +08:00
|
|
|
if (dgid.global.interface_id != cpu_to_be64(IB_SA_WELL_KNOWN_GUID)) {
|
|
|
|
ret = ib_find_cached_gid_by_port(device, &dgid,
|
|
|
|
IB_GID_TYPE_IB,
|
|
|
|
port_num, NULL,
|
|
|
|
&gid_index);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
} else {
|
|
|
|
gid_index = 0;
|
|
|
|
}
|
2017-04-30 02:41:28 +08:00
|
|
|
|
2017-11-14 20:51:49 +08:00
|
|
|
flow_class = be32_to_cpu(grh->version_tclass_flow);
|
|
|
|
rdma_ah_set_grh(ah_attr, &sgid,
|
|
|
|
flow_class & 0xFFFFF,
|
|
|
|
(u8)gid_index, hoplimit,
|
|
|
|
(flow_class >> 20) & 0xFF);
|
|
|
|
}
|
|
|
|
return 0;
|
2005-07-28 02:45:34 +08:00
|
|
|
}
|
2006-06-18 11:37:39 +08:00
|
|
|
}
|
2017-11-14 20:52:17 +08:00
|
|
|
EXPORT_SYMBOL(ib_init_ah_attr_from_wc);
|
2006-06-18 11:37:39 +08:00
|
|
|
|
2015-06-01 05:15:31 +08:00
|
|
|
struct ib_ah *ib_create_ah_from_wc(struct ib_pd *pd, const struct ib_wc *wc,
|
|
|
|
const struct ib_grh *grh, u8 port_num)
|
2006-06-18 11:37:39 +08:00
|
|
|
{
|
2017-04-30 02:41:18 +08:00
|
|
|
struct rdma_ah_attr ah_attr;
|
2006-06-18 11:37:39 +08:00
|
|
|
int ret;
|
|
|
|
|
2017-11-14 20:52:17 +08:00
|
|
|
ret = ib_init_ah_attr_from_wc(pd->device, port_num, wc, grh, &ah_attr);
|
2006-06-18 11:37:39 +08:00
|
|
|
if (ret)
|
|
|
|
return ERR_PTR(ret);
|
2005-07-28 02:45:34 +08:00
|
|
|
|
2017-04-30 02:41:19 +08:00
|
|
|
return rdma_create_ah(pd, &ah_attr);
|
2005-07-28 02:45:34 +08:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_create_ah_from_wc);
|
|
|
|
|
2017-04-30 02:41:20 +08:00
|
|
|
int rdma_modify_ah(struct ib_ah *ah, struct rdma_ah_attr *ah_attr)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2017-04-30 02:41:29 +08:00
|
|
|
if (ah->type != ah_attr->type)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
return ah->device->modify_ah ?
|
|
|
|
ah->device->modify_ah(ah, ah_attr) :
|
|
|
|
-ENOSYS;
|
|
|
|
}
|
2017-04-30 02:41:20 +08:00
|
|
|
EXPORT_SYMBOL(rdma_modify_ah);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2017-04-30 02:41:21 +08:00
|
|
|
int rdma_query_ah(struct ib_ah *ah, struct rdma_ah_attr *ah_attr)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
|
|
|
return ah->device->query_ah ?
|
|
|
|
ah->device->query_ah(ah, ah_attr) :
|
|
|
|
-ENOSYS;
|
|
|
|
}
|
2017-04-30 02:41:21 +08:00
|
|
|
EXPORT_SYMBOL(rdma_query_ah);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2017-04-30 02:41:22 +08:00
|
|
|
int rdma_destroy_ah(struct ib_ah *ah)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
|
|
|
struct ib_pd *pd;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
pd = ah->pd;
|
|
|
|
ret = ah->device->destroy_ah(ah);
|
|
|
|
if (!ret)
|
|
|
|
atomic_dec(&pd->usecnt);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
2017-04-30 02:41:22 +08:00
|
|
|
EXPORT_SYMBOL(rdma_destroy_ah);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2005-08-19 03:23:08 +08:00
|
|
|
/* Shared receive queues */
|
|
|
|
|
|
|
|
struct ib_srq *ib_create_srq(struct ib_pd *pd,
|
|
|
|
struct ib_srq_init_attr *srq_init_attr)
|
|
|
|
{
|
|
|
|
struct ib_srq *srq;
|
|
|
|
|
|
|
|
if (!pd->device->create_srq)
|
|
|
|
return ERR_PTR(-ENOSYS);
|
|
|
|
|
|
|
|
srq = pd->device->create_srq(pd, srq_init_attr, NULL);
|
|
|
|
|
|
|
|
if (!IS_ERR(srq)) {
|
|
|
|
srq->device = pd->device;
|
|
|
|
srq->pd = pd;
|
|
|
|
srq->uobject = NULL;
|
|
|
|
srq->event_handler = srq_init_attr->event_handler;
|
|
|
|
srq->srq_context = srq_init_attr->srq_context;
|
2011-05-24 07:31:36 +08:00
|
|
|
srq->srq_type = srq_init_attr->srq_type;
|
2017-08-17 20:52:04 +08:00
|
|
|
if (ib_srq_has_cq(srq->srq_type)) {
|
|
|
|
srq->ext.cq = srq_init_attr->ext.cq;
|
|
|
|
atomic_inc(&srq->ext.cq->usecnt);
|
|
|
|
}
|
2011-05-24 10:42:29 +08:00
|
|
|
if (srq->srq_type == IB_SRQT_XRC) {
|
|
|
|
srq->ext.xrc.xrcd = srq_init_attr->ext.xrc.xrcd;
|
|
|
|
atomic_inc(&srq->ext.xrc.xrcd->usecnt);
|
|
|
|
}
|
2005-08-19 03:23:08 +08:00
|
|
|
atomic_inc(&pd->usecnt);
|
|
|
|
atomic_set(&srq->usecnt, 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
return srq;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_create_srq);
|
|
|
|
|
|
|
|
int ib_modify_srq(struct ib_srq *srq,
|
|
|
|
struct ib_srq_attr *srq_attr,
|
|
|
|
enum ib_srq_attr_mask srq_attr_mask)
|
|
|
|
{
|
2008-04-17 12:09:28 +08:00
|
|
|
return srq->device->modify_srq ?
|
|
|
|
srq->device->modify_srq(srq, srq_attr, srq_attr_mask, NULL) :
|
|
|
|
-ENOSYS;
|
2005-08-19 03:23:08 +08:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_modify_srq);
|
|
|
|
|
|
|
|
int ib_query_srq(struct ib_srq *srq,
|
|
|
|
struct ib_srq_attr *srq_attr)
|
|
|
|
{
|
|
|
|
return srq->device->query_srq ?
|
|
|
|
srq->device->query_srq(srq, srq_attr) : -ENOSYS;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_query_srq);
|
|
|
|
|
|
|
|
int ib_destroy_srq(struct ib_srq *srq)
|
|
|
|
{
|
|
|
|
struct ib_pd *pd;
|
2011-05-24 10:42:29 +08:00
|
|
|
enum ib_srq_type srq_type;
|
|
|
|
struct ib_xrcd *uninitialized_var(xrcd);
|
|
|
|
struct ib_cq *uninitialized_var(cq);
|
2005-08-19 03:23:08 +08:00
|
|
|
int ret;
|
|
|
|
|
|
|
|
if (atomic_read(&srq->usecnt))
|
|
|
|
return -EBUSY;
|
|
|
|
|
|
|
|
pd = srq->pd;
|
2011-05-24 10:42:29 +08:00
|
|
|
srq_type = srq->srq_type;
|
2017-08-17 20:52:04 +08:00
|
|
|
if (ib_srq_has_cq(srq_type))
|
|
|
|
cq = srq->ext.cq;
|
|
|
|
if (srq_type == IB_SRQT_XRC)
|
2011-05-24 10:42:29 +08:00
|
|
|
xrcd = srq->ext.xrc.xrcd;
|
2005-08-19 03:23:08 +08:00
|
|
|
|
|
|
|
ret = srq->device->destroy_srq(srq);
|
2011-05-24 10:42:29 +08:00
|
|
|
if (!ret) {
|
2005-08-19 03:23:08 +08:00
|
|
|
atomic_dec(&pd->usecnt);
|
2017-08-17 20:52:04 +08:00
|
|
|
if (srq_type == IB_SRQT_XRC)
|
2011-05-24 10:42:29 +08:00
|
|
|
atomic_dec(&xrcd->usecnt);
|
2017-08-17 20:52:04 +08:00
|
|
|
if (ib_srq_has_cq(srq_type))
|
2011-05-24 10:42:29 +08:00
|
|
|
atomic_dec(&cq->usecnt);
|
|
|
|
}
|
2005-08-19 03:23:08 +08:00
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_destroy_srq);
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
/* Queue pairs */
|
|
|
|
|
2011-08-09 06:31:51 +08:00
|
|
|
static void __ib_shared_qp_event_handler(struct ib_event *event, void *context)
|
|
|
|
{
|
|
|
|
struct ib_qp *qp = context;
|
2013-08-01 23:49:53 +08:00
|
|
|
unsigned long flags;
|
2011-08-09 06:31:51 +08:00
|
|
|
|
2013-08-01 23:49:53 +08:00
|
|
|
spin_lock_irqsave(&qp->device->event_handler_lock, flags);
|
2011-08-09 06:31:51 +08:00
|
|
|
list_for_each_entry(event->element.qp, &qp->open_list, open_list)
|
2013-04-10 22:26:46 +08:00
|
|
|
if (event->element.qp->event_handler)
|
|
|
|
event->element.qp->event_handler(event, event->element.qp->qp_context);
|
2013-08-01 23:49:53 +08:00
|
|
|
spin_unlock_irqrestore(&qp->device->event_handler_lock, flags);
|
2011-08-09 06:31:51 +08:00
|
|
|
}
|
|
|
|
|
2011-05-27 14:06:44 +08:00
|
|
|
static void __ib_insert_xrcd_qp(struct ib_xrcd *xrcd, struct ib_qp *qp)
|
|
|
|
{
|
|
|
|
mutex_lock(&xrcd->tgt_qp_mutex);
|
|
|
|
list_add(&qp->xrcd_list, &xrcd->tgt_qp_list);
|
|
|
|
mutex_unlock(&xrcd->tgt_qp_mutex);
|
|
|
|
}
|
|
|
|
|
2011-08-09 06:31:51 +08:00
|
|
|
static struct ib_qp *__ib_open_qp(struct ib_qp *real_qp,
|
|
|
|
void (*event_handler)(struct ib_event *, void *),
|
|
|
|
void *qp_context)
|
2011-05-27 14:06:44 +08:00
|
|
|
{
|
2011-08-09 06:31:51 +08:00
|
|
|
struct ib_qp *qp;
|
|
|
|
unsigned long flags;
|
IB/core: Enforce PKey security on QPs
Add new LSM hooks to allocate and free security contexts and check for
permission to access a PKey.
Allocate and free a security context when creating and destroying a QP.
This context is used for controlling access to PKeys.
When a request is made to modify a QP that changes the port, PKey index,
or alternate path, check that the QP has permission for the PKey in the
PKey table index on the subnet prefix of the port. If the QP is shared
make sure all handles to the QP also have access.
Store which port and PKey index a QP is using. After the reset to init
transition the user can modify the port, PKey index and alternate path
independently. So port and PKey settings changes can be a merge of the
previous settings and the new ones.
In order to maintain access control if there are PKey table or subnet
prefix change keep a list of all QPs are using each PKey index on
each port. If a change occurs all QPs using that device and port must
have access enforced for the new cache settings.
These changes add a transaction to the QP modify process. Association
with the old port and PKey index must be maintained if the modify fails,
and must be removed if it succeeds. Association with the new port and
PKey index must be established prior to the modify and removed if the
modify fails.
1. When a QP is modified to a particular Port, PKey index or alternate
path insert that QP into the appropriate lists.
2. Check permission to access the new settings.
3. If step 2 grants access attempt to modify the QP.
4a. If steps 2 and 3 succeed remove any prior associations.
4b. If ether fails remove the new setting associations.
If a PKey table or subnet prefix changes walk the list of QPs and
check that they have permission. If not send the QP to the error state
and raise a fatal error event. If it's a shared QP make sure all the
QPs that share the real_qp have permission as well. If the QP that
owns a security structure is denied access the security structure is
marked as such and the QP is added to an error_list. Once the moving
the QP to error is complete the security structure mark is cleared.
Maintaining the lists correctly turns QP destroy into a transaction.
The hardware driver for the device frees the ib_qp structure, so while
the destroy is in progress the ib_qp pointer in the ib_qp_security
struct is undefined. When the destroy process begins the ib_qp_security
structure is marked as destroying. This prevents any action from being
taken on the QP pointer. After the QP is destroyed successfully it
could still listed on an error_list wait for it to be processed by that
flow before cleaning up the structure.
If the destroy fails the QPs port and PKey settings are reinserted into
the appropriate lists, the destroying flag is cleared, and access control
is enforced, in case there were any cache changes during the destroy
flow.
To keep the security changes isolated a new file is used to hold security
related functionality.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Acked-by: Doug Ledford <dledford@redhat.com>
[PM: merge fixup in ib_verbs.h and uverbs_cmd.c]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-19 20:48:52 +08:00
|
|
|
int err;
|
2011-08-09 06:31:51 +08:00
|
|
|
|
|
|
|
qp = kzalloc(sizeof *qp, GFP_KERNEL);
|
|
|
|
if (!qp)
|
|
|
|
return ERR_PTR(-ENOMEM);
|
|
|
|
|
IB/core: Enforce PKey security on QPs
Add new LSM hooks to allocate and free security contexts and check for
permission to access a PKey.
Allocate and free a security context when creating and destroying a QP.
This context is used for controlling access to PKeys.
When a request is made to modify a QP that changes the port, PKey index,
or alternate path, check that the QP has permission for the PKey in the
PKey table index on the subnet prefix of the port. If the QP is shared
make sure all handles to the QP also have access.
Store which port and PKey index a QP is using. After the reset to init
transition the user can modify the port, PKey index and alternate path
independently. So port and PKey settings changes can be a merge of the
previous settings and the new ones.
In order to maintain access control if there are PKey table or subnet
prefix change keep a list of all QPs are using each PKey index on
each port. If a change occurs all QPs using that device and port must
have access enforced for the new cache settings.
These changes add a transaction to the QP modify process. Association
with the old port and PKey index must be maintained if the modify fails,
and must be removed if it succeeds. Association with the new port and
PKey index must be established prior to the modify and removed if the
modify fails.
1. When a QP is modified to a particular Port, PKey index or alternate
path insert that QP into the appropriate lists.
2. Check permission to access the new settings.
3. If step 2 grants access attempt to modify the QP.
4a. If steps 2 and 3 succeed remove any prior associations.
4b. If ether fails remove the new setting associations.
If a PKey table or subnet prefix changes walk the list of QPs and
check that they have permission. If not send the QP to the error state
and raise a fatal error event. If it's a shared QP make sure all the
QPs that share the real_qp have permission as well. If the QP that
owns a security structure is denied access the security structure is
marked as such and the QP is added to an error_list. Once the moving
the QP to error is complete the security structure mark is cleared.
Maintaining the lists correctly turns QP destroy into a transaction.
The hardware driver for the device frees the ib_qp structure, so while
the destroy is in progress the ib_qp pointer in the ib_qp_security
struct is undefined. When the destroy process begins the ib_qp_security
structure is marked as destroying. This prevents any action from being
taken on the QP pointer. After the QP is destroyed successfully it
could still listed on an error_list wait for it to be processed by that
flow before cleaning up the structure.
If the destroy fails the QPs port and PKey settings are reinserted into
the appropriate lists, the destroying flag is cleared, and access control
is enforced, in case there were any cache changes during the destroy
flow.
To keep the security changes isolated a new file is used to hold security
related functionality.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Acked-by: Doug Ledford <dledford@redhat.com>
[PM: merge fixup in ib_verbs.h and uverbs_cmd.c]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-19 20:48:52 +08:00
|
|
|
qp->real_qp = real_qp;
|
|
|
|
err = ib_open_shared_qp_security(qp, real_qp->device);
|
|
|
|
if (err) {
|
|
|
|
kfree(qp);
|
|
|
|
return ERR_PTR(err);
|
|
|
|
}
|
|
|
|
|
2011-08-09 06:31:51 +08:00
|
|
|
qp->real_qp = real_qp;
|
|
|
|
atomic_inc(&real_qp->usecnt);
|
|
|
|
qp->device = real_qp->device;
|
|
|
|
qp->event_handler = event_handler;
|
|
|
|
qp->qp_context = qp_context;
|
|
|
|
qp->qp_num = real_qp->qp_num;
|
|
|
|
qp->qp_type = real_qp->qp_type;
|
|
|
|
|
|
|
|
spin_lock_irqsave(&real_qp->device->event_handler_lock, flags);
|
|
|
|
list_add(&qp->open_list, &real_qp->open_list);
|
|
|
|
spin_unlock_irqrestore(&real_qp->device->event_handler_lock, flags);
|
|
|
|
|
|
|
|
return qp;
|
|
|
|
}
|
|
|
|
|
|
|
|
struct ib_qp *ib_open_qp(struct ib_xrcd *xrcd,
|
|
|
|
struct ib_qp_open_attr *qp_open_attr)
|
|
|
|
{
|
|
|
|
struct ib_qp *qp, *real_qp;
|
|
|
|
|
|
|
|
if (qp_open_attr->qp_type != IB_QPT_XRC_TGT)
|
|
|
|
return ERR_PTR(-EINVAL);
|
|
|
|
|
|
|
|
qp = ERR_PTR(-EINVAL);
|
2011-05-27 14:06:44 +08:00
|
|
|
mutex_lock(&xrcd->tgt_qp_mutex);
|
2011-08-09 06:31:51 +08:00
|
|
|
list_for_each_entry(real_qp, &xrcd->tgt_qp_list, xrcd_list) {
|
|
|
|
if (real_qp->qp_num == qp_open_attr->qp_num) {
|
|
|
|
qp = __ib_open_qp(real_qp, qp_open_attr->event_handler,
|
|
|
|
qp_open_attr->qp_context);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
2011-05-27 14:06:44 +08:00
|
|
|
mutex_unlock(&xrcd->tgt_qp_mutex);
|
2011-08-09 06:31:51 +08:00
|
|
|
return qp;
|
2011-05-27 14:06:44 +08:00
|
|
|
}
|
2011-08-09 06:31:51 +08:00
|
|
|
EXPORT_SYMBOL(ib_open_qp);
|
2011-05-27 14:06:44 +08:00
|
|
|
|
2016-05-04 00:01:06 +08:00
|
|
|
static struct ib_qp *ib_create_xrc_qp(struct ib_qp *qp,
|
|
|
|
struct ib_qp_init_attr *qp_init_attr)
|
|
|
|
{
|
|
|
|
struct ib_qp *real_qp = qp;
|
|
|
|
|
|
|
|
qp->event_handler = __ib_shared_qp_event_handler;
|
|
|
|
qp->qp_context = qp;
|
|
|
|
qp->pd = NULL;
|
|
|
|
qp->send_cq = qp->recv_cq = NULL;
|
|
|
|
qp->srq = NULL;
|
|
|
|
qp->xrcd = qp_init_attr->xrcd;
|
|
|
|
atomic_inc(&qp_init_attr->xrcd->usecnt);
|
|
|
|
INIT_LIST_HEAD(&qp->open_list);
|
|
|
|
|
|
|
|
qp = __ib_open_qp(real_qp, qp_init_attr->event_handler,
|
|
|
|
qp_init_attr->qp_context);
|
|
|
|
if (!IS_ERR(qp))
|
|
|
|
__ib_insert_xrcd_qp(qp_init_attr->xrcd, real_qp);
|
|
|
|
else
|
|
|
|
real_qp->device->destroy_qp(real_qp);
|
|
|
|
return qp;
|
|
|
|
}
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
struct ib_qp *ib_create_qp(struct ib_pd *pd,
|
|
|
|
struct ib_qp_init_attr *qp_init_attr)
|
|
|
|
{
|
2016-05-04 00:01:06 +08:00
|
|
|
struct ib_device *device = pd ? pd->device : qp_init_attr->xrcd->device;
|
|
|
|
struct ib_qp *qp;
|
2016-05-04 00:01:09 +08:00
|
|
|
int ret;
|
|
|
|
|
2016-05-23 20:20:54 +08:00
|
|
|
if (qp_init_attr->rwq_ind_tbl &&
|
|
|
|
(qp_init_attr->recv_cq ||
|
|
|
|
qp_init_attr->srq || qp_init_attr->cap.max_recv_wr ||
|
|
|
|
qp_init_attr->cap.max_recv_sge))
|
|
|
|
return ERR_PTR(-EINVAL);
|
|
|
|
|
2016-05-04 00:01:09 +08:00
|
|
|
/*
|
|
|
|
* If the callers is using the RDMA API calculate the resources
|
|
|
|
* needed for the RDMA READ/WRITE operations.
|
|
|
|
*
|
|
|
|
* Note that these callers need to pass in a port number.
|
|
|
|
*/
|
|
|
|
if (qp_init_attr->cap.max_rdma_ctxs)
|
|
|
|
rdma_rw_init_qp(device, qp_init_attr);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2011-05-24 10:59:25 +08:00
|
|
|
qp = device->create_qp(pd, qp_init_attr, NULL);
|
2016-05-04 00:01:06 +08:00
|
|
|
if (IS_ERR(qp))
|
|
|
|
return qp;
|
|
|
|
|
IB/core: Enforce PKey security on QPs
Add new LSM hooks to allocate and free security contexts and check for
permission to access a PKey.
Allocate and free a security context when creating and destroying a QP.
This context is used for controlling access to PKeys.
When a request is made to modify a QP that changes the port, PKey index,
or alternate path, check that the QP has permission for the PKey in the
PKey table index on the subnet prefix of the port. If the QP is shared
make sure all handles to the QP also have access.
Store which port and PKey index a QP is using. After the reset to init
transition the user can modify the port, PKey index and alternate path
independently. So port and PKey settings changes can be a merge of the
previous settings and the new ones.
In order to maintain access control if there are PKey table or subnet
prefix change keep a list of all QPs are using each PKey index on
each port. If a change occurs all QPs using that device and port must
have access enforced for the new cache settings.
These changes add a transaction to the QP modify process. Association
with the old port and PKey index must be maintained if the modify fails,
and must be removed if it succeeds. Association with the new port and
PKey index must be established prior to the modify and removed if the
modify fails.
1. When a QP is modified to a particular Port, PKey index or alternate
path insert that QP into the appropriate lists.
2. Check permission to access the new settings.
3. If step 2 grants access attempt to modify the QP.
4a. If steps 2 and 3 succeed remove any prior associations.
4b. If ether fails remove the new setting associations.
If a PKey table or subnet prefix changes walk the list of QPs and
check that they have permission. If not send the QP to the error state
and raise a fatal error event. If it's a shared QP make sure all the
QPs that share the real_qp have permission as well. If the QP that
owns a security structure is denied access the security structure is
marked as such and the QP is added to an error_list. Once the moving
the QP to error is complete the security structure mark is cleared.
Maintaining the lists correctly turns QP destroy into a transaction.
The hardware driver for the device frees the ib_qp structure, so while
the destroy is in progress the ib_qp pointer in the ib_qp_security
struct is undefined. When the destroy process begins the ib_qp_security
structure is marked as destroying. This prevents any action from being
taken on the QP pointer. After the QP is destroyed successfully it
could still listed on an error_list wait for it to be processed by that
flow before cleaning up the structure.
If the destroy fails the QPs port and PKey settings are reinserted into
the appropriate lists, the destroying flag is cleared, and access control
is enforced, in case there were any cache changes during the destroy
flow.
To keep the security changes isolated a new file is used to hold security
related functionality.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Acked-by: Doug Ledford <dledford@redhat.com>
[PM: merge fixup in ib_verbs.h and uverbs_cmd.c]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-19 20:48:52 +08:00
|
|
|
ret = ib_create_qp_security(qp, device);
|
|
|
|
if (ret) {
|
|
|
|
ib_destroy_qp(qp);
|
|
|
|
return ERR_PTR(ret);
|
|
|
|
}
|
|
|
|
|
2016-05-04 00:01:06 +08:00
|
|
|
qp->device = device;
|
|
|
|
qp->real_qp = qp;
|
|
|
|
qp->uobject = NULL;
|
|
|
|
qp->qp_type = qp_init_attr->qp_type;
|
2016-05-23 20:20:54 +08:00
|
|
|
qp->rwq_ind_tbl = qp_init_attr->rwq_ind_tbl;
|
2016-05-04 00:01:06 +08:00
|
|
|
|
|
|
|
atomic_set(&qp->usecnt, 0);
|
2016-05-04 00:01:07 +08:00
|
|
|
qp->mrs_used = 0;
|
|
|
|
spin_lock_init(&qp->mr_lock);
|
2016-05-04 00:01:09 +08:00
|
|
|
INIT_LIST_HEAD(&qp->rdma_mrs);
|
2016-05-04 00:01:12 +08:00
|
|
|
INIT_LIST_HEAD(&qp->sig_mrs);
|
2017-08-23 13:35:40 +08:00
|
|
|
qp->port = 0;
|
2016-05-04 00:01:07 +08:00
|
|
|
|
2016-05-04 00:01:06 +08:00
|
|
|
if (qp_init_attr->qp_type == IB_QPT_XRC_TGT)
|
|
|
|
return ib_create_xrc_qp(qp, qp_init_attr);
|
|
|
|
|
|
|
|
qp->event_handler = qp_init_attr->event_handler;
|
|
|
|
qp->qp_context = qp_init_attr->qp_context;
|
|
|
|
if (qp_init_attr->qp_type == IB_QPT_XRC_INI) {
|
|
|
|
qp->recv_cq = NULL;
|
|
|
|
qp->srq = NULL;
|
|
|
|
} else {
|
|
|
|
qp->recv_cq = qp_init_attr->recv_cq;
|
2016-05-23 20:20:54 +08:00
|
|
|
if (qp_init_attr->recv_cq)
|
|
|
|
atomic_inc(&qp_init_attr->recv_cq->usecnt);
|
2016-05-04 00:01:06 +08:00
|
|
|
qp->srq = qp_init_attr->srq;
|
|
|
|
if (qp->srq)
|
|
|
|
atomic_inc(&qp_init_attr->srq->usecnt);
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
2016-05-04 00:01:06 +08:00
|
|
|
qp->pd = pd;
|
|
|
|
qp->send_cq = qp_init_attr->send_cq;
|
|
|
|
qp->xrcd = NULL;
|
|
|
|
|
|
|
|
atomic_inc(&pd->usecnt);
|
2016-05-23 20:20:54 +08:00
|
|
|
if (qp_init_attr->send_cq)
|
|
|
|
atomic_inc(&qp_init_attr->send_cq->usecnt);
|
|
|
|
if (qp_init_attr->rwq_ind_tbl)
|
|
|
|
atomic_inc(&qp->rwq_ind_tbl->usecnt);
|
2016-05-04 00:01:09 +08:00
|
|
|
|
|
|
|
if (qp_init_attr->cap.max_rdma_ctxs) {
|
|
|
|
ret = rdma_rw_init_mrs(qp, qp_init_attr);
|
|
|
|
if (ret) {
|
|
|
|
pr_err("failed to init MR pool ret= %d\n", ret);
|
|
|
|
ib_destroy_qp(qp);
|
2016-09-29 22:31:33 +08:00
|
|
|
return ERR_PTR(ret);
|
2016-05-04 00:01:09 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-07-22 04:03:30 +08:00
|
|
|
/*
|
|
|
|
* Note: all hw drivers guarantee that max_send_sge is lower than
|
|
|
|
* the device RDMA WRITE SGE limit but not all hw drivers ensure that
|
|
|
|
* max_send_sge <= max_sge_rd.
|
|
|
|
*/
|
|
|
|
qp->max_write_sge = qp_init_attr->cap.max_send_sge;
|
|
|
|
qp->max_read_sge = min_t(u32, qp_init_attr->cap.max_send_sge,
|
|
|
|
device->attrs.max_sge_rd);
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
return qp;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_create_qp);
|
|
|
|
|
2006-02-14 04:48:12 +08:00
|
|
|
static const struct {
|
|
|
|
int valid;
|
2011-05-24 10:59:25 +08:00
|
|
|
enum ib_qp_attr_mask req_param[IB_QPT_MAX];
|
|
|
|
enum ib_qp_attr_mask opt_param[IB_QPT_MAX];
|
2006-02-14 04:48:12 +08:00
|
|
|
} qp_state_table[IB_QPS_ERR + 1][IB_QPS_ERR + 1] = {
|
|
|
|
[IB_QPS_RESET] = {
|
|
|
|
[IB_QPS_RESET] = { .valid = 1 },
|
|
|
|
[IB_QPS_INIT] = {
|
|
|
|
.valid = 1,
|
|
|
|
.req_param = {
|
|
|
|
[IB_QPT_UD] = (IB_QP_PKEY_INDEX |
|
|
|
|
IB_QP_PORT |
|
|
|
|
IB_QP_QKEY),
|
2012-03-01 18:17:51 +08:00
|
|
|
[IB_QPT_RAW_PACKET] = IB_QP_PORT,
|
2006-02-14 04:48:12 +08:00
|
|
|
[IB_QPT_UC] = (IB_QP_PKEY_INDEX |
|
|
|
|
IB_QP_PORT |
|
|
|
|
IB_QP_ACCESS_FLAGS),
|
|
|
|
[IB_QPT_RC] = (IB_QP_PKEY_INDEX |
|
|
|
|
IB_QP_PORT |
|
|
|
|
IB_QP_ACCESS_FLAGS),
|
2011-05-24 10:59:25 +08:00
|
|
|
[IB_QPT_XRC_INI] = (IB_QP_PKEY_INDEX |
|
|
|
|
IB_QP_PORT |
|
|
|
|
IB_QP_ACCESS_FLAGS),
|
|
|
|
[IB_QPT_XRC_TGT] = (IB_QP_PKEY_INDEX |
|
|
|
|
IB_QP_PORT |
|
|
|
|
IB_QP_ACCESS_FLAGS),
|
2006-02-14 04:48:12 +08:00
|
|
|
[IB_QPT_SMI] = (IB_QP_PKEY_INDEX |
|
|
|
|
IB_QP_QKEY),
|
|
|
|
[IB_QPT_GSI] = (IB_QP_PKEY_INDEX |
|
|
|
|
IB_QP_QKEY),
|
|
|
|
}
|
|
|
|
},
|
|
|
|
},
|
|
|
|
[IB_QPS_INIT] = {
|
|
|
|
[IB_QPS_RESET] = { .valid = 1 },
|
|
|
|
[IB_QPS_ERR] = { .valid = 1 },
|
|
|
|
[IB_QPS_INIT] = {
|
|
|
|
.valid = 1,
|
|
|
|
.opt_param = {
|
|
|
|
[IB_QPT_UD] = (IB_QP_PKEY_INDEX |
|
|
|
|
IB_QP_PORT |
|
|
|
|
IB_QP_QKEY),
|
|
|
|
[IB_QPT_UC] = (IB_QP_PKEY_INDEX |
|
|
|
|
IB_QP_PORT |
|
|
|
|
IB_QP_ACCESS_FLAGS),
|
|
|
|
[IB_QPT_RC] = (IB_QP_PKEY_INDEX |
|
|
|
|
IB_QP_PORT |
|
|
|
|
IB_QP_ACCESS_FLAGS),
|
2011-05-24 10:59:25 +08:00
|
|
|
[IB_QPT_XRC_INI] = (IB_QP_PKEY_INDEX |
|
|
|
|
IB_QP_PORT |
|
|
|
|
IB_QP_ACCESS_FLAGS),
|
|
|
|
[IB_QPT_XRC_TGT] = (IB_QP_PKEY_INDEX |
|
|
|
|
IB_QP_PORT |
|
|
|
|
IB_QP_ACCESS_FLAGS),
|
2006-02-14 04:48:12 +08:00
|
|
|
[IB_QPT_SMI] = (IB_QP_PKEY_INDEX |
|
|
|
|
IB_QP_QKEY),
|
|
|
|
[IB_QPT_GSI] = (IB_QP_PKEY_INDEX |
|
|
|
|
IB_QP_QKEY),
|
|
|
|
}
|
|
|
|
},
|
|
|
|
[IB_QPS_RTR] = {
|
|
|
|
.valid = 1,
|
|
|
|
.req_param = {
|
|
|
|
[IB_QPT_UC] = (IB_QP_AV |
|
|
|
|
IB_QP_PATH_MTU |
|
|
|
|
IB_QP_DEST_QPN |
|
|
|
|
IB_QP_RQ_PSN),
|
|
|
|
[IB_QPT_RC] = (IB_QP_AV |
|
|
|
|
IB_QP_PATH_MTU |
|
|
|
|
IB_QP_DEST_QPN |
|
|
|
|
IB_QP_RQ_PSN |
|
|
|
|
IB_QP_MAX_DEST_RD_ATOMIC |
|
|
|
|
IB_QP_MIN_RNR_TIMER),
|
2011-05-24 10:59:25 +08:00
|
|
|
[IB_QPT_XRC_INI] = (IB_QP_AV |
|
|
|
|
IB_QP_PATH_MTU |
|
|
|
|
IB_QP_DEST_QPN |
|
|
|
|
IB_QP_RQ_PSN),
|
|
|
|
[IB_QPT_XRC_TGT] = (IB_QP_AV |
|
|
|
|
IB_QP_PATH_MTU |
|
|
|
|
IB_QP_DEST_QPN |
|
|
|
|
IB_QP_RQ_PSN |
|
|
|
|
IB_QP_MAX_DEST_RD_ATOMIC |
|
|
|
|
IB_QP_MIN_RNR_TIMER),
|
2006-02-14 04:48:12 +08:00
|
|
|
},
|
|
|
|
.opt_param = {
|
|
|
|
[IB_QPT_UD] = (IB_QP_PKEY_INDEX |
|
|
|
|
IB_QP_QKEY),
|
|
|
|
[IB_QPT_UC] = (IB_QP_ALT_PATH |
|
|
|
|
IB_QP_ACCESS_FLAGS |
|
|
|
|
IB_QP_PKEY_INDEX),
|
|
|
|
[IB_QPT_RC] = (IB_QP_ALT_PATH |
|
|
|
|
IB_QP_ACCESS_FLAGS |
|
|
|
|
IB_QP_PKEY_INDEX),
|
2011-05-24 10:59:25 +08:00
|
|
|
[IB_QPT_XRC_INI] = (IB_QP_ALT_PATH |
|
|
|
|
IB_QP_ACCESS_FLAGS |
|
|
|
|
IB_QP_PKEY_INDEX),
|
|
|
|
[IB_QPT_XRC_TGT] = (IB_QP_ALT_PATH |
|
|
|
|
IB_QP_ACCESS_FLAGS |
|
|
|
|
IB_QP_PKEY_INDEX),
|
2006-02-14 04:48:12 +08:00
|
|
|
[IB_QPT_SMI] = (IB_QP_PKEY_INDEX |
|
|
|
|
IB_QP_QKEY),
|
|
|
|
[IB_QPT_GSI] = (IB_QP_PKEY_INDEX |
|
|
|
|
IB_QP_QKEY),
|
IB/core: Ethernet L2 attributes in verbs/cm structures
This patch add the support for Ethernet L2 attributes in the
verbs/cm/cma structures.
When dealing with L2 Ethernet, we should use smac, dmac, vlan ID and priority
in a similar manner that the IB L2 (and the L4 PKEY) attributes are used.
Thus, those attributes were added to the following structures:
* ib_ah_attr - added dmac
* ib_qp_attr - added smac and vlan_id, (sl remains vlan priority)
* ib_wc - added smac, vlan_id
* ib_sa_path_rec - added smac, dmac, vlan_id
* cm_av - added smac and vlan_id
For the path record structure, extra care was taken to avoid the new
fields when packing it into wire format, so we don't break the IB CM
and SA wire protocol.
On the active side, the CM fills. its internal structures from the
path provided by the ULP. We add there taking the ETH L2 attributes
and placing them into the CM Address Handle (struct cm_av).
On the passive side, the CM fills its internal structures from the WC
associated with the REQ message. We add there taking the ETH L2
attributes from the WC.
When the HW driver provides the required ETH L2 attributes in the WC,
they set the IB_WC_WITH_SMAC and IB_WC_WITH_VLAN flags. The IB core
code checks for the presence of these flags, and in their absence does
address resolution from the ib_init_ah_from_wc() helper function.
ib_modify_qp_is_ok is also updated to consider the link layer. Some
parameters are mandatory for Ethernet link layer, while they are
irrelevant for IB. Vendor drivers are modified to support the new
function signature.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-12-13 00:03:11 +08:00
|
|
|
},
|
2015-10-15 23:38:51 +08:00
|
|
|
},
|
2006-02-14 04:48:12 +08:00
|
|
|
},
|
|
|
|
[IB_QPS_RTR] = {
|
|
|
|
[IB_QPS_RESET] = { .valid = 1 },
|
|
|
|
[IB_QPS_ERR] = { .valid = 1 },
|
|
|
|
[IB_QPS_RTS] = {
|
|
|
|
.valid = 1,
|
|
|
|
.req_param = {
|
|
|
|
[IB_QPT_UD] = IB_QP_SQ_PSN,
|
|
|
|
[IB_QPT_UC] = IB_QP_SQ_PSN,
|
|
|
|
[IB_QPT_RC] = (IB_QP_TIMEOUT |
|
|
|
|
IB_QP_RETRY_CNT |
|
|
|
|
IB_QP_RNR_RETRY |
|
|
|
|
IB_QP_SQ_PSN |
|
|
|
|
IB_QP_MAX_QP_RD_ATOMIC),
|
2011-05-24 10:59:25 +08:00
|
|
|
[IB_QPT_XRC_INI] = (IB_QP_TIMEOUT |
|
|
|
|
IB_QP_RETRY_CNT |
|
|
|
|
IB_QP_RNR_RETRY |
|
|
|
|
IB_QP_SQ_PSN |
|
|
|
|
IB_QP_MAX_QP_RD_ATOMIC),
|
|
|
|
[IB_QPT_XRC_TGT] = (IB_QP_TIMEOUT |
|
|
|
|
IB_QP_SQ_PSN),
|
2006-02-14 04:48:12 +08:00
|
|
|
[IB_QPT_SMI] = IB_QP_SQ_PSN,
|
|
|
|
[IB_QPT_GSI] = IB_QP_SQ_PSN,
|
|
|
|
},
|
|
|
|
.opt_param = {
|
|
|
|
[IB_QPT_UD] = (IB_QP_CUR_STATE |
|
|
|
|
IB_QP_QKEY),
|
|
|
|
[IB_QPT_UC] = (IB_QP_CUR_STATE |
|
|
|
|
IB_QP_ALT_PATH |
|
|
|
|
IB_QP_ACCESS_FLAGS |
|
|
|
|
IB_QP_PATH_MIG_STATE),
|
|
|
|
[IB_QPT_RC] = (IB_QP_CUR_STATE |
|
|
|
|
IB_QP_ALT_PATH |
|
|
|
|
IB_QP_ACCESS_FLAGS |
|
|
|
|
IB_QP_MIN_RNR_TIMER |
|
|
|
|
IB_QP_PATH_MIG_STATE),
|
2011-05-24 10:59:25 +08:00
|
|
|
[IB_QPT_XRC_INI] = (IB_QP_CUR_STATE |
|
|
|
|
IB_QP_ALT_PATH |
|
|
|
|
IB_QP_ACCESS_FLAGS |
|
|
|
|
IB_QP_PATH_MIG_STATE),
|
|
|
|
[IB_QPT_XRC_TGT] = (IB_QP_CUR_STATE |
|
|
|
|
IB_QP_ALT_PATH |
|
|
|
|
IB_QP_ACCESS_FLAGS |
|
|
|
|
IB_QP_MIN_RNR_TIMER |
|
|
|
|
IB_QP_PATH_MIG_STATE),
|
2006-02-14 04:48:12 +08:00
|
|
|
[IB_QPT_SMI] = (IB_QP_CUR_STATE |
|
|
|
|
IB_QP_QKEY),
|
|
|
|
[IB_QPT_GSI] = (IB_QP_CUR_STATE |
|
|
|
|
IB_QP_QKEY),
|
2016-12-01 19:43:14 +08:00
|
|
|
[IB_QPT_RAW_PACKET] = IB_QP_RATE_LIMIT,
|
2006-02-14 04:48:12 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
},
|
|
|
|
[IB_QPS_RTS] = {
|
|
|
|
[IB_QPS_RESET] = { .valid = 1 },
|
|
|
|
[IB_QPS_ERR] = { .valid = 1 },
|
|
|
|
[IB_QPS_RTS] = {
|
|
|
|
.valid = 1,
|
|
|
|
.opt_param = {
|
|
|
|
[IB_QPT_UD] = (IB_QP_CUR_STATE |
|
|
|
|
IB_QP_QKEY),
|
2006-03-03 03:22:28 +08:00
|
|
|
[IB_QPT_UC] = (IB_QP_CUR_STATE |
|
|
|
|
IB_QP_ACCESS_FLAGS |
|
2006-02-14 04:48:12 +08:00
|
|
|
IB_QP_ALT_PATH |
|
|
|
|
IB_QP_PATH_MIG_STATE),
|
2006-03-03 03:22:28 +08:00
|
|
|
[IB_QPT_RC] = (IB_QP_CUR_STATE |
|
|
|
|
IB_QP_ACCESS_FLAGS |
|
2006-02-14 04:48:12 +08:00
|
|
|
IB_QP_ALT_PATH |
|
|
|
|
IB_QP_PATH_MIG_STATE |
|
|
|
|
IB_QP_MIN_RNR_TIMER),
|
2011-05-24 10:59:25 +08:00
|
|
|
[IB_QPT_XRC_INI] = (IB_QP_CUR_STATE |
|
|
|
|
IB_QP_ACCESS_FLAGS |
|
|
|
|
IB_QP_ALT_PATH |
|
|
|
|
IB_QP_PATH_MIG_STATE),
|
|
|
|
[IB_QPT_XRC_TGT] = (IB_QP_CUR_STATE |
|
|
|
|
IB_QP_ACCESS_FLAGS |
|
|
|
|
IB_QP_ALT_PATH |
|
|
|
|
IB_QP_PATH_MIG_STATE |
|
|
|
|
IB_QP_MIN_RNR_TIMER),
|
2006-02-14 04:48:12 +08:00
|
|
|
[IB_QPT_SMI] = (IB_QP_CUR_STATE |
|
|
|
|
IB_QP_QKEY),
|
|
|
|
[IB_QPT_GSI] = (IB_QP_CUR_STATE |
|
|
|
|
IB_QP_QKEY),
|
2016-12-01 19:43:14 +08:00
|
|
|
[IB_QPT_RAW_PACKET] = IB_QP_RATE_LIMIT,
|
2006-02-14 04:48:12 +08:00
|
|
|
}
|
|
|
|
},
|
|
|
|
[IB_QPS_SQD] = {
|
|
|
|
.valid = 1,
|
|
|
|
.opt_param = {
|
|
|
|
[IB_QPT_UD] = IB_QP_EN_SQD_ASYNC_NOTIFY,
|
|
|
|
[IB_QPT_UC] = IB_QP_EN_SQD_ASYNC_NOTIFY,
|
|
|
|
[IB_QPT_RC] = IB_QP_EN_SQD_ASYNC_NOTIFY,
|
2011-05-24 10:59:25 +08:00
|
|
|
[IB_QPT_XRC_INI] = IB_QP_EN_SQD_ASYNC_NOTIFY,
|
|
|
|
[IB_QPT_XRC_TGT] = IB_QP_EN_SQD_ASYNC_NOTIFY, /* ??? */
|
2006-02-14 04:48:12 +08:00
|
|
|
[IB_QPT_SMI] = IB_QP_EN_SQD_ASYNC_NOTIFY,
|
|
|
|
[IB_QPT_GSI] = IB_QP_EN_SQD_ASYNC_NOTIFY
|
|
|
|
}
|
|
|
|
},
|
|
|
|
},
|
|
|
|
[IB_QPS_SQD] = {
|
|
|
|
[IB_QPS_RESET] = { .valid = 1 },
|
|
|
|
[IB_QPS_ERR] = { .valid = 1 },
|
|
|
|
[IB_QPS_RTS] = {
|
|
|
|
.valid = 1,
|
|
|
|
.opt_param = {
|
|
|
|
[IB_QPT_UD] = (IB_QP_CUR_STATE |
|
|
|
|
IB_QP_QKEY),
|
|
|
|
[IB_QPT_UC] = (IB_QP_CUR_STATE |
|
|
|
|
IB_QP_ALT_PATH |
|
|
|
|
IB_QP_ACCESS_FLAGS |
|
|
|
|
IB_QP_PATH_MIG_STATE),
|
|
|
|
[IB_QPT_RC] = (IB_QP_CUR_STATE |
|
|
|
|
IB_QP_ALT_PATH |
|
|
|
|
IB_QP_ACCESS_FLAGS |
|
|
|
|
IB_QP_MIN_RNR_TIMER |
|
|
|
|
IB_QP_PATH_MIG_STATE),
|
2011-05-24 10:59:25 +08:00
|
|
|
[IB_QPT_XRC_INI] = (IB_QP_CUR_STATE |
|
|
|
|
IB_QP_ALT_PATH |
|
|
|
|
IB_QP_ACCESS_FLAGS |
|
|
|
|
IB_QP_PATH_MIG_STATE),
|
|
|
|
[IB_QPT_XRC_TGT] = (IB_QP_CUR_STATE |
|
|
|
|
IB_QP_ALT_PATH |
|
|
|
|
IB_QP_ACCESS_FLAGS |
|
|
|
|
IB_QP_MIN_RNR_TIMER |
|
|
|
|
IB_QP_PATH_MIG_STATE),
|
2006-02-14 04:48:12 +08:00
|
|
|
[IB_QPT_SMI] = (IB_QP_CUR_STATE |
|
|
|
|
IB_QP_QKEY),
|
|
|
|
[IB_QPT_GSI] = (IB_QP_CUR_STATE |
|
|
|
|
IB_QP_QKEY),
|
|
|
|
}
|
|
|
|
},
|
|
|
|
[IB_QPS_SQD] = {
|
|
|
|
.valid = 1,
|
|
|
|
.opt_param = {
|
|
|
|
[IB_QPT_UD] = (IB_QP_PKEY_INDEX |
|
|
|
|
IB_QP_QKEY),
|
|
|
|
[IB_QPT_UC] = (IB_QP_AV |
|
|
|
|
IB_QP_ALT_PATH |
|
|
|
|
IB_QP_ACCESS_FLAGS |
|
|
|
|
IB_QP_PKEY_INDEX |
|
|
|
|
IB_QP_PATH_MIG_STATE),
|
|
|
|
[IB_QPT_RC] = (IB_QP_PORT |
|
|
|
|
IB_QP_AV |
|
|
|
|
IB_QP_TIMEOUT |
|
|
|
|
IB_QP_RETRY_CNT |
|
|
|
|
IB_QP_RNR_RETRY |
|
|
|
|
IB_QP_MAX_QP_RD_ATOMIC |
|
|
|
|
IB_QP_MAX_DEST_RD_ATOMIC |
|
|
|
|
IB_QP_ALT_PATH |
|
|
|
|
IB_QP_ACCESS_FLAGS |
|
|
|
|
IB_QP_PKEY_INDEX |
|
|
|
|
IB_QP_MIN_RNR_TIMER |
|
|
|
|
IB_QP_PATH_MIG_STATE),
|
2011-05-24 10:59:25 +08:00
|
|
|
[IB_QPT_XRC_INI] = (IB_QP_PORT |
|
|
|
|
IB_QP_AV |
|
|
|
|
IB_QP_TIMEOUT |
|
|
|
|
IB_QP_RETRY_CNT |
|
|
|
|
IB_QP_RNR_RETRY |
|
|
|
|
IB_QP_MAX_QP_RD_ATOMIC |
|
|
|
|
IB_QP_ALT_PATH |
|
|
|
|
IB_QP_ACCESS_FLAGS |
|
|
|
|
IB_QP_PKEY_INDEX |
|
|
|
|
IB_QP_PATH_MIG_STATE),
|
|
|
|
[IB_QPT_XRC_TGT] = (IB_QP_PORT |
|
|
|
|
IB_QP_AV |
|
|
|
|
IB_QP_TIMEOUT |
|
|
|
|
IB_QP_MAX_DEST_RD_ATOMIC |
|
|
|
|
IB_QP_ALT_PATH |
|
|
|
|
IB_QP_ACCESS_FLAGS |
|
|
|
|
IB_QP_PKEY_INDEX |
|
|
|
|
IB_QP_MIN_RNR_TIMER |
|
|
|
|
IB_QP_PATH_MIG_STATE),
|
2006-02-14 04:48:12 +08:00
|
|
|
[IB_QPT_SMI] = (IB_QP_PKEY_INDEX |
|
|
|
|
IB_QP_QKEY),
|
|
|
|
[IB_QPT_GSI] = (IB_QP_PKEY_INDEX |
|
|
|
|
IB_QP_QKEY),
|
|
|
|
}
|
|
|
|
}
|
|
|
|
},
|
|
|
|
[IB_QPS_SQE] = {
|
|
|
|
[IB_QPS_RESET] = { .valid = 1 },
|
|
|
|
[IB_QPS_ERR] = { .valid = 1 },
|
|
|
|
[IB_QPS_RTS] = {
|
|
|
|
.valid = 1,
|
|
|
|
.opt_param = {
|
|
|
|
[IB_QPT_UD] = (IB_QP_CUR_STATE |
|
|
|
|
IB_QP_QKEY),
|
|
|
|
[IB_QPT_UC] = (IB_QP_CUR_STATE |
|
|
|
|
IB_QP_ACCESS_FLAGS),
|
|
|
|
[IB_QPT_SMI] = (IB_QP_CUR_STATE |
|
|
|
|
IB_QP_QKEY),
|
|
|
|
[IB_QPT_GSI] = (IB_QP_CUR_STATE |
|
|
|
|
IB_QP_QKEY),
|
|
|
|
}
|
|
|
|
}
|
|
|
|
},
|
|
|
|
[IB_QPS_ERR] = {
|
|
|
|
[IB_QPS_RESET] = { .valid = 1 },
|
|
|
|
[IB_QPS_ERR] = { .valid = 1 }
|
|
|
|
}
|
|
|
|
};
|
|
|
|
|
|
|
|
int ib_modify_qp_is_ok(enum ib_qp_state cur_state, enum ib_qp_state next_state,
|
IB/core: Ethernet L2 attributes in verbs/cm structures
This patch add the support for Ethernet L2 attributes in the
verbs/cm/cma structures.
When dealing with L2 Ethernet, we should use smac, dmac, vlan ID and priority
in a similar manner that the IB L2 (and the L4 PKEY) attributes are used.
Thus, those attributes were added to the following structures:
* ib_ah_attr - added dmac
* ib_qp_attr - added smac and vlan_id, (sl remains vlan priority)
* ib_wc - added smac, vlan_id
* ib_sa_path_rec - added smac, dmac, vlan_id
* cm_av - added smac and vlan_id
For the path record structure, extra care was taken to avoid the new
fields when packing it into wire format, so we don't break the IB CM
and SA wire protocol.
On the active side, the CM fills. its internal structures from the
path provided by the ULP. We add there taking the ETH L2 attributes
and placing them into the CM Address Handle (struct cm_av).
On the passive side, the CM fills its internal structures from the WC
associated with the REQ message. We add there taking the ETH L2
attributes from the WC.
When the HW driver provides the required ETH L2 attributes in the WC,
they set the IB_WC_WITH_SMAC and IB_WC_WITH_VLAN flags. The IB core
code checks for the presence of these flags, and in their absence does
address resolution from the ib_init_ah_from_wc() helper function.
ib_modify_qp_is_ok is also updated to consider the link layer. Some
parameters are mandatory for Ethernet link layer, while they are
irrelevant for IB. Vendor drivers are modified to support the new
function signature.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-12-13 00:03:11 +08:00
|
|
|
enum ib_qp_type type, enum ib_qp_attr_mask mask,
|
|
|
|
enum rdma_link_layer ll)
|
2006-02-14 04:48:12 +08:00
|
|
|
{
|
|
|
|
enum ib_qp_attr_mask req_param, opt_param;
|
|
|
|
|
|
|
|
if (cur_state < 0 || cur_state > IB_QPS_ERR ||
|
|
|
|
next_state < 0 || next_state > IB_QPS_ERR)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (mask & IB_QP_CUR_STATE &&
|
|
|
|
cur_state != IB_QPS_RTR && cur_state != IB_QPS_RTS &&
|
|
|
|
cur_state != IB_QPS_SQD && cur_state != IB_QPS_SQE)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (!qp_state_table[cur_state][next_state].valid)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
req_param = qp_state_table[cur_state][next_state].req_param[type];
|
|
|
|
opt_param = qp_state_table[cur_state][next_state].opt_param[type];
|
|
|
|
|
|
|
|
if ((mask & req_param) != req_param)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (mask & ~(req_param | opt_param | IB_QP_STATE))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_modify_qp_is_ok);
|
|
|
|
|
2017-10-16 13:45:13 +08:00
|
|
|
static int ib_resolve_eth_dmac(struct ib_device *device,
|
|
|
|
struct rdma_ah_attr *ah_attr)
|
2013-12-13 00:03:17 +08:00
|
|
|
{
|
|
|
|
int ret = 0;
|
2017-04-30 02:41:28 +08:00
|
|
|
struct ib_global_route *grh;
|
2013-12-13 00:03:17 +08:00
|
|
|
|
2017-04-30 02:41:28 +08:00
|
|
|
if (!rdma_is_port_valid(device, rdma_ah_get_port_num(ah_attr)))
|
2016-11-23 14:23:22 +08:00
|
|
|
return -EINVAL;
|
2015-10-15 23:38:51 +08:00
|
|
|
|
2017-04-30 02:41:29 +08:00
|
|
|
if (ah_attr->type != RDMA_AH_ATTR_TYPE_ROCE)
|
2016-11-23 14:23:22 +08:00
|
|
|
return 0;
|
2015-10-15 23:38:51 +08:00
|
|
|
|
2017-04-30 02:41:28 +08:00
|
|
|
grh = rdma_ah_retrieve_grh(ah_attr);
|
|
|
|
|
2017-06-12 16:14:04 +08:00
|
|
|
if (rdma_is_multicast_addr((struct in6_addr *)ah_attr->grh.dgid.raw)) {
|
|
|
|
if (ipv6_addr_v4mapped((struct in6_addr *)ah_attr->grh.dgid.raw)) {
|
|
|
|
__be32 addr = 0;
|
|
|
|
|
|
|
|
memcpy(&addr, ah_attr->grh.dgid.raw + 12, 4);
|
|
|
|
ip_eth_mc_map(addr, (char *)ah_attr->roce.dmac);
|
|
|
|
} else {
|
|
|
|
ipv6_eth_mc_map((struct in6_addr *)ah_attr->grh.dgid.raw,
|
|
|
|
(char *)ah_attr->roce.dmac);
|
|
|
|
}
|
2016-11-23 14:23:22 +08:00
|
|
|
} else {
|
2017-11-14 20:51:49 +08:00
|
|
|
ret = ib_resolve_unicast_gid_dmac(device, ah_attr);
|
2013-12-13 00:03:17 +08:00
|
|
|
}
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2017-05-23 16:26:08 +08:00
|
|
|
/**
|
|
|
|
* ib_modify_qp_with_udata - Modifies the attributes for the specified QP.
|
|
|
|
* @qp: The QP to modify.
|
|
|
|
* @attr: On input, specifies the QP attributes to modify. On output,
|
|
|
|
* the current values of selected QP attributes are returned.
|
|
|
|
* @attr_mask: A bit-mask used to specify which attributes of the QP
|
|
|
|
* are being modified.
|
|
|
|
* @udata: pointer to user's input output buffer information
|
|
|
|
* are being modified.
|
|
|
|
* It returns 0 on success and returns appropriate error code on error.
|
|
|
|
*/
|
|
|
|
int ib_modify_qp_with_udata(struct ib_qp *qp, struct ib_qp_attr *attr,
|
|
|
|
int attr_mask, struct ib_udata *udata)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2017-11-14 20:51:56 +08:00
|
|
|
u8 port = attr_mask & IB_QP_PORT ? attr->port_num : qp->port;
|
2017-05-23 16:26:08 +08:00
|
|
|
int ret;
|
2013-12-13 00:03:17 +08:00
|
|
|
|
2017-05-23 16:26:08 +08:00
|
|
|
if (attr_mask & IB_QP_AV) {
|
|
|
|
ret = ib_resolve_eth_dmac(qp->device, &attr->ah_attr);
|
2016-11-23 14:23:22 +08:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
}
|
2017-11-14 20:51:56 +08:00
|
|
|
|
|
|
|
if (rdma_ib_or_roce(qp->device, port)) {
|
|
|
|
if (attr_mask & IB_QP_RQ_PSN && attr->rq_psn & ~0xffffff) {
|
|
|
|
pr_warn("%s: %s rq_psn overflow, masking to 24 bits\n",
|
|
|
|
__func__, qp->device->name);
|
|
|
|
attr->rq_psn &= 0xffffff;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (attr_mask & IB_QP_SQ_PSN && attr->sq_psn & ~0xffffff) {
|
|
|
|
pr_warn("%s: %s sq_psn overflow, masking to 24 bits\n",
|
|
|
|
__func__, qp->device->name);
|
|
|
|
attr->sq_psn &= 0xffffff;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-08-23 13:35:40 +08:00
|
|
|
ret = ib_security_modify_qp(qp, attr, attr_mask, udata);
|
|
|
|
if (!ret && (attr_mask & IB_QP_PORT))
|
|
|
|
qp->port = attr->port_num;
|
|
|
|
|
|
|
|
return ret;
|
2017-05-23 16:26:08 +08:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_modify_qp_with_udata);
|
2013-12-13 00:03:17 +08:00
|
|
|
|
2017-06-15 04:13:34 +08:00
|
|
|
int ib_get_eth_speed(struct ib_device *dev, u8 port_num, u8 *speed, u8 *width)
|
|
|
|
{
|
|
|
|
int rc;
|
|
|
|
u32 netdev_speed;
|
|
|
|
struct net_device *netdev;
|
|
|
|
struct ethtool_link_ksettings lksettings;
|
|
|
|
|
|
|
|
if (rdma_port_get_link_layer(dev, port_num) != IB_LINK_LAYER_ETHERNET)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (!dev->get_netdev)
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
|
|
|
|
netdev = dev->get_netdev(dev, port_num);
|
|
|
|
if (!netdev)
|
|
|
|
return -ENODEV;
|
|
|
|
|
|
|
|
rtnl_lock();
|
|
|
|
rc = __ethtool_get_link_ksettings(netdev, &lksettings);
|
|
|
|
rtnl_unlock();
|
|
|
|
|
|
|
|
dev_put(netdev);
|
|
|
|
|
|
|
|
if (!rc) {
|
|
|
|
netdev_speed = lksettings.base.speed;
|
|
|
|
} else {
|
|
|
|
netdev_speed = SPEED_1000;
|
|
|
|
pr_warn("%s speed is unknown, defaulting to %d\n", netdev->name,
|
|
|
|
netdev_speed);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (netdev_speed <= SPEED_1000) {
|
|
|
|
*width = IB_WIDTH_1X;
|
|
|
|
*speed = IB_SPEED_SDR;
|
|
|
|
} else if (netdev_speed <= SPEED_10000) {
|
|
|
|
*width = IB_WIDTH_1X;
|
|
|
|
*speed = IB_SPEED_FDR10;
|
|
|
|
} else if (netdev_speed <= SPEED_20000) {
|
|
|
|
*width = IB_WIDTH_4X;
|
|
|
|
*speed = IB_SPEED_DDR;
|
|
|
|
} else if (netdev_speed <= SPEED_25000) {
|
|
|
|
*width = IB_WIDTH_1X;
|
|
|
|
*speed = IB_SPEED_EDR;
|
|
|
|
} else if (netdev_speed <= SPEED_40000) {
|
|
|
|
*width = IB_WIDTH_4X;
|
|
|
|
*speed = IB_SPEED_FDR10;
|
|
|
|
} else {
|
|
|
|
*width = IB_WIDTH_4X;
|
|
|
|
*speed = IB_SPEED_EDR;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_get_eth_speed);
|
|
|
|
|
2017-05-23 16:26:08 +08:00
|
|
|
int ib_modify_qp(struct ib_qp *qp,
|
|
|
|
struct ib_qp_attr *qp_attr,
|
|
|
|
int qp_attr_mask)
|
|
|
|
{
|
|
|
|
return ib_modify_qp_with_udata(qp, qp_attr, qp_attr_mask, NULL);
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_modify_qp);
|
|
|
|
|
|
|
|
int ib_query_qp(struct ib_qp *qp,
|
|
|
|
struct ib_qp_attr *qp_attr,
|
|
|
|
int qp_attr_mask,
|
|
|
|
struct ib_qp_init_attr *qp_init_attr)
|
|
|
|
{
|
|
|
|
return qp->device->query_qp ?
|
2011-08-09 06:31:51 +08:00
|
|
|
qp->device->query_qp(qp->real_qp, qp_attr, qp_attr_mask, qp_init_attr) :
|
2005-04-17 06:20:36 +08:00
|
|
|
-ENOSYS;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_query_qp);
|
|
|
|
|
2011-08-09 06:31:51 +08:00
|
|
|
int ib_close_qp(struct ib_qp *qp)
|
|
|
|
{
|
|
|
|
struct ib_qp *real_qp;
|
|
|
|
unsigned long flags;
|
|
|
|
|
|
|
|
real_qp = qp->real_qp;
|
|
|
|
if (real_qp == qp)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
spin_lock_irqsave(&real_qp->device->event_handler_lock, flags);
|
|
|
|
list_del(&qp->open_list);
|
|
|
|
spin_unlock_irqrestore(&real_qp->device->event_handler_lock, flags);
|
|
|
|
|
|
|
|
atomic_dec(&real_qp->usecnt);
|
2017-12-24 19:54:58 +08:00
|
|
|
if (qp->qp_sec)
|
|
|
|
ib_close_shared_qp_security(qp->qp_sec);
|
2011-08-09 06:31:51 +08:00
|
|
|
kfree(qp);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_close_qp);
|
|
|
|
|
|
|
|
static int __ib_destroy_shared_qp(struct ib_qp *qp)
|
|
|
|
{
|
|
|
|
struct ib_xrcd *xrcd;
|
|
|
|
struct ib_qp *real_qp;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
real_qp = qp->real_qp;
|
|
|
|
xrcd = real_qp->xrcd;
|
|
|
|
|
|
|
|
mutex_lock(&xrcd->tgt_qp_mutex);
|
|
|
|
ib_close_qp(qp);
|
|
|
|
if (atomic_read(&real_qp->usecnt) == 0)
|
|
|
|
list_del(&real_qp->xrcd_list);
|
|
|
|
else
|
|
|
|
real_qp = NULL;
|
|
|
|
mutex_unlock(&xrcd->tgt_qp_mutex);
|
|
|
|
|
|
|
|
if (real_qp) {
|
|
|
|
ret = ib_destroy_qp(real_qp);
|
|
|
|
if (!ret)
|
|
|
|
atomic_dec(&xrcd->usecnt);
|
|
|
|
else
|
|
|
|
__ib_insert_xrcd_qp(xrcd, real_qp);
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
int ib_destroy_qp(struct ib_qp *qp)
|
|
|
|
{
|
|
|
|
struct ib_pd *pd;
|
|
|
|
struct ib_cq *scq, *rcq;
|
|
|
|
struct ib_srq *srq;
|
2016-05-23 20:20:54 +08:00
|
|
|
struct ib_rwq_ind_table *ind_tbl;
|
IB/core: Enforce PKey security on QPs
Add new LSM hooks to allocate and free security contexts and check for
permission to access a PKey.
Allocate and free a security context when creating and destroying a QP.
This context is used for controlling access to PKeys.
When a request is made to modify a QP that changes the port, PKey index,
or alternate path, check that the QP has permission for the PKey in the
PKey table index on the subnet prefix of the port. If the QP is shared
make sure all handles to the QP also have access.
Store which port and PKey index a QP is using. After the reset to init
transition the user can modify the port, PKey index and alternate path
independently. So port and PKey settings changes can be a merge of the
previous settings and the new ones.
In order to maintain access control if there are PKey table or subnet
prefix change keep a list of all QPs are using each PKey index on
each port. If a change occurs all QPs using that device and port must
have access enforced for the new cache settings.
These changes add a transaction to the QP modify process. Association
with the old port and PKey index must be maintained if the modify fails,
and must be removed if it succeeds. Association with the new port and
PKey index must be established prior to the modify and removed if the
modify fails.
1. When a QP is modified to a particular Port, PKey index or alternate
path insert that QP into the appropriate lists.
2. Check permission to access the new settings.
3. If step 2 grants access attempt to modify the QP.
4a. If steps 2 and 3 succeed remove any prior associations.
4b. If ether fails remove the new setting associations.
If a PKey table or subnet prefix changes walk the list of QPs and
check that they have permission. If not send the QP to the error state
and raise a fatal error event. If it's a shared QP make sure all the
QPs that share the real_qp have permission as well. If the QP that
owns a security structure is denied access the security structure is
marked as such and the QP is added to an error_list. Once the moving
the QP to error is complete the security structure mark is cleared.
Maintaining the lists correctly turns QP destroy into a transaction.
The hardware driver for the device frees the ib_qp structure, so while
the destroy is in progress the ib_qp pointer in the ib_qp_security
struct is undefined. When the destroy process begins the ib_qp_security
structure is marked as destroying. This prevents any action from being
taken on the QP pointer. After the QP is destroyed successfully it
could still listed on an error_list wait for it to be processed by that
flow before cleaning up the structure.
If the destroy fails the QPs port and PKey settings are reinserted into
the appropriate lists, the destroying flag is cleared, and access control
is enforced, in case there were any cache changes during the destroy
flow.
To keep the security changes isolated a new file is used to hold security
related functionality.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Acked-by: Doug Ledford <dledford@redhat.com>
[PM: merge fixup in ib_verbs.h and uverbs_cmd.c]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-19 20:48:52 +08:00
|
|
|
struct ib_qp_security *sec;
|
2005-04-17 06:20:36 +08:00
|
|
|
int ret;
|
|
|
|
|
2016-05-04 00:01:07 +08:00
|
|
|
WARN_ON_ONCE(qp->mrs_used > 0);
|
|
|
|
|
2011-08-09 06:31:51 +08:00
|
|
|
if (atomic_read(&qp->usecnt))
|
|
|
|
return -EBUSY;
|
|
|
|
|
|
|
|
if (qp->real_qp != qp)
|
|
|
|
return __ib_destroy_shared_qp(qp);
|
|
|
|
|
2011-05-24 10:59:25 +08:00
|
|
|
pd = qp->pd;
|
|
|
|
scq = qp->send_cq;
|
|
|
|
rcq = qp->recv_cq;
|
|
|
|
srq = qp->srq;
|
2016-05-23 20:20:54 +08:00
|
|
|
ind_tbl = qp->rwq_ind_tbl;
|
IB/core: Enforce PKey security on QPs
Add new LSM hooks to allocate and free security contexts and check for
permission to access a PKey.
Allocate and free a security context when creating and destroying a QP.
This context is used for controlling access to PKeys.
When a request is made to modify a QP that changes the port, PKey index,
or alternate path, check that the QP has permission for the PKey in the
PKey table index on the subnet prefix of the port. If the QP is shared
make sure all handles to the QP also have access.
Store which port and PKey index a QP is using. After the reset to init
transition the user can modify the port, PKey index and alternate path
independently. So port and PKey settings changes can be a merge of the
previous settings and the new ones.
In order to maintain access control if there are PKey table or subnet
prefix change keep a list of all QPs are using each PKey index on
each port. If a change occurs all QPs using that device and port must
have access enforced for the new cache settings.
These changes add a transaction to the QP modify process. Association
with the old port and PKey index must be maintained if the modify fails,
and must be removed if it succeeds. Association with the new port and
PKey index must be established prior to the modify and removed if the
modify fails.
1. When a QP is modified to a particular Port, PKey index or alternate
path insert that QP into the appropriate lists.
2. Check permission to access the new settings.
3. If step 2 grants access attempt to modify the QP.
4a. If steps 2 and 3 succeed remove any prior associations.
4b. If ether fails remove the new setting associations.
If a PKey table or subnet prefix changes walk the list of QPs and
check that they have permission. If not send the QP to the error state
and raise a fatal error event. If it's a shared QP make sure all the
QPs that share the real_qp have permission as well. If the QP that
owns a security structure is denied access the security structure is
marked as such and the QP is added to an error_list. Once the moving
the QP to error is complete the security structure mark is cleared.
Maintaining the lists correctly turns QP destroy into a transaction.
The hardware driver for the device frees the ib_qp structure, so while
the destroy is in progress the ib_qp pointer in the ib_qp_security
struct is undefined. When the destroy process begins the ib_qp_security
structure is marked as destroying. This prevents any action from being
taken on the QP pointer. After the QP is destroyed successfully it
could still listed on an error_list wait for it to be processed by that
flow before cleaning up the structure.
If the destroy fails the QPs port and PKey settings are reinserted into
the appropriate lists, the destroying flag is cleared, and access control
is enforced, in case there were any cache changes during the destroy
flow.
To keep the security changes isolated a new file is used to hold security
related functionality.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Acked-by: Doug Ledford <dledford@redhat.com>
[PM: merge fixup in ib_verbs.h and uverbs_cmd.c]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-19 20:48:52 +08:00
|
|
|
sec = qp->qp_sec;
|
|
|
|
if (sec)
|
|
|
|
ib_destroy_qp_security_begin(sec);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
2016-05-04 00:01:09 +08:00
|
|
|
if (!qp->uobject)
|
|
|
|
rdma_rw_cleanup_mrs(qp);
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
ret = qp->device->destroy_qp(qp);
|
|
|
|
if (!ret) {
|
2011-05-24 10:59:25 +08:00
|
|
|
if (pd)
|
|
|
|
atomic_dec(&pd->usecnt);
|
|
|
|
if (scq)
|
|
|
|
atomic_dec(&scq->usecnt);
|
|
|
|
if (rcq)
|
|
|
|
atomic_dec(&rcq->usecnt);
|
2005-04-17 06:20:36 +08:00
|
|
|
if (srq)
|
|
|
|
atomic_dec(&srq->usecnt);
|
2016-05-23 20:20:54 +08:00
|
|
|
if (ind_tbl)
|
|
|
|
atomic_dec(&ind_tbl->usecnt);
|
IB/core: Enforce PKey security on QPs
Add new LSM hooks to allocate and free security contexts and check for
permission to access a PKey.
Allocate and free a security context when creating and destroying a QP.
This context is used for controlling access to PKeys.
When a request is made to modify a QP that changes the port, PKey index,
or alternate path, check that the QP has permission for the PKey in the
PKey table index on the subnet prefix of the port. If the QP is shared
make sure all handles to the QP also have access.
Store which port and PKey index a QP is using. After the reset to init
transition the user can modify the port, PKey index and alternate path
independently. So port and PKey settings changes can be a merge of the
previous settings and the new ones.
In order to maintain access control if there are PKey table or subnet
prefix change keep a list of all QPs are using each PKey index on
each port. If a change occurs all QPs using that device and port must
have access enforced for the new cache settings.
These changes add a transaction to the QP modify process. Association
with the old port and PKey index must be maintained if the modify fails,
and must be removed if it succeeds. Association with the new port and
PKey index must be established prior to the modify and removed if the
modify fails.
1. When a QP is modified to a particular Port, PKey index or alternate
path insert that QP into the appropriate lists.
2. Check permission to access the new settings.
3. If step 2 grants access attempt to modify the QP.
4a. If steps 2 and 3 succeed remove any prior associations.
4b. If ether fails remove the new setting associations.
If a PKey table or subnet prefix changes walk the list of QPs and
check that they have permission. If not send the QP to the error state
and raise a fatal error event. If it's a shared QP make sure all the
QPs that share the real_qp have permission as well. If the QP that
owns a security structure is denied access the security structure is
marked as such and the QP is added to an error_list. Once the moving
the QP to error is complete the security structure mark is cleared.
Maintaining the lists correctly turns QP destroy into a transaction.
The hardware driver for the device frees the ib_qp structure, so while
the destroy is in progress the ib_qp pointer in the ib_qp_security
struct is undefined. When the destroy process begins the ib_qp_security
structure is marked as destroying. This prevents any action from being
taken on the QP pointer. After the QP is destroyed successfully it
could still listed on an error_list wait for it to be processed by that
flow before cleaning up the structure.
If the destroy fails the QPs port and PKey settings are reinserted into
the appropriate lists, the destroying flag is cleared, and access control
is enforced, in case there were any cache changes during the destroy
flow.
To keep the security changes isolated a new file is used to hold security
related functionality.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Acked-by: Doug Ledford <dledford@redhat.com>
[PM: merge fixup in ib_verbs.h and uverbs_cmd.c]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-19 20:48:52 +08:00
|
|
|
if (sec)
|
|
|
|
ib_destroy_qp_security_end(sec);
|
|
|
|
} else {
|
|
|
|
if (sec)
|
|
|
|
ib_destroy_qp_security_abort(sec);
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_destroy_qp);
|
|
|
|
|
|
|
|
/* Completion queues */
|
|
|
|
|
|
|
|
struct ib_cq *ib_create_cq(struct ib_device *device,
|
|
|
|
ib_comp_handler comp_handler,
|
|
|
|
void (*event_handler)(struct ib_event *, void *),
|
2015-06-11 21:35:21 +08:00
|
|
|
void *cq_context,
|
|
|
|
const struct ib_cq_init_attr *cq_attr)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
|
|
|
struct ib_cq *cq;
|
|
|
|
|
2015-06-11 21:35:21 +08:00
|
|
|
cq = device->create_cq(device, cq_attr, NULL, NULL);
|
2005-04-17 06:20:36 +08:00
|
|
|
|
|
|
|
if (!IS_ERR(cq)) {
|
|
|
|
cq->device = device;
|
2005-07-08 08:57:11 +08:00
|
|
|
cq->uobject = NULL;
|
2005-04-17 06:20:36 +08:00
|
|
|
cq->comp_handler = comp_handler;
|
|
|
|
cq->event_handler = event_handler;
|
|
|
|
cq->cq_context = cq_context;
|
|
|
|
atomic_set(&cq->usecnt, 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
return cq;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_create_cq);
|
|
|
|
|
2017-11-13 16:51:19 +08:00
|
|
|
int rdma_set_cq_moderation(struct ib_cq *cq, u16 cq_count, u16 cq_period)
|
2008-04-17 12:09:33 +08:00
|
|
|
{
|
|
|
|
return cq->device->modify_cq ?
|
|
|
|
cq->device->modify_cq(cq, cq_count, cq_period) : -ENOSYS;
|
|
|
|
}
|
2017-11-13 16:51:19 +08:00
|
|
|
EXPORT_SYMBOL(rdma_set_cq_moderation);
|
2008-04-17 12:09:33 +08:00
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
int ib_destroy_cq(struct ib_cq *cq)
|
|
|
|
{
|
|
|
|
if (atomic_read(&cq->usecnt))
|
|
|
|
return -EBUSY;
|
|
|
|
|
|
|
|
return cq->device->destroy_cq(cq);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_destroy_cq);
|
|
|
|
|
2006-02-14 08:30:49 +08:00
|
|
|
int ib_resize_cq(struct ib_cq *cq, int cqe)
|
2005-04-17 06:20:36 +08:00
|
|
|
{
|
2005-11-09 03:10:25 +08:00
|
|
|
return cq->device->resize_cq ?
|
2006-01-31 06:29:21 +08:00
|
|
|
cq->device->resize_cq(cq, cqe, NULL) : -ENOSYS;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_resize_cq);
|
|
|
|
|
|
|
|
/* Memory regions */
|
|
|
|
|
|
|
|
int ib_dereg_mr(struct ib_mr *mr)
|
|
|
|
{
|
2015-12-24 02:12:54 +08:00
|
|
|
struct ib_pd *pd = mr->pd;
|
2005-04-17 06:20:36 +08:00
|
|
|
int ret;
|
|
|
|
|
|
|
|
ret = mr->device->dereg_mr(mr);
|
|
|
|
if (!ret)
|
|
|
|
atomic_dec(&pd->usecnt);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_dereg_mr);
|
|
|
|
|
2015-07-30 15:32:35 +08:00
|
|
|
/**
|
|
|
|
* ib_alloc_mr() - Allocates a memory region
|
|
|
|
* @pd: protection domain associated with the region
|
|
|
|
* @mr_type: memory region type
|
|
|
|
* @max_num_sg: maximum sg entries available for registration.
|
|
|
|
*
|
|
|
|
* Notes:
|
|
|
|
* Memory registeration page/sg lists must not exceed max_num_sg.
|
|
|
|
* For mr_type IB_MR_TYPE_MEM_REG, the total length cannot exceed
|
|
|
|
* max_num_sg * used_page_size.
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
struct ib_mr *ib_alloc_mr(struct ib_pd *pd,
|
|
|
|
enum ib_mr_type mr_type,
|
|
|
|
u32 max_num_sg)
|
2008-07-15 14:48:45 +08:00
|
|
|
{
|
|
|
|
struct ib_mr *mr;
|
|
|
|
|
2015-07-30 15:32:48 +08:00
|
|
|
if (!pd->device->alloc_mr)
|
2008-07-15 14:48:45 +08:00
|
|
|
return ERR_PTR(-ENOSYS);
|
|
|
|
|
2015-07-30 15:32:48 +08:00
|
|
|
mr = pd->device->alloc_mr(pd, mr_type, max_num_sg);
|
2008-07-15 14:48:45 +08:00
|
|
|
if (!IS_ERR(mr)) {
|
|
|
|
mr->device = pd->device;
|
|
|
|
mr->pd = pd;
|
|
|
|
mr->uobject = NULL;
|
|
|
|
atomic_inc(&pd->usecnt);
|
2016-05-04 00:01:08 +08:00
|
|
|
mr->need_inval = false;
|
2008-07-15 14:48:45 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
return mr;
|
|
|
|
}
|
2015-07-30 15:32:48 +08:00
|
|
|
EXPORT_SYMBOL(ib_alloc_mr);
|
2008-07-15 14:48:45 +08:00
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
/* "Fast" memory regions */
|
|
|
|
|
|
|
|
struct ib_fmr *ib_alloc_fmr(struct ib_pd *pd,
|
|
|
|
int mr_access_flags,
|
|
|
|
struct ib_fmr_attr *fmr_attr)
|
|
|
|
{
|
|
|
|
struct ib_fmr *fmr;
|
|
|
|
|
|
|
|
if (!pd->device->alloc_fmr)
|
|
|
|
return ERR_PTR(-ENOSYS);
|
|
|
|
|
|
|
|
fmr = pd->device->alloc_fmr(pd, mr_access_flags, fmr_attr);
|
|
|
|
if (!IS_ERR(fmr)) {
|
|
|
|
fmr->device = pd->device;
|
|
|
|
fmr->pd = pd;
|
|
|
|
atomic_inc(&pd->usecnt);
|
|
|
|
}
|
|
|
|
|
|
|
|
return fmr;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_alloc_fmr);
|
|
|
|
|
|
|
|
int ib_unmap_fmr(struct list_head *fmr_list)
|
|
|
|
{
|
|
|
|
struct ib_fmr *fmr;
|
|
|
|
|
|
|
|
if (list_empty(fmr_list))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
fmr = list_entry(fmr_list->next, struct ib_fmr, list);
|
|
|
|
return fmr->device->unmap_fmr(fmr_list);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_unmap_fmr);
|
|
|
|
|
|
|
|
int ib_dealloc_fmr(struct ib_fmr *fmr)
|
|
|
|
{
|
|
|
|
struct ib_pd *pd;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
pd = fmr->pd;
|
|
|
|
ret = fmr->device->dealloc_fmr(fmr);
|
|
|
|
if (!ret)
|
|
|
|
atomic_dec(&pd->usecnt);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_dealloc_fmr);
|
|
|
|
|
|
|
|
/* Multicast groups */
|
|
|
|
|
2017-06-12 16:14:02 +08:00
|
|
|
static bool is_valid_mcast_lid(struct ib_qp *qp, u16 lid)
|
|
|
|
{
|
|
|
|
struct ib_qp_init_attr init_attr = {};
|
|
|
|
struct ib_qp_attr attr = {};
|
|
|
|
int num_eth_ports = 0;
|
|
|
|
int port;
|
|
|
|
|
|
|
|
/* If QP state >= init, it is assigned to a port and we can check this
|
|
|
|
* port only.
|
|
|
|
*/
|
|
|
|
if (!ib_query_qp(qp, &attr, IB_QP_STATE | IB_QP_PORT, &init_attr)) {
|
|
|
|
if (attr.qp_state >= IB_QPS_INIT) {
|
2017-09-01 00:30:34 +08:00
|
|
|
if (rdma_port_get_link_layer(qp->device, attr.port_num) !=
|
2017-06-12 16:14:02 +08:00
|
|
|
IB_LINK_LAYER_INFINIBAND)
|
|
|
|
return true;
|
|
|
|
goto lid_check;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Can't get a quick answer, iterate over all ports */
|
|
|
|
for (port = 0; port < qp->device->phys_port_cnt; port++)
|
2017-09-01 00:30:34 +08:00
|
|
|
if (rdma_port_get_link_layer(qp->device, port) !=
|
2017-06-12 16:14:02 +08:00
|
|
|
IB_LINK_LAYER_INFINIBAND)
|
|
|
|
num_eth_ports++;
|
|
|
|
|
|
|
|
/* If we have at lease one Ethernet port, RoCE annex declares that
|
|
|
|
* multicast LID should be ignored. We can't tell at this step if the
|
|
|
|
* QP belongs to an IB or Ethernet port.
|
|
|
|
*/
|
|
|
|
if (num_eth_ports)
|
|
|
|
return true;
|
|
|
|
|
|
|
|
/* If all the ports are IB, we can check according to IB spec. */
|
|
|
|
lid_check:
|
|
|
|
return !(lid < be16_to_cpu(IB_MULTICAST_LID_BASE) ||
|
|
|
|
lid == be16_to_cpu(IB_LID_PERMISSIVE));
|
|
|
|
}
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
int ib_attach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid)
|
|
|
|
{
|
2012-04-29 22:04:22 +08:00
|
|
|
int ret;
|
|
|
|
|
2005-09-27 02:47:53 +08:00
|
|
|
if (!qp->device->attach_mcast)
|
|
|
|
return -ENOSYS;
|
2017-06-12 16:14:03 +08:00
|
|
|
|
|
|
|
if (!rdma_is_multicast_addr((struct in6_addr *)gid->raw) ||
|
|
|
|
qp->qp_type != IB_QPT_UD || !is_valid_mcast_lid(qp, lid))
|
2005-09-27 02:47:53 +08:00
|
|
|
return -EINVAL;
|
|
|
|
|
2012-04-29 22:04:22 +08:00
|
|
|
ret = qp->device->attach_mcast(qp, gid, lid);
|
|
|
|
if (!ret)
|
|
|
|
atomic_inc(&qp->usecnt);
|
|
|
|
return ret;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_attach_mcast);
|
|
|
|
|
|
|
|
int ib_detach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid)
|
|
|
|
{
|
2012-04-29 22:04:22 +08:00
|
|
|
int ret;
|
|
|
|
|
2005-09-27 02:47:53 +08:00
|
|
|
if (!qp->device->detach_mcast)
|
|
|
|
return -ENOSYS;
|
2017-06-12 16:14:03 +08:00
|
|
|
|
|
|
|
if (!rdma_is_multicast_addr((struct in6_addr *)gid->raw) ||
|
|
|
|
qp->qp_type != IB_QPT_UD || !is_valid_mcast_lid(qp, lid))
|
2005-09-27 02:47:53 +08:00
|
|
|
return -EINVAL;
|
|
|
|
|
2012-04-29 22:04:22 +08:00
|
|
|
ret = qp->device->detach_mcast(qp, gid, lid);
|
|
|
|
if (!ret)
|
|
|
|
atomic_dec(&qp->usecnt);
|
|
|
|
return ret;
|
2005-04-17 06:20:36 +08:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_detach_mcast);
|
2011-05-24 08:52:46 +08:00
|
|
|
|
|
|
|
struct ib_xrcd *ib_alloc_xrcd(struct ib_device *device)
|
|
|
|
{
|
|
|
|
struct ib_xrcd *xrcd;
|
|
|
|
|
|
|
|
if (!device->alloc_xrcd)
|
|
|
|
return ERR_PTR(-ENOSYS);
|
|
|
|
|
|
|
|
xrcd = device->alloc_xrcd(device, NULL, NULL);
|
|
|
|
if (!IS_ERR(xrcd)) {
|
|
|
|
xrcd->device = device;
|
2011-05-24 23:33:46 +08:00
|
|
|
xrcd->inode = NULL;
|
2011-05-24 08:52:46 +08:00
|
|
|
atomic_set(&xrcd->usecnt, 0);
|
2011-05-27 14:06:44 +08:00
|
|
|
mutex_init(&xrcd->tgt_qp_mutex);
|
|
|
|
INIT_LIST_HEAD(&xrcd->tgt_qp_list);
|
2011-05-24 08:52:46 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
return xrcd;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_alloc_xrcd);
|
|
|
|
|
|
|
|
int ib_dealloc_xrcd(struct ib_xrcd *xrcd)
|
|
|
|
{
|
2011-05-27 14:06:44 +08:00
|
|
|
struct ib_qp *qp;
|
|
|
|
int ret;
|
|
|
|
|
2011-05-24 08:52:46 +08:00
|
|
|
if (atomic_read(&xrcd->usecnt))
|
|
|
|
return -EBUSY;
|
|
|
|
|
2011-05-27 14:06:44 +08:00
|
|
|
while (!list_empty(&xrcd->tgt_qp_list)) {
|
|
|
|
qp = list_entry(xrcd->tgt_qp_list.next, struct ib_qp, xrcd_list);
|
|
|
|
ret = ib_destroy_qp(qp);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2011-05-24 08:52:46 +08:00
|
|
|
return xrcd->device->dealloc_xrcd(xrcd);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_dealloc_xrcd);
|
IB/core: Add receive flow steering support
The RDMA stack allows for applications to create IB_QPT_RAW_PACKET
QPs, which receive plain Ethernet packets, specifically packets that
don't carry any QPN to be matched by the receiving side. Applications
using these QPs must be provided with a method to program some
steering rule with the HW so packets arriving at the local port can be
routed to them.
This patch adds ib_create_flow(), which allow providing a flow
specification for a QP. When there's a match between the
specification and a received packet, the packet is forwarded to that
QP, in a the same way one uses ib_attach_multicast() for IB UD
multicast handling.
Flow specifications are provided as instances of struct ib_flow_spec_yyy,
which describe L2, L3 and L4 headers. Currently specs for Ethernet, IPv4,
TCP and UDP are defined. Flow specs are made of values and masks.
The input to ib_create_flow() is a struct ib_flow_attr, which contains
a few mandatory control elements and optional flow specs.
struct ib_flow_attr {
enum ib_flow_attr_type type;
u16 size;
u16 priority;
u32 flags;
u8 num_of_specs;
u8 port;
/* Following are the optional layers according to user request
* struct ib_flow_spec_yyy
* struct ib_flow_spec_zzz
*/
};
As these specs are eventually coming from user space, they are defined and
used in a way which allows adding new spec types without kernel/user ABI
change, just with a little API enhancement which defines the newly added spec.
The flow spec structures are defined with TLV (Type-Length-Value)
entries, which allows calling ib_create_flow() with a list of variable
length of optional specs.
For the actual processing of ib_flow_attr the driver uses the number
of specs and the size mandatory fields along with the TLV nature of
the specs.
Steering rules processing order is according to the domain over which
the rule is set and the rule priority. All rules set by user space
applicatations fall into the IB_FLOW_DOMAIN_USER domain, other domains
could be used by future IPoIB RFS and Ethetool flow-steering interface
implementation. Lower numerical value for the priority field means
higher priority.
The returned value from ib_create_flow() is a struct ib_flow, which
contains a database pointer (handle) provided by the HW driver to be
used when calling ib_destroy_flow().
Applications that offload TCP/IP traffic can also be written over IB
UD QPs. The ib_create_flow() / ib_destroy_flow() API is designed to
support UD QPs too. A HW driver can set IB_DEVICE_MANAGED_FLOW_STEERING
to denote support for flow steering.
The ib_flow_attr enum type supports usage of flow steering for promiscuous
and sniffer purposes:
IB_FLOW_ATTR_NORMAL - "regular" rule, steering according to rule specification
IB_FLOW_ATTR_ALL_DEFAULT - default unicast and multicast rule, receive
all Ethernet traffic which isn't steered to any QP
IB_FLOW_ATTR_MC_DEFAULT - same as IB_FLOW_ATTR_ALL_DEFAULT but only for multicast
IB_FLOW_ATTR_SNIFFER - sniffer rule, receive all port traffic
ALL_DEFAULT and MC_DEFAULT rules options are valid only for Ethernet link type.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-07 19:01:59 +08:00
|
|
|
|
IB/core: Introduce Work Queue object and its verbs
Introduce Work Queue object and its create/destroy/modify verbs.
QP can be created without internal WQs "packaged" inside it,
this QP can be configured to use "external" WQ object as its
receive/send queue.
WQ is a necessary component for RSS technology since RSS mechanism
is supposed to distribute the traffic between multiple
Receive Work Queues.
WQ associated (many to one) with Completion Queue and it owns WQ
properties (PD, WQ size, etc.).
WQ has a type, this patch introduces the IB_WQT_RQ (i.e.receive queue),
it may be extend to others such as IB_WQT_SQ. (send queue).
WQ from type IB_WQT_RQ contains receive work requests.
PD is an attribute of a work queue (i.e. send/receive queue), it's used
by the hardware for security validation before scattering to a memory
region which is pointed by the WQ. For that, an external WQ object
needs a PD, letting the hardware makes that validation.
When accessing a memory region that is pointed by the WQ its PD
is used and not the QP's PD, this behavior is similar
to a SRQ and a QP.
WQ context is subject to a well-defined state transitions done by
the modify_wq verb.
When WQ is created its initial state becomes IB_WQS_RESET.
>From IB_WQS_RESET it can be modified to itself or to IB_WQS_RDY.
>From IB_WQS_RDY it can be modified to itself, to IB_WQS_RESET
or to IB_WQS_ERR.
>From IB_WQS_ERR it can be modified to IB_WQS_RESET.
Note: transition to IB_WQS_ERR might occur implicitly in case there
was some HW error.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-23 20:20:48 +08:00
|
|
|
/**
|
|
|
|
* ib_create_wq - Creates a WQ associated with the specified protection
|
|
|
|
* domain.
|
|
|
|
* @pd: The protection domain associated with the WQ.
|
|
|
|
* @wq_init_attr: A list of initial attributes required to create the
|
|
|
|
* WQ. If WQ creation succeeds, then the attributes are updated to
|
|
|
|
* the actual capabilities of the created WQ.
|
|
|
|
*
|
|
|
|
* wq_init_attr->max_wr and wq_init_attr->max_sge determine
|
|
|
|
* the requested size of the WQ, and set to the actual values allocated
|
|
|
|
* on return.
|
|
|
|
* If ib_create_wq() succeeds, then max_wr and max_sge will always be
|
|
|
|
* at least as large as the requested values.
|
|
|
|
*/
|
|
|
|
struct ib_wq *ib_create_wq(struct ib_pd *pd,
|
|
|
|
struct ib_wq_init_attr *wq_attr)
|
|
|
|
{
|
|
|
|
struct ib_wq *wq;
|
|
|
|
|
|
|
|
if (!pd->device->create_wq)
|
|
|
|
return ERR_PTR(-ENOSYS);
|
|
|
|
|
|
|
|
wq = pd->device->create_wq(pd, wq_attr, NULL);
|
|
|
|
if (!IS_ERR(wq)) {
|
|
|
|
wq->event_handler = wq_attr->event_handler;
|
|
|
|
wq->wq_context = wq_attr->wq_context;
|
|
|
|
wq->wq_type = wq_attr->wq_type;
|
|
|
|
wq->cq = wq_attr->cq;
|
|
|
|
wq->device = pd->device;
|
|
|
|
wq->pd = pd;
|
|
|
|
wq->uobject = NULL;
|
|
|
|
atomic_inc(&pd->usecnt);
|
|
|
|
atomic_inc(&wq_attr->cq->usecnt);
|
|
|
|
atomic_set(&wq->usecnt, 0);
|
|
|
|
}
|
|
|
|
return wq;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_create_wq);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* ib_destroy_wq - Destroys the specified WQ.
|
|
|
|
* @wq: The WQ to destroy.
|
|
|
|
*/
|
|
|
|
int ib_destroy_wq(struct ib_wq *wq)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
struct ib_cq *cq = wq->cq;
|
|
|
|
struct ib_pd *pd = wq->pd;
|
|
|
|
|
|
|
|
if (atomic_read(&wq->usecnt))
|
|
|
|
return -EBUSY;
|
|
|
|
|
|
|
|
err = wq->device->destroy_wq(wq);
|
|
|
|
if (!err) {
|
|
|
|
atomic_dec(&pd->usecnt);
|
|
|
|
atomic_dec(&cq->usecnt);
|
|
|
|
}
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_destroy_wq);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* ib_modify_wq - Modifies the specified WQ.
|
|
|
|
* @wq: The WQ to modify.
|
|
|
|
* @wq_attr: On input, specifies the WQ attributes to modify.
|
|
|
|
* @wq_attr_mask: A bit-mask used to specify which attributes of the WQ
|
|
|
|
* are being modified.
|
|
|
|
* On output, the current values of selected WQ attributes are returned.
|
|
|
|
*/
|
|
|
|
int ib_modify_wq(struct ib_wq *wq, struct ib_wq_attr *wq_attr,
|
|
|
|
u32 wq_attr_mask)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
|
|
|
|
if (!wq->device->modify_wq)
|
|
|
|
return -ENOSYS;
|
|
|
|
|
|
|
|
err = wq->device->modify_wq(wq, wq_attr, wq_attr_mask, NULL);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_modify_wq);
|
|
|
|
|
2016-05-23 20:20:51 +08:00
|
|
|
/*
|
|
|
|
* ib_create_rwq_ind_table - Creates a RQ Indirection Table.
|
|
|
|
* @device: The device on which to create the rwq indirection table.
|
|
|
|
* @ib_rwq_ind_table_init_attr: A list of initial attributes required to
|
|
|
|
* create the Indirection Table.
|
|
|
|
*
|
|
|
|
* Note: The life time of ib_rwq_ind_table_init_attr->ind_tbl is not less
|
|
|
|
* than the created ib_rwq_ind_table object and the caller is responsible
|
|
|
|
* for its memory allocation/free.
|
|
|
|
*/
|
|
|
|
struct ib_rwq_ind_table *ib_create_rwq_ind_table(struct ib_device *device,
|
|
|
|
struct ib_rwq_ind_table_init_attr *init_attr)
|
|
|
|
{
|
|
|
|
struct ib_rwq_ind_table *rwq_ind_table;
|
|
|
|
int i;
|
|
|
|
u32 table_size;
|
|
|
|
|
|
|
|
if (!device->create_rwq_ind_table)
|
|
|
|
return ERR_PTR(-ENOSYS);
|
|
|
|
|
|
|
|
table_size = (1 << init_attr->log_ind_tbl_size);
|
|
|
|
rwq_ind_table = device->create_rwq_ind_table(device,
|
|
|
|
init_attr, NULL);
|
|
|
|
if (IS_ERR(rwq_ind_table))
|
|
|
|
return rwq_ind_table;
|
|
|
|
|
|
|
|
rwq_ind_table->ind_tbl = init_attr->ind_tbl;
|
|
|
|
rwq_ind_table->log_ind_tbl_size = init_attr->log_ind_tbl_size;
|
|
|
|
rwq_ind_table->device = device;
|
|
|
|
rwq_ind_table->uobject = NULL;
|
|
|
|
atomic_set(&rwq_ind_table->usecnt, 0);
|
|
|
|
|
|
|
|
for (i = 0; i < table_size; i++)
|
|
|
|
atomic_inc(&rwq_ind_table->ind_tbl[i]->usecnt);
|
|
|
|
|
|
|
|
return rwq_ind_table;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_create_rwq_ind_table);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* ib_destroy_rwq_ind_table - Destroys the specified Indirection Table.
|
|
|
|
* @wq_ind_table: The Indirection Table to destroy.
|
|
|
|
*/
|
|
|
|
int ib_destroy_rwq_ind_table(struct ib_rwq_ind_table *rwq_ind_table)
|
|
|
|
{
|
|
|
|
int err, i;
|
|
|
|
u32 table_size = (1 << rwq_ind_table->log_ind_tbl_size);
|
|
|
|
struct ib_wq **ind_tbl = rwq_ind_table->ind_tbl;
|
|
|
|
|
|
|
|
if (atomic_read(&rwq_ind_table->usecnt))
|
|
|
|
return -EBUSY;
|
|
|
|
|
|
|
|
err = rwq_ind_table->device->destroy_rwq_ind_table(rwq_ind_table);
|
|
|
|
if (!err) {
|
|
|
|
for (i = 0; i < table_size; i++)
|
|
|
|
atomic_dec(&ind_tbl[i]->usecnt);
|
|
|
|
}
|
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_destroy_rwq_ind_table);
|
|
|
|
|
IB/core: Add receive flow steering support
The RDMA stack allows for applications to create IB_QPT_RAW_PACKET
QPs, which receive plain Ethernet packets, specifically packets that
don't carry any QPN to be matched by the receiving side. Applications
using these QPs must be provided with a method to program some
steering rule with the HW so packets arriving at the local port can be
routed to them.
This patch adds ib_create_flow(), which allow providing a flow
specification for a QP. When there's a match between the
specification and a received packet, the packet is forwarded to that
QP, in a the same way one uses ib_attach_multicast() for IB UD
multicast handling.
Flow specifications are provided as instances of struct ib_flow_spec_yyy,
which describe L2, L3 and L4 headers. Currently specs for Ethernet, IPv4,
TCP and UDP are defined. Flow specs are made of values and masks.
The input to ib_create_flow() is a struct ib_flow_attr, which contains
a few mandatory control elements and optional flow specs.
struct ib_flow_attr {
enum ib_flow_attr_type type;
u16 size;
u16 priority;
u32 flags;
u8 num_of_specs;
u8 port;
/* Following are the optional layers according to user request
* struct ib_flow_spec_yyy
* struct ib_flow_spec_zzz
*/
};
As these specs are eventually coming from user space, they are defined and
used in a way which allows adding new spec types without kernel/user ABI
change, just with a little API enhancement which defines the newly added spec.
The flow spec structures are defined with TLV (Type-Length-Value)
entries, which allows calling ib_create_flow() with a list of variable
length of optional specs.
For the actual processing of ib_flow_attr the driver uses the number
of specs and the size mandatory fields along with the TLV nature of
the specs.
Steering rules processing order is according to the domain over which
the rule is set and the rule priority. All rules set by user space
applicatations fall into the IB_FLOW_DOMAIN_USER domain, other domains
could be used by future IPoIB RFS and Ethetool flow-steering interface
implementation. Lower numerical value for the priority field means
higher priority.
The returned value from ib_create_flow() is a struct ib_flow, which
contains a database pointer (handle) provided by the HW driver to be
used when calling ib_destroy_flow().
Applications that offload TCP/IP traffic can also be written over IB
UD QPs. The ib_create_flow() / ib_destroy_flow() API is designed to
support UD QPs too. A HW driver can set IB_DEVICE_MANAGED_FLOW_STEERING
to denote support for flow steering.
The ib_flow_attr enum type supports usage of flow steering for promiscuous
and sniffer purposes:
IB_FLOW_ATTR_NORMAL - "regular" rule, steering according to rule specification
IB_FLOW_ATTR_ALL_DEFAULT - default unicast and multicast rule, receive
all Ethernet traffic which isn't steered to any QP
IB_FLOW_ATTR_MC_DEFAULT - same as IB_FLOW_ATTR_ALL_DEFAULT but only for multicast
IB_FLOW_ATTR_SNIFFER - sniffer rule, receive all port traffic
ALL_DEFAULT and MC_DEFAULT rules options are valid only for Ethernet link type.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-07 19:01:59 +08:00
|
|
|
struct ib_flow *ib_create_flow(struct ib_qp *qp,
|
|
|
|
struct ib_flow_attr *flow_attr,
|
|
|
|
int domain)
|
|
|
|
{
|
|
|
|
struct ib_flow *flow_id;
|
|
|
|
if (!qp->device->create_flow)
|
|
|
|
return ERR_PTR(-ENOSYS);
|
|
|
|
|
|
|
|
flow_id = qp->device->create_flow(qp, flow_attr, domain);
|
2016-10-27 21:36:30 +08:00
|
|
|
if (!IS_ERR(flow_id)) {
|
IB/core: Add receive flow steering support
The RDMA stack allows for applications to create IB_QPT_RAW_PACKET
QPs, which receive plain Ethernet packets, specifically packets that
don't carry any QPN to be matched by the receiving side. Applications
using these QPs must be provided with a method to program some
steering rule with the HW so packets arriving at the local port can be
routed to them.
This patch adds ib_create_flow(), which allow providing a flow
specification for a QP. When there's a match between the
specification and a received packet, the packet is forwarded to that
QP, in a the same way one uses ib_attach_multicast() for IB UD
multicast handling.
Flow specifications are provided as instances of struct ib_flow_spec_yyy,
which describe L2, L3 and L4 headers. Currently specs for Ethernet, IPv4,
TCP and UDP are defined. Flow specs are made of values and masks.
The input to ib_create_flow() is a struct ib_flow_attr, which contains
a few mandatory control elements and optional flow specs.
struct ib_flow_attr {
enum ib_flow_attr_type type;
u16 size;
u16 priority;
u32 flags;
u8 num_of_specs;
u8 port;
/* Following are the optional layers according to user request
* struct ib_flow_spec_yyy
* struct ib_flow_spec_zzz
*/
};
As these specs are eventually coming from user space, they are defined and
used in a way which allows adding new spec types without kernel/user ABI
change, just with a little API enhancement which defines the newly added spec.
The flow spec structures are defined with TLV (Type-Length-Value)
entries, which allows calling ib_create_flow() with a list of variable
length of optional specs.
For the actual processing of ib_flow_attr the driver uses the number
of specs and the size mandatory fields along with the TLV nature of
the specs.
Steering rules processing order is according to the domain over which
the rule is set and the rule priority. All rules set by user space
applicatations fall into the IB_FLOW_DOMAIN_USER domain, other domains
could be used by future IPoIB RFS and Ethetool flow-steering interface
implementation. Lower numerical value for the priority field means
higher priority.
The returned value from ib_create_flow() is a struct ib_flow, which
contains a database pointer (handle) provided by the HW driver to be
used when calling ib_destroy_flow().
Applications that offload TCP/IP traffic can also be written over IB
UD QPs. The ib_create_flow() / ib_destroy_flow() API is designed to
support UD QPs too. A HW driver can set IB_DEVICE_MANAGED_FLOW_STEERING
to denote support for flow steering.
The ib_flow_attr enum type supports usage of flow steering for promiscuous
and sniffer purposes:
IB_FLOW_ATTR_NORMAL - "regular" rule, steering according to rule specification
IB_FLOW_ATTR_ALL_DEFAULT - default unicast and multicast rule, receive
all Ethernet traffic which isn't steered to any QP
IB_FLOW_ATTR_MC_DEFAULT - same as IB_FLOW_ATTR_ALL_DEFAULT but only for multicast
IB_FLOW_ATTR_SNIFFER - sniffer rule, receive all port traffic
ALL_DEFAULT and MC_DEFAULT rules options are valid only for Ethernet link type.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-07 19:01:59 +08:00
|
|
|
atomic_inc(&qp->usecnt);
|
2016-10-27 21:36:30 +08:00
|
|
|
flow_id->qp = qp;
|
|
|
|
}
|
IB/core: Add receive flow steering support
The RDMA stack allows for applications to create IB_QPT_RAW_PACKET
QPs, which receive plain Ethernet packets, specifically packets that
don't carry any QPN to be matched by the receiving side. Applications
using these QPs must be provided with a method to program some
steering rule with the HW so packets arriving at the local port can be
routed to them.
This patch adds ib_create_flow(), which allow providing a flow
specification for a QP. When there's a match between the
specification and a received packet, the packet is forwarded to that
QP, in a the same way one uses ib_attach_multicast() for IB UD
multicast handling.
Flow specifications are provided as instances of struct ib_flow_spec_yyy,
which describe L2, L3 and L4 headers. Currently specs for Ethernet, IPv4,
TCP and UDP are defined. Flow specs are made of values and masks.
The input to ib_create_flow() is a struct ib_flow_attr, which contains
a few mandatory control elements and optional flow specs.
struct ib_flow_attr {
enum ib_flow_attr_type type;
u16 size;
u16 priority;
u32 flags;
u8 num_of_specs;
u8 port;
/* Following are the optional layers according to user request
* struct ib_flow_spec_yyy
* struct ib_flow_spec_zzz
*/
};
As these specs are eventually coming from user space, they are defined and
used in a way which allows adding new spec types without kernel/user ABI
change, just with a little API enhancement which defines the newly added spec.
The flow spec structures are defined with TLV (Type-Length-Value)
entries, which allows calling ib_create_flow() with a list of variable
length of optional specs.
For the actual processing of ib_flow_attr the driver uses the number
of specs and the size mandatory fields along with the TLV nature of
the specs.
Steering rules processing order is according to the domain over which
the rule is set and the rule priority. All rules set by user space
applicatations fall into the IB_FLOW_DOMAIN_USER domain, other domains
could be used by future IPoIB RFS and Ethetool flow-steering interface
implementation. Lower numerical value for the priority field means
higher priority.
The returned value from ib_create_flow() is a struct ib_flow, which
contains a database pointer (handle) provided by the HW driver to be
used when calling ib_destroy_flow().
Applications that offload TCP/IP traffic can also be written over IB
UD QPs. The ib_create_flow() / ib_destroy_flow() API is designed to
support UD QPs too. A HW driver can set IB_DEVICE_MANAGED_FLOW_STEERING
to denote support for flow steering.
The ib_flow_attr enum type supports usage of flow steering for promiscuous
and sniffer purposes:
IB_FLOW_ATTR_NORMAL - "regular" rule, steering according to rule specification
IB_FLOW_ATTR_ALL_DEFAULT - default unicast and multicast rule, receive
all Ethernet traffic which isn't steered to any QP
IB_FLOW_ATTR_MC_DEFAULT - same as IB_FLOW_ATTR_ALL_DEFAULT but only for multicast
IB_FLOW_ATTR_SNIFFER - sniffer rule, receive all port traffic
ALL_DEFAULT and MC_DEFAULT rules options are valid only for Ethernet link type.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-07 19:01:59 +08:00
|
|
|
return flow_id;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_create_flow);
|
|
|
|
|
|
|
|
int ib_destroy_flow(struct ib_flow *flow_id)
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
struct ib_qp *qp = flow_id->qp;
|
|
|
|
|
|
|
|
err = qp->device->destroy_flow(flow_id);
|
|
|
|
if (!err)
|
|
|
|
atomic_dec(&qp->usecnt);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_destroy_flow);
|
2014-02-23 20:19:05 +08:00
|
|
|
|
|
|
|
int ib_check_mr_status(struct ib_mr *mr, u32 check_mask,
|
|
|
|
struct ib_mr_status *mr_status)
|
|
|
|
{
|
|
|
|
return mr->device->check_mr_status ?
|
|
|
|
mr->device->check_mr_status(mr, check_mask, mr_status) : -ENOSYS;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_check_mr_status);
|
2015-10-14 00:11:24 +08:00
|
|
|
|
2016-03-12 04:58:38 +08:00
|
|
|
int ib_set_vf_link_state(struct ib_device *device, int vf, u8 port,
|
|
|
|
int state)
|
|
|
|
{
|
|
|
|
if (!device->set_vf_link_state)
|
|
|
|
return -ENOSYS;
|
|
|
|
|
|
|
|
return device->set_vf_link_state(device, vf, port, state);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_set_vf_link_state);
|
|
|
|
|
|
|
|
int ib_get_vf_config(struct ib_device *device, int vf, u8 port,
|
|
|
|
struct ifla_vf_info *info)
|
|
|
|
{
|
|
|
|
if (!device->get_vf_config)
|
|
|
|
return -ENOSYS;
|
|
|
|
|
|
|
|
return device->get_vf_config(device, vf, port, info);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_get_vf_config);
|
|
|
|
|
|
|
|
int ib_get_vf_stats(struct ib_device *device, int vf, u8 port,
|
|
|
|
struct ifla_vf_stats *stats)
|
|
|
|
{
|
|
|
|
if (!device->get_vf_stats)
|
|
|
|
return -ENOSYS;
|
|
|
|
|
|
|
|
return device->get_vf_stats(device, vf, port, stats);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_get_vf_stats);
|
|
|
|
|
|
|
|
int ib_set_vf_guid(struct ib_device *device, int vf, u8 port, u64 guid,
|
|
|
|
int type)
|
|
|
|
{
|
|
|
|
if (!device->set_vf_guid)
|
|
|
|
return -ENOSYS;
|
|
|
|
|
|
|
|
return device->set_vf_guid(device, vf, port, guid, type);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_set_vf_guid);
|
|
|
|
|
2015-10-14 00:11:24 +08:00
|
|
|
/**
|
|
|
|
* ib_map_mr_sg() - Map the largest prefix of a dma mapped SG list
|
|
|
|
* and set it the memory region.
|
|
|
|
* @mr: memory region
|
|
|
|
* @sg: dma mapped scatterlist
|
|
|
|
* @sg_nents: number of entries in sg
|
2016-05-04 00:01:04 +08:00
|
|
|
* @sg_offset: offset in bytes into sg
|
2015-10-14 00:11:24 +08:00
|
|
|
* @page_size: page vector desired page size
|
|
|
|
*
|
|
|
|
* Constraints:
|
|
|
|
* - The first sg element is allowed to have an offset.
|
2016-09-27 00:09:42 +08:00
|
|
|
* - Each sg element must either be aligned to page_size or virtually
|
|
|
|
* contiguous to the previous element. In case an sg element has a
|
|
|
|
* non-contiguous offset, the mapping prefix will not include it.
|
2015-10-14 00:11:24 +08:00
|
|
|
* - The last sg element is allowed to have length less than page_size.
|
|
|
|
* - If sg_nents total byte length exceeds the mr max_num_sge * page_size
|
|
|
|
* then only max_num_sg entries will be mapped.
|
2016-09-27 00:09:42 +08:00
|
|
|
* - If the MR was allocated with type IB_MR_TYPE_SG_GAPS, none of these
|
2016-03-01 01:07:32 +08:00
|
|
|
* constraints holds and the page_size argument is ignored.
|
2015-10-14 00:11:24 +08:00
|
|
|
*
|
|
|
|
* Returns the number of sg elements that were mapped to the memory region.
|
|
|
|
*
|
|
|
|
* After this completes successfully, the memory region
|
|
|
|
* is ready for registration.
|
|
|
|
*/
|
2016-05-04 00:01:04 +08:00
|
|
|
int ib_map_mr_sg(struct ib_mr *mr, struct scatterlist *sg, int sg_nents,
|
2016-05-13 01:49:15 +08:00
|
|
|
unsigned int *sg_offset, unsigned int page_size)
|
2015-10-14 00:11:24 +08:00
|
|
|
{
|
|
|
|
if (unlikely(!mr->device->map_mr_sg))
|
|
|
|
return -ENOSYS;
|
|
|
|
|
|
|
|
mr->page_size = page_size;
|
|
|
|
|
2016-05-04 00:01:04 +08:00
|
|
|
return mr->device->map_mr_sg(mr, sg, sg_nents, sg_offset);
|
2015-10-14 00:11:24 +08:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_map_mr_sg);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* ib_sg_to_pages() - Convert the largest prefix of a sg list
|
|
|
|
* to a page vector
|
|
|
|
* @mr: memory region
|
|
|
|
* @sgl: dma mapped scatterlist
|
|
|
|
* @sg_nents: number of entries in sg
|
2016-05-13 01:49:15 +08:00
|
|
|
* @sg_offset_p: IN: start offset in bytes into sg
|
|
|
|
* OUT: offset in bytes for element n of the sg of the first
|
|
|
|
* byte that has not been processed where n is the return
|
|
|
|
* value of this function.
|
2015-10-14 00:11:24 +08:00
|
|
|
* @set_page: driver page assignment function pointer
|
|
|
|
*
|
2015-12-04 08:04:17 +08:00
|
|
|
* Core service helper for drivers to convert the largest
|
2015-10-14 00:11:24 +08:00
|
|
|
* prefix of given sg list to a page vector. The sg list
|
|
|
|
* prefix converted is the prefix that meet the requirements
|
|
|
|
* of ib_map_mr_sg.
|
|
|
|
*
|
|
|
|
* Returns the number of sg elements that were assigned to
|
|
|
|
* a page vector.
|
|
|
|
*/
|
2016-05-04 00:01:04 +08:00
|
|
|
int ib_sg_to_pages(struct ib_mr *mr, struct scatterlist *sgl, int sg_nents,
|
2016-05-13 01:49:15 +08:00
|
|
|
unsigned int *sg_offset_p, int (*set_page)(struct ib_mr *, u64))
|
2015-10-14 00:11:24 +08:00
|
|
|
{
|
|
|
|
struct scatterlist *sg;
|
2015-12-29 17:45:03 +08:00
|
|
|
u64 last_end_dma_addr = 0;
|
2016-05-13 01:49:15 +08:00
|
|
|
unsigned int sg_offset = sg_offset_p ? *sg_offset_p : 0;
|
2015-10-14 00:11:24 +08:00
|
|
|
unsigned int last_page_off = 0;
|
|
|
|
u64 page_mask = ~((u64)mr->page_size - 1);
|
2015-12-04 08:04:17 +08:00
|
|
|
int i, ret;
|
2015-10-14 00:11:24 +08:00
|
|
|
|
2016-05-13 01:49:15 +08:00
|
|
|
if (unlikely(sg_nents <= 0 || sg_offset > sg_dma_len(&sgl[0])))
|
|
|
|
return -EINVAL;
|
|
|
|
|
2016-05-04 00:01:04 +08:00
|
|
|
mr->iova = sg_dma_address(&sgl[0]) + sg_offset;
|
2015-10-14 00:11:24 +08:00
|
|
|
mr->length = 0;
|
|
|
|
|
|
|
|
for_each_sg(sgl, sg, sg_nents, i) {
|
2016-05-04 00:01:04 +08:00
|
|
|
u64 dma_addr = sg_dma_address(sg) + sg_offset;
|
2016-05-13 01:49:15 +08:00
|
|
|
u64 prev_addr = dma_addr;
|
2016-05-04 00:01:04 +08:00
|
|
|
unsigned int dma_len = sg_dma_len(sg) - sg_offset;
|
2015-10-14 00:11:24 +08:00
|
|
|
u64 end_dma_addr = dma_addr + dma_len;
|
|
|
|
u64 page_addr = dma_addr & page_mask;
|
|
|
|
|
2015-12-04 08:04:17 +08:00
|
|
|
/*
|
|
|
|
* For the second and later elements, check whether either the
|
|
|
|
* end of element i-1 or the start of element i is not aligned
|
|
|
|
* on a page boundary.
|
|
|
|
*/
|
|
|
|
if (i && (last_page_off != 0 || page_addr != dma_addr)) {
|
|
|
|
/* Stop mapping if there is a gap. */
|
|
|
|
if (last_end_dma_addr != dma_addr)
|
|
|
|
break;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Coalesce this element with the last. If it is small
|
|
|
|
* enough just update mr->length. Otherwise start
|
|
|
|
* mapping from the next page.
|
|
|
|
*/
|
|
|
|
goto next_page;
|
2015-10-14 00:11:24 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
do {
|
2015-12-04 08:04:17 +08:00
|
|
|
ret = set_page(mr, page_addr);
|
2016-05-13 01:49:15 +08:00
|
|
|
if (unlikely(ret < 0)) {
|
|
|
|
sg_offset = prev_addr - sg_dma_address(sg);
|
|
|
|
mr->length += prev_addr - dma_addr;
|
|
|
|
if (sg_offset_p)
|
|
|
|
*sg_offset_p = sg_offset;
|
|
|
|
return i || sg_offset ? i : ret;
|
|
|
|
}
|
|
|
|
prev_addr = page_addr;
|
2015-12-04 08:04:17 +08:00
|
|
|
next_page:
|
2015-10-14 00:11:24 +08:00
|
|
|
page_addr += mr->page_size;
|
|
|
|
} while (page_addr < end_dma_addr);
|
|
|
|
|
|
|
|
mr->length += dma_len;
|
|
|
|
last_end_dma_addr = end_dma_addr;
|
|
|
|
last_page_off = end_dma_addr & ~page_mask;
|
2016-05-04 00:01:04 +08:00
|
|
|
|
|
|
|
sg_offset = 0;
|
2015-10-14 00:11:24 +08:00
|
|
|
}
|
|
|
|
|
2016-05-13 01:49:15 +08:00
|
|
|
if (sg_offset_p)
|
|
|
|
*sg_offset_p = 0;
|
2015-10-14 00:11:24 +08:00
|
|
|
return i;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_sg_to_pages);
|
2016-02-18 00:15:41 +08:00
|
|
|
|
|
|
|
struct ib_drain_cqe {
|
|
|
|
struct ib_cqe cqe;
|
|
|
|
struct completion done;
|
|
|
|
};
|
|
|
|
|
|
|
|
static void ib_drain_qp_done(struct ib_cq *cq, struct ib_wc *wc)
|
|
|
|
{
|
|
|
|
struct ib_drain_cqe *cqe = container_of(wc->wr_cqe, struct ib_drain_cqe,
|
|
|
|
cqe);
|
|
|
|
|
|
|
|
complete(&cqe->done);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Post a WR and block until its completion is reaped for the SQ.
|
|
|
|
*/
|
|
|
|
static void __ib_drain_sq(struct ib_qp *qp)
|
|
|
|
{
|
2017-02-15 02:56:35 +08:00
|
|
|
struct ib_cq *cq = qp->send_cq;
|
2016-02-18 00:15:41 +08:00
|
|
|
struct ib_qp_attr attr = { .qp_state = IB_QPS_ERR };
|
|
|
|
struct ib_drain_cqe sdrain;
|
|
|
|
struct ib_send_wr swr = {}, *bad_swr;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
swr.wr_cqe = &sdrain.cqe;
|
|
|
|
sdrain.cqe.done = ib_drain_qp_done;
|
|
|
|
init_completion(&sdrain.done);
|
|
|
|
|
|
|
|
ret = ib_modify_qp(qp, &attr, IB_QP_STATE);
|
|
|
|
if (ret) {
|
|
|
|
WARN_ONCE(ret, "failed to drain send queue: %d\n", ret);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = ib_post_send(qp, &swr, &bad_swr);
|
|
|
|
if (ret) {
|
|
|
|
WARN_ONCE(ret, "failed to drain send queue: %d\n", ret);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2017-02-15 02:56:35 +08:00
|
|
|
if (cq->poll_ctx == IB_POLL_DIRECT)
|
|
|
|
while (wait_for_completion_timeout(&sdrain.done, HZ / 10) <= 0)
|
|
|
|
ib_process_cq_direct(cq, -1);
|
|
|
|
else
|
|
|
|
wait_for_completion(&sdrain.done);
|
2016-02-18 00:15:41 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Post a WR and block until its completion is reaped for the RQ.
|
|
|
|
*/
|
|
|
|
static void __ib_drain_rq(struct ib_qp *qp)
|
|
|
|
{
|
2017-02-15 02:56:35 +08:00
|
|
|
struct ib_cq *cq = qp->recv_cq;
|
2016-02-18 00:15:41 +08:00
|
|
|
struct ib_qp_attr attr = { .qp_state = IB_QPS_ERR };
|
|
|
|
struct ib_drain_cqe rdrain;
|
|
|
|
struct ib_recv_wr rwr = {}, *bad_rwr;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
rwr.wr_cqe = &rdrain.cqe;
|
|
|
|
rdrain.cqe.done = ib_drain_qp_done;
|
|
|
|
init_completion(&rdrain.done);
|
|
|
|
|
|
|
|
ret = ib_modify_qp(qp, &attr, IB_QP_STATE);
|
|
|
|
if (ret) {
|
|
|
|
WARN_ONCE(ret, "failed to drain recv queue: %d\n", ret);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = ib_post_recv(qp, &rwr, &bad_rwr);
|
|
|
|
if (ret) {
|
|
|
|
WARN_ONCE(ret, "failed to drain recv queue: %d\n", ret);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2017-02-15 02:56:35 +08:00
|
|
|
if (cq->poll_ctx == IB_POLL_DIRECT)
|
|
|
|
while (wait_for_completion_timeout(&rdrain.done, HZ / 10) <= 0)
|
|
|
|
ib_process_cq_direct(cq, -1);
|
|
|
|
else
|
|
|
|
wait_for_completion(&rdrain.done);
|
2016-02-18 00:15:41 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* ib_drain_sq() - Block until all SQ CQEs have been consumed by the
|
|
|
|
* application.
|
|
|
|
* @qp: queue pair to drain
|
|
|
|
*
|
|
|
|
* If the device has a provider-specific drain function, then
|
|
|
|
* call that. Otherwise call the generic drain function
|
|
|
|
* __ib_drain_sq().
|
|
|
|
*
|
|
|
|
* The caller must:
|
|
|
|
*
|
|
|
|
* ensure there is room in the CQ and SQ for the drain work request and
|
|
|
|
* completion.
|
|
|
|
*
|
2017-02-15 02:56:35 +08:00
|
|
|
* allocate the CQ using ib_alloc_cq().
|
2016-02-18 00:15:41 +08:00
|
|
|
*
|
|
|
|
* ensure that there are no other contexts that are posting WRs concurrently.
|
|
|
|
* Otherwise the drain is not guaranteed.
|
|
|
|
*/
|
|
|
|
void ib_drain_sq(struct ib_qp *qp)
|
|
|
|
{
|
|
|
|
if (qp->device->drain_sq)
|
|
|
|
qp->device->drain_sq(qp);
|
|
|
|
else
|
|
|
|
__ib_drain_sq(qp);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_drain_sq);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* ib_drain_rq() - Block until all RQ CQEs have been consumed by the
|
|
|
|
* application.
|
|
|
|
* @qp: queue pair to drain
|
|
|
|
*
|
|
|
|
* If the device has a provider-specific drain function, then
|
|
|
|
* call that. Otherwise call the generic drain function
|
|
|
|
* __ib_drain_rq().
|
|
|
|
*
|
|
|
|
* The caller must:
|
|
|
|
*
|
|
|
|
* ensure there is room in the CQ and RQ for the drain work request and
|
|
|
|
* completion.
|
|
|
|
*
|
2017-02-15 02:56:35 +08:00
|
|
|
* allocate the CQ using ib_alloc_cq().
|
2016-02-18 00:15:41 +08:00
|
|
|
*
|
|
|
|
* ensure that there are no other contexts that are posting WRs concurrently.
|
|
|
|
* Otherwise the drain is not guaranteed.
|
|
|
|
*/
|
|
|
|
void ib_drain_rq(struct ib_qp *qp)
|
|
|
|
{
|
|
|
|
if (qp->device->drain_rq)
|
|
|
|
qp->device->drain_rq(qp);
|
|
|
|
else
|
|
|
|
__ib_drain_rq(qp);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_drain_rq);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* ib_drain_qp() - Block until all CQEs have been consumed by the
|
|
|
|
* application on both the RQ and SQ.
|
|
|
|
* @qp: queue pair to drain
|
|
|
|
*
|
|
|
|
* The caller must:
|
|
|
|
*
|
|
|
|
* ensure there is room in the CQ(s), SQ, and RQ for drain work requests
|
|
|
|
* and completions.
|
|
|
|
*
|
2017-02-15 02:56:35 +08:00
|
|
|
* allocate the CQs using ib_alloc_cq().
|
2016-02-18 00:15:41 +08:00
|
|
|
*
|
|
|
|
* ensure that there are no other contexts that are posting WRs concurrently.
|
|
|
|
* Otherwise the drain is not guaranteed.
|
|
|
|
*/
|
|
|
|
void ib_drain_qp(struct ib_qp *qp)
|
|
|
|
{
|
|
|
|
ib_drain_sq(qp);
|
2016-04-26 22:55:38 +08:00
|
|
|
if (!qp->srq)
|
|
|
|
ib_drain_rq(qp);
|
2016-02-18 00:15:41 +08:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(ib_drain_qp);
|