2007-05-09 09:00:38 +08:00
|
|
|
/*
|
|
|
|
* Copyright (c) 2007 Cisco Systems, Inc. All rights reserved.
|
2008-07-26 01:32:52 +08:00
|
|
|
* Copyright (c) 2007, 2008 Mellanox Technologies. All rights reserved.
|
2007-05-09 09:00:38 +08:00
|
|
|
*
|
|
|
|
* This software is available to you under a choice of one of two
|
|
|
|
* licenses. You may choose to be licensed under the terms of the GNU
|
|
|
|
* General Public License (GPL) Version 2, available from the file
|
|
|
|
* COPYING in the main directory of this source tree, or the
|
|
|
|
* OpenIB.org BSD license below:
|
|
|
|
*
|
|
|
|
* Redistribution and use in source and binary forms, with or
|
|
|
|
* without modification, are permitted provided that the following
|
|
|
|
* conditions are met:
|
|
|
|
*
|
|
|
|
* - Redistributions of source code must retain the above
|
|
|
|
* copyright notice, this list of conditions and the following
|
|
|
|
* disclaimer.
|
|
|
|
*
|
|
|
|
* - Redistributions in binary form must reproduce the above
|
|
|
|
* copyright notice, this list of conditions and the following
|
|
|
|
* disclaimer in the documentation and/or other materials
|
|
|
|
* provided with the distribution.
|
|
|
|
*
|
|
|
|
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
|
|
|
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
|
|
|
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
|
|
|
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
|
|
|
|
* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
|
|
|
|
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
|
|
|
|
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
|
|
* SOFTWARE.
|
|
|
|
*/
|
|
|
|
|
IB/mlx4: Use multiple WQ blocks to post smaller send WQEs
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift. This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting. Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.
Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.
Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs. However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware. And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.
When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-28 16:40:59 +08:00
|
|
|
#include <linux/log2.h>
|
2016-01-14 23:47:38 +08:00
|
|
|
#include <linux/etherdevice.h>
|
2016-01-14 23:50:41 +08:00
|
|
|
#include <net/ip.h>
|
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 16:04:11 +08:00
|
|
|
#include <linux/slab.h>
|
2010-10-25 12:08:52 +08:00
|
|
|
#include <linux/netdevice.h>
|
2015-10-08 13:27:04 +08:00
|
|
|
#include <linux/vmalloc.h>
|
IB/mlx4: Use multiple WQ blocks to post smaller send WQEs
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift. This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting. Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.
Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.
Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs. However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware. And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.
When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-28 16:40:59 +08:00
|
|
|
|
2007-05-09 09:00:38 +08:00
|
|
|
#include <rdma/ib_cache.h>
|
|
|
|
#include <rdma/ib_pack.h>
|
2010-08-26 22:19:22 +08:00
|
|
|
#include <rdma/ib_addr.h>
|
2012-08-03 16:40:40 +08:00
|
|
|
#include <rdma/ib_mad.h>
|
2007-05-09 09:00:38 +08:00
|
|
|
|
2015-02-03 22:48:36 +08:00
|
|
|
#include <linux/mlx4/driver.h>
|
2007-05-09 09:00:38 +08:00
|
|
|
#include <linux/mlx4/qp.h>
|
|
|
|
|
|
|
|
#include "mlx4_ib.h"
|
2016-09-22 22:31:14 +08:00
|
|
|
#include <rdma/mlx4-abi.h>
|
2007-05-09 09:00:38 +08:00
|
|
|
|
2015-02-08 17:49:34 +08:00
|
|
|
static void mlx4_ib_lock_cqs(struct mlx4_ib_cq *send_cq,
|
|
|
|
struct mlx4_ib_cq *recv_cq);
|
|
|
|
static void mlx4_ib_unlock_cqs(struct mlx4_ib_cq *send_cq,
|
|
|
|
struct mlx4_ib_cq *recv_cq);
|
|
|
|
|
2007-05-09 09:00:38 +08:00
|
|
|
enum {
|
|
|
|
MLX4_IB_ACK_REQ_FREQ = 8,
|
|
|
|
};
|
|
|
|
|
|
|
|
enum {
|
|
|
|
MLX4_IB_DEFAULT_SCHED_QUEUE = 0x83,
|
2010-10-25 12:08:52 +08:00
|
|
|
MLX4_IB_DEFAULT_QP0_SCHED_QUEUE = 0x3f,
|
|
|
|
MLX4_IB_LINK_TYPE_IB = 0,
|
|
|
|
MLX4_IB_LINK_TYPE_ETH = 1
|
2007-05-09 09:00:38 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
enum {
|
|
|
|
/*
|
2010-10-25 12:08:52 +08:00
|
|
|
* Largest possible UD header: send with GRH and immediate
|
2010-08-26 22:19:22 +08:00
|
|
|
* data plus 18 bytes for an Ethernet header with VLAN/802.1Q
|
|
|
|
* tag. (LRH would only use 8 bytes, so Ethernet is the
|
|
|
|
* biggest case)
|
2007-05-09 09:00:38 +08:00
|
|
|
*/
|
2010-08-26 22:19:22 +08:00
|
|
|
MLX4_IB_UD_HEADER_SIZE = 82,
|
2009-11-13 03:19:44 +08:00
|
|
|
MLX4_IB_LSO_HEADER_SPARE = 128,
|
2007-05-09 09:00:38 +08:00
|
|
|
};
|
|
|
|
|
2010-10-25 12:08:52 +08:00
|
|
|
enum {
|
|
|
|
MLX4_IB_IBOE_ETHERTYPE = 0x8915
|
|
|
|
};
|
|
|
|
|
2007-05-09 09:00:38 +08:00
|
|
|
struct mlx4_ib_sqp {
|
|
|
|
struct mlx4_ib_qp qp;
|
|
|
|
int pkey_index;
|
|
|
|
u32 qkey;
|
|
|
|
u32 send_psn;
|
|
|
|
struct ib_ud_header ud_header;
|
|
|
|
u8 header_buf[MLX4_IB_UD_HEADER_SIZE];
|
2016-01-14 23:50:42 +08:00
|
|
|
struct ib_qp *roce_v2_gsi;
|
2007-05-09 09:00:38 +08:00
|
|
|
};
|
|
|
|
|
2007-10-18 23:36:43 +08:00
|
|
|
enum {
|
2009-11-13 03:19:44 +08:00
|
|
|
MLX4_IB_MIN_SQ_STRIDE = 6,
|
|
|
|
MLX4_IB_CACHE_LINE_SIZE = 64,
|
2007-10-18 23:36:43 +08:00
|
|
|
};
|
|
|
|
|
2012-01-17 19:39:07 +08:00
|
|
|
enum {
|
|
|
|
MLX4_RAW_QP_MTU = 7,
|
|
|
|
MLX4_RAW_QP_MSGMAX = 31,
|
|
|
|
};
|
|
|
|
|
2013-12-13 00:03:14 +08:00
|
|
|
#ifndef ETH_ALEN
|
|
|
|
#define ETH_ALEN 6
|
|
|
|
#endif
|
|
|
|
|
2007-05-09 09:00:38 +08:00
|
|
|
static const __be32 mlx4_ib_opcode[] = {
|
2010-04-14 22:23:39 +08:00
|
|
|
[IB_WR_SEND] = cpu_to_be32(MLX4_OPCODE_SEND),
|
|
|
|
[IB_WR_LSO] = cpu_to_be32(MLX4_OPCODE_LSO),
|
|
|
|
[IB_WR_SEND_WITH_IMM] = cpu_to_be32(MLX4_OPCODE_SEND_IMM),
|
|
|
|
[IB_WR_RDMA_WRITE] = cpu_to_be32(MLX4_OPCODE_RDMA_WRITE),
|
|
|
|
[IB_WR_RDMA_WRITE_WITH_IMM] = cpu_to_be32(MLX4_OPCODE_RDMA_WRITE_IMM),
|
|
|
|
[IB_WR_RDMA_READ] = cpu_to_be32(MLX4_OPCODE_RDMA_READ),
|
|
|
|
[IB_WR_ATOMIC_CMP_AND_SWP] = cpu_to_be32(MLX4_OPCODE_ATOMIC_CS),
|
|
|
|
[IB_WR_ATOMIC_FETCH_AND_ADD] = cpu_to_be32(MLX4_OPCODE_ATOMIC_FA),
|
|
|
|
[IB_WR_SEND_WITH_INV] = cpu_to_be32(MLX4_OPCODE_SEND_INVAL),
|
|
|
|
[IB_WR_LOCAL_INV] = cpu_to_be32(MLX4_OPCODE_LOCAL_INVAL),
|
2015-10-14 00:11:27 +08:00
|
|
|
[IB_WR_REG_MR] = cpu_to_be32(MLX4_OPCODE_FMR),
|
2010-04-14 22:23:39 +08:00
|
|
|
[IB_WR_MASKED_ATOMIC_CMP_AND_SWP] = cpu_to_be32(MLX4_OPCODE_MASKED_ATOMIC_CS),
|
|
|
|
[IB_WR_MASKED_ATOMIC_FETCH_AND_ADD] = cpu_to_be32(MLX4_OPCODE_MASKED_ATOMIC_FA),
|
2007-05-09 09:00:38 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
static struct mlx4_ib_sqp *to_msqp(struct mlx4_ib_qp *mqp)
|
|
|
|
{
|
|
|
|
return container_of(mqp, struct mlx4_ib_sqp, qp);
|
|
|
|
}
|
|
|
|
|
2012-08-03 16:40:40 +08:00
|
|
|
static int is_tunnel_qp(struct mlx4_ib_dev *dev, struct mlx4_ib_qp *qp)
|
|
|
|
{
|
|
|
|
if (!mlx4_is_master(dev->dev))
|
|
|
|
return 0;
|
|
|
|
|
mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations
Previously, the structure of a guest's proxy QPs followed the
structure of the PPF special qps (qp0 port 1, qp0 port 2, qp1 port 1,
qp1 port 2, ...). The guest then did offset calculations on the
sqp_base qp number that the PPF passed to it in QUERY_FUNC_CAP().
This is now changed so that the guest does no offset calculations
regarding proxy or tunnel QPs to use. This change frees the PPF from
needing to adhere to a specific order in allocating proxy and tunnel
QPs.
Now QUERY_FUNC_CAP provides each port individually with its proxy
qp0, proxy qp1, tunnel qp0, and tunnel qp1 QP numbers, and these are
used directly where required (with no offset calculations).
To accomplish this change, several fields were added to the phys_caps
structure for use by the PPF and by non-SR-IOV mode:
base_sqpn -- in non-sriov mode, this was formerly sqp_start.
base_proxy_sqpn -- the first physical proxy qp number -- used by PPF
base_tunnel_sqpn -- the first physical tunnel qp number -- used by PPF.
The current code in the PPF still adheres to the previous layout of
sqps, proxy-sqps and tunnel-sqps. However, the PPF can change this
layout without affecting VF or (paravirtualized) PF code.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-08-03 16:40:57 +08:00
|
|
|
return qp->mqp.qpn >= dev->dev->phys_caps.base_tunnel_sqpn &&
|
|
|
|
qp->mqp.qpn < dev->dev->phys_caps.base_tunnel_sqpn +
|
|
|
|
8 * MLX4_MFUNC_MAX;
|
2012-08-03 16:40:40 +08:00
|
|
|
}
|
|
|
|
|
2007-05-09 09:00:38 +08:00
|
|
|
static int is_sqp(struct mlx4_ib_dev *dev, struct mlx4_ib_qp *qp)
|
|
|
|
{
|
mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations
Previously, the structure of a guest's proxy QPs followed the
structure of the PPF special qps (qp0 port 1, qp0 port 2, qp1 port 1,
qp1 port 2, ...). The guest then did offset calculations on the
sqp_base qp number that the PPF passed to it in QUERY_FUNC_CAP().
This is now changed so that the guest does no offset calculations
regarding proxy or tunnel QPs to use. This change frees the PPF from
needing to adhere to a specific order in allocating proxy and tunnel
QPs.
Now QUERY_FUNC_CAP provides each port individually with its proxy
qp0, proxy qp1, tunnel qp0, and tunnel qp1 QP numbers, and these are
used directly where required (with no offset calculations).
To accomplish this change, several fields were added to the phys_caps
structure for use by the PPF and by non-SR-IOV mode:
base_sqpn -- in non-sriov mode, this was formerly sqp_start.
base_proxy_sqpn -- the first physical proxy qp number -- used by PPF
base_tunnel_sqpn -- the first physical tunnel qp number -- used by PPF.
The current code in the PPF still adheres to the previous layout of
sqps, proxy-sqps and tunnel-sqps. However, the PPF can change this
layout without affecting VF or (paravirtualized) PF code.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-08-03 16:40:57 +08:00
|
|
|
int proxy_sqp = 0;
|
|
|
|
int real_sqp = 0;
|
|
|
|
int i;
|
|
|
|
/* PPF or Native -- real SQP */
|
|
|
|
real_sqp = ((mlx4_is_master(dev->dev) || !mlx4_is_mfunc(dev->dev)) &&
|
|
|
|
qp->mqp.qpn >= dev->dev->phys_caps.base_sqpn &&
|
|
|
|
qp->mqp.qpn <= dev->dev->phys_caps.base_sqpn + 3);
|
|
|
|
if (real_sqp)
|
|
|
|
return 1;
|
|
|
|
/* VF or PF -- proxy SQP */
|
|
|
|
if (mlx4_is_mfunc(dev->dev)) {
|
|
|
|
for (i = 0; i < dev->dev->caps.num_ports; i++) {
|
|
|
|
if (qp->mqp.qpn == dev->dev->caps.qp0_proxy[i] ||
|
|
|
|
qp->mqp.qpn == dev->dev->caps.qp1_proxy[i]) {
|
|
|
|
proxy_sqp = 1;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
2016-01-14 23:50:42 +08:00
|
|
|
if (proxy_sqp)
|
|
|
|
return 1;
|
|
|
|
|
|
|
|
return !!(qp->flags & MLX4_IB_ROCE_V2_GSI_QP);
|
2007-05-09 09:00:38 +08:00
|
|
|
}
|
|
|
|
|
2012-08-03 16:40:40 +08:00
|
|
|
/* used for INIT/CLOSE port logic */
|
2007-05-09 09:00:38 +08:00
|
|
|
static int is_qp0(struct mlx4_ib_dev *dev, struct mlx4_ib_qp *qp)
|
|
|
|
{
|
mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations
Previously, the structure of a guest's proxy QPs followed the
structure of the PPF special qps (qp0 port 1, qp0 port 2, qp1 port 1,
qp1 port 2, ...). The guest then did offset calculations on the
sqp_base qp number that the PPF passed to it in QUERY_FUNC_CAP().
This is now changed so that the guest does no offset calculations
regarding proxy or tunnel QPs to use. This change frees the PPF from
needing to adhere to a specific order in allocating proxy and tunnel
QPs.
Now QUERY_FUNC_CAP provides each port individually with its proxy
qp0, proxy qp1, tunnel qp0, and tunnel qp1 QP numbers, and these are
used directly where required (with no offset calculations).
To accomplish this change, several fields were added to the phys_caps
structure for use by the PPF and by non-SR-IOV mode:
base_sqpn -- in non-sriov mode, this was formerly sqp_start.
base_proxy_sqpn -- the first physical proxy qp number -- used by PPF
base_tunnel_sqpn -- the first physical tunnel qp number -- used by PPF.
The current code in the PPF still adheres to the previous layout of
sqps, proxy-sqps and tunnel-sqps. However, the PPF can change this
layout without affecting VF or (paravirtualized) PF code.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-08-03 16:40:57 +08:00
|
|
|
int proxy_qp0 = 0;
|
|
|
|
int real_qp0 = 0;
|
|
|
|
int i;
|
|
|
|
/* PPF or Native -- real QP0 */
|
|
|
|
real_qp0 = ((mlx4_is_master(dev->dev) || !mlx4_is_mfunc(dev->dev)) &&
|
|
|
|
qp->mqp.qpn >= dev->dev->phys_caps.base_sqpn &&
|
|
|
|
qp->mqp.qpn <= dev->dev->phys_caps.base_sqpn + 1);
|
|
|
|
if (real_qp0)
|
|
|
|
return 1;
|
|
|
|
/* VF or PF -- proxy QP0 */
|
|
|
|
if (mlx4_is_mfunc(dev->dev)) {
|
|
|
|
for (i = 0; i < dev->dev->caps.num_ports; i++) {
|
|
|
|
if (qp->mqp.qpn == dev->dev->caps.qp0_proxy[i]) {
|
|
|
|
proxy_qp0 = 1;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return proxy_qp0;
|
2007-05-09 09:00:38 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void *get_wqe(struct mlx4_ib_qp *qp, int offset)
|
|
|
|
{
|
2008-02-07 13:07:54 +08:00
|
|
|
return mlx4_buf_offset(&qp->buf, offset);
|
2007-05-09 09:00:38 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void *get_recv_wqe(struct mlx4_ib_qp *qp, int n)
|
|
|
|
{
|
|
|
|
return get_wqe(qp, qp->rq.offset + (n << qp->rq.wqe_shift));
|
|
|
|
}
|
|
|
|
|
|
|
|
static void *get_send_wqe(struct mlx4_ib_qp *qp, int n)
|
|
|
|
{
|
|
|
|
return get_wqe(qp, qp->sq.offset + (n << qp->sq.wqe_shift));
|
|
|
|
}
|
|
|
|
|
2007-06-18 23:13:48 +08:00
|
|
|
/*
|
|
|
|
* Stamp a SQ WQE so that it is invalid if prefetched by marking the
|
IB/mlx4: Use multiple WQ blocks to post smaller send WQEs
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift. This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting. Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.
Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.
Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs. However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware. And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.
When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-28 16:40:59 +08:00
|
|
|
* first four bytes of every 64 byte chunk with
|
|
|
|
* 0x7FFFFFF | (invalid_ownership_value << 31).
|
|
|
|
*
|
|
|
|
* When the max work request size is less than or equal to the WQE
|
|
|
|
* basic block size, as an optimization, we can stamp all WQEs with
|
|
|
|
* 0xffffffff, and skip the very first chunk of each WQE.
|
2007-06-18 23:13:48 +08:00
|
|
|
*/
|
IB/mlx4: Use multiple WQ blocks to post smaller send WQEs
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift. This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting. Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.
Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.
Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs. However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware. And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.
When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-28 16:40:59 +08:00
|
|
|
static void stamp_send_wqe(struct mlx4_ib_qp *qp, int n, int size)
|
2007-06-18 23:13:48 +08:00
|
|
|
{
|
2008-04-17 12:01:07 +08:00
|
|
|
__be32 *wqe;
|
2007-06-18 23:13:48 +08:00
|
|
|
int i;
|
IB/mlx4: Use multiple WQ blocks to post smaller send WQEs
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift. This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting. Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.
Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.
Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs. However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware. And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.
When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-28 16:40:59 +08:00
|
|
|
int s;
|
|
|
|
int ind;
|
|
|
|
void *buf;
|
|
|
|
__be32 stamp;
|
2008-07-15 14:48:44 +08:00
|
|
|
struct mlx4_wqe_ctrl_seg *ctrl;
|
IB/mlx4: Use multiple WQ blocks to post smaller send WQEs
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift. This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting. Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.
Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.
Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs. However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware. And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.
When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-28 16:40:59 +08:00
|
|
|
|
|
|
|
if (qp->sq_max_wqes_per_wr > 1) {
|
2008-07-15 14:48:44 +08:00
|
|
|
s = roundup(size, 1U << qp->sq.wqe_shift);
|
IB/mlx4: Use multiple WQ blocks to post smaller send WQEs
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift. This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting. Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.
Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.
Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs. However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware. And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.
When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-28 16:40:59 +08:00
|
|
|
for (i = 0; i < s; i += 64) {
|
|
|
|
ind = (i >> qp->sq.wqe_shift) + n;
|
|
|
|
stamp = ind & qp->sq.wqe_cnt ? cpu_to_be32(0x7fffffff) :
|
|
|
|
cpu_to_be32(0xffffffff);
|
|
|
|
buf = get_send_wqe(qp, ind & (qp->sq.wqe_cnt - 1));
|
|
|
|
wqe = buf + (i & ((1 << qp->sq.wqe_shift) - 1));
|
|
|
|
*wqe = stamp;
|
|
|
|
}
|
|
|
|
} else {
|
2008-07-15 14:48:44 +08:00
|
|
|
ctrl = buf = get_send_wqe(qp, n & (qp->sq.wqe_cnt - 1));
|
2016-07-20 03:16:54 +08:00
|
|
|
s = (ctrl->qpn_vlan.fence_size & 0x3f) << 4;
|
IB/mlx4: Use multiple WQ blocks to post smaller send WQEs
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift. This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting. Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.
Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.
Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs. However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware. And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.
When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-28 16:40:59 +08:00
|
|
|
for (i = 64; i < s; i += 64) {
|
|
|
|
wqe = buf + i;
|
2008-04-17 12:01:07 +08:00
|
|
|
*wqe = cpu_to_be32(0xffffffff);
|
IB/mlx4: Use multiple WQ blocks to post smaller send WQEs
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift. This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting. Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.
Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.
Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs. However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware. And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.
When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-28 16:40:59 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void post_nop_wqe(struct mlx4_ib_qp *qp, int n, int size)
|
|
|
|
{
|
|
|
|
struct mlx4_wqe_ctrl_seg *ctrl;
|
|
|
|
struct mlx4_wqe_inline_seg *inl;
|
|
|
|
void *wqe;
|
|
|
|
int s;
|
|
|
|
|
|
|
|
ctrl = wqe = get_send_wqe(qp, n & (qp->sq.wqe_cnt - 1));
|
|
|
|
s = sizeof(struct mlx4_wqe_ctrl_seg);
|
|
|
|
|
|
|
|
if (qp->ibqp.qp_type == IB_QPT_UD) {
|
|
|
|
struct mlx4_wqe_datagram_seg *dgram = wqe + sizeof *ctrl;
|
|
|
|
struct mlx4_av *av = (struct mlx4_av *)dgram->av;
|
|
|
|
memset(dgram, 0, sizeof *dgram);
|
|
|
|
av->port_pd = cpu_to_be32((qp->port << 24) | to_mpd(qp->ibqp.pd)->pdn);
|
|
|
|
s += sizeof(struct mlx4_wqe_datagram_seg);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Pad the remainder of the WQE with an inline data segment. */
|
|
|
|
if (size > s) {
|
|
|
|
inl = wqe + s;
|
|
|
|
inl->byte_count = cpu_to_be32(1 << 31 | (size - s - sizeof *inl));
|
|
|
|
}
|
|
|
|
ctrl->srcrb_flags = 0;
|
2016-07-20 03:16:54 +08:00
|
|
|
ctrl->qpn_vlan.fence_size = size / 16;
|
IB/mlx4: Use multiple WQ blocks to post smaller send WQEs
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift. This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting. Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.
Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.
Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs. However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware. And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.
When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-28 16:40:59 +08:00
|
|
|
/*
|
|
|
|
* Make sure descriptor is fully written before setting ownership bit
|
|
|
|
* (because HW can start executing as soon as we do).
|
|
|
|
*/
|
|
|
|
wmb();
|
|
|
|
|
|
|
|
ctrl->owner_opcode = cpu_to_be32(MLX4_OPCODE_NOP | MLX4_WQE_CTRL_NEC) |
|
|
|
|
(n & qp->sq.wqe_cnt ? cpu_to_be32(1 << 31) : 0);
|
2007-06-18 23:13:48 +08:00
|
|
|
|
IB/mlx4: Use multiple WQ blocks to post smaller send WQEs
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift. This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting. Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.
Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.
Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs. However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware. And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.
When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-28 16:40:59 +08:00
|
|
|
stamp_send_wqe(qp, n + qp->sq_spare_wqes, size);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Post NOP WQE to prevent wrap-around in the middle of WR */
|
|
|
|
static inline unsigned pad_wraparound(struct mlx4_ib_qp *qp, int ind)
|
|
|
|
{
|
|
|
|
unsigned s = qp->sq.wqe_cnt - (ind & (qp->sq.wqe_cnt - 1));
|
|
|
|
if (unlikely(s < qp->sq_max_wqes_per_wr)) {
|
|
|
|
post_nop_wqe(qp, ind, s << qp->sq.wqe_shift);
|
|
|
|
ind += s;
|
|
|
|
}
|
|
|
|
return ind;
|
2007-06-18 23:13:48 +08:00
|
|
|
}
|
|
|
|
|
2007-05-09 09:00:38 +08:00
|
|
|
static void mlx4_ib_qp_event(struct mlx4_qp *qp, enum mlx4_event type)
|
|
|
|
{
|
|
|
|
struct ib_event event;
|
|
|
|
struct ib_qp *ibqp = &to_mibqp(qp)->ibqp;
|
|
|
|
|
|
|
|
if (type == MLX4_EVENT_TYPE_PATH_MIG)
|
|
|
|
to_mibqp(qp)->port = to_mibqp(qp)->alt_port;
|
|
|
|
|
|
|
|
if (ibqp->event_handler) {
|
|
|
|
event.device = ibqp->device;
|
|
|
|
event.element.qp = ibqp;
|
|
|
|
switch (type) {
|
|
|
|
case MLX4_EVENT_TYPE_PATH_MIG:
|
|
|
|
event.event = IB_EVENT_PATH_MIG;
|
|
|
|
break;
|
|
|
|
case MLX4_EVENT_TYPE_COMM_EST:
|
|
|
|
event.event = IB_EVENT_COMM_EST;
|
|
|
|
break;
|
|
|
|
case MLX4_EVENT_TYPE_SQ_DRAINED:
|
|
|
|
event.event = IB_EVENT_SQ_DRAINED;
|
|
|
|
break;
|
|
|
|
case MLX4_EVENT_TYPE_SRQ_QP_LAST_WQE:
|
|
|
|
event.event = IB_EVENT_QP_LAST_WQE_REACHED;
|
|
|
|
break;
|
|
|
|
case MLX4_EVENT_TYPE_WQ_CATAS_ERROR:
|
|
|
|
event.event = IB_EVENT_QP_FATAL;
|
|
|
|
break;
|
|
|
|
case MLX4_EVENT_TYPE_PATH_MIG_FAILED:
|
|
|
|
event.event = IB_EVENT_PATH_MIG_ERR;
|
|
|
|
break;
|
|
|
|
case MLX4_EVENT_TYPE_WQ_INVAL_REQ_ERROR:
|
|
|
|
event.event = IB_EVENT_QP_REQ_ERR;
|
|
|
|
break;
|
|
|
|
case MLX4_EVENT_TYPE_WQ_ACCESS_ERROR:
|
|
|
|
event.event = IB_EVENT_QP_ACCESS_ERR;
|
|
|
|
break;
|
|
|
|
default:
|
2012-04-29 22:04:26 +08:00
|
|
|
pr_warn("Unexpected event type %d "
|
2007-05-09 09:00:38 +08:00
|
|
|
"on QP %06x\n", type, qp->qpn);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
ibqp->event_handler(&event, ibqp->qp_context);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2012-08-03 16:40:40 +08:00
|
|
|
static int send_wqe_overhead(enum mlx4_ib_qp_type type, u32 flags)
|
2007-05-09 09:00:38 +08:00
|
|
|
{
|
|
|
|
/*
|
|
|
|
* UD WQEs must have a datagram segment.
|
|
|
|
* RC and UC WQEs might have a remote address segment.
|
|
|
|
* MLX WQEs need two extra inline data segments (for the UD
|
|
|
|
* header and space for the ICRC).
|
|
|
|
*/
|
|
|
|
switch (type) {
|
2012-08-03 16:40:40 +08:00
|
|
|
case MLX4_IB_QPT_UD:
|
2007-05-09 09:00:38 +08:00
|
|
|
return sizeof (struct mlx4_wqe_ctrl_seg) +
|
2008-04-17 12:09:27 +08:00
|
|
|
sizeof (struct mlx4_wqe_datagram_seg) +
|
2009-11-13 03:19:44 +08:00
|
|
|
((flags & MLX4_IB_QP_LSO) ? MLX4_IB_LSO_HEADER_SPARE : 0);
|
2012-08-03 16:40:40 +08:00
|
|
|
case MLX4_IB_QPT_PROXY_SMI_OWNER:
|
|
|
|
case MLX4_IB_QPT_PROXY_SMI:
|
|
|
|
case MLX4_IB_QPT_PROXY_GSI:
|
|
|
|
return sizeof (struct mlx4_wqe_ctrl_seg) +
|
|
|
|
sizeof (struct mlx4_wqe_datagram_seg) + 64;
|
|
|
|
case MLX4_IB_QPT_TUN_SMI_OWNER:
|
|
|
|
case MLX4_IB_QPT_TUN_GSI:
|
|
|
|
return sizeof (struct mlx4_wqe_ctrl_seg) +
|
|
|
|
sizeof (struct mlx4_wqe_datagram_seg);
|
|
|
|
|
|
|
|
case MLX4_IB_QPT_UC:
|
2007-05-09 09:00:38 +08:00
|
|
|
return sizeof (struct mlx4_wqe_ctrl_seg) +
|
|
|
|
sizeof (struct mlx4_wqe_raddr_seg);
|
2012-08-03 16:40:40 +08:00
|
|
|
case MLX4_IB_QPT_RC:
|
2007-05-09 09:00:38 +08:00
|
|
|
return sizeof (struct mlx4_wqe_ctrl_seg) +
|
2016-06-22 22:27:28 +08:00
|
|
|
sizeof (struct mlx4_wqe_masked_atomic_seg) +
|
2007-05-09 09:00:38 +08:00
|
|
|
sizeof (struct mlx4_wqe_raddr_seg);
|
2012-08-03 16:40:40 +08:00
|
|
|
case MLX4_IB_QPT_SMI:
|
|
|
|
case MLX4_IB_QPT_GSI:
|
2007-05-09 09:00:38 +08:00
|
|
|
return sizeof (struct mlx4_wqe_ctrl_seg) +
|
|
|
|
ALIGN(MLX4_IB_UD_HEADER_SIZE +
|
2007-06-19 00:23:47 +08:00
|
|
|
DIV_ROUND_UP(MLX4_IB_UD_HEADER_SIZE,
|
|
|
|
MLX4_INLINE_ALIGN) *
|
2007-05-09 09:00:38 +08:00
|
|
|
sizeof (struct mlx4_wqe_inline_seg),
|
|
|
|
sizeof (struct mlx4_wqe_data_seg)) +
|
|
|
|
ALIGN(4 +
|
|
|
|
sizeof (struct mlx4_wqe_inline_seg),
|
|
|
|
sizeof (struct mlx4_wqe_data_seg));
|
|
|
|
default:
|
|
|
|
return sizeof (struct mlx4_wqe_ctrl_seg);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2007-05-17 15:32:41 +08:00
|
|
|
static int set_rq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
|
2011-06-03 02:32:15 +08:00
|
|
|
int is_user, int has_rq, struct mlx4_ib_qp *qp)
|
2007-05-09 09:00:38 +08:00
|
|
|
{
|
2007-05-17 15:32:41 +08:00
|
|
|
/* Sanity check RQ size before proceeding */
|
2012-05-24 21:08:08 +08:00
|
|
|
if (cap->max_recv_wr > dev->dev->caps.max_wqes - MLX4_IB_SQ_MAX_SPARE ||
|
|
|
|
cap->max_recv_sge > min(dev->dev->caps.max_sq_sg, dev->dev->caps.max_rq_sg))
|
2007-05-17 15:32:41 +08:00
|
|
|
return -EINVAL;
|
|
|
|
|
2011-06-03 02:32:15 +08:00
|
|
|
if (!has_rq) {
|
2007-06-08 14:24:39 +08:00
|
|
|
if (cap->max_recv_wr)
|
|
|
|
return -EINVAL;
|
2007-05-17 15:32:41 +08:00
|
|
|
|
2007-06-18 23:13:48 +08:00
|
|
|
qp->rq.wqe_cnt = qp->rq.max_gs = 0;
|
2007-06-08 14:24:39 +08:00
|
|
|
} else {
|
|
|
|
/* HW requires >= 1 RQ entry with >= 1 gather entry */
|
|
|
|
if (is_user && (!cap->max_recv_wr || !cap->max_recv_sge))
|
|
|
|
return -EINVAL;
|
|
|
|
|
2007-06-18 23:13:48 +08:00
|
|
|
qp->rq.wqe_cnt = roundup_pow_of_two(max(1U, cap->max_recv_wr));
|
2007-06-13 01:52:02 +08:00
|
|
|
qp->rq.max_gs = roundup_pow_of_two(max(1U, cap->max_recv_sge));
|
2007-06-08 14:24:39 +08:00
|
|
|
qp->rq.wqe_shift = ilog2(qp->rq.max_gs * sizeof (struct mlx4_wqe_data_seg));
|
|
|
|
}
|
2007-05-17 15:32:41 +08:00
|
|
|
|
2012-05-24 21:08:08 +08:00
|
|
|
/* leave userspace return values as they were, so as not to break ABI */
|
|
|
|
if (is_user) {
|
|
|
|
cap->max_recv_wr = qp->rq.max_post = qp->rq.wqe_cnt;
|
|
|
|
cap->max_recv_sge = qp->rq.max_gs;
|
|
|
|
} else {
|
|
|
|
cap->max_recv_wr = qp->rq.max_post =
|
|
|
|
min(dev->dev->caps.max_wqes - MLX4_IB_SQ_MAX_SPARE, qp->rq.wqe_cnt);
|
|
|
|
cap->max_recv_sge = min(qp->rq.max_gs,
|
|
|
|
min(dev->dev->caps.max_sq_sg,
|
|
|
|
dev->dev->caps.max_rq_sg));
|
|
|
|
}
|
2007-05-17 15:32:41 +08:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
|
2016-05-04 19:50:15 +08:00
|
|
|
enum mlx4_ib_qp_type type, struct mlx4_ib_qp *qp,
|
|
|
|
bool shrink_wqe)
|
2007-05-17 15:32:41 +08:00
|
|
|
{
|
IB/mlx4: Use multiple WQ blocks to post smaller send WQEs
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift. This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting. Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.
Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.
Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs. However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware. And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.
When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-28 16:40:59 +08:00
|
|
|
int s;
|
|
|
|
|
2007-05-17 15:32:41 +08:00
|
|
|
/* Sanity check SQ size before proceeding */
|
2012-05-24 21:08:08 +08:00
|
|
|
if (cap->max_send_wr > (dev->dev->caps.max_wqes - MLX4_IB_SQ_MAX_SPARE) ||
|
|
|
|
cap->max_send_sge > min(dev->dev->caps.max_sq_sg, dev->dev->caps.max_rq_sg) ||
|
2008-04-17 12:09:27 +08:00
|
|
|
cap->max_inline_data + send_wqe_overhead(type, qp->flags) +
|
2007-05-09 09:00:38 +08:00
|
|
|
sizeof (struct mlx4_wqe_inline_seg) > dev->dev->caps.max_sq_desc_sz)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* For MLX transport we need 2 extra S/G entries:
|
|
|
|
* one for the header and one for the checksum at the end
|
|
|
|
*/
|
2012-08-03 16:40:40 +08:00
|
|
|
if ((type == MLX4_IB_QPT_SMI || type == MLX4_IB_QPT_GSI ||
|
|
|
|
type & (MLX4_IB_QPT_PROXY_SMI_OWNER | MLX4_IB_QPT_TUN_SMI_OWNER)) &&
|
2007-05-09 09:00:38 +08:00
|
|
|
cap->max_send_sge + 2 > dev->dev->caps.max_sq_sg)
|
|
|
|
return -EINVAL;
|
|
|
|
|
IB/mlx4: Use multiple WQ blocks to post smaller send WQEs
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift. This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting. Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.
Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.
Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs. However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware. And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.
When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-28 16:40:59 +08:00
|
|
|
s = max(cap->max_send_sge * sizeof (struct mlx4_wqe_data_seg),
|
|
|
|
cap->max_inline_data + sizeof (struct mlx4_wqe_inline_seg)) +
|
2008-04-17 12:09:27 +08:00
|
|
|
send_wqe_overhead(type, qp->flags);
|
2007-05-09 09:00:38 +08:00
|
|
|
|
2008-05-21 05:00:02 +08:00
|
|
|
if (s > dev->dev->caps.max_sq_desc_sz)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2007-06-18 23:13:48 +08:00
|
|
|
/*
|
IB/mlx4: Use multiple WQ blocks to post smaller send WQEs
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift. This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting. Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.
Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.
Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs. However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware. And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.
When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-28 16:40:59 +08:00
|
|
|
* Hermon supports shrinking WQEs, such that a single work
|
|
|
|
* request can include multiple units of 1 << wqe_shift. This
|
|
|
|
* way, work requests can differ in size, and do not have to
|
|
|
|
* be a power of 2 in size, saving memory and speeding up send
|
|
|
|
* WR posting. Unfortunately, if we do this then the
|
|
|
|
* wqe_index field in CQEs can't be used to look up the WR ID
|
|
|
|
* anymore, so we do this only if selective signaling is off.
|
|
|
|
*
|
|
|
|
* Further, on 32-bit platforms, we can't use vmap() to make
|
tree-wide: fix assorted typos all over the place
That is "success", "unknown", "through", "performance", "[re|un]mapping"
, "access", "default", "reasonable", "[con]currently", "temperature"
, "channel", "[un]used", "application", "example","hierarchy", "therefore"
, "[over|under]flow", "contiguous", "threshold", "enough" and others.
Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-11-14 23:09:05 +08:00
|
|
|
* the QP buffer virtually contiguous. Thus we have to use
|
IB/mlx4: Use multiple WQ blocks to post smaller send WQEs
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift. This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting. Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.
Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.
Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs. However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware. And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.
When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-28 16:40:59 +08:00
|
|
|
* constant-sized WRs to make sure a WR is always fully within
|
|
|
|
* a single page-sized chunk.
|
|
|
|
*
|
|
|
|
* Finally, we use NOP work requests to pad the end of the
|
|
|
|
* work queue, to avoid wrap-around in the middle of WR. We
|
|
|
|
* set NEC bit to avoid getting completions with error for
|
|
|
|
* these NOP WRs, but since NEC is only supported starting
|
|
|
|
* with firmware 2.2.232, we use constant-sized WRs for older
|
|
|
|
* firmware.
|
|
|
|
*
|
|
|
|
* And, since MLX QPs only support SEND, we use constant-sized
|
|
|
|
* WRs in this case.
|
|
|
|
*
|
|
|
|
* We look for the smallest value of wqe_shift such that the
|
|
|
|
* resulting number of wqes does not exceed device
|
|
|
|
* capabilities.
|
|
|
|
*
|
|
|
|
* We set WQE size to at least 64 bytes, this way stamping
|
|
|
|
* invalidates each WQE.
|
2007-06-18 23:13:48 +08:00
|
|
|
*/
|
2016-05-04 19:50:15 +08:00
|
|
|
if (shrink_wqe && dev->dev->caps.fw_ver >= MLX4_FW_VER_WQE_CTRL_NEC &&
|
IB/mlx4: Use multiple WQ blocks to post smaller send WQEs
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift. This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting. Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.
Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.
Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs. However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware. And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.
When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-28 16:40:59 +08:00
|
|
|
qp->sq_signal_bits && BITS_PER_LONG == 64 &&
|
2012-08-03 16:40:40 +08:00
|
|
|
type != MLX4_IB_QPT_SMI && type != MLX4_IB_QPT_GSI &&
|
|
|
|
!(type & (MLX4_IB_QPT_PROXY_SMI_OWNER | MLX4_IB_QPT_PROXY_SMI |
|
|
|
|
MLX4_IB_QPT_PROXY_GSI | MLX4_IB_QPT_TUN_SMI_OWNER)))
|
IB/mlx4: Use multiple WQ blocks to post smaller send WQEs
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift. This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting. Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.
Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.
Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs. However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware. And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.
When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-28 16:40:59 +08:00
|
|
|
qp->sq.wqe_shift = ilog2(64);
|
|
|
|
else
|
|
|
|
qp->sq.wqe_shift = ilog2(roundup_pow_of_two(s));
|
|
|
|
|
|
|
|
for (;;) {
|
|
|
|
qp->sq_max_wqes_per_wr = DIV_ROUND_UP(s, 1U << qp->sq.wqe_shift);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We need to leave 2 KB + 1 WR of headroom in the SQ to
|
|
|
|
* allow HW to prefetch.
|
|
|
|
*/
|
|
|
|
qp->sq_spare_wqes = (2048 >> qp->sq.wqe_shift) + qp->sq_max_wqes_per_wr;
|
|
|
|
qp->sq.wqe_cnt = roundup_pow_of_two(cap->max_send_wr *
|
|
|
|
qp->sq_max_wqes_per_wr +
|
|
|
|
qp->sq_spare_wqes);
|
|
|
|
|
|
|
|
if (qp->sq.wqe_cnt <= dev->dev->caps.max_wqes)
|
|
|
|
break;
|
|
|
|
|
|
|
|
if (qp->sq_max_wqes_per_wr <= 1)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
++qp->sq.wqe_shift;
|
|
|
|
}
|
|
|
|
|
2008-05-21 05:00:02 +08:00
|
|
|
qp->sq.max_gs = (min(dev->dev->caps.max_sq_desc_sz,
|
|
|
|
(qp->sq_max_wqes_per_wr << qp->sq.wqe_shift)) -
|
2008-04-17 12:09:27 +08:00
|
|
|
send_wqe_overhead(type, qp->flags)) /
|
|
|
|
sizeof (struct mlx4_wqe_data_seg);
|
2007-06-18 23:13:48 +08:00
|
|
|
|
|
|
|
qp->buf_size = (qp->rq.wqe_cnt << qp->rq.wqe_shift) +
|
|
|
|
(qp->sq.wqe_cnt << qp->sq.wqe_shift);
|
2007-05-09 09:00:38 +08:00
|
|
|
if (qp->rq.wqe_shift > qp->sq.wqe_shift) {
|
|
|
|
qp->rq.offset = 0;
|
2007-06-18 23:13:48 +08:00
|
|
|
qp->sq.offset = qp->rq.wqe_cnt << qp->rq.wqe_shift;
|
2007-05-09 09:00:38 +08:00
|
|
|
} else {
|
2007-06-18 23:13:48 +08:00
|
|
|
qp->rq.offset = qp->sq.wqe_cnt << qp->sq.wqe_shift;
|
2007-05-09 09:00:38 +08:00
|
|
|
qp->sq.offset = 0;
|
|
|
|
}
|
|
|
|
|
IB/mlx4: Use multiple WQ blocks to post smaller send WQEs
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift. This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting. Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.
Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.
Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs. However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware. And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.
When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-28 16:40:59 +08:00
|
|
|
cap->max_send_wr = qp->sq.max_post =
|
|
|
|
(qp->sq.wqe_cnt - qp->sq_spare_wqes) / qp->sq_max_wqes_per_wr;
|
2008-05-21 05:00:02 +08:00
|
|
|
cap->max_send_sge = min(qp->sq.max_gs,
|
|
|
|
min(dev->dev->caps.max_sq_sg,
|
|
|
|
dev->dev->caps.max_rq_sg));
|
2007-06-18 23:13:53 +08:00
|
|
|
/* We don't support inline sends for kernel QPs (yet) */
|
|
|
|
cap->max_inline_data = 0;
|
2007-05-09 09:00:38 +08:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2007-10-18 23:36:43 +08:00
|
|
|
static int set_user_sq_size(struct mlx4_ib_dev *dev,
|
|
|
|
struct mlx4_ib_qp *qp,
|
2007-05-17 15:32:41 +08:00
|
|
|
struct mlx4_ib_create_qp *ucmd)
|
|
|
|
{
|
2007-10-18 23:36:43 +08:00
|
|
|
/* Sanity check SQ size before proceeding */
|
|
|
|
if ((1 << ucmd->log_sq_bb_count) > dev->dev->caps.max_wqes ||
|
|
|
|
ucmd->log_sq_stride >
|
|
|
|
ilog2(roundup_pow_of_two(dev->dev->caps.max_sq_desc_sz)) ||
|
|
|
|
ucmd->log_sq_stride < MLX4_IB_MIN_SQ_STRIDE)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2007-06-18 23:13:48 +08:00
|
|
|
qp->sq.wqe_cnt = 1 << ucmd->log_sq_bb_count;
|
2007-05-17 15:32:41 +08:00
|
|
|
qp->sq.wqe_shift = ucmd->log_sq_stride;
|
|
|
|
|
2007-06-18 23:13:48 +08:00
|
|
|
qp->buf_size = (qp->rq.wqe_cnt << qp->rq.wqe_shift) +
|
|
|
|
(qp->sq.wqe_cnt << qp->sq.wqe_shift);
|
2007-05-17 15:32:41 +08:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2012-08-03 16:40:40 +08:00
|
|
|
static int alloc_proxy_bufs(struct ib_device *dev, struct mlx4_ib_qp *qp)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
qp->sqp_proxy_rcv =
|
|
|
|
kmalloc(sizeof (struct mlx4_ib_buf) * qp->rq.wqe_cnt,
|
|
|
|
GFP_KERNEL);
|
|
|
|
if (!qp->sqp_proxy_rcv)
|
|
|
|
return -ENOMEM;
|
|
|
|
for (i = 0; i < qp->rq.wqe_cnt; i++) {
|
|
|
|
qp->sqp_proxy_rcv[i].addr =
|
|
|
|
kmalloc(sizeof (struct mlx4_ib_proxy_sqp_hdr),
|
|
|
|
GFP_KERNEL);
|
|
|
|
if (!qp->sqp_proxy_rcv[i].addr)
|
|
|
|
goto err;
|
|
|
|
qp->sqp_proxy_rcv[i].map =
|
|
|
|
ib_dma_map_single(dev, qp->sqp_proxy_rcv[i].addr,
|
|
|
|
sizeof (struct mlx4_ib_proxy_sqp_hdr),
|
|
|
|
DMA_FROM_DEVICE);
|
2015-03-17 01:49:59 +08:00
|
|
|
if (ib_dma_mapping_error(dev, qp->sqp_proxy_rcv[i].map)) {
|
|
|
|
kfree(qp->sqp_proxy_rcv[i].addr);
|
|
|
|
goto err;
|
|
|
|
}
|
2012-08-03 16:40:40 +08:00
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
err:
|
|
|
|
while (i > 0) {
|
|
|
|
--i;
|
|
|
|
ib_dma_unmap_single(dev, qp->sqp_proxy_rcv[i].map,
|
|
|
|
sizeof (struct mlx4_ib_proxy_sqp_hdr),
|
|
|
|
DMA_FROM_DEVICE);
|
|
|
|
kfree(qp->sqp_proxy_rcv[i].addr);
|
|
|
|
}
|
|
|
|
kfree(qp->sqp_proxy_rcv);
|
|
|
|
qp->sqp_proxy_rcv = NULL;
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void free_proxy_bufs(struct ib_device *dev, struct mlx4_ib_qp *qp)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < qp->rq.wqe_cnt; i++) {
|
|
|
|
ib_dma_unmap_single(dev, qp->sqp_proxy_rcv[i].map,
|
|
|
|
sizeof (struct mlx4_ib_proxy_sqp_hdr),
|
|
|
|
DMA_FROM_DEVICE);
|
|
|
|
kfree(qp->sqp_proxy_rcv[i].addr);
|
|
|
|
}
|
|
|
|
kfree(qp->sqp_proxy_rcv);
|
|
|
|
}
|
|
|
|
|
2011-06-03 02:32:15 +08:00
|
|
|
static int qp_has_rq(struct ib_qp_init_attr *attr)
|
|
|
|
{
|
|
|
|
if (attr->qp_type == IB_QPT_XRC_INI || attr->qp_type == IB_QPT_XRC_TGT)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
return !attr->srq;
|
|
|
|
}
|
|
|
|
|
2014-05-29 21:31:03 +08:00
|
|
|
static int qp0_enabled_vf(struct mlx4_dev *dev, int qpn)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
for (i = 0; i < dev->caps.num_ports; i++) {
|
|
|
|
if (qpn == dev->caps.qp0_proxy[i])
|
|
|
|
return !!dev->caps.qp0_qkey[i];
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2015-10-15 19:44:41 +08:00
|
|
|
static void mlx4_ib_free_qp_counter(struct mlx4_ib_dev *dev,
|
|
|
|
struct mlx4_ib_qp *qp)
|
|
|
|
{
|
|
|
|
mutex_lock(&dev->counters_table[qp->port - 1].mutex);
|
|
|
|
mlx4_counter_free(dev->dev, qp->counter_index->index);
|
|
|
|
list_del(&qp->counter_index->list);
|
|
|
|
mutex_unlock(&dev->counters_table[qp->port - 1].mutex);
|
|
|
|
|
|
|
|
kfree(qp->counter_index);
|
|
|
|
qp->counter_index = NULL;
|
|
|
|
}
|
|
|
|
|
2007-05-09 09:00:38 +08:00
|
|
|
static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
|
|
|
|
struct ib_qp_init_attr *init_attr,
|
2014-05-11 20:15:12 +08:00
|
|
|
struct ib_udata *udata, int sqpn, struct mlx4_ib_qp **caller_qp,
|
|
|
|
gfp_t gfp)
|
2007-05-09 09:00:38 +08:00
|
|
|
{
|
2008-10-11 03:01:37 +08:00
|
|
|
int qpn;
|
2007-05-09 09:00:38 +08:00
|
|
|
int err;
|
2016-05-04 19:50:15 +08:00
|
|
|
struct ib_qp_cap backup_cap;
|
2012-08-03 16:40:40 +08:00
|
|
|
struct mlx4_ib_sqp *sqp;
|
|
|
|
struct mlx4_ib_qp *qp;
|
|
|
|
enum mlx4_ib_qp_type qp_type = (enum mlx4_ib_qp_type) init_attr->qp_type;
|
2015-02-08 17:49:34 +08:00
|
|
|
struct mlx4_ib_cq *mcq;
|
|
|
|
unsigned long flags;
|
2012-08-03 16:40:40 +08:00
|
|
|
|
|
|
|
/* When tunneling special qps, we use a plain UD qp */
|
|
|
|
if (sqpn) {
|
|
|
|
if (mlx4_is_mfunc(dev->dev) &&
|
|
|
|
(!mlx4_is_master(dev->dev) ||
|
|
|
|
!(init_attr->create_flags & MLX4_IB_SRIOV_SQP))) {
|
|
|
|
if (init_attr->qp_type == IB_QPT_GSI)
|
|
|
|
qp_type = MLX4_IB_QPT_PROXY_GSI;
|
2014-05-29 21:31:03 +08:00
|
|
|
else {
|
|
|
|
if (mlx4_is_master(dev->dev) ||
|
|
|
|
qp0_enabled_vf(dev->dev, sqpn))
|
|
|
|
qp_type = MLX4_IB_QPT_PROXY_SMI_OWNER;
|
|
|
|
else
|
|
|
|
qp_type = MLX4_IB_QPT_PROXY_SMI;
|
|
|
|
}
|
2012-08-03 16:40:40 +08:00
|
|
|
}
|
|
|
|
qpn = sqpn;
|
|
|
|
/* add extra sg entry for tunneling */
|
|
|
|
init_attr->cap.max_recv_sge++;
|
|
|
|
} else if (init_attr->create_flags & MLX4_IB_SRIOV_TUNNEL_QP) {
|
|
|
|
struct mlx4_ib_qp_tunnel_init_attr *tnl_init =
|
|
|
|
container_of(init_attr,
|
|
|
|
struct mlx4_ib_qp_tunnel_init_attr, init_attr);
|
|
|
|
if ((tnl_init->proxy_qp_type != IB_QPT_SMI &&
|
|
|
|
tnl_init->proxy_qp_type != IB_QPT_GSI) ||
|
|
|
|
!mlx4_is_master(dev->dev))
|
|
|
|
return -EINVAL;
|
|
|
|
if (tnl_init->proxy_qp_type == IB_QPT_GSI)
|
|
|
|
qp_type = MLX4_IB_QPT_TUN_GSI;
|
2014-05-29 21:31:03 +08:00
|
|
|
else if (tnl_init->slave == mlx4_master_func_num(dev->dev) ||
|
|
|
|
mlx4_vf_smi_enabled(dev->dev, tnl_init->slave,
|
|
|
|
tnl_init->port))
|
2012-08-03 16:40:40 +08:00
|
|
|
qp_type = MLX4_IB_QPT_TUN_SMI_OWNER;
|
|
|
|
else
|
|
|
|
qp_type = MLX4_IB_QPT_TUN_SMI;
|
mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations
Previously, the structure of a guest's proxy QPs followed the
structure of the PPF special qps (qp0 port 1, qp0 port 2, qp1 port 1,
qp1 port 2, ...). The guest then did offset calculations on the
sqp_base qp number that the PPF passed to it in QUERY_FUNC_CAP().
This is now changed so that the guest does no offset calculations
regarding proxy or tunnel QPs to use. This change frees the PPF from
needing to adhere to a specific order in allocating proxy and tunnel
QPs.
Now QUERY_FUNC_CAP provides each port individually with its proxy
qp0, proxy qp1, tunnel qp0, and tunnel qp1 QP numbers, and these are
used directly where required (with no offset calculations).
To accomplish this change, several fields were added to the phys_caps
structure for use by the PPF and by non-SR-IOV mode:
base_sqpn -- in non-sriov mode, this was formerly sqp_start.
base_proxy_sqpn -- the first physical proxy qp number -- used by PPF
base_tunnel_sqpn -- the first physical tunnel qp number -- used by PPF.
The current code in the PPF still adheres to the previous layout of
sqps, proxy-sqps and tunnel-sqps. However, the PPF can change this
layout without affecting VF or (paravirtualized) PF code.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-08-03 16:40:57 +08:00
|
|
|
/* we are definitely in the PPF here, since we are creating
|
|
|
|
* tunnel QPs. base_tunnel_sqpn is therefore valid. */
|
|
|
|
qpn = dev->dev->phys_caps.base_tunnel_sqpn + 8 * tnl_init->slave
|
|
|
|
+ tnl_init->proxy_qp_type * 2 + tnl_init->port - 1;
|
2012-08-03 16:40:40 +08:00
|
|
|
sqpn = qpn;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!*caller_qp) {
|
|
|
|
if (qp_type == MLX4_IB_QPT_SMI || qp_type == MLX4_IB_QPT_GSI ||
|
|
|
|
(qp_type & (MLX4_IB_QPT_PROXY_SMI | MLX4_IB_QPT_PROXY_SMI_OWNER |
|
|
|
|
MLX4_IB_QPT_PROXY_GSI | MLX4_IB_QPT_TUN_SMI_OWNER))) {
|
2014-06-09 22:36:33 +08:00
|
|
|
sqp = kzalloc(sizeof (struct mlx4_ib_sqp), gfp);
|
2012-08-03 16:40:40 +08:00
|
|
|
if (!sqp)
|
|
|
|
return -ENOMEM;
|
|
|
|
qp = &sqp->qp;
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
qp->pri.vid = 0xFFFF;
|
|
|
|
qp->alt.vid = 0xFFFF;
|
2012-08-03 16:40:40 +08:00
|
|
|
} else {
|
2014-06-09 22:36:33 +08:00
|
|
|
qp = kzalloc(sizeof (struct mlx4_ib_qp), gfp);
|
2012-08-03 16:40:40 +08:00
|
|
|
if (!qp)
|
|
|
|
return -ENOMEM;
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
qp->pri.vid = 0xFFFF;
|
|
|
|
qp->alt.vid = 0xFFFF;
|
2012-08-03 16:40:40 +08:00
|
|
|
}
|
|
|
|
} else
|
|
|
|
qp = *caller_qp;
|
|
|
|
|
|
|
|
qp->mlx4_ib_qp_type = qp_type;
|
2007-05-09 09:00:38 +08:00
|
|
|
|
|
|
|
mutex_init(&qp->mutex);
|
|
|
|
spin_lock_init(&qp->sq.lock);
|
|
|
|
spin_lock_init(&qp->rq.lock);
|
2010-10-25 12:08:52 +08:00
|
|
|
INIT_LIST_HEAD(&qp->gid_list);
|
{NET, IB}/mlx4: Add device managed flow steering firmware API
The driver is modified to support three operation modes.
If supported by firmware use the device managed flow steering
API, that which we call device managed steering mode. Else, if
the firmware supports the B0 steering mode use it, and finally,
if none of the above, use the A0 steering mode.
When the steering mode is device managed, the code is modified
such that L2 based rules set by the mlx4_en driver for Ethernet
unicast and multicast, and the IB stack multicast attach calls
done through the mlx4_ib driver are all routed to use the device
managed API.
When attaching rule using device managed flow steering API,
the firmware returns a 64 bit registration id, which is to be
provided during detach.
Currently the firmware is always programmed during HCA initialization
to use standard L2 hashing. Future work should be done to allow
configuring the flow-steering hash function with common, non
proprietary means.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-05 12:03:46 +08:00
|
|
|
INIT_LIST_HEAD(&qp->steering_rules);
|
2007-05-09 09:00:38 +08:00
|
|
|
|
|
|
|
qp->state = IB_QPS_RESET;
|
IB/mlx4: Use multiple WQ blocks to post smaller send WQEs
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift. This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting. Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.
Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.
Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs. However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware. And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.
When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-28 16:40:59 +08:00
|
|
|
if (init_attr->sq_sig_type == IB_SIGNAL_ALL_WR)
|
|
|
|
qp->sq_signal_bits = cpu_to_be32(MLX4_WQE_CTRL_CQ_UPDATE);
|
2007-05-09 09:00:38 +08:00
|
|
|
|
2011-06-03 02:32:15 +08:00
|
|
|
err = set_rq_size(dev, &init_attr->cap, !!pd->uobject, qp_has_rq(init_attr), qp);
|
2007-05-09 09:00:38 +08:00
|
|
|
if (err)
|
|
|
|
goto err;
|
|
|
|
|
|
|
|
if (pd->uobject) {
|
|
|
|
struct mlx4_ib_create_qp ucmd;
|
|
|
|
|
|
|
|
if (ib_copy_from_udata(&ucmd, udata, sizeof ucmd)) {
|
|
|
|
err = -EFAULT;
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
2007-06-18 23:13:48 +08:00
|
|
|
qp->sq_no_prefetch = ucmd.sq_no_prefetch;
|
|
|
|
|
2007-10-18 23:36:43 +08:00
|
|
|
err = set_user_sq_size(dev, qp, &ucmd);
|
2007-05-17 15:32:41 +08:00
|
|
|
if (err)
|
|
|
|
goto err;
|
|
|
|
|
2007-05-09 09:00:38 +08:00
|
|
|
qp->umem = ib_umem_get(pd->uobject->context, ucmd.buf_addr,
|
2008-04-29 16:00:34 +08:00
|
|
|
qp->buf_size, 0, 0);
|
2007-05-09 09:00:38 +08:00
|
|
|
if (IS_ERR(qp->umem)) {
|
|
|
|
err = PTR_ERR(qp->umem);
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
|
|
|
err = mlx4_mtt_init(dev->dev, ib_umem_page_count(qp->umem),
|
|
|
|
ilog2(qp->umem->page_size), &qp->mtt);
|
|
|
|
if (err)
|
|
|
|
goto err_buf;
|
|
|
|
|
|
|
|
err = mlx4_ib_umem_write_mtt(dev, &qp->mtt, qp->umem);
|
|
|
|
if (err)
|
|
|
|
goto err_mtt;
|
|
|
|
|
2011-06-03 02:32:15 +08:00
|
|
|
if (qp_has_rq(init_attr)) {
|
2007-05-24 06:16:08 +08:00
|
|
|
err = mlx4_ib_db_map_user(to_mucontext(pd->uobject->context),
|
|
|
|
ucmd.db_addr, &qp->db);
|
|
|
|
if (err)
|
|
|
|
goto err_mtt;
|
|
|
|
}
|
2007-05-09 09:00:38 +08:00
|
|
|
} else {
|
2007-06-18 23:13:48 +08:00
|
|
|
qp->sq_no_prefetch = 0;
|
|
|
|
|
2008-04-17 12:09:27 +08:00
|
|
|
if (init_attr->create_flags & IB_QP_CREATE_IPOIB_UD_LSO)
|
|
|
|
qp->flags |= MLX4_IB_QP_LSO;
|
|
|
|
|
2013-11-07 21:25:17 +08:00
|
|
|
if (init_attr->create_flags & IB_QP_CREATE_NETIF_QP) {
|
|
|
|
if (dev->steering_support ==
|
|
|
|
MLX4_STEERING_MODE_DEVICE_MANAGED)
|
|
|
|
qp->flags |= MLX4_IB_QP_NETIF;
|
|
|
|
else
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
2016-05-04 19:50:15 +08:00
|
|
|
memcpy(&backup_cap, &init_attr->cap, sizeof(backup_cap));
|
|
|
|
err = set_kernel_sq_size(dev, &init_attr->cap,
|
|
|
|
qp_type, qp, true);
|
2007-05-17 15:32:41 +08:00
|
|
|
if (err)
|
|
|
|
goto err;
|
|
|
|
|
2011-06-03 02:32:15 +08:00
|
|
|
if (qp_has_rq(init_attr)) {
|
2014-05-11 20:15:12 +08:00
|
|
|
err = mlx4_db_alloc(dev->dev, &qp->db, 0, gfp);
|
2007-05-24 06:16:08 +08:00
|
|
|
if (err)
|
|
|
|
goto err;
|
2007-05-09 09:00:38 +08:00
|
|
|
|
2007-05-24 06:16:08 +08:00
|
|
|
*qp->db.db = 0;
|
|
|
|
}
|
2007-05-09 09:00:38 +08:00
|
|
|
|
2016-05-04 19:50:15 +08:00
|
|
|
if (mlx4_buf_alloc(dev->dev, qp->buf_size, qp->buf_size,
|
|
|
|
&qp->buf, gfp)) {
|
|
|
|
memcpy(&init_attr->cap, &backup_cap,
|
|
|
|
sizeof(backup_cap));
|
|
|
|
err = set_kernel_sq_size(dev, &init_attr->cap, qp_type,
|
|
|
|
qp, false);
|
|
|
|
if (err)
|
|
|
|
goto err_db;
|
|
|
|
|
|
|
|
if (mlx4_buf_alloc(dev->dev, qp->buf_size,
|
|
|
|
PAGE_SIZE * 2, &qp->buf, gfp)) {
|
|
|
|
err = -ENOMEM;
|
|
|
|
goto err_db;
|
|
|
|
}
|
2007-05-09 09:00:38 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
err = mlx4_mtt_init(dev->dev, qp->buf.npages, qp->buf.page_shift,
|
|
|
|
&qp->mtt);
|
|
|
|
if (err)
|
|
|
|
goto err_buf;
|
|
|
|
|
2014-05-11 20:15:12 +08:00
|
|
|
err = mlx4_buf_write_mtt(dev->dev, &qp->mtt, &qp->buf, gfp);
|
2007-05-09 09:00:38 +08:00
|
|
|
if (err)
|
|
|
|
goto err_mtt;
|
|
|
|
|
2015-12-17 15:31:53 +08:00
|
|
|
qp->sq.wrid = kmalloc_array(qp->sq.wqe_cnt, sizeof(u64),
|
2015-12-17 15:31:52 +08:00
|
|
|
gfp | __GFP_NOWARN);
|
2015-10-08 13:27:04 +08:00
|
|
|
if (!qp->sq.wrid)
|
|
|
|
qp->sq.wrid = __vmalloc(qp->sq.wqe_cnt * sizeof(u64),
|
|
|
|
gfp, PAGE_KERNEL);
|
2015-12-17 15:31:53 +08:00
|
|
|
qp->rq.wrid = kmalloc_array(qp->rq.wqe_cnt, sizeof(u64),
|
2015-12-17 15:31:52 +08:00
|
|
|
gfp | __GFP_NOWARN);
|
2015-10-08 13:27:04 +08:00
|
|
|
if (!qp->rq.wrid)
|
|
|
|
qp->rq.wrid = __vmalloc(qp->rq.wqe_cnt * sizeof(u64),
|
|
|
|
gfp, PAGE_KERNEL);
|
2007-05-09 09:00:38 +08:00
|
|
|
if (!qp->sq.wrid || !qp->rq.wrid) {
|
|
|
|
err = -ENOMEM;
|
|
|
|
goto err_wrid;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2008-10-11 03:01:37 +08:00
|
|
|
if (sqpn) {
|
2012-08-03 16:40:40 +08:00
|
|
|
if (qp->mlx4_ib_qp_type & (MLX4_IB_QPT_PROXY_SMI_OWNER |
|
|
|
|
MLX4_IB_QPT_PROXY_SMI | MLX4_IB_QPT_PROXY_GSI)) {
|
|
|
|
if (alloc_proxy_bufs(pd->device, qp)) {
|
|
|
|
err = -ENOMEM;
|
|
|
|
goto err_wrid;
|
|
|
|
}
|
|
|
|
}
|
2008-10-11 03:01:37 +08:00
|
|
|
} else {
|
net/mlx4: Change QP allocation scheme
When using BF (Blue-Flame), the QPN overrides the VLAN, CV, and SV fields
in the WQE. Thus, BF may only be used for QPNs with bits 6,7 unset.
The current Ethernet driver code reserves a Tx QP range with 256b alignment.
This is wrong because if there are more than 64 Tx QPs in use,
QPNs >= base + 65 will have bits 6/7 set.
This problem is not specific for the Ethernet driver, any entity that
tries to reserve more than 64 BF-enabled QPs should fail. Also, using
ranges is not necessary here and is wasteful.
The new mechanism introduced here will support reservation for
"Eth QPs eligible for BF" for all drivers: bare-metal, multi-PF, and VFs
(when hypervisors support WC in VMs). The flow we use is:
1. In mlx4_en, allocate Tx QPs one by one instead of a range allocation,
and request "BF enabled QPs" if BF is supported for the function
2. In the ALLOC_RES FW command, change param1 to:
a. param1[23:0] - number of QPs
b. param1[31-24] - flags controlling QPs reservation
Bit 31 refers to Eth blueflame supported QPs. Those QPs must have
bits 6 and 7 unset in order to be used in Ethernet.
Bits 24-30 of the flags are currently reserved.
When a function tries to allocate a QP, it states the required attributes
for this QP. Those attributes are considered "best-effort". If an attribute,
such as Ethernet BF enabled QP, is a must-have attribute, the function has
to check that attribute is supported before trying to do the allocation.
In a lower layer of the code, mlx4_qp_reserve_range masks out the bits
which are unsupported. If SRIOV is used, the PF validates those attributes
and masks out unsupported attributes as well. In order to notify VFs which
attributes are supported, the VF uses QUERY_FUNC_CAP command. This command's
mailbox is filled by the PF, which notifies which QP allocation attributes
it supports.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 16:57:54 +08:00
|
|
|
/* Raw packet QPNs may not have bits 6,7 set in their qp_num;
|
|
|
|
* otherwise, the WQE BlueFlame setup flow wrongly causes
|
|
|
|
* VLAN insertion. */
|
2012-01-17 19:39:07 +08:00
|
|
|
if (init_attr->qp_type == IB_QPT_RAW_PACKET)
|
net/mlx4: Change QP allocation scheme
When using BF (Blue-Flame), the QPN overrides the VLAN, CV, and SV fields
in the WQE. Thus, BF may only be used for QPNs with bits 6,7 unset.
The current Ethernet driver code reserves a Tx QP range with 256b alignment.
This is wrong because if there are more than 64 Tx QPs in use,
QPNs >= base + 65 will have bits 6/7 set.
This problem is not specific for the Ethernet driver, any entity that
tries to reserve more than 64 BF-enabled QPs should fail. Also, using
ranges is not necessary here and is wasteful.
The new mechanism introduced here will support reservation for
"Eth QPs eligible for BF" for all drivers: bare-metal, multi-PF, and VFs
(when hypervisors support WC in VMs). The flow we use is:
1. In mlx4_en, allocate Tx QPs one by one instead of a range allocation,
and request "BF enabled QPs" if BF is supported for the function
2. In the ALLOC_RES FW command, change param1 to:
a. param1[23:0] - number of QPs
b. param1[31-24] - flags controlling QPs reservation
Bit 31 refers to Eth blueflame supported QPs. Those QPs must have
bits 6 and 7 unset in order to be used in Ethernet.
Bits 24-30 of the flags are currently reserved.
When a function tries to allocate a QP, it states the required attributes
for this QP. Those attributes are considered "best-effort". If an attribute,
such as Ethernet BF enabled QP, is a must-have attribute, the function has
to check that attribute is supported before trying to do the allocation.
In a lower layer of the code, mlx4_qp_reserve_range masks out the bits
which are unsupported. If SRIOV is used, the PF validates those attributes
and masks out unsupported attributes as well. In order to notify VFs which
attributes are supported, the VF uses QUERY_FUNC_CAP command. This command's
mailbox is filled by the PF, which notifies which QP allocation attributes
it supports.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 16:57:54 +08:00
|
|
|
err = mlx4_qp_reserve_range(dev->dev, 1, 1, &qpn,
|
net/mlx4: Add A0 hybrid steering
A0 hybrid steering is a form of high performance flow steering.
By using this mode, mlx4 cards use a fast limited table based steering,
in order to enable fast steering of unicast packets to a QP.
In order to implement A0 hybrid steering we allocate resources
from different zones:
(1) General range
(2) Special MAC-assigned QPs [RSS, Raw-Ethernet] each has its own region.
When we create a rss QP or a raw ethernet (A0 steerable and BF ready) QP,
we try hard to allocate the QP from range (2). Otherwise, we try hard not
to allocate from this range. However, when the system is pushed to its
limits and one needs every resource, the allocator uses every region it can.
Meaning, when we run out of raw-eth qps, the allocator allocates from the
general range (and the special-A0 area is no longer active). If we run out
of RSS qps, the mechanism tries to allocate from the raw-eth QP zone. If that
is also exhausted, the allocator will allocate from the general range
(and the A0 region is no longer active).
Note that if a raw-eth qp is allocated from the general range, it attempts
to allocate the range such that bits 6 and 7 (blueflame bits) in the
QP number are not set.
When the feature is used in SRIOV, the VF has to notify the PF what
kind of QP attributes it needs. In order to do that, along with the
"Eth QP blueflame" bit, we reserve a new "A0 steerable QP". According
to the combination of these bits, the PF tries to allocate a suitable QP.
In order to maintain backward compatibility (with older PFs), the PF
notifies which QP attributes it supports via QUERY_FUNC_CAP command.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 16:57:57 +08:00
|
|
|
(init_attr->cap.max_send_wr ?
|
|
|
|
MLX4_RESERVE_ETH_BF_QP : 0) |
|
|
|
|
(init_attr->cap.max_recv_wr ?
|
|
|
|
MLX4_RESERVE_A0_QP : 0));
|
2012-01-17 19:39:07 +08:00
|
|
|
else
|
2013-11-07 21:25:17 +08:00
|
|
|
if (qp->flags & MLX4_IB_QP_NETIF)
|
|
|
|
err = mlx4_ib_steer_qp_alloc(dev, 1, &qpn);
|
|
|
|
else
|
|
|
|
err = mlx4_qp_reserve_range(dev->dev, 1, 1,
|
net/mlx4: Change QP allocation scheme
When using BF (Blue-Flame), the QPN overrides the VLAN, CV, and SV fields
in the WQE. Thus, BF may only be used for QPNs with bits 6,7 unset.
The current Ethernet driver code reserves a Tx QP range with 256b alignment.
This is wrong because if there are more than 64 Tx QPs in use,
QPNs >= base + 65 will have bits 6/7 set.
This problem is not specific for the Ethernet driver, any entity that
tries to reserve more than 64 BF-enabled QPs should fail. Also, using
ranges is not necessary here and is wasteful.
The new mechanism introduced here will support reservation for
"Eth QPs eligible for BF" for all drivers: bare-metal, multi-PF, and VFs
(when hypervisors support WC in VMs). The flow we use is:
1. In mlx4_en, allocate Tx QPs one by one instead of a range allocation,
and request "BF enabled QPs" if BF is supported for the function
2. In the ALLOC_RES FW command, change param1 to:
a. param1[23:0] - number of QPs
b. param1[31-24] - flags controlling QPs reservation
Bit 31 refers to Eth blueflame supported QPs. Those QPs must have
bits 6 and 7 unset in order to be used in Ethernet.
Bits 24-30 of the flags are currently reserved.
When a function tries to allocate a QP, it states the required attributes
for this QP. Those attributes are considered "best-effort". If an attribute,
such as Ethernet BF enabled QP, is a must-have attribute, the function has
to check that attribute is supported before trying to do the allocation.
In a lower layer of the code, mlx4_qp_reserve_range masks out the bits
which are unsupported. If SRIOV is used, the PF validates those attributes
and masks out unsupported attributes as well. In order to notify VFs which
attributes are supported, the VF uses QUERY_FUNC_CAP command. This command's
mailbox is filled by the PF, which notifies which QP allocation attributes
it supports.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 16:57:54 +08:00
|
|
|
&qpn, 0);
|
2008-10-11 03:01:37 +08:00
|
|
|
if (err)
|
2012-08-03 16:40:40 +08:00
|
|
|
goto err_proxy;
|
2008-10-11 03:01:37 +08:00
|
|
|
}
|
|
|
|
|
2015-10-15 19:44:42 +08:00
|
|
|
if (init_attr->create_flags & IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK)
|
|
|
|
qp->flags |= MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK;
|
|
|
|
|
2014-05-11 20:15:12 +08:00
|
|
|
err = mlx4_qp_alloc(dev->dev, qpn, &qp->mqp, gfp);
|
2007-05-09 09:00:38 +08:00
|
|
|
if (err)
|
2008-10-11 03:01:37 +08:00
|
|
|
goto err_qpn;
|
2007-05-09 09:00:38 +08:00
|
|
|
|
2011-06-03 02:32:15 +08:00
|
|
|
if (init_attr->qp_type == IB_QPT_XRC_TGT)
|
|
|
|
qp->mqp.qpn |= (1 << 23);
|
|
|
|
|
2007-05-09 09:00:38 +08:00
|
|
|
/*
|
|
|
|
* Hardware wants QPN written in big-endian order (after
|
|
|
|
* shifting) for send doorbell. Precompute this value to save
|
|
|
|
* a little bit when posting sends.
|
|
|
|
*/
|
|
|
|
qp->doorbell_qpn = swab32(qp->mqp.qpn << 8);
|
|
|
|
|
|
|
|
qp->mqp.event = mlx4_ib_qp_event;
|
2012-08-03 16:40:40 +08:00
|
|
|
if (!*caller_qp)
|
|
|
|
*caller_qp = qp;
|
2015-02-08 17:49:34 +08:00
|
|
|
|
|
|
|
spin_lock_irqsave(&dev->reset_flow_resource_lock, flags);
|
|
|
|
mlx4_ib_lock_cqs(to_mcq(init_attr->send_cq),
|
|
|
|
to_mcq(init_attr->recv_cq));
|
|
|
|
/* Maintain device to QPs access, needed for further handling
|
|
|
|
* via reset flow
|
|
|
|
*/
|
|
|
|
list_add_tail(&qp->qps_list, &dev->qp_list);
|
|
|
|
/* Maintain CQ to QPs access, needed for further handling
|
|
|
|
* via reset flow
|
|
|
|
*/
|
|
|
|
mcq = to_mcq(init_attr->send_cq);
|
|
|
|
list_add_tail(&qp->cq_send_list, &mcq->send_qp_list);
|
|
|
|
mcq = to_mcq(init_attr->recv_cq);
|
|
|
|
list_add_tail(&qp->cq_recv_list, &mcq->recv_qp_list);
|
|
|
|
mlx4_ib_unlock_cqs(to_mcq(init_attr->send_cq),
|
|
|
|
to_mcq(init_attr->recv_cq));
|
|
|
|
spin_unlock_irqrestore(&dev->reset_flow_resource_lock, flags);
|
2007-05-09 09:00:38 +08:00
|
|
|
return 0;
|
|
|
|
|
2008-10-11 03:01:37 +08:00
|
|
|
err_qpn:
|
2013-11-07 21:25:17 +08:00
|
|
|
if (!sqpn) {
|
|
|
|
if (qp->flags & MLX4_IB_QP_NETIF)
|
|
|
|
mlx4_ib_steer_qp_free(dev, qpn, 1);
|
|
|
|
else
|
|
|
|
mlx4_qp_release_range(dev->dev, qpn, 1);
|
|
|
|
}
|
2012-08-03 16:40:40 +08:00
|
|
|
err_proxy:
|
|
|
|
if (qp->mlx4_ib_qp_type == MLX4_IB_QPT_PROXY_GSI)
|
|
|
|
free_proxy_bufs(pd->device, qp);
|
2007-05-09 09:00:38 +08:00
|
|
|
err_wrid:
|
2007-07-21 12:19:43 +08:00
|
|
|
if (pd->uobject) {
|
2011-06-03 02:32:15 +08:00
|
|
|
if (qp_has_rq(init_attr))
|
|
|
|
mlx4_ib_db_unmap_user(to_mucontext(pd->uobject->context), &qp->db);
|
2007-07-21 12:19:43 +08:00
|
|
|
} else {
|
2015-10-08 13:27:04 +08:00
|
|
|
kvfree(qp->sq.wrid);
|
|
|
|
kvfree(qp->rq.wrid);
|
2007-05-09 09:00:38 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
err_mtt:
|
|
|
|
mlx4_mtt_cleanup(dev->dev, &qp->mtt);
|
|
|
|
|
|
|
|
err_buf:
|
|
|
|
if (pd->uobject)
|
|
|
|
ib_umem_release(qp->umem);
|
|
|
|
else
|
|
|
|
mlx4_buf_free(dev->dev, qp->buf_size, &qp->buf);
|
|
|
|
|
|
|
|
err_db:
|
2011-06-03 02:32:15 +08:00
|
|
|
if (!pd->uobject && qp_has_rq(init_attr))
|
2008-04-24 02:55:45 +08:00
|
|
|
mlx4_db_free(dev->dev, &qp->db);
|
2007-05-09 09:00:38 +08:00
|
|
|
|
|
|
|
err:
|
2012-08-03 16:40:40 +08:00
|
|
|
if (!*caller_qp)
|
|
|
|
kfree(qp);
|
2007-05-09 09:00:38 +08:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
static enum mlx4_qp_state to_mlx4_state(enum ib_qp_state state)
|
|
|
|
{
|
|
|
|
switch (state) {
|
|
|
|
case IB_QPS_RESET: return MLX4_QP_STATE_RST;
|
|
|
|
case IB_QPS_INIT: return MLX4_QP_STATE_INIT;
|
|
|
|
case IB_QPS_RTR: return MLX4_QP_STATE_RTR;
|
|
|
|
case IB_QPS_RTS: return MLX4_QP_STATE_RTS;
|
|
|
|
case IB_QPS_SQD: return MLX4_QP_STATE_SQD;
|
|
|
|
case IB_QPS_SQE: return MLX4_QP_STATE_SQER;
|
|
|
|
case IB_QPS_ERR: return MLX4_QP_STATE_ERR;
|
|
|
|
default: return -1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void mlx4_ib_lock_cqs(struct mlx4_ib_cq *send_cq, struct mlx4_ib_cq *recv_cq)
|
2009-09-06 11:24:49 +08:00
|
|
|
__acquires(&send_cq->lock) __acquires(&recv_cq->lock)
|
2007-05-09 09:00:38 +08:00
|
|
|
{
|
2009-09-06 11:24:49 +08:00
|
|
|
if (send_cq == recv_cq) {
|
2015-02-08 17:49:34 +08:00
|
|
|
spin_lock(&send_cq->lock);
|
2009-09-06 11:24:49 +08:00
|
|
|
__acquire(&recv_cq->lock);
|
|
|
|
} else if (send_cq->mcq.cqn < recv_cq->mcq.cqn) {
|
2015-02-08 17:49:34 +08:00
|
|
|
spin_lock(&send_cq->lock);
|
2007-05-09 09:00:38 +08:00
|
|
|
spin_lock_nested(&recv_cq->lock, SINGLE_DEPTH_NESTING);
|
|
|
|
} else {
|
2015-02-08 17:49:34 +08:00
|
|
|
spin_lock(&recv_cq->lock);
|
2007-05-09 09:00:38 +08:00
|
|
|
spin_lock_nested(&send_cq->lock, SINGLE_DEPTH_NESTING);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void mlx4_ib_unlock_cqs(struct mlx4_ib_cq *send_cq, struct mlx4_ib_cq *recv_cq)
|
2009-09-06 11:24:49 +08:00
|
|
|
__releases(&send_cq->lock) __releases(&recv_cq->lock)
|
2007-05-09 09:00:38 +08:00
|
|
|
{
|
2009-09-06 11:24:49 +08:00
|
|
|
if (send_cq == recv_cq) {
|
|
|
|
__release(&recv_cq->lock);
|
2015-02-08 17:49:34 +08:00
|
|
|
spin_unlock(&send_cq->lock);
|
2009-09-06 11:24:49 +08:00
|
|
|
} else if (send_cq->mcq.cqn < recv_cq->mcq.cqn) {
|
2007-05-09 09:00:38 +08:00
|
|
|
spin_unlock(&recv_cq->lock);
|
2015-02-08 17:49:34 +08:00
|
|
|
spin_unlock(&send_cq->lock);
|
2007-05-09 09:00:38 +08:00
|
|
|
} else {
|
|
|
|
spin_unlock(&send_cq->lock);
|
2015-02-08 17:49:34 +08:00
|
|
|
spin_unlock(&recv_cq->lock);
|
2007-05-09 09:00:38 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2010-10-25 12:08:52 +08:00
|
|
|
static void del_gid_entries(struct mlx4_ib_qp *qp)
|
|
|
|
{
|
|
|
|
struct mlx4_ib_gid_entry *ge, *tmp;
|
|
|
|
|
|
|
|
list_for_each_entry_safe(ge, tmp, &qp->gid_list, list) {
|
|
|
|
list_del(&ge->list);
|
|
|
|
kfree(ge);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2011-06-03 02:32:15 +08:00
|
|
|
static struct mlx4_ib_pd *get_pd(struct mlx4_ib_qp *qp)
|
|
|
|
{
|
|
|
|
if (qp->ibqp.qp_type == IB_QPT_XRC_TGT)
|
|
|
|
return to_mpd(to_mxrcd(qp->ibqp.xrcd)->pd);
|
|
|
|
else
|
|
|
|
return to_mpd(qp->ibqp.pd);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void get_cqs(struct mlx4_ib_qp *qp,
|
|
|
|
struct mlx4_ib_cq **send_cq, struct mlx4_ib_cq **recv_cq)
|
|
|
|
{
|
|
|
|
switch (qp->ibqp.qp_type) {
|
|
|
|
case IB_QPT_XRC_TGT:
|
|
|
|
*send_cq = to_mcq(to_mxrcd(qp->ibqp.xrcd)->cq);
|
|
|
|
*recv_cq = *send_cq;
|
|
|
|
break;
|
|
|
|
case IB_QPT_XRC_INI:
|
|
|
|
*send_cq = to_mcq(qp->ibqp.send_cq);
|
|
|
|
*recv_cq = *send_cq;
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
*send_cq = to_mcq(qp->ibqp.send_cq);
|
|
|
|
*recv_cq = to_mcq(qp->ibqp.recv_cq);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2007-05-09 09:00:38 +08:00
|
|
|
static void destroy_qp_common(struct mlx4_ib_dev *dev, struct mlx4_ib_qp *qp,
|
|
|
|
int is_user)
|
|
|
|
{
|
|
|
|
struct mlx4_ib_cq *send_cq, *recv_cq;
|
2015-02-08 17:49:34 +08:00
|
|
|
unsigned long flags;
|
2007-05-09 09:00:38 +08:00
|
|
|
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
if (qp->state != IB_QPS_RESET) {
|
2007-05-09 09:00:38 +08:00
|
|
|
if (mlx4_qp_modify(dev->dev, NULL, to_mlx4_state(qp->state),
|
|
|
|
MLX4_QP_STATE_RST, NULL, 0, 0, &qp->mqp))
|
2012-04-29 22:04:26 +08:00
|
|
|
pr_warn("modify QP %06x to RESET failed.\n",
|
2007-05-09 09:00:38 +08:00
|
|
|
qp->mqp.qpn);
|
2014-09-11 19:11:20 +08:00
|
|
|
if (qp->pri.smac || (!qp->pri.smac && qp->pri.smac_port)) {
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
mlx4_unregister_mac(dev->dev, qp->pri.smac_port, qp->pri.smac);
|
|
|
|
qp->pri.smac = 0;
|
2014-09-11 19:11:20 +08:00
|
|
|
qp->pri.smac_port = 0;
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
}
|
|
|
|
if (qp->alt.smac) {
|
|
|
|
mlx4_unregister_mac(dev->dev, qp->alt.smac_port, qp->alt.smac);
|
|
|
|
qp->alt.smac = 0;
|
|
|
|
}
|
|
|
|
if (qp->pri.vid < 0x1000) {
|
|
|
|
mlx4_unregister_vlan(dev->dev, qp->pri.vlan_port, qp->pri.vid);
|
|
|
|
qp->pri.vid = 0xFFFF;
|
|
|
|
qp->pri.candidate_vid = 0xFFFF;
|
|
|
|
qp->pri.update_vid = 0;
|
|
|
|
}
|
|
|
|
if (qp->alt.vid < 0x1000) {
|
|
|
|
mlx4_unregister_vlan(dev->dev, qp->alt.vlan_port, qp->alt.vid);
|
|
|
|
qp->alt.vid = 0xFFFF;
|
|
|
|
qp->alt.candidate_vid = 0xFFFF;
|
|
|
|
qp->alt.update_vid = 0;
|
|
|
|
}
|
|
|
|
}
|
2007-05-09 09:00:38 +08:00
|
|
|
|
2011-06-03 02:32:15 +08:00
|
|
|
get_cqs(qp, &send_cq, &recv_cq);
|
2007-05-09 09:00:38 +08:00
|
|
|
|
2015-02-08 17:49:34 +08:00
|
|
|
spin_lock_irqsave(&dev->reset_flow_resource_lock, flags);
|
2007-05-09 09:00:38 +08:00
|
|
|
mlx4_ib_lock_cqs(send_cq, recv_cq);
|
|
|
|
|
2015-02-08 17:49:34 +08:00
|
|
|
/* del from lists under both locks above to protect reset flow paths */
|
|
|
|
list_del(&qp->qps_list);
|
|
|
|
list_del(&qp->cq_send_list);
|
|
|
|
list_del(&qp->cq_recv_list);
|
2007-05-09 09:00:38 +08:00
|
|
|
if (!is_user) {
|
|
|
|
__mlx4_ib_cq_clean(recv_cq, qp->mqp.qpn,
|
|
|
|
qp->ibqp.srq ? to_msrq(qp->ibqp.srq): NULL);
|
|
|
|
if (send_cq != recv_cq)
|
|
|
|
__mlx4_ib_cq_clean(send_cq, qp->mqp.qpn, NULL);
|
|
|
|
}
|
|
|
|
|
|
|
|
mlx4_qp_remove(dev->dev, &qp->mqp);
|
|
|
|
|
|
|
|
mlx4_ib_unlock_cqs(send_cq, recv_cq);
|
2015-02-08 17:49:34 +08:00
|
|
|
spin_unlock_irqrestore(&dev->reset_flow_resource_lock, flags);
|
2007-05-09 09:00:38 +08:00
|
|
|
|
|
|
|
mlx4_qp_free(dev->dev, &qp->mqp);
|
2008-10-11 03:01:37 +08:00
|
|
|
|
2013-11-07 21:25:17 +08:00
|
|
|
if (!is_sqp(dev, qp) && !is_tunnel_qp(dev, qp)) {
|
|
|
|
if (qp->flags & MLX4_IB_QP_NETIF)
|
|
|
|
mlx4_ib_steer_qp_free(dev, qp->mqp.qpn, 1);
|
|
|
|
else
|
|
|
|
mlx4_qp_release_range(dev->dev, qp->mqp.qpn, 1);
|
|
|
|
}
|
2008-10-11 03:01:37 +08:00
|
|
|
|
2007-05-09 09:00:38 +08:00
|
|
|
mlx4_mtt_cleanup(dev->dev, &qp->mtt);
|
|
|
|
|
|
|
|
if (is_user) {
|
2011-06-03 02:32:15 +08:00
|
|
|
if (qp->rq.wqe_cnt)
|
2007-05-24 06:16:08 +08:00
|
|
|
mlx4_ib_db_unmap_user(to_mucontext(qp->ibqp.uobject->context),
|
|
|
|
&qp->db);
|
2007-05-09 09:00:38 +08:00
|
|
|
ib_umem_release(qp->umem);
|
|
|
|
} else {
|
2015-10-08 13:27:04 +08:00
|
|
|
kvfree(qp->sq.wrid);
|
|
|
|
kvfree(qp->rq.wrid);
|
2012-08-03 16:40:40 +08:00
|
|
|
if (qp->mlx4_ib_qp_type & (MLX4_IB_QPT_PROXY_SMI_OWNER |
|
|
|
|
MLX4_IB_QPT_PROXY_SMI | MLX4_IB_QPT_PROXY_GSI))
|
|
|
|
free_proxy_bufs(&dev->ib_dev, qp);
|
2007-05-09 09:00:38 +08:00
|
|
|
mlx4_buf_free(dev->dev, qp->buf_size, &qp->buf);
|
2011-06-03 02:32:15 +08:00
|
|
|
if (qp->rq.wqe_cnt)
|
2008-04-24 02:55:45 +08:00
|
|
|
mlx4_db_free(dev->dev, &qp->db);
|
2007-05-09 09:00:38 +08:00
|
|
|
}
|
2010-10-25 12:08:52 +08:00
|
|
|
|
|
|
|
del_gid_entries(qp);
|
2007-05-09 09:00:38 +08:00
|
|
|
}
|
|
|
|
|
mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations
Previously, the structure of a guest's proxy QPs followed the
structure of the PPF special qps (qp0 port 1, qp0 port 2, qp1 port 1,
qp1 port 2, ...). The guest then did offset calculations on the
sqp_base qp number that the PPF passed to it in QUERY_FUNC_CAP().
This is now changed so that the guest does no offset calculations
regarding proxy or tunnel QPs to use. This change frees the PPF from
needing to adhere to a specific order in allocating proxy and tunnel
QPs.
Now QUERY_FUNC_CAP provides each port individually with its proxy
qp0, proxy qp1, tunnel qp0, and tunnel qp1 QP numbers, and these are
used directly where required (with no offset calculations).
To accomplish this change, several fields were added to the phys_caps
structure for use by the PPF and by non-SR-IOV mode:
base_sqpn -- in non-sriov mode, this was formerly sqp_start.
base_proxy_sqpn -- the first physical proxy qp number -- used by PPF
base_tunnel_sqpn -- the first physical tunnel qp number -- used by PPF.
The current code in the PPF still adheres to the previous layout of
sqps, proxy-sqps and tunnel-sqps. However, the PPF can change this
layout without affecting VF or (paravirtualized) PF code.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-08-03 16:40:57 +08:00
|
|
|
static u32 get_sqp_num(struct mlx4_ib_dev *dev, struct ib_qp_init_attr *attr)
|
|
|
|
{
|
|
|
|
/* Native or PPF */
|
|
|
|
if (!mlx4_is_mfunc(dev->dev) ||
|
|
|
|
(mlx4_is_master(dev->dev) &&
|
|
|
|
attr->create_flags & MLX4_IB_SRIOV_SQP)) {
|
|
|
|
return dev->dev->phys_caps.base_sqpn +
|
|
|
|
(attr->qp_type == IB_QPT_SMI ? 0 : 2) +
|
|
|
|
attr->port_num - 1;
|
|
|
|
}
|
|
|
|
/* PF or VF -- creating proxies */
|
|
|
|
if (attr->qp_type == IB_QPT_SMI)
|
|
|
|
return dev->dev->caps.qp0_proxy[attr->port_num - 1];
|
|
|
|
else
|
|
|
|
return dev->dev->caps.qp1_proxy[attr->port_num - 1];
|
|
|
|
}
|
|
|
|
|
2016-01-14 23:50:42 +08:00
|
|
|
static struct ib_qp *_mlx4_ib_create_qp(struct ib_pd *pd,
|
|
|
|
struct ib_qp_init_attr *init_attr,
|
|
|
|
struct ib_udata *udata)
|
2007-05-09 09:00:38 +08:00
|
|
|
{
|
2012-08-03 16:40:40 +08:00
|
|
|
struct mlx4_ib_qp *qp = NULL;
|
2007-05-09 09:00:38 +08:00
|
|
|
int err;
|
2015-10-15 19:44:42 +08:00
|
|
|
int sup_u_create_flags = MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK;
|
2011-06-03 02:32:15 +08:00
|
|
|
u16 xrcdn = 0;
|
2014-05-11 20:15:12 +08:00
|
|
|
gfp_t gfp;
|
2007-05-09 09:00:38 +08:00
|
|
|
|
2014-05-11 20:15:12 +08:00
|
|
|
gfp = (init_attr->create_flags & MLX4_IB_QP_CREATE_USE_GFP_NOIO) ?
|
|
|
|
GFP_NOIO : GFP_KERNEL;
|
2008-07-15 14:48:48 +08:00
|
|
|
/*
|
2012-08-03 16:40:40 +08:00
|
|
|
* We only support LSO, vendor flag1, and multicast loopback blocking,
|
|
|
|
* and only for kernel UD QPs.
|
2008-07-15 14:48:48 +08:00
|
|
|
*/
|
2012-08-03 16:40:40 +08:00
|
|
|
if (init_attr->create_flags & ~(MLX4_IB_QP_LSO |
|
|
|
|
MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK |
|
2013-11-07 21:25:17 +08:00
|
|
|
MLX4_IB_SRIOV_TUNNEL_QP |
|
|
|
|
MLX4_IB_SRIOV_SQP |
|
2014-05-11 20:15:12 +08:00
|
|
|
MLX4_IB_QP_NETIF |
|
2016-01-14 23:50:42 +08:00
|
|
|
MLX4_IB_QP_CREATE_ROCE_V2_GSI |
|
2014-05-11 20:15:12 +08:00
|
|
|
MLX4_IB_QP_CREATE_USE_GFP_NOIO))
|
2008-04-17 12:09:27 +08:00
|
|
|
return ERR_PTR(-EINVAL);
|
2008-07-15 14:48:48 +08:00
|
|
|
|
2013-11-07 21:25:17 +08:00
|
|
|
if (init_attr->create_flags & IB_QP_CREATE_NETIF_QP) {
|
|
|
|
if (init_attr->qp_type != IB_QPT_UD)
|
|
|
|
return ERR_PTR(-EINVAL);
|
|
|
|
}
|
|
|
|
|
2016-01-14 23:50:42 +08:00
|
|
|
if (init_attr->create_flags) {
|
|
|
|
if (udata && init_attr->create_flags & ~(sup_u_create_flags))
|
|
|
|
return ERR_PTR(-EINVAL);
|
|
|
|
|
|
|
|
if ((init_attr->create_flags & ~(MLX4_IB_SRIOV_SQP |
|
|
|
|
MLX4_IB_QP_CREATE_USE_GFP_NOIO |
|
|
|
|
MLX4_IB_QP_CREATE_ROCE_V2_GSI |
|
|
|
|
MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK) &&
|
|
|
|
init_attr->qp_type != IB_QPT_UD) ||
|
|
|
|
(init_attr->create_flags & MLX4_IB_SRIOV_SQP &&
|
|
|
|
init_attr->qp_type > IB_QPT_GSI) ||
|
|
|
|
(init_attr->create_flags & MLX4_IB_QP_CREATE_ROCE_V2_GSI &&
|
|
|
|
init_attr->qp_type != IB_QPT_GSI))
|
|
|
|
return ERR_PTR(-EINVAL);
|
|
|
|
}
|
2008-04-17 12:09:27 +08:00
|
|
|
|
2007-05-09 09:00:38 +08:00
|
|
|
switch (init_attr->qp_type) {
|
2011-06-03 02:32:15 +08:00
|
|
|
case IB_QPT_XRC_TGT:
|
|
|
|
pd = to_mxrcd(init_attr->xrcd)->pd;
|
|
|
|
xrcdn = to_mxrcd(init_attr->xrcd)->xrcdn;
|
|
|
|
init_attr->send_cq = to_mxrcd(init_attr->xrcd)->cq;
|
|
|
|
/* fall through */
|
|
|
|
case IB_QPT_XRC_INI:
|
|
|
|
if (!(to_mdev(pd->device)->dev->caps.flags & MLX4_DEV_CAP_FLAG_XRC))
|
|
|
|
return ERR_PTR(-ENOSYS);
|
|
|
|
init_attr->recv_cq = init_attr->send_cq;
|
|
|
|
/* fall through */
|
2007-05-09 09:00:38 +08:00
|
|
|
case IB_QPT_RC:
|
|
|
|
case IB_QPT_UC:
|
2012-01-17 19:39:07 +08:00
|
|
|
case IB_QPT_RAW_PACKET:
|
2014-05-11 20:15:12 +08:00
|
|
|
qp = kzalloc(sizeof *qp, gfp);
|
2007-05-09 09:00:38 +08:00
|
|
|
if (!qp)
|
|
|
|
return ERR_PTR(-ENOMEM);
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
qp->pri.vid = 0xFFFF;
|
|
|
|
qp->alt.vid = 0xFFFF;
|
2012-08-03 16:40:40 +08:00
|
|
|
/* fall through */
|
|
|
|
case IB_QPT_UD:
|
|
|
|
{
|
|
|
|
err = create_qp_common(to_mdev(pd->device), pd, init_attr,
|
2014-05-11 20:15:12 +08:00
|
|
|
udata, 0, &qp, gfp);
|
2016-06-22 22:27:31 +08:00
|
|
|
if (err) {
|
|
|
|
kfree(qp);
|
2007-05-09 09:00:38 +08:00
|
|
|
return ERR_PTR(err);
|
2016-06-22 22:27:31 +08:00
|
|
|
}
|
2007-05-09 09:00:38 +08:00
|
|
|
|
|
|
|
qp->ibqp.qp_num = qp->mqp.qpn;
|
2011-06-03 02:32:15 +08:00
|
|
|
qp->xrcdn = xrcdn;
|
2007-05-09 09:00:38 +08:00
|
|
|
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
case IB_QPT_SMI:
|
|
|
|
case IB_QPT_GSI:
|
|
|
|
{
|
2016-01-14 23:50:42 +08:00
|
|
|
int sqpn;
|
|
|
|
|
2007-05-09 09:00:38 +08:00
|
|
|
/* Userspace is not allowed to create special QPs: */
|
2011-06-03 02:32:15 +08:00
|
|
|
if (udata)
|
2007-05-09 09:00:38 +08:00
|
|
|
return ERR_PTR(-EINVAL);
|
2016-01-14 23:50:42 +08:00
|
|
|
if (init_attr->create_flags & MLX4_IB_QP_CREATE_ROCE_V2_GSI) {
|
|
|
|
int res = mlx4_qp_reserve_range(to_mdev(pd->device)->dev, 1, 1, &sqpn, 0);
|
|
|
|
|
|
|
|
if (res)
|
|
|
|
return ERR_PTR(res);
|
|
|
|
} else {
|
|
|
|
sqpn = get_sqp_num(to_mdev(pd->device), init_attr);
|
|
|
|
}
|
2007-05-09 09:00:38 +08:00
|
|
|
|
2011-06-03 02:32:15 +08:00
|
|
|
err = create_qp_common(to_mdev(pd->device), pd, init_attr, udata,
|
2016-01-14 23:50:42 +08:00
|
|
|
sqpn,
|
2014-05-11 20:15:12 +08:00
|
|
|
&qp, gfp);
|
2012-08-03 16:40:40 +08:00
|
|
|
if (err)
|
2007-05-09 09:00:38 +08:00
|
|
|
return ERR_PTR(err);
|
|
|
|
|
|
|
|
qp->port = init_attr->port_num;
|
2016-01-14 23:50:42 +08:00
|
|
|
qp->ibqp.qp_num = init_attr->qp_type == IB_QPT_SMI ? 0 :
|
|
|
|
init_attr->create_flags & MLX4_IB_QP_CREATE_ROCE_V2_GSI ? sqpn : 1;
|
2007-05-09 09:00:38 +08:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
default:
|
|
|
|
/* Don't support raw QPs */
|
|
|
|
return ERR_PTR(-EINVAL);
|
|
|
|
}
|
|
|
|
|
|
|
|
return &qp->ibqp;
|
|
|
|
}
|
|
|
|
|
2016-01-14 23:50:42 +08:00
|
|
|
struct ib_qp *mlx4_ib_create_qp(struct ib_pd *pd,
|
|
|
|
struct ib_qp_init_attr *init_attr,
|
|
|
|
struct ib_udata *udata) {
|
|
|
|
struct ib_device *device = pd ? pd->device : init_attr->xrcd->device;
|
|
|
|
struct ib_qp *ibqp;
|
|
|
|
struct mlx4_ib_dev *dev = to_mdev(device);
|
|
|
|
|
|
|
|
ibqp = _mlx4_ib_create_qp(pd, init_attr, udata);
|
|
|
|
|
|
|
|
if (!IS_ERR(ibqp) &&
|
|
|
|
(init_attr->qp_type == IB_QPT_GSI) &&
|
|
|
|
!(init_attr->create_flags & MLX4_IB_QP_CREATE_ROCE_V2_GSI)) {
|
|
|
|
struct mlx4_ib_sqp *sqp = to_msqp((to_mqp(ibqp)));
|
|
|
|
int is_eth = rdma_cap_eth_ah(&dev->ib_dev, init_attr->port_num);
|
|
|
|
|
|
|
|
if (is_eth &&
|
|
|
|
dev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_ROCE_V1_V2) {
|
|
|
|
init_attr->create_flags |= MLX4_IB_QP_CREATE_ROCE_V2_GSI;
|
|
|
|
sqp->roce_v2_gsi = ib_create_qp(pd, init_attr);
|
|
|
|
|
|
|
|
if (IS_ERR(sqp->roce_v2_gsi)) {
|
|
|
|
pr_err("Failed to create GSI QP for RoCEv2 (%ld)\n", PTR_ERR(sqp->roce_v2_gsi));
|
|
|
|
sqp->roce_v2_gsi = NULL;
|
|
|
|
} else {
|
|
|
|
sqp = to_msqp(to_mqp(sqp->roce_v2_gsi));
|
|
|
|
sqp->qp.flags |= MLX4_IB_ROCE_V2_GSI_QP;
|
|
|
|
}
|
|
|
|
|
|
|
|
init_attr->create_flags &= ~MLX4_IB_QP_CREATE_ROCE_V2_GSI;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return ibqp;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int _mlx4_ib_destroy_qp(struct ib_qp *qp)
|
2007-05-09 09:00:38 +08:00
|
|
|
{
|
|
|
|
struct mlx4_ib_dev *dev = to_mdev(qp->device);
|
|
|
|
struct mlx4_ib_qp *mqp = to_mqp(qp);
|
2011-06-03 02:32:15 +08:00
|
|
|
struct mlx4_ib_pd *pd;
|
2007-05-09 09:00:38 +08:00
|
|
|
|
|
|
|
if (is_qp0(dev, mqp))
|
|
|
|
mlx4_CLOSE_PORT(dev->dev, mqp->port);
|
|
|
|
|
2014-05-15 20:29:28 +08:00
|
|
|
if (dev->qp1_proxy[mqp->port - 1] == mqp) {
|
|
|
|
mutex_lock(&dev->qp1_proxy_lock[mqp->port - 1]);
|
|
|
|
dev->qp1_proxy[mqp->port - 1] = NULL;
|
|
|
|
mutex_unlock(&dev->qp1_proxy_lock[mqp->port - 1]);
|
|
|
|
}
|
|
|
|
|
2015-10-15 19:44:41 +08:00
|
|
|
if (mqp->counter_index)
|
|
|
|
mlx4_ib_free_qp_counter(dev, mqp);
|
|
|
|
|
2011-06-03 02:32:15 +08:00
|
|
|
pd = get_pd(mqp);
|
|
|
|
destroy_qp_common(dev, mqp, !!pd->ibpd.uobject);
|
2007-05-09 09:00:38 +08:00
|
|
|
|
|
|
|
if (is_sqp(dev, mqp))
|
|
|
|
kfree(to_msqp(mqp));
|
|
|
|
else
|
|
|
|
kfree(mqp);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2016-01-14 23:50:42 +08:00
|
|
|
int mlx4_ib_destroy_qp(struct ib_qp *qp)
|
|
|
|
{
|
|
|
|
struct mlx4_ib_qp *mqp = to_mqp(qp);
|
|
|
|
|
|
|
|
if (mqp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI) {
|
|
|
|
struct mlx4_ib_sqp *sqp = to_msqp(mqp);
|
|
|
|
|
|
|
|
if (sqp->roce_v2_gsi)
|
|
|
|
ib_destroy_qp(sqp->roce_v2_gsi);
|
|
|
|
}
|
|
|
|
|
|
|
|
return _mlx4_ib_destroy_qp(qp);
|
|
|
|
}
|
|
|
|
|
2012-08-03 16:40:40 +08:00
|
|
|
static int to_mlx4_st(struct mlx4_ib_dev *dev, enum mlx4_ib_qp_type type)
|
2007-05-09 09:00:38 +08:00
|
|
|
{
|
|
|
|
switch (type) {
|
2012-08-03 16:40:40 +08:00
|
|
|
case MLX4_IB_QPT_RC: return MLX4_QP_ST_RC;
|
|
|
|
case MLX4_IB_QPT_UC: return MLX4_QP_ST_UC;
|
|
|
|
case MLX4_IB_QPT_UD: return MLX4_QP_ST_UD;
|
|
|
|
case MLX4_IB_QPT_XRC_INI:
|
|
|
|
case MLX4_IB_QPT_XRC_TGT: return MLX4_QP_ST_XRC;
|
|
|
|
case MLX4_IB_QPT_SMI:
|
|
|
|
case MLX4_IB_QPT_GSI:
|
|
|
|
case MLX4_IB_QPT_RAW_PACKET: return MLX4_QP_ST_MLX;
|
|
|
|
|
|
|
|
case MLX4_IB_QPT_PROXY_SMI_OWNER:
|
|
|
|
case MLX4_IB_QPT_TUN_SMI_OWNER: return (mlx4_is_mfunc(dev->dev) ?
|
|
|
|
MLX4_QP_ST_MLX : -1);
|
|
|
|
case MLX4_IB_QPT_PROXY_SMI:
|
|
|
|
case MLX4_IB_QPT_TUN_SMI:
|
|
|
|
case MLX4_IB_QPT_PROXY_GSI:
|
|
|
|
case MLX4_IB_QPT_TUN_GSI: return (mlx4_is_mfunc(dev->dev) ?
|
|
|
|
MLX4_QP_ST_UD : -1);
|
|
|
|
default: return -1;
|
2007-05-09 09:00:38 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2007-05-14 12:26:51 +08:00
|
|
|
static __be32 to_mlx4_access_flags(struct mlx4_ib_qp *qp, const struct ib_qp_attr *attr,
|
2007-05-09 09:00:38 +08:00
|
|
|
int attr_mask)
|
|
|
|
{
|
|
|
|
u8 dest_rd_atomic;
|
|
|
|
u32 access_flags;
|
|
|
|
u32 hw_access_flags = 0;
|
|
|
|
|
|
|
|
if (attr_mask & IB_QP_MAX_DEST_RD_ATOMIC)
|
|
|
|
dest_rd_atomic = attr->max_dest_rd_atomic;
|
|
|
|
else
|
|
|
|
dest_rd_atomic = qp->resp_depth;
|
|
|
|
|
|
|
|
if (attr_mask & IB_QP_ACCESS_FLAGS)
|
|
|
|
access_flags = attr->qp_access_flags;
|
|
|
|
else
|
|
|
|
access_flags = qp->atomic_rd_en;
|
|
|
|
|
|
|
|
if (!dest_rd_atomic)
|
|
|
|
access_flags &= IB_ACCESS_REMOTE_WRITE;
|
|
|
|
|
|
|
|
if (access_flags & IB_ACCESS_REMOTE_READ)
|
|
|
|
hw_access_flags |= MLX4_QP_BIT_RRE;
|
|
|
|
if (access_flags & IB_ACCESS_REMOTE_ATOMIC)
|
|
|
|
hw_access_flags |= MLX4_QP_BIT_RAE;
|
|
|
|
if (access_flags & IB_ACCESS_REMOTE_WRITE)
|
|
|
|
hw_access_flags |= MLX4_QP_BIT_RWE;
|
|
|
|
|
|
|
|
return cpu_to_be32(hw_access_flags);
|
|
|
|
}
|
|
|
|
|
2007-05-14 12:26:51 +08:00
|
|
|
static void store_sqp_attrs(struct mlx4_ib_sqp *sqp, const struct ib_qp_attr *attr,
|
2007-05-09 09:00:38 +08:00
|
|
|
int attr_mask)
|
|
|
|
{
|
|
|
|
if (attr_mask & IB_QP_PKEY_INDEX)
|
|
|
|
sqp->pkey_index = attr->pkey_index;
|
|
|
|
if (attr_mask & IB_QP_QKEY)
|
|
|
|
sqp->qkey = attr->qkey;
|
|
|
|
if (attr_mask & IB_QP_SQ_PSN)
|
|
|
|
sqp->send_psn = attr->sq_psn;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void mlx4_set_sched(struct mlx4_qp_path *path, u8 port)
|
|
|
|
{
|
|
|
|
path->sched_queue = (path->sched_queue & 0xbf) | ((port - 1) << 6);
|
|
|
|
}
|
|
|
|
|
2013-12-13 00:03:14 +08:00
|
|
|
static int _mlx4_set_path(struct mlx4_ib_dev *dev, const struct ib_ah_attr *ah,
|
|
|
|
u64 smac, u16 vlan_tag, struct mlx4_qp_path *path,
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
struct mlx4_roce_smac_vlan_info *smac_info, u8 port)
|
2007-05-09 09:00:38 +08:00
|
|
|
{
|
2010-10-25 12:08:52 +08:00
|
|
|
int is_eth = rdma_port_get_link_layer(&dev->ib_dev, port) ==
|
|
|
|
IB_LINK_LAYER_ETHERNET;
|
2010-08-26 22:19:22 +08:00
|
|
|
int vidx;
|
2013-12-13 00:03:14 +08:00
|
|
|
int smac_index;
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
int err;
|
2013-12-13 00:03:14 +08:00
|
|
|
|
2010-10-25 12:08:52 +08:00
|
|
|
|
2007-05-09 09:00:38 +08:00
|
|
|
path->grh_mylmc = ah->src_path_bits & 0x7f;
|
|
|
|
path->rlid = cpu_to_be16(ah->dlid);
|
|
|
|
if (ah->static_rate) {
|
|
|
|
path->static_rate = ah->static_rate + MLX4_STAT_RATE_OFFSET;
|
|
|
|
while (path->static_rate > IB_RATE_2_5_GBPS + MLX4_STAT_RATE_OFFSET &&
|
|
|
|
!(1 << path->static_rate & dev->dev->caps.stat_rate_support))
|
|
|
|
--path->static_rate;
|
|
|
|
} else
|
|
|
|
path->static_rate = 0;
|
|
|
|
|
|
|
|
if (ah->ah_flags & IB_AH_GRH) {
|
2015-07-30 23:33:30 +08:00
|
|
|
int real_sgid_index = mlx4_ib_gid_index_to_real_index(dev,
|
|
|
|
port,
|
|
|
|
ah->grh.sgid_index);
|
|
|
|
|
|
|
|
if (real_sgid_index >= dev->dev->caps.gid_table_len[port]) {
|
2012-04-29 22:04:26 +08:00
|
|
|
pr_err("sgid_index (%u) too large. max is %d\n",
|
2015-07-30 23:33:30 +08:00
|
|
|
real_sgid_index, dev->dev->caps.gid_table_len[port] - 1);
|
2007-05-09 09:00:38 +08:00
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
path->grh_mylmc |= 1 << 7;
|
2015-07-30 23:33:30 +08:00
|
|
|
path->mgid_index = real_sgid_index;
|
2007-05-09 09:00:38 +08:00
|
|
|
path->hop_limit = ah->grh.hop_limit;
|
|
|
|
path->tclass_flowlabel =
|
|
|
|
cpu_to_be32((ah->grh.traffic_class << 20) |
|
|
|
|
(ah->grh.flow_label));
|
|
|
|
memcpy(path->rgid, ah->grh.dgid.raw, 16);
|
|
|
|
}
|
|
|
|
|
2010-10-25 12:08:52 +08:00
|
|
|
if (is_eth) {
|
|
|
|
if (!(ah->ah_flags & IB_AH_GRH))
|
|
|
|
return -1;
|
|
|
|
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
path->sched_queue = MLX4_IB_DEFAULT_SCHED_QUEUE |
|
|
|
|
((port - 1) << 6) | ((ah->sl & 7) << 3);
|
2010-08-26 22:19:22 +08:00
|
|
|
|
2013-12-13 00:03:14 +08:00
|
|
|
path->feup |= MLX4_FEUP_FORCE_ETH_UP;
|
2010-08-26 22:19:22 +08:00
|
|
|
if (vlan_tag < 0x1000) {
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
if (smac_info->vid < 0x1000) {
|
|
|
|
/* both valid vlan ids */
|
|
|
|
if (smac_info->vid != vlan_tag) {
|
|
|
|
/* different VIDs. unreg old and reg new */
|
|
|
|
err = mlx4_register_vlan(dev->dev, port, vlan_tag, &vidx);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
smac_info->candidate_vid = vlan_tag;
|
|
|
|
smac_info->candidate_vlan_index = vidx;
|
|
|
|
smac_info->candidate_vlan_port = port;
|
|
|
|
smac_info->update_vid = 1;
|
|
|
|
path->vlan_index = vidx;
|
|
|
|
} else {
|
|
|
|
path->vlan_index = smac_info->vlan_index;
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
/* no current vlan tag in qp */
|
|
|
|
err = mlx4_register_vlan(dev->dev, port, vlan_tag, &vidx);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
smac_info->candidate_vid = vlan_tag;
|
|
|
|
smac_info->candidate_vlan_index = vidx;
|
|
|
|
smac_info->candidate_vlan_port = port;
|
|
|
|
smac_info->update_vid = 1;
|
|
|
|
path->vlan_index = vidx;
|
|
|
|
}
|
2013-12-13 00:03:14 +08:00
|
|
|
path->feup |= MLX4_FVL_FORCE_ETH_VLAN;
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
path->fl = 1 << 6;
|
|
|
|
} else {
|
|
|
|
/* have current vlan tag. unregister it at modify-qp success */
|
|
|
|
if (smac_info->vid < 0x1000) {
|
|
|
|
smac_info->candidate_vid = 0xFFFF;
|
|
|
|
smac_info->update_vid = 1;
|
|
|
|
}
|
2010-08-26 22:19:22 +08:00
|
|
|
}
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
|
|
|
|
/* get smac_index for RoCE use.
|
|
|
|
* If no smac was yet assigned, register one.
|
|
|
|
* If one was already assigned, but the new mac differs,
|
|
|
|
* unregister the old one and register the new one.
|
|
|
|
*/
|
2014-09-11 19:11:20 +08:00
|
|
|
if ((!smac_info->smac && !smac_info->smac_port) ||
|
|
|
|
smac_info->smac != smac) {
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
/* register candidate now, unreg if needed, after success */
|
|
|
|
smac_index = mlx4_register_mac(dev->dev, port, smac);
|
|
|
|
if (smac_index >= 0) {
|
|
|
|
smac_info->candidate_smac_index = smac_index;
|
|
|
|
smac_info->candidate_smac = smac;
|
|
|
|
smac_info->candidate_smac_port = port;
|
|
|
|
} else {
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
smac_index = smac_info->smac_index;
|
|
|
|
}
|
|
|
|
|
|
|
|
memcpy(path->dmac, ah->dmac, 6);
|
|
|
|
path->ackto = MLX4_IB_LINK_TYPE_ETH;
|
|
|
|
/* put MAC table smac index for IBoE */
|
|
|
|
path->grh_mylmc = (u8) (smac_index) | 0x80;
|
|
|
|
} else {
|
2010-08-26 22:19:22 +08:00
|
|
|
path->sched_queue = MLX4_IB_DEFAULT_SCHED_QUEUE |
|
|
|
|
((port - 1) << 6) | ((ah->sl & 0xf) << 2);
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
}
|
2010-10-25 12:08:52 +08:00
|
|
|
|
2007-05-09 09:00:38 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2013-12-13 00:03:14 +08:00
|
|
|
static int mlx4_set_path(struct mlx4_ib_dev *dev, const struct ib_qp_attr *qp,
|
|
|
|
enum ib_qp_attr_mask qp_attr_mask,
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
struct mlx4_ib_qp *mqp,
|
2015-10-15 23:38:51 +08:00
|
|
|
struct mlx4_qp_path *path, u8 port,
|
|
|
|
u16 vlan_id, u8 *smac)
|
2013-12-13 00:03:14 +08:00
|
|
|
{
|
|
|
|
return _mlx4_set_path(dev, &qp->ah_attr,
|
2015-10-15 23:38:51 +08:00
|
|
|
mlx4_mac_to_u64(smac),
|
|
|
|
vlan_id,
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
path, &mqp->pri, port);
|
2013-12-13 00:03:14 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int mlx4_set_alt_path(struct mlx4_ib_dev *dev,
|
|
|
|
const struct ib_qp_attr *qp,
|
|
|
|
enum ib_qp_attr_mask qp_attr_mask,
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
struct mlx4_ib_qp *mqp,
|
2013-12-13 00:03:14 +08:00
|
|
|
struct mlx4_qp_path *path, u8 port)
|
|
|
|
{
|
|
|
|
return _mlx4_set_path(dev, &qp->alt_ah_attr,
|
2015-10-15 23:38:51 +08:00
|
|
|
0,
|
|
|
|
0xffff,
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
path, &mqp->alt, port);
|
2013-12-13 00:03:14 +08:00
|
|
|
}
|
|
|
|
|
2010-10-25 12:08:52 +08:00
|
|
|
static void update_mcg_macs(struct mlx4_ib_dev *dev, struct mlx4_ib_qp *qp)
|
|
|
|
{
|
|
|
|
struct mlx4_ib_gid_entry *ge, *tmp;
|
|
|
|
|
|
|
|
list_for_each_entry_safe(ge, tmp, &qp->gid_list, list) {
|
|
|
|
if (!ge->added && mlx4_ib_add_mc(dev, qp, &ge->gid)) {
|
|
|
|
ge->added = 1;
|
|
|
|
ge->port = qp->port;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-10-15 23:38:51 +08:00
|
|
|
static int handle_eth_ud_smac_index(struct mlx4_ib_dev *dev,
|
|
|
|
struct mlx4_ib_qp *qp,
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
struct mlx4_qp_context *context)
|
|
|
|
{
|
|
|
|
u64 u64_mac;
|
|
|
|
int smac_index;
|
|
|
|
|
2014-09-11 19:11:17 +08:00
|
|
|
u64_mac = atomic64_read(&dev->iboe.mac[qp->port - 1]);
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
|
|
|
|
context->pri_path.sched_queue = MLX4_IB_DEFAULT_SCHED_QUEUE | ((qp->port - 1) << 6);
|
2014-09-11 19:11:20 +08:00
|
|
|
if (!qp->pri.smac && !qp->pri.smac_port) {
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
smac_index = mlx4_register_mac(dev->dev, qp->port, u64_mac);
|
|
|
|
if (smac_index >= 0) {
|
|
|
|
qp->pri.candidate_smac_index = smac_index;
|
|
|
|
qp->pri.candidate_smac = u64_mac;
|
|
|
|
qp->pri.candidate_smac_port = qp->port;
|
|
|
|
context->pri_path.grh_mylmc = 0x80 | (u8) smac_index;
|
|
|
|
} else {
|
|
|
|
return -ENOENT;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2015-10-15 19:44:41 +08:00
|
|
|
static int create_qp_lb_counter(struct mlx4_ib_dev *dev, struct mlx4_ib_qp *qp)
|
|
|
|
{
|
|
|
|
struct counter_index *new_counter_index;
|
|
|
|
int err;
|
|
|
|
u32 tmp_idx;
|
|
|
|
|
|
|
|
if (rdma_port_get_link_layer(&dev->ib_dev, qp->port) !=
|
|
|
|
IB_LINK_LAYER_ETHERNET ||
|
|
|
|
!(qp->flags & MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK) ||
|
|
|
|
!(dev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_LB_SRC_CHK))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
err = mlx4_counter_alloc(dev->dev, &tmp_idx);
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
new_counter_index = kmalloc(sizeof(*new_counter_index), GFP_KERNEL);
|
|
|
|
if (!new_counter_index) {
|
|
|
|
mlx4_counter_free(dev->dev, tmp_idx);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
new_counter_index->index = tmp_idx;
|
|
|
|
new_counter_index->allocated = 1;
|
|
|
|
qp->counter_index = new_counter_index;
|
|
|
|
|
|
|
|
mutex_lock(&dev->counters_table[qp->port - 1].mutex);
|
|
|
|
list_add_tail(&new_counter_index->list,
|
|
|
|
&dev->counters_table[qp->port - 1].counters_list);
|
|
|
|
mutex_unlock(&dev->counters_table[qp->port - 1].mutex);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2016-01-14 23:50:39 +08:00
|
|
|
enum {
|
|
|
|
MLX4_QPC_ROCE_MODE_1 = 0,
|
|
|
|
MLX4_QPC_ROCE_MODE_2 = 2,
|
|
|
|
MLX4_QPC_ROCE_MODE_UNDEFINED = 0xff
|
|
|
|
};
|
|
|
|
|
|
|
|
static u8 gid_type_to_qpc(enum ib_gid_type gid_type)
|
|
|
|
{
|
|
|
|
switch (gid_type) {
|
|
|
|
case IB_GID_TYPE_ROCE:
|
|
|
|
return MLX4_QPC_ROCE_MODE_1;
|
|
|
|
case IB_GID_TYPE_ROCE_UDP_ENCAP:
|
|
|
|
return MLX4_QPC_ROCE_MODE_2;
|
|
|
|
default:
|
|
|
|
return MLX4_QPC_ROCE_MODE_UNDEFINED;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2007-05-14 12:26:51 +08:00
|
|
|
static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
|
|
|
|
const struct ib_qp_attr *attr, int attr_mask,
|
|
|
|
enum ib_qp_state cur_state, enum ib_qp_state new_state)
|
2007-05-09 09:00:38 +08:00
|
|
|
{
|
|
|
|
struct mlx4_ib_dev *dev = to_mdev(ibqp->device);
|
|
|
|
struct mlx4_ib_qp *qp = to_mqp(ibqp);
|
2011-06-03 02:32:15 +08:00
|
|
|
struct mlx4_ib_pd *pd;
|
|
|
|
struct mlx4_ib_cq *send_cq, *recv_cq;
|
2007-05-09 09:00:38 +08:00
|
|
|
struct mlx4_qp_context *context;
|
|
|
|
enum mlx4_qp_optpar optpar = 0;
|
|
|
|
int sqd_event;
|
2013-11-07 21:25:17 +08:00
|
|
|
int steer_qp = 0;
|
2007-05-09 09:00:38 +08:00
|
|
|
int err = -EINVAL;
|
2015-10-15 19:44:40 +08:00
|
|
|
int counter_index;
|
2007-05-09 09:00:38 +08:00
|
|
|
|
2014-09-11 19:11:19 +08:00
|
|
|
/* APM is not supported under RoCE */
|
|
|
|
if (attr_mask & IB_QP_ALT_PATH &&
|
|
|
|
rdma_port_get_link_layer(&dev->ib_dev, qp->port) ==
|
|
|
|
IB_LINK_LAYER_ETHERNET)
|
|
|
|
return -ENOTSUPP;
|
|
|
|
|
2007-05-09 09:00:38 +08:00
|
|
|
context = kzalloc(sizeof *context, GFP_KERNEL);
|
|
|
|
if (!context)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
context->flags = cpu_to_be32((to_mlx4_state(new_state) << 28) |
|
2012-08-03 16:40:40 +08:00
|
|
|
(to_mlx4_st(dev, qp->mlx4_ib_qp_type) << 16));
|
2007-05-09 09:00:38 +08:00
|
|
|
|
|
|
|
if (!(attr_mask & IB_QP_PATH_MIG_STATE))
|
|
|
|
context->flags |= cpu_to_be32(MLX4_QP_PM_MIGRATED << 11);
|
|
|
|
else {
|
|
|
|
optpar |= MLX4_QP_OPTPAR_PM_STATE;
|
|
|
|
switch (attr->path_mig_state) {
|
|
|
|
case IB_MIG_MIGRATED:
|
|
|
|
context->flags |= cpu_to_be32(MLX4_QP_PM_MIGRATED << 11);
|
|
|
|
break;
|
|
|
|
case IB_MIG_REARM:
|
|
|
|
context->flags |= cpu_to_be32(MLX4_QP_PM_REARM << 11);
|
|
|
|
break;
|
|
|
|
case IB_MIG_ARMED:
|
|
|
|
context->flags |= cpu_to_be32(MLX4_QP_PM_ARMED << 11);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2008-04-17 12:09:27 +08:00
|
|
|
if (ibqp->qp_type == IB_QPT_GSI || ibqp->qp_type == IB_QPT_SMI)
|
2007-05-09 09:00:38 +08:00
|
|
|
context->mtu_msgmax = (IB_MTU_4096 << 5) | 11;
|
2012-01-17 19:39:07 +08:00
|
|
|
else if (ibqp->qp_type == IB_QPT_RAW_PACKET)
|
|
|
|
context->mtu_msgmax = (MLX4_RAW_QP_MTU << 5) | MLX4_RAW_QP_MSGMAX;
|
2008-04-17 12:09:27 +08:00
|
|
|
else if (ibqp->qp_type == IB_QPT_UD) {
|
|
|
|
if (qp->flags & MLX4_IB_QP_LSO)
|
|
|
|
context->mtu_msgmax = (IB_MTU_4096 << 5) |
|
|
|
|
ilog2(dev->dev->caps.max_gso_sz);
|
|
|
|
else
|
2008-08-08 05:06:50 +08:00
|
|
|
context->mtu_msgmax = (IB_MTU_4096 << 5) | 12;
|
2008-04-17 12:09:27 +08:00
|
|
|
} else if (attr_mask & IB_QP_PATH_MTU) {
|
2007-05-09 09:00:38 +08:00
|
|
|
if (attr->path_mtu < IB_MTU_256 || attr->path_mtu > IB_MTU_4096) {
|
2012-04-29 22:04:26 +08:00
|
|
|
pr_err("path MTU (%u) is invalid\n",
|
2007-05-09 09:00:38 +08:00
|
|
|
attr->path_mtu);
|
2007-07-20 03:58:09 +08:00
|
|
|
goto out;
|
2007-05-09 09:00:38 +08:00
|
|
|
}
|
2008-07-15 14:48:45 +08:00
|
|
|
context->mtu_msgmax = (attr->path_mtu << 5) |
|
|
|
|
ilog2(dev->dev->caps.max_msg_sz);
|
2007-05-09 09:00:38 +08:00
|
|
|
}
|
|
|
|
|
2007-06-18 23:13:48 +08:00
|
|
|
if (qp->rq.wqe_cnt)
|
|
|
|
context->rq_size_stride = ilog2(qp->rq.wqe_cnt) << 3;
|
2007-05-09 09:00:38 +08:00
|
|
|
context->rq_size_stride |= qp->rq.wqe_shift - 4;
|
|
|
|
|
2007-06-18 23:13:48 +08:00
|
|
|
if (qp->sq.wqe_cnt)
|
|
|
|
context->sq_size_stride = ilog2(qp->sq.wqe_cnt) << 3;
|
2007-05-09 09:00:38 +08:00
|
|
|
context->sq_size_stride |= qp->sq.wqe_shift - 4;
|
|
|
|
|
2015-10-15 19:44:41 +08:00
|
|
|
if (new_state == IB_QPS_RESET && qp->counter_index)
|
|
|
|
mlx4_ib_free_qp_counter(dev, qp);
|
|
|
|
|
2011-06-03 02:32:15 +08:00
|
|
|
if (cur_state == IB_QPS_RESET && new_state == IB_QPS_INIT) {
|
2007-06-18 23:13:48 +08:00
|
|
|
context->sq_size_stride |= !!qp->sq_no_prefetch << 7;
|
2011-06-03 02:32:15 +08:00
|
|
|
context->xrcd = cpu_to_be32((u32) qp->xrcdn);
|
2013-04-21 23:10:00 +08:00
|
|
|
if (ibqp->qp_type == IB_QPT_RAW_PACKET)
|
|
|
|
context->param3 |= cpu_to_be32(1 << 30);
|
2011-06-03 02:32:15 +08:00
|
|
|
}
|
2007-06-18 23:13:48 +08:00
|
|
|
|
2007-05-09 09:00:38 +08:00
|
|
|
if (qp->ibqp.uobject)
|
net/mlx4_core: Set UAR page size to 4KB regardless of system page size
problem description:
The current code sets UAR page size equal to system page size.
The ConnectX-3 and ConnectX-3 Pro HWs require minimum 128 UAR pages.
The mlx4 kernel drivers are not loaded if there is less than 128 UAR pages.
solution:
Always set UAR page to 4KB. This allows more UAR pages if the OS
has PAGE_SIZE larger than 4KB. For example, PowerPC kernel use 64KB
system page size, with 4MB uar region, there are 4MB/2/64KB = 32
uars (half for uar, half for blueflame). This does not meet minimum 128
UAR pages requirement. With 4KB UAR page, there are 4MB/2/4KB = 512 uars
which meet the minimum requirement.
Note that only codes in mlx4_core that deal with firmware know that uar
page size is 4KB. Codes that deal with usr page in cq and qp context
(mlx4_ib, mlx4_en and part of mlx4_core) still have the same assumption
that uar page size equals to system page size.
Note that with this implementation, on 64KB system page size kernel, there
are 16 uars per system page but only one uars is used. The other 15
uars are ignored because of the above assumption.
Regarding SR-IOV, mlx4_core in hypervisor will set the uar page size
to 4KB and mlx4_core code in virtual OS will obtain the uar page size from
firmware.
Regarding backward compatibility in SR-IOV, if hypervisor has this new code,
the virtual OS must be updated. If hypervisor has old code, and the virtual
OS has this new code, the new code will be backward compatible with the
old code. If the uar size is big enough, this new code in VF continues to
work with 64 KB uar page size (on PowerPc kernel). If the uar size does not
meet 128 uars requirement, this new code not loaded in VF and print the same
error message as the old code in Hypervisor.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 23:24:26 +08:00
|
|
|
context->usr_page = cpu_to_be32(
|
|
|
|
mlx4_to_hw_uar_index(dev->dev,
|
|
|
|
to_mucontext(ibqp->uobject->context)->uar.index));
|
2007-05-09 09:00:38 +08:00
|
|
|
else
|
net/mlx4_core: Set UAR page size to 4KB regardless of system page size
problem description:
The current code sets UAR page size equal to system page size.
The ConnectX-3 and ConnectX-3 Pro HWs require minimum 128 UAR pages.
The mlx4 kernel drivers are not loaded if there is less than 128 UAR pages.
solution:
Always set UAR page to 4KB. This allows more UAR pages if the OS
has PAGE_SIZE larger than 4KB. For example, PowerPC kernel use 64KB
system page size, with 4MB uar region, there are 4MB/2/64KB = 32
uars (half for uar, half for blueflame). This does not meet minimum 128
UAR pages requirement. With 4KB UAR page, there are 4MB/2/4KB = 512 uars
which meet the minimum requirement.
Note that only codes in mlx4_core that deal with firmware know that uar
page size is 4KB. Codes that deal with usr page in cq and qp context
(mlx4_ib, mlx4_en and part of mlx4_core) still have the same assumption
that uar page size equals to system page size.
Note that with this implementation, on 64KB system page size kernel, there
are 16 uars per system page but only one uars is used. The other 15
uars are ignored because of the above assumption.
Regarding SR-IOV, mlx4_core in hypervisor will set the uar page size
to 4KB and mlx4_core code in virtual OS will obtain the uar page size from
firmware.
Regarding backward compatibility in SR-IOV, if hypervisor has this new code,
the virtual OS must be updated. If hypervisor has old code, and the virtual
OS has this new code, the new code will be backward compatible with the
old code. If the uar size is big enough, this new code in VF continues to
work with 64 KB uar page size (on PowerPc kernel). If the uar size does not
meet 128 uars requirement, this new code not loaded in VF and print the same
error message as the old code in Hypervisor.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 23:24:26 +08:00
|
|
|
context->usr_page = cpu_to_be32(
|
|
|
|
mlx4_to_hw_uar_index(dev->dev, dev->priv_uar.index));
|
2007-05-09 09:00:38 +08:00
|
|
|
|
|
|
|
if (attr_mask & IB_QP_DEST_QPN)
|
|
|
|
context->remote_qpn = cpu_to_be32(attr->dest_qp_num);
|
|
|
|
|
|
|
|
if (attr_mask & IB_QP_PORT) {
|
|
|
|
if (cur_state == IB_QPS_SQD && new_state == IB_QPS_SQD &&
|
|
|
|
!(attr_mask & IB_QP_AV)) {
|
|
|
|
mlx4_set_sched(&context->pri_path, attr->port_num);
|
|
|
|
optpar |= MLX4_QP_OPTPAR_SCHED_QUEUE;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2011-06-15 22:49:57 +08:00
|
|
|
if (cur_state == IB_QPS_INIT && new_state == IB_QPS_RTR) {
|
2015-10-15 19:44:41 +08:00
|
|
|
err = create_qp_lb_counter(dev, qp);
|
|
|
|
if (err)
|
|
|
|
goto out;
|
|
|
|
|
2015-10-15 19:44:40 +08:00
|
|
|
counter_index =
|
|
|
|
dev->counters_table[qp->port - 1].default_counter;
|
2015-10-15 19:44:41 +08:00
|
|
|
if (qp->counter_index)
|
|
|
|
counter_index = qp->counter_index->index;
|
|
|
|
|
2015-10-15 19:44:40 +08:00
|
|
|
if (counter_index != -1) {
|
|
|
|
context->pri_path.counter_index = counter_index;
|
2011-06-15 22:49:57 +08:00
|
|
|
optpar |= MLX4_QP_OPTPAR_COUNTER_INDEX;
|
2015-10-15 19:44:41 +08:00
|
|
|
if (qp->counter_index) {
|
|
|
|
context->pri_path.fl |=
|
|
|
|
MLX4_FL_ETH_SRC_CHECK_MC_LB;
|
|
|
|
context->pri_path.vlan_control |=
|
|
|
|
MLX4_CTRL_ETH_SRC_CHECK_IF_COUNTER;
|
|
|
|
}
|
2011-06-15 22:49:57 +08:00
|
|
|
} else
|
2015-06-15 22:58:58 +08:00
|
|
|
context->pri_path.counter_index =
|
|
|
|
MLX4_SINK_COUNTER_INDEX(dev->dev);
|
2013-11-07 21:25:17 +08:00
|
|
|
|
|
|
|
if (qp->flags & MLX4_IB_QP_NETIF) {
|
|
|
|
mlx4_ib_steer_qp_reg(dev, qp, 1);
|
|
|
|
steer_qp = 1;
|
|
|
|
}
|
2016-01-14 23:50:42 +08:00
|
|
|
|
|
|
|
if (ibqp->qp_type == IB_QPT_GSI) {
|
|
|
|
enum ib_gid_type gid_type = qp->flags & MLX4_IB_ROCE_V2_GSI_QP ?
|
|
|
|
IB_GID_TYPE_ROCE_UDP_ENCAP : IB_GID_TYPE_ROCE;
|
|
|
|
u8 qpc_roce_mode = gid_type_to_qpc(gid_type);
|
|
|
|
|
|
|
|
context->rlkey_roce_mode |= (qpc_roce_mode << 6);
|
|
|
|
}
|
2011-06-15 22:49:57 +08:00
|
|
|
}
|
|
|
|
|
2007-05-09 09:00:38 +08:00
|
|
|
if (attr_mask & IB_QP_PKEY_INDEX) {
|
2012-08-03 16:40:40 +08:00
|
|
|
if (qp->mlx4_ib_qp_type & MLX4_IB_QPT_ANY_SRIOV)
|
|
|
|
context->pri_path.disable_pkey_check = 0x40;
|
2007-05-09 09:00:38 +08:00
|
|
|
context->pri_path.pkey_index = attr->pkey_index;
|
|
|
|
optpar |= MLX4_QP_OPTPAR_PKEY_INDEX;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (attr_mask & IB_QP_AV) {
|
2015-10-15 23:38:51 +08:00
|
|
|
u8 port_num = mlx4_is_bonded(to_mdev(ibqp->device)->dev) ? 1 :
|
|
|
|
attr_mask & IB_QP_PORT ? attr->port_num : qp->port;
|
|
|
|
union ib_gid gid;
|
|
|
|
struct ib_gid_attr gid_attr;
|
|
|
|
u16 vlan = 0xffff;
|
|
|
|
u8 smac[ETH_ALEN];
|
|
|
|
int status = 0;
|
2016-01-14 23:50:39 +08:00
|
|
|
int is_eth = rdma_cap_eth_ah(&dev->ib_dev, port_num) &&
|
|
|
|
attr->ah_attr.ah_flags & IB_AH_GRH;
|
2015-10-15 23:38:51 +08:00
|
|
|
|
2016-01-14 23:50:39 +08:00
|
|
|
if (is_eth) {
|
2015-10-15 23:38:51 +08:00
|
|
|
int index = attr->ah_attr.grh.sgid_index;
|
|
|
|
|
|
|
|
status = ib_get_cached_gid(ibqp->device, port_num,
|
|
|
|
index, &gid, &gid_attr);
|
|
|
|
if (!status && !memcmp(&gid, &zgid, sizeof(gid)))
|
|
|
|
status = -ENOENT;
|
|
|
|
if (!status && gid_attr.ndev) {
|
|
|
|
vlan = rdma_vlan_dev_vlan_id(gid_attr.ndev);
|
|
|
|
memcpy(smac, gid_attr.ndev->dev_addr, ETH_ALEN);
|
|
|
|
dev_put(gid_attr.ndev);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (status)
|
|
|
|
goto out;
|
|
|
|
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
if (mlx4_set_path(dev, attr, attr_mask, qp, &context->pri_path,
|
2015-10-15 23:38:51 +08:00
|
|
|
port_num, vlan, smac))
|
2007-05-09 09:00:38 +08:00
|
|
|
goto out;
|
|
|
|
|
|
|
|
optpar |= (MLX4_QP_OPTPAR_PRIMARY_ADDR_PATH |
|
|
|
|
MLX4_QP_OPTPAR_SCHED_QUEUE);
|
2016-01-14 23:50:39 +08:00
|
|
|
|
|
|
|
if (is_eth &&
|
|
|
|
(cur_state == IB_QPS_INIT && new_state == IB_QPS_RTR)) {
|
|
|
|
u8 qpc_roce_mode = gid_type_to_qpc(gid_attr.gid_type);
|
|
|
|
|
|
|
|
if (qpc_roce_mode == MLX4_QPC_ROCE_MODE_UNDEFINED) {
|
|
|
|
err = -EINVAL;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
context->rlkey_roce_mode |= (qpc_roce_mode << 6);
|
|
|
|
}
|
|
|
|
|
2007-05-09 09:00:38 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
if (attr_mask & IB_QP_TIMEOUT) {
|
2010-10-25 12:08:52 +08:00
|
|
|
context->pri_path.ackto |= attr->timeout << 3;
|
2007-05-09 09:00:38 +08:00
|
|
|
optpar |= MLX4_QP_OPTPAR_ACK_TIMEOUT;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (attr_mask & IB_QP_ALT_PATH) {
|
|
|
|
if (attr->alt_port_num == 0 ||
|
|
|
|
attr->alt_port_num > dev->dev->caps.num_ports)
|
2007-07-20 03:58:09 +08:00
|
|
|
goto out;
|
2007-05-09 09:00:38 +08:00
|
|
|
|
2007-06-18 23:15:02 +08:00
|
|
|
if (attr->alt_pkey_index >=
|
|
|
|
dev->dev->caps.pkey_table_len[attr->alt_port_num])
|
2007-07-20 03:58:09 +08:00
|
|
|
goto out;
|
2007-06-18 23:15:02 +08:00
|
|
|
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
if (mlx4_set_alt_path(dev, attr, attr_mask, qp,
|
|
|
|
&context->alt_path,
|
2013-12-13 00:03:14 +08:00
|
|
|
attr->alt_port_num))
|
2007-07-20 03:58:09 +08:00
|
|
|
goto out;
|
2007-05-09 09:00:38 +08:00
|
|
|
|
|
|
|
context->alt_path.pkey_index = attr->alt_pkey_index;
|
|
|
|
context->alt_path.ackto = attr->alt_timeout << 3;
|
|
|
|
optpar |= MLX4_QP_OPTPAR_ALT_ADDR_PATH;
|
|
|
|
}
|
|
|
|
|
2011-06-03 02:32:15 +08:00
|
|
|
pd = get_pd(qp);
|
|
|
|
get_cqs(qp, &send_cq, &recv_cq);
|
|
|
|
context->pd = cpu_to_be32(pd->pdn);
|
|
|
|
context->cqn_send = cpu_to_be32(send_cq->mcq.cqn);
|
|
|
|
context->cqn_recv = cpu_to_be32(recv_cq->mcq.cqn);
|
|
|
|
context->params1 = cpu_to_be32(MLX4_IB_ACK_REQ_FREQ << 28);
|
2007-06-07 00:35:04 +08:00
|
|
|
|
2008-07-23 23:12:26 +08:00
|
|
|
/* Set "fast registration enabled" for all kernel QPs */
|
|
|
|
if (!qp->ibqp.uobject)
|
|
|
|
context->params1 |= cpu_to_be32(1 << 11);
|
|
|
|
|
2007-06-07 00:35:04 +08:00
|
|
|
if (attr_mask & IB_QP_RNR_RETRY) {
|
|
|
|
context->params1 |= cpu_to_be32(attr->rnr_retry << 13);
|
|
|
|
optpar |= MLX4_QP_OPTPAR_RNR_RETRY;
|
|
|
|
}
|
|
|
|
|
2007-05-09 09:00:38 +08:00
|
|
|
if (attr_mask & IB_QP_RETRY_CNT) {
|
|
|
|
context->params1 |= cpu_to_be32(attr->retry_cnt << 16);
|
|
|
|
optpar |= MLX4_QP_OPTPAR_RETRY_COUNT;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (attr_mask & IB_QP_MAX_QP_RD_ATOMIC) {
|
|
|
|
if (attr->max_rd_atomic)
|
|
|
|
context->params1 |=
|
|
|
|
cpu_to_be32(fls(attr->max_rd_atomic - 1) << 21);
|
|
|
|
optpar |= MLX4_QP_OPTPAR_SRA_MAX;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (attr_mask & IB_QP_SQ_PSN)
|
|
|
|
context->next_send_psn = cpu_to_be32(attr->sq_psn);
|
|
|
|
|
|
|
|
if (attr_mask & IB_QP_MAX_DEST_RD_ATOMIC) {
|
|
|
|
if (attr->max_dest_rd_atomic)
|
|
|
|
context->params2 |=
|
|
|
|
cpu_to_be32(fls(attr->max_dest_rd_atomic - 1) << 21);
|
|
|
|
optpar |= MLX4_QP_OPTPAR_RRA_MAX;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (attr_mask & (IB_QP_ACCESS_FLAGS | IB_QP_MAX_DEST_RD_ATOMIC)) {
|
|
|
|
context->params2 |= to_mlx4_access_flags(qp, attr, attr_mask);
|
|
|
|
optpar |= MLX4_QP_OPTPAR_RWE | MLX4_QP_OPTPAR_RRE | MLX4_QP_OPTPAR_RAE;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (ibqp->srq)
|
|
|
|
context->params2 |= cpu_to_be32(MLX4_QP_BIT_RIC);
|
|
|
|
|
|
|
|
if (attr_mask & IB_QP_MIN_RNR_TIMER) {
|
|
|
|
context->rnr_nextrecvpsn |= cpu_to_be32(attr->min_rnr_timer << 24);
|
|
|
|
optpar |= MLX4_QP_OPTPAR_RNR_TIMEOUT;
|
|
|
|
}
|
|
|
|
if (attr_mask & IB_QP_RQ_PSN)
|
|
|
|
context->rnr_nextrecvpsn |= cpu_to_be32(attr->rq_psn);
|
|
|
|
|
2012-08-03 16:40:40 +08:00
|
|
|
/* proxy and tunnel qp qkeys will be changed in modify-qp wrappers */
|
2007-05-09 09:00:38 +08:00
|
|
|
if (attr_mask & IB_QP_QKEY) {
|
2012-08-03 16:40:40 +08:00
|
|
|
if (qp->mlx4_ib_qp_type &
|
|
|
|
(MLX4_IB_QPT_PROXY_SMI_OWNER | MLX4_IB_QPT_TUN_SMI_OWNER))
|
|
|
|
context->qkey = cpu_to_be32(IB_QP_SET_QKEY);
|
|
|
|
else {
|
|
|
|
if (mlx4_is_mfunc(dev->dev) &&
|
|
|
|
!(qp->mlx4_ib_qp_type & MLX4_IB_QPT_ANY_SRIOV) &&
|
|
|
|
(attr->qkey & MLX4_RESERVED_QKEY_MASK) ==
|
|
|
|
MLX4_RESERVED_QKEY_BASE) {
|
|
|
|
pr_err("Cannot use reserved QKEY"
|
|
|
|
" 0x%x (range 0xffff0000..0xffffffff"
|
|
|
|
" is reserved)\n", attr->qkey);
|
|
|
|
err = -EINVAL;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
context->qkey = cpu_to_be32(attr->qkey);
|
|
|
|
}
|
2007-05-09 09:00:38 +08:00
|
|
|
optpar |= MLX4_QP_OPTPAR_Q_KEY;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (ibqp->srq)
|
|
|
|
context->srqn = cpu_to_be32(1 << 24 | to_msrq(ibqp->srq)->msrq.srqn);
|
|
|
|
|
2011-06-03 02:32:15 +08:00
|
|
|
if (qp->rq.wqe_cnt && cur_state == IB_QPS_RESET && new_state == IB_QPS_INIT)
|
2007-05-09 09:00:38 +08:00
|
|
|
context->db_rec_addr = cpu_to_be64(qp->db.dma);
|
|
|
|
|
|
|
|
if (cur_state == IB_QPS_INIT &&
|
|
|
|
new_state == IB_QPS_RTR &&
|
|
|
|
(ibqp->qp_type == IB_QPT_GSI || ibqp->qp_type == IB_QPT_SMI ||
|
2012-01-17 19:39:07 +08:00
|
|
|
ibqp->qp_type == IB_QPT_UD ||
|
|
|
|
ibqp->qp_type == IB_QPT_RAW_PACKET)) {
|
2007-05-09 09:00:38 +08:00
|
|
|
context->pri_path.sched_queue = (qp->port - 1) << 6;
|
2012-08-03 16:40:40 +08:00
|
|
|
if (qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI ||
|
|
|
|
qp->mlx4_ib_qp_type &
|
|
|
|
(MLX4_IB_QPT_PROXY_SMI_OWNER | MLX4_IB_QPT_TUN_SMI_OWNER)) {
|
2007-05-09 09:00:38 +08:00
|
|
|
context->pri_path.sched_queue |= MLX4_IB_DEFAULT_QP0_SCHED_QUEUE;
|
2012-08-03 16:40:40 +08:00
|
|
|
if (qp->mlx4_ib_qp_type != MLX4_IB_QPT_SMI)
|
|
|
|
context->pri_path.fl = 0x80;
|
|
|
|
} else {
|
|
|
|
if (qp->mlx4_ib_qp_type & MLX4_IB_QPT_ANY_SRIOV)
|
|
|
|
context->pri_path.fl = 0x80;
|
2007-05-09 09:00:38 +08:00
|
|
|
context->pri_path.sched_queue |= MLX4_IB_DEFAULT_SCHED_QUEUE;
|
2012-08-03 16:40:40 +08:00
|
|
|
}
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
if (rdma_port_get_link_layer(&dev->ib_dev, qp->port) ==
|
|
|
|
IB_LINK_LAYER_ETHERNET) {
|
|
|
|
if (qp->mlx4_ib_qp_type == MLX4_IB_QPT_TUN_GSI ||
|
|
|
|
qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI)
|
|
|
|
context->pri_path.feup = 1 << 7; /* don't fsm */
|
|
|
|
/* handle smac_index */
|
|
|
|
if (qp->mlx4_ib_qp_type == MLX4_IB_QPT_UD ||
|
|
|
|
qp->mlx4_ib_qp_type == MLX4_IB_QPT_PROXY_GSI ||
|
|
|
|
qp->mlx4_ib_qp_type == MLX4_IB_QPT_TUN_GSI) {
|
2015-10-15 23:38:51 +08:00
|
|
|
err = handle_eth_ud_smac_index(dev, qp, context);
|
2015-01-29 16:41:41 +08:00
|
|
|
if (err) {
|
|
|
|
err = -EINVAL;
|
|
|
|
goto out;
|
|
|
|
}
|
2014-05-15 20:29:28 +08:00
|
|
|
if (qp->mlx4_ib_qp_type == MLX4_IB_QPT_PROXY_GSI)
|
|
|
|
dev->qp1_proxy[qp->port - 1] = qp;
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
}
|
|
|
|
}
|
2007-05-09 09:00:38 +08:00
|
|
|
}
|
|
|
|
|
2014-08-27 21:47:49 +08:00
|
|
|
if (qp->ibqp.qp_type == IB_QPT_RAW_PACKET) {
|
2013-04-21 23:10:01 +08:00
|
|
|
context->pri_path.ackto = (context->pri_path.ackto & 0xf8) |
|
|
|
|
MLX4_IB_LINK_TYPE_ETH;
|
2014-08-27 21:47:49 +08:00
|
|
|
if (dev->dev->caps.tunnel_offload_mode == MLX4_TUNNEL_OFFLOAD_MODE_VXLAN) {
|
|
|
|
/* set QP to receive both tunneled & non-tunneled packets */
|
2014-09-10 22:15:11 +08:00
|
|
|
if (!(context->flags & cpu_to_be32(1 << MLX4_RSS_QPC_FLAG_OFFSET)))
|
2014-08-27 21:47:49 +08:00
|
|
|
context->srqn = cpu_to_be32(7 << 28);
|
|
|
|
}
|
|
|
|
}
|
2013-04-21 23:10:01 +08:00
|
|
|
|
2013-12-13 00:03:14 +08:00
|
|
|
if (ibqp->qp_type == IB_QPT_UD && (new_state == IB_QPS_RTR)) {
|
|
|
|
int is_eth = rdma_port_get_link_layer(
|
|
|
|
&dev->ib_dev, qp->port) ==
|
|
|
|
IB_LINK_LAYER_ETHERNET;
|
|
|
|
if (is_eth) {
|
|
|
|
context->pri_path.ackto = MLX4_IB_LINK_TYPE_ETH;
|
|
|
|
optpar |= MLX4_QP_OPTPAR_PRIMARY_ADDR_PATH;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2007-05-09 09:00:38 +08:00
|
|
|
if (cur_state == IB_QPS_RTS && new_state == IB_QPS_SQD &&
|
|
|
|
attr_mask & IB_QP_EN_SQD_ASYNC_NOTIFY && attr->en_sqd_async_notify)
|
|
|
|
sqd_event = 1;
|
|
|
|
else
|
|
|
|
sqd_event = 0;
|
|
|
|
|
2008-10-09 11:09:01 +08:00
|
|
|
if (!ibqp->uobject && cur_state == IB_QPS_RESET && new_state == IB_QPS_INIT)
|
2016-01-14 23:50:39 +08:00
|
|
|
context->rlkey_roce_mode |= (1 << 4);
|
2008-10-09 11:09:01 +08:00
|
|
|
|
2007-05-24 21:05:01 +08:00
|
|
|
/*
|
|
|
|
* Before passing a kernel QP to the HW, make sure that the
|
2007-06-18 23:13:48 +08:00
|
|
|
* ownership bits of the send queue are set and the SQ
|
|
|
|
* headroom is stamped so that the hardware doesn't start
|
|
|
|
* processing stale work requests.
|
2007-05-24 21:05:01 +08:00
|
|
|
*/
|
|
|
|
if (!ibqp->uobject && cur_state == IB_QPS_RESET && new_state == IB_QPS_INIT) {
|
|
|
|
struct mlx4_wqe_ctrl_seg *ctrl;
|
|
|
|
int i;
|
|
|
|
|
2007-06-18 23:13:48 +08:00
|
|
|
for (i = 0; i < qp->sq.wqe_cnt; ++i) {
|
2007-05-24 21:05:01 +08:00
|
|
|
ctrl = get_send_wqe(qp, i);
|
|
|
|
ctrl->owner_opcode = cpu_to_be32(1 << 31);
|
2008-07-15 14:48:44 +08:00
|
|
|
if (qp->sq_max_wqes_per_wr == 1)
|
2016-07-20 03:16:54 +08:00
|
|
|
ctrl->qpn_vlan.fence_size =
|
|
|
|
1 << (qp->sq.wqe_shift - 4);
|
2007-06-18 23:13:48 +08:00
|
|
|
|
IB/mlx4: Use multiple WQ blocks to post smaller send WQEs
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift. This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting. Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.
Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.
Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs. However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware. And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.
When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-28 16:40:59 +08:00
|
|
|
stamp_send_wqe(qp, i, 1 << qp->sq.wqe_shift);
|
2007-05-24 21:05:01 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2007-05-09 09:00:38 +08:00
|
|
|
err = mlx4_qp_modify(dev->dev, &qp->mtt, to_mlx4_state(cur_state),
|
|
|
|
to_mlx4_state(new_state), context, optpar,
|
|
|
|
sqd_event, &qp->mqp);
|
|
|
|
if (err)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
qp->state = new_state;
|
|
|
|
|
|
|
|
if (attr_mask & IB_QP_ACCESS_FLAGS)
|
|
|
|
qp->atomic_rd_en = attr->qp_access_flags;
|
|
|
|
if (attr_mask & IB_QP_MAX_DEST_RD_ATOMIC)
|
|
|
|
qp->resp_depth = attr->max_dest_rd_atomic;
|
2010-10-25 12:08:52 +08:00
|
|
|
if (attr_mask & IB_QP_PORT) {
|
2007-05-09 09:00:38 +08:00
|
|
|
qp->port = attr->port_num;
|
2010-10-25 12:08:52 +08:00
|
|
|
update_mcg_macs(dev, qp);
|
|
|
|
}
|
2007-05-09 09:00:38 +08:00
|
|
|
if (attr_mask & IB_QP_ALT_PATH)
|
|
|
|
qp->alt_port = attr->alt_port_num;
|
|
|
|
|
|
|
|
if (is_sqp(dev, qp))
|
|
|
|
store_sqp_attrs(to_msqp(qp), attr, attr_mask);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we moved QP0 to RTR, bring the IB link up; if we moved
|
|
|
|
* QP0 to RESET or ERROR, bring the link back down.
|
|
|
|
*/
|
|
|
|
if (is_qp0(dev, qp)) {
|
|
|
|
if (cur_state != IB_QPS_RTR && new_state == IB_QPS_RTR)
|
2007-06-18 23:15:02 +08:00
|
|
|
if (mlx4_INIT_PORT(dev->dev, qp->port))
|
2012-04-29 22:04:26 +08:00
|
|
|
pr_warn("INIT_PORT failed for port %d\n",
|
2007-06-18 23:15:02 +08:00
|
|
|
qp->port);
|
2007-05-09 09:00:38 +08:00
|
|
|
|
|
|
|
if (cur_state != IB_QPS_RESET && cur_state != IB_QPS_ERR &&
|
|
|
|
(new_state == IB_QPS_RESET || new_state == IB_QPS_ERR))
|
|
|
|
mlx4_CLOSE_PORT(dev->dev, qp->port);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we moved a kernel QP to RESET, clean up all old CQ
|
|
|
|
* entries and reinitialize the QP.
|
|
|
|
*/
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
if (new_state == IB_QPS_RESET) {
|
|
|
|
if (!ibqp->uobject) {
|
|
|
|
mlx4_ib_cq_clean(recv_cq, qp->mqp.qpn,
|
|
|
|
ibqp->srq ? to_msrq(ibqp->srq) : NULL);
|
|
|
|
if (send_cq != recv_cq)
|
|
|
|
mlx4_ib_cq_clean(send_cq, qp->mqp.qpn, NULL);
|
|
|
|
|
|
|
|
qp->rq.head = 0;
|
|
|
|
qp->rq.tail = 0;
|
|
|
|
qp->sq.head = 0;
|
|
|
|
qp->sq.tail = 0;
|
|
|
|
qp->sq_next_wqe = 0;
|
|
|
|
if (qp->rq.wqe_cnt)
|
|
|
|
*qp->db.db = 0;
|
2007-05-09 09:00:38 +08:00
|
|
|
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
if (qp->flags & MLX4_IB_QP_NETIF)
|
|
|
|
mlx4_ib_steer_qp_reg(dev, qp, 0);
|
|
|
|
}
|
2014-09-11 19:11:20 +08:00
|
|
|
if (qp->pri.smac || (!qp->pri.smac && qp->pri.smac_port)) {
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
mlx4_unregister_mac(dev->dev, qp->pri.smac_port, qp->pri.smac);
|
|
|
|
qp->pri.smac = 0;
|
2014-09-11 19:11:20 +08:00
|
|
|
qp->pri.smac_port = 0;
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
}
|
|
|
|
if (qp->alt.smac) {
|
|
|
|
mlx4_unregister_mac(dev->dev, qp->alt.smac_port, qp->alt.smac);
|
|
|
|
qp->alt.smac = 0;
|
|
|
|
}
|
|
|
|
if (qp->pri.vid < 0x1000) {
|
|
|
|
mlx4_unregister_vlan(dev->dev, qp->pri.vlan_port, qp->pri.vid);
|
|
|
|
qp->pri.vid = 0xFFFF;
|
|
|
|
qp->pri.candidate_vid = 0xFFFF;
|
|
|
|
qp->pri.update_vid = 0;
|
|
|
|
}
|
2013-11-07 21:25:17 +08:00
|
|
|
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
if (qp->alt.vid < 0x1000) {
|
|
|
|
mlx4_unregister_vlan(dev->dev, qp->alt.vlan_port, qp->alt.vid);
|
|
|
|
qp->alt.vid = 0xFFFF;
|
|
|
|
qp->alt.candidate_vid = 0xFFFF;
|
|
|
|
qp->alt.update_vid = 0;
|
|
|
|
}
|
2007-05-09 09:00:38 +08:00
|
|
|
}
|
|
|
|
out:
|
2015-10-15 19:44:41 +08:00
|
|
|
if (err && qp->counter_index)
|
|
|
|
mlx4_ib_free_qp_counter(dev, qp);
|
2013-11-07 21:25:17 +08:00
|
|
|
if (err && steer_qp)
|
|
|
|
mlx4_ib_steer_qp_reg(dev, qp, 0);
|
2007-05-09 09:00:38 +08:00
|
|
|
kfree(context);
|
2014-09-11 19:11:20 +08:00
|
|
|
if (qp->pri.candidate_smac ||
|
|
|
|
(!qp->pri.candidate_smac && qp->pri.candidate_smac_port)) {
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
if (err) {
|
|
|
|
mlx4_unregister_mac(dev->dev, qp->pri.candidate_smac_port, qp->pri.candidate_smac);
|
|
|
|
} else {
|
2014-09-11 19:11:20 +08:00
|
|
|
if (qp->pri.smac || (!qp->pri.smac && qp->pri.smac_port))
|
mlx4: Add ref counting to port MAC table for RoCE
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:40 +08:00
|
|
|
mlx4_unregister_mac(dev->dev, qp->pri.smac_port, qp->pri.smac);
|
|
|
|
qp->pri.smac = qp->pri.candidate_smac;
|
|
|
|
qp->pri.smac_index = qp->pri.candidate_smac_index;
|
|
|
|
qp->pri.smac_port = qp->pri.candidate_smac_port;
|
|
|
|
}
|
|
|
|
qp->pri.candidate_smac = 0;
|
|
|
|
qp->pri.candidate_smac_index = 0;
|
|
|
|
qp->pri.candidate_smac_port = 0;
|
|
|
|
}
|
|
|
|
if (qp->alt.candidate_smac) {
|
|
|
|
if (err) {
|
|
|
|
mlx4_unregister_mac(dev->dev, qp->alt.candidate_smac_port, qp->alt.candidate_smac);
|
|
|
|
} else {
|
|
|
|
if (qp->alt.smac)
|
|
|
|
mlx4_unregister_mac(dev->dev, qp->alt.smac_port, qp->alt.smac);
|
|
|
|
qp->alt.smac = qp->alt.candidate_smac;
|
|
|
|
qp->alt.smac_index = qp->alt.candidate_smac_index;
|
|
|
|
qp->alt.smac_port = qp->alt.candidate_smac_port;
|
|
|
|
}
|
|
|
|
qp->alt.candidate_smac = 0;
|
|
|
|
qp->alt.candidate_smac_index = 0;
|
|
|
|
qp->alt.candidate_smac_port = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (qp->pri.update_vid) {
|
|
|
|
if (err) {
|
|
|
|
if (qp->pri.candidate_vid < 0x1000)
|
|
|
|
mlx4_unregister_vlan(dev->dev, qp->pri.candidate_vlan_port,
|
|
|
|
qp->pri.candidate_vid);
|
|
|
|
} else {
|
|
|
|
if (qp->pri.vid < 0x1000)
|
|
|
|
mlx4_unregister_vlan(dev->dev, qp->pri.vlan_port,
|
|
|
|
qp->pri.vid);
|
|
|
|
qp->pri.vid = qp->pri.candidate_vid;
|
|
|
|
qp->pri.vlan_port = qp->pri.candidate_vlan_port;
|
|
|
|
qp->pri.vlan_index = qp->pri.candidate_vlan_index;
|
|
|
|
}
|
|
|
|
qp->pri.candidate_vid = 0xFFFF;
|
|
|
|
qp->pri.update_vid = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (qp->alt.update_vid) {
|
|
|
|
if (err) {
|
|
|
|
if (qp->alt.candidate_vid < 0x1000)
|
|
|
|
mlx4_unregister_vlan(dev->dev, qp->alt.candidate_vlan_port,
|
|
|
|
qp->alt.candidate_vid);
|
|
|
|
} else {
|
|
|
|
if (qp->alt.vid < 0x1000)
|
|
|
|
mlx4_unregister_vlan(dev->dev, qp->alt.vlan_port,
|
|
|
|
qp->alt.vid);
|
|
|
|
qp->alt.vid = qp->alt.candidate_vid;
|
|
|
|
qp->alt.vlan_port = qp->alt.candidate_vlan_port;
|
|
|
|
qp->alt.vlan_index = qp->alt.candidate_vlan_index;
|
|
|
|
}
|
|
|
|
qp->alt.candidate_vid = 0xFFFF;
|
|
|
|
qp->alt.update_vid = 0;
|
|
|
|
}
|
|
|
|
|
2007-05-09 09:00:38 +08:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2016-01-14 23:50:42 +08:00
|
|
|
static int _mlx4_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
|
|
|
|
int attr_mask, struct ib_udata *udata)
|
2007-05-14 12:26:51 +08:00
|
|
|
{
|
|
|
|
struct mlx4_ib_dev *dev = to_mdev(ibqp->device);
|
|
|
|
struct mlx4_ib_qp *qp = to_mqp(ibqp);
|
|
|
|
enum ib_qp_state cur_state, new_state;
|
|
|
|
int err = -EINVAL;
|
2013-12-13 00:03:14 +08:00
|
|
|
int ll;
|
2007-05-14 12:26:51 +08:00
|
|
|
mutex_lock(&qp->mutex);
|
|
|
|
|
|
|
|
cur_state = attr_mask & IB_QP_CUR_STATE ? attr->cur_qp_state : qp->state;
|
|
|
|
new_state = attr_mask & IB_QP_STATE ? attr->qp_state : cur_state;
|
|
|
|
|
2013-12-13 00:03:14 +08:00
|
|
|
if (cur_state == new_state && cur_state == IB_QPS_RESET) {
|
|
|
|
ll = IB_LINK_LAYER_UNSPECIFIED;
|
|
|
|
} else {
|
|
|
|
int port = attr_mask & IB_QP_PORT ? attr->port_num : qp->port;
|
|
|
|
ll = rdma_port_get_link_layer(&dev->ib_dev, port);
|
|
|
|
}
|
IB/core: Ethernet L2 attributes in verbs/cm structures
This patch add the support for Ethernet L2 attributes in the
verbs/cm/cma structures.
When dealing with L2 Ethernet, we should use smac, dmac, vlan ID and priority
in a similar manner that the IB L2 (and the L4 PKEY) attributes are used.
Thus, those attributes were added to the following structures:
* ib_ah_attr - added dmac
* ib_qp_attr - added smac and vlan_id, (sl remains vlan priority)
* ib_wc - added smac, vlan_id
* ib_sa_path_rec - added smac, dmac, vlan_id
* cm_av - added smac and vlan_id
For the path record structure, extra care was taken to avoid the new
fields when packing it into wire format, so we don't break the IB CM
and SA wire protocol.
On the active side, the CM fills. its internal structures from the
path provided by the ULP. We add there taking the ETH L2 attributes
and placing them into the CM Address Handle (struct cm_av).
On the passive side, the CM fills its internal structures from the WC
associated with the REQ message. We add there taking the ETH L2
attributes from the WC.
When the HW driver provides the required ETH L2 attributes in the WC,
they set the IB_WC_WITH_SMAC and IB_WC_WITH_VLAN flags. The IB core
code checks for the presence of these flags, and in their absence does
address resolution from the ib_init_ah_from_wc() helper function.
ib_modify_qp_is_ok is also updated to consider the link layer. Some
parameters are mandatory for Ethernet link layer, while they are
irrelevant for IB. Vendor drivers are modified to support the new
function signature.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-12-13 00:03:11 +08:00
|
|
|
|
|
|
|
if (!ib_modify_qp_is_ok(cur_state, new_state, ibqp->qp_type,
|
2013-12-13 00:03:14 +08:00
|
|
|
attr_mask, ll)) {
|
2012-06-19 16:21:35 +08:00
|
|
|
pr_debug("qpn 0x%x: invalid attribute mask specified "
|
|
|
|
"for transition %d to %d. qp_type %d,"
|
|
|
|
" attr_mask 0x%x\n",
|
|
|
|
ibqp->qp_num, cur_state, new_state,
|
|
|
|
ibqp->qp_type, attr_mask);
|
2007-05-14 12:26:51 +08:00
|
|
|
goto out;
|
2012-06-19 16:21:35 +08:00
|
|
|
}
|
2007-05-14 12:26:51 +08:00
|
|
|
|
2015-02-03 22:48:39 +08:00
|
|
|
if (mlx4_is_bonded(dev->dev) && (attr_mask & IB_QP_PORT)) {
|
|
|
|
if ((cur_state == IB_QPS_RESET) && (new_state == IB_QPS_INIT)) {
|
|
|
|
if ((ibqp->qp_type == IB_QPT_RC) ||
|
|
|
|
(ibqp->qp_type == IB_QPT_UD) ||
|
|
|
|
(ibqp->qp_type == IB_QPT_UC) ||
|
|
|
|
(ibqp->qp_type == IB_QPT_RAW_PACKET) ||
|
|
|
|
(ibqp->qp_type == IB_QPT_XRC_INI)) {
|
|
|
|
attr->port_num = mlx4_ib_bond_next_port(dev);
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
/* no sense in changing port_num
|
|
|
|
* when ports are bonded */
|
|
|
|
attr_mask &= ~IB_QP_PORT;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2007-05-14 12:26:51 +08:00
|
|
|
if ((attr_mask & IB_QP_PORT) &&
|
2012-08-03 16:40:40 +08:00
|
|
|
(attr->port_num == 0 || attr->port_num > dev->num_ports)) {
|
2012-06-19 16:21:35 +08:00
|
|
|
pr_debug("qpn 0x%x: invalid port number (%d) specified "
|
|
|
|
"for transition %d to %d. qp_type %d\n",
|
|
|
|
ibqp->qp_num, attr->port_num, cur_state,
|
|
|
|
new_state, ibqp->qp_type);
|
2007-05-14 12:26:51 +08:00
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2012-01-17 19:39:07 +08:00
|
|
|
if ((attr_mask & IB_QP_PORT) && (ibqp->qp_type == IB_QPT_RAW_PACKET) &&
|
|
|
|
(rdma_port_get_link_layer(&dev->ib_dev, attr->port_num) !=
|
|
|
|
IB_LINK_LAYER_ETHERNET))
|
|
|
|
goto out;
|
|
|
|
|
2007-06-18 23:15:02 +08:00
|
|
|
if (attr_mask & IB_QP_PKEY_INDEX) {
|
|
|
|
int p = attr_mask & IB_QP_PORT ? attr->port_num : qp->port;
|
2012-06-19 16:21:35 +08:00
|
|
|
if (attr->pkey_index >= dev->dev->caps.pkey_table_len[p]) {
|
|
|
|
pr_debug("qpn 0x%x: invalid pkey index (%d) specified "
|
|
|
|
"for transition %d to %d. qp_type %d\n",
|
|
|
|
ibqp->qp_num, attr->pkey_index, cur_state,
|
|
|
|
new_state, ibqp->qp_type);
|
2007-06-18 23:15:02 +08:00
|
|
|
goto out;
|
2012-06-19 16:21:35 +08:00
|
|
|
}
|
2007-06-18 23:15:02 +08:00
|
|
|
}
|
|
|
|
|
2007-05-14 12:26:51 +08:00
|
|
|
if (attr_mask & IB_QP_MAX_QP_RD_ATOMIC &&
|
|
|
|
attr->max_rd_atomic > dev->dev->caps.max_qp_init_rdma) {
|
2012-06-19 16:21:35 +08:00
|
|
|
pr_debug("qpn 0x%x: max_rd_atomic (%d) too large. "
|
|
|
|
"Transition %d to %d. qp_type %d\n",
|
|
|
|
ibqp->qp_num, attr->max_rd_atomic, cur_state,
|
|
|
|
new_state, ibqp->qp_type);
|
2007-05-14 12:26:51 +08:00
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (attr_mask & IB_QP_MAX_DEST_RD_ATOMIC &&
|
|
|
|
attr->max_dest_rd_atomic > dev->dev->caps.max_qp_dest_rdma) {
|
2012-06-19 16:21:35 +08:00
|
|
|
pr_debug("qpn 0x%x: max_dest_rd_atomic (%d) too large. "
|
|
|
|
"Transition %d to %d. qp_type %d\n",
|
|
|
|
ibqp->qp_num, attr->max_dest_rd_atomic, cur_state,
|
|
|
|
new_state, ibqp->qp_type);
|
2007-05-14 12:26:51 +08:00
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (cur_state == new_state && cur_state == IB_QPS_RESET) {
|
|
|
|
err = 0;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
err = __mlx4_ib_modify_qp(ibqp, attr, attr_mask, cur_state, new_state);
|
|
|
|
|
2015-02-03 22:48:39 +08:00
|
|
|
if (mlx4_is_bonded(dev->dev) && (attr_mask & IB_QP_PORT))
|
|
|
|
attr->port_num = 1;
|
|
|
|
|
2007-05-14 12:26:51 +08:00
|
|
|
out:
|
|
|
|
mutex_unlock(&qp->mutex);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2016-01-14 23:50:42 +08:00
|
|
|
int mlx4_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
|
|
|
|
int attr_mask, struct ib_udata *udata)
|
|
|
|
{
|
|
|
|
struct mlx4_ib_qp *mqp = to_mqp(ibqp);
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
ret = _mlx4_ib_modify_qp(ibqp, attr, attr_mask, udata);
|
|
|
|
|
|
|
|
if (mqp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI) {
|
|
|
|
struct mlx4_ib_sqp *sqp = to_msqp(mqp);
|
|
|
|
int err = 0;
|
|
|
|
|
|
|
|
if (sqp->roce_v2_gsi)
|
|
|
|
err = ib_modify_qp(sqp->roce_v2_gsi, attr, attr_mask);
|
|
|
|
if (err)
|
|
|
|
pr_err("Failed to modify GSI QP for RoCEv2 (%d)\n",
|
|
|
|
err);
|
|
|
|
}
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2014-05-29 21:31:03 +08:00
|
|
|
static int vf_get_qp0_qkey(struct mlx4_dev *dev, int qpn, u32 *qkey)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
for (i = 0; i < dev->caps.num_ports; i++) {
|
|
|
|
if (qpn == dev->caps.qp0_proxy[i] ||
|
|
|
|
qpn == dev->caps.qp0_tunnel[i]) {
|
|
|
|
*qkey = dev->caps.qp0_qkey[i];
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2012-08-03 16:40:40 +08:00
|
|
|
static int build_sriov_qp0_header(struct mlx4_ib_sqp *sqp,
|
2015-10-08 16:16:33 +08:00
|
|
|
struct ib_ud_wr *wr,
|
2012-08-03 16:40:40 +08:00
|
|
|
void *wqe, unsigned *mlx_seg_len)
|
|
|
|
{
|
|
|
|
struct mlx4_ib_dev *mdev = to_mdev(sqp->qp.ibqp.device);
|
|
|
|
struct ib_device *ib_dev = &mdev->ib_dev;
|
|
|
|
struct mlx4_wqe_mlx_seg *mlx = wqe;
|
|
|
|
struct mlx4_wqe_inline_seg *inl = wqe + sizeof *mlx;
|
2015-10-08 16:16:33 +08:00
|
|
|
struct mlx4_ib_ah *ah = to_mah(wr->ah);
|
2012-08-03 16:40:40 +08:00
|
|
|
u16 pkey;
|
|
|
|
u32 qkey;
|
|
|
|
int send_size;
|
|
|
|
int header_size;
|
|
|
|
int spc;
|
|
|
|
int i;
|
|
|
|
|
2015-10-08 16:16:33 +08:00
|
|
|
if (wr->wr.opcode != IB_WR_SEND)
|
2012-08-03 16:40:40 +08:00
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
send_size = 0;
|
|
|
|
|
2015-10-08 16:16:33 +08:00
|
|
|
for (i = 0; i < wr->wr.num_sge; ++i)
|
|
|
|
send_size += wr->wr.sg_list[i].length;
|
2012-08-03 16:40:40 +08:00
|
|
|
|
|
|
|
/* for proxy-qp0 sends, need to add in size of tunnel header */
|
|
|
|
/* for tunnel-qp0 sends, tunnel header is already in s/g list */
|
|
|
|
if (sqp->qp.mlx4_ib_qp_type == MLX4_IB_QPT_PROXY_SMI_OWNER)
|
|
|
|
send_size += sizeof (struct mlx4_ib_tunnel_header);
|
|
|
|
|
2015-12-23 20:56:56 +08:00
|
|
|
ib_ud_header_init(send_size, 1, 0, 0, 0, 0, 0, 0, &sqp->ud_header);
|
2012-08-03 16:40:40 +08:00
|
|
|
|
|
|
|
if (sqp->qp.mlx4_ib_qp_type == MLX4_IB_QPT_PROXY_SMI_OWNER) {
|
|
|
|
sqp->ud_header.lrh.service_level =
|
|
|
|
be32_to_cpu(ah->av.ib.sl_tclass_flowlabel) >> 28;
|
|
|
|
sqp->ud_header.lrh.destination_lid =
|
|
|
|
cpu_to_be16(ah->av.ib.g_slid & 0x7f);
|
|
|
|
sqp->ud_header.lrh.source_lid =
|
|
|
|
cpu_to_be16(ah->av.ib.g_slid & 0x7f);
|
|
|
|
}
|
|
|
|
|
|
|
|
mlx->flags &= cpu_to_be32(MLX4_WQE_CTRL_CQ_UPDATE);
|
|
|
|
|
|
|
|
/* force loopback */
|
|
|
|
mlx->flags |= cpu_to_be32(MLX4_WQE_MLX_VL15 | 0x1 | MLX4_WQE_MLX_SLR);
|
|
|
|
mlx->rlid = sqp->ud_header.lrh.destination_lid;
|
|
|
|
|
|
|
|
sqp->ud_header.lrh.virtual_lane = 0;
|
2015-10-08 16:16:33 +08:00
|
|
|
sqp->ud_header.bth.solicited_event = !!(wr->wr.send_flags & IB_SEND_SOLICITED);
|
2012-08-03 16:40:40 +08:00
|
|
|
ib_get_cached_pkey(ib_dev, sqp->qp.port, 0, &pkey);
|
|
|
|
sqp->ud_header.bth.pkey = cpu_to_be16(pkey);
|
|
|
|
if (sqp->qp.mlx4_ib_qp_type == MLX4_IB_QPT_TUN_SMI_OWNER)
|
2015-10-08 16:16:33 +08:00
|
|
|
sqp->ud_header.bth.destination_qpn = cpu_to_be32(wr->remote_qpn);
|
2012-08-03 16:40:40 +08:00
|
|
|
else
|
|
|
|
sqp->ud_header.bth.destination_qpn =
|
mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations
Previously, the structure of a guest's proxy QPs followed the
structure of the PPF special qps (qp0 port 1, qp0 port 2, qp1 port 1,
qp1 port 2, ...). The guest then did offset calculations on the
sqp_base qp number that the PPF passed to it in QUERY_FUNC_CAP().
This is now changed so that the guest does no offset calculations
regarding proxy or tunnel QPs to use. This change frees the PPF from
needing to adhere to a specific order in allocating proxy and tunnel
QPs.
Now QUERY_FUNC_CAP provides each port individually with its proxy
qp0, proxy qp1, tunnel qp0, and tunnel qp1 QP numbers, and these are
used directly where required (with no offset calculations).
To accomplish this change, several fields were added to the phys_caps
structure for use by the PPF and by non-SR-IOV mode:
base_sqpn -- in non-sriov mode, this was formerly sqp_start.
base_proxy_sqpn -- the first physical proxy qp number -- used by PPF
base_tunnel_sqpn -- the first physical tunnel qp number -- used by PPF.
The current code in the PPF still adheres to the previous layout of
sqps, proxy-sqps and tunnel-sqps. However, the PPF can change this
layout without affecting VF or (paravirtualized) PF code.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-08-03 16:40:57 +08:00
|
|
|
cpu_to_be32(mdev->dev->caps.qp0_tunnel[sqp->qp.port - 1]);
|
2012-08-03 16:40:40 +08:00
|
|
|
|
|
|
|
sqp->ud_header.bth.psn = cpu_to_be32((sqp->send_psn++) & ((1 << 24) - 1));
|
2014-05-29 21:31:03 +08:00
|
|
|
if (mlx4_is_master(mdev->dev)) {
|
|
|
|
if (mlx4_get_parav_qkey(mdev->dev, sqp->qp.mqp.qpn, &qkey))
|
|
|
|
return -EINVAL;
|
|
|
|
} else {
|
|
|
|
if (vf_get_qp0_qkey(mdev->dev, sqp->qp.mqp.qpn, &qkey))
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
2012-08-03 16:40:40 +08:00
|
|
|
sqp->ud_header.deth.qkey = cpu_to_be32(qkey);
|
|
|
|
sqp->ud_header.deth.source_qpn = cpu_to_be32(sqp->qp.mqp.qpn);
|
|
|
|
|
|
|
|
sqp->ud_header.bth.opcode = IB_OPCODE_UD_SEND_ONLY;
|
|
|
|
sqp->ud_header.immediate_present = 0;
|
|
|
|
|
|
|
|
header_size = ib_ud_header_pack(&sqp->ud_header, sqp->header_buf);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Inline data segments may not cross a 64 byte boundary. If
|
|
|
|
* our UD header is bigger than the space available up to the
|
|
|
|
* next 64 byte boundary in the WQE, use two inline data
|
|
|
|
* segments to hold the UD header.
|
|
|
|
*/
|
|
|
|
spc = MLX4_INLINE_ALIGN -
|
|
|
|
((unsigned long) (inl + 1) & (MLX4_INLINE_ALIGN - 1));
|
|
|
|
if (header_size <= spc) {
|
|
|
|
inl->byte_count = cpu_to_be32(1 << 31 | header_size);
|
|
|
|
memcpy(inl + 1, sqp->header_buf, header_size);
|
|
|
|
i = 1;
|
|
|
|
} else {
|
|
|
|
inl->byte_count = cpu_to_be32(1 << 31 | spc);
|
|
|
|
memcpy(inl + 1, sqp->header_buf, spc);
|
|
|
|
|
|
|
|
inl = (void *) (inl + 1) + spc;
|
|
|
|
memcpy(inl + 1, sqp->header_buf + spc, header_size - spc);
|
|
|
|
/*
|
|
|
|
* Need a barrier here to make sure all the data is
|
|
|
|
* visible before the byte_count field is set.
|
|
|
|
* Otherwise the HCA prefetcher could grab the 64-byte
|
|
|
|
* chunk with this inline segment and get a valid (!=
|
|
|
|
* 0xffffffff) byte count but stale data, and end up
|
|
|
|
* generating a packet with bad headers.
|
|
|
|
*
|
|
|
|
* The first inline segment's byte_count field doesn't
|
|
|
|
* need a barrier, because it comes after a
|
|
|
|
* control/MLX segment and therefore is at an offset
|
|
|
|
* of 16 mod 64.
|
|
|
|
*/
|
|
|
|
wmb();
|
|
|
|
inl->byte_count = cpu_to_be32(1 << 31 | (header_size - spc));
|
|
|
|
i = 2;
|
|
|
|
}
|
|
|
|
|
|
|
|
*mlx_seg_len =
|
|
|
|
ALIGN(i * sizeof (struct mlx4_wqe_inline_seg) + header_size, 16);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2016-01-14 23:50:41 +08:00
|
|
|
#define MLX4_ROCEV2_QP1_SPORT 0xC000
|
2015-10-08 16:16:33 +08:00
|
|
|
static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_ud_wr *wr,
|
2008-04-17 12:09:28 +08:00
|
|
|
void *wqe, unsigned *mlx_seg_len)
|
2007-05-09 09:00:38 +08:00
|
|
|
{
|
2010-01-27 21:57:03 +08:00
|
|
|
struct ib_device *ib_dev = sqp->qp.ibqp.device;
|
2007-05-09 09:00:38 +08:00
|
|
|
struct mlx4_wqe_mlx_seg *mlx = wqe;
|
2014-03-12 18:00:37 +08:00
|
|
|
struct mlx4_wqe_ctrl_seg *ctrl = wqe;
|
2007-05-09 09:00:38 +08:00
|
|
|
struct mlx4_wqe_inline_seg *inl = wqe + sizeof *mlx;
|
2015-10-08 16:16:33 +08:00
|
|
|
struct mlx4_ib_ah *ah = to_mah(wr->ah);
|
2010-08-26 22:19:22 +08:00
|
|
|
union ib_gid sgid;
|
2007-05-09 09:00:38 +08:00
|
|
|
u16 pkey;
|
|
|
|
int send_size;
|
|
|
|
int header_size;
|
2007-06-19 00:23:47 +08:00
|
|
|
int spc;
|
2007-05-09 09:00:38 +08:00
|
|
|
int i;
|
2012-08-03 16:40:40 +08:00
|
|
|
int err = 0;
|
2013-02-26 01:17:13 +08:00
|
|
|
u16 vlan = 0xffff;
|
2013-02-26 01:02:03 +08:00
|
|
|
bool is_eth;
|
|
|
|
bool is_vlan = false;
|
|
|
|
bool is_grh;
|
2016-01-14 23:50:41 +08:00
|
|
|
bool is_udp = false;
|
|
|
|
int ip_version = 0;
|
2007-05-09 09:00:38 +08:00
|
|
|
|
|
|
|
send_size = 0;
|
2015-10-08 16:16:33 +08:00
|
|
|
for (i = 0; i < wr->wr.num_sge; ++i)
|
|
|
|
send_size += wr->wr.sg_list[i].length;
|
2007-05-09 09:00:38 +08:00
|
|
|
|
2010-10-25 12:08:52 +08:00
|
|
|
is_eth = rdma_port_get_link_layer(sqp->qp.ibqp.device, sqp->qp.port) == IB_LINK_LAYER_ETHERNET;
|
|
|
|
is_grh = mlx4_ib_ah_grh_present(ah);
|
2010-08-26 22:19:22 +08:00
|
|
|
if (is_eth) {
|
2016-01-14 23:50:41 +08:00
|
|
|
struct ib_gid_attr gid_attr;
|
|
|
|
|
2012-08-03 16:40:40 +08:00
|
|
|
if (mlx4_is_mfunc(to_mdev(ib_dev)->dev)) {
|
|
|
|
/* When multi-function is enabled, the ib_core gid
|
|
|
|
* indexes don't necessarily match the hw ones, so
|
|
|
|
* we must use our own cache */
|
2014-03-12 18:00:37 +08:00
|
|
|
err = mlx4_get_roce_gid_from_slave(to_mdev(ib_dev)->dev,
|
|
|
|
be32_to_cpu(ah->av.ib.port_pd) >> 24,
|
|
|
|
ah->av.ib.gid_index, &sgid.raw[0]);
|
|
|
|
if (err)
|
|
|
|
return err;
|
2012-08-03 16:40:40 +08:00
|
|
|
} else {
|
|
|
|
err = ib_get_cached_gid(ib_dev,
|
|
|
|
be32_to_cpu(ah->av.ib.port_pd) >> 24,
|
2015-10-15 23:38:45 +08:00
|
|
|
ah->av.ib.gid_index, &sgid,
|
2016-01-14 23:50:41 +08:00
|
|
|
&gid_attr);
|
|
|
|
if (!err) {
|
|
|
|
if (gid_attr.ndev)
|
|
|
|
dev_put(gid_attr.ndev);
|
|
|
|
if (!memcmp(&sgid, &zgid, sizeof(sgid)))
|
|
|
|
err = -ENOENT;
|
|
|
|
}
|
|
|
|
if (!err) {
|
|
|
|
is_udp = gid_attr.gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP;
|
|
|
|
if (is_udp) {
|
|
|
|
if (ipv6_addr_v4mapped((struct in6_addr *)&sgid))
|
|
|
|
ip_version = 4;
|
|
|
|
else
|
|
|
|
ip_version = 6;
|
|
|
|
is_grh = false;
|
|
|
|
}
|
|
|
|
} else {
|
2012-08-03 16:40:40 +08:00
|
|
|
return err;
|
2016-01-14 23:50:41 +08:00
|
|
|
}
|
2012-08-03 16:40:40 +08:00
|
|
|
}
|
2014-03-10 17:33:05 +08:00
|
|
|
if (ah->av.eth.vlan != cpu_to_be16(0xffff)) {
|
2013-12-13 00:03:14 +08:00
|
|
|
vlan = be16_to_cpu(ah->av.eth.vlan) & 0x0fff;
|
|
|
|
is_vlan = 1;
|
|
|
|
}
|
2010-08-26 22:19:22 +08:00
|
|
|
}
|
2015-12-23 20:56:56 +08:00
|
|
|
err = ib_ud_header_init(send_size, !is_eth, is_eth, is_vlan, is_grh,
|
2016-01-14 23:50:41 +08:00
|
|
|
ip_version, is_udp, 0, &sqp->ud_header);
|
2015-12-23 20:56:56 +08:00
|
|
|
if (err)
|
|
|
|
return err;
|
2010-10-25 12:08:52 +08:00
|
|
|
|
|
|
|
if (!is_eth) {
|
|
|
|
sqp->ud_header.lrh.service_level =
|
|
|
|
be32_to_cpu(ah->av.ib.sl_tclass_flowlabel) >> 28;
|
|
|
|
sqp->ud_header.lrh.destination_lid = ah->av.ib.dlid;
|
|
|
|
sqp->ud_header.lrh.source_lid = cpu_to_be16(ah->av.ib.g_slid & 0x7f);
|
|
|
|
}
|
2007-05-09 09:00:38 +08:00
|
|
|
|
2016-01-14 23:50:41 +08:00
|
|
|
if (is_grh || (ip_version == 6)) {
|
2007-05-09 09:00:38 +08:00
|
|
|
sqp->ud_header.grh.traffic_class =
|
2010-10-25 12:08:52 +08:00
|
|
|
(be32_to_cpu(ah->av.ib.sl_tclass_flowlabel) >> 20) & 0xff;
|
2007-05-09 09:00:38 +08:00
|
|
|
sqp->ud_header.grh.flow_label =
|
2010-10-25 12:08:52 +08:00
|
|
|
ah->av.ib.sl_tclass_flowlabel & cpu_to_be32(0xfffff);
|
|
|
|
sqp->ud_header.grh.hop_limit = ah->av.ib.hop_limit;
|
2014-03-12 18:00:37 +08:00
|
|
|
if (is_eth)
|
|
|
|
memcpy(sqp->ud_header.grh.source_gid.raw, sgid.raw, 16);
|
|
|
|
else {
|
2012-08-03 16:40:40 +08:00
|
|
|
if (mlx4_is_mfunc(to_mdev(ib_dev)->dev)) {
|
|
|
|
/* When multi-function is enabled, the ib_core gid
|
|
|
|
* indexes don't necessarily match the hw ones, so
|
|
|
|
* we must use our own cache */
|
|
|
|
sqp->ud_header.grh.source_gid.global.subnet_prefix =
|
|
|
|
to_mdev(ib_dev)->sriov.demux[sqp->qp.port - 1].
|
|
|
|
subnet_prefix;
|
|
|
|
sqp->ud_header.grh.source_gid.global.interface_id =
|
|
|
|
to_mdev(ib_dev)->sriov.demux[sqp->qp.port - 1].
|
|
|
|
guid_cache[ah->av.ib.gid_index];
|
|
|
|
} else
|
|
|
|
ib_get_cached_gid(ib_dev,
|
|
|
|
be32_to_cpu(ah->av.ib.port_pd) >> 24,
|
|
|
|
ah->av.ib.gid_index,
|
2015-10-15 23:38:45 +08:00
|
|
|
&sqp->ud_header.grh.source_gid, NULL);
|
2014-03-12 18:00:37 +08:00
|
|
|
}
|
2007-05-09 09:00:38 +08:00
|
|
|
memcpy(sqp->ud_header.grh.destination_gid.raw,
|
2010-10-25 12:08:52 +08:00
|
|
|
ah->av.ib.dgid, 16);
|
2007-05-09 09:00:38 +08:00
|
|
|
}
|
|
|
|
|
2016-01-14 23:50:41 +08:00
|
|
|
if (ip_version == 4) {
|
|
|
|
sqp->ud_header.ip4.tos =
|
|
|
|
(be32_to_cpu(ah->av.ib.sl_tclass_flowlabel) >> 20) & 0xff;
|
|
|
|
sqp->ud_header.ip4.id = 0;
|
|
|
|
sqp->ud_header.ip4.frag_off = htons(IP_DF);
|
|
|
|
sqp->ud_header.ip4.ttl = ah->av.eth.hop_limit;
|
|
|
|
|
|
|
|
memcpy(&sqp->ud_header.ip4.saddr,
|
|
|
|
sgid.raw + 12, 4);
|
|
|
|
memcpy(&sqp->ud_header.ip4.daddr, ah->av.ib.dgid + 12, 4);
|
|
|
|
sqp->ud_header.ip4.check = ib_ud_ip4_csum(&sqp->ud_header);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (is_udp) {
|
|
|
|
sqp->ud_header.udp.dport = htons(ROCE_V2_UDP_DPORT);
|
|
|
|
sqp->ud_header.udp.sport = htons(MLX4_ROCEV2_QP1_SPORT);
|
|
|
|
sqp->ud_header.udp.csum = 0;
|
|
|
|
}
|
|
|
|
|
2007-05-09 09:00:38 +08:00
|
|
|
mlx->flags &= cpu_to_be32(MLX4_WQE_CTRL_CQ_UPDATE);
|
2010-10-25 12:08:52 +08:00
|
|
|
|
|
|
|
if (!is_eth) {
|
|
|
|
mlx->flags |= cpu_to_be32((!sqp->qp.ibqp.qp_num ? MLX4_WQE_MLX_VL15 : 0) |
|
|
|
|
(sqp->ud_header.lrh.destination_lid ==
|
|
|
|
IB_LID_PERMISSIVE ? MLX4_WQE_MLX_SLR : 0) |
|
|
|
|
(sqp->ud_header.lrh.service_level << 8));
|
2012-08-03 16:40:40 +08:00
|
|
|
if (ah->av.ib.port_pd & cpu_to_be32(0x80000000))
|
|
|
|
mlx->flags |= cpu_to_be32(0x1); /* force loopback */
|
2010-10-25 12:08:52 +08:00
|
|
|
mlx->rlid = sqp->ud_header.lrh.destination_lid;
|
|
|
|
}
|
2007-05-09 09:00:38 +08:00
|
|
|
|
2015-10-08 16:16:33 +08:00
|
|
|
switch (wr->wr.opcode) {
|
2007-05-09 09:00:38 +08:00
|
|
|
case IB_WR_SEND:
|
|
|
|
sqp->ud_header.bth.opcode = IB_OPCODE_UD_SEND_ONLY;
|
|
|
|
sqp->ud_header.immediate_present = 0;
|
|
|
|
break;
|
|
|
|
case IB_WR_SEND_WITH_IMM:
|
|
|
|
sqp->ud_header.bth.opcode = IB_OPCODE_UD_SEND_ONLY_WITH_IMMEDIATE;
|
|
|
|
sqp->ud_header.immediate_present = 1;
|
2015-10-08 16:16:33 +08:00
|
|
|
sqp->ud_header.immediate_data = wr->wr.ex.imm_data;
|
2007-05-09 09:00:38 +08:00
|
|
|
break;
|
|
|
|
default:
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2010-10-25 12:08:52 +08:00
|
|
|
if (is_eth) {
|
2014-03-12 18:00:37 +08:00
|
|
|
struct in6_addr in6;
|
2016-01-14 23:50:41 +08:00
|
|
|
u16 ether_type;
|
2012-04-29 22:04:24 +08:00
|
|
|
u16 pcp = (be32_to_cpu(ah->av.ib.sl_tclass_flowlabel) >> 29) << 13;
|
|
|
|
|
2016-01-14 23:50:41 +08:00
|
|
|
ether_type = (!is_udp) ? MLX4_IB_IBOE_ETHERTYPE :
|
|
|
|
(ip_version == 4 ? ETH_P_IP : ETH_P_IPV6);
|
|
|
|
|
2012-04-29 22:04:24 +08:00
|
|
|
mlx->sched_prio = cpu_to_be16(pcp);
|
2010-10-25 12:08:52 +08:00
|
|
|
|
2016-01-14 23:47:38 +08:00
|
|
|
ether_addr_copy(sqp->ud_header.eth.smac_h, ah->av.eth.s_mac);
|
2010-10-25 12:08:52 +08:00
|
|
|
memcpy(sqp->ud_header.eth.dmac_h, ah->av.eth.mac, 6);
|
2014-03-12 18:00:37 +08:00
|
|
|
memcpy(&ctrl->srcrb_flags16[0], ah->av.eth.mac, 2);
|
|
|
|
memcpy(&ctrl->imm, ah->av.eth.mac + 2, 4);
|
|
|
|
memcpy(&in6, sgid.raw, sizeof(in6));
|
mlx4: Implement IP based gids support for RoCE/SRIOV
Since there is no connection between the MAC/VLAN and the GID
when using IP-based addressing, the proxy QP1 (running on the
slave) must pass the source-mac, destination-mac, and vlan_id
information separately from the GID. Additionally, the Host
must pass the remote source-mac and vlan_id back to the slave,
This is achieved as follows:
Outgoing MADs:
1. Source MAC: obtained from the CQ completion structure
(struct ib_wc, smac field).
2. Destination MAC: obtained from the tunnel header
3. vlan_id: obtained from the tunnel header.
Incoming MADs
1. The source (i.e., remote) MAC and vlan_id are passed in
the tunnel header to the proxy QP1.
VST mode support:
For outgoing MADs, the vlan_id obtained from the header is
discarded, and the vlan_id specified by the Hypervisor is used
instead.
For incoming MADs, the incoming vlan_id (in the wc) is discarded, and the
"invalid" vlan (0xffff) is substituted when forwarding to the slave.
Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:41 +08:00
|
|
|
|
2014-09-11 19:11:17 +08:00
|
|
|
|
2010-10-25 12:08:52 +08:00
|
|
|
if (!memcmp(sqp->ud_header.eth.smac_h, sqp->ud_header.eth.dmac_h, 6))
|
|
|
|
mlx->flags |= cpu_to_be32(MLX4_WQE_CTRL_FORCE_LOOPBACK);
|
2010-08-26 22:19:22 +08:00
|
|
|
if (!is_vlan) {
|
2016-01-14 23:50:41 +08:00
|
|
|
sqp->ud_header.eth.type = cpu_to_be16(ether_type);
|
2010-08-26 22:19:22 +08:00
|
|
|
} else {
|
2016-01-14 23:50:41 +08:00
|
|
|
sqp->ud_header.vlan.type = cpu_to_be16(ether_type);
|
2010-08-26 22:19:22 +08:00
|
|
|
sqp->ud_header.vlan.tag = cpu_to_be16(vlan | pcp);
|
|
|
|
}
|
2010-10-25 12:08:52 +08:00
|
|
|
} else {
|
|
|
|
sqp->ud_header.lrh.virtual_lane = !sqp->qp.ibqp.qp_num ? 15 : 0;
|
|
|
|
if (sqp->ud_header.lrh.destination_lid == IB_LID_PERMISSIVE)
|
|
|
|
sqp->ud_header.lrh.source_lid = IB_LID_PERMISSIVE;
|
|
|
|
}
|
2015-10-08 16:16:33 +08:00
|
|
|
sqp->ud_header.bth.solicited_event = !!(wr->wr.send_flags & IB_SEND_SOLICITED);
|
2007-05-09 09:00:38 +08:00
|
|
|
if (!sqp->qp.ibqp.qp_num)
|
|
|
|
ib_get_cached_pkey(ib_dev, sqp->qp.port, sqp->pkey_index, &pkey);
|
|
|
|
else
|
2015-10-08 16:16:33 +08:00
|
|
|
ib_get_cached_pkey(ib_dev, sqp->qp.port, wr->pkey_index, &pkey);
|
2007-05-09 09:00:38 +08:00
|
|
|
sqp->ud_header.bth.pkey = cpu_to_be16(pkey);
|
2015-10-08 16:16:33 +08:00
|
|
|
sqp->ud_header.bth.destination_qpn = cpu_to_be32(wr->remote_qpn);
|
2007-05-09 09:00:38 +08:00
|
|
|
sqp->ud_header.bth.psn = cpu_to_be32((sqp->send_psn++) & ((1 << 24) - 1));
|
2015-10-08 16:16:33 +08:00
|
|
|
sqp->ud_header.deth.qkey = cpu_to_be32(wr->remote_qkey & 0x80000000 ?
|
|
|
|
sqp->qkey : wr->remote_qkey);
|
2007-05-09 09:00:38 +08:00
|
|
|
sqp->ud_header.deth.source_qpn = cpu_to_be32(sqp->qp.ibqp.qp_num);
|
|
|
|
|
|
|
|
header_size = ib_ud_header_pack(&sqp->ud_header, sqp->header_buf);
|
|
|
|
|
|
|
|
if (0) {
|
2012-04-29 22:04:26 +08:00
|
|
|
pr_err("built UD header of size %d:\n", header_size);
|
2007-05-09 09:00:38 +08:00
|
|
|
for (i = 0; i < header_size / 4; ++i) {
|
|
|
|
if (i % 8 == 0)
|
2012-04-29 22:04:26 +08:00
|
|
|
pr_err(" [%02x] ", i * 4);
|
|
|
|
pr_cont(" %08x",
|
|
|
|
be32_to_cpu(((__be32 *) sqp->header_buf)[i]));
|
2007-05-09 09:00:38 +08:00
|
|
|
if ((i + 1) % 8 == 0)
|
2012-04-29 22:04:26 +08:00
|
|
|
pr_cont("\n");
|
2007-05-09 09:00:38 +08:00
|
|
|
}
|
2012-04-29 22:04:26 +08:00
|
|
|
pr_err("\n");
|
2007-05-09 09:00:38 +08:00
|
|
|
}
|
|
|
|
|
2007-06-19 00:23:47 +08:00
|
|
|
/*
|
|
|
|
* Inline data segments may not cross a 64 byte boundary. If
|
|
|
|
* our UD header is bigger than the space available up to the
|
|
|
|
* next 64 byte boundary in the WQE, use two inline data
|
|
|
|
* segments to hold the UD header.
|
|
|
|
*/
|
|
|
|
spc = MLX4_INLINE_ALIGN -
|
|
|
|
((unsigned long) (inl + 1) & (MLX4_INLINE_ALIGN - 1));
|
|
|
|
if (header_size <= spc) {
|
|
|
|
inl->byte_count = cpu_to_be32(1 << 31 | header_size);
|
|
|
|
memcpy(inl + 1, sqp->header_buf, header_size);
|
|
|
|
i = 1;
|
|
|
|
} else {
|
|
|
|
inl->byte_count = cpu_to_be32(1 << 31 | spc);
|
|
|
|
memcpy(inl + 1, sqp->header_buf, spc);
|
|
|
|
|
|
|
|
inl = (void *) (inl + 1) + spc;
|
|
|
|
memcpy(inl + 1, sqp->header_buf + spc, header_size - spc);
|
|
|
|
/*
|
|
|
|
* Need a barrier here to make sure all the data is
|
|
|
|
* visible before the byte_count field is set.
|
|
|
|
* Otherwise the HCA prefetcher could grab the 64-byte
|
|
|
|
* chunk with this inline segment and get a valid (!=
|
|
|
|
* 0xffffffff) byte count but stale data, and end up
|
|
|
|
* generating a packet with bad headers.
|
|
|
|
*
|
|
|
|
* The first inline segment's byte_count field doesn't
|
|
|
|
* need a barrier, because it comes after a
|
|
|
|
* control/MLX segment and therefore is at an offset
|
|
|
|
* of 16 mod 64.
|
|
|
|
*/
|
|
|
|
wmb();
|
|
|
|
inl->byte_count = cpu_to_be32(1 << 31 | (header_size - spc));
|
|
|
|
i = 2;
|
|
|
|
}
|
2007-05-09 09:00:38 +08:00
|
|
|
|
2008-04-17 12:09:28 +08:00
|
|
|
*mlx_seg_len =
|
|
|
|
ALIGN(i * sizeof (struct mlx4_wqe_inline_seg) + header_size, 16);
|
|
|
|
return 0;
|
2007-05-09 09:00:38 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static int mlx4_wq_overflow(struct mlx4_ib_wq *wq, int nreq, struct ib_cq *ib_cq)
|
|
|
|
{
|
|
|
|
unsigned cur;
|
|
|
|
struct mlx4_ib_cq *cq;
|
|
|
|
|
|
|
|
cur = wq->head - wq->tail;
|
2007-06-18 23:13:48 +08:00
|
|
|
if (likely(cur + nreq < wq->max_post))
|
2007-05-09 09:00:38 +08:00
|
|
|
return 0;
|
|
|
|
|
|
|
|
cq = to_mcq(ib_cq);
|
|
|
|
spin_lock(&cq->lock);
|
|
|
|
cur = wq->head - wq->tail;
|
|
|
|
spin_unlock(&cq->lock);
|
|
|
|
|
2007-06-18 23:13:48 +08:00
|
|
|
return cur + nreq >= wq->max_post;
|
2007-05-09 09:00:38 +08:00
|
|
|
}
|
|
|
|
|
2008-07-23 23:12:26 +08:00
|
|
|
static __be32 convert_access(int acc)
|
|
|
|
{
|
2013-02-07 00:19:15 +08:00
|
|
|
return (acc & IB_ACCESS_REMOTE_ATOMIC ?
|
|
|
|
cpu_to_be32(MLX4_WQE_FMR_AND_BIND_PERM_ATOMIC) : 0) |
|
|
|
|
(acc & IB_ACCESS_REMOTE_WRITE ?
|
|
|
|
cpu_to_be32(MLX4_WQE_FMR_AND_BIND_PERM_REMOTE_WRITE) : 0) |
|
|
|
|
(acc & IB_ACCESS_REMOTE_READ ?
|
|
|
|
cpu_to_be32(MLX4_WQE_FMR_AND_BIND_PERM_REMOTE_READ) : 0) |
|
2008-07-23 23:12:26 +08:00
|
|
|
(acc & IB_ACCESS_LOCAL_WRITE ? cpu_to_be32(MLX4_WQE_FMR_PERM_LOCAL_WRITE) : 0) |
|
|
|
|
cpu_to_be32(MLX4_WQE_FMR_PERM_LOCAL_READ);
|
|
|
|
}
|
|
|
|
|
2015-10-14 00:11:27 +08:00
|
|
|
static void set_reg_seg(struct mlx4_wqe_fmr_seg *fseg,
|
|
|
|
struct ib_reg_wr *wr)
|
|
|
|
{
|
|
|
|
struct mlx4_ib_mr *mr = to_mmr(wr->mr);
|
|
|
|
|
|
|
|
fseg->flags = convert_access(wr->access);
|
|
|
|
fseg->mem_key = cpu_to_be32(wr->key);
|
|
|
|
fseg->buf_list = cpu_to_be64(mr->page_map);
|
|
|
|
fseg->start_addr = cpu_to_be64(mr->ibmr.iova);
|
|
|
|
fseg->reg_len = cpu_to_be64(mr->ibmr.length);
|
|
|
|
fseg->offset = 0; /* XXX -- is this just for ZBVA? */
|
|
|
|
fseg->page_size = cpu_to_be32(ilog2(mr->ibmr.page_size));
|
|
|
|
fseg->reserved[0] = 0;
|
|
|
|
fseg->reserved[1] = 0;
|
|
|
|
}
|
|
|
|
|
2008-07-23 23:12:26 +08:00
|
|
|
static void set_local_inv_seg(struct mlx4_wqe_local_inval_seg *iseg, u32 rkey)
|
|
|
|
{
|
2013-02-07 00:19:07 +08:00
|
|
|
memset(iseg, 0, sizeof(*iseg));
|
|
|
|
iseg->mem_key = cpu_to_be32(rkey);
|
2008-07-23 23:12:26 +08:00
|
|
|
}
|
|
|
|
|
2007-07-19 02:47:55 +08:00
|
|
|
static __always_inline void set_raddr_seg(struct mlx4_wqe_raddr_seg *rseg,
|
|
|
|
u64 remote_addr, u32 rkey)
|
|
|
|
{
|
|
|
|
rseg->raddr = cpu_to_be64(remote_addr);
|
|
|
|
rseg->rkey = cpu_to_be32(rkey);
|
|
|
|
rseg->reserved = 0;
|
|
|
|
}
|
|
|
|
|
2015-10-08 16:16:33 +08:00
|
|
|
static void set_atomic_seg(struct mlx4_wqe_atomic_seg *aseg,
|
|
|
|
struct ib_atomic_wr *wr)
|
2007-07-19 02:47:55 +08:00
|
|
|
{
|
2015-10-08 16:16:33 +08:00
|
|
|
if (wr->wr.opcode == IB_WR_ATOMIC_CMP_AND_SWP) {
|
|
|
|
aseg->swap_add = cpu_to_be64(wr->swap);
|
|
|
|
aseg->compare = cpu_to_be64(wr->compare_add);
|
|
|
|
} else if (wr->wr.opcode == IB_WR_MASKED_ATOMIC_FETCH_AND_ADD) {
|
|
|
|
aseg->swap_add = cpu_to_be64(wr->compare_add);
|
|
|
|
aseg->compare = cpu_to_be64(wr->compare_add_mask);
|
2007-07-19 02:47:55 +08:00
|
|
|
} else {
|
2015-10-08 16:16:33 +08:00
|
|
|
aseg->swap_add = cpu_to_be64(wr->compare_add);
|
2007-07-19 02:47:55 +08:00
|
|
|
aseg->compare = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
}
|
|
|
|
|
2010-04-14 22:23:39 +08:00
|
|
|
static void set_masked_atomic_seg(struct mlx4_wqe_masked_atomic_seg *aseg,
|
2015-10-08 16:16:33 +08:00
|
|
|
struct ib_atomic_wr *wr)
|
2010-04-14 22:23:39 +08:00
|
|
|
{
|
2015-10-08 16:16:33 +08:00
|
|
|
aseg->swap_add = cpu_to_be64(wr->swap);
|
|
|
|
aseg->swap_add_mask = cpu_to_be64(wr->swap_mask);
|
|
|
|
aseg->compare = cpu_to_be64(wr->compare_add);
|
|
|
|
aseg->compare_mask = cpu_to_be64(wr->compare_add_mask);
|
2010-04-14 22:23:39 +08:00
|
|
|
}
|
|
|
|
|
2007-07-19 02:47:55 +08:00
|
|
|
static void set_datagram_seg(struct mlx4_wqe_datagram_seg *dseg,
|
2015-10-08 16:16:33 +08:00
|
|
|
struct ib_ud_wr *wr)
|
2007-07-19 02:47:55 +08:00
|
|
|
{
|
2015-10-08 16:16:33 +08:00
|
|
|
memcpy(dseg->av, &to_mah(wr->ah)->av, sizeof (struct mlx4_av));
|
|
|
|
dseg->dqpn = cpu_to_be32(wr->remote_qpn);
|
|
|
|
dseg->qkey = cpu_to_be32(wr->remote_qkey);
|
|
|
|
dseg->vlan = to_mah(wr->ah)->av.eth.vlan;
|
|
|
|
memcpy(dseg->mac, to_mah(wr->ah)->av.eth.mac, 6);
|
2007-07-19 02:47:55 +08:00
|
|
|
}
|
|
|
|
|
2012-08-03 16:40:40 +08:00
|
|
|
static void set_tunnel_datagram_seg(struct mlx4_ib_dev *dev,
|
|
|
|
struct mlx4_wqe_datagram_seg *dseg,
|
2015-10-08 16:16:33 +08:00
|
|
|
struct ib_ud_wr *wr,
|
2014-05-29 21:31:02 +08:00
|
|
|
enum mlx4_ib_qp_type qpt)
|
2012-08-03 16:40:40 +08:00
|
|
|
{
|
2015-10-08 16:16:33 +08:00
|
|
|
union mlx4_ext_av *av = &to_mah(wr->ah)->av;
|
2012-08-03 16:40:40 +08:00
|
|
|
struct mlx4_av sqp_av = {0};
|
|
|
|
int port = *((u8 *) &av->ib.port_pd) & 0x3;
|
|
|
|
|
|
|
|
/* force loopback */
|
|
|
|
sqp_av.port_pd = av->ib.port_pd | cpu_to_be32(0x80000000);
|
|
|
|
sqp_av.g_slid = av->ib.g_slid & 0x7f; /* no GRH */
|
|
|
|
sqp_av.sl_tclass_flowlabel = av->ib.sl_tclass_flowlabel &
|
|
|
|
cpu_to_be32(0xf0000000);
|
|
|
|
|
|
|
|
memcpy(dseg->av, &sqp_av, sizeof (struct mlx4_av));
|
2014-05-29 21:31:02 +08:00
|
|
|
if (qpt == MLX4_IB_QPT_PROXY_GSI)
|
|
|
|
dseg->dqpn = cpu_to_be32(dev->dev->caps.qp1_tunnel[port - 1]);
|
|
|
|
else
|
|
|
|
dseg->dqpn = cpu_to_be32(dev->dev->caps.qp0_tunnel[port - 1]);
|
mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations
Previously, the structure of a guest's proxy QPs followed the
structure of the PPF special qps (qp0 port 1, qp0 port 2, qp1 port 1,
qp1 port 2, ...). The guest then did offset calculations on the
sqp_base qp number that the PPF passed to it in QUERY_FUNC_CAP().
This is now changed so that the guest does no offset calculations
regarding proxy or tunnel QPs to use. This change frees the PPF from
needing to adhere to a specific order in allocating proxy and tunnel
QPs.
Now QUERY_FUNC_CAP provides each port individually with its proxy
qp0, proxy qp1, tunnel qp0, and tunnel qp1 QP numbers, and these are
used directly where required (with no offset calculations).
To accomplish this change, several fields were added to the phys_caps
structure for use by the PPF and by non-SR-IOV mode:
base_sqpn -- in non-sriov mode, this was formerly sqp_start.
base_proxy_sqpn -- the first physical proxy qp number -- used by PPF
base_tunnel_sqpn -- the first physical tunnel qp number -- used by PPF.
The current code in the PPF still adheres to the previous layout of
sqps, proxy-sqps and tunnel-sqps. However, the PPF can change this
layout without affecting VF or (paravirtualized) PF code.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-08-03 16:40:57 +08:00
|
|
|
/* Use QKEY from the QP context, which is set by master */
|
|
|
|
dseg->qkey = cpu_to_be32(IB_QP_SET_QKEY);
|
2012-08-03 16:40:40 +08:00
|
|
|
}
|
|
|
|
|
2015-10-08 16:16:33 +08:00
|
|
|
static void build_tunnel_header(struct ib_ud_wr *wr, void *wqe, unsigned *mlx_seg_len)
|
2012-08-03 16:40:40 +08:00
|
|
|
{
|
|
|
|
struct mlx4_wqe_inline_seg *inl = wqe;
|
|
|
|
struct mlx4_ib_tunnel_header hdr;
|
2015-10-08 16:16:33 +08:00
|
|
|
struct mlx4_ib_ah *ah = to_mah(wr->ah);
|
2012-08-03 16:40:40 +08:00
|
|
|
int spc;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
memcpy(&hdr.av, &ah->av, sizeof hdr.av);
|
2015-10-08 16:16:33 +08:00
|
|
|
hdr.remote_qpn = cpu_to_be32(wr->remote_qpn);
|
|
|
|
hdr.pkey_index = cpu_to_be16(wr->pkey_index);
|
|
|
|
hdr.qkey = cpu_to_be32(wr->remote_qkey);
|
mlx4: Implement IP based gids support for RoCE/SRIOV
Since there is no connection between the MAC/VLAN and the GID
when using IP-based addressing, the proxy QP1 (running on the
slave) must pass the source-mac, destination-mac, and vlan_id
information separately from the GID. Additionally, the Host
must pass the remote source-mac and vlan_id back to the slave,
This is achieved as follows:
Outgoing MADs:
1. Source MAC: obtained from the CQ completion structure
(struct ib_wc, smac field).
2. Destination MAC: obtained from the tunnel header
3. vlan_id: obtained from the tunnel header.
Incoming MADs
1. The source (i.e., remote) MAC and vlan_id are passed in
the tunnel header to the proxy QP1.
VST mode support:
For outgoing MADs, the vlan_id obtained from the header is
discarded, and the vlan_id specified by the Hypervisor is used
instead.
For incoming MADs, the incoming vlan_id (in the wc) is discarded, and the
"invalid" vlan (0xffff) is substituted when forwarding to the slave.
Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 18:00:41 +08:00
|
|
|
memcpy(hdr.mac, ah->av.eth.mac, 6);
|
|
|
|
hdr.vlan = ah->av.eth.vlan;
|
2012-08-03 16:40:40 +08:00
|
|
|
|
|
|
|
spc = MLX4_INLINE_ALIGN -
|
|
|
|
((unsigned long) (inl + 1) & (MLX4_INLINE_ALIGN - 1));
|
|
|
|
if (sizeof (hdr) <= spc) {
|
|
|
|
memcpy(inl + 1, &hdr, sizeof (hdr));
|
|
|
|
wmb();
|
|
|
|
inl->byte_count = cpu_to_be32(1 << 31 | sizeof (hdr));
|
|
|
|
i = 1;
|
|
|
|
} else {
|
|
|
|
memcpy(inl + 1, &hdr, spc);
|
|
|
|
wmb();
|
|
|
|
inl->byte_count = cpu_to_be32(1 << 31 | spc);
|
|
|
|
|
|
|
|
inl = (void *) (inl + 1) + spc;
|
|
|
|
memcpy(inl + 1, (void *) &hdr + spc, sizeof (hdr) - spc);
|
|
|
|
wmb();
|
|
|
|
inl->byte_count = cpu_to_be32(1 << 31 | (sizeof (hdr) - spc));
|
|
|
|
i = 2;
|
|
|
|
}
|
|
|
|
|
|
|
|
*mlx_seg_len =
|
|
|
|
ALIGN(i * sizeof (struct mlx4_wqe_inline_seg) + sizeof (hdr), 16);
|
|
|
|
}
|
|
|
|
|
2007-09-20 00:52:25 +08:00
|
|
|
static void set_mlx_icrc_seg(void *dseg)
|
|
|
|
{
|
|
|
|
u32 *t = dseg;
|
|
|
|
struct mlx4_wqe_inline_seg *iseg = dseg;
|
|
|
|
|
|
|
|
t[1] = 0;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Need a barrier here before writing the byte_count field to
|
|
|
|
* make sure that all the data is visible before the
|
|
|
|
* byte_count field is set. Otherwise, if the segment begins
|
|
|
|
* a new cacheline, the HCA prefetcher could grab the 64-byte
|
|
|
|
* chunk and get a valid (!= * 0xffffffff) byte count but
|
|
|
|
* stale data, and end up sending the wrong data.
|
|
|
|
*/
|
|
|
|
wmb();
|
|
|
|
|
|
|
|
iseg->byte_count = cpu_to_be32((1 << 31) | 4);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void set_data_seg(struct mlx4_wqe_data_seg *dseg, struct ib_sge *sg)
|
2007-07-19 02:46:27 +08:00
|
|
|
{
|
|
|
|
dseg->lkey = cpu_to_be32(sg->lkey);
|
|
|
|
dseg->addr = cpu_to_be64(sg->addr);
|
2007-09-20 00:52:25 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Need a barrier here before writing the byte_count field to
|
|
|
|
* make sure that all the data is visible before the
|
|
|
|
* byte_count field is set. Otherwise, if the segment begins
|
|
|
|
* a new cacheline, the HCA prefetcher could grab the 64-byte
|
|
|
|
* chunk and get a valid (!= * 0xffffffff) byte count but
|
|
|
|
* stale data, and end up sending the wrong data.
|
|
|
|
*/
|
|
|
|
wmb();
|
|
|
|
|
|
|
|
dseg->byte_count = cpu_to_be32(sg->length);
|
2007-07-19 02:46:27 +08:00
|
|
|
}
|
|
|
|
|
2007-10-10 10:59:05 +08:00
|
|
|
static void __set_data_seg(struct mlx4_wqe_data_seg *dseg, struct ib_sge *sg)
|
|
|
|
{
|
|
|
|
dseg->byte_count = cpu_to_be32(sg->length);
|
|
|
|
dseg->lkey = cpu_to_be32(sg->lkey);
|
|
|
|
dseg->addr = cpu_to_be64(sg->addr);
|
|
|
|
}
|
|
|
|
|
2015-10-08 16:16:33 +08:00
|
|
|
static int build_lso_seg(struct mlx4_wqe_lso_seg *wqe, struct ib_ud_wr *wr,
|
2009-01-17 04:47:47 +08:00
|
|
|
struct mlx4_ib_qp *qp, unsigned *lso_seg_len,
|
2009-11-13 03:19:44 +08:00
|
|
|
__be32 *lso_hdr_sz, __be32 *blh)
|
2008-04-17 12:09:27 +08:00
|
|
|
{
|
2015-10-08 16:16:33 +08:00
|
|
|
unsigned halign = ALIGN(sizeof *wqe + wr->hlen, 16);
|
2008-04-17 12:09:27 +08:00
|
|
|
|
2009-11-13 03:19:44 +08:00
|
|
|
if (unlikely(halign > MLX4_IB_CACHE_LINE_SIZE))
|
|
|
|
*blh = cpu_to_be32(1 << 6);
|
2008-04-17 12:09:27 +08:00
|
|
|
|
|
|
|
if (unlikely(!(qp->flags & MLX4_IB_QP_LSO) &&
|
2015-10-08 16:16:33 +08:00
|
|
|
wr->wr.num_sge > qp->sq.max_gs - (halign >> 4)))
|
2008-04-17 12:09:27 +08:00
|
|
|
return -EINVAL;
|
|
|
|
|
2015-10-08 16:16:33 +08:00
|
|
|
memcpy(wqe->header, wr->header, wr->hlen);
|
2008-04-17 12:09:27 +08:00
|
|
|
|
2015-10-08 16:16:33 +08:00
|
|
|
*lso_hdr_sz = cpu_to_be32(wr->mss << 16 | wr->hlen);
|
2008-04-17 12:09:27 +08:00
|
|
|
*lso_seg_len = halign;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2008-07-23 23:12:26 +08:00
|
|
|
static __be32 send_ieth(struct ib_send_wr *wr)
|
|
|
|
{
|
|
|
|
switch (wr->opcode) {
|
|
|
|
case IB_WR_SEND_WITH_IMM:
|
|
|
|
case IB_WR_RDMA_WRITE_WITH_IMM:
|
|
|
|
return wr->ex.imm_data;
|
|
|
|
|
|
|
|
case IB_WR_SEND_WITH_INV:
|
|
|
|
return cpu_to_be32(wr->ex.invalidate_rkey);
|
|
|
|
|
|
|
|
default:
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2012-08-03 16:40:40 +08:00
|
|
|
static void add_zero_len_inline(void *wqe)
|
|
|
|
{
|
|
|
|
struct mlx4_wqe_inline_seg *inl = wqe;
|
|
|
|
memset(wqe, 0, 16);
|
|
|
|
inl->byte_count = cpu_to_be32(1 << 31);
|
|
|
|
}
|
|
|
|
|
2007-05-09 09:00:38 +08:00
|
|
|
int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
|
|
|
|
struct ib_send_wr **bad_wr)
|
|
|
|
{
|
|
|
|
struct mlx4_ib_qp *qp = to_mqp(ibqp);
|
|
|
|
void *wqe;
|
|
|
|
struct mlx4_wqe_ctrl_seg *ctrl;
|
2007-09-20 00:52:25 +08:00
|
|
|
struct mlx4_wqe_data_seg *dseg;
|
2007-05-09 09:00:38 +08:00
|
|
|
unsigned long flags;
|
|
|
|
int nreq;
|
|
|
|
int err = 0;
|
IB/mlx4: Use multiple WQ blocks to post smaller send WQEs
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift. This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting. Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.
Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.
Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs. However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware. And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.
When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-28 16:40:59 +08:00
|
|
|
unsigned ind;
|
|
|
|
int uninitialized_var(stamp);
|
|
|
|
int uninitialized_var(size);
|
2008-05-17 05:28:30 +08:00
|
|
|
unsigned uninitialized_var(seglen);
|
2009-01-17 04:47:47 +08:00
|
|
|
__be32 dummy;
|
|
|
|
__be32 *lso_wqe;
|
|
|
|
__be32 uninitialized_var(lso_hdr_sz);
|
2009-11-13 03:19:44 +08:00
|
|
|
__be32 blh;
|
2007-05-09 09:00:38 +08:00
|
|
|
int i;
|
2015-02-08 17:49:34 +08:00
|
|
|
struct mlx4_ib_dev *mdev = to_mdev(ibqp->device);
|
2007-05-09 09:00:38 +08:00
|
|
|
|
2016-01-14 23:50:42 +08:00
|
|
|
if (qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI) {
|
|
|
|
struct mlx4_ib_sqp *sqp = to_msqp(qp);
|
|
|
|
|
|
|
|
if (sqp->roce_v2_gsi) {
|
|
|
|
struct mlx4_ib_ah *ah = to_mah(ud_wr(wr)->ah);
|
|
|
|
struct ib_gid_attr gid_attr;
|
|
|
|
union ib_gid gid;
|
|
|
|
|
|
|
|
if (!ib_get_cached_gid(ibqp->device,
|
|
|
|
be32_to_cpu(ah->av.ib.port_pd) >> 24,
|
|
|
|
ah->av.ib.gid_index, &gid,
|
|
|
|
&gid_attr)) {
|
|
|
|
if (gid_attr.ndev)
|
|
|
|
dev_put(gid_attr.ndev);
|
|
|
|
qp = (gid_attr.gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP) ?
|
|
|
|
to_mqp(sqp->roce_v2_gsi) : qp;
|
|
|
|
} else {
|
|
|
|
pr_err("Failed to get gid at index %d. RoCEv2 will not work properly\n",
|
|
|
|
ah->av.ib.gid_index);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2007-10-31 01:53:54 +08:00
|
|
|
spin_lock_irqsave(&qp->sq.lock, flags);
|
2015-02-08 17:49:34 +08:00
|
|
|
if (mdev->dev->persist->state & MLX4_DEVICE_STATE_INTERNAL_ERROR) {
|
|
|
|
err = -EIO;
|
|
|
|
*bad_wr = wr;
|
|
|
|
nreq = 0;
|
|
|
|
goto out;
|
|
|
|
}
|
2007-05-09 09:00:38 +08:00
|
|
|
|
IB/mlx4: Use multiple WQ blocks to post smaller send WQEs
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift. This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting. Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.
Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.
Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs. However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware. And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.
When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-28 16:40:59 +08:00
|
|
|
ind = qp->sq_next_wqe;
|
2007-05-09 09:00:38 +08:00
|
|
|
|
|
|
|
for (nreq = 0; wr; ++nreq, wr = wr->next) {
|
2009-01-17 04:47:47 +08:00
|
|
|
lso_wqe = &dummy;
|
2009-11-13 03:19:44 +08:00
|
|
|
blh = 0;
|
2009-01-17 04:47:47 +08:00
|
|
|
|
2007-05-09 09:00:38 +08:00
|
|
|
if (mlx4_wq_overflow(&qp->sq, nreq, qp->ibqp.send_cq)) {
|
|
|
|
err = -ENOMEM;
|
|
|
|
*bad_wr = wr;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (unlikely(wr->num_sge > qp->sq.max_gs)) {
|
|
|
|
err = -EINVAL;
|
|
|
|
*bad_wr = wr;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2007-06-18 23:13:48 +08:00
|
|
|
ctrl = wqe = get_send_wqe(qp, ind & (qp->sq.wqe_cnt - 1));
|
IB/mlx4: Use multiple WQ blocks to post smaller send WQEs
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift. This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting. Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.
Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.
Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs. However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware. And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.
When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-28 16:40:59 +08:00
|
|
|
qp->sq.wrid[(qp->sq.head + nreq) & (qp->sq.wqe_cnt - 1)] = wr->wr_id;
|
2007-05-09 09:00:38 +08:00
|
|
|
|
|
|
|
ctrl->srcrb_flags =
|
|
|
|
(wr->send_flags & IB_SEND_SIGNALED ?
|
|
|
|
cpu_to_be32(MLX4_WQE_CTRL_CQ_UPDATE) : 0) |
|
|
|
|
(wr->send_flags & IB_SEND_SOLICITED ?
|
|
|
|
cpu_to_be32(MLX4_WQE_CTRL_SOLICITED) : 0) |
|
2008-04-17 12:01:10 +08:00
|
|
|
((wr->send_flags & IB_SEND_IP_CSUM) ?
|
|
|
|
cpu_to_be32(MLX4_WQE_CTRL_IP_CSUM |
|
|
|
|
MLX4_WQE_CTRL_TCP_UDP_CSUM) : 0) |
|
2007-05-09 09:00:38 +08:00
|
|
|
qp->sq_signal_bits;
|
|
|
|
|
2008-07-23 23:12:26 +08:00
|
|
|
ctrl->imm = send_ieth(wr);
|
2007-05-09 09:00:38 +08:00
|
|
|
|
|
|
|
wqe += sizeof *ctrl;
|
|
|
|
size = sizeof *ctrl / 16;
|
|
|
|
|
2012-08-03 16:40:40 +08:00
|
|
|
switch (qp->mlx4_ib_qp_type) {
|
|
|
|
case MLX4_IB_QPT_RC:
|
|
|
|
case MLX4_IB_QPT_UC:
|
2007-05-09 09:00:38 +08:00
|
|
|
switch (wr->opcode) {
|
|
|
|
case IB_WR_ATOMIC_CMP_AND_SWP:
|
|
|
|
case IB_WR_ATOMIC_FETCH_AND_ADD:
|
2010-04-14 22:23:39 +08:00
|
|
|
case IB_WR_MASKED_ATOMIC_FETCH_AND_ADD:
|
2015-10-08 16:16:33 +08:00
|
|
|
set_raddr_seg(wqe, atomic_wr(wr)->remote_addr,
|
|
|
|
atomic_wr(wr)->rkey);
|
2007-05-09 09:00:38 +08:00
|
|
|
wqe += sizeof (struct mlx4_wqe_raddr_seg);
|
|
|
|
|
2015-10-08 16:16:33 +08:00
|
|
|
set_atomic_seg(wqe, atomic_wr(wr));
|
2007-05-09 09:00:38 +08:00
|
|
|
wqe += sizeof (struct mlx4_wqe_atomic_seg);
|
2007-07-19 02:47:55 +08:00
|
|
|
|
2007-05-09 09:00:38 +08:00
|
|
|
size += (sizeof (struct mlx4_wqe_raddr_seg) +
|
|
|
|
sizeof (struct mlx4_wqe_atomic_seg)) / 16;
|
2010-04-14 22:23:39 +08:00
|
|
|
|
|
|
|
break;
|
|
|
|
|
|
|
|
case IB_WR_MASKED_ATOMIC_CMP_AND_SWP:
|
2015-10-08 16:16:33 +08:00
|
|
|
set_raddr_seg(wqe, atomic_wr(wr)->remote_addr,
|
|
|
|
atomic_wr(wr)->rkey);
|
2010-04-14 22:23:39 +08:00
|
|
|
wqe += sizeof (struct mlx4_wqe_raddr_seg);
|
|
|
|
|
2015-10-08 16:16:33 +08:00
|
|
|
set_masked_atomic_seg(wqe, atomic_wr(wr));
|
2010-04-14 22:23:39 +08:00
|
|
|
wqe += sizeof (struct mlx4_wqe_masked_atomic_seg);
|
|
|
|
|
|
|
|
size += (sizeof (struct mlx4_wqe_raddr_seg) +
|
|
|
|
sizeof (struct mlx4_wqe_masked_atomic_seg)) / 16;
|
2007-05-09 09:00:38 +08:00
|
|
|
|
|
|
|
break;
|
|
|
|
|
|
|
|
case IB_WR_RDMA_READ:
|
|
|
|
case IB_WR_RDMA_WRITE:
|
|
|
|
case IB_WR_RDMA_WRITE_WITH_IMM:
|
2015-10-08 16:16:33 +08:00
|
|
|
set_raddr_seg(wqe, rdma_wr(wr)->remote_addr,
|
|
|
|
rdma_wr(wr)->rkey);
|
2007-05-09 09:00:38 +08:00
|
|
|
wqe += sizeof (struct mlx4_wqe_raddr_seg);
|
|
|
|
size += sizeof (struct mlx4_wqe_raddr_seg) / 16;
|
|
|
|
break;
|
2008-07-23 23:12:26 +08:00
|
|
|
|
|
|
|
case IB_WR_LOCAL_INV:
|
2009-06-06 01:36:24 +08:00
|
|
|
ctrl->srcrb_flags |=
|
|
|
|
cpu_to_be32(MLX4_WQE_CTRL_STRONG_ORDER);
|
2008-07-23 23:12:26 +08:00
|
|
|
set_local_inv_seg(wqe, wr->ex.invalidate_rkey);
|
|
|
|
wqe += sizeof (struct mlx4_wqe_local_inval_seg);
|
|
|
|
size += sizeof (struct mlx4_wqe_local_inval_seg) / 16;
|
|
|
|
break;
|
|
|
|
|
2015-10-14 00:11:27 +08:00
|
|
|
case IB_WR_REG_MR:
|
|
|
|
ctrl->srcrb_flags |=
|
|
|
|
cpu_to_be32(MLX4_WQE_CTRL_STRONG_ORDER);
|
|
|
|
set_reg_seg(wqe, reg_wr(wr));
|
|
|
|
wqe += sizeof(struct mlx4_wqe_fmr_seg);
|
|
|
|
size += sizeof(struct mlx4_wqe_fmr_seg) / 16;
|
|
|
|
break;
|
|
|
|
|
2007-05-09 09:00:38 +08:00
|
|
|
default:
|
|
|
|
/* No extra segments required for sends */
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
|
2012-08-03 16:40:40 +08:00
|
|
|
case MLX4_IB_QPT_TUN_SMI_OWNER:
|
2015-10-08 16:16:33 +08:00
|
|
|
err = build_sriov_qp0_header(to_msqp(qp), ud_wr(wr),
|
|
|
|
ctrl, &seglen);
|
2012-08-03 16:40:40 +08:00
|
|
|
if (unlikely(err)) {
|
|
|
|
*bad_wr = wr;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
wqe += seglen;
|
|
|
|
size += seglen / 16;
|
|
|
|
break;
|
|
|
|
case MLX4_IB_QPT_TUN_SMI:
|
|
|
|
case MLX4_IB_QPT_TUN_GSI:
|
|
|
|
/* this is a UD qp used in MAD responses to slaves. */
|
2015-10-08 16:16:33 +08:00
|
|
|
set_datagram_seg(wqe, ud_wr(wr));
|
2012-08-03 16:40:40 +08:00
|
|
|
/* set the forced-loopback bit in the data seg av */
|
|
|
|
*(__be32 *) wqe |= cpu_to_be32(0x80000000);
|
|
|
|
wqe += sizeof (struct mlx4_wqe_datagram_seg);
|
|
|
|
size += sizeof (struct mlx4_wqe_datagram_seg) / 16;
|
|
|
|
break;
|
|
|
|
case MLX4_IB_QPT_UD:
|
2015-10-08 16:16:33 +08:00
|
|
|
set_datagram_seg(wqe, ud_wr(wr));
|
2007-05-09 09:00:38 +08:00
|
|
|
wqe += sizeof (struct mlx4_wqe_datagram_seg);
|
|
|
|
size += sizeof (struct mlx4_wqe_datagram_seg) / 16;
|
2008-04-17 12:09:27 +08:00
|
|
|
|
|
|
|
if (wr->opcode == IB_WR_LSO) {
|
2015-10-08 16:16:33 +08:00
|
|
|
err = build_lso_seg(wqe, ud_wr(wr), qp, &seglen,
|
|
|
|
&lso_hdr_sz, &blh);
|
2008-04-17 12:09:27 +08:00
|
|
|
if (unlikely(err)) {
|
|
|
|
*bad_wr = wr;
|
|
|
|
goto out;
|
|
|
|
}
|
2009-01-17 04:47:47 +08:00
|
|
|
lso_wqe = (__be32 *) wqe;
|
2008-04-17 12:09:27 +08:00
|
|
|
wqe += seglen;
|
|
|
|
size += seglen / 16;
|
|
|
|
}
|
2007-05-09 09:00:38 +08:00
|
|
|
break;
|
|
|
|
|
2012-08-03 16:40:40 +08:00
|
|
|
case MLX4_IB_QPT_PROXY_SMI_OWNER:
|
2015-10-08 16:16:33 +08:00
|
|
|
err = build_sriov_qp0_header(to_msqp(qp), ud_wr(wr),
|
|
|
|
ctrl, &seglen);
|
2012-08-03 16:40:40 +08:00
|
|
|
if (unlikely(err)) {
|
|
|
|
*bad_wr = wr;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
wqe += seglen;
|
|
|
|
size += seglen / 16;
|
|
|
|
/* to start tunnel header on a cache-line boundary */
|
|
|
|
add_zero_len_inline(wqe);
|
|
|
|
wqe += 16;
|
|
|
|
size++;
|
2015-10-08 16:16:33 +08:00
|
|
|
build_tunnel_header(ud_wr(wr), wqe, &seglen);
|
2012-08-03 16:40:40 +08:00
|
|
|
wqe += seglen;
|
|
|
|
size += seglen / 16;
|
|
|
|
break;
|
|
|
|
case MLX4_IB_QPT_PROXY_SMI:
|
|
|
|
case MLX4_IB_QPT_PROXY_GSI:
|
|
|
|
/* If we are tunneling special qps, this is a UD qp.
|
|
|
|
* In this case we first add a UD segment targeting
|
|
|
|
* the tunnel qp, and then add a header with address
|
|
|
|
* information */
|
2015-10-08 16:16:33 +08:00
|
|
|
set_tunnel_datagram_seg(to_mdev(ibqp->device), wqe,
|
|
|
|
ud_wr(wr),
|
2014-05-29 21:31:02 +08:00
|
|
|
qp->mlx4_ib_qp_type);
|
2012-08-03 16:40:40 +08:00
|
|
|
wqe += sizeof (struct mlx4_wqe_datagram_seg);
|
|
|
|
size += sizeof (struct mlx4_wqe_datagram_seg) / 16;
|
2015-10-08 16:16:33 +08:00
|
|
|
build_tunnel_header(ud_wr(wr), wqe, &seglen);
|
2012-08-03 16:40:40 +08:00
|
|
|
wqe += seglen;
|
|
|
|
size += seglen / 16;
|
|
|
|
break;
|
|
|
|
|
|
|
|
case MLX4_IB_QPT_SMI:
|
|
|
|
case MLX4_IB_QPT_GSI:
|
2015-10-08 16:16:33 +08:00
|
|
|
err = build_mlx_header(to_msqp(qp), ud_wr(wr), ctrl,
|
|
|
|
&seglen);
|
2008-04-17 12:09:28 +08:00
|
|
|
if (unlikely(err)) {
|
2007-05-09 09:00:38 +08:00
|
|
|
*bad_wr = wr;
|
|
|
|
goto out;
|
|
|
|
}
|
2008-04-17 12:09:28 +08:00
|
|
|
wqe += seglen;
|
|
|
|
size += seglen / 16;
|
2007-05-09 09:00:38 +08:00
|
|
|
break;
|
|
|
|
|
|
|
|
default:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2007-09-20 00:52:25 +08:00
|
|
|
/*
|
|
|
|
* Write data segments in reverse order, so as to
|
|
|
|
* overwrite cacheline stamp last within each
|
|
|
|
* cacheline. This avoids issues with WQE
|
|
|
|
* prefetching.
|
|
|
|
*/
|
2007-05-09 09:00:38 +08:00
|
|
|
|
2007-09-20 00:52:25 +08:00
|
|
|
dseg = wqe;
|
|
|
|
dseg += wr->num_sge - 1;
|
|
|
|
size += wr->num_sge * (sizeof (struct mlx4_wqe_data_seg) / 16);
|
2007-05-09 09:00:38 +08:00
|
|
|
|
|
|
|
/* Add one more inline data segment for ICRC for MLX sends */
|
2012-08-03 16:40:40 +08:00
|
|
|
if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI ||
|
|
|
|
qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI ||
|
|
|
|
qp->mlx4_ib_qp_type &
|
|
|
|
(MLX4_IB_QPT_PROXY_SMI_OWNER | MLX4_IB_QPT_TUN_SMI_OWNER))) {
|
2007-09-20 00:52:25 +08:00
|
|
|
set_mlx_icrc_seg(dseg + 1);
|
2007-05-09 09:00:38 +08:00
|
|
|
size += sizeof (struct mlx4_wqe_data_seg) / 16;
|
|
|
|
}
|
|
|
|
|
2007-09-20 00:52:25 +08:00
|
|
|
for (i = wr->num_sge - 1; i >= 0; --i, --dseg)
|
|
|
|
set_data_seg(dseg, wr->sg_list + i);
|
|
|
|
|
2009-01-17 04:47:47 +08:00
|
|
|
/*
|
|
|
|
* Possibly overwrite stamping in cacheline with LSO
|
|
|
|
* segment only after making sure all data segments
|
|
|
|
* are written.
|
|
|
|
*/
|
|
|
|
wmb();
|
|
|
|
*lso_wqe = lso_hdr_sz;
|
|
|
|
|
2016-07-20 03:16:54 +08:00
|
|
|
ctrl->qpn_vlan.fence_size = (wr->send_flags & IB_SEND_FENCE ?
|
|
|
|
MLX4_WQE_CTRL_FENCE : 0) | size;
|
2007-05-09 09:00:38 +08:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Make sure descriptor is fully written before
|
|
|
|
* setting ownership bit (because HW can start
|
|
|
|
* executing as soon as we do).
|
|
|
|
*/
|
|
|
|
wmb();
|
|
|
|
|
2007-05-19 23:51:58 +08:00
|
|
|
if (wr->opcode < 0 || wr->opcode >= ARRAY_SIZE(mlx4_ib_opcode)) {
|
2012-02-10 00:52:50 +08:00
|
|
|
*bad_wr = wr;
|
2007-05-09 09:00:38 +08:00
|
|
|
err = -EINVAL;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
ctrl->owner_opcode = mlx4_ib_opcode[wr->opcode] |
|
2009-11-13 03:19:44 +08:00
|
|
|
(ind & qp->sq.wqe_cnt ? cpu_to_be32(1 << 31) : 0) | blh;
|
2007-06-18 23:13:48 +08:00
|
|
|
|
IB/mlx4: Use multiple WQ blocks to post smaller send WQEs
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift. This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting. Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.
Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.
Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs. However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware. And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.
When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-28 16:40:59 +08:00
|
|
|
stamp = ind + qp->sq_spare_wqes;
|
|
|
|
ind += DIV_ROUND_UP(size * 16, 1U << qp->sq.wqe_shift);
|
|
|
|
|
2007-06-18 23:13:48 +08:00
|
|
|
/*
|
|
|
|
* We can improve latency by not stamping the last
|
|
|
|
* send queue WQE until after ringing the doorbell, so
|
|
|
|
* only stamp here if there are still more WQEs to post.
|
IB/mlx4: Use multiple WQ blocks to post smaller send WQEs
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift. This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting. Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.
Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.
Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs. However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware. And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.
When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-28 16:40:59 +08:00
|
|
|
*
|
|
|
|
* Same optimization applies to padding with NOP wqe
|
|
|
|
* in case of WQE shrinking (used to prevent wrap-around
|
|
|
|
* in the middle of WR).
|
2007-06-18 23:13:48 +08:00
|
|
|
*/
|
IB/mlx4: Use multiple WQ blocks to post smaller send WQEs
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift. This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting. Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.
Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.
Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs. However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware. And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.
When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-28 16:40:59 +08:00
|
|
|
if (wr->next) {
|
|
|
|
stamp_send_wqe(qp, stamp, size * 16);
|
|
|
|
ind = pad_wraparound(qp, ind);
|
|
|
|
}
|
2007-05-09 09:00:38 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
out:
|
|
|
|
if (likely(nreq)) {
|
|
|
|
qp->sq.head += nreq;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Make sure that descriptors are written before
|
|
|
|
* doorbell record.
|
|
|
|
*/
|
|
|
|
wmb();
|
|
|
|
|
|
|
|
writel(qp->doorbell_qpn,
|
|
|
|
to_mdev(ibqp->device)->uar_map + MLX4_SEND_DOORBELL);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Make sure doorbells don't leak out of SQ spinlock
|
|
|
|
* and reach the HCA out of order.
|
|
|
|
*/
|
|
|
|
mmiowb();
|
2007-06-18 23:13:48 +08:00
|
|
|
|
IB/mlx4: Use multiple WQ blocks to post smaller send WQEs
ConnectX HCA supports shrinking WQEs, so that a single work request
can be made of multiple units of wqe_shift. This way, WRs can differ
in size, and do not have to be a power of 2 in size, saving memory and
speeding up send WR posting. Unfortunately, if we do this then the
wqe_index field in CQEs can't be used to look up the WR ID anymore, so
our implementation does this only if selective signaling is off.
Further, on 32-bit platforms, we can't use vmap() to make the QP
buffer virtually contigious. Thus we have to use constant-sized WRs to
make sure a WR is always fully within a single page-sized chunk.
Finally, we use WRs with the NOP opcode to avoid wrapping around the
queue buffer in the middle of posting a WR, and we set the
NoErrorCompletion bit to avoid getting completions with error for NOP
WRs. However, NEC is only supported starting with firmware 2.2.232,
so we use constant-sized WRs for older firmware. And, since MLX QPs
only support SEND, we use constant-sized WRs in this case.
When stamping during NOP posting, do stamping following setting of the
NOP WQE valid bit.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-28 16:40:59 +08:00
|
|
|
stamp_send_wqe(qp, stamp, size * 16);
|
|
|
|
|
|
|
|
ind = pad_wraparound(qp, ind);
|
|
|
|
qp->sq_next_wqe = ind;
|
2007-05-09 09:00:38 +08:00
|
|
|
}
|
|
|
|
|
2007-10-31 01:53:54 +08:00
|
|
|
spin_unlock_irqrestore(&qp->sq.lock, flags);
|
2007-05-09 09:00:38 +08:00
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
int mlx4_ib_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *wr,
|
|
|
|
struct ib_recv_wr **bad_wr)
|
|
|
|
{
|
|
|
|
struct mlx4_ib_qp *qp = to_mqp(ibqp);
|
|
|
|
struct mlx4_wqe_data_seg *scat;
|
|
|
|
unsigned long flags;
|
|
|
|
int err = 0;
|
|
|
|
int nreq;
|
|
|
|
int ind;
|
2012-08-03 16:40:40 +08:00
|
|
|
int max_gs;
|
2007-05-09 09:00:38 +08:00
|
|
|
int i;
|
2015-02-08 17:49:34 +08:00
|
|
|
struct mlx4_ib_dev *mdev = to_mdev(ibqp->device);
|
2007-05-09 09:00:38 +08:00
|
|
|
|
2012-08-03 16:40:40 +08:00
|
|
|
max_gs = qp->rq.max_gs;
|
2007-05-09 09:00:38 +08:00
|
|
|
spin_lock_irqsave(&qp->rq.lock, flags);
|
|
|
|
|
2015-02-08 17:49:34 +08:00
|
|
|
if (mdev->dev->persist->state & MLX4_DEVICE_STATE_INTERNAL_ERROR) {
|
|
|
|
err = -EIO;
|
|
|
|
*bad_wr = wr;
|
|
|
|
nreq = 0;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2007-06-18 23:13:48 +08:00
|
|
|
ind = qp->rq.head & (qp->rq.wqe_cnt - 1);
|
2007-05-09 09:00:38 +08:00
|
|
|
|
|
|
|
for (nreq = 0; wr; ++nreq, wr = wr->next) {
|
2010-01-07 04:51:30 +08:00
|
|
|
if (mlx4_wq_overflow(&qp->rq, nreq, qp->ibqp.recv_cq)) {
|
2007-05-09 09:00:38 +08:00
|
|
|
err = -ENOMEM;
|
|
|
|
*bad_wr = wr;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (unlikely(wr->num_sge > qp->rq.max_gs)) {
|
|
|
|
err = -EINVAL;
|
|
|
|
*bad_wr = wr;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
scat = get_recv_wqe(qp, ind);
|
|
|
|
|
2012-08-03 16:40:40 +08:00
|
|
|
if (qp->mlx4_ib_qp_type & (MLX4_IB_QPT_PROXY_SMI_OWNER |
|
|
|
|
MLX4_IB_QPT_PROXY_SMI | MLX4_IB_QPT_PROXY_GSI)) {
|
|
|
|
ib_dma_sync_single_for_device(ibqp->device,
|
|
|
|
qp->sqp_proxy_rcv[ind].map,
|
|
|
|
sizeof (struct mlx4_ib_proxy_sqp_hdr),
|
|
|
|
DMA_FROM_DEVICE);
|
|
|
|
scat->byte_count =
|
|
|
|
cpu_to_be32(sizeof (struct mlx4_ib_proxy_sqp_hdr));
|
|
|
|
/* use dma lkey from upper layer entry */
|
|
|
|
scat->lkey = cpu_to_be32(wr->sg_list->lkey);
|
|
|
|
scat->addr = cpu_to_be64(qp->sqp_proxy_rcv[ind].map);
|
|
|
|
scat++;
|
|
|
|
max_gs--;
|
|
|
|
}
|
|
|
|
|
2007-10-10 10:59:05 +08:00
|
|
|
for (i = 0; i < wr->num_sge; ++i)
|
|
|
|
__set_data_seg(scat + i, wr->sg_list + i);
|
2007-05-09 09:00:38 +08:00
|
|
|
|
2012-08-03 16:40:40 +08:00
|
|
|
if (i < max_gs) {
|
2007-05-09 09:00:38 +08:00
|
|
|
scat[i].byte_count = 0;
|
|
|
|
scat[i].lkey = cpu_to_be32(MLX4_INVALID_LKEY);
|
|
|
|
scat[i].addr = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
qp->rq.wrid[ind] = wr->wr_id;
|
|
|
|
|
2007-06-18 23:13:48 +08:00
|
|
|
ind = (ind + 1) & (qp->rq.wqe_cnt - 1);
|
2007-05-09 09:00:38 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
out:
|
|
|
|
if (likely(nreq)) {
|
|
|
|
qp->rq.head += nreq;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Make sure that descriptors are written before
|
|
|
|
* doorbell record.
|
|
|
|
*/
|
|
|
|
wmb();
|
|
|
|
|
|
|
|
*qp->db.db = cpu_to_be32(qp->rq.head & 0xffff);
|
|
|
|
}
|
|
|
|
|
|
|
|
spin_unlock_irqrestore(&qp->rq.lock, flags);
|
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
2007-06-21 17:27:47 +08:00
|
|
|
|
|
|
|
static inline enum ib_qp_state to_ib_qp_state(enum mlx4_qp_state mlx4_state)
|
|
|
|
{
|
|
|
|
switch (mlx4_state) {
|
|
|
|
case MLX4_QP_STATE_RST: return IB_QPS_RESET;
|
|
|
|
case MLX4_QP_STATE_INIT: return IB_QPS_INIT;
|
|
|
|
case MLX4_QP_STATE_RTR: return IB_QPS_RTR;
|
|
|
|
case MLX4_QP_STATE_RTS: return IB_QPS_RTS;
|
|
|
|
case MLX4_QP_STATE_SQ_DRAINING:
|
|
|
|
case MLX4_QP_STATE_SQD: return IB_QPS_SQD;
|
|
|
|
case MLX4_QP_STATE_SQER: return IB_QPS_SQE;
|
|
|
|
case MLX4_QP_STATE_ERR: return IB_QPS_ERR;
|
|
|
|
default: return -1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline enum ib_mig_state to_ib_mig_state(int mlx4_mig_state)
|
|
|
|
{
|
|
|
|
switch (mlx4_mig_state) {
|
|
|
|
case MLX4_QP_PM_ARMED: return IB_MIG_ARMED;
|
|
|
|
case MLX4_QP_PM_REARM: return IB_MIG_REARM;
|
|
|
|
case MLX4_QP_PM_MIGRATED: return IB_MIG_MIGRATED;
|
|
|
|
default: return -1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static int to_ib_qp_access_flags(int mlx4_flags)
|
|
|
|
{
|
|
|
|
int ib_flags = 0;
|
|
|
|
|
|
|
|
if (mlx4_flags & MLX4_QP_BIT_RRE)
|
|
|
|
ib_flags |= IB_ACCESS_REMOTE_READ;
|
|
|
|
if (mlx4_flags & MLX4_QP_BIT_RWE)
|
|
|
|
ib_flags |= IB_ACCESS_REMOTE_WRITE;
|
|
|
|
if (mlx4_flags & MLX4_QP_BIT_RAE)
|
|
|
|
ib_flags |= IB_ACCESS_REMOTE_ATOMIC;
|
|
|
|
|
|
|
|
return ib_flags;
|
|
|
|
}
|
|
|
|
|
2010-08-26 22:19:22 +08:00
|
|
|
static void to_ib_ah_attr(struct mlx4_ib_dev *ibdev, struct ib_ah_attr *ib_ah_attr,
|
2007-06-21 17:27:47 +08:00
|
|
|
struct mlx4_qp_path *path)
|
|
|
|
{
|
2010-08-26 22:19:22 +08:00
|
|
|
struct mlx4_dev *dev = ibdev->dev;
|
|
|
|
int is_eth;
|
|
|
|
|
2007-07-15 20:00:09 +08:00
|
|
|
memset(ib_ah_attr, 0, sizeof *ib_ah_attr);
|
2007-06-21 17:27:47 +08:00
|
|
|
ib_ah_attr->port_num = path->sched_queue & 0x40 ? 2 : 1;
|
|
|
|
|
|
|
|
if (ib_ah_attr->port_num == 0 || ib_ah_attr->port_num > dev->caps.num_ports)
|
|
|
|
return;
|
|
|
|
|
2010-08-26 22:19:22 +08:00
|
|
|
is_eth = rdma_port_get_link_layer(&ibdev->ib_dev, ib_ah_attr->port_num) ==
|
|
|
|
IB_LINK_LAYER_ETHERNET;
|
|
|
|
if (is_eth)
|
|
|
|
ib_ah_attr->sl = ((path->sched_queue >> 3) & 0x7) |
|
|
|
|
((path->sched_queue & 4) << 1);
|
|
|
|
else
|
|
|
|
ib_ah_attr->sl = (path->sched_queue >> 2) & 0xf;
|
|
|
|
|
2007-06-21 17:27:47 +08:00
|
|
|
ib_ah_attr->dlid = be16_to_cpu(path->rlid);
|
|
|
|
ib_ah_attr->src_path_bits = path->grh_mylmc & 0x7f;
|
|
|
|
ib_ah_attr->static_rate = path->static_rate ? path->static_rate - 5 : 0;
|
|
|
|
ib_ah_attr->ah_flags = (path->grh_mylmc & (1 << 7)) ? IB_AH_GRH : 0;
|
|
|
|
if (ib_ah_attr->ah_flags) {
|
|
|
|
ib_ah_attr->grh.sgid_index = path->mgid_index;
|
|
|
|
ib_ah_attr->grh.hop_limit = path->hop_limit;
|
|
|
|
ib_ah_attr->grh.traffic_class =
|
|
|
|
(be32_to_cpu(path->tclass_flowlabel) >> 20) & 0xff;
|
|
|
|
ib_ah_attr->grh.flow_label =
|
2007-07-18 09:37:38 +08:00
|
|
|
be32_to_cpu(path->tclass_flowlabel) & 0xfffff;
|
2007-06-21 17:27:47 +08:00
|
|
|
memcpy(ib_ah_attr->grh.dgid.raw,
|
|
|
|
path->rgid, sizeof ib_ah_attr->grh.dgid.raw);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
int mlx4_ib_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr, int qp_attr_mask,
|
|
|
|
struct ib_qp_init_attr *qp_init_attr)
|
|
|
|
{
|
|
|
|
struct mlx4_ib_dev *dev = to_mdev(ibqp->device);
|
|
|
|
struct mlx4_ib_qp *qp = to_mqp(ibqp);
|
|
|
|
struct mlx4_qp_context context;
|
|
|
|
int mlx4_state;
|
2008-04-17 12:09:34 +08:00
|
|
|
int err = 0;
|
|
|
|
|
|
|
|
mutex_lock(&qp->mutex);
|
2007-06-21 17:27:47 +08:00
|
|
|
|
|
|
|
if (qp->state == IB_QPS_RESET) {
|
|
|
|
qp_attr->qp_state = IB_QPS_RESET;
|
|
|
|
goto done;
|
|
|
|
}
|
|
|
|
|
|
|
|
err = mlx4_qp_query(dev->dev, &qp->mqp, &context);
|
2008-04-17 12:09:34 +08:00
|
|
|
if (err) {
|
|
|
|
err = -EINVAL;
|
|
|
|
goto out;
|
|
|
|
}
|
2007-06-21 17:27:47 +08:00
|
|
|
|
|
|
|
mlx4_state = be32_to_cpu(context.flags) >> 28;
|
|
|
|
|
2008-04-17 12:09:34 +08:00
|
|
|
qp->state = to_ib_qp_state(mlx4_state);
|
|
|
|
qp_attr->qp_state = qp->state;
|
2007-06-21 17:27:47 +08:00
|
|
|
qp_attr->path_mtu = context.mtu_msgmax >> 5;
|
|
|
|
qp_attr->path_mig_state =
|
|
|
|
to_ib_mig_state((be32_to_cpu(context.flags) >> 11) & 0x3);
|
|
|
|
qp_attr->qkey = be32_to_cpu(context.qkey);
|
|
|
|
qp_attr->rq_psn = be32_to_cpu(context.rnr_nextrecvpsn) & 0xffffff;
|
|
|
|
qp_attr->sq_psn = be32_to_cpu(context.next_send_psn) & 0xffffff;
|
|
|
|
qp_attr->dest_qp_num = be32_to_cpu(context.remote_qpn) & 0xffffff;
|
|
|
|
qp_attr->qp_access_flags =
|
|
|
|
to_ib_qp_access_flags(be32_to_cpu(context.params2));
|
|
|
|
|
|
|
|
if (qp->ibqp.qp_type == IB_QPT_RC || qp->ibqp.qp_type == IB_QPT_UC) {
|
2010-08-26 22:19:22 +08:00
|
|
|
to_ib_ah_attr(dev, &qp_attr->ah_attr, &context.pri_path);
|
|
|
|
to_ib_ah_attr(dev, &qp_attr->alt_ah_attr, &context.alt_path);
|
2007-06-21 17:27:47 +08:00
|
|
|
qp_attr->alt_pkey_index = context.alt_path.pkey_index & 0x7f;
|
|
|
|
qp_attr->alt_port_num = qp_attr->alt_ah_attr.port_num;
|
|
|
|
}
|
|
|
|
|
|
|
|
qp_attr->pkey_index = context.pri_path.pkey_index & 0x7f;
|
2007-07-18 09:37:38 +08:00
|
|
|
if (qp_attr->qp_state == IB_QPS_INIT)
|
|
|
|
qp_attr->port_num = qp->port;
|
|
|
|
else
|
|
|
|
qp_attr->port_num = context.pri_path.sched_queue & 0x40 ? 2 : 1;
|
2007-06-21 17:27:47 +08:00
|
|
|
|
|
|
|
/* qp_attr->en_sqd_async_notify is only applicable in modify qp */
|
|
|
|
qp_attr->sq_draining = mlx4_state == MLX4_QP_STATE_SQ_DRAINING;
|
|
|
|
|
|
|
|
qp_attr->max_rd_atomic = 1 << ((be32_to_cpu(context.params1) >> 21) & 0x7);
|
|
|
|
|
|
|
|
qp_attr->max_dest_rd_atomic =
|
|
|
|
1 << ((be32_to_cpu(context.params2) >> 21) & 0x7);
|
|
|
|
qp_attr->min_rnr_timer =
|
|
|
|
(be32_to_cpu(context.rnr_nextrecvpsn) >> 24) & 0x1f;
|
|
|
|
qp_attr->timeout = context.pri_path.ackto >> 3;
|
|
|
|
qp_attr->retry_cnt = (be32_to_cpu(context.params1) >> 16) & 0x7;
|
|
|
|
qp_attr->rnr_retry = (be32_to_cpu(context.params1) >> 13) & 0x7;
|
|
|
|
qp_attr->alt_timeout = context.alt_path.ackto >> 3;
|
|
|
|
|
|
|
|
done:
|
|
|
|
qp_attr->cur_qp_state = qp_attr->qp_state;
|
2007-07-18 11:59:02 +08:00
|
|
|
qp_attr->cap.max_recv_wr = qp->rq.wqe_cnt;
|
|
|
|
qp_attr->cap.max_recv_sge = qp->rq.max_gs;
|
|
|
|
|
2007-06-21 17:27:47 +08:00
|
|
|
if (!ibqp->uobject) {
|
2007-07-18 11:59:02 +08:00
|
|
|
qp_attr->cap.max_send_wr = qp->sq.wqe_cnt;
|
|
|
|
qp_attr->cap.max_send_sge = qp->sq.max_gs;
|
|
|
|
} else {
|
|
|
|
qp_attr->cap.max_send_wr = 0;
|
|
|
|
qp_attr->cap.max_send_sge = 0;
|
2007-06-21 17:27:47 +08:00
|
|
|
}
|
|
|
|
|
2007-07-18 11:59:02 +08:00
|
|
|
/*
|
|
|
|
* We don't support inline sends for kernel QPs (yet), and we
|
|
|
|
* don't know what userspace's value should be.
|
|
|
|
*/
|
|
|
|
qp_attr->cap.max_inline_data = 0;
|
|
|
|
|
|
|
|
qp_init_attr->cap = qp_attr->cap;
|
|
|
|
|
2008-07-15 14:48:48 +08:00
|
|
|
qp_init_attr->create_flags = 0;
|
|
|
|
if (qp->flags & MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK)
|
|
|
|
qp_init_attr->create_flags |= IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK;
|
|
|
|
|
|
|
|
if (qp->flags & MLX4_IB_QP_LSO)
|
|
|
|
qp_init_attr->create_flags |= IB_QP_CREATE_IPOIB_UD_LSO;
|
|
|
|
|
2013-11-07 21:25:17 +08:00
|
|
|
if (qp->flags & MLX4_IB_QP_NETIF)
|
|
|
|
qp_init_attr->create_flags |= IB_QP_CREATE_NETIF_QP;
|
|
|
|
|
2012-08-23 22:09:03 +08:00
|
|
|
qp_init_attr->sq_sig_type =
|
|
|
|
qp->sq_signal_bits == cpu_to_be32(MLX4_WQE_CTRL_CQ_UPDATE) ?
|
|
|
|
IB_SIGNAL_ALL_WR : IB_SIGNAL_REQ_WR;
|
|
|
|
|
2008-04-17 12:09:34 +08:00
|
|
|
out:
|
|
|
|
mutex_unlock(&qp->mutex);
|
|
|
|
return err;
|
2007-06-21 17:27:47 +08:00
|
|
|
}
|
|
|
|
|