Commit Graph

156 Commits

Author SHA1 Message Date
Jack Morgenstein 5a03108689 net/mlx4_core: Adjust command timeouts to conform to the firmware spec
The firmware spec states that the timeout for all commands should be 60 seconds.

In the past, the spec indicated that there were several classes of timeout
(short, medium, and long).  The driver has these different timeout classes.
We leave the class differentiation in the driver as-is (to protect against any
future spec changes), but set the timeout for all classes to be 60 seconds.

In addition, we fix a few commands which had hard-coded numeric timeouts specified.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 17:12:57 -08:00
Jack Morgenstein f0ce061508 net/mlx4_core: Add reserved lkey for VFs to QUERY_FUNC_CAP
The reserved lKey is different for each VF.
A base lkey value is returned in QUERY_DEV_CAP at offset 0x98.

The reserved L_key value for a VF is:
    VF_lkey = base_lkey + (VF_number << 8).

This VF L_key value should be returned in QUERY_FUNC_CAP
(opcode-modifier = 0) at offset 0x48.

To indicate that the lkey value at offset 0x48 is valid, the Hypervisor
sets a flag bit in dword 0x0, offset 27 in the QUERY_FUNC_CAP wrapper
function.

When the VF calls QUERY_FUNC_CAP, it should check if this flag bit is set.
If it is set, the VF should take the reserved lkey value at offset 0x48.
If the bit is not set, the VF should not use a reserved lkey
(i.e., should set its reserved lkey value to 0).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 17:12:57 -08:00
Jack Morgenstein be6a6b43b5 net/mlx4_core: Add bad-cable event support
If the firmware can detect a bad cable, allow it to generate an
event, and print the problem in the log.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 17:12:57 -08:00
Ido Shamay c3f2511fea net/mlx4: Cache line CQE/EQE stride fixes
This commit contains 2 fixes for the 128B CQE/EQE stride feaure.
Wei found that mlx4_QUERY_HCA function marked the wrong capability
in flags (64B CQE/EQE), when CQE/EQE stride feature was enabled.
Also added small fix in initial CQE ownership bit assignment, when CQE
is size is not default 32B.

Fixes: 77507aa24 (net/mlx4: Enable CQE/EQE stride support)
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-16 15:23:53 -05:00
Or Gerlitz c78e25edbf net/mlx4_core: Avoid double dumping of the PF device capabilities
To support asymmetric EQ allocations, we should query the device
capabilities prior to enabling SRIOV. As a side effect of adding that,
we are dumping the PF device capabilities twice. Avoid that by moving
the printing into a helper function which is called once.

Fixes: 7ae0e400cd ('net/mlx4_core: Flexible (asymmetric) allocation of
		     EQs and MSI-X vectors for PF/VFs')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-15 11:34:54 -05:00
Matan Barak 7d077cd34e net/mlx4: Add support for A0 steering
Add the required firmware commands for A0 steering and a way to enable
that. The firmware support focuses on INIT_HCA, QUERY_HCA, QUERY_PORT,
QUERY_DEV_CAP and QUERY_FUNC_CAP commands. Those commands are used
to configure and query the device.

The different A0 DMFS (steering) modes are:

Static - optimized performance, but flow steering rules are
limited. This mode should be choosed explicitly by the user
in order to be used.

Dynamic - this mode should be explicitly choosed by the user.
In this mode, the FW works in optimized steering mode as long as
it can and afterwards automatically drops to classic (full) DMFS.

Disable - this mode should be explicitly choosed by the user.
The user instructs the system not to use optimized steering, even if
the FW supports Dynamic A0 DMFS (and thus will be able to use optimized
steering in Default A0 DMFS mode).

Default - this mode is implicitly choosed. In this mode, if the FW
supports Dynamic A0 DMFS, it'll work in this mode. Otherwise, it'll
work at Disable A0 DMFS mode.

Under SRIOV configuration, when the A0 steering mode is enabled,
older guest VF drivers who aren't using the RX QP allocation flag
(MLX4_RESERVE_A0_QP) will get a QP from the general range and
fail when attempting to register a steering rule. To avoid that,
the PF context behaviour is changed once on A0 static mode, to
require support for the allocation flag in VF drivers too.

In order to enable A0 steering, we use log_num_mgm_entry_size param.
If the value of the parameter is not positive, we treat the absolute
value of log_num_mgm_entry_size as a bit field. Setting bit 2 of this
bit field enables static A0 steering.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:47:36 -05:00
Matan Barak 431df8c7e9 net/mlx4: Refactor QUERY_PORT
Currently QUERY_PORT is done as a part of QUERY_DEV_CAP firmware command.

Since we would like to use it without querying all device capabilities,
extract this part to be a function of its own.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:47:35 -05:00
Matan Barak d57febe1a4 net/mlx4: Add A0 hybrid steering
A0 hybrid steering is a form of high performance flow steering.
By using this mode, mlx4 cards use a fast limited table based steering,
in order to enable fast steering of unicast packets to a QP.

In order to implement A0 hybrid steering we allocate resources
from different zones:
(1) General range
(2) Special MAC-assigned QPs [RSS, Raw-Ethernet] each has its own region.

When we create a rss QP or a raw ethernet (A0 steerable and BF ready) QP,
we try hard to allocate the QP from range (2). Otherwise, we try hard not
to allocate from this  range. However, when the system is pushed to its
limits and one needs every resource, the allocator uses every region it can.

Meaning, when we run out of raw-eth qps, the allocator allocates from the
general range (and the special-A0 area is no longer active). If we run out
of RSS qps, the mechanism tries to allocate from the raw-eth QP zone. If that
is also exhausted, the allocator will allocate from the general range
(and the A0 region is no longer active).

Note that if a raw-eth qp is allocated from the general range, it attempts
to allocate the range such that bits 6 and 7 (blueflame bits) in the
QP number are not set.

When the feature is used in SRIOV, the VF has to notify the PF what
kind of QP attributes it needs. In order to do that, along with the
"Eth QP blueflame" bit, we reserve a new "A0 steerable QP". According
to the combination of these bits, the PF tries to allocate a suitable QP.

In order to maintain backward compatibility (with older PFs), the PF
notifies which QP attributes it supports via QUERY_FUNC_CAP command.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:47:35 -05:00
Eugenia Emantayev ddae0349fd net/mlx4: Change QP allocation scheme
When using BF (Blue-Flame), the QPN overrides the VLAN, CV, and SV fields
in the WQE. Thus, BF may only be used for QPNs with bits 6,7 unset.

The current Ethernet driver code reserves a Tx QP range with 256b alignment.

This is wrong because if there are more than 64 Tx QPs in use,
QPNs >= base + 65 will have bits 6/7 set.

This problem is not specific for the Ethernet driver, any entity that
tries to reserve more than 64 BF-enabled QPs should fail. Also, using
ranges is not necessary here and is wasteful.

The new mechanism introduced here will support reservation for
"Eth QPs eligible for BF" for all drivers: bare-metal, multi-PF, and VFs
(when hypervisors support WC in VMs). The flow we use is:

1. In mlx4_en, allocate Tx QPs one by one instead of a range allocation,
   and request "BF enabled QPs" if BF is supported for the function

2. In the ALLOC_RES FW command, change param1 to:
a. param1[23:0]  - number of QPs
b. param1[31-24] - flags controlling QPs reservation

Bit 31 refers to Eth blueflame supported QPs. Those QPs must have
bits 6 and 7 unset in order to be used in Ethernet.

Bits 24-30 of the flags are currently reserved.

When a function tries to allocate a QP, it states the required attributes
for this QP. Those attributes are considered "best-effort". If an attribute,
such as Ethernet BF enabled QP, is a must-have attribute, the function has
to check that attribute is supported before trying to do the allocation.

In a lower layer of the code, mlx4_qp_reserve_range masks out the bits
which are unsupported. If SRIOV is used, the PF validates those attributes
and masks out unsupported attributes as well. In order to notify VFs which
attributes are supported, the VF uses QUERY_FUNC_CAP command. This command's
mailbox is filled by the PF, which notifies which QP allocation attributes
it supports.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:47:35 -05:00
Or Gerlitz 383677da43 net/mlx4_core: Mask out host side virtualization features for guests
When VFs (guests in this context) issue the QUERY_DEV_CAP command, they
need not be told that host side virtualization features such as VST, FSM
(MAC anti-spoofing) and running > 80 VFs are supported by the device.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:47:34 -05:00
Matan Barak de966c5928 net/mlx4_core: Support more than 64 VFs
We now allow up to 126 VFs. Note though that certain firmware
versions only allow up to 80 VFs. Moreover, old HCAs only support 64 VFs.
In these cases, we limit the maximum number of VFs to 64.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13 15:16:22 -05:00
Matan Barak 7ae0e400cd net/mlx4_core: Flexible (asymmetric) allocation of EQs and MSI-X vectors for PF/VFs
Previously, the driver queried the firmware in order to get the number
of supported EQs. Under SRIOV, since this was done before the driver
notified the firmware how many VFs it actually needs, the firmware had
to take into account a worst case scenario and always allocated four EQs
per VF, where one was used for events while the others were used for completions.

Now, when the firmware supports the asymmetric allocation scheme, denoted
by exposing num_sys_eqs > 0 (--> MLX4_DEV_CAP_FLAG2_SYS_EQS), we use the
QUERY_FUNC command to query the firmware before enabling SRIOV. Thus we
can get more EQs and MSI-X vectors per function.

Moreover, when running in the new firmware/driver mode, the limitation
that the number of EQs should be a power of two is lifted.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13 15:16:21 -05:00
Matan Barak e8c4265bea net/mlx4_core: Add QUERY_FUNC firmware command
QUERY_FUNC firmware command could be used in order to query the
number of EQs, reserved EQs, etc for a specific function.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13 15:16:19 -05:00
Matan Barak 225c6c8c6b net/mlx4_core: Use correct variable type for mlx4_slave_cap
We've used an incorrect type for the loop counter and the
mlx4_QUERY_FUNC_CAP function. The current input modifier
is either a port or a boolean.
Since the number of ports is always a positive value < 255,
we should use u8 instead of an integer with casting.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13 15:16:16 -05:00
Matan Barak 7c68dd435b net/mlx4_core: Fix wrong reading of reserved_eqs
We mistakenly read the reserved_eqs field as a standard
numeric value rather than a log2 value.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13 15:16:15 -05:00
Matan Barak d475c95b4b net/mlx4_core: Add retrieval of CONFIG_DEV parameters
Add code to issue CONFIG_DEV "get" firmware command.

This command is used in order to obtain certain parameters used for
supporting various RX checksumming options and vxlan UDP port.

The GET operation is allowed for VFs too.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-03 12:28:14 -05:00
Saeed Mahameed 6e80669998 net/mlx4_core: Prevent VF from changing port configuration
Added wrapper to the ACCESS_REG command for handling guest HW
registers access, preventing write operations, but do allow reads.

This will prevent SRIOV guests to change port PTYS configuration,
such as speed/advertised link modes.

Fixes: adbc7ac5c1 ('net/mlx4_core: Introduce ACCESS_REG CMD [...]')
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-03 12:28:13 -05:00
Saeed Mahameed a53e3e8c1d net/mlx4_core: Add ethernet backplane autoneg device capability
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28 17:18:00 -04:00
Saeed Mahameed adbc7ac5c1 net/mlx4_core: Introduce ACCESS_REG CMD and eth_prot_ctrl dev cap
Adding ACCESS REG mlx4 command and use it to implement Query method for
PTYS (Port Type and Speed Register).
Query and store eth_prot_ctrl dev cap.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28 17:18:00 -04:00
Jack Morgenstein a7401b9cf3 net/mlx4_core: Protect QUERY_PORT wrapper from untrusted guests
The function mlx4_QUERY_PORT_wrapper implements only the
QUERY_PORT "general" case (opcode modifier = 0).

Verify that the opcode modifier is zero, and also that the
input modifier contains only the port number in bits 0..7
(all other bits should be zero).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-30 16:27:49 -04:00
Ido Shamay 77507aa249 net/mlx4_core: Enable CQE/EQE stride support
This feature is intended for archs having cache line larger then 64B.

Since our CQE/EQEs are generally 64B in those systems, HW will write
twice to the same cache line consecutively, causing pipe locks due to
he hazard prevention mechanism. For elements in a cyclic buffer, writes
are consecutive, so entries smaller than a cache line should be
avoided, especially if they are written at a high rate.

Reduce consecutive writes to same cache line in CQs/EQs, by allowing the
driver to increase the distance between entries so that each will reside
in a different cache line. Until the introduction of this feature, there
were two types of CQE/EQE:

1. 32B stride and context in the [0-31] segment
2. 64B stride and context in the [32-63] segment

This feature introduces two additional types:

3. 128B stride and context in the [0-31] segment (128B cache line)
4. 256B stride and context in the [0-31] segment (256B cache line)

Modify the mlx4_core driver to query the device for the CQE/EQE cache
line stride capability and to enable that capability when the host
cache line size is larger than 64 bytes (supported cache lines are
128B and 256B).

The mlx4 IB driver and libmlx4 need not be aware of this change. The PF
context behaviour is changed to require this change in VF drivers
running on such archs.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:30:10 -04:00
Jack Morgenstein 114840c3d2 mlx4_core: Add support for secure-host and SMP firewall
Secure-host is the general term for the capability of a device
to protect itself and the subnet from malicious host software.

This is achieved by:
1. Not allowing un-trusted entities to access device configuration
   registers, directly (through pci_cr or pci_conf) and indirectly
   (through MADs).

2. Hiding M_Key from untrusted entities.

3. Preventing the modification of GUID0 by un-trusted entities

4. Not allowing drivers on untrusted hosts to receive nor to transmit
   packets over QP0 (SMP Firewall).

The secure-host capability depends on firmware handling all QP0
packets, and not passing these packets up to the driver. Any information
required by the driver for proper operation (e.g., SM lid) is passed
via events generated by the firmware while processing QP0 MADs.

Driver support mainly requires using the MAD_DEMUX FW command at startup,
where the feature is enabled/disabled through a procedure described in
the Mellanox HCA tools package.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>

[ Fix error path in mlx4_setup_hca to go to err_mcg_table_free. - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-05 07:40:22 -07:00
Linus Torvalds f9da455b93 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov.

 2) Multiqueue support in xen-netback and xen-netfront, from Andrew J
    Benniston.

 3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn
    Mork.

 4) BPF now has a "random" opcode, from Chema Gonzalez.

 5) Add more BPF documentation and improve test framework, from Daniel
    Borkmann.

 6) Support TCP fastopen over ipv6, from Daniel Lee.

 7) Add software TSO helper functions and use them to support software
    TSO in mvneta and mv643xx_eth drivers.  From Ezequiel Garcia.

 8) Support software TSO in fec driver too, from Nimrod Andy.

 9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli.

10) Handle broadcasts more gracefully over macvlan when there are large
    numbers of interfaces configured, from Herbert Xu.

11) Allow more control over fwmark used for non-socket based responses,
    from Lorenzo Colitti.

12) Do TCP congestion window limiting based upon measurements, from Neal
    Cardwell.

13) Support busy polling in SCTP, from Neal Horman.

14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru.

15) Bridge promisc mode handling improvements from Vlad Yasevich.

16) Don't use inetpeer entries to implement ID generation any more, it
    performs poorly, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits)
  rtnetlink: fix userspace API breakage for iproute2 < v3.9.0
  tcp: fixing TLP's FIN recovery
  net: fec: Add software TSO support
  net: fec: Add Scatter/gather support
  net: fec: Increase buffer descriptor entry number
  net: fec: Factorize feature setting
  net: fec: Enable IP header hardware checksum
  net: fec: Factorize the .xmit transmit function
  bridge: fix compile error when compiling without IPv6 support
  bridge: fix smatch warning / potential null pointer dereference
  via-rhine: fix full-duplex with autoneg disable
  bnx2x: Enlarge the dorq threshold for VFs
  bnx2x: Check for UNDI in uncommon branch
  bnx2x: Fix 1G-baseT link
  bnx2x: Fix link for KR with swapped polarity lane
  sctp: Fix sk_ack_backlog wrap-around problem
  net/core: Add VF link state control policy
  net/fsl: xgmac_mdio is dependent on OF_MDIO
  net/fsl: Make xgmac_mdio read error message useful
  net_sched: drr: warn when qdisc is not work conserving
  ...
2014-06-12 14:27:40 -07:00
Jack Morgenstein 99ec41d0a4 mlx4: Add infrastructure for selecting VFs to enable QP0 via MLX proxy QPs
This commit adds the infrastructure for enabling selected VFs to
operate SMI (QP0) MADs without restriction.

Additionally, for these enabled VFs, their QP0 proxy and tunnel QPs
are MLX QPs.  As such, they operate over VL15.  Therefore, they are
not affected by "credit" problems or changes in the VLArb table (which
may shut down VL0).

Non-enabled VFs may only create UD proxy QP0 qps (which are forced by
the hypervisor to send packets using the q-key it assigns and places
in the qp-context).  Thus, non-enabled VFs will not pose a security
risk.  The hypervisor discards any privileged MADs it receives from
these non-enabled VFs.

By default, all VFs are NOT enabled, and must explicitly be enabled
by the administrator.

The sysfs interface which operates the VF enablement infrastructure
is provided in the next commit.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-29 21:13:09 -07:00
Jack Morgenstein bc82878baa mlx4_core: Fix incorrect FLAGS1 bitmap test in mlx4_QUERY_FUNC_CAP
Commit eb17711bc1 ("net/mlx4_core: Introduce nic_info new flag in
QUERY_FUNC_CAP") did:

	if (func_cap->flags1 & QUERY_FUNC_CAP_FLAGS1_OFFSET) {

which should be:

	if (func_cap->flags1 & QUERY_FUNC_CAP_FLAGS1_FORCE_VLAN) {

Fix that.

Fixes: eb17711bc1 ("net/mlx4_core: Introduce nic_info new flag in QUERY_FUNC_CAP")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-29 21:12:58 -07:00
Joe Perches 1a91de2883 mellanox: Logging message cleanups
Use a more current logging style.

o Coalesce formats
o Add missing spaces for coalesced formats
o Align arguments for modified formats
o Add missing newlines for some logging messages
o Use DRV_NAME as part of format instead of %s, DRV_NAME to
  reduce overall text.
o Use ..., ##__VA_ARGS__ instead of args... in macros
o Correct a few format typos
o Use a single line message where appropriate

Signed-off-by: Joe Perches <joe@perches.com>
Acked-By: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-08 23:42:02 -04:00
Or Gerlitz d18f141a1a mlx4: Add support for CONFIG_DEV command
Introduce the CONFIG_DEV firmware command which we will use to
configure the UDP port assumed by the firmware for the VXLAN offloads.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 16:29:35 -04:00
Matan Barak 449fc48866 net/mlx4: Adapt code for N-Port VF
Adds support for N-Port VFs, this includes:
1. Adding support in the wrapped FW command
	In wrapped commands, we need to verify and convert
	the slave's port into the real physical port.
	Furthermore, when sending the response back to the slave,
	a reverse conversion should be made.
2. Adjusting sqpn for QP1 para-virtualization
	The slave assumes that sqpn is used for QP1 communication.
	If the slave is assigned to a port != (first port), we need
	to adjust the sqpn that will direct its QP1 packets into the
	correct endpoint.
3. Adjusting gid[5] to modify the port for raw ethernet
	In B0 steering, gid[5] contains the port. It needs
	to be adjusted into the physical port.
4. Adjusting number of ports in the query / ports caps in the FW commands
	When a slave queries the hardware, it needs to view only
	the physical ports it's assigned to.
5. Adjusting the sched_qp according to the port number
	The QP port is encoded in the sched_qp, thus in modify_qp we need
	to encode the correct port in sched_qp.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-20 16:18:30 -04:00
David S. Miller 85dcce7a73 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/r8152.c
	drivers/net/xen-netback/netback.c

Both the r8152 and netback conflicts were simple overlapping
changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14 22:31:55 -04:00
Or Gerlitz 56cb456746 net/mlx4_core: Fix wrong dump of the vxlan offloads device capability
Fix the value used to dump the vxlan offloads device capability to align
with the MLX4_DEV_CAP_FLAG2_yyy definition. While on that, add dump to
the IPoIB flow-steering device capability and fix small typo.

The vxlan cap value wasn't fully handled when a conflict was resolved
between MLX4_DEV_CAP_FLAG2_DMFS_IPOIB coming from the IB tree to
MLX4_DEV_CAP_FLAG2_VXLAN_OFFLOADS coming from net-next.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 16:12:41 -04:00
Jack Morgenstein b6ffaeffae mlx4: In RoCE allow guests to have multiple GIDS
The GIDs are statically distributed, as follows:
PF: gets 16 GIDs
VFs:  Remaining GIDS are divided evenly between VFs activated by the driver.
      If the division is not even, lower-numbered VFs get an extra GID.

For an IB interface, the number of gids per guest remains as before: one gid per guest.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 15:57:14 -04:00
Amir Vadai 57352ef4f5 net/mlx4_core: Fix memory access error in mlx4_QUERY_DEV_CAP_wrapper()
Fix a regression introduced by [1]. outbox was accessed instead of
outbox->buf. Typo was copy-pasted to [2] and [3].

[1] - cc1ade9 mlx4_core: Disable memory windows for virtual functions
[2] - 4de6580 mlx4_core: Add support for steerable IB UD QPs
[3] - 7ffdf72 net/mlx4_core: Add basic support for TCP/IP offloads under
      tunneling

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06 17:09:04 -05:00
Eyal Perry 28d222bbaa net/mlx4_core: Fix sparse warning
This patch force conversion to u32 to fix the following sparse warning:
drivers/net/ethernet/mellanox/mlx4/fw.c:1822:53: warning: restricted __be32
degrades to integer

Casting to u32 is safe here, because token will be returned as is
from the hardware without any modification.

Signed-off-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-02 20:04:01 -05:00
Linus Torvalds 4ba9920e5e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) BPF debugger and asm tool by Daniel Borkmann.

 2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann.

 3) Correct reciprocal_divide and update users, from Hannes Frederic
    Sowa and Daniel Borkmann.

 4) Currently we only have a "set" operation for the hw timestamp socket
    ioctl, add a "get" operation to match.  From Ben Hutchings.

 5) Add better trace events for debugging driver datapath problems, also
    from Ben Hutchings.

 6) Implement auto corking in TCP, from Eric Dumazet.  Basically, if we
    have a small send and a previous packet is already in the qdisc or
    device queue, defer until TX completion or we get more data.

 7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko.

 8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel
    Borkmann.

 9) Share IP header compression code between Bluetooth and IEEE802154
    layers, from Jukka Rissanen.

10) Fix ipv6 router reachability probing, from Jiri Benc.

11) Allow packets to be captured on macvtap devices, from Vlad Yasevich.

12) Support tunneling in GRO layer, from Jerry Chu.

13) Allow bonding to be configured fully using netlink, from Scott
    Feldman.

14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can
    already get the TCI.  From Atzm Watanabe.

15) New "Heavy Hitter" qdisc, from Terry Lam.

16) Significantly improve the IPSEC support in pktgen, from Fan Du.

17) Allow ipv4 tunnels to cache routes, just like sockets.  From Tom
    Herbert.

18) Add Proportional Integral Enhanced packet scheduler, from Vijay
    Subramanian.

19) Allow openvswitch to mmap'd netlink, from Thomas Graf.

20) Key TCP metrics blobs also by source address, not just destination
    address.  From Christoph Paasch.

21) Support 10G in generic phylib.  From Andy Fleming.

22) Try to short-circuit GRO flow compares using device provided RX
    hash, if provided.  From Tom Herbert.

The wireless and netfilter folks have been busy little bees too.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits)
  net/cxgb4: Fix referencing freed adapter
  ipv6: reallocate addrconf router for ipv6 address when lo device up
  fib_frontend: fix possible NULL pointer dereference
  rtnetlink: remove IFLA_BOND_SLAVE definition
  rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info
  qlcnic: update version to 5.3.55
  qlcnic: Enhance logic to calculate msix vectors.
  qlcnic: Refactor interrupt coalescing code for all adapters.
  qlcnic: Update poll controller code path
  qlcnic: Interrupt code cleanup
  qlcnic: Enhance Tx timeout debugging.
  qlcnic: Use bool for rx_mac_learn.
  bonding: fix u64 division
  rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC
  sfc: Use the correct maximum TX DMA ring size for SFC9100
  Add Shradha Shah as the sfc driver maintainer.
  net/vxlan: Share RX skb de-marking and checksum checks with ovs
  tulip: cleanup by using ARRAY_SIZE()
  ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
  net/cxgb4: Don't retrieve stats during recovery
  ...
2014-01-25 11:17:34 -08:00
Matan Barak 4de6580360 mlx4_core: Add support for steerable IB UD QPs
This patch adds support for allocating IB UD QPs that we can steer
traffic from.  We introduce a new firmware command FLOW_STEERING_IB_UC_QP_RANGE
and a capability bit.

This command isn't supported for VFs.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 14:06:50 -08:00
Or Gerlitz 7ffdf726cf net/mlx4_core: Add basic support for TCP/IP offloads under tunneling
Add the low-level device commands and definitions used for TCP/IP HW offloads
of tunneled/vxlan traffic which are supported by the ConnectX3-pro NIC.

This is done through the following elements:

 - read tunneling device caps in QUERY_DEV_CAP
 - add helper function to do SET_PORT for tunneling
 - add DMFS VXLAN steering rule definitions
 - add CQE and WQE checksum offload field definitions

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-31 14:31:43 -05:00
Hadar Hen Zion 8e1a28e8e6 net/mlx4_core: Expose physical port id as PF/VF capability
Add the infrastructure needed to support ndo_get_phys_port_id which
allows users to identify to which physical port a net-device is connected
to by reading a unique port id.
This will work for VFs and PFs.
The driver uses a new device capability - phys_port_id, The PF driver
reads the port phys_port_id from Firmware and stores it. The VF driver
reads the port phys_port_id from the PF using QUERY_FUNC_CAP command.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-19 19:04:43 -05:00
Hadar Hen Zion eb17711bc1 net/mlx4_core: Introduce nic_info new flag in QUERY_FUNC_CAP
Set nic_info field in QUERY_FUNC_CAP, which designates
supplementary NIC information is provided by the hypervisor.
When set, the following fields are valid: nic_num_rings,
nic_indirection_tbl_sz, cur_mac and phys_port_id.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-19 19:04:43 -05:00
Hadar Hen Zion 73e74ab4e0 net/mlx4_core: Rename QUERY_FUNC_CAP fields
Use correct names for QUERY_FUNC_CAP fields: flags0 and flags1.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-19 19:04:43 -05:00
Hadar Hen Zion 7b25d81b7f net/mlx4_core: Remove zeroed out of explicit QUERY_FUNC_CAP fields
All mailboxes are already zeroed by commit:
571b8b9 net/mlx4_core: Initialize all mailbox buffers to zero before use
Remove explicit zero set for force mac and force vlan fields in
mlx4_QUERY_FUNC_CAP_wrapper

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-19 19:04:43 -05:00
Linus Torvalds 9073e1a804 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "Usual earth-shaking, news-breaking, rocket science pile from
  trivial.git"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
  doc: usb: Fix typo in Documentation/usb/gadget_configs.txt
  doc: add missing files to timers/00-INDEX
  timekeeping: Fix some trivial typos in comments
  mm: Fix some trivial typos in comments
  irq: Fix some trivial typos in comments
  NUMA: fix typos in Kconfig help text
  mm: update 00-INDEX
  doc: Documentation/DMA-attributes.txt fix typo
  DRM: comment: `halve' -> `half'
  Docs: Kconfig: `devlopers' -> `developers'
  doc: typo on word accounting in kprobes.c in mutliple architectures
  treewide: fix "usefull" typo
  treewide: fix "distingush" typo
  mm/Kconfig: Grammar s/an/a/
  kexec: Typo s/the/then/
  Documentation/kvm: Update cpuid documentation for steal time and pv eoi
  treewide: Fix common typo in "identify"
  __page_to_pfn: Fix typo in comment
  Correct some typos for word frequency
  clk: fixed-factor: Fix a trivial typo
  ...
2013-11-15 16:47:22 -08:00
Jack Morgenstein 571b8b92c7 net/mlx4_core: Initialize all mailbox buffers to zero before use
To guarantee that all unused fields in all FW commands for both inboxes
and outboxes are zeroed out, initialize the mailbox buffer to all zeroes.

This is especially important for SRIOV comm-channel virtual commands
(such as QUERY_FUNC_CAP), where if new fields are added to support new
features, the driver can depend on older kernels passing zeroes in these
fields.

In addition to zeroing out the mailbox buffer at allocation time, all
(now unnecessary) calls to memset by the callers of
mlx4_alloc_cmd_mailbox() are removed.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-07 19:22:47 -05:00
Jack Morgenstein eb456a68c6 net/mlx4_core: Fix quota handling in the QUERY_FUNC_CAP wrapper
In current kernels, the mlx4 driver running on a VM does not
differentiate between max resource numbers for the HCA and
max quotas -- it simply takes the quota values passed to it
as max-resource values.

However, the driver actually requires the VFs to be aware of
the actual number of resources that the HCA was initialized with,
for QPs, CQs, SRQs and MPTs.

For QPs, CQs and SRQs, the reason is that in completion handling
the driver must know which of the 24 bits are the actual resource
number, and which are "padding" bits.

For MPTs, also, the driver assumes knowledge of the number of MPTs
in the system.

The previous commit fixes the quota logic on the VM for the quota values
passed to it by QUERY_FUNC_CAPS.

For QPs, CQs, SRQs, and MPTs, it takes the max resource numbers
from QUERY_HCA (and not QUERY_FUNC_CAPS).  The quotas passed
in QUERY_FUNC_CAPS are used to report max resource number values
in the response to ib_query_device.

However, the Hypervisor driver must consider that VMs
may be running previous kernels, and compatibility must be preserved.

To resolve the incompatibility with previous kernels running on VMs,
we deprecated the quota fields in mlx4_QUERY_FUNC_CAP.  In the
deprecated fields, we pass the max-resource values from INIT_HCA

The quota fields are moved to a new location, and the current kernel
driver takes the proper values from that location. There is
also a new flag in dword 0, bit 28 of the mlx4_QUERY_FUNC_CAP mailbox;
if this flag is set, the (VM) driver takes the quota values from the
new location.

VMs running previous kernels will work properly, except that the max resource
numbers reported in ib_query_device for these resources will be
too high.  The Hypervisor driver will, however, enforce the quotas
for these VMs.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 16:19:08 -05:00
Jack Morgenstein 5a0d0a6161 mlx4: Structures and init/teardown for VF resource quotas
This is step #1 for implementing SRIOV resource quotas for VFs.

Quotas are implemented per resource type for VFs and the PF, to prevent
any entity from simply grabbing all the resources for itself and leaving
the other entities unable to obtain such resources.

Resources which are allocated using quotas:  QPs, CQs, SRQs, MPTs, MTTs, MAC,
                                             VLAN, and Counters.

The quota system works as follows:
Each entity (VF or PF) is given a max number of a given resource (its quota),
and a guaranteed minimum number for each resource (starvation prevention).

For QPs, CQs, SRQs, MPTs and MTTs:
50% of the available quantity for the resource is divided equally among
the PF and all the active VFs (i.e., the number of VFs in the mlx4_core module
parameter "num_vfs"). This 50% represents the "guaranteed minimum" pool.
The other 50% is the "free pool", allocated on a first-come-first-serve basis.
For each VF/PF, resources are first allocated from its "guaranteed-minimum"
pool. When that pool is exhausted, the driver attempts to allocate from
the resource "free-pool".

The quota (i.e., max) for the VFs and the PF is:
  The free-pool amount (50% of the real max) + the guaranteed minimum

For MACs:
  Guarantee 2 MACs per VF/PF per port. As a result, since we have only
  128 MACs per port, reduce the allowable number of VFs from 64 to 63.
  Any remaining MACs are put into a free pool.

For VLANs:
  For the PF, the per-port quota is 128 and guarantee is 64
     (to allow the PF to register at least a VLAN per VF in VST mode).
  For the VFs, the per-port quota is 64 and the guarantee is 0.
      We assume that VGT VFs are trusted not to abuse the VLAN resource.

For Counters:
  For all functions (PF and VFs), the quota is 128 and the guarantee is 0.

In this patch, we define the needed structures, which are added to the
resource-tracker struct.  In addition, we do initialization
for the resource quota, and adjust the query_device response to use quotas
rather than resource maxima.

As part of the implementation, we introduce a new field in
mlx4_dev: quotas.  This field holds the resource quotas used
to report maxima to the upper layers (ib_core, via query_device).

The HCA maxima of these values are passed to the VFs (via
QUERY_HCA) so that they may continue to use these in handling
QPs, CQs, SRQs and MPTs.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 16:19:07 -05:00
Amir Vadai 39e210fd23 net/mlx4: Unused local variable in mlx4_opreq_action
Clean up warning added by commit fe6f700d "net/mlx4_core: Respond to
operation request by firmware".

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17 15:10:50 -04:00
Or Gerlitz 5930e8d0ab net/mlx4: Fix typo, move similar defs to same location
Small code cleanup:

1. change MLX4_DEV_CAP_FLAGS2_REASSIGN_MAC_EN to MLX4_DEV_CAP_FLAG2_REASSIGN_MAC_EN

2. put MLX4_SET_PORT_PRIO2TC and MLX4_SET_PORT_SCHEDULER in the same union with the
   other MLX4_SET_PORT_yyy

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17 15:10:50 -04:00
Masanari Iida 6d3be300c6 treewide: Fix typo in printk
Correct spelling typo within various part of the kernel

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-10-14 15:24:22 +02:00
David S. Miller 0e76a3a587 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Merge net into net-next to setup some infrastructure Eric
Dumazet needs for usbnet changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-03 21:36:46 -07:00
Or Gerlitz 0508ad6468 net/mlx4_core: Don't give VFs MAC addresses which are derived from the PF MAC
If the user has not assigned a MAC address to a VM, then don't give it MAC which
is based on the PF one. The current derivation scheme is wrong and leads to VM
MAC collisions when the number of cards/hypervisors becomes big enough.

Instead, just give it zeros and let them figure out what to do with that.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-01 15:09:35 -07:00
Yevgeny Petrilin fe6f700d6c net/mlx4_core: Respond to operation request by firmware
This commit adds new firmware command and new firmware event.  The firmware
raises the MLX4_EVENT_TYPE_OP_REQUIRED event in order to signal the driver it
needs to perform an administrative operation throughout the MLX4_CMD_GET_OP_REQ
command. At the moment the supported operation is adding/removing multicast
entries which are used by the firmware for handling NCSI traffic in B0
steering mode.

Also, had to swap the order of mlx4_init_mcg_table() and
mlx4_init_eq_table() to make sure that driver will get events only after
resources are initialized to handle it.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-29 01:12:40 -07:00
Jack Morgenstein b01978cacf net/mlx4_core: Dynamic VST to VST vlan/qos changes
Within VST mode, enable modifying the vlan and/or qos
for a VF without requiring unbind/rebind.

This requires firmware which supports the UPDATE_QP command.
(If the command is not available, we fall back to requiring
unbind/bind to activate these changes).

To avoid race conditions with modify-qp on QPs that are affected
by update-qp, this operation is performed on the comm_wq.

If the update operation succeeds for all the necessary QPs, a
vlan_unregister is performed for the abandoned vlan id.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-01 13:10:22 -07:00
Rony Efraim 948e306d7d net/mlx4: Add VF link state support
Add support to change the link state of VF (vPort)

Signed-off-by: Rony Efraim <ronye@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-13 17:51:04 -07:00
Or Gerlitz c418253f12 net/mlx4_core: Keep VF assigned MAC in the PF admin table
MAC addresses assigned by the PF to VFs were not kept in the PF driver
admin table. As a result, displaying the VF MACs from the PF interface
to user space showed zero address where in fact the VF got non-zero
address from the PF, fix that.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-04 12:58:24 -07:00
Or Gerlitz 4e8cf5b8a1 net/mlx4_core: Add missing report on VST and spoof-checking dev caps
Commits e6b6a23 "net/mlx4: Add VF MAC spoof checking support" and
3f7fb021 "net/mlx4: Add set VF default vlan ID and priority support"
missed reporting in the device capabilities dump when these features
are actually supported. Also two too noisy debug messages which produce
message on every QP opened by a VF, were left in the code, fix that.

Signed-off-by: Rony Efraim <ronye@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-11 16:12:44 -07:00
Rony Efraim e6b6a23163 net/mlx4: Add VF MAC spoof checking support
Add ndo_set_vf_spoofchk support

Signed-off-by: Rony Efraim <ronye@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-26 23:29:14 -04:00
Rony Efraim 3f7fb021d0 net/mlx4: Add set VF default vlan ID and priority support
Add support to ndo_set_vf_vlan in the driver. Once this call is used the vport
is considered to be in VST mode. In this mode, the PPF driver configures
Ethernet QPs created by this VF to use this vlan id and priority. Currently
RoCE isn't supported on that mode.

The special values of VID=4095 or VID=0,UP=0 are considered as VGT.

Signed-off-by: Rony Efraim <ronye@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-26 23:29:13 -04:00
Rony Efraim 0eb62b93cb net/mlx4: Add structures to keep VF Ethernet ports information
This patch add struct mlx4_vport_state where all the parameters related
to management of VFs port (virtual ports of the NIC eswitch) are kept.

The driver keeps an administrative and operational copy of the settings.
The current administrative copy becomes operational on the event of probing
a VF either on a VM or on the host.

Signed-off-by: Rony Efraim <ronye@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-26 23:29:13 -04:00
Amir Vadai 30b40c31c2 net/mlx4_core: Disable HW timestamping for VFs
Disable timestamp capability on virtual functions.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-26 23:29:13 -04:00
Eugenia Emantayev ddd8a6c12d net/mlx4_core: Read HCA frequency and map internal clock
Read HCA frequency, read PCI clock bar and offset, map internal clock to
PCI bar.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-24 16:30:13 -04:00
Eugenia Emantayev d998735f44 net/mlx4_core: Add timestamping device capability
Add new device capability for timestamping support and query FW to retrieve it.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-24 16:30:13 -04:00
Or Gerlitz 540b3a39ee net/mlx4_en: Enable DCB ETS ops only when supported by the firmware
Enable the DCB ETS ops only when supported by the firmware. For older firmware/cards
which don't support ETS, advertize only PFC DCB ops.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:55:46 -04:00
Or Gerlitz 4d531aa8ab net/mlx4_core: Added proper description for two device capabilities
Added readable description for the DPDP and port sensing device capabilities.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:55:46 -04:00
Jack Morgenstein 0081c8f381 net/mlx4_core: Turn off device-managed FS bit in dev-cap wrapper if DMFS is not enabled
Older kernels detect DMFS (device-managed flow steering) from the HCA
device capability directly, regardless of whether the capability was
enabled in INIT_HCA, this is fixed by commit 7b8157bed "mlx4_core: Adjustments
to Flow Steering activation logic for SR-IOV"

To protect against guests running kernels without this fix, the host driver
should turn off the DMFS capability bit in mlx4_QUERY_DEV_CAP_wrapper.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-07 15:52:03 -05:00
Linus Torvalds 70a3a06d01 Main batch of InfiniBand/RDMA changes for 3.9:
- SRP error handling fixes from Bart Van Assche
  - Implementation of memory windows for mlx4 from Shani Michaeli
  - Lots of cxgb4 HW driver fixes from Vipul Pandya
  - Make iSER work for virtual functions, other fixes from Or Gerlitz
  - Fix for bug in qib HW driver from Mike Marciniszyn
  - IPoIB fixes from me, Itai Garbi, Shlomo Pongratz, Yan Burman
  - Various cleanups and warning fixes from Julia Lawall, Paul Bolle, Wei Yongjun
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJRLPNoAAoJEENa44ZhAt0h0bMP/1xqlgDR3DGdoUoV4yTd6O9G
 Uccdb7og5o5tedVo+8xZz01y4at99P8FWW4hJ4os6k8n6yKoBVwo7qjN+BOR+JG5
 q8+O+ynUSIg4tGrb5sXcMnKXAXbw/vkftMWYNA41cbrM24DTYzB/2SLpvhwbFoTT
 tdQc2tgz5QaDqzWbagyCR4+k/IgO+Llrz/RvIdtz4dsTnTDogN7QCoSffX8n/Lpb
 DxtyXK4sdl3DAtd3CsIdsB/TSMb3RkHLCoSvmrWlLnqMdsbRxVnCVfBm4BOghW3J
 Y2K3joRoCjjIZSRNs/i0FMFkT/jbCXg1oXg9ek/a6YFNcgyk7z8iGyXrRY7fOnno
 8U2SfxJ69YpVYeJr+DSjaeHcmjsaYU7NN7JPxzvPKcJOIsxQJ/euJDXAXau3lEQY
 o9/p4JsGty0WHi1NanyygvghvBAoP1C5/59Sl4bHH5gckPyJT1kinPSCTT76YXGS
 WkSHg2mDhiJHy7Pnuy85iZldPoy2/5z09/I4aGMeL+8kUZbD4iFqzXIJU0HTsAim
 EONoRXDhIcN5DNVSVH1ig6nJ2a7Vhov4Z0r/vB8P4KhslBcqFwf2leC0eCoe5mNt
 SzcKhqosZDXoL8AwzpntzGIOid8pWmHbUx/PgIcoVXPjtl0h2ULNIFoYYyMZ3cyU
 AyN2tSiUZVddTV1/aKGL
 =RAQw
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband update from Roland Dreier:
 "Main batch of InfiniBand/RDMA changes for 3.9:

   - SRP error handling fixes from Bart Van Assche

   - Implementation of memory windows for mlx4 from Shani Michaeli

   - Lots of cxgb4 HW driver fixes from Vipul Pandya

   - Make iSER work for virtual functions, other fixes from Or Gerlitz

   - Fix for bug in qib HW driver from Mike Marciniszyn

   - IPoIB fixes from me, Itai Garbi, Shlomo Pongratz, Yan Burman

   - Various cleanups and warning fixes from Julia Lawall, Paul Bolle,
     Wei Yongjun"

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (41 commits)
  IB/mlx4: Advertise MW support
  IB/mlx4: Support memory window binding
  mlx4: Implement memory windows allocation and deallocation
  mlx4_core: Enable memory windows in {INIT, QUERY}_HCA
  mlx4_core: Disable memory windows for virtual functions
  IPoIB: Free ipoib neigh on path record failure so path rec queries are retried
  IB/srp: Fail I/O requests if the transport is offline
  IB/srp: Avoid endless SCSI error handling loop
  IB/srp: Avoid sending a task management function needlessly
  IB/srp: Track connection state properly
  IB/mlx4: Remove redundant NULL check before kfree
  IB/mlx4: Fix compiler warning about uninitialized 'vlan' variable
  IB/mlx4: Convert is_xxx variables in build_mlx_header() to bool
  IB/iser: Enable iser when FMRs are not supported
  IB/iser: Avoid error prints on EAGAIN registration failures
  IB/iser: Use proper define for the commands per LUN value advertised to SCSI ML
  IB/uverbs: Implement memory windows support in uverbs
  IB/core: Add "type 2" memory windows support
  mlx4_core: Propagate MR deregistration failures to caller
  mlx4_core: Rename MPT-related functions to have mpt_ prefix
  ...
2013-02-26 11:41:08 -08:00
Shani Michaeli e448834e35 mlx4_core: Enable memory windows in {INIT, QUERY}_HCA
Add memory windows-related code to INIT_HCA and QUERY_HCA.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-25 10:44:31 -08:00
Shani Michaeli cc1ade94ee mlx4_core: Disable memory windows for virtual functions
Do not enable memory windows allocation for virtual functions.

In addition, add a few safety checks, such as:

* Verifying the PD of a new MPT matches the VF.
* Making sure binding memory window isn't enabled for FMRs, and
  that new memory windows are not FMR themselves.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-25 10:44:31 -08:00
Matan Barak 955154fa33 net/mlx4_en: Don't reassign port mac address on firmware that supports it
Mac reassignments should only be done when not supported by the firmware. To
accomplish that, checking firmware capability bit to know whether we should
reassign macs in the driver.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-31 12:48:47 -05:00
Hadar Hen Zion 23537b732f net/mlx4_core: Use firmware driven flow steering hash mode
The Firmware dynamically changes flow steering hash configuration from covering
L2 only to "full" L2/L3/L4 mode needed.  The dynamic change allows the driver
to set hard coded hash configuration which is changed by the firmware from L2
to L2/L3/L4 when attaching the first L3/L4 flow steering rule and back to L2
when there are no more such rules.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-31 12:48:47 -05:00
Jack Morgenstein 7b8157bedc mlx4_core: Adjustments to Flow Steering activation logic for SR-IOV
Separate flow steering capability detection from the decision to activate.

For the master (and for native), detect the flow steering capability
in mlx4_dev_cap, but activate the appropriate steering type in a new
function choose_flow_steering() based on detected data.

For VFs, activate flow steering based on what was actually activated
by the master, where that info is obtained via QUERY_HCA. This fixes
the current VF detection which is wrongly based on QUERY_DEV_CAP.

Also, for SR-IOV mode, if flow steering may be activated, do so only
if the max number of QPs per rule is sufficient to satisfy one
subscription per VF.  If not, fall back to B0 mode. This is needed to
serve registrations done by L2 network drivers such as mlx4_en and
IPoIB when the network stack attempts to join to multicast groups such
as all-hosts or the IPoIB broadcast group.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-12-19 11:47:14 -08:00
Or Gerlitz 08ff32352d mlx4: 64-byte CQE/EQE support
ConnectX-3 devices can use either 64- or 32-byte completion queue
entries (CQEs) and event queue entries (EQEs).  Using 64-byte
EQEs/CQEs performs better because each entry is aligned to a complete
cacheline.  This patch queries the HCA's capabilities, and if it
supports 64-byte CQEs and EQES the driver will configure the HW to
work in 64-byte mode.

The 32-byte vs 64-byte mode is global per HCA and not per CQ or EQ.

Since this mode is global, userspace (libmlx4) must be updated to work
with the configured CQE size, and guests using SR-IOV virtual
functions need to know both EQE and CQE size.

In case one of the 64-byte CQE/EQE capabilities is activated, the
patch makes sure that older guest drivers that use the QUERY_DEV_FUNC
command (e.g as done in mlx4_core of Linux 3.3..3.6) will notice that
they need an update to be able to work with the PPF. This is done by
changing the returned pf_context_behaviour not to be zero any more. In
case none of these capabilities is activated that value remains zero
and older guest drivers can run OK.

The SRIOV related flow is as follows

1. the PPF does the detection of the new capabilities using
   QUERY_DEV_CAP command.

2. the PPF activates the new capabilities using INIT_HCA.

3. the VF detects if the PPF activated the capabilities using
   QUERY_HCA, and if this is the case activates them for itself too.

Note that the VF detects that it must be aware to the new PF behaviour
using QUERY_FUNC_CAP.  Steps 1 and 2 apply also for native mode.

User space notification is done through a new field introduced in
struct mlx4_ib_ucontext which holds device capabilities for which user
space must take action. This changes the binary interface so the ABI
towards libmlx4 exposed through uverbs is bumped from 3 to 4 but only
when **needed** i.e. only when the driver does use 64-byte CQEs or
future device capabilities which must be in sync by user space. This
practice allows to work with unmodified libmlx4 on older devices (e.g
A0, B0) which don't support 64-byte CQEs.

In order to keep existing systems functional when they update to a
newer kernel that contains these changes in VF and userspace ABI, a
module parameter enable_64b_cqe_eqe must be set to enable 64-byte
mode; the default is currently false.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-26 10:19:17 -08:00
Roland Dreier d172f5a4ab Merge branches 'cma', 'cxgb4', 'ipoib', 'mlx4', 'mlx4-sriov', 'nes', 'qib' and 'srp' into for-linus 2012-10-02 07:43:59 -07:00
Roland Dreier 69612b9ff4 mlx4_core: Trivial readability fix: "0X30" -> "0x30"
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:45 -07:00
Jack Morgenstein 47605df953 mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations
Previously, the structure of a guest's proxy QPs followed the
structure of the PPF special qps (qp0 port 1, qp0 port 2, qp1 port 1,
qp1 port 2, ...).  The guest then did offset calculations on the
sqp_base qp number that the PPF passed to it in QUERY_FUNC_CAP().

This is now changed so that the guest does no offset calculations
regarding proxy or tunnel QPs to use.  This change frees the PPF from
needing to adhere to a specific order in allocating proxy and tunnel
QPs.

Now QUERY_FUNC_CAP provides each port individually with its proxy
qp0, proxy qp1, tunnel qp0, and tunnel qp1 QP numbers, and these are
used directly where required (with no offset calculations).

To accomplish this change, several fields were added to the phys_caps
structure for use by the PPF and by non-SR-IOV mode:

    base_sqpn -- in non-sriov mode, this was formerly sqp_start.
    base_proxy_sqpn -- the first physical proxy qp number -- used by PPF
    base_tunnel_sqpn -- the first physical tunnel qp number -- used by PPF.

The current code in the PPF still adheres to the previous layout of
sqps, proxy-sqps and tunnel-sqps.  However, the PPF can change this
layout without affecting VF or (paravirtualized) PF code.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:43 -07:00
Jack Morgenstein 026149cbaa mlx4: Activate SR-IOV mode for IB
Remove the error returns for IB ports from mlx4_ib_add,
mlx4_INIT_PORT_wrapper, and mlx4_CLOSE_PORT_wrapper.

Currently, SRIOV is supported only for devices for which the
link layer is IB on all ports; RoCE support will be added later.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:42 -07:00
Jack Morgenstein 980e90010f mlx4_core: INIT/CLOSE port logic for IB ports in SR-IOV mode
Normally, INIT_PORT and CLOSE_PORT are invoked when special QP0
transitions to RTR, or transitions to ERR/RESET respectively.

In SR-IOV mode, however, the master is also paravirtualized.  This in
turn requires that we not do INIT_PORT until the entire QP0 path (real
QP0 and proxy QP0) is ready to receive.  When the real QP0 goes down,
we should indicate that the port is not active.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:40 -07:00
Jack Morgenstein 2a4fae148c IB/mlx4: Propagate P_Key and guid change port management events to slaves
P_Key change and guid change events are not of interest to all slaves,
but only to those slaves which "see" the table slots whose contents
have change.

For example, if the guid at port 1, index 5 has changed in the PPF, we
wish to propagate the gid-change event only to the function which has
that guid index mapped to its port/guid table (in this case it is
slave #5). Other functions should not get the event, since the event
does not affect them.

Similarly with P_Keys -- P_Key change events are forwarded only to
slaves which have that P_Key index mapped to their virtual P_Key table.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:38 -07:00
Jack Morgenstein e2c76824ca mlx4_core: Add proxy and tunnel QPs to the reserved QP area
In addition, pass the proxy and tunnel QP numbers to slaves so the
driver can perform special QP paravirtualization.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:31 -07:00
Dotan Barak 426dd00d76 mlx4_core: Fix wrong offset in parsing query device caps response
The wrong offset was used when parsing the number of XRCs in
mlx4_QUERY_DEV_CAP().

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:08 -07:00
Linus Torvalds 5dedb9f3bd InfiniBand/RDMA changes for the 3.6 merge window:
- Updates to the qib low-level driver
  - First chunk of changes for SR-IOV support for mlx4 IB
  - RDMA CM support for IPv6-only binding
  - Other misc cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJQDXhUAAoJEENa44ZhAt0hZxwQAJydS9f9me31AGa45SAq8rA8
 6e3LgnQ6jS38he7LZZdSsT7g25jROCKOcTj6VUkNSVAVSkK8zcp7wngjDxw50IK6
 FF9hNeljMkWBflOhxzB34DRGCW4b4J+Yt1o7v1RtawiG/Mhri/6imKL0Aqjt4GwX
 a1MPZn+xI2osLujfdHJtATPWWB9jCaXdFe4DJUNPdqJhS6TN7s8OP3XMiqoJtdaV
 ptHeSSbjdUR1mg/h2LU2FVmWHXNSBxn7MEsrBBRkQVyiEkXieFBLwPTp4DqmAUJf
 xugm6Hf4sbiQ+QuU0baJODt56wveuYVQ4HKUzE0urUFXyU4TUB9blrehWZnKsRte
 wo/w4nvVlnXgGhHH0Igq76RDX8aCwc/6uQJ/29oChWjrei3HE0LjmIlPAu0vAhyw
 ViLe02/r2gKXQv1NxIqhPmGsJTZizg2mUk2eEJHHPQb/NGL7iNo6b+141AfURoqQ
 goGmlGdzffCRpmo6FXFZ57RPVRS4gwMunCY/Pmvq5a2t4oZh8899l2+V3N7bCfvH
 +JdavxAjia9U4IlgPsAqVaz8z8TyHOY/lEd75Wnw8q9www3kLx7hASvHsdwrS4LL
 ihzECzsaXOSoIXQzghs+6iyp1pzmzE9ve8OqbIGJStzlrOLyn7Gjo+Ixwm0SLsTO
 I7h9iJ1OMKFZ8bzmHsWb
 =f09j
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA changes from Roland Dreier:
 - Updates to the qib low-level driver
 - First chunk of changes for SR-IOV support for mlx4 IB
 - RDMA CM support for IPv6-only binding
 - Other misc cleanups and fixes

Fix up some add-add conflicts in include/linux/mlx4/device.h and
drivers/net/ethernet/mellanox/mlx4/main.c

* tag 'rdma-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (30 commits)
  IB/qib: checkpatch fixes
  IB/qib: Add congestion control agent implementation
  IB/qib: Reduce sdma_lock contention
  IB/qib: Fix an incorrect log message
  IB/qib: Fix QP RCU sparse warnings
  mlx4: Put physical GID and P_Key table sizes in mlx4_phys_caps struct and paravirtualize them
  mlx4_core: Allow guests to have IB ports
  mlx4_core: Implement mechanism for reserved Q_Keys
  net/mlx4_core: Free ICM table in case of error
  IB/cm: Destroy idr as part of the module init error flow
  mlx4_core: Remove double function declarations
  IB/mlx4: Fill the masked_atomic_cap attribute in query device
  IB/mthca: Fill in sq_sig_type in query QP
  IB/mthca: Warning about event for non-existent QPs should show event type
  IB/qib: Fix sparse RCU warnings in qib_keys.c
  net/mlx4_core: Initialize IB port capabilities for all slaves
  mlx4: Use port management change event instead of smp_snoop
  IB/qib: RCU locking for MR validation
  IB/qib: Avoid returning EBUSY from MR deregister
  IB/qib: Fix UC MR refs for immediate operations
  ...
2012-07-24 13:56:26 -07:00
Jack Morgenstein 6634961c14 mlx4: Put physical GID and P_Key table sizes in mlx4_phys_caps struct and paravirtualize them
To allow easy paravirtualization of P_Key and GID table sizes, keep
paravirtualized sizes in mlx4_dev->caps, but save the actual physical
sizes from FW in struct: mlx4_dev->phys_cap.

In addition, in SR-IOV mode, do the following:

1. Reduce reported P_Key table size by 1.
   This is done to reserve the highest P_Key index for internal use,
   for declaring an invalid P_Key in P_Key paravirtualization.
   We require a P_Key index which always contain an invalid P_Key
   value for this purpose (i.e., one which cannot be modified by
   the subnet manager).  The way to do this is to reduce the
   P_Key table size reported to the subnet manager by 1, so that
   it will not attempt to access the P_Key at index #127.

2. Paravirtualize the GID table size to 1. Thus, each guest sees
   only a single GID (at its paravirtualized index 0).

In addition, since we are paravirtualizing the GID table size to 1, we
add paravirtualization of the master GID event here (i.e., we do not
do ib_dispatch_event() for the GUID change event on the master, since
its (only) GUID never changes).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-11 11:52:23 -07:00
Jack Morgenstein 105c320f6a mlx4_core: Allow guests to have IB ports
Modify mlx4_dev_cap to allow IB support when SR-IOV is active.  Modify
mlx4_slave_cap to set the "rdma-supported" bit in its flags area, and
pass that to the guests (this is done in QUERY_FUNC_CAP and its
wrapper).

However, we don't activate IB support quite yet -- we leave the error
return at the start of mlx4_ib_add in the mlx4_ib driver.

In addition, set "protected fmr supported" bit to zero in the
QUERY_FUNC_CAP wrapper.

Finally, in the QUERY_FUNC_CAP wrapper, we needed to add code which
checks for the port type (IB or Ethernet).  Previously, this was not
an issue, since only Ethernet ports were supported.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-11 11:51:37 -07:00
Jack Morgenstein 00f5ce99dc mlx4: Use port management change event instead of smp_snoop
The port management change event can replace smp_snoop.  If the
capability bit for this event is set in dev-caps, the event is used
(by the driver setting the PORT_MNG_CHG_EVENT bit in the async event
mask in the MAP_EQ fw command).  In this case, when the driver passes
incoming SMP PORT_INFO SET mads to the FW, the FW generates port
management change events to signal any changes to the driver.

If the FW generates these events, smp_snoop shouldn't be invoked in
ib_process_mad(), or duplicate events will occur (once from the
FW-generated event, and once from smp_snoop).

In the case where the FW does not generate port management change
events smp_snoop needs to be invoked to create these events.  The flow
in smp_snoop has been modified to make use of the same procedures as
in the fw-generated-event event case to generate the port management
events (LID change, Client-rereg, Pkey change, and/or GID change).

Port management change event handling required changing the
mlx4_ib_event and mlx4_dispatch_event prototypes; the "param" argument
(last argument) had to be changed to unsigned long in order to
accomodate passing the EQE pointer.

We also needed to move the definition of struct mlx4_eqe from
net/mlx4.h to file device.h -- to make it available to the IB driver,
to handle port management change events.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-10 09:47:10 -07:00
Jack Morgenstein 752a50cab6 mlx4_core: Pass an invalid PCI id number to VFs
Currently, VFs have 0 in their dev->caps.function field.  This is a
valid pci id (usually of the PF).  Instead, pass an invalid PCI id to
the VF via QUERY_FW, so that if the value gets accessed in the VF
driver, we'll catch the problem.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-08 18:05:05 -07:00
Hadar Hen Zion 0ff1fb654b {NET, IB}/mlx4: Add device managed flow steering firmware API
The driver is modified to support three operation modes.

If supported by firmware use the device managed flow steering
API, that which we call device managed steering mode. Else, if
the firmware supports the B0 steering mode use it, and finally,
if none of the above, use the A0 steering mode.

When the steering mode is device managed, the code is modified
such that L2 based rules set by the mlx4_en driver for Ethernet
unicast and multicast, and the IB stack multicast attach calls
done through the mlx4_ib driver are all routed to use the device
managed API.

When attaching rule using device managed flow steering API,
the firmware returns a 64 bit registration id, which is to be
provided during detach.

Currently the firmware is always programmed during HCA initialization
to use standard L2 hashing. Future work should be done to allow
configuring the flow-steering hash function with common, non
proprietary means.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:05 -07:00
Hadar Hen Zion c96d97f4d1 net/mlx4: Set steering mode according to device capabilities
Instead of checking the firmware supported steering mode in various
places in the code, add a dedicated field in the mlx4 device capabilities
structure which is written once during the initialization flow and read
across the code.

This also set the grounds for add new steering modes. Currently two modes
are supported, and are named after the ConnectX HW versions A0 and B0.

A0 steering uses mac_index, vlan_index and priority to steer traffic
into pre-defined range of QPs.

B0 steering uses Ethernet L2 hashing rules and is enabled only
if the firmware supports both unicast and multicast B0 steering,

The current steering modes are relevant for Ethernet traffic only,
such that Infiniband steering remains untouched.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:05 -07:00
Jack Morgenstein 401453a31e net/mlx4_core: Fix obscure mlx4_cmd_box parameter in QUERY_DEV_CAP
The "!mlx4_is_slave" is totally confusing.  Fix with
constant MLX4_CMD_NATIVE, which is the intended behavior.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-31 18:18:16 -04:00
Jack Morgenstein b91cb3ebcd net/mlx4_core: Fixes for VF / Guest startup flow
- pass the following parameters:
  - firmware version (added QUERY_FW paravirtualization for that)

  - disable Blueflame on slaves. KVM disables write combining on guests,
    and we get better performance without BF in this case. (This requires
    QUERY_DEV_CAP paravirtualization, also in this commit)

  - max qp rdma as destination

- get rid of a chunk of "if (0)" dead code

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-31 18:18:16 -04:00
Linus Torvalds c23ddf7857 InfiniBand/RDMA changes for the 3.5 merge window:
- Add ocrdma hardware driver for Emulex IB-over-Ethernet adapters
  - Add generic and mlx4 support for "raw" QPs: allow suitably privileged
    applications to send and receive arbitrary packets directly to/from
    the hardware
  - Add "doorbell drop" handling to the cxgb4 driver
  - A fairly large batch of qib hardware driver changes
  - A few fixes for lockdep-detected issues
  - A few other miscellaneous fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJPumbZAAoJEENa44ZhAt0h8LgP/0fXe7Szm3n6P6UvMAVqkagM
 4PpreH3mpWUFpzqeQE1JPDtgx700R6aPipbHqgIN+k61RWMpLjICGcNx7iwxn1I+
 zqdquGygWgjceLz+BLVlk+iBmJt3vZ3fPRAXc7fdP+jhIarWkNIOy1pXWTUuRvED
 jL8jIaxhCcgAVzm/zNyt6IPxkaHvCz7K9wqmpyU0dsO9OyPdGvWA9+CkGXwmOCPq
 mxSVhWnfGsMkPBsL7EgTC5KP/ox2PKq6rFgysmVVS+rKCpP0L8BEVQyGX3Gf8KA8
 yV+KdTi9ofDnFrv6R7Wz0v7HRUih8GRssakzBu7Y7HLfK1M/QwMG0GUAibXGZObc
 vUXuQ3uRJ/cIzMPXqKeGYwpb5t+TmxyjhWu44OjCUQkNau91+9BSbA69S88KXc49
 aTJiCZlhPuGf4uGMWJJuPLcE2xO2QCZj+8ckL2STYrIip6GWlCH02kJaQmRkuWH2
 UhMOeJDBC4nvh4EQT/WwHpGzyhkavE2ayfo5YemxBJXo+P5Mdbf7WIDRQDLUEeQH
 F8sPoccH4hDiAorN/SkTsm14jVTP7oWW1M40Ont59Nhbgm88MsVkvjoneHnfBvbD
 HjK92soCWnYTAoREfj0G4xUxZgMdOZcezWrX0rx5LJ8Ju9y4zAi3cKGr7lg6hs4X
 syKfN0VjiDRtJ+pxayi3
 =yWfr
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA changes from Roland Dreier:
 - Add ocrdma hardware driver for Emulex IB-over-Ethernet adapters
 - Add generic and mlx4 support for "raw" QPs: allow suitably privileged
   applications to send and receive arbitrary packets directly to/from
   the hardware
 - Add "doorbell drop" handling to the cxgb4 driver
 - A fairly large batch of qib hardware driver changes
 - A few fixes for lockdep-detected issues
 - A few other miscellaneous fixes and cleanups

Fix up trivial conflict in drivers/net/ethernet/emulex/benet/be.h.

* tag 'rdma-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (53 commits)
  RDMA/cxgb4: Include vmalloc.h for vmalloc and vfree
  IB/mlx4: Fix mlx4_ib_add() error flow
  IB/core: Fix IB_SA_COMP_MASK macro
  IB/iser: Fix error flow in iser ep connection establishment
  IB/mlx4: Increase the number of vectors (EQs) available for ULPs
  RDMA/cxgb4: Add query_qp support
  RDMA/cxgb4: Remove kfifo usage
  RDMA/cxgb4: Use vmalloc() for debugfs QP dump
  RDMA/cxgb4: DB Drop Recovery for RDMA and LLD queues
  RDMA/cxgb4: Disable interrupts in c4iw_ev_dispatch()
  RDMA/cxgb4: Add DB Overflow Avoidance
  RDMA/cxgb4: Add debugfs RDMA memory stats
  cxgb4: DB Drop Recovery for RDMA and LLD queues
  cxgb4: Common platform specific changes for DB Drop Recovery
  cxgb4: Detect DB FULL events and notify RDMA ULD
  RDMA/cxgb4: Drop peer_abort when no endpoint found
  RDMA/cxgb4: Always wake up waiters in c4iw_peer_abort_intr()
  mlx4_core: Change bitmap allocator to work in round-robin fashion
  RDMA/nes: Don't call event handler if pointer is NULL
  RDMA/nes: Fix for the ORD value of the connecting peer
  ...
2012-05-21 17:54:55 -07:00
Jack Morgenstein 8bac9ede68 net/mlx4_core: Fix init_port mask state for slaves
In function mlx4_INIT_PORT_wrapper, the port state mask for the
slave is only set if we are invoking the INIT_PORT fw command.

However, the reference count for the (initialized) port is
incremented anyway.

This creates a problem in that when we have multiple slaves,
then the CLOSE_PORT command will never be invoked. The
reason is that in the CLOSE_PORT wrapper, if the port-state
mask is zero for the slave (which it is), the wrapper returns
without doing anything. The only slave which will not return
immediately in the CLOSE_PORT wrapper is that slave for which
INIT_PORT was invoked.

The fix is to not have the port-state mask setting depend
on the logic for calling the INIT_PORT fw command.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 00:56:58 -04:00
Shlomo Pongratz b3416f4476 mlx4_core: Add second capabilities flags field
This patch adds a 64-bit flags2 features member to struct mlx4_dev to
export further features of the hardware.  The original flags field
tracks features whose support bits are advertised by the firmware in
offsets 0x40 and 0x44 of the query device capabilities command.
flags2 will track features whose support bits are scattered at various
offsets.

RSS support is the first feature to be exported through flags2.  RSS
capabilities are located at offset 0x2e.  The size of the RSS
indirection table is also given in this offset.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-08 11:54:32 -07:00
Yevgeny Petrilin be6736ba1f net/mlx4: remove unused functions
Removing functions that are no longer in use, but still exist

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 15:19:18 -05:00
Doug Ledford 22c8bff6fa mlx4_core: Exported functions can't be static
At least on powerpc, it breaks the build if exported functions are
static.  Fix some static exported functions introduced with the mlx4
SR-IOV support added in 3.3-rc1.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-02-22 23:00:38 -08:00
Marcel Apfelbaum eb41049f2f mlx4_core: removed function index from vf.
The Virtual Functions should not be aware their function number.

Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-22 15:08:43 -05:00
Yevgeny Petrilin 1c015b3b82 mlx4_core: Elaborating limitation on VF port options
Showing which capabilities are not passed to VF
when executing QUERY_PORT

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-03 12:49:16 -05:00
Yevgeny Petrilin 95f56e7aa8 mlx4_core: limiting VF port options
At the moment VFs can only operate in Eth mode.
In addition we don't want the VF to attempt link sensing,
so we block this option as well.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-29 15:08:41 -05:00
Rusty Russell eb93992207 module_param: make bool parameters really bool (net & drivers/net)
module_param(bool) used to counter-intuitively take an int.  In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.

It's time to remove the int/unsigned int option.  For this version
it'll simply give a warning, but it'll break next kernel version.

(Thanks to Joe Perches for suggesting coccinelle for 0/1 -> true/false).

Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-19 22:27:29 -05:00
Yevgeny Petrilin 8d0fc7b611 mlx4_core: Changing link sensing logic
New FW can give clues to driver regarding default port type
and whether or not we should default to link sensing on the port.

2 bits are added to QUERY_PORT command:
1. suggested_type: This bit gives a hint whether the default port type should be
   IB or Ethernet.
   The driver will use this hint in case the user didn't specify explicitly the link layer
   type he wants to set.
2. default_sense: If this bit is set, we would sense the port type on start-up
   and default the port to link sensing

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-19 14:57:07 -05:00
Jack Morgenstein ab9c17a009 mlx4_core: Modify driver initialization flow to accommodate SRIOV for Ethernet
1. Added module parameters sr_iov and probe_vf for controlling enablement of
   SRIOV mode.
2. Increased default max num-qps, num-mpts and log_num_macs to accomodate
   SRIOV mode
3. Added port_type_array as a module parameter to allow driver startup with
   ports configured as desired.
   In SRIOV mode, only ETH is supported, and this array is ignored; otherwise,
   for the case where the FW supports both port types (ETH and IB), the
   port_type_array parameter is used.
   By default, the port_type_array is set to configure both ports as IB.
4. When running in sriov mode, the master needs to initialize the ICM eq table
   to hold the eq's for itself and also for all the slaves.
5. mlx4_set_port_mask() now invoked from mlx4_init_hca, instead of in mlx4_dev_cap.
6. Introduced sriov VF (slave) device startup/teardown logic (mainly procedures
   mlx4_init_slave, mlx4_slave_exit, mlx4_slave_cap, mlx4_slave_exit and flow
   modifications in __mlx4_init_one, mlx4_init_hca, and mlx4_setup_hca).
   VFs obtain their startup information from the PF (master) device via the
   comm channel.
7. In SRIOV mode (both PF and VF), MSI_X must be enabled, or the driver
   aborts loading the device.
8. Do not allow setting port type via sysfs when running in SRIOV mode.
9. mlx4_get_ownership:  Currently, only one PF is supported by the driver.
   If the HCA is burned with FW which enables more than one PF, only one
   of the PFs is allowed to run.  The first one up grabs a FW ownership
   semaphone -- all other PFs will find that semaphore taken, and the
   driver will not allow them to run.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Liran Liss <liranl@mellanox.co.il>
Signed-off-by: Marcel Apfelbaum <marcela@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13 13:56:08 -05:00
Marcel Apfelbaum 2b8fb2867c mlx4_core: mtts resources units changed to offset
In the previous implementation mtts are managed by:
1. order     - log(mtt segments), 'mtt segment' groups several mtts together.
2. first_seg - segment location relative to mtt table.
In the current implementation:
1. order     - log(mtts) rather than segments
2. offset    - mtt index in mtt table

Note: The actual mtt allocation is made in segments but it is
      transparent to callers.

Rational: The mtt resource holders are not interested on how the allocation
          of mtt is done, but rather on how they will use it.

Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13 13:56:07 -05:00
Marcel Apfelbaum 5cc914f108 mlx4_core: Added FW commands and their wrappers for supporting SRIOV
The following commands are added here:
1. QUERY_FUNC_CAP and its wrapper.  This function is used by VFs when
   they start up to receive configuration information from the PF, such
   as resource quotas for this VF, which ports should be used (currently
   two), what protocol is running on the port (currently Ethernet ONLY,
   or port not active).

2. QUERY_PORT and its wrapper. Previously, this FW command was invoked directly
   by the ETH driver (en_port.c) using mlx4_cmd_box. Virtualization is now
   required here (the VF's MAC address must be substituted for the PFs
   MAC address returned by the FW). We changed the invocation
   in the ETH driver to use mlx4_QUERY_PORT, and added the wrapper.

3. QUERY_HCA. Used by the VF to determine how the HCA was initialized.
   For now, we need only the multicast table member entry size
   (log2_mc_table_entry_sz, in the ConnectX PRM).  No wrapper is needed
   here, because the data may be passed as is to the VF without modification).

   In this command, we have added a GLOBAL_CAPS field for passing required
   configuration information from FW to a VF (this field is to allow safely
                   adding new SRIOV capabilities which require support in VF drivers, too).
   Bits will set here by FW in response to PF-driver configuration commands which
   will activate as yet undefined new SRIOV features. The VF will test to see that
   all required capabilities indicated by this field are supported (i.e., if a bit
   is set and the VF driver does not recognize that bit, it must abort
   its initialization).  Currently, no bits are set.

4. Added a CLOSE_PORT wrapper.  The PF context needs to keep track of how many VF contexts
   have the port open.  The PF context will not actually issue the FW close port command
   until the last port user issues a CLOSE_PORT request.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Marcel Apfelbaum <marcela@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13 13:56:06 -05:00