Commit Graph

43 Commits

Author SHA1 Message Date
Linus Torvalds 1e30c1b386 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking updates and fixes from David Miller:

1) Reinstate the no-ref optimization for input route lookups in ipv4 to
   fix some routing cache removal perf regressions.

2) Make TCP socket pre-demux work on ipv6 side too, from Eric Dumazet.

3) Get RX hash value from correct place in be2net driver, from
   Sarveshwar Bandi.

4) Validation of FIB cached routes missing critical check, from Eric
   Dumazet.

5) EEH support in mlx4 driver, from Kleber Sacilotto de Souza.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (23 commits)
  ipv6: Early TCP socket demux
  ipv4: Fix input route performance regression.
  pch_gbe: vlan skb len fix
  pch_gbe: add extra clean tx
  pch_gbe: fix transmit watchdog timeout
  ixgbe: fix panic while dumping packets on Tx hang with IOMMU
  be2net: Fix to parse RSS hash from Receive completions correctly.
  net/mlx4_en: Limit the RFS filter IDs to be < RPS_NO_FILTER
  hyperv: Add error handling to rndis_filter_device_add()
  hyperv: Add a check for ring_size value
  ipv4: rt_cache_valid must check expired routes
  net/pch_gpe: Cannot disable ethernet autonegation
  qeth: repair crash in qeth_l3_vlan_rx_kill_vid()
  netiucv: cleanup attribute usage
  net: wiznet add missing HAS_IOMEM dependency
  be2net: Missing byteswap in be_get_fw_log_level causes oops on PowerPC
  mlx4: Add support for EEH error recovery
  cdc-ncm: tag Ericsson WWAN devices (eg F5521gw) with FLAG_WWAN
  wanmain: comparing array with NULL
  caif: fix NULL pointer check
  ...
2012-07-26 18:09:01 -07:00
Kleber Sacilotto de Souza 57dbf29a54 mlx4: Add support for EEH error recovery
Currently the mlx4 drivers don't have the necessary callbacks to
implement EEH errors detection and recovery, so the PCI layer uses the
probe and remove callbacks to try to recover the device after an error on
the bus. However, these callbacks have race conditions with the internal
catastrophic error recovery functions, which will also detect the error
and this can cause the system to crash if both EEH and catas functions
try to reset the device.

This patch adds the necessary error recovery callbacks and makes sure
that the internal catastrophic error functions will not try to reset the
device in such scenarios. It also adds some calls to
pci_channel_offline() to suppress reads/writes on the bus when the slot
cannot accept I/O operations so we prevent unnecessary accesses to the
bus and speed up the device removal.

Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Acked-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-25 15:24:13 -07:00
Linus Torvalds 5dedb9f3bd InfiniBand/RDMA changes for the 3.6 merge window:
- Updates to the qib low-level driver
  - First chunk of changes for SR-IOV support for mlx4 IB
  - RDMA CM support for IPv6-only binding
  - Other misc cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJQDXhUAAoJEENa44ZhAt0hZxwQAJydS9f9me31AGa45SAq8rA8
 6e3LgnQ6jS38he7LZZdSsT7g25jROCKOcTj6VUkNSVAVSkK8zcp7wngjDxw50IK6
 FF9hNeljMkWBflOhxzB34DRGCW4b4J+Yt1o7v1RtawiG/Mhri/6imKL0Aqjt4GwX
 a1MPZn+xI2osLujfdHJtATPWWB9jCaXdFe4DJUNPdqJhS6TN7s8OP3XMiqoJtdaV
 ptHeSSbjdUR1mg/h2LU2FVmWHXNSBxn7MEsrBBRkQVyiEkXieFBLwPTp4DqmAUJf
 xugm6Hf4sbiQ+QuU0baJODt56wveuYVQ4HKUzE0urUFXyU4TUB9blrehWZnKsRte
 wo/w4nvVlnXgGhHH0Igq76RDX8aCwc/6uQJ/29oChWjrei3HE0LjmIlPAu0vAhyw
 ViLe02/r2gKXQv1NxIqhPmGsJTZizg2mUk2eEJHHPQb/NGL7iNo6b+141AfURoqQ
 goGmlGdzffCRpmo6FXFZ57RPVRS4gwMunCY/Pmvq5a2t4oZh8899l2+V3N7bCfvH
 +JdavxAjia9U4IlgPsAqVaz8z8TyHOY/lEd75Wnw8q9www3kLx7hASvHsdwrS4LL
 ihzECzsaXOSoIXQzghs+6iyp1pzmzE9ve8OqbIGJStzlrOLyn7Gjo+Ixwm0SLsTO
 I7h9iJ1OMKFZ8bzmHsWb
 =f09j
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA changes from Roland Dreier:
 - Updates to the qib low-level driver
 - First chunk of changes for SR-IOV support for mlx4 IB
 - RDMA CM support for IPv6-only binding
 - Other misc cleanups and fixes

Fix up some add-add conflicts in include/linux/mlx4/device.h and
drivers/net/ethernet/mellanox/mlx4/main.c

* tag 'rdma-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (30 commits)
  IB/qib: checkpatch fixes
  IB/qib: Add congestion control agent implementation
  IB/qib: Reduce sdma_lock contention
  IB/qib: Fix an incorrect log message
  IB/qib: Fix QP RCU sparse warnings
  mlx4: Put physical GID and P_Key table sizes in mlx4_phys_caps struct and paravirtualize them
  mlx4_core: Allow guests to have IB ports
  mlx4_core: Implement mechanism for reserved Q_Keys
  net/mlx4_core: Free ICM table in case of error
  IB/cm: Destroy idr as part of the module init error flow
  mlx4_core: Remove double function declarations
  IB/mlx4: Fill the masked_atomic_cap attribute in query device
  IB/mthca: Fill in sq_sig_type in query QP
  IB/mthca: Warning about event for non-existent QPs should show event type
  IB/qib: Fix sparse RCU warnings in qib_keys.c
  net/mlx4_core: Initialize IB port capabilities for all slaves
  mlx4: Use port management change event instead of smp_snoop
  IB/qib: RCU locking for MR validation
  IB/qib: Avoid returning EBUSY from MR deregister
  IB/qib: Fix UC MR refs for immediate operations
  ...
2012-07-24 13:56:26 -07:00
Jack Morgenstein 6634961c14 mlx4: Put physical GID and P_Key table sizes in mlx4_phys_caps struct and paravirtualize them
To allow easy paravirtualization of P_Key and GID table sizes, keep
paravirtualized sizes in mlx4_dev->caps, but save the actual physical
sizes from FW in struct: mlx4_dev->phys_cap.

In addition, in SR-IOV mode, do the following:

1. Reduce reported P_Key table size by 1.
   This is done to reserve the highest P_Key index for internal use,
   for declaring an invalid P_Key in P_Key paravirtualization.
   We require a P_Key index which always contain an invalid P_Key
   value for this purpose (i.e., one which cannot be modified by
   the subnet manager).  The way to do this is to reduce the
   P_Key table size reported to the subnet manager by 1, so that
   it will not attempt to access the P_Key at index #127.

2. Paravirtualize the GID table size to 1. Thus, each guest sees
   only a single GID (at its paravirtualized index 0).

In addition, since we are paravirtualizing the GID table size to 1, we
add paravirtualization of the master GID event here (i.e., we do not
do ib_dispatch_event() for the GUID change event on the master, since
its (only) GUID never changes).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-11 11:52:23 -07:00
Jack Morgenstein 105c320f6a mlx4_core: Allow guests to have IB ports
Modify mlx4_dev_cap to allow IB support when SR-IOV is active.  Modify
mlx4_slave_cap to set the "rdma-supported" bit in its flags area, and
pass that to the guests (this is done in QUERY_FUNC_CAP and its
wrapper).

However, we don't activate IB support quite yet -- we leave the error
return at the start of mlx4_ib_add in the mlx4_ib driver.

In addition, set "protected fmr supported" bit to zero in the
QUERY_FUNC_CAP wrapper.

Finally, in the QUERY_FUNC_CAP wrapper, we needed to add code which
checks for the port type (IB or Ethernet).  Previously, this was not
an issue, since only Ethernet ports were supported.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-11 11:51:37 -07:00
Jack Morgenstein 396f2feb05 mlx4_core: Implement mechanism for reserved Q_Keys
The SR-IOV special QP tunneling mechanism uses proxy special QPs
(instead of the real special QPs) for MADs on guests.  These proxy QPs
send their packets to a "tunnel" QP owned by the master.  The master
then forwards the MAD (after any required paravirtualization) to the
real special QP, which sends out the MAD.

For security reasons (i.e., to prevent guests from sending MADs to
tunnel QPs belonging to other guests), each proxy-tunnel QP pair is
assigned a unique, reserved, Q_Key.  These Q_Keys are available only
for proxy and tunnel QPs -- if the guest tries to use these Q_Keys
with other QPs, it will fail.

This patch introduces a mechanism for reserving a block of 64K Q_Keys
for proxy/tunneling use.

The patch introduces also two new fields into mlx4_dev: base_sqpn and
base_tunnel_sqpn.

In SR-IOV mode, the QP numbers for the "real," proxy, and tunnel sqps
are added to the reserved QPN area (so that they will not change).
There are 8 special QPs per port in the HCA, and each of them is
assigned both a proxy and a tunnel QP, for each VF and for the PF as
well in SR-IOV mode.

The QPNs for these QPs are arranged as follows:
 1. The real SQP numbers (8)
 2. The proxy SQPs (8 * (max number of VFs + max number of PFs)
 3. The tunnel SQPs (8 * (max number of VFs + max number of PFs)

To support these QPs, two new fields are added to struct mlx4_dev:

  base_sqp:  this is the QP number of the first of the real SQPs
  base_tunnel_sqp: this is the qp number of the first qp in the tunnel
                   sqp region. (On guests, this is the first tunnel
                   sqp of the 8 which are assigned to that guest).

In addition, in SR-IOV mode, sqp_start is the number of the first
proxy SQP in the proxy SQP region.  (In guests, this is the first
proxy SQP of the 8 which are assigned to that guest)

Note that in non-SR-IOV mode, there are no proxies and no tunnels.
In this case, sqp_start is set to sqp_base -- which minimizes code
changes.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-11 11:35:55 -07:00
Jack Morgenstein 2aca1172c2 net/mlx4_core: Initialize IB port capabilities for all slaves
With IB SR-IOV, each slave has its own separate copy of the port
capabilities flags.  For example, the master can run a subnet manager
(which causes the IsSM bit to be set in the master's port
capabilities) without affecting the port capabilities seen by the
slaves (the IsSM bit will be seen as cleared in the slaves).

Also add a static inline mlx4_master_func_num() to enhance readability
of the code.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-10 09:57:06 -07:00
Hadar Hen Zion 0ff1fb654b {NET, IB}/mlx4: Add device managed flow steering firmware API
The driver is modified to support three operation modes.

If supported by firmware use the device managed flow steering
API, that which we call device managed steering mode. Else, if
the firmware supports the B0 steering mode use it, and finally,
if none of the above, use the A0 steering mode.

When the steering mode is device managed, the code is modified
such that L2 based rules set by the mlx4_en driver for Ethernet
unicast and multicast, and the IB stack multicast attach calls
done through the mlx4_ib driver are all routed to use the device
managed API.

When attaching rule using device managed flow steering API,
the firmware returns a 64 bit registration id, which is to be
provided during detach.

Currently the firmware is always programmed during HCA initialization
to use standard L2 hashing. Future work should be done to allow
configuring the flow-steering hash function with common, non
proprietary means.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:05 -07:00
Hadar Hen Zion c96d97f4d1 net/mlx4: Set steering mode according to device capabilities
Instead of checking the firmware supported steering mode in various
places in the code, add a dedicated field in the mlx4 device capabilities
structure which is written once during the initialization flow and read
across the code.

This also set the grounds for add new steering modes. Currently two modes
are supported, and are named after the ConnectX HW versions A0 and B0.

A0 steering uses mac_index, vlan_index and priority to steer traffic
into pre-defined range of QPs.

B0 steering uses Ethernet L2 hashing rules and is enabled only
if the firmware supports both unicast and multicast B0 steering,

The current steering modes are relevant for Ethernet traffic only,
such that Infiniband steering remains untouched.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:05 -07:00
Yuval Mintz 90b1ebe7af mlx4: set maximal number of default RSS queues
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>

Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-05 03:06:44 -07:00
Yevgeny Petrilin 9858d2d1ac net/mlx4: Use single completion vector after NOP failure
Fix a crash at the error flow of NOP command which caused the driver to try and use
a completion vector which wasn't allocated.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-25 16:30:12 -07:00
Jack Morgenstein 6230bb234d net/mlx4_core: Check port out-of-range before using in mlx4_slave_cap
The range check was performed after using the port number.

Reverse this to prevent a potential array overflow.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-31 18:18:16 -04:00
Jack Morgenstein b91cb3ebcd net/mlx4_core: Fixes for VF / Guest startup flow
- pass the following parameters:
  - firmware version (added QUERY_FW paravirtualization for that)

  - disable Blueflame on slaves. KVM disables write combining on guests,
    and we get better performance without BF in this case. (This requires
    QUERY_DEV_CAP paravirtualization, also in this commit)

  - max qp rdma as destination

- get rid of a chunk of "if (0)" dead code

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-31 18:18:16 -04:00
Marcel Apfelbaum 3fc929e2d6 net/mlx4_core: Fix number of EQs used in ICM initialisation
In SRIOV mode, the number of EQs used when computing the total ICM size
was incorrect.

To fix this, we do the following:
1. We add a new structure to mlx4_dev, mlx4_phys_caps, to contain physical HCA
   capabilities.  The PPF uses the phys capabilities when it computes things
   like ICM size.

   The dev_caps structure will then contain the paravirtualized values, making
   bookkeeping much easier in SRIOV mode. We add a structure rather than a
   single parameter because there will be other fields in the phys_caps.

   The first field we add to the mlx4_phys_caps structure is num_phys_eqs.

2. In INIT_HCA, when running in SRIOV mode, the "log_num_eqs" parameter
   passed to the FW is the number of EQs per VF/PF; each function (PF or VF)
   has this number of EQs available.

   However, the total number of EQs which must be allowed for in the ICM is
   (1 << log_num_eqs) * (#VFs + #PFs).  Rather than compute this quantity,
   we allocate ICM space for 1024 EQs (which is the device maximum
   number of EQs, and which is the value we place in the mlx4_phys_caps structure).

   For INIT_HCA, however, we use the per-function number of EQs as described
   above.

Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-31 18:18:16 -04:00
Linus Torvalds c23ddf7857 InfiniBand/RDMA changes for the 3.5 merge window:
- Add ocrdma hardware driver for Emulex IB-over-Ethernet adapters
  - Add generic and mlx4 support for "raw" QPs: allow suitably privileged
    applications to send and receive arbitrary packets directly to/from
    the hardware
  - Add "doorbell drop" handling to the cxgb4 driver
  - A fairly large batch of qib hardware driver changes
  - A few fixes for lockdep-detected issues
  - A few other miscellaneous fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJPumbZAAoJEENa44ZhAt0h8LgP/0fXe7Szm3n6P6UvMAVqkagM
 4PpreH3mpWUFpzqeQE1JPDtgx700R6aPipbHqgIN+k61RWMpLjICGcNx7iwxn1I+
 zqdquGygWgjceLz+BLVlk+iBmJt3vZ3fPRAXc7fdP+jhIarWkNIOy1pXWTUuRvED
 jL8jIaxhCcgAVzm/zNyt6IPxkaHvCz7K9wqmpyU0dsO9OyPdGvWA9+CkGXwmOCPq
 mxSVhWnfGsMkPBsL7EgTC5KP/ox2PKq6rFgysmVVS+rKCpP0L8BEVQyGX3Gf8KA8
 yV+KdTi9ofDnFrv6R7Wz0v7HRUih8GRssakzBu7Y7HLfK1M/QwMG0GUAibXGZObc
 vUXuQ3uRJ/cIzMPXqKeGYwpb5t+TmxyjhWu44OjCUQkNau91+9BSbA69S88KXc49
 aTJiCZlhPuGf4uGMWJJuPLcE2xO2QCZj+8ckL2STYrIip6GWlCH02kJaQmRkuWH2
 UhMOeJDBC4nvh4EQT/WwHpGzyhkavE2ayfo5YemxBJXo+P5Mdbf7WIDRQDLUEeQH
 F8sPoccH4hDiAorN/SkTsm14jVTP7oWW1M40Ont59Nhbgm88MsVkvjoneHnfBvbD
 HjK92soCWnYTAoREfj0G4xUxZgMdOZcezWrX0rx5LJ8Ju9y4zAi3cKGr7lg6hs4X
 syKfN0VjiDRtJ+pxayi3
 =yWfr
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA changes from Roland Dreier:
 - Add ocrdma hardware driver for Emulex IB-over-Ethernet adapters
 - Add generic and mlx4 support for "raw" QPs: allow suitably privileged
   applications to send and receive arbitrary packets directly to/from
   the hardware
 - Add "doorbell drop" handling to the cxgb4 driver
 - A fairly large batch of qib hardware driver changes
 - A few fixes for lockdep-detected issues
 - A few other miscellaneous fixes and cleanups

Fix up trivial conflict in drivers/net/ethernet/emulex/benet/be.h.

* tag 'rdma-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (53 commits)
  RDMA/cxgb4: Include vmalloc.h for vmalloc and vfree
  IB/mlx4: Fix mlx4_ib_add() error flow
  IB/core: Fix IB_SA_COMP_MASK macro
  IB/iser: Fix error flow in iser ep connection establishment
  IB/mlx4: Increase the number of vectors (EQs) available for ULPs
  RDMA/cxgb4: Add query_qp support
  RDMA/cxgb4: Remove kfifo usage
  RDMA/cxgb4: Use vmalloc() for debugfs QP dump
  RDMA/cxgb4: DB Drop Recovery for RDMA and LLD queues
  RDMA/cxgb4: Disable interrupts in c4iw_ev_dispatch()
  RDMA/cxgb4: Add DB Overflow Avoidance
  RDMA/cxgb4: Add debugfs RDMA memory stats
  cxgb4: DB Drop Recovery for RDMA and LLD queues
  cxgb4: Common platform specific changes for DB Drop Recovery
  cxgb4: Detect DB FULL events and notify RDMA ULD
  RDMA/cxgb4: Drop peer_abort when no endpoint found
  RDMA/cxgb4: Always wake up waiters in c4iw_peer_abort_intr()
  mlx4_core: Change bitmap allocator to work in round-robin fashion
  RDMA/nes: Don't call event handler if pointer is NULL
  RDMA/nes: Fix for the ORD value of the connecting peer
  ...
2012-05-21 17:54:55 -07:00
Jack Morgenstein ba062d5219 net/mlx4_core: Add XRC domains and counters to resource tracker
Add missing resource tracking for XRC domains and complete the tracking for HCA
network flow counters.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 00:56:59 -04:00
Jack Morgenstein b8924951f6 net/mlx4_core: Fix potential kernel Oops in res tracker during Dom0 driver unload
Currently the slave and master resources are deleted after master freed
all bitmaps. If any resources were not properly cleaned up during the
shutdown process, an Oops would result.

Fix so that delete slave (only) resources during cleanup. Master resources
are cleaned up during unload process, and need not separately be cleaned.

Note that during cleanup, we need to split the resource-tracker freeing
functionality.

Before removing all the bitmaps, we free any leftover slave resources.
However, we can only remove the resource tracker linked list after
all bitmap frees, since some of the freeing functions (e.g.,
mlx4_cleanup_eq_table) use paravirtualized FW commands which expect
the resource tracker linked list to be present.

Found-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 00:56:58 -04:00
Jack Morgenstein 681372a7a3 net/mlx4_core: Do not reset module-parameter num_vfs when fail to enable sriov
Consider the following scenario: 2 HCAs, where only one of which can run SRIOV.

If we reset the module parameter, all the VFs of the SRIOV HCA will be
claimed by the PPF host (-- the code relies on num_vfs being non-zero
to avoid this claiming, and num_vfs was reset when pci_enable_sriov failed
for the non-SRIOV HCA).

The solution is not to touch the num_vfs parameter.

Also, eliminate the unneeded check of num_vfs when disabling sriov
(the dev flag bit is sufficient).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 00:56:58 -04:00
Shlomo Pongratz b3416f4476 mlx4_core: Add second capabilities flags field
This patch adds a 64-bit flags2 features member to struct mlx4_dev to
export further features of the hardware.  The original flags field
tracks features whose support bits are advertised by the firmware in
offsets 0x40 and 0x44 of the query device capabilities command.
flags2 will track features whose support bits are scattered at various
offsets.

RSS support is the first feature to be exported through flags2.  RSS
capabilities are located at offset 0x2e.  The size of the RSS
indirection table is also given in this offset.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-08 11:54:32 -07:00
Linus Torvalds 0c2fe82a9b InfiniBand/RDMA changes for the 3.4 merge window. Nothing big really
stands out; by patch count lots of fixes to the mlx4 driver plus some
 cleanups and fixes to the core and other drivers.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJPZ2THAAoJEENa44ZhAt0hMgkP/AwXjwzz6lYlIizytQ5KOH11
 CT/VoHLIGIwY31aRhZRHggWLEbJi8akgfh04K9PiSh/muFRPYk5KJDoQEoJV5Odv
 12krqtpZeL41YcAPq0wiyP3Ki5AZDCDumzWbOtmxE9m93mVfi3yFTTWDHFo/7SOx
 A2kUITdKuZhgBpCFYS23Gs8QTWcMPUlwNlM76edG90BjSySHsK15tbqTUaEOC6Ux
 SPG0p0c2YjlZCPmRObrCL65o5LFRVajinZ4rXN7rre4LEU+IuXgW+mQU2Kwy8z6a
 oMPE2l4OMSqj270r2PlVK087gGH69nKfLgLQLJiP0Id+KwdYUk7vQcA+Fu0jwjP3
 Ci/ahllvB76IiaNbax0vTy2ohCJmilLLotAP46NW0vPR2/KnmVy0Mqk8pXQQddw4
 tnAFNnafn/MjTty8GwqyXR/enJZs29ePcSxcYnKjXYRIgG4ldejL2wktgSra8+gb
 3/qFag1HRgZzcICkXGpHj7uWJ2KDe++1m6KTb0THUmrjuiel6ATFZpYoeRed3GfS
 ZMqpjZvuXM8nA9CrKMkzm5vIiUFF8Qq0hUTLR38F8FiNEhmC08JqH5PqjJ3Z18if
 R1yG4W4xmp/0ICHg4zyg6LWV3YsweDD+jSCJpW+KNBvu3HtYb41HIlK+i2TTmtPD
 3etEZZRwROAyYKcwN01p
 =DbLx
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA changes for the 3.4 merge window from Roland Dreier:
 "Nothing big really stands out; by patch count lots of fixes to the
  mlx4 driver plus some cleanups and fixes to the core and other
  drivers."

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (28 commits)
  mlx4_core: Scale size of MTT table with system RAM
  mlx4_core: Allow dynamic MTU configuration for IB ports
  IB/mlx4: Fix info returned when querying IBoE ports
  IB/mlx4: Fix possible missed completion event
  mlx4_core: Report thermal error events
  mlx4_core: Fix one more static exported function
  IB: Change CQE "csum_ok" field to a bit flag
  RDMA/iwcm: Reject connect requests if cmid is not in LISTEN state
  RDMA/cxgb3: Don't pass irq flags to flush_qp()
  mlx4_core: Get rid of redundant ext_port_cap flags
  RDMA/ucma: Fix AB-BA deadlock
  IB/ehca: Fix ilog2() compile failure
  IB: Use central enum for speed instead of hard-coded values
  IB/iser: Post initial receive buffers before sending the final login request
  IB/iser: Free IB connection resources in the proper place
  IB/srp: Consolidate repetitive sysfs code
  IB/srp: Use pr_fmt() and pr_err()/pr_warn()
  IB/core: Fix SDR rates in sysfs
  mlx4: Enforce device max FMR maps in FMR alloc
  IB/mlx4: Set bad_wr for invalid send opcode
  ...
2012-03-21 10:33:42 -07:00
Roland Dreier 42872c7a5e Merge branches 'misc' and 'mlx4' into for-next
Conflicts:
	drivers/infiniband/hw/mlx4/main.c
	drivers/net/ethernet/mellanox/mlx4/main.c
	include/linux/mlx4/device.h
2012-03-12 16:25:28 -07:00
Or Gerlitz 096335b3f9 mlx4_core: Allow dynamic MTU configuration for IB ports
Set the MTU for IB ports in the driver instead of using the firmware
default of 2KB (the driver defaults to 4KB).  Allow for dynamic mtu
configuration through a new, per-port sysfs entry.

Since there's a dependency between the port MTU and the max number of
HW VLs the port can support, apply a mim/max approach, using a loop
that goes down from the highest possible number of VLs to the lowest,
using the firmware return status to know whether the requested number
of VLs is possible with a given MTU.

For now, as with the dynamic link type change / VPI support, the sysfs
entry to change the mtu is exposed only when NOT running in SR-IOV
mode.  To allow changing the MTU for the master in SR-IOV mode,
primary-function-initiated FLR (Function Level Reset) needs to be
implemented.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-03-12 16:24:59 -07:00
Roland Dreier e10903b087 mlx4_core: Fix one more static exported function
Commit 22c8bff6fa ("mlx4_core: Exported functions can't be static")
fixed most of this up, but forgot about mlx4_is_slave_active().  Fix
this one too.

Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-03-12 16:24:58 -07:00
Or Gerlitz 8154c07fe1 mlx4_core: Get rid of redundant ext_port_cap flags
While doing the work for commit a6f7feae6d ("IB/mlx4: pass SMP
vendor-specific attribute MADs to firmware") we realized that the
firmware would respond on all sorts of vendor-specific MADs.
Therefore commit 97285b7817 ("mlx4_core: Add extended port
capabilities support") adds redundant code into the driver, since
there's no real reaon to maintain the extended capabilities of the
port, as they can be queried on demand (e.g the FDR10 capability).

This patch reverts commit 97285b7817 and removes the check for
extended caps from the mlx4_ib driver port query flow.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-03-06 17:25:18 -08:00
David S. Miller ff4783ce78 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/sfc/rx.c

Overlapping changes in drivers/net/ethernet/sfc/rx.c, one to change
the rx_buf->is_page boolean into a set of u16 flags, and another to
adjust how ->ip_summed is initialized.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-26 21:55:51 -05:00
Eli Cohen a5bbe892da mlx4: Enforce device max FMR maps in FMR alloc
ConnectX devices have a limit on the number of mappings that can be
done on an FMR before having to call sync_tpt.  The current
mlx4_ib driver reports the limit correctly in max_map_per_fmr in
.query_device(), but mlx4_core doesn't check it when actually
allocating FMRs.

Add a max_fmr_maps field to struct mlx4_caps and enforce this maximum
value on FMR allocations.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-02-26 01:43:37 -08:00
Yevgeny Petrilin 1e0f03d57d mlx4_core: Fixing array indexes when setting port types
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-24 01:52:45 -05:00
Yevgeny Petrilin 3d8f93083b mlx4: Setting new port types after all interfaces unregistered
In port type change flow, need to set the new port types only after
all interfaces have finished the unregister process.
Otherwise, during unregister, one of the interfaces might issue a SET_PORT
command with wrong port types, it can cause bad FW behavior.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-21 15:27:24 -05:00
Yevgeny Petrilin 730c41d5ba mlx4: Replacing pool_lock with mutex
Under the spinlock we call request_irq(), which allocates memory with GFP_KERNEL,
This causes the following trace when DEBUG_SPINLOCK is enabled, it can cause
the following trace:

 BUG: spinlock wrong CPU on CPU#2, ethtool/2595
 lock: ffff8801f9cbc2b0, .magic: dead4ead, .owner: ethtool/2595, .owner_cpu: 0
 Pid: 2595, comm: ethtool Not tainted 3.0.18 #2
 Call Trace:
 spin_bug+0xa2/0xf0
 do_raw_spin_unlock+0x71/0xa0
 _raw_spin_unlock+0xe/0x10
 mlx4_assign_eq+0x12b/0x190 [mlx4_core]
 mlx4_en_activate_cq+0x252/0x2d0 [mlx4_en]
 ? mlx4_en_activate_rx_rings+0x227/0x370 [mlx4_en]
 mlx4_en_start_port+0x189/0xb90 [mlx4_en]
 mlx4_en_set_ringparam+0x29a/0x340 [mlx4_en]
 dev_ethtool+0x816/0xb10
 ? dev_get_by_name_rcu+0xa4/0xe0
 dev_ioctl+0x2b5/0x470
 handle_mm_fault+0x1cd/0x2d0
 sock_do_ioctl+0x5d/0x70
 sock_ioctl+0x79/0x2f0
 do_vfs_ioctl+0x8c/0x340
 sys_ioctl+0xa1/0xb0
 system_call_fastpath+0x16/0x1b

Replacing with mutex, which is enough in this case.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-21 15:27:23 -05:00
Jack Morgenstein 3d7474734b mlx4_core: Do not map BF area if capability is 0
BF can be disabled in some cases, the capability field, bf_reg_size is set
to zero in this case. Don't map the BF area in this case, it would cause
failures.  In addition, leaving the BF area unmapped
also alerts the ETH driver to not use BF.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-20 19:26:34 -05:00
Eugenia Emantayev 45b5136551 mlx4: remove unused field high_prios
Remove unnecessary field high_prios from mlx4_steer struct and initialization

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-14 14:11:58 -05:00
Marcel Apfelbaum 9fd7a1e147 mlx4_core: Fix mtt profile issue
Num mtts from profile is really the number of mtt segments.
Thus, in make profile, to get the proper number of MTT entries,
must multiply num_mtts by mtts per segment.

Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-22 15:08:44 -05:00
Marcel Apfelbaum eb41049f2f mlx4_core: removed function index from vf.
The Virtual Functions should not be aware their function number.

Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-22 15:08:43 -05:00
Yevgeny Petrilin 46c4674754 mlx4_core: using array index for sense_allowed
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-29 15:08:41 -05:00
Rusty Russell eb93992207 module_param: make bool parameters really bool (net & drivers/net)
module_param(bool) used to counter-intuitively take an int.  In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.

It's time to remove the int/unsigned int option.  For this version
it'll simply give a warning, but it'll break next kernel version.

(Thanks to Joe Perches for suggesting coccinelle for 0/1 -> true/false).

Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-19 22:27:29 -05:00
Yevgeny Petrilin 8d0fc7b611 mlx4_core: Changing link sensing logic
New FW can give clues to driver regarding default port type
and whether or not we should default to link sensing on the port.

2 bits are added to QUERY_PORT command:
1. suggested_type: This bit gives a hint whether the default port type should be
   IB or Ethernet.
   The driver will use this hint in case the user didn't specify explicitly the link layer
   type he wants to set.
2. default_sense: If this bit is set, we would sense the port type on start-up
   and default the port to link sensing

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-19 14:57:07 -05:00
Yevgeny Petrilin 58a60168d1 mlx4: capability for link sensing
For ConnectX3 devices, we allow link sensing only if FW explicitly
reported it supports the feature.
For older versions (ConnectX1 and 2), if the card supports both link layer types
(Ethenet and Infiniband), link sensing is supported.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-19 14:57:06 -05:00
Jack Morgenstein ab9c17a009 mlx4_core: Modify driver initialization flow to accommodate SRIOV for Ethernet
1. Added module parameters sr_iov and probe_vf for controlling enablement of
   SRIOV mode.
2. Increased default max num-qps, num-mpts and log_num_macs to accomodate
   SRIOV mode
3. Added port_type_array as a module parameter to allow driver startup with
   ports configured as desired.
   In SRIOV mode, only ETH is supported, and this array is ignored; otherwise,
   for the case where the FW supports both port types (ETH and IB), the
   port_type_array parameter is used.
   By default, the port_type_array is set to configure both ports as IB.
4. When running in sriov mode, the master needs to initialize the ICM eq table
   to hold the eq's for itself and also for all the slaves.
5. mlx4_set_port_mask() now invoked from mlx4_init_hca, instead of in mlx4_dev_cap.
6. Introduced sriov VF (slave) device startup/teardown logic (mainly procedures
   mlx4_init_slave, mlx4_slave_exit, mlx4_slave_cap, mlx4_slave_exit and flow
   modifications in __mlx4_init_one, mlx4_init_hca, and mlx4_setup_hca).
   VFs obtain their startup information from the PF (master) device via the
   comm channel.
7. In SRIOV mode (both PF and VF), MSI_X must be enabled, or the driver
   aborts loading the device.
8. Do not allow setting port type via sysfs when running in SRIOV mode.
9. mlx4_get_ownership:  Currently, only one PF is supported by the driver.
   If the HCA is burned with FW which enables more than one PF, only one
   of the PFs is allowed to run.  The first one up grabs a FW ownership
   semaphone -- all other PFs will find that semaphore taken, and the
   driver will not allow them to run.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Liran Liss <liranl@mellanox.co.il>
Signed-off-by: Marcel Apfelbaum <marcela@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13 13:56:08 -05:00
Marcel Apfelbaum 2b8fb2867c mlx4_core: mtts resources units changed to offset
In the previous implementation mtts are managed by:
1. order     - log(mtt segments), 'mtt segment' groups several mtts together.
2. first_seg - segment location relative to mtt table.
In the current implementation:
1. order     - log(mtts) rather than segments
2. offset    - mtt index in mtt table

Note: The actual mtt allocation is made in segments but it is
      transparent to callers.

Rational: The mtt resource holders are not interested on how the allocation
          of mtt is done, but rather on how they will use it.

Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13 13:56:07 -05:00
Eugenia Emantayev 0ec2c0f86d mlx4: Traffic steering management support for SRIOV
Let multicast/unicast attaching flow go through resource tracker.
The PF is the one responsible for managing all the steering entries.
Define and use module parameter that determines the number of qps
per multicast group.
Minor changes in function calls according to changed prototype.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13 13:56:07 -05:00
Jack Morgenstein 65dab25deb mlx4: Extanding port_mask functionality
Port mask now has additional state.
Port can be set as "none". In this case neither the mlx4_en or mlx4_ib
drivers take ownership of the port.
In multifunction mode there is an option to set the vfs as single ported devices.
(in single function mode, both physical ports belong to same function)

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13 13:56:05 -05:00
Linus Torvalds f470f8d4e7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (62 commits)
  mlx4_core: Deprecate log_num_vlan module param
  IB/mlx4: Don't set VLAN in IBoE WQEs' control segment
  IB/mlx4: Enable 4K mtu for IBoE
  RDMA/cxgb4: Mark QP in error before disabling the queue in firmware
  RDMA/cxgb4: Serialize calls to CQ's comp_handler
  RDMA/cxgb3: Serialize calls to CQ's comp_handler
  IB/qib: Fix issue with link states and QSFP cables
  IB/mlx4: Configure extended active speeds
  mlx4_core: Add extended port capabilities support
  IB/qib: Hold links until tuning data is available
  IB/qib: Clean up checkpatch issue
  IB/qib: Remove s_lock around header validation
  IB/qib: Precompute timeout jiffies to optimize latency
  IB/qib: Use RCU for qpn lookup
  IB/qib: Eliminate divide/mod in converting idx to egr buf pointer
  IB/qib: Decode path MTU optimization
  IB/qib: Optimize RC/UC code by IB operation
  IPoIB: Use the right function to do DMA unmap pages
  RDMA/cxgb4: Use correct QID in insert_recv_cqe()
  RDMA/cxgb4: Make sure flush CQ entries are collected on connection close
  ...
2011-11-01 10:51:38 -07:00
Jeff Kirsher 5a2cc190eb mlx4: Move the Mellanox driver
Moves the Mellanox driver into drivers/net/ethernet/mellanox/ and
make the necessary Kconfig and Makefile changes.

CC: Roland Dreier <roland@kernel.org>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-08-11 02:41:35 -07:00