When bringing the port up, performing a SENSE_PORT command
To try and check to which physical link type (IB or Ethernet) the physical
port is connected.
In case there is no valid link partner, the port will come up as its
supported default.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (56 commits)
route: Take the right src and dst addresses in ip_route_newports
ipv4: Fix nexthop caching wrt. scoping.
ipv4: Invalidate nexthop cache nh_saddr more correctly.
net: fix pch_gbe section mismatch warning
ipv4: fix fib metrics
mlx4_en: Removing HW info from ethtool -i report.
net_sched: fix THROTTLED/RUNNING race
drivers/net/a2065.c: Convert release_resource to release_region/release_mem_region
drivers/net/ariadne.c: Convert release_resource to release_region/release_mem_region
bonding: fix rx_handler locking
myri10ge: fix rmmod crash
mlx4_en: updated driver version to 1.5.4.1
mlx4_en: Using blue flame support
mlx4_core: reserve UARs for userspace consumers
mlx4_core: maintain available field in bitmap allocator
mlx4: Add blue flame support for kernel consumers
mlx4_en: Enabling new steering
mlx4: Add support for promiscuous mode in the new steering model.
mlx4: generalization of multicast steering.
mlx4_en: Reporting HW revision in ethtool -i
...
Using blue flame can improve latency by allowing the HW to more efficiently
access the WQE. This patch presents two functions that are used to allocate or
release HW resources for using blue flame; the caller need to supply a struct
mlx4_bf object when allocating resources. Consumers that make use of this API
should post doorbells to the UAR object pointed by the initialized struct
mlx4_bf;
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
For Ethernet mode only,
When we want to register QP as promiscuous, it must be added to all the
existing steering entries and also to the default one.
The promiscuous QP might also be on of "real" QPs,
which means we need to monitor every entry to avoid duplicates and ensure
we close an entry when all it has is promiscuous QPs.
Same mechanism both for unicast and multicast.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
The same packet steering mechanism would be used both for IB and Ethernet,
Both multicasts and unicasts.
This commit prepares the general infrastructure for this.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
HW revision is derived from device ID and rev id.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver queries the FW for WOL support.
Ethtool get/set_wol is implemented accordingly.
Only magic packets are supported at the time.
Signed-off-by: Igor Yarovinsky <igory@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding a pool of MSI-X vectors and EQs that can be used explicitly by mlx4_core
customers (mlx4_ib, mlx4_en). The consumers will assign their own names to the
interrupt vectors. Those vectors are not opened at mlx4 device initialization,
opened by demand.
Changed the max number of possible EQs according to the new scheme, no longer relies on
on number of cores.
The new functionality is exposed through mlx4_assign_eq() and mlx4_release_eq().
Customers that do not use the new API will get completion vectors as before.
Signed-off-by: Markuze Alex <markuze@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
By default, each device is assumed to be able only handle 64 KB chunks
during DMA. By giving the segment size a larger value, the block layer
will coalesce more S/G entries together for SRP, allowing larger
requests with the same sg_tablesize setting. The block layer is the
only direct user of it, though a few IOMMU drivers reference it as
well for their *_map_sg coalescing code. pci-gart_64 on x86, and a
smattering on on sparc, powerpc, and ia64.
Since other IB protocols could potentially see larger segments with
this, let's check those:
- iSER is fine, because you limit your maximum request size to 512
KB, so we'll never overrun the page vector in struct iser_page_vec
(128 entries currently). It is independent of the DMA segment size,
and handles multi-page segments already.
- IPoIB is fine, as it maps each page individually, and doesn't use
ib_dma_map_sg().
- RDS appears to do the right thing and has no dependencies on DMA
segment size, but I don't claim to have done a complete audit.
- NFSoRDMA and 9p are OK -- they do not use ib_dma_map_sg(), so they
doesn't care about the coalescing.
- Lustre's ko2iblnd does not care about coalescing -- it properly
walks the returned sg list.
This patch ups the value on Mellanox hardware to 1 GB, which matches
reported firmware limits on mlx4.
Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Some systems have PCI addresses that don't fit in unsigned long (eg some
32-bit PowerPC 440 systems have 36-bit bus addresses). Fix up mlx4 drivers
by using phys_addr_t where appropriate, so we don't truncate any PCI
resource addresses before ioremapping them.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (63 commits)
IB/qib: clean up properly if pci_set_consistent_dma_mask() fails
IB/qib: Allow driver to load if PCIe AER fails
IB/qib: Fix uninitialized pointer if CONFIG_PCI_MSI not set
IB/qib: Fix extra log level in qib_early_err()
RDMA/cxgb4: Remove unnecessary KERN_<level> use
RDMA/cxgb3: Remove unnecessary KERN_<level> use
IB/core: Add link layer type information to sysfs
IB/mlx4: Add VLAN support for IBoE
IB/core: Add VLAN support for IBoE
IB/mlx4: Add support for IBoE
mlx4_en: Change multicast promiscuous mode to support IBoE
mlx4_core: Update data structures and constants for IBoE
mlx4_core: Allow protocol drivers to find corresponding interfaces
IB/uverbs: Return link layer type to userspace for query port operation
IB/srp: Sync buffer before posting send
IB/srp: Use list_first_entry()
IB/srp: Reduce number of BUSY conditions
IB/srp: Eliminate two forward declarations
IB/mlx4: Signal node desc changes to SM by using FW to generate trap 144
IB: Replace EXTRA_CFLAGS with ccflags-y
...
As there are now machines with a lot more physical memory, we need to
be able to register more memory. This patch lifts the upper limit of
log_mtts_per_seg from 5 to 7, increasing the amount of memory that can
be registered.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Adding capability for RSS for UDP traffic, hashing is done based on
IP addresses and UDP port number.
The support depends on HW/FW capabilities.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
The report now based on query from FW, giving the correct tranciever type
and link speed.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
The selftest includes 5 features:
1. Interrupt test: Executing commands and receiving command completion
on all our interrupt vectors.
2. Link test: Verifying we are connected to valid link partner.
3. Speed test: Check that we negotiated link speed correctly.
4. Registers test: Activate HW health check command.
5. Loopback test: Send a packet on loopback interface and catch it on RX side.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove near duplication of format string constants by using the newly
introduced vsprintf extention %pV to reduce text by 20k or so.
$ size drivers/net/mlx4/built-in.o*
text data bss dec hex filename
161367 1866 48784 212017 33c31 drivers/net/mlx4/built-in.o
142621 1866 46248 190735 2e90f drivers/net/mlx4/built-in.o.new
Use printk_once as appropriate.
Convert printks to pr_<level>, some bare printks now use pr_cont.
Remove now unused #define PFX.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
This fixes a sysfs lockdep warning in the mlx4 code.
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Use DEFINE_PCI_DEVICE_TABLE() so we get place PCI ids table into correct section
in every case.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If mlx4_init_port_info() fails, cleanup the initialized ports only.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add MT26438 (ConnectX EN 40GigE PCIe 2.0 5GT/s) to the list of supported
devices.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
On the error path of mlx4_init_hca(), mlx4_close_hca() is called,
followed by mlx4_free_icms() and mlx4_UNMAP_FA(). But both those
functions are also called from mlx4_close_hca(), which leads to a
double free.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The current implementation allocates a single host page for EQ context
memory, which was OK when we only allocated a few EQs. However, since
we now allocate an EQ for each CPU core, this patch removes the
hard-coded limit (which we exceed with 4 KB pages and 128 byte EQ
context entries with 32 CPUs) and uses the same ICM table code as all
other context tables, which ends up simplifying the code quite a bit
while fixing the problem.
This problem was actually hit in practice on a dual-socket Nehalem box
with 16 real hardware threads and sufficiently odd ACPI tables that it
shows on boot
SMP: Allowing 32 CPUs, 16 hotplug CPUs
so num_possible_cpus() ends up 32, and mlx4 ends up creating 33 MSI-X
interrupts and 33 EQs. This mlx4 bug means that mlx4 can't even
initialize at all on this quite mainstream system.
Cc: <stable@kernel.org>
Reported-by: Eli Cohen <eli@mellanox.co.il>
Tested-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The old code used two calls to pci_request_region() to get the two BARs
for the mlx4 device, for no particularly good reason. Clean up the code
a little by converting this to a single call to pci_request_regions().
Signed-off-by: Roland Dreier <rolandd@cisco.com>
MT26468 (PCI ID 0x6764) devices can expose multiple physical
functions. The current driver only handles the primary physical
function. For other functions, the QUERY_FW firmware command will
fail with the CMD_STAT_MULTI_FUNC_REQ error code. Don't try to drive
such devices, but print a message saying the driver is skipping those
devices rather than just "QUERY_FW command failed."
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
[ Rather than keeping unsupported devices bound to the driver, simply
print a more informative error message and exit - Roland ]
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The current MTT allocator uses kmalloc() to allocate a buffer for its
buddy allocator, and thus is limited in the amount of MTT segments
that it can control. As a result, the size of memory that can be
registered is limited too. This patch uses a module parameter to
control the number of MTT entries that each segment represents,
allowing more memory to be registered with the same number of
segments.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When the msi_x option is enabled but pci_enable_msix() fails (not
enough vectors are available etc), the entries array was not freed on
the error path.
Signed-off-by: Nicolas Morey-Chaisemartin <nicolas.morey-chaisemartin@ext.bull.net>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Replace all DMA_32BIT_MASK macro with DMA_BIT_MASK(32)
Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace all DMA_64BIT_MASK macro with DMA_BIT_MASK(64)
Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a port's link is down (except to driver restart) and the port is
configured for auto sensing, we try to sense port link type (Ethernet
or InfiniBand) in order to determine how to initialize the port. If
the port type needs to be changed, all mlx4 for the device interfaces
are unregistered and then registered again with the new port
types. Sensing is done with intervals of 3 seconds.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Recent cpumask changes changed num_possible_cpus() from returning an int
to returning an unsigned int. This means that doing
min(num_possible_cpus(), <int expression>)
now produces a warning like
drivers/net/mlx4/main.c: In function 'mlx4_enable_msi_x':
drivers/net/mlx4/main.c:915: warning: comparison of distinct pointer types lacks a cast
Fix this by using min_t(int, ...).
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When using MSI-X mode, create a completion event queue for each CPU.
Report the number of completion EQs in a new struct mlx4_caps member,
num_comp_vectors, and extend the mlx4_cq_alloc() interface with a
vector parameter so that consumers can specify which completion EQ
should be used to report events for the CQ being created.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Commit 7ff93f8b ("mlx4_core: Multiple port type support") introduced
support for different port types. As part of that support, SET_PORT
is invoked to set the port type during driver startup. However, as a
side-effect, for IB ports the invocation of this command also sets the
port's capability mask to zero (losing the default value set by FW).
To fix this, get the default ib port capabilities (via a MAD_IFC Port
Info query) during driver startup, and save them for use in the
mlx4_SET_PORT command when setting the port-type to Infiniband.
This patch fixes problems with subnet manager (SM) failover such as
<https://bugs.openfabrics.org/show_bug.cgi?id=1183>, which occurred
because the IsTrapSupported bit in the capability mask was zeroed.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Multi-protocol adapters support different port types. Each consumer
of mlx4_core queries for supported port types; in particular mlx4_ib
can no longer assume that all physical ports belong to it. Port type
is configured through a sysfs interface. When the type of a port is
changed, all mlx4 interfaces are unregistered, and then registered
again with the new port types.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add support for managing MAC and VLAN filters for each port.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Oren Duer <oren@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Get maximum ethernet MTU and default MAC address from the firmware
QUERY_DEV_CAP command.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
For ethernet support, we need to reserve QPs for the ethernet and
fibre channel driver. The QPs are reserved at the end of the QP
table. (This way we assure that they are aligned to their size)
We need to consider these reserved ranges in bitmap creation, so we
extend the mlx4 bitmap utility functions to allow reserved ranges at
both the bottom and the top of the range.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Update existing Mellanox copyright lines to 2008, and add such lines
to files where they are missing.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add support for the following operations to mlx4 when device firmware
supports them:
- Send with invalidate and local invalidate send queue work requests;
- Allocate/free fast register MRs;
- Allocate/free fast register MR page lists;
- Fast register MR send queue work requests;
- Local DMA L_Key.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
There was a bug in some versions of the mlx4 driver in
mlx4_alloc_fmr(), which hardcoded the minimum acceptable page_shift to
be 12. However, new ConnectX firmware can support a minimum
page_shift of 9 (log_pg_sz of 9 returned by QUERY_DEV_LIM) -- so with
old drivers, ib_fmr_alloc() would fail for ULPs using the device
minimum when creating FMRs.
To preserve firmware compatibility with released mlx4 drivers, the
firmware will continue to return 12 as before for log_page_sz in
QUERY_DEV_CAP for these drivers. However, to enable new drivers to
take advantage of the available smaller page size, the mlx4 driver now
first sets the log_pg_sz to the device minimum by setting a
log_page_sz value to 0 via the MOD_STAT_CFG command and then reading
the real minimum via QUERY_DEV_CAP.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In addition to mlx4_ib, there will be ethernet and FC consumers of
mlx4_core, so move the code for managing kernel doorbells into the
core module to avoid having to duplicate this multiple times.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
mlx4 hardware does not support external DDR memory. Moreover, UAR
area (BAR 2) can change depending on FW version.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
With the advent large clusters which utilize multicore hosts, 64K QPs
is not enough. We should increase the default maximum for QPs to 128K.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>