Commit Graph

106 Commits

Author SHA1 Message Date
Zhu Yanjun 8a18e911d0 IB: remove duplicate header files
In hfi.h, the header file opa_addr.h is included twice.
In vt.h, the header file mmap.h is included twice.

Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-13 16:46:03 -04:00
Kamenee Arumugam 0719007663 IB/hfi1: Convert PortXmitWait/PortVLXmitWait counters to flit times
HFI's counters SendWaitCnt and SendWaitVlCnt are in units
of TXE cycle time (at 805MHz). OPA counters PortXmitWait and
PortVLXmtWait are in units of flit times.
Convert the counter values to flit units using following
conversion formula:

PortXmitWait =
	SendWaitCnt * 2 * (4 /link_width) * (25 Gbps /link_speed)
PortVLXmitWait =
	SendWaitVLCnt * 2 * (4 /link_width) * (25 Gbps /link_speed)

At link up or downgrade events, the link width can change. To ensure
accurate counter calculations, sample the counters after the events,
during counter requests, and then aggregate the OPA counters.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01 15:43:30 -07:00
Sebastian Sanchez ca85bb1ca9 IB/hfi1: Remove unnecessary fecn and becn fields
packet->fecn and packet->becn are calculated in the hot path
and are never used. Remove these fields as they show to be
costly in a profile. Also, remove initialization for
becn and fecn in process_ecn() as they're unconditionally
assigned in the function and ensure fecn and becn variables
use a boolean type.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01 15:43:29 -07:00
Sebastian Sanchez 6d6b8848c8 IB/hfi1: Optimize packet type comparison using 9B and bypass code paths
The packet type comparison used to find out if a packet is a bypass
packet in the hot path is an expensive operation as seen in a profile.

Determine packet's pkey and migration bit through the bypass and 9B
code paths instead.

Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01 15:43:28 -07:00
Michael J. Ruhl 82a9792656 IB/hfi1: Re-order IRQ cleanup to address driver cleanup race
The pci_request_irq() interfaces always adds the IRQF_SHARED bit to
all IRQ requests.

When the kernel is built with CONFIG_DEBUG_SHIRQ config flag, if the
IRQF_SHARED bit is set, a call to the IRQ handler is made from the
__free_irq() function. This is testing a race condition between the
IRQ cleanup and an IRQ racing the cleanup.  The HFI driver should be
able to handle this race, but does not.

This race can cause traces that start with this footprint:

BUG: unable to handle kernel NULL pointer dereference at   (null)
Call Trace:
 <hfi1 irq handler>
 ...
 __free_irq+0x1b3/0x2d0
 free_irq+0x35/0x70
 pci_free_irq+0x1c/0x30
 clean_up_interrupts+0x53/0xf0 [hfi1]
 hfi1_start_cleanup+0x122/0x190 [hfi1]
 postinit_cleanup+0x1d/0x280 [hfi1]
 remove_one+0x233/0x250 [hfi1]
 pci_device_remove+0x39/0xc0

Export IRQ cleanup function so it can be called from other modules.

Using the exported cleanup function:

  Re-order the driver cleanup code to clean up IRQ resources before
  other resources, eliminating the race.

  Re-order error path for init so that the race does not occur.

Reduce severity on spurious error message for SDMA IRQs to info.

Reviewed-by: Alex Estrin <alex.estrin@intel.com>
Reviewed-by: Patel Jay P <jay.p.patel@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01 15:24:32 -07:00
Michael J. Ruhl 11f0e89710 IB/{hfi1, qib}: Fix a concurrency issue with device name in logging
The get_unit_name() function crafts a string based on the device name
and the device unit number.  It then stores this in a static variable.

This has concurrency issues as can be seen with this log:

hfi1 0000:02:00.0: hfi1_1: read_idle_message: read idle message 0x203
hfi1 0000:01:00.0: hfi1_1: read_idle_message: read idle message 0x203

The PCI device ID (0000:02:00.0 vs. 0000:01:00.0) is correct for the
message, but the device string hfi1_1 is incorrect (it should be
hfi1_0 for the second log message).

Remove get_unit_name() function.

Instead, use the rvt accessor rvt_get_ibdev_name() to get the IB name
string.

Clean up any hfi1_early_xx calls that can now use the new path.

QIB has the same (qib_get_unit_name()) issue.  Updating as necessary.

Remove qib_get_unit_name() function.

Update log message that has redundant device name.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-05 13:34:55 -05:00
Michael J. Ruhl 06f2597f75 IB/{rdmavt, hfi1, qib}: Remove get_card_name() downcall
rdmavt has a down call to client drivers to retrieve a crafted card
name.

This name should be the IB defined name.

Rather than craft the name each time it is needed, simply retrieve
the IB allocated name from the IB device.

Update the function name to reflect its application.

Clean up driver code to match this change.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-05 13:34:55 -05:00
Jason Gunthorpe 76a895d9e1 Merge branch 'from-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git
Patches for 4.16 that are dependent on patches sent to 4.15-rc.

These are small clean ups for the vmw_pvrdma and i40iw drivers.

* 'from-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git:
  RDMA/vmw_pvrdma: Remove usage of BIT() from UAPI header
  RDMA/vmw_pvrdma: Use refcount_t instead of atomic_t
  RDMA/vmw_pvrdma: Use more specific sizeof in kcalloc
  RDMA/vmw_pvrdma: Clarify QP and CQ is_kernel logic
  RDMA/vmw_pvrdma: Add UAR SRQ macros in ABI header file
  i40iw: Change accelerated flag to bool
2017-12-27 21:50:46 -07:00
Don Hiatt 2e903b611b IB/hfi1: Change slid arg in ingress_pkey_table_fail to 32bit
Change the slid arg to ingress_pkey_table_fail() to a full
32Bits and do not convert to 16Bits in caller. This is so we
can keep everything 32bit in the kernel and only change to
16bit at the uapi boundary.

Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-22 13:46:11 -07:00
Michael J. Ruhl 4c009af473 IB/hfi: Only read capability registers if the capability exists
During driver init, various registers are saved to allow restoration
after an FLR or gen3 bump.  Some of these registers are not available
in some circumstances (i.e. Virtual machines).

This bug makes the driver unusable when the PCI device is passed into
a VM, it fails during probe.

Delete unnecessary register read/write, and only access register if
the capability exists.

Cc: <stable@vger.kernel.org> # 4.14.x
Fixes: a618b7e40a ("IB/hfi1: Move saving PCI values to a separate function")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-22 10:42:08 -07:00
Jan Sokolowski 641f348bbd IB/hfi1: Allow MgmtAllowed on B2B setups
HFI's are hard-wired to send Device Info frames with
MgmtAllowed bit set to 0. This means in B2B setups,
MgmtAllowed would never be allowed, which prevents
remote opa management tools from working properly.

Assume MgmtAllowed if a neighbor is also an HFI.

Fixes: 98b9ee2002 ("IB/hfi1: Cache neighbor secure data after link up")
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 15:53:56 -05:00
Mike Marciniszyn 1b311f8931 IB/hfi1: Add tx_opcode_stats like the opcode_stats
This patch adds tx_opcode_stats to parallel the
(rx)opcode_stats in the debugfs.

Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-30 14:51:36 -04:00
Bart Van Assche e2fdbc2368 IB/hfi1: Define hfi1_handle_cnp_tbl[] once
Move the hfi1_handle_cnp_tbl[] from a header file to a .c file
such that only one copy ends up in the hfi1 kernel module. This
patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-14 20:47:06 -04:00
Michael J. Ruhl d7d626179f IB/hfi1: Fix incorrect available receive user context count
The addition of the VNIC contexts to num_rcv_contexts changes the
meaning of the sysfs value nctxts from available user contexts, to
user contexts + reserved VNIC contexts.

User applications that use nctxts are now broken.

Update the calculation so that VNIC contexts are used only if there are
hardware contexts available, and do not silently affect nctxts.

Update code to use the calculated VNIC context number.

Update the sysfs value nctxts to be available user contexts only.

Fixes: 2280740f01 ("IB/hfi1: Virtual Network Interface Controller (VNIC) HW support")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Niranjana Vishwanathapura <Niranjana.Vishwanathapura@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: <Stable@vger.kernel.org> #v4.12+
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-04 15:39:44 -04:00
Mike Marciniszyn e08aa59476 IB/hfi1: Fix output trace issues from 16B change
The 16B changes to the output side of the header trace introduced
two issues:

1. An uninitialized field "l4" for 9B packets

   This field needs to be given a value of 0 for 9B
   packets to insure a correct 9B trace.

   The fix adds a new define to insure that there is a dummy
   default for 9B packets to insure the correct string
   is decoded.

2. Use of entry vs. __entry in field references

Fixes: Commit 863cf89d47 ("IB/hfi1: Add 16B trace support")
Reported-by: Kaike Wan <kaike.wan@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-04 15:39:44 -04:00
Michael J. Ruhl d59075ad1e IB/hfi1: Add a safe wrapper for _rcd_get_by_index
hfi1_rcd_get_by_index assumes that the given index is in the correct
range.  In most cases this is correct because the index is bounded by
a loop.  For these cases, adding a range check to the function is
redundant.

For the use case that is not bounded by the loop range, a _safe wrapper
function is needed to validate the index before accessing the rcd array.

Add a _safe wrapper to _get_by_index to validate the index range.

Update appropriate call sites with the new _safe function.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27 11:34:13 -04:00
Jan Sokolowski 6fab2a88f7 IB/hfi1: Remove unused hfi1_cpulist variables
Following variables: hfi1_cpulist and hfi1_cpulist_count
are unused. Remove them.

Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27 11:34:13 -04:00
Michael J. Ruhl 21e5acc064 IB/hfi1: Inline common calculation
Calculating the offset to a context is done several times throughout
the code.  Create a common inlined function for doing this
calculation.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27 11:34:13 -04:00
Ira Weiny 156d24d700 IB/hfi1: Remove unused link_default variable
devdata->link_default is no longer variable

Maintain number of holes by moving dc_shutdown

Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27 11:34:13 -04:00
Michael J. Ruhl 05cb18fda9 IB/hfi1: Update HFI to use the latest PCI API
The HFI PCI IRQ code uses an obsolete PCI API.  Update the code to use
the new PCI IRQ API and any necessary changes because of the new API.

Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27 11:34:13 -04:00
Grzegorz Morys de42de80d7 IB/hfi1: Ratelimit prints from sdma_interrupt
Ratelimit error prints from sdma_interrupt function
that could swarm dmesg otherwise.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Grzegorz Morys <grzegorz.morys@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:18 -04:00
Kaike Wan bf808b5039 IB/hfi1: Add kernel receive context info to debugfs
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:38 -04:00
Jakub Byczkowski d392a673e7 IB/hfi1: Remove pstate from hfi1_pportdata
Do not track physical state separately from host_link_state.
Deduce physical state from host_link_state when required.
Change cache_physical_state to log_physical_state to make
sure host_link_state reflects hardwares physical state properly.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:38 -04:00
Jakub Byczkowski 9161860463 IB/hfi1: Add flag for platform config scratch register read
Add flag in pport data structure to determine when platform config was
read from scratch registers. Change conditions in parse_platform_config
and get_platform_config_field to use the new flag.

Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:37 -04:00
Don Hiatt 566d53a826 IB/hfi1: Enhance PIO/SDMA send for 16B
PIO/SDMA send logic now uses the hdr_type field to determine
the type of packet that has been constructed. Based on the hdr_type,
certain things such as PBC flags, padding count and the LT extra
trailing bytes are determined.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:37 -04:00
Don Hiatt 5b6cabb0db IB/hfi1: Add 16B RC/UC support
Add 16B bypass packet support for RC/UC traffic types.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:37 -04:00
Dasaratharaman Chandramouli 51e658f5dd IB/rdmavt, hfi1, qib: Enhance rdmavt and hfi1 to use 32 bit lids
Increase lid used in hfi1 driver to 32 bits. qib continues
to use 16 bit lids.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:37 -04:00
Don Hiatt 863cf89d47 IB/hfi1: Add 16B trace support
Add trace support to 16B bypass packets during send and
receive.

Sample input header trace:
<idle>-0     [000] d.h. 271742.509477: input_ibhdr: [0000:05:00.0] (16B)
len:24 sc:0 dlid:0xf0000b slid:0x10002 age:0 becn:0 fecn:0 l4:10 rc:0
sc:0 pkey:0x8001 entropy:0x0000 op:0x65,UD_SEND_ONLY_WITH_IMMEDIATE se:0
m:1 pad:3 tver:0 qpn:0xffffff a:0 psn:0x00000001 hlen:248 deth qkey
0x01234567 sqpn 0x000004

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:37 -04:00
Don Hiatt 88733e3b84 IB/hfi1: Add 16B UD support
Add 16B bypass packet support for UD traffic types.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:37 -04:00
Don Hiatt d98bb7f7e6 IB/hfi1: Determine 9B/16B L2 header type based on Address handle
When address handle attributes are initialized, the LIDs are
transformed to be in the 32 bit LID space.
When constructing the header, hfi1 driver will look at the LID
to determine the packet header to be created.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:37 -04:00
Don Hiatt 5786adf3fd IB/hfi1: Add support to process 16B header errors
Enhance hdr_rcverr() to also handle errors during
16B bypass packet receive.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:37 -04:00
Don Hiatt 72c07e2b67 IB/hfi1: Add support to receive 16B bypass packets
We introduce a struct hfi1_16b_header to support 16B headers.
16B bypass packets are received by the driver and processed
similar to 9B packets. Add basic support to handle 16B packets.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:37 -04:00
Michael J. Ruhl d295dbeb2a IB/hf1: User context locking is inconsistent
There is a mixture of mutex and spinlocks to protect receive context
(rcd/uctxt) information.  This is not used consistently.

Use the mutex to protect device receive context information only.
Use the spinlock to protect sub context information only.

Protect access to items in the rcd array with a spinlock and
reference count.

Remove spinlock around dd->rcd array cleanup.  Since interrupts are
disabled and cleaned up before this point, this lock is not useful.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:36 -04:00
Michael J. Ruhl f2a3bc00a0 IB/hfi1: Protect context array set/clear with spinlock
The rcd array can be accessed from user context or during interrupts.
Protecting this with a mutex isn't a good idea because the mutex should
not be used from an IRQ.

Protect the allocation and freeing of rcd array elements with a
spinlock.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:36 -04:00
Bartlomiej Dudek 64a296f579 IB/hfi1: Use host_link_state to read state when DC is shut down
When DC is shut down (by e.g.  disconnecting the cable), the
driver should use host_link_state to get port's current
physical state. This is due to the fact that physical state
is read from DC's CSRs and when DC is shut down and state is
changed, its registers are not impacted.

Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Bartlomiej Dudek <bartlomiej.dudek@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:36 -04:00
Byczkowski, Jakub 02a222c7f6 IB/hfi1: Remove lstate from hfi1_pportdata
Do not track logical state separately from host_link_state. Deduce
logical state from host_link_state when required. Transitions in
set_link_state and goto_offline already make sure host_link_state
reflects hardware's logical state properly.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:36 -04:00
Sebastian Sanchez 626c077c02 IB/hfi1: Prevent link down request double queuing
When link interrupts occur, multiple link down requests
could be queued up when only one is needed. This could get
the hfi1 out of sync with its link partner during LNI.

Only allow one link down request to be queued at any one time.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:18:38 -04:00
Sebastian Sanchez 71d47008ca IB/hfi1: Create workqueue for link events
Currently, link down interrupts queue link entries
on a workqueue intended for sending events only.
Create a workqueue for queuing link events.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:18:38 -04:00
Bartlomiej Dudek a618b7e40a IB/hfi1: Move saving PCI values to a separate function
During PCIe initialization some registers' values from
PCI config space are saved in order to restore them later
(i.e. after reset). Restoring those value is done by a
function called restore_pci_variables, while saving them
is put directly into function hfi1_pcie_ddinit.
Move saving values to a separate function in the image
of restoring functionality.

Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Bartlomiej Dudek <bartlomiej.dudek@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:18:37 -04:00
Michael J. Ruhl e6f7622df1 IB/hfi1: Size rcd array index correctly and consistently
The array index for the rcd array is sized several different ways
throughout the code.

Use the user interface size (u16) as the standard size and update the
necessary code to reflect this.

u16 is large enough for the largest amount of supported contexts.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:17:55 -04:00
Michael J. Ruhl 91d970abe8 IB/hfi1: Remove unused user context data members
Several data members of the user context have become unused over time.
Cleaning them up.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:17:54 -04:00
Mike Marciniszyn cb51c5d2cd IB/hfi1: Fix bar0 mapping to use write combining
When the debugpat kernel boot flag is turned on the following
traces are printed:

[ 1884.793168] x86/PAT: Overlap at 0x90000000-0x92000000
[ 1884.803510] x86/PAT: reserve_memtype added [mem 0x91200000-0x9127ffff],
track uncached-minus, req write-combining, ret uncached-minus
[ 1884.818167] hfi1 0000:05:00.0: hfi1_0: WC Remapped RcvArray:
ffffc9000a980000

The ioremap_wc() clearly is not returning a write combining mapping due
to an overlap where the RcvArray is mapped in a uncached mapping prior
to creating the proposed write combining mapping.

The patch replaces the single base register for uncached CSRs that
used to overlap the RcvArray with two mappings.   One, kregbase1, from the
bar0 up to the RcvArray and another, kregbase2, from the end of the
RcvArray to the pio send buffer space.  A new dd field, base2_start,
is used to convert the zero-based offset in the CSR routines to the
correct kregbase1/kregbase2 mapping.  A single direct write of the
RcvArray CSRs is replaced with hfi1_put_tid() to insure correct access
using the new disjoint mapping.

Additionally, the kregend field is deleted since it is only ever written.

patdebug now shows the RcvArray as write combining:
[   35.688990] x86/PAT: reserve_memtype added [mem 0x91200000-0x9127ffff],
track write-combining, req write-combining, ret write-combining

To insulate from any potential issues with write combining, all
writeq are now flushed in hfi1_put_tid() and rcv_array_wc_fill().

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:17:54 -04:00
Bartlomiej Dudek c53df62c7a IB/hfi1: Check return values from PCI config API calls
Ensure that return values from kernel PCI config access
API calls in HFI driver are checked and react properly if
they are not expected (i.e. not successful).

Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Bartlomiej Dudek <bartlomiej.dudek@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:01:36 -04:00
Michael J. Ruhl f683c80ca6 IB/hfi1: Resolve kernel panics by reference counting receive contexts
Base receive contexts can be used by sub contexts.  Because of this,
resources for the context cannot be completely freed until all sub
contexts are done using the base context.

Introduce a reference count so that the base receive context can be
freed only when all sub contexts are done with it.

Use the provided function call for setting default send context
integrity rather than the manual method.

The cleanup path does not set all variables back to NULL after freeing
resources.  Since the clean up code can get called more than once,
(e.g. during context close and on the error path), it is necessary to
make sure that all the variables are NULLed.

Possible crash are:

BUG: unable to handle kernel paging request at 0000000001908900
IP: read_csr+0x24/0x30 [hfi1]
RIP: 0010:read_csr+0x24/0x30 [hfi1]
Call Trace:
 sc_disable+0x40/0x110 [hfi1]
 hfi1_file_close+0x16f/0x360 [hfi1]
 __fput+0xe7/0x210
 ____fput+0xe/0x10

or

kernel BUG at mm/slub.c:3877!
RIP: 0010:kfree+0x14f/0x170
Call Trace:
 hfi1_free_ctxtdata+0x19a/0x2b0 [hfi1]
 ? hfi1_user_exp_rcv_grp_free+0x73/0x80 [hfi1]
 hfi1_file_close+0x20f/0x360 [hfi1]
 __fput+0xe7/0x210
 ____fput+0xe/0x10

Fixes: Commit 62239fc6e5 ("IB/hfi1: Clean up on context initialization failure")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27 16:58:13 -04:00
Byczkowski, Jakub bec7c79cd8 IB/hfi1: Modify handling of physical link state by Host Driver
Ensure states returned to the Fabric Manager are consistent with
the OPA specification by caching the physical state along with the
logical state.

Reviewed-by: Stuart Summers <john.s.summers@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Andrzej Kotlowski <andrzej.kotlowski@intel.com>
Signed-off-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27 16:58:12 -04:00
Michael J. Ruhl bb7dde8784 IB/hfi1: Replace deprecated pci functions with new API
pci_enable_msix_range() and pci_disable_msix() have been deprecated.
Updating to the new pci_alloc_irq_vectors() interface.

Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27 16:58:11 -04:00
Don Hiatt 9039746cdf IB/hfi1: Setup common IB fields in hfi1_packet struct
We move many common IB fields into the hfi1_packet structure and
set them up in a single function. This allows us to set the fields
in a single place and not deal with them throughout the driver.

Reviewed-by: Brian Welty <brian.welty@intel.com>
Reviewed-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27 16:56:33 -04:00
Don Hiatt 228d2af1b7 IB/hfi1: Separate input/output header tracing
Calls to trace incoming packets will now receive the packet
context as parameter. This enables trace support for future
packet types.

Header trace output is in the format <field>:<value>
which makes parsing easier.

input_ibhdr trace before change:
<idle>-0     [001] d.h.  5904.250925: input_ibhdr: [0000:05:00.0] vl 0
lver 0 sl 0 lnh 2,LRH_BTH dlid 0002 len 18 slid 0001 op
0x64,UD_SEND_ONLY se 0 m 0 pad 0 tver 0 pkey 0xffff f 0 b 0 qpn 0x000001
a 0 psn 0x000001b2 deth qkey 0x80010000 sqpn 0x000001

input_ibhdr trace after change:
<idle>-0     [001] d.h.  6655.714488: input_ibhdr: [0000:05:00.0] (IB)
len:124 sc:0 dlid:0x0001 slid:0x0002 lnh:2,LRH_BTH lver:0 sl:0  age:0
becn:0 fecn:0 l4:0 rc:0 entropy:0 op:0x64,UD_SEND_ONLY se:0 m:0 pad:0
tver:0 pkey:0x7fff f:0 b:0 qpn:0x000001 a:0 psn:0x00000036 hlen:8 deth
qkey:0x80010000 sqpn:0x000001

Reviewed-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27 16:56:33 -04:00
Byczkowski, Jakub b3e6b4bdbb RDMA/hfi1: Defer setting VL15 credits to link-up interrupt
Keep VL15 credits at 0 during LNI, before link-up. Store
VL15 credits value during verify cap interrupt and set
in after link-up. This addresses an issue where VL15 MAD
packets could be sent by one side of the link before
the other side is ready to receive them.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-01 17:04:20 -04:00
Linus Torvalds 3341713c67 Updates #2 for 4.12 kernel merge window
- mlx5/IPoIB fixup patch
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZC7idAAoJELgmozMOVy/d8bEP/31hq3Vr1unNJYjU8yEaWI0l
 D/k3ZSJcnLaeoUE5Ts69pYVkncAM/g55bmS80bKyNONRJp0BMU0MxmmzQC/iMqMM
 LAvDf3Qyv8rTrEgjjmjWVQrP/v7K0+teD6YMmTkfB8fWqhi/vR+84Sv3c50GUXpL
 OZMrn6p/IXptWa24kvi5/mKNSmeOBQkp/1lRxyi6bhAcFhjwQZoO4tDkOL6Aj1oA
 FFeUcBDwMsOjbqhH59sM9ryOnGXPEwJUnLOPfq36pIrFzyUtbbcmFfMmefjJWMLK
 XW4VBXywYLzkCvr2tGeXMukJzroUPQhD6G/4uwnFlwad0tovlBCwPmOmXHbnhRbh
 JmriZaQEGj43o3yH/HHSeRvbwJ7e71xbH0tW21K9cB/CuYTixwlvYAMd218qXQ+k
 Q6qBRueWlgsaDatlkuRV0ezrILvGQhn6QgEbCUIluDjKp/d6aRtI4IM60Y5G02U/
 WLvnMQJ/MaGQXPVR1gWqWpy1ojvj/mAeTQf6aKw1Icjd3oSPgAI06B/uThXoWxh/
 FyvDl/+HNW2StxPudQPKWkMjamaD/cToSa4HjLjdPER9HiCHmUru6nRZOufhwtwh
 q2NEPx5kraRQNEMqjpopzNIWPpGBNBm+htkqXmTRpDdfvKgQ29vL49GCcV1VHtj9
 TpYkqu/6ZVPItAAfyklh
 =Ihe9
 -----END PGP SIGNATURE-----
mergetag object 67cf3623e0
 type commit
 tag for-next
 tagger Doug Ledford <dledford@redhat.com> 1493940800 -0400
 
 Updates #3 for 4.12 kernel merge window
 
 - The hfi1 15 patch set that landed late
 - IPoIB get_link_ksettings which landed late because I asked for a
   respin
 - One late rxe change
 - One -rc worthy fix that's in early
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZC7q5AAoJELgmozMOVy/dungQAKpWG7sq+23PhsXDTOGtKSRg
 6jrgt5lG+V4twAJogXazxYsanPLQOE6ItKUFaBqEh2fULGyzXyPv1PsLNAVi4MYR
 6vbsMh2qCW8+dy+uEBveirQzdwdJr8U7g1mvJYPDr1Rmaw852Sq+PbId3nEy/cFK
 boMn/zhSSJuF8VsTfFCHx1gVdRN7n7gclqBP7Phq4bzdZ8E0u5GIPjmPDePUTpwH
 r3yFSC3VV5JGGgOSUc66oPPoIgraAG9EQezn01Llk5n+HPJ3eM91spV1Y9tZ7EgA
 toNW+bxEmanFYtuR+qzd0ctozUt2Ke6AUdTgcZcSCFkIan2ZOssJ0y2UN3CsNxlQ
 rYcOBe5KBNcFVb5E3NoeG+iHo9jT86fBqzYheHKmQJy38xalAlOoX6mLOvn4pPzo
 DFYDd6PpEU9SQSmhpPj1UbhvCOzecBaV3cRF/TihnqlOrRiKtd/YCccqT5GguPCB
 RVW6h1qQ94rmr/k6gobT7lNDWn2bZAgUupUD19Nw+7YwauxjGhSFsXW357NVcs4B
 3a/uLNVvnoyIjm62Sm8YFz9Bz3pRutfP9b6pSJ5V4P7dBgXYFZU35buViwcAj5fL
 DNGx4a8Poi3VQR8lQjpK6MRCWAibUY27zW1nbWQrQ9TdcHhK4sJC9Mqe41W4FXpU
 qQLkYBChz+mkCzE44bN/
 =H9f5
 -----END PGP SIGNATURE-----

Merge tags 'for-linus' and 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull more rdma updates from Doug Ledford:
 "As mentioned in my first pull request, this is the subsequent pull
  requests I had. This is all I have, and in fact this cleans out the
  RDMA subsystem's entire patchworks queue of kernel changes that are
  ready to go (well, it did for the weekend anyway, a few new patches
  are in, but they'll be coming during the -rc cycle).

  The first tag contains a single patch that would have conflicted if
  taken from my tree or DaveM's tree as it needed our trees merged to
  come cleanly.

  The second tag contains the patch series from Intel plus three other
  stragllers that came in late last week. I took them because it allowed
  me to legitimately claim that the RDMA patchworks queue was, for a
  short time, 100% cleared of all waiting kernel patches, woohoo! :-).

  I have it under my for-next tag, so it did get 0day and linux- next
  over the end of last week, and linux-next did show one minor conflict.

  Summary:

  'for-linus' tag:
   - mlx5/IPoIB fixup patch

  'for-next' tag:
   - the hfi1 15 patch set that landed late
   - IPoIB get_link_ksettings which landed late because I asked for a
     respin
   - one late rxe change
   - one -rc worthy fix that's in early"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  IB/mlx5: Enable IPoIB acceleration

* tag 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  rxe: expose num_possible_cpus() cnum_comp_vectors
  IB/rxe: Update caller's CRC for RXE_MEM_TYPE_DMA memory type
  IB/hfi1: Clean up on context initialization failure
  IB/hfi1: Fix an assign/ordering issue with shared context IDs
  IB/hfi1: Clean up context initialization
  IB/hfi1: Correctly clear the pkey
  IB/hfi1: Search shared contexts on the opened device, not all devices
  IB/hfi1: Remove atomic operations for SDMA_REQ_HAVE_AHG bit
  IB/hfi1: Use filedata rather than filepointer
  IB/hfi1: Name function prototype parameters
  IB/hfi1: Fix a subcontext memory leak
  IB/hfi1: Return an error on memory allocation failure
  IB/hfi1: Adjust default eager_buffer_size to 8MB
  IB/hfi1: Get rid of divide when setting the tx request header
  IB/hfi1: Fix yield logic in send engine
  IB/hfi1, IB/rdmavt: Move r_adefered to r_lock cache line
  IB/hfi1: Fix checks for Offline transient state
  IB/ipoib: add get_link_ksettings in ethtool
2017-05-08 20:07:29 -07:00