Commit Graph

5026 Commits

Author SHA1 Message Date
Bart Van Assche 8f5ba10ed4 IB core: Fix ib_sg_to_pages()
On 12/03/2015 01:18 AM, Christoph Hellwig wrote:
> The patch looks good to me, but while we touch this area, how about
> throwing in a few cosmetic fixes as well?

How about the patch below ? In that version of the ib_sg_to_pages() fix
these concerns have been addressed and additionally to more bugs have been fixed.

------------

[PATCH] IB core: Fix ib_sg_to_pages()

Fix the code for detecting gaps. A gap occurs not only if the
second or later scatterlist element is not aligned but also if
any scatterlist element other than the last does not end at a
page boundary.

In the code for coalescing contiguous elements, ensure that
mr->length is correct and that last_page_addr is up-to-date.

Ensure that this function returns a negative
error code instead of zero if the first set_page() call fails.

Fixes: commit 4c67e2bfc8 ("IB/core: Introduce new fast registration API")
Reported-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-07 17:20:12 -05:00
Bart Van Assche 57b0be9c0f IB/srp: Fix srp_map_sg_fr()
After dma_map_sg() has been called the return value of that function
must be used as the number of elements in the scatterlist instead of
scsi_sg_count().

Fixes: commit f7f7aab1a5 ("IB/srp: Convert to new registration API")
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: stable <stable@vger.kernel.org> # v4.4+
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-07 17:20:11 -05:00
Bart Van Assche a745f4f410 IB/srp: Fix indirect data buffer rkey endianness
Detected by sparse.

Fixes: commit 330179f2fa ("IB/srp: Register the indirect data buffer descriptor")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: stable <stable@vger.kernel.org> # v4.3+
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-07 17:20:11 -05:00
Christoph Hellwig fc925518aa IB/srp: Initialize dma_length in srp_map_idb
Without this sg_dma_len will return 0 on architectures tha have
the dma_length field.

Fixes: commit f7f7aab1a5 ("IB/srp: Convert to new registration API")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-07 17:20:11 -05:00
Sagi Grimberg 09c0c0bea5 IB/srp: Fix possible send queue overflow
When using work request based memory registration (fast_reg)
we must reserve SQ entries for registration and invalidation
in addition to send operations. Each IO consumes 3 SQ entries
(registration, send, invalidation) so we need to allocate 3x
larger send-queue instead of 2x.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-07 17:20:11 -05:00
Bart Van Assche 4d59ad2995 IB/srp: Fix a memory leak
If srp_connect_ch() returns a positive value then that is considered
by its caller as a connection failure but this does not result in a
scsi_host_put() call and additionally causes the srp_create_target()
function to return a positive value while it should return a negative
value. Avoid all this confusion and additionally fix a memory leak by
ensuring that srp_connect_ch() always returns a value that is <= 0.
This patch avoids that a rejected login triggers the following memory
leak:

unreferenced object 0xffff88021b24a220 (size 8):
  comm "srp_daemon", pid 56421, jiffies 4295006762 (age 4240.750s)
  hex dump (first 8 bytes):
    68 6f 73 74 35 38 00 a5                          host58..
  backtrace:
    [<ffffffff8151014a>] kmemleak_alloc+0x7a/0xc0
    [<ffffffff81165c1e>] __kmalloc_track_caller+0xfe/0x160
    [<ffffffff81260d2b>] kvasprintf+0x5b/0x90
    [<ffffffff81260e2d>] kvasprintf_const+0x8d/0xb0
    [<ffffffff81254b0c>] kobject_set_name_vargs+0x3c/0xa0
    [<ffffffff81337e3c>] dev_set_name+0x3c/0x40
    [<ffffffff81355757>] scsi_host_alloc+0x327/0x4b0
    [<ffffffffa03edc8e>] srp_create_target+0x4e/0x8a0 [ib_srp]
    [<ffffffff8133778b>] dev_attr_store+0x1b/0x20
    [<ffffffff811f27fa>] sysfs_kf_write+0x4a/0x60
    [<ffffffff811f1e8e>] kernfs_fop_write+0x14e/0x180
    [<ffffffff81176eef>] __vfs_write+0x2f/0xf0
    [<ffffffff811771e4>] vfs_write+0xa4/0x100
    [<ffffffff81177c64>] SyS_write+0x54/0xc0
    [<ffffffff8151b257>] entry_SYSCALL_64_fastpath+0x12/0x6f

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-07 17:20:11 -05:00
Kaike Wan 3ebd2fd0d0 IB/sa: Put netlink request into the request list before sending
It was found by Saurabh Sengar that the netlink code tried to allocate
memory with GFP_KERNEL while holding a spinlock. While it is possible
to fix the issue by replacing GFP_KERNEL with GFP_ATOMIC, it is better
to get rid of the spinlock while sending the packet. However, in order
to protect against a race condition that a quick response may be received
before the request is put on the request list, we need to put the request
on the list first.

Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reported-by: Saurabh Sengar <saurabh.truth@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-07 16:43:01 -05:00
Arnd Bergmann 2c63d1072a IB/iser: use sector_div instead of do_div
do_div is the wrong way to divide a sector_t, as it is less
efficient when sector_t is 32-bit wide. With the upcoming
do_div optimizations, the kernel starts warning about this:

drivers/infiniband/ulp/iser/iser_verbs.c:1296:4: note: in expansion of macro 'do_div'
include/asm-generic/div64.h:224:22: warning: passing argument 1 of '__div64_32' from incompatible pointer type

This changes the code to use sector_div instead, which always
produces optimal code.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-07 16:42:33 -05:00
Mike Marciniszyn d144da8c6f IB/core: use RCU for uverbs id lookup
The current implementation gets a spin_lock, and at any scale with
qib and hfi1 post send, the lock contention grows exponentially
with the number of QPs.

idr_find() is RCU compatibile, so read doesn't need the lock.

Change to use rcu_read_lock() and rcu_read_unlock() in
__idr_get_uobj().

kfree_rcu() is used to insure a grace period between the
idr removal and actual free.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-07 16:39:26 -05:00
Easwar Hariharan 57ab251213 IB/qib: Minor fixes to qib per SFF 8636
Minor errors found via code inspection during future development.
SFF 8636 defines bit position 2 to hold the status indication of
QSFP memory paging. The mask used to test for the value was
incorrect and is fixed in this patch. Additionally, the dump
function had a mismatch between the field being printed out and
the field used to source the data which was fixed.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reported-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-07 16:36:00 -05:00
Mike Marciniszyn 1d784b890c IB/core: Fix user mode post wr corruption
Commit e622f2f4ad ("IB: split struct ib_send_wr")
introduced a regression for HCAs whose user mode post
sends go through ib_uverbs_post_send().

The code didn't account for the fact that the first sge is
offset by an operation dependent length.  The allocation did,
but the pointer to the destination sge list is computed without
that knowledge.  The sge list copy_from_user() then corrupts
fields in the work request

Store the operation dependent length in a local variable and
compute the sge list copy_from_user() destination using that length.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-07 16:22:14 -05:00
Ira Weiny 785f742223 IB/qib: Fix qib_mr structure
struct qib_mr requires the mr member be the last because struct
qib_mregion contains a dynamic array at the end.  The additions
of members should have been placed before this structure as the
comment noted.

Failure to do so was causing random memory corruption.  Reproducing
this bug was easy to do by running the client and server of
ib_write_bw -s 8 -n 5 on the same node.

This BUG() was tripped in a slab debug kernel:

kernel BUG at mm/slab.c:2572!

Fixes: 38071a461f ("IB/qib: Support the new memory registration API")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-07 16:22:14 -05:00
Or Gerlitz f1b4e12a9a IB/mlx4: Use the VF base-port when demuxing mad from wire
Under HA mode, it's possible that the VF registered its GID
(and expects to get mads through the PV scheme) on a port which is
different from the one this mad arrived on, due to HA fail over.

Therefore, if the gid is not matched on the port that the packet arrived
on, check for a match on the other port if HA mode is active -- and if a
match is found on the other port, continue processing the mad using that
other port.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-06 22:40:45 -05:00
Linus Torvalds d83763f4a6 SCSI misc on 20151113
Sorry for the delay in this patch which was mostly caused by getting the
 merger of the mpt2/mpt3sas driver, which was seen as an essential item of
 maintenance work to do before the drivers diverge too much.  Unfortunately,
 this caused a compile failure (detected by linux-next), which then had to be
 fixed up and incubated.  In addition to the mpt2/3sas rework, there are
 updates from pm80xx, lpfc, bnx2fc, hpsa, ipr, aacraid, megaraid_sas, storvsc
 and ufs plus an assortment of changes including some year 2038 issues, a fix
 for a remove before detach issue in some drivers and a couple of other minor
 issues.
 
 Signed-off-by: James Bottomley <JBottomley@Odin.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABAgAGBQJWRnpiAAoJEDeqqVYsXL0MJd0IAIkNP1q/6ksMw/lam2UlbdxN
 zFsFOIhGM3xLnBFehbCx7JW3cmybA7WC5jhMjaa1OeJmYHLpwbTzvVwrLs2l/0y1
 /S+G0wwxROZfIKj/1EO3oKbSPCu9N3lStxOnmNXt5PSUEzAXrTqSNTtnMf2ZGh7j
 bQxTWEJe+66GckgGw4ozTXJHWXqM/Zs/FsYjn2h/WzFhFv8utr7zRSzHMVjcqpLG
 TK/Lt03hiTGqiignKrV9W6JzdGTWf2LGIsj/njgR0dxv59cNH8PrHIjZyknSBxgn
 lblinsCjxDHWnP4BSfTi9MQG1lEiZiWO3Y6TKkKJTgxZ9M0Eitspc+cLOiJ1mpg=
 =HvQf
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull final round of SCSI updates from James Bottomley:
 "Sorry for the delay in this patch which was mostly caused by getting
  the merger of the mpt2/mpt3sas driver, which was seen as an essential
  item of maintenance work to do before the drivers diverge too much.
  Unfortunately, this caused a compile failure (detected by linux-next),
  which then had to be fixed up and incubated.

  In addition to the mpt2/3sas rework, there are updates from pm80xx,
  lpfc, bnx2fc, hpsa, ipr, aacraid, megaraid_sas, storvsc and ufs plus
  an assortment of changes including some year 2038 issues, a fix for a
  remove before detach issue in some drivers and a couple of other minor
  issues"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (141 commits)
  mpt3sas: fix inline markers on non inline function declarations
  sd: Clear PS bit before Mode Select.
  ibmvscsi: set max_lun to 32
  ibmvscsi: display default value for max_id, max_lun and max_channel.
  mptfusion: don't allow negative bytes in kbuf_alloc_2_sgl()
  scsi: pmcraid: replace struct timeval with ktime_get_real_seconds()
  mvumi: 64bit value for seconds_since1970
  be2iscsi: Fix bogus WARN_ON length check
  scsi_scan: don't dump trace when scsi_prep_async_scan() is called twice
  mpt3sas: Bump mpt3sas driver version to 09.102.00.00
  mpt3sas: Single driver module which supports both SAS 2.0 & SAS 3.0 HBAs
  mpt2sas, mpt3sas: Update the driver versions
  mpt3sas: setpci reset kernel oops fix
  mpt3sas: Added OEM Gen2 PnP ID branding names
  mpt3sas: Refcount fw_events and fix unsafe list usage
  mpt3sas: Refcount sas_device objects and fix unsafe list usage
  mpt3sas: sysfs attribute to report Backup Rail Monitor Status
  mpt3sas: Ported WarpDrive product SSS6200 support
  mpt3sas: fix for driver fails EEH, recovery from injected pci bus error
  mpt3sas: Manage MSI-X vectors according to HBA device type
  ...
2015-11-13 20:35:54 -08:00
Linus Torvalds 9aa3d651a9 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "This series contains HCH's changes to absorb configfs attribute
  ->show() + ->store() function pointer usage from it's original
  tree-wide consumers, into common configfs code.

  It includes usb-gadget, target w/ drivers, netconsole and ocfs2
  changes to realize the improved simplicity, that now renders the
  original include/target/configfs_macros.h CPP magic for fabric drivers
  and others, unnecessary and obsolete.

  And with common code in place, new configfs attributes can be added
  easier than ever before.

  Note, there are further improvements in-flight from other folks for
  v4.5 code in configfs land, plus number of target fixes for post -rc1
  code"

In the meantime, a new user of the now-removed old configfs API came in
through the char/misc tree in commit 7bd1d4093c ("stm class: Introduce
an abstraction for System Trace Module devices").

This merge resolution comes from Alexander Shishkin, who updated his stm
class tracing abstraction to account for the removal of the old
show_attribute and store_attribute methods in commit 517982229f
("configfs: remove old API") from this pull.  As Alexander says about
that patch:

 "There's no need to keep an extra wrapper structure per item and the
  awkward show_attribute/store_attribute item ops are no longer needed.

  This patch converts policy code to the new api, all the while making
  the code quite a bit smaller and easier on the eyes.

  Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>"

That patch was folded into the merge so that the tree should be fully
bisectable.

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (23 commits)
  configfs: remove old API
  ocfs2/cluster: use per-attribute show and store methods
  ocfs2/cluster: move locking into attribute store methods
  netconsole: use per-attribute show and store methods
  target: use per-attribute show and store methods
  spear13xx_pcie_gadget: use per-attribute show and store methods
  dlm: use per-attribute show and store methods
  usb-gadget/f_serial: use per-attribute show and store methods
  usb-gadget/f_phonet: use per-attribute show and store methods
  usb-gadget/f_obex: use per-attribute show and store methods
  usb-gadget/f_uac2: use per-attribute show and store methods
  usb-gadget/f_uac1: use per-attribute show and store methods
  usb-gadget/f_mass_storage: use per-attribute show and store methods
  usb-gadget/f_sourcesink: use per-attribute show and store methods
  usb-gadget/f_printer: use per-attribute show and store methods
  usb-gadget/f_midi: use per-attribute show and store methods
  usb-gadget/f_loopback: use per-attribute show and store methods
  usb-gadget/ether: use per-attribute show and store methods
  usb-gadget/f_acm: use per-attribute show and store methods
  usb-gadget/f_hid: use per-attribute show and store methods
  ...
2015-11-13 20:04:17 -08:00
Christoph Hellwig 64d513ac31 scsi: use host wide tags by default
This patch changes the !blk-mq path to the same defaults as the blk-mq
I/O path by always enabling block tagging, and always using host wide
tags.  We've had blk-mq available for a few releases so bugs with
this mode should have been ironed out, and this ensures we get better
coverage of over tagging setup over different configs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-11-09 17:11:57 -08:00
Linus Torvalds ad804a0b2a Merge branch 'akpm' (patches from Andrew)
Merge second patch-bomb from Andrew Morton:

 - most of the rest of MM

 - procfs

 - lib/ updates

 - printk updates

 - bitops infrastructure tweaks

 - checkpatch updates

 - nilfs2 update

 - signals

 - various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc,
   dma-debug, dma-mapping, ...

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits)
  ipc,msg: drop dst nil validation in copy_msg
  include/linux/zutil.h: fix usage example of zlib_adler32()
  panic: release stale console lock to always get the logbuf printed out
  dma-debug: check nents in dma_sync_sg*
  dma-mapping: tidy up dma_parms default handling
  pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
  kexec: use file name as the output message prefix
  fs, seqfile: always allow oom killer
  seq_file: reuse string_escape_str()
  fs/seq_file: use seq_* helpers in seq_hex_dump()
  coredump: change zap_threads() and zap_process() to use for_each_thread()
  coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
  signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)
  signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()
  signal: turn dequeue_signal_lock() into kernel_dequeue_signal()
  signals: kill block_all_signals() and unblock_all_signals()
  nilfs2: fix gcc uninitialized-variable warnings in powerpc build
  nilfs2: fix gcc unused-but-set-variable warnings
  MAINTAINERS: nilfs2: add header file for tracing
  nilfs2: add tracepoints for analyzing reading and writing metadata files
  ...
2015-11-07 14:32:45 -08:00
Linus Torvalds ab9f2faf8f Initial 4.4 merge window submission
- "Checksum offload support in user space" enablement
 - Misc cxgb4 fixes, add T6 support
 - Misc usnic fixes
 - 32 bit build warning fixes
 - Misc ocrdma fixes
 - Multicast loopback prevention extension
 - Extend the GID cache to store and return attributes of GIDs
 - Misc iSER updates
 - iSER clustering update
 - Network NameSpace support for rdma CM
 - Work Request cleanup series
 - New Memory Registration API
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWPO5UAAoJELgmozMOVy/dSCQP/iX2ImMZOS3VkOYKhLR3dSv8
 4vTEiYIoAT1JEXiPpiabuuACwotcZcMRk9kZ0dcWmBoFusTzKJmoDOkgAYd95XqY
 EsAyjqtzUGNNMjH5u5W+kdbaFdH9Ktq7IJvspRlJuvzC47Srax+qBxX01jrAkDgh
 4PoA3hEa2KkvkDjY2Mhvk9EWd/uflO9Ky6o0D8jUQkWtEvKBRyDjQLk30oW6wHX9
 pTWqww3dD0EXTrR+PDA88v2saKH1kZFU1Nt2eU8Bw+zlJM8hcX6U7PfRX0g3HT/J
 o+7ejTdLPWFDH35gJOU+KE519f1JbwfRjPJCqbOC9IttBB7iHSbhcpQLpWv4JV1x
 agdBeDA3TGQj3dHb2SkYMlWXCBp7q8UCbVGvvirTFzGSGU73sc6hhP+vCKvPQIlE
 Ah5tUqD7Y3mOBjvuDeIzKMLXILd5d3cH+m7Laytrf5e7fJPmBRZyOkcMh0QVElyl
 mKo+PFjghgeTFb405J7SDDw/vThVyN9HyIt7AGEzObaajzOOk9R1hkQr46XVy9TK
 yi58fl85yQ2n6TWV6NRnvkQoMy/N2HAEuXk/7HtO0PabV5w3Lo0zvXB9SnVrrVEm
 58FWRBYCWorVSdSacuDnPm0iz45WSRIb9G9sBlhEC93eXRq2rSBoy4RvyLeliHFH
 hllyhNNolI6FJ64j07Xm
 =bBIY
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma updates from Doug Ledford:
 "This is my initial round of 4.4 merge window patches.  There are a few
  other things I wish to get in for 4.4 that aren't in this pull, as
  this represents what has gone through merge/build/run testing and not
  what is the last few items for which testing is not yet complete.

   - "Checksum offload support in user space" enablement
   - Misc cxgb4 fixes, add T6 support
   - Misc usnic fixes
   - 32 bit build warning fixes
   - Misc ocrdma fixes
   - Multicast loopback prevention extension
   - Extend the GID cache to store and return attributes of GIDs
   - Misc iSER updates
   - iSER clustering update
   - Network NameSpace support for rdma CM
   - Work Request cleanup series
   - New Memory Registration API"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (76 commits)
  IB/core, cma: Make __attribute_const__ declarations sparse-friendly
  IB/core: Remove old fast registration API
  IB/ipath: Remove fast registration from the code
  IB/hfi1: Remove fast registration from the code
  RDMA/nes: Remove old FRWR API
  IB/qib: Remove old FRWR API
  iw_cxgb4: Remove old FRWR API
  RDMA/cxgb3: Remove old FRWR API
  RDMA/ocrdma: Remove old FRWR API
  IB/mlx4: Remove old FRWR API support
  IB/mlx5: Remove old FRWR API support
  IB/srp: Dont allocate a page vector when using fast_reg
  IB/srp: Remove srp_finish_mapping
  IB/srp: Convert to new registration API
  IB/srp: Split srp_map_sg
  RDS/IW: Convert to new memory registration API
  svcrdma: Port to new memory registration API
  xprtrdma: Port to new memory registration API
  iser-target: Port to new memory registration API
  IB/iser: Port to new fast registration API
  ...
2015-11-07 13:33:07 -08:00
Mel Gorman 71baba4b92 mm, page_alloc: rename __GFP_WAIT to __GFP_RECLAIM
__GFP_WAIT was used to signal that the caller was in atomic context and
could not sleep.  Now it is possible to distinguish between true atomic
context and callers that are not willing to sleep.  The latter should
clear __GFP_DIRECT_RECLAIM so kswapd will still wake.  As clearing
__GFP_WAIT behaves differently, there is a risk that people will clear the
wrong flags.  This patch renames __GFP_WAIT to __GFP_RECLAIM to clearly
indicate what it does -- setting it allows all reclaim activity, clearing
them prevents it.

[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Mel Gorman d0164adc89 mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts.  They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve".  __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".

Over time, callers had a requirement to not block when fallback options
were available.  Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.

This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative.  High priority users continue to use
__GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.

This patch then converts a number of sites

o __GFP_ATOMIC is used by callers that are high priority and have memory
  pools for those requests. GFP_ATOMIC uses this flag.

o Callers that have a limited mempool to guarantee forward progress clear
  __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
  into this category where kswapd will still be woken but atomic reserves
  are not used as there is a one-entry mempool to guarantee progress.

o Callers that are checking if they are non-blocking should use the
  helper gfpflags_allow_blocking() where possible. This is because
  checking for __GFP_WAIT as was done historically now can trigger false
  positives. Some exceptions like dm-crypt.c exist where the code intent
  is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
  flag manipulations.

o Callers that built their own GFP flags instead of starting with GFP_KERNEL
  and friends now also need to specify __GFP_KSWAPD_RECLAIM.

The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.

The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL.  They may
now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Linus Torvalds 9cf5c095b6 asm-generic cleanups
The asm-generic changes for 4.4 are mostly a series from Christoph Hellwig
 to clean up various abuses of headers in there. The patch to rename the
 io-64-nonatomic-*.h headers caused some conflicts with new users, so I
 added a workaround that we can remove in the next merge window.
 
 The only other patch is a warning fix from Marek Vasut
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIVAwUAVjzaf2CrR//JCVInAQImmhAA20fZ91sUlnA5skKNPT1phhF6Z7UF2Sx5
 nPKcHQD3HA3lT1OKfPBYvCo+loYflvXFLaQThVylVcnE/8ecAEMtft4nnGW2nXvh
 sZqHIZ8fszTB53cynAZKTjdobD1wu33Rq7XRzg0ugn1mdxFkOzCHW/xDRvWRR5TL
 rdQjzzgvn2PNlqFfHlh6cZ5ykShM36AIKs3WGA0H0Y/aYsE9GmDOAUp41q1mLXnA
 4lKQaIxoeOa+kmlsUB0wEHUecWWWJH4GAP+CtdKzTX9v12bGNhmiKUMCETG78BT3
 uL8irSqaViNwSAS9tBxSpqvmVUsa5aCA5M3MYiO+fH9ifd7wbR65g/wq39D3Pc01
 KnZ3BNVRW5XSA3c86pr8vbg/HOynUXK8TN0lzt6rEk8bjoPBNEDy5YWzy0t6reVe
 wX65F+ver8upjOKe9yl2Jsg+5Kcmy79GyYjLUY3TU2mZ+dIdScy/jIWatXe/OTKZ
 iB4Ctc4MDe9GDECmlPOWf98AXqsBUuKQiWKCN/OPxLtFOeWBvi4IzvFuO8QvnL9p
 jZcRDmIlIWAcDX/2wMnLjV+Hqi3EeReIrYznxTGnO7HHVInF555GP51vFaG5k+SN
 smJQAB0/sostmC1OCCqBKq5b6/li95/No7+0v0SUhJJ5o76AR5CcNsnolXesw1fu
 vTUkB/I66Hk=
 =dQKG
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic cleanups from Arnd Bergmann:
 "The asm-generic changes for 4.4 are mostly a series from Christoph
  Hellwig to clean up various abuses of headers in there.  The patch to
  rename the io-64-nonatomic-*.h headers caused some conflicts with new
  users, so I added a workaround that we can remove in the next merge
  window.

  The only other patch is a warning fix from Marek Vasut"

* tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  asm-generic: temporarily add back asm-generic/io-64-nonatomic*.h
  asm-generic: cmpxchg: avoid warnings from macro-ized cmpxchg() implementations
  gpio-mxc: stop including <asm-generic/bug>
  n_tracesink: stop including <asm-generic/bug>
  n_tracerouter: stop including <asm-generic/bug>
  mlx5: stop including <asm-generic/kmap_types.h>
  hifn_795x: stop including <asm-generic/kmap_types.h>
  drbd: stop including <asm-generic/kmap_types.h>
  move count_zeroes.h out of asm-generic
  move io-64-nonatomic*.h out of asm-generic
2015-11-06 14:22:15 -08:00
David S. Miller b75ec3af27 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-11-01 00:15:30 -04:00
Bart Van Assche db7489e076 IB/core, cma: Make __attribute_const__ declarations sparse-friendly
Move the __attribute_const__ declarations such that sparse understands
that these apply to the function itself and not to the return type.
This avoids that sparse reports error messages like the following:

drivers/infiniband/core/verbs.c:73:12: error: symbol 'ib_event_msg' redeclared with different type (originally declared at include/rdma/ib_verbs.h:470) - different modifiers

Fixes: 2b1b5b6012 ("IB/core, cma: Nice log-friendly string helpers")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-30 17:57:49 -04:00
Sagi Grimberg 39bfc271bd IB/core: Remove old fast registration API
No callers and no providers left, go ahead and remove it.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-29 11:43:47 -04:00
Sagi Grimberg 13d3e895fa RDMA/nes: Remove old FRWR API
No ULP uses it anymore, go ahead and remove it.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 22:32:29 -04:00
Sagi Grimberg b8533eccc8 IB/qib: Remove old FRWR API
No ULP uses it anymore, go ahead and remove it.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 22:32:29 -04:00
Sagi Grimberg d3cfd002e6 iw_cxgb4: Remove old FRWR API
No ULP uses it anymore, go ahead and remove it.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 22:32:29 -04:00
Sagi Grimberg 94e585cb74 RDMA/cxgb3: Remove old FRWR API
No ULP uses it anymore, go ahead and remove it.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 22:27:19 -04:00
Sagi Grimberg 191cfed565 RDMA/ocrdma: Remove old FRWR API
No ULP uses it anymore, go ahead and remove it.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 22:27:19 -04:00
Sagi Grimberg e761c67fbf IB/mlx4: Remove old FRWR API support
No ULP uses it anymore, go ahead and remove it.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 22:27:19 -04:00
Sagi Grimberg dd01e66a6c IB/mlx5: Remove old FRWR API support
No ULP uses it anymore, go ahead and remove it.
Keep only the local invalidate part of the handlers.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 22:27:19 -04:00
Sagi Grimberg 9a21be531c IB/srp: Dont allocate a page vector when using fast_reg
The new fast registration API does not reuqire a page vector
so we can't avoid allocating it.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 22:27:19 -04:00
Sagi Grimberg 51c2b8e2ac IB/srp: Remove srp_finish_mapping
No callers left, remove it.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 22:27:19 -04:00
Sagi Grimberg f7f7aab1a5 IB/srp: Convert to new registration API
Instead of constructing a page list, call ib_map_mr_sg
and post a new ib_reg_wr. srp_map_finish_fr now returns
the number of sg elements registered.

Remove srp_finish_mapping since no one is calling it.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 22:27:19 -04:00
Sagi Grimberg 26630e8a09 IB/srp: Split srp_map_sg
This is a preparation patch for the new registration API
conversion. It splits srp_map_sg per registration strategy
(srp_map_sg[fmr|fr|dma]. On its own it adds some code duplication,
but it makes the API switch easier to comprehend.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 22:27:18 -04:00
Sagi Grimberg 16c2d702f2 iser-target: Port to new memory registration API
Remove fastreg page list allocation as the page vector
is now private to the provider. Instead of constructing
the page list and fast_req work request, call ib_map_mr_sg
and construct ib_reg_wr.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 22:27:18 -04:00
Sagi Grimberg 3940588500 IB/iser: Port to new fast registration API
Remove fastreg page list allocation as the page vector
is now private to the provider. Instead of constructing
the page list and fast_req work request, call ib_map_mr_sg
and construct ib_reg_wr.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 22:27:18 -04:00
Sagi Grimberg 0ba24dd39a RDMA/nes: Support the new memory registration API
Support the new memory registration API by allocating a
private page list array in nes_mr and populate it when
nes_map_mr_sg is invoked. Also, support IB_WR_REG_MR
by duplicating IB_WR_FAST_REG_MR handling and take the
needed information from different places:
- page_size, iova, length (ib_mr)
- page array (nes_mr)
- key, access flags (ib_reg_wr)

The IB_WR_FAST_REG_MR handlers will be removed later when
all the ULPs will be converted.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 22:27:18 -04:00
Sagi Grimberg 38071a461f IB/qib: Support the new memory registration API
Support the new memory registration API by allocating a
private page list array in qib_mr and populate it when
qib_map_mr_sg is invoked. Also, support IB_WR_REG_MR
by duplicating qib_fastreg_mr just take the needed information
from different places:
- page_size, iova, length (ib_mr)
- page array (qib_mr)
- key, access flags (ib_reg_wr)

The IB_WR_FAST_REG_MR handlers will be removed later when
all the ULPs will be converted.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 22:27:18 -04:00
Sagi Grimberg 8376b86de7 iw_cxgb4: Support the new memory registration API
Support the new memory registration API by allocating a
private page list array in c4iw_mr and populate it when
c4iw_map_mr_sg is invoked. Also, support IB_WR_REG_MR
by duplicating build_fastreg just take the needed information
from different places:
- page_size, iova, length (ib_mr)
- page array (c4iw_mr)
- key, access flags (ib_reg_wr)

The IB_WR_FAST_REG_MR handlers will be removed later when
all the ULPs will be converted.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 22:27:18 -04:00
Sagi Grimberg 14fb4171ab RDMA/cxgb3: Support the new memory registration API
Support the new memory registration API by allocating a
private page list array in iwch_mr and populate it when
iwch_map_mr_sg is invoked. Also, support IB_WR_REG_MR
by duplicating build_fastreg just take the needed information
from different places:
- page_size, iova, length (ib_mr)
- page array (iwch_mr)
- key, access flags (ib_reg_wr)

The IB_WR_FAST_REG_MR handlers will be removed later when
all the ULPs will be converted.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 22:27:18 -04:00
Sagi Grimberg 2eaa1c5647 RDMA/ocrdma: Support the new memory registration API
Support the new memory registration API by allocating a
private page list array in ocrdma_mr and populate it when
ocrdma_map_mr_sg is invoked. Also, support IB_WR_REG_MR
by duplicating IB_WR_FAST_REG_MR, but take the needed
information from different places:
- page_size, iova, length, access flags (ib_mr)
- page array (ocrdma_mr)
- key (ib_reg_wr)

The IB_WR_FAST_REG_MR handlers will be removed later when
all the ULPs will be converted.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 22:27:18 -04:00
Sagi Grimberg 1b2cd0fc67 IB/mlx4: Support the new memory registration API
Support the new memory registration API by allocating a
private page list array in mlx4_ib_mr and populate it when
mlx4_ib_map_mr_sg is invoked. Also, support IB_WR_REG_MR
by setting the exact WQE as IB_WR_FAST_REG_MR, just take the
needed information from different places:
- page_size, iova, length, access flags (ib_mr)
- page array (mlx4_ib_mr)
- key (ib_reg_wr)

The IB_WR_FAST_REG_MR handlers will be removed later when
all the ULPs will be converted.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 22:27:17 -04:00
Sagi Grimberg 8a187ee52b IB/mlx5: Support the new memory registration API
Support the new memory registration API by allocating a
private page list array in mlx5_ib_mr and populate it when
mlx5_ib_map_mr_sg is invoked. Also, support IB_WR_REG_MR
by setting the exact WQE as IB_WR_FAST_REG_MR, just take the
needed information from different places:
- page_size, iova, length, access flags (ib_mr)
- page array (mlx5_ib_mr)
- key (ib_reg_wr)

The IB_WR_FAST_REG_MR handlers will be removed later when
all the ULPs will be converted.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 22:27:17 -04:00
Sagi Grimberg a706000916 IB/mlx5: Remove dead fmr code
Just function declarations - no need for those
laying arround. If for some reason someone will want
FMR support in mlx5, it should be easy enough to restore
a few structs.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 22:27:17 -04:00
Sagi Grimberg 4c67e2bfc8 IB/core: Introduce new fast registration API
The new fast registration  verb ib_map_mr_sg receives a scatterlist
and converts it to a page list under the verbs API thus hiding
the specific HW mapping details away from the consumer.

The provider drivers are provided with a generic helper ib_sg_to_pages
that converts a scatterlist into a vector of page addresses. The
drivers can still perform any HW specific page address setting
by passing a set_page function pointer which will be invoked for
each page address. This allows drivers to avoid keeping a shadow
page vectors and convert them to HW specific translations by doing
extra copies.

This API will allow ULPs to remove the duplicated code of constructing
a page vector from a given sg list.

The send work request ib_reg_wr also shrinks as it will contain only
mr, key and access flags in addition.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 22:27:17 -04:00
Doug Ledford 63e8790d39 Merge branch 'wr-cleanup' into k.o/for-4.4 2015-10-28 22:23:34 -04:00
Doug Ledford eb14ab3ba1 Merge branch 'wr-cleanup' of git://git.infradead.org/users/hch/rdma into wr-cleanup
Signed-off-by: Doug Ledford <dledford@redhat.com>

Conflicts:
	drivers/infiniband/ulp/isert/ib_isert.c - Commit 4366b19ca5
	(iser-target: Change the recv buffers posting logic) changed the
	logic in isert_put_datain() and had to be hand merged
2015-10-28 22:21:09 -04:00
Guy Shapiro 95893dde99 IB/ucma: Take the network namespace from the process
Add support for network namespaces from user space. This is done by passing
the network namespace of the process instead of init_net.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 12:32:48 -04:00
Guy Shapiro fa20105e09 IB/cma: Add support for network namespaces
Add support for network namespaces in the ib_cma module. This is
accomplished by:

1. Adding network namespace parameter for rdma_create_id. This parameter is
   used to populate the network namespace field in rdma_id_private.
   rdma_create_id keeps a reference on the network namespace.
2. Using the network namespace from the rdma_id instead of init_net inside
   of ib_cma, when listening on an ID and when looking for an ID for an
   incoming request.
3. Decrementing the reference count for the appropriate network namespace
   when calling rdma_destroy_id.

In order to preserve the current behavior init_net is passed when calling
from other modules.

Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 12:32:48 -04:00
Haggai Eran 4be74b42a6 IB/cma: Separate port allocation to network namespaces
Keep a struct for each network namespace containing the IDRs for the RDMA
CM port spaces. The struct is created dynamically using the generic_net
mechanism.

This patch is internal infrastructure work for the following patches. In
this patch, init_net is statically used as the network namespace for
the new port-space API.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 12:32:48 -04:00
Guy Shapiro 565edd1d55 IB/addr: Pass network namespace as a parameter
Add network namespace support to the ib_addr module. For that, all the
address resolution and matching should be done using the appropriate
namespace instead of init_net.

This is achieved by:

1. Adding an explicit network namespace argument to exported function that
   require a namespace.
2. Saving the namespace in the rdma_addr_client structure.
3. Using it when calling networking functions.

In order to preserve the behavior of calling modules, &init_net is
passed as the parameter in calls from other modules. This is modified as
namespace support is added on more levels.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 12:32:47 -04:00
Sagi Grimberg 630c3183ce IB/iser: Enable SG clustering
iser is perfectly capable supporting SG clustering as it translates
the SG list to a page vector. Enabling SG clustering can dramatically
reduce the number of SG elements, which doesn't make much of a difference
at this point, but with arbitrary SG list support, reducing the
number of SG elements can benefit greatly as as it would reduce
the length of the HW descriptors array.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 12:26:06 -04:00
Sagi Grimberg dd0107a089 IB/iser: set block queue_virt_boundary
The block layer can reliably guarantee that SG lists won't
contain gaps (page unaligned) if a driver set the queue
virt_boundary.

With this setting the block layer will:
- refuse merges if bios are not aligned to the virtual boundary
- split bios/requests that are not aligned to the virtual boundary
- or, bounce buffer SG_IOs that are not aligned to the virtual boundary

Since iser is working in 4K page size, set the virt_boundary to
4K pages. With this setting, we can now safely remove the bounce
buffering logic in iser.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 12:26:06 -04:00
Linus Torvalds 410694e214 Changes for 4.3-rc6
6 serious fixes:
 
 1) Hold the mutex around the find and corresponding update of our gid
 2) The ifa list is rcu protected, copy its contents under rcu to avoid
    using a freed structure
 3) On error, netdev might be null, so check it before trying to release it
 4) On init, if workqueue alloc fails, fail init
 5) The new demux patches exposed a bug in mlx5 and ipath drivers, we need
    to use the payload P_Key to determine the P_Key the packet arrived on
    because the hardware doesn't tell us the truth
 6) Due to a couple convoluted error flows, it is possible for the CM to
    trigger a use_after_free and a double_free of rb nodes.  Add two
    checks to prevent that.  This code has worked for 10+ years.  It is
    likely that some of the recent changes have caused this issue to
    surface.  The current patch will protect us from nasty events for
    now while we track down why this is just now showing up.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWJ+zYAAoJELgmozMOVy/dpvIP/iDBxRy0tw8jRdJQNF7EUZ5S
 a40aXun/J7Ywl6znvWG/WepDriMDe/9n9qIll8ObZtn7VuNlCZhp/7fx0Bn+fVnR
 c4j1cIdhWeja7Wg1xI6K75kXzae8hIwn8aombaFwf4NhUH6qkqGMt72Wp0Y9KeIv
 d903VyDwEJgETLAgUPaS1STurEuYcQocESvuer66Qp704Q+EhSbYKf9UgiRT6bOm
 36ilI5vxtpPeDpI/YT4ubyD2R7r6XPTGRGs3R6/3W6vRaGLNJ6jq4+Fhthx137G2
 NoddGmBdxemt2u0hWwLKcMGb5JNFbOoHuOrJmH9AZO6OyIH8XzILeEiXosdRjX/d
 U7OC29fE1Ge5wBNwwm+rJx79HgOVyu1vc5AlcE8vbtImIugJT8OzNIbZoAMcm8CB
 RUieyn78PGrkZ1KjCbVpiJ7u6S07LBLq/GqPHTnZ7aen3klP3pm8rAfskM7quEvf
 UD3kFP3kFQKZ8PG0Q27XtkKixoBU0vie9YgF2ZLBbEfJpJDKT9HF3lr1nP2AjN2a
 23FxUMzFCncE2ewA7tXwbHkCY7x5zn/Og0xdbRKdw/q/EvVu9VLZz/eTASLT+we0
 v3YZQdkdSnU0rwULZY+26h1PeHba/aoeoN6x/tmlKLQvaCfhkAsoXDl+MGfY4oLk
 1f5iWBgn4kseBXg+N/c9
 =fpXl
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull infiniband fixes from Doug Ledford:
 "It's late in the game, I know, but these fixes seemed important enough
  to warrant a late pull request.  They all involve oopses or use after
  frees or corruptions.

  Six serious fixes:

   - Hold the mutex around the find and corresponding update of our gid

   - The ifa list is rcu protected, copy its contents under rcu to avoid
     using a freed structure

   - On error, netdev might be null, so check it before trying to
     release it

   - On init, if workqueue alloc fails, fail init

   - The new demux patches exposed a bug in mlx5 and ipath drivers, we
     need to use the payload P_Key to determine the P_Key the packet
     arrived on because the hardware doesn't tell us the truth

   - Due to a couple convoluted error flows, it is possible for the CM
     to trigger a use_after_free and a double_free of rb nodes.  Add two
     checks to prevent that.  This code has worked for 10+ years.  It is
     likely that some of the recent changes have caused this issue to
     surface.  The current patch will protect us from nasty events for
     now while we track down why this is just now showing up"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  IB/cm: Fix rb-tree duplicate free and use-after-free
  IB/cma: Use inner P_Key to determine netdev
  IB/ucma: check workqueue allocation before usage
  IB/cma: Potential NULL dereference in cma_id_from_event
  IB/core: Fix use after free of ifa
  IB/core: Fix memory corruption in ib_cache_gid_set_default_gid
2015-10-24 07:28:05 +09:00
Bart Van Assche 6c760b3dd5 iser-target: Remove an unused variable
Detected this by compiling with W=1.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-22 18:37:47 -04:00
Bart Van Assche 78fc3fc4cc IB/iser: Remove an unused variable
Detected this by compiling with W=1.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-22 18:37:14 -04:00
Matan Barak 10e07f13c0 IB/core: Remove smac and vlan id from path record
The GID cache accompanies every GID with attributes.
The GID attributes link the GID with its netdevice, which could be
resolved to smac and vlan id easily. Since we've added the netdevice
(ifindex and net) to the path record, storing the L2 attributes is
duplicated data and hence these attributes are removed.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 23:48:18 -04:00
Matan Barak aa744cc01f IB/core: Remove smac and vlan id from qp_attr and ah_attr
Smac and vlan id could be resolved from the GID attribute, and thus
these attributes aren't needed anymore. Removing them.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 23:48:18 -04:00
Matan Barak 5c266b2304 IB/cm: Remove the usage of smac and vid of qp_attr and cm_av
The cm and cma don't need to explicitly handle vlan and smac,
as they are resolved from the GID index now. Removing this
portion of code.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 23:48:18 -04:00
Matan Barak dbf727de74 IB/core: Use GID table in AH creation and dmac resolution
Previously, vlan id and source MAC were used from QP attributes. Since
the net device is now stored in the GID attributes, they could be used
instead of getting this information from the QP attributes.

IB_QP_SMAC, IB_QP_ALT_SMAC, IB_QP_VID and IB_QP_ALT_VID were removed
because there is no known libibverbs that uses them.

This commit also modifies the vendors (mlx4, ocrdma) drivers in order
to use the new approach.

ocrdma driver changes were done by Somnath Kotur <Somnath.Kotur@Avagotech.Com>

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 23:48:17 -04:00
Matan Barak 99b27e3b5d IB/cache: Add ib_find_gid_by_filter cache API
GID cache API users might want to search for GIDs with specific
attributes rather than just specifying GID, net device and port.
This is used in a later patch, where we find the sgid index by
L2 Ethernet attributes.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 23:48:17 -04:00
Matan Barak abae1b71dd IB/cma: cma_validate_port should verify the port and netdevice
Previously, cma_validate_port searched for GIDs in IB cache and then
tried to verify the found port. This could fail when there are
identical GIDs on both ports. In addition, netdevice should be taken
into account when searching the GID table.
Fixing cma_validate_port to search only the relevant port's cache
and netdevice.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 23:48:17 -04:00
Matan Barak c2c6ff1345 IB/cm: cm_init_av_by_path should find a GID by its netdevice
Previously, the CM has searched the cache for any sgid_index whose
GID matches the path's GID. Since the path record stores the net
device, the CM should now search only for GIDs which originated from
this net device.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 23:48:17 -04:00
Matan Barak ba36e37fd3 IB/core: Add netdev to path record
In order to find the sgid_index, one could just query the IB cache
with the correct GID and netdevice. Therefore, instead of storing
the L2 attributes directly in the path, we only store the
ifindex and net and use them later to get the sgid_index.
The vlan_id and smac L2 attributes are removed in a later patch.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 23:48:17 -04:00
Matan Barak d300ec528b IB/core: Expose and rename ib_find_cached_gid_by_port cache API
Sometime consumers might want to search for a GID in a specific port.
For example, when a WC arrives and we want to search the GID
that matches that port - it's better to search only the relevant
port.
Exposing and renaming ib_cache_gid_find_by_port in order to match
the naming convention of the module.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 23:48:17 -04:00
Matan Barak 55ee3ab2e4 IB/core: Add netdev and gid attributes paramteres to cache
Adding an ability to query the IB cache by a netdev and get the
attributes of a GID. These parameters are necessary in order to
successfully resolve the required GID (when the netdevice is known)
and get the Ethernet L2 attributes from a GID.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 23:48:17 -04:00
Eran Ben Elisha fbfb6625ea IB/mlx4: Add support for blocking multicast loopback QP creation user flag
MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK is now supported downstream.

In addition, this flag was supported only for IB_QPT_UD, now, with the
new implementation it is supported for all QP types.

Support IB_USER_VERBS_EX_CMD_CREATE_QP in order to get the flag from
user space using the extension create qp command.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 23:16:47 -04:00
Eran Ben Elisha 7b59f0f951 IB/mlx4: Add counter based implementation for QP multicast loopback block
Current implementation for MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK is not
supported when link layer is Ethernet.

This patch will add counter based implementation for multicast loopback
prevention. HW can drop multicast loopback packets if sender QP counter
index is equal to receiver QP counter index. If qp flag
MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK is set and link layer is Ethernet,
create a new counter and attach it to the QP so it will continue
receiving multicast loopback traffic but it's own.

The decision if to create a new counter is being made at the qp
modification to RTR after the QP's port is set. When QP is destroyed or
moved back to reset state, delete the counter.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 23:16:47 -04:00
Eran Ben Elisha 3ba8e31d5a IB/mlx4: Add IB counters table
This is an infrastructure step for allocating and attaching more than
one counter to QPs on the same port. Allocate a counters table and
manage the insertion and removals of the counters in load and unload of
mlx4 IB.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 23:16:47 -04:00
Eran Ben Elisha ddf9529be1 IB/core: Allow setting create flags in QP init attribute
Allow setting IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK at create_flags in
ib_uverbs_create_qp_ex.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 23:16:46 -04:00
Eran Ben Elisha 6d8a74972b IB/core: Extend ib_uverbs_create_qp
ib_uverbs_ex_create_qp follows the extension verbs
mechanism. New features (for example, QP creation flags
field which is added in a downstream patch) could used
via user-space libraries without breaking the ABI.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 23:16:46 -04:00
Hariprasad S 963cab5082 iw_cxgb4: Adds support for T6 adapter
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 23:16:38 -04:00
Selvin Xavier c6a7b0d7a5 RDMA/ocrdma: Bump up ocrdma version number to 11.0.0.0
Updating the version number to 11.0.0.0

Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 17:28:19 -04:00
Devesh Sharma af74d1956f RDMA/ocrdma: Prevent CQ-Doorbell floods
Changing CQ-Doorbell(DB) logic to prevent DB floods, it is supposed to be
pressed only if any hw CQE is polled. If cq-arm was requested
previously then don't bother about number of hw CQEs polled and
arm the CQ.

Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 17:28:19 -04:00
Naga Irrinki aeb922df2c RDMA/ocrdma: Check resource ids received in Async CQE
Some versions of the FW sends wrong QP or CQ IDs in the
Async CQE. Adding a check to see whether qp or cq structures
associated with the CQE is valid.

Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 17:28:19 -04:00
Selvin Xavier fb16d8c49e RDMA/ocrdma: Avoid a possible crash in ocrdma_rem_port_stats
debugfs_remove should be called before freeing the driver
stats resources to avoid any crash during ocrdma_remove.

Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 17:28:19 -04:00
Selvin Xavier 5a85f5e9d4 RDMA/ocrdma: Cleanup unused device list and rcu variables
ocrdma_dev_list is not used by the driver. So removing
the references of this variable. dev->rcu was introduced
for the ipv6 notifier for GID management. This is no longer
required as the GID management is outside the HW driver.

Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 17:28:18 -04:00
Hariprasad S 3dd9a5dc24 iw_cxgb4: reverse the ord/ird in the ESTABLISHED upcall
The ESTABLISHED event should have the peer's ord/ird so
swap the values in the event before the upcall.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 17:16:10 -04:00
Hariprasad S f57b780c00 iw_cxgb4: fix misuse of ep->ord for minimum ird calculation
When calculating the minimum ird in c4iw_accept_cr(), we need to always
have a value of at least 1 if the RTR message is a 0B read.  The code
was
incorrectly using ep->ord for this logic which was incorrectly adjusting
the ird and causing incorrect ord/ird negotiation when using MPAv2 to
negotiate these values.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 17:16:10 -04:00
Hariprasad S 158c776dba iw_cxgb4: pass the ord/ird in connect reply events
This allows client ULPs to get the negotiated ord/ird which is useful
to avoid stalling the SQ due to exceeding the ORD.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 17:16:10 -04:00
Hariprasad S 99718e59fa iw_cxgb4: detect fatal errors while creating listening filters
In c4iw_create_listen(), if we're using listen filters, then bail out
of the busy loop if the device becomes fatally dead

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 17:16:10 -04:00
Arnd Bergmann 5d1e623591 IB/core: avoid 32-bit warning
The INIT_UDATA() macro requires a pointer or unsigned long argument for
both input and output buffer, and all callers had a cast from when
the code was merged until a recent restructuring, so now we get

core/uverbs_cmd.c: In function 'ib_uverbs_create_cq':
core/uverbs_cmd.c:1481:66: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]

This makes the code behave as before by adding back the cast to
unsigned long.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 565197dd8f ("IB/core: Extend ib_uverbs_create_cq")
Reviewed-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 16:56:44 -04:00
Arnd Bergmann b61e564af8 RDMA/cxgb4: re-fix 32-bit build warning
Casting a pointer to __be64 produces a warning on 32-bit architectures:

drivers/infiniband/hw/cxgb4/mem.c:147:20: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
    req->wr.wr_lo = (__force __be64)&wr_wait;

This was fixed at least twice for this driver in different places,
and accidentally reverted once more. This puts the correct version
back in place.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 6198dd8d7a ("iw_cxgb4: 32b platform fixes")
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 16:56:28 -04:00
Geliang Tang 68a5e60436 IB/iser: fix a comment typo
Just fix a typo in the code comment.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Acked-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 16:41:54 -04:00
Insu Yun fe274c5aed usnic: correctly handle kzalloc return value
Since kzalloc returns memory address, not error code,
it should be checked whether it is null or not.

Signed-off-by: Insu Yun <wuninsu@gmail.com>
Reviewed-by: Dave Goodell <dgoodell@cisco.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 16:41:19 -04:00
Insu Yun 2c79dad895 usnic: correctly check failed allocation
Since ib_alloc_device returns allocated memory address, not error,
it should be checked as IS_NULL, not IS_ERR_OR_NULL.

Signed-off-by: Insu Yun <wuninsu@gmail.com>
Reviewed-by: Dave Goodell <dgoodell@cisco.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 16:41:19 -04:00
Doug Ledford fc81a06965 Merge branch 'k.o/for-4.3-v1' into k.o/for-4.4
Pick up the late fixes from the 4.3 cycle so we have them in our
next branch.
2015-10-21 16:40:21 -04:00
Doron Tsur 0ca81a2840 IB/cm: Fix rb-tree duplicate free and use-after-free
ib_send_cm_sidr_rep could sometimes erase the node from the sidr
(depending on errors in the process). Since ib_send_cm_sidr_rep is
called both from cm_sidr_req_handler and cm_destroy_id, cm_id_priv
could be either erased from the rb_tree twice or not erased at all.
Fixing that by making sure it's erased only once before freeing
cm_id_priv.

Fixes: a977049dac ('[PATCH] IB: Add the kernel CM implementation')
Signed-off-by: Doron Tsur <doront@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 15:43:12 -04:00
Haggai Eran ab3964ad2a IB/cma: Use inner P_Key to determine netdev
When discussing the patches to demux ids in rdma_cm instead of ib_cm, it
was decided that it is best to use the P_Key value in the packet headers.
However, the mlx5 and ipath drivers are currently unable to send correct
P_Key values in GMP headers. They always send using a single P_Key that is
set during the GSI QP initialization.

Change the rdma_cm code to look at the P_Key value that is part of the
packet payload as a workaround. Once the drivers are fixed this patch can
be reverted.

Fixes: 4c21b5bcef ("IB/cma: Add net_dev and private data checks to
RDMA CM")
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-20 14:16:51 -04:00
Sasha Levin 0174b381ca IB/ucma: check workqueue allocation before usage
Allocating a workqueue might fail, which wasn't checked so far and would
lead to NULL ptr derefs when an attempt to use it was made.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-20 13:35:51 -04:00
Haggai Eran b3b51f9f6f IB/cma: Potential NULL dereference in cma_id_from_event
If the lookup of a listening ID failed for an AF_IB request, the code
would try to call dev_put() on a NULL net_dev.

Fixes: be688195bd ("IB/cma: Fix net_dev reference leak with failed
requests")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-20 13:13:42 -04:00
Matan Barak 3909642034 IB/core: Fix use after free of ifa
When using ifup/ifdown while executing enum_netdev_ipv4_ips,
ifa could become invalid and cause use after free error.
Fixing it by protecting with RCU lock.

Fixes: 03db3a2d81 ('IB/core: Add RoCE GID table management')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-20 13:10:46 -04:00
David S. Miller 26440c835f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/asix_common.c
	net/ipv4/inet_connection_sock.c
	net/switchdev/switchdev.c

In the inet_connection_sock.c case the request socket hashing scheme
is completely different in net-next.

The other two conflicts were overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-20 06:08:27 -07:00
Ivan Vecera 47ea032533 drivers/net: get rid of unnecessary initializations in .get_drvinfo()
Many drivers initialize uselessly n_priv_flags, n_stats, testinfo_len,
eedump_len & regdump_len fields in their .get_drvinfo() ethtool op.
It's not necessary as these fields is filled in ethtool_get_drvinfo().

v2: removed unused variable
v3: removed another unused variable

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-16 00:24:10 -07:00
Linus Torvalds 58bd6e0602 Changes for 4.3-rc5
- Work around connection namespace lookup bug related to RoCE
 - Change usnic license to Dual GPL/BSD (was intended to be that way
   all along, but wasn't clear, permission from contributors was
   chased down)
 - Fix an issue between NFSoRDMA and mlx5 that could cause an oops
 - Fix leak of sendonly multicast groups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWHoT/AAoJELgmozMOVy/deK4QALETCToLcR5RRDR+QleFUvby
 FnP91Pu9zGOoiuP25FT5Ny0YAmTHd1KiDQBQHRe/NrYDCH2M/q8jFJSWZLwGrG6q
 8GYc1ieozGQMZvId3ZJnqUJUTEyJu9QtpiFFZJYJHriP6OShP1GiHJ/XTN9dvJ/u
 xcmViAYYIjjScjaY1MuYpxKITFwfZE0HtdvK7zzq+F9cpfmC//Zc0Po4V4o4Y9V3
 14WgbWZyhehmECKwN95hIY1pLySadgcCxoeUDHclQ3efKLar4tEC3SOM2QZsnNRc
 qlCHEZYeB5TRo0dF/2CYUMLfUHkMjnUpW2BiVDbQfmPio7lGUjh2SBAQjI5i1dEQ
 Wg69JH1TV7BYfRiwe7n49P/BJ2vIhCR2UjQrHjilZ/h6DPSfKy29hVSvOzb5xLeJ
 mwl/KSKxlfT2Z1SZy0yMlJfCm8tjPwf6WhOVwkFRAhYHD3Yf31EMVzD7gTtW2MXO
 n5S80k5ccJlXniPWjaqerhjOZHmwHViBmHNlN4zlDCRZeT9IuKDj5mi31f7HC4gx
 WqJtSjRxydpbGPKROHI4vrmfARPAKNrKhj8BiqxO5Cja+TiS2QeXXr+fbRwETrLS
 TjXWNfS3Boy564AJ8Gfug2wfBcHwY+31Uv2a6nrMmKi+wwVexF/ENOb/x9WHZrVo
 VqQVI2lUBH2LsmzadD9c
 =usb1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma updates from Doug Ledford:
 "We have four batched up patches for the current rc kernel.

  Two of them are small fixes that are obvious.

  One of them is larger than I would like for a late stage rc pull, but
  we found an issue in the namespace lookup code related to RoCE and
  this works around the issue for now (we allow a lookup with a
  namespace to succeed on RoCE since RoCE namespaces aren't implemented
  yet).  This will go away in 4.4 when we put in support for namespaces
  in RoCE devices.

  The last one is large in terms of lines, but is all legal and no
  functional changes.  Cisco needed to update their files to be more
  specific about their license.  They had intended the files to be dual
  licensed as GPL/BSD all along, and specified that in their module
  license tag, but their file headers were not up to par.  They
  contacted all of the contributors to get agreement and then submitted
  a patch to update the license headers in the files.

  Summary:

   - Work around connection namespace lookup bug related to RoCE

   - Change usnic license to Dual GPL/BSD (was intended to be that way
     all along, but wasn't clear, permission from contributors was
     chased down)

   - Fix an issue between NFSoRDMA and mlx5 that could cause an oops

   - Fix leak of sendonly multicast groups"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  IB/ipoib: For sendonly join free the multicast group on leave
  IB/cma: Accept connection without a valid netdev on RoCE
  xprtrdma: Don't require LOCAL_DMA_LKEY support for fastreg
  usnic: add missing clauses to BSD license
2015-10-15 13:44:35 -07:00
Doron Tsur 17b38fb890 IB/core: Fix memory corruption in ib_cache_gid_set_default_gid
When ib_cache_gid_set_default_gid is called from several threads,
updating the table could make find_gid fail, therefore a negative
index will be retruned and an invalid table entry will be used.
Locking find_gid as well fixes this problem.

Fixes: 03db3a2d81 ('IB/core: Add RoCE GID table management')
Signed-off-by: Doron Tsur <doront@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-15 12:35:54 -04:00
Christoph Hellwig adec640e03 mlx5: stop including <asm-generic/kmap_types.h>
<linux/highmem.h> is the placace the get the kmap type flags, asm-generic
files are generic implementations only to be used by architecture code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2015-10-15 00:21:10 +02:00
Christoph Hellwig 2eafd72939 target: use per-attribute show and store methods
This also allows to remove the target-specific old configfs macros, and
gets rid of the target_core_fabric_configfs.h header which only had one
function declaration left that could be moved to a better place.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org>
Acked-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-10-13 22:17:49 -07:00
Christoph Lameter 0b5c9279e5 IB/ipoib: For sendonly join free the multicast group on leave
When we leave the multicast group on expiration of a neighbor we
do not free the mcast structure. This results in a memory leak
that causes ib_dealloc_pd to fail and print a WARN_ON message
and backtrace.

Fixes: bd99b2e05c (IB/ipoib: Expire sendonly multicast joins)
Signed-off-by: Christoph Lameter <cl@linux.com>
Tested-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-13 16:43:59 -04:00
Christoph Hellwig 25556ae6b9 IB: remove xrc_remote_srq_num from struct ib_send_wr
The field is only initialized in mlx, but never used.

If we want to add proper XRC support it should be done with a new
struct ib_xrc_wr.

This shrinks the various WR structures by another 4 bytes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: Haggai Eran <haggaie@mellanox.com>
2015-10-08 11:09:11 +01:00
Christoph Hellwig e622f2f4ad IB: split struct ib_send_wr
This patch split up struct ib_send_wr so that all non-trivial verbs
use their own structure which embedds struct ib_send_wr.  This dramaticly
shrinks the size of a WR for most common operations:

sizeof(struct ib_send_wr) (old):	96

sizeof(struct ib_send_wr):		48
sizeof(struct ib_rdma_wr):		64
sizeof(struct ib_atomic_wr):		96
sizeof(struct ib_ud_wr):		88
sizeof(struct ib_fast_reg_wr):		88
sizeof(struct ib_bind_mw_wr):		96
sizeof(struct ib_sig_handover_wr):	80

And with Sagi's pending MR rework the fast registration WR will also be
down to a reasonable size:

sizeof(struct ib_fastreg_wr):		64

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [srp, srpt]
Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [sunrpc]
Tested-by: Haggai Eran <haggaie@mellanox.com>
Tested-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
2015-10-08 11:09:10 +01:00
Haggai Eran b8cab5dab1 IB/cma: Accept connection without a valid netdev on RoCE
The netdev checks recently added to RDMA CM expect a valid netdev to be
found for both InfiniBand and RoCE, but the code that find a netdev is
only implemented for InfiniBand.

Currently RoCE doesn't provide an API to find the netdev matching a
given set of parameters, so this patch just disables the netdev enforcement
for each incoming connections when the link layer is RoCE.

Fixes: 4c21b5bcef ("IB/cma: Add net_dev and private data checks to RDMA CM")
Reported-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-06 14:25:16 -04:00
Jeff Squyres 3805eade3b usnic: add missing clauses to BSD license
The usnic_verbs kernel module was clearly marked with the following in
its code:

  MODULE_LICENSE("Dual BSD/GPL");

However, we accidentally left a few clauses of the BSD text out of the
license header in all the source files.  This commit fixes that: all
the files are properly dual BSD/GPL-licensed.  Contributors that might
have been confused by this have been contacted to get their permission
and are Cc:ed here.

Cc: Benoit Taine <benoit.taine@lip6.fr>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Masanari Iida <standby24x7@gmail.com>
Cc: Matan Barak <matanb@mellanox.com>
Cc: Michael Wang <yun.wang@profitbricks.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Fabio Estevam <fabio.estevam@freescale.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-06 13:43:25 -04:00
Linus Torvalds 46c8217c4a Changes for 4.3-rc4
- Fixes for mlx5 related issues
 - Fixes for ipoib multicast handling
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWCfALAAoJELgmozMOVy/dc+MQAKoD6echYpTkWE0otMuHQcYf
 zMaVVots+JdRKpA6OqHYQHgKGA80z21BpnjGYwcwB5zB1zPrJwz4vxwGlOBHt01T
 xLBReFgSKyJlgOWLXKfPx4bXUdivOBKm203wY0dh+/dC/VROGYoiXYTmSDsfsuKa
 8OXT1kWgzRVLtqwqj5GSkgWvtFZ28CjKh6d9egjqcj9tpbh2UupQDZzMyOtZ52X6
 Nz/Vo3u4T7qjzlhHOlCwHCDw+97x0yvmvLY1mWweGPfKOnxtXjkzQmTQEpyzU5Mo
 EwcqJucrBnmjbLAIBMrbR1mzTUQeD4dHz1jx+EzWE0lVnRL3twe1UaY40176sNlm
 aCBA4bIOQ242r3IJ++ss15ol1k5hu7PYKRn9Q8d2sSbQGcSnCHe/YOutQQ+FTEFG
 yE9xiLL+pgT8koauROnxg66E3HDM78NGTpjP3EuG4r2Qwa1iFANPfDB6kikuv8bO
 rG3qUJcloEPvfatZY+h5QC4UCoB0/W1DAhlfzE3tPBYPmhSEgQDfEOzXTKDakeF0
 VB903bYrOL3CVOun4I7fLrDc1leVeiAUKqO2orZs3qIpRWvAKyV/VjolAusMv2+F
 /4xPyh95AEMTFfmZogOCofQFk3eOnkWpLdrVTYCKy3i6NVBoy2wHldrl+LuCAN/m
 r/DNRBmazShashbeU6wg
 =8+cX
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma fixes from Doug Ledford:
 - Fixes for mlx5 related issues
 - Fixes for ipoib multicast handling

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  IB/ipoib: increase the max mcast backlog queue
  IB/ipoib: Make sendonly multicast joins create the mcast group
  IB/ipoib: Expire sendonly multicast joins
  IB/mlx5: Remove pa_lkey usages
  IB/mlx5: Remove support for IB_DEVICE_LOCAL_DMA_LKEY
  IB/iser: Add module parameter for always register memory
  xprtrdma: Replace global lkey with lkey local to PD
2015-10-01 16:38:52 -04:00
Bodong Wang 070b399723 IB/mlx4: Report checksum offload cap for RAW QP when query device
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-28 22:10:15 -04:00
Linus Torvalds c91d707295 Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
 "This includes a iser-target series from Jenny + Sagi @ Mellanox that
  addresses the few remaining active I/O shutdown bugs, along with a
  patch to support zero-copy for immediate data payloads that gives a
  nice performance improvement for small block WRITEs.

  Also included are some recent >= v4.2 regression bug-fixes.  The most
  notable is a RCU conversion regression for SPC-3 PR registrations, and
  recent removal of obsolete RFC-3720 markers that introduced a login
  regression bug with MSFT iSCSI initiators.

  Thanks to everyone who has been testing + reporting bugs for v4.x"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  iscsi-target: Avoid OFMarker + IFMarker negotiation
  target: Make TCM_WRITE_PROTECT failure honor D_SENSE bit
  target: Fix target_sense_desc_format NULL pointer dereference
  target: Propigate backend read-only to core_tpg_add_lun
  target: Fix PR registration + APTPL RCU conversion regression
  iser-target: Skip data copy if all the command data comes as immediate
  iser-target: Change the recv buffers posting logic
  iser-target: Fix pending connections handling in target stack shutdown sequnce
  iser-target: Remove np_ prefix from isert_np members
  iser-target: Remove unused variables
  iser-target: Put the reference on commands waiting for unsol data
  iser-target: remove command with state ISTATE_REMOVE
2015-09-26 21:02:42 -04:00
Doug Ledford 2866196f29 IB/ipoib: increase the max mcast backlog queue
When performing sendonly joins, we queue the packets that trigger
the join until the join completes.  This may take on the order of
hundreds of milliseconds.  It is easy to have many more than three
packets come in during that time.  Expand the maximum queue depth
in order to try and prevent dropped packets during the time it
takes to join the multicast group.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-25 22:30:24 -04:00
Doug Ledford c3852ab0e6 IB/ipoib: Make sendonly multicast joins create the mcast group
Since IPoIB should, as much as possible, emulate how multicast
sends work on Ethernet for regular TCP/IP apps, there should be
no requirement to subscribe to a multicast group before your
sends are properly sent.  However, due to the difference in how
multicast is handled on InfiniBand, we must join the appropriate
multicast group before we can send to it.  Previously we tried
not to trigger the auto-create feature of the subnet manager when
doing this because we didn't have tracking of these sendonly
groups and the auto-creation might never get undone.  The previous
patch added timing to these sendonly joins and allows us to
leave them after a reasonable idle expiration time.  So supply
all of the information needed to auto-create group.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-25 14:46:58 -04:00
Christoph Lameter bd99b2e05c IB/ipoib: Expire sendonly multicast joins
On neighbor expiration, check to see if the neighbor was actually a
sendonly multicast join, and if so, leave the multicast group as we
expire the neighbor.

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-25 14:43:19 -04:00
Sagi Grimberg 81fb5e26a9 IB/mlx5: Remove pa_lkey usages
Since mlx5 driver cannot rely on registration using the
reserved lkey (global_dma_lkey) it used to allocate a private
physical address lkey for each allocated pd.
Commit 96249d70dd ("IB/core: Guarantee that a local_dma_lkey
is available") just does it in the core layer so we can go ahead
and use that.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-25 10:46:51 -04:00
Sagi Grimberg c6790aa9f4 IB/mlx5: Remove support for IB_DEVICE_LOCAL_DMA_LKEY
Commit 96249d70dd ("IB/core: Guarantee that a local_dma_lkey
is available") allows ULPs that make use of the local dma key to keep
working as before by allocating a DMA MR with local permissions and
converted these consumers to use the MR associated with the PD
rather then device->local_dma_lkey.

ConnectIB has some known issues with memory registration
using the local_dma_lkey (SEND, RDMA, RECV seems to work ok).

Thus don't expose support for it (remove device->local_dma_lkey
setting), and take advantage of the above commit such that no regression
is introduced to working systems.

The local_dma_lkey support will be restored in CX4 depending on FW
capability query.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-25 10:46:51 -04:00
Sagi Grimberg 3cffd93017 IB/iser: Add module parameter for always register memory
This module parameter forces memory registration even for
a continuous memory region. It is true by default as sending
an all-physical rkey with remote permissions might be insecure.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-25 10:46:51 -04:00
Linus Torvalds 72714841b7 Changes for 4.3-rc1
- Move ehca driver to staging/rdma and schedule for deletion
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJV9xxNAAoJELgmozMOVy/dI+QP/R0afI1zc7pJe1D2RmrZWnAH
 HDWSNYJ8eVk25ptnaOes9kV6yuOV1nGiyCIGqUXk8Q7rVYbIIz0nK81S+gPLMpTU
 /la28bmUz7szvEtY+gFHZCuQvCiwjM+N7t39aehIMCc/vYnIdrOXuYMhB8kd6flL
 2mpnUFmfqMZLGh4RY1PQB9cNwVW9yYOVnG+1Z/M26wWl3NIDe4jiNoqV/jFvEf2v
 FyzjcYfjexVA47XOsfGy512Vlb+8DTtFeCz/TRl9e8tYCSHOmTcXv7jLywRoA1zn
 NNwJnXhliM9/EDqtbJMymYDAm9vzcfvXPCx1OOisVWxJxg9IF3p2/uPLeIKUSXZl
 15mfOzoTuNXlunHDQYbzC4+P2zs+uU2k6ivpd7HZgyiBkUXzftYh/6gIJcJOreU6
 1ndt2fK5hcG/Was+v6kkYaw0x6ZpawyLqHMs32V1xVcbmM5P1OFFOQFx1KDYhHH8
 WmOlcd/XmIjNY5o3ySQHSKjO6Ar0IfNyzpU3jUzg2AFbZv8m2BgYEhQOnrggxBtM
 deFbDlxMnIrsIJtsHBmr5Y/nsgIYpkxLK4xUod76yZSRuyjLCH/2C0otubS1w1vH
 4unU9zJXboBi7naPE5/fdIPMhwZu1X47xUyrwGksz3pHyiRpnhq55t+AWpsHCrg+
 +XYexztBX/cPReacpWJR
 =4XzR
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma driver move from Doug Ledford:
 "This is a move only, no functional changes.

  I tried to get it in prior to the rc1 release, but we were waiting on
  IBM to get back to us that they were OK with the deprecation and
  eventual removal of this driver.  That OK didn't materialize until
  last week, so integration and testing time pushed us beyond the rc1
  release.

  Summary:

   - Move ehca driver to staging/rdma and schedule for deletion"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  IB/ehca: Deprecate driver, move to staging, schedule deletion
2015-09-16 09:16:20 -07:00
Jenny Derzhavetz 9fd60088ff iser-target: Skip data copy if all the command data comes as immediate
Given that supporting zcopy immediate data for all IOs requires
iser driver to use its own buffer allocations, we settle with
avoiding data copy for IOs with data length of up to 8K (which
is more latency sensitive anyway).

This trims IO write latency by up to 3us and increase IOPs
by up to 40% by saving CPU time doing sg_copy_from_buffer
(8K IO size is the obvious winner here).

Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-09-15 15:47:31 -07:00
Jenny Derzhavetz 4366b19ca5 iser-target: Change the recv buffers posting logic
iser target batches post recv operations to avoid
the overhead of acquiring the recv queue lock and
posting a HW doorbell for each command.

We change it to be per command in order to support
zcopy immediate data for IOs that fits in the 8K
transfer boundary (in the next patch).

(Fix minor patch fuzz due to ib_mr removal - nab)

Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-09-15 15:47:29 -07:00
Jenny Derzhavetz bd3792205a iser-target: Fix pending connections handling in target stack shutdown sequnce
Instead of handing a connection to the iscsi stack
for processing right after accepting (rdma_accept) we only hand
the connection to the iscsi core after we reached to a connected
state (ESTABLISHED CM event). This will prevent two error scenrios:

1. race between rdma connection teardown and iscsi login sequence
   reported by Nic in: (ce9a9fc20a "iser-target: Fix REJECT CM event
   use-after-free OOPs")

2. target stack shutdown sequence race with constant login attempts by
   multiple initiators.

We address this by maintaining two queues at the isert_np level:
- accepted: connections that were accepted but have not reached
  connected state (might get rejected, unreachable or error).
- pending: connections in connected state, but have yet to handed
  to the iscsi core for login processing. iser connections are promoted
  to the pending queue only from the accepted queue.

This way the iscsi core now will only handle functional iser connections
and once we shutdown the target stack, we look for any stales that
got left behind so we can safely release them.

Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-09-15 15:47:27 -07:00
Jenny Derzhavetz ed8cb0a437 iser-target: Remove np_ prefix from isert_np members
These are always referenced from np-> so no need
for the prefix.

Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-09-15 15:47:25 -07:00
Jenny Derzhavetz f27dfa1f0e iser-target: Remove unused variables
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-09-15 15:47:23 -07:00
Jenny Derzhavetz 3e03c4b01d iser-target: Put the reference on commands waiting for unsol data
The iscsi target core teardown sequence calls wait_conn for
all active commands to finish gracefully by:
- move the queue-pair to error state
- drain all the completions
- wait for the core to finish handling all session commands

However, when tearing down a session while there are sequenced
commands that are still waiting for unsolicited data outs, we can
block forever as these are missing an extra reference put.

We basically need the equivalent of iscsit_free_queue_reqs_for_conn()
which is called after wait_conn has returned. Address this by an
explicit walk on conn_cmd_list and put the extra reference.

Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-09-15 15:47:21 -07:00
Jenny Derzhavetz a4c15cd957 iser-target: remove command with state ISTATE_REMOVE
As documented in iscsit_sequence_cmd:
/*
 * Existing callers for iscsit_sequence_cmd() will silently
 * ignore commands with CMDSN_LOWER_THAN_EXP, so force this
 * return for CMDSN_MAXCMDSN_OVERRUN as well..
 */

We need to silently finish a command when it's in ISTATE_REMOVE.
This fixes an teardown hang we were seeing where a mis-behaved
initiator (triggered by allocation error injections) sent us a
cmdsn which was lower than expected.

Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-09-15 15:47:19 -07:00
Linus Torvalds 05c78081d2 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "Here are the outstanding target-pending updates for v4.3-rc1.

  Mostly bug-fixes and minor changes this round.  The fallout from the
  big v4.2-rc1 RCU conversion have (thus far) been minimal.

  The highlights this round include:

   - Move sense handling routines into scsi_common code (Sagi)

   - Return ABORTED_COMMAND sense key for PI errors (Sagi)

   - Add tpg_enabled_sendtargets attribute for disabled iscsi-target
     discovery (David)

   - Shrink target struct se_cmd by rearranging fields (Roland)

   - Drop iSCSI use of mutex around max_cmd_sn increment (Roland)

   - Replace iSCSI __kernel_sockaddr_storage with sockaddr_storage (Andy +
     Chris)

   - Honor fabric max_data_sg_nents I/O transfer limit (Arun + Himanshu +
     nab)

   - Fix EXTENDED_COPY >= v4.1 regression OOPsen (Alex + nab)"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (37 commits)
  target: use stringify.h instead of own definition
  target/user: Fix UFLAG_UNKNOWN_OP handling
  target: Remove no-op conditional
  target/user: Remove unused variable
  target: Fix max_cmd_sn increment w/o cmdsn mutex regressions
  target: Attach EXTENDED_COPY local I/O descriptors to xcopy_pt_sess
  target/qla2xxx: Honor max_data_sg_nents I/O transfer limit
  target/iscsi: Replace __kernel_sockaddr_storage with sockaddr_storage
  target/iscsi: Replace conn->login_ip with login_sockaddr
  target/iscsi: Keep local_ip as the actual sockaddr
  target/iscsi: Fix np_ip bracket issue by removing np_ip
  target: Drop iSCSI use of mutex around max_cmd_sn increment
  qla2xxx: Update tcm_qla2xxx module description to 24xx+
  iscsi-target: Add tpg_enabled_sendtargets for disabled discovery
  drivers: target: Drop unlikely before IS_ERR(_OR_NULL)
  target: check DPO/FUA usage for COMPARE AND WRITE
  target: Shrink struct se_cmd by rearranging fields
  target: Remove cmd->se_ordered_id (unused except debug log lines)
  target: add support for START_STOP_UNIT SCSI opcode
  target: improve unsupported opcode message
  ...
2015-09-11 19:00:42 -07:00
Doug Ledford 447e9a4d27 IB/ehca: Deprecate driver, move to staging, schedule deletion
The ehca driver is only supported on IBM machines with a custom EBus.
As they have opted to build their newer machines using more industry
standard technology and haven't really been pushing EBus capable
machines for a while, this driver can now safely be moved to the
staging area and scheduled for eventual removal.  This plan was brought
to IBM's attention and received their sign-off.

Cc: alexs@linux.vnet.ibm.com
Cc: hnguyen@de.ibm.com
Cc: raisch@de.ibm.com
Cc: stefan.roscher@de.ibm.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-11 18:13:35 -04:00
Kirill A. Shutemov 7cbea8dc01 mm: mark most vm_operations_struct const
With two exceptions (drm/qxl and drm/radeon) all vm_operations_struct
structs should be constant.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Linus Torvalds 26d2177e97 Changes for 4.3
- Create drivers/staging/rdma
 - Move amso1100 driver to staging/rdma and schedule for deletion
 - Move ipath driver to staging/rdma and schedule for deletion
 - Add hfi1 driver to staging/rdma and set TODO for move to regular tree
 - Initial support for namespaces to be used on RDMA devices
 - Add RoCE GID table handling to the RDMA core caching code
 - Infrastructure to support handling of devices with differing
   read and write scatter gather capabilities
 - Various iSER updates
 - Kill off unsafe usage of global mr registrations
 - Update SRP driver
 - Misc. mlx4 driver updates
 - Support for the mr_alloc verb
 - Support for a netlink interface between kernel and user space cache
   daemon to speed path record queries and route resolution
 - Ininitial support for safe hot removal of verbs devices
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJV7v8wAAoJELgmozMOVy/d2dcP/3PXnGFPgFGJODKE6VCZtTvj
 nooNXRKXjxv470UT5DiAX7SNcBxzzS7Zl/Lj+831H9iNXUyzuH31KtBOAZ3W03vZ
 yXwCB2caOStSldTRSUUvPe2aIFPnyNmSpC4i6XcJLJMCFijKmxin5pAo8qE44BQU
 yjhT+wC9P6LL5wZXsn/nFIMLjOFfu0WBFHNp3gs5j59paxlx5VeIAZk16aQZH135
 m7YCyicwrS8iyWQl2bEXRMon2vlCHlX2RHmOJ4f/P5I0quNcGF2+d8Yxa+K1VyC5
 zcb3OBezz+wZtvh16yhsDfSPqHWirljwID2VzOgRSzTJWvQjju8VkwHtkq6bYoBW
 egIxGCHcGWsD0R5iBXLYr/tB+BmjbDObSm0AsR4+JvSShkeVA1IpeoO+19162ixE
 n6CQnk2jCee8KXeIN4PoIKsjRSbIECM0JliWPLoIpuTuEhhpajftlSLgL5hf1dzp
 HrSy6fXmmoRj7wlTa7DnYIC3X+ffwckB8/t1zMAm2sKnIFUTjtQXF7upNiiyWk4L
 /T1QEzJ2bLQckQ9yY4v528SvBQwA4Dy1amIQB7SU8+2S//bYdUvhysWPkdKC4oOT
 WlqS5PFDCI31MvNbbM3rUbMAD8eBAR8ACw9ZpGI/Rffm5FEX5W3LoxA8gfEBRuqt
 30ZYFuW8evTL+YQcaV65
 =EHLg
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull inifiniband/rdma updates from Doug Ledford:
 "This is a fairly sizeable set of changes.  I've put them through a
  decent amount of testing prior to sending the pull request due to
  that.

  There are still a few fixups that I know are coming, but I wanted to
  go ahead and get the big, sizable chunk into your hands sooner rather
  than waiting for those last few fixups.

  Of note is the fact that this creates what is intended to be a
  temporary area in the drivers/staging tree specifically for some
  cleanups and additions that are coming for the RDMA stack.  We
  deprecated two drivers (ipath and amso1100) and are waiting to hear
  back if we can deprecate another one (ehca).  We also put Intel's new
  hfi1 driver into this area because it needs to be refactored and a
  transfer library created out of the factored out code, and then it and
  the qib driver and the soft-roce driver should all be modified to use
  that library.

  I expect drivers/staging/rdma to be around for three or four kernel
  releases and then to go away as all of the work is completed and final
  deletions of deprecated drivers are done.

  Summary of changes for 4.3:

   - Create drivers/staging/rdma
   - Move amso1100 driver to staging/rdma and schedule for deletion
   - Move ipath driver to staging/rdma and schedule for deletion
   - Add hfi1 driver to staging/rdma and set TODO for move to regular
     tree
   - Initial support for namespaces to be used on RDMA devices
   - Add RoCE GID table handling to the RDMA core caching code
   - Infrastructure to support handling of devices with differing read
     and write scatter gather capabilities
   - Various iSER updates
   - Kill off unsafe usage of global mr registrations
   - Update SRP driver
   - Misc  mlx4 driver updates
   - Support for the mr_alloc verb
   - Support for a netlink interface between kernel and user space cache
     daemon to speed path record queries and route resolution
   - Ininitial support for safe hot removal of verbs devices"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (136 commits)
  IB/ipoib: Suppress warning for send only join failures
  IB/ipoib: Clean up send-only multicast joins
  IB/srp: Fix possible protection fault
  IB/core: Move SM class defines from ib_mad.h to ib_smi.h
  IB/core: Remove unnecessary defines from ib_mad.h
  IB/hfi1: Add PSM2 user space header to header_install
  IB/hfi1: Add CSRs for CONFIG_SDMA_VERBOSITY
  mlx5: Fix incorrect wc pkey_index assignment for GSI messages
  IB/mlx5: avoid destroying a NULL mr in reg_user_mr error flow
  IB/uverbs: reject invalid or unknown opcodes
  IB/cxgb4: Fix if statement in pick_local_ip6adddrs
  IB/sa: Fix rdma netlink message flags
  IB/ucma: HW Device hot-removal support
  IB/mlx4_ib: Disassociate support
  IB/uverbs: Enable device removal when there are active user space applications
  IB/uverbs: Explicitly pass ib_dev to uverbs commands
  IB/uverbs: Fix race between ib_uverbs_open and remove_one
  IB/uverbs: Fix reference counting usage of event files
  IB/core: Make ib_dealloc_pd return void
  IB/srp: Create an insecure all physical rkey only if needed
  ...
2015-09-09 08:33:31 -07:00
Linus Torvalds 4e4adb2f46 NFS client updates for Linux 4.3
Highlights include:
 
 Stable patches:
 - Fix atomicity of pNFS commit list updates
 - Fix NFSv4 handling of open(O_CREAT|O_EXCL|O_RDONLY)
 - nfs_set_pgio_error sometimes misses errors
 - Fix a thinko in xs_connect()
 - Fix borkage in _same_data_server_addrs_locked()
 - Fix a NULL pointer dereference of migration recovery ops for v4.2 client
 - Don't let the ctime override attribute barriers.
 - Revert "NFSv4: Remove incorrect check in can_open_delegated()"
 - Ensure flexfiles pNFS driver updates the inode after write finishes
 - flexfiles must not pollute the attribute cache with attrbutes from the DS
 - Fix a protocol error in layoutreturn
 - Fix a protocol issue with NFSv4.1 CLOSE stateids
 
 Bugfixes + cleanups
 - pNFS blocks bugfixes from Christoph
 - Various cleanups from Anna
 - More fixes for delegation corner cases
 - Don't fsync twice for O_SYNC/IS_SYNC files
 - Fix pNFS and flexfiles layoutstats bugs
 - pnfs/flexfiles: avoid duplicate tracking of mirror data
 - pnfs: Fix layoutget/layoutreturn/return-on-close serialisation issues.
 - pnfs/flexfiles: error handling retries a layoutget before fallback to MDS
 
 Features:
 - Full support for the OPEN NFS4_CREATE_EXCLUSIVE4_1 mode from Kinglong
 - More RDMA client transport improvements from Chuck
 - Removal of the deprecated ib_reg_phys_mr() and ib_rereg_phys_mr() verbs
   from the SUNRPC, Lustre and core infiniband tree.
 - Optimise away the close-to-open getattr if there is no cached data
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJV7chgAAoJEGcL54qWCgDyqJQP/3kto9VXnXcatC382jF9Pfj5
 F55XeSnviOXH7CyiKA4nSBhnxg/sLuWOTpbkVI/4Y+VyWhLby9h+mtcKURHOlBnj
 d5BFoPwaBVDnUiKlHFQDkRjIyxjj2Sb6/uEb2V/u3v+3znR5AZZ4lzFx4cD85oaz
 mcru7yGiSxaQCIH6lHExcCEKXaDP5YdvS9YFsyQfv2976JSaQHM9ZG04E0v6MzTo
 E5wwC4CLMKmhuX9kmQMj85jzs1ASAKZ3N7b4cApTIo6F8DCDH0vKQphq/nEQC497
 ECjEs5/fpxtNJUpSBu0gT7G4LCiW3PzE7pHa+8bhbaAn9OzxIR5+qWujKsfGYQhO
 Oomp3K9zO6omshAc5w4MkknPpbImjoZjGAj/q/6DbtrDpnD7DzOTirwYY2yX0CA8
 qcL81uJUb8+j4jJj4RTO+lTUBItrM1XTqTSd/3eSMr5DDRVZj+ERZxh17TaxRBZL
 YrbrLHxCHrcbdSbPlovyvY+BwjJUUFJRcOxGQXLmNYR9u92fF59rb53bzVyzcRRO
 wBozzrNRCFL+fPgfNPLEapIb6VtExdM3rl2HYsJGckHj4DPQdnoB3ytIT9iEFZEN
 +/rE14XEZte7kuH3OP4el2UsP/hVsm7A49mtwrkdbd7rxMWD6XfQUp8DAggWUEtI
 1H6T7RC1Y6wsu0X1fnVz
 =knJA
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.3-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Stable patches:
   - Fix atomicity of pNFS commit list updates
   - Fix NFSv4 handling of open(O_CREAT|O_EXCL|O_RDONLY)
   - nfs_set_pgio_error sometimes misses errors
   - Fix a thinko in xs_connect()
   - Fix borkage in _same_data_server_addrs_locked()
   - Fix a NULL pointer dereference of migration recovery ops for v4.2
     client
   - Don't let the ctime override attribute barriers.
   - Revert "NFSv4: Remove incorrect check in can_open_delegated()"
   - Ensure flexfiles pNFS driver updates the inode after write finishes
   - flexfiles must not pollute the attribute cache with attrbutes from
     the DS
   - Fix a protocol error in layoutreturn
   - Fix a protocol issue with NFSv4.1 CLOSE stateids

  Bugfixes + cleanups
   - pNFS blocks bugfixes from Christoph
   - Various cleanups from Anna
   - More fixes for delegation corner cases
   - Don't fsync twice for O_SYNC/IS_SYNC files
   - Fix pNFS and flexfiles layoutstats bugs
   - pnfs/flexfiles: avoid duplicate tracking of mirror data
   - pnfs: Fix layoutget/layoutreturn/return-on-close serialisation
     issues
   - pnfs/flexfiles: error handling retries a layoutget before fallback
     to MDS

  Features:
   - Full support for the OPEN NFS4_CREATE_EXCLUSIVE4_1 mode from
     Kinglong
   - More RDMA client transport improvements from Chuck
   - Removal of the deprecated ib_reg_phys_mr() and ib_rereg_phys_mr()
     verbs from the SUNRPC, Lustre and core infiniband tree.
   - Optimise away the close-to-open getattr if there is no cached data"

* tag 'nfs-for-4.3-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (108 commits)
  NFSv4: Respect the server imposed limit on how many changes we may cache
  NFSv4: Express delegation limit in units of pages
  Revert "NFS: Make close(2) asynchronous when closing NFS O_DIRECT files"
  NFS: Optimise away the close-to-open getattr if there is no cached data
  NFSv4.1/flexfiles: Clean up ff_layout_write_done_cb/ff_layout_commit_done_cb
  NFSv4.1/flexfiles: Mark the layout for return in ff_layout_io_track_ds_error()
  nfs: Remove unneeded checking of the return value from scnprintf
  nfs: Fix truncated client owner id without proto type
  NFSv4.1/flexfiles: Mark layout for return if the mirrors are invalid
  NFSv4.1/flexfiles: RW layouts are valid only if all mirrors are valid
  NFSv4.1/flexfiles: Fix incorrect usage of pnfs_generic_mark_devid_invalid()
  NFSv4.1/flexfiles: Fix freeing of mirrors
  NFSv4.1/pNFS: Don't request a minimal read layout beyond the end of file
  NFSv4.1/pnfs: Handle LAYOUTGET return values correctly
  NFSv4.1/pnfs: Don't ask for a read layout for an empty file.
  NFSv4.1: Fix a protocol issue with CLOSE stateids
  NFSv4.1/flexfiles: Don't mark the entire deviceid as bad for file errors
  SUNRPC: Prevent SYN+SYNACK+RST storms
  SUNRPC: xs_reset_transport must mark the connection as disconnected
  NFSv4.1/pnfs: Ensure layoutreturn reserves space for the opaque payload
  ...
2015-09-07 14:02:24 -07:00
Jason Gunthorpe d1178cbcdc IB/ipoib: Suppress warning for send only join failures
We expect send only joins to fail, it just means there are no listeners
for the group. The correct thing to do is silently drop the packet
at source.

Eg avahi will full join 224.0.0.251 which causes a send only IGMP packet
to 224.0.0.22, and then a warning level kmessage like this:

 ib0: sendonly multicast join failed for ff12:401b:ffff:0000:0000:0000:0000:0016, status -22

If there is no IP router listening to IGMP.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-03 17:11:05 -04:00
Doug Ledford c3acdc06a9 IB/ipoib: Clean up send-only multicast joins
Even though we don't expect the group to be created by the SM we
sill need to provide all the parameters to force the SM to validate
they are correct.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-03 17:05:58 -04:00
Sagi Grimberg 7fbc67df2c IB/srp: Fix possible protection fault
srp_destroy_qp is designed to indicate we are safe to continue with
freeing the channel resources by modifying the qp error state,
posting a dummy wr on the queue-pair and waiting for it to flush.
This also holds for the channel registration pool as we are unmapping
the memory region when handling a scsi response. Destroying the
channel registration pool before we make sure we processed all the
inflight IO might introduce a use-after-free of the registration pool.

This use-after-free is demonstrated in the stack trace below where
srp is trying to unmap a used FMR after the fmr_pool was already destroyed.

general protection fault: 0000 [#1] SMP
RIP: 0010:[<ffffffff8151121b>]  [<ffffffff8151121b>] _raw_spin_lock_irqsave+0x1b/0x50
Call Trace:
 [<ffffffffa055d88a>] ib_fmr_pool_unmap+0x1a/0xb0 [ib_core]
 [<ffffffffa06c00ed>] srp_unmap_data.isra.28+0x17d/0x250 [ib_srp]
 [<ffffffffa06c01eb>] srp_free_req+0x2b/0x60 [ib_srp]
 [<ffffffffa06c0c94>] srp_recv_completion+0x174/0x580 [ib_srp]
 [<ffffffffa04580fe>] mlx4_eq_int+0x4de/0xe50 [mlx4_core]
 [<ffffffffa0458b00>] mlx4_msi_x_interrupt+0x10/0x20 [mlx4_core]
 [<ffffffff810abc45>] handle_irq_event_percpu+0x35/0x1b0
 [<ffffffff810abdf2>] handle_irq_event+0x32/0x50
 [<ffffffff810ae5cf>] handle_edge_irq+0x6f/0x120
 [<ffffffff8100455a>] handle_irq+0x1a/0x30
 [<ffffffff8151b475>] do_IRQ+0x45/0xb0
 [<ffffffff8151162d>] common_interrupt+0x6d/0x6d
 [<ffffffff813e4d2f>] cpuidle_enter_state+0x4f/0xc0
 [<ffffffff813e4e6c>] cpuidle_idle_call+0xcc/0x210
 [<ffffffff8100b9ea>] arch_cpu_idle+0xa/0x30
 [<ffffffff810ab1e1>] cpu_startup_entry+0xe1/0x270
 [<ffffffff81030b3a>] start_secondary+0x21a/0x2c0

Reported-by: Eliott Kespi <eliottk@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-03 15:59:48 -04:00
Ira Weiny 0629cb06cd IB/core: Move SM class defines from ib_mad.h to ib_smi.h
When the hfi1 driver was added these definitions were moved from the qib driver
to ib_mad.h to be used by both qib and hfi1.  They should have been moved to
ib_smi.h instead.

Fixes: d4ab347005 ("IB/core: Add core header changes needed for OPA")
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-03 15:50:32 -04:00
Sagi Grimberg b636401f0e mlx5: Fix incorrect wc pkey_index assignment for GSI messages
Since patch series "Demux IB CM requests in the rdma_cm module" the
P_Key index is taken from the work completion rather than the message
itself.

The HCA provides us with the message P_Key. In order to provide the
P_Key index, we need to look it up. Given that this is relevant only
for GSI messages (session establishments) which is less performance critical,
micro-optimize against the GSI (is_qp1) branch.

Fixes: 4c21b5bcef ("IB/cma: Add net_dev and private data checks to
RDMA CM")
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-03 14:50:06 -04:00
Haggai Eran 11d748045c IB/mlx5: avoid destroying a NULL mr in reg_user_mr error flow
The mlx5_ib_reg_user_mr() function will attempt to call clean_mr() in
its error flow even though there is never a case where the error flow
occurs with a valid MR pointer to destroy.

Remove the clean_mr() call and the incorrect comment above it.

Fixes: b4cfe447d4 ("IB/mlx5: Implement on demand paging by adding
support for MMU notifiers")
Cc: Eli Cohen <eli@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-03 14:42:54 -04:00
Christoph Hellwig b632ffa7ce IB/uverbs: reject invalid or unknown opcodes
We have many WR opcodes that are only supported in kernel space
and/or require optional information to be copied into the WR
structure.  Reject all those not explicitly handled so that we
can't pass invalid information to drivers.

Cc: stable@vger.kernel.org
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-03 14:25:24 -04:00
Nicholas Krause 54b9a96f10 IB/cxgb4: Fix if statement in pick_local_ip6adddrs
This fixes an if statement checking the return value of the function
get_lladdr for success in the function pick_local_ip6addrs to instead
of directly checking the return value of this call check the opposite
as get_lladdr returns zero for success which would incorrectly make
this if statement block not execute with the current if statement
check.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-03 14:03:59 -04:00
Kaike Wan ba13b5f8f8 IB/sa: Fix rdma netlink message flags
The flags to ibnl_put_msg should be NLM_F_REQUEST instead of GFP_KERNEL.

Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-02 13:58:54 -04:00
Yishai Hadas e1c30298cc IB/ucma: HW Device hot-removal support
Currently, IB/cma remove_one flow blocks until all user descriptor managed by
IB/ucma are released. This prevents hot-removal of IB devices. This patch
allows IB/cma to remove devices regardless of user space activity. Upon getting
the RDMA_CM_EVENT_DEVICE_REMOVAL event we close all the underlying HW resources
for the given ucontext. The ucontext itself is still alive till its explicit
destroying by its creator.

Running applications at that time will have some zombie device, further
operations may fail.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:41 -04:00
Yishai Hadas ae184ddeca IB/mlx4_ib: Disassociate support
Implements the IB core disassociate_ucontext API. The driver detaches the HW
resources for a given user context to prevent a dependency between application
termination and device disconnecting. This is done by managing the VMAs that
were mapped to the HW bars such as door bell and blueflame. When need to detach
remap them to an arbitrary kernel page returned by the zap API.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:40 -04:00
Yishai Hadas 036b106357 IB/uverbs: Enable device removal when there are active user space applications
Enables the uverbs_remove_one to succeed despite the fact that there are
running IB applications working with the given ib device.  This
functionality enables a HW device to be unbind/reset despite the fact that
there are running user space applications using it.

It exposes a new IB kernel API named 'disassociate_ucontext' which lets
a driver detaching its HW resources from a given user context without
crashing/terminating the application. In case a driver implemented the
above API and registered with ib_uverb there will be no dependency between its
device to its uverbs_device. Upon calling remove_one of ib_uverbs the call
should return after disassociating the open HW resources without waiting to
clients disconnecting. In case driver didn't implement this API there will be no
change to current behaviour and uverbs_remove_one will return only when last
client has disconnected and reference count on uverbs device became 0.

In case the lower driver device was removed any application will
continue working over some zombie HCA, further calls will ended with an
immediate error.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:40 -04:00
Yishai Hadas 057aec0d23 IB/uverbs: Explicitly pass ib_dev to uverbs commands
Done in preparation for deploying RCU for the device removal
flow. Allows isolating the RCU handling to the uverb_main layer and
keeping the uverbs_cmd code as is.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:40 -04:00
Yishai Hadas 35d4a0b63d IB/uverbs: Fix race between ib_uverbs_open and remove_one
Fixes: 2a72f21226 ("IB/uverbs: Remove dev_table")

Before this commit there was a device look-up table that was protected
by a spin_lock used by ib_uverbs_open and by ib_uverbs_remove_one. When
it was dropped and container_of was used instead, it enabled the race
with remove_one as dev might be freed just after:
dev = container_of(inode->i_cdev, struct ib_uverbs_device, cdev) but
before the kref_get.

In addition, this buggy patch added some dead code as
container_of(x,y,z) can never be NULL and so dev can never be NULL.
As a result the comment above ib_uverbs_open saying "the open method
will either immediately run -ENXIO" is wrong as it can never happen.

The solution follows Jason Gunthorpe suggestion from below URL:
https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg25692.html

cdev will hold a kref on the parent (the containing structure,
ib_uverbs_device) and only when that kref is released it is
guaranteed that open will never be called again.

In addition, fixes the active count scheme to use an atomic
not a kref to prevent WARN_ON as pointed by above comment
from Jason.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:40 -04:00
Yishai Hadas 03c40442a0 IB/uverbs: Fix reference counting usage of event files
Fix the reference counting usage to be handled in the event file
creation/destruction function, instead of being done by the caller.
This is done for both async/non-async event files.

Based on Jason Gunthorpe report at https://www.mail-archive.com/
linux-rdma@vger.kernel.org/msg24680.html:
"The existing code for this is broken, in ib_uverbs_get_context all
the error paths between ib_uverbs_alloc_event_file and the
kref_get(file->ref) are wrong - this will result in fput() which will
call ib_uverbs_event_close, which will try to do kref_put and
ib_unregister_event_handler - which are no longer paired."

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:39 -04:00
Jason Gunthorpe 7dd78647a2 IB/core: Make ib_dealloc_pd return void
The majority of callers never check the return value, and even if they
did, they can't do anything about a failure.

All possible failure cases represent a bug in the caller, so just
WARN_ON inside the function instead.

This fixes a few random errors:
 net/rd/iw.c infinite loops while it fails. (racing with EBUSY?)

This also lays the ground work to get rid of error return from the
drivers. Most drivers do not error, the few that do are broken since
it cannot be handled.

Since uverbs can legitimately make use of EBUSY, open code the
check.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:39 -04:00
Bart Van Assche 03f6fb93fd IB/srp: Create an insecure all physical rkey only if needed
The SRP initiator only needs this if the insecure register_always=N
performance optimization is enabled, or if FRWR/FMR is not supported
in the driver.

Do not create an all physical MR unless it is needed to support
either of those modes. Default register_always to true so the out of
the box configuration does not create an insecure all physical MR.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
[bvanassche: reworked and rebased this patch]
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:39 -04:00
Bart Van Assche 330179f2fa IB/srp: Register the indirect data buffer descriptor
Instead of always using the global rkey for the indirect data
buffer descriptor, register that descriptor with the HCA if
the kernel module parameter register_always has been set to Y.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:38 -04:00
Bart Van Assche 002f15674c IB/srp: Introduce srp_device.use_fmr
Introduce the variable srp_device.use_fmr. Leave out the dev->has_fr /
dev->has_fmr and ch->fr_pool / ch->fmr_pool checks since these are
redundant. This patch does not change any functionality but makes the
source code easier to read.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:38 -04:00
Bart Van Assche 3ae95da883 IB/srp: Remove use_mr argument from srp_map_sg_entry()
Move the srp_map_desc() call from inside srp_map_sg_entry() to
srp_map_sg() such that the use_mr argument can be removed from
srp_map_sg_entry().

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:38 -04:00
Bart Van Assche 0e0d3a4800 IB/srp: Remove the memory registration backtracking code
Mapping a discontiguous sg-list requires multiple memory regions
and hence can exhaust the memory region pool. The SRP initiator
already handles this by temporarily reducing the queue depth. This
means that it is safe to remove the memory registration backtracking
code. This patch has been tested with direct I/O sizes up to 256 MB.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:37 -04:00
Bart Van Assche f731ed6293 IB/srp: Add memory descriptor array pointer range checking
Although most paths through which a request is submitted check
block layer parameters like the max_segments limit, these are
not checked when an SG_IO or direct I/O request is submitted.
Hence add a range check for the memory descriptor array pointer.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:37 -04:00
Bart Van Assche 7e85c91970 IB/srp: Use multiple registrations for large memory regions
Instead of using the global rkey for large memory regions, use
multiple registrations. See also the while (dma_len) loop further
down in srp_map_sg_entry().

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:37 -04:00
Bart Van Assche 186fbc6689 IB/srp: Re-enable FMR for non-page aligned buffers
During a discussion in 2011 nobody recalled why FMR was not used for
non-page aligned buffers (see also
http://thread.gmane.org/gmane.linux.drivers.rdma/7149). Re-enable FMR
for such buffers. For the reason why the srp_map_fmr() function needs
to be modified, see also patch "IB/srp: rework mapping engine to use
multiple FMR entries" (commit ID 8f26c9ff9cd0; January 2011).

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:36 -04:00
Jason Gunthorpe 5a783956c2 ib_srpt: Remove ib_get_dma_mr calls
The pd now has a local_dma_lkey member which completely replaces
ib_get_dma_mr, use it instead.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:35 -04:00
Jason Gunthorpe e6bf5f48d2 IB/srp: Use pd->local_dma_lkey
Replace all leys with  pd->local_dma_lkey. This driver does not support
iWarp, so this is safe.

The insecure use of ib_get_dma_mr is thus isolated to an rkey, and will
have to be fixed separately.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:35 -04:00
Jason Gunthorpe 34efc7dfbd iser-target: Remove ib_get_dma_mr calls
The pd now has a local_dma_lkey member which completely replaces
ib_get_dma_mr, use it instead.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:35 -04:00
Jason Gunthorpe 256b7ad273 IB/iser: Use pd->local_dma_lkey
Replace all leys with  pd->local_dma_lkey. This driver does not support
iWarp, so this is safe.

The insecure use of ib_get_dma_mr is thus isolated to an rkey, and this
looks trivially fixed by forcing the use of registration in a future
patch.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:34 -04:00
Jason Gunthorpe b37c788f59 IB/mlx5: Remove ib_get_dma_mr calls
The pd now has a local_dma_lkey member which completely replaces
ib_get_dma_mr, use it instead.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:34 -04:00
Jason Gunthorpe 7dd9757628 IB/mlx4: Remove ib_get_dma_mr calls
The pd now has a local_dma_lkey member which completely replaces
ib_get_dma_mr, use it instead.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:34 -04:00
Jason Gunthorpe 77b1f99660 IB/ipoib: Remove ib_get_dma_mr calls
The pd now has a local_dma_lkey member which completely replaces
ib_get_dma_mr, use it instead.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:34 -04:00
Jason Gunthorpe 4be90bc60d IB/mad: Remove ib_get_dma_mr calls
The pd now has a local_dma_lkey member which completely replaces
ib_get_dma_mr, use it instead.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:33 -04:00
Jason Gunthorpe 96249d70dd IB/core: Guarantee that a local_dma_lkey is available
Every single ULP requires a local_dma_lkey to do anything with
a QP, so let us ensure one exists for every PD created.

If the driver can supply a global local_dma_lkey then use that, otherwise
ask the driver to create a local use all physical memory MR associated
with the new PD.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Sagi Grimberg <sagig@dev.mellanox.co.il>
Acked-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:33 -04:00
Sagi Grimberg 7332bed085 IB/iser: Chain all iser transaction send work requests
Chaning of send work requests benefits performance by
reducing the send queue lock contention (acquired in
ib_post_send) and saves us HW doorbells which is posted
only once.

Currently, in normal IO flows iser does not chain the CDB send
work request with the registration work request. Also in PI
flows, signature work requests are not chained as well.

Lets chain those and post only once.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:33 -04:00
Sagi Grimberg 1b16c9894b IB/iser: Add debug prints to the various memory registration methods
Easier to debug when we have the registration details.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:32 -04:00
Sagi Grimberg df749cdc45 IB/iser: Support up to 8MB data transfer in a single command
iser support up to 512KB data transfer in a single scsi command.
This means that larger IOs will split to different request. While
iser can easily saturate FDR/EDR wires, some arrays are fine tuned
for 1MB (or larger) IO sizes, hence add an option to support larger
transfers (up to 8MB) if the device allows it.

Given that a few target implementations don't support data transfers
of more than 512KB by default and the fact that larger IO sizes require
more resources, we introduce a module parameter to determine the
maximum number of 512B sectors in a single scsi command.
Users that are interested in larger transfers can change this value given
that the target supports larger transfers.

At the moment, iser works in 4K pages granularity, In a later stage
we will get it to work with system page size instead.

IO operations that consists of N pages will need a page vector
of size N+1 in case the first SG element contains an offset. Given
that some devices allocates memory regions in powers of 2, this
means that allocating a region with N+1 pages, will result in
region resources allocation of the next power of 2. Since we don't
want that to happen, in case we are in the limit of IO size supported
and the first SG element has an offset, we align the SG list using a
bounce buffer (which is OK given that this is not likely to happen a lot).

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:32 -04:00
Sagi Grimberg f8db651da2 IB/iser: Pass registration pool a size parameter
Hard coded for now. This will allow to allocate different
sized MRs depending on the IO size needed (and device
capabilities).

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:32 -04:00
Sagi Grimberg 32467c420b IB/iser: Unify fast memory registration flows
iser_reg_rdma_mem_[fastreg|fmr] share a lot of code, and
logically do the same thing other than the buffer registration
method itself (iser_fast_reg_mr vs. iser_fast_reg_fmr).
The DIF logic is not implemented in the FMR flow as there is no
existing device that supports FMRs and Signature feature.

This patch unifies the flow in a single routine iser_reg_rdma_mem
and just split to fmr/frwr for the buffer registration itself.

Also, for symmetry reasons, unify iser_unreg_rdma_mem (which will
call the relevant device specific unreg routine).

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:31 -04:00
Sagi Grimberg 81722909c8 IB/iser: Make reg_desc_get a per device routine
As for fmrs we will hold a single registration descriptor
as no need for multiple like in the frwr mode (descriptor
for each task). This change helps unifying the duplicate
registration code paths.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:31 -04:00
Sagi Grimberg 7d0483c927 IB/iser: Rename iser_reg_page_vec to iser_fast_reg_fmr
Also, change a name of a local variable.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:31 -04:00
Adir Lev 2b3bf95810 IB/iser: Maintain connection fmr_pool under a single registration descriptor
This will allow us to unify the memory registration code path between
the various methods which vary by the device capabilities. This change
will make it easier and less intrusive to remove fmr_pools from the
code when we'd want to.

The reason we use a single descriptor is to avoid taking a
redundant spinlock when working with FMRs.

We also change the signature of iser_reg_page_vec to make it match
iser_fast_reg_mr (and the future indirect registration method).

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:30 -04:00
Sagi Grimberg 385ad87d4b IB/iser: Introduce iser registration pool struct
Instead of having it a part of the connection structure,
have it be under a dedicated (embedded) structure in the
connection. A logical separation of the registration pool
and the connection structure.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:30 -04:00
Sagi Grimberg eb6ea8c36c IB/iser: Move fastreg descriptor allocation to iser_create_fastreg_desc
Don't have the caller allocate the structure and worry about
freeing it in case the routine failed.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:30 -04:00
Sagi Grimberg 48afbff673 IB/iser: Introduce iser_reg_ops
Move all the per-device function pointers to an easy
extensible iser_reg_ops structure that contains all
the iser registration operations.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:29 -04:00
Sagi Grimberg 8c18ed03a9 IB/iser: Remove dead code in fmr_pool alloc/free
In the past the we always tried to allocate an fmr_pool
and if it failed on ENOSYS (not supported) then we continued
with dma mr. This is not the case anymore and if we tried to
allocate an fmr_pool then it is supported and we expect to succeed.

Also, the check if fmr_pool is allocated when free is called is
redundant as well as we are guaranteed it exists.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:29 -04:00
Sagi Grimberg 5190cc2664 IB/iser: Rename struct fast_reg_descriptor -> iser_fr_desc
Avoid struct names without iser_ prefix.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:29 -04:00
Sagi Grimberg d711d81d64 IB/iser: Introduce struct iser_reg_resources
Have fast_reg_descriptor hold struct iser_reg_resources
(mr, frpl, valid flag). This will be useful when the
actual buffer registration routines will be passed with
the needed registration resources (i.e. iser_reg_resources)
without being aware of their nature (i.e. data or protection).

In order to achieve this, we remove reg_indicators flags container
and place specific flags (mr_valid) within iser_reg_resources struct.
We also place the sig_mr_valid and sig_protcted flags in iser_pi_context.

This patch also modifies iser_fast_reg_mr to receive the
reg_resources instead of the fast_reg_descriptor and a data/protection
indicator.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:28 -04:00
Sagi Grimberg ea18f5d777 IB/iser: Remove an unneeded print for unaligned memory
We can do it in iser_aligned_data_len instead and
it will save us an argument that is passed to
fall_to_counce_buf just for the print.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:28 -04:00
Sagi Grimberg b9abd8d21d IB/iser: Remove a redundant always-false condition
We always call iser_initialize_task_headers() and set
the header tx_sg.lkey to the device mr lkey, so no
point in checking it in iser_create_send_desc().

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:28 -04:00
Sagi Grimberg 8d5944d803 IB/iser: Fix possible bogus DMA unmapping
If iser_initialize_task_headers() routine failed before
dma mapping, we should not attempt to unmap in cleanup_task().

Fixes: 7414dde0a6 (IB/iser: Fix race between iser connection ...)
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:28 -04:00
Sagi Grimberg 02816a8b88 IB/iser: Get rid of un-maintained counters
We don't update those anywhere in the code and they
seem pretty useless (no one seem to care about those).

qp_tx_queue_full: We never should get this
fmr_map_not_avail: We can never get to this
eh_abort_cnt: We don't monitor aborts

Go ahead and remove them.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:27 -04:00
Sagi Grimberg d16739055b IB/iser: Fix missing return status check in iser_send_data_out
Since commit "IB/iser: Fix race between iser connection teardown..."
iser_initialize_task_headers() might fail, so we need to check that.

Fixes: 7414dde0a6 (IB/iser: Fix race between iser connection ...)
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:27 -04:00
Sagi Grimberg 1156cc80f8 IB/iser: Remove '.' from log message
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:27 -04:00
Sagi Grimberg 74ce897b7c IB/iser: Change minor assignments and logging prints
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:26 -04:00
Jenny Falkovich db0a6cbd21 IB/iser: Change some module parameters to be RO
While we're at it, use permission defines instead
of octal values and rearrange a little bit.

Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:26 -04:00
Kaike Wan 2ca546b92a IB/sa: Route SA pathrecord query through netlink
This patch routes a SA pathrecord query to netlink first and processes the
response appropriately. If a failure is returned, the request will be sent
through IB. The decision whether to route the request to netlink first is
determined by the presence of a listener for the local service netlink
multicast group. If the user-space local service netlink multicast group
listener is not present, the request will be sent through IB, just like
what is currently being done.

Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:26 -04:00
Kaike Wan 5d2657708e IB/sa: Allocate SA query with kzalloc
Replace kmalloc with kzalloc so that all uninitialized fields in SA query
will be zero-ed out to avoid unintentional consequence. This prepares the
SA query structure to accept new fields in the future.

Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:25 -04:00
Kaike Wan bc10ed7d3d IB/core: Add rdma netlink helper functions
This patch adds a function to check if listeners for a netlink multicast
group are present. It also adds a function to receive netlink response
messages.

Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:25 -04:00
Bart Van Assche bc44bd1d86 IB/srp: Stop the scsi_eh_<n> and scsi_tmf_<n> threads if login fails
scsi_host_alloc() not only allocates memory for a SCSI host but also
creates the scsi_eh_<n> kernel thread and the scsi_tmf_<n> workqueue.
Stop these threads if login fails by calling scsi_host_put().

Reported-by: Konstantin Krotov <kkv@clodo.ru>
Fixes: fb49c8bbaa ("Remove an extraneous scsi_host_put() from an error path")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: <stable@vger.kernel.org> #v3.19
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:24 -04:00
Bart Van Assche 713ef24e41 IB/srp: Bump driver version and release date
Since version 1.0 e.g. scsi-mq has been added. Since this is
a significant change, bump the driver version and release date.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:24 -04:00
Bart Van Assche c257ea6f9f IB/srp: Handle partial connection success correctly
Avoid that the following kernel warning is reported if the SRP
target system accepts fewer channels per connection than what
was requested by the initiator system:

WARNING: at drivers/infiniband/ulp/srp/ib_srp.c:617 srp_destroy_qp+0xb1/0x120 [ib_srp]()
Call Trace:
[<ffffffff8105d67f>] warn_slowpath_common+0x7f/0xc0
[<ffffffff8105d6da>] warn_slowpath_null+0x1a/0x20
[<ffffffffa05419e1>] srp_destroy_qp+0xb1/0x120 [ib_srp]
[<ffffffffa05445fb>] srp_create_ch_ib+0x19b/0x420 [ib_srp]
[<ffffffffa0545257>] srp_create_target+0x7d7/0xa94 [ib_srp]
[<ffffffff8138dac0>] dev_attr_store+0x20/0x30
[<ffffffff812079ef>] sysfs_write_file+0xef/0x170
[<ffffffff81191fc4>] vfs_write+0xb4/0x130
[<ffffffff8119276f>] sys_write+0x5f/0xa0
[<ffffffff815a0a59>] system_call_fastpath+0x16/0x1b

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:24 -04:00
Bart Van Assche e6300cbd9b IB/srp: Constify a function argument
This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:23 -04:00
Ariel Nahum 799cdaf8a9 IB/mlx4: Fix incorrect cq flushing in error state
When handling a device internal error, the driver is responsible to
drain the completion queue with flush errors.

In case a completion queue was assigned to multiple send queues, the
driver iterates over the send queues and generates flush errors of
inflight wqes. The driver must correctly pass the wc array with an
offset as a result of the previous send queue iteration. Not doing so
will overwrite previously set completions and return a wrong number
of polled completions which includes ones which were not correctly set.

Fixes: 35f05dabf9 (IB/mlx4: Reset flow support for IB kernel ULPs)
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:23 -04:00
Noa Osherovich 5e99b139f1 IB/mlx4: Use correct SL on AH query under RoCE
The mlx4 IB driver implementation for ib_query_ah used a wrong offset
(28 instead of 29) when link type is Ethernet. Fixed to use the correct one.

Fixes: fa417f7b52 ('IB/mlx4: Add support for IBoE')
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:23 -04:00
Jack Morgenstein 2b135db3e8 IB/mlx4: Forbid using sysfs to change RoCE pkeys
The pkey mapping for RoCE must remain the default mapping:
VFs:
  virtual index 0 = mapped to real index 0 (0xFFFF)
  All others indices: mapped to a real pkey index containing an
                      invalid pkey.
PF:
  virtual index i = real index i.

Don't allow users to change these mappings using files found in
sysfs.

Fixes: c1e7e46612 ('IB/mlx4: Add iov directory in sysfs under the ib device')
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:22 -04:00
Jack Morgenstein 2cb8e7f86e IB/mlx4: Demote mcg message from warning to debug
The mcg "too many pending requests" warning message fills the log
when OpenSM is downed. Demote the message from  warning level to
debug level.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:22 -04:00
Jack Morgenstein 90c1d8b635 IB/mlx4: Fix potential deadlock when sending mad to wire
send_mad_to_wire takes the same spinlock that is taken in
the interrupt context.  Therefore, it needs irqsave/restore.

Fixes: b9c5d6a643 ('IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV')
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:22 -04:00
Doug Ledford b8071ad893 IB/core: Remove needless bracketization
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:21 -04:00
Somnath Kotur cc36929e73 RDMA/ocrdma: Incorporate the moving of GID Table mgmt to IB/Core
1.Change query_gid hook to return value from IB/Core GID
  management APIs.
2.Get rid of all the netdev notifier chain subscription code as well
  as maintenance of SGID Table in memory.
3.Implement get_netdev hook in driver.

Signed-off-by: Somnath Kotur <somnath.kotur@avagotech.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:21 -04:00
Moni Shoua 5070cd2239 IB/mlx4: Replace mechanism for RoCE GID management
Manage RoCE gid table with logic in IB/core, which is common to all
vendors, and remove the mechanism from the mlx4 IB driver.
Since management of the GID cache may lead to index mismatch with the
hardware GID table, a translation between indexes is required when
modifying a QP or creating an address handle.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:21 -04:00
Moni Shoua e26be1bfef IB/mlx4: Implement ib_device callbacks
get_netdev: get the net_device on the physical port of the IB transport port. In
port aggregation mode it is required to return the netdev of the active port.

modify_gid: note for a change in the RoCE gid cache. Handle this by writing to
the harsware GID table. It is possible that indexes in cahce and hardware tables
won't match so a translation is required when modifying a QP or creating an
address handle.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:20 -04:00
Matan Barak 238fdf48f2 IB/core: Add RoCE table bonding support
Handling bonding and other devices require us to all all GIDs of the
net-devices which are upper-devices of the RoCE port related
net-device.

Active-backup configurations imposes even more challenges as the
default GID should only be set on the active devices (this is
necessary as otherwise the same MAC could be used for several
slaves and thus several slaves will have identical GIDs).

Managing these configurations are done by listening to:
(a) NETDEV_CHANGEUPPER event
	(1) if a related net-device is linked, delete all inactive
	    slaves default GIDs and add the upper device GIDs.
	(2) if a related net-device is unlinked, delete all upper GIDs
	    and add the default GIDs.
(b) NETDEV_BONDING_FAILOVER:
	(1) delete the bond GIDs from inactive slaves
	(2) delete the inactive slave's default GIDs
	(3) Add the bond GIDs to the active slave.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:20 -04:00
Dan Carpenter 98d25afa97 IB/core: missing curly braces in ib_find_gid()
Smatch says that, based on the indenting, we should probably add curly
braces here.

Fixes: 03db3a2d81 ('IB/core: Add RoCE GID table management')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:12:09 -04:00
Matan Barak 03db3a2d81 IB/core: Add RoCE GID table management
RoCE GIDs are based on IP addresses configured on Ethernet net-devices
which relate to the RDMA (RoCE) device port.

Currently, each of the low-level drivers that support RoCE (ocrdma,
mlx4) manages its own RoCE port GID table. As there's nothing which is
essentially vendor specific, we generalize that, and enhance the RDMA
core GID cache to do this job.

In order to populate the GID table, we listen for events:

(a) netdev up/down/change_addr events - if a netdev is built onto
    our RoCE device, we need to add/delete its IPs. This involves
    adding all GIDs related to this ndev, add default GIDs, etc.

(b) inet events - add new GIDs (according to the IP addresses)
    to the table.

For programming the port RoCE GID table, providers must implement
the add_gid and del_gid callbacks.

RoCE GID management requires us to state the associated net_device
alongside the GID. This information is necessary in order to manage
the GID table. For example, when a net_device is removed, its
associated GIDs need to be removed as well.

RoCE mandates generating a default GID for each port, based on the
related net-device's IPv6 link local. In contrast to the GID based on
the regular IPv6 link-local (as we generate GID per IP address),
the default GID is also available when the net device is down (in
order to support loopback).

Locking is done as follows:
The patch modify the GID table code both for new RoCE drivers
implementing the add_gid/del_gid callbacks and for current RoCE and
IB drivers that do not. The flows for updating the table are
different, so the locking requirements are too.

While updating RoCE GID table, protection against multiple writers is
achieved via mutex_lock(&table->lock). Since writing to a table
requires us to find an entry (possible a free entry) in the table and
then modify it, this mutex protects both the find_gid and write_gid
ensuring the atomicity of the action.
Each entry in the GID cache is protected by rwlock. In RoCE, writing
(usually results from netdev notifier) involves invoking the vendor's
add_gid and del_gid callbacks, which could sleep.
Therefore, an invalid flag is added for each entry. Updates for RoCE are
done via a workqueue, thus sleeping is permitted.

In IB, updates are done in write_lock_irq(&device->cache.lock), thus
write_gid isn't allowed to sleep and add_gid/del_gid are not called.

When passing net-device into/out-of the GID cache, the device
is always passed held (dev_hold).

The code uses a single work item for updating all RDMA devices,
following a netdev or inet notifier.

The patch moves the cache from being a client (which was incorrect,
as the cache is part of the IB infrastructure) to being explicitly
initialized/freed when a device is registered/removed.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:08:50 -04:00