Pull vfs update from Al Viro:
- big one - consolidation of descriptor-related logics; almost all of
that is moved to fs/file.c
(BTW, I'm seriously tempted to rename the result to fd.c. As it is,
we have a situation when file_table.c is about handling of struct
file and file.c is about handling of descriptor tables; the reasons
are historical - file_table.c used to be about a static array of
struct file we used to have way back).
A lot of stray ends got cleaned up and converted to saner primitives,
disgusting mess in android/binder.c is still disgusting, but at least
doesn't poke so much in descriptor table guts anymore. A bunch of
relatively minor races got fixed in process, plus an ext4 struct file
leak.
- related thing - fget_light() partially unuglified; see fdget() in
there (and yes, it generates the code as good as we used to have).
- also related - bits of Cyrill's procfs stuff that got entangled into
that work; _not_ all of it, just the initial move to fs/proc/fd.c and
switch of fdinfo to seq_file.
- Alex's fs/coredump.c spiltoff - the same story, had been easier to
take that commit than mess with conflicts. The rest is a separate
pile, this was just a mechanical code movement.
- a few misc patches all over the place. Not all for this cycle,
there'll be more (and quite a few currently sit in akpm's tree)."
Fix up trivial conflicts in the android binder driver, and some fairly
simple conflicts due to two different changes to the sock_alloc_file()
interface ("take descriptor handling from sock_alloc_file() to callers"
vs "net: Providing protocol type via system.sockprotoname xattr of
/proc/PID/fd entries" adding a dentry name to the socket)
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
MAX_LFS_FILESIZE should be a loff_t
compat: fs: Generic compat_sys_sendfile implementation
fs: push rcu_barrier() from deactivate_locked_super() to filesystems
btrfs: reada_extent doesn't need kref for refcount
coredump: move core dump functionality into its own file
coredump: prevent double-free on an error path in core dumper
usb/gadget: fix misannotations
fcntl: fix misannotations
ceph: don't abuse d_delete() on failure exits
hypfs: ->d_parent is never NULL or negative
vfs: delete surplus inode NULL check
switch simple cases of fget_light to fdget
new helpers: fdget()/fdput()
switch o2hb_region_dev_write() to fget_light()
proc_map_files_readdir(): don't bother with grabbing files
make get_file() return its argument
vhost_set_vring(): turn pollstart/pollstop into bool
switch prctl_set_mm_exe_file() to fget_light()
switch xfs_find_handle() to fget_light()
switch xfs_swapext() to fget_light()
...
- mlx4 IB support for SR-IOV
- A couple of SRP initiator fixes
- Batch of nes hardware driver fixes
- Fix for long-standing use-after-free crash in IPoIB
- Other miscellaneous fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABCAAGBQJQav4jAAoJEENa44ZhAt0hmL0QAJTuMdSOzYFd/NB38owJCNM2
kz/N1GlBm3z98fIlGo8u+lzgV2qxqZSAzsJsouMeK38KiAX3CL8HKe44A1QvTM6v
dXTNL4JFX24/YF+nlmMY8Av518I9Mkte3BZCnpYkBjVFBWe0ePwoRC/btfBXPDIV
0snq4OtjoBAn00dOOyuZ5PoyY9xf0z4UB0Gple9sM4mzEb8wVWdNDDPOiuPJc6fA
L+gk6HLkZDg54+QswafdKYwpeTq45wIKLmCdS3oUNmppMLVhZY8rECOwzSa+KiTr
/Yo19n+zl+IBlvjQHhmUqGHvdD17PaGlr+TckAsQqmVfXUH5qqpEnkF8FoEK59c5
YA3lVU8Sj4BPhJ7qX54CuN3767mZizakkxCr9iPRzABFTgzWVgcSgCrE8jjx4i0h
Pam+L5bmANFStgmGR8PmXiNgcrCUcEqYHsOWDDAnHa5ekb2nyv1JL1c18hlY9hC3
Xb1YTMZFwvofGza89hBu7oHrMbLOUc5kW2lBpvUn2nlyf3i0F8ISlVbVbNjFA54p
60/jHa2VOQ2CcJUJKnJOk4ajOOEfHnPtMn2q96XJ69Dp8+eSYEO/G+0i1OlChq4h
ClnG0Yp+NkT1o8WXMd7guDR+RsXt+DXIij5TiUWRIqnIlopIsMTRhNH28tMu4jQL
fgN5n987wru91ewdX4gW
=PAcy
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull infiniband updates from Roland Dreier:
"First batch of InfiniBand/RDMA changes for the 3.7 merge window:
- mlx4 IB support for SR-IOV
- A couple of SRP initiator fixes
- Batch of nes hardware driver fixes
- Fix for long-standing use-after-free crash in IPoIB
- Other miscellaneous fixes"
This merge also removes a new use of __cancel_delayed_work(), and
replaces it with the regular cancel_delayed_work() that is now irq-safe
thanks to the workqueue updates.
That said, I suspect the sequence in question should probably use
"mod_delayed_work()". I just did the minimal "don't use deprecated
functions" fixup, though.
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (45 commits)
IB/qib: Fix local access validation for user MRs
mlx4_core: Disable SENSE_PORT for multifunction devices
mlx4_core: Clean up enabling of SENSE_PORT for older (ConnectX-1/-2) HCAs
mlx4_core: Stash PCI ID driver_data in mlx4_priv structure
IB/srp: Avoid having aborted requests hang
IB/srp: Fix use-after-free in srp_reset_req()
IB/qib: Add a qib driver version
RDMA/nes: Fix compilation error when nes_debug is enabled
RDMA/nes: Print hardware resource type
RDMA/nes: Fix for crash when TX checksum offload is off
RDMA/nes: Cosmetic changes
RDMA/nes: Fix for incorrect MSS when TSO is on
RDMA/nes: Fix incorrect resolving of the loopback MAC address
mlx4_core: Fix crash on uninitialized priv->cmd.slave_sem
mlx4_core: Trivial cleanups to driver log messages
mlx4_core: Trivial readability fix: "0X30" -> "0x30"
IB/mlx4: Create paravirt contexts for VFs when master IB driver initializes
mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations
mlx4: Paravirtualize Node Guids for slaves
mlx4: Activate SR-IOV mode for IB
...
Pull networking changes from David Miller:
1) GRE now works over ipv6, from Dmitry Kozlov.
2) Make SCTP more network namespace aware, from Eric Biederman.
3) TEAM driver now works with non-ethernet devices, from Jiri Pirko.
4) Make openvswitch network namespace aware, from Pravin B Shelar.
5) IPV6 NAT implementation, from Patrick McHardy.
6) Server side support for TCP Fast Open, from Jerry Chu and others.
7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel
Borkmann.
8) Increate the loopback default MTU to 64K, from Eric Dumazet.
9) Use a per-task rather than per-socket page fragment allocator for
outgoing networking traffic. This benefits processes that have very
many mostly idle sockets, which is quite common.
From Eric Dumazet.
10) Use up to 32K for page fragment allocations, with fallbacks to
smaller sizes when higher order page allocations fail. Benefits are
a) less segments for driver to process b) less calls to page
allocator c) less waste of space.
From Eric Dumazet.
11) Allow GRO to be used on GRE tunnels, from Eric Dumazet.
12) VXLAN device driver, one way to handle VLAN issues such as the
limitation of 4096 VLAN IDs yet still have some level of isolation.
From Stephen Hemminger.
13) As usual there is a large boatload of driver changes, with the scale
perhaps tilted towards the wireless side this time around.
Fix up various fairly trivial conflicts, mostly caused by the user
namespace changes.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits)
hyperv: Add buffer for extended info after the RNDIS response message.
hyperv: Report actual status in receive completion packet
hyperv: Remove extra allocated space for recv_pkt_list elements
hyperv: Fix page buffer handling in rndis_filter_send_request()
hyperv: Fix the missing return value in rndis_filter_set_packet_filter()
hyperv: Fix the max_xfer_size in RNDIS initialization
vxlan: put UDP socket in correct namespace
vxlan: Depend on CONFIG_INET
sfc: Fix the reported priorities of different filter types
sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP
sfc: Fix loopback self-test with separate_tx_channels=1
sfc: Fix MCDI structure field lookup
sfc: Add parentheses around use of bitfield macro arguments
sfc: Fix null function pointer in efx_sriov_channel_type
vxlan: virtual extensible lan
igmp: export symbol ip_mc_leave_group
netlink: add attributes to fdb interface
tg3: unconditionally select HWMON support when tg3 is enabled.
Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT"
gre: fix sparse warning
...
Pull workqueue changes from Tejun Heo:
"This is workqueue updates for v3.7-rc1. A lot of activities this
round including considerable API and behavior cleanups.
* delayed_work combines a timer and a work item. The handling of the
timer part has always been a bit clunky leading to confusing
cancelation API with weird corner-case behaviors. delayed_work is
updated to use new IRQ safe timer and cancelation now works as
expected.
* Another deficiency of delayed_work was lack of the counterpart of
mod_timer() which led to cancel+queue combinations or open-coded
timer+work usages. mod_delayed_work[_on]() are added.
These two delayed_work changes make delayed_work provide interface
and behave like timer which is executed with process context.
* A work item could be executed concurrently on multiple CPUs, which
is rather unintuitive and made flush_work() behavior confusing and
half-broken under certain circumstances. This problem doesn't
exist for non-reentrant workqueues. While non-reentrancy check
isn't free, the overhead is incurred only when a work item bounces
across different CPUs and even in simulated pathological scenario
the overhead isn't too high.
All workqueues are made non-reentrant. This removes the
distinction between flush_[delayed_]work() and
flush_[delayed_]_work_sync(). The former is now as strong as the
latter and the specified work item is guaranteed to have finished
execution of any previous queueing on return.
* In addition to the various bug fixes, Lai redid and simplified CPU
hotplug handling significantly.
* Joonsoo introduced system_highpri_wq and used it during CPU
hotplug.
There are two merge commits - one to pull in IRQ safe timer from
tip/timers/core and the other to pull in CPU hotplug fixes from
wq/for-3.6-fixes as Lai's hotplug restructuring depended on them."
Fixed a number of trivial conflicts, but the more interesting conflicts
were silent ones where the deprecated interfaces had been used by new
code in the merge window, and thus didn't cause any real data conflicts.
Tejun pointed out a few of them, I fixed a couple more.
* 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (46 commits)
workqueue: remove spurious WARN_ON_ONCE(in_irq()) from try_to_grab_pending()
workqueue: use cwq_set_max_active() helper for workqueue_set_max_active()
workqueue: introduce cwq_set_max_active() helper for thaw_workqueues()
workqueue: remove @delayed from cwq_dec_nr_in_flight()
workqueue: fix possible stall on try_to_grab_pending() of a delayed work item
workqueue: use hotcpu_notifier() for workqueue_cpu_down_callback()
workqueue: use __cpuinit instead of __devinit for cpu callbacks
workqueue: rename manager_mutex to assoc_mutex
workqueue: WORKER_REBIND is no longer necessary for idle rebinding
workqueue: WORKER_REBIND is no longer necessary for busy rebinding
workqueue: reimplement idle worker rebinding
workqueue: deprecate __cancel_delayed_work()
workqueue: reimplement cancel_delayed_work() using try_to_grab_pending()
workqueue: use mod_delayed_work() instead of __cancel + queue
workqueue: use irqsafe timer for delayed_work
workqueue: clean up delayed_work initializers and add missing one
workqueue: make deferrable delayed_work initializer names consistent
workqueue: cosmetic whitespace updates for macro definitions
workqueue: deprecate system_nrt[_freezable]_wq
workqueue: deprecate flush[_delayed]_work_sync()
...
When P_Key tables potentially contain both full and partial membership
copies for the same P_Key, we need a function to find the index for an
exact (16-bit) P_Key.
This is necessary when the master forwards QP1 MADs sent by guests.
If the guest has sent the MAD with a limited membership P_Key, we need
to to forward the MAD using the same limited membership P_Key. Since
the master may have both the limited and the full member P_Keys in its
table, we must make sure to retrieve the limited membership P_Key in
this case.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Extend the cached and non-cached P_Key table lookups to handle limited
and full membership of the same P_Key to co-exist in the P_Key table.
This is necessary for SR-IOV, to allow for some guests would to have
the full membership P_Key in their virtual P_Key table, while other
guests on the same physical HCA would have the limited one.
To support this, we need both the limited and full membership P_Keys
to be present in the master's (hypervisor physical port) P_Key table.
The algorithm for handling P_Key tables which contain both the limited
and the full membership versions of the same P_Key works as follows:
When scanning the P_Key table for a 15-bit P_Key:
A. If there is a full member version of that P_Key anywhere in the
table, return its index (even if a limited-member version of the
P_Key exists earlier in the table).
B. If the full member version is not in the table, but the
limited-member version is in the table, return the index of the
limited P_Key.
Signed-off-by: Liran Liss <liranl@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
CMA multicast joins for the IPoIB port space need to use the same
component mask used by the ipoib driver. Otherwise, it's possible for
the CMA to create a group to which a join made by ipoib will fail, or
vise-versa. Some of the component mask fields set by ipoib weren't
set by the CMA, fix that.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch defines netlink_kernel_create as a wrapper function of
__netlink_kernel_create to hide the struct module *me parameter
(which seems to be THIS_MODULE in all existing netlink subsystems).
Suggested by David S. Miller.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that cancel_delayed_work() can be safely called from IRQ handlers,
there's no reason to use __cancel_delayed_work(). Use
cancel_delayed_work() instead of __cancel_delayed_work() and mark the
latter deprecated.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Roland Dreier <roland@kernel.org>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Now that mod_delayed_work() is safe to call from IRQ handlers,
__cancel_delayed_work() followed by queue_delayed_work() can be
replaced with mod_delayed_work().
Most conversions are straight-forward except for the following.
* net/core/link_watch.c: linkwatch_schedule_work() was doing a quite
elaborate dancing around its delayed_work. Collapse it such that
linkwatch_work is queued for immediate execution if LW_URGENT and
existing timer is kept otherwise.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Convert delayed_work users doing cancel_delayed_work() followed by
queue_delayed_work() to mod_delayed_work().
Most conversions are straight-forward. Ones worth mentioning are,
* drivers/edac/edac_mc.c: edac_mc_workq_setup() converted to always
use mod_delayed_work() and cancel loop in
edac_mc_reset_delay_period() is dropped.
* drivers/platform/x86/thinkpad_acpi.c: No need to remember whether
watchdog is active or not. @fan_watchdog_active and related code
dropped.
* drivers/power/charger-manager.c: Seemingly a lot of
delayed_work_pending() abuse going on here.
[delayed_]work_pending() are unsynchronized and racy when used like
this. I converted one instance in fullbatt_handler(). Please
conver the rest so that it invokes workqueue APIs for the intended
target state rather than trying to game work item pending state
transitions. e.g. if timer should be modified - call
mod_delayed_work(), canceled - call cancel_delayed_work[_sync]().
* drivers/thermal/thermal_sys.c: thermal_zone_device_set_polling()
simplified. Note that round_jiffies() calls in this function are
meaningless. round_jiffies() work on absolute jiffies not delta
delay used by delayed_work.
v2: Tomi pointed out that __cancel_delayed_work() users can't be
safely converted to mod_delayed_work(). They could be calling it
from irq context and if that happens while delayed_work_timer_fn()
is running, it could deadlock. __cancel_delayed_work() users are
dropped.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Anton Vorontsov <cbouatmailru@gmail.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Roland Dreier <roland@kernel.org>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
It is possible for asynchronous RDMA_CM_EVENT_ESTABLISHED events to be
generated with ctx->uid == 0, because ucma_set_event_context() copies
ctx->uid to the event structure outside of ctx->file->mut. This leads
to a crash in the userspace library, since it gets a bogus event.
Fix this by taking the mutex a bit earlier in ucma_event_handler.
Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Sean Hefty <Sean.Hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
- Updates to the qib low-level driver
- First chunk of changes for SR-IOV support for mlx4 IB
- RDMA CM support for IPv6-only binding
- Other misc cleanups and fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABCAAGBQJQDXhUAAoJEENa44ZhAt0hZxwQAJydS9f9me31AGa45SAq8rA8
6e3LgnQ6jS38he7LZZdSsT7g25jROCKOcTj6VUkNSVAVSkK8zcp7wngjDxw50IK6
FF9hNeljMkWBflOhxzB34DRGCW4b4J+Yt1o7v1RtawiG/Mhri/6imKL0Aqjt4GwX
a1MPZn+xI2osLujfdHJtATPWWB9jCaXdFe4DJUNPdqJhS6TN7s8OP3XMiqoJtdaV
ptHeSSbjdUR1mg/h2LU2FVmWHXNSBxn7MEsrBBRkQVyiEkXieFBLwPTp4DqmAUJf
xugm6Hf4sbiQ+QuU0baJODt56wveuYVQ4HKUzE0urUFXyU4TUB9blrehWZnKsRte
wo/w4nvVlnXgGhHH0Igq76RDX8aCwc/6uQJ/29oChWjrei3HE0LjmIlPAu0vAhyw
ViLe02/r2gKXQv1NxIqhPmGsJTZizg2mUk2eEJHHPQb/NGL7iNo6b+141AfURoqQ
goGmlGdzffCRpmo6FXFZ57RPVRS4gwMunCY/Pmvq5a2t4oZh8899l2+V3N7bCfvH
+JdavxAjia9U4IlgPsAqVaz8z8TyHOY/lEd75Wnw8q9www3kLx7hASvHsdwrS4LL
ihzECzsaXOSoIXQzghs+6iyp1pzmzE9ve8OqbIGJStzlrOLyn7Gjo+Ixwm0SLsTO
I7h9iJ1OMKFZ8bzmHsWb
=f09j
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull InfiniBand/RDMA changes from Roland Dreier:
- Updates to the qib low-level driver
- First chunk of changes for SR-IOV support for mlx4 IB
- RDMA CM support for IPv6-only binding
- Other misc cleanups and fixes
Fix up some add-add conflicts in include/linux/mlx4/device.h and
drivers/net/ethernet/mellanox/mlx4/main.c
* tag 'rdma-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (30 commits)
IB/qib: checkpatch fixes
IB/qib: Add congestion control agent implementation
IB/qib: Reduce sdma_lock contention
IB/qib: Fix an incorrect log message
IB/qib: Fix QP RCU sparse warnings
mlx4: Put physical GID and P_Key table sizes in mlx4_phys_caps struct and paravirtualize them
mlx4_core: Allow guests to have IB ports
mlx4_core: Implement mechanism for reserved Q_Keys
net/mlx4_core: Free ICM table in case of error
IB/cm: Destroy idr as part of the module init error flow
mlx4_core: Remove double function declarations
IB/mlx4: Fill the masked_atomic_cap attribute in query device
IB/mthca: Fill in sq_sig_type in query QP
IB/mthca: Warning about event for non-existent QPs should show event type
IB/qib: Fix sparse RCU warnings in qib_keys.c
net/mlx4_core: Initialize IB port capabilities for all slaves
mlx4: Use port management change event instead of smp_snoop
IB/qib: RCU locking for MR validation
IB/qib: Avoid returning EBUSY from MR deregister
IB/qib: Fix UC MR refs for immediate operations
...
Clean the idr as part of the error flow since it is a resource too.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
These macros will be reused by the mlx4 SRIOV-IB CM paravirtualization
code, and there is no reason to have them declared both in the IB core
in the mlx4 IB driver.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This query is needed for SRIOV alias GUID support.
The query is implemented per the IB Spec definition
in section 15.2.5.18 (GuidInfoRecord).
Signed-off-by: Erez Shitrit <erezsh@mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Provide an option for the user to specify that listens should only
accept connections where the incoming address family matches that of
the locally bound address. This is used to support the equivalent of
IPV6_V6ONLY socket option, which allows an app to only accept
connection requests directed to IPv6 addresses.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The rdma_cm maps IPv4 and IPv6 addresses to the same service ID. This
prevents apps from listening only for IPv4 or IPv6 addresses. It also
results in an app binding to an IPv4 address receiving connection
requests for an IPv6 address.
Change this to match socket behavior: restrict listens on IPv4
addresses to only IPv4 addresses, and if a listen is on an IPv6
address, allow it to receive either IPv4 or IPv6 addresses, based on
its address family binding.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The RDMA CM uses a single port space for all associated (tcp, udp,
etc.) port bindings, regardless of the address family that the user
binds to. The result is that if a user binds to AF_INET, but does not
specify an IP address, the bind will occur for AF_INET6. This causes
an attempt to bind to the same port using AF_INET6 to fail, and
connection requests to AF_INET6 will match with the AF_INET listener.
Align the behavior with sockets and restrict the bind to AF_INET only.
If a user binds to AF_INET6, we bind the port to AF_INET6 and
AF_INET depending on the value of bindv6only.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch adds the following structure:
struct netlink_kernel_cfg {
unsigned int groups;
void (*input)(struct sk_buff *skb);
struct mutex *cb_mutex;
};
That can be passed to netlink_kernel_create to set optional configurations
for netlink kernel sockets.
I've populated this structure by looking for NULL and zero parameters at the
existing code. The remaining parameters that always need to be set are still
left in the original interface.
That includes optional parameters for the netlink socket creation. This allows
easy extensibility of this interface in the future.
This patch also adapts all callers to use this new interface.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/caif/caif_hsi.c
drivers/net/usb/qmi_wwan.c
The qmi_wwan merge was trivial.
The caif_hsi.c, on the other hand, was not. It's a conflict between
1c385f1fdf ("caif-hsi: Replace platform
device with ops structure.") in the net-next tree and commit
39abbaef19 ("caif-hsi: Postpone init of
HIS until open()") in the net tree.
I did my best with that one and will ask Sjur to check it out.
Signed-off-by: David S. Miller <davem@davemloft.net>
Change || check to the intended && when checking the QP type in a
received connection request against the listening endpoint.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
- Add ocrdma hardware driver for Emulex IB-over-Ethernet adapters
- Add generic and mlx4 support for "raw" QPs: allow suitably privileged
applications to send and receive arbitrary packets directly to/from
the hardware
- Add "doorbell drop" handling to the cxgb4 driver
- A fairly large batch of qib hardware driver changes
- A few fixes for lockdep-detected issues
- A few other miscellaneous fixes and cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABCAAGBQJPumbZAAoJEENa44ZhAt0h8LgP/0fXe7Szm3n6P6UvMAVqkagM
4PpreH3mpWUFpzqeQE1JPDtgx700R6aPipbHqgIN+k61RWMpLjICGcNx7iwxn1I+
zqdquGygWgjceLz+BLVlk+iBmJt3vZ3fPRAXc7fdP+jhIarWkNIOy1pXWTUuRvED
jL8jIaxhCcgAVzm/zNyt6IPxkaHvCz7K9wqmpyU0dsO9OyPdGvWA9+CkGXwmOCPq
mxSVhWnfGsMkPBsL7EgTC5KP/ox2PKq6rFgysmVVS+rKCpP0L8BEVQyGX3Gf8KA8
yV+KdTi9ofDnFrv6R7Wz0v7HRUih8GRssakzBu7Y7HLfK1M/QwMG0GUAibXGZObc
vUXuQ3uRJ/cIzMPXqKeGYwpb5t+TmxyjhWu44OjCUQkNau91+9BSbA69S88KXc49
aTJiCZlhPuGf4uGMWJJuPLcE2xO2QCZj+8ckL2STYrIip6GWlCH02kJaQmRkuWH2
UhMOeJDBC4nvh4EQT/WwHpGzyhkavE2ayfo5YemxBJXo+P5Mdbf7WIDRQDLUEeQH
F8sPoccH4hDiAorN/SkTsm14jVTP7oWW1M40Ont59Nhbgm88MsVkvjoneHnfBvbD
HjK92soCWnYTAoREfj0G4xUxZgMdOZcezWrX0rx5LJ8Ju9y4zAi3cKGr7lg6hs4X
syKfN0VjiDRtJ+pxayi3
=yWfr
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull InfiniBand/RDMA changes from Roland Dreier:
- Add ocrdma hardware driver for Emulex IB-over-Ethernet adapters
- Add generic and mlx4 support for "raw" QPs: allow suitably privileged
applications to send and receive arbitrary packets directly to/from
the hardware
- Add "doorbell drop" handling to the cxgb4 driver
- A fairly large batch of qib hardware driver changes
- A few fixes for lockdep-detected issues
- A few other miscellaneous fixes and cleanups
Fix up trivial conflict in drivers/net/ethernet/emulex/benet/be.h.
* tag 'rdma-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (53 commits)
RDMA/cxgb4: Include vmalloc.h for vmalloc and vfree
IB/mlx4: Fix mlx4_ib_add() error flow
IB/core: Fix IB_SA_COMP_MASK macro
IB/iser: Fix error flow in iser ep connection establishment
IB/mlx4: Increase the number of vectors (EQs) available for ULPs
RDMA/cxgb4: Add query_qp support
RDMA/cxgb4: Remove kfifo usage
RDMA/cxgb4: Use vmalloc() for debugfs QP dump
RDMA/cxgb4: DB Drop Recovery for RDMA and LLD queues
RDMA/cxgb4: Disable interrupts in c4iw_ev_dispatch()
RDMA/cxgb4: Add DB Overflow Avoidance
RDMA/cxgb4: Add debugfs RDMA memory stats
cxgb4: DB Drop Recovery for RDMA and LLD queues
cxgb4: Common platform specific changes for DB Drop Recovery
cxgb4: Detect DB FULL events and notify RDMA ULD
RDMA/cxgb4: Drop peer_abort when no endpoint found
RDMA/cxgb4: Always wake up waiters in c4iw_peer_abort_intr()
mlx4_core: Change bitmap allocator to work in round-robin fashion
RDMA/nes: Don't call event handler if pointer is NULL
RDMA/nes: Fix for the ORD value of the connecting peer
...
Commit bc3e53f682 ("mm: distinguish between mlocked and pinned
pages") introduced a separate counter for pinned pages and used it in
the IB stack. However, in ib_umem_get() the pinned counter is
incremented, but ib_umem_release() wrongly decrements the locked
counter. Fix this.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
IB_QPT_RAW_PACKET allows applications to build a complete packet,
including L2 headers, when sending; on the receive side, the HW will
not strip any headers.
This QP type is designed for userspace direct access to Ethernet; for
example by applications that do TCP/IP themselves. Only processes
with the NET_RAW capability are allowed to create raw packet QPs (the
name "raw packet QP" is supposed to suggest an analogy to AF_PACKET /
SOL_RAW sockets).
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The following lockdep problem was reported by Or Gerlitz <ogerlitz@mellanox.com>:
[ INFO: possible recursive locking detected ]
3.3.0-32035-g1b2649e-dirty #4 Not tainted
---------------------------------------------
kworker/5:1/418 is trying to acquire lock:
(&id_priv->handler_mutex){+.+.+.}, at: [<ffffffffa0138a41>] rdma_destroy_i d+0x33/0x1f0 [rdma_cm]
but task is already holding lock:
(&id_priv->handler_mutex){+.+.+.}, at: [<ffffffffa0135130>] cma_disable_ca llback+0x24/0x45 [rdma_cm]
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&id_priv->handler_mutex);
lock(&id_priv->handler_mutex);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by kworker/5:1/418:
#0: (ib_cm){.+.+.+}, at: [<ffffffff81042ac1>] process_one_work+0x210/0x4a 6
#1: ((&(&work->work)->work)){+.+.+.}, at: [<ffffffff81042ac1>] process_on e_work+0x210/0x4a6
#2: (&id_priv->handler_mutex){+.+.+.}, at: [<ffffffffa0135130>] cma_disab le_callback+0x24/0x45 [rdma_cm]
stack backtrace:
Pid: 418, comm: kworker/5:1 Not tainted 3.3.0-32035-g1b2649e-dirty #4
Call Trace:
[<ffffffff8102b0fb>] ? console_unlock+0x1f4/0x204
[<ffffffff81068771>] __lock_acquire+0x16b5/0x174e
[<ffffffff8106461f>] ? save_trace+0x3f/0xb3
[<ffffffff810688fa>] lock_acquire+0xf0/0x116
[<ffffffffa0138a41>] ? rdma_destroy_id+0x33/0x1f0 [rdma_cm]
[<ffffffff81364351>] mutex_lock_nested+0x64/0x2ce
[<ffffffffa0138a41>] ? rdma_destroy_id+0x33/0x1f0 [rdma_cm]
[<ffffffff81065a78>] ? trace_hardirqs_on_caller+0x11e/0x155
[<ffffffff81065abc>] ? trace_hardirqs_on+0xd/0xf
[<ffffffffa0138a41>] rdma_destroy_id+0x33/0x1f0 [rdma_cm]
[<ffffffffa0139c02>] cma_req_handler+0x418/0x644 [rdma_cm]
[<ffffffffa012ee88>] cm_process_work+0x32/0x119 [ib_cm]
[<ffffffffa0130299>] cm_req_handler+0x928/0x982 [ib_cm]
[<ffffffffa01302f3>] ? cm_req_handler+0x982/0x982 [ib_cm]
[<ffffffffa0130326>] cm_work_handler+0x33/0xfe5 [ib_cm]
[<ffffffff81065a78>] ? trace_hardirqs_on_caller+0x11e/0x155
[<ffffffffa01302f3>] ? cm_req_handler+0x982/0x982 [ib_cm]
[<ffffffff81042b6e>] process_one_work+0x2bd/0x4a6
[<ffffffff81042ac1>] ? process_one_work+0x210/0x4a6
[<ffffffff813669f3>] ? _raw_spin_unlock_irq+0x2b/0x40
[<ffffffff8104316e>] worker_thread+0x1d6/0x350
[<ffffffff81042f98>] ? rescuer_thread+0x241/0x241
[<ffffffff81046a32>] kthread+0x84/0x8c
[<ffffffff8136e854>] kernel_thread_helper+0x4/0x10
[<ffffffff81366d59>] ? retint_restore_args+0xe/0xe
[<ffffffff810469ae>] ? __init_kthread_worker+0x56/0x56
[<ffffffff8136e850>] ? gs_change+0xb/0xb
The actual locking is fine, since we're dealing with different locks,
but from the same lock class. cma_disable_callback() acquires the
listening id mutex, whereas rdma_destroy_id() acquires the mutex for
the new connection id. To fix this, delay the call to
rdma_destroy_id() until we've released the listening id mutex.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Add names for our lockdep classes, so instead of having to decipher
lockdep output with mysterious names:
Chain exists of:
key#14 --> key#11 --> key#13
lockdep will give us something nicer:
Chain exists of:
SRQ-uobj --> PD-uobj --> CQ-uobj
Signed-off-by: Roland Dreier <roland@purestorage.com>
Just as we don't allow PDs, CQs, etc. to be destroyed if there are QPs
that are attached to them, don't let a QP be destroyed if there are
multicast group(s) attached to it. Use the existing usecnt field of
struct ib_qp which was added by commit 0e0ec7e ("RDMA/core: Export
ib_open_qp() to share XRC TGT QPs") to track this.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Conflicts:
drivers/net/ethernet/intel/e1000e/param.c
drivers/net/wireless/iwlwifi/iwl-agn-rx.c
drivers/net/wireless/iwlwifi/iwl-trans-pcie-rx.c
drivers/net/wireless/iwlwifi/iwl-trans.h
Resolved the iwlwifi conflict with mainline using 3-way diff posted
by John Linville and Stephen Rothwell. In 'net' we added a bug
fix to make iwlwifi report a more accurate skb->truesize but this
conflicted with RX path changes that happened meanwhile in net-next.
In e1000e a conflict arose in the validation code for settings of
adapter->itr. 'net-next' had more sophisticated logic so that
logic was used.
Signed-off-by: David S. Miller <davem@davemloft.net>
- fix memory leak in mlx4
- fix two problems with new MAD response generation code
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABCAAGBQJPmYfMAAoJEENa44ZhAt0h06IP/jPJeH855wC2RgWXY2Fd+b6w
K/DQVE2MaX2f8EHuAB5MKoC0zzchKNu0GTMbrZMizbocDBmIU6ibF3MKx2rycSBC
FI906X7FFAeSyaywrI+k+ZGxVQYAck1QFDr44WiyJ400r8BGbx9ZECKvIyEwjaUa
RpRn4Pui4IjMV7GFqiziUsa9uo4xKYagjNfeInAZXelzIN76Coeae28dAIE2rPT/
BTcrtQ2JMG6H4AVGTImTP+Kqk1Cm7apBeHS4+SGWWTdw1WH4IcMqPmxHBUUtKTWt
UnlgpHquuC913VpkbkGeog32iAndhHgAkfs67Rs0IFwck127Il/DiPK/zir2zkZ2
YVilzYIxrSUYqPZYfKFcrKsnjfM+HJxBEPCfDjxfG0TjrVS8tnc75Uq1NvIt6e42
lcIclLDyZtdsMLmJlsTsT5cF7HlThSjpZDSgmMi2eeOI0RKoqRyyGGwiAIq7Ud2a
oRa+bheh4pBD2lvzd62uEkgMkTaToAz3I/vlbOTSwxYg2skoHUpwNPCsmstsQ7RU
BHi7Nl5DLNT5lNBTIouWvbilpXs8eXwgcNrQcWSYaPeY2aDsGAAdi9Q4BadENao9
dmxOtqjryctX7U7LGDRmNDIfiyPY5AodDUJjFux1JwqbdSCGf1Lz7uvDWu7gqL9V
TCJFAK54okexpoyV6Vgv
=iRrT
-----END PGP SIGNATURE-----
Merge tag 'ib-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull infiniband fixes from Roland Dreier:
"A few fixes for regressions introduced in 3.4-rc1:
- fix memory leak in mlx4
- fix two problems with new MAD response generation code"
* tag 'ib-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/mlx4: Fix memory leaks in ib_link_query_port()
IB/mad: Don't send response for failed MADs
IB/mad: Set 'D' bit in response for unhandled MADs
Commit 0b30704304 ("IB/mad: Return error response for unsupported
MADs") does not failed MADs (eg those that return
IB_MAD_RESULT_FAILURE) properly -- these MADs should be silently
discarded. (We should not force the lower-layer drivers to return
SUCCESS | CONSUMED in this case, since the MAD is NOT successful).
Unsupported MADs are not failures -- they return SUCCESS, but with an
"unsupported error" status value inside the response MAD.
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Commit 0b30704304 ("IB/mad: Return error response for unsupported
MADs") does not handle directed-route MADs properly -- it fails to set
the 'D' bit in the response MAD status field. This is a problem for
SmInfo MADs when the receiver does not have an SM running.
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This results in code with less boiler plate that is a bit easier
to read.
Additionally stops us from using compatibility code in the sysctl
core, hastening the day when the compatibility code can be removed.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This makes it clearer which sysctls are relative to your current network
namespace.
This makes it a little less error prone by not exposing sysctls for the
initial network namespace in other namespaces.
This is the same way we handle all of our other network interfaces to
userspace and I can't honestly remember why we didn't do this for
sysctls right from the start.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both tagged traffic and untagged traffic use tc tool mapping.
Treat RDMA TOS same as IP TOS when mapping to SL
Signed-off-by: Amir Vadai <amirv@mellanox.com>
CC: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit e9319b0cb0 ("IB/core: Fix SDR rates in sysfs") changed our
sysfs rate attribute to return EINVAL to userspace if the underlying
device driver returns an invalid rate. Apparently some drivers do this
when the link is down and some userspace pukes if it gets an error when
reading this attribute, so avoid a regression by not return an error to
match the old code.
Signed-off-by: Roland Dreier <roland@purestorage.com>
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.
Signed-off-by: David S. Miller <davem@davemloft.net>
stands out; by patch count lots of fixes to the mlx4 driver plus some
cleanups and fixes to the core and other drivers.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABCAAGBQJPZ2THAAoJEENa44ZhAt0hMgkP/AwXjwzz6lYlIizytQ5KOH11
CT/VoHLIGIwY31aRhZRHggWLEbJi8akgfh04K9PiSh/muFRPYk5KJDoQEoJV5Odv
12krqtpZeL41YcAPq0wiyP3Ki5AZDCDumzWbOtmxE9m93mVfi3yFTTWDHFo/7SOx
A2kUITdKuZhgBpCFYS23Gs8QTWcMPUlwNlM76edG90BjSySHsK15tbqTUaEOC6Ux
SPG0p0c2YjlZCPmRObrCL65o5LFRVajinZ4rXN7rre4LEU+IuXgW+mQU2Kwy8z6a
oMPE2l4OMSqj270r2PlVK087gGH69nKfLgLQLJiP0Id+KwdYUk7vQcA+Fu0jwjP3
Ci/ahllvB76IiaNbax0vTy2ohCJmilLLotAP46NW0vPR2/KnmVy0Mqk8pXQQddw4
tnAFNnafn/MjTty8GwqyXR/enJZs29ePcSxcYnKjXYRIgG4ldejL2wktgSra8+gb
3/qFag1HRgZzcICkXGpHj7uWJ2KDe++1m6KTb0THUmrjuiel6ATFZpYoeRed3GfS
ZMqpjZvuXM8nA9CrKMkzm5vIiUFF8Qq0hUTLR38F8FiNEhmC08JqH5PqjJ3Z18if
R1yG4W4xmp/0ICHg4zyg6LWV3YsweDD+jSCJpW+KNBvu3HtYb41HIlK+i2TTmtPD
3etEZZRwROAyYKcwN01p
=DbLx
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull InfiniBand/RDMA changes for the 3.4 merge window from Roland Dreier:
"Nothing big really stands out; by patch count lots of fixes to the
mlx4 driver plus some cleanups and fixes to the core and other
drivers."
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (28 commits)
mlx4_core: Scale size of MTT table with system RAM
mlx4_core: Allow dynamic MTU configuration for IB ports
IB/mlx4: Fix info returned when querying IBoE ports
IB/mlx4: Fix possible missed completion event
mlx4_core: Report thermal error events
mlx4_core: Fix one more static exported function
IB: Change CQE "csum_ok" field to a bit flag
RDMA/iwcm: Reject connect requests if cmid is not in LISTEN state
RDMA/cxgb3: Don't pass irq flags to flush_qp()
mlx4_core: Get rid of redundant ext_port_cap flags
RDMA/ucma: Fix AB-BA deadlock
IB/ehca: Fix ilog2() compile failure
IB: Use central enum for speed instead of hard-coded values
IB/iser: Post initial receive buffers before sending the final login request
IB/iser: Free IB connection resources in the proper place
IB/srp: Consolidate repetitive sysfs code
IB/srp: Use pr_fmt() and pr_err()/pr_warn()
IB/core: Fix SDR rates in sysfs
mlx4: Enforce device max FMR maps in FMR alloc
IB/mlx4: Set bad_wr for invalid send opcode
...
When destroying a listening cmid, the iwcm first marks the state of
the cmid as DESTROYING, then releases the lock and calls into the
iWARP provider to destroy the endpoint. Since the cmid is not locked,
its possible for the iWARP provider to pass a connection request event
to the iwcm, which will be silently dropped by the iwcm. This causes
the iWARP provider to never free up the resources from this connection
because the assumption is the iwcm will accept or reject this connection.
The solution is to reject these connection requests.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
When we destroy a cm_id, we must purge associated events from the
event queue. If the cm_id is for a listen request, we also purge
corresponding pending connect requests. This requires destroying
the cm_id's associated with the connect requests by calling
rdma_destroy_id(). rdma_destroy_id() blocks until all outstanding
callbacks have completed.
The issue is that we hold file->mut while purging events from the
event queue. We also acquire file->mut in our event handler. Calling
rdma_destroy_id() while holding file->mut can lead to a deadlock,
since the event handler callback cannot acquire file->mut, which
prevents rdma_destroy_id() from completing.
Fix this by moving events to purge from the event queue to a temporary
list. We can then release file->mut and call rdma_destroy_id()
outside of holding any locks.
Bug report by Or Gerlitz <ogerlitz@mellanox.com>:
[ INFO: possible circular locking dependency detected ]
3.3.0-rc5-00008-g79f1e43-dirty #34 Tainted: G I
tgtd/9018 is trying to acquire lock:
(&id_priv->handler_mutex){+.+.+.}, at: [<ffffffffa0359a41>] rdma_destroy_id+0x33/0x1f0 [rdma_cm]
but task is already holding lock:
(&file->mut){+.+.+.}, at: [<ffffffffa02470fe>] ucma_free_ctx+0xb6/0x196 [rdma_ucm]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&file->mut){+.+.+.}:
[<ffffffff810682f3>] lock_acquire+0xf0/0x116
[<ffffffff8135f179>] mutex_lock_nested+0x64/0x2e6
[<ffffffffa0247636>] ucma_event_handler+0x148/0x1dc [rdma_ucm]
[<ffffffffa035a79a>] cma_ib_handler+0x1a7/0x1f7 [rdma_cm]
[<ffffffffa0333e88>] cm_process_work+0x32/0x119 [ib_cm]
[<ffffffffa03362ab>] cm_work_handler+0xfb8/0xfe5 [ib_cm]
[<ffffffff810423e2>] process_one_work+0x2bd/0x4a6
[<ffffffff810429e2>] worker_thread+0x1d6/0x350
[<ffffffff810462a6>] kthread+0x84/0x8c
[<ffffffff81369624>] kernel_thread_helper+0x4/0x10
-> #0 (&id_priv->handler_mutex){+.+.+.}:
[<ffffffff81067b86>] __lock_acquire+0x10d5/0x1752
[<ffffffff810682f3>] lock_acquire+0xf0/0x116
[<ffffffff8135f179>] mutex_lock_nested+0x64/0x2e6
[<ffffffffa0359a41>] rdma_destroy_id+0x33/0x1f0 [rdma_cm]
[<ffffffffa024715f>] ucma_free_ctx+0x117/0x196 [rdma_ucm]
[<ffffffffa0247255>] ucma_close+0x77/0xb4 [rdma_ucm]
[<ffffffff810df6ef>] fput+0x117/0x1cf
[<ffffffff810dc76e>] filp_close+0x6d/0x78
[<ffffffff8102b667>] put_files_struct+0xbd/0x17d
[<ffffffff8102b76d>] exit_files+0x46/0x4e
[<ffffffff8102d057>] do_exit+0x299/0x75d
[<ffffffff8102d599>] do_group_exit+0x7e/0xa9
[<ffffffff8103ae4b>] get_signal_to_deliver+0x536/0x555
[<ffffffff81001717>] do_signal+0x39/0x634
[<ffffffff81001d39>] do_notify_resume+0x27/0x69
[<ffffffff81361c03>] retint_signal+0x46/0x83
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&file->mut);
lock(&id_priv->handler_mutex);
lock(&file->mut);
lock(&id_priv->handler_mutex);
*** DEADLOCK ***
1 lock held by tgtd/9018:
#0: (&file->mut){+.+.+.}, at: [<ffffffffa02470fe>] ucma_free_ctx+0xb6/0x196 [rdma_ucm]
stack backtrace:
Pid: 9018, comm: tgtd Tainted: G I 3.3.0-rc5-00008-g79f1e43-dirty #34
Call Trace:
[<ffffffff81029e9c>] ? console_unlock+0x18e/0x207
[<ffffffff81066433>] print_circular_bug+0x28e/0x29f
[<ffffffff81067b86>] __lock_acquire+0x10d5/0x1752
[<ffffffff810682f3>] lock_acquire+0xf0/0x116
[<ffffffffa0359a41>] ? rdma_destroy_id+0x33/0x1f0 [rdma_cm]
[<ffffffff8135f179>] mutex_lock_nested+0x64/0x2e6
[<ffffffffa0359a41>] ? rdma_destroy_id+0x33/0x1f0 [rdma_cm]
[<ffffffff8106546d>] ? trace_hardirqs_on_caller+0x11e/0x155
[<ffffffff810654b1>] ? trace_hardirqs_on+0xd/0xf
[<ffffffffa0359a41>] rdma_destroy_id+0x33/0x1f0 [rdma_cm]
[<ffffffffa024715f>] ucma_free_ctx+0x117/0x196 [rdma_ucm]
[<ffffffffa0247255>] ucma_close+0x77/0xb4 [rdma_ucm]
[<ffffffff810df6ef>] fput+0x117/0x1cf
[<ffffffff810dc76e>] filp_close+0x6d/0x78
[<ffffffff8102b667>] put_files_struct+0xbd/0x17d
[<ffffffff8102b5cc>] ? put_files_struct+0x22/0x17d
[<ffffffff8102b76d>] exit_files+0x46/0x4e
[<ffffffff8102d057>] do_exit+0x299/0x75d
[<ffffffff8102d599>] do_group_exit+0x7e/0xa9
[<ffffffff8103ae4b>] get_signal_to_deliver+0x536/0x555
[<ffffffff810654b1>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff81001717>] do_signal+0x39/0x634
[<ffffffff8135e037>] ? printk+0x3c/0x45
[<ffffffff8106546d>] ? trace_hardirqs_on_caller+0x11e/0x155
[<ffffffff810654b1>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff81361803>] ? _raw_spin_unlock_irq+0x2b/0x40
[<ffffffff81039011>] ? set_current_blocked+0x44/0x49
[<ffffffff81361bce>] ? retint_signal+0x11/0x83
[<ffffffff81001d39>] do_notify_resume+0x27/0x69
[<ffffffff8118a1fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff81361c03>] retint_signal+0x46/0x83
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The kernel IB stack uses one enumeration for IB speed, which wasn't
explicitly specified in the verbs header file. Add that enum, and use
it all over the code.
The IB speed/width notation is also used by iWARP and IBoE HW drivers,
which use the convention of rate = speed * width to advertise their
port link rate.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Commit 71eeba16 ("IB: Add new InfiniBand link speeds") introduced a bug
where eg the rate for IB 4X SDR links iss displayed as "8.5 Gb/sec"
instead of "10 Gb/sec" as it used to be. Fix that.
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Davem considers that the argument list of this interface is getting
out of control. This patch tries to address this issue following
his proposal:
struct netlink_dump_control c = { .dump = dump, .done = done, ... };
netlink_dump_start(..., &c);
Suggested by David S. Miller.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set up a response with appropriate error status and send it for MADs
that are not supported by a specific class/version.
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Swapna Thete <swapna.thete@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
After reporting a new connection request to user space, the rdma_ucm
will discard subsequent events until the user has associated a user
space idenfier with the kernel cm_id. This is needed to avoid
reporting a reject/disconnect event to the user for a request that
they may not have processed.
The user space identifier is set once the user tries to accept the
connection request. However, the following race exists in ucma_accept():
ctx->uid = cmd.uid;
<events may be reported now>
ret = rdma_accept(ctx->cm_id, ...);
Once ctx->uid has been set, new events may be reported to the user.
While the above mentioned race is avoided, there is an issue that the
user _may_ receive a reject/disconnect event if rdma_accept() fails,
depending on when the event is processed. To simplify the use of
rdma_accept(), discard all events unless rdma_accept() succeeds.
This problem was discovered based on questions from Roland Dreier
<roland@purestorage.com>.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
We have just been investigating kernel panics related to
cq->ibcq.event_handler() completion calls. The problem is that
ib_destroy_qp() fails with -EBUSY.
Further investigation revealed qp->usecnt is not initialized. This
counter was introduced in linux-3.2 by commit 0e0ec7e063
("RDMA/core: Export ib_open_qp() to share XRC TGT QPs") but it only
gets initialized for IB_QPT_XRC_TGT, but it is checked in
ib_destroy_qp() for any QP type.
Fix this by initializing qp->usecnt for every QP we create.
Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: Sven Breuner <sven.breuner@itwm.fraunhofer.de>
[ Initialize qp->usecnt in uverbs too. - Sean ]
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Now we must provide the IP destination address, and a reference has
to be dropped when we're done with the entry.
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (165 commits)
reiserfs: Properly display mount options in /proc/mounts
vfs: prevent remount read-only if pending removes
vfs: count unlinked inodes
vfs: protect remounting superblock read-only
vfs: keep list of mounts for each superblock
vfs: switch ->show_options() to struct dentry *
vfs: switch ->show_path() to struct dentry *
vfs: switch ->show_devname() to struct dentry *
vfs: switch ->show_stats to struct dentry *
switch security_path_chmod() to struct path *
vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb
vfs: trim includes a bit
switch mnt_namespace ->root to struct mount
vfs: take /proc/*/mounts and friends to fs/proc_namespace.c
vfs: opencode mntget() mnt_set_mountpoint()
vfs: spread struct mount - remaining argument of next_mnt()
vfs: move fsnotify junk to struct mount
vfs: move mnt_devname
vfs: move mnt_list to struct mount
vfs: switch pnode.h macros to struct mount *
...
Clean up sparse warnings in the rdma core layer.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Fix endianness bugs reported by sparse in the RDMA core stack. Note
that these are real bugs, but don't affect any existing code to the
best of my knowledge. The mlid issue would only affect kernel users
of rdma_join_multicast which have the rdma_cm attach/detach its QP.
There are no current in tree users that do this. (rdma_join_multicast
may be used called by user space applications, which does not have
this issue.) And the pkey setting is simply returned as
informational.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Add a missing 16-bit reserved field between ap_status and info fields.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Userspace verbs multicast attach/detach operations on a QP are done
while holding the rwsem of the QP for reading. That's not sufficient
since a reader lock allows more than one reader to acquire the
lock. However, multicast attach/detach does list manipulation that
can corrupt the list if multiple threads run in parallel.
Fix this by acquiring the rwsem as a writer to serialize attach/detach
operations. Add idr_write_qp() and put_qp_write() to encapsulate
this.
This fixes oops seen when running applications that perform multicast
joins/leaves.
Reported by: Mike Dubman <miked@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
both callers of device_get_devnode() are only interested in lower 16bits
and nobody tries to return anything wider than 16bit anyway.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Conflicts:
net/bluetooth/l2cap_core.c
Just two overlapping changes, one added an initialization of
a local variable, and another change added a new local variable.
Signed-off-by: David S. Miller <davem@davemloft.net>
private_data_len is defined as a u8. If the user specifies a large
private_data size (> 220 bytes), we will calculate a total length that
exceeds 255, resulting in private_data_len wrapping back to 0. This
can lead to overwriting random kernel memory. Avoid this by verifying
that the resulting size fits into a u8.
Reported-by: B. Thery <benjamin.thery@bull.net>
Addresses: <http://bugs.openfabrics.org/bugzilla/show_bug.cgi?id=2335>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
IPV4 should do exactly what the IPV6 code does here, which is
use the neighbour obtained via the dst entry.
And now that the two code paths do the same thing, use a common
helper function to perform the operation.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Roland Dreier <roland@purestorage.com>
To reflect the fact that a refrence is not obtained to the
resulting neighbour entry.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Roland Dreier <roland@purestorage.com>
Commit f2c31e32b3 ("net: fix NULL dereferences in check_peer_redir()")
forgot to take care of infiniband uses of dst neighbours.
Many thanks to Marc Aurele who provided a nice bug report and feedback.
Reported-by: Marc Aurele La France <tsi@ualberta.ca>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
C assignment can handle struct in6_addr copying.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
Revert "tracing: Include module.h in define_trace.h"
irq: don't put module.h into irq.h for tracking irqgen modules.
bluetooth: macroize two small inlines to avoid module.h
ip_vs.h: fix implicit use of module_get/module_put from module.h
nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
include: replace linux/module.h with "struct module" wherever possible
include: convert various register fcns to macros to avoid include chaining
crypto.h: remove unused crypto_tfm_alg_modname() inline
uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
pm_runtime.h: explicitly requires notifier.h
linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
miscdevice.h: fix up implicit use of lists and types
stop_machine.h: fix implicit use of smp.h for smp_processor_id
of: fix implicit use of errno.h in include/linux/of.h
of_platform.h: delete needless include <linux/module.h>
acpi: remove module.h include from platform/aclinux.h
miscdevice.h: delete unnecessary inclusion of module.h
device_cgroup.h: delete needless include <linux/module.h>
net: sch_generic remove redundant use of <linux/module.h>
net: inet_timewait_sock doesnt need <linux/module.h>
...
Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
- drivers/media/dvb/frontends/dibx000_common.c
- drivers/media/video/{mt9m111.c,ov6650.c}
- drivers/mfd/ab3550-core.c
- include/linux/dmaengine.h
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (62 commits)
mlx4_core: Deprecate log_num_vlan module param
IB/mlx4: Don't set VLAN in IBoE WQEs' control segment
IB/mlx4: Enable 4K mtu for IBoE
RDMA/cxgb4: Mark QP in error before disabling the queue in firmware
RDMA/cxgb4: Serialize calls to CQ's comp_handler
RDMA/cxgb3: Serialize calls to CQ's comp_handler
IB/qib: Fix issue with link states and QSFP cables
IB/mlx4: Configure extended active speeds
mlx4_core: Add extended port capabilities support
IB/qib: Hold links until tuning data is available
IB/qib: Clean up checkpatch issue
IB/qib: Remove s_lock around header validation
IB/qib: Precompute timeout jiffies to optimize latency
IB/qib: Use RCU for qpn lookup
IB/qib: Eliminate divide/mod in converting idx to egr buf pointer
IB/qib: Decode path MTU optimization
IB/qib: Optimize RC/UC code by IB operation
IPoIB: Use the right function to do DMA unmap pages
RDMA/cxgb4: Use correct QID in insert_recv_cqe()
RDMA/cxgb4: Make sure flush CQ entries are collected on connection close
...
Some kernel components pin user space memory (infiniband and perf) (by
increasing the page count) and account that memory as "mlocked".
The difference between mlocking and pinning is:
A. mlocked pages are marked with PG_mlocked and are exempt from
swapping. Page migration may move them around though.
They are kept on a special LRU list.
B. Pinned pages cannot be moved because something needs to
directly access physical memory. They may not be on any
LRU list.
I recently saw an mlockalled process where mm->locked_vm became
bigger than the virtual size of the process (!) because some
memory was accounted for twice:
Once when the page was mlocked and once when the Infiniband
layer increased the refcount because it needt to pin the RDMA
memory.
This patch introduces a separate counter for pinned pages and
accounts them seperately.
Signed-off-by: Christoph Lameter <cl@linux.com>
Cc: Mike Marciniszyn <infinipath@qlogic.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
These were getting it implicitly via device.h --> module.h but
we are going to stop that when we clean up the headers.
Fix these in advance so the tree remains biscect-clean.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
They had been getting it implicitly via device.h but we can't
rely on that for the future, due to a pending cleanup so fix
it now.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
They get it via module.h (via device.h) but we want to clean that up.
When we do, we'll get things like:
CC [M] drivers/infiniband/core/sysfs.o
sysfs.c:361: error: 'S_IRUGO' undeclared here (not in a function)
sysfs.c:654: error: 'S_IWUSR' undeclared here (not in a function)
so add in the stat header it is using explicitly in advance.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Allow processes that share the same XRC domain to open an existing
shareable QP. This permits those processes to receive events on the
shared QP and transfer ownership, so that any process may modify the
QP. The latter allows the creating process to exit, while a remaining
process can still transition it for path migration purposes.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
XRC TGT QPs are shared resources among multiple processes. Since the
creating process may exit, allow other processes which share the same
XRC domain to open an existing QP. This allows us to transfer
ownership of an XRC TGT QP to another process.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Because an XRC TGT QP can end up being shared among multiple
processes, don't have the ib_cm automatically send a DREQ when the
userspace process that owns the ib_cm_id exits. Disconnect can be
initiated by the user directly; otherwise, the owner of the XRC INI QP
controls the connection.
Note that as a result of the process exiting, the ib_cm will stop
tracking the XRC connection on the target side. For the purposes of
disconnecting, this isn't a big deal. The ib_cm will respond to the
DREQ appropriately. For other messages, mainly LAP, the CM will
reject the request, since there's no one available to route the
request to.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Allow users to connect XRC QPs through the rdma_cm.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Allow the user to indicate the QP type separately from the port space
when allocating an rdma_cm_id. With RDMA_PS_IB, there is no longer a
1:1 relationship between the QP type and port space, so we need to
switch on the QP type to select between UD and connected QPs.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Add RDMA_PS_IB. XRC QP types will use the IB port space when operating
over the RDMA CM. For the 'IP protocol' field value, we select 0x3F,
which is listed as being for 'any local network'.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The XRC annex was updated to have XRC behave more like RD. Specifically,
the XRC TGT QPN moves from the local QPN to local EECN field. Lookup of
SRQN is done using the REQ/REP protocol.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Update the REQ and REP messages to support XRC connection setup
according to the XRC Annex. Several existing fields must be set to 0 or
1 when connecting XRC QPs, and a reserved field is changed to an
extended transport type.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Allow user space to operate on XRC TGT QPs the same way as other types
of QPs, with one notable exception: since XRC TGT QPs may be shared
among multiple processes, the XRC TGT QP is allowed to exist beyond the
lifetime of the creating process.
The process that creates the QP is allowed to destroy it, but if the
process exits without destroying the QP, then the QP will be left bound
to the lifetime of the XRCD.
TGT QPs are not associated with CQs or a PD.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
XRC INI QPs are similar to send only RC QPs. Allow user space to create
INI QPs. Note that INI QPs do not require receive CQs.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
We require additional information to create XRC SRQs than we can
exchange using the existing create SRQ ABI. Provide an enhanced create
ABI for extended SRQ types.
Based on patches by Jack Morgenstein <jackm@dev.mellanox.co.il>
and Roland Dreier <roland@purestorage.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Allow user space to create XRC domains. Because XRCDs are expected to
be shared among multiple processes, we use inodes to identify an XRCD.
Based on patches by Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
XRC TGT QPs are intended to be shared among multiple users and
processes. Allow the destruction of an XRC TGT QP to be done explicitly
through ib_destroy_qp() or when the XRCD is destroyed.
To support destroying an XRC TGT QP, we need to track TGT QPs with the
XRCD. When the XRCD is destroyed, all tracked XRC TGT QPs are also
cleaned up.
To avoid stale reference issues, if a user is holding a reference on a
TGT QP, we increment a reference count on the QP. The user releases the
reference by calling ib_release_qp. This releases any access to the QP
from a user above verbs, but allows the QP to continue to exist until
destroyed by the XRCD.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
XRC ("eXtended reliable connected") is an IB transport that provides
better scalability by allowing senders to specify which shared receive
queue (SRQ) should be used to receive a message, which essentially
allows one transport context (QP connection) to serve multiple
destinations (as long as they share an adapter, of course).
XRC communication is between an initiator (INI) QP and a target (TGT)
QP. Target QPs are associated with SRQs through an XRCD. An XRC TGT QP
behaves like a receive-only RD QP. XRC INI QPs behave similarly to RC
QPs, except that work requests posted to an XRC INI QP must specify the
remote SRQ that is the target of the work request.
We define two new QP types for XRC, to distinguish between INI and TGT
QPs, and update the core layer to support XRC QPs.
This patch is derived from work by Jack Morgenstein
<jackm@dev.mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
XRC ("eXtended reliable connected") is an IB transport that provides
better scalability by allowing senders to specify which shared receive
queue (SRQ) should be used to receive a message, which essentially
allows one transport context (QP connection) to serve multiple
destinations (as long as they share an adapter, of course).
XRC defines SRQs that are specifically used by XRC connections. Expand
the SRQ code to support XRC SRQs. An XRC SRQ is currently restricted to
only XRC use according to the IB XRC Annex.
Portions of this patch were derived from work by
Jack Morgenstein <jackm@dev.mellanox.co.il>.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Currently, there is only a single ("basic") type of SRQ, but with XRC
support we will add a second. Prepare for this by defining an SRQ type
and setting all current users to IB_SRQT_BASIC.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>