Add kernel support for userspace calling poll CQ, request CQ
notification, post send, post receive, post SRQ receive, create AH and
destroy AH commands. These commands allow us to support userspace
verbs for devices that can't perform these operations directly from
userspace (eg the PathScale HCA).
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Give each device a uverbs_cmd_mask, so that a low-level driver can
control which methods may be called on behalf of userspace.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If the SA query module's initialization fails for a device, then that
device won't have a struct ib_sa_device associated. We should fail SA
queries in that case, rather than blindly dereferencing the NULL
pointer we get back from ib_get_client_data().
Signed-off-by: Roland Dreier <rolandd@cisco.com>
There is a bug in ib_mad_init_device(): if ib_agent_port_open() fails
for a given port, then the current code doesn't call ib_mad_port_close()
for that port.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Reject userspace memory registrations with invalid permission flags:
"local write" is required if "remote write" or "remote atomic" is also
requested.
Pointed out by Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add abi_version attribute to uverbs class devices to allow for
ABI versioning of device-specific interfaces.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
By waiting to add resources to our lists until after the last
operation that can fail, we don't have to remove them from their lists
in the error path. Also, we should hold the idr mutex until we know
whether resource creation has succeed or failed, to avoid someone
finding a resource in our table before we're ready.
Loosely based on work by Robert Walsh <rjwalsh@pathscale.com>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Introduce new userspace verbs ABI version 3. This eliminates some
unneeded commands, and adds support for user-created completion
channels. This cleans up problems with file leaks on error paths, and
also makes sure that file descriptors are always installed into the
correct process.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add checks so that we only allow multicast attach/detach with
a valid multicast GID and the correct QP type.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
- added typedef unsigned int __nocast gfp_t;
- replaced __nocast uses for gfp flags with gfp_t - it gives exactly
the same warnings as far as sparse is concerned, doesn't change
generated code (from gcc point of view we replaced unsigned int with
typedef) and documents what's going on far better.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Al Viro pointed out that the current IB userspace verbs interface
allows userspace to cause mischief by closing file descriptors before
we're ready, or issuing the same command twice at the same time. This
patch closes those races, and fixes other obvious problems such as a
module reference leak.
Some other interface bogosities will require an ABI change to fix
properly, so I'm deferring those fixes until 2.6.15.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Based on simplification idea from Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Clean up code by using enums instead of hard-coded magic numbers.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We need to subtract off the header length from our payload
length when sending multi-packet SA messages.
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If the CM REQ handling function gets to error2, then it frees
cm_id_priv->timewait_info. But the next line goes through
ib_destroy_cm_id() -> ib_send_cm_rej() -> cm_reset_to_idle(),
which ends up calling cm_cleanup_timewait(), which dereferences the
pointer we just freed. Make sure we clear cm_id_priv->timewait_info
after freeing it, so that doesn't happen.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Changes to CM to support CM and port redirection (REJ reason 24).
Signed-off-by: John Kingman <kingman <at> storagegear.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
completion events after destroying a CQ, QP or SRQ. We do this by
sweeping the event lists before returning from a destroy calls, and
then return the number of events already reported before the destroy
call. This allows userspace wait until it has processed all events
for an object returned from the kernel before it frees its context for
the object.
The ABI of the destroy CQ, destroy QP and destroy SRQ commands has to
change to return the event count, so bump the ABI version from 1 to 2.
The userspace libibverbs library has already been updated to handle
both the old and new ABI versions.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
SA: Move SA attributes to ib_sa.h so are accessible to more than
sa_query.c. Also, remove deprecated attributes and add one missing one.
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch converts kcalloc(1, ...) calls to use the new kzalloc() function.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a new config option INFINIBAND_USER_MAD to control whether we
build ib_umad. Change INFINIBAND_USER_VERBS to INFINIBAND_USER_ACCESS,
and have it control ib_ucm and ib_uat as well as ib_uverbs.
Signed-off-by: James Lentini <jlentini@netapp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
- Fix payload length of middle RMPP sent segments. Middle payload
lengths should be 0 on the send side.
(This is perhaps a compliance and should not be an interop issue as
middle payload lengths are supposed to be ignored on receive).
- Fix length in first segment of multipacket sends
(This is a compliance issue but does not affect at least OpenIB to
OpenIB RMPP transfers).
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
- Add user specified context to all uCM events. Users will not retrieve
any events associated with the context after destroying the corresponding
cm_id.
- Provide the ib_cm_init_qp_attr() call to userspace clients of the CM.
This call may be used to set QP attributes properly before modifying the QP.
- Fixes some error handling synchonization and cleanup issues.
- Performs some minor code cleanup.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Using ib_get_client_data in SA event handler performs a list scan.
It's better to use container_of to get the sa device directly.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Move the InfiniBand headers from drivers/infiniband/include to include/rdma.
This allows InfiniBand-using code to live elsewhere, and lets us remove the
ugly EXTRA_CFLAGS include path from the InfiniBand Makefiles.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix deadlock condition resulting from trying to destroy a cm_id
from the context of a CM thread. The synchronization around the
ucm context structure is simplified as a result, and some simple
code cleanup is included.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add SRQ support to userspace verbs module. This adds several commands
and associated structures, but it's OK to do this without bumping the
ABI version because the commands are added at the end of the list so
they don't change the existing numbering. There are two cases to
worry about:
1. New kernel, old userspace. This is OK because old userspace simply
won't try to use the new SRQ commands. None of the old commands are
changed.
2. Old kernel, new userspace. This works perfectly as long as
userspace doesn't try to use SRQ commands. If userspace tries to
use SRQ commands, it will get EINVAL, which is perfectly
reasonable: the kernel doesn't support SRQs, so we couldn't do any
better.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
changing CONFIG_LOCALVERSION rebuilds too much, for no appearent reason.
Remove unneeded includes of <linux/version.h>.
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Change ib_mad_thread_completion_handler to conform to ib_comp_handler
declaration.
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Make sure that all FMRs are unmapped before we deallocate them so that
we don't leak references to our protection domain when destroying an
FMR pool. (Bug reported by Guy German <guyg@voltaire.com>)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Make some lawyers happy and add copyright notices for people who
forgot to include them when they actually touched the code.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix a use-after-free bug in userspace verbs cleanup: we can't touch
mr->device after we free mr by calling ib_dereg_mr().
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Only print debug messages when debug_level is set.
Eliminate NULL checks prior to calling kfree.
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Libor Michalek <libor@topspin.com>
Eliminate sparse warnings in SA client
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hook up userspace CM to the make system
Signed-off-by: Libor Michalek <libor@topspin.com>
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Include the patch openib-general changing class_simple to class.
Signed-off-by: Tom Duffy <tduffy@sun.com>
Cc: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add kernel portion of user CM implementation
Signed-off-by: Libor Michalek <libor@topspin.com>
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implementation for RMPP support in user MAD
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add the kernel CM implementation
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add Service Record support to SA client
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Replace be32_to_cpup with be32_to_cpu and fix bug referencing pointer rather
than value in ib_create_ah_from_wc().
Signed-off-by: Tom Duffy <tduffy@sun.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Added new call: ib_create_ah_from_wc. Call will allocate an address handle
given work completion information, including any received GRH.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fixed locking to handle error posting MAD send work requests. Fixed handling
canceling a MAD with an active work request.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Optimize canceling a MAD.
- Eliminate searching timeout list in cancel case.
- Remove duplicate calls to queue work item.
- Eliminate resending a MAD before MAD is completed.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add new MAD layer call to modify (ib_modify_mad) the timeout of a sent MAD,
and simplify cancel code.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eliminate MAD cache leak associated with local completions. Also, when
canceling MAD, empty local completion list as well.
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Simplify calling of list_del.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add implementation for ib_coalesce_recv_mad. Also, clear allocated MAD data
buffer in ib_create_send_mad.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Minor cleanup during startup and shutdown
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fixes an issue processing a sent MAD after it has timed out or been canceled.
The race occurs when a response MAD matches with the send request. The
request could time out or be canceled after the response MAD matches with the
request, but before the request completion can be processed.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Have ib_mad_send_wr_private reference the private agent structure directly,
rather than the exposed agent definition. Remove unneeded parameters to
functions and simplify code were possible from this change.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move saving of user's send wr_id to better match layering of received response
handling.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Combine response_mad() and solicited_mad() routines into a single function and
simplify/encapsulate its usage.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add new helper routines for allocating MADs for sending and formatting a send
WR.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Automatically allocate a MR when registering a MAD agent.
MAD clients are modified to use this updated API.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change some functions to return void rather than an int since they are always
returning 0, thus making checking return values rather pointless.
Signed-off-by: Tom Duffy <tduffy@sun.com>
Signed-off-by: Libor Michalek <libor@topspin.com>
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support for O_ASYNC notifications on userspace verbs
completion and asynchronous event file descriptors.
Signed-off-by: Gleb Natapov <glebn@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Hook up InfiniBand userspace verbs to Kconfig and the make system.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support for pinning userspace memory regions and returning a list of pages
in the region. This includes tracking pinned memory against vm_locked and
preventing unprivileged users from exceeding RLIMIT_MEMLOCK.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add the core of the InfiniBand userspace verbs implementation, including
creating character device nodes, dispatching requests from userspace, and
passing event notifications back up to userspace.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Update kernel InfiniBand midlayer to compile against the updated API for
low-level drivers. This just amounts to passing NULL for all
userspace-related parameters, and setting userspace-related structure members
to NULL.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix handling of fields with size_bits == 64. Pointed out by Hal Rosenstock.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use a copy of the id we'll return to the consumer so that we don't
dereference query->sa_query after calling send_mad(). A completion may
occur very quickly and end up freeing the query before we get to do
anything after send_mad().
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sysfs: fix the rest of the kernel so if an attribute doesn't
implement show or store method read/write will return
-EIO instead of 0 or -EINVAL or -EPERM.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Free all unclaimed MAD receive buffers when userspace closes our file so we
don't leak memory.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Check if a client passes a NULL callback into an SA query, and if so, never
call back. This fixes an oops if someone unloads ib_ipoib and ib_sa in
rapid succession. ib_ipoib does an MCMember delete with a NULL callback
and 0 timeout on unload, which is usually fine since the delete completes
successfully. However, if ib_sa is unloaded immediately afterwards, the
delete will be canceled and ib_sa will try to call the (now already
unloaded) ib_ipoib module back with the cancel completion, which triggers
the oops.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix handling of MAD agent registrations with mgmt_class == 0. In this case
ib_umad should pass a NULL registration request to the MAD core rather than a
request with mgmt_class set to 0.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mask bits correctly from jhash result in ib_fmr_hash() so that the
computed bucket index is within our hash table. This fixes an SDP
crash.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eliminate no longer needed include files
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Replace the *wc field in ib_mad_recv_wc from pointing to a structure on the
stack to one allocated with the received MAD buffer. This allows a client to
access the *wc field after their receive completion handler has returned.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!