Commit Graph

64 Commits

Author SHA1 Message Date
Roland Dreier bc1db9af73 IB: Explicitly rule out llseek to avoid BKL in default_llseek()
Several RDMA user-access drivers have file_operations structures with
no .llseek method set.  None of the drivers actually do anything with
f_pos, so this means llseek is essentially a NOP, instead of returning
an error as leaving other file_operations methods unimplemented would
do.  This is mostly harmless, except that a NULL .llseek means that
default_llseek() is used, and this function grabs the BKL, which we
would like to avoid.

Since llseek does nothing useful on these files, we would like it to
return an error to userspace instead of silently grabbing the BKL and
succeeding.  For nearly all of the file types, we take the
belt-and-suspenders approach of setting the .llseek method to
no_llseek and also calling nonseekable_open(); the exception is the
uverbs_event files, which are created with anon_inode_getfile(), which
already sets f_mode the same way as nonseekable_open() would.

This work is motivated by Arnd Bergmann's bkl-removal tree.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-04-21 12:17:38 -07:00
Tejun Heo 5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Andi Kleen 0933e2d98d driver core: Convert some drivers to CLASS_ATTR_STRING
Convert some drivers who export a single string as class attribute
to the new class_attr_string functions. This removes redundant
code all over.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-07 17:04:48 -08:00
Andi Kleen 28812fe11a driver-core: Add attribute argument to class_attribute show/store
Passing the attribute to the low level IO functions allows all kinds
of cleanups, by sharing low level IO code without requiring
an own function for every piece of data.

Also drivers can extend the attributes with own data fields
and use that in the low level function.

This makes the class attributes the same as sysdev_class attributes
and plain attributes.

This will allow further cleanups in drivers.

Full tree sweep converting all users.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-07 17:04:48 -08:00
Linus Torvalds 0f2cc4ecd8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
  init: Open /dev/console from rootfs
  mqueue: fix typo "failues" -> "failures"
  mqueue: only set error codes if they are really necessary
  mqueue: simplify do_open() error handling
  mqueue: apply mathematics distributivity on mq_bytes calculation
  mqueue: remove unneeded info->messages initialization
  mqueue: fix mq_open() file descriptor leak on user-space processes
  fix race in d_splice_alias()
  set S_DEAD on unlink() and non-directory rename() victims
  vfs: add NOFOLLOW flag to umount(2)
  get rid of ->mnt_parent in tomoyo/realpath
  hppfs can use existing proc_mnt, no need for do_kern_mount() in there
  Mirror MS_KERNMOUNT in ->mnt_flags
  get rid of useless vfsmount_lock use in put_mnt_ns()
  Take vfsmount_lock to fs/internal.h
  get rid of insanity with namespace roots in tomoyo
  take check for new events in namespace (guts of mounts_poll()) to namespace.c
  Don't mess with generic_permission() under ->d_lock in hpfs
  sanitize const/signedness for udf
  nilfs: sanitize const/signedness in dealing with ->d_name.name
  ...

Fix up fairly trivial (famous last words...) conflicts in
drivers/infiniband/core/uverbs_main.c and security/tomoyo/realpath.c
2010-03-04 08:15:33 -08:00
Al Viro b1e4594ba0 switch infiniband uverbs to anon_inodes
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-03 14:07:27 -05:00
Roland Dreier fe8875e5a4 Merge branch 'misc' into for-next
Conflicts:
	drivers/infiniband/core/uverbs_main.c
2010-03-01 23:52:31 -08:00
Roland Dreier a265e5587f IB/uverbs: Use anon_inodes instead of private infinibandeventfs
The anon_inodes interface has been split to allow creating a bare
(non-installed) file pointer and also extended to allow specifying
O_RDONLY in the flags.  This makes it a suitable replacement for the
private "infinibandeventfs" pseudo-filesystem used by uverbs, and this
replacement saves a small chunk of boilerplate code.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 16:51:20 -08:00
Alexander Chiang 9afed76d59 IB/uverbs: Whitespace cleanup
Clean up the errors as shown when 'let c_space_errors=1' is set in vim.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 10:23:42 -08:00
Alexander Chiang 6d6a0e71ee IB/uverbs: Increase maximum devices supported
Some large systems may support more than IB_UVERBS_MAX_DEVICES
(currently 32).

This change allows us to support more devices in a backwards-compatible
manner.  The first IB_UVERBS_MAX_DEVICES keep the same major/minor
device numbers that they've always had.

If there are more than IB_UVERBS_MAX_DEVICES, we then dynamically
request a new major device number (new minors start at 0).

This change increases the maximum number of HCAs to 64 (from 32).

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 10:23:41 -08:00
Alexander Chiang ddbd688301 IB/uverbs: use stack variable 'base' in ib_uverbs_add_one
This change is not useful by itself, but sets us up for a future change
that allows us to support more than IB_UVERBS_MAX_DEVICES in a system.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 10:23:40 -08:00
Alexander Chiang 38707980c4 IB/uverbs: Use stack variable 'devnum' in ib_uverbs_add_one
This change is not useful by itself, but it sets us up for a future
change that allows us to dynamically allocate device numbers in case
we have more than IB_UVERBS_MAX_DEVICES in the system.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 10:23:40 -08:00
Alexander Chiang 2a72f21226 IB/uverbs: Remove dev_table
dev_table's raison d'etre was to associate an inode back to a struct
ib_uverbs_device.

However, now that we've converted ib_uverbs_device to contain an
embedded cdev (instead of a *cdev), we can use the container_of()
macro and cast back to the containing device.

There's no longer any need for dev_table, so get rid of it.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 10:23:39 -08:00
Alexander Chiang 055422ddbb IB/uverbs: Convert *cdev to cdev in struct ib_uverbs_device
Instead of storing a pointer to a cdev, embed the entire struct cdev.

This change allows us to use the container_of() macro in
ib_uverbs_open() in a future patch.

This change increases the size of struct ib_uverbs_device to 168 bytes
across 3 cachelines from 80 bytes in 2 cachelines.  However, we
rearrange the members so that everything fits into the first cacheline
except for the struct cdev. Finally, we don't touch the cdev in any
fastpaths, so this change shouldn't negatively affect performance.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 10:23:39 -08:00
Al Viro 2c48b9c455 switch alloc_file() to passing struct path
... and have the caller grab both mnt and dentry; kill
leak in infiniband, while we are at it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-16 12:16:42 -05:00
Alexey Dobriyan a99bbaf5ee headers: remove sched.h from poll.h
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-04 15:05:10 -07:00
Jack Morgenstein b1b8afb833 IB/uverbs: Return ENOSYS for unimplemented commands (not EINVAL)
Since the original commit 883a99c7 ("[IB] uverbs: Add a mask of device
methods allowed for userspace"), the uverbs core returns EINVAL for
commands not implemented by a specific low-level driver.

This creates a problem that there is no way to tell the difference
between an unimplemented command and an implemented one which is
incorrectly invoked (which also returns EINVAL).

The fix is to have unimplemented commands return ENOSYS.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:24:24 -07:00
Roland Dreier 6276e08a9b IB: Use DEFINE_SPINLOCK() for static spinlocks
Rather than just defining static spinlock_t variables and then
initializing them later in init functions, simply define them with
DEFINE_SPINLOCK() and remove the calls to spin_lock_init().  This cleans
up the source a tad and also shrinks the compiled code; eg on x86-64:

add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-40 (-40)
function                                     old     new   delta
ib_uverbs_init                               336     326     -10
ib_mad_init_module                           147     137     -10
ib_sa_init                                   123     103     -20

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:24:23 -07:00
Al Viro 233e70f422 saner FASYNC handling on file close
As it is, all instances of ->release() for files that have ->fasync()
need to remember to evict file from fasync lists; forgetting that
creates a hole and we actually have a bunch that *does* forget.

So let's keep our lives simple - let __fput() check FASYNC in
file->f_flags and call ->fasync() there if it's been set.  And lose that
crap in ->release() instances - leaving it there is still valid, but we
don't have to bother anymore.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-01 09:49:46 -07:00
Greg Kroah-Hartman 91bd418fdc device create: infiniband: convert device_create_drvdata to device_create
Now that device_create() has been audited, rename things back to the
original call to be sane.

Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-10-16 09:24:42 -07:00
Roland Dreier f3781d2e89 RDMA: Remove subversion $Id tags
They don't get updated by git and so they're worse than useless.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:44 -07:00
Jonathan Corbet 2fceef397f Merge commit 'v2.6.26' into bkl-removal 2008-07-14 15:29:34 -06:00
Roland Dreier 5b2d281acb IB/uverbs: BKL is not needed for ib_uverbs_open()
Remove explicit lock_kernel() calls and document why the code is safe.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-07-04 10:32:28 -06:00
Jonathan Corbet 057e7c7ff9 infiniband: more BKL pushdown
Be extra-cautious and protect the remaining open() functions.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-06-20 14:05:51 -06:00
Jack Morgenstein fb77bcef9f IB/uverbs: Fix check of is_closed flag check in ib_uverbs_async_handler()
Commit 1ae5c187 ("IB/uverbs: Don't store struct file * for event
files") changed the way that closed files are handled in the uverbs
code.  However, after the conversion, is_closed flag is checked
incorrectly in ib_uverbs_async_handler().  As a result, no async
events are ever passed to applications.

Found by: Ronni Zimmerman <ronniz@mellanox.co.il>

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-06-18 15:36:38 -07:00
Greg Kroah-Hartman 6c06aec248 IB: fix race in device_create
There is a race from when a device is created with device_create() and
then the drvdata is set with a call to dev_set_drvdata() in which a
sysfs file could be open, yet the drvdata will be NULL, causing all
sorts of bad things to happen.

This patch fixes the problem by using the new function,
device_create_drvdata().

Cc: Kay Sievers <kay.sievers@vrfy.org>
Reviewed-by: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-20 13:31:55 -07:00
Tony Jones f4e91eb4a8 IB: convert struct class_device to struct device
This converts the main ib_device to use struct device instead of struct
class_device as class_device is going away.

Signed-off-by: Tony Jones <tonyj@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-19 19:10:30 -07:00
Roland Dreier a7dab9e887 IB/uverbs: Use alloc_file() instead of get_empty_filp()
Christoph Hellwig wants to unexport get_empty_filp(), which is an ugly
internal interface.  Change the modular user in ib_uverbs_alloc_event_file()
to use the better alloc_file() interface; this makes the code cleaner too.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:08 -07:00
Roland Dreier 1ae5c187ac IB/uverbs: Don't store struct file * for event files
The file member of struct ib_uverbs_event_file was only used to keep
track of whether the file had been closed or not.  The only thing we
ever did with the value was check if it was NULL or not.  Simplify the
code and get rid of the need to keep track of the struct file * we
allocate by replacing the file member with an is_closed member.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:08 -07:00
Roland Dreier 04d29b0ede IB/uverbs: Make ib_uverbs_release_event_file() static
ib_uverbs_release_event_file() is only used in uverbs_main.c, so make it
static to that file.  Also move the definition before the first use, so
a forward declaration is not needed.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:15 -07:00
Roland Dreier f7c6a7b5d5 IB/uverbs: Export ib_umem_get()/ib_umem_release() to modules
Export ib_umem_get()/ib_umem_release() and put low-level drivers in
control of when to call ib_umem_get() to pin and DMA map userspace,
rather than always calling it in ib_uverbs_reg_mr() before calling the
low-level driver's reg_user_mr method.

Also move these functions to be in the ib_core module instead of
ib_uverbs, so that driver modules using them do not depend on
ib_uverbs.

This has a number of advantages:
 - It is better design from the standpoint of making generic code a
   library that can be used or overridden by device-specific code as
   the details of specific devices dictate.
 - Drivers that do not need to pin userspace memory regions do not
   need to take the performance hit of calling ib_mem_get().  For
   example, although I have not tried to implement it in this patch,
   the ipath driver should be able to avoid pinning memory and just
   use copy_{to,from}_user() to access userspace memory regions.
 - Buffers that need special mapping treatment can be identified by
   the low-level driver.  For example, it may be possible to solve
   some Altix-specific memory ordering issues with mthca CQs in
   userspace by mapping CQ buffers with extra flags.
 - Drivers that need to pin and DMA map userspace memory for things
   other than memory regions can use ib_umem_get() directly, instead
   of hacks using extra parameters to their reg_phys_mr method.  For
   example, the mlx4 driver that is pending being merged needs to pin
   and DMA map QP and CQ buffers, but it does not need to create a
   memory key for these buffers.  So the cleanest solution is for mlx4
   to call ib_umem_get() in the create_qp and create_cq methods.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-08 18:00:37 -07:00
Michael S. Tsirkin f4fd0b224d IB: Add CQ comp_vector support
Add a num_comp_vectors member to struct ib_device and extend
ib_create_cq() to pass in a comp_vector parameter -- this parallels
the userspace libibverbs API.  Update all hardware drivers to set
num_comp_vectors to 1 and have all ULPs pass 0 for the comp_vector
value.  Pass the value of num_comp_vectors to userspace rather than
hard-coding a value of 1.

We want multiple CQ event vector support (via MSI-X or similar for
adapters that can generate multiple interrupts), but it's not clear
how many vectors we want, or how we want to deal with policy issues
such as how to decide which vector to use or how to set up interrupt
affinity.  This patch is useful for experimenting, since no core
changes will be necessary when updating a driver to support multiple
vectors, and we know that we want to make at least these changes
anyway.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06 21:18:11 -07:00
Arjan van de Ven 2b8693c061 [PATCH] mark struct file_operations const 3
Many struct file_operations in the kernel can be "const".  Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data.  In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:45 -08:00
Josef Sipek 1cfd6e648b [PATCH] struct path: convert infiniband
Signed-off-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:46 -08:00
Jack Morgenstein fd60ae404f IB/uverbs: Avoid a crash on device hot remove
Wait until all users have closed their device context before allowing
device unregistration to complete.  This prevents a crash caused by
referring to stale data structures.

A better solution would be to have a way to revoke contexts rather
than waiting for userspace to close the context, but that's a much
bigger change that will have to wait.  For now let's at least avoid
the crash.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-08-03 10:56:42 -07:00
Linus Torvalds 61b9175808 Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
  IB/iser: iSER Kconfig and Makefile
  IB/iser: iSER handling of memory for RDMA
  IB/iser: iSER RDMA CM (CMA) and IB verbs interaction
  IB/iser: iSER initiator iSCSI PDU and TX/RX
  IB/iser: iSCSI iSER transport provider high level code
  IB/iser: iSCSI iSER transport provider header file
  IB/uverbs: Remove unnecessary list_del()s
  IB/uverbs: Don't free wr list when it's known to be empty
2006-06-25 16:07:58 -07:00
David Howells 454e2398be [PATCH] VFS: Permit filesystem to override root dentry on mount
Extend the get_sb() filesystem operation to take an extra argument that
permits the VFS to pass in the target vfsmount that defines the mountpoint.

The filesystem is then required to manually set the superblock and root dentry
pointers.  For most filesystems, this should be done with simple_set_mnt()
which will set the superblock pointer and then set the root dentry to the
superblock's s_root (as per the old default behaviour).

The get_sb() op now returns an integer as there's now no need to return the
superblock pointer.

This patch permits a superblock to be implicitly shared amongst several mount
points, such as can be done with NFS to avoid potential inode aliasing.  In
such a case, simple_set_mnt() would not be called, and instead the mnt_root
and mnt_sb would be set directly.

The patch also makes the following changes:

 (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
     pointer argument and return an integer, so most filesystems have to change
     very little.

 (*) If one of the convenience function is not used, then get_sb() should
     normally call simple_set_mnt() to instantiate the vfsmount. This will
     always return 0, and so can be tail-called from get_sb().

 (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
     dcache upon superblock destruction rather than shrink_dcache_anon().

     This is required because the superblock may now have multiple trees that
     aren't actually bound to s_root, but that still need to be cleaned up. The
     currently called functions assume that the whole tree is rooted at s_root,
     and that anonymous dentries are not the roots of trees which results in
     dentries being left unculled.

     However, with the way NFS superblock sharing are currently set to be
     implemented, these assumptions are violated: the root of the filesystem is
     simply a dummy dentry and inode (the real inode for '/' may well be
     inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
     with child trees.

     [*] Anonymous until discovered from another tree.

 (*) The documentation has been adjusted, including the additional bit of
     changing ext2_* into foo_* in the documentation.

[akpm@osdl.org: convert ipath_fs, do other stuff]
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:45 -07:00
Roland Dreier 9b8efc0242 IB/uverbs: Remove unnecessary list_del()s
In ib_uverbs_cleanup_ucontext(), when iterating through the lists of
objects, there's no reason to do list_del() to remove the objects,
since both the objects and the lists that contain them are about to be
freed anyway.  Since list_del() is a moderately big inline function,
getting rid of this extra work saves quite a bit of .text:

add/remove: 0/0 grow/shrink: 1/2 up/down: 3/-217 (-214)
function                                     old     new   delta
ib_uverbs_comp_handler                       225     228      +3
ib_uverbs_async_handler                      256     255      -1
ib_uverbs_close                              905     689    -216

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-22 07:47:27 -07:00
Roland Dreier 9ead190bfd IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex.  This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.

Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.

This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention.  However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.

Surprisingly, these changes even shrink the object code:

add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:44:49 -07:00
Dotan Barak 8bdb0e8632 IB/uverbs: Support for query SRQ from userspace
Add support to uverbs to handle querying userspace SRQs (shared
receive queues), including adding an ABI for marshalling requests and
responses.  The kernel midlayer already has the underlying
ib_query_srq() function.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:14 -08:00
Dotan Barak 7ccc9a24e0 IB/uverbs: Support for query QP from userspace
Add support to uverbs to handle querying userspace QPs (queue pairs),
including adding an ABI for marshalling requests and responses.  The
kernel midlayer already has the underlying ib_query_qp() function.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:14 -08:00
Roland Dreier a74cd4af0b IB: Whitespace cleanups
Remove trailing whitespace and fix indentation that with spaces
instead of tabs.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:13 -08:00
Roland Dreier 33b9b3ee97 IB: Add userspace support for resizing CQs
Add support to uverbs to handle resizing userspace CQs (completion
queues), including adding an ABI for marshalling requests and
responses.  The kernel midlayer already has ib_resize_cq().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:07 -08:00
Michael S. Tsirkin cc76e33ec9 IB/uverbs: Flush scheduled work before unloading module
uverbs might schedule work to clean up when a file is closed.  Make
sure that this work runs before allowing module text to go away.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-01-17 09:41:47 -08:00
Ingo Molnar 95ed644fd1 IB: convert from semaphores to mutexes
semaphore to mutex conversion by Ingo and Arjan's script.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
[ Sanity-checked on real IB hardware ]
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-01-13 14:51:39 -08:00
Jack Morgenstein f4e401562c IB/uverbs: track multicast group membership for userspace QPs
uverbs needs to track which multicast groups is each qp
attached to, in order to properly detach when cleanup
is performed on device file close.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-29 16:57:01 -08:00
Roland Dreier de6eb66b56 [IB] kzalloc() conversions
Replace kmalloc()+memset(,0,) with kzalloc(), for a net savings of 35
source lines and about 500 bytes of text.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-02 07:23:14 -08:00
Roland Dreier 7162a3e0db [IB] uverbs: Avoid NULL pointer deref on CQ async event
Userspace CQs that have no completion event channel attached end up
with their cq_context set to NULL.  However, asynchronous events like
"CQ overrun" can still occur on such CQs, so add a uverbs_file member
to struct ib_ucq_object that we can follow to deliver these events.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-10-31 07:10:32 -08:00
Roland Dreier 4cce3390c9 [IB] fix up class_device_create() calls
Fix class_device_create() calls to match the new prototype which
takes a parent device pointer.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-10-28 16:38:15 -07:00
Roland Dreier 70a30e16a8 [IB] uverbs: Fix device lifetime problems
Move ib_uverbs module to using cdev_alloc() and class_device_create()
so that we can handle device lifetime properly.  Now we can make sure
we keep all of our data structures around until the last way to reach
them is gone.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-10-28 15:38:26 -07:00