OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Sunil Mushran	17104683d2	ocfs2: Update default cluster timeouts Lots of people are having trouble with the default timeouts, which are too low. These new values are derived from an informal survey taken on ocfs2-users, as well as data from bug reports. This should reduce the amount of cluster disconnects and subsequent fencing seen during normal workloads. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:45 -08:00
Jan Kara	634bf74d1e	ocfs2: printf fixes Explicitely convert loff_t to long long in printf. Just for sure... Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:45 -08:00
Jan Kara	32c3c0e2e5	ocfs2: Use generic_file_llseek We should use generic_file_llseek() and not default_llseek() so that s_maxbytes gets properly checked when seeking. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:45 -08:00
Jan Kara	d2849fb294	ocfs2: Safer read_inline_data() In ocfs2_read_inline_data() we should store file size in loff_t. Although the file size should fit in 32 bits we cannot be sure in case filesystem is corrupted. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:44 -08:00
Jan Kara	5fa0613ea5	ocfs2: Silence false lockdep warnings Create separate lockdep lock classes for system file's i_mutexes. They are used to guard allocations and similar things and thus rank differently than i_mutex of a regular file or directory. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:44 -08:00
Mark Fasheh	53fc622b9e	[PATCH 2/2] ocfs2: cluster aware flock() Hook up ocfs2_flock(), using the new flock lock type in dlmglue.c. A new mount option, "localflocks" is added so that users can revert to old functionality as need be. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:43 -08:00
Mark Fasheh	cf8e06f1a8	[PATCH 1/2] ocfs2: add flock lock type This adds a new dlmglue lock type which is intended to back flock() requests. Since these locks are driven from userspace, usage rules are much more liberal than the typical Ocfs2 internal cluster lock. As a result, we can't make use of most dlmglue features - lock caching and lock level optimizations in particular. Additionally, userspace is free to deadlock itself, so we have to deal with that in the same way as the rest of the kernel - by allowing a signal to abort a lock request. In order to keep ocfs2_cluster_lock() complexity down, ocfs2_file_lock() does it's own dlm coordination. We still use the same helper functions though, so duplicated code is kept to a minimum. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:43 -08:00
Sunil Mushran	2fbe8d1ebe	ocfs2: Local alloc window size changeable via mount option Local alloc is a performance optimization in ocfs2 in which a node takes a window of bits from the global bitmap and then uses that for all small local allocations. This window size is fixed to 8MB currently. This patch allows users to specify the window size in MB including disabling it by passing in 0. If the number specified is too large, the fs will use the default value of 8MB. mount -o localalloc=X /dev/sdX /mntpoint Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:43 -08:00
Mark Fasheh	d147b3d630	ocfs2: Support commit= mount option Mostly taken from ext3. This allows the user to set the jbd commit interval, in seconds. The default of 5 seconds stays the same, but now users can easily increase the commit interval. Typically, this would be increased in order to benefit performance at the expense of data-safety. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:42 -08:00
Mark Fasheh	0957f00796	ocfs2: Add missing permission checks Check that an online resize is being driven by a user with permission to change system resource limits. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:05:19 -08:00
Tao Ma	7909f2bf83	[PATCH 2/2] ocfs2: Implement group add for online resize This patch adds the ability for a userspace program to request that a properly formatted cluster group be added to the main allocation bitmap for an Ocfs2 file system. The request is made via an ioctl, OCFS2_IOC_GROUP_ADD. On a high level, this is similar to ext3, but we use a different ioctl as the structure which has to be passed through is different. During an online resize, tunefs.ocfs2 will format any new cluster groups which must be added to complete the resize, and call OCFS2_IOC_GROUP_ADD on each one. Kernel verifies that the core cluster group information is valid and then does the work of linking it into the global allocation bitmap. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 15:04:24 -08:00
Tao Ma	d659072f73	[PATCH 1/2] ocfs2: Add group extend for online resize This patch adds the ability for a userspace program to request an extend of last cluster group on an Ocfs2 file system. The request is made via ioctl, OCFS2_IOC_GROUP_EXTEND. This is derived from EXT3_IOC_GROUP_EXTEND, but is obviously Ocfs2 specific. tunefs.ocfs2 would call this for an online-resize operation if the last cluster group isn't full. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:53:35 -08:00
Hans de Goede	23c3e290fb	[SCSI] usbstorage: use last_sector_bug flag universally This patch sets the last_sector_bug flag to 1 for all USB disks. This is needed to makes the cardreader on various HP multifunction printers work. Since the performance impact is negible we set this flag for all USB disks to avoid an unusual_devs.h nightmare. Signed-off-by: Hans de Goede <j.w.r.degoede@hhs.nl> Acked-by: Matthew Dharm <mdharm-usb@one-eyed-alien.net> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2008-01-25 16:50:31 -06:00
Tao Ma	7f68fc2821	ocfs2: Reserve ioctl range We need to reserve at least two ioctls for online-resize. Reserve a small range of ioctls for Ocfs2 use in Documentation/ioctl-number.txt. This should give us enough room for future growth. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:48:57 -08:00
Tao Ma	e9d578a8f2	ocfs2: Initalize bitmap_cpg of ocfs2_super to be the maximum. This value is initialized from global_bitmap->id2.i_chain.cl_cpg. If there is only 1 group, it will be equal to the total clusters in the volume. So as for online resize, it should change for all the nodes in the cluster. It isn't easy and there is no corresponding lock for it. bitmap_cpg is only used in 2 areas: 1. Check whether the suballoc is too large for us to allocate from the global bitmap, so it is little used. And now the suballoc size is 2048, it rarely meet this situation and the check is almost useless. 2. Calculate which group a cluster belongs to. We use it during truncate to figure out which cluster group an extent belongs too. But we should be OK if we increase it though as the cluster group calculated shouldn't change and we only ever have a small bitmap_cpg on file systems with a single cluster group. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:48:54 -08:00
Herbert Xu	946fef4e14	[CRYPTO] hifn795x: Disallow built-in hifn795x when HW_RANDOM is m Currently it is possible to select HW_RANDOM as a module and have hifn795x built-in. This causes a build problem because hifn795x will then call hwrng_register which isn't built-in. This patch introduces a new config option to control the hifn795x RNG which lets us avoid this problem. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2008-01-26 09:48:44 +11:00
Mark Fasheh	1252c434e3	ocfs2: Documentation update Remove 'readpages' from the list in ocfs2.txt. Instead of having two identical lists, I just removed the list in the OCFS2 section of fs/Kconfig and added a pointer to Documentation/filesystems/ocfs2.txt. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:48:42 -08:00
Mark Fasheh	628a24f5bd	ocfs2: Readpages support Add ->readpages support to Ocfs2. This is rather trivial - all it required is a small update to ocfs2_get_block (for mapping full extents via b_size) and an ocfs2_readpages() function which partially mirrors ocfs2_readpage(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:48:12 -08:00
Joel Becker	d69a3ad6a0	dlm: Split lock mode and flag constants into a sharable header. This allows others to use the DLM constants without being tied to the function API of fs/dlm. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:46:04 -08:00
Mark Fasheh	e63aecb651	ocfs2: Rename ocfs2_meta_[un]lock Call this the "inode_lock" now, since it covers both data and meta data. This patch makes no functional changes. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:46:01 -08:00
Mark Fasheh	c934a92d05	ocfs2: Remove data locks The meta lock now covers both meta data and data, so this just removes the now-redundant data lock. Combining locks saves us a round of lock mastery per inode and one less lock to ping between nodes during read/write. We don't lose much - since meta locks were always held before a data lock (and at the same level) ordered writeout mode (the default) ensured that flushing for the meta data lock also pushed out data anyways. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:45:57 -08:00
Mark Fasheh	f1f540688e	ocfs2: Add data downconvert worker to inode lock In order to extend inode lock coverage to inode data, we use the same data downconvert worker with only a small modification to only do work for regular files. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:45:54 -08:00
Mark Fasheh	34d024f843	ocfs2: Remove mount/unmount votes The node maps that are set/unset by these votes are no longer relevant, thus we can remove the mount and umount votes. Since those are the last two remaining votes, we can also remove the entire vote infrastructure. The vote thread has been renamed to the downconvert thread, and the small amount of functionality related to managing it has been moved into fs/ocfs2/dlmglue.c. All references to votes have been removed or updated. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:45:34 -08:00
Linus Torvalds	99f1c97dbd	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (81 commits) RDMA/cxgb3: Fix the T3A workaround checks IB/ipath: Remove unnecessary cast IPoIB: Constify seq_operations function pointer tables RDMA/cxgb3: Mark QP as privileged based on user capabilities RDMA/cxgb3: Fix page shift calculation in build_phys_page_list() RDMA/cxgb3: Flush the receive queue when closing IB/ipath: Trivial simplification of ipath_make_ud_req() IB/mthca: Update latest "native Arbel" firmware revision IPoIB: Remove redundant check of netif_queue_stopped() in xmit handler IB/ipath: Add mappings from HW register to PortInfo port physical state IB/ipath: Changes to support PIO bandwidth check on IBA7220 IB/ipath: Minor cleanup of unused fields and chip-specific errors IB/ipath: New sysfs entries to control 7220 features IB/ipath: Add new chip-specific functions to older chips, consistent init IB/ipath: Remove unused MDIO interface code IB/ehca: Prevent RDMA-related connection failures on some eHCA2 hardware IB/ehca: Add "port connection autodetect mode" IB/ehca: Define array to store SMI/GSI QPs IB/ehca: Remove CQ-QP-link before destroying QP in error path of create_qp() IB/iser: Add change_queue_depth method ...	2008-01-25 14:41:24 -08:00
Mark Fasheh	6f7b056ea9	ocfs2: Remove fs dependency on ocfs2_heartbeat module Now that the dlm exposes domain information to us, we don't need generic node up / node down callbacks. And since the DLM is only telling us when a node goes down unexpectedly, we no longer need to optimize away node down callbacks via the umount map. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:36:40 -08:00
Mark Fasheh	6561168cb4	ocfs2_dlm: Call node eviction callbacks from heartbeat handler With this, a dlm client can take advantage of the group protocol in the dlm to get full notification whenever a node within the dlm domain leaves unexpectedly. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-01-25 14:36:40 -08:00
Steve Wise	8176d297c7	RDMA/cxgb3: Fix the T3A workaround checks Correctly work around T3A issues by checking "hwtype != T3A" instead of "hwtype == T3B". This will be needed for new hardware types. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:17:47 -08:00
Jan Engelhardt	f7fca1e8a8	IB/ipath: Remove unnecessary cast Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:17:46 -08:00
Jan Engelhardt	1cf18d5aab	IPoIB: Constify seq_operations function pointer tables Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:17:46 -08:00
Steve Wise	c6b5b50474	RDMA/cxgb3: Mark QP as privileged based on user capabilities This is needed to support zero-stag properly. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:17:45 -08:00
Steve Wise	d08ca26cee	RDMA/cxgb3: Fix page shift calculation in build_phys_page_list() The existing logic incorrectly maps this buffer list: 0: addr 0x10001000, size 0x1000 1: addr 0x10002000, size 0x1000 To this bogus page list: 0: 0x10000000 1: 0x10002000 The shift calculation must also take into account the address of the first entry masked by the page_mask as well as the last address+size rounded up to the next page size. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:17:45 -08:00
Steve Wise	856b592504	RDMA/cxgb3: Flush the receive queue when closing - for kernel mode cqs, call event notification handler when flushing. - flush QP when moving from RTS -> CLOSING. - fix logic to identify a kernel mode qp. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:17:45 -08:00
Ralph Campbell	4e1e93a418	IB/ipath: Trivial simplification of ipath_make_ud_req() Move the increment of s_hdrwords into the existing if block that tests if we're doing a send with immediate, to save one test of the opcode. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:17:44 -08:00
Roland Dreier	950529e5c6	IB/mthca: Update latest "native Arbel" firmware revision Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:17:44 -08:00
Krishna Kumar	48fe5e594c	IPoIB: Remove redundant check of netif_queue_stopped() in xmit handler qdisc_run() now tests for queue_stopped() before calling __qdisc_run(), and the same check is done in every iteration of __qdisc_run(), so another check is not required in the driver xmit. This means that ipoib_start_xmit() no longer needs to test netif_queue_stopped(); the test was added to fix earlier kernels, where the networking stack did not guarantee that the xmit method of an LLTX driver would not be called after the queue was stopped, but current kernels do provide this guarantee. To validate, I put a debug in the TX_BUSY path which never hit with 64 threads running overnight exercising this code a few 100 million times. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:17:44 -08:00
Ralph Campbell	3d68ea3261	IB/ipath: Add mappings from HW register to PortInfo port physical state Add new mappings from port physical state (a HW register value) to the IB SubnGet(PortInfo) port physical state. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:17:44 -08:00
Dave Olson	6ac50727bd	IB/ipath: Changes to support PIO bandwidth check on IBA7220 The IBA7220 uses a count-based triggering mechanism, and therefore can't use the same bandwidth verification mechanism as older chips. To support the 7220, allow enabling and disabling armlaunch errors on application request. Minor robustness improvements as well. Signed-off-by: Dave Olson <dave.olson@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:17:43 -08:00
Dave Olson	ddb70c83a5	IB/ipath: Minor cleanup of unused fields and chip-specific errors Clean up some unused header fields, minor related cleanup. Signed-off-by: Dave Olson <dave.olson@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:17:43 -08:00
Michael Albaugh	359193ef43	IB/ipath: New sysfs entries to control 7220 features IBA7220 includes many more configurable IB settings. Getting/setting these is now grouped into a pair of chip specific functions accessed via function pointers. Provide sysfs access to these settings. Signed-off-by: Michael Albaugh <michael.albaugh@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:17:32 -08:00
Dave Olson	c4bce8032e	IB/ipath: Add new chip-specific functions to older chips, consistent init This adds the new (sometimes empty) chip-specific functions to the older chips, and makes the initialization and related functions consistent across all 3 chips. Signed-off-by: Dave Olson <dave.olson@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:15:45 -08:00
Dave Olson	7387273307	IB/ipath: Remove unused MDIO interface code This code has been unused for some time, but still had leftovers from when it was used. Signed-off-by: Dave Olson <dave.olson@qlogic.com Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:15:44 -08:00
Joachim Fenkes	2ec8e66241	IB/ehca: Prevent RDMA-related connection failures on some eHCA2 hardware Some HW revisions of eHCA2 may cause an RC connection to break if they received RDMA Reads over that connection before. This can be prevented by assuring that, after the first RDMA Read, the QP receives a new RDMA Read every few million link packets. Include code into the driver that inserts an empty (size 0) RDMA Read into the message stream every now and then if the consumer doesn't post them frequently enough. Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:15:44 -08:00
Hoang-Nam Nguyen	bbdd267ef2	IB/ehca: Add "port connection autodetect mode" This patch enhances ehca with a capability to "autodetect" the ports being connected physically. In order to utilize that function the module option nr_ports must be set to -1 (default is 2 - two ports). This feature is experimental and will made the default later. More detail: If the user connects only one port to the switch, current code requires 1) port one to be connected and 2) module option nr_ports=1 to be given. If autodetect is enabled, ehca will not wait at creation of the GSI QP for the respective port to become active. Since firmware does not accept modify_qp() while the port is down at initialization, we need to cache all calls to modify_qp() for the SMI/GSI QP and just return a good return code. When a port is activated and we get a PORT_ACTIVE event, we replay the cached modify-qp() parms and re-trigger any posted recv WRs. Only then do we forward the PORT_ACTIVE event to registered clients. The result of this autodetect patch is that all ports will be accessible by the users. Depending on their respective cabling only those ports that are connected properly will become operable. If a user tries to modify a regular QP of a non-connected port, modify_qp() will fail. Furthermore, ibv_devinfo should show the port state accordingly. Note that this patch primarily improves the loading behaviour of ehca. If the cable is removed while the driver is operating and plugged in again, firmware will handle that properly by sending an appropriate async event. Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:15:44 -08:00
Hoang-Nam Nguyen	b8b50e353b	IB/ehca: Define array to store SMI/GSI QPs Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:15:44 -08:00
Hoang-Nam Nguyen	0c86e280fe	IB/ehca: Remove CQ-QP-link before destroying QP in error path of create_qp() Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:15:43 -08:00
Erez Zilber	6410627eb9	IB/iser: Add change_queue_depth method Add a .change_queue_depth handler to the scsi_host_template in the iSER driver. iscsi_change_queue_depth was added to iscsi_tcp in order to solve the problem of queue depth which was too high for some targets. It is also applicable for iSER. Signed-off-by: Erez Zilber <erezz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:15:43 -08:00
Erez Zilber	a4ef1451df	IB/iser: Print information about unhandled RDMA CM events Some RDMA CM events are not supported or not handled in iSER. This patch adds some info (printk) for the user about them. Signed-off-by: Erez Zilber <erezz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:15:43 -08:00
Olaf Kirch	a3cd7d9070	IB/fmr_pool: ib_fmr_pool_flush() should flush all dirty FMRs When a FMR is released via ib_fmr_pool_unmap(), the FMR usually ends up on the free_list rather than the dirty_list (because we allow a certain number of remappings before actually requiring a flush). However, ib_fmr_batch_release() only looks at dirty_list when flushing out old mappings. This means that when ib_fmr_pool_flush() is used to force a flush of the FMR pool, some dirty FMRs that have not reached their maximum remap count will not actually be flushed. Fix this by flushing all FMRs that have been used at least once in ib_fmr_batch_release(). Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:15:43 -08:00
Olaf Kirch	a656eb758f	IB/fmr_pool: Flush serial numbers can get out of sync Normally, the serial numbers for flush requests and flushes executed for an FMR pool should be in sync. However, if the FMR pool flushes dirty FMRs because the dirty_watermark was reached, we wake up the cleanup thread and let it do its stuff. As a side effect, the cleanup thread increments pool->flush_ser, which leaves it one higher than pool->req_ser. The next time the user calls ib_flush_fmr_pool(), the cleanup thread will be woken up, but ib_flush_fmr_pool() won't wait for the flush to complete because flush_ser is already past req_ser. This means the FMRs that the user expects to be flushed may not have all been flushed when the function returns. Fix this by telling the cleanup thread to do work exclusively by incrementing req_ser, and by moving the comparison of dirty_len and dirty_watermark into ib_fmr_pool_unmap(). Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>	2008-01-25 14:15:42 -08:00
Roland Dreier	2fe7e6f7c9	IB/umad: Simplify and fix locking In addition to being overly complex, the locking in user_mad.c is broken: there were multiple reports of deadlocks and lockdep warnings. In particular it seems that a single thread may end up trying to take the same rwsem for reading more than once, which is explicitly forbidden in the comments in <linux/rwsem.h>. To solve this, we change the locking to use plain mutexes instead of rwsems. There is one mutex per open file, which protects the contents of the struct ib_umad_file, including the array of agents and list of queued packets; and there is one mutex per struct ib_umad_port, which protects the contents, including the list of open files. We never hold the file mutex across calls to functions like ib_unregister_mad_agent(), which can call back into other ib_umad code to queue a packet, and we always hold the port mutex as long as we need to make sure that a device is not hot-unplugged from under us. This even makes things nicer for users of the -rt patch, since we remove calls to downgrade_write() (which is not implemented in -rt). Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:15:42 -08:00

1 2 3 4 5 ...

77206 Commits All Branches Search

77206 Commits

All Branches