OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Alexander Schmidt	6c02eed930	IB/ehca: Rename goto label in ehca_poll_cq_one() Rename the "poll_cq_one_read_cqe" goto label to what it actually does, namely "repoll". Signed-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-08-12 11:34:58 -07:00
Alexander Schmidt	51ad241af4	IB/ehca: Update qp_state on cached modify_qp() Since the introduction of the port auto-detect mode for ehca, calls to modify_qp() may be cached in the device driver when the ports are not activated yet. When a modify_qp() call is cached, the qp state remains untouched until the port is activated, which will leave the qp in the reset state. In the reset state, however, it is not allowed to post SQ WQEs, which confuses applications like ib_mad. The solution for this problem is to immediately set the qp state as requested by modify_qp(), even when the call is cached. Signed-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-08-12 11:34:58 -07:00
David J. Wilder	b1404069f6	IPoIB/cm: Use vmalloc() to allocate rx_rings There are users that are running UDP applications that require a large receive queue size in order to get good performance. To prevent allocation failures for rx_rings when using non-SRQ mode and large recv_queue_size (1K or larger), use vmalloc() instead of kcalloc() to alocate rx_rings. Signed-off-by: David Wilder <dwilder@us.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-08-08 15:51:29 -07:00
Linus Torvalds	273b257839	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: IB/mad: Test ib_create_send_mad() return with IS_ERR(), not == NULL IB/mlx4: Allow 4K messages for UD QPs mlx4_core: Add ethernet fields to CQE struct IB/ipath: Fix printk format warnings RDMA/cxgb3: Fix deadlock initializing iw_cxgb3 device RDMA/cxgb3: Fix up MW access rights RDMA/cxgb3: Fix QP capabilities RDMA/cma: Remove padding arrays by using struct sockaddr_storage IB/ipath: Use unsigned long for irq flags IPoIB/cm: Set correct SG list in ipoib_cm_init_rx_wr()	2008-08-07 18:14:07 -07:00
Roland Dreier	06a91a02e9	Merge branches 'cma', 'cxgb3', 'ipath', 'ipoib', 'mad' and 'mlx4' into for-linus	2008-08-07 14:12:03 -07:00
Julien Brunel	cd55ef5a10	IB/mad: Test ib_create_send_mad() return with IS_ERR(), not == NULL In case of error, the function ib_create_send_mad() returns an ERR pointer, but never returns a NULL pointer. So testing the return value for error should be done with IS_ERR, not by comparing with NULL. A simplified version of the semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @correct_null_test@ expression x,E; statement S1, S2; @@ x = ib_create_send_mad(...) <... when != x = E if ( ( - x@p2 != NULL + ! IS_ERR ( x ) \| - x@p2 == NULL + IS_ERR( x ) ) ) S1 else S2 ...> ? x = E; // </smpl> Signed-off-by: Julien Brunel <brunel@diku.dk> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-08-07 14:11:56 -07:00
Alex Naslednikov	6e0d733d92	IB/mlx4: Allow 4K messages for UD QPs Current code limits the max message size to 2K for UD QPs, while MTU might be as big as 4K. This patch sets the maximum message size to 4K, which is needed for UD to work correctly on fabrics with a 4K MTU. Signed-off-by: Alex Naslednikov <xalex@mellanox.co.il> Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-08-07 14:06:50 -07:00
Yevgeny Petrilin	f780a9f119	mlx4_core: Add ethernet fields to CQE struct Add ethernet-related fields to struct mlx4_cqe so that the mlx4_en ethernet NIC driver can share the same definition. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-08-06 20:14:06 -07:00
Alexander Beregalov	70117b9e86	IB/ipath: Fix printk format warnings ipath_driver.c:1260: warning: format '%Lx' expects type 'long long unsigned int', but argument 6 has type 'long unsigned int' ipath_driver.c:1459: warning: format '%Lx' expects type 'long long unsigned int', but argument 4 has type 'u64' ipath_intr.c:358: warning: format '%Lx' expects type 'long long unsigned int', but argument 3 has type 'u64' ipath_intr.c:358: warning: format '%Lu' expects type 'long long unsigned int', but argument 6 has type 'u64' ipath_intr.c:1119: warning: format '%Lx' expects type 'long long unsigned int', but argument 5 has type 'u64' ipath_intr.c:1119: warning: format '%Lx' expects type 'long long unsigned int', but argument 3 has type 'u64' ipath_intr.c:1123: warning: format '%Lx' expects type 'long long unsigned int', but argument 3 has type 'u64' ipath_intr.c:1130: warning: format '%Lx' expects type 'long long unsigned int', but argument 4 has type 'u64' ipath_iba7220.c:1032: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64' ipath_iba7220.c:1045: warning: format '%llX' expects type 'long long unsigned int', but argument 3 has type 'u64' ipath_iba7220.c:2506: warning: format '%Lu' expects type 'long long unsigned int', but argument 4 has type 'u64' Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-08-04 11:12:18 -07:00
Steve Wise	be43324d8b	RDMA/cxgb3: Fix deadlock initializing iw_cxgb3 device Running 'ifconfig up' on the cxgb3 interface with iw_cxgb3 loaded causes a deadlock. The rtnl lock is already held in this path. The function fw_supports_fastreg() was introduced in 2.6.27 to conditionally set the IB_DEVICE_MEM_MGT_EXTENSIONS bit iff the firmware was at 7.0 or greater, and this function also acquires the rtnl lock and which thus causes a deadlock. Further, if iw_cxgb3 is loaded _after_ the nic interface is brought up, then the deadlock does not occur and therefore fw_supports_fastreg() does need to grab the rtnl lock in that path. It turns out this code is all useless anyway. The low level driver will NOT allow the open if the firmware isn't 7.0, so iw_cxgb3 can always set the MEM_MGT_EXTENSIONS bit. Simplify... Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-08-04 11:08:37 -07:00
Steve Wise	1c355a6e80	RDMA/cxgb3: Fix up MW access rights - MWs don't have local read/write permissions. - Set the MW_BIND enabled bit if a MR has MW_BIND access. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-08-04 11:05:43 -07:00
Steve Wise	5f0f66b022	RDMA/cxgb3: Fix QP capabilities - Set the stag0 and fastreg capability bits only for kernel qps. - QP_PRIV flag is no longer used, so don't set it. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-08-04 11:04:42 -07:00
Roland Dreier	3f44675439	RDMA/cma: Remove padding arrays by using struct sockaddr_storage There are a few places where the RDMA CM code handles IPv6 by doing struct sockaddr addr; u8 pad[sizeof(struct sockaddr_in6) - sizeof(struct sockaddr)]; This is fragile and ugly; handle this in a better way with just struct sockaddr_storage addr; [ Also roll in patch from Aleksey Senin <alekseys@voltaire.com> to switch to struct sockaddr_storage and get rid of padding arrays in struct rdma_addr. ] Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-08-04 11:02:14 -07:00
Stephen Rothwell	b8b572e101	powerpc: Move include files to arch/powerpc/include/asm from include/asm-powerpc. This is the result of a mkdir arch/powerpc/include/asm git mv include/asm-powerpc/* arch/powerpc/include/asm Followed by a few documentation/comment fixups and a couple of places where <asm-powepc/...> was being used explicitly. Of the latter only one was outside the arch code and it is a driver only built for powerpc. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-08-04 12:02:00 +10:00
Niels de Vos	61a2d07d3f	Remove newline from the description of module parameters Some module parameters with only one line have the '\n' at the end of the description. This is not needed nor wanted as after the description the type (i.e. int) is followed by a newline. Some modules contain a multi-line description, these are not affected by this patch. Signed-off-by: Niels de Vos <niels.devos@wincor-nixdorf.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: John W. Linville <linville@tuxdriver.com> Cc: Ed L. Cashin <ecashin@coraid.com> Cc: Dave Airlie <airlied@linux.ie> Cc: Roland Dreier <rolandd@cisco.com> Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-08-01 12:46:41 -07:00
Vegard Nossum	52fd8ca6ad	IB/ipath: Use unsigned long for irq flags A few functions in the ipath driver incorrectly use unsigned int to hold irq flags for spin_lock_irqsave(). This patch was generated using the Coccinelle framework with the following semantic patch: The semantic patch I used was this: @@ expression lock; identifier flags; expression subclass; @@ - unsigned int flags; + unsigned long flags; ... <+... ( spin_lock_irqsave(lock, flags) \| _spin_lock_irqsave(lock) \| spin_unlock_irqrestore(lock, flags) \| _spin_unlock_irqrestore(lock, flags) \| read_lock_irqsave(lock, flags) \| _read_lock_irqsave(lock) \| read_unlock_irqrestore(lock, flags) \| _read_unlock_irqrestore(lock, flags) \| write_lock_irqsave(lock, flags) \| _write_lock_irqsave(lock) \| write_unlock_irqrestore(lock, flags) \| _write_unlock_irqrestore(lock, flags) \| spin_lock_irqsave_nested(lock, flags, subclass) \| _spin_lock_irqsave_nested(lock, subclass) \| spin_unlock_irqrestore(lock, flags) \| _spin_unlock_irqrestore(lock, flags) \| _raw_spin_lock_flags(lock, flags) \| __raw_spin_lock_flags(lock, flags) ) ...+> Cc: Ralph Campbell <ralph.campbell@qlogic.com> Cc: Julia Lawall <julia@diku.dk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-30 09:29:06 -07:00
Roland Dreier	e08198169e	IPoIB/cm: Set correct SG list in ipoib_cm_init_rx_wr() wr->sg_list should be set to the sge pointer passed in, not priv->cm.rx_sge. Reported-by: Hoang-Nam Nguyen <HNGUYEN@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-30 07:21:46 -07:00
Linus Torvalds	8be1a6d6c7	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: mlx4: Update/add Mellanox Technologies copyright lines to mlx4 driver files mlx4_core: Add VLAN tag field to WQE control segment struct RDMA/nes: CM connection setup/teardown rework IPoIB: Correct help text for INFINIBAND_IPOIB_DEBUG IPoIB/cm: Connected mode is no longer EXPERIMENTAL RDMA/ucm: BKL is not needed for ib_ucm_open() RDMA/ucma: BKL is not needed for ucma_open()	2008-07-26 20:40:36 -07:00
Roland Dreier	cc9969c967	Merge branches 'bkl-removal', 'ipoib', 'mlx4' and 'nes' into for-linus	2008-07-26 13:59:47 -07:00
FUJITA Tomonori	8d8bb39b9e	dma-mapping: add the device argument to dma_mapping_error() Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER architecture does: This enables us to cleanly fix the Calgary IOMMU issue that some devices are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423). I think that per-device dma_mapping_ops support would be also helpful for KVM people to support PCI passthrough but Andi thinks that this makes it difficult to support the PCI passthrough (see the above thread). So I CC'ed this to KVM camp. Comments are appreciated. A pointer to dma_mapping_ops to struct dev_archdata is added. If the pointer is non NULL, DMA operations in asm/dma-mapping.h use it. If it's NULL, the system-wide dma_ops pointer is used as before. If it's useful for KVM people, I plan to implement a mechanism to register a hook called when a new pci (or dma capable) device is created (it works with hot plugging). It enables IOMMUs to set up an appropriate dma_mapping_ops per device. The major obstacle is that dma_mapping_error doesn't take a pointer to the device unlike other DMA operations. So x86 can't have dma_mapping_ops per device. Note all the POWER IOMMUs use the same dma_mapping_error function so this is not a problem for POWER but x86 IOMMUs use different dma_mapping_error functions. The first patch adds the device argument to dma_mapping_error. The patch is trivial but large since it touches lots of drivers and dma-mapping.h in all the architecture. This patch: dma_mapping_error() doesn't take a pointer to the device unlike other DMA operations. So we can't have dma_mapping_ops per device. Note that POWER already has dma_mapping_ops per device but all the POWER IOMMUs use the same dma_mapping_error function. x86 IOMMUs use device argument. [akpm@linux-foundation.org: fix sge] [akpm@linux-foundation.org: fix svc_rdma] [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: fix bnx2x] [akpm@linux-foundation.org: fix s2io] [akpm@linux-foundation.org: fix pasemi_mac] [akpm@linux-foundation.org: fix sdhci] [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: fix sparc] [akpm@linux-foundation.org: fix ibmvscsi] Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Muli Ben-Yehuda <muli@il.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:03 -07:00
Jack Morgenstein	51a379d0c8	mlx4: Update/add Mellanox Technologies copyright lines to mlx4 driver files Update existing Mellanox copyright lines to 2008, and add such lines to files where they are missing. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-25 10:32:52 -07:00
Faisal Latif	6492cdf3a2	RDMA/nes: CM connection setup/teardown rework Major rework of CM connection setup/teardown. We had a number of issues with MPI applications not starting/terminating properly over time. With these changes we were able to run longer on larger clusters. * Remove memory allocation from nes_connect() and nes_cm_connect(). * Fix mini_cm_dec_refcnt_listen() when destroying listener. * Remove unnecessary code from schedule_nes_timer() and nes_cm_timer_tick(). * Functionalize mini_cm_recv_pkt() and process_packet(). * Clean up cm_node->ref_count usage. * Reuse skbs if available. Signed-off-by: Faisal Latif <flatif@neteffect.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-24 20:50:45 -07:00
Roland Dreier	9905922446	IPoIB: Correct help text for INFINIBAND_IPOIB_DEBUG The help text for INFINIBAND_IPOIB_DEBUG refers to "ipoib_debugfs," which no longer exists. Correct this to talk about the files under debugfs that are really created. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-24 20:37:25 -07:00
Roland Dreier	99c3a5a9e3	IPoIB/cm: Connected mode is no longer EXPERIMENTAL Connected mode is now tested and used by lots of people. No need to hide it under CONFIG_EXPERIMENTAL. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-24 20:37:25 -07:00
Roland Dreier	5ba18b186c	RDMA/ucm: BKL is not needed for ib_ucm_open() Remove explicit cycle_kernel_lock() call and document why the code is safe. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-24 20:36:59 -07:00
Roland Dreier	f7a6117ee5	RDMA/ucma: BKL is not needed for ucma_open() Remove explicit lock_kernel() calls and document why the code is safe. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-24 20:36:59 -07:00
Linus Torvalds	5c402355ad	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: MAINTAINERS: Remove Glenn Streiff from NetEffect entry mlx4_core: Improve error message when not enough UAR pages are available IB/mlx4: Add support for memory management extensions and local DMA L_Key IB/mthca: Keep free count for MTT buddy allocator mlx4_core: Keep free count for MTT buddy allocator mlx4_code: Add missing FW status return code IB/mlx4: Rename struct mlx4_lso_seg to mlx4_wqe_lso_seg mlx4_core: Add module parameter to enable QoS support RDMA/iwcm: Remove IB_ACCESS_LOCAL_WRITE from remote QP attributes IPoIB: Include err code in trace message for ib_sa_path_rec_get() failures IB/sa_query: Check if sm_ah is NULL in ib_sa_remove_one() IB/ehca: Release mutex in error path of alloc_small_queue_page() IB/ehca: Use default value for Local CA ACK Delay if FW returns 0 IB/ehca: Filter PATH_MIG events if QP was never armed IB/iser: Add support for RDMA_CM_EVENT_ADDR_CHANGE event RDMA/cma: Add RDMA_CM_EVENT_TIMEWAIT_EXIT event RDMA/cma: Add RDMA_CM_EVENT_ADDR_CHANGE event	2008-07-24 12:56:07 -07:00
Roland Dreier	2cc177364e	Merge branches 'bkl-removal', 'cma', 'ehca', 'for-2.6.27', 'mlx4', 'mthca' and 'nes' into for-linus	2008-07-24 08:38:47 -07:00
Linus Torvalds	26dcce0fab	Merge branch 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits) NR_CPUS: Replace NR_CPUS in speedstep-centrino.c cpumask: Provide a generic set of CPUMASK_ALLOC macros, FIXUP NR_CPUS: Replace NR_CPUS in cpufreq userspace routines NR_CPUS: Replace per_cpu(..., smp_processor_id()) with __get_cpu_var NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genapic_flat_64.c NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genx2apic_uv_x.c NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/proc.c NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/mcheck/mce_64.c cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c, fix cpumask: Use optimized CPUMASK_ALLOC macros in the centrino_target cpumask: Provide a generic set of CPUMASK_ALLOC macros cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c cpumask: Optimize cpumask_of_cpu in kernel/time/tick-common.c cpumask: Optimize cpumask_of_cpu in drivers/misc/sgi-xp/xpc_main.c cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/ldt.c cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/io_apic_64.c cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr Revert "cpumask: introduce new APIs" cpumask: make for_each_cpu_mask a bit smaller net: Pass reference to cpumask variable in net/sunrpc/svc.c ... Fix up trivial conflicts in drivers/cpufreq/cpufreq.c manually	2008-07-23 18:37:44 -07:00
Roland Dreier	95d04f0735	IB/mlx4: Add support for memory management extensions and local DMA L_Key Add support for the following operations to mlx4 when device firmware supports them: - Send with invalidate and local invalidate send queue work requests; - Allocate/free fast register MRs; - Allocate/free fast register MR page lists; - Fast register MR send queue work requests; - Local DMA L_Key. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-23 08:12:26 -07:00
Roland Dreier	e8bb4beb2b	IB/mthca: Keep free count for MTT buddy allocator MTT entries are allocated with a buddy allocator, which just keeps bitmaps for each level of the buddy table. However, all free space starts out at the highest order, and small allocations start scanning from the lowest order. When the lowest order tables have no free space, this can lead to scanning potentially millions of bits before finding a free entry at a higher order. We can avoid this by just keeping a count of how many free entries each order has, and skipping the bitmap scan when an order is completely empty. This provides a nice performance boost for a negligible increase in memory usage. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-22 14:20:05 -07:00
Roland Dreier	47b374752a	IB/mlx4: Rename struct mlx4_lso_seg to mlx4_wqe_lso_seg Make the struct name consistent with other WQE segment struct types defined in <linux/mlx4/qp.h>. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-22 14:19:39 -07:00
Dotan Barak	1ca8d15619	RDMA/iwcm: Remove IB_ACCESS_LOCAL_WRITE from remote QP attributes Remove IB_ACCESS_LOCAL_WRITE from qp.qp_access_flags because this attribute is only used to set remote permissions. Signed-off-by: Dotan Barak <dotanba@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-22 14:18:34 -07:00
Or Gerlitz	01b3fc8b15	IPoIB: Include err code in trace message for ib_sa_path_rec_get() failures Print the return code of ib_sa_path_rec_get() if it fails to help debug errors. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-22 14:18:34 -07:00
Ralph Campbell	64b784b583	IB/sa_query: Check if sm_ah is NULL in ib_sa_remove_one() If update_sm_ah() fails, it leaves the port's sm_ah as NULL. Then if the device or module is removed, ib_sa_remove_one() will dereference a NULL pointer when it calls kref_put(). Fix this by testing if sm_ah is NULL before dropping the reference. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-22 14:18:33 -07:00
Julia Lawall	1a867c33bb	IB/ehca: Release mutex in error path of alloc_small_queue_page() The pd->lock mutex is released on a successful return, so it should be released on an error return as well. The semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @@ expression l; @@ mutex_lock(l); ... when != mutex_unlock(l) when any when strict ( if (...) { ... when != mutex_unlock(l) + mutex_unlock(l); return ...; } \| mutex_unlock(l); ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-22 14:18:10 -07:00
Joachim Fenkes	593e4d4a05	IB/ehca: Use default value for Local CA ACK Delay if FW returns 0 Some firmware versions report a Local CA ACK Delay of 0. In that case, return a more sensible default value of 12 (-> 16 msec) instead. Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-22 14:18:08 -07:00
Joachim Fenkes	5b673b71c8	IB/ehca: Filter PATH_MIG events if QP was never armed Certain firmware versions sometimes cause spurious PATH_MIG events to occur during QP creation. Filter these events by making sure PATH_MIG events are only handed down when they actually make sense (i.e. when the QP has been armed at least once). Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-22 14:18:07 -07:00
Or Gerlitz	2f5de15128	IB/iser: Add support for RDMA_CM_EVENT_ADDR_CHANGE event Enhance iser to act upon notification on network stack changes that make its RDMA connection unaligned with the link used by the stack for the <src,dst> IPs used to establish the connection. When RDMA_CM_EVENT_ADDR_CHANGE arrives, just disconnect the connection, assuming that the user space iscsid daemon will reconnect, and the new connection will be aligned with the IP stack. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-22 14:16:21 -07:00
Amir Vadai	38ca83a588	RDMA/cma: Add RDMA_CM_EVENT_TIMEWAIT_EXIT event Consumers that want to re-use their QPs in new connections need to know when the QP has exited the timewait state. Report the timewait event through the rdma_cm. Signed-off-by: Amir Vadai <amirv@mellanox.co.il> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-22 14:14:23 -07:00
Or Gerlitz	dd5bdff83b	RDMA/cma: Add RDMA_CM_EVENT_ADDR_CHANGE event Add an RDMA_CM_EVENT_ADDR_CHANGE event can be used by rdma-cm consumers that wish to have their RDMA sessions always use the same links (eg <hca/port>) as the IP stack does. In the current code, this does not happen when bonding is used and fail-over happened but the IB link used by an already existing session is operating fine. Use the netevent notification for sensing that a change has happened in the IP stack, then scan the rdma-cm ID list to see if there is an ID that is "misaligned" with respect to the IP stack, and deliver RDMA_CM_EVENT_ADDR_CHANGE for this ID. The consumer can act on the event or just ignore it. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-22 14:14:22 -07:00
Greg Kroah-Hartman	110cf374a8	infiniband: make cm_device use a struct device and not a kobject. This object really should be a struct device, or at least contain a pointer to a struct device, as it is trying to create a separate device tree outside of the main device tree. This patch fixes this problem. It is needed for the class core rework that is being done in the driver core. Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Roland Dreier <rolandd@cisco.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-07-21 21:54:49 -07:00
Greg Kroah-Hartman	d4c4196f24	infiniband: rename "device" to "ib_device" in cm_device This pointer really is a struct ib_device, not a struct device, so name it properly to help prevent confusion. This makes the followon patch in this series much smaller and easier to understand as well. Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Roland Dreier <rolandd@cisco.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-07-21 21:54:49 -07:00
Greg Kroah-Hartman	c76d3d28c3	device create: infiniband: convert device_create to device_create_drvdata device_create() is race-prone, so use the race-free device_create_drvdata() instead as device_create() is going away. Cc: Roland Dreier <rolandd@cisco.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-07-21 21:54:43 -07:00
Ingo Molnar	eb6a12c242	Merge branch 'linus' into cpus4096-for-linus Conflicts: net/sunrpc/svc.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-21 17:19:50 +02:00
Ingo Molnar	bb2c018b09	Merge branch 'linus' into cpus4096 Conflicts: drivers/acpi/processor_throttling.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 22:00:54 +02:00
David S. Miller	49997d7515	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: Documentation/powerpc/booting-without-of.txt drivers/atm/Makefile drivers/net/fs_enet/fs_enet-main.c drivers/pci/pci-acpi.c net/8021q/vlan.c net/iucv/iucv.c	2008-07-18 02:39:39 -07:00
Linus Torvalds	89a93f2f48	Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (102 commits) [SCSI] scsi_dh: fix kconfig related build errors [SCSI] sym53c8xx: Fix bogus sym_que_entry re-implementation of container_of [SCSI] scsi_cmnd.h: remove double inclusion of linux/blkdev.h [SCSI] make struct scsi_{host,target}_type static [SCSI] fix locking in host use of blk_plug_device() [SCSI] zfcp: Cleanup external header file [SCSI] zfcp: Cleanup code in zfcp_erp.c [SCSI] zfcp: zfcp_fsf cleanup. [SCSI] zfcp: consolidate sysfs things into one file. [SCSI] zfcp: Cleanup of code in zfcp_aux.c [SCSI] zfcp: Cleanup of code in zfcp_scsi.c [SCSI] zfcp: Move status accessors from zfcp to SCSI include file. [SCSI] zfcp: Small QDIO cleanups [SCSI] zfcp: Adapter reopen for large number of unsolicited status [SCSI] zfcp: Fix error checking for ELS ADISC requests [SCSI] zfcp: wait until adapter is finished with ERP during auto-port [SCSI] ibmvfc: IBM Power Virtual Fibre Channel Adapter Client Driver [SCSI] sg: Add target reset support [SCSI] lib: Add support for the T10 (SCSI) Data Integrity Field CRC [SCSI] sd: Move scsi_disk() accessor function to sd.h ...	2008-07-15 18:58:04 -07:00
Ingo Molnar	82638844d9	Merge branch 'linus' into cpus4096 Conflicts: arch/x86/xen/smp.c kernel/sched_rt.c net/iucv/iucv.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 00:29:07 +02:00
Ingo Molnar	6c9fcaf2ee	Merge branch 'core/rcu' into core/rcu-for-linus	2008-07-15 21:10:12 +02:00
David S. Miller	b9e4085768	netdev: Do not use TX lock to protect address lists. Now that we have a specific lock to protect the network device unicast and multicast lists, remove extraneous grabs of the TX lock in cases where the code only needs address list protection. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-15 00:15:08 -07:00
David S. Miller	e308a5d806	netdev: Add netdev->addr_list_lock protection. Add netif_addr_{lock,unlock}{,_bh}() helpers. Use them to protect operations that operate on or read the network device unicast and multicast address lists. Also use them in cases where the code simply wants to block calls into the driver's ->set_rx_mode() and ->set_multicast_list() methods. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-15 00:13:44 -07:00
Eli Cohen	f507d28bff	IB/mlx4: Use kzalloc() for new QPs so flags are initialized to 0 Current code uses kmalloc() and then just does a bitwise OR operation on qp->flags in create_qp_common(), which means that qp->flags may potentially have some unintended bits set. This patch uses kzalloc() and avoids further explicit clearing of structure members, which also shrinks the code: add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-65 (-65) function old new delta create_qp_common 2024 1959 -65 Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:53 -07:00
Or Gerlitz	de910bd921	RDMA/cma: Simplify locking needed for serialization of callbacks The RDMA CM has some logic in place to make sure that callbacks on a given CM ID are delivered to the consumer in a serialized manner. Specifically it has code to protect against a device removal racing with a running callback function. This patch simplifies this logic by using a mutex per ID instead of a wait queue and atomic variable. This means that cma_disable_remove() now is more properly named to cma_disable_callback(), and cma_enable_remove() can now be removed because it just would become a trivial wrapper around mutex_unlock(). Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:53 -07:00
Or Gerlitz	64c5e613b9	RDMA/addr: Keep pointer to netdevice in struct rdma_dev_addr Keep a pointer to the local (src) netdevice in struct rdma_dev_addr, and copy it in as part of rdma_copy_addr(). Use rdma_translate_ip() in cma_new_conn_id() to reduce some code duplication and also make sure the src_dev member gets set. In a high-availability configuration the netdevice pointer can be used by the RDMA CM to align RDMA sessions to use the same links as the IP stack does under fail-over and route change cases. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:53 -07:00
Steve Wise	4ab928f692	RDMA/cxgb3: Fixes for zero STag Handling the zero STag in receive work request requires some extra logic in the driver: - Only set the QP_PRIV bit for kernel mode QPs. - Add a zero STag build function for recv wrs. The uP needs a PBL allocated and passed down in the recv WR so it can construct a HW PBL for the zero STag S/G entries. Note: we need to place a few restrictions on zero STag usage because of this: 1) all SGEs in a recv WR must either be zero STag or not. No mixing. 2) an individual SGE length cannot exceed 128MB for a zero-stag SGE. This should be OK since it's not really practical to allocate such a large chunk of pinned contiguous DMA mapped memory. - Add an optimized non-zero-STag recv wr format for kernel users. This is needed to optimize both zero and non-zero STag cracking in the recv path for kernel users. - Remove the iwch_ prefix from the static build functions. - Bump required FW version. Signed-off-by: Steve Wise <swise@opengridcomputing.com>	2008-07-14 23:48:53 -07:00
Steve Wise	96f15c0353	RDMA/core: Add local DMA L_Key support - Change the IB_DEVICE_ZERO_STAG flag to the transport-neutral name IB_DEVICE_LOCAL_DMA_LKEY, which is used by iWARP RNICs to indicate 0 STag support and IB HCAs to indicate reserved L_Key support. - Add a u32 local_dma_lkey member to struct ib_device. Drivers fill this in with the appropriate local DMA L_Key (if they support it). - Fix up the drivers using this flag. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:53 -07:00
Roland Dreier	aed012279d	IB/mthca: Fix check of max_send_sge for special QPs The MLX transport requires two extra gather entries for sends (one for the header and one for the checksum at the end, as the comment says). However the code checked that max_recv_sge was not too big, instead of checking max_send_sge as it should have. Fix the code to check the correct condition. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:52 -07:00
Roland Dreier	c036925ac0	IB/mthca: Use round_jiffies() for catastrophic error polling timer Exactly when the catastrophic error polling timer function runs is not important, so use round_jiffies() to save unnecessary wakeups. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:52 -07:00
Roland Dreier	4522e08ced	IB/mthca: Remove "stop" flag for catastrophic error polling timer Since we use del_timer_sync() anyway, there's no need for an additional flag to tell the timer not to rearm. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:52 -07:00
Eli Cohen	bc3a290b51	IPoIB: Double default RX/TX ring sizes Increase IPoIB ring sizes to twice their original sizes (RX: 128->256, TX: 64->128) to act as a shock absorber for high traffic peaks. With the current settings, we have seen cases that there are many calls to netif_stop_queue(), which causes degradation in throughput. Also, larger receive buffer sizes help IPoIB in CM mode to avoid experiencing RNR NAK conditions due to insufficient receive buffers at the SRQ. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:52 -07:00
Eli Cohen	e112373fd6	IPoIB/cm: Reduce connected mode TX object size Since IPoIB connected mode does not NETIF_F_SG, we only have one DMA mapping per send, so we don't need a mapping[] array. Define a new struct with a single u64 mapping member and use it for the CM tx_ring. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:52 -07:00
Ralph Campbell	df8666198d	IB/ipath: Use IEEE OUI for vendor_id reported by ibv_query_device() The IB spe. for SubnGet(NodeInfo) and query HCA says that the vendor ID field should be the IEEE OUI assigned to the vendor. The ipath driver was returning the PCI vendor ID instead. This will affect applications which call ibv_query_device(). The old value was 0x001fc1 or 0x001077, the new value is 0x001175. The vendor ID doesn't appear to be exported via /sys so that should reduce possible compatibility issues. I'm only aware of Open MPI as a major application which depends on this change, and they have made necessary adjustments. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:52 -07:00
Eli Cohen	bd3606715e	IPoIB: Use dev_set_mtu() to change mtu When the driver sets the MTU of the net device outside of its change_mtu method, it should make use of dev_set_mtu() instead of directly setting the mtu field of struct netdevice. Otherwise functions registered to be called upon MTU change will not get called (this is done through call_netdevice_notifiers() in dev_set_mtu()). Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:51 -07:00
Eli Cohen	c8c2afe360	IPoIB: Use rtnl lock/unlock when changing device flags Use of this lock is required to synchronize changes to the netdvice's data structs. Also move the call to ipoib_flush_paths() after the modification of the netdevice flags in set_mode(). Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:51 -07:00
Roland Dreier	9eae554c17	IPoIB: Get rid of ipoib_mcast_detach() wrapper ipoib_mcast_detach() does nothing except call ib_detach_mcast(), so just use the core API in the one place that does a multicast group detach. add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-105 (-105) function old new delta ipoib_mcast_leave 357 319 -38 ipoib_mcast_detach 67 - -67 Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:50 -07:00
Eli Cohen	d0de13622d	IPoIB: Only set Q_Key once: after joining broadcast group The current code will set the Q_Key for any join of a non-sendonly multicast group. The operation involves a modify QP operation, which is fairly heavyweight, and is only really required after the join of the broadcast group. Fix this by adding a parameter to ipoib_mcast_attach() to control when the Q_Key is set. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:50 -07:00
Eli Cohen	5892eff91a	IPoIB: Remove priv->mcast_mutex No need for a mutex around calls to ib_attach_mcast/ib_detach_mcast since these operations are synchronized at the HW driver layer. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:50 -07:00
Eli Cohen	c03d4731b5	IPoIB: Remove unused IPOIB_MCAST_STARTED code The IPOIB_MCAST_STARTED flag is not used at all since commit `b3e2749b` ("IPoIB: Don't drop multicast sends when they can be queued"), so remove it. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:50 -07:00
Steve Wise	70fe1796a5	RDMA/cxgb3: Set rkey field for new memory windows in iwch_alloc_mw() Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:49 -07:00
Roland Dreier	8294f29767	RDMA/nes: Get rid of ring_doorbell parameter of nes_post_cqp_request() Every caller of nes_post_cqp_request() passed it NES_CQP_REQUEST_RING_DOORBELL, so just remove that parameter and always ring the doorbell. Signed-off-by: Roland Dreier <rolandd@cisco.com> Acked-by: Faisal Latif <flatif@neteffect.com>	2008-07-14 23:48:49 -07:00
Jon Mason	52c8084b74	RDMA/cxgb3: Propagate HW page size capabilities cxgb3 does not currently report the page size capabilities, and incorrectly reports them internally. This version changes the bit-shifting to a static value (per Steve's request). Signed-off-by: Jon Mason <jon@opengridcomputing.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:49 -07:00
Roland Dreier	1ff66e8c1f	RDMA/nes: Encapsulate logic nes_put_cqp_request() The iw_nes driver repeats the logic if (atomic_dec_and_test(&cqp_request->refcount)) { if (cqp_request->dynamic) { kfree(cqp_request); } else { spin_lock_irqsave(&nesdev->cqp.lock, flags); list_add_tail(&cqp_request->list, &nesdev->cqp_avail_reqs); spin_unlock_irqrestore(&nesdev->cqp.lock, flags); } } over and over. Wrap this up in functions nes_free_cqp_request() and nes_put_cqp_request() to simplify such code. In addition to making the source smaller and more readable, this shrinks the compiled code quite a bit: add/remove: 2/0 grow/shrink: 0/13 up/down: 164/-1692 (-1528) function old new delta nes_free_cqp_request - 147 +147 nes_put_cqp_request - 17 +17 nes_modify_qp 2316 2293 -23 nes_hw_modify_qp 737 657 -80 nes_dereg_mr 945 860 -85 flush_wqes 501 416 -85 nes_manage_apbvt 648 560 -88 nes_reg_mr 1117 1026 -91 nes_cqp_ce_handler 927 769 -158 nes_alloc_mw 1052 884 -168 nes_create_qp 5314 5141 -173 nes_alloc_fmr 2212 2035 -177 nes_destroy_cq 1097 918 -179 nes_create_cq 2787 2598 -189 nes_dealloc_mw 762 566 -196 Signed-off-by: Roland Dreier <rolandd@cisco.com> Acked-by: Faisal Latif <flatif@neteffect.com>	2008-07-14 23:48:49 -07:00
Moni Shoua	ee1e2c82c2	IPoIB: Refresh paths instead of flushing them on SM change events The patch tries to solve the problem of device going down and paths being flushed on an SM change event. The method is to mark the paths as candidates for refresh (by setting the new valid flag to 0), and wait for an ARP probe a new path record query. The solution requires a different and less intrusive handling of SM change event. For that, the second argument of the flush function changes its meaning from a boolean flag to a level. In most cases, SM failover doesn't cause LID change so traffic won't stop. In the rare cases of LID change, the remote host (the one that hadn't changed its LID) will lose connectivity until paths are refreshed. This is no worse than the current state. In fact, preventing the device from going down saves packets that otherwise would be lost. Signed-off-by: Moni Levy <monil@voltaire.com> Signed-off-by: Moni Shoua <monis@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:49 -07:00
Joachim Fenkes	038919f296	IB/ehca: Make device table externally visible This gives ehca an autogenerated modalias and therefore enables automatic loading. Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:49 -07:00
Vladimir Sokolovsky	af40da894e	IPoIB: add LRO support Add "ipoib_use_lro" module parameter to enable LRO and an "ipoib_lro_max_aggr" module parameter to set the max number of packets to be aggregated. Make LRO controllable and LRO statistics accessible through ethtool. Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il> Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:48 -07:00
Ron Livne	1240673405	IPoIB: Use multicast loopback blocking if available Set IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK for IPoIB's UD QPs if supported by the underlying device. This creates an improvement of up to 39% in bandwidth when sending multicast packets with IPoIB, and an improvment of 12% in cpu usage. Signed-off-by: Ron Livne <ronli@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:48 -07:00
Ron Livne	521e575b9a	IB/mlx4: Add support for blocking multicast loopback packets Add support for handling the IB_QP_CREATE_MULTICAST_BLOCK_LOOPBACK flag by using the per-multicast group loopback blocking feature of mlx4 hardware. Signed-off-by: Ron Livne <ronli@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:48 -07:00
Steve Wise	14cc180f7b	RDMA/cxgb3: Add support for protocol statistics - Add a new rdma ctl command called RDMA_GET_MIB to the cxgb3 low level driver to obtain the protocol mib from the rnic hardware. - Add new iw_cxgb3 provider method to get the MIB from the low level driver. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:48 -07:00
Steve Wise	7f624d023b	RDMA/core: Add iWARP protocol statistics attributes in sysfs This patch adds a sysfs attribute group called "proto_stats" under /sys/class/infiniband/$device/ and populates this group with protocol statistics if they exist for a given device. Currently, only iWARP stats are defined, but the code is designed to allow InfiniBand protocol stats if they become available. These stats are per-device and more importantly -not- per port. Details: - Add union rdma_protocol_stats in ib_verbs.h. This union allows defining transport-specific stats. Currently only iwarp stats are defined. - Add struct iw_protocol_stats to define the current set of iwarp protocol stats. - Add new ib_device method called get_proto_stats() to return protocol statistics. - Add logic in core/sysfs.c to create iwarp protocol stats attributes if the device is an RNIC and has a get_proto_stats() method. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:48 -07:00
Roland Dreier	a7d834c4bc	IPoIB/cm: Fix racy use of receive WR/SGL in ipoib_cm_post_receive_nonsrq() For devices that don't support SRQs, ipoib_cm_post_receive_nonsrq() is called from both ipoib_cm_handle_rx_wc() and ipoib_cm_nonsrq_init_rx(), and these two callers are not synchronized against each other. However, ipoib_cm_post_receive_nonsrq() always reuses the same receive work request and scatter list structures, so multiple callers can end up stepping on each other, which leads to posting garbled work requests. Fix this by having the caller pass in the ib_recv_wr and ib_sge structures to use, and allocating new local structures in ipoib_cm_nonsrq_init_rx(). Based on a patch by Pradeep Satyanarayana <pradeep@us.ibm.com> and David Wilder <dwilder@us.ibm.com>, with debugging help from Hoang-Nam Nguyen <hnguyen@de.ibm.com>. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:47 -07:00
Roland Dreier	468f2239bc	RDMA/cma: Add missing newlines to printk()s Signed-off-by: Roland Dreier <rolandd@cisco.com> Acked-by: Sean Hefty <sean.hefty@intel.com>	2008-07-14 23:48:47 -07:00
Roland Dreier	eec8845d29	RDMA/cxgb3: Remove write-only iwch_rnic_attributes fields The members struct iwch_rnic_attributes.vendor_id and .vendor_part_id are write-only, so we might as well get rid of them. Signed-off-by: Roland Dreier <rolandd@cisco.com> Acked-by: Steve Wise <swise@opengridcomputing.com>	2008-07-14 23:48:47 -07:00
Steve Wise	97d1cc8055	RDMA/cxgb3: Fix up some ib_device_attr fields - set fw_ver - set hw_ver - set max_qp_wr to something reasonable - set max_cqe to something reasonable Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:47 -07:00
Stefan Roscher	6f7bc01a73	IB/ehca: In case of lost interrupts, trigger EOI to reenable interrupts During corner case testing, we noticed that some versions of ehca do not properly transition to interrupt done in special load situations. This can be resolved by periodically triggering EOI through H_EOI, if EQEs are pending. Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:47 -07:00
Joachim Fenkes	3e255eac56	IB/ehca: Reject receive work requests if QP is in RESET state Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:47 -07:00
Roland Dreier	7c27f35820	IB/mlx4: Remove extra code for RESET->ERR QP state transition Commit `65adfa91` ("IB/mlx4: Fix RESET to RESET and RESET to ERROR transitions") added some extra code to handle a QP state transition from RESET to ERROR. However, the latest 1.2.1 version of the IB spec has clarified that this transition is actually not allowed, so we can remove this extra code again. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:46 -07:00
Roland Dreier	d3809ad097	IB/mthca: Remove extra code for RESET->ERR QP state transition Commit `b18aad71` ("IB/mthca: Fix RESET to ERROR transition") added some extra code to handle a QP state transition from RESET to ERROR. However, the latest 1.2.1 version of the IB spec has clarified that this transition is actually not allowed, so we can remove this extra code again. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:46 -07:00
Ralph Campbell	e5a5e7d59a	IB/core: Reset to error QP state transition is not allowed I was reviewing the QP state transition diagram in the IB 1.2.1 spec and the code for qp_state_table[], and noticed that the code allows a QP to be modified from IB_QPS_RESET to IB_QPS_ERR whereas the notes for figure 124 (pg 457) specifically says that this transition isn't allowed. This is a clarification from earlier versions of the IB spec, which were ambiguous in this area and suggested that the RESET to ERR transition was allowed. Fix up the qp_state_table[] to make RESET->ERR not allowed. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:46 -07:00
Eli Cohen	6578cf3398	IB/mlx4: Pass congestion management class MADs to the HCA ConnectX HCAs support the IB_MGMT_CLASS_CONG_MGMT management class, so process MADs of this class through the MAD_IFC firmware command. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:45 -07:00
Eli Cohen	d1f2cd895f	IB/mlx4: Configure QPs' max message size based on real device capability ConnectX returns the max message size it supports through the QUERY_DEV_CAP firmware command. When modifying a QP to RTR, the max message size for the QP must be specified. This value must not exceed the value declared through QUERY_DEV_CAP. The current code ignores the max allowed size and unconditionally sets the value to 2^31. This patch sets all QPs to the max value allowed as returned from firmware. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:45 -07:00
Steve Wise	e7e5582999	RDMA/cxgb3: MEM_MGT_EXTENSIONS support - set IB_DEVICE_MEM_MGT_EXTENSIONS capability bit if fw supports it. - set max_fast_reg_page_list_len device attribute. - add iwch_alloc_fast_reg_mr function. - add iwch_alloc_fastreg_pbl - add iwch_free_fastreg_pbl - adjust the WQ depth for kernel mode work queues to account for fastreg possibly taking 2 WR slots. - add fastreg_mr work request support. - add local_inv work request support. - add send_with_inv and send_with_se_inv work request support. - removed useless duplicate enums/defines for TPT/MW/MR stuff. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:45 -07:00
Steve Wise	00f7ec36c9	RDMA/core: Add memory management extensions support This patch adds support for the IB "base memory management extension" (BMME) and the equivalent iWARP operations (which the iWARP verbs mandates all devices must implement). The new operations are: - Allocate an ib_mr for use in fast register work requests. - Allocate/free a physical buffer lists for use in fast register work requests. This allows device drivers to allocate this memory as needed for use in posting send requests (eg via dma_alloc_coherent). - New send queue work requests: * send with remote invalidate * fast register memory region * local invalidate memory region * RDMA read with invalidate local memory region (iWARP only) Consumer interface details: - A new device capability flag IB_DEVICE_MEM_MGT_EXTENSIONS is added to indicate device support for these features. - New send work request opcodes IB_WR_FAST_REG_MR, IB_WR_LOCAL_INV, IB_WR_RDMA_READ_WITH_INV are added. - A new consumer API function, ib_alloc_mr() is added to allocate fast register memory regions. - New consumer API functions, ib_alloc_fast_reg_page_list() and ib_free_fast_reg_page_list() are added to allocate and free device-specific memory for fast registration page lists. - A new consumer API function, ib_update_fast_reg_key(), is added to allow the key portion of the R_Key and L_Key of a fast registration MR to be updated. Consumers call this if desired before posting a IB_WR_FAST_REG_MR work request. Consumers can use this as follows: - MR is allocated with ib_alloc_mr(). - Page list memory is allocated with ib_alloc_fast_reg_page_list(). - MR R_Key/L_Key "key" field is updated with ib_update_fast_reg_key(). - MR made VALID and bound to a specific page list via ib_post_send(IB_WR_FAST_REG_MR) - MR made INVALID via ib_post_send(IB_WR_LOCAL_INV), ib_post_send(IB_WR_RDMA_READ_WITH_INV) or an incoming send with invalidate operation. - MR is deallocated with ib_dereg_mr() - page lists dealloced via ib_free_fast_reg_page_list(). Applications can allocate a fast register MR once, and then can repeatedly bind the MR to different physical block lists (PBLs) via posting work requests to a send queue (SQ). For each outstanding MR-to-PBL binding in the SQ pipe, a fast_reg_page_list needs to be allocated (the fast_reg_page_list is owned by the low-level driver from the consumer posting a work request until the request completes). Thus pipelining can be achieved while still allowing device-specific page_list processing. The 32-bit fast register memory key/STag is composed of a 24-bit index and an 8-bit key. The application can change the key each time it fast registers thus allowing more control over the peer's use of the key/STag (ie it can effectively be changed each time the rkey is rebound to a page list). Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:45 -07:00
Eli Cohen	f89271da32	IPoIB: Copy small received SKBs in connected mode The connected mode implementation in the IPoIB driver has a large overhead in the way SKBs are handled in the receive flow. It usually allocates an SKB with as big as was used in the currently received SKB and moves unused fragments from the old SKB to the new one. This involves a loop on all the remaining fragments and incurs overhead on the CPU. This patch, for small SKBs, allocates an SKB just large enough to contain the received data and copies to it the data from the received SKB. The newly allocated SKB is passed to the stack and the old SKB is reposted. When running netperf, UDP small messages, without this pach I get: UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 14.4.3.178 (14.4.3.178) port 0 AF_INET Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 114688 128 10.00 5142034 0 526.31 114688 10.00 1130489 115.71 With this patch I get both send and receive at ~315 mbps. The reason that send performance actually slows down is as follows: When using this patch, the overhead of the CPU for handling RX packets is dramatically reduced. As a result, we do not experience RNR NAK messages from the receiver which cause the connection to be closed and reopened again; when the patch is not used, the receiver cannot handle the packets fast enough so there is less time to post new buffers and hence the mentioned RNR NACKs. So what happens is that the application thinks it posted a certain number of packets for transmission but these packets are flushed and do not really get transmitted. Since the connection gets opened and closed many times, each time netperf gets the CPU time that otherwise would have been given to IPoIB to actually transmit the packets. This can be verified when looking at the port counters -- the output of ifconfig and the oputput of netperf (this is for the case without the patch): tx packets ========== port counter: 1,543,996 ifconfig: 1,581,426 netperf: 5,142,034 rx packets ========== netperf 1,1304,089 Signed-off-by: Eli Cohen <eli@mellanox.co.il>	2008-07-14 23:48:44 -07:00
Roland Dreier	f3781d2e89	RDMA: Remove subversion $Id tags They don't get updated by git and so they're worse than useless. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:44 -07:00
Robert P. J. Day	fd91b1bf1b	IB/ipath: Simplify code using ARRAY_SIZE() macro Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:44 -07:00
Eli Cohen	9670e55391	IB/mlx4: Optimize QP stamping The idea is that for QPs with fixed size work requests (eg selective signaling QPs), before stamping the WQE, we read the value of the DS field, which gives the effective size of the descriptor as used in the previous post. Then we stamp only that area, since the rest of the descriptor is already stamped. When initializing the send queue buffer, make sure the DS field is initialized to the max descriptor size so that the subsequent stamping will be done on the entire descriptor area. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:44 -07:00
Moni Shoua	164ba0893c	IB/sa: Fail requests made while creating new SM AH This patch solves a race that occurs after an event occurs that causes the SA query module to flush its SM address handle (AH). When SM AH becomes invalid and needs an update it is handled by the global workqueue. On the other hand this event is also handled in the IPoIB driver by queuing work in the ipoib_workqueue that does multicast joins. Although queuing is in the right order, it is done to 2 different workqueues and so there is no guarantee that the first to be queued is the first to be executed. This causes a problem because IPoIB may end up sending an request to the old SM, which will take a long time to time out (since the old SM is gone); this leads to a much longer than necessary interruption in multicast traffer. The patch sets the SA query module's SM AH to NULL when the event occurs, and until update_sm_ah() is done, any request that needs sm_ah fails with -EAGAIN return status. For consumers, the patch doesn't make things worse. Before the patch, MADs are sent to the wrong SM so the request gets lost. Consumers can be improved if they examine the return code and respond to EAGAIN properly but even without an improvement the situation is not getting worse. Signed-off-by: Moni Levy <monil@voltaire.com> Signed-off-by: Moni Shoua <monis@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:43 -07:00
Sean Hefty	a947491709	RDMA: Fix license text The license text for several files references a third software license that was inadvertently copied in. Update the license to what was intended. This update was based on a request from HP. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:43 -07:00
Christophe Jaillet	929555a2ba	RDMA/nes: Remove unnecessary memset() Remove an explicit memset(..., 0, ...) of a 'listener' structure allocated with kzalloc(). Signed-off-by: Christophe Jaillet <christophe.jaillet@wanadoo.fr> Acked-by: Faisal Latif <faisal@neteffect.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-07-14 23:48:43 -07:00

1 2 3 4 5 ...

1883 Commits