Commit Graph

1066 Commits

Author SHA1 Message Date
David S. Miller 0d6c4a2e46 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/intel/e1000e/param.c
	drivers/net/wireless/iwlwifi/iwl-agn-rx.c
	drivers/net/wireless/iwlwifi/iwl-trans-pcie-rx.c
	drivers/net/wireless/iwlwifi/iwl-trans.h

Resolved the iwlwifi conflict with mainline using 3-way diff posted
by John Linville and Stephen Rothwell.  In 'net' we added a bug
fix to make iwlwifi report a more accurate skb->truesize but this
conflicted with RX path changes that happened meanwhile in net-next.

In e1000e a conflict arose in the validation code for settings of
adapter->itr.  'net-next' had more sophisticated logic so that
logic was used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-07 23:35:40 -04:00
Linus Torvalds ebcf596d89 A few fixes for regressions introduced in 3.4-rc1:
- fix memory leak in mlx4
  - fix two problems with new MAD response generation code
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJPmYfMAAoJEENa44ZhAt0h06IP/jPJeH855wC2RgWXY2Fd+b6w
 K/DQVE2MaX2f8EHuAB5MKoC0zzchKNu0GTMbrZMizbocDBmIU6ibF3MKx2rycSBC
 FI906X7FFAeSyaywrI+k+ZGxVQYAck1QFDr44WiyJ400r8BGbx9ZECKvIyEwjaUa
 RpRn4Pui4IjMV7GFqiziUsa9uo4xKYagjNfeInAZXelzIN76Coeae28dAIE2rPT/
 BTcrtQ2JMG6H4AVGTImTP+Kqk1Cm7apBeHS4+SGWWTdw1WH4IcMqPmxHBUUtKTWt
 UnlgpHquuC913VpkbkGeog32iAndhHgAkfs67Rs0IFwck127Il/DiPK/zir2zkZ2
 YVilzYIxrSUYqPZYfKFcrKsnjfM+HJxBEPCfDjxfG0TjrVS8tnc75Uq1NvIt6e42
 lcIclLDyZtdsMLmJlsTsT5cF7HlThSjpZDSgmMi2eeOI0RKoqRyyGGwiAIq7Ud2a
 oRa+bheh4pBD2lvzd62uEkgMkTaToAz3I/vlbOTSwxYg2skoHUpwNPCsmstsQ7RU
 BHi7Nl5DLNT5lNBTIouWvbilpXs8eXwgcNrQcWSYaPeY2aDsGAAdi9Q4BadENao9
 dmxOtqjryctX7U7LGDRmNDIfiyPY5AodDUJjFux1JwqbdSCGf1Lz7uvDWu7gqL9V
 TCJFAK54okexpoyV6Vgv
 =iRrT
 -----END PGP SIGNATURE-----

Merge tag 'ib-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband fixes from Roland Dreier:
 "A few fixes for regressions introduced in 3.4-rc1:
   - fix memory leak in mlx4
   - fix two problems with new MAD response generation code"

* tag 'ib-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/mlx4: Fix memory leaks in ib_link_query_port()
  IB/mad: Don't send response for failed MADs
  IB/mad: Set 'D' bit in response for unhandled MADs
2012-04-26 15:35:35 -07:00
Jack Morgenstein a9e7432319 IB/mad: Don't send response for failed MADs
Commit 0b30704304 ("IB/mad: Return error response for unsupported
MADs") does not failed MADs (eg those that return
IB_MAD_RESULT_FAILURE) properly -- these MADs should be silently
discarded. (We should not force the lower-layer drivers to return
SUCCESS | CONSUMED in this case, since the MAD is NOT successful).
Unsupported MADs are not failures -- they return SUCCESS, but with an
"unsupported error" status value inside the response MAD.

Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-04-24 16:08:57 -07:00
Jack Morgenstein 840777de53 IB/mad: Set 'D' bit in response for unhandled MADs
Commit 0b30704304 ("IB/mad: Return error response for unsupported
MADs") does not handle directed-route MADs properly -- it fails to set
the 'D' bit in the response MAD status field.  This is a problem for
SmInfo MADs when the receiver does not have an SM running.

Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-04-24 16:06:50 -07:00
Eric W. Biederman ec8f23ce0f net: Convert all sysctl registrations to register_net_sysctl
This results in code with less boiler plate that is a bit easier
to read.

Additionally stops us from using compatibility code in the sysctl
core, hastening the day when the compatibility code can be removed.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:30 -04:00
Eric W. Biederman 5dd3df105b net: Move all of the network sysctls without a namespace into init_net.
This makes it clearer which sysctls are relative to your current network
namespace.

This makes it a little less error prone by not exposing sysctls for the
initial network namespace in other namespaces.

This is the same way we handle all of our other network interfaces to
userspace and I can't honestly remember why we didn't do this for
sysctls right from the start.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:21:17 -04:00
David S. Miller 011e3c6325 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-04-12 19:41:23 -04:00
Amir Vadai 366cddb402 IB/rdma_cm: TOS <=> UP mapping for IBoE
Both tagged traffic and untagged traffic use tc tool mapping.
Treat RDMA TOS same as IP TOS when mapping to SL

Signed-off-by: Amir Vadai <amirv@mellanox.com>
CC: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-05 05:08:04 -04:00
Roland Dreier 0559d8dc13 IB/core: Don't return EINVAL from sysfs rate attribute for invalid speeds
Commit e9319b0cb0 ("IB/core: Fix SDR rates in sysfs") changed our
sysfs rate attribute to return EINVAL to userspace if the underlying
device driver returns an invalid rate.  Apparently some drivers do this
when the link is down and some userspace pukes if it gets an error when
reading this attribute, so avoid a regression by not return an error to
match the old code.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-04-02 10:57:31 -07:00
David S. Miller 4e24ffa4d9 infiniband: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02 04:33:42 -04:00
Linus Torvalds 0c2fe82a9b InfiniBand/RDMA changes for the 3.4 merge window. Nothing big really
stands out; by patch count lots of fixes to the mlx4 driver plus some
 cleanups and fixes to the core and other drivers.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJPZ2THAAoJEENa44ZhAt0hMgkP/AwXjwzz6lYlIizytQ5KOH11
 CT/VoHLIGIwY31aRhZRHggWLEbJi8akgfh04K9PiSh/muFRPYk5KJDoQEoJV5Odv
 12krqtpZeL41YcAPq0wiyP3Ki5AZDCDumzWbOtmxE9m93mVfi3yFTTWDHFo/7SOx
 A2kUITdKuZhgBpCFYS23Gs8QTWcMPUlwNlM76edG90BjSySHsK15tbqTUaEOC6Ux
 SPG0p0c2YjlZCPmRObrCL65o5LFRVajinZ4rXN7rre4LEU+IuXgW+mQU2Kwy8z6a
 oMPE2l4OMSqj270r2PlVK087gGH69nKfLgLQLJiP0Id+KwdYUk7vQcA+Fu0jwjP3
 Ci/ahllvB76IiaNbax0vTy2ohCJmilLLotAP46NW0vPR2/KnmVy0Mqk8pXQQddw4
 tnAFNnafn/MjTty8GwqyXR/enJZs29ePcSxcYnKjXYRIgG4ldejL2wktgSra8+gb
 3/qFag1HRgZzcICkXGpHj7uWJ2KDe++1m6KTb0THUmrjuiel6ATFZpYoeRed3GfS
 ZMqpjZvuXM8nA9CrKMkzm5vIiUFF8Qq0hUTLR38F8FiNEhmC08JqH5PqjJ3Z18if
 R1yG4W4xmp/0ICHg4zyg6LWV3YsweDD+jSCJpW+KNBvu3HtYb41HIlK+i2TTmtPD
 3etEZZRwROAyYKcwN01p
 =DbLx
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA changes for the 3.4 merge window from Roland Dreier:
 "Nothing big really stands out; by patch count lots of fixes to the
  mlx4 driver plus some cleanups and fixes to the core and other
  drivers."

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (28 commits)
  mlx4_core: Scale size of MTT table with system RAM
  mlx4_core: Allow dynamic MTU configuration for IB ports
  IB/mlx4: Fix info returned when querying IBoE ports
  IB/mlx4: Fix possible missed completion event
  mlx4_core: Report thermal error events
  mlx4_core: Fix one more static exported function
  IB: Change CQE "csum_ok" field to a bit flag
  RDMA/iwcm: Reject connect requests if cmid is not in LISTEN state
  RDMA/cxgb3: Don't pass irq flags to flush_qp()
  mlx4_core: Get rid of redundant ext_port_cap flags
  RDMA/ucma: Fix AB-BA deadlock
  IB/ehca: Fix ilog2() compile failure
  IB: Use central enum for speed instead of hard-coded values
  IB/iser: Post initial receive buffers before sending the final login request
  IB/iser: Free IB connection resources in the proper place
  IB/srp: Consolidate repetitive sysfs code
  IB/srp: Use pr_fmt() and pr_err()/pr_warn()
  IB/core: Fix SDR rates in sysfs
  mlx4: Enforce device max FMR maps in FMR alloc
  IB/mlx4: Set bad_wr for invalid send opcode
  ...
2012-03-21 10:33:42 -07:00
Roland Dreier f0e88aeb19 Merge branches 'cma', 'cxgb3', 'cxgb4', 'ehca', 'iser', 'mad', 'nes', 'qib', 'srp' and 'srpt' into for-next 2012-03-19 09:50:33 -07:00
Steve Wise 3eae7c9f97 RDMA/iwcm: Reject connect requests if cmid is not in LISTEN state
When destroying a listening cmid, the iwcm first marks the state of
the cmid as DESTROYING, then releases the lock and calls into the
iWARP provider to destroy the endpoint.  Since the cmid is not locked,
its possible for the iWARP provider to pass a connection request event
to the iwcm, which will be silently dropped by the iwcm.  This causes
the iWARP provider to never free up the resources from this connection
because the assumption is the iwcm will accept or reject this connection.

The solution is to reject these connection requests.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-03-07 15:14:53 -08:00
Hefty, Sean 186834b5de RDMA/ucma: Fix AB-BA deadlock
When we destroy a cm_id, we must purge associated events from the
event queue.  If the cm_id is for a listen request, we also purge
corresponding pending connect requests.  This requires destroying
the cm_id's associated with the connect requests by calling
rdma_destroy_id().  rdma_destroy_id() blocks until all outstanding
callbacks have completed.

The issue is that we hold file->mut while purging events from the
event queue.  We also acquire file->mut in our event handler.  Calling
rdma_destroy_id() while holding file->mut can lead to a deadlock,
since the event handler callback cannot acquire file->mut, which
prevents rdma_destroy_id() from completing.

Fix this by moving events to purge from the event queue to a temporary
list.  We can then release file->mut and call rdma_destroy_id()
outside of holding any locks.

Bug report by Or Gerlitz <ogerlitz@mellanox.com>:

    [ INFO: possible circular locking dependency detected ]
    3.3.0-rc5-00008-g79f1e43-dirty #34 Tainted: G          I

    tgtd/9018 is trying to acquire lock:
     (&id_priv->handler_mutex){+.+.+.}, at: [<ffffffffa0359a41>] rdma_destroy_id+0x33/0x1f0 [rdma_cm]

    but task is already holding lock:
     (&file->mut){+.+.+.}, at: [<ffffffffa02470fe>] ucma_free_ctx+0xb6/0x196 [rdma_ucm]

    which lock already depends on the new lock.


    the existing dependency chain (in reverse order) is:

    -> #1 (&file->mut){+.+.+.}:
           [<ffffffff810682f3>] lock_acquire+0xf0/0x116
           [<ffffffff8135f179>] mutex_lock_nested+0x64/0x2e6
           [<ffffffffa0247636>] ucma_event_handler+0x148/0x1dc [rdma_ucm]
           [<ffffffffa035a79a>] cma_ib_handler+0x1a7/0x1f7 [rdma_cm]
           [<ffffffffa0333e88>] cm_process_work+0x32/0x119 [ib_cm]
           [<ffffffffa03362ab>] cm_work_handler+0xfb8/0xfe5 [ib_cm]
           [<ffffffff810423e2>] process_one_work+0x2bd/0x4a6
           [<ffffffff810429e2>] worker_thread+0x1d6/0x350
           [<ffffffff810462a6>] kthread+0x84/0x8c
           [<ffffffff81369624>] kernel_thread_helper+0x4/0x10

    -> #0 (&id_priv->handler_mutex){+.+.+.}:
           [<ffffffff81067b86>] __lock_acquire+0x10d5/0x1752
           [<ffffffff810682f3>] lock_acquire+0xf0/0x116
           [<ffffffff8135f179>] mutex_lock_nested+0x64/0x2e6
           [<ffffffffa0359a41>] rdma_destroy_id+0x33/0x1f0 [rdma_cm]
           [<ffffffffa024715f>] ucma_free_ctx+0x117/0x196 [rdma_ucm]
           [<ffffffffa0247255>] ucma_close+0x77/0xb4 [rdma_ucm]
           [<ffffffff810df6ef>] fput+0x117/0x1cf
           [<ffffffff810dc76e>] filp_close+0x6d/0x78
           [<ffffffff8102b667>] put_files_struct+0xbd/0x17d
           [<ffffffff8102b76d>] exit_files+0x46/0x4e
           [<ffffffff8102d057>] do_exit+0x299/0x75d
           [<ffffffff8102d599>] do_group_exit+0x7e/0xa9
           [<ffffffff8103ae4b>] get_signal_to_deliver+0x536/0x555
           [<ffffffff81001717>] do_signal+0x39/0x634
           [<ffffffff81001d39>] do_notify_resume+0x27/0x69
           [<ffffffff81361c03>] retint_signal+0x46/0x83

    other info that might help us debug this:

     Possible unsafe locking scenario:

           CPU0                    CPU1
           ----                    ----
      lock(&file->mut);
                                   lock(&id_priv->handler_mutex);
                                   lock(&file->mut);
      lock(&id_priv->handler_mutex);

     *** DEADLOCK ***

    1 lock held by tgtd/9018:
     #0:  (&file->mut){+.+.+.}, at: [<ffffffffa02470fe>] ucma_free_ctx+0xb6/0x196 [rdma_ucm]

    stack backtrace:
    Pid: 9018, comm: tgtd Tainted: G          I  3.3.0-rc5-00008-g79f1e43-dirty #34
    Call Trace:
     [<ffffffff81029e9c>] ? console_unlock+0x18e/0x207
     [<ffffffff81066433>] print_circular_bug+0x28e/0x29f
     [<ffffffff81067b86>] __lock_acquire+0x10d5/0x1752
     [<ffffffff810682f3>] lock_acquire+0xf0/0x116
     [<ffffffffa0359a41>] ? rdma_destroy_id+0x33/0x1f0 [rdma_cm]
     [<ffffffff8135f179>] mutex_lock_nested+0x64/0x2e6
     [<ffffffffa0359a41>] ? rdma_destroy_id+0x33/0x1f0 [rdma_cm]
     [<ffffffff8106546d>] ? trace_hardirqs_on_caller+0x11e/0x155
     [<ffffffff810654b1>] ? trace_hardirqs_on+0xd/0xf
     [<ffffffffa0359a41>] rdma_destroy_id+0x33/0x1f0 [rdma_cm]
     [<ffffffffa024715f>] ucma_free_ctx+0x117/0x196 [rdma_ucm]
     [<ffffffffa0247255>] ucma_close+0x77/0xb4 [rdma_ucm]
     [<ffffffff810df6ef>] fput+0x117/0x1cf
     [<ffffffff810dc76e>] filp_close+0x6d/0x78
     [<ffffffff8102b667>] put_files_struct+0xbd/0x17d
     [<ffffffff8102b5cc>] ? put_files_struct+0x22/0x17d
     [<ffffffff8102b76d>] exit_files+0x46/0x4e
     [<ffffffff8102d057>] do_exit+0x299/0x75d
     [<ffffffff8102d599>] do_group_exit+0x7e/0xa9
     [<ffffffff8103ae4b>] get_signal_to_deliver+0x536/0x555
     [<ffffffff810654b1>] ? trace_hardirqs_on+0xd/0xf
     [<ffffffff81001717>] do_signal+0x39/0x634
     [<ffffffff8135e037>] ? printk+0x3c/0x45
     [<ffffffff8106546d>] ? trace_hardirqs_on_caller+0x11e/0x155
     [<ffffffff810654b1>] ? trace_hardirqs_on+0xd/0xf
     [<ffffffff81361803>] ? _raw_spin_unlock_irq+0x2b/0x40
     [<ffffffff81039011>] ? set_current_blocked+0x44/0x49
     [<ffffffff81361bce>] ? retint_signal+0x11/0x83
     [<ffffffff81001d39>] do_notify_resume+0x27/0x69
     [<ffffffff8118a1fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
     [<ffffffff81361c03>] retint_signal+0x46/0x83

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-03-05 12:27:57 -08:00
Or Gerlitz 2e96691c31 IB: Use central enum for speed instead of hard-coded values
The kernel IB stack uses one enumeration for IB speed, which wasn't
explicitly specified in the verbs header file.  Add that enum, and use
it all over the code.

The IB speed/width notation is also used by iWARP and IBoE HW drivers,
which use the convention of rate = speed * width to advertise their
port link rate.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-03-05 09:25:16 -08:00
Roland Dreier e9319b0cb0 IB/core: Fix SDR rates in sysfs
Commit 71eeba16 ("IB: Add new InfiniBand link speeds") introduced a bug 
where eg the rate for IB 4X SDR links iss displayed as "8.5 Gb/sec" 
instead of "10 Gb/sec" as it used to be.  Fix that.

Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-02-27 09:15:08 -08:00
Pablo Neira Ayuso 80d326fab5 netlink: add netlink_dump_control structure for netlink_dump_start()
Davem considers that the argument list of this interface is getting
out of control. This patch tries to address this issue following
his proposal:

struct netlink_dump_control c = { .dump = dump, .done = done, ... };

netlink_dump_start(..., &c);

Suggested by David S. Miller.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-26 14:10:06 -05:00
Swapna Thete 0b30704304 IB/mad: Return error response for unsupported MADs
Set up a response with appropriate error status and send it for MADs
that are not supported by a specific class/version.

Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Swapna Thete <swapna.thete@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-02-25 17:47:32 -08:00
David S. Miller d5ef8a4d87 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/infiniband/hw/nes/nes_cm.c

Simple whitespace conflict.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-10 23:32:28 -05:00
Roland Dreier f36ae34238 Merge branches 'cma', 'ipath', 'misc', 'mlx4', 'nes' and 'qib' into for-next 2012-01-30 16:18:21 -08:00
Sean Hefty 9ced69ca52 RDMA/ucma: Discard all events for new connections until accepted
After reporting a new connection request to user space, the rdma_ucm
will discard subsequent events until the user has associated a user
space idenfier with the kernel cm_id.  This is needed to avoid
reporting a reject/disconnect event to the user for a request that
they may not have processed.

The user space identifier is set once the user tries to accept the
connection request.  However, the following race exists in ucma_accept():

	ctx->uid = cmd.uid;
	<events may be reported now>
	ret = rdma_accept(ctx->cm_id, ...);

Once ctx->uid has been set, new events may be reported to the user.
While the above mentioned race is avoided, there is an issue that the
user _may_ receive a reject/disconnect event if rdma_accept() fails,
depending on when the event is processed.  To simplify the use of
rdma_accept(), discard all events unless rdma_accept() succeeds.

This problem was discovered based on questions from Roland Dreier
<roland@purestorage.com>.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-01-27 10:07:48 -08:00
Bernd Schubert e47e321a35 RDMA/core: Fix kernel panic by always initializing qp->usecnt
We have just been investigating kernel panics related to
cq->ibcq.event_handler() completion calls.  The problem is that
ib_destroy_qp() fails with -EBUSY.

Further investigation revealed qp->usecnt is not initialized.  This
counter was introduced in linux-3.2 by commit 0e0ec7e063
("RDMA/core: Export ib_open_qp() to share XRC TGT QPs") but it only
gets initialized for IB_QPT_XRC_TGT, but it is checked in
ib_destroy_qp() for any QP type.

Fix this by initializing qp->usecnt for every QP we create.

Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: Sven Breuner <sven.breuner@itwm.fraunhofer.de>

[ Initialize qp->usecnt in uverbs too.  - Sean ]

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-01-27 09:20:10 -08:00
David Miller 02b619555a infiniband: Convert dst_fetch_ha() over to dst_neigh_lookup().
Now we must provide the IP destination address, and a reference has
to be dropped when we're done with the entry.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-25 21:30:36 -05:00
Linus Torvalds 48fa57ac2c infiniband changes for 3.3 merge window
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJPBQQDAAoJEENa44ZhAt0hy1EP/A/dz741mEd2QZxYHgloK8XU
 sSoHvq+vDxHnOvBrDuaHXT47FoY+OSVE+ESeJJQJ9L+B6g3yacP3hNSIcguXFs8Z
 v011AZAeRvQx3bLu5R9+eDL+YTyotkR0sl/huoLkSwlqrEqGA85eLqf5RSQdxYZf
 iC1ZXfg0KTrtb6rBvohNcijpmIVEe83SWfnD/ZuCGuWq++DyVJxzECnR7p5D8a9q
 eMJEnIKVEIpqkqXrPQr/blVSfGQL54QuUdYtoKAS8ZW6BzjIwCGUmKdoT1vaNqX5
 sIntxXMcgIgE2r0y/nDK+QIFS4U784eUevIC/LeunbhWUEQX05f3l6+V566/T9hX
 lvp5M6aonsSSvtqrVi6SF5rvSHFlwPpvAY3+jhjXKLpZ5OxMqf/ZlTN1xN4bin1A
 whGnznU+51Tjzph6Or8iXo5yExDUQhowX1Z3CYDmh/UqzKHRqaFAuiC071r8GZW3
 BEOV9yf/+qPsgtXAiO4jSKlLrOJbMgEI4BoITXTO9HvZH9dHGXDYLvULdHDmFaBi
 XLg5zcAjou24855miv/gnBQzDc0NWW184BGS9hPE9zmbQlJr7gA4zI0Eggtj+3MO
 7z/SLTrxKSfjZJR8Z3cGsnBjCs1VFqV+YQnTkyZYLORLf4F3RbDLe6aJQ+9WBA1g
 86J11MjrG30erg3gbXun
 =8ZmW
 -----END PGP SIGNATURE-----

Merge tag 'infiniband-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

infiniband changes for 3.3 merge window

* tag 'infiniband-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  rdma/core: Fix sparse warnings
  RDMA/cma: Fix endianness bugs
  RDMA/nes: Fix terminate during AE
  RDMA/nes: Make unnecessarily global nes_set_pau() static
  RDMA/nes: Change MDIO bus clock to 2.5MHz
  IB/cm: Fix layout of APR message
  IB/mlx4: Fix SL to 802.1Q priority-bits mapping for IBoE
  IB/qib: Default some module parameters optimally
  IB/qib: Optimize locking for get_txreq()
  IB/qib: Fix a possible data corruption when receiving packets
  IB/qib: Eliminate 64-bit jiffies use
  IB/qib: Fix style issues
  IB/uverbs: Protect QP multicast list
2012-01-08 14:05:48 -08:00
Linus Torvalds 972b2c7199 Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (165 commits)
  reiserfs: Properly display mount options in /proc/mounts
  vfs: prevent remount read-only if pending removes
  vfs: count unlinked inodes
  vfs: protect remounting superblock read-only
  vfs: keep list of mounts for each superblock
  vfs: switch ->show_options() to struct dentry *
  vfs: switch ->show_path() to struct dentry *
  vfs: switch ->show_devname() to struct dentry *
  vfs: switch ->show_stats to struct dentry *
  switch security_path_chmod() to struct path *
  vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb
  vfs: trim includes a bit
  switch mnt_namespace ->root to struct mount
  vfs: take /proc/*/mounts and friends to fs/proc_namespace.c
  vfs: opencode mntget() mnt_set_mountpoint()
  vfs: spread struct mount - remaining argument of next_mnt()
  vfs: move fsnotify junk to struct mount
  vfs: move mnt_devname
  vfs: move mnt_list to struct mount
  vfs: switch pnode.h macros to struct mount *
  ...
2012-01-08 12:19:57 -08:00
Roland Dreier 1583676d9e Merge branches 'cma', 'misc', 'mlx4', 'nes', 'qib' and 'uverbs' into for-next 2012-01-04 09:18:20 -08:00
Sean Hefty c89d1bedf8 rdma/core: Fix sparse warnings
Clean up sparse warnings in the rdma core layer.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-01-04 09:17:45 -08:00
Sean Hefty 46ea5061c7 RDMA/cma: Fix endianness bugs
Fix endianness bugs reported by sparse in the RDMA core stack.  Note
that these are real bugs, but don't affect any existing code to the
best of my knowledge.  The mlid issue would only affect kernel users
of rdma_join_multicast which have the rdma_cm attach/detach its QP.
There are no current in tree users that do this. (rdma_join_multicast
may be used called by user space applications, which does not have
this issue.)  And the pkey setting is simply returned as
informational.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-01-04 09:13:52 -08:00
Eli Cohen 6f233d300d IB/cm: Fix layout of APR message
Add a missing 16-bit reserved field between ap_status and info fields.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-01-03 21:04:18 -08:00
Eli Cohen e214a0fe2b IB/uverbs: Protect QP multicast list
Userspace verbs multicast attach/detach operations on a QP are done
while holding the rwsem of the QP for reading.  That's not sufficient
since a reader lock allows more than one reader to acquire the
lock.  However, multicast attach/detach does list manipulation that
can corrupt the list if multiple threads run in parallel.

Fix this by acquiring the rwsem as a writer to serialize attach/detach
operations.  Add idr_write_qp() and put_qp_write() to encapsulate
this.

This fixes oops seen when running applications that perform multicast
joins/leaves.

Reported by: Mike Dubman <miked@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-01-03 20:36:48 -08:00
Al Viro 2c9ede55ec switch device_get_devnode() and ->devnode() to umode_t *
both callers of device_get_devnode() are only interested in lower 16bits
and nobody tries to return anything wider than 16bit anyway.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:55 -05:00
David S. Miller abb434cb05 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/bluetooth/l2cap_core.c

Just two overlapping changes, one added an initialization of
a local variable, and another change added a new local variable.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-23 17:13:56 -05:00
Sean Hefty 04ded16724 RDMA/cma: Verify private data length
private_data_len is defined as a u8.  If the user specifies a large
private_data size (> 220 bytes), we will calculate a total length that
exceeds 255, resulting in private_data_len wrapping back to 0.  This
can lead to overwriting random kernel memory.  Avoid this by verifying
that the resulting size fits into a u8.

Reported-by: B. Thery <benjamin.thery@bull.net>
Addresses: <http://bugs.openfabrics.org/bugzilla/show_bug.cgi?id=2335>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-12-19 09:15:33 -08:00
David Miller 51d4597451 infiniband: addr: Consolidate code to fetch neighbour hardware address from dst.
IPV4 should do exactly what the IPV6 code does here, which is
use the neighbour obtained via the dst entry.

And now that the two code paths do the same thing, use a common
helper function to perform the operation.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Roland Dreier <roland@purestorage.com>
2011-12-05 15:20:19 -05:00
David Miller 2721745501 net: Rename dst_get_neighbour{, _raw} to dst_get_neighbour_noref{, _raw}.
To reflect the fact that a refrence is not obtained to the
resulting neighbour entry.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Roland Dreier <roland@purestorage.com>
2011-12-05 15:20:19 -05:00
David S. Miller b3613118eb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2011-12-02 13:49:21 -05:00
Eric Dumazet 580da35a31 IB: Fix RCU lockdep splats
Commit f2c31e32b3 ("net: fix NULL dereferences in check_peer_redir()")
forgot to take care of infiniband uses of dst neighbours.

Many thanks to Marc Aurele who provided a nice bug report and feedback.

Reported-by: Marc Aurele La France <tsi@ualberta.ca>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-11-29 13:37:11 -08:00
Alexey Dobriyan 4e3fd7a06d net: remove ipv6_addr_copy()
C assignment can handle struct in6_addr copying.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-22 16:43:32 -05:00
Linus Torvalds 32aaeffbd4 Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
  Revert "tracing: Include module.h in define_trace.h"
  irq: don't put module.h into irq.h for tracking irqgen modules.
  bluetooth: macroize two small inlines to avoid module.h
  ip_vs.h: fix implicit use of module_get/module_put from module.h
  nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
  include: replace linux/module.h with "struct module" wherever possible
  include: convert various register fcns to macros to avoid include chaining
  crypto.h: remove unused crypto_tfm_alg_modname() inline
  uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
  pm_runtime.h: explicitly requires notifier.h
  linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
  miscdevice.h: fix up implicit use of lists and types
  stop_machine.h: fix implicit use of smp.h for smp_processor_id
  of: fix implicit use of errno.h in include/linux/of.h
  of_platform.h: delete needless include <linux/module.h>
  acpi: remove module.h include from platform/aclinux.h
  miscdevice.h: delete unnecessary inclusion of module.h
  device_cgroup.h: delete needless include <linux/module.h>
  net: sch_generic remove redundant use of <linux/module.h>
  net: inet_timewait_sock doesnt need <linux/module.h>
  ...

Fix up trivial conflicts (other header files, and  removal of the ab3550 mfd driver) in
 - drivers/media/dvb/frontends/dibx000_common.c
 - drivers/media/video/{mt9m111.c,ov6650.c}
 - drivers/mfd/ab3550-core.c
 - include/linux/dmaengine.h
2011-11-06 19:44:47 -08:00
Linus Torvalds f470f8d4e7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (62 commits)
  mlx4_core: Deprecate log_num_vlan module param
  IB/mlx4: Don't set VLAN in IBoE WQEs' control segment
  IB/mlx4: Enable 4K mtu for IBoE
  RDMA/cxgb4: Mark QP in error before disabling the queue in firmware
  RDMA/cxgb4: Serialize calls to CQ's comp_handler
  RDMA/cxgb3: Serialize calls to CQ's comp_handler
  IB/qib: Fix issue with link states and QSFP cables
  IB/mlx4: Configure extended active speeds
  mlx4_core: Add extended port capabilities support
  IB/qib: Hold links until tuning data is available
  IB/qib: Clean up checkpatch issue
  IB/qib: Remove s_lock around header validation
  IB/qib: Precompute timeout jiffies to optimize latency
  IB/qib: Use RCU for qpn lookup
  IB/qib: Eliminate divide/mod in converting idx to egr buf pointer
  IB/qib: Decode path MTU optimization
  IB/qib: Optimize RC/UC code by IB operation
  IPoIB: Use the right function to do DMA unmap pages
  RDMA/cxgb4: Use correct QID in insert_recv_cqe()
  RDMA/cxgb4: Make sure flush CQ entries are collected on connection close
  ...
2011-11-01 10:51:38 -07:00
Roland Dreier 504255f8d0 Merge branches 'amso1100', 'cma', 'cxgb3', 'cxgb4', 'fdr', 'ipath', 'ipoib', 'misc', 'mlx4', 'misc', 'nes', 'qib' and 'xrc' into for-next 2011-11-01 09:37:08 -07:00
Christoph Lameter bc3e53f682 mm: distinguish between mlocked and pinned pages
Some kernel components pin user space memory (infiniband and perf) (by
increasing the page count) and account that memory as "mlocked".

The difference between mlocking and pinning is:

A. mlocked pages are marked with PG_mlocked and are exempt from
   swapping. Page migration may move them around though.
   They are kept on a special LRU list.

B. Pinned pages cannot be moved because something needs to
   directly access physical memory. They may not be on any
   LRU list.

I recently saw an mlockalled process where mm->locked_vm became
bigger than the virtual size of the process (!) because some
memory was accounted for twice:

Once when the page was mlocked and once when the Infiniband
layer increased the refcount because it needt to pin the RDMA
memory.

This patch introduces a separate counter for pinned pages and
accounts them seperately.

Signed-off-by: Christoph Lameter <cl@linux.com>
Cc: Mike Marciniszyn <infinipath@qlogic.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:46 -07:00
Paul Gortmaker b108d9764c infiniband: add in export.h for files using EXPORT_SYMBOL/THIS_MODULE
These were getting it implicitly via device.h --> module.h but
we are going to stop that when we clean up the headers.

Fix these in advance so the tree remains biscect-clean.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:31:35 -04:00
Paul Gortmaker e4dd23d753 infiniband: Fix up module files that need to include module.h
They had been getting it implicitly via device.h but we can't
rely on that for the future, due to a pending cleanup so fix
it now.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:31:35 -04:00
Paul Gortmaker fc87af74af infiniband: Fix up users implicitly relying on getting stat.h
They get it via module.h (via device.h) but we want to clean that up.
When we do, we'll get things like:

  CC [M]  drivers/infiniband/core/sysfs.o
  sysfs.c:361: error: 'S_IRUGO' undeclared here (not in a function)
  sysfs.c:654: error: 'S_IWUSR' undeclared here (not in a function)

so add in the stat header it is using explicitly in advance.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:31:34 -04:00
Sean Hefty 42849b2697 RDMA/uverbs: Export ib_open_qp() capability to user space
Allow processes that share the same XRC domain to open an existing
shareable QP.  This permits those processes to receive events on the
shared QP and transfer ownership, so that any process may modify the
QP.  The latter allows the creating process to exit, while a remaining
process can still transition it for path migration purposes.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:50:56 -07:00
Sean Hefty 0e0ec7e063 RDMA/core: Export ib_open_qp() to share XRC TGT QPs
XRC TGT QPs are shared resources among multiple processes.  Since the
creating process may exit, allow other processes which share the same
XRC domain to open an existing QP.  This allows us to transfer
ownership of an XRC TGT QP to another process.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:49:51 -07:00
Sean Hefty 2622e18ef4 IB/cm: Do not automatically disconnect XRC TGT QPs
Because an XRC TGT QP can end up being shared among multiple
processes, don't have the ib_cm automatically send a DREQ when the
userspace process that owns the ib_cm_id exits.  Disconnect can be
initiated by the user directly; otherwise, the owner of the XRC INI QP
controls the connection.

Note that as a result of the process exiting, the ib_cm will stop
tracking the XRC connection on the target side.  For the purposes of
disconnecting, this isn't a big deal.  The ib_cm will respond to the
DREQ appropriately.  For other messages, mainly LAP, the CM will
reject the request, since there's no one available to route the
request to.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:42:06 -07:00
Sean Hefty 18c441a6c3 RDMA/cma: Support XRC QPs
Allow users to connect XRC QPs through the rdma_cm.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:41:36 -07:00
Sean Hefty 638ef7a6c6 RDMA/ucm: Allow user to specify QP type when creating id
Allow the user to indicate the QP type separately from the port space
when allocating an rdma_cm_id.  With RDMA_PS_IB, there is no longer a
1:1 relationship between the QP type and port space, so we need to
switch on the QP type to select between UD and connected QPs.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:40:36 -07:00
Sean Hefty 2d2e941529 RDMA/cm: Define new RDMA port space specific to IB
Add RDMA_PS_IB.  XRC QP types will use the IB port space when operating
over the RDMA CM.  For the 'IP protocol' field value, we select 0x3F,
which is listed as being for 'any local network'.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:39:52 -07:00
Sean Hefty ef70044647 IB/cm: Update XRC support based on XRC annex errata
The XRC annex was updated to have XRC behave more like RD. Specifically,
the XRC TGT QPN moves from the local QPN to local EECN field.  Lookup of
SRQN is done using the REQ/REP protocol.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:38:35 -07:00
Sean Hefty d26a360b77 IB/cm: Update protocol to support XRC
Update the REQ and REP messages to support XRC connection setup
according to the XRC Annex.  Several existing fields must be set to 0 or
1 when connecting XRC QPs, and a reserved field is changed to an
extended transport type.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:37:55 -07:00
Sean Hefty b93f3c1872 RDMA/uverbs: Export XRC TGT QPs to user space
Allow user space to operate on XRC TGT QPs the same way as other types
of QPs, with one notable exception: since XRC TGT QPs may be shared
among multiple processes, the XRC TGT QP is allowed to exist beyond the
lifetime of the creating process.

The process that creates the QP is allowed to destroy it, but if the
process exits without destroying the QP, then the QP will be left bound
to the lifetime of the XRCD.

TGT QPs are not associated with CQs or a PD.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:37:07 -07:00
Sean Hefty 9977f4f64b RDMA/uverbs: Export XRC INI QPs to userspace
XRC INI QPs are similar to send only RC QPs.  Allow user space to create
INI QPs.  Note that INI QPs do not require receive CQs.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:32:27 -07:00
Sean Hefty 8541f8de05 RDMA/uverbs: Export XRC SRQs to user space
We require additional information to create XRC SRQs than we can
exchange using the existing create SRQ ABI.  Provide an enhanced create
ABI for extended SRQ types.

Based on patches by Jack Morgenstein <jackm@dev.mellanox.co.il>
and Roland Dreier <roland@purestorage.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:29:18 -07:00
Sean Hefty 53d0bd1e7f RDMA/uverbs: Export XRC domains to user space
Allow user space to create XRC domains.  Because XRCDs are expected to
be shared among multiple processes, we use inodes to identify an XRCD.

Based on patches by Jack Morgenstein <jackm@dev.mellanox.co.il>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:21:24 -07:00
Sean Hefty d3d72d909e RDMA/verbs: Cleanup XRC TGT QPs when destroying XRCD
XRC TGT QPs are intended to be shared among multiple users and
processes.  Allow the destruction of an XRC TGT QP to be done explicitly
through ib_destroy_qp() or when the XRCD is destroyed.

To support destroying an XRC TGT QP, we need to track TGT QPs with the
XRCD.  When the XRCD is destroyed, all tracked XRC TGT QPs are also
cleaned up.

To avoid stale reference issues, if a user is holding a reference on a
TGT QP, we increment a reference count on the QP.  The user releases the
reference by calling ib_release_qp.  This releases any access to the QP
from a user above verbs, but allows the QP to continue to exist until
destroyed by the XRCD.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:20:27 -07:00
Sean Hefty b42b63cf0d RDMA/core: Add XRC QPs
XRC ("eXtended reliable connected") is an IB transport that provides
better scalability by allowing senders to specify which shared receive
queue (SRQ) should be used to receive a message, which essentially
allows one transport context (QP connection) to serve multiple
destinations (as long as they share an adapter, of course).

XRC communication is between an initiator (INI) QP and a target (TGT)
QP.  Target QPs are associated with SRQs through an XRCD.  An XRC TGT QP
behaves like a receive-only RD QP.  XRC INI QPs behave similarly to RC
QPs, except that work requests posted to an XRC INI QP must specify the
remote SRQ that is the target of the work request.

We define two new QP types for XRC, to distinguish between INI and TGT
QPs, and update the core layer to support XRC QPs.

This patch is derived from work by Jack Morgenstein
<jackm@dev.mellanox.co.il>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:16:19 -07:00
Sean Hefty 418d51307d RDMA/core: Add XRC SRQ type
XRC ("eXtended reliable connected") is an IB transport that provides
better scalability by allowing senders to specify which shared receive
queue (SRQ) should be used to receive a message, which essentially
allows one transport context (QP connection) to serve multiple
destinations (as long as they share an adapter, of course).

XRC defines SRQs that are specifically used by XRC connections.  Expand
the SRQ code to support XRC SRQs.  An XRC SRQ is currently restricted to
only XRC use according to the IB XRC Annex.

Portions of this patch were derived from work by
Jack Morgenstein <jackm@dev.mellanox.co.il>.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:14:31 -07:00
Sean Hefty 96104eda01 RDMA/core: Add SRQ type field
Currently, there is only a single ("basic") type of SRQ, but with XRC
support we will add a second.  Prepare for this by defining an SRQ type
and setting all current users to IB_SRQT_BASIC.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:13:26 -07:00
Sean Hefty 59991f94eb RDMA/core: Add XRC domain support
XRC ("eXtended reliable connected") is an IB transport that provides
better scalability by allowing senders to specify which shared receive
queue (SRQ) should be used to receive a message, which essentially
allows one transport context (QP connection) to serve multiple
destinations (as long as they share an adapter, of course).

A few new concepts are introduced to support this.  This patch adds:

 - A new device capability flag, IB_DEVICE_XRC, which low-level
   drivers set to indicate that a device supports XRC.
 - A new object type, XRC domains (struct ib_xrcd), and new verbs
   ib_alloc_xrcd()/ib_dealloc_xrcd().  XRCDs are used to limit which
   XRC SRQs an incoming message can target.

This patch is derived from work by Jack Morgenstein <jackm@dev.mellanox.co.il>.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-12 10:32:26 -07:00
Marcel Apfelbaum 71eeba161d IB: Add new InfiniBand link speeds
Introduce support for the following extended speeds:

FDR-10: a Mellanox proprietary link speed which is 10.3125 Gbps with
        64b/66b encoding rather than 8b/10b encoding.
FDR:    IBA extended speed 14.0625 Gbps.
EDR:    IBA extended speed 25.78125 Gbps.

Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-11 11:53:47 -07:00
Kumar Sanghvi 3ebeebc38b RDMA/iwcm: Propagate ird/ord values upwards
Update struct iw_cm_event to support propagating the ird/ord values
upwards to the application.

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-06 09:37:52 -07:00
Hefty, Sean caf6e3f221 RDMA/ucm: Removed checks for unsigned value < 0
cmd is unsigned, no need to check for < 0.  Found by code inspection.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-06 09:33:05 -07:00
Hefty, Sean b7ab0b19a8 IB/mad: Verify mgmt class in received MADs
If a received MAD contains an invalid or reserved mgmt class, we will
attempt to access method_table outside of its range.  Add a check to
ensure that mgmt class falls within the handled range.

Found by code inspection.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-06 09:33:05 -07:00
Hefty, Sean f45ee80eb0 RDMA/cma: Check for NULL conn_param in rdma_accept
Check that conn_param is not null before dereferencing it when
processing rdma_accept().  This is necessary to prevent a possible
system crash, which can be caused by user space.

Problem found by code inspection.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-06 09:33:04 -07:00
Hefty, Sean 9595480c5d RDMA/cma: Fix crash in cma_req_handler
The RDMA CM uses the local qp_type to determine how to process an
incoming request.  This can result in an incoming REQ being treated as
a SIDR REQ and vice versa.  Fix this by switching off the event type
instead, and for good measure verify that the listener supports the
incoming connection request.

This problem showed up when a user space application mismatched the QP
types between a client and server app.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-06 09:32:33 -07:00
Linus Torvalds ece236ce2f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (26 commits)
  IB/qib: Defer HCA error events to tasklet
  mlx4_core: Bump the driver version to 1.0
  RDMA/cxgb4: Use printk_ratelimited() instead of printk_ratelimit()
  IB/mlx4: Support PMA counters for IBoE
  IB/mlx4: Use flow counters on IBoE ports
  IB/pma: Add include file for IBA performance counters definitions
  mlx4_core: Add network flow counters
  mlx4_core: Fix location of counter index in QP context struct
  mlx4_core: Read extended capabilities into the flags field
  mlx4_core: Extend capability flags to 64 bits
  IB/mlx4: Generate GID change events in IBoE code
  IB/core: Add GID change event
  RDMA/cma: Don't allow IPoIB port space for IBoE
  RDMA: Allow for NULL .modify_device() and .modify_port() methods
  IB/qib: Update active link width
  IB/qib: Fix potential deadlock with link down interrupt
  IB/qib: Add sysfs interface to read free contexts
  IB/mthca: Remove unnecessary read of PCI_CAP_ID_EXP
  IB/qib: Remove double define
  IB/qib: Remove unnecessary read of PCI_CAP_ID_EXP
  ...
2011-07-22 14:50:12 -07:00
Roland Dreier 4460207561 Merge branches 'cma', 'cxgb4', 'ipath', 'misc', 'mlx4', 'mthca', 'qib' and 'srp' into for-next 2011-07-22 11:56:11 -07:00
Or Gerlitz 761d90ed4c IB/core: Add GID change event
Add IB GID change event type.  This is needed for IBoE when the HW
driver updates the GID (e.g when new VLANs are added/deleted) table
and the change should be reflected to the IB core cache.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 21:04:30 -07:00
Moni Shoua 2efdd6a038 RDMA/cma: Don't allow IPoIB port space for IBoE
This patch fixes a kernel crash in cma_set_qkey().

When the link layer is Ethernet, it is wrong to use IPoIB port space
since no IPoIB interface is available.  Specifically, setting the
Q_Key when port space is RDMA_PS_IPOIB requires MGID calculation and
an SA query, which doesn't make sense over Ethernet.

Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 16:50:25 -07:00
Bart Van Assche 10e1b54bbb RDMA: Allow for NULL .modify_device() and .modify_port() methods
These methods don't make sense for iWARP devices, so rather than
forcing them to implement stubs, just return -ENOSYS in the core if
the hardware driver doesn't set .modify_device and/or .modify_port.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 16:44:30 -07:00
Jack Morgenstein 0c9361fcde RDMA/cma: Avoid assigning an IS_ERR value to cm_id pointer in CMA id object
Avoid assigning an IS_ERR value to the cm_id pointer.  This fixes a
few anomalies in the error flow due to confusion about checking for
NULL vs IS_ERR, and eliminates the need to test for the IS_ERR value
every time we wish to determine if the cma_id object has a cm device
associated with it.

Also, eliminate the now-unnecessary procedure cma_has_cm_dev (we can
check directly for the existence of the device pointer -- for a
non-NULL check, makes no difference if it is the iwarp or the ib
pointer).

Finally, make a few code changes here to improve coding consistency.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 09:44:55 -07:00
David S. Miller 69cce1d140 net: Abstract dst->neighbour accesses behind helpers.
dst_{get,set}_neighbour()

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-17 23:11:35 -07:00
David S. Miller 6a7ebdf2fd Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	net/bluetooth/l2cap_core.c
2011-07-14 07:56:40 -07:00
Goldwyn Rodrigues b2bc478219 RDMA: Check for NULL mode in .devnode methods
Commits 71c29bd5c2 ("IB/uverbs: Add devnode method to set path/mode")
and c3af0980ce ("IB: Add devnode methods to cm_class and umad_class")
added devnode methods that set the mode.

However, these methods don't check for a NULL mode, and so we get a
crash when unloading modules because devtmpfs_delete_node() calls
device_get_devnode() with mode == NULL.

Add the missing checks.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de>
[ Also fix cm.c.  - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-04 15:53:28 -07:00
Greg Rose c7ac8679be rtnetlink: Compute and store minimum ifinfo dump size
The message size allocated for rtnl ifinfo dumps was limited to
a single page.  This is not enough for additional interface info
available with devices that support SR-IOV and caused a bug in
which VF info would not be displayed if more than approximately
40 VFs were created per interface.

Implement a new function pointer for the rtnl_register service that will
calculate the amount of data required for the ifinfo dump and allocate
enough data to satisfy the request.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-06-09 20:38:07 -07:00
Linus Torvalds 4c171acc20 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  RDMA/cma: Save PID of ID's owner
  RDMA/cma: Add support for netlink statistics export
  RDMA/cma: Pass QP type into rdma_create_id()
  RDMA: Update exported headers list
  RDMA/cma: Export enum cma_state in <rdma/rdma_cm.h>
  RDMA/nes: Add a check for strict_strtoul()
  RDMA/cxgb3: Don't post zero-byte read if endpoint is going away
  RDMA/cxgb4: Use completion objects for event blocking
  IB/srp: Fix integer -> pointer cast warnings
  IB: Add devnode methods to cm_class and umad_class
  IB/mad: Return EPROTONOSUPPORT when an RDMA device lacks the QP required
  IB/uverbs: Add devnode method to set path/mode
  RDMA/ucma: Add .nodename/.mode to tell userspace where to create device node
  RDMA: Add netlink infrastructure
  RDMA: Add error handling to ib_core_init()
2011-05-26 12:13:57 -07:00
Roland Dreier 8dc4abdf4c Merge branches 'cma', 'cxgb3', 'cxgb4', 'misc', 'nes', 'netlink', 'srp' and 'uverbs' into for-next 2011-05-25 13:47:20 -07:00
Nir Muchtar 83e9502d8d RDMA/cma: Save PID of ID's owner
Save the PID associated with an RDMA CM ID for reporting via netlink.

Signed-off-by: Nir Muchtar <nirm@voltaire.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-25 13:46:23 -07:00
Nir Muchtar 753f618ae0 RDMA/cma: Add support for netlink statistics export
Add callbacks and data types for statistics export of all current
devices/ids.  The schema for RDMA CM is a series of netlink messages.
Each one contains an rdma_cm_stat struct.  Additionally, two netlink
attributes are created for the addresses for each message (if
applicable).

Their types used are:
RDMA_NL_RDMA_CM_ATTR_SRC_ADDR (The source address for this ID)
RDMA_NL_RDMA_CM_ATTR_DST_ADDR (The destination address for this ID)
sockaddr_* structs are encapsulated within these attributes.

In other words, every transaction contains a series of messages like:

-------message 1-------
struct rdma_cm_id_stats {
       __u32 qp_num;
       __u32 bound_dev_if;
       __u32 port_space;
       __s32 pid;
       __u8 cm_state;
       __u8 node_type;
       __u8 port_num;
       __u8 reserved;
}
RDMA_NL_RDMA_CM_ATTR_SRC_ADDR attribute - contains the source address
RDMA_NL_RDMA_CM_ATTR_DST_ADDR attribute - contains the destination address
-------end 1-------
-------message 2-------
struct rdma_cm_id_stats
RDMA_NL_RDMA_CM_ATTR_SRC_ADDR attribute
RDMA_NL_RDMA_CM_ATTR_DST_ADDR attribute
-------end 2-------

Signed-off-by: Nir Muchtar <nirm@voltaire.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-25 13:46:23 -07:00
Sean Hefty b26f9b9949 RDMA/cma: Pass QP type into rdma_create_id()
The RDMA CM currently infers the QP type from the port space selected
by the user.  In the future (eg with RDMA_PS_IB or XRC), there may not
be a 1-1 correspondence between port space and QP type.  For netlink
export of RDMA CM state, we want to export the QP type to userspace,
so it is cleaner to explicitly associate a QP type to an ID.

Modify rdma_create_id() to allow the user to specify the QP type, and
use it to make our selections of datagram versus connected mode.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-25 13:46:23 -07:00
Nir Muchtar 550e5ca77e RDMA/cma: Export enum cma_state in <rdma/rdma_cm.h>
Move cma.c's internal definition of enum cma_state to enum rdma_cm_state
in an exported header so that it can be exported via RDMA netlink.

Signed-off-by: Nir Muchtar <nirm@voltaire.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-25 13:46:22 -07:00
Roland Dreier c3af0980ce IB: Add devnode methods to cm_class and umad_class
We want the ucmX, umadX and issmX device nodes to show up under
/dev/infiniband, and additionally ucmX should have mode 0666.  Add
appropriate devnode methods to their class structs for this.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-23 11:24:28 -07:00
Ira Weiny c8367c4cd9 IB/mad: Return EPROTONOSUPPORT when an RDMA device lacks the QP required
We had a script which was looping through the devices returned from
ibstat and attempted to register a SMI agent on an ethernet device.
This caused a kernel panic for IBoE devices that don't have QP0.

Fix this by checking if the QP exists before using it.

Signed-off-by: Ira Weiny <weiny2@llnl.gov>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-23 11:15:48 -07:00
Roland Dreier 71c29bd5c2 IB/uverbs: Add devnode method to set path/mode
We want udev to create a device node under /dev/infiniband with
permission 0666 for uverbsX devices, so add a devnode method to set the
appropriate info.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-23 11:10:05 -07:00
Roland Dreier 04ea2f8197 RDMA/ucma: Add .nodename/.mode to tell userspace where to create device node
We want udev to create a device node under /dev/infiniband with
permission 0666 for rdma_cm, so add that info to our struct miscdevice.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
2011-05-23 10:48:43 -07:00
Linus Torvalds 06f4e926d2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1446 commits)
  macvlan: fix panic if lowerdev in a bond
  tg3: Add braces around 5906 workaround.
  tg3: Fix NETIF_F_LOOPBACK error
  macvlan: remove one synchronize_rcu() call
  networking: NET_CLS_ROUTE4 depends on INET
  irda: Fix error propagation in ircomm_lmp_connect_response()
  irda: Kill set but unused variable 'bytes' in irlan_check_command_param()
  irda: Kill set but unused variable 'clen' in ircomm_connect_indication()
  rxrpc: Fix set but unused variable 'usage' in rxrpc_get_transport()
  be2net: Kill set but unused variable 'req' in lancer_fw_download()
  irda: Kill set but unused vars 'saddr' and 'daddr' in irlan_provider_connect_indication()
  atl1c: atl1c_resume() is only used when CONFIG_PM_SLEEP is defined.
  rxrpc: Fix set but unused variable 'usage' in rxrpc_get_peer().
  rxrpc: Kill set but unused variable 'local' in rxrpc_UDP_error_handler()
  rxrpc: Kill set but unused variable 'sp' in rxrpc_process_connection()
  rxrpc: Kill set but unused variable 'sp' in rxrpc_rotate_tx_window()
  pkt_sched: Kill set but unused variable 'protocol' in tc_classify()
  isdn: capi: Use pr_debug() instead of ifdefs.
  tg3: Update version to 3.119
  tg3: Apply rx_discards fix to 5719/5720
  ...

Fix up trivial conflicts in arch/x86/Kconfig and net/mac80211/agg-tx.c
as per Davem.
2011-05-20 13:43:21 -07:00
Roland Dreier b2cbae2c24 RDMA: Add netlink infrastructure
Add basic RDMA netlink infrastructure that allows for registration of
RDMA clients for which data is to be exported and supplies message
construction callbacks.

Signed-off-by: Nir Muchtar <nirm@voltaire.com>

[ Reorganize a few things, add CONFIG_NET dependency.  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-20 11:46:11 -07:00
Nir Muchtar fd75c789ab RDMA: Add error handling to ib_core_init()
Fail RDMA midlayer initialization if sysfs setup fails.

Signed-off-by: Nir Muchtar <nirm@voltaire.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-20 11:46:10 -07:00
David S. Miller 5fc3590c81 infiniband: Remove rt->rt_src usage in addr4_resolve()
Use an explicit flow key and fetch it from there.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-10 13:32:48 -07:00
Roland Dreier d0c49bf391 RDMA/iwcm: Get rid of enum iw_cm_event_status
The IW_CM_EVENT_STATUS_xxx values were used in only a couple of places;
cma.c uses -Exxx values instead, and so do the amso1100, cxgb3 and cxgb4
drivers -- only nes was using the enum values (with the mild consequence
that all nes connection failures were treated as generic errors rather
than reported as timeouts or rejections).

We can fix this confusion by getting rid of enum iw_cm_event_status and
using a plain int for struct iw_cm_event.status, and converting nes to
use -Exxx as the other iWARP drivers do.

This also gets rid of the warning

    drivers/infiniband/core/cma.c: In function 'cma_iw_handler':
    drivers/infiniband/core/cma.c:1333:3: warning: case value '4294967185' not in enumerated type 'enum iw_cm_event_status'
    drivers/infiniband/core/cma.c:1336:3: warning: case value '4294967186' not in enumerated type 'enum iw_cm_event_status'
    drivers/infiniband/core/cma.c:1332:3: warning: case value '4294967192' not in enumerated type 'enum iw_cm_event_status'

Signed-off-by: Roland Dreier <roland@purestorage.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Faisal Latif <faisal.latif@intel.com>
2011-05-09 22:23:57 -07:00
Hefty, Sean a9bb79128a RDMA/cma: Add an ID_REUSEADDR option
Lustre requires that clients bind to a privileged port number before
connecting to a remote server.  On larger clusters (typically more
than about 1000 nodes), the number of privileged ports is exhausted,
resulting in lustre being unusable.

To handle this, we add support for reusable addresses to the rdma_cm.
This mimics the behavior of the socket option SO_REUSEADDR.  A user
may set an rdma_cm_id to reuse an address before calling
rdma_bind_addr() (explicitly or implicitly).  If set, other
rdma_cm_id's may be bound to the same address, provided that they all
have reuse enabled, and there are no active listens.

If rdma_listen() is called on an rdma_cm_id that has reuse enabled, it
will only succeed if there are no other id's bound to that same
address.  The reuse option is exported to user space.  The behavior of
the kernel reuse implementation was verified against that given by
sockets.

This patch is derived from a path by Ira Weiny <weiny2@llnl.gov>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-09 22:06:10 -07:00
Hefty, Sean 43b752daae RDMA/cma: Fix handling of IPv6 addressing in cma_use_port
cma_use_port() assumes that the sockaddr is an IPv4 address.  Since
IPv6 addressing is supported (and also to support other address
families) make the code more generic in its address handling.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-09 22:06:10 -07:00
Michael Heinz 1eba843dd7 IB/mad: Improve an error message so error code is included
Signed-off-by: Michael Heinz <michael.heinz@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-03-18 09:42:20 -07:00
Sean Hefty 1bdd6384c2 RDMA/addr: Fix return of uninitialized ret value
Commit b23dd4fe42 ("ipv4: Make output route lookup return rtable
directly") resulted in leaving ret uninitialized, where it may later
be returned.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-03-17 17:00:19 -07:00
Linus Torvalds 7a6362800c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1480 commits)
  bonding: enable netpoll without checking link status
  xfrm: Refcount destination entry on xfrm_lookup
  net: introduce rx_handler results and logic around that
  bonding: get rid of IFF_SLAVE_INACTIVE netdev->priv_flag
  bonding: wrap slave state work
  net: get rid of multiple bond-related netdevice->priv_flags
  bonding: register slave pointer for rx_handler
  be2net: Bump up the version number
  be2net: Copyright notice change. Update to Emulex instead of ServerEngines
  e1000e: fix kconfig for crc32 dependency
  netfilter ebtables: fix xt_AUDIT to work with ebtables
  xen network backend driver
  bonding: Improve syslog message at device creation time
  bonding: Call netif_carrier_off after register_netdevice
  bonding: Incorrect TX queue offset
  net_sched: fix ip_tos2prio
  xfrm: fix __xfrm_route_forward()
  be2net: Fix UDP packet detected status in RX compl
  Phonet: fix aligned-mode pipe socket buffer header reserve
  netxen: support for GbE port settings
  ...

Fix up conflicts in drivers/staging/brcm80211/brcmsmac/wl_mac80211.c
with the staging updates.
2011-03-16 16:29:25 -07:00
Sean Hefty a396d43a35 RDMA/cma: Replace global lock in rdma_destroy_id() with id-specific one
rdma_destroy_id currently uses the global rdma cm 'lock' to test if an
rdma_cm_id has been bound to a device.  This prevents an active
address resolution callback handler from assigning a device to the
rdma_cm_id after rdma_destroy_id checks for one.

Instead, we can replace the use of the global lock around the check to
the rdma_cm_id device pointer by setting the id state to destroying,
then flushing all active callbacks.  The latter is accomplished by
acquiring and releasing the handler_mutex.  Any active handler will
complete first, and any newly scheduled handlers will find the
rdma_cm_id in an invalid state.

In addition to optimizing the current locking scheme, the use of the
rdma_cm_id mutex is a more intuitive synchronization mechanism than
that of the global lock.  These changes are based on feedback from
Doug Ledford <dledford@redhat.com> while he was trying to debug a
crash in the rdma cm destroy path.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-03-15 10:57:34 -07:00
Sean Hefty 8d8ac86564 IB/cm: Cancel pending LAP message when exiting IB_CM_ESTABLISH state
This problem was reported by Moni Shoua <monis@mellanox.com> and Amir
Vadai <amirv@mellanox.com>:

	When destroying a cm_id from a context of a work queue and if
	the lap_state of this cm_id is IB_CM_LAP_SENT, we need to
	release the reference of this id that was taken upon the send
	of the LAP message.  Otherwise, if the expected APR message
	gets lost, it is only after a long time that the reference
	will be released, while during that the work handler thread is
	not available to process other things.

It turns out that we need to cancel any pending LAP messages whenever
we transition out of the IB_CM_ESTABLISH state.  This occurs when
disconnecting - either sending or receiving a DREQ.  It can also
happen in a corner case where we receive a REJ message after sending
an RTU, followed by a LAP.  Add checks and cancel any outstanding LAP
messages in these three cases.

Canceling the LAP when sending a DREQ fixes the destroy problem
reported by Moni.  When a cm_id is destroyed in the IB_CM_ESTABLISHED
state, it sends a DREQ to the remote side to notify the peer that the
connection is going away.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-03-15 10:56:12 -07:00
Sean Hefty 29963437a4 IB/cm: Bump reference count on cm_id before invoking callback
When processing a SIDR REQ, the ib_cm allocates a new cm_id.  The
refcount of the cm_id is initialized to 1.  However, cm_process_work
will decrement the refcount after invoking all callbacks.  The result
is that the cm_id will end up with refcount set to 0 by the end of the
sidr req handler.

If a user tries to destroy the cm_id, the destruction will proceed,
under the incorrect assumption that no other threads are referencing
the cm_id.  This can lead to a crash when the cm callback thread tries
to access the cm_id.

This problem was noticed as part of a larger investigation with kernel
crashes in the rdma_cm when running on a real time OS.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-03-15 10:56:12 -07:00
Sean Hefty 25ae21a101 RDMA/cma: Fix crash in request handlers
Doug Ledford and Red Hat reported a crash when running the rdma_cm on
a real-time OS.  The crash has the following call trace:

    cm_process_work
       cma_req_handler
          cma_disable_callback
          rdma_create_id
             kzalloc
             init_completion
          cma_get_net_info
          cma_save_net_info
          cma_any_addr
             cma_zero_addr
          rdma_translate_ip
             rdma_copy_addr
          cma_acquire_dev
             rdma_addr_get_sgid
             ib_find_cached_gid
             cma_attach_to_dev
          ucma_event_handler
             kzalloc
             ib_copy_ah_attr_to_user
          cma_comp

[ preempted ]

    cma_write
        copy_from_user
        ucma_destroy_id
           copy_from_user
           _ucma_find_context
           ucma_put_ctx
           ucma_free_ctx
              rdma_destroy_id
                 cma_exch
                 cma_cancel_operation
                 rdma_node_get_transport

        rt_mutex_slowunlock
        bad_area_nosemaphore
        oops_enter

They were able to reproduce the crash multiple times with the
following details:

    Crash seems to always happen on the:
            mutex_unlock(&conn_id->handler_mutex);
    as conn_id looks to have been freed during this code path.

An examination of the code shows that a race exists in the request
handlers.  When a new connection request is received, the rdma_cm
allocates a new connection identifier.  This identifier has a single
reference count on it.  If a user calls rdma_destroy_id() from another
thread after receiving a callback, rdma_destroy_id will proceed to
destroy the id and free the associated memory.  However, the request
handlers may still be in the process of running.  When control returns
to the request handlers, they can attempt to access the newly created
identifiers.

Fix this by holding a reference on the newly created rdma_cm_id until
the request handler is through accessing it.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-03-15 10:00:28 -07:00
David S. Miller 4c9483b2fb ipv6: Convert to use flowi6 where applicable.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:54 -08:00
David S. Miller 1d28f42c1b net: Put flowi_* prefix on AF independent members of struct flowi
I intend to turn struct flowi into a union of AF specific flowi
structs.  There will be a common structure that each variant includes
first, much like struct sock_common.

This is the first step to move in that direction.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:44 -08:00
David S. Miller 78fbfd8a65 ipv4: Create and use route lookup helpers.
The idea here is this minimizes the number of places one has to edit
in order to make changes to how flows are defined and used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:42 -08:00
David S. Miller b23dd4fe42 ipv4: Make output route lookup return rtable directly.
Instead of on the stack.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-02 14:31:35 -08:00
Roland Dreier e51c7b1ab0 Merge branches 'amso1100', 'cma', 'cxgb4', 'misc', 'mlx4' and 'qib' into for-next 2011-01-29 20:45:04 -08:00
Tejun Heo 96e61fa55e RDMA: Update missed conversion of flush_scheduled_work()
Commit f06267104d ("RDMA: Update workqueue usage") introduced ib_wq
and removed the use of flush_scheduled_work(); however, during the merge
process one chunk was lost in ib_sa_remove_one().  Fix it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-01-28 16:39:08 -08:00
Steve Wise e86f8b06f5 RDMA/ucma: Copy iWARP route information on queries
For iWARP rdma_cm ids, the "route" information is the L2 src and
next hop addresses.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-01-28 16:34:05 -08:00
Tejun Heo f06267104d RDMA: Update workqueue usage
* ib_wq is added, which is used as the common workqueue for infiniband
  instead of the system workqueue.  All system workqueue usages
  including flush_scheduled_work() callers are converted to use and
  flush ib_wq.

* cancel_delayed_work() + flush_scheduled_work() converted to
  cancel_delayed_work_sync().

* qib_wq is removed and ib_wq is used instead.

This is to prepare for deprecation of flush_scheduled_work().

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-16 21:16:31 -08:00
David S. Miller 17f7f4d9fc Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	net/ipv4/fib_frontend.c
2010-12-26 22:37:05 -08:00
Dan Carpenter 7182afea8d IB/uverbs: Handle large number of entries in poll CQ
In ib_uverbs_poll_cq() code there is a potential integer overflow if
userspace passes in a large cmd.ne.  The calls to kmalloc() would
allocate smaller buffers than intended, leading to memory corruption.
There iss also an information leak if resp wasn't all used.
Unprivileged userspace may call this function, although only if an
RDMA device that uses this function is present.

Fix this by copying CQ entries one at a time, which avoids the
allocation entirely, and also by moving this copying into a function
that makes sure to initialize all memory copied to userspace.

Special thanks to Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
for his help and advice.

Cc: <stable@kernel.org>
Signed-off-by: Dan Carpenter <error27@gmail.com>

[ Monkey around with things a bit to avoid bad code generation by gcc
  when designated initializers are used.  - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-12-08 15:23:49 -08:00
Vasiliy Kulikov 91a4d157d0 IB: Fix information leak in marshalling code
ib_ucm_init_qp_attr() and ucma_init_qp_attr() pass struct ib_uverbs_qp_attr
with reserved, qp_state, {ah_attr,alt_ah_attr}{reserved,->grh.reserved}
fields uninitialized to copy_to_user().  This leads to leaking of
contents of kernel stack memory to userspace.

Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-12-01 16:33:18 -08:00
Or Gerlitz f55864a4f4 IB/pack: Remove some unused code added by the IBoE patches
Remove unused functions added by commit ff7f5aab35 ("IB/pack: IBoE UD
packet packing support").

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
2010-12-01 16:30:18 -08:00
Eric Dumazet 22f4fbd9bd infiniband: remove dev_base_lock use
dev_base_lock is the legacy way to lock the device list, and is planned
to disappear. (writers hold RTNL, readers hold RCU lock)

Convert rdma_translate_ip() and update_ipv6_gids() to RCU locking.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24 11:41:56 -08:00
Eric Dumazet 72cdd1d971 net: get rid of rtable->idev
It seems idev field in struct rtable has no special purpose, but adding
extra atomic ops.

We hold refcounts on the device itself (using percpu data, so pretty
cheap in current kernel).

infiniband case is solved using dst.dev instead of idev->dev

Removal of this field means routing without route cache is now using
shared data, percpu data, and only potential contention is a pair of
atomic ops on struct neighbour per forwarded packet.

About 5% speedup on routing test.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-11 10:29:40 -08:00
Roland Dreier 116e9535fe Merge branches 'amso1100', 'cma', 'cxgb3', 'cxgb4', 'ehca', 'iboe', 'ipoib', 'misc', 'mlx4', 'nes', 'qib' and 'srp' into for-next 2010-10-26 16:09:11 -07:00
Eli Cohen 8ad330a002 IB/core: Add link layer type information to sysfs
Since an IB transport port may use either IB or Ethernet as its link layer,
add the file /sys/class/infiniband/<device>/ports/<port_num>/link_layer to
show the link layer for the port.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-25 10:20:39 -07:00
Eli Cohen af7bd46376 IB/core: Add VLAN support for IBoE
Add 802.1q VLAN support to IBoE. The VLAN tag is encoded within the
GID derived from a link local address in the following way:

    GID[11] GID[12] contain the VLAN ID when the GID contains a VLAN.

The 3 bits user priority field of the packets are identical to the 3
bits of the SL.

In case of rdma_cm apps, the TOS field is used to generate the SL
field by doing a shift right of 5 bits effectively taking to 3 MS bits
of the TOS field.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-25 10:20:39 -07:00
Eli Cohen 2420b60b1d IB/uverbs: Return link layer type to userspace for query port operation
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-25 10:20:39 -07:00
Steve Wise 97cb7e40c6 RDMA/ucma: Allow tuning the max listen backlog
For iWARP connections, the connect request is carried in a TCP payload
on an already established TCP connection.  So if the ucma's backlog is
full, the connection request is transmitted and acked at the TCP level
by the time the connect request gets dropped in the ucma.  The end
result is the connection gets rejected by the iWARP provider.
Further, a 32 node 256NP OpenMPI job will generate > 128 connect
requests on some ranks.

This patch increases the default max backlog to 1024, and adds a
sysctl variable so the backlog can be adjusted at run time.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-23 13:41:40 -07:00
Eli Cohen ff7f5aab35 IB/pack: IBoE UD packet packing support
Add support for packing IBoE packet headers.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>

[ Clean up and fix ib_ud_header_init() a bit.  - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-14 12:41:29 -07:00
Eli Cohen 3c86aa70bf RDMA/cm: Add RDMA CM support for IBoE devices
Add support for IBoE device binding and IP --> GID resolution.  Path
resolving and multicast joining are implemented within cma.c by
filling in the responses and running callbacks in the CMA work queue.

IP --> GID resolution always yields IPv6 link local addresses; remote
GIDs are derived from the destination MAC address of the remote port.
Multicast GIDs are always mapped to multicast MACs as is done in IPv6.
(IPv4 multicast is enabled by translating IPv4 multicast addresses to
IPv6 multicast as described in
<http://www.mail-archive.com/ipng@sunroof.eng.sun.com/msg02134.html>.)

Some helper functions are added to ib_addr.h.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-13 15:46:43 -07:00
Eli Cohen fac70d5191 IB/mad: IBoE supports only QP1 (no QP0)
Since IBoE is using Ethernet as its link layer, there is no central
management entity so there is need for QP0.  QP1 is still needed since
it handles communications between CM agents.  This patch will skip QP0
and create only QP1 for IBoE ports.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-13 09:38:11 -07:00
Animesh K Trivedi 26012f0750 RDMA/iwcm: Fix hang in uninterruptible wait on cm_id destroy
A process can get stuck in an uninterruptible wait in the
kernel while destroying a cm_id when iw_cm_connect() fails:

For example, When creation of a PD fails but the user continues with
an attempt to connect to the server without checking the return value,
in iw_cm_connect() a NULL qp is found so the call fails.  However the
IWCM_F_CONNECT_WAIT bit is not cleared.  destroy_cm_id() then waits
forever for IWCM_F_CONNECT_WAIT to be cleared.

The same problem exists on the passive side with the accept call.

Fix this by clearing the bit and waking up any waiters in the
appropriate spots.

Signed-off-by: Animesh Trivedi <atr@zurich.ibm.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-11 20:24:04 -07:00
Thomas Gleixner 557d0540b9 IB/umad: Make user_mad semaphore a real one
Get rid of init_MUTEX[_LOCKED]() and use sema_init() instead.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-28 20:52:21 -07:00
Eli Cohen a3f5adaf49 IB/core: Add link layer property to ports
This patch allows ports to have different link layers:
IB_LINK_LAYER_INFINIBAND or IB_LINK_LAYER_ETHERNET.  This is required
for adding IBoE (InfiniBand-over-Ethernet, aka RoCE) support.  For
devices that do not provide an implementation for querying the link
layer property of a port, we return a default value based on the
transport: RMA_TRANSPORT_IB nodes will return IB_LINK_LAYER_INFINIBAND
and RDMA_TRANSPORT_IWARP nodes will return IB_LINK_LAYER_ETHERNET.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-09-27 17:51:10 -07:00
Linus Torvalds 3cc08fc35d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (42 commits)
  IB/qib: Add missing <linux/slab.h> include
  IB/ehca: Drop unnecessary NULL test
  RDMA/nes: Fix confusing if statement indentation
  IB/ehca: Init irq tasklet before irq can happen
  RDMA/nes: Fix misindented code
  RDMA/nes: Fix showing wqm_quanta
  RDMA/nes: Get rid of "set but not used" variables
  RDMA/nes: Read firmware version from correct place
  IB/srp: Export req_lim via sysfs
  IB/srp: Make receive buffer handling more robust
  IB/srp: Use print_hex_dump()
  IB: Rename RAW_ETY to RAW_ETHERTYPE
  RDMA/nes: Fix two sparse warnings
  RDMA/cxgb3: Make needlessly global iwch_l2t_send() static
  IB/iser: Make needlessly global iser_alloc_rx_descriptors() static
  RDMA/cxgb4: Add timeouts when waiting for FW responses
  IB/qib: Fix race between qib_error_qp() and receive packet processing
  IB/qib: Limit the number of packets processed per interrupt
  IB/qib: Allow writes to the diag_counters to be able to clear them
  IB/qib: Set cfgctxts to number of CPUs by default
  ...
2010-08-07 17:08:02 -07:00
Aleksey Senin a2ebf07ae5 IB: Rename RAW_ETY to RAW_ETHERTYPE
Change abbreviated IB_QPT_RAW_ETY to IB_QPT_RAW_ETHERTYPE to make
the special QP type easier to understand.

cf http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg04530.html

Signed-off-by: Aleksey Senin <alekseys@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-08-04 10:44:19 -07:00
Sean Hefty 50a025c69e IB/cm: Check LAP state before sending an MRA
NULL pointer dereferences in ib_cm_init_qp_attr() were seen by some
users.  From a crash dump, I determined that we died in
cm_init_qp_rts_attr() (it's inlined, so it doesn't show up in the
traceback) on the line labeled below:

static int cm_init_qp_rts_attr(struct cm_id_private *cm_id_priv,
                               struct ib_qp_attr *qp_attr,
                               int *qp_attr_mask)
{
        ........
        if (cm_id_priv->id.lap_state == IB_CM_LAP_UNINIT) {
                .....
        } else {
               *qp_attr_mask = IB_QP_ALT_PATH | IB_QP_PATH_MIG_STATE;
               qp_attr->alt_port_num = cm_id_priv->alt_av.port->port_num; <-die


The problem is that the rdma_cm can call ib_send_cm_mra() after a
connection has been established.  The ib_cm incorrectly assumes that
the MRA is in response to a LAP (load alternate path) message, even
though no LAP message has been received.  The ib_cm needs to check the
lap_state before sending an MRA if the cm_id state is established.

Reported-by: Arthur Kepner <akepner@sgi.com>
Reported-by: Josh England <jjengla@gmail.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-07-28 15:18:24 -07:00
Roland Dreier f400e5b38a IB/umad: Remove unused-but-set variable 'already_dead'
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-07-14 13:25:04 -07:00
Changli Gao d8d1f30b95 net-next: remove useless union keyword
remove useless union keyword in rtable, rt6_info and dn_route.

Since there is only one member in a union, the union keyword isn't useful.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-10 23:31:35 -07:00
Julia Lawall e642df6a0b IB/ucm: Use memdup_user()
Use memdup_user when user data is immediately copied into the
allocated region.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression from,to,size,flag;
position p;
identifier l1,l2;
@@

-  to = \(kmalloc@p\|kzalloc@p\)(size,flag);
+  to = memdup_user(from,size);
   if (
-      to==NULL
+      IS_ERR(to)
                 || ...) {
   <+... when != goto l1;
-  -ENOMEM
+  PTR_ERR(to)
   ...+>
   }
-  if (copy_from_user(to, from, size) != 0) {
-    <+... when != goto l2;
-    -EFAULT
-    ...+>
-  }
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
2010-05-25 21:10:57 -07:00
Roland Dreier acdc30b56a Merge branches 'cxgb4', 'misc', 'mlx4', 'nes' and 'qib' into for-next 2010-05-25 09:54:03 -07:00
Roland Dreier 1693395511 IB/mad: Make needlessly global mad_sendq_size/mad_recvq_size static
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-05-23 21:39:31 -07:00
Ralph Campbell 9a6edb60ec IB/core: Allow device-specific per-port sysfs files
Add a new parameter to ib_register_device() so that low-level device
drivers can pass in a pointer to a callback function that will be
called for each port that is registered in sysfs.  This allows
low-level device drivers to create files in

    /sys/class/infiniband/<hca>/ports/<N>/

without having to poke through the internals of the RDMA sysfs handling.

There is no need for an unregister function since the kobject
reference will go to zero when ib_unregister_device() is called.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-05-21 10:34:44 -07:00
Roland Dreier ffebedb7ab Merge branches 'amso1100', 'bkl', 'cma', 'cxgb3', 'cxgb4', 'ipoib', 'iser', 'masked-atomics', 'misc', 'mthca' and 'nes' into for-next 2010-05-15 20:06:01 -07:00
Julia Lawall 9893e742a0 IB/core: Use kmemdup() instead of kmalloc()+memcpy()
Use kmemdup when some other buffer is immediately copied into the
allocated region.

A simplified version of the semantic patch that makes this change is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression from,to,size,flag;
statement S;
@@

-  to = \(kmalloc\|kzalloc\)(size,flag);
+  to = kmemdup(from,size,flag);
   if (to==NULL || ...) S
-  memcpy(to, from, size);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-05-15 20:05:07 -07:00
Tetsuo Handa 5d7220e8dc RDMA/cma: Randomize local port allocation
Randomize local port allocation in the way sctp_get_port_local() does.
Update rover at the end of loop since we're likely to pick a valid port
on the first try.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-04-21 16:18:40 -07:00
Roland Dreier bc1db9af73 IB: Explicitly rule out llseek to avoid BKL in default_llseek()
Several RDMA user-access drivers have file_operations structures with
no .llseek method set.  None of the drivers actually do anything with
f_pos, so this means llseek is essentially a NOP, instead of returning
an error as leaving other file_operations methods unimplemented would
do.  This is mostly harmless, except that a NULL .llseek means that
default_llseek() is used, and this function grabs the BKL, which we
would like to avoid.

Since llseek does nothing useful on these files, we would like it to
return an error to userspace instead of silently grabbing the BKL and
succeeding.  For nearly all of the file types, we take the
belt-and-suspenders approach of setting the .llseek method to
no_llseek and also calling nonseekable_open(); the exception is the
uverbs_event files, which are created with anon_inode_getfile(), which
already sets f_mode the same way as nonseekable_open() would.

This work is motivated by Arnd Bergmann's bkl-removal tree.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-04-21 12:17:38 -07:00
Linus Torvalds 0eddb519b9 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/mlx4: Check correct variable for allocation failure
  RDMA/nes: Correct cap.max_inline_data assignment in nes_query_qp()
  RDMA/cm: Set num_paths when manually assigning path records
  IB/cm: Fix device_create() return value check
2010-04-09 11:53:06 -07:00
Roland Dreier 5091b35388 Merge branches 'cma', 'misc', 'mlx4' and 'nes' into for-linus 2010-04-09 09:14:21 -07:00
Sean Hefty ae2d9293d7 RDMA/cm: Set num_paths when manually assigning path records
When manually assigning the path records to use for a connection, save
the number of paths that were set.  Otherwise, checks against num_path
will show 0, even though path record data is available.

This was discovered by manually setting the path records from user
space, then querying the kernel to see if the correct path records
were assigned, only to discover that the kernel returned 0 path
records to the query.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-04-07 14:13:22 -07:00
Jani Nikula 3e340c05c0 IB/cm: Fix device_create() return value check
Use IS_ERR() instead of comparing to NULL.

Signed-off-by: Jani Nikula <ext-jani.1.nikula@nokia.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-03-31 14:26:52 -07:00
Tejun Heo 5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Greg Kroah-Hartman 21e3bde964 sysfs: fix sysfs lockdep warning in infiniband code
This fixes a sysfs lockdep warning in the infiniband code.

Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-19 07:12:13 -07:00
Linus Torvalds 122ce878dc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  RDMA/nes: Fix CX4 link problem in back-to-back configuration
  RDMA/nes: Clear stall bit before destroying NIC QP
  RDMA/nes: Set assume_aligned_header bit
  RDMA/cxgb3: Wait at least one schedule cycle during device removal
  IB/mad: Ignore iWARP devices on device removal
  IPoIB: Include return code in trace message for ib_post_send() failures
  IPoIB: Fix TX queue lockup with mixed UD/CM traffic
2010-03-13 14:38:31 -08:00
Steve Wise 070e140c4c IB/mad: Ignore iWARP devices on device removal
When an iWARP device is unloaded, the ib_mad module logs errors.  It
should be ignoring iWARP devices on device removal just like it does
on device add.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-03-11 14:00:08 -08:00
Emese Revfy 52cf25d0ab Driver core: Constify struct sysfs_ops in struct kobj_type
Constify struct sysfs_ops.

This is part of the ops structure constification
effort started by Arjan van de Ven et al.

Benefits of this constification:

 * prevents modification of data that is shared
   (referenced) by many other structure instances
   at runtime

 * detects/prevents accidental (but not intentional)
   modification attempts on archs that enforce
   read-only kernel data at runtime

 * potentially better optimized code as the compiler
   can assume that the const data cannot be changed

 * the compiler/linker move const data into .rodata
   and therefore exclude them from false sharing

Signed-off-by: Emese Revfy <re.emese@gmail.com>
Acked-by: David Teigland <teigland@redhat.com>
Acked-by: Matt Domsch <Matt_Domsch@dell.com>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Hans J. Koch <hjk@linutronix.de>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-07 17:04:49 -08:00
Andi Kleen 0933e2d98d driver core: Convert some drivers to CLASS_ATTR_STRING
Convert some drivers who export a single string as class attribute
to the new class_attr_string functions. This removes redundant
code all over.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-07 17:04:48 -08:00
Andi Kleen 28812fe11a driver-core: Add attribute argument to class_attribute show/store
Passing the attribute to the low level IO functions allows all kinds
of cleanups, by sharing low level IO code without requiring
an own function for every piece of data.

Also drivers can extend the attributes with own data fields
and use that in the low level function.

This makes the class attributes the same as sysdev_class attributes
and plain attributes.

This will allow further cleanups in drivers.

Full tree sweep converting all users.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-07 17:04:48 -08:00
Akinobu Mita 19b629f581 infiniband: use for_each_set_bit()
Replace open-coded loop with for_each_set_bit().

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-06 11:26:23 -08:00
Linus Torvalds 0f2cc4ecd8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
  init: Open /dev/console from rootfs
  mqueue: fix typo "failues" -> "failures"
  mqueue: only set error codes if they are really necessary
  mqueue: simplify do_open() error handling
  mqueue: apply mathematics distributivity on mq_bytes calculation
  mqueue: remove unneeded info->messages initialization
  mqueue: fix mq_open() file descriptor leak on user-space processes
  fix race in d_splice_alias()
  set S_DEAD on unlink() and non-directory rename() victims
  vfs: add NOFOLLOW flag to umount(2)
  get rid of ->mnt_parent in tomoyo/realpath
  hppfs can use existing proc_mnt, no need for do_kern_mount() in there
  Mirror MS_KERNMOUNT in ->mnt_flags
  get rid of useless vfsmount_lock use in put_mnt_ns()
  Take vfsmount_lock to fs/internal.h
  get rid of insanity with namespace roots in tomoyo
  take check for new events in namespace (guts of mounts_poll()) to namespace.c
  Don't mess with generic_permission() under ->d_lock in hpfs
  sanitize const/signedness for udf
  nilfs: sanitize const/signedness in dealing with ->d_name.name
  ...

Fix up fairly trivial (famous last words...) conflicts in
drivers/infiniband/core/uverbs_main.c and security/tomoyo/realpath.c
2010-03-04 08:15:33 -08:00
Al Viro b1e4594ba0 switch infiniband uverbs to anon_inodes
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-03 14:07:27 -05:00
Roland Dreier fe8875e5a4 Merge branch 'misc' into for-next
Conflicts:
	drivers/infiniband/core/uverbs_main.c
2010-03-01 23:52:31 -08:00
Roland Dreier a265e5587f IB/uverbs: Use anon_inodes instead of private infinibandeventfs
The anon_inodes interface has been split to allow creating a bare
(non-installed) file pointer and also extended to allow specifying
O_RDONLY in the flags.  This makes it a suitable replacement for the
private "infinibandeventfs" pseudo-filesystem used by uverbs, and this
replacement saves a small chunk of boilerplate code.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 16:51:20 -08:00
Eli Cohen 920d706c89 IB/core: Fix and clean up ib_ud_header_init()
ib_ud_header_init() first clears header and then fills up the various
fields.  Later on, it tests header->immediate_present, which it has
already cleared, so the condition is always false.  Fix this by adding
an immediate_present parameter and setting header->immediate_present
as is done with grh_present.  Also remove unused calculation of
header_len.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 14:54:10 -08:00
Alexander Chiang cdb8e43889 IB/ucm: Clean whitespace errors
As shown when 'let c_space_errors=1' is set in vim.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 10:23:48 -08:00
Alexander Chiang daa913580e IB/ucm: Increase maximum devices supported
Some large systems may support more than IB_UCM_MAX_DEVICES
(currently 32).

This change allows us to support more devices in a backwards-compatible
manner. the first IB_UCM_MAX_DEVICES keep the same major/minor device
numbers they've always had.

If there are more than IB_UCM_MAX_DEVICES, then we dynamically request
a new major device number (new minors start at 0).

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 10:23:48 -08:00
Alexander Chiang 31d14b6e10 IB/ucm: Use stack variable 'base' in ib_ucm_add_one
This change is not useful by itself, but sets us up for a future
change that allows us to support more than IB_UCM_MAX_DEVICES.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 10:23:47 -08:00
Alexander Chiang dd08f702dd IB/ucm: Use stack variable 'devnum' in ib_ucm_add_one
This change is not useful by itself, but sets us up for a future
change that allows us to dynamically allocate device numbers in case
we have more than IB_UCM_MAX_DEVICES in the system.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 10:23:47 -08:00
Alexander Chiang d3f2c67f2d IB/umad: Clean whitespace
Clean errors as shown when 'let c_space_errors=1' is set in vim.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 10:23:46 -08:00
Alexander Chiang 8698d3fecc IB/umad: Increase maximum devices supported
Some large systems may support more than IB_UMAD_MAX_PORTS
(currently 64).

This change allows us to support more ports in a backwards-compatible
manner.  The first IB_UMAD_MAX_PORTS keep the same major/minor device
numbers they've always had.

If there are more than IB_UMAD_MAX_PORTS, we then dynamically request
a new major device number (new minors start at 0).

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 10:23:45 -08:00
Alexander Chiang dc2ed5e3c9 IB/umad: Use stack variable 'base' in ib_umad_init_port
This change is not useful by itself, but sets us up for a future change
that allows us to support more than IB_UMAD_MAX_PORTS in a system.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 10:23:45 -08:00
Alexander Chiang d451b8df9f IB/umad: Use stack variable 'devnum' in ib_umad_init_port
This change is not useful by itself, but sets us up for a future
change that allows us to dynamically allocate device numbers in case
we have more than IB_UMAD_MAX_PORTS in the system.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 10:23:44 -08:00
Alexander Chiang 6aa2a86ec4 IB/umad: Remove port_table[]
We no longer need this data structure, as it was used to associate an
inode back to a struct ib_umad_port during ->open().  But now that
we're embedding a struct cdev in struct ib_umad_port, we can use the
container_of() macro to go from the inode back to the device instead.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 10:23:44 -08:00
Alexander Chiang 2b937afcab IB/umad: Convert *cdev to cdev in struct ib_umad_port
Instead of storing pointers to cdev and sm_cdev, embed the full
structures instead.

This change allows us to use the container_of() macro in ib_umad_open()
and ib_umad_sm_open() in a future patch.

This change increases the size of struct ib_umad_port to 320 bytes
from 128.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 10:23:43 -08:00
Alexander Chiang 9afed76d59 IB/uverbs: Whitespace cleanup
Clean up the errors as shown when 'let c_space_errors=1' is set in vim.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 10:23:42 -08:00
Alexander Chiang 830a387138 IB/uverbs: Pack struct ib_uverbs_event_file tighter
Eliminate some padding in the structure by rearranging the members.
sizeof(struct ib_uverbs_event_file) is now 72 bytes (from 80) and
more members now fit in the first cacheline.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 10:23:42 -08:00
Alexander Chiang 6d6a0e71ee IB/uverbs: Increase maximum devices supported
Some large systems may support more than IB_UVERBS_MAX_DEVICES
(currently 32).

This change allows us to support more devices in a backwards-compatible
manner.  The first IB_UVERBS_MAX_DEVICES keep the same major/minor
device numbers that they've always had.

If there are more than IB_UVERBS_MAX_DEVICES, we then dynamically
request a new major device number (new minors start at 0).

This change increases the maximum number of HCAs to 64 (from 32).

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 10:23:41 -08:00
Alexander Chiang ddbd688301 IB/uverbs: use stack variable 'base' in ib_uverbs_add_one
This change is not useful by itself, but sets us up for a future change
that allows us to support more than IB_UVERBS_MAX_DEVICES in a system.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 10:23:40 -08:00
Alexander Chiang 38707980c4 IB/uverbs: Use stack variable 'devnum' in ib_uverbs_add_one
This change is not useful by itself, but it sets us up for a future
change that allows us to dynamically allocate device numbers in case
we have more than IB_UVERBS_MAX_DEVICES in the system.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 10:23:40 -08:00
Alexander Chiang 2a72f21226 IB/uverbs: Remove dev_table
dev_table's raison d'etre was to associate an inode back to a struct
ib_uverbs_device.

However, now that we've converted ib_uverbs_device to contain an
embedded cdev (instead of a *cdev), we can use the container_of()
macro and cast back to the containing device.

There's no longer any need for dev_table, so get rid of it.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 10:23:39 -08:00
Alexander Chiang 055422ddbb IB/uverbs: Convert *cdev to cdev in struct ib_uverbs_device
Instead of storing a pointer to a cdev, embed the entire struct cdev.

This change allows us to use the container_of() macro in
ib_uverbs_open() in a future patch.

This change increases the size of struct ib_uverbs_device to 168 bytes
across 3 cachelines from 80 bytes in 2 cachelines.  However, we
rearrange the members so that everything fits into the first cacheline
except for the struct cdev. Finally, we don't touch the cdev in any
fastpaths, so this change shouldn't negatively affect performance.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 10:23:39 -08:00
Jiri Slaby ccbe9f0b11 RDMA: Use rlimit helpers
Make sure compiler won't do weird things with limits by using the
rlimit helpers added in 3e10e716 ("resource: add helpers for fetching
rlimits").  E.g. fetching them twice may return 2 different values
after writable limits are implemented.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-11 15:40:48 -08:00
Sean Hefty 8523c04809 RDMA/cm: Revert association of an RDMA device when binding to loopback
Revert the following change from commit 6f8372b6 ("RDMA/cm: fix
loopback address support")

   The defined behavior of rdma_bind_addr is to associate an RDMA
   device with an rdma_cm_id, as long as the user specified a non-
   zero address.  (ie they weren't just trying to reserve a port)
   Currently, if the loopback address is passed to rdma_bind_addr,
   no device is associated with the rdma_cm_id.  Fix this.

It turns out that important apps such as Open MPI depend on
rdma_bind_addr() NOT associating any RDMA device when binding to a
loopback address.  Open MPI is being updated to deal with this, but at
least until a new Open MPI release is available, maintain the previous
behavior: allow rdma_bind_addr() to succeed, but do not bind to a
device.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-10 12:00:48 -08:00
Robert P. J. Day fd4582a399 IB/addr: Correct CONFIG_IPv6 to CONFIG_IPV6
Correct misspelled "CONFIG_IPv6" that was introduced in commit
d14714df ("IB/addr: Fix IPv6 routing lookup").  The config variable
should be all uppercase.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>

[ This was my fault when I munged the original patch.  - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-01-06 13:16:30 -08:00
Linus Torvalds bac5e54c29 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (38 commits)
  direct I/O fallback sync simplification
  ocfs: stop using do_sync_mapping_range
  cleanup blockdev_direct_IO locking
  make generic_acl slightly more generic
  sanitize xattr handler prototypes
  libfs: move EXPORT_SYMBOL for d_alloc_name
  vfs: force reval of target when following LAST_BIND symlinks (try #7)
  ima: limit imbalance msg
  Untangling ima mess, part 3: kill dead code in ima
  Untangling ima mess, part 2: deal with counters
  Untangling ima mess, part 1: alloc_file()
  O_TRUNC open shouldn't fail after file truncation
  ima: call ima_inode_free ima_inode_free
  IMA: clean up the IMA counts updating code
  ima: only insert at inode creation time
  ima: valid return code from ima_inode_alloc
  fs: move get_empty_filp() deffinition to internal.h
  Sanitize exec_permission_lite()
  Kill cached_lookup() and real_lookup()
  Kill path_lookup_open()
  ...

Trivial conflicts in fs/direct-io.c
2009-12-16 12:04:02 -08:00
Al Viro 2c48b9c455 switch alloc_file() to passing struct path
... and have the caller grab both mnt and dentry; kill
leak in infiniband, while we are at it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-16 12:16:42 -05:00
Roland Dreier 14f369d1d6 Merge branches 'amso1100', 'cma', 'cxgb3', 'ehca', 'ipath', 'ipoib', 'iser', 'misc', 'mlx4' and 'nes' into for-next 2009-12-15 23:39:25 -08:00
Roel Kluin df42245a3c IB/uverbs: Fix return of PTR_ERR() of wrong pointer in ib_uverbs_get_context()
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09 14:30:44 -08:00
Sean Hefty d14714df61 IB/addr: Fix IPv6 routing lookup
Include link scope as part of address resolution.  Combine local
and remote address resolution into a single, simpler code path.
Fix error checking in the IPv6 routing lookups.

Based on work from:
David Wilder <dwilder@us.ibm.com>
Jason Gunthorpe <jgunthorpe@obsidianresearch.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

[ Fix up cma_check_linklocal() for !IPV6 case.  - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-11-19 16:46:25 -08:00
Sean Hefty 923c100ef0 IB/addr: Simplify resolving IPv4 addresses
Merge resolve local/remote address resolution into a single
data flow to ensure consistent access and use of the local routing
tables.

Based on work from:
David Wilder <dwilder@us.ibm.com>
Jason Gunthorpe <jgunthorpe@obsidianresearch.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-11-19 13:26:51 -08:00
Sean Hefty 6f8372b69c RDMA/cm: fix loopback address support
The RDMA CM is intended to support the use of a loopback address
when establishing a connection; however, the behavior of the CM
when loopback addresses are used is confusing and does not always
work, depending on whether loopback was specified by the server,
the client, or both.

The defined behavior of rdma_bind_addr is to associate an RDMA
device with an rdma_cm_id, as long as the user specified a non-
zero address.  (ie they weren't just trying to reserve a port)
Currently, if the loopback address is passed to rdam_bind_addr,
no device is associated with the rdma_cm_id.  Fix this.

If a loopback address is specified by the client as the destination
address for a connection, it will fail to establish a connection.
This is true even if the server is listing across all addresses or
on the loopback address itself.  The issue is that the server tries
to translate the IP address carried in the REQ message to a local
net_device address, which fails.  The translation is not needed in
this case, since the REQ carries the actual HW address that should
be used.

Finally, cleanup loopback support to be more transport neutral.
Replace separate calls to get/set the sgid and dgid from the
device address to a single call that behaves correctly depending
on the format of the device address.  And support both IPv4 and
IPv6 address formats.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

[ Fixed RDS build by s/ib_addr_get/rdma_addr_get/  - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-11-19 13:26:06 -08:00
Sean Hefty c4315d85f9 IB/addr: Store net_device type instead of translating to RDMA transport
The struct rdma_dev_addr stores net_device address information:
the source device address, destination hardware address, and
broadcast address.  For consistency, store the net_device type
rather than converting it to the rdma_node_type.

The type indicates the format of the various hardware addresses,
which is what we're concerned with, and not the RDMA node type
that the address may map to.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-11-19 12:57:18 -08:00
Sean Hefty d2e0886245 IB/addr: Verify source and destination address families match
If a source address is provided, verify that the address family matches
that of the destination address.  If the source is not specified, use the
same address family as the destination.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-11-19 12:55:22 -08:00
Sean Hefty 6266ed6e41 RDMA/cma: Replace net_device pointer with index
Provide the device interface when resolving route information to
ensure that the correct outbound device is used.  This will also
simplify processing of sin6_scope_id for IPv6 support.

Based on work from:
David Wilder <dwilder@us.ibm.com>
Jason Gunthorpe <jgunthrope@obsidianresearch.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-11-19 12:55:22 -08:00
Jason Gunthorpe e2e626972e RDMA/cma: Fix AF_INET6 support in multicast joining
If joining to an AF_INET6 address, we need to map the address to a MGID
in the same way as the IP stack.  The old code would just fall through to
the IPv4 case and generate garbage.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-11-19 12:55:22 -08:00
Jason Gunthorpe 1c9b281997 RDMA/cma: Correct detection of SA Created MGID
RDMA CM treats AF_INET6 addresses that are either 0 or prefixed with
FF1x:A01B::/32 as MGIDs, but the detection for the prefix was buggy;
fix it up.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-11-19 12:55:21 -08:00
Eric Dumazet 0f9ea5d2ab RDMA/addr: Use appropriate locking with for_each_netdev()
for_each_netdev() should be used with RTNL or dev_base_lock held,
or else we risk a crash.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-11-18 14:24:34 -08:00
Sean Hefty a7ca1f00ed RDMA/ucma: Add option to manually set IB path
Export rdma_set_ib_paths to user space to allow applications to
manually set the IB path used for connections.  This allows
alternative ways for a user space application or library to obtain
path record information, including retrieving path information
from cached data, avoiding direct interaction with the IB SA.
The IB SA is a single, centralized entity that can limit scaling
on large clusters running MPI applications.

Future changes to the rdma cm can expand on this framework to
support the full range of features allowed by the IB CM, such as
separate forward and reverse paths and APM.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-11-16 09:30:33 -08:00
Alexey Dobriyan d43c36dc6b headers: remove sched.h from interrupt.h
After m68k's task_thread_info() doesn't refer to current,
it's possible to remove sched.h from interrupt.h and not break m68k!
Many thanks to Heiko Carstens for allowing this.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-10-11 11:20:58 -07:00
David J. Wilder 85f20b39fd RDMA/addr: Fix resolution of local IPv6 addresses
This patch allows a local IPv6 address to be resolved by rdma_cm.

To reproduce the problem:

 $ rping -s -v -a ::0  &
 $ rping -c -v -a <IPv6 address local to this system>
 rdma_resolve_addr error -1

Local IPv6 address was obtained with "ip addr show ib0"

Addresses: https://bugs.openfabrics.org/show_bug.cgi?id=1759
Signed-off-by: David Wilder <dwilder@us.ibm.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-10-07 16:03:18 -07:00
Steve Wise 54e05f15cc RDMA/iwcm: Don't call provider reject func with irqs disabled
In commit cb58160e ("RDMA/iwcm: Reject the connection when the cm_id
is destroyed") a call to the provider's reject handler was added to
destroy_cm_id() to fix a provider endpoint leak.  This call needs to
be done with interrupts enabled.  So unlock and relock around this
call.  This is safe because:

1) the provider will do nothing with this endpoint until the iwcm either
   accepts or rejects.
2) the lock is only released after the iwcm state is changed, so an
   errant iwcm app that is destroying -and- rejecting the connection
   concurrently will get a failure on one of the calls.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-10-07 15:38:12 -07:00
Alexey Dobriyan a99bbaf5ee headers: remove sched.h from poll.h
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-04 15:05:10 -07:00
Roland Dreier 0e442afd92 IB/mad: Fix lock-lock-timer deadlock in RMPP code
Holding agent->lock across cancel_delayed_work() (which does
del_timer_sync()) in ib_cancel_rmpp_recvs() leads to lockdep reports of
possible lock-timer deadlocks if a consumer ever does something that
connects agent->lock to a lock taken in IRQ context (cf
http://marc.info/?l=linux-rdma&m=125243699026045).

Fix this by changing the list items to a new state "CANCELING" while
holding the lock, and then canceling the delayed work without holding
the lock.  If the delayed work runs after the lock is dropped, it will
see the state is CANCELING and return immediately, so the list will
stay stable while we traverse it with the lock not held.

Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-23 11:10:15 -07:00
Roland Dreier 73f526da02 Merge branch 'mad' into for-linus
Conflicts:
	drivers/infiniband/core/mad.c
2009-09-10 21:19:45 -07:00
Steve Wise cb58160e72 RDMA/iwcm: Reject the connection when the cm_id is destroyed
If the cm_id of a connect request is destroyed prior to the ULP
accepting or rejecting the connection, then the provider never cleans
up the connection.  The iwcm should explicitly reject these
connections if the cm_id is destroyed.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-09 11:37:38 -07:00
Hal Rosenstock b76aabc395 IB/mad: Allow tuning of QP0 and QP1 sizes
MADs are UD and can be dropped if there are no receives posted, so
allow receive queue size to be set with a module parameter in case the
queue needs to be lengthened.  Send side tuning is done for symmetry
with receive.

Signed-off-by: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-07 08:28:48 -07:00
Roland Dreier 6b2eef8fd7 IB/mad: Fix possible lock-lock-timer deadlock
Lockdep reported a possible deadlock with cm_id_priv->lock,
mad_agent_priv->lock and mad_agent_priv->timed_work.timer; this
happens because the mad module does

	cancel_delayed_work(&mad_agent_priv->timed_work);

while holding mad_agent_priv->lock.  cancel_delayed_work() internally
does del_timer_sync(&mad_agent_priv->timed_work.timer).

This can turn into a deadlock because mad_agent_priv->lock is taken
inside cm_id_priv->lock, so we can get the following set of contexts
that deadlock each other:

 A: holding cm_id_priv->lock, waiting for mad_agent_priv->lock
 B: holding mad_agent_priv->lock, waiting for del_timer_sync()
 C: interrupt during mad_agent_priv->timed_work.timer that takes
    cm_id_priv->lock

Fix this by using the new __cancel_delayed_work() interface (which
internally does del_timer() instead of del_timer_sync()) in all the
places where we are holding a lock.

Addresses: http://bugzilla.kernel.org/show_bug.cgi?id=13757
Reported-by: Bart Van Assche <bart.vanassche@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-07 08:27:50 -07:00
Jack Morgenstein b1b8afb833 IB/uverbs: Return ENOSYS for unimplemented commands (not EINVAL)
Since the original commit 883a99c7 ("[IB] uverbs: Add a mask of device
methods allowed for userspace"), the uverbs core returns EINVAL for
commands not implemented by a specific low-level driver.

This creates a problem that there is no way to tell the difference
between an unimplemented command and an implemented one which is
incorrectly invoked (which also returns EINVAL).

The fix is to have unimplemented commands return ENOSYS.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:24:24 -07:00
Yossi Etigin e1d7806df3 IB/core: Fix send multicast group leave retry
Until now, retries were only sent when joining a multicast group. This
patch will adds retries when leaving a multicast group as well.

Signed-off-by: Ron Livne <ronli@voltaire.com>
Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:24:24 -07:00
Roland Dreier 6276e08a9b IB: Use DEFINE_SPINLOCK() for static spinlocks
Rather than just defining static spinlock_t variables and then
initializing them later in init functions, simply define them with
DEFINE_SPINLOCK() and remove the calls to spin_lock_init().  This cleans
up the source a tad and also shrinks the compiled code; eg on x86-64:

add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-40 (-40)
function                                     old     new   delta
ib_uverbs_init                               336     326     -10
ib_mad_init_module                           147     137     -10
ib_sa_init                                   123     103     -20

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:24:23 -07:00
Roland Dreier 60f2b652f5 IB/mad: Check hop count field in directed route MAD to avoid array overflow
The hop count field in a directed route MAD is only allowed to be in the
range 0 to 63 (by spec).  Check that this really is the case to avoid
accessing outside the bounds of the hop array.

Reported-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05 20:24:10 -07:00
Peter Huewe 716abb1fdf RDMA: Add __init/__exit macros to addr.c and cma.c
Add __init and __exit annotations to the module_init/module_exit
functions from drivers/infiniband/core/addr.c and cma.c.

Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-06-23 10:38:42 -07:00
Greg Kroah-Hartman 3f7c58a05f infiniband: remove driver_data direct access of struct device
In the near future, the driver core is going to not allow direct access
to the driver_data pointer in struct device.  Instead, the functions
dev_get_drvdata() and dev_set_drvdata() should be used.  These functions
have been around since the beginning, so are backwards compatible with
all older kernel versions.


Cc: general@lists.openfabrics.org
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-15 21:30:26 -07:00
Yossi Etigin d2ca39f262 RDMA/cma: Create cm id even when IB port is down
When doing rdma_resolve_addr(), if the relevant IB port is down, the
function fails and the cm_id is not bound to the correct device.
Therefore, application does not have a device handle and cannot wait
for the port to become active.  The function fails because the
underlying IPoIB interface is not joined to the broadcast group and
therefore the SA does not have a multicast record to take a Q_Key
from.

The fix is to use lazy Q_Key resolution - cma_set_qkey() will set
id_priv->qkey if it was not set, and will be called just before the
Q_Key is really required.

Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-04-08 13:42:33 -07:00
Yossi Etigin 84adeee9aa RDMA/cma: Use rate from IPoIB broadcast when joining IPoIB multicast groups
When joining an IPoIB multicast group, use the same rate as in the
broadcast group.  Otherwise, if the RDMA CM creates this group before
IPoIB does, it might get a different rate.  This will cause IPoIB to
fail joining to the same group later on, because IPoIB uses strict
rate selection.

Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-04-01 13:55:32 -07:00
Roland Dreier 09f98bafea Merge branches 'cxgb3', 'endian', 'ipath', 'ipoib', 'iser', 'mad', 'misc', 'mlx4', 'mthca', 'nes' and 'sysfs' into for-next 2009-03-24 20:44:41 -07:00
Roland Dreier 6432f36684 IB: Remove useless ibdev_is_alive() tests from sysfs code
Some attribute show functions test ibdev_is_alive() to make sure that
it's OK to access device state.  However, the sysfs attributes will
not be registered until the device is fully initialized, and they'll
be unregistered before anything is torn down, so ibdev_is_alive()
doesn't do anything useful.  Remove it.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-03-04 15:22:39 -08:00
Jack Morgenstein 6b708b3dde IB/sa_query: Fix AH leak due to update_sm_ah() race
Our testing uncovered a race condition in ib_sa_event():

	spin_lock_irqsave(&port->ah_lock, flags);
	if (port->sm_ah)
		kref_put(&port->sm_ah->ref, free_sm_ah);
	port->sm_ah = NULL;
	spin_unlock_irqrestore(&port->ah_lock, flags);

	schedule_work(&sa_dev->port[event->element.port_num -
				    sa_dev->start_port].update_task);

If two events occur back-to-back (e.g., client-reregister and LID
change), both may pass the spinlock-protected code above before the
scheduled work updates the port->sm_ah handle.  Then if the scheduled
work ends up running twice, the second operation will then find a
non-NULL port->sm_ah, and will simply overwrite it in update_sm_ah --
resulting in an AH leak.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-03-03 14:30:01 -08:00
Ralph Campbell 4780c1953f IB/mad: Fix ib_post_send_mad() returning 0 with no generate send comp
If ib_post_send_mad() returns 0, the API guarantees that there will be
a callback to send_buf->mad_agent->send_handler() so that the sender
can call ib_free_send_mad().  Otherwise, the ib_mad_send_buf will be
leaked and the mad_agent reference count will never go to zero and the
IB device module cannot be unloaded.  The above can happen without
this patch if process_mad() returns (IB_MAD_RESULT_SUCCESS |
IB_MAD_RESULT_CONSUMED).

If process_mad() returns IB_MAD_RESULT_SUCCESS and there is no agent
registered to receive the mad being sent, handle_outgoing_dr_smp()
returns zero which causes a MAD packet which is at the end of the
directed route to be incorrectly sent on the wire but doesn't cause a
hang since the HCA generates a send completion.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-03-03 14:22:17 -08:00
Ralph Campbell d9620a4c82 IB/mad: initialize mad_agent_priv before putting on lists
There is a potential race in ib_register_mad_agent() where the struct
ib_mad_agent_private is not fully initialized before it is added to
the list of agents per IB port. This means the ib_mad_agent_private
could be seen before the refcount, spin locks, and linked lists are
initialized.  The fix is to initialize the structure earlier.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-02-27 14:44:32 -08:00
Ralph Campbell 1d9bc6d648 IB/mad: Fix null pointer dereference in local_completions()
handle_outgoing_dr_smp() can queue a struct ib_mad_local_private
*local on the mad_agent_priv->local_work work queue with
local->mad_priv == NULL if device->process_mad() returns
IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY and
(!ib_response_mad(&mad_priv->mad.mad) ||
!mad_agent_priv->agent.recv_handler).

In this case, local_completions() will be called with local->mad_priv
== NULL. The code does check for this case and skips calling
recv_mad_agent->agent.recv_handler() but recv == 0 so
kmem_cache_free() is called with a NULL pointer.

Also, since recv isn't reinitialized each time through the loop, it
can cause a memory leak if recv should have been zero.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
2009-02-27 10:34:30 -08:00
Roland Dreier 9206dff157 IB: Remove sysfs files before unregistering device
Move the ib_device_unregister_sysfs() call from ib_dealloc_device() to
ib_unregister_device().  The old code allows device unregister to
proceed even if some sysfs files are open, which leaves a window where
userspace can open a file before a device is removed but then end up
reading the file after the device is removed, which leads to various
kernel crashes either because the device data structure is freed or
because the low-level driver code is gone after module removal.

By not returning from ib_unregister_device() until after all sysfs
entries are removed, we make sure that data structures and/or module
code is not freed until after all sysfs access is done.

Reported-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-02-25 13:27:46 -08:00
Harvey Harrison 9c3da09917 IB: Remove __constant_{endian} uses
The base versions handle constant folding just fine, use them
directly.  The replacements are OK in the include/ files as they are
not exported to userspace so we don't need the __ prefixed versions.

This patch does not affect code generation at all.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-01-17 17:11:57 -08:00
Kay Sievers d927e38c6c infiniband: struct device - replace bus_id with dev_name(), dev_set_name()
Acked-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-06 10:44:39 -08:00
Roland Dreier 2c4ab6243f RDMA/addr: Fix build breakage when IPv6 is disabled
Commit 38617c64 ("RDMA/addr: Add support for translating IPv6
addresses") broke the build when CONFIG_IPV6=n, because the ib_addr
module unconditionally attempted to call ipv6_chk_addr() and other
IPv6 functions that are not defined when IPv6 is disabled.  Fix this
by only building IPv6 support if CONFIG_IPV6 is turned on, and
add a Kconfig dependency to prevent the ib_addr code from being built
in when IPv6 is built modular.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-12-29 23:37:14 -08:00
Linus Torvalds 0191b625ca Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits)
  net: Allow dependancies of FDDI & Tokenring to be modular.
  igb: Fix build warning when DCA is disabled.
  net: Fix warning fallout from recent NAPI interface changes.
  gro: Fix potential use after free
  sfc: If AN is enabled, always read speed/duplex from the AN advertising bits
  sfc: When disabling the NIC, close the device rather than unregistering it
  sfc: SFT9001: Add cable diagnostics
  sfc: Add support for multiple PHY self-tests
  sfc: Merge top-level functions for self-tests
  sfc: Clean up PHY mode management in loopback self-test
  sfc: Fix unreliable link detection in some loopback modes
  sfc: Generate unique names for per-NIC workqueues
  802.3ad: use standard ethhdr instead of ad_header
  802.3ad: generalize out mac address initializer
  802.3ad: initialize ports LACPDU from const initializer
  802.3ad: remove typedef around ad_system
  802.3ad: turn ports is_individual into a bool
  802.3ad: turn ports is_enabled into a bool
  802.3ad: make ntt bool
  ixgbe: Fix set_ringparam in ixgbe to use the same memory pools.
  ...

Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due
to the conversion to %pI (in this networking merge) and the addition of
doing IPv6 addresses (from the earlier merge of CIFS).
2008-12-28 12:49:40 -08:00
Aleksey Senin 1f5175adea RDMA/cma: Add IPv6 support
Handle AF_INET6 cases where required, and use struct sockaddr_storage
wherever an IPv6 address might be stored.

Signed-off-by: Aleksey Senin <aleksey@alst60.(none)>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-12-24 10:16:45 -08:00
Aleksey Senin 38617c64bf RDMA/addr: Add support for translating IPv6 addresses
Add support for translating AF_INET6 addresses to the IB address
translation service.  This requires using struct sockaddr_storage
instead of struct sockaddr wherever an IPv6 address might be stored,
and adding cases to handle IPv6 in addition to IPv4 to the various
translation functions.

Signed-off-by: Aleksey Senin <aleksey@alst60.(none)>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-12-24 10:16:37 -08:00
David S. Miller 9eeda9abd1 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/wireless/ath5k/base.c
	net/8021q/vlan_core.c
2008-11-06 22:43:03 -08:00
Al Viro 233e70f422 saner FASYNC handling on file close
As it is, all instances of ->release() for files that have ->fasync()
need to remember to evict file from fasync lists; forgetting that
creates a hole and we actually have a bunch that *does* forget.

So let's keep our lives simple - let __fput() check FASYNC in
file->f_flags and call ->fasync() there if it's been set.  And lose that
crap in ->release() instances - leaving it there is still valid, but we
don't have to bother anymore.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-01 09:49:46 -07:00
Harvey Harrison 5b095d9892 net: replace %p6 with %pI6
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-29 12:52:50 -07:00
Harvey Harrison 8867cd7c86 infiniband: use %p6 for printing message ids
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28 23:02:35 -07:00
Linus Torvalds 724bdd097e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/ehca: Reject dynamic memory add/remove when ehca adapter is present
  IB/ehca: Fix reported max number of QPs and CQs in systems with >1 adapter
  IPoIB: Set netdev offload features properly for child (VLAN) interfaces
  IPoIB: Clean up ethtool support
  mlx4_core: Add Ethernet PCI device IDs
  mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC
  mlx4_core: Multiple port type support
  mlx4_core: Ethernet MAC/VLAN management
  mlx4_core: Get ethernet MTU and default address from firmware
  mlx4_core: Support multiple pre-reserved QP regions
  Update NetEffect maintainer emails to Intel emails
  RDMA/cxgb3: Remove cmid reference on tid allocation failures
  IB/mad: Use krealloc() to resize snoop table
  IPoIB: Always initialize poll_timer to avoid crash on unload
  IB/ehca: Don't allow creating UC QP with SRQ
  mlx4_core: Add QP range reservation support
  RDMA/ucma: Test ucma_alloc_multicast() return against NULL, not with IS_ERR()
2008-10-23 08:16:03 -07:00
Roland Dreier 56f2fdaade Merge branches 'cma', 'cxgb3', 'ehca', 'ipoib', 'mad', 'mlx4' and 'nes' into for-next 2008-10-22 15:56:41 -07:00
Parag Warudkar 01e8ef11bc x86: sysfs: kill owner field from attribute
Tejun's commit 7b595756ec made sysfs
attribute->owner unnecessary.  But the field was left in the structure to
ease the merge.  It's been over a year since that change and it is now
time to start killing attribute->owner along with its users - one arch at
a time!

This patch is attempt #1 to get rid of attribute->owner only for
CONFIG_X86_64 or CONFIG_X86_32 .  We will deal with other arches later on
as and when possible - avr32 will be the next since that is something I
can test.  Compile (make allyesconfig / make allmodconfig / custom config)
and boot tested.

akpm: the idea is that we put the declaration of sttribute.owner inside
`#ifndef CONFIG_X86'.  But that proved to be too ambitious for now because
new usages kept on turning up in subsystem trees.

[akpm: remove the ifdef for now]
Signed-off-by: Parag Warudkar <parag.lkml@gmail.com>
Cc: Greg KH <greg@kroah.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: David Brownell <david-b@pacbell.net>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20 08:52:42 -07:00
Greg Kroah-Hartman 91bd418fdc device create: infiniband: convert device_create_drvdata to device_create
Now that device_create() has been audited, rename things back to the
original call to be sane.

Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-10-16 09:24:42 -07:00
Roland Dreier 528051746b IB/mad: Use krealloc() to resize snoop table
Use krealloc() instead of kmalloc() followed by memcpy() when resizing
the MAD module's snoop table.

Also put parentheses around the new table size to avoid calculating
the wrong size to allocate, which fixes a bug pointed out by Haven
Hash <haven.hash@isilon.com>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-10-14 14:05:36 -07:00
Julien Brunel 6aea938f54 RDMA/ucma: Test ucma_alloc_multicast() return against NULL, not with IS_ERR()
In case of error, the function ucma_alloc_multicast() returns a NULL
pointer, but never returns an ERR pointer.  So after a call to this
function, an IS_ERR test should be replaced by a NULL test.

The semantic match that finds this problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@match bad_is_err_test@
expression x, E;
@@

x = ucma_alloc_multicast(...)
... when != x = E
IS_ERR(x)
// </smpl>

Signed-off-by:  Julien Brunel <brunel@diku.dk>
Signed-off-by:  Julia Lawall <julia@diku.dk>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-10-10 12:00:19 -07:00
Roland Dreier eedd5d0a70 Merge branches 'cma', 'cxgb3', 'ehca', 'ipath', 'ipoib', 'mad', 'misc', 'mlx4', 'mthca' and 'nes' into for-next 2008-10-09 17:41:15 -07:00
Hefty, Sean a7e80ce26c IB/cm: Correctly free cm_device structure
commit 110cf374 ("infiniband: make cm_device use a struct device and
not a kobject.") introduced a memory leak, since it deleted
cm_release_dev_obj(), which was where cm_dev was freed.  Fix this by
freeing the leaked structure after calling device_unregister().

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-09-30 10:36:54 -07:00
Michael Brooks 7097228c54 IB/mad: Don't discard BMA responses in kernel
This fixes the problem of incoming BMA responses being dropped due to
a bad "is response" check.  Fix the test to use the ib_response_mad()
predicate, which correctly handles BMA MADs.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=988>.

Signed-off-by: Michael Brooks <michael.brooks@qlogic.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-09-20 20:06:16 -07:00
Roland Dreier 06a91a02e9 Merge branches 'cma', 'cxgb3', 'ipath', 'ipoib', 'mad' and 'mlx4' into for-linus 2008-08-07 14:12:03 -07:00
Julien Brunel cd55ef5a10 IB/mad: Test ib_create_send_mad() return with IS_ERR(), not == NULL
In case of error, the function ib_create_send_mad() returns an ERR
pointer, but never returns a NULL pointer.  So testing the return
value for error should be done with IS_ERR, not by comparing with
NULL.

A simplified version of the semantic patch that makes this change is
as follows:

(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@correct_null_test@
expression x,E;
statement S1, S2;
@@
x = ib_create_send_mad(...)
<... when != x = E
if (
(
- x@p2 != NULL
+ ! IS_ERR ( x )
|
- x@p2 == NULL
+ IS_ERR( x )
)
 )
S1
else S2
...>
? x = E;
// </smpl>

Signed-off-by: Julien Brunel <brunel@diku.dk>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-08-07 14:11:56 -07:00
Roland Dreier 3f44675439 RDMA/cma: Remove padding arrays by using struct sockaddr_storage
There are a few places where the RDMA CM code handles IPv6 by doing

	struct sockaddr		addr;
	u8			pad[sizeof(struct sockaddr_in6) -
				    sizeof(struct sockaddr)];

This is fragile and ugly; handle this in a better way with just

	struct sockaddr_storage	addr;

[ Also roll in patch from Aleksey Senin <alekseys@voltaire.com> to
  switch to struct sockaddr_storage and get rid of padding arrays in
  struct rdma_addr. ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-08-04 11:02:14 -07:00
Roland Dreier 5ba18b186c RDMA/ucm: BKL is not needed for ib_ucm_open()
Remove explicit cycle_kernel_lock() call and document why the code is safe.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-24 20:36:59 -07:00
Roland Dreier f7a6117ee5 RDMA/ucma: BKL is not needed for ucma_open()
Remove explicit lock_kernel() calls and document why the code is safe.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-24 20:36:59 -07:00
Linus Torvalds 5c402355ad Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  MAINTAINERS: Remove Glenn Streiff from NetEffect entry
  mlx4_core: Improve error message when not enough UAR pages are available
  IB/mlx4: Add support for memory management extensions and local DMA L_Key
  IB/mthca: Keep free count for MTT buddy allocator
  mlx4_core: Keep free count for MTT buddy allocator
  mlx4_code: Add missing FW status return code
  IB/mlx4: Rename struct mlx4_lso_seg to mlx4_wqe_lso_seg
  mlx4_core: Add module parameter to enable QoS support
  RDMA/iwcm: Remove IB_ACCESS_LOCAL_WRITE from remote QP attributes
  IPoIB: Include err code in trace message for ib_sa_path_rec_get() failures
  IB/sa_query: Check if sm_ah is NULL in ib_sa_remove_one()
  IB/ehca: Release mutex in error path of alloc_small_queue_page()
  IB/ehca: Use default value for Local CA ACK Delay if FW returns 0
  IB/ehca: Filter PATH_MIG events if QP was never armed
  IB/iser: Add support for RDMA_CM_EVENT_ADDR_CHANGE event
  RDMA/cma: Add RDMA_CM_EVENT_TIMEWAIT_EXIT event
  RDMA/cma: Add RDMA_CM_EVENT_ADDR_CHANGE event
2008-07-24 12:56:07 -07:00
Roland Dreier 2cc177364e Merge branches 'bkl-removal', 'cma', 'ehca', 'for-2.6.27', 'mlx4', 'mthca' and 'nes' into for-linus 2008-07-24 08:38:47 -07:00
Dotan Barak 1ca8d15619 RDMA/iwcm: Remove IB_ACCESS_LOCAL_WRITE from remote QP attributes
Remove IB_ACCESS_LOCAL_WRITE from qp.qp_access_flags because this
attribute is only used to set remote permissions.

Signed-off-by: Dotan Barak <dotanba@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-22 14:18:34 -07:00
Ralph Campbell 64b784b583 IB/sa_query: Check if sm_ah is NULL in ib_sa_remove_one()
If update_sm_ah() fails, it leaves the port's sm_ah as NULL.  Then if
the device or module is removed, ib_sa_remove_one() will dereference a
NULL pointer when it calls kref_put().  Fix this by testing if sm_ah
is NULL before dropping the reference.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-22 14:18:33 -07:00
Amir Vadai 38ca83a588 RDMA/cma: Add RDMA_CM_EVENT_TIMEWAIT_EXIT event
Consumers that want to re-use their QPs in new connections need to
know when the QP has exited the timewait state.  Report the timewait
event through the rdma_cm.

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-22 14:14:23 -07:00
Or Gerlitz dd5bdff83b RDMA/cma: Add RDMA_CM_EVENT_ADDR_CHANGE event
Add an RDMA_CM_EVENT_ADDR_CHANGE event can be used by rdma-cm
consumers that wish to have their RDMA sessions always use the same
links (eg <hca/port>) as the IP stack does.  In the current code, this
does not happen when bonding is used and fail-over happened but the IB
link used by an already existing session is operating fine.

Use the netevent notification for sensing that a change has happened
in the IP stack, then scan the rdma-cm ID list to see if there is an
ID that is "misaligned" with respect to the IP stack, and deliver
RDMA_CM_EVENT_ADDR_CHANGE for this ID.  The consumer can act on the
event or just ignore it.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-22 14:14:22 -07:00
Greg Kroah-Hartman 110cf374a8 infiniband: make cm_device use a struct device and not a kobject.
This object really should be a struct device, or at least contain a
pointer to a struct device, as it is trying to create a separate device
tree outside of the main device tree.  This patch fixes this problem.

It is needed for the class core rework that is being done in the driver
core.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:54:49 -07:00
Greg Kroah-Hartman d4c4196f24 infiniband: rename "device" to "ib_device" in cm_device
This pointer really is a struct ib_device, not a struct device, so name
it properly to help prevent confusion.

This makes the followon patch in this series much smaller and easier to
understand as well.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:54:49 -07:00
Or Gerlitz de910bd921 RDMA/cma: Simplify locking needed for serialization of callbacks
The RDMA CM has some logic in place to make sure that callbacks on a
given CM ID are delivered to the consumer in a serialized manner.
Specifically it has code to protect against a device removal racing
with a running callback function.

This patch simplifies this logic by using a mutex per ID instead of a
wait queue and atomic variable.  This means that cma_disable_remove()
now is more properly named to cma_disable_callback(), and
cma_enable_remove() can now be removed because it just would become a
trivial wrapper around mutex_unlock().

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:53 -07:00
Or Gerlitz 64c5e613b9 RDMA/addr: Keep pointer to netdevice in struct rdma_dev_addr
Keep a pointer to the local (src) netdevice in struct rdma_dev_addr,
and copy it in as part of rdma_copy_addr().  Use rdma_translate_ip()
in cma_new_conn_id() to reduce some code duplication and also make
sure the src_dev member gets set.

In a high-availability configuration the netdevice pointer can be used
by the RDMA CM to align RDMA sessions to use the same links as the IP
stack does under fail-over and route change cases.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:53 -07:00
Steve Wise 7f624d023b RDMA/core: Add iWARP protocol statistics attributes in sysfs
This patch adds a sysfs attribute group called "proto_stats" under
/sys/class/infiniband/$device/ and populates this group with protocol
statistics if they exist for a given device.  Currently, only iWARP
stats are defined, but the code is designed to allow InfiniBand
protocol stats if they become available.  These stats are per-device
and more importantly -not- per port.

Details:

- Add union rdma_protocol_stats in ib_verbs.h.  This union allows
  defining transport-specific stats.  Currently only iwarp stats are
  defined.

- Add struct iw_protocol_stats to define the current set of iwarp
  protocol stats.

- Add new ib_device method called get_proto_stats() to return protocol
  statistics.

- Add logic in core/sysfs.c to create iwarp protocol stats attributes
  if the device is an RNIC and has a get_proto_stats() method.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:48 -07:00
Roland Dreier 468f2239bc RDMA/cma: Add missing newlines to printk()s
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
2008-07-14 23:48:47 -07:00
Ralph Campbell e5a5e7d59a IB/core: Reset to error QP state transition is not allowed
I was reviewing the QP state transition diagram in the IB 1.2.1 spec
and the code for qp_state_table[], and noticed that the code allows a
QP to be modified from IB_QPS_RESET to IB_QPS_ERR whereas the notes
for figure 124 (pg 457) specifically says that this transition isn't
allowed.  This is a clarification from earlier versions of the IB
spec, which were ambiguous in this area and suggested that the RESET
to ERR transition was allowed.

Fix up the qp_state_table[] to make RESET->ERR not allowed.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:46 -07:00
Steve Wise 00f7ec36c9 RDMA/core: Add memory management extensions support
This patch adds support for the IB "base memory management extension"
(BMME) and the equivalent iWARP operations (which the iWARP verbs
mandates all devices must implement).  The new operations are:

 - Allocate an ib_mr for use in fast register work requests.

 - Allocate/free a physical buffer lists for use in fast register work
   requests.  This allows device drivers to allocate this memory as
   needed for use in posting send requests (eg via dma_alloc_coherent).

 - New send queue work requests:
   * send with remote invalidate
   * fast register memory region
   * local invalidate memory region
   * RDMA read with invalidate local memory region (iWARP only)

Consumer interface details:

 - A new device capability flag IB_DEVICE_MEM_MGT_EXTENSIONS is added
   to indicate device support for these features.

 - New send work request opcodes IB_WR_FAST_REG_MR, IB_WR_LOCAL_INV,
   IB_WR_RDMA_READ_WITH_INV are added.

 - A new consumer API function, ib_alloc_mr() is added to allocate
   fast register memory regions.

 - New consumer API functions, ib_alloc_fast_reg_page_list() and
   ib_free_fast_reg_page_list() are added to allocate and free
   device-specific memory for fast registration page lists.

 - A new consumer API function, ib_update_fast_reg_key(), is added to
   allow the key portion of the R_Key and L_Key of a fast registration
   MR to be updated.  Consumers call this if desired before posting
   a IB_WR_FAST_REG_MR work request.

Consumers can use this as follows:

 - MR is allocated with ib_alloc_mr().

 - Page list memory is allocated with ib_alloc_fast_reg_page_list().

 - MR R_Key/L_Key "key" field is updated with ib_update_fast_reg_key().

 - MR made VALID and bound to a specific page list via
   ib_post_send(IB_WR_FAST_REG_MR)

 - MR made INVALID via ib_post_send(IB_WR_LOCAL_INV),
   ib_post_send(IB_WR_RDMA_READ_WITH_INV) or an incoming send with
   invalidate operation.

 - MR is deallocated with ib_dereg_mr()

 - page lists dealloced via ib_free_fast_reg_page_list().

Applications can allocate a fast register MR once, and then can
repeatedly bind the MR to different physical block lists (PBLs) via
posting work requests to a send queue (SQ).  For each outstanding
MR-to-PBL binding in the SQ pipe, a fast_reg_page_list needs to be
allocated (the fast_reg_page_list is owned by the low-level driver
from the consumer posting a work request until the request completes).
Thus pipelining can be achieved while still allowing device-specific
page_list processing.

The 32-bit fast register memory key/STag is composed of a 24-bit index
and an 8-bit key.  The application can change the key each time it
fast registers thus allowing more control over the peer's use of the
key/STag (ie it can effectively be changed each time the rkey is
rebound to a page list).

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:45 -07:00
Roland Dreier f3781d2e89 RDMA: Remove subversion $Id tags
They don't get updated by git and so they're worse than useless.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:44 -07:00
Moni Shoua 164ba0893c IB/sa: Fail requests made while creating new SM AH
This patch solves a race that occurs after an event occurs that causes
the SA query module to flush its SM address handle (AH).  When SM AH
becomes invalid and needs an update it is handled by the global
workqueue.  On the other hand this event is also handled in the IPoIB
driver by queuing work in the ipoib_workqueue that does multicast
joins.  Although queuing is in the right order, it is done to 2
different workqueues and so there is no guarantee that the first to be
queued is the first to be executed.

This causes a problem because IPoIB may end up sending an request to
the old SM, which will take a long time to time out (since the old SM
is gone); this leads to a much longer than necessary interruption in
multicast traffer.

The patch sets the SA query module's SM AH to NULL when the event
occurs, and until update_sm_ah() is done, any request that needs sm_ah
fails with -EAGAIN return status.

For consumers, the patch doesn't make things worse.  Before the patch,
MADs are sent to the wrong SM so the request gets lost.  Consumers can
be improved if they examine the return code and respond to EAGAIN
properly but even without an improvement the situation is not getting
worse.

Signed-off-by: Moni Levy <monil@voltaire.com>
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:43 -07:00
Sean Hefty a947491709 RDMA: Fix license text
The license text for several files references a third software license
that was inadvertently copied in.  Update the license to what was
intended.  This update was based on a request from HP.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:43 -07:00
Jonathan Corbet 2fceef397f Merge commit 'v2.6.26' into bkl-removal 2008-07-14 15:29:34 -06:00
Roland Dreier feae1ef116 IB/umad: BKL is not needed for ib_umad_open()
Remove explicit lock_kernel() calls and document why the code is safe.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-07-11 16:40:58 -06:00
Roland Dreier 5b2d281acb IB/uverbs: BKL is not needed for ib_uverbs_open()
Remove explicit lock_kernel() calls and document why the code is safe.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-07-04 10:32:28 -06:00
Arnd Bergmann 6b0ee363b2 infiniband-ucma: BKL pushdown
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2008-06-20 14:05:57 -06:00
Jonathan Corbet f2b9857eee Add a bunch of cycle_kernel_lock() calls
All of the open() functions which don't need the BKL on their face may
still depend on its acquisition to serialize opens against driver
initialization.  So make those functions acquire then release the BKL to be
on the safe side.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-06-20 14:05:53 -06:00
Jonathan Corbet 057e7c7ff9 infiniband: more BKL pushdown
Be extra-cautious and protect the remaining open() functions.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-06-20 14:05:51 -06:00
Jonathan Corbet d21c95c569 Add "no BKL needed" comments to several drivers
This documents the fact that somebody looked at the relevant open()
functions and concluded that, due to their trivial nature, no locking was
needed.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-06-20 14:05:50 -06:00
Jack Morgenstein fb77bcef9f IB/uverbs: Fix check of is_closed flag check in ib_uverbs_async_handler()
Commit 1ae5c187 ("IB/uverbs: Don't store struct file * for event
files") changed the way that closed files are handled in the uverbs
code.  However, after the conversion, is_closed flag is checked
incorrectly in ib_uverbs_async_handler().  As a result, no async
events are ever passed to applications.

Found by: Ronni Zimmerman <ronniz@mellanox.co.il>

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-06-18 15:36:38 -07:00
Roland Dreier 8079ffa0e1 IB/umem: Avoid sign problems when demoting npages to integer
On a 64-bit architecture, if ib_umem_get() is called with a size value
that is so big that npages is negative when cast to int, then the
length of the page list passed to get_user_pages(), namely

	min_t(int, npages, PAGE_SIZE / sizeof (struct page *))

will be negative, and get_user_pages() will immediately return 0 (at
least since 900cf086, "Be more robust about bad arguments in
get_user_pages()").  This leads to an infinite loop in ib_umem_get(),
since the code boils down to:

	while (npages) {
		ret = get_user_pages(...);
		npages -= ret;
	}

Fix this by taking the minimum as unsigned longs, so that the value of
npages is never truncated.

The impact of this bug isn't too severe, since the value of npages is
checked against RLIMIT_MEMLOCK, so a process would need to have an
astronomical limit or have CAP_IPC_LOCK to be able to trigger this,
and such a process could already cause lots of mischief.  But it does
let buggy userspace code cause a kernel lock-up; for example I hit
this with code that passes a negative value into a memory registartion
function where it is promoted to a huge u64 value.

Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-06-06 21:38:37 -07:00
Linus Torvalds c2448278e3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/mad: Fix kernel crash when .process_mad() returns SUCCESS|CONSUMED
  IPoIB: Test for NULL broadcast object in ipiob_mcast_join_finish()
  MAINTAINERS: Add cxgb3 and iw_cxgb3 NIC and iWARP driver entries
  IB/mlx4: Fix creation of kernel QP with max number of send s/g entries
  IB/mthca: Fix max_sge value returned by query_device
  RDMA/cxgb3: Fix uninitialized variable warning in iwch_post_send()
  IB/mlx4: Fix uninitialized-var warning in mlx4_ib_post_send()
  IB/ipath: Fix UC receive completion opcode for RDMA WRITE with immediate
  IB/ipath: Fix printk format for ipath_sdma_status
2008-05-23 11:11:44 -07:00
Dave Olson 5a4f2b6752 IB/mad: Fix kernel crash when .process_mad() returns SUCCESS|CONSUMED
If a low-level driver returns IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_CONSUMED,
handle_outgoing_dr_smp() doesn't clean up properly.  The fix is to
kfree the local data and break, rather than falling through.  This was
observed with the ipath driver, but could happen with any driver.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1027>.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-23 10:52:59 -07:00
Greg Kroah-Hartman 6c06aec248 IB: fix race in device_create
There is a race from when a device is created with device_create() and
then the drvdata is set with a call to dev_set_drvdata() in which a
sysfs file could be open, yet the drvdata will be NULL, causing all
sorts of bad things to happen.

This patch fixes the problem by using the new function,
device_create_drvdata().

Cc: Kay Sievers <kay.sievers@vrfy.org>
Reviewed-by: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-20 13:31:55 -07:00
Arthur Kepner cb9fbc5c37 IB: expand ib_umem_get() prototype
Add a new parameter, dmasync, to the ib_umem_get() prototype.  Use dmasync = 1
when mapping user-allocated CQs with ib_umem_get().

Signed-off-by: Arthur Kepner <akepner@sgi.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: David Miller <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:12 -07:00
Linus Torvalds e80ab411e5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6: (36 commits)
  SCSI: convert struct class_device to struct device
  DRM: remove unused dev_class
  IB: rename "dev" to "srp_dev" in srp_host structure
  IB: convert struct class_device to struct device
  memstick: convert struct class_device to struct device
  driver core: replace remaining __FUNCTION__ occurrences
  sysfs: refill attribute buffer when reading from offset 0
  PM: Remove destroy_suspended_device()
  Firmware: add iSCSI iBFT Support
  PM: Remove legacy PM (fix)
  Kobject: Replace list_for_each() with list_for_each_entry().
  SYSFS: Explicitly include required header file slab.h.
  Driver core: make device_is_registered() work for class devices
  PM: Convert wakeup flag accessors to inline functions
  PM: Make wakeup flags available whenever CONFIG_PM is set
  PM: Fix misuse of wakeup flag accessors in serial core
  Driver core: Call device_pm_add() after bus_add_device() in device_add()
  PM: Handle device registrations during suspend/resume
  block: send disk "change" event for rescan_partitions()
  sysdev: detect multiple driver registrations
  ...

Fixed trivial conflict in include/linux/memory.h due to semaphore header
file change (made irrelevant by the change to mutex).
2008-04-21 15:49:58 -07:00
Tony Jones f4e91eb4a8 IB: convert struct class_device to struct device
This converts the main ib_device to use struct device instead of struct
class_device as class_device is going away.

Signed-off-by: Tony Jones <tonyj@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-19 19:10:30 -07:00
Matthew Wilcox 6188e10d38 Convert asm/semaphore.h users to linux/semaphore.h
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
2008-04-18 22:22:54 -04:00
Eli Cohen 2dd5716227 IB/core: Add support for modify CQ
Add support for modifying CQ parameters for controlling event
generation moderation.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:33 -07:00
Roland Dreier 0f39cf3d54 IB/core: Add support for "send with invalidate" work requests
Add a new IB_WR_SEND_WITH_INV send opcode that can be used to mark a
"send with invalidate" work request as defined in the iWARP verbs and
the InfiniBand base memory management extensions.  Also put "imm_data"
and a new "invalidate_rkey" member in a new "ex" union in struct
ib_send_wr. The invalidate_rkey member can be used to pass in an
R_Key/STag to be invalidated.  Add this new union to struct
ib_uverbs_send_wr.  Add code to copy the invalidate_rkey field in
ib_uverbs_post_send().

Fix up low-level drivers to deal with the change to struct ib_send_wr,
and just remove the imm_data initialization from net/sunrpc/xprtrdma/,
since that code never does any send with immediate operations.

Also, move the existing IB_DEVICE_SEND_W_INV flag to a new bit, since
the iWARP drivers currently in the tree set the bit.  The amso1100
driver at least will silently fail to honor the IB_SEND_INVALIDATE bit
if passed in as part of userspace send requests (since it does not
implement kernel bypass work request queueing).  Remove the flag from
all existing drivers that set it until we know which ones are OK.

The values chosen for the new flag is not consecutive to avoid clashing
with flags defined in the XRC patches, which are not merged yet but
which are already in use and are likely to be merged soon.

This resurrects a patch sent long ago by Mikkel Hagen <mhagen@iol.unh.edu>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:32 -07:00
Dotan Barak 7ce5eacb45 IB/core: Check optional verbs before using them
Make sure that a device implements the modify_srq and reg_phys_mr
optional methods before calling them.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:28 -07:00
Eli Cohen b846f25aa2 IB/core: Add creation flags to struct ib_qp_init_attr
Add a create_flags member to struct ib_qp_init_attr that will allow a
kernel verbs consumer to create a pass special flags when creating a QP.
Add a flag value for telling low-level drivers that a QP will be used
for IPoIB UD LSO.  The create_flags member will also be useful for XRC
and ehca low-latency QP support.

Since no create_flags handling is implemented yet, add code to all
low-level drivers to return -EINVAL if create_flags is non-zero.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:27 -07:00
Robert P. J. Day 157de22946 IB: Use shorter list_splice_init() for brevity
Convert list_splice() + INIT_LIST_HEAD() to the equivalent list_splice_init()

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:26 -07:00
Julia Lawall 10f32065a2 RDMA/iwcm: Test rdma_create_id() for IS_ERR rather than 0
The function rdma_create_id() always returns either a valid pointer or
a value made with ERR_PTR, so its result should be tested with IS_ERR,
not with a test for 0.

The problem was found using the following semantic match.
(http://www.emn.fr/x-info/coccinelle/)

//<smpl>
@a@
expression E, E1;
statement S,S1;
position p;
@@

E = rdma_create_id(...)
... when != E = E1
if@p (E) S else S1

@n@
position a.p;
expression E,E1;
statement S,S1;
@@

E = NULL
... when != E = E1
if@p (E) S else S1

@depends on !n@
expression E;
statement S,S1;
position a.p;
@@

* if@p (E)
  S else S1
//</smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:25 -07:00
Roland Dreier a7dab9e887 IB/uverbs: Use alloc_file() instead of get_empty_filp()
Christoph Hellwig wants to unexport get_empty_filp(), which is an ugly
internal interface.  Change the modular user in ib_uverbs_alloc_event_file()
to use the better alloc_file() interface; this makes the code cleaner too.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:08 -07:00
Roland Dreier 1ae5c187ac IB/uverbs: Don't store struct file * for event files
The file member of struct ib_uverbs_event_file was only used to keep
track of whether the file had been closed or not.  The only thing we
ever did with the value was check if it was NULL or not.  Simplify the
code and get rid of the need to keep track of the struct file * we
allocate by replacing the file member with an is_closed member.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:08 -07:00
Roland Dreier 9cda779cc2 RDMA/ucma: Endian annotation
Add __force cast of node_guid to __u64, since we are sticking it into a
structure whose definition is shared with userspace.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:07 -07:00
Roland Dreier a88f488857 IB/cm: Endianness annotations
Mostly update the RB tree comparisons to force __be types to normal
integers, but the change to cm_format_sidr_req() is a real fix:
param->path->pkey is already __be16.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
2008-04-16 21:01:07 -07:00
Al Viro 1b90c137cc trivial endianness annotations: infiniband core
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-30 14:20:24 -07:00
Steve Wise d7c1fbd660 RDMA/iwcm: Don't access a cm_id after dropping reference
cm_work_handler() can access cm_id_priv after it drops its reference
by calling iwch_deref_id(), which might cause it to be freed.  The fix
is to look at whether IWCM_F_CALLBACK_DESTROY is set _before_ dropping
the reference.  Then if it was set, free the cm_id on this thread.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-10 21:22:22 -07:00
Pete Wyckoff 331552925d IB/fmr_pool: Flush all dirty FMRs from ib_fmr_pool_flush()
Commit a3cd7d90 ("IB/fmr_pool: ib_fmr_pool_flush() should flush all
dirty FMRs") caused a regression for iSER and was reverted in
e5507736.

This change attempts to redo the original patch so that all used FMR
entries are flushed when ib_flush_fmr_pool() is called without
affecting the normal FMR pool cleaning thread.  Simply move used
entries from the clean list onto the dirty list in ib_flush_fmr_pool()
before letting the cleanup thread do its job.

Signed-off-by: Pete Wyckoff <pw@osc.edu>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-29 13:31:48 -08:00
Pete Wyckoff 35fb5340e3 Revert "IB/fmr_pool: ib_fmr_pool_flush() should flush all dirty FMRs"
This reverts commit a3cd7d9070.

The original commit breaks iSER reliably, making it complain:

    iser: iser_reg_page_vec:ib_fmr_pool_map_phys failed: -11

The FMR cleanup thread runs ib_fmr_batch_release() as dirty entries
build up.  This commit causes clean but used FMR entries also to be
purged.  During that process, another thread can see that there are no
free FMRs and fail, even though there should always have been enough
available.

Signed-off-by: Pete Wyckoff <pw@osc.edu>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-29 13:29:19 -08:00
Sean Hefty 84ba284cd7 IB/cm: Flush workqueue when removing device
When a CM MAD is received, it is queued to a CM workqueue for
processing.  The queued work item references the port and device on
which the MAD was received.  If that device is removed from the system
before the work item can execute, the work item will reference freed
memory.

To fix this, flush the workqueue after unregistering to receive MAD,
and before the device is be freed.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-29 13:27:52 -08:00
Li Zefan c7482b81c8 IB: Fix return value in ib_device_register_sysfs()
If kobject_create_and_add() fails and returns NULL, the current code
in ib_device_register_sysfs() does not set ret and hence returns 0.
Set ret to -ENOMEM for this failure, so that the caller knows that
ib_device_register_sysfs() actually failed.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-15 15:05:05 -08:00
Sean Hefty ead595aeb0 RDMA/cma: Do not issue MRA if user rejects connection request
There's an undesirable interaction with issuing MRA requests to
increase connection timeouts and the listen backlog.

When the rdma_cm receives a connection request, it queues an MRA with
the ib_cm.  (The ib_cm will send an MRA if it receives a duplicate
REQ.)  The rdma_cm will then create a new rdma_cm_id and give that to
the user, which in this case is the rdma_user_cm.

If the listen backlog maintained in the rdma_user_cm is full, it
destroys the rdma_cm_id, which in turns destroys the ib_cm_id.  The
ib_cm_id generates a REJ because the state of the ib_cm_id has changed
to MRA sent, versus REQ received.  When the backlog is full, we just
want to drop the REQ so that it is retried later.

Fix this by deferring queuing the MRA until after the user of the
rdma_cm has examined the connection request.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-14 15:30:41 -08:00
Roland Dreier 7c7a9bccd2 IB/cm: Fix infiniband_cm class kobject ref counting
Commit 9af57b7a ("IB/cm: Add basic performance counters") introduced a
bug in how the reference count for cm_class.subsys.kobj was handled:
the path that released a device did a kobject_put() on that kobject, but
there was no kobject_get() in the path the handles adding a device.  So
the reference count ended up too low, which leads to bad things.  Fix up
and simplify the reference counting to avoid this.

(Actually, I introduced the bug when fixing the patch up to match some
of Greg's kobject changes, but who's counting)

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-12 14:38:27 -08:00
Roland Dreier ab64b96067 IB/cm: Remove debug printk()s that snuck upstream
Pesky little devils, sneaking around...

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-12 14:38:27 -08:00
Or Gerlitz 1d96354e61 IB/fmr_pool: Allocate page list for pool FMRs only when caching enabled
Allocate memory for the page_list field of struct ib_pool_fmr only
when caching is enabled for the FMR pool, since the field is not used
otherwise.  This can save significant amounts of memory for large
pools with caching turned off.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:44 -08:00
Sean Hefty 3971c9f6db IB/cm: Add interim support for routed paths
Paths with hop_limit > 1 indicate that the connection will be routed
between IB subnets.  Update the subnet local field in the CM REQ based
on the hop_limit value.  In addition, if the path is routed, then set
the LIDs in the REQ to the permissive LIDs.  This is used to indicate
to the passive side that it should use the LIDs in the received local
route header (LRH) associated with the REQ when programming the QP.

This is a temporary work-around to the IB CM to support IB router
development until the IB router specification is completed.  It is not
anticipated that this work-around will cause any interoperability
issues with existing stacks or future stacks that will properly
support IB routers when defined.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:42 -08:00
Denis V. Lunev f206351a50 [NETNS]: Add namespace parameter to ip_route_output_key.
Needed to propagate it down to the ip_route_output_flow.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:07 -08:00
Denis V. Lunev 1ab352768f [NETNS]: Add namespace parameter to ip_dev_find.
in_dev_find() need a namespace to pass it to fib_get_table(), so add
an argument.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 15:11:04 -08:00
Joe Perches 6360a02af1 [IPV4] drivers/infiniband: Use ipv4_is_<type>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:17 -08:00
Olaf Kirch a3cd7d9070 IB/fmr_pool: ib_fmr_pool_flush() should flush all dirty FMRs
When a FMR is released via ib_fmr_pool_unmap(), the FMR usually ends
up on the free_list rather than the dirty_list (because we allow a
certain number of remappings before actually requiring a flush).

However, ib_fmr_batch_release() only looks at dirty_list when flushing
out old mappings.  This means that when ib_fmr_pool_flush() is used to
force a flush of the FMR pool, some dirty FMRs that have not reached
their maximum remap count will not actually be flushed.

Fix this by flushing all FMRs that have been used at least once in
ib_fmr_batch_release().

Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:43 -08:00
Olaf Kirch a656eb758f IB/fmr_pool: Flush serial numbers can get out of sync
Normally, the serial numbers for flush requests and flushes executed
for an FMR pool should be in sync.

However, if the FMR pool flushes dirty FMRs because the
dirty_watermark was reached, we wake up the cleanup thread and let it
do its stuff.  As a side effect, the cleanup thread increments
pool->flush_ser, which leaves it one higher than pool->req_ser.  The
next time the user calls ib_flush_fmr_pool(), the cleanup thread will
be woken up, but ib_flush_fmr_pool() won't wait for the flush to
complete because flush_ser is already past req_ser.  This means the
FMRs that the user expects to be flushed may not have all been flushed
when the function returns.

Fix this by telling the cleanup thread to do work exclusively by
incrementing req_ser, and by moving the comparison of dirty_len and
dirty_watermark into ib_fmr_pool_unmap().

Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
2008-01-25 14:15:42 -08:00
Roland Dreier 2fe7e6f7c9 IB/umad: Simplify and fix locking
In addition to being overly complex, the locking in user_mad.c is
broken: there were multiple reports of deadlocks and lockdep warnings.
In particular it seems that a single thread may end up trying to take
the same rwsem for reading more than once, which is explicitly
forbidden in the comments in <linux/rwsem.h>.

To solve this, we change the locking to use plain mutexes instead of
rwsems.  There is one mutex per open file, which protects the contents
of the struct ib_umad_file, including the array of agents and list of
queued packets; and there is one mutex per struct ib_umad_port, which
protects the contents, including the list of open files.  We never
hold the file mutex across calls to functions like ib_unregister_mad_agent(),
which can call back into other ib_umad code to queue a packet, and we
always hold the port mutex as long as we need to make sure that a
device is not hot-unplugged from under us.

This even makes things nicer for users of the -rt patch, since we
remove calls to downgrade_write() (which is not implemented in -rt).

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:42 -08:00
Sean Hefty 5851bb893e RDMA/cma: Override default responder_resources with user value
By default, the responder_resources parameter is set to that received
in a connection request.  The passive side may override this value
when accepting the connection.  Use the value provided by the passive
side when transitioning the QP to RTR state, rather than the value
given in the connect request.  Without this change, the RTR transition
may fail if the passive side supports fewer responder_resources than
that in the request.

For code consistency and to protect against QP destruction, restructure
overriding initiator_depth to match how responder_resources is set.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:41 -08:00
Rolf Manderscheid a9e527e3f9 IPoIB: improve IPv4/IPv6 to IB mcast mapping functions
An IPoIB subnet on an IB fabric that spans multiple IB subnets can't
use link-local scope in multicast GIDs.  The existing routines that
map IP/IPv6 multicast addresses into IB link-level addresses hard-code
the scope to link-local, and they also leave the partition key field
uninitialised.  This patch adds a parameter (the link-level broadcast
address) to the mapping routines, allowing them to initialise both the
scope and the P_Key appropriately, and fixes up the call sites.

The next step will be to add a way to configure the scope for an IPoIB
interface.

Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:37 -08:00
Sean Hefty 88314e4dda RDMA/cma: add support for rdma_migrate_id()
This is based on user feedback from Doug Ledford at RedHat:

Events that occur on an rdma_cm_id are reported to userspace through an
event channel.  Connection request events are reported on the event
channel associated with the listen.  When the connection is accepted, a
new rdma_cm_id is created and automatically uses the listen event
channel.  This is suboptimal where the user only wants listen events on
that channel.

Additionally, it may be desirable to have events related to connection
establishment use a different event channel than those related to
already established connections.

Allow the user to migrate an rdma_cm_id between event channels. All
pending events associated with the rdma_cm_id are moved to the new event
channel.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:32 -08:00
Vladimir Sokolovsky 45d9478da1 RDMA/cma: Reenable device removal on passive side
Enable conn_id remove on the passive side after connection
establishment.  This corrects an issue where the IB driver can't be
unloaded after running applications over RDS.  The 'dev_remove' counter
does not reach 0 for established connections on the passive side.

This problem is limited to device removal, and only occurs on the
passive side if there are established connections.

Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:31 -08:00
Sean Hefty b61d92d8ae IB/mad: Fix incorrect access to items on local_list
In cancel_mads(), MADs are moved from the wait_list and local_list
to a cancel_list for processing.  However, the structures on these two
lists are not the same.  The wait_list references struct
ib_mad_send_wr_private, but local_list references struct
ib_mad_local_private.  Cancel_mads() treats all items moved to the
cancel_list as struct ib_mad_send_wr_private.  This leads to a system
crash when requests are moved from the local_list to the cancel_list.

Fix this by leaving local_list alone.  All requests on the local_list
have completed are just awaiting processing by a queued worker thread.

Bug (crash) reported by Dotan Barak <dotanb@dev.mellanox.co.il>.
Problem with local_list access reported by Robert Reynolds
<rreynolds@opengridcomputing.com>.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:31 -08:00
Sean Hefty 9af57b7a27 IB/cm: Add basic performance counters
Add performance/debug counters to track sent/received messages, retries,
and duplicates.  Counters are tracked per CM message type, per port.

The counters are always enabled, so intrusive state tracking is not done.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:30 -08:00
Sean Hefty 4fc8cd4919 IB/mad: Report number of times a mad was retried
To allow ULPs to tune timeout values and capture retry statistics,
report the number of times that a mad send operation was retried.

For RMPP mads, report the total number of times that the any portion
(send window) of the send operation was retried.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:30 -08:00
Sean Hefty 547af76521 IB/multicast: Report errors on multicast groups if P_key changes
P_key changes can invalidate multicast groups.  Report errors on all
multicast groups affected by a pkey change.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:29 -08:00
Steve Welch 727792da2b IB/mad: Enable loopback of DR SMP responses from userspace
The local loopback of an outgoing DR SMP response is limited to those
that originate at the driver specific SMA implementation during the
driver specific process_mad() function.  This patch enables a
returning DR SMP originating in userspace (or elsewhere) to be
delivered to the local managment stack.  In this specific case the
driver process_mad() function does not consume or process the MAD, so
a reponse mad has not be created and the original MAD must manually be
copied to the MAD buffer that is to be handed off to the local agent.

Signed-off-by: Steve Welch <swelch@systemfabricworks.com>
Acked-by: Hal Rosenstock <hal@xsigo.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:25 -08:00
Ralph Campbell 3828ff457a IB/mad: Remove redundant NULL pointer check in ib_mad_recv_done_handler()
In ib_mad_recv_done_handler(), the response pointer is checked for
NULL after allocating it.  It is then checked again in the local
process_mad() path but there is no possibility of it changing in
between.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Acked-by: Hal Rosenstock <hal@xsigo.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:25 -08:00
Steve Wise 8d8293cfb3 RDMA/iwcm: Set initiator depth and responder resources to device max values
Set the initiator depth and responder resources to the device max
values for new connect request events in the iWARP connection manager.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:25 -08:00
Greg Kroah-Hartman c10997f657 Kobject: convert drivers/* from kobject_unregister() to kobject_put()
There is no need for kobject_unregister() anymore, thanks to Kay's
kobject cleanup changes, so replace all instances of it with
kobject_put().


Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:40 -08:00
Greg Kroah-Hartman 35be068198 Kobject: change drivers/infiniband to use kobject_init_and_add
Stop using kobject_register, as this way we can control the sending of
the uevent properly, after everything is properly initialized.

Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <mshefty@ichips.intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:26 -08:00
Linus Torvalds 53173920da Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/fmr_pool: Stop ib_fmr threads from contributing to load average
  IB/ipath: Fix incorrect use of sizeof on msg buffer (function argument)
  IB/ipath: Limit length checksummed in eeprom
  IB/ipath: Fix a race where s_last is updated without lock held
  IB/mlx4: Lock SQ lock in mlx4_ib_post_send()
  IPoIB/cm: Fix receive QP cleanup
2007-10-30 15:26:56 -07:00
Anton Blanchard 3f776e8a25 IB/fmr_pool: Stop ib_fmr threads from contributing to load average
I noticed my machine was at a constant load average of 1. This was
because ib_create_fmr_pool calls kthread_create but does not
immediately wake the thread up.

Change to using kthread_run so we enter ib_fmr_cleanup_thread(), set
TASK_INTERRUPTIBLE, then go to sleep.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-30 14:57:43 -07:00
Jens Axboe 642f149031 SG: Change sg_set_page() to take length and offset argument
Most drivers need to set length and offset as well, so may as well fold
those three lines into one.

Add sg_assign_page() for those two locations that only needed to set
the page, where the offset/length is set outside of the function context.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-24 11:20:47 +02:00
Linus Torvalds 0b776eb542 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  mlx4_core: Increase command timeout for INIT_HCA to 10 seconds
  IPoIB/cm: Use common CQ for CM send completions
  IB/uverbs: Fix checking of userspace object ownership
  IB/mlx4: Sanity check userspace send queue sizes
  IPoIB: Rewrite "if (!likely(...))" as "if (unlikely(!(...)))"
  IB/ehca: Enable large page MRs by default
  IB/ehca: Change meaning of hca_cap_mr_pgsize
  IB/ehca: Fix ehca_encode_hwpage_size() and alloc_fmr()
  IB/ehca: Fix masking error in {,re}reg_phys_mr()
  IB/ehca: Supply QP token for SRQ base QPs
  IPoIB: Use round_jiffies() for ah_reap_task
  RDMA/cma: Fix deadlock destroying listen requests
  RDMA/cma: Add locking around QP accesses
  IB/mthca: Avoid alignment traps when writing doorbells
  mlx4_core: Kill mlx4_write64_raw()
2007-10-23 09:56:11 -07:00
Jens Axboe 45711f1af6 [SG] Update drivers to use sg helpers
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-22 21:19:53 +02:00
Roland Dreier cbfb50e6e2 IB/uverbs: Fix checking of userspace object ownership
Commit 9ead190b ("IB/uverbs: Don't serialize with ib_uverbs_idr_mutex")
rewrote how userspace objects are looked up in the uverbs module's
idrs, and introduced a severe bug in the process: there is no checking
that an operation is being performed by the right process any more.
Fix this by adding the missing check of uobj->context in __idr_get_uobj().

Apparently everyone is being very careful to only touch their own
objects, because this bug was introduced in June 2006 in 2.6.18, and
has gone undetected until now.

Cc: stable <stable@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-19 20:01:43 -07:00
Anton Arapov a25de534f8 [INET]: Justification for local port range robustness.
There is a justifying patch for Stephen's patches. Stephen's patches
disallows using a port range of one single port and brakes the meaning
of the 'remaining' variable, in some places it has different meaning.
My patch gives back the sense of 'remaining' variable. It should mean
how many ports are remaining and nothing else. Also my patch allows
using a single port.

  I sure we must be able to use mentioned port range, this does not
restricted by documentation and does not brake current behavior.

usefull links:
Patches posted by Stephen Hemminger
  http://marc.info/?l=linux-netdev&m=119206106218187&w=2
  http://marc.info/?l=linux-netdev&m=119206109918235&w=2

Andrew Morton's comment
  http://marc.info/?l=linux-kernel&m=119248225007737&w=2

1. Allows using a port range of one single port.
2. Gives back sense of 'remaining' variable.

Signed-off-by: Anton Arapov <aarapov@redhat.com>
Acked-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-18 22:00:17 -07:00
Sean Hefty d02d1f5359 RDMA/cma: Fix deadlock destroying listen requests
Deadlock condition reported by Kanoj Sarcar <kanoj@netxen.com>.
The deadlock occurs when a connection request arrives at the same
time that a wildcard listen is being destroyed.

A wildcard listen maintains per device listen requests for each
RDMA device in the system.  The per device listens are automatically
added and removed when RDMA devices are inserted or removed from
the system.

When a wildcard listen is destroyed, rdma_destroy_id() acquires
the rdma_cm's device mutex ('lock') to protect against hot-plug
events adding or removing per device listens.  It then tries to
destroy the per device listens by calling ib_destroy_cm_id() or
iw_destroy_cm_id().  It does this while holding the device mutex.

However, if the underlying iw/ib CM reports a connection request
while this is occurring, the rdma_cm callback function will try
to acquire the same device mutex.  Since we're in a callback,
the ib_destroy_cm_id() or iw_destroy_cm_id() calls will block until
their callback thread returns, but the callback is blocked waiting for
the device mutex.

Fix this by re-working how per device listens are destroyed.  Use
rdma_destroy_id(), which avoids the deadlock, in place of
cma_destroy_listen().  Additional synchronization is added to handle
device hot-plug events and ensure that the id is not destroyed twice.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-16 12:25:49 -07:00
Sean Hefty c5483388bb RDMA/cma: Add locking around QP accesses
If a user allocates a QP on an rdma_cm_id, the rdma_cm will automatically
transition the QP through its states (RTR, RTS, error, etc.)  While the
QP state transitions are occurring, the QP itself must remain valid.
Provide locking around the QP pointer to prevent its destruction while
accessing the pointer.

This fixes an issue reported by Olaf Kirch from Oracle that resulted in
a system crash:

"An incoming connection arrives and we decide to tear down the nascent
 connection.  The remote ends decides to do the same.  We start to shut
 down the connection, and call rdma_destroy_qp on our cm_id. ... Now
 apparently a 'connect reject' message comes in from the other host,
 and cma_ib_handler() is called with an event of IB_CM_REJ_RECEIVED.
 It calls cma_modify_qp_err, which for some odd reason tries to modify
 the exact same QP we just destroyed."

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-16 12:25:00 -07:00
Kay Sievers 7eff2e7a8b Driver core: change add_uevent_var to use a struct
This changes the uevent buffer functions to use a struct instead of a
long list of parameters. It does no longer require the caller to do the
proper buffer termination and size accounting, which is currently wrong
in some places. It fixes a known bug where parts of the uevent
environment are overwritten because of wrong index calculations.

Many thanks to Mathieu Desnoyers for finding bugs and improving the
error handling.

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:01 -07:00
Linus Torvalds ce9d3c9a6a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (87 commits)
  mlx4_core: Fix section mismatches
  IPoIB: Allow setting policy to ignore multicast groups
  IB/mthca: Mark error paths as unlikely() in post_srq_recv functions
  IB/ipath: Minor fix to ordering of freeing and zeroing of tid pages.
  IB/ipath: Remove redundant link state checks
  IB/ipath: Fix IB_EVENT_PORT_ERR event
  IB/ipath: Better handling of unexpected GPIO interrupts
  IB/ipath: Maintain active time on all chips
  IB/ipath: Fix QHT7040 serial number check
  IB/ipath: Indicate a couple of chip bugs to userspace
  IB/ipath: iba6110 rev4 no longer needs recv header overrun workaround
  IB/ipath: Use counters in ipath_poll and cleanup interrupts in ipath_close
  IB/ipath: Remove duplicate copy of LMC
  IB/ipath: Add ability to set the LMC via the sysfs debugging interface
  IB/ipath: Optimize completion queue entry insertion and polling
  IB/ipath: Implement IB_EVENT_QP_LAST_WQE_REACHED
  IB/ipath: Generate flush CQE when QP is in error state
  IB/ipath: Remove redundant code
  IB/ipath: Future proof eeprom checksum code (contents reading)
  IB/ipath: UC RDMA WRITE with IMMEDIATE doesn't send the immediate
  ...
2007-10-11 19:43:13 -07:00
Stephen Hemminger 227b60f510 [INET]: local port range robustness
Expansion of original idea from Denis V. Lunev <den@openvz.org>

Add robustness and locking to the local_port_range sysctl.
1. Enforce that low < high when setting.
2. Use seqlock to ensure atomic update.

The locking might seem like overkill, but there are
cases where sysadmin might want to change value in the
middle of a DoS attack.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 17:30:46 -07:00
Sean Hefty dcb3f974da RDMA/cma: Queue IB CM MRAs to avoid unnecessary remote retries
Automatically queue MRA message to decrease the number of retries sent
by the remote side during connection establishment.  This also has the
effect of increasing the overall connection timeout without using a
longer retry time in the case of dropped packets.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:17 -07:00
Sean Hefty de98b693e9 IB/cm: Modify interface to send MRAs in response to duplicate messages
The IB CM provides a message received acknowledged (MRA) message that
can be sent to indicate that a REQ or REP message has been received, but
will require more time to process than the timeout specified by those
messages.  In many cases, the application may not know how long it will
take to respond to a CM message, but the majority of the time, it will
usually respond before a retry has been sent.  Rather than sending an
MRA in response to all messages just to handle the case where a longer
timeout is needed, it is more efficient to queue the MRA for sending in
case a duplicate message is received.

This avoids sending an MRA when it is not needed, but limits the number
of times that a REQ or REP will be resent.  It also provides for a
simpler implementation than generating the MRA based on a timer event.
(That is, trying to send the MRA after receiving the first REQ or REP if
a response has not been generated, so that it is received at the remote
side before a duplicate REQ or REP has been received)

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:17 -07:00
Roland Dreier 04d29b0ede IB/uverbs: Make ib_uverbs_release_event_file() static
ib_uverbs_release_event_file() is only used in uverbs_main.c, so make it
static to that file.  Also move the definition before the first use, so
a forward declaration is not needed.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:15 -07:00
Roland Dreier a394f83bdf IB/umad: Fix bit ordering and 32-on-64 problems on big endian systems
The declaration of struct ib_user_mad_reg_req.method_mask[] exported
to userspace was an array of __u32, but the kernel internally treated
it as a bitmap made up of longs.  This makes a difference for 64-bit
big-endian kernels, where numbering the bits in an array of__u32 gives:

    |31.....0|63....31|95....64|127...96|

while numbering the bits in an array of longs gives:

    |63..............0|127............64|

64-bit userspace can handle this by just treating method_mask[] as an
array of longs, but 32-bit userspace is really stuck: the meaning of
the bits in method_mask[] depends on whether the kernel is 32-bit or
64-bit, and there's no sane way for userspace to know that.

Fix this by updating <rdma/ib_user_mad.h> to make it clear that
method_mask[] is an array of longs, and using a compat_ioctl method to
convert to an array of 64-bit longs to handle the 32-on-64 problem.
This fixes the interface description to match existing behavior (so
working binaries continue to work) in almost all situations, and gives
consistent semantics in the case of 32-bit userspace that can run on
either a 32-bit or 64-bit kernel, so that the same binary can work for
both 32-on-32 and 32-on-64 systems.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:15 -07:00
Roland Dreier 2be8e3ee8e IB/umad: Add P_Key index support
Add support for setting the P_Key index of sent MADs and getting the
P_Key index of received MADs.  This requires a change to the layout of
the ABI structure struct ib_user_mad_hdr, so to avoid breaking
compatibility, we default to the old (unchanged) ABI and add a new
ioctl IB_USER_MAD_ENABLE_PKEY that allows applications that are aware
of the new ABI to opt into using it.

We plan on switching to the new ABI by default in a year or so, and
this patch adds a warning that is printed when an application uses the
old ABI, to push people towards converting to the new ABI.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Hal Rosenstock <hal@xsigo.com>
2007-10-09 19:59:15 -07:00
Ralph Campbell 57cb61d587 IB/core: Fix handling of multicast response failures
I was looking at the code for multicast.c and noticed that
ib_sa_join_multicast() calls queue_join() which puts the
request at the front of the group->pending_list.  If this
is a second request, it seems like it would interfere with
process_join_error() since group->last_join won't point
to the member at the head of the pending_list. The sequence
would thus be:

1. ib_sa_join_multicast()
   puts member1 on head of pending_list and starts work thread
2. mcast_work_handler()
   calls send_join() which sets group->last_join to member1
3. ib_sa_join_multicast()
   puts member2 on head of pending_list
4. join operation for member1 receives failures response from SA.
5. join_handler() is called with error status
6. process_join_error() fails to process member1 since
   it doesn't match the first entry in the group->pending_list.

The impact is that the failed join request is tossed.  The second
request is processed, and after it completes, the original request ends
up being retried.

This change also results in join requests being processed in FIFO
order.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:14 -07:00
Steve Wise 935ef2d7a2 RDMA/cma: Use neigh_event_send() to start neighbour discovery
Calling arp_send() to initiate neighbour discovery (ND) doesn't do the
full ND protocol.  Namely, it doesn't handle retransmitting the arp
request if it is dropped. The function neigh_event_send() does all
this.  Without doing full ND, RDMA address resolution fails in the
presence of dropped ARP broadcast packets.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:13 -07:00
Joachim Fenkes c8d8beea03 IB/umem: Add hugetlb flag to struct ib_umem
During ib_umem_get(), determine whether all pages from the memory
region are hugetlb pages and report this in the "hugetlb" member.
Low-level drivers can use this information if they need it.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:13 -07:00
Sean Hefty 7ce86409ad RDMA/ucma: Allow user space to set service type
Export the ability to set the type of service to user space.  Model
the interface after setsockopt.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:12 -07:00
Sean Hefty a81c994d5e RDMA/cma: Add ability to specify type of service
Provide support to specify a type of service for a communication
identifier.  A new function call is used when dealing with IPv4
addresses.  For IPv6 addresses, the ToS is specified through the
traffic class field in the sockaddr_in6 structure.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

[ The comments Eitan Zahavi and myself have made over the v1 post at 
  <http://lists.openfabrics.org/pipermail/general/2007-August/039247.html>
  were fully addressed. ]
 
Reviewed-by: Or Gerlitz <ogerlitz@voltaire.com> 
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:12 -07:00
Sean Hefty 733d65fe33 IB/sa: Add new QoS fields to path record
The QoS annex defines new fields for path records.  Add them to the
ib_sa for consumers that want to use them.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:12 -07:00
Ali Ayoub 3c10c7c929 IB/sa: Error handling thinko fix
ib_create_send_mad() returns an error code pointer on error, not NULL.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:07 -07:00
Anton Blanchard 8a68bbe31d IB/fmr_pool: Clean up some error messages in fmr_pool.c
A number of printks in fmr_pool.c dont have newlines, eg:

    fmr_create failed for FMR 0<5>FS-Cache: Loaded

Fix them up.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:05 -07:00
Roland Dreier 65d470b3ea IB: find_first_zero_bit() takes unsigned pointer
Fix sparse warning

    drivers/infiniband/core/device.c:142:6: warning: incorrect type in argument 1 (different signedness)
    drivers/infiniband/core/device.c:142:6:    expected unsigned long const *addr
    drivers/infiniband/core/device.c:142:6:    got long *[assigned] inuse

by making the local variable inuse unsigned.  Does not affect generated
code at all.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:04 -07:00
Dotan Barak 92ddc447ce IB: Move the macro IB_UMEM_MAX_PAGE_CHUNK() to umem.c
After moving the definition of struct ib_umem_chunk from ib_verbs.h to
ib_umem.h there isn't any reason for the macro IB_UMEM_MAX_PAGE_CHUNK
to stay in ib_verbs.h.  Move the macro to umem.c, the only place where
it is used.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-03 10:45:18 -07:00
Sean Hefty 38d5af9565 IB/mad: Fix address handle leak in mad_rmpp
The address handle associated with dual-sided RMPP direction switch
ACKs is never destroyed.  Free the AH for ACKs which fall into this
category.

Problem was reported by Dotan Barak <dotanb@dev.mellanox.co.il>.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-03 10:45:17 -07:00
Hal Rosenstock 8fc394b197 IB/mad: agent_send_response() should be void
Nothing looks at the return value of agent_send_response(), so there's
no point in returning anything.

Signed-off-by: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-03 10:45:17 -07:00
Hal Rosenstock 86dfbecdea IB/mad: Fix memory leak in switch handling in ib_mad_recv_done_handler()
If agent_send_response() returns an error, we shouldn't do anything
differently than if it succeeds; setting response to NULL just means
that the response buffer gets leaked.

Signed-off-by: Suresh Shelvapille <suri@baymicrosystems.com>
Signed-off-by: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-03 10:45:17 -07:00
Hal Rosenstock 445d68070c IB/mad: Fix error path if response alloc fails in ib_mad_recv_done_handler()
If ib_mad_recv_done_handler() fails to allocate response, then it just
printed a warning and continued, which leads to an oops if the MAD is
being handled for a switch device, because the switch code uses
response without checking for NULL.  Fix this by bailing out of the
function if the allocation fails.

Signed-off-by: Suresh Shelvapille <suri@baymicrosystems.com>
Signed-off-by: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-03 10:45:17 -07:00
Roland Dreier 5399891052 IB/sa: Don't need to check for default P_Key twice
Now that ib_find_pkey() ignores the membership bit of P_Keys, there's no
need for ib_sa to look for both 0x7fff and 0xffff in a port's P_Key table.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-03 10:45:17 -07:00
Moni Shoua 36026ecc20 IB/core: Ignore membership bit in ib_find_pkey()
ib_find_pkey() is used as a replacement for ib_find_cached_pkey(), and
the original function ignored the membership bit when searching for a
P_Key, so ib_find_pkey() should ignore the bit too.

In particular, IPoIB turns on the P_Key membership bit of limited
membership P_Keys when creating a child interface and looks for the
full membership P_key.  This broke if a port was a partial member of a
partition when IPoIB switched from ib_find_cached_pkey() to
ib_find_pkey(), and this change fixes things again.

Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-03 10:45:17 -07:00
Paul Mundt 20c2df83d2 mm: Remove slab destructors from kmem_cache_create().
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.

This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-07-20 10:11:58 +09:00
Yoann Padioleau dd00cc486a some kmalloc/memset ->kzalloc (tree wide)
Transform some calls to kmalloc/memset to a single kzalloc (or kcalloc).

Here is a short excerpt of the semantic patch performing
this transformation:

@@
type T2;
expression x;
identifier f,fld;
expression E;
expression E1,E2;
expression e1,e2,e3,y;
statement S;
@@

 x =
- kmalloc
+ kzalloc
  (E1,E2)
  ...  when != \(x->fld=E;\|y=f(...,x,...);\|f(...,x,...);\|x=E;\|while(...) S\|for(e1;e2;e3) S\)
- memset((T2)x,0,E1);

@@
expression E1,E2,E3;
@@

- kzalloc(E1 * E2,E3)
+ kcalloc(E1,E2,E3)

[akpm@linux-foundation.org: get kcalloc args the right way around]
Signed-off-by: Yoann Padioleau <padator@wanadoo.fr>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Cc: Bryan Wu <bryan.wu@analog.com>
Acked-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Dave Airlie <airlied@linux.ie>
Acked-by: Roland Dreier <rolandd@cisco.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Acked-by: Pierre Ossman <drzeus-list@drzeus.cx>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Greg KH <greg@kroah.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:50 -07:00
Dotan Barak 8f076531cd RDMA/cma: Remove local write permission from QP access flags
Local write permission makes no sense as part of the QP access flags,
since the access flags only control what the remote end of the
connection is allowed to do.  Remove the code in the RDMA CM that
initializes qp_access_flags with IB_ACCESS_LOCAL_WRITE.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 20:30:22 -07:00
Roland Dreier 454a01e7f4 IB/cm: Make internal function cm_get_ack_delay() static
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 18:37:43 -07:00
Linus Torvalds 0cdf6990e9 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (76 commits)
  IB: Update MAINTAINERS with Hal's new email address
  IB/mlx4: Implement query SRQ
  IB/mlx4: Implement query QP
  IB/cm: Send no match if a SIDR REQ does not match a listen
  IB/cm: Fix handling of duplicate SIDR REQs
  IB/cm: cm_msgs.h should include ib_cm.h
  IB/cm: Include HCA ACK delay in local ACK timeout
  IB/cm: Use spin_lock_irq() instead of spin_lock_irqsave() when possible
  IB/sa: Make sure SA queries use default P_Key
  IPoIB: Recycle loopback skbs instead of freeing and reallocating
  IB/mthca: Replace memset(<addr>, 0, PAGE_SIZE) with clear_page(<addr>)
  IPoIB/cm: Fix warning if IPV6 is not enabled
  IB/core: Take sizeof the correct pointer when calling kmalloc()
  IB/ehca: Improve latency by unlocking after triggering the hardware
  IB/ehca: Notify consumers of LID/PKEY/SM changes after nondisruptive events
  IB/ehca: Return QP pointer in poll_cq()
  IB/ehca: Change idr spinlocks into rwlocks
  IB/ehca: Refactor sync between completions and destroy_cq using atomic_t
  IB/ehca: Lock renaming, static initializers
  IB/ehca: Report RDMA atomic attributes in query_qp()
  ...
2007-07-12 16:45:40 -07:00
Tejun Heo 7b595756ec sysfs: kill unnecessary attribute->owner
sysfs is now completely out of driver/module lifetime game.  After
deletion, a sysfs node doesn't access anything outside sysfs proper,
so there's no reason to hold onto the attribute owners.  Note that
often the wrong modules were accounted for as owners leading to
accessing removed modules.

This patch kills now unnecessary attribute->owner.  Note that with
this change, userland holding a sysfs node does not prevent the
backing module from being unloaded.

For more info regarding lifetime rule cleanup, please read the
following message.

  http://article.gmane.org/gmane.linux.kernel/510293

(tweaked by Greg to not delete the field just yet, to make it easier to
merge things properly.)

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:09:06 -07:00
Sean Hefty 6164c8cd13 IB/cm: Send no match if a SIDR REQ does not match a listen
If a SIDR REQ does not match a listen, we should reply with status
value 1 (service ID not supported), rather than dropping through to
the default case of status 2 (rejected by service provider).

Doing this also fixes a bug where the cm_id_priv is removed from the
remote_sidr_table twice.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 21:52:28 -07:00
Sean Hefty 29c2731cbf IB/cm: Fix handling of duplicate SIDR REQs
Fix handling to duplicate SIDR REQs to avoid sending a reject if a
duplicate is detected.  Duplicates should just be silently discarded.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 21:51:43 -07:00
Sean Hefty 5d861be8c8 IB/cm: cm_msgs.h should include ib_cm.h
cm_msgs.h uses definitions from ib_cm.h.  Include it directly, rather
than depending on a specific include order.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 21:50:53 -07:00
Sean Hefty 1d84612649 IB/cm: Include HCA ACK delay in local ACK timeout
The IB CM should include the HCA ACK delay when calculating the local
ACK timeout value to use for RC QPs.  If the HCA ACK delay is large
enough relative to the packet life time, then if it is not taken into
account, the calculated timeout value ends up being too small, which
can result in "retry exceeded" errors.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 21:50:05 -07:00
Sean Hefty 24be6e81c7 IB/cm: Use spin_lock_irq() instead of spin_lock_irqsave() when possible
The ib_cm is a little over zealous about using spin_lock_irqsave,
when spin_lock_irq would do.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 21:47:29 -07:00
Sean Hefty 2aec5c602c IB/sa: Make sure SA queries use default P_Key
MADs sent to the SA should use the the default P_Key (0x7fff/0xffff).
There's no requirement that the default P_Key is stored at index 0 in
the local P_Key table, so add code to the sa_query module to look up
the index of the default P_Key when creating an address handle for the
SA (which is done any time the P_Key table might change), and use this
index for all SA queries.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 21:45:31 -07:00
Dotan Barak 856c52a741 IB/core: Take sizeof the correct pointer when calling kmalloc()
When allocating out_mad in show_pma_counter(), take sizeof *out_mad
instead of sizeof *in_mad.  It is true that today the type of in_mad
and out_mad are the same, but this patch will give us a cleaner code.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 11:04:40 -07:00
Andrew Morton 1d3f4b905a IB: Fix ib_umem_get() when npages == 0
gcc correctly warned:

drivers/infiniband/core/umem.c: In function 'ib_umem_get':
drivers/infiniband/core/umem.c:78: warning: 'ret' may be used uninitialized in this function

Set ret to 0 in case npages == 0 and the loop isn't entered at all.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 16:17:33 -07:00
Roland Dreier 43506d954e IB: Remove garbage non-ASCII characters from comments
A few files had 0xa0 characters in comments.  Remove them so that the 
files are clean ASCII text.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 16:17:32 -07:00
Hal Rosenstock 1bae4dbf95 IB/mad: Enhance SMI for switch support
Extend the SMI with switch (intermediate hop) support. Care has been
taken to ensure that the CA (and router) code paths are changed as
little as possible.

Signed-off-by: Suresh Shelvapille <suri@baymicrosystems.com>
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 16:17:32 -07:00
Roland Dreier 24bce50803 IB/umem: Fix possible hang on process exit
If ib_umem_release() is called after ib_uverbs_close() sets context->closing,
then a process can get stuck in a D state, because the code boils down to

	if (down_write_trylock(&mm->mmap_sem))
		down_write(&mm->mmap_sem);

which is obviously a stupid instant deadlock.  Fix the code so that we
only try to take the lock once.

This bug was introduced in commit f7c6a7b5 ("IB/uverbs: Export
ib_umem_get()/ib_umem_release() to modules") which fortunately never
made it into a release, and was reported by Pete Wyckoff <pw@osc.edu>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-21 11:05:58 -07:00
Sean Hefty bf2944bd56 RDMA/cma: Fix initialization of next_port
next_port should be between sysctl_local_port_range[0] and [1].
However, it is initially set to a random value with get_random_bytes().  
If the value is negative when treated as a signed integer, next_port
can end up outside the expected range because of the result of the % 
operator being negative.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-07 23:24:38 -07:00
Sean Hefty d998ccce02 IB/cm: Fix stale connection detection
The ib_cm can incorrectly detect a stale connection (a new connection
request for a QPN that is already connected) as a duplicate connection
request.  Separate the handling of potential duplicate REQs from stale
connections.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-29 16:07:09 -07:00
Linus Torvalds 8aee74c8ee Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
  IB/cm: Improve local id allocation
  IPoIB/cm: Fix SRQ WR leak
  IB/ipoib: Fix typos in error messages
  IB/mlx4: Check if SRQ is full when posting receive
  IB/mlx4: Pass send queue sizes from userspace to kernel
  IB/mlx4: Fix check of opcode in mlx4_ib_post_send()
  mlx4_core: Fix array overrun in dump_dev_cap_flags()
  IB/mlx4: Fix RESET to RESET and RESET to ERROR transitions
  IB/mthca: Fix RESET to ERROR transition
  IB/mlx4: Set GRH:HopLimit when sending globally routed MADs
  IB/mthca: Set GRH:HopLimit when building MLX headers
  IB/mlx4: Fix check of max_qp_dest_rdma in modify QP
  IB/mthca: Fix use-after-free on device restart
  IB/ehca: Return proper error code if register_mr fails
  IPoIB: Handle P_Key table reordering
  IB/core: Use start_port() and end_port()
  IB/core: Add helpers for uncached GID and P_Key searches
  IB/ipath: Fix potential deadlock with multicast spinlocks
  IB/core: Free umem when mm is already gone
2007-05-21 16:19:32 -07:00
Michael S. Tsirkin 9f81036c54 IB/cm: Improve local id allocation
The IB CM uses an idr for local id allocations, with a running counter
as start_id.  This fails to generate distinct ids if

1. An id is constantly created and destroyed
2. A chunk of ids just beyond the current next_id value is occupied

This in turn leads to an increased chance of connection request being
mis-detected as a duplicate, sometimes for several retries, until
next_id gets past the block of allocated ids. This has been observed
in practice.

As a fix, remember the last id allocated and start immediately above it.
This also fixes a problem with the old code, where next_id might
overflow and become negative.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-21 13:41:29 -07:00
Alexey Dobriyan e8edc6e03a Detach sched.h from mm.h
First thing mm.h does is including sched.h solely for can_do_mlock() inline
function which has "current" dereference inside. By dealing with can_do_mlock()
mm.h can be detached from sched.h which is good. See below, why.

This patch
a) removes unconditional inclusion of sched.h from mm.h
b) makes can_do_mlock() normal function in mm/mlock.c
c) exports can_do_mlock() to not break compilation
d) adds sched.h inclusions back to files that were getting it indirectly.
e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
   getting them indirectly

Net result is:
a) mm.h users would get less code to open, read, preprocess, parse, ... if
   they don't need sched.h
b) sched.h stops being dependency for significant number of files:
   on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
   after patch it's only 3744 (-8.3%).

Cross-compile tested on

	all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
	alpha alpha-up
	arm
	i386 i386-up i386-defconfig i386-allnoconfig
	ia64 ia64-up
	m68k
	mips
	parisc parisc-up
	powerpc powerpc-up
	s390 s390-up
	sparc sparc-up
	sparc64 sparc64-up
	um-x86_64
	x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig

as well as my two usual configs.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-21 09:18:19 -07:00
Roland Dreier 1af4c435f3 IB/core: Use start_port() and end_port()
Clean up ib_query_port() and ib_modify_port() slightly by using the 
just-added start_port() and end_port() helpers.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19 08:51:54 -07:00
Yosef Etigin 5eb620c81c IB/core: Add helpers for uncached GID and P_Key searches
Add ib_find_gid() and ib_find_pkey() functions that use uncached device
queries.  The calls might block but the returns are always up-to-date.
Cache P_Key and GID table lengths in core to avoid extra port info queries.

Signed-off-by: Yosef Etigin <yosefe@voltaire.com>
Acked-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19 08:51:53 -07:00
Eli Cohen 7b82cd8ee7 IB/core: Free umem when mm is already gone
Free umem when task's mm is already destroyed by the time
ib_umem_release gets called.

Found by Dotan Barak at Mellanox.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19 08:51:53 -07:00
Sean Hefty 6c719f5c6c RDMA/cma: Add check to validate that cm_id is bound to a device
Several checks in the rdma_cm check against the state of the
cm_id, but only to validate that the cm_id is bound to an underlying
transport specific CM and an RDMA device.  Make the check explicit
in what we're trying to check for, since we're not synchronizing
against the cm_id state.

This will allow a user to disconnect a cm_id or reject a connection
after receiving a device removal event.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14 14:10:32 -07:00
Sean Hefty be65f086f2 RDMA/cma: Fix synchronization with device removal in cma_iw_handler
The cma_iw_handler needs to validate the state of the rdma_cm_id before
processing a new connection request to ensure that a device removal is
not already being processed for the same rdma_cm_id.  Without the state
check, the user can receive simultaneous callbacks for the same cm_id, or
a callback after they've destroyed the cm_id.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14 13:56:32 -07:00
Sean Hefty 8aa08602bd RDMA/cma: Simplify device removal handling code
Add a new routine and rename another to encapsulate common code for
synchronizing with device removal.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14 13:54:49 -07:00
Roland Dreier 1bf66a3042 IB: Put rlimit accounting struct in struct ib_umem
When memory pinned with ib_umem_get() is released, ib_umem_release()
needs to subtract the amount of memory being unpinned from
mm->locked_vm.  However, ib_umem_release() may be called with
mm->mmap_sem already held for writing if the memory is being released
as part of an munmap() call, so it is sometimes necessary to defer
this accounting into a workqueue.

However, the work struct used to defer this accounting is dynamically
allocated before it is queued, so there is the possibility of failing
that allocation.  If the allocation fails, then ib_umem_release has no
choice except to bail out and leave the process with a permanently
elevated locked_vm.

Fix this by allocating the structure to defer accounting as part of
the original struct ib_umem, so there's no possibility of failing a
later allocation if creating the struct ib_umem and pinning memory
succeeds.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-08 18:00:37 -07:00
Roland Dreier f7c6a7b5d5 IB/uverbs: Export ib_umem_get()/ib_umem_release() to modules
Export ib_umem_get()/ib_umem_release() and put low-level drivers in
control of when to call ib_umem_get() to pin and DMA map userspace,
rather than always calling it in ib_uverbs_reg_mr() before calling the
low-level driver's reg_user_mr method.

Also move these functions to be in the ib_core module instead of
ib_uverbs, so that driver modules using them do not depend on
ib_uverbs.

This has a number of advantages:
 - It is better design from the standpoint of making generic code a
   library that can be used or overridden by device-specific code as
   the details of specific devices dictate.
 - Drivers that do not need to pin userspace memory regions do not
   need to take the performance hit of calling ib_mem_get().  For
   example, although I have not tried to implement it in this patch,
   the ipath driver should be able to avoid pinning memory and just
   use copy_{to,from}_user() to access userspace memory regions.
 - Buffers that need special mapping treatment can be identified by
   the low-level driver.  For example, it may be possible to solve
   some Altix-specific memory ordering issues with mthca CQs in
   userspace by mapping CQ buffers with extra flags.
 - Drivers that need to pin and DMA map userspace memory for things
   other than memory regions can use ib_umem_get() directly, instead
   of hacks using extra parameters to their reg_phys_mr method.  For
   example, the mlx4 driver that is pending being merged needs to pin
   and DMA map QP and CQ buffers, but it does not need to create a
   memory key for these buffers.  So the cleanest solution is for mlx4
   to call ib_umem_get() in the create_qp and create_cq methods.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-08 18:00:37 -07:00
Linus Torvalds 972d45fb43 Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
  IPoIB: Convert to NAPI
  IB: Return "maybe missed event" hint from ib_req_notify_cq()
  IB: Add CQ comp_vector support
  IB/ipath: Fix a race condition when generating ACKs
  IB/ipath: Fix two more spin lock problems
  IB/fmr_pool: Add prefix to all printks
  IB/srp: Set proc_name
  IB/srp: Add orig_dgid sysfs attribute to scsi_host
  IPoIB/cm: Don't crash if remote side uses one QP for both directions
  RDMA/cxgb3: Support for new abort logic
  RDMA/cxgb3: Initialize cpu_idx field in cpl_close_listserv_req message
  RDMA/cxgb3: Fail qp creation if the requested max_inline is too large
  RDMA/cxgb3: Fix TERM codes
  IPoIB/cm: Fix error handling in ipoib_cm_dev_open()
  IB/ipath: Don't corrupt pending mmap list when unmapped objects are freed
  IB/mthca: Work around kernel QP starvation
  IB/ipath: Don't put QP in timeout queue if waiting to send
  IB/ipath: Don't call spin_lock_irq() from interrupt context
2007-05-07 12:18:21 -07:00
Michael S. Tsirkin f4fd0b224d IB: Add CQ comp_vector support
Add a num_comp_vectors member to struct ib_device and extend
ib_create_cq() to pass in a comp_vector parameter -- this parallels
the userspace libibverbs API.  Update all hardware drivers to set
num_comp_vectors to 1 and have all ULPs pass 0 for the comp_vector
value.  Pass the value of num_comp_vectors to userspace rather than
hard-coding a value of 1.

We want multiple CQ event vector support (via MSI-X or similar for
adapters that can generate multiple interrupts), but it's not clear
how many vectors we want, or how we want to deal with policy issues
such as how to decide which vector to use or how to set up interrupt
affinity.  This patch is useful for experimenting, since no core
changes will be necessary when updating a driver to support multiple
vectors, and we know that we want to make at least these changes
anyway.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06 21:18:11 -07:00
Roland Dreier 1a70a05d9d IB/fmr_pool: Add prefix to all printks
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06 21:18:11 -07:00
Jean Delvare 6473d160b4 PCI: Cleanup the includes of <linux/pci.h>
I noticed that many source files include <linux/pci.h> while they do
not appear to need it. Here is an attempt to clean it all up.

In order to find all possibly affected files, I searched for all
files including <linux/pci.h> but without any other occurence of "pci"
or "PCI". I removed the include statement from all of these, then I
compiled an allmodconfig kernel on both i386 and x86_64 and fixed the
false positives manually.

My tests covered 66% of the affected files, so there could be false
positives remaining. Untested files are:

arch/alpha/kernel/err_common.c
arch/alpha/kernel/err_ev6.c
arch/alpha/kernel/err_ev7.c
arch/ia64/sn/kernel/huberror.c
arch/ia64/sn/kernel/xpnet.c
arch/m68knommu/kernel/dma.c
arch/mips/lib/iomap.c
arch/powerpc/platforms/pseries/ras.c
arch/ppc/8260_io/enet.c
arch/ppc/8260_io/fcc_enet.c
arch/ppc/8xx_io/enet.c
arch/ppc/syslib/ppc4xx_sgdma.c
arch/sh64/mach-cayman/iomap.c
arch/xtensa/kernel/xtensa_ksyms.c
arch/xtensa/platform-iss/setup.c
drivers/i2c/busses/i2c-at91.c
drivers/i2c/busses/i2c-mpc.c
drivers/media/video/saa711x.c
drivers/misc/hdpuftrs/hdpu_cpustate.c
drivers/misc/hdpuftrs/hdpu_nexus.c
drivers/net/au1000_eth.c
drivers/net/fec_8xx/fec_main.c
drivers/net/fec_8xx/fec_mii.c
drivers/net/fs_enet/fs_enet-main.c
drivers/net/fs_enet/mac-fcc.c
drivers/net/fs_enet/mac-fec.c
drivers/net/fs_enet/mac-scc.c
drivers/net/fs_enet/mii-bitbang.c
drivers/net/fs_enet/mii-fec.c
drivers/net/ibm_emac/ibm_emac_core.c
drivers/net/lasi_82596.c
drivers/parisc/hppb.c
drivers/sbus/sbus.c
drivers/video/g364fb.c
drivers/video/platinumfb.c
drivers/video/stifb.c
drivers/video/valkyriefb.c
include/asm-arm/arch-ixp4xx/dma.h
sound/oss/au1550_ac97.c

I would welcome test reports for these files. I am fine with removing
the untested files from the patch if the general opinion is that these
changes aren't safe. The tested part would still be nice to have.

Note that this patch depends on another header fixup patch I submitted
to LKML yesterday:
  [PATCH] scatterlist.h needs types.h
  http://lkml.org/lkml/2007/3/01/141

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:35 -07:00
Joachim Fenkes 1912ffbb88 IB: Set class_dev->dev in core for nice device symlink
All RDMA drivers except ehca set class_dev->dev to their dma_device
value (ehca leaves this unset).  dma_device is the only value that
makes any sense, so move this assignment to core/sysfs.c.  This reduce
the duplicated code in the rest of the drivers and gives ehca a nice
/sys/class/infiniband/ehcaX/device symlink.

Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-24 21:30:38 -07:00
Hal Rosenstock de493d47d8 IB/mad: Change SMI to use enums rather than magic return codes
Clarify code by changing return values from SMI functions to named
enum values instead of magic 0/1 values.

Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-24 16:31:12 -07:00
Sean Hefty aeba84a925 IB/umad: Implement GRH handling for sent/received MADs
We need to set the SGID index for routed MADs and pass received
GRH information to userspace when a MAD is received.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
2007-04-24 16:31:12 -07:00
Sean Hefty d0e7bb1418 IB/sa: Set src_path_bits correctly in ib_init_ah_from_path()
src_path_bits needs to mask off the base LID value.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
2007-04-24 16:31:12 -07:00
Sean Hefty 9d41b7fdea IB/ucm: Simplify ib_ucm_event()
Use wait_event_interruptible() instead of a more complicated
open-coded equivalent.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
2007-04-24 16:31:11 -07:00
Sean Hefty d92f76448c RDMA/ucma: Simplify ucma_get_event()
Use wait_event_interruptible() instead of a more complicated
open-coded equivalent.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
2007-04-24 16:31:11 -07:00
Hal Rosenstock 9a4b65e357 IB/umad: Fix declaration of dev_map[]
The current ib_umad code never accesses bits past IB_UMAD_MAX_PORTS in
dev_map[].  We shouldn't declare it to be twice as big.

Pointed-out-by: Roland Dreier <rolandd@cisco.com>

Signed-off-by: Hal Rosenstock <halr@voltaire.com>
2007-04-18 20:20:53 -07:00
Sean Hefty 3492856e33 RDMA/ucma: Avoid sending reject if backlog is full
Change the returned error code to ENOMEM if the connection event
backlog is full.  This prevents the ib_cm from issuing a reject
on the connection, which can allow retries to succeed.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-06 14:58:11 -08:00
Sean Hefty cb164b8c6a RDMA/cma: Initialize rdma_bind_list in cma_alloc_any_port()
The struct rdma_bind_list fields for hlist are not being initialized,
resulting in a corrupted list.  Fix this by using kzalloc() to make
sure all pointers are NULL.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-06 12:41:44 -08:00
Sean Hefty 1836854f25 RDMA/cma: Remove unused node_guid from cma_device structure
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-22 17:54:35 -08:00
Sean Hefty e971b8cd19 IB/cm: Remove ca_guid from cm_device structure
The cm_device references an ib_device, which already contains the node_guid.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-22 17:54:33 -08:00
Sean Hefty 962063e64b RDMA/cma: Request reversible paths only
The rdma_cm requires that path records be reversible.  Set the
reversible bit when issuing an path record query.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-22 17:54:07 -08:00
Sean Hefty 47645d8d25 IB/core: Set hop limit in ib_init_ah_from_wc correctly
The hop_limit value in the ah_attr should be 0xFF, not the value read
from the received GRH (which should be 0).  See 13.5.4.4 in the 1.2 IB
spec.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-22 17:54:02 -08:00
Roland Dreier aaf1aef55f IB/uverbs: Return correct error for invalid PD in register MR
If no matching PD is found in ib_uverbs_reg_mr(), then the function
jumps to err_release without setting the return value ret.  This means
that ret will hold the return value of the call to ib_umem_get() a few
lines earlier; if the function reaches the point where it looks for
the PD, we know that ib_umem_get() must have returned 0, so
ib_uverbs_reg_mr() ends up return 0 for a bad PD ID.  Fix this by
setting ret to -EINVAL before jumping to the exit path when no PD is
found.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-22 13:16:51 -08:00
Roland Dreier 7084f8429c IB/core: Set static rate in ib_init_ah_from_path()
The static rate from the path record should be put into the address
vector -- a long time ago the rate in the address attributes needed to
be a relative rate, which required more munging, but now that the
conversion from absolute to relative is done in the low-level driver,
it's easy for ib_init_ah_from_path() to put the absolute rate in.

Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 15:31:24 -08:00
Roland Dreier 38abaa63bf IB/core: Fix sparse warnings about shadowed declarations
Change a couple of variable names to avoid sparse warnings about
symbols being shadowed.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 14:41:14 -08:00
Sean Hefty c8f6a362bf RDMA/cma: Add multicast communication support
Extend rdma_cm to support multicast communication.  Multicast support
is added to the existing RDMA_PS_UDP port space, as well as a new
RDMA_PS_IPOIB port space.  The latter port space allows joining the
multicast groups used by IPoIB, which enables offloading IPoIB traffic
to a separate QP.  The port space determines the signature used in the
MGID when joining the group.  The newly added RDMA_PS_IPOIB also
allows for unicast operations, similar to RDMA_PS_UDP.

Supporting the RDMA_PS_IPOIB requires changing how UD QPs are initialized,
since we can no longer assume that the qkey is constant.  This requires
saving the Q_Key to use when attaching to a device, so that it is
available when creating the QP.  The Q_Key information is exported to
the user through the existing rdma_init_qp_attr() interface.

Multicast support is also exported to userspace through the rdma_ucm.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 14:29:07 -08:00
Sean Hefty faec2f7b96 IB/sa: Track multicast join/leave requests
The IB SA tracks multicast join/leave requests on a per port basis and
does not do any reference counting: if two users of the same port join
the same group, and one leaves that group, then the SA will remove the
port from the group even though there is one user who wants to stay a
member left.  Therefore, in order to support multiple users of the
same multicast group from the same port, we need to perform reference
counting locally.

To do this, add an multicast submodule to ib_sa to perform reference
counting of multicast join/leave operations.  Modify ib_ipoib (the
only in-kernel user of multicast) to use the new interface.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 14:20:02 -08:00
Steve Wise ebb90986e1 RDMA/iwcm: iw_cm_id destruction race fixes
iwcm iw_cm_id destruction race condition fixes:

- iwcm_deref_id() always wakes up if there's another reference.
- clean up race condition in cm_work_handler().
- create static void free_cm_id() which deallocs the work entries and then
  kfrees the cm_id memory.  This reduces code replication.
- rem_ref() if this is the last reference -and- the IWCM owns freeing the
  cm_id, then free it.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 13:57:35 -08:00
Tim Schmielau cd354f1ae7 [PATCH] remove many unneeded #includes of sched.h
After Al Viro (finally) succeeded in removing the sched.h #include in module.h
recently, it makes sense again to remove other superfluous sched.h includes.
There are quite a lot of files which include it but don't actually need
anything defined in there.  Presumably these includes were once needed for
macros that used to live in sched.h, but moved to other header files in the
course of cleaning it up.

To ease the pain, this time I did not fiddle with any header files and only
removed #includes from .c-files, which tend to cause less trouble.

Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
configs in arch/arm/configs on arm.  I also checked that no new warnings were
introduced by the patch (actually, some warnings are removed that were emitted
by unnecessarily included header files).

Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:54 -08:00
Linus Torvalds 93bbad8fe1 Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
  IB/mthca: Always fill MTTs from CPU
  IB/mthca: Merge MR and FMR space on 64-bit systems
  IB/mthca: Fix access to MTT and MPT tables on non-cache-coherent CPUs
  IB/mthca: Give reserved MTTs a separate cache line
  IB/mthca: Fix reserved MTTs calculation on mem-free HCAs
  RDMA/cxgb3: Add driver for Chelsio T3 RNIC
  IB: Remove redundant "_wq" from workqueue names
  RDMA/cma: Increment port number after close to avoid re-use
  IB/ehca: Fix memleak on module unloading
  IB/mthca: Work around gcc bug on sparc64
  IPoIB: Connected mode experimental support
  IB/core: Use ARRAY_SIZE macro for mandatory_table
  IB/mthca: Use correct structure size in call to memset()
2007-02-13 21:16:39 -08:00
Arjan van de Ven 2b8693c061 [PATCH] mark struct file_operations const 3
Many struct file_operations in the kernel can be "const".  Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data.  In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:45 -08:00
Sean Hefty c7f743a669 IB: Remove redundant "_wq" from workqueue names
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-10 08:00:50 -08:00
Sean Hefty aedec08050 RDMA/cma: Increment port number after close to avoid re-use
Randomize the starting port number and avoid re-using port values
immediately after they are closed.  Instead keep track of the last
port value used and increment it every time a new port number is
assigned, to better replicate other port spaces.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-10 08:00:50 -08:00
Ahmed S. Darwish 9a6b090c0d IB/core: Use ARRAY_SIZE macro for mandatory_table
Use ARRAY_SIZE() macro already defined in kernel.h instead of open
coding equivalent code.

Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-10 08:00:47 -08:00
Steve Wise 1f12667021 RDMA/addr: Handle ethernet neighbour updates during route resolution
The iWARP connection manager uses the ib_addr services to do route
resolution (neighbour discovery in the IP world).  The ib_addr
netevent callback routine, however, currently only acts on InfiniBand
neighbour updates.  It needs to act on ethernet neighbour updates as
well.

This patch just removes filtering on device type altogether and will
trigger on any neighour updates where the nud_type is valid.  This
simplifies the code some.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-04 14:11:57 -08:00
Michael S. Tsirkin 062dbb69f3 IB: Return qp pointer as part of ib_wc
struct ib_wc currently only includes the local QP number: this matches
the IB spec, but seems mostly useless. The following patch replaces
this with the pointer to qp itself, and updates all low level drivers
and all users.

This has the following advantages:
- Ability to get a per-qp context through wc->qp->qp_context
- Existing drivers already have the qp pointer ready in poll cq, so
  this change actually saves a tiny bit (extra memory read) on data path
  (for ehca it would actually be expensive to find the QP pointer when
  polling a CQ, but ehca does not support SRQ so we can leave wc->qp as
  NULL for ehca)
- Users that need the QP number can still get it through wc->qp->qp_num

Use case:

In IPoIB connected mode code, I have a common CQ shared by multiple
QPs.  To track connection usage, I need a way to get at some per-QP
context upon the completion, and I would like to avoid allocating
context object per work request just to stick a QP pointer into it.
With this code, I can just use wc->qp->qp_context.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-04 14:11:55 -08:00
Sean Hefty 0cefcf0bbc RDMA/ucma: Don't report events with invalid user context
There's a problem with how rdma cm events are reported to userspace
that can lead to application crashes.

When a new connection request arrives, a context for the connection is
allocated in the kernel.  The connection event is then reported to
userspace.  The userspace library retrieves the event and allocates
its own context for the connection.  The userspace context is
associated with the kernel's context when accepting.  This allows the
kernel to give userspace context with other events.

A problem occurs if a second event for the same connection occurs
before the user has had a chance to call accept.  The userspace
context has not yet been set, which causes the librdmacm to crash.
(This has been seen when the app takes too long to call accept,
resulting in the remote side timing out and rejecting the connection)

Fix this by ignoring events for new connections until userspace has
set their context.  This can only happen if an error occurs on a new
connection before the user accepts it.  This is okay, since the accept
will just fail later.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-07 20:20:08 -08:00
Sean Hefty 30a5ec982e RDMA/ucma: Fix struct ucma_event leak when backlog is full
We discard new connection requests while the listen backlog is full,
but leak a struct ucma_event in the process.  Free the structure in
this case.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-07 20:17:34 -08:00
Steve Wise 881a045fc5 RDMA/iwcm: iWARP connection timeouts shouldn't be reported as rejects
The iWARP CM should report timeouts as event RDMA_CM_EVENT_UNREACHABLE,
not event RDMA_CM_EVENT_REJECTED.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-07 20:15:58 -08:00
Ralph Campbell 1527106ff8 IB/core: Use the new verbs DMA mapping functions
Convert code in core/ to use the new DMA mapping functions for kernel
verbs consumers.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 14:28:30 -08:00
Sean Hefty 7521663857 RDMA/cma: Export rdma cm interface to userspace
Export the rdma cm interfaces to userspace via a misc device.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 11:50:22 -08:00
Sean Hefty 628e5f6d39 RDMA/cma: Add support for RDMA_PS_UDP
Allow the use of UD QPs through the rdma_cm, in order to provide
address translation services for resolving IB addresses for datagram
messages using SIDR.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 11:50:21 -08:00
Sean Hefty 0fe313b000 RDMA/cma: Allow early transition to RTS to handle lost CM messages
During connection establishment, the passive side of a connection can
receive messages from the active side before the connection event has
been delivered to the user.  Allow the passive side to send messages
in response to received data before the event is delivered.  To handle
the case where the connection messages are lost, a new rdma_notify()
function is added that users may invoke to force a connection into the
established state.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 11:50:21 -08:00
Sean Hefty a1b1b61f80 RDMA/cma: Report connect info with connect events
Connection information was never given to the recipient of a
connection request or reply message.  Only the event was delivered.
Report the connection data with the event to allows user to
reject the connection based on the requested parameters, or adjust
their resources to match the request.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 11:50:21 -08:00
Sean Hefty 9b2e9c0c24 RDMA/cma: Remove unneeded qp_type parameter from rdma_cm
The qp_type parameter into the rdma_cm is unneeded, and can be
misleading.  The QP type should be determined from the port space.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 11:50:21 -08:00
Roland Dreier f47e22c6e4 IB/fmr: ib_flush_fmr_pool() may wait too long
ib_flush_fmr_pool() stashes away the request generation number
properly, but then goes ahead and rereads it every time it tests
whether the flush generation number has caught up.  This means that
there is a theoretical possibility of livelock, if the request
generation number keeps getting bumped and the flush generation number
never catches up.  The fix is simple: use the request generation
number read at the beginning of the function.

Also, atomic_inc() followed by atomic_read() can be replaced with
atomic_int_return().  There's no real requirement for atomicity here
but we might as well shrink the code.

This bug was discovered using David Binderman's list of "set but never
used" warnings from icc.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 11:50:19 -08:00
Josef Sipek 1cfd6e648b [PATCH] struct path: convert infiniband
Signed-off-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:46 -08:00
David Howells 4c1ac1b491 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	drivers/infiniband/core/iwcm.c
	drivers/net/chelsio/cxgb2.c
	drivers/net/wireless/bcm43xx/bcm43xx_main.c
	drivers/net/wireless/prism54/islpci_eth.c
	drivers/usb/core/hub.h
	drivers/usb/input/hid-core.c
	net/core/netpoll.c

Fix up merge failures with Linus's head and fix new compilation failures.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-12-05 14:37:56 +00:00
Michael S. Tsirkin f469b2626f IB/ucm: Fix deadlock in cleanup
ib_ucm_cleanup_events() holds file_mutex while calling ib_destroy_cm_id().
This can deadlock since ib_destroy_cm_id() flushes event handlers, and
ib_ucm_event_handler() needs file_mutex, too.  Therefore, drop the
file_mutex during the call to ib_destroy_cm_id().

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:10 -08:00
Sean Hefty e1444b5a16 IB/cm: Fix automatic path migration support
The ib_cm_establish() function is replaced with a more generic
ib_cm_notify().  This routine is used to notify the CM that failover
has occurred, so that future CM messages (LAP, DREQ) reach the remote
CM.  (Currently, we continue to use the original path)  This bumps the
userspace CM ABI.

New alternate path information is captured when a LAP message is sent
or received.  This allows QP attributes to be initialized for the user
when a new path is loaded after failover occurs.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:10 -08:00
Roland Dreier 04699a1f86 RDMA/addr: list_move() cleanups
Replace a couple list_del()/list_add() combos with list_move().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:09 -08:00
Krishna Kumar c78bb8442b RDMA/addr: Fix some cancellation problems in process_req()
Fix following problems in process_req() relating to cancellation:

- Function is wrongly doing another addr_remote() when cancelled,
  which is not required.
- Make failure reporting immediate by using time_after_eq().
- On cancellation, -ETIMEDOUT was returned to the callback routine
  instead of the more appropriate -ECANCELLED (users getting notified
  may want to print/return this status, eg ucma_event_handler).

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:09 -08:00
Krishna Kumar 9ab1ffa877 RDMA/iwcm: Fix comment for iwcm_deref_id() to match code
In iwcm_deref_id(), the comment says : "If the last reference is being
removed and iw_destroy_cm_id is waiting, wake up the waiting
thread". The second part of the comment, "and iw_destroy_cm_id is
waiting," is wrong, since this function either wakes the waiter
already waiting in iwcm_deref_id, or enables it (so that when
wait_for_completion() is performed later, it will immediately return).

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:08 -08:00
Krishna Kumar 715a588f42 RDMA/iwcm: Remove unnecessary function argument
Remove unnecessary cm_id_priv argument to copy_private_data(), and
change text to reflect the code.  Fix couple of typos in comments.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:08 -08:00
Krishna Kumar 13fccdb380 RDMA/iwcm: Remove unnecessary initializations
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:08 -08:00
Krishna Kumar 83b9658623 RDMA/iwcm: Fix memory leak
If we get IW_CM_EVENT_CONNECT_REQUEST message and encounter an error
(not in the LISTEN state, cannot create an id, cannot alloc
work_entry, etc), then the memory allocated by cm_event_handler() in
the event->private_data gets leaked. Since cm_work_handler has already
put the event on the work_free_list, this allocated memory is
leaked. High backlog value can allow DoS attacks.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:07 -08:00
Krishna Kumar 33ba0fa9f3 RDMA/iwcm: Fix memory corruption bug in cm_work_handler()
Possible memory corruption scenario: after putting the work entry back
on the work_free_list, we call process_event() which dereferences
work->event, which could have been modified to another value
meanwhile.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:07 -08:00
Roland Dreier e54f81889c IB: Convert kmem_cache_t -> struct kmem_cache
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:07 -08:00
Dotan Barak e31353eaec RDMA/cm: Remove setting local write as part of QP access flags
The qp_access_flags are for remote access permissions only, so
IB_ACCESS_LOCAL_WRITE is an invalid value.  Remove it from the values
set by cm_init_qp_init_attr() and cma_init_ib_qp().

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:06 -08:00
Eric Sesterhenn bed8bdfddd IB: kmemdup() cleanup
Replace open coded kmemdup() to save some screen space, and allow
inlining/not inlining to be triggered by gcc.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:06 -08:00
Krishna Kumar a1a733f65b RDMA/cma: Rewrite cma_req_handler() to encapsulate common code
Rewrite cma_req_handler error handling case to encapsulate
common code.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:05 -08:00
Krishna Kumar f115db4803 RDMA/addr: Use time_after_eq() instead of time_after() in queue_req()
In queue_req(), use time_after_eq() instead of time_after()
for following reasons :

- Improves insert time if multiple entries with same time are
  present.
- set_timeout need not be called if entry with same time
  is added to the list (and that happens to be the entry
  with the smallest time), saving atomic/locking operations.
- Earlier entries with same time are deleted first (fifo).

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:05 -08:00
Krishna Kumar e4022274cf RDMA/cma: Remove redundant check in cma_add_one
Remove redundant check of node_guid in cma_add_one().

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:04 -08:00
Krishna Kumar e82153b54d RDMA/cma: Optimize cma_bind_loopback() to check for empty list
Optimize to test for an empty list first.  This ends up simplifying
the code too.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:04 -08:00
David Howells c4028958b6 WorkStruct: make allyesconfig
Fix up for make allyesconfig.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-11-22 14:57:56 +00:00
Roland Dreier 39798695b4 IB/mad: Fix race between cancel and receive completion
When ib_cancel_mad() is called, it puts the canceled send on a list
and schedules a "flushed" callback from process context.  However,
this leaves a window where a receive completion could be processed
before the send is fully flushed.

This is fine, except that ib_find_send_mad() will find the MAD and
return it to the receive processing, which results in the sender
getting both a successful receive and a "flushed" send completion for
the same request.  Understandably, this confuses the sender, which is
expecting only one of these two callbacks, and leads to grief such as
a use-after-free in IPoIB.

Fix this by changing ib_find_send_mad() to return a send struct only
if the status is still successful (and not "flushed").  The search of
the send_list already had this check, so this patch just adds the same
check to the search of the wait_list.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-13 09:38:07 -08:00
Sean Hefty 7a118df3ea RDMA/addr: Use client registration to fix module unload race
Require registration with ib_addr module to prevent caller from
unloading while a callback is in progress.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-02 14:26:04 -08:00
Jack Morgenstein 0b26c88f29 IB/uverbs: Return sq_draining value in query_qp response
Return the sq_draining value back to user space for query_qp instead
of the en_sqd_async notify value, which is valid only for
modify_qp.  For query_qp, the draining status should returned.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-30 21:19:35 -08:00
Krishna Kumar 255d0c14b3 RDMA/cma: rdma_bind_addr() leaks a cma_dev reference count
rdma_bind_addr() leaks a cma_dev reference count in failure case.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
2006-10-30 20:52:51 -08:00
Sean Hefty 82a9c16a10 IB/cm: Send DREP in response to unmatched DREQ
Currently a DREP is only sent in response to a DREQ if a connection
has been found matching the DREQ, and it is in the proper state.  Once
a DREP is sent, the local connection moves into timewait.  Duplicate
DREQs received while in this state result in re-sending the DREP.

However, it's likely that the local connection will enter and exit
timewait before the remote side times out a lost DREP and resends a DREQ.
To handle this, we send a DREP in response to a DREQ, even if a local
connection is not found.  This avoids maintaining disconnected
id's in timewait states for excessively long times, just to handle a
lost DREP.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-10 12:50:38 -07:00
Sean Hefty 8575329d4f IB/cm: Fix timewait crash after module unload
If the ib_cm module is unloaded while id's are still in timewait, the
CM will destroy the work queue used to process timewait.  Once the
id's exit timewait, their timers will fire, leading to a crash trying
to access the destroyed work queue.

We need to track id's that are in timewait, and cancel their deferred
work on module unload.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-10 12:50:38 -07:00
Krishna Kumar 3f168d2b66 RDMA/cma: Optimize error handling
Reorganize code relating to cma_get_net_info() and rdam_create_id() to
optimize error case handling (no need to alloc memory/etc. as part of
rdma_create_id() if input parameters are wrong).

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-02 14:52:16 -07:00
Krishna Kumar 94de178ac6 RDMA/cma: Eliminate unnecessary remove_list
Eliminate remove_list by using list_del_init() instead during device
removal handling.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-02 14:52:16 -07:00
Sean Hefty 8f0472d331 RDMA/cma: Set status correctly on route resolution error
On reporting a route error, also include the status for the error,
rather than indicating a status of 0 when an error has occurred.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-02 14:52:15 -07:00
Krishna Kumar 6e35aabee1 RDMA/cma: Fix device removal race
The race is as follows:

A process : cma_process_remove() calls cma_remove_id_dev(),
	    which sets id state to CMA_DEVICE_REMOVAL and
	    calls wait_event(dev_remove).

B process : cma_req_handler() had incremented dev_remove,
	    and calls cma_acquire_ib_dev() and on failure
	    calls cma_release_remove(), which does a
	    wake_up of cma_process_remove(). Then
	    cma_req_handler() calls rdma_destroy_id();

A Process : cma_remove_id_dev() gets woken and checks the
	    state of id, and since it is still (wrongly)
	    CMA_DEVICE_REMOVAL, it calls notify_user(id)
	    and if that fails, the caller - cma_process_remove()
	    calls rdma_destroy_id(id). Two processes can
	    call rdma_destroy_id(), resulting in one
	    de-referencing kfreed id_priv.

Fix is for process B to set CMA_DESTROYING in cma_req_handler()
so that process A will return instead of doing a rdma_destroy_id().

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-02 14:52:15 -07:00
Krishna Kumar 675a027c3d RDMA/cma: Fix leak of cm_ids in case of failures
cma_connect_ib() and cma_connect_iw() leak cm_id's in failure cases.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-02 14:52:15 -07:00
Al Viro 60cad5da57 [IPV4]: annotate inetdev.h helpers
inet_confirm_addr(), inet_ifa_byprefix(), ip_dev_find(), inet_make_mask() and
inet_ifa_match() annotated, along with inferred net-endian variables

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:01:05 -07:00
Alexey Dobriyan 1a1d92c10d [PATCH] Really ignore kmem_cache_destroy return value
* Rougly half of callers already do it by not checking return value
* Code in drivers/acpi/osl.c does the following to be sure:

	(void)kmem_cache_destroy(cache);

* Those who check it printk something, however, slab_error already printed
  the name of failed cache.
* XFS BUGs on failed kmem_cache_destroy which is not the decision
  low-level filesystem driver should make. Converted to ignore.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:10 -07:00
Al Viro d7b2004528 [PATCH] missing includes from infiniband merge
indirect chains of includes are arch-specific and can't
be relied upon...  (hell, even attempt to build it for
itanic would trigger vmalloc.h ones; err.h triggers
on e.g. alpha).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-23 11:34:43 -07:00
Krishna Kumar 9cd330d36b IB: Fix typo in kerneldoc for ib_set_client_data()
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:58 -07:00