Commit Graph

763 Commits

Author SHA1 Message Date
Or Gerlitz 01b3fc8b15 IPoIB: Include err code in trace message for ib_sa_path_rec_get() failures
Print the return code of ib_sa_path_rec_get() if it fails to help
debug errors.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-22 14:18:34 -07:00
Or Gerlitz 2f5de15128 IB/iser: Add support for RDMA_CM_EVENT_ADDR_CHANGE event
Enhance iser to act upon notification on network stack changes that
make its RDMA connection unaligned with the link used by the stack for
the <src,dst> IPs used to establish the connection.

When RDMA_CM_EVENT_ADDR_CHANGE arrives, just disconnect the
connection, assuming that the user space iscsid daemon will reconnect,
and the new connection will be aligned with the IP stack.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-22 14:16:21 -07:00
David S. Miller 49997d7515 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	Documentation/powerpc/booting-without-of.txt
	drivers/atm/Makefile
	drivers/net/fs_enet/fs_enet-main.c
	drivers/pci/pci-acpi.c
	net/8021q/vlan.c
	net/iucv/iucv.c
2008-07-18 02:39:39 -07:00
Linus Torvalds 89a93f2f48 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (102 commits)
  [SCSI] scsi_dh: fix kconfig related build errors
  [SCSI] sym53c8xx: Fix bogus sym_que_entry re-implementation of container_of
  [SCSI] scsi_cmnd.h: remove double inclusion of linux/blkdev.h
  [SCSI] make struct scsi_{host,target}_type static
  [SCSI] fix locking in host use of blk_plug_device()
  [SCSI] zfcp: Cleanup external header file
  [SCSI] zfcp: Cleanup code in zfcp_erp.c
  [SCSI] zfcp: zfcp_fsf cleanup.
  [SCSI] zfcp: consolidate sysfs things into one file.
  [SCSI] zfcp: Cleanup of code in zfcp_aux.c
  [SCSI] zfcp: Cleanup of code in zfcp_scsi.c
  [SCSI] zfcp: Move status accessors from zfcp to SCSI include file.
  [SCSI] zfcp: Small QDIO cleanups
  [SCSI] zfcp: Adapter reopen for large number of unsolicited status
  [SCSI] zfcp: Fix error checking for ELS ADISC requests
  [SCSI] zfcp: wait until adapter is finished with ERP during auto-port
  [SCSI] ibmvfc: IBM Power Virtual Fibre Channel Adapter Client Driver
  [SCSI] sg: Add target reset support
  [SCSI] lib: Add support for the T10 (SCSI) Data Integrity Field CRC
  [SCSI] sd: Move scsi_disk() accessor function to sd.h
  ...
2008-07-15 18:58:04 -07:00
David S. Miller b9e4085768 netdev: Do not use TX lock to protect address lists.
Now that we have a specific lock to protect the network
device unicast and multicast lists, remove extraneous
grabs of the TX lock in cases where the code only needs
address list protection.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-15 00:15:08 -07:00
David S. Miller e308a5d806 netdev: Add netdev->addr_list_lock protection.
Add netif_addr_{lock,unlock}{,_bh}() helpers.

Use them to protect operations that operate on or read
the network device unicast and multicast address lists.

Also use them in cases where the code simply wants to
block calls into the driver's ->set_rx_mode() and
->set_multicast_list() methods.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-15 00:13:44 -07:00
Eli Cohen bc3a290b51 IPoIB: Double default RX/TX ring sizes
Increase IPoIB ring sizes to twice their original sizes (RX: 128->256,
TX: 64->128) to act as a shock absorber for high traffic peaks.  With
the current settings, we have seen cases that there are many calls to
netif_stop_queue(), which causes degradation in throughput.  Also,
larger receive buffer sizes help IPoIB in CM mode to avoid experiencing
RNR NAK conditions due to insufficient receive buffers at the SRQ.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:52 -07:00
Eli Cohen e112373fd6 IPoIB/cm: Reduce connected mode TX object size
Since IPoIB connected mode does not NETIF_F_SG, we only have one DMA
mapping per send, so we don't need a mapping[] array.  Define a new
struct with a single u64 mapping member and use it for the CM tx_ring.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:52 -07:00
Eli Cohen bd3606715e IPoIB: Use dev_set_mtu() to change mtu
When the driver sets the MTU of the net device outside of its
change_mtu method, it should make use of dev_set_mtu() instead of
directly setting the mtu field of struct netdevice.  Otherwise
functions registered to be called upon MTU change will not get called
(this is done through call_netdevice_notifiers() in dev_set_mtu()).

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:51 -07:00
Eli Cohen c8c2afe360 IPoIB: Use rtnl lock/unlock when changing device flags
Use of this lock is required to synchronize changes to the netdvice's
data structs.  Also move the call to ipoib_flush_paths() after the
modification of the netdevice flags in set_mode().

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:51 -07:00
Roland Dreier 9eae554c17 IPoIB: Get rid of ipoib_mcast_detach() wrapper
ipoib_mcast_detach() does nothing except call ib_detach_mcast(), so just
use the core API in the one place that does a multicast group detach.

add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-105 (-105)
function                                     old     new   delta
ipoib_mcast_leave                            357     319     -38
ipoib_mcast_detach                            67       -     -67

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:50 -07:00
Eli Cohen d0de13622d IPoIB: Only set Q_Key once: after joining broadcast group
The current code will set the Q_Key for any join of a non-sendonly
multicast group.  The operation involves a modify QP operation, which
is fairly heavyweight, and is only really required after the join of
the broadcast group.  Fix this by adding a parameter to ipoib_mcast_attach()
to control when the Q_Key is set.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:50 -07:00
Eli Cohen 5892eff91a IPoIB: Remove priv->mcast_mutex
No need for a mutex around calls to ib_attach_mcast/ib_detach_mcast
since these operations are synchronized at the HW driver layer.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:50 -07:00
Eli Cohen c03d4731b5 IPoIB: Remove unused IPOIB_MCAST_STARTED code
The IPOIB_MCAST_STARTED flag is not used at all since commit b3e2749b
("IPoIB: Don't drop multicast sends when they can be queued"), so
remove it.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:50 -07:00
Moni Shoua ee1e2c82c2 IPoIB: Refresh paths instead of flushing them on SM change events
The patch tries to solve the problem of device going down and paths being
flushed on an SM change event. The method is to mark the paths as candidates for
refresh (by setting the new valid flag to 0), and wait for an ARP
probe a new path record query.

The solution requires a different and less intrusive handling of SM
change event. For that, the second argument of the flush function
changes its meaning from a boolean flag to a level.  In most cases, SM
failover doesn't cause LID change so traffic won't stop.  In the rare
cases of LID change, the remote host (the one that hadn't changed its
LID) will lose connectivity until paths are refreshed. This is no
worse than the current state.  In fact, preventing the device from
going down saves packets that otherwise would be lost.

Signed-off-by: Moni Levy <monil@voltaire.com>
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:49 -07:00
Vladimir Sokolovsky af40da894e IPoIB: add LRO support
Add "ipoib_use_lro" module parameter to enable LRO and an
"ipoib_lro_max_aggr" module parameter to set the max number of packets
to be aggregated.  Make LRO controllable and LRO statistics accessible
through ethtool.

Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:48 -07:00
Ron Livne 1240673405 IPoIB: Use multicast loopback blocking if available
Set IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK for IPoIB's UD QPs if
supported by the underlying device.  This creates an improvement of up
to 39% in bandwidth when sending multicast packets with IPoIB, and an
improvment of 12% in cpu usage.

Signed-off-by: Ron Livne <ronli@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:48 -07:00
Roland Dreier a7d834c4bc IPoIB/cm: Fix racy use of receive WR/SGL in ipoib_cm_post_receive_nonsrq()
For devices that don't support SRQs, ipoib_cm_post_receive_nonsrq() is
called from both ipoib_cm_handle_rx_wc() and ipoib_cm_nonsrq_init_rx(),
and these two callers are not synchronized against each other.
However, ipoib_cm_post_receive_nonsrq() always reuses the same receive
work request and scatter list structures, so multiple callers can end
up stepping on each other, which leads to posting garbled work
requests.

Fix this by having the caller pass in the ib_recv_wr and ib_sge
structures to use, and allocating new local structures in
ipoib_cm_nonsrq_init_rx().

Based on a patch by Pradeep Satyanarayana <pradeep@us.ibm.com> and
David Wilder <dwilder@us.ibm.com>, with debugging help from Hoang-Nam
Nguyen <hnguyen@de.ibm.com>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:47 -07:00
Eli Cohen f89271da32 IPoIB: Copy small received SKBs in connected mode
The connected mode implementation in the IPoIB driver has a large
overhead in the way SKBs are handled in the receive flow.  It usually
allocates an SKB with as big as was used in the currently received SKB
and moves unused fragments from the old SKB to the new one. This
involves a loop on all the remaining fragments and incurs overhead on
the CPU.  This patch, for small SKBs, allocates an SKB just large
enough to contain the received data and copies to it the data from the
received SKB.  The newly allocated SKB is passed to the stack and the
old SKB is reposted.

When running netperf, UDP small messages, without this pach I get:

    UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
    14.4.3.178 (14.4.3.178) port 0 AF_INET
    Socket  Message  Elapsed      Messages
    Size    Size     Time         Okay Errors   Throughput
    bytes   bytes    secs            #      #   10^6bits/sec

    114688     128   10.00     5142034      0     526.31
    114688           10.00     1130489            115.71

With this patch I get both send and receive at ~315 mbps.

The reason that send performance actually slows down is as follows:
When using this patch, the overhead of the CPU for handling RX packets
is dramatically reduced.  As a result, we do not experience RNR NAK
messages from the receiver which cause the connection to be closed and
reopened again; when the patch is not used, the receiver cannot handle
the packets fast enough so there is less time to post new buffers and
hence the mentioned RNR NACKs.  So what happens is that the
application *thinks* it posted a certain number of packets for
transmission but these packets are flushed and do not really get
transmitted.  Since the connection gets opened and closed many times,
each time netperf gets the CPU time that otherwise would have been
given to IPoIB to actually transmit the packets.  This can be verified
when looking at the port counters -- the output of ifconfig and the
oputput of netperf (this is for the case without the patch):

    tx packets
    ==========
    port counter:   1,543,996
    ifconfig:       1,581,426
    netperf:        5,142,034

    rx packets
    ==========
    netperf         1,1304,089

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
2008-07-14 23:48:44 -07:00
Roland Dreier f3781d2e89 RDMA: Remove subversion $Id tags
They don't get updated by git and so they're worse than useless.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:44 -07:00
Roland Dreier 969a60f9db IB/srp: Remove use of cached P_Key/GID queries
The SRP initiator is currently using ib_find_cached_pkey() and
ib_get_cached_gid() in situations where the uncached ib_find_pkey()
and ib_query_gid() functions serve just as well: sleeping is allowed
and performance is not an issue.  Since we want to eliminate the
cached operations in the long term, convert SRP to use the uncached
variants.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:43 -07:00
Mike Christie 8e9a20cee4 [SCSI] libiscsi, iscsi_tcp, ib_iser: fix setting of can_queue with old tools.
This patch fixes two bugs that are related.

1. Old tools did not set can_queue/cmds_max. This patch modifies
libiscsi so that when we add the host we catch this and set it
to the default.

2. iscsi_tcp thought that the scsi command that was passed to
the eh functions needed a iscsi_cmd_task allocated for it. It
only needed a mgmt task, and now it does not matter since it
all comes from the same pool and libiscsi handles this for the
drivers. ib_iser had copied iscsi_tcp's code and set can_queue
to its max - 1 to handle this. So this patch removes the max -1,
and just sets it to the max.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:29 -05:00
Mike Christie 913e5bf435 [SCSI] libiscsi, iser, tcp: remove recv_lock
The recv lock was defined so the iscsi layer could block
the recv path from processing IO during recovery. It
turns out iser just set a lock to that pointer which was pointless.

We now disconnect the transport connection before doing recovery
so we do not need the recv lock. For iscsi_tcp we still stop
the recv path incase older tools are being used.

This patch also has iscsi_itt_to_ctask user grab the session lock
and has the caller access the task with the lock or get a ref
to it in case the target is broken and sends a tmf success response
then sends data or a response for the command that was supposed to
be affected bty the tmf.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:22 -05:00
Mike Christie 88dfd340b9 [SCSI] iscsi class: Add session initiatorname and ifacename sysfs attrs.
This adds two new attrs used for creating initiator ports and
binding sessions to hardware.

The session level initiatorname:

Since bnx2i does a scsi_host per host device, we need to add the
iface initiator port settings on the session, so we can create
multiple initiator ports (each with different inames) per device/scsi_host.

The current iname reflects that qla4xxx can have one iname per hba, and we are
allocating a host per session for software. The iname on the host will
remain so we can export and set the hba level qla4xxx setting.

The ifacename attr:

To bind a session to a some peice of hardware in userspace we maintain
some mappings, but during boot or iscsid restart (iscsid contains the user
space part of the driver) we need to be able to figure out which of those
host mappings abstractions maps to certain sessions. This patch adds
a ifacename attr, which userspace can set to id the host side of the
endpoint across pivot_roots and iscsid restarts.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:21 -05:00
Mike Christie 412eeafa0a [SCSI] iser: Modify iser to take a iscsi_endpoint struct in ep callouts and session setup
This hooks iser into the iscsi endpoint code. Previously it handled the
lookup and allocation. This has been made generic so bnx2i and iser can
share it. It also allows us to pass iser the leading conn's ep, so we
know the ib_deivce being used and can set it as the scsi_host's parent.
And that allows scsi-ml to set the dma_mask based on those values.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:21 -05:00
Mike Christie 7970634b81 [SCSI] iscsi class: user device_for_each_child instead of duplicating session list
Currently we duplicate the list of sessions, because we were using the
test for if a session was on the host list to indicate if the session
was bound or unbound. We can instead use the target_id and fix up
the class so that drivers like bnx2i do not have to manage the target id
space.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:20 -05:00
Mike Christie 2261ec3d68 [SCSI] iser: handle iscsi_cmd_task rename
This handles the iscsi_cmd_task rename and renames
the iser cmd task to iser task.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:20 -05:00
Mike Christie 2747fdb257 [SCSI] iser: convert ib_iser to support merged tasks
Convert ib_iser to support merged tasks.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:19 -05:00
Mike Christie 0af967f5d4 [SCSI] libiscsi, iscsi_tcp, iser: add session cmds array accessor
Currently to get a ctask from the session cmd array, you have to
know to use the itt modifier. To make this easier on LLDs and
so in the future we can easilly kill the session array and use
the host shared map instead, this patch adds a nice wrapper
to strip the itt into a session->cmds index and return a ctask.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:18 -05:00
Mike Christie b40977d95f [SCSI] iser: fix handling of scsi cmnds during recovery.
After the stop_conn callback has returned the LLD should not
touch the scsi cmds. iscsi_tcp and libiscsi use the
conn->recv_lock and suspend_rx field to halt recv path
processing, but iser does not have any protection.

This patch modifies iser so that userspace can just
call the ep_disconnect callback, which will halt
all recv IO, before calling the stop_conn callback so
we do not have to worry about the conn->recv_lock and
suspend rx field. iser just needs to stop the send side
from accessing the ib conn.

Fixup to handle when the ep poll fails and ep disconnect
is called from Erez.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:17 -05:00
Mike Christie 5d91e209fb [SCSI] iscsi: remove session/conn_data_size from iscsi_transport
This removes the session and conn data_size fields from the iscsi_transport.
Just pass in the value like with host allocation. This patch also makes
it so the LLD iscsi_conn data is allocated with the iscsi_cls_conn.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:16 -05:00
Mike Christie a4804cd6eb [SCSI] iscsi: add iscsi host helpers
This finishes the host/session unbinding, by adding some helpers
to add and remove hosts and the session they manage.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:16 -05:00
Mike Christie 756135215e [SCSI] iscsi: remove session and host binding in libiscsi
bnx2i allocates a host per netdevice but will use libiscsi,
so this unbinds the session from the host in that code.

This will also be useful for the iser parent device dma settings
fixes.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:16 -05:00
Mike Christie d3826721b1 [SCSI] iscsi class, iscsi drivers: remove unused iscsi_transport attrs
max_cmd_len and max_conn are not really used. max_cmd_len is
always 16 and can be set by the LLD. max_conn is always one
since we do not support MCS.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:15 -05:00
Mike Christie 40753caa36 [SCSI] iscsi class, iscsi_tcp/iser: add host arg to session creation
iscsi offload (bnx2i and qla4xx) allocate a scsi host per hba,
so the session creation path needs a shost/host_no argument.
Software iscsi/iser will follow the same behabior as before
where it allcoates a host per session, but in the future iser
will probably look more like bnx2i where the host's parent is
the hardware (rnic for iser and for bnx2i it is the nic), because
it does not use a socket layer like how iscsi_tcp does.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-07-12 08:22:15 -05:00
Jack Morgenstein e1d50dce5a IPoIB: Test for NULL broadcast object in ipiob_mcast_join_finish()
We saw a kernel oops in our regression testing when a multicast "join
finish" occurred just after the interface was -- this is
<https://bugs.openfabrics.org/show_bug.cgi?id=1040>.  The test
randomly causes the HCA physical port to go down then up.

The cause of this is that ipoib_mcast_join_finish() processing happen
just after ipoib_mcast_dev_flush() was invoked (in which case the
broadcast pointer is NULL).  This patch tests for and handles the case
where priv->broadcast is NULL.

Cc: <stable@kernel.org>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-20 15:41:09 -07:00
Eli Cohen 57ce41d1d1 IB/ipoib: Fix transmit queue stalling forever
Commit f56bcd80 ("IPoIB: Use separate CQ for UD send completions")
introduced a bug where the transmit queue could get stopped and never
woken up.  The problem is that send completions are only polled at the
end of the xmit function, so if the send queue fills up and the xmit
path stops the queue, then there is no way for send completions to
ever get polled, and so the transmit queue stays stopped forever.

Fix this by arming the send CQ just before posting the last send
request that fills the send queue.  Then, when the completion event
handler is called, drain the send CQ.  Since it is possible that not
enough send completions are in the CQ, verify that the the net queue
has been woken up after draining the send CQ, and if not arm a timer
and drain again at the timer function.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-30 20:02:45 -07:00
Eli Cohen b4132efa1a IPoIB: Copy child MTU from parent
When creating a child interface, copy the MTU information from the
parent.  Otherwise when the child's multicast join completes, the MTU
will not be updated since the code does

	dev->mtu = min(priv->mcast_mtu, priv->admin_mtu);

and priv->admin_mtu will be set to 0.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:53 -07:00
Eli Cohen f56bcd8013 IPoIB: Use separate CQ for UD send completions
Use a dedicated CQ for UD send completions. Also, do not arm the UD
send CQ, which reduces the number of interrupts generated.  This patch
farther reduces overhead by not calling poll CQ for every posted send
WR -- it does polls only when there 16 or more outstanding work requests.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:53 -07:00
Eli Dorfman 87528227df IB/iser: Count FMR alignment violations per session
Count FMR alignment violations per session as part of the iscsi
statistics.

Signed-off-by: Eli Dorfman <elid@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:52 -07:00
Eli Dorfman 6f735e36ba IB/iser: Move high-volume debug output to higher debug level
Add another level for debug.

Signed-off-by: Eli Dorfman <elid@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:52 -07:00
Shirley Ma bc7b3a36ba IPoIB: Handle 4K IB MTU for UD (datagram) mode
This patch enables IPoIB to use 4K UD messages (when the underlying
device and fabrics support a 4K MTU) by using two scatter buffers when
PAGE_SIZE is less than or equal to thhe HCA IB MTU size.  The first
buffer is for IPoIB header + GRH header, and the second buffer is the
IPoIB payload, which is 4K-4.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-23 11:55:45 -07:00
Tony Jones ee959b00c3 SCSI: convert struct class_device to struct device
It's big, but there doesn't seem to be a way to split it up smaller...

Signed-off-by: Tony Jones <tonyj@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-19 19:10:33 -07:00
Greg Kroah-Hartman 0532193746 IB: rename "dev" to "srp_dev" in srp_host structure
This sets us up to be able to convert the srp_host to use a struct
device instead of a class_device.

Based on a original patch from Tony Jones, but split up into this piece
by Greg.

Signed-off-by: Tony Jones <tonyj@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Reviewed-by: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-19 19:10:30 -07:00
Erez Zilber 0a22ab92f5 IB/iser: Don't change itt endianness
The itt field in struct iscsi_data is not defined with any particular
endianness.  open-iscsi should use it as-is without byte-swapping it.
This fixes sparse warnings coming from doing ntohl(hdr->itt).

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:35 -07:00
Roland Dreier 9fdd5e5bf6 IPoIB: Handle case when P_Key is deleted and re-added at same index
If a P_Key is deleted and then re-added at the same index, then IPoIB
gets confused because __ipoib_ib_dev_flush() only checks whether the
index is the same without checking whether the P_Key was present, so
the interface is stopped when the P_Key is deleted, but the event when
the P_Key is re-added gets ignored and the interface never gets
restarted.

Also, switch to using ib_find_pkey() instead of ib_find_cached_pkey()
everywhere in IPoIB, since none of the places that look for P_Keys are
in a fast path or in non-sleeping context, and in general we want to
kill off the whole caching infrastructure eventually.  This also fixes
consistency problems caused because some IPoIB queries were cached and
some were uncached during the window where the cache was not updated.

Thanks to Venkata Subramonyam <vsubramo@cisco.com> for debugging this
problem and testing this fix.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:35 -07:00
Erez Zilber d97c51707d IB/iser: Release connection resources on RDMA_CM_EVENT_DEVICE_REMOVAL event
When a RDMA_CM_EVENT_DEVICE_REMOVAL event is raised, iSER should
release the connection resources.

This is necessary when the IB HCA module is unloaded while open-iscsi
is still running.  Currently, iSER just BUG()s.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:35 -07:00
Eli Cohen 28d52b3cd8 IPoIB: Support modifying IPoIB CQ event moderation
This can be used to tune at run time the parameters controlling the
event (interrupt) generation rate and thus reduce the overhead
incurred by handling interrupts resulting in better throughput.  Since
IPoIB uses a single CQ for both RX and TX, RX is chosen to dictate
configuration for both RX and TX.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:33 -07:00
Eli Cohen 82c24c18af IPoIB: Add basic ethtool support
Just add the infrastructure so we can add functionality later.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:32 -07:00
Eli Cohen 40ca1988e0 IPoIB: Add LSO support
For HCAs that support TCP segmentation offload (IB_DEVICE_UD_TSO), set
NETIF_F_TSO and use HW LSO to offload TCP segmentation.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:27 -07:00
Robert P. J. Day 157de22946 IB: Use shorter list_splice_init() for brevity
Convert list_splice() + INIT_LIST_HEAD() to the equivalent list_splice_init()

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:09:26 -07:00
David Dillow 1e89a1946c IB/srp: Enforce protocol limit on srp_sg_tablesize
The current SRP initiator will allow unlimited s/g entries in the
indirect descriptors lists, but the entry count field in the SRP_CMD
request is 8 bits, so setting srp_sg_tablesize too large will open the
possibility of wrapping the count and generating invalid requests.

Clamp srp_sg_tablesize to the protocol limits to prevent surprises.

Reported by Martin W. Schlining III <mschlining@datadirectnet.com>.

Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:12 -07:00
Eli Cohen 6046136c74 IPoIB: Use checksum offload support if available
For HCAs that support checksum offload (ie that set IB_DEVICE_UD_IP_CSUM
in the device capabilities flags), have IPoIB set NETIF_F_IP_CSUM and
use the HCA to generate and verify IP checksums.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16 21:01:10 -07:00
Roland Dreier 10313cbb92 IPoIB: Allocate priv->tx_ring with vmalloc()
Commit 7143740d ("IPoIB: Add send gather support") made struct
ipoib_tx_buf significantly larger, since the mapping member changed
from a single u64 to an array with MAX_SKB_FRAGS + 1 entries.  This
means that allocating tx_rings with kzalloc() may fail because there
is not enough contiguous memory for the new, much bigger size.  Fix
this regression by allocating the rings with vmalloc() instead.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-12 07:51:03 -07:00
Roland Dreier 4200406b8f IPoIB/cm: Set tx_wr.num_sge in connected mode post_send()
Commit 7143740d ("IPoIB: Add send gather support") made it possible
for tx_wr.num_sge to be != 1 -- this happens if send gather support is
enabled.  However, the code in the connected mode post_send() function
assumes the old invariant, namely that tx_wr.num_sge is always 1.  Fix
this by explicitly setting tx_wr.num_sge to 1 in the CM post_send().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-11 18:35:20 -07:00
Or Gerlitz b3e2749bf3 IPoIB: Don't drop multicast sends when they can be queued
When set_multicast_list() is called the multicast task is restarted
and the IPOIB_MCAST_STARTED bit is cleared.  As a result for some
window of time, multicast packets are not transmitted nor queued but
rather dropped by ipoib_mcast_send().  These dropped packets are
painful in two cases:

 - bonding fail-over which both calls set_multicast_list() on the new
   active slave and sends Gratuitous ARP through that slave.

 - IP_DROP_MEMBERSHIP code which both calls set_multicast_list() on the
   device and issues IGMP leave.

In both these cases, depending on the scheduling of the IPoIB
multicast task, the packets would be dropped.  As a result, in the
bonding case, the failover would not be detected by the peers until
their neighbour is renewed the neighbour (which takes a few tens of
seconds).  In the IGMP case, the IP router doesn't get an IGMP leave
and would only learn on that from further probes on the group (also a
delay of at least a few tens of seconds).

Fix this by allowing transmission (or queuing) depending on the
IPOIB_FLAG_OPER_UP flag instead of the IPOIB_MCAST_STARTED flag.

Signed-off-by: Olga Shern <olgas@voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-11 14:12:03 -07:00
Arne Redlich d33ed425c6 IB/iser: Handle iser_device allocation error gracefully
"iser_device" allocation failure is "handled" with a BUG_ON() right
before dereferencing the NULL-pointer - fix this!

Signed-off-by: Arne Redlich <arne.redlich@xiranet.com>
Signed-off-by: Erez Zilber <erezz@voltaire.com>
2008-03-10 21:17:51 -07:00
Arne Redlich 9a378270c0 IB/iser: Fix list iteration bug
The iteration through the list of "iser_device"s during device
lookup/creation is broken -- it might result in an infinite loop if
more than one HCA is used with iSER.  Fix this by using
list_for_each_entry() instead of the open-coded flawed list iteration
code.

Signed-off-by: Arne Redlich <arne.redlich@xiranet.com>
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-10 21:15:49 -07:00
Pradeep Satyanarayana ec229e5e81 IPoIB/cm: Fix ipoib_cm_dev_stop() cleanup when drain times out
Commit efcd9971 ("IPoIB/cm: Factor out ipoib_cm_free_rx_reap_list()")
introduced a bug in ipoib_cm_dev_stop() when the receive drain times
out.  In that case, the function moves all the pending rx stuff into a
private list but then calls ipoib_cm_free_rx_reap_list(), which
handles a different list.

Fix this by moving everything to the rx_reap_list that will actually
get freed up.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=906>.

Signed-off-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-19 10:25:11 -08:00
Eli Cohen a9d1884925 IPoIB: Remove unused struct ipoib_cm_tx.ibwc member
struct ipoib_cm_tx.ibwc is unused since commit 1b524963 ("IPoIB/cm:
Use common CQ for CM send completions"), so remove it.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
2008-02-14 10:30:50 -08:00
Jack Morgenstein 167c42655c IPoIB: On P_Key change event, reset state properly
In P_Key event handling, if the old P_Key is no longer available, the
driver must call ipoib_ib_dev_stop() -- just as it does when the P_Key
is still available (see procedure __ipoib_ib_dev_flush()).

When a P_Key becomes available, the driver will perform ipoib_open(),
which assumes that the QP is in RESET, the cm_id has been
destroyed/deleted, etc.  If ipoib_ib_dev_stop() is not called as
described above, then these assumptions will be false, and the attempt
to bring the interface up will fail.

Found by Mellanox QA.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-14 10:15:06 -08:00
Eli Cohen 7143740d26 IPoIB: Add send gather support
This patch acts as a preparation for using checksum offload for IB
devices capable of inserting/verifying checksum in IP packets.  The
patch does not actaully turn on NETIF_F_SG - we defer that to the
patches adding checksum offload capabilities.

We only add support for send gathers for datagram mode, since existing
HW does not support checksum offload on connected QPs.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-08 14:32:37 -08:00
Eli Cohen eb14032f9e IPoIB: Add high DMA feature flag
All current InfiniBand devices can handle all DMA addresses, and it's
hard to imagine anyone would be silly enough to build a new device
that couldn't.  Therefore, enable the NETIF_F_HIGHDMA feature for IPoIB.

This has no effect for no, but is needed when we enable gather/scatter
support and checksum stateless offloads.

Signed-off-by: Eli Cohen <eli@mellnaox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-08 13:39:26 -08:00
David Dillow 9fe4bcf45e IB/srp: Retry stale connections
When a host just goes away (crash, power loss, etc.) without tearing
down its IB connections, it can get stale connection errors when it
tries to reconnect to targets upon rebooting.  Retrying the connection
a few times will prevent sysadmins from playing the "which disk(s)
went missing?" game.

This would have made things slightly quicker when tracking down some
of the recent bugs, but it also helps quite a bit when you've got a
large number of targets hanging off a wedged server.

Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:43 -08:00
Or Gerlitz 7bc531dd88 IPoIB: Remove a misleading debug print
Commit 732a2170 ("IB/ipoib: Bound the net device to the ipoib_neigh
structue") left a misleading debug print (n->dev would be a bond
device only if boding is used).  Clean it up.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:43 -08:00
Or Gerlitz bafff97417 IPoIB: Handle bonding failover race for connected neighbours too
Move up the code that checks for a situation where the remote GID
stored in the ipoib_neigh is different than the one present in the
neighbour (handle gratuitous ARP) or that a bonding fail over has
happened but the neighbour still has a pointer to an ipoib_neigh
created by a different device than the current slave.  This will cause
the driver to apply the check also for connected mode neighbours.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04 20:20:43 -08:00
James Bottomley d3f46f39b7 [SCSI] remove use_sg_chaining
With the sg table code, every SCSI driver is now either chain capable
or broken (or has sg_tablesize set so chaining is never activated), so
there's no need to have a check in the host template.

Also tidy up the code by moving the scatterlist size defines into the
SCSI includes and permit the last entry of the scatterlist pools not
to be a power of two.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-01-30 13:14:02 -06:00
Linus Torvalds 9b73e76f3c Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (200 commits)
  [SCSI] usbstorage: use last_sector_bug flag universally
  [SCSI] libsas: abstract STP task status into a function
  [SCSI] ultrastor: clean up inline asm warnings
  [SCSI] aic7xxx: fix firmware build
  [SCSI] aacraid: fib context lock for management ioctls
  [SCSI] ch: remove forward declarations
  [SCSI] ch: fix device minor number management bug
  [SCSI] ch: handle class_device_create failure properly
  [SCSI] NCR5380: fix section mismatch
  [SCSI] sg: fix /proc/scsi/sg/devices when no SCSI devices
  [SCSI] IB/iSER: add logical unit reset support
  [SCSI] don't use __GFP_DMA for sense buffers if not required
  [SCSI] use dynamically allocated sense buffer
  [SCSI] scsi.h: add macro for enclosure bit of inquiry data
  [SCSI] sd: add fix for devices with last sector access problems
  [SCSI] fix pcmcia compile problem
  [SCSI] aacraid: add Voodoo Lite class of cards.
  [SCSI] aacraid: add new driver features flags
  [SCSI] qla2xxx: Update version number to 8.02.00-k7.
  [SCSI] qla2xxx: Issue correct MBC_INITIALIZE_FIRMWARE command.
  ...
2008-01-25 17:19:08 -08:00
Jan Engelhardt 1cf18d5aab IPoIB: Constify seq_operations function pointer tables
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:46 -08:00
Krishna Kumar 48fe5e594c IPoIB: Remove redundant check of netif_queue_stopped() in xmit handler
qdisc_run() now tests for queue_stopped() before calling
__qdisc_run(), and the same check is done in every iteration of
__qdisc_run(), so another check is not required in the driver xmit.
This means that ipoib_start_xmit() no longer needs to test
netif_queue_stopped(); the test was added to fix earlier kernels,
where the networking stack did not guarantee that the xmit method of
an LLTX driver would not be called after the queue was stopped, but
current kernels do provide this guarantee.

To validate, I put a debug in the TX_BUSY path which never hit with 64
threads running overnight exercising this code a few 100 million
times.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:17:44 -08:00
Erez Zilber 6410627eb9 IB/iser: Add change_queue_depth method
Add a .change_queue_depth handler to the scsi_host_template in the
iSER driver.  iscsi_change_queue_depth was added to iscsi_tcp in order
to solve the problem of queue depth which was too high for some
targets.  It is also applicable for iSER.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:43 -08:00
Erez Zilber a4ef1451df IB/iser: Print information about unhandled RDMA CM events
Some RDMA CM events are not supported or not handled in iSER.
This patch adds some info (printk) for the user about them.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:43 -08:00
David Dillow 7aa54bd730 IB/srp: Add identifying information to log messages
When you have multiple targets, it gets really confusing when you try
to track down who did a reset when there is no identifying information
in the log message, especially when the same extension ID is mapped
through two different local IB ports.  So, add an identifier that can
be used to track back to which local IB port/remote target pair is the
one having problems.

Signed-off-by: David Dillow <dillowda@ornl.gov>
Acked-by: Pete Wyckoff <pw@osc.edu>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:38 -08:00
Pradeep Satyanarayana 586a693448 IPoIB/CM: Enable SRQ support on HCAs that support fewer than 16 SG entries
Some HCAs (such as ehca2) support SRQ, but only support fewer than 16 SG
entries for SRQs.  Currently IPoIB/CM implicitly assumes all HCAs will
support 16 SG entries for SRQs (to handle a 64K MTU with 4K pages). This
patch removes that restriction by limiting the maximum MTU in connected
mode to what the maximum number of SRQ SG entries allows.

This patch addresses <https://bugs.openfabrics.org/show_bug.cgi?id=728>

Signed-off-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:37 -08:00
David Dillow fff09a8e6e IB/srp: Enable SG list chaining
By default, the SCSI mid-layer seems to send down 512KB requests
(sg_tablesize = 256), with some requests occasionally combined. By
allowing the mid-layer to chain requests, we can easily grow to 1024KB
or larger -- I've tested 4096KB I/O requests with no problems.

I looked through the DMA paths on the hardware drivers to ensure they
could take advantage of the SG chaining, and it seems that every one
except ipath uses the system's DMA routines, which have been converted
to handle chaining.  ipath looks like it should be OK, but I have no
way to test it.

Signed-off-by: David Dillow <dillowda@ornl.gov>

[ Tested on ipath.  - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:37 -08:00
David Dillow 8cba207732 IB/srp: Respect target credit limit
The current SRP initiator will send requests even if it has no credits
available.  The results of sending extra requests are vendor specific,
but on some devices, overrunning credits will cost 85% of peak
performance -- e.g. 100 MB/s vs 720 MB/s.  Other devices may just drop
the requests.

This patch will tell the SCSI midlayer to queue requests if there are
fewer than two credits remaining, and will not issue a task management
request if there are no credits remaining.  The mid-layer will retry
the queued command once an outstanding command completes.

The patch also removes the unlikely() in __srp_get_tx_iu(), as it is
not at all unlikely to hit this limit under heavy load.

Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:37 -08:00
Rolf Manderscheid a9e527e3f9 IPoIB: improve IPv4/IPv6 to IB mcast mapping functions
An IPoIB subnet on an IB fabric that spans multiple IB subnets can't
use link-local scope in multicast GIDs.  The existing routines that
map IP/IPv6 multicast addresses into IB link-level addresses hard-code
the scope to link-local, and they also leave the partition key field
uninitialised.  This patch adds a parameter (the link-level broadcast
address) to the mapping routines, allowing them to initialise both the
scope and the P_Key appropriately, and fixes up the call sites.

The next step will be to add a way to configure the scope for an IPoIB
interface.

Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:37 -08:00
Oliver Pinter 38dc732f47 IB/iser: Typo fix (s/destory/destroy/)
Signed-off-by: Oliver Pinter <oliver.pntr@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:32 -08:00
Erez Zilber bd5d7a8585 IB/iser: update URLs of iSER docs
Signed-off-by: Erez Zilber <erezz@voltaire.com>
2008-01-25 14:15:32 -08:00
Joe Perches 908cf9a565 drivers/infiniband: Add missing "space"
Add missing spaces in the middle of format strings.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:26 -08:00
Pradeep Satyanarayana 68e995a295 IPoIB/cm: Add connected mode support for devices without SRQs
Some IB adapters (notably IBM's eHCA) do not implement SRQs (shared
receive queues).  The current IPoIB connected mode support only works
on devices that support SRQs.

Fix this by adding support for using the receive queue of each
connected mode receive QP.  The disadvantage of this compared to using
an SRQ is that it means a full queue of receives must be posted for
each remote connected mode peer, which means that total memory usage
is potentially much higher than when using SRQs.  To manage this, add
a new module parameter "max_nonsrq_conn_qp" that limits the number of
connections allowed per interface.

The rest of the changes are fairly straightforward: we use a table of
struct ipoib_cm_rx to hold all the active connections, and put the
table index of the connection in the high bits of receive WR IDs.
This is needed because we cannot rely on the struct ib_wc.qp field for
non-SRQ receive completions.  Most of the rest of the changes just
test whether or not an SRQ is available, and post receives or find
received packets in the right place depending on the answer.

Cleaning up dead connections actually becomes simpler, because we do
not have to do the "last WQE reached" dance that is required to
destroy QPs attached to an SRQ.  We just move the QP to the error
state and wait for all pending receives to be flushed.

Signed-off-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>

[ Completely rewritten and split up, based on Pradeep's work.  Several
  bugs fixed and no doubt several bugs introduced.  - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:24 -08:00
Roland Dreier efcd99717f IPoIB/cm: Factor out ipoib_cm_free_rx_reap_list()
Factor out the code for going through the rx_reap list of struct
ipoib_cm_rx and freeing each one.  This consolidates the code
duplicated between ipoib_cm_dev_stop() and ipoib_cm_rx_reap() and
reduces the risk of error when adding additional accounting.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:24 -08:00
Roland Dreier 7b3687df66 IPoIB/cm: Factor out ipoib_cm_create_srq()
Factor out the code to create an SRQ and allocate the receive ring in
ipoib_cm_dev_init() into a new function ipoib_cm_create_srq().  This
will make the code neater when support for devices that don't implement
SRQs is added.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:24 -08:00
Roland Dreier 1efb61444c IPoIB/cm: Factor out ipoib_cm_free_rx_ring()
Factor out the code to unmap/free skbs and free the receive ring in
ipoib_cm_dev_cleanup() into a new function ipoib_cm_free_rx_ring().
This function will be called from a couple of other places when
support for devices that don't implement SRQs is added.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:24 -08:00
Roland Dreier 2337f80941 IPoIB: Trivial formatting cleanups
Fix whitespace blunders, convert "foo* bar" to "foo *bar", etc.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:23 -08:00
Erez Zilber 90c18f3c28 [SCSI] IB/iSER: add logical unit reset support
eh_device_reset_handler was already added to scsi_host_template
in iscsi_tcp, and is now added also for iscsi_iser.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-01-23 13:39:43 -06:00
Olaf Kirch a8ac6311cc [SCSI] iscsi: convert xmit path to iscsi chunks
Convert xmit to iscsi chunks.

from michaelc@cs.wisc.edu:

Bug fixes, more digest integration, sg chaining conversion and other
sg wrapper changes, coding style sync up, and removal of io fields,
like pdu_sent, that are not needed.

Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-01-11 18:28:42 -06:00
Mike Christie f6d5180c78 [SCSI] libiscsi: fix nop handling
During root boot and shutdown the target could send us nops.
At this time iscsid cannot be running, so the target will drop
the session and the boot or shutdown will hang.

To handle this and allow us to better control when to check the network
this patch moves the nop handling to the kernel.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-01-11 18:28:35 -06:00
Mike Christie b3a7ea8d50 [SCSI] libiscsi: do not block session during logout
There is not need to block the session during logout. Since
we are going to fail the commands that were blocked just fail them
immediately instead.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-01-11 18:28:28 -06:00
Boaz Harrosh 38ad03de3f [SCSI] libiscsi,iser: patch for AHS support
- The default initialization of hdr_max is the minimum -
    sizeof(struct iscsi_cmd) - Once this patch goes into iser the default
    initialization at libiscsi can be removed.
  - This is not yet full support for AHSs at iser end. But it should be easy.
    Just allocate more space at iser_desc right after iscsi_hdr. Than
    at transmission time use ctask->hdr_len to retrieve the total
    size of all iscsi pdu headers. See previous patch at iscsi_tcp.[ch]

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-01-11 18:28:25 -06:00
Mike Christie 843c0a8a76 [SCSI] libiscsi, iscsi_tcp: add device support
This patch adds logical unit reset support. This should work for ib_iser,
but I have not finished testing that driver so it is not hooked in yet.

This patch also temporarily reverts the iscsi_tcp r2t write out patch.
That code is completely rewritten in this patchset.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-01-11 18:28:19 -06:00
Dave Dillow ad696989b4 IB/srp: Release transport before removing host
The documented call sequence for removing a host is to call the
transport xxx_remove_host() prior to scsi_remove_host(). The SRP
transport used to crash when that order was followed, but as it is now
fixed, use the documented order.

Signed-off-by: David Dillow <dillowda@ornl.gov>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-08 12:08:10 -08:00
David Dillow b0e47c8b79 IB/srp: Fix list corruption/oops on module reload
Add a missing call to srp_remove_host() in srp_remove_one() so that we 
don't leak SRP transport class list entries.

Tested-by: David Dillow <dillowda@ornl.gov>
Acked-by: FUJITA Tomonori <tomof@acm.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-03 10:25:27 -08:00
Jack Morgenstein 1401b53acc IPoIB: Fix oops if xmit is called when priv->broadcast is NULL
If a port goes down, ipoib_ib_dev_down() is invoked -- which flushes
the mcasts (clearing priv->broadcast) and clearing the path record
cache.  If ipoib_start_xmit() is then invoked (before the broadcast
group is rejoined), a kernel oops results from attempting to access
priv->broadcast, which is still unset.

Returning NULL from path_rec_create() if priv->broadcast is NULL is a
harmless way of bypassing the problem -- the offending packet is
simply discarded "without prejudice."

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-11-27 15:40:10 -08:00
Erez Zilber a316b79c33 IB/iser: Add missing counter increment in iser_data_buf_aligned_len()
While adding sg chaining support to iSER, a "for" loop was replaced
with a "for_each_sg" loop. The "for" loop included the incrementation
of 2 variables. Only one of them is incremented in the current
"for_each_sg" loop. This caused iSER to think that all data is
unaligned, and all data was copied to aligned buffers.

This patch increments the missing counter inside the "for_each_sg"
loop whenever necessary.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-11-24 13:50:39 -08:00
Roland Dreier 09f60f8f54 IPoIB/cm: Fix receive QP cleanup
Commit 1b524963 ("IPoIB/cm: Use common CQ for CM send completions")
changed how the high-order bits of work request IDs were used, which
had the effect that IPOIB_CM_RX_DRAIN_WRID was no longer handled as a
connected mode receive completion.  This leads to the messages

    ib1: cm send completion event with wrid 1073741823 (> 64)
    ib1: RX drain timing out

when an interface with connected mode QPs is brought down.  Fix this
by making sure that both IPOIB_OP_CM and IPOIB_OP_RECV are set in
IPOIB_CM_RX_DRAIN_WRID.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-26 13:44:25 -07:00
Linus Torvalds 0b776eb542 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  mlx4_core: Increase command timeout for INIT_HCA to 10 seconds
  IPoIB/cm: Use common CQ for CM send completions
  IB/uverbs: Fix checking of userspace object ownership
  IB/mlx4: Sanity check userspace send queue sizes
  IPoIB: Rewrite "if (!likely(...))" as "if (unlikely(!(...)))"
  IB/ehca: Enable large page MRs by default
  IB/ehca: Change meaning of hca_cap_mr_pgsize
  IB/ehca: Fix ehca_encode_hwpage_size() and alloc_fmr()
  IB/ehca: Fix masking error in {,re}reg_phys_mr()
  IB/ehca: Supply QP token for SRQ base QPs
  IPoIB: Use round_jiffies() for ah_reap_task
  RDMA/cma: Fix deadlock destroying listen requests
  RDMA/cma: Add locking around QP accesses
  IB/mthca: Avoid alignment traps when writing doorbells
  mlx4_core: Kill mlx4_write64_raw()
2007-10-23 09:56:11 -07:00
Jens Axboe 45711f1af6 [SG] Update drivers to use sg helpers
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-22 21:19:53 +02:00
Michael S. Tsirkin 1b524963fd IPoIB/cm: Use common CQ for CM send completions
Use the same CQ for CM send completions as for all other IPoIB
completions.  This means all completions are processed via the same
NAPI polling routine.  This should help reduce the number of
interrupts for bi-directional traffic (such as TCP) and fixes "driver
is hogging interrupts" errors reported for IPoIB send side, e.g.
<https://bugs.openfabrics.org/show_bug.cgi?id=508>

To do this, keep a per-interface counter of outstanding send WRs, and
stop the interface when this counter reaches the send queue size to
avoid CQ overruns.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-19 21:39:34 -07:00
Roland Dreier fd312561ad IPoIB: Rewrite "if (!likely(...))" as "if (unlikely(!(...)))"
It's too hard to figure out what "!likely(...)" really means, and who
knows how compilers interpret the hint.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-17 21:54:44 -07:00
Anton Blanchard 69fc507a14 IPoIB: Use round_jiffies() for ah_reap_task
Use round_jiffies() to align the 1 second ah_reap_task with other work
and potentially save power by sleeping cores for longer.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-16 12:28:56 -07:00
Jens Axboe 53d412fce0 infiniband: sg chaining support
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-16 11:20:59 +02:00
Moni Shoua 200d1713b4 IB/ipoib: Verify address handle validity on send
When the bonding device senses a carrier loss of its active slave it replaces
that slave with a new one. In between the times when the carrier of an IPoIB
device goes down and ipoib_neigh is destroyed, it is possible that the
bonding driver will send a packet on a new slave that uses an old ipoib_neigh.
This patch detects and prevents this from happenning.

Signed-off-by: Moni Shoua <monis at voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>
Acked-by: Roland Dreier <rdreier@cisco.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-15 14:20:45 -04:00
Moni Shoua 732a2170f4 IB/ipoib: Bound the net device to the ipoib_neigh structue
IPoIB uses a two layer neighboring scheme, such that for each struct neighbour
whose device is an ipoib one, there is a struct ipoib_neigh buddy which is
created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour)
call.

When using the bonding driver, neighbours are created by the net stack on behalf
of the bonding (master) device. On the tx flow the bonding code gets an skb such
that skb->dev points to the master device, it changes this skb to point on the
slave device and calls the slave hard_start_xmit function.

Under this scheme, ipoib_neigh_destructor assumption that for each struct
neighbour it gets, n->dev is an ipoib device and hence netdev_priv(n->dev)
can be casted to struct ipoib_dev_priv is buggy.

To fix it, this patch adds a dev field to struct ipoib_neigh which is used
instead of the struct neighbour dev one, when n->dev->flags has the
IFF_MASTER bit set.

Signed-off-by: Moni Shoua <monis at voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>
Acked-by: Roland Dreier <rdreier@cisco.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-15 14:20:45 -04:00
Linus Torvalds df3d80f5a5 Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (207 commits)
  [SCSI] gdth: fix CONFIG_ISA build failure
  [SCSI] esp_scsi: remove __dev{init,exit}
  [SCSI] gdth: !use_sg cleanup and use of scsi accessors
  [SCSI] gdth: Move members from SCp to gdth_cmndinfo, stage 2
  [SCSI] gdth: Setup proper per-command private data
  [SCSI] gdth: Remove gdth_ctr_tab[]
  [SCSI] gdth: switch to modern scsi host registration
  [SCSI] gdth: gdth_interrupt() gdth_get_status() & gdth_wait() fixes
  [SCSI] gdth: clean up host private data
  [SCSI] gdth: Remove virt hosts
  [SCSI] gdth: Reorder scsi_host_template intitializers
  [SCSI] gdth: kill gdth_{read,write}[bwl] wrappers
  [SCSI] gdth: Remove 2.4.x support, in-kernel changelog
  [SCSI] gdth: split out pci probing
  [SCSI] gdth: split out eisa probing
  [SCSI] gdth: split out isa probing
  gdth: Make one abuse of scsi_cmnd less obvious
  [SCSI] NCR5380: Use scsi_eh API for REQUEST_SENSE invocation
  [SCSI] usb storage: use scsi_eh API in REQUEST_SENSE execution
  [SCSI] scsi_error: Refactoring scsi_error to facilitate in synchronous REQUEST_SENSE
  ...
2007-10-15 08:19:33 -07:00
FUJITA Tomonori aebd5e476e [SCSI] transport_srp: add rport roles attribute
This adds a 'roles' attribute to rport like transport_fc. The role can
be initiator or target. That is, the initiator driver creates target
remote ports and the target driver creates initiator remote ports.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2007-10-12 14:37:46 -04:00
FUJITA Tomonori 3236822b1c [SCSI] ib_srp: convert to use the srp transport class
This converts ib_srp to use the srp transport class.

I don't have ib hardware so I've not tested this patch.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2007-10-12 14:37:42 -04:00
Linus Torvalds ce9d3c9a6a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (87 commits)
  mlx4_core: Fix section mismatches
  IPoIB: Allow setting policy to ignore multicast groups
  IB/mthca: Mark error paths as unlikely() in post_srq_recv functions
  IB/ipath: Minor fix to ordering of freeing and zeroing of tid pages.
  IB/ipath: Remove redundant link state checks
  IB/ipath: Fix IB_EVENT_PORT_ERR event
  IB/ipath: Better handling of unexpected GPIO interrupts
  IB/ipath: Maintain active time on all chips
  IB/ipath: Fix QHT7040 serial number check
  IB/ipath: Indicate a couple of chip bugs to userspace
  IB/ipath: iba6110 rev4 no longer needs recv header overrun workaround
  IB/ipath: Use counters in ipath_poll and cleanup interrupts in ipath_close
  IB/ipath: Remove duplicate copy of LMC
  IB/ipath: Add ability to set the LMC via the sysfs debugging interface
  IB/ipath: Optimize completion queue entry insertion and polling
  IB/ipath: Implement IB_EVENT_QP_LAST_WQE_REACHED
  IB/ipath: Generate flush CQE when QP is in error state
  IB/ipath: Remove redundant code
  IB/ipath: Future proof eeprom checksum code (contents reading)
  IB/ipath: UC RDMA WRITE with IMMEDIATE doesn't send the immediate
  ...
2007-10-11 19:43:13 -07:00
Roland Dreier 9153f66a5b IPoIB: Fix unused variable warning
The conversion to use netdevice internal stats left an unused variable
in ipoib_neigh_free(), since there's no longer any reason to get
netdev_priv() in order to increment dropped packets.  Delete the
unused priv variable.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-10-10 16:55:30 -07:00
Roland Dreier de90351219 [IPoIB]: Convert to netdevice internal stats
Use the stats member of struct netdevice in IPoIB, so we can save
memory by deleting the stats member of struct ipoib_dev_priv, and save
code by deleting ipoib_get_stats().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:41 -07:00
Stephen Hemminger 3b04ddde02 [NET]: Move hardware header operations out of netdevice.
Since hardware header operations are part of the protocol class
not the device instance, make them into a separate object and
save memory.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:52 -07:00
Ralf Baechle 10d024c1b2 [NET]: Nuke SET_MODULE_OWNER macro.
It's been a useless no-op for long enough in 2.6 so I figured it's time to
remove it.  The number of people that could object because they're
maintaining unified 2.4 and 2.6 drivers is probably rather small.

[ Handled drivers added by netdev tree and some missed IRDA cases... -DaveM ]

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:51:13 -07:00
Stephen Hemminger bea3348eef [NET]: Make NAPI polling independent of struct net_device objects.
Several devices have multiple independant RX queues per net
device, and some have a single interrupt doorbell for several
queues.

In either case, it's easier to support layouts like that if the
structure representing the poll is independant from the net
device itself.

The signature of the ->poll() call back goes from:

	int foo_poll(struct net_device *dev, int *budget)

to

	int foo_poll(struct napi_struct *napi, int budget)

The caller is returned the number of RX packets processed (or
the number of "NAPI credits" consumed if you want to get
abstract).  The callee no longer messes around bumping
dev->quota, *budget, etc. because that is all handled in the
caller upon return.

The napi_struct is to be embedded in the device driver private data
structures.

Furthermore, it is the driver's responsibility to disable all NAPI
instances in it's ->stop() device close handler.  Since the
napi_struct is privatized into the driver's private data structures,
only the driver knows how to get at all of the napi_struct instances
it may have per-device.

With lots of help and suggestions from Rusty Russell, Roland Dreier,
Michael Chan, Jeff Garzik, and Jamal Hadi Salim.

Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra,
Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan.

[ Ported to current tree and all drivers converted.  Integrated
  Stephen's follow-on kerneldoc additions, and restored poll_list
  handling to the old style to fix mutual exclusion issues.  -DaveM ]

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:47:45 -07:00
Or Gerlitz 335a64a5a9 IPoIB: Allow setting policy to ignore multicast groups
The kernel IB stack allows (through the RDMA CM) userspace
applications to join and use multicast groups from the IPoIB MGID
range.  This allows multicast traffic to be handled directly from
userspace QPs, without going through the kernel stack, which gives
better performance for some applications.

However, to fully interoperate with IP multicast, such userspace
applications need to participate in IGMP reports and queries, or else
routers may not forward the multicast traffic to the system where the
application is running.  The simplest way to do this is to share the
kernel IGMP implementation by using the IP_ADD_MEMBERSHIP option to
join multicast groups that are being handled directly in userspace.

However, in such cases, the actual multicast traffic should not also
be handled by the IPoIB interface, because that would burn resources
handling multicast packets that will just be discarded in the kernel.

To handle this, this patch adds lookup on the database used for IB
multicast group reference counting when IPoIB is joining multicast
groups, and if a multicast group is already handled by user space,
then the IPoIB kernel driver ignores the group.  This is controlled by
a per-interface policy flag.  When the flag is set, IPoIB will not
join and attach its QP to a multicast group which already has an entry
in the database; when the flag is cleared, IPoIB will behave as before
this change.

For each IPoIB interface, the /sys/class/net/$intf/umcast attribute
controls the policy flag.  The default value is off/0.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-10 13:02:30 -07:00
Dotan Barak ede6bc04f3 IPoIB/cm: Clean up initialization of QP attr in ipoib_cm_create_tx_qp()
Make the way QP is being created in ipoib_cm_create_tx_qp()
consistent with ipoib_cm_create_rx_qp().

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:18 -07:00
Roland Dreier ec2a1344ad IB/iser: Remove unnecessary includes
<asm/scatterlist.h> is not needed because everyplace it appears,
<linux/scatterlist.h> also appears.  <asm/io.h> is not needed because
nothing seems to be using device IO anyway.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:13 -07:00
Sean Hefty 247e020ee5 IB/srp: Add QoS support through service ID
Provide the target service ID when performing a path record query to
support optional QoS capability.  QoS requires support from the SA.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:12 -07:00
Sean Hefty 81668838c4 IPoIB: Specify Traffic Class with path record queries for QoS support
To support QoS within and between subnets, modify IPoIB to request
specific Traffic Class values with path record queries, using
the value associated with the IPoIB broadcast group.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

[ See some comments I made on this at v1 and v2 of the posts
  <http://lists.openfabrics.org/pipermail/general/2007-August/039275.html>
  <http://lists.openfabrics.org/pipermail/general/2007-September/040312.html> ]

Reviewed-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:11 -07:00
Eli Cohen ca6de177ac IPoIB: Fix error path memory leak
Clean up properly if ib_query_pkey() or ib_query_gid() fail.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:06 -07:00
Eli Cohen b3ac60fc24 IPoIB: Fix typo to end statement with ';' instead of ','
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:06 -07:00
Roland Dreier ce423ef50e IPoIB: Make sure no receives are handled when stopping device
The current IPoIB code might process receive completions from
ipoib_drain_cq() when bringing down the interface.  This could cause
packets to be passed up the stack without the device's poll method
being called.  Avoid this by setting the status of any successful
completions to IB_WC_WR_FLUSH_ERR.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 19:59:04 -07:00
Jack Morgenstein 6958e827f1 IPoIB: Fix leak in ipoib_transport_dev_init() error path
ipoib_transport_dev_init() calls ipoib_cm_dev_init(), so it needs to
call ipoib_cm_dev_cleanup() to unwind that on the error path.

Found by Dotan Barak of Mellanox.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-07 12:40:56 -07:00
Raghava Kondapalli 3d1ff48da7 IB/srp: Add OUI for new Cisco targets
New Cisco IB SRP targets use the Cisco OUI 00-1b-0d but still need the
Topspin workarounds.  Add this OUI to srp_target_is_topspin().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-03 10:45:18 -07:00
Roland Dreier 5d7cbfd631 IB/srp: Wrap OUI checking for workarounds in helper functions
Wrap the checking for Mellanox and Topspin OUIs to decide whether to
use a workaround into helper functions.  This will make it cleaner to
add a new OUI to check (as we need to do now that some targets with a
Cisco OUI still need the Topspin workarounds).

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-08-03 10:45:18 -07:00
Mike Christie 7974392c0b [SCSI] iscsi_tcp, ib_iser Enable module refcounting for iscsi host template
This prevents the iscsi modules from being unloaded while
there are active mounts from an iscsi target.

Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2007-07-27 09:11:45 -04:00
Paul Mundt 20c2df83d2 mm: Remove slab destructors from kmem_cache_create().
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.

This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-07-20 10:11:58 +09:00
Roland Dreier 41179e2de6 IB/iser: Make a couple of functions static
Make iser_conn_release() and iser_start_rdma_unaligned_sg() static,
since they are only used in the .c file where they are defined.  In
addition to being a cleanup, this even shrinks the generated code by
allowing the single call of iser_start_rdma_unaligned_sg() to be
inlined into its callsite.  On x86_64:

add/remove: 0/1 grow/shrink: 1/0 up/down: 466/-533 (-67)
function                                     old     new   delta
iser_reg_rdma_mem                           1518    1984    +466
iser_start_rdma_unaligned_sg                 533       -    -533

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-17 18:37:42 -07:00
Linus Torvalds bc06cffdec Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (166 commits)
  [SCSI] ibmvscsi: convert to use the data buffer accessors
  [SCSI] dc395x: convert to use the data buffer accessors
  [SCSI] ncr53c8xx: convert to use the data buffer accessors
  [SCSI] sym53c8xx: convert to use the data buffer accessors
  [SCSI] ppa: coding police and printk levels
  [SCSI] aic7xxx_old: remove redundant GFP_ATOMIC from kmalloc
  [SCSI] i2o: remove redundant GFP_ATOMIC from kmalloc from device.c
  [SCSI] remove the dead CYBERSTORMIII_SCSI option
  [SCSI] don't build scsi_dma_{map,unmap} for !HAS_DMA
  [SCSI] Clean up scsi_add_lun a bit
  [SCSI] 53c700: Remove printk, which triggers because of low scsi clock on SNI RMs
  [SCSI] sni_53c710: Cleanup
  [SCSI] qla4xxx: Fix underrun/overrun conditions
  [SCSI] megaraid_mbox: use mutex instead of semaphore
  [SCSI] aacraid: add 51245, 51645 and 52245 adapters to documentation.
  [SCSI] qla2xxx: update version to 8.02.00-k1.
  [SCSI] qla2xxx: add support for NPIV
  [SCSI] stex: use resid for xfer len information
  [SCSI] Add Brownie 1200U3P to blacklist
  [SCSI] scsi.c: convert to use the data buffer accessors
  ...
2007-07-15 16:51:54 -07:00
Sean Hefty 1d84612649 IB/cm: Include HCA ACK delay in local ACK timeout
The IB CM should include the HCA ACK delay when calculating the local
ACK timeout value to use for RC QPs.  If the HCA ACK delay is large
enough relative to the packet life time, then if it is not taken into
account, the calculated timeout value ends up being too small, which
can result in "retry exceeded" errors.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 21:50:05 -07:00
Roland Dreier 1b844afe9e IPoIB: Recycle loopback skbs instead of freeing and reallocating
InfiniBand HCAs replicate multicast packets back to the QP that sent
them if that QP is attached to the destination multicast group.  This
means that IPoIB multicasts are often replicated back to the receive
queue of the interface that generated them.  To avoid confusing the
network stack, we drop these duplicates within the IPoIB driver.

However, there's no reason to free the skb that received the duplicate
and then immediately allocate a new skb to post to the receive queue.
We can be more efficient and just repost the same skb.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 13:43:53 -07:00
Roland Dreier 20089ca557 IPoIB/cm: Fix warning if IPV6 is not enabled
Fix

    drivers/infiniband/ulp/ipoib/ipoib_cm.c:1151: warning: unused variable 'dev'

by getting rid of the variable dev, which is only used if CONFIG_IPV6
is enabled, and replacing the one use of it with the value it is
assigned, namely priv->dev.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-10 11:18:34 -07:00
Jan Engelhardt 06cc85086e IB: Use menuconfig for InfiniBand menu
Change Kconfig objects from "menu, config" into "menuconfig" so
that the user can disable the whole feature without having to
enter the menu first.

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Ralph Campbell 841adfca9c IPoIB/cm: Partial error clean up unmaps wrong address
If a page can't be allocated for the frag list of a skb, the code to
unmap the partially allocated list is off by one.  For exaple, if
'frags' equals one, i == 0, and the alloc_page() fails, then the old
loop would have unmapped mapping[1] which is uninitialized.  The same
would happen if the call to ib_dma_map_page() failed.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Acked-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-02 20:48:31 -07:00
Roland Dreier 13ef5f44c3 IPoIB/cm: Remove dead definition of struct ipoib_cm_id
It's completely unused.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-21 13:39:08 -07:00
Michael S. Tsirkin 82c3aca6ad IPoIB/cm: Fix interoperability when MTU doesn't match
IPoIB connected mode currently rejects a connection request unless the
supported MTU is >= the local netdevice MTU. This breaks
interoperability with implementations that might have tweaked
IPOIB_CM_MTU, and there's real no longer a reason to do so: this test
is just a leftover from when we did not tweak MTU per-connection.  Fix
this by making the test as permissive as possible.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-21 13:38:08 -07:00
Michael S. Tsirkin 3ec7393a68 IPoIB/cm: Initialize RX before moving QP to RTR
Fix a crasher bug in IPoIB CM: once a QP is in the RTR state, a
receive completion (or even an asynchronous error) might be observed
on this QP, so we have to initialize all of our receive data
structures before moving to the RTR state.

As an optimization (since modify_qp might take a long time), the
jiffies update done when moving RX to the passive_ids list is also
left in place to reduce the chance of the RX being misdetected as
stale.

This fixes bug <https://bugs.openfabrics.org/show_bug.cgi?id=662>.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-06-21 13:03:50 -07:00
FUJITA Tomonori da9c0c770e [SCSI] iscsi_iser: convert to use the data buffer accessors
iscsi_iser: convert to use the data buffer accessors

- remove the unnecessary map_single path.

- convert to use the new accessors for the sg lists and the
parameters.

TODO: use scsi_for_each_sg().

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2007-06-18 19:48:43 -05:00
FUJITA Tomonori bb350d1dec [SCSI] ib_srp: convert to use the data buffer accessors
- remove the unnecessary map_single path.

- convert to use the new accessors for the sg lists and the
parameters.

Jens Axboe <jens.axboe@oracle.com> did the for_each_sg cleanup.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Roland Dreier <rdreier@cisco.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2007-06-07 09:02:50 -05:00
Mike Christie d8196ed218 [SCSI] iscsi class, iscsi_tcp, iser, qla4xxx: add netdevname sysfs attr
iSCSI must support software iscsi (iscsi_tcp, iser), hardware iscsi (qla4xxx),
and partial offload (broadcom). To be able to allow each stack or driver
or port (virtual or physical) to be able to log into the same target portal
we use the initiator tuple [[HWADDRESS | NETDEVNAME], INITIATOR_NAME] and
the target tuple [TARGETNAME, CONN_ADDRESS, CONN_PORT] to id a session.
This patch adds the netdev name, which is used by software iscsi when
it binds a session to a netdevice using the SO_BINDTODEVICE sock opt.
It cannot use HWADDRESS because if someone did vlans then the same netdevice
will have the same mac and the initiator,target id will not be unique.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: David C Somayajulu <david.somayajulu@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2007-06-02 15:38:04 -04:00
Mike Christie 1548271ece [SCSI] libiscsi: make can_queue configurable
This patch allows us to set can_queue and cmds_per_lun from userspace
when we create the session/host. From there we can set it on a per
target basis. The patch fully converts iscsi_tcp, but only hooks
up ib_iser for cmd_per_lun since it currently has a lots of preallocations
based on can_queue.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: Roland Dreier <rdreier@cisco.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2007-06-02 15:34:46 -04:00
Mike Christie 77a23c21aa [SCSI] libiscsi: fix iscsi cmdsn allocation
The cmdsn allocation and pdu transmit code can race, and we can end
up sending a pdu with cmdsn 10 before a pdu with 5. The target will
then fail the connection/session. This patch fixes the problem by
delaying the cmdsn allocation until we are about to send the pdu.

This also removes the xmitmutex. We were using the connection xmitmutex
during error handling to handle races with mtask and ctask cleanup and
completion. For ctasks we now have nice refcounting and for the mtask,
if we hit the case where the mtask timesout and it is floating
around somewhere in the driver, we end up dropping the session.
And to handle session level cleanup, we use the xmit suspend bit
along with scsi_flush_queue and the session lock to make sure
that the xmit thread is not possibly transmitting a task while
we are trying to kill it.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: Roland Dreier <rdreier@cisco.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2007-06-02 15:34:14 -04:00
Mike Christie b2c6416736 [SCSI] iscsi class, iscsi_tcp, ib_iser: add sysfs chap file
The attached patches add sysfs files for the chap settings
to the iscsi transport class, iscsi_tcp and ib_iser. This is
needed for software iscsi because there are times when iscsid
can die and it will need to reread the values it was using.
And it is needed by qla4xxx for basic management opertaions.
This patch does not hook in qla4xxx yet, because I am not sure
the mbx command to use.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: Roland Dreier <rdreier@cisco.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2007-06-01 12:58:58 -04:00
Mike Christie 857ae0bdb7 [SCSI] iscsi: Some fixes in preparation for bidirectional support - total_length
- Remove shadow of request length from struct iscsi_cmd_task.
- change all users to use scsi_cmnd->request_bufflen directly

(With bidi we will use scsi-ml API to retrieve in/out length)

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: Roland Dreier <rdreier@cisco.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2007-06-01 12:58:22 -04:00
Mike Christie 8ad5781ae9 [SCSI] iscsi class, qla4xxx, iscsi_tcp, ib_iser: export/set initiator name
For iscsi root boot, software iscsi needs to know what the BIOS/OF
initiator used for the initiator name so this puts it in sysfs
for userspace to be able to pick up.

For hw iscsi, it is nice to see what the card is using.

This patch adds the new param, and hooks in qla4xxx, iscsi_tcp, and ib_iser.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: David C Somayajulu <david.somayajulu@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2007-06-01 12:56:40 -04:00
Mike Christie 0801c242a3 [SCSI] libiscsi, iscsi_tcp, ib_iser : add sw iscsi host get/set params helpers
iscsid and udev need to key off the hw address being
used so add some helpers for iser and iscsi tcp.

Also convert them

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: Roland Dreier <rdreier@cisco.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2007-06-01 12:55:23 -04:00
Michael S. Tsirkin ec56dc0b7f IPoIB/cm: Fix performance regression on Mellanox
commit 518b1646 ("IPoIB/cm: Fix SRQ WR leak") introduced a severe
performance regression on Mellanox cards, because keeping a QP in the
error state for extended periods of time moves hardware to the slow
path (until the QP is destroyed).  For example, MPI latency goes from
~3 usecs to ~7 usecs.

Fix this by posting a send WR on one of the QPs that are being
flushed, instead of using a separate drain QP that is kept in the
error state.

This fixes bug <https://bugs.openfabrics.org/show_bug.cgi?id=636>,
reported and bisected by Scott Weitzenkamp at Cisco and debugged by
Sasha Mikheev at Voltaire.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-29 16:07:09 -07:00
Michael S. Tsirkin 2dfbfc3712 IPoIB/cm: Drain cq in ipoib_cm_dev_stop()
Since NAPI polling is disabled while ipoib_cm_dev_stop() is running,
ipoib_cm_dev_stop() must poll the CQ itself in order to see the
packets draining.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-24 14:02:40 -07:00
Michael S. Tsirkin 8fd357a6e3 IPoIB/cm: Fix timeout check in ipoib_cm_dev_stop()
time_after() was used backwards, so the timeout occurred immediately.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-24 14:02:39 -07:00
Michael S. Tsirkin 518b1646f8 IPoIB/cm: Fix SRQ WR leak
SRQ WR leakage has been observed with IPoIB/CM: e.g. flipping ports on
and off will, with time, leak out all WRs and then all connections
will start getting RNR NAKs.  Fix this in the way suggested by spec:
move the QP being destroyed to the error state, wait for "Last WQE
Reached" event and then post WR on a "drain QP" connected to the same
CQ.  Once we observe a completion on the drain QP, it's safe to call
ib_destroy_qp.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-21 13:35:40 -07:00
Michael S. Tsirkin 24bd1e4e32 IB/ipoib: Fix typos in error messages
Trivial error message fixups.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-21 13:29:15 -07:00
Yosef Etigin 26bbf13ce1 IPoIB: Handle P_Key table reordering
SM reconfiguration or failover possibly causes a shuffling of the values
in the P_Key table. Right now, IPoIB only queries for the P_Key index
once when it creates the device QP, and hence there are problems if the
index of a P_Key value changes.  Fix this by using the PKEY_CHANGE event
to trigger a recheck of the P_Key index.

Signed-off-by: Yosef Etigin <yosefe@voltaire.com>
Acked-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-19 08:51:54 -07:00
Michael S. Tsirkin 7c5b9ef857 IPoIB/cm: Optimize stale connection detection
In the presence of some running RX connections, we repeat
queue_delayed_work calls each 4 RX WRs, which is a waste.  It's enough
to start stale task when a first passive connection is added, and
rerun it every IPOIB_CM_RX_DELAY as long as there are outstanding
passive connections.

This removes some code from RX data path.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-14 14:11:01 -07:00
Randy Dunlap e63340ae6b header cleaning: don't include smp_lock.h when not used
Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.

Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:07 -07:00
Linus Torvalds 972d45fb43 Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
  IPoIB: Convert to NAPI
  IB: Return "maybe missed event" hint from ib_req_notify_cq()
  IB: Add CQ comp_vector support
  IB/ipath: Fix a race condition when generating ACKs
  IB/ipath: Fix two more spin lock problems
  IB/fmr_pool: Add prefix to all printks
  IB/srp: Set proc_name
  IB/srp: Add orig_dgid sysfs attribute to scsi_host
  IPoIB/cm: Don't crash if remote side uses one QP for both directions
  RDMA/cxgb3: Support for new abort logic
  RDMA/cxgb3: Initialize cpu_idx field in cpl_close_listserv_req message
  RDMA/cxgb3: Fail qp creation if the requested max_inline is too large
  RDMA/cxgb3: Fix TERM codes
  IPoIB/cm: Fix error handling in ipoib_cm_dev_open()
  IB/ipath: Don't corrupt pending mmap list when unmapped objects are freed
  IB/mthca: Work around kernel QP starvation
  IB/ipath: Don't put QP in timeout queue if waiting to send
  IB/ipath: Don't call spin_lock_irq() from interrupt context
2007-05-07 12:18:21 -07:00
Roland Dreier 8d1cc86a62 IPoIB: Convert to NAPI
Convert the IP-over-InfiniBand network device driver over to using
NAPI to handle completions for the main CQ.  This covers all receives
as well as datagram mode sends; send completions for connected mode
connections are still handled from interrupt context.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06 21:18:11 -07:00
Michael S. Tsirkin f4fd0b224d IB: Add CQ comp_vector support
Add a num_comp_vectors member to struct ib_device and extend
ib_create_cq() to pass in a comp_vector parameter -- this parallels
the userspace libibverbs API.  Update all hardware drivers to set
num_comp_vectors to 1 and have all ULPs pass 0 for the comp_vector
value.  Pass the value of num_comp_vectors to userspace rather than
hard-coding a value of 1.

We want multiple CQ event vector support (via MSI-X or similar for
adapters that can generate multiple interrupts), but it's not clear
how many vectors we want, or how we want to deal with policy issues
such as how to decide which vector to use or how to set up interrupt
affinity.  This patch is useful for experimenting, since no core
changes will be necessary when updating a driver to support multiple
vectors, and we know that we want to make at least these changes
anyway.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06 21:18:11 -07:00
Roland Dreier b7f008fdc9 IB/srp: Set proc_name
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06 21:18:11 -07:00
Ishai Rabinovitz 3633b3d096 IB/srp: Add orig_dgid sysfs attribute to scsi_host
Add an orig_dgid attribute in sysfs for SRP scsi_hosts, so that
userspace can tell what the original dgid value written to the
add_target file was, even if the connection is redirected to a
different port while connecting.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06 21:18:11 -07:00
Michael S. Tsirkin d6ef7d68f6 IPoIB/cm: Don't crash if remote side uses one QP for both directions
The IPoIB CM spec allows the use of a single connection in both
active->passive and passive->active directions.  The current Linux
code uses one connection for both directions, but if another node only
uses one connection for both directions, we oops when we try to look
up the passive connection.  Fix by checking that qp_context is
non-NULL before dereferencing it.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
2007-05-06 21:18:11 -07:00
Linus Torvalds 4f7a307dc6 Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (87 commits)
  [SCSI] fusion: fix domain validation loops
  [SCSI] qla2xxx: fix regression on sparc64
  [SCSI] modalias for scsi devices
  [SCSI] sg: cap reserved_size values at max_sectors
  [SCSI] BusLogic: stop using check_region
  [SCSI] tgt: fix rdma transfer bugs
  [SCSI] aacraid: fix aacraid not finding device
  [SCSI] aacraid: Correct SMC products in aacraid.txt
  [SCSI] scsi_error.c: Add EH Start Unit retry
  [SCSI] aacraid: [Fastboot] Panics for AACRAID driver during 'insmod' for kexec test.
  [SCSI] ipr: Driver version to 2.3.2
  [SCSI] ipr: Faster sg list fetch
  [SCSI] ipr: Return better qc_issue errors
  [SCSI] ipr: Disrupt device error
  [SCSI] ipr: Improve async error logging level control
  [SCSI] ipr: PCI unblock config access fix
  [SCSI] ipr: Fix for oops following SATA request sense
  [SCSI] ipr: Log error for SAS dual path switch
  [SCSI] ipr: Enable logging of debug error data for all devices
  [SCSI] ipr: Add new PCI-E IDs to device table
  ...
2007-05-05 13:30:44 -07:00
Jean Delvare 6473d160b4 PCI: Cleanup the includes of <linux/pci.h>
I noticed that many source files include <linux/pci.h> while they do
not appear to need it. Here is an attempt to clean it all up.

In order to find all possibly affected files, I searched for all
files including <linux/pci.h> but without any other occurence of "pci"
or "PCI". I removed the include statement from all of these, then I
compiled an allmodconfig kernel on both i386 and x86_64 and fixed the
false positives manually.

My tests covered 66% of the affected files, so there could be false
positives remaining. Untested files are:

arch/alpha/kernel/err_common.c
arch/alpha/kernel/err_ev6.c
arch/alpha/kernel/err_ev7.c
arch/ia64/sn/kernel/huberror.c
arch/ia64/sn/kernel/xpnet.c
arch/m68knommu/kernel/dma.c
arch/mips/lib/iomap.c
arch/powerpc/platforms/pseries/ras.c
arch/ppc/8260_io/enet.c
arch/ppc/8260_io/fcc_enet.c
arch/ppc/8xx_io/enet.c
arch/ppc/syslib/ppc4xx_sgdma.c
arch/sh64/mach-cayman/iomap.c
arch/xtensa/kernel/xtensa_ksyms.c
arch/xtensa/platform-iss/setup.c
drivers/i2c/busses/i2c-at91.c
drivers/i2c/busses/i2c-mpc.c
drivers/media/video/saa711x.c
drivers/misc/hdpuftrs/hdpu_cpustate.c
drivers/misc/hdpuftrs/hdpu_nexus.c
drivers/net/au1000_eth.c
drivers/net/fec_8xx/fec_main.c
drivers/net/fec_8xx/fec_mii.c
drivers/net/fs_enet/fs_enet-main.c
drivers/net/fs_enet/mac-fcc.c
drivers/net/fs_enet/mac-fec.c
drivers/net/fs_enet/mac-scc.c
drivers/net/fs_enet/mii-bitbang.c
drivers/net/fs_enet/mii-fec.c
drivers/net/ibm_emac/ibm_emac_core.c
drivers/net/lasi_82596.c
drivers/parisc/hppb.c
drivers/sbus/sbus.c
drivers/video/g364fb.c
drivers/video/platinumfb.c
drivers/video/stifb.c
drivers/video/valkyriefb.c
include/asm-arm/arch-ixp4xx/dma.h
sound/oss/au1550_ac97.c

I would welcome test reports for these files. I am fine with removing
the untested files from the patch if the general opinion is that these
changes aren't safe. The tested part would still be nice to have.

Note that this patch depends on another header fixup patch I submitted
to LKML yesterday:
  [PATCH] scatterlist.h needs types.h
  http://lkml.org/lkml/2007/3/01/141

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:35 -07:00
Michael S. Tsirkin 347fcfbed2 IPoIB/cm: Fix error handling in ipoib_cm_dev_open()
If skb allocation fails when we start the device, we call
ipoib_cm_dev_stop() even though ipoib_cm_dev_open() did not run to
completion, so we pass an invalid pointer to ib_destroy_cm_id and get
an oops.

Fix by clearing cm.id on error, and testing it in cm_dev_stop().
This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=561>

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-30 17:30:28 -07:00
Linus Torvalds afc2e82c08 Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband: (49 commits)
  IB: Set class_dev->dev in core for nice device symlink
  IB/ehca: Implement modify_port
  IB/umad: Clarify documentation of transaction ID
  IPoIB/cm: spin_lock_irqsave() -> spin_lock_irq() replacements
  IB/mad: Change SMI to use enums rather than magic return codes
  IB/umad: Implement GRH handling for sent/received MADs
  IB/ipoib: Use ib_init_ah_from_path to initialize ah_attr
  IB/sa: Set src_path_bits correctly in ib_init_ah_from_path()
  IB/ucm: Simplify ib_ucm_event()
  RDMA/ucma: Simplify ucma_get_event()
  IB/mthca: Simplify CQ cleaning in mthca_free_qp()
  IB/mthca: Fix mthca_write_mtt() on HCAs with hidden memory
  IB/mthca: Update HCA firmware revisions
  IB/ipath: Fix WC format drift between user and kernel space
  IB/ipath: Check that a UD work request's address handle is valid
  IB/ipath: Remove duplicate stuff from ipath_verbs.h
  IB/ipath: Check reserved memory keys
  IB/ipath: Fix unit selection when all CPU affinity bits set
  IB/ipath: Don't allow QPs 0 and 1 to be opened multiple times
  IB/ipath: Disable IB link earlier in shutdown sequence
  ...
2007-04-27 09:39:27 -07:00
Arnaldo Carvalho de Melo 459a98ed88 [SK_BUFF]: Introduce skb_reset_mac_header(skb)
For the common, open coded 'skb->mac.raw = skb->data' operation, so that we can
later turn skb->mac.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.

This one touches just the most simple case, next will handle the slightly more
"complex" cases.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:32 -07:00
Roland Dreier 37aebbde70 IPoIB/cm: spin_lock_irqsave() -> spin_lock_irq() replacements
There are quite a few places in ipoib_cm.c where we know IRQs are
enabled because we do something that sleeps in the same function, so
we can convert several occurrences of spin_lock_irqsave() to a plain
spin_lock_irq().  This cleans up the source a little and makes the
code smaller too:

add/remove: 0/0 grow/shrink: 1/5 up/down: 3/-51 (-48)
function                                     old     new   delta
ipoib_cm_tx_reap                             403     406      +3
ipoib_cm_stale_task                          146     145      -1
ipoib_cm_dev_stop                            173     172      -1
ipoib_cm_tx_handler                          964     956      -8
ipoib_cm_rx_handler                          956     937     -19
ipoib_cm_skb_reap                            212     190     -22

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-24 21:30:37 -07:00
Sean Hefty 46f1b3d7af IB/ipoib: Use ib_init_ah_from_path to initialize ah_attr
To support destinations that are not on the local IB subnet, IPoIB
should include the GRH information when constructing an address
handle.  Using the existing ib_init_ah_from_path() call will do this
for us.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
2007-04-24 16:31:12 -07:00
Roland Dreier a89875fc7e IPoIB: Remove pointless opcode field from debugging output
There's no point in printing the opcode field in the completion
handling debugging output, since the type of completion is already
printed at the beginning of the line.  In fact the opcode field is not
even defined for completions with a status other than success.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:20:53 -07:00
Michael S. Tsirkin 6371ea3d48 IPoIB/cm: Fix DMA direction typo
Receive buffers need to be mapped with DMA_FROM_DEVICE.  Incorrectly
mapping with DMA_TO_DEVICE causes a hard lock on ppc64 machines with
an IOMMU.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=431>

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-10 08:58:30 -07:00
Erez Zilber 1d426d6418 IB/iser: Don't defer connection failure notification to workqueue
When a connection is terminated asynchronously from the iSCSI layer's
perspective, iSER needs to notify the iSCSI layer that the connection
has failed.  This is done using a workqueue (switched to from the iSER
tasklet context).  Meanwhile, the connection object (that holds the
work struct) is released.  If the workqueue function wasn't called
yet, it will be called later with a NULL pointer, which will crash the
kernel.

The context switch (tasklet to workqueue) is not required, and
everything can be done from the iSER tasklet. This eliminates the NULL
work struct bug (and simplifies the code).

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-05 09:46:04 -07:00
Linus Torvalds a26b5fce06 Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
  IB/iser: Handle aborting a command after it is sent
  IB/mthca: Fix thinko in init_mr_table()
  RDMA/cxgb3: Fix resource leak in cxio_hal_init_ctrl_qp()
2007-03-28 14:00:01 -07:00
Erez Zilber 3104a2175d IB/iser: Handle aborting a command after it is sent
The SCSI midlayer may abort a command that was already sent.  If the
initiator is still trying to send the command (or data-out PDUs for
that command), the QP may time out after the midlayer times
out. Therefore, when aborting the command, iSER may still have
references for the command's buffers.  When sending these PDUs, the
sends will complete with an error and their resources will be released
then.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-26 16:35:09 -07:00
Alexey Kuznetsov ecbb416939 [NET]: Fix neighbour destructor handling.
->neigh_destructor() is killed (not used), replaced with
->neigh_cleanup(), which is called when neighbor entry goes to dead
state. At this point everything is still valid: neigh->dev,
neigh->parms etc.

The device should guarantee that dead neighbor entries (neigh->dead !=
0) do not get private part initialized, otherwise nobody will cleanup
it.

I think this is enough for ipoib which is the only user of this thing.
Initialization private part of neighbor entries happens in ipib
start_xmit routine, which is not reached when device is down.  But it
would be better to add explicit test for neigh->dead in any case.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-25 18:48:01 -07:00
Michael S. Tsirkin 77d8e1efea IB/ipoib: Fix thinko in packet length checks
The packet length checks in ipoib are broken: we add 4 bytes (IPoIB
encapsulation header) when sending a packet, not 20 bytes (hardware
address length) to each packet.  Therefore, if connected mode is
enabled so that the interface MTU is larger than the multicast MTU,
IPoIB may end up trying to send too-long multicast packets.  For
example, multicast is broken if a message of size 2048 bytes is sent
on an interface with UD MTU 2048, because 2048 is bigger than the real
limit of 2044 but the code tests against the wrong limit of 2060.

This patch fixes <https://bugs.openfabrics.org/show_bug.cgi?id=418>,
submitted by Scott Weitzenkamp <sweitzen@cisco.com>.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-22 14:40:16 -07:00
Michael S. Tsirkin d04d01b113 IPoIB: Fix use-after-free in path_rec_completion()
The connected mode code added the possibility that an neigh struct
gets freed in the list_for_each_entry() loop in path_rec_completion(),
which causes a use-after-free.  Fix this by changing to the _safe
variant of the list walking macro.

This was spotted by the Coverity checker (CID 1567).

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-22 14:40:16 -07:00
Sean Hefty e07832b662 IPoIB: Fix race in detaching from mcast group before attaching
There's a race between ipoib_mcast_leave() and ipoib_mcast_join_finish()
where we can try to detach from a multicast group before we've
attached to it.  Fix this by reordering the code in ipoib_mcast_leave
to free the multicast group first, which waits for the multicast
callback thread (which calls ipoib_mcast_join_finish()) to complete
before detaching from the group.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-22 14:32:09 -07:00
Michael S. Tsirkin 60a596dab7 IPoIB/cm: Fix reaping of stale connections
The sense of the time_after_eq() test in ipoib_cm_stale_task() is
reversed so that only non-stale connections are reaped.  Fix this by
changing to time_before_eq().

Noticed by Pradeep Satyanarayana <pradeep@us.ibm.com>.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-22 14:32:09 -07:00
Mike Christie bf32ed33e9 [SCSI] iscsi: rename DEFAULT_MAX_RECV_DATA_SEGMENT_LENGTH
This patch renames DEFAULT_MAX_RECV_DATA_SEGMENT_LENGTH to avoid
confusion with the drivers default values (DEFAULT_MAX_RECV_DATA_SEGMENT_LENGTH
is the iscsi RFC specific default).

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2007-03-11 11:26:50 -05:00
Shirley Ma 55c9adde13 IPoIB: Turn on interface's carrier after broadcast group is joined
Do netif_carrier_on() right after the IPv4 broadcast multicast group
is joined, rather than waiting for all of the initial set of multicast
group joins to finish.  This allows at least IPv4 traffic to limp
along on broken fabrics where not all multicast groups can be joined.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-03-08 14:59:30 -08:00
Roland Dreier a27cbe8782 IPoIB: Only handle async events for one port
An asynchronous event carries the port number that the event occurred
on, so there's no reason for an IPoIB interface to process an event
associated with a different local HCA port.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-27 07:37:49 -08:00
Roland Dreier 843613b047 IPoIB: Correct debugging output when path record lookup fails
If path_rec_completion() is passed a non-NULL path record pointer
along with an unsuccessful status value, the tracing code incorrectly
prints the (invalid) DLID from the path record rather than the more
interesting status code.  The actual logic of the function correctly
uses the path record only if the status indicates a successful lookup.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-26 12:57:08 -08:00
Roland Dreier 658bcef619 IPoIB: Remove unused local_rate tracking
Now that low-level drivers handle the conversion from an absolute rate
to a relative rate, there's no need for the IPoIB driver to keep track
of the local port's data rate.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-21 20:28:05 -08:00
Michael S. Tsirkin 1812063ba3 IPoIB/cm: Improve small message bandwidth
Avoid the overhead of freeing/reallocating and mapping/unmapping for
DMA pages that have not been written to by hardware.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-20 20:16:14 -08:00
Sean Hefty faec2f7b96 IB/sa: Track multicast join/leave requests
The IB SA tracks multicast join/leave requests on a per port basis and
does not do any reference counting: if two users of the same port join
the same group, and one leaves that group, then the SA will remove the
port from the group even though there is one user who wants to stay a
member left.  Therefore, in order to support multiple users of the
same multicast group from the same port, we need to perform reference
counting locally.

To do this, add an multicast submodule to ib_sa to perform reference
counting of multicast join/leave operations.  Modify ib_ipoib (the
only in-kernel user of multicast) to use the new interface.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 14:20:02 -08:00
Michael S. Tsirkin 8a2e65f87c IPoIB: CM error handling thinko fix
ipoib_cm_alloc_rx_skb() might be called from IRQ context, so it must
use dev_kfree_skb_any(), not kfree_skb().

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 13:57:35 -08:00
Roland Dreier 551fd6122d IPoIB: Only allow root to change between datagram and connected mode
Change the permissions of the "mode" sysfs attribute to be S_IWUSR
instead of S_IWUGO.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16 13:57:33 -08:00
Linus Torvalds 93bbad8fe1 Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
  IB/mthca: Always fill MTTs from CPU
  IB/mthca: Merge MR and FMR space on 64-bit systems
  IB/mthca: Fix access to MTT and MPT tables on non-cache-coherent CPUs
  IB/mthca: Give reserved MTTs a separate cache line
  IB/mthca: Fix reserved MTTs calculation on mem-free HCAs
  RDMA/cxgb3: Add driver for Chelsio T3 RNIC
  IB: Remove redundant "_wq" from workqueue names
  RDMA/cma: Increment port number after close to avoid re-use
  IB/ehca: Fix memleak on module unloading
  IB/mthca: Work around gcc bug on sparc64
  IPoIB: Connected mode experimental support
  IB/core: Use ARRAY_SIZE macro for mandatory_table
  IB/mthca: Use correct structure size in call to memset()
2007-02-13 21:16:39 -08:00
Arjan van de Ven 2b8693c061 [PATCH] mark struct file_operations const 3
Many struct file_operations in the kernel can be "const".  Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data.  In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:45 -08:00
Michael S. Tsirkin 839fcaba35 IPoIB: Connected mode experimental support
The following patch adds experimental support for IPoIB connected
mode, as defined by the draft from the IETF ipoib working group.  The
idea is to increase performance by increasing the MTU from the maximum
of 2K (theoretically 4K) supported by IPoIB on top of UD.  With this
code, I'm able to get 800MByte/sec or more with netperf without
options on a Mellanox 4x back-to-back DDR system.

Some notes on code:
1. SRQ is used for scalability to large cluster sizes
2. Only RC connections are used (UC does not support SRQ now)
3. Retry count is set to 0 since spec draft warns against retries
4. Each connection is used for data transfers in only 1 direction, so
   each connection is either active(TX) or passive (RX).  2 sides that
   want to communicate create 2 connections.
5. Each active (TX) connection has a separate CQ for send completions -
   this keeps the code simple without CQ resize and other tricks
6. To detect stale passive side connections (where the remote side is
   down), we keep an LRU list of passive connections (updated once per
   second per connection) and destroy a connection after it has been
   unused for several seconds. The LRU rule makes it possible to avoid
   scanning connections that have recently been active.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-10 08:00:48 -08:00
Al Viro b437735645 [PATCH] iscsi endianness annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-09 09:14:07 -08:00
Greg Kroah-Hartman 43cb76d91e Network: convert network devices to use struct device instead of class_device
This lets the network core have the ability to handle suspend/resume
issues, if it wants to.

Thanks to Frederik Deweerdt <frederik.deweerdt@gmail.com> for the arm
driver fixes.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-02-07 10:37:11 -08:00
Ishai Rabinovitz 1033ff670d IB/srp: Don't wait for response when QP is in error state.
When there is a call to send_tsk_mgmt SRP posts a send and waits for 5
seconds to get a response.

When the QP is in the error state it is obvious that there will be no
response so it is quite useless to wait.  In fact, the timeout causes
SRP to wait a long time to reconnect when a QP error occurs. (Each
abort and each reset_device calls send_tsk_mgmt, which waits for the
timeout).  The following patch solves this problem by identifying the
failure and returning an immediate error code.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-04 14:11:56 -08:00
Ishai Rabinovitz a20f3a6d7e IB/srp: Check match_strdup() return
Checks if the kmalloc in match_strdup() was successful, and bail out
on looking at the token if it failed.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-22 17:03:54 -08:00
Erez Zilber f0938401f2 IB/iser: Return error code when PDUs may not be sent
iSER limits the number of outstanding PDUs to send. When this threshold
is reached, it should return an error code (-ENOBUFS) instead of setting
the suspend_tx bit (which should be used only by libiscsi).

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-01-07 10:15:15 -08:00
Roland Dreier bf628dc22a IB/srp: Fix FMR mapping for 32-bit kernels and addresses above 4G
struct srp_device.fmr_page_mask was unsigned long, which means that
the top part of addresses above 4G was being chopped off on 32-bit
architectures.  Of course nothing good happens when data from SRP
targets is DMAed to the wrong place.

Fix this by changing fmr_page_mask to u64, to match the addresses
actually used by IB devices.

Thanks to Brian Cain <Brian.Cain@ge.com> and David McMillen
<davem@systemfabricworks.com> for help diagnosing the bug and testing
the fix.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-15 14:01:49 -08:00
Roland Dreier 82b399133b IPoIB: Make sure struct ipoib_neigh.queue is always initialized
Move the initialization of ipoib_neigh's skb_queue into
ipoib_neigh_alloc(), since commit 2745b5b7 ("IPoIB: Fix skb leak when
freeing neighbour") will make iterate over the skb_queue to free any
packets left over when freeing the ipoib_neigh structure.

This fixes a crash when freeing ipoib_neigh structures allocated in
ipoib_mcast_send(), which otherwise don't have their skb_queue
initialized.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 14:48:18 -08:00
Ralph Campbell 5180311fe9 IB/iser: Use the new verbs DMA mapping functions
Convert iSER to use the new verbs DMA mapping functions for kernel
verbs consumers.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 14:31:00 -08:00
Ralph Campbell 85507bcce0 IB/srp: Use new verbs IB DMA mapping functions
Convert SRP to use the new verbs DMA mapping functions for kernel
verbs consumers.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 14:30:55 -08:00
Ralph Campbell 37ccf9df97 IPoIB: Use the new verbs DMA mapping functions
Convert IPoIB to use the new DMA mapping functions
for kernel verbs consumers.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 14:30:48 -08:00
Roland Dreier dee234f48a IB/iser: Remove unused "write-only" variables
Remove variables that are set but then never looked at in the iSER
initiator.  These cleanups came from David Binderman's list of "set
but never used" warnings from icc.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12 11:50:20 -08:00
David Howells f0d1b0b30d [PATCH] LOG2: Implement a general integer log2 facility in the kernel
This facility provides three entry points:

	ilog2()		Log base 2 of unsigned long
	ilog2_u32()	Log base 2 of u32
	ilog2_u64()	Log base 2 of u64

These facilities can either be used inside functions on dynamic data:

	int do_something(long q)
	{
		...;
		y = ilog2(x)
		...;
	}

Or can be used to statically initialise global variables with constant values:

	unsigned n = ilog2(27);

When performing static initialisation, the compiler will report "error:
initializer element is not constant" if asked to take a log of zero or of
something not reducible to a constant.  They treat negative numbers as
unsigned.

When not dealing with a constant, they fall back to using fls() which permits
them to use arch-specific log calculation instructions - such as BSR on
x86/x86_64 or SCAN on FRV - if available.

[akpm@osdl.org: MMC fix]
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David Howells <dhowells@redhat.com>
Cc: Wojtek Kaniewski <wojtekka@toxygen.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:51 -08:00
David Howells 9db7372445 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	drivers/ata/libata-scsi.c
	include/linux/libata.h

Futher merge of Linus's head and compilation fixups.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-12-05 17:01:28 +00:00
David Howells 4c1ac1b491 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	drivers/infiniband/core/iwcm.c
	drivers/net/chelsio/cxgb2.c
	drivers/net/wireless/bcm43xx/bcm43xx_main.c
	drivers/net/wireless/prism54/islpci_eth.c
	drivers/usb/core/hub.h
	drivers/usb/input/hid-core.c
	net/core/netpoll.c

Fix up merge failures with Linus's head and fix new compilation failures.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-12-05 14:37:56 +00:00
Al Viro a1f8e7f7fb [PATCH] severing skbuff.h -> highmem.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-12-04 02:00:29 -05:00
Michael S. Tsirkin 2745b5b713 IPoIB: Fix skb leak when freeing neighbour
ipoib_neigh_free() is sometimes called while neighbour is still alive,
so it might still have queued skbs.  Fix skb leak in this case.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:09 -08:00
Vu Pham d2fcea7d68 IB/srp: Fix memory leak on reconnect
SRP reallocates the IU buffers for tx_ring and rx_ring without freeing
the old buffers when it reconnects to a target.  Fix this by keeping
the old IU buffers around.

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:09 -08:00
Roland Dreier e54f81889c IB: Convert kmem_cache_t -> struct kmem_cache
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:07 -08:00
Arne Redlich 3c8edf0eca IB/srp: Increase supported CDB size
Set the Scsi_Host's max_cmd_len from 12 (default) to 16 for
SRP. Otherwise scsi_dispatch_cmd() won't pass down certain commands
such as READ CAPACITY 16, required for supporting disks > 2TB.

Signed-off-by: Arne Redlich <arne.redlich@xiranet.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-29 15:33:06 -08:00
David Howells c4028958b6 WorkStruct: make allyesconfig
Fix up for make allyesconfig.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-11-22 14:57:56 +00:00
Michael S. Tsirkin 073ae841d6 IPoIB: Clear high octet in QP number
IPoIB assumes that high (reserved) octet in the hardware address is 0,
and copies it into the QPN.  This violates RFC 4391 (which requires
that the high 8 bits are ignored on receive), and will result in an
invalid QPN being used when interoperating with IPoIB connected mode.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-11-16 13:56:45 -08:00
Erez Zilber 2e7a742628 IB/iser: Start connection after enabling iSER
When a connection is started (a new connection or a recovered one),
iSER should prepare its resources for full-featured mode and only then
notify the iSCSI layer that it is ready to start queueing commands.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-30 20:52:50 -08:00
Roland Dreier 73fbe8be73 IPoIB: Check for DMA mapping error for TX packets
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-10 12:50:38 -07:00
Ishai Rabinovitz 01cb9bcbd3 IB/srp: Enable multiple connections to the same target
Enable multiple concurrent connections to the same SRP target:

1) Use port GUID instead of node GUID in the initiator port
   identifier.  This allows connections to be made from multiple HCA
   ports at the same time.
2) Let the user specify the identifier extention when adding the
   device.  This allows userspace to make multiple connections even
   from the same port, if it wants too.

Without this, only one connection can be made from any given HCA, even
if it has multiple ports, because we don't use multi-channel mode, so
targets will only allow one connection from a given initiator port ID.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-10 12:49:05 -07:00
Ishai Rabinovitz 9b0af401aa IB/srp: Remove redundant memset()
scsi_host_alloc() already allocates with kzalloc(), so the struct Scsi_Host
is zeroed out, including the private data portion.  Remove the redundant
memset that zeros this out again in the SRP initiator.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-10-10 09:51:14 -07:00
Matt LaPlante cab00891c5 Still more typo fixes
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03 22:36:44 +02:00
Erez Zilber fd6a79a786 IB/iser: Fix the description of iSER in Kconfig
Fix the description of iSER in Kconfig.  It is not accurate.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-28 10:54:51 -07:00
Erez Zilber 74a2078061 IB/iser: DMA unmap unaligned for RDMA data before touching it
iSER uses the DMA mapping api to map the page holding the
SCSI command data to the HCA DMA address space. When the
command data is not aligned for RDMA, the data is copied
to/from an allocated buffer which in turn is used for
executing this command. The pages associated with the
command must be unmapped before being touched.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-28 10:53:18 -07:00
Erez Zilber 87e8df7a27 IB/iser: Have iSER data transaction object point to iSER conn
iSER uses a data transaction object (struct iser_dto) as part
of its IB data descriptors (struct iser_desc) management.
It also uses a hierarchy of connection structures pointing to
each other. A DTO may exist even after the iscsi_iser connection
pointed by it is destroyed (eg one that is bound to a post
receive buffer which was flushed by the IB HW). Hence DTOs need
point to the lowest connection, which is struct iser_conn.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-28 10:53:16 -07:00
Theodore Ts'o 8e18e2941c [PATCH] inode_diet: Replace inode.u.generic_ip with inode.i_private
The following patches reduce the size of the VFS inode structure by 28 bytes
on a UP x86.  (It would be more on an x86_64 system).  This is a 10% reduction
in the inode size on a UP kernel that is configured in a production mode
(i.e., with no spinlock or other debugging functions enabled; if you want to
save memory taken up by in-core inodes, the first thing you should do is
disable the debugging options; they are responsible for a huge amount of bloat
in the VFS inode structure).

This patch:

The filesystem or device-specific pointer in the inode is inside a union,
which is pretty pointless given that all 30+ users of this field have been
using the void pointer.  Get rid of the union and rename it to i_private, with
a comment to explain who is allowed to use the void pointer.  This is just a
cleanup, but it allows us to reuse the union 'u' for something something where
the union will actually be used.

[judith@osdl.org: powerpc build fix]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Judith Lebzelter <judith@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27 08:26:17 -07:00
James Bottomley c9802cd957 Merge mulgrave-w:git/scsi-misc-2.6
Conflicts:

	drivers/scsi/iscsi_tcp.c
	drivers/scsi/iscsi_tcp.h

Pretty horrible merge between crypto hash consolidation
and crypto_digest_...->crypto_hash_... conversion

Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-09-23 15:33:43 -05:00
Eli Cohen a8bfca0243 IPoIB: Add some likely/unlikely annotations in hot path
Signed-off-by: Eli Cohen <eli@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:58 -07:00
Dotan Barak 507c335046 IPoIB: Remove unused include of vmalloc.h
IPoIB doesn't use anything from <linux/vmalloc.h>, so don't include it.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:57 -07:00
Eli Cohen 5ccd025553 IPoIB: Rejoin all multicast groups after a port event
When ipoib_ib_dev_flush() is called because of a port event, the
driver needs to rejoin all multicast groups, since the flush will call
ipoib_mcast_dev_flush() (via ipoib_ib_dev_down()).  Otherwise no
(non-broadcast) multicast groups will be rejoined until the networking
core calls ->set_multicast_list again, and so multicast reception will
be broken for potentially a long time.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:56 -07:00
Roland Dreier d0df6d6d45 IPoIB: Create MCGs with all attributes required by RFC
RFC 4391 ("Transmission of IP over InfiniBand (IPoIB)") says:

  If the IB multicast group does not already exist, one must be
  created first with the IPoIB link MTU.  The MGID MUST use the same
  P_Key, Q_Key, SL, MTU, and HopLimit as those used in the
  broadcast-GID.  The rest of attributes SHOULD follow the values used
  in the broadcast-GID as well.

However, the current IPoIB driver is only setting the attributes
required by the InfiniBand spec to create a multicast group, so in
particular the MTU and HopLimit are not being set.  Add these
attributes when creating MCGs, and also set the Rate attribute, since
IPoIB pays attention to that attribute as well.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:56 -07:00
Roland Dreier 5755d6dad9 IB/iser: INFINIBAND_ISER depends on INET
iSER won't build without CONFIG_INET enabled, so make Kconfig reflect that.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:55 -07:00
Michael S. Tsirkin c1a0b23bf4 IB/sa: Require SA registration
Require users to register with SA module, to prevent the sa_query
module text from going away while an SA query callback is still
running.  Update all in-tree users for the new interface.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:53 -07:00
Roland Dreier 2439a6e65f IPoIB: Refactor completion handling
Split up ipoib_ib_handle_wc() into ipoib_ib_handle_rx_wc() and
ipoib_ib_handle_tx_wc() to make the code easier to read.  This will
also help implement NAPI in the future.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:52 -07:00
Erez Zilber d81110285f IB/iser: Do not use FMR for a single dma entry sg
Fast Memory Registration (fmr) is used to register for rdma an sg whose
elements are not linearly sequential after dma mapping.

The IB verbs layer provides an "all dma memory MR (memory region)" which
can be used for RDMA-ing a dma linearly sequential buffer.

Change the code to use the dma mr instead of doing fmr when dma mapping
produces a single dma entry sg.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:52 -07:00
Erez Zilber e981f1d4b8 IB/iser: fix some debug prints
fix and add some debug prints related to iser
handling of memory for rdma.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:51 -07:00
Erez Zilber 8dfa0876d3 IB/iser: make FMR "page size" be 4K and not PAGE_SIZE
As iser is able to use at most one rdma operation for the
execution of a scsi command, and registration of the sg
associated with scsi command has its restrictions, the code
checks if an sg is "aligned for rdma".

Alignment for rdma is measured in "fmr page" units whose
possible resolutions are different between HCAs and can be
smaller, equal or bigger to the system page size.

When the system page size is bigger than 4KB (eg the default
with ia64 kernels) there a bigger chance that an sg would be
aligned for rdma if the fmr page size is 4KB.

Change the code to create FMR whose pages are of size 4KB
and to take that into account when processing the sg.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:51 -07:00
Erez Zilber 8072ec2f8f IB/iser: Limit the max size of a scsi command
Currently, the data length of a command coming down from scsi-ml
is limited only by the size of its sg list (sg_tablesize). The
max data length may be different for different page size values.
By setting max_sectors, we limit the data length to
max_sectors*512 bytes.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:50 -07:00
Erez Zilber 777a71dd4d IB/iser: fix a check of SG alignment for RDMA
dma mapping may include a "compaction" of the sg associated with scsi command.
Hence, the size of the maximal prefix of the SG which is aligned for rdma must be
compared against the length of the dma mapped sg (mem->dma_nents) and not against
the size of it before it was mapped (mem->size).

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:49 -07:00
Tom Tucker 07ebafbaaa RDMA: iWARP Core Changes.
Modifications to the existing rdma header files, core files, drivers,
and ulp files to support iWARP, including:
 - Hook iWARP CM into the build system and use it in rdma_cm.
 - Convert enum ib_node_type to enum rdma_node_type, which includes
   the possibility of RDMA_NODE_RNIC, and update everything for this.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:47 -07:00
Roland Dreier 3cd965646b IB: Whitespace fixes
Remove some trailing whitespace that has snuck in despite the best
efforts of whitespace=error-all.  Also fix a few other whitespace
bogosities.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:46 -07:00
Ishai Rabinovitz ded7f1a16d IB/srp: Add port/device attributes
Add local_ib_device and local_ib_port attributes to srp scsi_host.
These are needed when we want to connect to the same target through
multiple distinct ports.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:17:21 -07:00
Michael S. Tsirkin 9217b27b12 IB/ipoib: Fix flush/start xmit race (from code review)
Prevent flush task from freeing the ipoib_neigh pointer, while
ipoib_start_xmit() is accessing the ipoib_neigh through the pointer it
has loaded from the skb's hardware address.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:17:18 -07:00
Eli Cohen c11bd42a76 IPoIB: Retry failed send-only multicast group joins
When a send-only multicast group join fails, mcast->query must be set
to NULL.  Otherwise, IPoIB will never retry the join and the multicast
group will never be reachable.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-14 13:51:41 -07:00
Ishai Rabinovitz add7afc756 IB/srp: Don't schedule reconnect from srp
If there is a problem in the connection, the SCSI mid-layer will
eventually call srp_reset_host(), which will call srp_reconnect(), so
we do not need to schedule a call to srp_reconnect_work() from
srp_completion().

Removing this prevents srp_reset_host() from failing if a reconnect
scheduled from srp_completion() is already in progress, which in turn
was causing crashes as both SCSI midlayer and srp_reconnect() were
cancelling commands.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-14 13:51:40 -07:00
Mike Christie ffd0436ed2 [SCSI] libiscsi, iscsi_tcp, iscsi_iser: check that burst lengths are valid.
iSCSI RFC states that the first burst length must be smaller than the
max burst length. We currently assume targets will be good, but that may
not be the case, so this patch adds a check.

This patch also moves the unsol data out offset to the lib so the LLDs
do not have to track it.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-09-02 13:37:04 -05:00
James Bottomley 00dd7b7d26 Merge ../linux-2.6
Conflicts:

	arch/ia64/hp/sim/simscsi.c

Stylistic differences in two separate fixes for buffer->request_buffer
problem.

Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-08-06 12:42:33 -05:00
Or Gerlitz 8ddc7c5326 IB/ipoib: Remove broken link from Kconfig and documentation
Remove references to the IPoIB IETF working group as it has been closed.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-08-03 10:48:31 -07:00
Ishai Rabinovitz 559ce8f150 IB/srp: Work around data corruption bug on Mellanox targets
Data corruption has been seen with Mellanox SRP targets when FMRs
create a memory region with I/O virtual address != 0.  Add a
workaround that disables FMR merging for Mellanox targets (OUI 0002c9).

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-08-03 10:35:43 -07:00
Ishai Rabinovitz d916a8f1b4 IB/srp: Fix crash in srp_reconnect_target
Protect against srp_reset_device() clearing the req_queue while
srp_reconnect_target() is in progress (note that state change at
the top of srp_reconnect_target() is not sufficient for this since
srp_reset_device() ignores the state).

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-08-03 09:44:22 -07:00
Mike Christie 1c83469d36 [SCSI] iscsi bugfixes: fix oops when iser is flushing io
When we enter recovery and flush the running commands
we cannot freee the connection before flushing the commands.
Some commands may have a reference to the connection
that needs to be released before. iscsi_stop was forcing
the term and suspend too early and was causing a oops
in iser, so this patch removes those callbacks all together
and allows the LLD to handle that detail.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-07-28 11:48:32 -05:00
Michael S. Tsirkin 8a7f752125 IB/ipoib: Fix packet loss after hardware address update
The neighbour ha field may get updated without destroying the
neighbour.  In this case, the ha field gets out of sync with the
address handle stored in ipoib_neigh->ah, with the result that
the ah field would point to an incorrect path, resulting in all
packets being lost.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-07-24 09:18:07 -07:00
Or Gerlitz 624d01f899 IB/ipoib: Fix oops with ipoib_debug_mcast set
Need to set mcast->ah before debug code dereferences it.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-07-24 09:18:07 -07:00
Michael S. Tsirkin adfaa888a2 [PATCH] fmr pool: remove unnecessary pointer dereference
ib_fmr_pool_map_phys gets the virtual address by pointer but never writes
there, and users (e.g.  srp) seem to assume this and ignore the value
returned.  This patch cleans up the API to get the VA by value, and updates
all users.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Acked-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14 21:53:51 -07:00
Vu Pham 6583eb3dcc [PATCH] srp: fix fmr error handling
srp_unmap_data assumes req->fmr is NULL if the request is not mapped, so we
must clean it out in case of an error.

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Acked-by: Roland Dreier <rolandd@cisco.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14 21:53:50 -07:00
James Bottomley c4e00fac42 Merge ../scsi-misc-2.6
Conflicts:

	drivers/scsi/nsp32.c
	drivers/scsi/pcmcia/nsp_cs.c

Removal of randomness flag conflicts with SA_ -> IRQF_ global
replacement.

Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-07-03 09:41:12 -05:00
Linus Torvalds 22a3e233ca Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
  Remove obsolete #include <linux/config.h>
  remove obsolete swsusp_encrypt
  arch/arm26/Kconfig typos
  Documentation/IPMI typos
  Kconfig: Typos in net/sched/Kconfig
  v9fs: do not include linux/version.h
  Documentation/DocBook/mtdnand.tmpl: typo fixes
  typo fixes: specfic -> specific
  typo fixes in Documentation/networking/pktgen.txt
  typo fixes: occuring -> occurring
  typo fixes: infomation -> information
  typo fixes: disadvantadge -> disadvantage
  typo fixes: aquire -> acquire
  typo fixes: mecanism -> mechanism
  typo fixes: bandwith -> bandwidth
  fix a typo in the RTC_CLASS help text
  smb is no longer maintained

Manually merged trivial conflict in arch/um/kernel/vmlinux.lds.S
2006-06-30 15:39:30 -07:00
Andrew Morton cfa7b0d469 [PATCH] infiniband: devfs fix
Remove devfs leftovers.

Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30 11:25:41 -07:00
Jörn Engel 6ab3d5624e Remove obsolete #include <linux/config.h>
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-30 19:25:36 +02:00
Mike Christie 358ff019b8 [SCSI] iscsi: convert iser to new set/get param fns
Convert iser to libiscsi get/set param functions.
Fix bugs in it returning old error return values and
have it expose exp_statsn.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-06-29 11:07:41 -04:00
Akinobu Mita 179e09172a [PATCH] drivers: use list_move()
This patch converts the combination of list_del(A) and list_add(A, B) to
list_move(A, B) under drivers/.

Acked-by: Corey Minyard <minyard@mvista.com>
Cc: Ben Collins <bcollins@debian.org>
Acked-by: Roland Dreier <rolandd@cisco.com>
Cc: Alasdair Kergon <dm-devel@redhat.com>
Cc: Gerd Knorr <kraxel@bytesex.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frank Pavlic <fpavlic@de.ibm.com>
Acked-by: Matthew Wilcox <matthew@wil.cx>
Cc: Andrew Vasquez <linux-driver@qlogic.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:18 -07:00
Or Gerlitz 3f1244a2f8 IB/iser: iSER Kconfig and Makefile
Kconfig and Makefile for iSER.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-22 07:51:14 -07:00
Or Gerlitz 6461f64ab5 IB/iser: iSER handling of memory for RDMA
This file contains the processing carried over an SG list associated with
a SCSI command such that it can be registered with the IB verbs. The
registration produces a network virtual address (VA) and a remote access
key (RKEY or STAG in iSER spec notation) which are used by the target for
its RDMA operation.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-22 07:51:12 -07:00
Or Gerlitz 1cfa0a75db IB/iser: iSER RDMA CM (CMA) and IB verbs interaction
This file contains the low level interaction with the RDMA CM
and the IB verbs, where iSER is consumer of both.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-22 07:51:11 -07:00
Or Gerlitz e85b24b5e7 IB/iser: iSER initiator iSCSI PDU and TX/RX
This file contains the iSER initiator processing of iSCSI PDUs - controls,
commands and data-outs along with processing of TX and RX completions.
It interacts with the lower level iser code doing the memory registration
and and the cma and verbs calls.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-22 07:51:09 -07:00
Or Gerlitz 65e7ae7bfc IB/iser: iSCSI iSER transport provider high level code
This file contains the code that registeres with the iscsi transport manager
and with the SCSI Mid Layer, where much of the provided functions to iSCSI and
SCSI are implemented in libiscsi.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-22 07:51:07 -07:00
Or Gerlitz 49cd5382f6 IB/iser: iSCSI iSER transport provider header file
iSER (iSCSI Extensions for RDMA) transport provider driver for the iSCSI
initiator, whose other parts (under drivers/scsi) are scsi_transport_iscsi
- the transport management module, iscsi_tcp - the TCP transport provider
module and libiscsi - a kernel library (module) implementing functionality
needed by both TCP and iSER transports. iSER is both a provider of the iSCSI
transport api and a SCSI low level driver.

This file contains internal data structures and non static service functions.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-22 07:51:05 -07:00
Linus Torvalds 4c84a39c8a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (46 commits)
  IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
  IB/mthca: Make all device methods truly reentrant
  IB/mthca: Fix memory leak on modify_qp error paths
  IB/uverbs: Factor out common idr code
  IB/uverbs: Don't decrement usecnt on error paths
  IB/uverbs: Release lock on error path
  IB/cm: Use address handle helpers
  IB/sa: Add ib_init_ah_from_path()
  IB: Add ib_init_ah_from_wc()
  IB/ucm: Get rid of duplicate P_Key parameter
  IB/srp: Factor out common request reset code
  IB/srp: Support SRP rev. 10 targets
  [SCSI] srp.h: Add I/O Class values
  IB/fmr: Use device's max_map_map_per_fmr attribute in FMR pool.
  IB/mthca: Fill in max_map_per_fmr device attribute
  IB/ipath: Add client reregister event generation
  IB/mthca: Add client reregister event generation
  IB: Move struct port_info from ipath to <rdma/ib_smi.h>
  IPoIB: Handle client reregister events
  IB: Add client reregister event type
  ...
2006-06-19 19:01:59 -07:00
Herbert Xu 932ff279a4 [NET]: Add netif_tx_lock
Various drivers use xmit_lock internally to synchronise with their
transmission routines.  They do so without setting xmit_lock_owner.
This is fine as long as netpoll is not in use.

With netpoll it is possible for deadlocks to occur if xmit_lock_owner
isn't set.  This is because if a printk occurs while xmit_lock is held
and xmit_lock_owner is not set can cause netpoll to attempt to take
xmit_lock recursively.

While it is possible to resolve this by getting netpoll to use
trylock, it is suboptimal because netpoll's sole objective is to
maximise the chance of getting the printk out on the wire.  So
delaying or dropping the message is to be avoided as much as possible.

So the only alternative is to always set xmit_lock_owner.  The
following patch does this by introducing the netif_tx_lock family of
functions that take care of setting/unsetting xmit_lock_owner.

I renamed xmit_lock to _xmit_lock to indicate that it should not be
used directly.  I didn't provide irq versions of the netif_tx_lock
functions since xmit_lock is meant to be a BH-disabling lock.

This is pretty much a straight text substitution except for a small
bug fix in winbond.  It currently uses
netif_stop_queue/spin_unlock_wait to stop transmission.  This is
unsafe as an IRQ can potentially wake up the queue.  So it is safer to
use netif_tx_disable.

The hamradio bits used spin_lock_irq but it is unnecessary as
xmit_lock must never be taken in an IRQ handler.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 21:30:14 -07:00
Ishai Rabinovitz 526b4caa0a IB/srp: Factor out common request reset code
Misc cleanups in ib_srp:
1) I think that it is more efficient to move the req entries from req_list
   to free_list in srp_reconnect_target (rather than rebuild the free_list).
   (In any case this code is shorter).
2) This allows us to reuse code in srp_reset_device and srp_reconnect_target
   and call a new function srp_reset_req.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:38 -07:00
Ramachandra K 0c0450db31 IB/srp: Support SRP rev. 10 targets
There has been a change in the format of port identifiers between
revision 10 of the SRP specification and the current revision 16A.

Revision 10 specifies port identifier format as

  lower 8 bytes :  GUID   upper 8 bytes :  Extension

Whereas revision 16A specifies it as 

 lower 8 bytes :  Extension  upper 8 bytes :  GUID

There are older targets (e.g. SilverStorm Virtual Fibre Channel
Bridge) which conform to revision 10 of the SRP specification.

The I/O class of revision 10 is 0xFF00 and the I/O class of revision
16A is 0x0100.

For supporting older targets, this patch:

1) Adds a new optional target creation parameter "io_class". Default
   value of io_class is 0x0100 (i.e. revision 16A)
2) Uses the correct port identifier format for targets with IO class
   of 0xFF00 (i.e. conforming to revision 10)

Signed-off-by: Ramachandra K <rkuchimanchi@silverstorm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:38 -07:00
Leonid Arsh 508e434123 IPoIB: Handle client reregister events
Handle client reregister events by treating them just like LID or
SM changes -- flush all cached paths and rejoin multicast groups.

Signed-off-by: Leonid Arsh <leonida@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:36 -07:00
Jack Morgenstein 37c22a7721 IPoIB: Fix kernel unaligned access on ia64
Fix misaligned access faults on ia64: never cast a misaligned
neighbour->ha + 4 pointer to union ib_gid type; pass a void * pointer
instead.  The memcpy was being optimized to use full word accesses
because the compiler thought that union ib_gid is always aligned.

The cast in IPOIB_GID_ARG is safe, since it is fixed to access each
byte separately.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:35 -07:00
Roland Dreier 31c02e2157 IPoIB: Avoid using stale last_send counter when reaping AHs
The comparisons of priv->tx_tail to ah->last_send in ipoib_free_ah()
and ipoib_post_receive() are slightly unsafe, because priv->tx_lock is
not held and hence a stale value of ah->last_send might be used, which
would lead to freeing an AH before the driver was really done with it.
The simple way to fix this is to the optimization of early free from
ipoib_free_ah() and unconditionally queue AHs for reaping, and then
take priv->tx_lock in __ipoib_reap_ah().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:34 -07:00
Roland Dreier 6bfa24fa3e IB/srp: Get rid of "Target has req_lim 0" messages
It's perfectly valid for a connection to an SRP target to have a
request limit of 0, so get rid of the message about it, which can spam
kernel logs even with printk_ratelimit().  Keep a count of such events
in a "zero_req_lim" SCSI host attribute instead, so someone who cares
can look at the statistics.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:33 -07:00
Ishai Rabinovitz b7ac4ab497 IB/srp: Handle DREQ events from CM
Handle IB_CM_DREQ_ERROR and IB_CM_DREQ_RECEIVED events from the CM,
instead of just printing "Unhandled CM event".  In the case of
DREQ_ERROR, just ignore the event -- a TIMEWAIT_EXIT will be generated
also.  For DREQ_RECEIVED, send a DREP in response to shut the
connection down cleanly.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:32 -07:00
Vu Pham 74b0a15b5e IB/srp: Allow sg_tablesize to be adjusted
Make the sg_tablesize used by SRP adjustable at module load time via a
module parameter.  Calculate the corresponding IU length required to
support this.

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:32 -07:00
Vu Pham 52fb2b50c4 IB/srp: Allow cmd_per_lun to be set per target port
Allow userspace to throttle traffic on a given connection to a target
port by adding "max_cmd_per_lun=xyz" to lower the cmd_per_lun value
set for that scsi_host.

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:31 -07:00
Ishai Rabinovitz 0c5b395239 IB/srp: Clean up loop in srp_remove_one()
Interrupts will always be enabled in srp_remove_one(), so
spin_lock_irq() can be used instead of spin_lock_irqsave().
Also, the loop takes target->scsi_host->host_lock, so target->state
can just be set to SRP_TARGET_REMOVED witout testing the old value.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:31 -07:00
Matthew Wilcox b3589fd490 IB/srp: Change target_mutex to a spinlock
The SRP driver never sleeps while holding target_mutex, and it's just
used to protect some simple list operations, so hold times will be
short.  So just convert it to a spinlock, which is smaller and faster.

Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:30 -07:00
Matthew Wilcox 549c5fc2c8 IB/srp: Get rid of unneeded use of list_for_each_entry_safe()
list_for_each_entry_safe() is used in one place where the list isn't
modified.  So just change it to list_for_each_entry().

Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:30 -07:00
Matthew Wilcox 1962a4a1e4 IB/srp: Use SCAN_WILD_CARD from SCSI headers
SCAN_WILD_CARD is indeed available from <scsi/scsi.h>, which is
already included.  So get rid of private hack.

Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:30 -07:00
Roland Dreier f5358a172f IB/srp: Use FMRs to map gather/scatter lists
Create an SRP FMR pool on HCAs that support FMRs, and use FMRs to map
gather/scatter lists that have more than one entry into a single
memory region that appears virtually contiguous to the SRP target
(which is the RDMA initiator).

This patch bails out on FMR mapping for SCSI commands where the
gather/scatter list cannot be mapped into a single FMR because there
are sub-page-sized entries in middle of the list.  An unaligned
start or end of the list is OK.

Based on a patch by Vu Pham <vuhuong@mellanox.com>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17 20:37:29 -07:00
Eli Cohen 959eb39297 IPoIB: Fix AH leak at interface down
When ipoib_stop() is called it first calls netif_stop_queue() to stop
the kernel from passing more packets to the network driver. However,
the completion handler may call netif_wake_queue() re-enabling packet
transfer.

This might result in leaks (we see AH leaks which we think can be
attributed to this bug) as new packets get posted while the interface
is going down.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Michael Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-05 09:51:36 -07:00
Ishai Rabinovitz 093beac189 IB/srp: Complete correct SCSI commands on device reset
When flushing out queued commands after a successful device reset,
make sure that SRP completes the right commands, instead of calling
scsi_done on the command passed into the device reset handler over and
over.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-17 09:20:48 -07:00
Roland Dreier ec2d720849 IB/srp: Get rid of extra scsi_host_put()s if reconnection fails
If a reconnection attempt fails, then SRP does two scsi_host_put()s.
This is a historical relic from an earlier version of the driver that
took a reference on the scsi_host before trying to reconnect, so get
rid of the extra scsi_host_put().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-17 09:16:03 -07:00
Roland Dreier e65810566f IB/srp: Don't wait for disconnection if sending DREQ fails
Sending a DREQ may fail, for example because the remote target has
already broken the connection.  If so, then SRP should not wait for
the disconnection to complete, because it never will.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-17 09:13:21 -07:00
Roland Dreier 5941d079f2 IPoIB: Free child interfaces properly
When deleting a child interface with a non-default P_Key via
/sys/class/net/ibX/delete_child, the interface must be freed with
free_netdev() (rather than kfree() on the private data).

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-09 22:54:59 -07:00
Roland Dreier d945e1df28 IB/srp: Fix tracking of pending requests during error handling
If a SCSI abort completes, or the command completes successfully, then
the driver must remove the command from its queue of pending
commands.  Similarly, if a device reset succeeds, then all commands
queued for the given device must be removed from the queue.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-05-09 10:50:28 -07:00
Roland Dreier f80887d0b9 IB/srp: Remove request from list when SCSI abort succeeds
If a SCSI abort succeeds, then the aborted request should to be
removed from the list of pending requests.  This fixes list corruption
after an abort occurs.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-19 11:40:10 -07:00
Roland Dreier f697f74a6b IPoIB: Use spin_lock_irq() instead of spin_lock_irqsave()
We know ipoib_flush_paths() is called from plain process context with
interrupts enabled, since it does wait_for_completion().  So there's
no need to use spin_lock_irqsave() -- spin_lock_irq() is fine.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-10 09:43:59 -07:00
Eli Cohen a30bb96c6f IPoIB: Close race in ipoib_flush_paths()
ib_sa_cancel_query() must be called with priv->lock held since
a completion might arrive and set path->query to NULL.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-10 09:43:59 -07:00
Shirley Ma 0f4852513f IPoIB: Make send and receive queue sizes tunable
Make IPoIB's send and receive queue sizes tunable via module
parameters ("send_queue_size" and "recv_queue_size").  This allows the
queue sizes to be enlarged to fix disastrously bad performance on some
platforms and workloads, without bloating memory usage when large
queues aren't needed.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-10 09:43:58 -07:00
Eli Cohen f2de3b0612 IPoIB: Wait for join to finish before freeing mcast struct
ipoib_mcast_restart_task() might free an mcast object while a join
request is still outstanding, leading to an oops when the query
completes.  Fix this by waiting for query to complete, similar to what
ipoib_stop_thread() is doing.  The wait for mcast completion code is
consolidated in wait_for_mcast_join().

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-10 09:43:58 -07:00
Jack Morgenstein bf6a9e31cf IB: simplify static rate encoding
Push translation of static rate to HCA format into low-level drivers,
where it belongs.  For static rate encoding, use encoding of rate
field from IB standard PathRecord, with addition of value 0, for
backwards compatibility with current usage.  The changes are:

 - Add enum ib_rate to midlayer includes.
 - Get rid of static rate translation in IPoIB; just use static rate
   directly from Path and MulticastGroup records.
 - Update mthca driver to translate absolute static rate into the
   format used by hardware.  This also fixes mthca's static rate
   handling for HCAs that are capable of 4X DDR.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-10 09:43:47 -07:00
Michael S. Tsirkin d2e0655ede IPoIB: Consolidate private neighbour data handling
Consolidate IPoIB's private neighbour data handling into
ipoib_neigh_alloc() and ipoib_neigh_free().  This will make it easier
to keep track of the neighbour structures that IPoIB is handling, and
is a nice cleanup of the code:

add/remove: 2/1 grow/shrink: 1/8 up/down: 100/-178 (-78)
function                                     old     new   delta
ipoib_neigh_alloc                              -      61     +61
ipoib_neigh_free                               -      36     +36
ipoib_mcast_join_finish                     1288    1291      +3
path_rec_completion                          575     573      -2
ipoib_mcast_join_task                        664     660      -4
ipoib_neigh_destructor                       101      92      -9
ipoib_neigh_setup_dev                         14       3     -11
ipoib_neigh_setup                             17       -     -17
path_free                                    238     215     -23
ipoib_mcast_free                             329     306     -23
ipoib_mcast_send                             718     684     -34
neigh_add_path                               705     650     -55

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-04 14:46:48 -07:00
Roland Dreier ce1823f032 IB/srp: Fix memory leak in options parsing
Fix memory leak if parsing destination GID fails.

Coverity bug 1042

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-03 09:31:04 -07:00
Roland Dreier f5545d24b8 IPoIB: Always build debugging code unless CONFIG_EMBEDDED=y
Don't allow CONFIG_INFINIBAND_IPOIB_DEBUG to be disabled unless
CONFIG_EMBEDDED is selected.  We want users (and especially distros)
to have this turned on unless they really need to save space, because
by the time we want debugging output, it's usually too late to rebuild
a kernel.  The debugging output can be controlled at runtime via the
debug_level module parameter in sysfs.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-04-02 14:39:19 -07:00
Roland Dreier ef12d45619 IPoIB: Fix oops with raw sockets
ipoib_hard_header() needs to handle the case that daddr is NULL.  This
can happen when packets are injected via a raw socket, and IPoIB
shouldn't oops in this case.

Reported by Anton Blanchard <anton@samba.org>

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-29 09:36:46 -08:00
Roland Dreier 3f89f83449 IB/srp: Fix unmapping of fake scatterlist
The recently merged patch to create a fake scatterlist for non-SG SCSI
commands had a bug: the driver ended up doing dma_unmap_sg() on a
scatterlist scmnd->request_buffer rather than the fake scatter list it
created.  Fix this so that the driver unmaps the same thing it maps.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-29 09:36:45 -08:00
Leonid Arsh 7a343d4c46 IPoIB: P_Key change event handling
This patch causes the network interface to respond to P_Key change
events correctly.  As a result, you'll see a child interface in the
"RUNNING" state (netif_carrier_on()) only when the corresponding P_Key
is configured by the SM.  When SM removes a P_Key, the "RUNNING" state
will be disabled for the corresponding network interface.  To
implement this, I added IB_EVENT_PKEY_CHANGE event handling.  To
prevent flushing the device before the device is open by the "delay
open" mechanism, I added an additional device flag called
IPOIB_FLAG_INITIALIZED.

This also prevents the child network interface from trying to join to
multicast groups until the PKEY is configured.  We used to get error
messages like:

    ib0.f2f2: couldn't attach QP to multicast group ff12:401b:f2f2:0:0:0:ffff:ffff

in this case.  To fix this, I just check IPOIB_FLAG_OPER_UP flag in
ipoib_set_mcast_list().

Signed-off-by: Leonid Arsh <leonida@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-24 15:47:30 -08:00
Leonid Arsh 4e37b95616 IPoIB: Fix network interface "RUNNING" status
With the current IPoIB driver, the status of network interfaces stays
"RUNNING" even if the link goes down (for example because a cable is
unplugged).  Fix this by flushing the IPoIB interface when the link
goes down.

Signed-off-by: Leonid Arsh <leonida@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-24 15:47:29 -08:00
Roland Dreier cf368713a3 IB/srp: Use a fake scatterlist for non-SG SCSI commands
Since the SCSI midlayer is moving towards entirely getting rid of
commands with use_sg == 0, we should treat this case as an exception.
Therefore, change the IB SRP initiator to create a fake scatterlist
for these commands with sg_init_one().  This simplifies the flow of
DMA mapping and unmapping, since SRP can just use dma_map_sg() and
dma_unmap_sg() unconditionally, rather than having to choose between
the dma_{map,unmap}_sg() and dma_{map,unmap}_single() variants.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-24 15:47:26 -08:00
Leonid Arsh 6f633c8d69 IPoIB: Pass correct pointer when flushing child interfaces
ipoib_ib_dev_flush() should get passed cpriv->dev, not &cpriv->dev.

Signed-off-by: Leonid Arsh <leonida@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-24 15:47:25 -08:00
Linus Torvalds 3d1f337b3e Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (235 commits)
  [NETFILTER]: Add H.323 conntrack/NAT helper
  [TG3]: Don't mark tg3_test_registers() as returning const.
  [IPV6]: Cleanups for net/ipv6/addrconf.c (kzalloc, early exit) v2
  [IPV6]: Nearly complete kzalloc cleanup for net/ipv6
  [IPV6]: Cleanup of net/ipv6/reassambly.c
  [BRIDGE]: Remove duplicate const from is_link_local() argument type.
  [DECNET]: net/decnet/dn_route.c: fix inconsequent NULL checking
  [TG3]: make drivers/net/tg3.c:tg3_request_irq() static
  [BRIDGE]: use LLC to send STP
  [LLC]: llc_mac_hdr_init const arguments
  [BRIDGE]: allow show/store of group multicast address
  [BRIDGE]: use llc for receiving STP packets
  [BRIDGE]: stp timer to jiffies cleanup
  [BRIDGE]: forwarding remove unneeded preempt and bh diasables
  [BRIDGE]: netfilter inline cleanup
  [BRIDGE]: netfilter VLAN macro cleanup
  [BRIDGE]: netfilter dont use __constant_htons
  [BRIDGE]: netfilter whitespace
  [BRIDGE]: optimize frame pass up
  [BRIDGE]: use kzalloc
  ...
2006-03-21 09:31:48 -08:00
Arnaldo Carvalho de Melo e35fc38565 [INFINIBAND] ipoib: Remove leftover use of neigh_ops->destructor
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:46:40 -08:00
Michael S. Tsirkin c5ecd62c25 [NET]: Move destructor from neigh->ops to neigh_params
struct neigh_ops currently has a destructor field, which no in-kernel
drivers outside of infiniband use.  The infiniband/ulp/ipoib in-tree
driver stashes some info in the neighbour structure (the results of
the second-stage lookup from ARP results to real link-level path), and
it uses neigh->ops->destructor to get a callback so it can clean up
this extra info when a neighbour is freed.  We've run into problems
with this: since the destructor is in an ops field that is shared
between neighbours that may belong to different net devices, there's
no way to set/clear it safely.

The following patch moves this field to neigh_parms where it can be
safely set, together with its twin neigh_setup.  Two additional
patches in the patch series update ipoib to use this new interface.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:25:41 -08:00
Roland Dreier bfef73fa78 IPoIB: Get rid of useless test of queue length
In neigh_add_path(), the queue of delayed packets can never be full,
because the queue is always freshly created and cannot be found by any
other code path.  In fact, the test of the queue length is worse than
useless: if somehow the test ever triggered and path_rec_start() also
failed, then dev_kfree_skb_any() will be called twice on the same skb.
Fix this by deleting the useless test.  Pointed out by Michael
S. Tsirkin <mst@mellanox.co.il>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:26 -08:00
Roland Dreier bf17c1c7cc IB/srp: Coverity fix to srp_parse_options()
Fix leak found by Coverity: in the SRP_OPT_DGID case,
srp_parse_options() didn't free the result of match_strdup().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:25 -08:00
Jack Morgenstein 0b3ea0829c IPoIB: Move ipoib_ib_dev_flush() to ipoib workqueue
Move ipoib_ib_dev_flush() to ipoib's workqueue.  This keeps it ordered
with respect to other work scheduled by the ipoib driver.  This fixes
problems with races, for example:
 - ipoib_ib_dev_flush() has started running because of an IB event
 - user does ifconfig ib0 down
 - ipoib_mcast_stop_thread() gets called twice and waits for the same
   completion twice

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:24 -08:00
Roland Dreier 8b9ab02b69 IPoIB: Fix build now that neighbour destructor is in neigh_params
Fix the IPoIB build (which is broken in net-2.6.17 because of my
screw-up, which left out this chunk in ipoib_multicast.c). 
The neighbour destructor is now in neigh_params, so we don't
need to clear it in the ops structure.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:24 -08:00
Roland Dreier 6ecb0c8496 IB/srp: Add SCSI host attributes to show target port
Add SCSI host attributes in sysfs that show the ID extension, IOC
GUID, service ID, P_Key and destination GID for each target port that
the SRP initiator connects to.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:23 -08:00
Michael S. Tsirkin 9acf6a8570 IPoIB: Fix multicast race between canceling and completing
ipoib_mcast_stop_thread currently tests mcast->query and if it is
NULL, does not perform wait_for_completion on the mcast and frees the
mcast object directly.

However, since both operations are done without locking, it is
possible that ipoib_mcast_join_complete is in progress on this mcast
object and has set mcast->query to NULL already.

Solve this by:
- taking priv->lock before we change mcast->query in ipoib_mcast_join_complete,
  and keeping it until we no longer need the mcast object
- taking priv->lock around mcast->query test in ipoib_mcast_stop_thread

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:20 -08:00
Eli Cohen 54d07e2a1e IPoIB: Clean up if posting receives fails
If posting receives in ipoib_ib_dev_open() fails, call
ipoib_ib_dev_stop() to move the device's QP back to the RESET state so
that we can try again later.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:19 -08:00
Eli Cohen 7343b231f2 IPoIB: Close race in setting mcast->ah
ipoib_mcast_send() tests mcast->ah twice.  If this value is changed
between these two points, we leak an skb.  However,
ipoib_mcast_join_finish() sets mcast->ah with no locking, so it could
race against ipoib_mcast_send().

As a solution, take priv->lock around assignment to mcast->ah thus
making sure ipoib_mcast_send() (which also takes priv->lock) is not in
flight.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:18 -08:00
Michael S. Tsirkin 44af79f952 IPoIB: clarify to_ipoib_neigh()
Cosmetic change: make alignment explicit in to_ipoib_neigh.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-20 10:08:16 -08:00
Roland Dreier 1285b3a0b0 IB/srp: Don't send task management commands after target removal
Just fail abort and reset requests that come in after we've already
decided to remove a target.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-03 17:50:16 -08:00
Roland Dreier 20b83382d1 IPoIB: Yet another fix for send-only joins
Even after the last fix, it's still possible for a send-only join to
start before the join for the broadcast group has finished.  This
could cause us to create a multicast group using attributes from the
broadcast group that haven't been initialized yet, so we would use
garbage for the Q_Key, etc.  Fix this by waiting until the broadcast
group's attached flag is set before starting send-only joins.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-02-11 12:22:12 -08:00
Michael S. Tsirkin 7bcb974ef6 IPoIB: Fix another send-only join race
Further, there's an additional issue that I saw in testing:
ipoib_mcast_send may get called when priv->broadcast is NULL (e.g. if
the device was downed and then upped internally because of a port
event).

If this happends and the send-only join request gets completed before
priv->broadcast is set, we get an oops.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-02-07 16:39:26 -08:00
Michael S. Tsirkin 479a079663 IPoIB: Don't start send-only joins while multicast thread is stopped
Fix the following race scenario:
  - Device is up.
  - Port event or set mcast list triggers ipoib_mcast_stop_thread,
    this cancels the query and waits on mcast "done" completion.
  - Completion is called and "done" is set.
  - Meanwhile, ipoib_mcast_send arrives and starts a new query,
    re-initializing "done".

Fix this by adding a "multicast started" bit and checking it before
starting a send-only join.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-02-07 16:37:08 -08:00
Ingo Molnar 8e9e5f4f5e IB/srp: Semaphore to mutex conversion
Convert srp_host->target_mutex from a semaphore to a mutex.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-01-30 15:21:21 -08:00
Michael S. Tsirkin b36f170b61 IPoIB: Lock accesses to multicast packet queues
Avoid corrupting mcast->pkt_queue by serializing access with
priv->tx_lock.  Also, update dropped packet statistics to count
multicast packets removed from pkt_queue as dropped.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-01-17 12:19:40 -08:00
Michael S. Tsirkin 47f7a0714b IPoIB: Make sure path is fully initialized before using it
The SA path record query completion can initialize path->pathrec.dlid
before IPoIB's callback runs and initializes path->ah, so we must test
ah rather than dlid.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-01-17 09:22:05 -08:00
Ingo Molnar 95ed644fd1 IB: convert from semaphores to mutexes
semaphore to mutex conversion by Ingo and Arjan's script.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
[ Sanity-checked on real IB hardware ]
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-01-13 14:51:39 -08:00
Eli Cohen 988bd50300 IPoIB: Fix memory leak of multicast group structures
The current handling of multicast groups in IPoIB ends up never
freeing send-only multicast groups.  It turns out the logic was much
more complicated than it needed to be; we can fix this bug and
completely kill ipoib_mcast_dev_down() at the same time.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-01-12 14:32:20 -08:00
Michael S. Tsirkin 78bfe0b5b6 IPoIB: Take dev->xmit_lock around mc_list accesses
dev->mc_list accesses must be protected by dev->xmit_lock.
Found by Eli Cohen <eli@mellanox.co.il>.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-01-11 11:47:34 -08:00
Eli Cohen 97460df37e IPoIB: Fix address handle refcounting for multicast groups
Multiple ipoib_neigh structures on mcast->neigh_list may point to the
same ah.  This means that ipoib_mcast_free() can't just make a list of
ah structs to free, since this might end up trying to add the same ah
to the list more than once.  Handle this in ipoib_multicast.c in the
same way as it is handled in ipoib_main.c for struct ipoib_path.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-01-10 07:43:02 -08:00
Eli Cohen 70b4c8cdc1 IPoIB: Fix error path in ipoib_mcast_dev_flush()
Don't leak memory on allocation failure for broadcast mcast group.
Also, print a warning to match handling for other mcast groups.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-01-10 07:42:14 -08:00
Sean Hefty cf311cd49a IB: Add node_guid to struct ib_device
Add a node_guid field to struct ib_device.  It is the responsibility
of the low-level driver to initialize this field before registering a
device with the midlayer.  Convert everyone to looking at this field
instead of calling ib_query_device() when all they want is the node
GUID, and remove the node_guid field from struct ib_device_attr.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-01-10 07:39:34 -08:00
Tim Schmielau de25968cc8 [PATCH] fix more missing includes
Include fixes for 2.6.14-git11.  Should allow to remove sched.h from
module.h on i386, x86_64, arm, ia64, ppc, ppc64, and s390.  Probably more
to come since I haven't yet checked the other archs.

Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:13:45 -08:00
Arnaldo Carvalho de Melo 14c850212e [INET_SOCK]: Move struct inet_sock & helper functions to net/inet_sock.h
To help in reducing the number of include dependencies, several files were
touched as they were getting needed headers indirectly for stuff they use.

Thanks also to Alan Menegotto for pointing out that net/dccp/proto.c had
linux/dccp.h include twice.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:21 -08:00
Roland Dreier 267ee88ed3 IPoIB: fix error handling in ipoib_open
If ipoib_ib_dev_up() fails after ipoib_ib_dev_open() is called, then
ipoib_ib_dev_stop() needs to be called to clean up.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-29 10:55:58 -08:00
Michael S. Tsirkin 4f71055a45 IPoIB: protect child list in ipoib_ib_dev_flush
race condition: ipoib_ib_dev_flush is accessing child list without locks.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-29 10:53:30 -08:00
Roland Dreier 2e86541ec8 IPoIB: don't zero members after we allocate with kzalloc
ipoib_mcast_alloc() uses kzalloc(), so there's no need to zero out
members of the mcast struct after it's allocated.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-29 10:25:23 -08:00
Michael S. Tsirkin de92248789 IPoIB: reinitialize mcast structs' completions for every query
Make sure mcast->done is initialized to uncompleted value before we
submit a new query, so that it's safe to wait on.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-29 10:18:45 -08:00
Roland Dreier 5872a9fc28 IPoIB: always set path->query to NULL when query finishes
Always set path->query to NULL when the SA path record query
completes, rather than only when we don't have an address handle.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-29 10:13:54 -08:00
Roland Dreier 65c7eddaba IPoIB: reinitialize path struct's completion for every query
It's possible that IPoIB will issue multiple SA queries for the same
path struct.  Therefore the struct's completion needs to be
initialized for each query rather than only once when the struct is
allocated, or else we might not wait long enough for later queries to
finish and free the path struct too soon.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-28 21:20:34 -08:00
Roland Dreier 47f2bce902 [IB] srp: don't post receive if no send buf available
Have __srp_get_tx_iu() fail if the target port's request limit will
not allow the initiator to post a send.  This avoids continuing on and
posting a receive, and then failing to post a corresponding send.  If
that happens, then the initiator will end up with an extra receive
posted, and if this happens to much, the receive queue will overflow.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-15 00:19:21 -08:00
Roland Dreier 5f068992a1 [IB] srp: increase max_luns
Increase SRP max_luns to 512 to match the kernel's default, since SRP
storage targets can have lots of LUNs and the SRP initiator itself
doesn't have any particular limit.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-11 14:06:01 -08:00
Linus Torvalds 78b9c0f91c Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband 2005-11-10 13:27:06 -08:00
Roland Dreier 8c608a32e3 [IPoIB] no need to set skb->dev right before freeing skb
For cut-and-paste reasons, the IPoIB driver was setting skb->dev right
before calling dev_kfree_skb_any().  Get rid of this.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-10 10:22:50 -08:00
Roland Dreier 1732b0ef3b [IPoIB] add path record information in debugfs
Add ibX_path files to debugfs that contain information about the IPoIB
path cache.  IPoIB ARP only gives GIDs, which the IPoIB driver must
resolve to real IB paths through the ib_sa module.  For debugging,
when the ARP table looks OK but traffic isn't flowing, it's useful to
be able to see if the resolution from GID to path worked.

Also clean up the formatting of the existing _mcg debugfs files.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-10 10:22:49 -08:00
Olaf Hering 733482e445 [PATCH] changing CONFIG_LOCALVERSION rebuilds too much, for no good reason
This patch removes almost all inclusions of linux/version.h.  The 3
#defines are unused in most of the touched files.

A few drivers use the simple KERNEL_VERSION(a,b,c) macro, which is
unfortunatly in linux/version.h.

There are also lots of #ifdef for long obsolete kernels, this was not
touched.  In a few places, the linux/version.h include was move to where
the LINUX_VERSION_CODE was used.

quilt vi `find * -type f -name "*.[ch]"|xargs grep -El '(UTS_RELEASE|LINUX_VERSION_CODE|KERNEL_VERSION|linux/version.h)'|grep -Ev '(/(boot|coda|drm)/|~$)'`

search pattern:
/UTS_RELEASE\|LINUX_VERSION_CODE\|KERNEL_VERSION\|linux\/\(utsname\|version\).h

Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:55:57 -08:00
Linus Torvalds 127f2fa31a Merge branch 'srp' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband 2005-11-04 16:32:36 -08:00
Roland Dreier 8ae5a8a24f [IPoIB] don't compile debug code if debugging isn't enabled
Don't build ipoib_mcast_iter_ functions if CONFIG_INFINIBAND_IPOIB_DEBUG
is not enabled -- their only callers will not be built either.

Also move the prototype for ipoib_open() to ipoib.h to fix a sparse warning.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-02 20:51:01 -08:00
Roland Dreier aef9ec39c4 IB: Add SCSI RDMA Protocol (SRP) initiator
Add an InfiniBand SCSI RDMA Protocol (SRP) initiator.  This driver is
used to talk talk to InfiniBand SRP targets (storage devices).

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-02 14:07:13 -08:00
Roland Dreier 21a384897d [IPoIB] remove unneeded initializations to 0
Shrink our source and .text a little by removing a few assignments of
NULL and 0 to memory that is already cleared as part of the allocation.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-02 10:07:59 -08:00
Roland Dreier de6eb66b56 [IB] kzalloc() conversions
Replace kmalloc()+memset(,0,) with kzalloc(), for a net savings of 35
source lines and about 500 bytes of text.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-11-02 07:23:14 -08:00
Roland Dreier 3bc12e75b2 [IPoIB] cleanups: fix comment, remove useless variables
Minor cleanups: fix a misleading comment, and get rid of attr_mask
variables that are only used to hold constants (just use the constants
directly).

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-10-31 07:10:33 -08:00
Roland Dreier a20583a7c2 [IPoIB] use spin_trylock_irqsave()
Use spin_trylock_irqsave() in ipoib_start_xmit() instead of
reinventing it out of local_irq_save(), spin_trylock() and
local_irq_restore().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-10-29 13:54:40 -07:00
Roland Dreier 1993d683f3 [IPoIB] Drop RX packets when out of memory
Change the way IPoIB handles RX packets when it can't allocate a new
receive skbuff.  If the allocation of a new receive skb fails, we now
drop the packet we just received and repost the original receive skb.
This means that the receive ring always stays full and we don't have
to monkey around with trying to schedule a refill task for later.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-10-28 15:30:34 -07:00
Roland Dreier 4b2d319b53 [IPoIB] Improve ipoib_timeout() output
Use jiffies_to_msecs() so we print a human-readable time so
we don't have to worry about what HZ is configured to, and
print out a few values to make post-mortem analysis easier.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-10-18 12:20:06 -07:00
Roland Dreier 5b6810e048 [IPoIB] Rename ipoib_create_qp() -> ipoib_init_qp() and fix error cleanup
ipoib_create_qp() no longer creates IPoIB's QP, so it shouldn't
destroy the QP on failure -- that unwinding happens elsewhere, so the
current code can cause a double free.  While we're at it, the
function's name should match what it actually does, so rename it to
ipoib_init_qp().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-10-17 15:20:29 -07:00
Roland Dreier d70ed6075f [IPoIB] Rename IPoIB's path_lookup() to avoid name clashes
Rename IPoIB driver's path_lookup() to ipoib_path_lookup() to avoid a
clashes with the kernel global path_lookup().  We don't hit this with
the current kernel source, but some external patches seem to trigger
this, and it's cleaner to avoid clashing with global names anyway.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
refs/heads/for-linus
2005-09-28 19:56:57 -07:00
Roland Dreier 8d2cae0651 [PATCH] IPoIB: Don't flush workqueue from within workqueue
ipoib_mcast_restart_task() is always called from within the
single-threaded IPoIB workqueue, so flushing the workqueue from within
the function can lead to a recursion overflow.  But since we're
running in a single-threaded workqueue, we're already synchronized
against other items in the workqueue, so just get rid of the flush in
ipoib_mcast_restart_task().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-09-20 10:52:04 -07:00
Hal Rosenstock ce5b65cc96 [PATCH] IPoIB: Fix SA client retransmission strategy
We got a little mixed up with what the backoff member holds in the
IPoIB multicast group structure: sometimes it was used as a number of
seconds, and sometimes it was used as a number of jiffies.  Fix the
code so that backoff is always in seconds.

Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-09-18 22:02:38 -07:00
Michael S. Tsirkin 51574e0398 [PATCH] IPoIB: fix module removal race
Since ipoib uses queue_delayed_work to run flush task on port state events,
it must flush scheduled work after unregistering the event handler.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-09-18 22:02:37 -07:00
Michael S. Tsirkin 06c56e44f3 [PATCH] IPoIB: fix memory leak
Fix IPoIB memory leak on device removal.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-09-07 09:48:52 -07:00
Roland Dreier a4d61e8480 [PATCH] IB: move include files to include/rdma
Move the InfiniBand headers from drivers/infiniband/include to include/rdma.
This allows InfiniBand-using code to live elsewhere, and lets us remove the
ugly EXTRA_CFLAGS include path from the InfiniBand Makefiles.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-08-26 20:37:38 -07:00
Michael S. Tsirkin 1ad62a19f1 [PATCH] IPoIB: Fix device removal race
Currently we may have work scheduled in default kernel workqueue when
the device is going down.  The device could get freed before this
workqueue gets serviced.  I am actually seeing this causing system
hangs.

The following patch fixes this by using ipoib_workqueue which gets
flushed when the device is going down.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-08-26 20:37:38 -07:00
Roland Dreier 4ce059378c [PATCH] IPoIB: Set full membership bit in P_Keys
Always make sure that the full membership bit is set in the P_Keys
that IPoIB uses.  This makes sure that all hosts join the correct
multicast groups so that hosts that are partial partition members
can talk to the rest of the network.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-08-26 20:37:37 -07:00
Olaf Hering 2aeba9a03b [PATCH] IB: Remove unnecessary includes of <linux/version.h>
changing CONFIG_LOCALVERSION rebuilds too much, for no appearent reason.
Remove unneeded includes of <linux/version.h>.

Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-08-26 20:37:36 -07:00
Sean Hefty 97f52eb438 [PATCH] IB: sparse endianness cleanup
Fix sparse warnings.  Use __be* where appropriate.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-08-26 20:37:35 -07:00
Hal Rosenstock 92a6b34bf4 [PATCH] IB: Eliminate redundant NULL checks
IPoIB: Eliminate NULL checks prior to calling kfree

Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-08-26 20:37:35 -07:00
Roland Dreier 2a1d9b7f09 [PATCH] IB: Add copyright notices
Make some lawyers happy and add copyright notices for people who
forgot to include them when they actually touched the code.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-08-26 20:37:35 -07:00
Hal Rosenstock 0dca0f7bf8 [PATCH] [IPoIB] Handle sending of unicast RARP responses
RARP replies are another valid case where IPoIB may need to send a
unicast packet with no neighbour structure.

Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-07-28 13:17:26 -07:00
Roland Dreier 2181858bb8 [IB/ipoib]: Fix unsigned comparisons to handle wraparound
Fix handling of tx_head/tx_tail comparisons to handle wraparound.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-07-27 14:41:32 -07:00
Roland Dreier 9adec1a808 [PATCH] IPoIB: convert to debugfs
Convert IPoIB to use debugfs instead of its own custom debugging filesystem.

Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16 15:26:07 -07:00
Roland Dreier e6ded99cbb [PATCH] IPoIB: fix static rate calculation
Correct and simplify calculation of static rate.  We need to round up the
quotient of (local_rate - path_rate) / path_rate.  To round up we add
(path_rate - 1) to the numerator, so the quotient simplifies to (local_rate -
1) / path_rate.

No idea how I came up with the old formula.

Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16 15:26:06 -07:00
Hal Rosenstock 62241eb497 [PATCH] IPoIB: set skb->mac.raw on receive
Set skb->mac.raw on receive.  This fixes crashes when this is
dereferenced, for example by netfilter or when PF_PACKET is used.

Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16 15:26:05 -07:00
Linus Torvalds 1da177e4c3 Linux-2.6.12-rc2
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!
2005-04-16 15:20:36 -07:00