Adds more cases to do flogi retry, now also retry
on getting bad response due to either no ELS response
or flogi response payload length not large enough.
In those cases flogi was not retried and that
was leaving lport offline.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Currently timer delay is large and is using msleep to avoid
avoid exchanges collision across lport reset, so instead
do this by initializing exches pool indexes during
reset also.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Its checked after skb freed, so instead have fh_type
cached and then check FC_TYPE_BLS against cached
fh_type value.
This wrong check was causing double exch locking as
reported by Bhanu at
https://lists.open-fcoe.org/pipermail/devel/2011-October/011793.html
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
fix holes and better cache aligned fields.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Re-arrange its fields to avoid padding and have better
cacheline alignments.
Removed not used start_time, end_time and last_pkt_time
fields.
This all reduced this struct size to 448 from 480 and
that also reduced one cacheline on x86_64 beside
eliminating 8 pads. However kept logical fields together.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
In commit 6a716a8, while releasing the DDP context in case frame_send() failed,
the frame may already be freed, so we should store the pointer to fc_fcp_pkt and
release the DDP context using the locally stored fsp instead of getting fsp from
the fr_fsp(fp) on a frame.
Signed-off-by: Yi Zou <yi.zou@intel.com>
Reported-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Call fc_block_scsi_eh() in all fcoe eh to blocks
the scsi_eh thread for blocked rports.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Reviewed-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Current fc_eh_host_reset leaves lport offline
permanently due to FLOGI response getting
handled by LOGO response from last reset as both
had same exchange id.
So fix this by having end to end exches clean-up
using exchange abort along exches reset
done from fc_eh_host_reset. This would avoid
exchanges collision between the sessions across
the reset. In this case implicit login should have
done that but no aborting support for FIP
frames, so just wait till lport->r_a_tov before
restarting next flogi to ensure all exchanges
are good to use again for next session.
Below is the trace of LOGO from older session
coming ahead of FLOGI response with same exche id
0x203:-
617 86.435165 4e.00.0b -> ff.ff.fc FC ELS LOGO 0x203
618 86.435195 4e.00.0b -> b6.02.00 FC ELS LOGO 0x213
619 86.435220 4e.00.0b -> 18.03.00 FC ELS LOGO 0x223
620 86.435244 4e.00.0b -> 18.02.00 FC ELS LOGO 0x233
621 86.435267 4e.00.0b -> 18.01.00 FC ELS LOGO 0x243
622 86.435349 00.00.00 -> ff.ff.fe FC ELS FLOGI 0x203
623 86.435549 ff.ff.fc -> 4e.00.0b FC ELS ACC (LOGO) 0x203
624 86.438721 ff.ff.fe -> 4e.00.0b FC ELS ACC (FLOGI) 0x203
625 86.442059 18.03.00 -> 4e.00.0b FC ELS ACC (LOGO) 0x223
626 86.443683 b6.02.00 -> 4e.00.0b FC ELS ACC (LOGO) 0x213
627 86.447693 18.01.00 -> 4e.00.0b FC ELS ACC (LOGO) 0x243
628 86.453499 18.02.00 -> 4e.00.0b FC ELS ACC (LOGO) 0x233
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Reviewed-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
The lport retry timer hits warn on in case
it has become ready in response from fip
login from fcoe_ctlr_flogi_send(), this is
possible but safe code path, therefore
removing this warn on.
Jun 22 03:16:30 10.0.16.6 [488198.316517] host3: Assigned Port ID 180f02
Jun 22 03:16:32 10.0.16.6 [488200.091561] ------------[ cut here ]------------
Jun 22 03:16:32 10.0.16.6 [488200.091586] WARNING: at
drivers/scsi/libfc/fc_lport.c:1355 fc_lport_timeout+0xd9/0xe0 [libfc]()
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
fc_queuecommand() allocates an FCP packet for each SCSI command and sends
it out on the wire. In the process it stores the reference to the FCP packet
in the scsi_cmnd structure.
Now, in case under stress testing the libfc exchange layer runs out of
exchanges the fc_queuecommand() may not be able to send out commands out on
the wire. In such a scenario if there is an error in sending the FCP packet
out the wire; fc_queuecommand() deletes the FCP packet from internal queue,
releases the FCP packet and returns a SCSI_MLQUEUE_HOST_BUSY status to the
scsi-ml. But, the reference to the FCP packet set in the scsi_cmnd is not
removed from the scsi_cmnd in this code path.
This might lead to a crash under stress testing where the scsi_cmnd failed by
fc_queuecommand() comes up to fc_eh_abort() via scsi eh thread. fc_eh_abort()
will get reference to the FCP packet to be aborted from the scsi_cmnd for
further FCP abort related processing and then try to release the FCP packet
that has already been released.
This patch removes the FCP packet reference from the scsi_cmnd before returning
back from fc_queuecommand() in case of an error in sending out the FCP packet.
Signed-off-by: Neerav Parikh <Neerav.Parikh@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
The variable on stack, namely cdb_op, is not used but removed.
[ Patch reworked by Robert Love due to invalid patch format ]
Signed-off-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
One change is to cleanup typo in comment for fc_fcp_recv(), another corrects
the misleading comment for fc_fcp_abts_resp().
[ Patch reworked by Robert Love due to invalid patch format ]
Signed-off-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Drop the rx frame having xid with wrong cpu info
or received with xid not matching to our xid.
Not dropping such frame is causing panic as
that causes accessing data struct beyond their
bounds.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
If fail to create workqueue, the newly created cache for exchg has to be
released.
Signed-off-by: Hillf Danton <dhillf@gmail.com>
Reviewed-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Though defined, FC_MAX_ERROR_CNT is not used. It is used now for CRC error in
the path of receiving FCP frame.
Signed-off-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (77 commits)
[SCSI] fix crash in scsi_dispatch_cmd()
[SCSI] sr: check_events() ignore GET_EVENT when TUR says otherwise
[SCSI] bnx2i: Fixed kernel panic due to illegal usage of sc->request->cpu
[SCSI] bfa: Update the driver version to 3.0.2.1
[SCSI] bfa: Driver and BSG enhancements.
[SCSI] bfa: Added support to query PHY.
[SCSI] bfa: Added HBA diagnostics support.
[SCSI] bfa: Added support for flash configuration
[SCSI] bfa: Added support to obtain SFP info.
[SCSI] bfa: Added support for CEE info and stats query.
[SCSI] bfa: Extend BSG interface.
[SCSI] bfa: FCS bug fixes.
[SCSI] bfa: DMA memory allocation enhancement.
[SCSI] bfa: Brocade-1860 Fabric Adapter vHBA support.
[SCSI] bfa: Brocade-1860 Fabric Adapter PLL init fixes.
[SCSI] bfa: Added Fabric Assigned Address(FAA) support
[SCSI] bfa: IOC bug fixes.
[SCSI] bfa: Enable ASIC block configuration and query.
[SCSI] bnx2i: Updated copyright and bump version
[SCSI] bnx2i: Modified to skip CNIC registration if iSCSI is not supported
...
Fix up some trivial conflicts in:
- drivers/scsi/bnx2fc/{bnx2fc.h,bnx2fc_fcoe.c}:
Crazy broadcom version number conflicts
- drivers/target/tcm_fc/tfc_cmd.c
Just trivial cleanups done on adjacent lines
The rcu callback fc_rport_free_rcu() just calls a kfree(),
so we use kfree_rcu() instead of the call_rcu(fc_rport_free_rcu).
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Robert Love <robert.w.love@intel.com>
Cc: "James E.J. Bottomley" <James.Bottomley@suse.de>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Post an FCH_EVT_LIPRESET event on lport reset as
as lport reset occurs on FIP cleat virtual link,
this could be due to change in fcoe vlan and this
event will allow user app fcoemon to switch to
new fcoe vlan.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Problem: Linux based SW target (TCM) connected to windows initiator
was unable to satisfy write request of size > 2K.
Fix: Existing linux implememtation of FCoE stack is expecting sequence
number to match w.r.t incoming framme. When DDP is used on target in
response to write request from initiator, SW stack is notified only
when last data frame arrives and only the pakcket header of last data
frame is posted to NetRx queue of storage. When that last packet was
processed in libfc:Exchange layer, implementation was expecting
sequence number to match, but in this case sequence number which is
embedded in FC Header is assigned by windows initaitor, hence due to
sequence number mismatch post-processing which shall result into
sending RSP is not done. Enhanced the code to utilize the sequence
number of incoming last frame and process the packet so that, it will
eventually complete the write request by sending write response (RSP)
GOOD.
Notes/Dependencies: This patch is validated using windows and linux
initiator to make sure, it doesn't break anything.
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Problem: Existing RPORT state machine continues witg FLOGI/PLOGI
process only after it receices beacon from other end. Once claiming
stage is over (either clain notify or clain repose), beacon is sent
and state machine enters into operational mode where it initiates the
rlogin process (FLOGI/PLOGI) to the peer but before this rlogin is
initiated, exitsing implementation checks if it received beacon from
other end, it beacon is not received yet, rlogin process is not
initiated. Other end initiates FLOGI but peer end keeps on rejecting
FLOGI, hence after 3 retries other end deletes associated rport, then
sends a beacon. Once the beacon is received, peer end now initiates
rlogin to the peer end but since associated rport is deleted FLOGI is
neither accepted nor the reject response send out because rport is
deleted. Hence unable to proceed withg FLOGI/PLOGI process and fails
to establish VN2VN connection.
Fix: VN2VN spec is not standard yet but based on exitsing collateral
on T11, it appears that, both end shall send beacon and enter into
'operational mode' without explictly waiting for beacon from other
end. Fix is to allow the RPORT login process as long as respective
RPORT is created (as part of claim notification / claim response) even
though state of RPORT is INIT. Means don't wait for beacon from peer
end, if peer end initiates FLOGI (means peer end exist and
responding).
Notes: This patch is preparing the FCoE stack for target wrt
offload. This is generic patch and harmless even if applied on storage
initiator because 'else if' condition of function 'fcoe_oem_found'
shall evaluate to TRUE only for targets.
Dependencies: None
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Currently, when seq_send() fails in fc_fcp_send_data(),
fc_fcp_retry_cmd() would complete this failed I/O directly and let
scsi-ml retry. However, target side is not notified which may hang the
target. Instead, we should just bail out from from fc_fcp_send_data
and let scsi-ml times it out and aborts this I/O instead.
Signed-off-by: Yi Zou <yi.zou@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <jbottomley@parallels.com>
In this case fsp was freed before error handler was invoked,
this is fixed by having SRR fsp reference freed by exch
destructor so that fsp will be always held until it exch
is freed.
Also don't reset fsp->recov_seq since this is needed by
SRR error handler to do exch done.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <jbottomley@parallels.com>
In cases exch is already timed out then exch layer could
end up calling resp handler again for its response frame
received after timeout, though in this case fc_exch_timeout
handler would have already called resp with FC_EX_TIMEOUT.
This would cause REC response handler to release its
fsp pkt hold twice instead once and possibly similar issues
with other ELS exchanges in this race.
To avoid this race have resp updated under exch lock
in rx path, the resp would get set to NULL in case
of FC_EX_TIMEOUT under the same lock to prevent resp
callback after FC_EX_TIMEOUT.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <jbottomley@parallels.com>
In case frame_send() fails, make sure to let the underlying HW release the DDP
context that has already been set up before calling frame_send().
Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <jbottomley@parallels.com>
When handling incoming request, if the operation code carried by the
received frame is not RSCN, the frame should be freed as in the RSCN
case, or there is memory leakage.
Signed-off-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <jbottomley@parallels.com>
Added REC_TOV_CONST intent was to have rec tov as e_d_tov + 1s
but currently it is e_d_tov + 1ms since e_d_tov is stored in ms
unit.
Also returned rec tov by get_fsp_rec_tov is in ms and this ms tov
is used as-is with fc_fcp_timer_set expecting jiffies tov.
Fixed this by having get_fsp_rec_tov return rec tov in jiffies
as e_d_tov + 1s and then use jiffies tov w/ fc_fcp_timer_set.
Also some cleanup, no need to cache get_fsp_rec_tov return value
in local rec_tov at various places.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
As ema_list is already initialized by libfc_host_alloc.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The host_lock is still used to protect the can_queue
value in the Scsi_Host, but it doesn't need to be held
and released by each caller. This patch moves the lock
usage into the fc_fcp_can_queue_ramp_up and
fc_fcp_can_queue_ramp_down routines.
Signed-off-by: Robert Love <robert.w.love@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
EM anchors list initialization for only master port was not enough to
keep npiv working as described here:-
https://lists.open-fcoe.org/pipermail/devel/2011-January/011063.html
So this patch moves fc_exch_mgr_list_clone to update npiv ports
EMs once EM anchors list initialized.
Also some cleanup, no need to set lport = NULL as that always
get initialized later.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
When abort for an exchange timed out it didn't release the reference to
the exchange resulting in a memory leak.
After discussion with the author of the patch (CC) that introduced this
bug it was suggested to revert that patch.
This reverts commit ea3e2e72ee.
Signed-off by: Neerav Parikh <Neerav.Parikh@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
When an fcoe interface is being destroyed; in the process the
fcoe driver will try to release all the resources it had allocated
for that interface including rports. But, it seems that it does not
release the reference held for the name server rport in that process
resulting into a memory leak. This patch fixes that memory leak.
Signed-off-by: Neerav Parikh <neerav.parikh@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
This patch enables LLD to listen to rport events and perform LLD
specific operations based on the rport event. This patch also stores
sp_features and spp_type in rdata for further reference by LLD.
Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Problem:
From initaitor machine, when queried role of target (other end of connection),
it is "initiator", hence SCSI-ml doesn't send any LUN Inquiry commands.
Fix:
If there is a registered target for FC_TYPE_FCP, extend lport's params
(capability) to be target as well, By default lport params are
INITIATOR only. Having this fix, caused initiator to send SCSI LUN
inquiry command to target.
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Problem:
In case of exchange responder case, EMA selection was defaulted to the
last EMA from EMA list (lport.ema_list). If exchange ID is selected
from offload pool and not setup DDP, resulting into incorrect
selection of EMA, and eventually dropping the packet because unable to
find exchange.
Fix:
Enhanced the exchange ID selection (depending upon request type and
exchange responder) Made necessary enhancement in EMA selection
algorithm.
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Target modules using lport->tt.seq_assign() get a hold on the
exchange but have no way of releasing it. Add that.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
This patch removes the use of the Scsi_Host's host_lock
within fc_queuecommand. It also removes the DEF_SCSI_QCMD
usage so that libfc has fully moved on to the new
queuecommand interface.
Signed-off-by: Robert Love <robert.w.love@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Reviewed-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
When sending an outgoing PRLI as an initiator, get the parameters
from registered providers so that they all get a chance to decide
on roles.
The passive provider is called last, and could override the
initiator role.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
When an SCST provider is registered, it needs to know what
local ports are available for configuration as targets.
Add a notifier chain that is invoked when any local port
that is added or deleted.
Maintain a global list of local ports and add an
interator function that calls a given function for
every existing local port. This is used when first
loading a provider.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Add a method for setting handler for incoming exchange.
For multi-sequence exchanges, this allows the target driver
to add a response handler for handling subsequent sequences,
and exchange manager resets.
The new function is called fc_seq_set_resp().
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Allow FC-4 provider modules to hook into libfc, mostly for targets.
This should allow any FC-4 module to handle PRLI requests and maintain
process-association states.
Each provider registers its ops with libfc and then will be called for
any incoming PRLI for that FC-4 type on any instance. The provider
can decide whether to handle that particular instance using any method
it likes, such as ACLs or other configuration information.
A count is kept of the number of successful PRLIs from the remote port.
Providers are called back with an implicit PRLO when the remote port
is about to be deleted or has been reset.
fc_lport_recv_req() now sends incoming FC-4 requests to FC-4 providers,
and there is a built-in provider always registered for handling
incoming ELS requests.
The call to provider recv() routines uses rcu_read_lock()
so that providers aren't removed during the call. That lock is very
cheap and shouldn't affect any performance on ELS requests.
Providers can rely on the RCU lock to protect a session lookup as well.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Fix sparse warning for non-ANSI function declaration.
Declare workqueue structs as static.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Robert Love <robert.w.love@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
If we goto out, then it tries to call kfree_skb() on an ERR_PTR which
will oops. Just return directly.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
This patch makes it so that we only have one call to
fc_rport_error. This patch does not completely
consolidate return statements, there is still one return
used when not calling fc_rport_error, but alternative
solutions made the code more confusing.
[ Patch modified by Robert Love ]
[ Patch title and commit message edited by Robert Love
to make it more relevant ]
Signed-off-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Here ticks_left is added to record the result of
wait_for_completion_timeout().
[ Patch title and description edited by Robert Love
to make it more descriptive ]
Signed-off-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The fsp's xfer_ddp is used as indication of the exchange id for the DDPed
I/O. We should always initialize it as FC_XID_UNKNOWN for a newly allocated
fsp, otherwise the fsp allocated in fc_fcp, i.e., not from queuecommand like
LUN RESET that is not doing DDP may still think DDP is setup for it since xid
0 is valid and goes on to call fc_fcp_ddp_done() in fc_fcp_resp() from
fc_tm_done(). So, set xfer_ddp as FC_XID_UNKNOWN in fc_fcp_pkt_alloc() now.
Also removes the setting of fsp->lp as it's already done when fsp is allocated.
Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Conflicts:
MAINTAINERS
arch/arm/mach-omap2/pm24xx.c
drivers/scsi/bfa/bfa_fcpim.c
Needed to update to apply fixes for which the old branch was too
outdated.
The statistics for InputMegabytes and OutputMegabytes are
misnamed. They're accumulating bytes, not megabytes.
The statistic returned via /sys must be in megabytes, however,
which is what the HBA-API wants. The FCP code needs to accumulate
it in bytes and then divide by 1,000,000 (not 2^20) before it
presented via sysfs.
This affects fcoe.ko only, not fnic. The fnic driver
correctly by accumulating bytes and then converts to megabytes.
I checked that libhbalinux is using the /sys file directly without
conversion.
BTW, qla2xxx does divide by 2^20, which I'm not fixing here.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Frame should be freed in fc_tm_done, this is an updated patch on the one
initially submitted by Hillf Danton.
Signed-off-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The timeout for the exchange carrying REC itself is 2 * R_A_TOV_els.
Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Should not continue when the abort itself is being timeout since in that case
the exchange will be deleted and relesased. We still want to call the
associated response handler to let the layer, e.g., fcp, know the exchange
itself is being timed out.
Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Do not call fc_io_compl() on fsp w/o any scsi_cmnd, e.g., lun reset is built
inside fc_fcp, not from a scsi command from queuecommnd from scsi-ml, so in
in case target is buggy that is invalid flags in the FCP_RSP, as we have seen
in some SAN Blaze target where all bits in flags are 0, we do not want to call
io_compl on this fsp.
[ Comment block added by Robert Love ]
Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
This is very helpful to match up the corresponding exchange to the actual I/O
described by the fsp, particularly when you do a side-by-side comparison of
the syslog with your trace.
Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
There seems rdata should get put before return.
Signed-off-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
There seems info should get freed when error encountered.
Signed-off-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
There seems info should get freed when error encountered.
Signed-off-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
We can easily remove the tgt_flags from fc_fcp_pkt struct
and use rpriv->tgt_flags directly where needed.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Use the rport value for rec_tov for timeout values when
sending fcp commands. Currently, defaults are being used
which may or may not match the advertised values.
The default may cause i/o to timeout on networks that
set this value larger then the default value. To make
the timeout more configurable in the non-REC mode we
remove the FC_SCSI_ER_TIMEOUT completely allowing the
scsi-ml to do the timeout. This removes an unneeded
timer and allows the i/o timeout to be configured
using the scsi-ml knobs.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The fcp packet recovery handler fc_fcp_recover() is called
when errors occurr in a fcp session. Currently it is
generically setting the status code to FC_CMD_RECOVERY for
all error types. This results in DID_BUS_BUSY errors
being returned to the scsi-ml.
DID_BUS_BUSY errors indicate "BUS stayed busy through time
out period" according to scsi.h. Many of the error reported
by fc_rcp_recovery() are pkt errors. Here we update
fc_fcp_recovery to use better host byte codes.
With certain FAST FAIL flags set DID_BUS_BUSY and DID_ERROR
will have different behaviors this was causing dm multipath
to fail quickly in some cases where a retry would be a
better action.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
There seems accumulation needed.
Signed-off-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
There is a typo cleaned, which triggers memory leakage.
Signed-off-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The error handler grabs the si->scsi_queue_lock, but
in the case where the fsp pointer is NULL it releases
the scsi_host lock. This can lead to a variety of
system hangs depending on which is used first- the
scsi_host lock or the scsi_queue_lock.
This patch simply unlocks the correct lock when fcp
is NULL.
Signed-off-by: Robert Love <robert.w.love@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
For allocating new exch from pool, scanning for free slot in exch
array fluctuates when exch pool is close to exhaustion.
The fluctuation is smoothed, and the scan looks to be O(2).
Signed-off-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
There seems that ep should get released, or it will no longer get freed.
Signed-off-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The define for fc_seq_exch is unnecessary, since it also appears in scsi/libfc.h
Signed-off-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Move the mid-layer's ->queuecommand() invocation from being locked
with the host lock to being unlocked to facilitate speeding up the
critical path for drivers who don't need this lock taken anyway.
The patch below presents a simple SCSI host lock push-down as an
equivalent transformation. No locking or other behavior should change
with this patch. All existing bugs and locking orders are preserved.
Additionally, add one parameter to queuecommand,
struct Scsi_Host *
and remove one parameter from queuecommand,
void (*done)(struct scsi_cmnd *)
Scsi_Host* is a convenient pointer that most host drivers need anyway,
and 'done' is redundant to struct scsi_cmnd->scsi_done.
Minimal code disturbance was attempted with this change. Most drivers
needed only two one-line modifications for their host lock push-down.
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Acked-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When number of NPIV ports created are greater than the xids
allocated per pool -- for eg., creating 255 NPIV ports on a
system with nr_cpu_ids of 32, with each pool containing 128
xids -- and then generating a link event - for eg.,
shutdown/no shutdown -- on the switch port causes the hang
with the following stack trace.
Call Trace:
schedule_timeout+0x19d/0x230
wait_for_common+0xc0/0x170
__cancel_work_timer+0xcf/0x1b0
fc_disc_stop+0x16/0x30 [libfc]
fc_lport_reset_locked+0x47/0x90 [libfc]
fc_lport_enter_reset+0x67/0xe0 [libfc]
fc_lport_disc_callback+0xbc/0xe0 [libfc]
fc_disc_done+0xa8/0xf0 [libfc]
fc_disc_timeout+0x29/0x40 [libfc]
run_workqueue+0xb8/0x140
worker_thread+0x96/0x110
kthread+0x96/0xa0
child_rip+0xa/0x20
Fix is to not cancel the disc_work if discovery is already
stopped, thus allowing lport state machine to restart and try
discovery again.
Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Acked-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
It is unlikely but in case if it hits then it would cause panic
due to null cmd ptr, so far only one instance seen recently with
ESX though this was introduced long ago with this commit:-
commit c1ecb90a66
Author: Chris Leech <christopher.leech@intel.com>
Date: Thu Dec 10 09:59:26 2009 -0800
[SCSI] libfc: reduce hold time on SCSI host lock
Currently fsp->cmd is set to NULL w/o scsi_queue_lock before
dequeuing from scsi_pkt_queue and that could cause NULL
fsp->cmd in fc_fcp_cleanup_each_cmd for cmd completing
with fsp->cmd = NULL after fc_fcp_cleanup_each_cmd taken
reference. No need to set fsp->cmd to NULL as this is also
protected by fc_fcp_lock_pkt(), for above race the
fc_fcp_lock_pkt() in fc_fcp_cleanup_each_cmd() will fail
as that cmd is already done.
Mike mentioned same issue at
http://www.open-fcoe.org/pipermail/devel/2010-September/010533.html
Similarly moved sc_cmd->SCp.ptr = NULL under scsi_queue_lock so
that scsi abort error handler won't abort on completed cmds.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Sometimes switch in NPV mode rejects flogi request with DID
zero and in that case flogi is not tried again and port
remains offline, so this patch validates DID for non zero
along with only ACC response to allow flogi retry
for RJT with DID=0 also succeed FLOGI in next try.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
This is per Mile Christie feedback since in this case IO
could get retried for tape devices and therefore DID_REQUEUE
cannot be used, more details in this thread.
http://marc.info/?l=linux-scsi&m=127970522630136&w=2
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
There does not seem to be a reason why libfc adds a 5
second delay to the user requested value for the dev loss
tmo. There also does not seem to be a reason to allow
setting it to 0 (or really close).
This patch removes the extra 5 sec delay, and for 0 it
sets it to 1 like other fc drivers. We should actually
be able to set it to 0 since the queue_delayed_work API
will just call queue_work, but other drivers set it to 1 in
that case.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The rport port state and flags are set under the host lock,
so this patch calls fc_remote_port_chkready with the host lock
held like is also done in the other fc drivers.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Call fc_lport_error to retry upto max retry count when
FLOGI/SCR/NS gets rejected.
Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Incoming requests shouldn't require a local exchange if we're
just going to reply with one or two frames and don't expect
anything further. Don't allocate exchanges for such requests
until requested by the upper-layer protocol.
The sequence is always NULL for new requests, so remove
that as an argument to request handlers.
Also change the first argument to lport->tt.seq_els_rsp_send
from the sequence pointer to the received frame pointer, to
supply the exchange IDs and destination ID info.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
For incoming ELS and FCP requests, we often don't require an
exchange and sequence, however, sometimes we do. For those cases,
(primarily FCP requests for targets) add a function to set up
the exchange and sequence.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Add functions to fill in an FC header given a request header.
These reduces code lines in fc_lport and fc_rport and works
without an exchange/sequence assigned.
fc_fill_reply_hdr() fills a header for a final reply frame.
fc_fill_hdr() which is similar but allows specifying the
f_ctl parameter.
Add defines for F_CTL values FC_FCTL_REQ and FC_FCTL_RESP.
These can be used for most request and response sequences.
v2 of patch adds a line to copy the frame encapsulation
info from the received frame.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
To pave the way for eliminating exchanges from incoming requests,
add simple inline fc_frame_sid() and fc_frame_did() functions
which get the FC_IDs from the frame header. This can be almost
as efficient as getting them from the sequence/exchange.
Move ntohll, htonll, ntoh24 and hton24 to <scsi/fc_frame.h>
since we need them there and that's included by <scsi/libfc.h>
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The LOGO state hasn't been used in a while, except in a brief
transition to DELETE state while holding the rport mutex.
All port LOGO responses have been ignored as well as any timeout
if we don't get a response.
So this patch just removes LOGO state and simplifies the response handler.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
When an exchange is received with a FIP encapsulation, we need
to know that the response must be sent via FIP and what the original
ELS opcode was. This becomes important for VN2VN mode, where we may
receive FLOGI or LOGO from several peer VN_ports, and the LS_ACC or
LS_RJT must be sent FIP-encapsulated with the correct sub-type.
Add a field to the struct fc_frame, fr_encaps, to indicate the
encapsulation values. That term is chosen to be neutral and
LLD-agnostic in case non-FCoE/FIP LLDs might find it useful.
The frame fr_encaps is transferred from the ingress frame to the
exchange by fc_exch_recv_req(), and back to the outgoing frame
by fc_seq_send().
This is taking the last byte in the skb->cb array. If needed,
we could combine the info in sof, eof, flags, and encaps
together into one field, but it'd be better to do that if
and when its needed.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The FIP proposal for VN_port to VN_port point-to-multipoint
operation requires a FLOGI be sent to each remote port.
The FLOGI is sent with the assigned S_ID and D_IDs of the
local and remote ports. This and the response get
FIP-encapsulated for Ethernet.
Add FLOGI state to the remote port state machine.
This will be skipped if not in point-to-multipoint mode.
To reduce a little duplication between PLOGI and FLOGI
response handling, added fc_rport_login_complete(), which
handles the parameters for the rdata struct.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
For VN_port to VN_port mode, the transport sets the port_id and
there's no lport FLOGI. This is similar to FC loop mode.
Add a point_to_multipoint flag that indicates the local port is in
point-to-multipoint mode. This skips FLOGI and discovery.
It also skips resetting the port_id on resets other than link down.
Add function fc_lport_set_local_id() that sets the local port_id.
This is called by libfcoe on behalf of the low-level driver
to set the port_id when the link comes up.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
For VN_port to VN_port mode, FIP will do discovery and needs a
way to find its state from the local port or discovery structure.
It seems that any other LLD that implements its own discovery
would also need something like this.
Replace disc->lport with disc->priv, and use container_of to
find the lport. We could use disc->priv for that, but
container_of is smaller and faster.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Add pre-zeroed space after the allocation for fc_rport_priv
for use by the lower-level driver.
This is primarily for VN2VN FIP mode, but could be used in
other ways someday.
The space required is specified in lport->rport_priv_size.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
To allow LLD to do lookups on rports without grabbing a mutex,
make them RCU-safe. The caller of lport->tt.rport_lookup will
have the choice of holding disc_mutex or the rcu_read_lock().
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
In this case, sync IO fails with EIO(5) errors as:-
"Thread:1 System call error:5 - Input/output error (::pwrite() failed)".
This is due to IO time out while libfc doing link down processing
to block all rports and if timed out IO was at last retry
attempt then it fails to user with EIO error followed by
these log messages.
[77848.612169] host2: rport bf0015: Delete port
[77848.612221] host2: rport e10aef: work delete
[77848.612232] host2: rport e10002: work event 3
[77848.612422] sd 2:0:1:1: [sdi] Unhandled error code
[77848.612426] sd 2:0:1:1: [sdi] Result: hostbyte=DID_ERROR
driverbyte=DRIVER_OK
[77848.612431] sd 2:0:1:1: [sdi] CDB: Write(10): 2a 00 00 00 11 20 00 00 20 00
[77848.612445] end_request: I/O error, dev sdi, sector 4384
[77848.612553] sd 2:0:1:2: [sdj] Unhandled error code
To fix these EIO errors, such timed out incomplete IOs needs
to be re-queued without counting retry attempt and this patch
does that using DID_REQUEUE scsi code.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
This is exposed by a mpio test using EMC CLARiiON targets when LUN
tresspassing happens, the burst length from the XFER_READY for the
MODE SELECT(10) is 19 bytes, much smaller than FC_MIN_MAX_PAYLOAD as
256 bytes. This patch removes the related two WARN_ON()s.
Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Remote ports were restarting indefinitely after getting
rejects in PRLI.
Fix by adding a counter of restarts and limiting that with
the port login retry limit as well.
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
This patch somewhat combines two fixes to remote port handing in libfc.
The first problem was that rport work could be queued on a deleted
and freed rport. This is handled by not resetting rdata->event
ton NONE if the rdata is about to be deleted.
However, that fix led to the second problem, described by
Bhanu Gollapudi, as follows:
> Here is the sequence of events. T1 is first LOGO receive thread, T2 is
> fc_rport_work() scheduled by T1 and T3 is second LOGO receive thread and
> T4 is fc_rport_work scheduled by T3.
>
> 1. (T1)Received 1st LOGO in state Ready
> 2. (T1)Delete port & enter to RESTART state.
> 3. (T1)schdule event_work, since event is RPORT_EV_NONE.
> 4. (T1)set event = RPORT_EV_LOGO
> 5. (T1)Enter RESTART state as disc_id is set.
> 6. (T2)remember to PLOGI, and set event = RPORT_EV_NONE
> 6. (T3)Received 2nd LOGO
> 7. (T3)Delete Port & enter to RESTART state.
> 8. (T3)schedule event_work, since event is RPORT_EV_NONE.
> 9. (T3)Enter RESTART state as disc_id is set.
> 9. (T3)set event = RPORT_EV_LOGO
> 10.(T2)work restart, enter PLOGI state and issues PLOGI
> 11.(T4)Since state is not RESTART anymore, restart is not set, and the
> event is not reset to RPORT_EV_NONE. (current event is RPORT_EV_LOGO).
> 12. Now, PLOGI succeeds and fc_rport_enter_ready() will not schedule
> event_work, and hence the rport will never be created, eventually losing
> the target after dev_loss_tmo.
So, the problem here is that we were tracking the desire for
the rport be restarted by state RESTART, which was otherwise
equivalent to DELETE. A contributing factor is that we dropped
the lock between steps 6 and 10 in thread T2, which allows the
state to change, and we didn't completely re-evaluate then.
This is hopefully corrected by the following minor redesign:
Simplify the rport restart logic by making the decision to
restart after deleting the transport rport. That decision
is based on a new STARTED flag that indicates fc_rport_login()
has been called and fc_rport_logoff() has not been called
since then. This replaces the need for the RESTART state.
Only restart if the rdata is still in DELETED state
and only if it still has the STARTED flag set.
Also now, since we clear the event code much later in the
work thread, allow for the possibility that the rport may
have become READY again via incoming PLOGI, and if so,
queue another event to handle that.
In the problem scenario, the second LOGO received will
cause the LOGO event to occur again.
Reported-by: Bhanu Gollapudi <bprakash@broadcom.com>
Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
lport state is enum not bit mask.
Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Resubmitting after incorporating Joe's review comment.
Unsolicited PRLO request is now handled by sending LS_ACC,
and then relogin to the remote port if an N-port login
session exists for that remote port.
Note that this patch should be applied on top of Joe Eykholt's
"Fix remote port restart problem" patch.
Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
As per FC-LS Rev 1.62 table 46, response codes are handled as follows:
1. If the Req executed is true, PRLI is accepted.
2. If Req executed is not set, if resp code is 5,
PRLI is not retried and port is logged out.
3. If resp code is anything apart from 1 or 5, PRLI is retired
upto max retry count.
Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Retry upto max_rport_retry_count when a target responds with
LS_RJT for a PRLI request.
Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
This patch creates a port_id member in struct fc_lport.
This allows libfc to just deal with fc_lport instances
instead of calling into the fc_host to get the port_id.
This change helps in only using symbols necessary for
operation from the libfc structures. libfc still needs
to change the fc_host_port_id() if the port_id changes
so the presentation layer (scsi_transport_fc) can provide
the user with the correct value, but libfc shouldn't
rely on the presentation layer for operational values.
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>