Michael Reutman reports that an AMD/ATI EHCI host controller on one of
his computers does not stop transferring data when an active bulk QH
is unlinked from the async schedule. Apparently that host controller
fails to implement the IAA mechanism correctly when an active QH is
unlinked. This leads to data corruption, because the controller
continues to update the QH in memory when the driver doesn't expect
it. As a result, the next URB submitted for that QH can hang, because
the link pointers for the TD queue have been messed up. This
misbehavior is observed quite regularly.
To be fair, the EHCI spec (section 4.8.2) says that active QHs should
not be unlinked. It goes on to recommend a procedure that involves
waiting for the QH to go inactive before unlinking it. In the real
world this is impractical, not least because the QH may _never_ go
inactive. (What were they thinking?) Sometimes we have no choice but
to unlink an active QH.
In an attempt to avoid the problems that can ensue, this patch changes
how the driver decides when the unlink is complete. In addition to
waiting through two IAA cycles, in cases where the QH was not known to
be inactive beforehand we now wait until a 2-ms period has elapsed
with the host controller making no change to the QH data structure
(the hw_current and hw_token fields in particular). The intuition
here is that after such a long period, the endpoint must be NAKing and
hopefully the QH has been dropped from the host controller's internal
cache. There's no way to know if this reasoning is really valid --
the spec is no help in this regard -- but at least this approach fixes
Michael's problem.
The test for whether the QH is already known to be inactive involves
the reason for unlinking the QH originally. If it was unlinked
because it had halted, or it stopped in response to a short read, or
it overlaid a dummy TD (a silicon bug), then it certainly is inactive.
If it was unlinked because the TD queue was empty and no TDs have been
added to the queue in the meantime, then it must be inactive. Or if
the hardware status indicates that the QH is currently halted (even if
that wasn't the reason for unlinking it), then it is inactive.
Otherwise, if none of those checks apply, we go through the 2-ms
delay.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Michael Reutman <mreutman@epiqsolutions.com>
Tested-by: Michael Reutman <mreutman@epiqsolutions.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch improves the way ehci-hcd handles the iaa_in_progress flag.
The current code is somewhat careless in this regard:
The flag is meaningless when the root hub isn't running, most
particularly after the root hub has been suspended. But in
start_iaa_cycle(), the driver checks the flag before checking
the root hub's state. They should be checked in the opposite
order.
That routine also sets the flag too early, before it has
definitely committed to starting an IAA cycle.
The flag is turned off in end_unlink_async(). Upcoming
changes will call that routine at other times, not just at the
end of an IAA cycle. The two actions are logically separate
(although related), so we separate out a new routine to be
called in place of end_unlink_async() whenever an IAA cycle
ends: end_iaa_cycle().
iaa_in_progress should be turned off when the root hub is
suspended -- we certainly don't want it still to be set when
the root hub resumes. Therefore the call to
end_unlink_async() in ehci_bus_suspend() should also be
replaced with a call to end_iaa_cycle().
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch replaces the "exception" bitflag in the ehci_qh structure
with a more explicit "unlink_reason" bitmask. This is for use in the
following patch, where we will need to have a good idea of the
reason for unlinking a QH, not just "something exceptional happened".
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Michael Reutman <mreutman@epiqsolutions.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ehci-hcd currently unlinks an interrupt QH when it becomes empty, that
is, after its last URB completes. This works well because in almost
all cases, the completion handler for an interrupt URB resubmits the
URB; therefore the QH doesn't become empty and doesn't get unlinked.
When we start using tasklets for URB completion, this scheme won't work
as well. The resubmission won't occur until the tasklet runs, which
will be some time after the completion is queued with the tasklet.
During that delay, the QH will be empty and so will be unlinked
unnecessarily.
To prevent this problem, this patch adds a 5-ms time delay before empty
interrupt QHs are unlinked. Most often, during that time the interrupt
URB will be resubmitted and thus we can avoid unlinking the QH.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch (as1665) changes the way ehci-hcd's end_unlink_async()
routine works in order to avoid recursive execution and to be more
efficient:
Now when an IAA cycle ends, a new one gets started up right
away (if it is needed) instead of waiting until the
just-unlinked QH has been processed.
The async_iaa list is renamed to async_idle, which better
expresses its new purpose: It is now the list of QHs which are
now completely idle and are waiting to be processed by
end_unlink_async().
A new flag is added to track whether an IAA cycle is in
progress, because the list formerly known as async_iaa no
longer stores the QHs waiting for the IAA to finish.
The decision about how many QHs to process when an IAA cycle
ends is now made at the end of the cycle, when we know the
current state of the hardware, rather than at the beginning.
This means a bunch of logic got moved from start_iaa_cycle()
to end_unlink_async().
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch (as1664) converts ehci-hcd's async_unlink, async_iaa, and
intr_unlink from singly-linked lists to standard doubly-linked
list_heads. Originally it didn't seem necessary to use list_heads,
because items are always added to and removed from these lists in FIFO
order. But now with more list processing going on, it's easier to use
the standard routines than continue with a roll-your-own approach.
I don't know if the code ends up being notably shorter, but the
patterns will be more familiar to any kernel hacker.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch (as1671) fixes up an incorrect resolution of a merge
conflict between Greg KH's usb-linus branch and his usb-next branch.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is to pick up the fixes in that branch, and let Alan fix the merge
error in drivers/usb/host/ehci-timer.c better than I just did (as I know
I messed it up...)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch (as1670) fixes a regression caused by commit
6402c796d3 (USB: EHCI: work around
silicon bug in Intel's EHCI controllers). The workaround goes through
two IAA cycles for each QH being unlinked. During the first cycle,
the QH is not added to the async_iaa list (because it isn't fully gone
from the hardware yet), which means that list will be empty.
Unfortunately, I forgot to update the IAA watchdog timer routine. It
thinks that an empty async_iaa list means the timer expiration was an
error, which isn't true any more. This problem didn't show up during
initial testing because the controllers being tested all had working
IAA interrupts. But not all controllers do, and when the watchdog
timer expires, the empty-list check prevents the second IAA cycle from
starting. As a result, URB unlinks never complete. The check needs
to be removed.
Among the symptoms of the regression are processes stuck in D wait
states and hangs during system shutdown.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: Stephen Warren <swarren@wwwdotorg.org>
Reported-and-tested-by: Sven Joachim <svenjoac@gmx.de>
Reported-by: Andreas Bombe <aeb@debian.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch (as1635) rearranges the control-flow logic in
ehci_iaa_watchdog() slightly to agree better with the comments. It
also changes a verbose-debug message to a regular debug message.
Expiration of the IAA watchdog is an unusual event and can lead to
problems; we need to know about it if it happens during debugging. It
should not be necessary to set a "verbose" compilation option.
No behavioral changes other than the debug message. Lots of apparent
changes to the source text, though, because the indentation level was
decreased.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch (as1657) decreases the timeout used by ehci-hcd for polling
the async and periodic schedule statuses. The timeout is currently
set to 20 ms, which is much too high. Controllers should always
update the schedule status within one or two ms of being told to do
so; if they don't then something is wrong.
Furthermore, bug reports have shown that sometimes controllers
(particularly those made by VIA) don't update the status bit at all,
even when the schedule does change state. When this happens, polling
for 20 ms would cause an unnecessarily long delay.
The delay is reduced to somewhere between 2 and 4 ms, depending on the
slop allowed by the kernel's high-res timers.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch (as1649) reverts commit
55bcdce8a8 (USB: EHCI: remove ASS/PSS
polling timeout). That commit was written under the assumption that
some controllers may take a very long time to turn off their async and
periodic schedules. It now appears that in fact the schedules do get
turned off reasonably quickly, but some controllers occasionally leave
the schedules' status bits turned on and consequently ehci-hcd can't
tell that the schedules are off.
VIA controllers in particular have this problem. ehci-hcd tells the
hardware to turn off the async schedule, the schedule does get turned
off, but the status bit remains on. Since the EHCI spec requires that
the schedules not be re-enabled until the previous disable has taken
effect, with an unlimited timeout the async schedule never gets turned
back on. The resulting symptom is that the system is unable to
communicate with USB devices.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: Ronald <ronald645@gmail.com>
Reported-and-tested-by: Paul Hartman <paul.hartman@gmail.com>
Reported-and-tested-by: Dieter Nützel <dieter@nuetzel-hh.de>
Reported-and-tested-by: Jean Delvare <khali@linux-fr.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch (as1647) attempts to work around a problem that seems to
affect some nVidia EHCI controllers. They sometimes take a very long
time to turn off their async or periodic schedules. I don't know if
this is a result of other problems, but in any case it seems wise not
to depend on schedule enables or disables taking effect in any
specific length of time.
The patch removes the existing 20-ms timeout for enabling and
disabling the schedules. The driver will now continue to poll the
schedule state at 1-ms intervals until the controller finally decides
to obey the most recent command issued by the driver. Just in case
this hides a problem, a debugging message will be logged if the
controller takes longer than 20 polls.
I don't know if this will actually fix anything, but it can't hurt.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch (as1606) converts two warning messages in the ehci-hcd
driver to debug messages, and adds a little extra information to each.
The log messages occur when an EHCI controller takes too long (more
than 20 ms) to turn its async or periodic schedule on or off. If this
happens at all, it's liable to happen quite often and there's no point
spamming the system log with these warnings. Furthermore, there's
nothing much we can do about it when the problem happens.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: Thomas Voegtle <tv@lio96.de>
Cc: stable <stable@vger.kernel.org> # [3.6]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch (as1586) replaces the kernel timer used by ehci-hcd as an
I/O watchdog with an hrtimer event.
Unlike in the current code, the watchdog event is now always enabled
whenever any isochronous URBs are active. This will prevent bugs
caused by the periodic schedule wrapping around with no completion
interrupts; the watchdog handler is guaranteed to scan the isochronous
transfers at least once during each iteration of the schedule. The
extra overhead will be negligible: one timer interrupt every 100 ms.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch (as1585) fixes a bug in ehci-hcd's scheme for scanning
interrupt QHs.
Currently a single routine takes care of scanning everything on the
periodic schedule. Whenever an interrupt occurs, it scans all
isochronous and interrupt URBs scheduled for frames that have elapsed
since the last scan.
This has two disadvantages. The first is relatively minor: An
interrupt QH is likely to end up getting scanned multiple times,
particularly if the last scan was not fairly recent. (The current
code avoids this by maintaining a periodic_stamp in each interrupt
QH.)
The second is more serious. The periodic schedule wraps around. If
the last scan occurred during frame N, and the next scan occurs when
the schedule has gone through an entire cycle and is back at frame N,
the scanning code won't look at any frames other than N. Consequently
it won't see any QHs that completed during frame N-1 or earlier.
The patch replaces the entire frame-based approach for scanning
interrupt QHs with a new routine using a list-based approach, the same
as for async QHs. This has a slight disadvantage, because it means
that all interrupt QHs have to be scanned every time. But it is more
robust than the current approach.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch (as1583) changes ehci-hcd to use an hrtimer event for
unlinking empty (unused) async QHs instead of using a kernel timer.
The check for empty QHs is moved to a new routine, where it doesn't
require going through an entire scan of both the async and periodic
schedules. And it can unlink multiple QHs at once, unlike the current
code.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch (as1582) changes ehci-hcd's strategy for unlinking async
QHs. Currently the driver never unlinks more than one QH at a time.
This can be inefficient and cause unnecessary delays, since a QH
cannot be reused while it is waiting to be unlinked.
The new strategy unlinks all the waiting QHs at once. In practice the
improvement won't be very big, because it's somewhat uncommon to have
two or more QHs waiting to be unlinked at any time. But it does
happen, and in any case, doing things this way makes more sense IMO.
The change requires the async unlinking code to be refactored
slightly. Now in addition to the routines for starting and ending an
unlink, there are new routines for unlinking a single QH and starting
an IAA cycle. This approach is needed because there are two separate
paths for unlinking async QHs:
When a transfer error occurs or an URB is cancelled, the QH
must be unlinked right away;
When a QH has been idle sufficiently long, it is unlinked
to avoid consuming DMA bandwidth uselessly.
In the first case we want the unlink to proceed as quickly as
possible, whereas in the second case we can afford to batch several
QHs together and unlink them all at once. Hence the division of
labor.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch (as1581) replaces the iaa_watchdog kernel timer used by
ehci-hcd with an hrtimer event, in keeping with the general conversion
to high-res timers.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch (as1579) adds an hrtimer event to handle deallocation of
iTDs and siTDs in ehci-hcd.
Because of the frame-oriented approach used by the EHCI periodic
schedule, the hardware can continue to access the Transfer Descriptor
for isochronous (or split-isochronous) transactions for up to a
millisecond after the transaction completes. The iTD (or siTD) must
not be reused before then.
The strategy currently used involves putting completed iTDs on a list
of cached entries and every so often returning them to the endpoint's
free list. The new strategy reduces overhead by putting completed
iTDs back on the free list immediately, although they are not reused
until it is safe to do so.
When the isochronous endpoint stops (its queue becomes empty), the
iTDs on its free list get moved to a global list, from which they will
be deallocated after a minimum of 2 ms. This delay is what the new
hrtimer event is for.
Overall this may not be a tremendous improvement over the current
code, but to me it seems a lot more clear and logical. In addition,
it removes the need for each iTD to keep a reference to the
ehci_iso_stream it belongs to, since the iTD never needs to be moved
back to the stream's free list from the global list.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch (as1578) adds an hrtimer event to handle the death of an
EHCI controller. When a controller dies, it doesn't necessarily stop
running right away. The new event polls at 1-ms intervals to see when
all activity has safely stopped. This replaces a busy-wait polling
loop in the current code.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch (as1577) adds hrtimer support for unlinking interrupt QHs
in ehci-hcd. The current code relies on a fixed delay of either 2 or
55 us, which is not always adequate and in any case is totally bogus.
Thanks to internal caching, the EHCI hardware may continue to access
an interrupt QH for more than a millisecond after it has been unlinked.
In fact, the EHCI spec doesn't say how long to wait before using an
unlinked interrupt QH. The patch sets the delay to 9 microframes
minimum, which ought to be adequate.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch (as1576) adds hrtimer support for managing ehci-hcd's
async schedule. Just as with the earlier change to the periodic
schedule management, two new hrtimer events take care of everything.
One event polls at 1-ms intervals to see when the Asynchronous
Schedule Status (ASS) flag matches the Asynchronous Schedule Enable
(ASE) value; the schedule's state must not be changed until it does.
The other event delays for 15 ms after the async schedule becomes
empty before turning it off.
The new events replace a busy-wait poll and a kernel timer usage.
They also replace the rather illogical method currently used for
indicating the async schedule should be turned off: attempting to
unlink the dedicated QH at the head of the async list.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch (as1573) adds hrtimer support for managing ehci-hcd's
periodic schedule. There are two issues to deal with.
First, the schedule's state (on or off) must not be changed until the
hardware status has caught up with the current command. This is
handled by an hrtimer event that polls at 1-ms intervals to see when
the Periodic Schedule Status (PSS) flag matches the Periodic Schedule
Enable (PSE) value.
Second, the schedule should not be turned off as soon as it becomes
empty. Turning the schedule on and off takes time, so we want to wait
until the schedule has been empty for a suitable period before turning
it off. This is handled by an hrtimer event that gets set to expire
10 ms after the periodic schedule becomes empty.
The existing code polls (for up to 1125 us and with interrupts
disabled!) to check the status, and doesn't implement a delay before
turning off the schedule. Furthermore, if the polling fails then the
driver decides that the controller has died. This has caused problems
for several people; some controllers can take 10 ms or more to turn
off their periodic schedules.
This patch fixes these issues. It also makes the "broken_periodic"
workaround unnecessary; there is no longer any danger of turning off
the periodic schedule after it has been on for less than 1 ms.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch (as1572) begins the conversion of ehci-hcd over to using
high-resolution timers rather than old-fashioned low-resolution kernel
timers. This reduces overhead caused by timer roundoff on systems
where HZ is smaller than 1000. Also, the new timer framework
introduced here is much more logical and easily extended than the
ad-hoc approach ehci-hcd currently uses for timers.
An hrtimer structure is added to ehci_hcd, along with a bitflag array
and an array of ktime_t values, to keep track of which timing events
are pending and what their expiration times are.
Only the infrastructure for the timing operations is added in this
patch. Later patches will add routines for handling each of the
various timing events the driver needs. In some cases the new hrtimer
handlers will replace the existing handlers for ehci-hcd's kernel
timers; as this happens the old timers will be removed. In other
cases the new timing events will replace busy-wait loops.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>