We've observed this test being significantly flaky on our Mac CI
machines when we're running the full check-clang suite. It fails because
the wait_for condition isn't met within 3 seconds. We believe it's
because our CI machines are somewhat underpowered and pretty heavily
loaded when we're running the full check-clang suite.
I ran some experiments on increasing the timeout. I ran the full
check-clang suite 100 times with each timeout value and recorded how
many flaky failures we encountered in these tests. The results are:
3 second timeout (baseline): 20 failures
10 second timeout: 14 failures
20 second timeout: 4 failures
30 second timeout: 2 failures
40 second timeout: 1 failure
50 second timeout: 0 failures
60 second timeout: 0 failures
I ran another set of 100 tests for the 50 second timeout and observed
one flaky failure. By contrast, I ended up running check-clang 500 times
for the 60 second timeout and didn't observe a single flaky failure.
That's how the 60 second timeout value used in this patch was derived.
While a 60 second timeout might seem high, keep in mind that:
- This is a timeout, not a sleep; the test should require much less time
the vast majority of instances, especially on more powerful machines.
- The long timeout is most likely to occur when other tests are also
running at the same time, so the latency of the timeout will also be
masked by the latency of the other tests.
See https://reviews.llvm.org/D58418?id=200123#inline-554211 for where
this timeout was originally introduced and the possibility of raising it
if it wasn't enough was discussed.
Reviewed By: plotfi
Differential Revision: https://reviews.llvm.org/D97878
After D88666, which implemented DirectoryWatcher on Windows, we're
seeing test failures on Chromium's Windows bots.
Try raising the timeout in case the test is failing due to high load on
the machine.
This implements the directory watcher on Windows. It does the most
naive thing for simplicity. ReadDirectoryChangesW is used to monitor
the changes. However, in order to support interruption, we must use
overlapped IO, which allows us to use the blocking, synchronous
mechanism. We create a thread to post the notification to the consumer
to allow the monitoring to continue. The two threads communicate via a
locked queue.
Differential Revision: https://reviews.llvm.org/D88666
Reviewed By: Adrian McCarthy
I observed two bugs in the DirectoryWatcher on macOS
1. We were calling FSEventStreamStop and FSEventStreamInvalidate before
we called FSEventStreamStart and FSEventStreamSetDispatchQueue, if the
DirectoryWatcher was destroyed before the initial async work was done.
This violates the requirements of the FSEvents API.
2. Calls to Receiver could race between the initial work and the
invalidation during destruction.
The second issue is easier to see when using TSan.
Differential Revision: https://reviews.llvm.org/D74371
rdar://59215667
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.
This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.
This doesn't actually modify StringRef yet, I'll do that in a follow-up.
I also have replaced all the instances of
"auto DW = DirectoryWatcher::create" with
llvm::Expected<std::unique_ptr<DirectoryWatcher>> DW = DirectoryWatcher::create
to make it more clear that DirectoryWatcher::create is returning an Expected.
I've also allowed for logAllUnhandledErrors to consume errors in the case were
DirectoryWatcher::create produces them.
Differential Revision: https://reviews.llvm.org/D65829
llvm-svn: 368108
Prior to this patch Unix style errno error reporting from the inotify layer was
used by DirectoryWatcher::create to simply return a nullptr on error. This
would generally be ok, except that in LLVM we have much more robust error
reporting through the facilities of llvm::Expected.
The other critical thing I stumbled across was that the unit tests for
DirectoryWatcher were not failing abruptly when inotify_init() was reporting an
error, but would continue with the testing and eventually hit a deadlock in a
pathological machine state (ie in the unit test, the return nullptr on ::create
was ignored).
Generally this pathological state never happens on any build bot, so it is
totally understandable that it was overlooked, but on a Linux desktop running
a dubious desktop environment (which I will not name) there is a chance that
said desktop environment could use up enough inotify instances to exceed the
user's limit. These are the conditions that led me to hit the deadlock I am
addressing in this patch with more robust error handling.
With the new llvm::Expected error handling when your system runs out of inotify
instances for your user, the unit test will be forced to handle the error or
crash and report the issue to the user instead of weirdly deadlocking on a
condition variable wait.
Differential Revision: https://reviews.llvm.org/D65704
llvm-svn: 367979
This should not affect actual behavior, but should pessimize the threading less
by avoiding the situation where:
* mutex is still locked
* T1 notifies on condition variable
* T2 wakes to check mutex
* T2 sees mutex is still locked
* T2 waits
* T1 unlocks mutex
* T2 tries again, acquires mutex.
Differential Revision: https://reviews.llvm.org/D65708
llvm-svn: 367968
Workaround for FSEvents sometimes sending notifications for events that happened
before DirectoryWatcher was created.
This caused tests to be flaky on green dragon.
llvm-svn: 366138
This reverts commit f561227d13.
- DirectoryWatcher
- Fix the build for platforms that don't have DW implementated.
- Fix the threading dependencies (thanks to compnerd).
llvm-svn: 365954
Asynchronously monitors specified directory for changes and passes notifications to provided callback.
Dependency for index-while-building.
Differential Revision: https://reviews.llvm.org/D58418
llvm-svn: 365574