llvm-project/llvm/test
Keno Fischer 578cf7aae7 [ExecutionDepsFix] Improve clearance calculation for loops
Summary:
In revision rL278321, ExecutionDepsFix learned how to pick a better
register for undef register reads, e.g. for instructions such as
`vcvtsi2sdq`. While this revision improved performance on a good number
of our benchmarks, it unfortunately also caused significant regressions
(up to 3x) on others. This regression turned out to be caused by loops
such as:

PH -> A -> B (xmm<Undef> -> xmm<Def>) -> C -> D -> EXIT
      ^                                  |
      +----------------------------------+

In the previous version of the clearance calculation, we would visit
the blocks in order, remembering for each whether there were any
incoming backedges from blocks that we hadn't processed yet and if
so queuing up the block to be re-processed. However, for loop structures
such as the above, this is clearly insufficient, since the block B
does not have any unknown backedges, so we do not see the false
dependency from the previous interation's Def of xmm registers in B.

To fix this, we need to consider all blocks that are part of the loop
and reprocess them one the correct clearance values are known. As
an optimization, we also want to avoid reprocessing any later blocks
that are not part of the loop.

In summary, the iteration order is as follows:
Before: PH A B C D A'
Corrected (Naive): PH A B C D A' B' C' D'
Corrected (w/ optimization): PH A B C A' B' C' D

To facilitate this optimization we introduce two new counters for each
basic block. The first counts how many of it's predecssors have
completed primary processing. The second counts how many of its
predecessors have completed all processing (we will call such a block
*done*. Now, the criteria to reprocess a block is as follows:
    - All Predecessors have completed primary processing
    - For x the number of predecessors that have completed primary
      processing *at the time of primary processing of this block*,
      the number of predecessors that are done has reached x.

The intuition behind this criterion is as follows:
We need to perform primary processing on all predecessors in order to
find out any direct defs in those predecessors. When predecessors are
done, we also know that we have information about indirect defs (e.g.
in block B though that were inherited through B->C->A->B). However,
we can't wait for all predecessors to be done, since that would
cause cyclic dependencies. However, it is guaranteed that all those
predecessors that are prior to us in reverse postorder will be done
before us. Since we iterate of the basic blocks in reverse postorder,
the number x above, is precisely the count of the number of predecessors
prior to us in reverse postorder.

Reviewers: myatsina
Differential Revision: https://reviews.llvm.org/D28759

llvm-svn: 293571
2017-01-30 23:37:03 +00:00
..
Analysis AMDGPU: Fix atomic_inc/atomic_dec + ds_swizzle not being divergent 2017-01-30 17:09:47 +00:00
Assembler [NVPTX] Auto-upgrade some NVPTX intrinsics to LLVM target-generic code. 2017-01-21 01:00:32 +00:00
Bindings [cmake] Canonicalize CMake booleans to 0/1 for lit interop 2017-01-06 21:33:48 +00:00
Bitcode [ThinLTO] Subsume all importing checks into a single flag 2017-01-05 14:32:16 +00:00
BugPoint
CodeGen [ExecutionDepsFix] Improve clearance calculation for loops 2017-01-30 23:37:03 +00:00
DebugInfo stripDebugInfo() should remove DILocation's found in !llvm.loop metadata 2017-01-28 11:22:05 +00:00
Examples
ExecutionEngine Test RuntimeDyld doesn't crash with R_X86_64_NONE (r293388). 2017-01-30 01:28:42 +00:00
Feature Add intrinsics for constrained floating point operations 2017-01-26 23:27:59 +00:00
FileCheck Commit a test for match-full-lines. 2017-01-09 23:11:25 +00:00
Instrumentation [sanitizer-coverage] emit __sanitizer_cov_trace_pc_guard w/o a preceding 'if' by default. Update the docs, also add deprecation notes around other parts of sanitizer coverage 2017-01-24 00:57:31 +00:00
Integer
JitListener [cmake] Canonicalize CMake booleans to 0/1 for lit interop 2017-01-06 21:33:48 +00:00
LTO IPO, LTO: Plumb the summary from the LTO API into the pass manager. 2017-01-20 22:18:52 +00:00
LibDriver LibDriver: Allow resource files to be archive members. 2016-12-15 19:37:46 +00:00
Linker Renumber testcase metadata nodes after r290153. 2016-12-22 00:45:21 +00:00
MC Fix line endings. 2017-01-30 22:04:23 +00:00
Object Change the llvm-obdump(1) behavior with the -macho flag and inappropriate file types. 2017-01-30 20:53:17 +00:00
ObjectYAML Attempt to fix the testcase in r292824 2017-01-23 20:42:17 +00:00
Other [PM] Port LoopLoadElimination to the new pass manager and wire it into 2017-01-27 01:32:26 +00:00
SymbolRewriter
TableGen TableGen: Fix infinite recursion in RegisterBankEmitter 2017-01-30 15:07:01 +00:00
ThinLTO/X86 [LTO] Add test to show up we don't support ThinLTO yet. 2017-01-24 00:59:00 +00:00
Transforms [InstCombine] enable (X <<nsw C1) >>s C2 --> X <<nsw (C1 - C2) for vectors with splat constants 2017-01-30 23:35:52 +00:00
Unit
Verifier Add intrinsics for constrained floating point operations 2017-01-26 23:27:59 +00:00
YAMLParser
tools [WebAssembly] Add wasm support for llvm-readobj 2017-01-30 23:30:52 +00:00
.clang-format
CMakeLists.txt [llvm-config] Print --system-libs only when static linking 2017-01-06 21:33:54 +00:00
TestRunner.sh
lit.cfg [llvm-config] Print --system-libs only when static linking 2017-01-06 21:33:54 +00:00
lit.site.cfg.in [llvm-config] Print --system-libs only when static linking 2017-01-06 21:33:54 +00:00