2021-12-23 04:46:06 +08:00
|
|
|
if (DEFINED LLVM_HAVE_TF_AOT OR DEFINED LLVM_HAVE_TF_API)
|
|
|
|
include(TensorFlowCompile)
|
|
|
|
set(LLVM_RAEVICT_MODEL_PATH_DEFAULT "models/regalloc-eviction")
|
|
|
|
|
|
|
|
# This url points to the most recent most which is known to be compatible with
|
|
|
|
# LLVM. When better models are published, this url should be updated to aid
|
|
|
|
# discoverability.
|
2022-02-09 08:53:36 +08:00
|
|
|
set(LLVM_RAEVICT_MODEL_CURRENT_URL "https://github.com/google/ml-compiler-opt/releases/download/regalloc-evict-v1.0/regalloc-evict-e67430c-v1.0.tar.gz")
|
2021-12-23 04:46:06 +08:00
|
|
|
|
|
|
|
if (DEFINED LLVM_HAVE_TF_AOT)
|
|
|
|
tf_find_and_compile(
|
|
|
|
${LLVM_RAEVICT_MODEL_PATH}
|
|
|
|
${LLVM_RAEVICT_MODEL_CURRENT_URL}
|
|
|
|
${LLVM_RAEVICT_MODEL_PATH_DEFAULT}
|
|
|
|
"../Analysis/models/gen-regalloc-eviction-test-model.py"
|
|
|
|
serve
|
|
|
|
action
|
|
|
|
RegallocEvictModel
|
|
|
|
llvm::RegallocEvictModel
|
|
|
|
)
|
|
|
|
endif()
|
|
|
|
|
|
|
|
if (DEFINED LLVM_HAVE_TF_API)
|
|
|
|
list(APPEND MLLinkDeps ${tensorflow_c_api} ${tensorflow_fx})
|
|
|
|
endif()
|
|
|
|
endif()
|
|
|
|
|
[cmake] Explicitly mark libraries defined in lib/ as "Component Libraries"
Summary:
Most libraries are defined in the lib/ directory but there are also a
few libraries defined in tools/ e.g. libLLVM, libLTO. I'm defining
"Component Libraries" as libraries defined in lib/ that may be included in
libLLVM.so. Explicitly marking the libraries in lib/ as component
libraries allows us to remove some fragile checks that attempt to
differentiate between lib/ libraries and tools/ libraires:
1. In tools/llvm-shlib, because
llvm_map_components_to_libnames(LIB_NAMES "all") returned a list of
all libraries defined in the whole project, there was custom code
needed to filter out libraries defined in tools/, none of which should
be included in libLLVM.so. This code assumed that any library
defined as static was from lib/ and everything else should be
excluded.
With this change, llvm_map_components_to_libnames(LIB_NAMES, "all")
only returns libraries that have been added to the LLVM_COMPONENT_LIBS
global cmake property, so this custom filtering logic can be removed.
Doing this also fixes the build with BUILD_SHARED_LIBS=ON
and LLVM_BUILD_LLVM_DYLIB=ON.
2. There was some code in llvm_add_library that assumed that
libraries defined in lib/ would not have LLVM_LINK_COMPONENTS or
ARG_LINK_COMPONENTS set. This is only true because libraries
defined lib lib/ use LLVMBuild.txt and don't set these values.
This code has been fixed now to check if the library has been
explicitly marked as a component library, which should now make it
easier to remove LLVMBuild at some point in the future.
I have tested this patch on Windows, MacOS and Linux with release builds
and the following combinations of CMake options:
- "" (No options)
- -DLLVM_BUILD_LLVM_DYLIB=ON
- -DLLVM_LINK_LLVM_DYLIB=ON
- -DBUILD_SHARED_LIBS=ON
- -DBUILD_SHARED_LIBS=ON -DLLVM_BUILD_LLVM_DYLIB=ON
- -DBUILD_SHARED_LIBS=ON -DLLVM_LINK_LLVM_DYLIB=ON
Reviewers: beanz, smeenai, compnerd, phosek
Reviewed By: beanz
Subscribers: wuzish, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, mgorny, mehdi_amini, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, steven_wu, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, dang, Jim, lenary, s.egerton, pzheng, sameer.abuasal, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70179
2019-11-14 13:39:58 +08:00
|
|
|
add_llvm_component_library(LLVMCodeGen
|
2009-10-27 03:32:42 +08:00
|
|
|
AggressiveAntiDepBreaker.cpp
|
2010-12-11 02:36:02 +08:00
|
|
|
AllocationOrder.cpp
|
2010-06-15 12:08:14 +08:00
|
|
|
Analysis.cpp
|
2014-08-22 05:50:01 +08:00
|
|
|
AtomicExpandPass.cpp
|
Switch TargetTransformInfo from an immutable analysis pass that requires
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
2013-01-07 09:37:14 +08:00
|
|
|
BasicTargetTransformInfo.cpp
|
2008-09-22 09:08:49 +08:00
|
|
|
BranchFolding.cpp
|
2016-10-06 23:38:53 +08:00
|
|
|
BranchRelaxation.cpp
|
2018-01-22 18:06:50 +08:00
|
|
|
BreakFalseDeps.cpp
|
2020-08-06 08:11:48 +08:00
|
|
|
BasicBlockSections.cpp
|
2009-12-14 15:43:25 +08:00
|
|
|
CalcSpillWeights.cpp
|
2010-07-07 23:15:27 +08:00
|
|
|
CallingConvLower.cpp
|
Add Windows Control Flow Guard checks (/guard:cf).
Summary:
A new function pass (Transforms/CFGuard/CFGuard.cpp) inserts CFGuard checks on
indirect function calls, using either the check mechanism (X86, ARM, AArch64) or
or the dispatch mechanism (X86-64). The check mechanism requires a new calling
convention for the supported targets. The dispatch mechanism adds the target as
an operand bundle, which is processed by SelectionDAG. Another pass
(CodeGen/CFGuardLongjmp.cpp) identifies and emits valid longjmp targets, as
required by /guard:cf. This feature is enabled using the `cfguard` CC1 option.
Reviewers: thakis, rnk, theraven, pcc
Subscribers: ychen, hans, metalcanine, dmajor, tomrittervg, alex, mehdi_amini, mgorny, javed.absar, kristof.beyls, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D65761
2019-10-28 21:22:19 +08:00
|
|
|
CFGuardLongjmp.cpp
|
[CodeGen] Async unwind - add a pass to fix CFI information
This pass inserts the necessary CFI instructions to compensate for the
inconsistency of the call-frame information caused by linear (non-CGA
aware) nature of the unwind tables.
Unlike the `CFIInstrInserer` pass, this one almost always emits only
`.cfi_remember_state`/`.cfi_restore_state`, which results in smaller
unwind tables and also transparently handles custom unwind info
extensions like CFA offset adjustement and save locations of SVE
registers.
This pass takes advantage of the constraints taht LLVM imposes on the
placement of save/restore points (cf. `ShrinkWrap.cpp`):
* there is a single basic block, containing the function prologue
* possibly multiple epilogue blocks, where each epilogue block is
complete and self-contained, i.e. CSR restore instructions (and the
corresponding CFI instructions are not split across two or more
blocks.
* prologue and epilogue blocks are outside of any loops
Thus, during execution, at the beginning and at the end of each basic
block the function can be in one of two states:
- "has a call frame", if the function has executed the prologue, or
has not executed any epilogue
- "does not have a call frame", if the function has not executed the
prologue, or has executed an epilogue
These properties can be computed for each basic block by a single RPO
traversal.
From the point of view of the unwind tables, the "has/does not have
call frame" state at beginning of each block is determined by the
state at the end of the previous block, in layout order.
Where these states differ, we insert compensating CFI instructions,
which come in two flavours:
- CFI instructions, which reset the unwind table state to the
initial one. This is done by a target specific hook and is
expected to be trivial to implement, for example it could be:
```
.cfi_def_cfa <sp>, 0
.cfi_same_value <rN>
.cfi_same_value <rN-1>
...
```
where `<rN>` are the callee-saved registers.
- CFI instructions, which reset the unwind table state to the one
created by the function prologue. These are the sequence:
```
.cfi_restore_state
.cfi_remember_state
```
In this case we also insert a `.cfi_remember_state` after the
last CFI instruction in the function prologue.
Reviewed By: MaskRay, danielkiss, chill
Differential Revision: https://reviews.llvm.org/D114545
2022-04-11 19:08:26 +08:00
|
|
|
CFIFixup.cpp
|
2018-04-24 18:32:08 +08:00
|
|
|
CFIInstrInserter.cpp
|
2010-10-08 02:41:20 +08:00
|
|
|
CodeGen.cpp
|
2021-09-26 13:55:37 +08:00
|
|
|
CodeGenCommonISel.cpp
|
2020-12-30 08:30:16 +08:00
|
|
|
CodeGenPassBuilder.cpp
|
2014-02-22 08:07:45 +08:00
|
|
|
CodeGenPrepare.cpp
|
2020-03-04 07:47:43 +08:00
|
|
|
CommandFlags.cpp
|
2009-10-27 00:59:04 +08:00
|
|
|
CriticalAntiDepBreaker.cpp
|
2013-01-12 04:05:37 +08:00
|
|
|
DeadMachineInstructionElim.cpp
|
2016-04-28 11:07:16 +08:00
|
|
|
DetectDeadLanes.cpp
|
2016-01-28 00:53:42 +08:00
|
|
|
DFAPacketizer.cpp
|
2009-05-23 04:36:31 +08:00
|
|
|
DwarfEHPrepare.cpp
|
2012-07-04 08:09:54 +08:00
|
|
|
EarlyIfConversion.cpp
|
2011-01-05 05:10:05 +08:00
|
|
|
EdgeBundles.cpp
|
2021-02-15 09:23:40 +08:00
|
|
|
EHContGuardCatchret.cpp
|
2018-01-22 18:06:33 +08:00
|
|
|
ExecutionDomainFix.cpp
|
2019-09-10 18:39:09 +08:00
|
|
|
ExpandMemCmp.cpp
|
2011-09-26 00:46:00 +08:00
|
|
|
ExpandPostRAPseudos.cpp
|
2017-05-10 17:42:49 +08:00
|
|
|
ExpandReductions.cpp
|
2021-04-30 19:43:48 +08:00
|
|
|
ExpandVectorPredication.cpp
|
2015-06-16 02:44:08 +08:00
|
|
|
FaultMaps.cpp
|
2017-02-01 01:00:27 +08:00
|
|
|
FEntryInserter.cpp
|
2019-06-19 08:25:39 +08:00
|
|
|
FinalizeISel.cpp
|
2020-04-09 19:40:53 +08:00
|
|
|
FixupStatepointCallerSaved.cpp
|
2015-09-18 04:45:18 +08:00
|
|
|
FuncletLayout.cpp
|
2008-09-22 09:08:49 +08:00
|
|
|
GCMetadata.cpp
|
|
|
|
GCMetadataPrinter.cpp
|
2015-01-16 03:29:42 +08:00
|
|
|
GCRootLowering.cpp
|
2014-06-14 06:57:59 +08:00
|
|
|
GlobalMerge.cpp
|
2019-06-07 15:35:30 +08:00
|
|
|
HardwareLoops.cpp
|
2008-09-22 09:08:49 +08:00
|
|
|
IfConversion.cpp
|
2015-06-16 02:44:27 +08:00
|
|
|
ImplicitNullChecks.cpp
|
Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre..
Summary:
First, we need to explain the core of the vulnerability. Note that this
is a very incomplete description, please see the Project Zero blog post
for details:
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
The basis for branch target injection is to direct speculative execution
of the processor to some "gadget" of executable code by poisoning the
prediction of indirect branches with the address of that gadget. The
gadget in turn contains an operation that provides a side channel for
reading data. Most commonly, this will look like a load of secret data
followed by a branch on the loaded value and then a load of some
predictable cache line. The attacker then uses timing of the processors
cache to determine which direction the branch took *in the speculative
execution*, and in turn what one bit of the loaded value was. Due to the
nature of these timing side channels and the branch predictor on Intel
processors, this allows an attacker to leak data only accessible to
a privileged domain (like the kernel) back into an unprivileged domain.
The goal is simple: avoid generating code which contains an indirect
branch that could have its prediction poisoned by an attacker. In many
cases, the compiler can simply use directed conditional branches and
a small search tree. LLVM already has support for lowering switches in
this way and the first step of this patch is to disable jump-table
lowering of switches and introduce a pass to rewrite explicit indirectbr
sequences into a switch over integers.
However, there is no fully general alternative to indirect calls. We
introduce a new construct we call a "retpoline" to implement indirect
calls in a non-speculatable way. It can be thought of loosely as
a trampoline for indirect calls which uses the RET instruction on x86.
Further, we arrange for a specific call->ret sequence which ensures the
processor predicts the return to go to a controlled, known location. The
retpoline then "smashes" the return address pushed onto the stack by the
call with the desired target of the original indirect call. The result
is a predicted return to the next instruction after a call (which can be
used to trap speculative execution within an infinite loop) and an
actual indirect branch to an arbitrary address.
On 64-bit x86 ABIs, this is especially easily done in the compiler by
using a guaranteed scratch register to pass the target into this device.
For 32-bit ABIs there isn't a guaranteed scratch register and so several
different retpoline variants are introduced to use a scratch register if
one is available in the calling convention and to otherwise use direct
stack push/pop sequences to pass the target address.
This "retpoline" mitigation is fully described in the following blog
post: https://support.google.com/faqs/answer/7625886
We also support a target feature that disables emission of the retpoline
thunk by the compiler to allow for custom thunks if users want them.
These are particularly useful in environments like kernels that
routinely do hot-patching on boot and want to hot-patch their thunk to
different code sequences. They can write this custom thunk and use
`-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this
case, on x86-64 thu thunk names must be:
```
__llvm_external_retpoline_r11
```
or on 32-bit:
```
__llvm_external_retpoline_eax
__llvm_external_retpoline_ecx
__llvm_external_retpoline_edx
__llvm_external_retpoline_push
```
And the target of the retpoline is passed in the named register, or in
the case of the `push` suffix on the top of the stack via a `pushl`
instruction.
There is one other important source of indirect branches in x86 ELF
binaries: the PLT. These patches also include support for LLD to
generate PLT entries that perform a retpoline-style indirection.
The only other indirect branches remaining that we are aware of are from
precompiled runtimes (such as crt0.o and similar). The ones we have
found are not really attackable, and so we have not focused on them
here, but eventually these runtimes should also be replicated for
retpoline-ed configurations for completeness.
For kernels or other freestanding or fully static executables, the
compiler switch `-mretpoline` is sufficient to fully mitigate this
particular attack. For dynamic executables, you must compile *all*
libraries with `-mretpoline` and additionally link the dynamic
executable and all shared libraries with LLD and pass `-z retpolineplt`
(or use similar functionality from some other linker). We strongly
recommend also using `-z now` as non-lazy binding allows the
retpoline-mitigated PLT to be substantially smaller.
When manually apply similar transformations to `-mretpoline` to the
Linux kernel we observed very small performance hits to applications
running typical workloads, and relatively minor hits (approximately 2%)
even for extremely syscall-heavy applications. This is largely due to
the small number of indirect branches that occur in performance
sensitive paths of the kernel.
When using these patches on statically linked applications, especially
C++ applications, you should expect to see a much more dramatic
performance hit. For microbenchmarks that are switch, indirect-, or
virtual-call heavy we have seen overheads ranging from 10% to 50%.
However, real-world workloads exhibit substantially lower performance
impact. Notably, techniques such as PGO and ThinLTO dramatically reduce
the impact of hot indirect calls (by speculatively promoting them to
direct calls) and allow optimized search trees to be used to lower
switches. If you need to deploy these techniques in C++ applications, we
*strongly* recommend that you ensure all hot call targets are statically
linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well
tuned servers using all of these techniques saw 5% - 10% overhead from
the use of retpoline.
We will add detailed documentation covering these components in
subsequent patches, but wanted to make the core functionality available
as soon as possible. Happy for more code review, but we'd really like to
get these patches landed and backported ASAP for obvious reasons. We're
planning to backport this to both 6.0 and 5.0 release streams and get
a 5.0 release with just this cherry picked ASAP for distros and vendors.
This patch is the work of a number of people over the past month: Eric, Reid,
Rui, and myself. I'm mailing it out as a single commit due to the time
sensitive nature of landing this and the need to backport it. Huge thanks to
everyone who helped out here, and everyone at Intel who helped out in
discussions about how to craft this. Also, credit goes to Paul Turner (at
Google, but not an LLVM contributor) for much of the underlying retpoline
design.
Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer
Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D41723
llvm-svn: 323155
2018-01-23 06:05:25 +08:00
|
|
|
IndirectBrExpandPass.cpp
|
2010-06-30 07:58:39 +08:00
|
|
|
InlineSpiller.cpp
|
2011-04-02 14:03:35 +08:00
|
|
|
InterferenceCache.cpp
|
[InterleavedAccess] Add a pass InterleavedAccess to identify interleaved memory accesses and transform into target specific intrinsics.
E.g. An interleaved load (Factor = 2):
%wide.vec = load <8 x i32>, <8 x i32>* %ptr
%v0 = shuffle <8 x i32> %wide.vec, <8 x i32> undef, <0, 2, 4, 6>
%v1 = shuffle <8 x i32> %wide.vec, <8 x i32> undef, <1, 3, 5, 7>
It can be transformed into a ld2 intrinsic in AArch64 backend or a vld2 intrinsic in ARM backend.
E.g. An interleaved store (Factor = 3):
%i.vec = shuffle <8 x i32> %v0, <8 x i32> %v1, <0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11>
store <12 x i32> %i.vec, <12 x i32>* %ptr
It can be transformed into a st3 intrinsic in AArch64 backend or a vst3 intrinsic in ARM backend.
Differential Revision: http://reviews.llvm.org/D10533
llvm-svn: 240751
2015-06-26 10:10:27 +08:00
|
|
|
InterleavedAccessPass.cpp
|
2018-11-19 22:26:10 +08:00
|
|
|
InterleavedLoadCombinePass.cpp
|
2008-09-22 09:08:49 +08:00
|
|
|
IntrinsicLowering.cpp
|
2022-02-11 07:10:48 +08:00
|
|
|
JMCInstrumenter.cpp
|
2008-11-20 07:18:57 +08:00
|
|
|
LatencyPriorityQueue.cpp
|
2017-02-15 01:21:09 +08:00
|
|
|
LazyMachineBlockFrequencyInfo.cpp
|
2011-08-11 03:04:06 +08:00
|
|
|
LexicalScopes.cpp
|
2010-11-30 10:17:10 +08:00
|
|
|
LiveDebugVariables.cpp
|
2017-12-13 10:51:04 +08:00
|
|
|
LiveIntervals.cpp
|
2016-01-28 00:53:42 +08:00
|
|
|
LiveInterval.cpp
|
2010-10-23 07:09:15 +08:00
|
|
|
LiveIntervalUnion.cpp
|
2016-01-28 00:53:42 +08:00
|
|
|
LivePhysRegs.cpp
|
2013-01-12 04:05:37 +08:00
|
|
|
LiveRangeCalc.cpp
|
2020-03-23 10:08:29 +08:00
|
|
|
LiveIntervalCalc.cpp
|
2013-01-12 04:05:37 +08:00
|
|
|
LiveRangeEdit.cpp
|
Add LiveRangeShrink pass to shrink live range within BB.
Summary: LiveRangeShrink pass moves instruction right after the definition with the same BB if the instruction and its operands all have more than one use. This pass is inexpensive and guarantees optimal live-range within BB.
Reviewers: davidxl, wmi, hfinkel, MatzeB, andreadb
Reviewed By: MatzeB, andreadb
Subscribers: hiraditya, jyknight, sanjoy, skatkov, gberry, jholewinski, qcolombet, javed.absar, krytarowski, atrick, spatel, RKSimon, andreadb, MatzeB, mehdi_amini, mgorny, efriedma, davide, dberlin, llvm-commits
Differential Revision: https://reviews.llvm.org/D32563
llvm-svn: 304371
2017-06-01 07:25:25 +08:00
|
|
|
LiveRangeShrink.cpp
|
2012-06-09 10:13:10 +08:00
|
|
|
LiveRegMatrix.cpp
|
2017-01-20 08:16:14 +08:00
|
|
|
LiveRegUnits.cpp
|
2017-12-19 07:19:44 +08:00
|
|
|
LiveStacks.cpp
|
2008-09-22 09:08:49 +08:00
|
|
|
LiveVariables.cpp
|
2017-10-13 06:57:28 +08:00
|
|
|
LLVMTargetMachine.cpp
|
2010-08-14 09:55:09 +08:00
|
|
|
LocalStackSlotAllocation.cpp
|
2018-01-22 18:06:50 +08:00
|
|
|
LoopTraversal.cpp
|
2016-07-21 03:09:30 +08:00
|
|
|
LowLevelType.cpp
|
2016-01-14 07:56:37 +08:00
|
|
|
LowerEmuTLS.cpp
|
2008-09-22 09:08:49 +08:00
|
|
|
MachineBasicBlock.cpp
|
2011-07-26 03:25:40 +08:00
|
|
|
MachineBlockFrequencyInfo.cpp
|
Implement a block placement pass based on the branch probability and
block frequency analyses. This differs substantially from the existing
block-placement pass in LLVM:
1) It operates on the Machine-IR in the CodeGen layer. This exposes much
more (and more precise) information and opportunities. Also, the
results are more stable due to fewer transforms ocurring after the
pass runs.
2) It uses the generalized probability and frequency analyses. These can
model static heuristics, code annotation derived heuristics as well
as eventual profile loading. By basing the optimization on the
analysis interface it can work from any (or a combination) of these
inputs.
3) It uses a more aggressive algorithm, both building chains from tho
bottom up to maximize benefit, and using an SCC-based walk to layout
chains of blocks in a profitable ordering without O(N^2) iterations
which the old pass involves.
The pass is currently gated behind a flag, and not enabled by default
because it still needs to grow some important features. Most notably, it
needs to support loop aligning and careful layout of loop structures
much as done by hand currently in CodePlacementOpt. Once it supports
these, and has sufficient testing and quality tuning, it should replace
both of these passes.
Thanks to Nick Lewycky and Richard Smith for help authoring & debugging
this, and to Jakob, Andy, Eric, Jim, and probably a few others I'm
forgetting for reviewing and answering all my questions. Writing
a backend pass is *sooo* much better now than it used to be. =D
llvm-svn: 142641
2011-10-21 14:46:38 +08:00
|
|
|
MachineBlockPlacement.cpp
|
2011-06-17 04:22:37 +08:00
|
|
|
MachineBranchProbabilityInfo.cpp
|
2014-08-04 05:35:39 +08:00
|
|
|
MachineCombiner.cpp
|
2012-01-07 11:02:36 +08:00
|
|
|
MachineCopyPropagation.cpp
|
2016-01-28 00:53:42 +08:00
|
|
|
MachineCSE.cpp
|
2020-12-17 12:23:29 +08:00
|
|
|
MachineCheckDebugify.cpp
|
2021-12-10 17:06:43 +08:00
|
|
|
MachineCycleAnalysis.cpp
|
2020-04-04 07:18:45 +08:00
|
|
|
MachineDebugify.cpp
|
2014-07-13 05:59:52 +08:00
|
|
|
MachineDominanceFrontier.cpp
|
2016-01-28 00:53:42 +08:00
|
|
|
MachineDominators.cpp
|
2017-04-27 07:36:58 +08:00
|
|
|
MachineFrameInfo.cpp
|
2016-01-28 00:53:42 +08:00
|
|
|
MachineFunction.cpp
|
2009-08-01 02:50:22 +08:00
|
|
|
MachineFunctionPass.cpp
|
2010-04-03 07:17:14 +08:00
|
|
|
MachineFunctionPrinterPass.cpp
|
2020-08-06 06:34:31 +08:00
|
|
|
MachineFunctionSplitter.cpp
|
2011-12-14 11:50:53 +08:00
|
|
|
MachineInstrBundle.cpp
|
2016-01-28 00:53:42 +08:00
|
|
|
MachineInstr.cpp
|
2008-09-22 09:08:49 +08:00
|
|
|
MachineLICM.cpp
|
|
|
|
MachineLoopInfo.cpp
|
[ModuloSchedule] Peel out prologs and epilogs, generate actual code
Summary:
This extends the PeelingModuloScheduleExpander to generate prolog and epilog code,
and correctly stitch uses through the prolog, kernel, epilog DAG.
The key concept in this patch is to ensure that all transforms are *local*; only a
function of a block and its immediate predecessor and successor. By defining the problem in this way
we can inductively rewrite the entire DAG using only local knowledge that is easy to
reason about.
For example, we assume that all prologs and epilogs are near-perfect clones of the
steady-state kernel. This means that if a block has an instruction that is predicated out,
we can redirect all users of that instruction to that equivalent instruction in our
immediate predecessor. As all blocks are clones, every instruction must have an equivalent in
every other block.
Similarly we can make the assumption by construction that if a value defined in a block is used
outside that block, the only possible user is its immediate successors. We maintain this
even for values that are used outside the loop by creating a limited form of LCSSA.
This code isn't small, but it isn't complex.
Enabled a bunch of testing from Hexagon. There are a couple of tests not enabled yet;
I'm about 80% sure there isn't buggy codegen but the tests are checking for patterns
that we don't produce. Those still need a bit more investigation. In the meantime we
(Google) are happy with the code produced by this on our downstream SMS implementation,
and believe it generates correct code.
Subscribers: mgorny, hiraditya, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68205
llvm-svn: 373462
2019-10-02 20:46:44 +08:00
|
|
|
MachineLoopUtils.cpp
|
2008-09-22 09:08:49 +08:00
|
|
|
MachineModuleInfo.cpp
|
2009-09-16 18:18:36 +08:00
|
|
|
MachineModuleInfoImpls.cpp
|
2021-05-26 08:20:52 +08:00
|
|
|
MachineModuleSlotTracker.cpp
|
2017-11-29 01:58:43 +08:00
|
|
|
MachineOperand.cpp
|
2017-01-26 07:20:33 +08:00
|
|
|
MachineOptimizationRemarkEmitter.cpp
|
2017-03-07 05:31:18 +08:00
|
|
|
MachineOutliner.cpp
|
2020-08-08 03:57:38 +08:00
|
|
|
MachinePassManager.cpp
|
2016-07-30 00:44:44 +08:00
|
|
|
MachinePipeliner.cpp
|
2013-01-12 04:05:37 +08:00
|
|
|
MachinePostDominators.cpp
|
2014-07-20 02:29:29 +08:00
|
|
|
MachineRegionInfo.cpp
|
2016-01-28 00:53:42 +08:00
|
|
|
MachineRegisterInfo.cpp
|
2012-01-13 14:30:30 +08:00
|
|
|
MachineScheduler.cpp
|
2010-01-13 09:02:47 +08:00
|
|
|
MachineSink.cpp
|
2019-10-29 03:35:34 +08:00
|
|
|
MachineSizeOpts.cpp
|
2021-12-10 17:06:43 +08:00
|
|
|
MachineSSAContext.cpp
|
2016-01-28 00:53:42 +08:00
|
|
|
MachineSSAUpdater.cpp
|
2020-04-09 01:27:17 +08:00
|
|
|
MachineStripDebug.cpp
|
2012-07-27 02:38:11 +08:00
|
|
|
MachineTraceMetrics.cpp
|
2009-05-16 08:33:53 +08:00
|
|
|
MachineVerifier.cpp
|
2021-05-19 07:08:38 +08:00
|
|
|
MIRFSDiscriminator.cpp
|
2021-08-19 07:59:02 +08:00
|
|
|
MIRSampleProfile.cpp
|
2021-05-01 03:31:55 +08:00
|
|
|
MIRYamlMapping.cpp
|
2021-12-21 12:50:55 +08:00
|
|
|
MLRegallocEvictAdvisor.cpp
|
2019-08-31 02:49:50 +08:00
|
|
|
ModuloSchedule.cpp
|
2020-10-26 16:06:17 +08:00
|
|
|
MultiHazardRecognizer.cpp
|
2016-04-19 13:24:47 +08:00
|
|
|
PatchableFunction.cpp
|
2020-01-28 02:05:54 +08:00
|
|
|
MBFIWrapper.cpp
|
2015-06-16 07:52:35 +08:00
|
|
|
MIRPrinter.cpp
|
2015-05-28 02:02:19 +08:00
|
|
|
MIRPrintingPass.cpp
|
2017-06-19 20:53:31 +08:00
|
|
|
MacroFusion.cpp
|
2019-12-05 18:11:32 +08:00
|
|
|
NonRelocatableStringpool.cpp
|
2010-02-12 09:30:21 +08:00
|
|
|
OptimizePHIs.cpp
|
2015-08-28 07:37:36 +08:00
|
|
|
ParallelCG.cpp
|
2010-08-10 13:16:06 +08:00
|
|
|
PeepholeOptimizer.cpp
|
2016-01-28 00:53:42 +08:00
|
|
|
PHIElimination.cpp
|
|
|
|
PHIEliminationUtils.cpp
|
2016-04-22 22:43:50 +08:00
|
|
|
PostRAHazardRecognizer.cpp
|
2008-09-22 09:08:49 +08:00
|
|
|
PostRASchedulerList.cpp
|
2016-04-23 05:18:02 +08:00
|
|
|
PreISelIntrinsicLowering.cpp
|
2009-11-04 09:32:06 +08:00
|
|
|
ProcessImplicitDefs.cpp
|
2008-09-22 09:08:49 +08:00
|
|
|
PrologEpilogInserter.cpp
|
2020-12-02 13:44:06 +08:00
|
|
|
PseudoProbeInserter.cpp
|
2008-09-22 09:08:49 +08:00
|
|
|
PseudoSourceValue.cpp
|
2020-03-18 02:45:11 +08:00
|
|
|
RDFGraph.cpp
|
|
|
|
RDFLiveness.cpp
|
|
|
|
RDFRegisters.cpp
|
2018-01-22 18:06:50 +08:00
|
|
|
ReachingDefAnalysis.cpp
|
2012-01-12 06:28:30 +08:00
|
|
|
RegAllocBase.cpp
|
2010-10-23 07:09:15 +08:00
|
|
|
RegAllocBasic.cpp
|
2021-12-14 14:49:57 +08:00
|
|
|
RegAllocEvictionAdvisor.cpp
|
2010-04-22 02:02:42 +08:00
|
|
|
RegAllocFast.cpp
|
2010-12-08 11:26:16 +08:00
|
|
|
RegAllocGreedy.cpp
|
2008-10-05 05:18:50 +08:00
|
|
|
RegAllocPBQP.cpp
|
[mlgo][regalloc] Add score calculation for training
Add the calculation of a score, which will be used during ML training. The
score qualifies the quality of a regalloc policy, and is independent of
what we train (currently, just eviction), or the regalloc algo itself.
We can then use scores to guide training (which happens offline), by
formulating a reward based on score variation - the goal being lowering
scores (currently, that reward is percentage reduction relative to
Greedy's heuristic)
Currently, we compute the score by factoring different instruction
counts (loads, stores, etc) with the machine basic block frequency,
regardless of the instructions' provenance - i.e. they could be due to
the regalloc policy or be introduced previously. This is different from
RAGreedy::reportStats, which accummulates the effects of the allocator
alone. We explored this alternative but found (at least currently) that
the more naive alternative introduced here produces better policies. We
do intend to consolidate the two, however, as we are actively
investigating improvements to our reward function, and will likely want
to re-explore scoring just the effects of the allocator.
In either case, we want to decouple score calculation from allocation
algorighm, as we currently evaluate it after a few more passes after
allocation (also, because score calculation should be reusable
regardless of allocation algorithm).
We intentionally accummulate counts independently because it facilitates
per-block reporting, which we found useful for debugging - for instance,
we can easily report the counts indepdently, and then cross-reference
with perf counter measurements.
Differential Revision: https://reviews.llvm.org/D115195
2021-12-07 06:59:19 +08:00
|
|
|
RegAllocScore.cpp
|
2011-06-02 10:19:35 +08:00
|
|
|
RegisterClassInfo.cpp
|
2008-09-22 09:08:49 +08:00
|
|
|
RegisterCoalescer.cpp
|
2012-04-25 02:06:49 +08:00
|
|
|
RegisterPressure.cpp
|
2008-09-22 09:08:49 +08:00
|
|
|
RegisterScavenging.cpp
|
[RemoveRedundantDebugValues] Add a Pass that removes redundant DBG_VALUEs
This new MIR pass removes redundant DBG_VALUEs.
After the register allocator is done, more precisely, after
the Virtual Register Rewriter, we end up having duplicated
DBG_VALUEs, since some virtual registers are being rewritten
into the same physical register as some of existing DBG_VALUEs.
Each DBG_VALUE should indicate (at least before the LiveDebugValues)
variables assignment, but it is being clobbered for function
parameters during the SelectionDAG since it generates new DBG_VALUEs
after COPY instructions, even though the parameter has no assignment.
For example, if we had a DBG_VALUE $regX as an entry debug value
representing the parameter, and a COPY and after the COPY,
DBG_VALUE $virt_reg, and after the virtregrewrite the $virt_reg gets
rewritten into $regX, we'd end up having redundant DBG_VALUE.
This breaks the definition of the DBG_VALUE since some analysis passes
might be built on top of that premise..., and this patch tries to fix
the MIR with the respect to that.
This first patch performs bacward scan, by trying to detect a sequence of
consecutive DBG_VALUEs, and to remove all DBG_VALUEs describing one
variable but the last one:
For example:
(1) DBG_VALUE $edi, !"var1", ...
(2) DBG_VALUE $esi, !"var2", ...
(3) DBG_VALUE $edi, !"var1", ...
...
in this case, we can remove (1).
By combining the forward scan that will be introduced in the next patch
(from this stack), by inspecting the statistics, the RemoveRedundantDebugValues
removes 15032 instructions by using gdb-7.11 as a testbed.
Differential Revision: https://reviews.llvm.org/D105279
2021-06-28 20:15:31 +08:00
|
|
|
RemoveRedundantDebugValues.cpp
|
2016-06-01 06:38:06 +08:00
|
|
|
RenameIndependentSubregs.cpp
|
2020-09-04 03:38:52 +08:00
|
|
|
MachineStableHash.cpp
|
2019-09-05 05:29:10 +08:00
|
|
|
MIRVRegNamerUtils.cpp
|
2019-09-06 04:44:33 +08:00
|
|
|
MIRNamerPass.cpp
|
2017-11-03 07:37:32 +08:00
|
|
|
MIRCanonicalizerPass.cpp
|
2016-06-11 00:19:46 +08:00
|
|
|
RegisterUsageInfo.cpp
|
|
|
|
RegUsageInfoCollector.cpp
|
2016-06-11 02:37:21 +08:00
|
|
|
RegUsageInfoPropagate.cpp
|
2021-02-13 01:22:28 +08:00
|
|
|
ReplaceWithVeclib.cpp
|
2016-08-27 08:18:31 +08:00
|
|
|
ResetMachineFunctionPass.cpp
|
2022-02-05 12:29:53 +08:00
|
|
|
RegisterBank.cpp
|
|
|
|
RegisterBankInfo.cpp
|
2016-01-28 00:53:42 +08:00
|
|
|
SafeStack.cpp
|
2016-06-30 04:37:43 +08:00
|
|
|
SafeStackLayout.cpp
|
2008-11-20 07:18:57 +08:00
|
|
|
ScheduleDAG.cpp
|
|
|
|
ScheduleDAGInstrs.cpp
|
|
|
|
ScheduleDAGPrinter.cpp
|
2011-01-10 05:31:39 +08:00
|
|
|
ScoreboardHazardRecognizer.cpp
|
2015-01-29 03:28:03 +08:00
|
|
|
ShadowStackGCLowering.cpp
|
2016-01-28 00:53:42 +08:00
|
|
|
ShrinkWrap.cpp
|
2009-08-18 02:47:11 +08:00
|
|
|
SjLjEHPrepare.cpp
|
2009-11-04 09:32:06 +08:00
|
|
|
SlotIndexes.cpp
|
2011-01-06 09:21:53 +08:00
|
|
|
SpillPlacement.cpp
|
2010-07-20 23:41:07 +08:00
|
|
|
SplitKit.cpp
|
2013-01-12 04:05:37 +08:00
|
|
|
StackColoring.cpp
|
2013-12-14 14:53:06 +08:00
|
|
|
StackMapLivenessAnalysis.cpp
|
2013-11-01 06:11:56 +08:00
|
|
|
StackMaps.cpp
|
2016-01-28 00:53:42 +08:00
|
|
|
StackProtector.cpp
|
|
|
|
StackSlotColoring.cpp
|
2019-05-24 16:39:43 +08:00
|
|
|
SwiftErrorValueTracking.cpp
|
2019-06-08 08:05:17 +08:00
|
|
|
SwitchLoweringUtils.cpp
|
2009-11-26 08:32:21 +08:00
|
|
|
TailDuplication.cpp
|
2016-04-09 04:35:01 +08:00
|
|
|
TailDuplicator.cpp
|
2011-12-16 06:58:58 +08:00
|
|
|
TargetFrameLoweringImpl.cpp
|
2012-11-28 10:35:09 +08:00
|
|
|
TargetInstrInfo.cpp
|
2013-01-12 04:05:37 +08:00
|
|
|
TargetLoweringBase.cpp
|
2010-02-16 06:55:13 +08:00
|
|
|
TargetLoweringObjectFileImpl.cpp
|
2011-12-16 06:58:58 +08:00
|
|
|
TargetOptionsImpl.cpp
|
2016-05-10 11:21:59 +08:00
|
|
|
TargetPassConfig.cpp
|
2012-11-28 10:35:09 +08:00
|
|
|
TargetRegisterInfo.cpp
|
2012-09-15 04:26:46 +08:00
|
|
|
TargetSchedule.cpp
|
2016-11-23 06:09:03 +08:00
|
|
|
TargetSubtargetInfo.cpp
|
2019-12-03 19:00:32 +08:00
|
|
|
TypePromotion.cpp
|
2008-09-22 09:08:49 +08:00
|
|
|
TwoAddressInstructionPass.cpp
|
|
|
|
UnreachableBlockElim.cpp
|
2018-03-30 01:21:10 +08:00
|
|
|
ValueTypes.cpp
|
2021-10-23 06:08:16 +08:00
|
|
|
VLIWMachineScheduler.cpp
|
2008-09-22 09:08:49 +08:00
|
|
|
VirtRegMap.cpp
|
2018-06-01 06:02:34 +08:00
|
|
|
WasmEHPrepare.cpp
|
2015-01-29 08:41:44 +08:00
|
|
|
WinEHPrepare.cpp
|
XRay: Add entry and exit sleds
Summary:
In this patch we implement the following parts of XRay:
- Supporting a function attribute named 'function-instrument' which currently only supports 'xray-always'. We should be able to use this attribute for other instrumentation approaches.
- Supporting a function attribute named 'xray-instruction-threshold' used to determine whether a function is instrumented with a minimum number of instructions (IR instruction counts).
- X86-specific nop sleds as described in the white paper.
- A machine function pass that adds the different instrumentation marker instructions at a very late stage.
- A way of identifying which return opcode is considered "normal" for each architecture.
There are some caveats here:
1) We don't handle PATCHABLE_RET in platforms other than x86_64 yet -- this means if IR used PATCHABLE_RET directly instead of a normal ret, instruction lowering for that platform might do the wrong thing. We think this should be handled at instruction selection time to by default be unpacked for platforms where XRay is not availble yet.
2) The generated section for X86 is different from what is described from the white paper for the sole reason that LLVM allows us to do this neatly. We're taking the opportunity to deviate from the white paper from this perspective to allow us to get richer information from the runtime library.
Reviewers: sanjoy, eugenis, kcc, pcc, echristo, rnk
Subscribers: niravd, majnemer, atrick, rnk, emaste, bmakam, mcrosier, mehdi_amini, llvm-commits
Differential Revision: http://reviews.llvm.org/D19904
llvm-svn: 275367
2016-07-14 12:06:33 +08:00
|
|
|
XRayInstrumentation.cpp
|
2021-12-23 04:46:06 +08:00
|
|
|
${GeneratedMLSources}
|
2015-02-11 11:28:02 +08:00
|
|
|
|
2020-08-22 21:10:16 +08:00
|
|
|
LiveDebugValues/LiveDebugValues.cpp
|
2020-08-22 19:53:49 +08:00
|
|
|
LiveDebugValues/VarLocBasedImpl.cpp
|
[LiveDebugValues] Add instruction-referencing LDV implementation
This patch imports the instruction-referencing implementation of
LiveDebugValues proposed here:
http://lists.llvm.org/pipermail/llvm-dev/2020-June/142368.html
The new implementation is unreachable in this patch, it's the next patch
that enables it behind a command line switch. Briefly, rather than
tracking variable locations by just their location as the 'VarLoc'
implementation does, this implementation does it by value:
* Each value defined in a function is numbered, and propagated through
dataflow,
* Each DBG_VALUE reads a machine value number from a machine location,
* Variable _values_ are propagated through dataflow,
* Variable values are translated back into locations, DBG_VALUEs
inserted to specify where those locations are.
The ultimate aim of this is to enable referring to variable values
throughout post-isel code, rather than locations. Those patches will
build on top of this new LiveDebugValues implementation in later patches
-- it can't be done with the VarLoc implementation as we don't have
value information, only locations.
Differential Revision: https://reviews.llvm.org/D83047
2020-08-22 23:07:39 +08:00
|
|
|
LiveDebugValues/InstrRefBasedImpl.cpp
|
2020-08-22 19:53:49 +08:00
|
|
|
|
2015-02-11 11:28:02 +08:00
|
|
|
ADDITIONAL_HEADER_DIRS
|
|
|
|
${LLVM_MAIN_INCLUDE_DIR}/llvm/CodeGen
|
|
|
|
${LLVM_MAIN_INCLUDE_DIR}/llvm/CodeGen/PBQP
|
2015-08-30 06:34:34 +08:00
|
|
|
|
2021-12-23 04:46:06 +08:00
|
|
|
LINK_LIBS ${LLVM_PTHREAD_LIB} ${MLLinkDeps}
|
2011-02-19 06:06:14 +08:00
|
|
|
|
2016-11-17 12:36:50 +08:00
|
|
|
DEPENDS
|
|
|
|
intrinsics_gen
|
2021-12-23 04:46:06 +08:00
|
|
|
${MLDeps}
|
2020-10-10 00:41:21 +08:00
|
|
|
|
|
|
|
LINK_COMPONENTS
|
|
|
|
Analysis
|
|
|
|
BitReader
|
|
|
|
BitWriter
|
|
|
|
Core
|
|
|
|
MC
|
|
|
|
ProfileData
|
|
|
|
Scalar
|
|
|
|
Support
|
|
|
|
Target
|
|
|
|
TransformUtils
|
2016-11-17 12:36:50 +08:00
|
|
|
)
|
2012-06-24 21:32:01 +08:00
|
|
|
|
2011-02-19 06:06:14 +08:00
|
|
|
add_subdirectory(SelectionDAG)
|
|
|
|
add_subdirectory(AsmPrinter)
|
2015-05-28 02:02:19 +08:00
|
|
|
add_subdirectory(MIRParser)
|
2016-02-12 03:18:27 +08:00
|
|
|
add_subdirectory(GlobalISel)
|