2010-10-08 02:41:20 +08:00
|
|
|
//===-- CodeGen.cpp -------------------------------------------------------===//
|
|
|
|
//
|
|
|
|
// The LLVM Compiler Infrastructure
|
|
|
|
//
|
|
|
|
// This file is distributed under the University of Illinois Open Source
|
|
|
|
// License. See LICENSE.TXT for details.
|
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
//
|
|
|
|
// This file implements the common initialization routines for the
|
|
|
|
// CodeGen library.
|
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
|
|
|
|
#include "llvm-c/Initialization.h"
|
2017-06-06 19:49:48 +08:00
|
|
|
#include "llvm/InitializePasses.h"
|
2014-01-07 19:48:04 +08:00
|
|
|
#include "llvm/PassRegistry.h"
|
2010-10-08 02:41:20 +08:00
|
|
|
|
|
|
|
using namespace llvm;
|
|
|
|
|
|
|
|
/// initializeCodeGen - Initialize all passes linked into the CodeGen library.
|
|
|
|
void llvm::initializeCodeGen(PassRegistry &Registry) {
|
2014-08-22 05:50:01 +08:00
|
|
|
initializeAtomicExpandPass(Registry);
|
2012-02-09 05:22:48 +08:00
|
|
|
initializeBranchFolderPassPass(Registry);
|
2016-10-06 23:38:53 +08:00
|
|
|
initializeBranchRelaxationPass(Registry);
|
2018-04-24 18:32:08 +08:00
|
|
|
initializeCFIInstrInserterPass(Registry);
|
2014-02-22 08:07:45 +08:00
|
|
|
initializeCodeGenPreparePass(Registry);
|
2010-10-08 02:41:20 +08:00
|
|
|
initializeDeadMachineInstructionElimPass(Registry);
|
2016-04-28 11:07:16 +08:00
|
|
|
initializeDetectDeadLanesPass(Registry);
|
2015-03-10 06:45:16 +08:00
|
|
|
initializeDwarfEHPreparePass(Registry);
|
2012-07-04 08:09:54 +08:00
|
|
|
initializeEarlyIfConverterPass(Registry);
|
2018-01-19 14:46:10 +08:00
|
|
|
initializeEarlyMachineLICMPass(Registry);
|
2018-01-19 14:08:17 +08:00
|
|
|
initializeEarlyTailDuplicatePass(Registry);
|
2015-02-20 10:15:36 +08:00
|
|
|
initializeExpandISelPseudosPass(Registry);
|
2017-11-03 20:12:27 +08:00
|
|
|
initializeExpandMemCmpPassPass(Registry);
|
2015-03-10 06:45:16 +08:00
|
|
|
initializeExpandPostRAPass(Registry);
|
2017-02-01 01:00:27 +08:00
|
|
|
initializeFEntryInserterPass(Registry);
|
2017-03-18 13:05:32 +08:00
|
|
|
initializeFinalizeMachineBundlesPass(Registry);
|
2015-09-18 04:45:18 +08:00
|
|
|
initializeFuncletLayoutPass(Registry);
|
2012-02-09 05:23:13 +08:00
|
|
|
initializeGCMachineCodeAnalysisPass(Registry);
|
2010-10-08 02:41:20 +08:00
|
|
|
initializeGCModuleInfoPass(Registry);
|
|
|
|
initializeIfConverterPass(Registry);
|
2017-03-18 13:05:32 +08:00
|
|
|
initializeImplicitNullChecksPass(Registry);
|
Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre..
Summary:
First, we need to explain the core of the vulnerability. Note that this
is a very incomplete description, please see the Project Zero blog post
for details:
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
The basis for branch target injection is to direct speculative execution
of the processor to some "gadget" of executable code by poisoning the
prediction of indirect branches with the address of that gadget. The
gadget in turn contains an operation that provides a side channel for
reading data. Most commonly, this will look like a load of secret data
followed by a branch on the loaded value and then a load of some
predictable cache line. The attacker then uses timing of the processors
cache to determine which direction the branch took *in the speculative
execution*, and in turn what one bit of the loaded value was. Due to the
nature of these timing side channels and the branch predictor on Intel
processors, this allows an attacker to leak data only accessible to
a privileged domain (like the kernel) back into an unprivileged domain.
The goal is simple: avoid generating code which contains an indirect
branch that could have its prediction poisoned by an attacker. In many
cases, the compiler can simply use directed conditional branches and
a small search tree. LLVM already has support for lowering switches in
this way and the first step of this patch is to disable jump-table
lowering of switches and introduce a pass to rewrite explicit indirectbr
sequences into a switch over integers.
However, there is no fully general alternative to indirect calls. We
introduce a new construct we call a "retpoline" to implement indirect
calls in a non-speculatable way. It can be thought of loosely as
a trampoline for indirect calls which uses the RET instruction on x86.
Further, we arrange for a specific call->ret sequence which ensures the
processor predicts the return to go to a controlled, known location. The
retpoline then "smashes" the return address pushed onto the stack by the
call with the desired target of the original indirect call. The result
is a predicted return to the next instruction after a call (which can be
used to trap speculative execution within an infinite loop) and an
actual indirect branch to an arbitrary address.
On 64-bit x86 ABIs, this is especially easily done in the compiler by
using a guaranteed scratch register to pass the target into this device.
For 32-bit ABIs there isn't a guaranteed scratch register and so several
different retpoline variants are introduced to use a scratch register if
one is available in the calling convention and to otherwise use direct
stack push/pop sequences to pass the target address.
This "retpoline" mitigation is fully described in the following blog
post: https://support.google.com/faqs/answer/7625886
We also support a target feature that disables emission of the retpoline
thunk by the compiler to allow for custom thunks if users want them.
These are particularly useful in environments like kernels that
routinely do hot-patching on boot and want to hot-patch their thunk to
different code sequences. They can write this custom thunk and use
`-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this
case, on x86-64 thu thunk names must be:
```
__llvm_external_retpoline_r11
```
or on 32-bit:
```
__llvm_external_retpoline_eax
__llvm_external_retpoline_ecx
__llvm_external_retpoline_edx
__llvm_external_retpoline_push
```
And the target of the retpoline is passed in the named register, or in
the case of the `push` suffix on the top of the stack via a `pushl`
instruction.
There is one other important source of indirect branches in x86 ELF
binaries: the PLT. These patches also include support for LLD to
generate PLT entries that perform a retpoline-style indirection.
The only other indirect branches remaining that we are aware of are from
precompiled runtimes (such as crt0.o and similar). The ones we have
found are not really attackable, and so we have not focused on them
here, but eventually these runtimes should also be replicated for
retpoline-ed configurations for completeness.
For kernels or other freestanding or fully static executables, the
compiler switch `-mretpoline` is sufficient to fully mitigate this
particular attack. For dynamic executables, you must compile *all*
libraries with `-mretpoline` and additionally link the dynamic
executable and all shared libraries with LLD and pass `-z retpolineplt`
(or use similar functionality from some other linker). We strongly
recommend also using `-z now` as non-lazy binding allows the
retpoline-mitigated PLT to be substantially smaller.
When manually apply similar transformations to `-mretpoline` to the
Linux kernel we observed very small performance hits to applications
running typical workloads, and relatively minor hits (approximately 2%)
even for extremely syscall-heavy applications. This is largely due to
the small number of indirect branches that occur in performance
sensitive paths of the kernel.
When using these patches on statically linked applications, especially
C++ applications, you should expect to see a much more dramatic
performance hit. For microbenchmarks that are switch, indirect-, or
virtual-call heavy we have seen overheads ranging from 10% to 50%.
However, real-world workloads exhibit substantially lower performance
impact. Notably, techniques such as PGO and ThinLTO dramatically reduce
the impact of hot indirect calls (by speculatively promoting them to
direct calls) and allow optimized search trees to be used to lower
switches. If you need to deploy these techniques in C++ applications, we
*strongly* recommend that you ensure all hot call targets are statically
linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well
tuned servers using all of these techniques saw 5% - 10% overhead from
the use of retpoline.
We will add detailed documentation covering these components in
subsequent patches, but wanted to make the core functionality available
as soon as possible. Happy for more code review, but we'd really like to
get these patches landed and backported ASAP for obvious reasons. We're
planning to backport this to both 6.0 and 5.0 release streams and get
a 5.0 release with just this cherry picked ASAP for distros and vendors.
This patch is the work of a number of people over the past month: Eric, Reid,
Rui, and myself. I'm mailing it out as a single commit due to the time
sensitive nature of landing this and the need to backport it. Huge thanks to
everyone who helped out here, and everyone at Intel who helped out in
discussions about how to craft this. Also, credit goes to Paul Turner (at
Google, but not an LLVM contributor) for much of the underlying retpoline
design.
Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer
Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D41723
llvm-svn: 323155
2018-01-23 06:05:25 +08:00
|
|
|
initializeIndirectBrExpandPassPass(Registry);
|
2016-05-20 04:08:32 +08:00
|
|
|
initializeInterleavedAccessPass(Registry);
|
2017-03-18 13:05:32 +08:00
|
|
|
initializeLiveDebugValuesPass(Registry);
|
2010-11-30 10:17:10 +08:00
|
|
|
initializeLiveDebugVariablesPass(Registry);
|
2010-10-08 02:41:20 +08:00
|
|
|
initializeLiveIntervalsPass(Registry);
|
Add LiveRangeShrink pass to shrink live range within BB.
Summary: LiveRangeShrink pass moves instruction right after the definition with the same BB if the instruction and its operands all have more than one use. This pass is inexpensive and guarantees optimal live-range within BB.
Reviewers: davidxl, wmi, hfinkel, MatzeB, andreadb
Reviewed By: MatzeB, andreadb
Subscribers: hiraditya, jyknight, sanjoy, skatkov, gberry, jholewinski, qcolombet, javed.absar, krytarowski, atrick, spatel, RKSimon, andreadb, MatzeB, mehdi_amini, mgorny, efriedma, davide, dberlin, llvm-commits
Differential Revision: https://reviews.llvm.org/D32563
llvm-svn: 304371
2017-06-01 07:25:25 +08:00
|
|
|
initializeLiveRangeShrinkPass(Registry);
|
2010-10-08 02:41:20 +08:00
|
|
|
initializeLiveStacksPass(Registry);
|
|
|
|
initializeLiveVariablesPass(Registry);
|
2012-02-09 05:23:13 +08:00
|
|
|
initializeLocalStackSlotPassPass(Registry);
|
2015-03-10 06:45:16 +08:00
|
|
|
initializeLowerIntrinsicsPass(Registry);
|
2011-07-26 03:25:40 +08:00
|
|
|
initializeMachineBlockFrequencyInfoPass(Registry);
|
Implement a block placement pass based on the branch probability and
block frequency analyses. This differs substantially from the existing
block-placement pass in LLVM:
1) It operates on the Machine-IR in the CodeGen layer. This exposes much
more (and more precise) information and opportunities. Also, the
results are more stable due to fewer transforms ocurring after the
pass runs.
2) It uses the generalized probability and frequency analyses. These can
model static heuristics, code annotation derived heuristics as well
as eventual profile loading. By basing the optimization on the
analysis interface it can work from any (or a combination) of these
inputs.
3) It uses a more aggressive algorithm, both building chains from tho
bottom up to maximize benefit, and using an SCC-based walk to layout
chains of blocks in a profitable ordering without O(N^2) iterations
which the old pass involves.
The pass is currently gated behind a flag, and not enabled by default
because it still needs to grow some important features. Most notably, it
needs to support loop aligning and careful layout of loop structures
much as done by hand currently in CodePlacementOpt. Once it supports
these, and has sufficient testing and quality tuning, it should replace
both of these passes.
Thanks to Nick Lewycky and Richard Smith for help authoring & debugging
this, and to Jakob, Andy, Eric, Jim, and probably a few others I'm
forgetting for reviewing and answering all my questions. Writing
a backend pass is *sooo* much better now than it used to be. =D
llvm-svn: 142641
2011-10-21 14:46:38 +08:00
|
|
|
initializeMachineBlockPlacementPass(Registry);
|
2011-11-02 15:17:12 +08:00
|
|
|
initializeMachineBlockPlacementStatsPass(Registry);
|
2015-02-20 10:15:36 +08:00
|
|
|
initializeMachineCSEPass(Registry);
|
2015-03-10 06:45:16 +08:00
|
|
|
initializeMachineCombinerPass(Registry);
|
|
|
|
initializeMachineCopyPropagationPass(Registry);
|
2010-10-08 02:41:20 +08:00
|
|
|
initializeMachineDominatorTreePass(Registry);
|
2015-03-10 06:45:16 +08:00
|
|
|
initializeMachineFunctionPrinterPassPass(Registry);
|
2010-10-08 02:41:20 +08:00
|
|
|
initializeMachineLICMPass(Registry);
|
|
|
|
initializeMachineLoopInfoPass(Registry);
|
|
|
|
initializeMachineModuleInfoPass(Registry);
|
2017-02-24 15:42:35 +08:00
|
|
|
initializeMachineOptimizationRemarkEmitterPassPass(Registry);
|
2017-03-07 05:31:18 +08:00
|
|
|
initializeMachineOutlinerPass(Registry);
|
2016-07-30 00:44:44 +08:00
|
|
|
initializeMachinePipelinerPass(Registry);
|
2015-03-10 06:45:16 +08:00
|
|
|
initializeMachinePostDominatorTreePass(Registry);
|
2017-02-18 08:41:16 +08:00
|
|
|
initializeMachineRegionInfoPassPass(Registry);
|
2012-02-09 05:23:13 +08:00
|
|
|
initializeMachineSchedulerPass(Registry);
|
2010-10-08 02:41:20 +08:00
|
|
|
initializeMachineSinkingPass(Registry);
|
|
|
|
initializeMachineVerifierPassPass(Registry);
|
|
|
|
initializeOptimizePHIsPass(Registry);
|
2015-03-10 06:45:16 +08:00
|
|
|
initializePEIPass(Registry);
|
2010-10-08 02:41:20 +08:00
|
|
|
initializePHIEliminationPass(Registry);
|
2017-03-18 13:05:32 +08:00
|
|
|
initializePatchableFunctionPass(Registry);
|
2010-10-08 02:41:20 +08:00
|
|
|
initializePeepholeOptimizerPass(Registry);
|
2013-12-29 05:56:51 +08:00
|
|
|
initializePostMachineSchedulerPass(Registry);
|
2016-04-22 22:43:50 +08:00
|
|
|
initializePostRAHazardRecognizerPass(Registry);
|
[CodeGen] Add a new pass for PostRA sink
Summary:
This pass sinks COPY instructions into a successor block, if the COPY is not
used in the current block and the COPY is live-in to a single successor
(i.e., doesn't require the COPY to be duplicated). This avoids executing the
the copy on paths where their results aren't needed. This also exposes
additional opportunites for dead copy elimination and shrink wrapping.
These copies were either not handled by or are inserted after the MachineSink
pass. As an example of the former case, the MachineSink pass cannot sink
COPY instructions with allocatable source registers; for AArch64 these type
of copy instructions are frequently used to move function parameters (PhyReg)
into virtual registers in the entry block..
For the machine IR below, this pass will sink %w19 in the entry into its
successor (%bb.1) because %w19 is only live-in in %bb.1.
```
%bb.0:
%wzr = SUBSWri %w1, 1
%w19 = COPY %w0
Bcc 11, %bb.2
%bb.1:
Live Ins: %w19
BL @fun
%w0 = ADDWrr %w0, %w19
RET %w0
%bb.2:
%w0 = COPY %wzr
RET %w0
```
As we sink %w19 (CSR in AArch64) into %bb.1, the shrink-wrapping pass will be
able to see %bb.0 as a candidate.
With this change I observed 12% more shrink-wrapping candidate and 13% more dead copies deleted in spec2000/2006/2017 on AArch64.
Reviewers: qcolombet, MatzeB, thegameg, mcrosier, gberry, hfinkel, john.brawn, twoh, RKSimon, sebpop, kparzysz
Reviewed By: sebpop
Subscribers: evandro, sebpop, sfertile, aemerson, mgorny, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D41463
llvm-svn: 328237
2018-03-23 04:06:47 +08:00
|
|
|
initializePostRAMachineSinkingPass(Registry);
|
2012-02-09 05:23:13 +08:00
|
|
|
initializePostRASchedulerPass(Registry);
|
2016-06-25 04:13:42 +08:00
|
|
|
initializePreISelIntrinsicLoweringLegacyPassPass(Registry);
|
2010-10-08 02:41:20 +08:00
|
|
|
initializeProcessImplicitDefsPass(Registry);
|
2017-06-03 06:46:26 +08:00
|
|
|
initializeRABasicPass(Registry);
|
2017-09-09 08:52:46 +08:00
|
|
|
initializeRegAllocFastPass(Registry);
|
2016-11-15 05:50:13 +08:00
|
|
|
initializeRAGreedyPass(Registry);
|
2011-06-27 06:34:10 +08:00
|
|
|
initializeRegisterCoalescerPass(Registry);
|
2016-06-01 06:38:06 +08:00
|
|
|
initializeRenameIndependentSubregsPass(Registry);
|
2017-05-10 08:39:22 +08:00
|
|
|
initializeSafeStackLegacyPassPass(Registry);
|
2017-05-15 19:30:54 +08:00
|
|
|
initializeScalarizeMaskedMemIntrinPass(Registry);
|
[ShrinkWrap] Add (a simplified version) of shrink-wrapping.
This patch introduces a new pass that computes the safe point to insert the
prologue and epilogue of the function.
The interest is to find safe points that are cheaper than the entry and exits
blocks.
As an example and to avoid regressions to be introduce, this patch also
implements the required bits to enable the shrink-wrapping pass for AArch64.
** Context **
Currently we insert the prologue and epilogue of the method/function in the
entry and exits blocks. Although this is correct, we can do a better job when
those are not immediately required and insert them at less frequently executed
places.
The job of the shrink-wrapping pass is to identify such places.
** Motivating example **
Let us consider the following function that perform a call only in one branch of
a if:
define i32 @f(i32 %a, i32 %b) {
%tmp = alloca i32, align 4
%tmp2 = icmp slt i32 %a, %b
br i1 %tmp2, label %true, label %false
true:
store i32 %a, i32* %tmp, align 4
%tmp4 = call i32 @doSomething(i32 0, i32* %tmp)
br label %false
false:
%tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ]
ret i32 %tmp.0
}
On AArch64 this code generates (removing the cfi directives to ease
readabilities):
_f: ; @f
; BB#0:
stp x29, x30, [sp, #-16]!
mov x29, sp
sub sp, sp, #16 ; =16
cmp w0, w1
b.ge LBB0_2
; BB#1: ; %true
stur w0, [x29, #-4]
sub x1, x29, #4 ; =4
mov w0, wzr
bl _doSomething
LBB0_2: ; %false
mov sp, x29
ldp x29, x30, [sp], #16
ret
With shrink-wrapping we could generate:
_f: ; @f
; BB#0:
cmp w0, w1
b.ge LBB0_2
; BB#1: ; %true
stp x29, x30, [sp, #-16]!
mov x29, sp
sub sp, sp, #16 ; =16
stur w0, [x29, #-4]
sub x1, x29, #4 ; =4
mov w0, wzr
bl _doSomething
add sp, x29, #16 ; =16
ldp x29, x30, [sp], #16
LBB0_2: ; %false
ret
Therefore, we would pay the overhead of setting up/destroying the frame only if
we actually do the call.
** Proposed Solution **
This patch introduces a new machine pass that perform the shrink-wrapping
analysis (See the comments at the beginning of ShrinkWrap.cpp for more details).
It then stores the safe save and restore point into the MachineFrameInfo
attached to the MachineFunction.
This information is then used by the PrologEpilogInserter (PEI) to place the
related code at the right place. This pass runs right before the PEI.
Unlike the original paper of Chow from PLDI’88, this implementation of
shrink-wrapping does not use expensive data-flow analysis and does not need hack
to properly avoid frequently executed point. Instead, it relies on dominance and
loop properties.
The pass is off by default and each target can opt-in by setting the
EnableShrinkWrap boolean to true in their derived class of TargetPassConfig.
This setting can also be overwritten on the command line by using
-enable-shrink-wrap.
Before you try out the pass for your target, make sure you properly fix your
emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not
necessarily the entry block.
** Design Decisions **
1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but
for debugging and clarity I thought it was best to have its own file.
2. Right now, we only support one save point and one restore point. At some
point we can expand this to several save point and restore point, the impacted
component would then be:
- The pass itself: New algorithm needed.
- MachineFrameInfo: Hold a list or set of Save/Restore point instead of one
pointer.
- PEI: Should loop over the save point and restore point.
Anyhow, at least for this first iteration, I do not believe this is interesting
to support the complex cases. We should revisit that when we motivating
examples.
Differential Revision: http://reviews.llvm.org/D9210
<rdar://problem/3201744>
llvm-svn: 236507
2015-05-06 01:38:16 +08:00
|
|
|
initializeShrinkWrapPass(Registry);
|
2010-10-08 02:41:20 +08:00
|
|
|
initializeSlotIndexesPass(Registry);
|
2015-02-20 10:15:36 +08:00
|
|
|
initializeStackColoringPass(Registry);
|
2015-03-10 06:45:16 +08:00
|
|
|
initializeStackMapLivenessPass(Registry);
|
|
|
|
initializeStackProtectorPass(Registry);
|
2010-10-08 02:41:20 +08:00
|
|
|
initializeStackSlotColoringPass(Registry);
|
2018-01-19 14:08:17 +08:00
|
|
|
initializeTailDuplicatePass(Registry);
|
2012-02-04 10:56:45 +08:00
|
|
|
initializeTargetPassConfigPass(Registry);
|
2010-10-08 02:41:20 +08:00
|
|
|
initializeTwoAddressInstructionPassPass(Registry);
|
2012-02-09 05:23:13 +08:00
|
|
|
initializeUnpackMachineBundlesPass(Registry);
|
2016-07-08 11:32:49 +08:00
|
|
|
initializeUnreachableBlockElimLegacyPassPass(Registry);
|
2010-10-08 02:41:20 +08:00
|
|
|
initializeUnreachableMachineBlockElimPass(Registry);
|
|
|
|
initializeVirtRegMapPass(Registry);
|
2012-06-09 07:44:45 +08:00
|
|
|
initializeVirtRegRewriterPass(Registry);
|
2015-03-10 06:45:16 +08:00
|
|
|
initializeWinEHPreparePass(Registry);
|
2017-03-18 13:05:32 +08:00
|
|
|
initializeXRayInstrumentationPass(Registry);
|
2017-11-03 07:37:32 +08:00
|
|
|
initializeMIRCanonicalizerPass(Registry);
|
2010-10-08 02:41:20 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
void LLVMInitializeCodeGen(LLVMPassRegistryRef R) {
|
|
|
|
initializeCodeGen(*unwrap(R));
|
|
|
|
}
|