Sadly, r324359 caused at least PR36312. There is a patch out for review
but it seems to be taking a bit and we've already had these crashers in
tree for too long. We're hitting this PR in real code now and are
blocked on shipping new compilers as a consequence so I'm reverting us
back to green.
Sorry for the churn due to the stacked changes that I had to revert. =/
llvm-svn: 325420
Travel all chains paths to first non-tokenfactor node can be
exponential work. Add simple redundency check to avoid this.
Fixes PR36264.
llvm-svn: 324491
Instruction Selection
Cleanup cycle/validity checks in ISel (IsLegalToFold,
HandleMergeInputChains) and X86 (isFusableLoadOpStore). Now do a full
search for cycles / dependencies pruning the search when topological
property of NodeId allows.
As part of this propogate the NodeId-based cutoffs to narrow
hasPreprocessorHelper searches.
Reviewers: craig.topper, bogner
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D41293
llvm-svn: 324359
Summary:
In Instruction Selection UpdateChains replaces all matched Nodes'
chain references including interior token factors and deletes them.
This may allow nodes which depend on these interior nodes but are not
part of the set of matched nodes to be left with a dangling dependence.
Avoid this by doing the replacement for matched non-TokenFactor nodes.
Fixes PR36164.
Reviewers: jonpa, RKSimon, bogner
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D42754
llvm-svn: 323977
Summary:
Instruction Selection preserves relative orders of all nodes save
TokenFactors which we treat specially. As a result Node Ids for
TokenFactors may violate the topological ordering and should not be
considered as valid pruning candidates in predecessor search.
Fixes PR35316.
Reviewers: RKSimon, hfinkel
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D42701
llvm-svn: 323880
Previously some targets printed their own message at the start of Select to indicate what they were selecting. For the targets that didn't, it means there was no print of the root node before any custom handling in the target executed. So if the target did something custom and never called SelectNodeCommon, no print would be made. For the targets that did print a message in Select, if they didn't custom handle a node SelectNodeCommon would reprint the root node before walking the isel table.
It seems better to just print the message before the call to Select so all targets behave the same. And then remove the root node printing from SelectNodeCommon and just leave a message that says we're starting the table search.
There were also some oddities in blank line behavior. Usually due to a \n after a call to SelectionDAGNode::dump which already inserted a new line.
llvm-svn: 323551
Apparently checking the pass structure isn't enough to ensure that we don't fall
back to FastISel, as it's set up as part of the SelectionDAGISel.
llvm-svn: 323369
Ingredients in this patch:
1. Add HANDLE_LIBCALL defs for finite mathlib functions that correspond to LLVM intrinsics.
2. Plumbing to send TargetLibraryInfo down to SelectionDAGLegalize.
3. Relaxed math and library checking in SelectionDAGLegalize::ConvertNodeToLibcall() to choose finite libcalls.
There was a bug about determining the availability of the finite calls that should be fixed with:
rL322010
Not in this patch:
This doesn't resolve the question/bug of clang creating the intrinsic IR in the first place.
There's likely follow-up work needed to support the long double variants better.
There's room for improvement to reduce the code duplication.
Create finite calls that don't originate from a corresponding intrinsic or DAG node?
Differential Revision: https://reviews.llvm.org/D41338
llvm-svn: 322087
When intrinsics are allowed to have mem operands, there
are two ways this can happen. First is an intrinsic
that is marked has having a mem operand, but is not handled
by getTgtMemIntrinsic.
The second way can occur even for intrinsics which do not
have a mem operand. It seems the selector table does
some kind of sorting based on the opcode, and the
mem ref recording can happen in the same scope for
intrinsics that both do and do not have mem refs.
I haven't been able to figure out exactly why this happens
(although it happens even with the matcher optimizations disabled).
I'm not sure if it's worth trying to avoid hitting this for
these nodes since I think it's still reasonable to handle
this in case getTgtMemIntrinic is not implemented.
llvm-svn: 321208
Introduces the AddrFI "addressing mode", which is necessary simply because
it's not possible to write a pattern that directly matches a frameindex.
Ensure callee-saved registers are accessed relative to the stackpointer. This
is necessary as callee-saved register spills are performed before the frame
pointer is set.
Move HexagonDAGToDAGISel::isOrEquivalentToAdd to SelectionDAGISel, so we can
make use of it in the RISC-V backend.
Differential Revision: https://reviews.llvm.org/D39848
llvm-svn: 320353
As part of the unification of the debug format and the MIR format, print
MBB references as '%bb.5'.
The MIR printer prints the IR name of a MBB only for block definitions.
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(*\1)/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g'
* find . \( -name "*.txt" -o -name "*.s" -o -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g'
* grep -nr 'BB#' and fix
Differential Revision: https://reviews.llvm.org/D40422
llvm-svn: 319665
Summary:
1/ Operand folding during complex pattern matching for LEAs has been extended, such that it promotes Scale to
accommodate similar operand appearing in the DAG e.g.
T1 = A + B
T2 = T1 + 10
T3 = T2 + A
For above DAG rooted at T3, X86AddressMode will now look like
Base = B , Index = A , Scale = 2 , Disp = 10
2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs so that if there is an opportunity
then complex LEAs (having 3 operands) could be factored out e.g.
leal 1(%rax,%rcx,1), %rdx
leal 1(%rax,%rcx,2), %rcx
will be factored as following
leal 1(%rax,%rcx,1), %rdx
leal (%rdx,%rcx) , %edx
3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, thus avoiding creation of any complex LEAs within a loop.
4/ Simplify LEA converts (lea (BASE,1,INDEX,0) --> add (BASE, INDEX) which offers better through put.
PR32755 will be taken care of by this pathc.
Previous patch revisions : r313343 , r314886
Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy, jbhateja
Reviewed By: lsaba, RKSimon, jbhateja
Subscribers: jmolloy, spatel, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D35014
llvm-svn: 319543
All these headers already depend on CodeGen headers so moving them into
CodeGen fixes the layering (since CodeGen depends on Target, not the
other way around).
llvm-svn: 318490
processDbgDeclares assumes pointer size is the same for different addr spaces.
It uses pointer size for addr space 0 for all pointers, which causes assertion
in stripAndAccumulateInBoundsConstantOffsets for amdgcn---amdgiz since
pointer in addr space 5 has different size than in addr space 0.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D40085
llvm-svn: 318370
This header includes CodeGen headers, and is not, itself, included by
any Target headers, so move it into CodeGen to match the layering of its
implementation.
llvm-svn: 317647
Similar to how llvm::salvagDebugInfo hooks into InstCombine, this adds
a hook that can be invoked before an SDNode that is associated with an
SDDbgValue is erased to capture the effect of the deleted node in a
DIExpression.
The motivating example is an SDDebugValue attached to an ADD operation
that gets folded into a LOAD+OFFSET operation.
rdar://problem/32121503
llvm-svn: 316525
It broke the Chromium / SQLite build; see PR34830.
> Summary:
> 1/ Operand folding during complex pattern matching for LEAs has been
> extended, such that it promotes Scale to accommodate similar operand
> appearing in the DAG.
> e.g.
> T1 = A + B
> T2 = T1 + 10
> T3 = T2 + A
> For above DAG rooted at T3, X86AddressMode will no look like
> Base = B , Index = A , Scale = 2 , Disp = 10
>
> 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
> so that if there is an opportunity then complex LEAs (having 3 operands)
> could be factored out.
> e.g.
> leal 1(%rax,%rcx,1), %rdx
> leal 1(%rax,%rcx,2), %rcx
> will be factored as following
> leal 1(%rax,%rcx,1), %rdx
> leal (%rdx,%rcx) , %edx
>
> 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
> thus avoiding creation of any complex LEAs within a loop.
>
> Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy
>
> Reviewed By: lsaba
>
> Subscribers: jmolloy, spatel, igorb, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D35014
llvm-svn: 314919
Summary:
1/ Operand folding during complex pattern matching for LEAs has been
extended, such that it promotes Scale to accommodate similar operand
appearing in the DAG.
e.g.
T1 = A + B
T2 = T1 + 10
T3 = T2 + A
For above DAG rooted at T3, X86AddressMode will no look like
Base = B , Index = A , Scale = 2 , Disp = 10
2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
so that if there is an opportunity then complex LEAs (having 3 operands)
could be factored out.
e.g.
leal 1(%rax,%rcx,1), %rdx
leal 1(%rax,%rcx,2), %rcx
will be factored as following
leal 1(%rax,%rcx,1), %rdx
leal (%rdx,%rcx) , %edx
3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
thus avoiding creation of any complex LEAs within a loop.
Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy
Reviewed By: lsaba
Subscribers: jmolloy, spatel, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D35014
llvm-svn: 314886
Add support for passing SwiftError through a register on the Windows x64
calling convention. This allows the use of swifterror attributes on
parameters which is used by the swift front end for the `Error`
parameter. This partially enables building the swift standard library
for Windows x86_64.
llvm-svn: 313791
This caused PR34629: asserts firing when building Chromium. It also broke some
buildbots building test-suite as reported on the commit thread.
> Summary:
> 1/ Operand folding during complex pattern matching for LEAs has been
> extended, such that it promotes Scale to accommodate similar operand
> appearing in the DAG.
> e.g.
> T1 = A + B
> T2 = T1 + 10
> T3 = T2 + A
> For above DAG rooted at T3, X86AddressMode will no look like
> Base = B , Index = A , Scale = 2 , Disp = 10
>
> 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
> so that if there is an opportunity then complex LEAs (having 3 operands)
> could be factored out.
> e.g.
> leal 1(%rax,%rcx,1), %rdx
> leal 1(%rax,%rcx,2), %rcx
> will be factored as following
> leal 1(%rax,%rcx,1), %rdx
> leal (%rdx,%rcx) , %edx
>
> 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
> thus avoiding creation of any complex LEAs within a loop.
>
> Reviewers: lsaba, RKSimon, craig.topper, qcolombet
>
> Reviewed By: lsaba
>
> Subscribers: spatel, igorb, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D35014
llvm-svn: 313376
Summary:
1/ Operand folding during complex pattern matching for LEAs has been
extended, such that it promotes Scale to accommodate similar operand
appearing in the DAG.
e.g.
T1 = A + B
T2 = T1 + 10
T3 = T2 + A
For above DAG rooted at T3, X86AddressMode will no look like
Base = B , Index = A , Scale = 2 , Disp = 10
2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
so that if there is an opportunity then complex LEAs (having 3 operands)
could be factored out.
e.g.
leal 1(%rax,%rcx,1), %rdx
leal 1(%rax,%rcx,2), %rcx
will be factored as following
leal 1(%rax,%rcx,1), %rdx
leal (%rdx,%rcx) , %edx
3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
thus avoiding creation of any complex LEAs within a loop.
Reviewers: lsaba, RKSimon, craig.topper, qcolombet
Reviewed By: lsaba
Subscribers: spatel, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D35014
llvm-svn: 313343
Summary:
This intrinsic represents a label with a list of associated metadata
strings. It is modelled as reading and writing inaccessible memory so
that it won't be removed as dead code. I think the intention is that the
annotation strings should appear at most once in the debug info, so I
marked it noduplicate. We are allowed to inline code with annotations as
long as we strip the annotation, but that can be done later.
Reviewers: majnemer
Subscribers: eraman, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D36904
llvm-svn: 312569
The NewNodesMustHaveLegalTypes flag is set to false at the beginning of CodeGenAndEmitDAG, and set to true after legalizing types.
But before calling CodeGenAndEmitDAG we build the DAG for the basic block.
So for the first basic block NewNodesMustHaveLegalTypes would be 'false' during the SDAG building, and for all other basic blocks it would be 'true'.
This patch sets the flag to false before SDAG building each basic block.
Differential Revision:
https://reviews.llvm.org/D33435
llvm-svn: 310239
Summary:
We are crashing in LLC at O0 when gc intrinsics are present in the block.
The reason being FastISel performs basic block ISel by modifying GC.relocates
to be the first instruction in the block. This can cause us to visit the GC
relocate before it's corresponding GC.statepoint is visited, which is incorrect.
When we lower the statepoint, we record the base and derived pointers, along
with the gc.relocates. After this we can visit the gc.relocate.
This patch avoids fastISel from incorrectly creating the block with gc.relocate
as the first instruction.
Reviewers: qcolombet, skatkov, qikon, reames
Reviewed by: skatkov
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34421
llvm-svn: 307084
The code assumed that we process instructions in basic block order. FastISel
processes instructions in reverse basic block order. We need to pre-assign
virtual registers before selecting otherwise we get def-use relationships wrong.
This only affects code with swifterror registers.
rdar://32659327
llvm-svn: 305484
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.
I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.
This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.
Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).
llvm-svn: 304787
Running `llc -verify-dom-info` on the attached testcase results in a
crash in the verifier, due to a stale dominator tree.
i.e.
DominatorTree is not up to date!
Computed:
=============================--------------------------------
Inorder Dominator Tree:
[1] %safe_mod_func_uint8_t_u_u.exit.i.i.i {0,7}
[2] %lor.lhs.false.i61.i.i.i {1,2}
[2] %safe_mod_func_int8_t_s_s.exit.i.i.i {3,6}
[3] %safe_div_func_int64_t_s_s.exit66.i.i.i {4,5}
Actual:
=============================--------------------------------
Inorder Dominator Tree:
[1] %safe_mod_func_uint8_t_u_u.exit.i.i.i {0,9}
[2] %lor.lhs.false.i61.i.i.i {1,2}
[2] %safe_mod_func_int8_t_s_s.exit.i.i.i {3,8}
[3] %safe_div_func_int64_t_s_s.exit66.i.i.i {4,5}
[3] %safe_mod_func_int8_t_s_s.exit.i.i.i.lor.lhs.false.i61.i.i.i_crit_edge {6,7}
This is because in `SelectionDAGIsel` we split critical edges without
updating the corresponding dominator for the function (and we claim
in `MachineFunctionPass::getAnalysisUsage()` that the domtree is preserved).
We could either stop preserving the domtree in `getAnalysisUsage`
or tell `splitCriticalEdge()` to update it.
As the second option is easy to implement, that's the one I chose.
Differential Revision: https://reviews.llvm.org/D33800
llvm-svn: 304742
The recursive implementation of findNonImmUse may overflow stack
on extremely long use chains. This patch replaces it with an equivalent
iterative implementation.
Reviewed By: bogner
Differential Revision: https://reviews.llvm.org/D33775
llvm-svn: 304522
Before r247167, the pass manager builder controlled which AA
implementations were used, exporting them all in the AliasAnalysis
analysis group.
Now, AAResultsWrapperPass always uses BasicAA, but still uses other AA
implementations if made available in the pass pipeline.
But regardless, SDAGISel is required at O0, and really doesn't need to
be doing fancy optimizations based on useful AA results.
Don't require AA at CodeGenOpt::None, and only use it otherwise.
This does have a functional impact (and one testcase is pessimized
because we can't reuse a load). But I think that's desirable no matter
what.
Note that this alone doesn't result in less DT computations: TwoAddress
was previously able to reuse the DT we computed for SDAG. That will be
fixed separately.
Differential Revision: https://reviews.llvm.org/D32766
llvm-svn: 302611
Summary:
For inalloca functions, this is a very common code pattern:
%argpack = type <{ i32, i32, i32 }>
define void @f(%argpack* inalloca %args) {
entry:
%a = getelementptr inbounds %argpack, %argpack* %args, i32 0, i32 0
%b = getelementptr inbounds %argpack, %argpack* %args, i32 0, i32 1
%c = getelementptr inbounds %argpack, %argpack* %args, i32 0, i32 2
tail call void @llvm.dbg.declare(metadata i32* %a, ... "a")
tail call void @llvm.dbg.declare(metadata i32* %c, ... "b")
tail call void @llvm.dbg.declare(metadata i32* %b, ... "c")
Even though these GEPs can be simplified to a constant offset from EBP
or RSP, we don't do that at -O0, and each GEP is computed into a
register. Registers used to compute argument addresses are typically
spilled and clobbered very quickly after the initial computation, so
live debug variable tracking loses information very quickly if we use
DBG_VALUE instructions.
This change moves processing of dbg.declare between argument lowering
and basic block isel, so that we can ask if an argument has a frame
index or not. If the argument lives in a register as is the case for
byval arguments on some targets, then we don't put it in the side table
and during ISel we emit DBG_VALUE instructions.
Reviewers: aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32980
llvm-svn: 302483
Adds a new method finalizeLowering to TargetLoweringBase. This is in
preparation for an upcoming commit.
This function is meant for target specific adjustments to
MachineFrameInfo or register reservations.
Move the freezeRegisters() and the hasCopyImplyingStackAdjustment()
handling into the new function to prove the concept. As an added bonus
GlobalISel no longer missed the hasCopyImplyingStackAdjustment()
handling with this.
Differential Revision: https://reviews.llvm.org/D32621
llvm-svn: 301679
This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar to what was done to ValueTracking's version recently.
This is largely a mechanical transformation from KnownZero to Known.Zero.
Differential Revision: https://reviews.llvm.org/D32569
llvm-svn: 301620
This patch uses various APInt methods to reduce the number of temporary APInts. These were all found while working through converting SelectionDAG's computeKnownBits to also use the KnownBits struct recently added to the ValueTracking version.
llvm-svn: 301618
In the long-term, we want to replace statistics with something
finer-grained that lets us gather per-function data.
Remarks are that replacement.
Create an ORE instance in SelectionDAGISel, and pass it to
SelectionDAG.
SelectionDAG was used so that we can emit remarks from all
SelectionDAG-related code, including TargetLowering and DAGCombiner.
This isn't used in the current patch but Adam tells me he's interested
for the fp-contract combines.
Use the ORE instance to emit FastISel failures as remarks (instead of
the mix of dbgs() dumps and statistics that we currently have).
Eventually, we want to have an API that tells us whether remarks are
enabled (http://llvm.org/PR32352) so that we don't emit expensive
remarks (in this case, dumping IR) when it's not needed. For now, use
'isEnabled' as a crude replacement.
This does mean that the replacement for '-fast-isel-verbose' is now
'-pass-remarks-missed=isel'. Additionally, clang users also need to
enable remark diagnostics, using '-Rpass-missed=isel'.
This also removes '-fast-isel-verbose2': there are no static statistics
that we want to only enable in asserts builds, so we can always use
the remarks regardless of the build type.
Differential Revision: https://reviews.llvm.org/D31405
llvm-svn: 299093
When checking if chain node is foldable, make sure the intermediate nodes have a single use across all results not just the result that was used to reach the chain node.
This recovers a test case that was severely broken by r296476, my making sure we don't create ADD/ADC that loads and stores when there is also a flag dependency.
llvm-svn: 297698
This patch causes compile times for some patterns to explode. I have
a (large, unreduced) test case that slows down by more than 20x and
several test cases slow down by 2x. I'm sending some of the test cases
directly to Nirav and following up with more details in the review log,
but this should unblock anyone else hitting this.
llvm-svn: 296862
Summary:
Avoids tons of prologue boilerplate when arguments are passed in memory
and left in memory. This can happen in a debug build or in a release
build when an argument alloca is escaped. This will dramatically affect
the code size of x86 debug builds, because X86 fast isel doesn't handle
arguments passed in memory at all. It only handles the x86_64 case of up
to 6 basic register parameters.
This is implemented by analyzing the entry block before ISel to identify
copy elision candidates. A copy elision candidate is an argument that is
used to fully initialize an alloca before any other possibly escaping
uses of that alloca. If an argument is a copy elision candidate, we set
a flag on the InputArg. If the the target generates loads from a fixed
stack object that matches the size and alignment requirements of the
alloca, the SelectionDAG builder will delete the stack object created
for the alloca and replace it with the fixed stack object. The load is
left behind to satisfy any remaining uses of the argument value. The
store is now dead and is therefore elided. The fixed stack object is
also marked as mutable, as it may now be modified by the user, and it
would be invalid to rematerialize the initial load from it.
Supersedes D28388
Fixes PR26328
Reviewers: chandlerc, MatzeB, qcolombet, inglorion, hans
Subscribers: igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D29668
llvm-svn: 296683
When SDAGISel (top-down) selects a tail-call, it skips the remainder
of the block.
If, before that, FastISel (bottom-up) selected some of the (no-op) next
few instructions, we can end up with dead instructions following the
terminator (selected by SDAGISel).
We need to erase them, as we know they aren't necessary (in addition to
being incorrect).
We already do this when FastISel falls back on the tail-call itself.
Also remove the FastISel-emitted code if we fallback on the
instructions between the tail-call and the return.
llvm-svn: 296552
This recovers a test case that was severely broken by r296476, my making sure we don't create ADD/ADC that loads and stores when there is also a flag dependency.
llvm-svn: 296486
To help assist in debugging ISEL or to prioritize GlobalISel backend
work, this patch adds two more tables to <Target>GenISelDAGISel.inc -
one which contains the patterns that are used during selection and the
other containing include source location of the patterns
Enabled through CMake varialbe LLVM_ENABLE_DAGISEL_COV
llvm-svn: 295081
This is consistent with what we do for GlobalISel. That way, it is easy
to see whether or not FastISel is able to fully select a function.
At some point we may want to switch that to an optimization remark.
llvm-svn: 294970
The patch comes in 2 parts:
1 - it makes use of the SelectionDAG::NewNodesMustHaveLegalTypes flag to tell when it can safely constant fold illegal types.
2 - it correctly resets SelectionDAG::NewNodesMustHaveLegalTypes at the start of each call to SelectionDAGISel::CodeGenAndEmitDAG so all the pre-legalization stages can make use of it - not just the first basic block that gets handled.
Fix for PR30760
Differential Revision: https://reviews.llvm.org/D29568
llvm-svn: 294749
Summary:
With -debug, we aren't dumping the DAG after legalizing vector ops. In particular, on X86 with AVX1 only, we don't dump the DAG after we split 256-bit integer ops into pairs of 128-bit ADDs since this occurs during vector legalization.
I'm only dumping if the legalize vector ops changes something since we don't print anything during legalize vector ops. So this dump shows up right after the first type-legalization dump happens. So if nothing changed this second dump is unnecessary.
Having said that though, I think we should probably fix legalize vector ops to log what its doing.
Reviewers: RKSimon, eli.friedman, spatel, arsenm, chandlerc
Reviewed By: RKSimon
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D29554
llvm-svn: 294711
Hoist entry block code for arguments and swift error values out of the
basic block instruction selection loop. Lowering arguments once up front
seems much more readable than doing it conditionally inside the loop. It
also makes it clear that argument lowering can update StaticAllocaMap
because no instructions have been selected yet.
Also use range-based for loops where possible.
llvm-svn: 294329
ISD::DELETED_NODE && "NodeToMatch was removed partway through
selection"' failed.
NodeToMatch can be modified during matching, but code does not handle
this situation.
Differential Revision: https://reviews.llvm.org/D29292
llvm-svn: 294003
Previously, we would hit UB (or the ISD::DELETED_NODE assert) if we
happened to replace a node during UpdateChains, because it would be
left in the list we were iterating over. This nulls out the pointer
when that happens so that we can avoid the issue.
Fixes llvm.org/PR31710
llvm-svn: 293522
We had various variants of defining dump() functions in LLVM. Normalize
them (this should just consistently implement the things discussed in
http://lists.llvm.org/pipermail/cfe-dev/2014-January/034323.html
For reference:
- Public headers should just declare the dump() method but not use
LLVM_DUMP_METHOD or #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
- The definition of a dump method should look like this:
#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
LLVM_DUMP_METHOD void MyClass::dump() {
// print stuff to dbgs()...
}
#endif
llvm-svn: 293359
This commit introduces a set of experimental intrinsics intended to prevent
optimizations that make assumptions about the rounding mode and floating point
exception behavior. These intrinsics will later be extended to specify
flush-to-zero behavior. More work is also required to model instruction
dependencies in machine code and to generate these instructions from clang
(when required by pragmas and/or command line options that are not currently
supported).
Differential Revision: https://reviews.llvm.org/D27028
llvm-svn: 293226
Recommitting r288293 with some extra fixes for GlobalISel code.
Most of the exception handling members in MachineModuleInfo is actually
per function data (talks about the "current function") so it is better
to keep it at the function instead of the module.
This is a necessary step to have machine module passes work properly.
Also:
- Rename TidyLandingPads() to tidyLandingPads()
- Use doxygen member groups instead of "//===- EH ---"... so it is clear
where a group ends.
- I had to add an ugly const_cast at two places in the AsmPrinter
because the available MachineFunction pointers are const, but the code
wants to call tidyLandingPads() in between
(markFunctionEnd()/endFunction()).
Differential Revision: https://reviews.llvm.org/D27227
llvm-svn: 288405
Most of the exception handling members in MachineModuleInfo is actually
per function data (talks about the "current function") so it is better
to keep it at the function instead of the module.
This is a necessary step to have machine module passes work properly.
Also:
- Rename TidyLandingPads() to tidyLandingPads()
- Use doxygen member groups instead of "//===- EH ---"... so it is clear
where a group ends.
- I had to add an ugly const_cast at two places in the AsmPrinter
because the available MachineFunction pointers are const, but the code
wants to call tidyLandingPads() in between
(markFunctionEnd()/endFunction()).
Differential Revision: https://reviews.llvm.org/D27227
llvm-svn: 288293
The previously used "names" are rather descriptions (they use multiple
words and contain spaces), use short programming language identifier
like strings for the "names" which should be used when exporting to
machine parseable formats.
Also removed a unused TimerGroup from Hexxagon.
Differential Revision: https://reviews.llvm.org/D25583
llvm-svn: 287369
The code used llvm basic block predecessors to decided where to insert phi
nodes. Instruction selection can and will liberally insert new machine basic
block predecessors. There is not a guaranteed one-to-one mapping from pred.
llvm basic blocks and machine basic blocks.
Therefore the current approach does not work as it assumes we can mark
predecessor machine basic block as needing a copy, and needs to know the set of
all predecessor machine basic blocks to decide when to insert phis.
Instead of computing the swifterror vregs as we select instructions, propagate
them at the end of instruction selection when the MBB CFG is complete.
When an instruction needs a swifterror vreg and we don't know the value yet,
generate a new vreg and remember this "upward exposed" use, and reconcile this
at the end of instruction selection.
This will only happen if the target supports promoting swifterror parameters to
registers and the swifterror attribute is used.
rdar://28300923
llvm-svn: 283617
That commit added a new version of Intrinsic::getName which should only
be called when the intrinsic has no overloaded types. There are several
debugging paths, such as SDNode::dump which are printing the name of the
intrinsic but don't have the overloaded types. These paths should be ok
to just print the name instead of crashing.
The fix here is ultimately to just add a 'None' second argument as that
calls the overload capable getName, which is less efficient, but this is a
debugging path anyway, and not perf critical.
Thanks to Björn Pettersson for pointing out that there were more crashes.
llvm-svn: 279528
When SelectionDAGISel transforms a node representing an inline asm
block, memory constraint information is not preserved. This can cause
constraints to be broken when a memory offset is of the form:
offset + frame index
when the frame is resolved.
By propagating the constraints all the way to the backend, targets can
enforce memory operands of inline assembly to conform to their constraints.
For MIPSR6, some instructions had their offsets reduced to 9 bits from
16 bits such as ll/sc. This becomes problematic when using inline assembly
to perform atomic operations, as an offset can generated that is too big to
encode in the instruction.
Reviewers: dsanders, vkalintris
Differential Review: https://reviews.llvm.org/D21615
llvm-svn: 275786
This is a mechanical change to make TargetLowering API take MachineInstr&
(instead of MachineInstr*), since the argument is expected to be a valid
MachineInstr. In one case, changed a parameter from MachineInstr* to
MachineBasicBlock::iterator, since it was used as an insertion point.
As a side effect, this removes a bunch of MachineInstr* to
MachineBasicBlock::iterator implicit conversions, a necessary step
toward fixing PR26753.
llvm-svn: 274287
This used to be free, copying and moving DebugLocs became expensive
after the metadata rewrite. Passing by reference eliminates a ton of
track/untrack operations. No functionality change intended.
llvm-svn: 272512
Summary:
This patch is adding support for the MSVC buffer security check implementation
The buffer security check is turned on with the '/GS' compiler switch.
* https://msdn.microsoft.com/en-us/library/8dbf701c.aspx
* To be added to clang here: http://reviews.llvm.org/D20347
Some overview of buffer security check feature and implementation:
* https://msdn.microsoft.com/en-us/library/aa290051(VS.71).aspx
* http://www.ksyash.com/2011/01/buffer-overflow-protection-3/
* http://blog.osom.info/2012/02/understanding-vs-c-compilers-buffer.html
For the following example:
```
int example(int offset, int index) {
char buffer[10];
memset(buffer, 0xCC, index);
return buffer[index];
}
```
The MSVC compiler is adding these instructions to perform stack integrity check:
```
push ebp
mov ebp,esp
sub esp,50h
[1] mov eax,dword ptr [__security_cookie (01068024h)]
[2] xor eax,ebp
[3] mov dword ptr [ebp-4],eax
push ebx
push esi
push edi
mov eax,dword ptr [index]
push eax
push 0CCh
lea ecx,[buffer]
push ecx
call _memset (010610B9h)
add esp,0Ch
mov eax,dword ptr [index]
movsx eax,byte ptr buffer[eax]
pop edi
pop esi
pop ebx
[4] mov ecx,dword ptr [ebp-4]
[5] xor ecx,ebp
[6] call @__security_check_cookie@4 (01061276h)
mov esp,ebp
pop ebp
ret
```
The instrumentation above is:
* [1] is loading the global security canary,
* [3] is storing the local computed ([2]) canary to the guard slot,
* [4] is loading the guard slot and ([5]) re-compute the global canary,
* [6] is validating the resulting canary with the '__security_check_cookie' and performs error handling.
Overview of the current stack-protection implementation:
* lib/CodeGen/StackProtector.cpp
* There is a default stack-protection implementation applied on intermediate representation.
* The target can overload 'getIRStackGuard' method if it has a standard location for the stack protector cookie.
* An intrinsic 'Intrinsic::stackprotector' is added to the prologue. It will be expanded by the instruction selection pass (DAG or Fast).
* Basic Blocks are added to every instrumented function to receive the code for handling stack guard validation and errors handling.
* Guard manipulation and comparison are added directly to the intermediate representation.
* lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
* lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
* There is an implementation that adds instrumentation during instruction selection (for better handling of sibbling calls).
* see long comment above 'class StackProtectorDescriptor' declaration.
* The target needs to override 'getSDagStackGuard' to activate SDAG stack protection generation. (note: getIRStackGuard MUST be nullptr).
* 'getSDagStackGuard' returns the appropriate stack guard (security cookie)
* The code is generated by 'SelectionDAGBuilder.cpp' and 'SelectionDAGISel.cpp'.
* include/llvm/Target/TargetLowering.h
* Contains function to retrieve the default Guard 'Value'; should be overriden by each target to select which implementation is used and provide Guard 'Value'.
* lib/Target/X86/X86ISelLowering.cpp
* Contains the x86 specialisation; Guard 'Value' used by the SelectionDAG algorithm.
Function-based Instrumentation:
* The MSVC doesn't inline the stack guard comparison in every function. Instead, a call to '__security_check_cookie' is added to the epilogue before every return instructions.
* To support function-based instrumentation, this patch is
* adding a function to get the function-based check (llvm 'Value', see include/llvm/Target/TargetLowering.h),
* If provided, the stack protection instrumentation won't be inlined and a call to that function will be added to the prologue.
* modifying (SelectionDAGISel.cpp) do avoid producing basic blocks used for inline instrumentation,
* generating the function-based instrumentation during the ISEL pass (SelectionDAGBuilder.cpp),
* if FastISEL (not SelectionDAG), using the fallback which rely on the same function-based implemented over intermediate representation (StackProtector.cpp).
Modifications
* adding support for MSVC (lib/Target/X86/X86ISelLowering.cpp)
* adding support function-based instrumentation (lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp, .h)
Results
* IR generated instrumentation:
```
clang-cl /GS test.cc /Od /c -mllvm -print-isel-input
```
```
*** Final LLVM Code input to ISel ***
; Function Attrs: nounwind sspstrong
define i32 @"\01?example@@YAHHH@Z"(i32 %offset, i32 %index) #0 {
entry:
%StackGuardSlot = alloca i8* <<<-- Allocated guard slot
%0 = call i8* @llvm.stackguard() <<<-- Loading Stack Guard value
call void @llvm.stackprotector(i8* %0, i8** %StackGuardSlot) <<<-- Prologue intrinsic call (store to Guard slot)
%index.addr = alloca i32, align 4
%offset.addr = alloca i32, align 4
%buffer = alloca [10 x i8], align 1
store i32 %index, i32* %index.addr, align 4
store i32 %offset, i32* %offset.addr, align 4
%arraydecay = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 0
%1 = load i32, i32* %index.addr, align 4
call void @llvm.memset.p0i8.i32(i8* %arraydecay, i8 -52, i32 %1, i32 1, i1 false)
%2 = load i32, i32* %index.addr, align 4
%arrayidx = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 %2
%3 = load i8, i8* %arrayidx, align 1
%conv = sext i8 %3 to i32
%4 = load volatile i8*, i8** %StackGuardSlot <<<-- Loading Guard slot
call void @__security_check_cookie(i8* %4) <<<-- Epilogue function-based check
ret i32 %conv
}
```
* SelectionDAG generated instrumentation:
```
clang-cl /GS test.cc /O1 /c /FA
```
```
"?example@@YAHHH@Z": # @"\01?example@@YAHHH@Z"
# BB#0: # %entry
pushl %esi
subl $16, %esp
movl ___security_cookie, %eax <<<-- Loading Stack Guard value
movl 28(%esp), %esi
movl %eax, 12(%esp) <<<-- Store to Guard slot
leal 2(%esp), %eax
pushl %esi
pushl $204
pushl %eax
calll _memset
addl $12, %esp
movsbl 2(%esp,%esi), %esi
movl 12(%esp), %ecx <<<-- Loading Guard slot
calll @__security_check_cookie@4 <<<-- Epilogue function-based check
movl %esi, %eax
addl $16, %esp
popl %esi
retl
```
Reviewers: kcc, pcc, eugenis, rnk
Subscribers: majnemer, llvm-commits, hans, thakis, rnk
Differential Revision: http://reviews.llvm.org/D20346
llvm-svn: 272053
My first attempt at this had an overly aggressive assert - chain nodes
will only be removed, but we could hit the assert if a non-chain node
was CSE'd (NodeToMatch, for instance).
This reapplies r271706 by reverting r271713 and fixing an assert.
Original message:
Avoid relying on UB by looking into deleted nodes for a marker value.
Instead, update the list of chain nodes as we go.
llvm-svn: 271733
It's awkward to force callers of SelectNodeTo to figure out whether
the node was morphed or CSE'd. Update uses here instead of requiring
callers to (sometimes) do it.
llvm-svn: 269235
This means SelectCode unconditionally returns nullptr now. I'll follow
up with a change to make that return void as well, but it seems best
to keep that one very mechanical.
This is part of the work to have Select return void instead of an
SDNode *, which is in turn part of llvm.org/pr26808.
llvm-svn: 269136
Relying on the caller to clean up after we've replaced all uses of a
node won't work when we've migrated to the `void Select(...)` API.
llvm-svn: 268774
This is a step towards removing the rampant undefined behaviour in
SelectionDAG, which is a part of llvm.org/PR26808.
We rename SelectionDAGISel::Select to SelectImpl and update targets to
match, and then change Select to return void and consolidate the
sketchy behaviour we're trying to get away from there.
Next, we'll update backends to implement `void Select(...)` instead of
SelectImpl and eventually drop the base Select implementation.
llvm-svn: 268693
This opcode never happens in practice, and yet the logic we have in
place to handle it would be undefined behaviour if we ever executed
it. Remove it rather than trying to refactor code that's never
reached.
llvm-svn: 268692