demanded by the machine verifier.
After shrinking a live-range to its uses, it is possible to create several
smaller live-ranges. When this happens, shrinkToUses returns true and we need to
split the different components into their own live-ranges.
The problem does not reproduce on any in-tree target but Jonas Paulsson
<jonas.paulsson@ericsson.com>, who reported the problem, checked that this patch
fixes the issue.
llvm-svn: 236658
If called twice in the same BB on the same constant, FastISel::fastEmit_ri_ was marking the materialized vreg as killed on each use, instead of only the last use.
Change this to only mark the last use as killed by making earlier uses check if the vreg is already used elsewhere.
llvm-svn: 236650
Don't create names for temporary symbols when using an object streamer.
The names never make it to the output anyway. From the starting point
of r236629, my heap profile says this drops peak memory usage from 1100
MB to 1058 MB for CodeGen of `verify-uselistorder`, a savings of almost
4% on peak memory, and removes `StringMap<bool, BumpPtrAllocator...>`
from the profile entirely.
(I'm looking at `llc` memory usage on `verify-uselistorder.lto.opt.bc`;
see r236629 for details.)
llvm-svn: 236642
It's quite possible to encounter an insertvalue instruction that's more deeply
nested than the value we're looking for, but when that happens we really
mustn't compare beyond the end of the index array.
Since I couldn't see any guarantees about what comparisons std::equal makes, we
probably need to directly check the size beforehand. In practice, I suspect
most std::equal implementations would probably bail early, which would be OK.
But just in case...
rdar://20834485
llvm-svn: 236635
Emit the number of bytes in a `.debug_loc` entry directly. The old code
created temp labels (expensive), emitted the difference between them,
and then emitted one on each side of the relevant bytes.
(I'm looking at `llc` memory usage on `verify-uselistorder.lto.opt.bc`
(the optimized version of ld64's `-save-temps` when linking the
`verify-uselistorder` executable in an LTO bootstrap). I've hacked
`MCContext::Allocate()` to just call `malloc()` instead of using the
`BumpPtrAllocator` so that the heap profile is easier to read. As far
as peak memory is concerned, `MCContext::Allocate()` is equivalent to a
leak, since it only gets freed at process teardown.
In my heap profile, this patch drops memory usage of
`DwarfDebug::emitDebugLoc()` from 132.56 MB (11.4%) down to 29.86 MB
(2.7%) at peak memory. Some of that must be noise from `SmallVector`
(or other) allocations -- peak memory only dropped from 1160 MB down to
1100 MB -- but this nevertheless shaves 5% off the top.)
llvm-svn: 236629
Summary:
When computing branch weights in BPI, we used to disallow branches with
weight 0. This is a minor nuisance, because a branch with weight 0 is
different to "don't have information". In the context of
instrumentation, it may mean "never executed", in the context of
sampling, it means "never or seldom executed".
In allowing 0 weight branches, I ran into issues with the switch
expansion code in selection DAG. It is currently hardwired to not handle
branches with weight 0. To maintain the current behaviour, I changed it
to use 1 when it finds 0, but perhaps the algorithm needs changes to
tolerate branches with weight zero.
Reviewers: hansw
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9533
llvm-svn: 236617
Summary: This patch correctly handles undef case of EXTRACT_VECTOR_ELT node where the element index is constant and not less than vector size.
Test Plan:
CodeGen for X86 test included.
Also one incorrect regression test fixed.
Reviewers: qcolombet, chandlerc, hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, llvm-commits
Differential Revision: http://reviews.llvm.org/D9250
llvm-svn: 236584
For accessors in the `Statepoint` class, use symbolic constants for
offsets into the argument vector instead of literals. This makes the
code intent clearer and simpler to change.
llvm-svn: 236566
Summary:
We default the value argument to nullptr. The only use of the value is
in diagnosePossiblyInvalidConstraint and that seems to be resilient to
it being nullptr.
Reviewers: atrick, reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9479
llvm-svn: 236555
Summary:
The exported class will be used in later change, in
StatepointLowering.cpp. It is still internal to SelectionDAG (not
exported via include/).
Reviewers: reames, atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9478
llvm-svn: 236554
Summary:
Currently this does not change anything, but change will be used in a
later change to StatepointLowering.cpp
Reviewers: reames, atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9477
llvm-svn: 236553
Note, this is a recommit of r236515 after fixing an error in r236514. The buildbot ran fast enough that it picked up r236514 prior to r236515 and threw an error. r236515 itself ran 'make check' without errors.
Original commit message follows:
A regmask (typically seen on a call) clobbers the set of registers it lists. The IfConverter, in UpdatePredRedefs, was handling register defs, but not regmasks.
These are slightly different to a def in that we need to add both an implicit use and def to appease the machine verifier. Otherwise, uses after the if converted call could think they are reading an undefined register.
Reviewed by Matthias Braun and Quentin Colombet.
llvm-svn: 236550
This patch adds the minimum plumbing necessary to use IR-level
fast-math-flags (FMF) in the backend without actually using
them for anything yet. This is a follow-on to:
http://reviews.llvm.org/rL235997
...which split the existing nsw / nuw / exact flags and FMF
into their own struct.
There are 2 structural changes here:
1. The main diff is that we're preparing to extend the optimization
flags to affect more than just binary SDNodes. Eg, IR intrinsics
( https://llvm.org/bugs/show_bug.cgi?id=21290 ) or non-binop nodes
that don't even exist in IR such as FMA, FNEG, etc.
2. The other change is that we're actually copying the FP fast-math-flags
from the IR instructions to SDNodes.
Differential Revision: http://reviews.llvm.org/D8900
llvm-svn: 236546
Note, this is a reapplication of r236515 with a fix to not assert on non-register operands, but instead only handle them until the subsequent commit. Original commit message follows.
The code was basically the same here already. Just added an out parameter for a vector of seen defs so that UpdatePredRedefs can call StepForward first, then do its own post processing on the seen defs.
Will be used in the next commit to also handle regmasks.
llvm-svn: 236538
This patch makes ReplaceExtractVectorEltOfLoadWithNarrowedLoad convert
the element number from getVectorIdxTy() to PtrTy before doing pointer
arithmetic on it. This is needed on z, where element numbers are i32
but pointers are i64.
Original patch by Richard Sandiford.
llvm-svn: 236530
For little-endian, the function would convert (extract_vector_elt (load X), Y)
to X + Y*sizeof(elt). For big-endian it would instead use
X + sizeof(vec) - Y*sizeof(elt). The big-endian case wasn't right since
vector index order always follows memory/array order, even for big-endian.
(Note that the current handling has to be wrong for Y==0 since it would
access beyond the end of the vector.)
Original patch by Richard Sandiford.
llvm-svn: 236529
When lowering a load or store for TypeWidenVector, the type legalizer
would use a single load or store if the associated integer type was legal.
E.g. it would load a v4i8 as an i32 if i32 was legal.
This patch extends that behavior to promoted integers as well as legal ones.
If the integer type for the full vector width is TypePromoteInteger,
the element type is going to be TypePromoteInteger too, and it's still
better to use a single promoting load or truncating store rather than N
individual promoting loads or truncating stores. E.g. if you have a v2i8
on a target where i16 is promoted to i32, it's better to load the v2i8 as
an i16 rather than load both i8s individually.
Original patch by Richard Sandiford.
llvm-svn: 236528
This reverts commit b27413cbfd78d959c18e713bfa271fb69e6b3303 (ie r236515).
This is to get the bots green while i investigate the failures.
llvm-svn: 236517
A regmask (typically seen on a call) clobbers the set of registers it lists. The IfConverter, in UpdatePredRedefs, was handling register defs, but not regmasks.
These are slightly different to a def in that we need to add both an implicit use and def to appease the machine verifier. Otherwise, uses after the if converted call could think they are reading an undefined register.
Reviewed by Matthias Braun and Quentin Colombet.
llvm-svn: 236515
The code was basically the same here already. Just added an out parameter for a vector of seen defs so that UpdatePredRedefs can call StepForward first, then do its own post processing on the seen defs.
Will be used in the next commit to also handle regmasks.
llvm-svn: 236514
This reverts commit r236360.
This change exposed a bug in WinEHPrepare by opting win32 code into EH
preparation. We already knew that WinEHPrepare has bugs, and is the
status quo for x64, so I don't think that's a reason to hold off on this
change. I disabled exceptions in the sanitizer tests in r236505 and an
earlier revision.
llvm-svn: 236508
This patch introduces a new pass that computes the safe point to insert the
prologue and epilogue of the function.
The interest is to find safe points that are cheaper than the entry and exits
blocks.
As an example and to avoid regressions to be introduce, this patch also
implements the required bits to enable the shrink-wrapping pass for AArch64.
** Context **
Currently we insert the prologue and epilogue of the method/function in the
entry and exits blocks. Although this is correct, we can do a better job when
those are not immediately required and insert them at less frequently executed
places.
The job of the shrink-wrapping pass is to identify such places.
** Motivating example **
Let us consider the following function that perform a call only in one branch of
a if:
define i32 @f(i32 %a, i32 %b) {
%tmp = alloca i32, align 4
%tmp2 = icmp slt i32 %a, %b
br i1 %tmp2, label %true, label %false
true:
store i32 %a, i32* %tmp, align 4
%tmp4 = call i32 @doSomething(i32 0, i32* %tmp)
br label %false
false:
%tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ]
ret i32 %tmp.0
}
On AArch64 this code generates (removing the cfi directives to ease
readabilities):
_f: ; @f
; BB#0:
stp x29, x30, [sp, #-16]!
mov x29, sp
sub sp, sp, #16 ; =16
cmp w0, w1
b.ge LBB0_2
; BB#1: ; %true
stur w0, [x29, #-4]
sub x1, x29, #4 ; =4
mov w0, wzr
bl _doSomething
LBB0_2: ; %false
mov sp, x29
ldp x29, x30, [sp], #16
ret
With shrink-wrapping we could generate:
_f: ; @f
; BB#0:
cmp w0, w1
b.ge LBB0_2
; BB#1: ; %true
stp x29, x30, [sp, #-16]!
mov x29, sp
sub sp, sp, #16 ; =16
stur w0, [x29, #-4]
sub x1, x29, #4 ; =4
mov w0, wzr
bl _doSomething
add sp, x29, #16 ; =16
ldp x29, x30, [sp], #16
LBB0_2: ; %false
ret
Therefore, we would pay the overhead of setting up/destroying the frame only if
we actually do the call.
** Proposed Solution **
This patch introduces a new machine pass that perform the shrink-wrapping
analysis (See the comments at the beginning of ShrinkWrap.cpp for more details).
It then stores the safe save and restore point into the MachineFrameInfo
attached to the MachineFunction.
This information is then used by the PrologEpilogInserter (PEI) to place the
related code at the right place. This pass runs right before the PEI.
Unlike the original paper of Chow from PLDI’88, this implementation of
shrink-wrapping does not use expensive data-flow analysis and does not need hack
to properly avoid frequently executed point. Instead, it relies on dominance and
loop properties.
The pass is off by default and each target can opt-in by setting the
EnableShrinkWrap boolean to true in their derived class of TargetPassConfig.
This setting can also be overwritten on the command line by using
-enable-shrink-wrap.
Before you try out the pass for your target, make sure you properly fix your
emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not
necessarily the entry block.
** Design Decisions **
1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but
for debugging and clarity I thought it was best to have its own file.
2. Right now, we only support one save point and one restore point. At some
point we can expand this to several save point and restore point, the impacted
component would then be:
- The pass itself: New algorithm needed.
- MachineFrameInfo: Hold a list or set of Save/Restore point instead of one
pointer.
- PEI: Should loop over the save point and restore point.
Anyhow, at least for this first iteration, I do not believe this is interesting
to support the complex cases. We should revisit that when we motivating
examples.
Differential Revision: http://reviews.llvm.org/D9210
<rdar://problem/3201744>
llvm-svn: 236507
When deciding whether a value comes from the aggregate or inserted value of an
insertvalue instruction, we compare the indices against those of the location
we're interested in. One of the lists needs reversing because the input data is
backwards (so that modifications take place at the end of the SmallVector), but
we were reversing both before leading to incorrect results.
Should fix PR23408
llvm-svn: 236457
ScheduleDAGInstrs wasn't setting or clearing the kill flags on instructions inside bundles. This led to code such as this
%R3<def> = t2ANDrr %R0
BUNDLE %ITSTATE<imp-def,dead>, %R0<imp-use,kill>
t2IT 1, 24, %ITSTATE<imp-def>
R6<def,tied6> = t2ORRrr %R0<kill>, ...
being transformed to
BUNDLE %ITSTATE<imp-def,dead>, %R0<imp-use>
t2IT 1, 24, %ITSTATE<imp-def>
R6<def,tied6> = t2ORRrr %R0<kill>, ...
%R3<def> = t2ANDrr %R0<kill>
where the kill flag was removed from the BUNDLE instruction, but not the t2ORRrr inside it. The verifier then thought that
R0 was undefined when read by the AND.
This change make the toggleKillFlags method also check for bundles and toggle flags on bundled instructions.
Setting the kill flag is special cased as we only want to set the kill flag on the last instruction in the bundle.
llvm-svn: 236428
This pass is responsible for constructing the EH registration object
that gets linked into fs:00, which is all it does in this change. In the
future, it will also insert stores to update the EH state number.
I considered keeping this functionality in WinEHPrepare, but it's pretty
separable and X86 specific. It has conceptually very little to do with
the task of WinEHPrepare, which is currently outlining. WinEHPrepare is
also in theory useful on ARM, but this logic is pretty x86 specific.
Reviewers: andrew.w.kaylor, majnemer
Differential Revision: http://reviews.llvm.org/D9422
llvm-svn: 236339
This patch fixes issues with vector constant folding not correctly handling scalar input operands if they require implicit truncation - this was tested with llvm-stress as recommended by Patrik H Hagglund.
The patch ensures that integer input scalars from a build vector are correctly truncated before folding, and that constant integer scalar results are promoted to a legal type before inclusion in the new folded build vector.
I have added another crash test case and also a test for UINT_TO_FP / SINT_TO_FP using an non-truncated scalar input, which was failing before this patch.
Differential Revision: http://reviews.llvm.org/D9282
llvm-svn: 236308
When commuting a thumb instruction in the size reduction pass, thumb
instructions are represented as a bundle and so some operands may be marked
as internal. The internal flag has to move with the operand when commuting.
This test is sensitive to register allocation so can't specifically check that
this error was happening, but so long as it continues to pass with -verify then
hopefully its still ok.
rdar://problem/20752113
llvm-svn: 236282
Revision 220239 exposed a latent bug in method
'TargetInstrInfo::commuteInstruction'. When commuting the operands of a machine
instruction, method 'commuteInstruction' didn't correctly propagate the
'IsUndef' flag to the register operands of the new (commuted) instruction.
Before this patch, the following instruction:
%vreg4<def> = VADDSDrr %vreg14, %vreg5<undef>; FR64:%vreg4,%vreg14,%vreg5
was wrongly converted by method 'commuteInstruction' into:
%vreg4<def> = VADDSDrr %vreg5, %vreg14<undef>; FR64:%vreg4,%vreg5,%vreg14
The correct instruction should have been:
%vreg4<def> = VADDSDrr %vreg5<undef>, %vreg14; FR64:%vreg4,%vreg5,%vreg14
This patch fixes the problem in method 'TargetInstrInfo::commuteInstruction'.
When swapping the operands of a machine instruction, we now make sure that
'IsUndef' flags are correctly set.
Added test case 'pr23103.ll'.
Differential Revision: http://reviews.llvm.org/D9406
llvm-svn: 236258
If you somehow added a MachineOperand to an instruction
that did not have the parent set, the verifier would
crash since it attempts to use the operand's parent.
llvm-svn: 236249
In the test case here, the 'unreachable' BB was removed by BranchFolding because its empty.
It then rewrote the jump from 'entry' to jump to its fallthrough, which was a landing pad.
This results in 'entry' jumping to 2 different landing pads, which fails the machine verifier.
rdar://problem/20750162
llvm-svn: 236248
changes:
Don't apply on hexagon and NVPTX since they no longer claim to support UADDO/USUBO
Add location to getConstant
Drop comment about the ops being turned into expand
llvm-svn: 236240
At the least it should be guarded by some kind of target hook.
It also introduced catastrophic compile time and code quality
regressions on some out of tree targets (test case still being
reduced/sanitized).
Sanjay agreed with reverting this patch until these issues can be
resolved.
llvm-svn: 236199
This will cause hot nodes to appear closer to the root.
The literature says building the tree like this makes it a near-optimal (in
terms of search time given key frequencies) binary search tree. In LLVM's case,
we can do up to 3 comparisons in each leaf node, so it might be better to opt
for lower tree height in some cases; that's something to look into in the
future.
Differential Revision: http://reviews.llvm.org/D9318
llvm-svn: 236192
32-bit x86 MSVC-style exceptions are functionaly similar to 64-bit, but
they take no arguments. Instead, they implicitly use the value of EBP
passed in by the caller as a pointer to the parent's frame. In LLVM, we
can represent this as llvm.frameaddress(1), and feed that into all of
our calls to llvm.framerecover.
The next steps are:
- Add an alloca to the fs:00 linked list of handlers
- Add something like llvm.sjlj.lsda or generalize it to store in the
alloca
- Move state number calculation to WinEHPrepare, arrange for
FunctionLoweringInfo to call it
- Use the state numbers to insert explicit loads and stores in the IR
llvm-svn: 236172
Many of the callers already have the pointer type anyway, and for the
couple of callers that don't it's pretty easy to call PointerType::get
on the pointee type and address space.
This avoids LLParser from using PointerType::getElementType when parsing
GlobalAliases from IR.
llvm-svn: 236160
Finish off PR23080 by renaming the debug info IR constructs from `MD*`
to `DI*`. The last of the `DIDescriptor` classes were deleted in
r235356, and the last of the related typedefs removed in r235413, so
this has all baked for about a week.
Note: If you have out-of-tree code (like a frontend), I recommend that
you get everything compiling and tests passing with the *previous*
commit before updating to this one. It'll be easier to keep track of
what code is using the `DIDescriptor` hierarchy and what you've already
updated, and I think you're extremely unlikely to insert bugs. YMMV of
course.
Back to *this* commit: I did this using the rename-md-di-nodes.sh
upgrade script I've attached to PR23080 (both code and testcases) and
filtered through clang-format-diff.py. I edited the tests for
test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns
were off-by-three. It should work on your out-of-tree testcases (and
code, if you've followed the advice in the previous paragraph).
Some of the tests are in badly named files now (e.g.,
test/Assembler/invalid-mdcompositetype-missing-tag.ll should be
'dicompositetype'); I'll come back and move the files in a follow-up
commit.
llvm-svn: 236120
This is a compromise: with this simple patch, we should always handle a chain of exactly 3
operations optimally, but we're not generating the optimal balanced binary tree for a longer
sequence.
In general, this transform will reduce the dependency chain for a sequence of instructions
using N operands from a worst case N-1 dependent operations to N/2 dependent operations.
The optimal balanced binary tree would reduce the chain to log2(N).
The trade-off for not dealing with longer sequences is: (1) we have less complexity in the
compiler, (2) we avoid unknown compile-time blowup calculating a balanced tree, and (3) we
don't need to worry about the increased register pressure required to parallelize longer
sequences. It also seems unlikely that we would ever encounter really long strings of
dependent ops like that in the wild, but I'm not sure how to verify that speculation.
FWIW, I see no perf difference for test-suite running on btver2 (x86-64) with -ffast-math
and this patch.
We can extend this patch to cover other associative operations such as fmul, fmax, fmin,
integer add, integer mul.
This is a partial fix for:
https://llvm.org/bugs/show_bug.cgi?id=17305
and if extended:
https://llvm.org/bugs/show_bug.cgi?id=21768https://llvm.org/bugs/show_bug.cgi?id=23116
The issue also came up in:
http://reviews.llvm.org/D8941
Differential Revision: http://reviews.llvm.org/D9232
llvm-svn: 236031
This is a preliminary step to using the IR-level floating-point fast-math-flags in the SDAG (D8900).
In this patch, we introduce the optimization flags as their own struct. As noted in the TODO comment,
we should eventually share this data between the IR passes and the backend.
We also switch the existing nsw / nuw / exact bit functionality of the BinaryWithFlagsSDNode class to
use the new struct.
The tradeoff is that instead of using the free but limited space of SDNode's SubclassData, we add a
data member to the subclass. This means we don't have to repeat all of the get/set methods per flag,
but we're potentially adding size to all nodes of this subclassi type.
In practice on 64-bit systems (measured on Linux and MacOS X), there is no size difference between an
SDNode and BinaryWithFlagsSDNode after this change: they're both 80 bytes. This means that we had at
least one free byte to play with due to struct alignment.
Differential Revision: http://reviews.llvm.org/D9325
llvm-svn: 235997
[DebugInfo] Add debug locations to constant SD nodes
This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).
Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.
Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.
This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.
Differential Revision: http://reviews.llvm.org/D9084
llvm-svn: 235989
This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).
Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.
Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.
This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.
Differential Revision: http://reviews.llvm.org/D9084
llvm-svn: 235977
I previously thought switch clusters would need to use uint64_t in case
the weights of multiple cases overflowed a 32-bit int. It turns
out that the weights on a terminator instruction are capped to allow for
being added together, so using a uint32_t should be safe.
llvm-svn: 235945
Previously, the code would try to put a fall-through case last,
even if that meant moving a case with much higher branch weight
further down the chain.
Ordering by branch weight is most important, putting a fall-through
block last is secondary.
llvm-svn: 235942
Looking into 23095, my best guess is that the CodeGen library itself isn't getting linked and initialized properly. To make this slightly more obvious to consumers of LLVM, emit a different error message if we can tell that the registry is empty vs you've simply happened to name a collector which hasn't been registered.
llvm-svn: 235824
right scaling.
In the function canFoldInAddressingMode, VT is computed as the type of the
destination/source of a LOAD/STORE operations, instead of the memory type of the
operation.
On targets with a scaling factor on the offset of the LOAD/STORE operations, the
function may return false for actually valid cases. This may then prevent the
selection of profitable pre or post indexed load/store operations, and instead
select pre or post indexed load/store for unprofitable cases.
Patch by Francois de Ferriere <francois.de-ferriere@st.com>!
Differential Revision: http://reviews.llvm.org/D9146
llvm-svn: 235780
This introduces an intrinsic called llvm.eh.exceptioncode. It is lowered
by copying the EAX value live into whatever basic block it is called
from. Obviously, this only works if you insert it late during codegen,
because otherwise mid-level passes might reschedule it.
llvm-svn: 235768
I couldn't provide a testcase as none of the public targets has wide
register classes with alot of subregisters and at the same time an
instruction which "ReMaterializable" and "AsCheapAsAMove" (could
probably be added for R600).
llvm-svn: 235668
We were asserting on code like this:
extern "C" unsigned long _exception_code();
void might_crash(unsigned long);
void foo() {
__try {
might_crash(0);
} __except(1) {
might_crash(_exception_code());
}
}
Gtest and many other libraries get the exception code from the __except
block. What's supposed to happen here is that EAX is live into the
__except block, and it contains the exception code. Eventually we'll
represent that as a use of the landingpad ehptr value, but for now we
can replace it with undef.
llvm-svn: 235649
remove copies that are useful after breaking some hardware dependencies.
In other words, handle this kind of situations conservatively by assuming reg2
is redefined by the undef flag.
reg1 = copy reg2
= inst reg2<undef>
reg2 = copy reg1
Copy propagation used to remove the last copy.
This is incorrect because the undef flag on reg2 in inst, allows next
passes to put whatever trashed value in reg2 that may help.
In practice we end up with this code:
reg1 = copy reg2
reg2 = 0
= inst reg2<undef>
reg2 = copy reg1
This fixes PR21743.
llvm-svn: 235647
Third time's the charm. The previous commit was reverted as a
reverse for-loop in SelectionDAGBuilder::lowerWorkItem did 'I--'
on an iterator at the beginning of a vector, causing asserts
when using debugging iterators. This commit fixes that.
llvm-svn: 235608
Patch to remove extra bitcasts from shuffles, this is often a legacy of XformToShuffleWithZero being used to combine bitmaskings (of float vectors bitcast to integer vectors) into shuffles: bitcast(shuffle(bitcast(s0),bitcast(s1))) -> shuffle(s0,s1)
Differential Revision: http://reviews.llvm.org/D9097
llvm-svn: 235578
This is a re-commit of r235101, which also fixes the problems with the previous patch:
- Switches with only a default case and non-fallthrough were handled incorrectly
- The previous patch tickled a bug in PowerPC Early-Return Creation which is fixed here.
> This is a major rewrite of the SelectionDAG switch lowering. The previous code
> would lower switches as a binary tre, discovering clusters of cases
> suitable for lowering by jump tables or bit tests as it went along. To increase
> the likelihood of finding jump tables, the binary tree pivot was selected to
> maximize case density on both sides of the pivot.
>
> By not selecting the pivot in the middle, the binary trees would not always
> be balanced, leading to performance problems in the generated code.
>
> This patch rewrites the lowering to search for clusters of cases
> suitable for jump tables or bit tests first, and then builds the binary
> tree around those clusters. This way, the binary tree will always be balanced.
>
> This has the added benefit of decoupling the different aspects of the lowering:
> tree building and jump table or bit tests finding are now easier to tweak
> separately.
>
> For example, this will enable us to balance the tree based on profile info
> in the future.
>
> The algorithm for finding jump tables is quadratic, whereas the previous algorithm
> was O(n log n) for common cases, and quadratic only in the worst-case. This
> doesn't seem to be major problem in practice, e.g. compiling a file consisting
> of a 10k-case switch was only 30% slower, and such large switches should be rare
> in practice. Compiling e.g. gcc.c showed no compile-time difference. If this
> does turn out to be a problem, we could limit the search space of the algorithm.
>
> This commit also disables all optimizations during switch lowering in -O0.
>
> Differential Revision: http://reviews.llvm.org/D8649
llvm-svn: 235560
This removes the -sehprepare flag and makes __C_specific_handler
functions always to use WinEHPrepare.
This was tested by building all of chromium_builder_tests and running a
few tests that use SEH, but if something breaks, we can revert this.
llvm-svn: 235557
In particular, this handles SSA values that are live *out* of a handler.
The existing code only handles values that are live *in* to a handler.
It also handles phi nodes in the block where normal control should
resume after the end of a catch handler. When EH return points have phi
nodes, we need to split the return edge. It is impossible for phi
elimination to emit copies in the previous block if that block gets
outlined. The indirectbr that we leave in the function is only notional,
and is eliminated from the MachineFunction CFG early on.
Reviewers: majnemer, andrew.w.kaylor
Differential Revision: http://reviews.llvm.org/D9158
llvm-svn: 235545
This turned up after r235333, but was a pre-existing bug. The optimization
which transforms select(c, load, load) into a load of a select of the addresses
does not handle indexed loads (pre/post inc/dec). However, it did not check for
them either, leading to a crash if it tried to transform one of them.
llvm-svn: 235497
X86 backend.
The code generated for symbolic targets is identical to the code generated for
constant targets, except that a relocation is emitted to fix up the actual
target address at link-time. This allows IR and object files containing
patchpoints to be cached across JIT-invocations where the target address may
change.
llvm-svn: 235483
We should also teach the inliner to collapse framerecover of
frameaddress of the current frame down to an alloca, but that can happen
later.
llvm-svn: 235459
Remove the `DIArray` and `DITypeArray` typedefs, preferring the
underlying types (`DebugNodeArray` and `MDTypeRefArray`, respectively).
llvm-svn: 235413
Remove early returns for when `getVariable()` is null, and just assert
that it never happens. The Verifier already confirms that there's a
valid variable on these intrinsics, so we should assume the debug info
isn't broken. I also updated a check for a `!dbg` attachment, which the
Verifier similarly guarantees.
llvm-svn: 235400
Keep the old SEH fan-in lowering on by default for now, since projects
rely on it. This will make it easy to test this change with a simple
flag flip.
llvm-svn: 235399
Fixed issue with the combine of CONCAT_VECTOR of 2 BUILD_VECTOR nodes - the optimisation wasn't ensuring that the scalar operands of both nodes were the same type/size for implicit truncation.
Test case spotted by Patrik Hagglund
llvm-svn: 235371
Summary:
This fixes http://llvm.org/bugs/show_bug.cgi?id=16439.
This is one possible way to approach this. The other would be to split InL>>(nbits-Amt) into (InL>>(nbits-1-Amt))>>1, which is also valid since since we only need to care about Amt up nbits-1. It's hard to tell which one is better since the shift might be expensive if this stage of expansion is not yet a legal machine integer, whereas comparisons with zero are relatively cheap at all sizes, but more expensive than a shift if the shift is on a legal machine type.
Patch by Keno Fischer!
Test Plan: regression test from http://reviews.llvm.org/D7752
Reviewers: chfast, resistor
Reviewed By: chfast, resistor
Subscribers: sanjoy, resistor, chfast, llvm-commits
Differential Revision: http://reviews.llvm.org/D4978
llvm-svn: 235370
Delete subclasses of (the already deleted) `DIType` in favour of
directly using pointers from the `Metadata` hierarchy.
While `DICompositeType` wraps `MDCompositeTypeBase` and `DIDerivedType`
wraps `MDDerivedTypeBase`, most uses of each really meant the more
specific `MDCompositeType` and `MDDerivedType`.
llvm-svn: 235351
The version of `constructTypeDIE()` for `MDSubroutineType` is unrelated
to (and has different callers than) the `MDCompositeType`. Split the
two in half.
This simplifies an upcoming patch to delete `DICompositeType`. There
shouldn't be any real functionality change here. `createTypeDIE()` is
`cast<>`'ing where it didn't need to before, but that function in turn
is only called for true `MDCompositeType`s.
llvm-svn: 235349
Update comment style in `DwarfUnit`.
- Drop duplicated comments at definition, and update the comments at
the declaration where the definition comments looked newer or more
complete.
- Drop the `functionName -` prefix.
- Add `\brief` in a few places.
- Remove a few comments entirely that weren't adding value (just
turned the function name and arguments into a sentence).
llvm-svn: 235345
This is the last major parent class, so I'll probably start deleting
classes in batches now. Looks like many of the references to the DI*
hierarchy were updated organically along the way.
llvm-svn: 235331
Replace uses of `DIScope` with `MDScope*`. There was one spot where
I've left an `MDScope*` uninitialized (where `DIScope` would have been
default-initialized to `nullptr`) -- this is intentional, since the
if/else that follows should unconditional assign it to a value.
llvm-svn: 235327
When an inline asm call has an output register marked as early-clobber, but
that same register is also an input operand, what should we do? GCC accepts
this, and is documented to accept this for read/write operands saying,
"Furthermore, if the earlyclobber operand is also a read/write operand, then
that operand is written only after it's used." For write-only operands, the
situation seems less clear, but I have at least one existing codebase that
assumes this will work, in part because it has syscall macros like this:
({ \
register uint64_t r0 __asm__ ("r0") = (__NR_ ## name); \
register uint64_t r3 __asm__ ("r3") = ((uint64_t) (arg0)); \
register uint64_t r4 __asm__ ("r4") = ((uint64_t) (arg1)); \
register uint64_t r5 __asm__ ("r5") = ((uint64_t) (arg2)); \
__asm__ __volatile__ \
("sc" \
: "=&r"(r0),"=&r"(r3),"=&r"(r4),"=&r"(r5) \
: "0"(r0), "1"(r3), "2"(r4), "3"(r5) \
: "r6","r7","r8","r9","r10","r11","r12","cr0","memory"); \
r3; \
})
Furthermore, with register aliases and subregister relationships that only the
backend knows about, rejecting this in the frontend seems like a difficult
proposition (if we wanted to do so). However, keeping the early-clobber flag on
the INLINEASM MI does not work for us, because it will cause the register's
live interval to end to soon (so it will not appear defined to be used as an
input).
Fortunately, fixing this does not seem hard: When forming the INLINEASM MI,
check to see if any of the early-clobber outputs are also inputs, and if so,
remove the early-clobber flag.
llvm-svn: 235283
Instead of merging everything together, look at the users of
GlobalVariables, and try to group them by function, to create
sets of globals used "together".
Using that information, a less-aggressive alternative is to keep merging
everything together *except* globals that are only ever used alone, that
is, those for which it's clearly non-profitable to merge with others.
In my testing, grouping by Function is too aggressive, but grouping by
BasicBlock is too conservative. Anything in-between isn't trivially
available, so stick with Function grouping for now.
cl::opts are added for testing; both enabled by default.
A few of the testcases aren't testing the merging proper, but just
various edge cases when merging does occur. Update them to use the
previous grouping behavior. Also, one of the tests is unrelated to
GlobalMerge; change it accordingly.
While there, switch to r234666' flags rather than the brutal -O3.
Differential Revision: http://reviews.llvm.org/D8070
llvm-svn: 235249
Stop using `DIDescriptor` and its subclasses in the `DebugInfoFinder`
API, as well as the rest of the API hanging around in `DebugInfo.h`.
llvm-svn: 235240
This commit removes `DebugLocList` and replaces it with
`DebugLocStream`.
- `DebugLocEntry` no longer contains its byte/comment streams.
- The `DebugLocEntry` list for a variable/inlined-at pair is allocated
on the stack, and released right after `DebugLocEntry::finalize()`
(possible because of the refactoring in r231023). Now, only one
list is in memory at a time now.
- There's a single unified stream for the `.debug_loc` section that
persists, stored in the new `DebugLocStream` data structure.
The last point is important: this collapses the nested `SmallVector<>`s
from `DebugLocList` into unified streams. We previously had something
like the following:
vec<tuple<Label, CU,
vec<tuple<BeginSym, EndSym,
vec<Value>,
vec<char>,
vec<string>>>>>
A `SmallVector` can avoid allocations, but is statically fairly large
for a vector: three pointers plus the size of the small storage, which
is the number of elements in small mode times the element size).
Nesting these is expensive, since an inner vector's size contributes to
the element size of an outer one. (Nesting any vector is expensive...)
In the old data structure, the outer vector's *element* size was 632B,
excluding allocation costs for when the middle and inner vectors
exceeded their small sizes. 312B of this was for the "three" pointers
in the vector-tree beneath it. If you assume 1M functions with an
average of 10 variable/inlined-at pairs each (in an LTO scenario),
that's almost 6GB (besides inner allocations), with almost 3GB for the
"three" pointers.
This came up in a heap profile a little while ago of a `clang -flto -g`
bootstrap, with `DwarfDebug::collectVariableInfo()` using something like
10-15% of the total memory.
With this commit, we have:
tuple<vec<tuple<Label, CU, Offset>>,
vec<tuple<BeginSym, EndSym, Offset, Offset>>,
vec<char>,
vec<string>>
The offsets are used to create `ArrayRef` slices of adjacent
`SmallVector`s. This reduces the number of vectors to four (unrelated
to the number of variable/inlined-at pairs), and caps the number of
allocations at the same number.
Besides saving memory and limiting allocations, this is NFC.
I don't know my way around this code very well yet, but I wonder if we
could go further: why stream to a side-table, instead of directly to the
output stream?
llvm-svn: 235229
CatchHigh may be smaller than TryHigh if we reuse an outlined catch
handler for two different invokes with different EH states. We have no
evidence which shows that CatchHigh must be greater than TryHigh or
TryLow. We can revisit this if we turn out to be wrong.
llvm-svn: 235223
Summary:
- Handle TypePromoteFloat in switch statements
- Move an expression into an assert to avoid unused variable in
non-assert builds.
Reviewers: srhines, ab
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9086
llvm-svn: 235220
Summary:
This patch adds legalization support to operate on FP16 as a load/store type
and do operations on it as floats.
Tests for ARM are added to test/CodeGen/ARM/fp16-promote.ll
Reviewers: srhines, t.p.northover
Differential Revision: http://reviews.llvm.org/D8755
llvm-svn: 235215
Catch blocks which are empty may be in the same state as their try
blocks. It is not meaningful to give the catch block its own state
number in this case because it can't do anything exceptional.
llvm-svn: 235212
Stop storing the `MDLocalVariable` in the `DebugLocEntry::Value`s. We
generate the list of `DebugLocEntry`s separately for each
variable/inlined-at pair, so the variable never actually changes here.
This is effectively NFC (aside from saving some memory and CPU time).
llvm-svn: 235202
We can calculate the variable type up front before calling
`DebugLocEntry::finalize()`. In fact, since we only care about the type
if it's an `MDBasicType`, don't even bother resolving it using the type
identifier map.
llvm-svn: 235201
This is a followon to r233681 - I'd misunderstood the semantics of FTRUNC,
and had confused it with (FP_ROUND ..., 0).
Thanks for Ahmed Bougacha for his post-commit review!
llvm-svn: 235191
This now emits simple, unoptimized xdata tables for __C_specific_handler
based on the handlers listed in @llvm.eh.actions calls produced by
WinEHPrepare.
This adds support for running __finally blocks when exceptions are
thrown, and removes the old landingpad fan-in codepath.
I ran some manual execution tests on small basic test cases with and
without optimization, as well as on Chrome base_unittests, which uses a
small amount of SEH. I'm sure there are bugs, and we may need to
revert.
llvm-svn: 235154
r235050 dropped the inlined-at field from `MDLocalVariable`, deferring
to the `!dbg` attachments. Fix `UserValue` to take the `!dbg` into
account when differentiating between variables.
llvm-svn: 235140
This is a major rewrite of the SelectionDAG switch lowering. The previous code
would lower switches as a binary tre, discovering clusters of cases
suitable for lowering by jump tables or bit tests as it went along. To increase
the likelihood of finding jump tables, the binary tree pivot was selected to
maximize case density on both sides of the pivot.
By not selecting the pivot in the middle, the binary trees would not always
be balanced, leading to performance problems in the generated code.
This patch rewrites the lowering to search for clusters of cases
suitable for jump tables or bit tests first, and then builds the binary
tree around those clusters. This way, the binary tree will always be balanced.
This has the added benefit of decoupling the different aspects of the lowering:
tree building and jump table or bit tests finding are now easier to tweak
separately.
For example, this will enable us to balance the tree based on profile info
in the future.
The algorithm for finding jump tables is O(n^2), whereas the previous algorithm
was O(n log n) for common cases, and quadratic only in the worst-case. This
doesn't seem to be major problem in practice, e.g. compiling a file consisting
of a 10k-case switch was only 30% slower, and such large switches should be rare
in practice. Compiling e.g. gcc.c showed no compile-time difference. If this
does turn out to be a problem, we could limit the search space of the algorithm.
This commit also disables all optimizations during switch lowering in -O0.
Differential Revision: http://reviews.llvm.org/D8649
llvm-svn: 235101
Fix for test case found by James Molloy - TRUNCATE of constant build vectors can be more simply achieved by simply replacing with a new build vector node with the truncated value type - no need to touch the scalar operands at all.
llvm-svn: 235079
The only type that isn't an integer, isn't floating point, and isn't
a vector; ladies and gentlemen, the gift that keeps on giving: x86_mmx!
Fixes PR23246.
Original message (reverted in r235062):
[CodeGen] Combine concat_vectors of scalars into build_vector.
Combine something like:
(v8i8 concat_vectors (v2i8 bitcast (i16)) x4)
into:
(v8i8 (bitcast (v4i16 BUILD_VECTOR (i16) x4)))
If any of the scalars are floating point, use that throughout.
Differential Revision: http://reviews.llvm.org/D8948
llvm-svn: 235072
Delete `DIRef<>`, and replace the remaining uses of it with
`TypedDebugNodeRef<>`. To minimize code churn, I've added typedefs from
`MDTypeRef` to `DITypeRef` (etc.).
llvm-svn: 235071
PR23080 is almost finished. With this commit, there's no consequential
API in `DIDescriptor` and its subclasses. What's left?
- Default-constructed to `nullptr`.
- Handy `const_cast<>` (constructed from `const`, but accessors are
non-`const`).
I think the safe way to catch those is to delete the classes and fix
compile errors. That'll be my next step, after I delete the `DITypeRef`
(etc.) wrapper around `MDTypeRef`.
llvm-svn: 235069
Continuing PR23080, gut `DIType` and its various subclasses, leaving
behind thin wrappers around the pointer types in the new debug info
hierarchy.
llvm-svn: 235064
The way we split SEH catch-all blocks can leave some dead EH values
behind at -O0. Try to remove them, and if we fail, replace them all with
undef.
Fixes a crash when removing the old unreachable landingpad which is
still used by extractvalue instructions in the catch-all block.
llvm-svn: 235061
Remove the accessors of `DIDerivedType` that downcast to
`MDDerivedType`, shifting the `cast<MDDerivedType>` into the callers.
Also remove `DIType::isValid()`, which is really just a check against
`nullptr` at this point.
llvm-svn: 235059
Remove 'inlinedAt:' from MDLocalVariable. Besides saving some memory
(variables with it seem to be single largest `Metadata` contributer to
memory usage right now in -g -flto builds), this stops optimization and
backend passes from having to change local variables.
The 'inlinedAt:' field was used by the backend in two ways:
1. To tell the backend whether and into what a variable was inlined.
2. To create a unique id for each inlined variable.
Instead, rely on the 'inlinedAt:' field of the intrinsic's `!dbg`
attachment, and change the DWARF backend to use a typedef called
`InlinedVariable` which is `std::pair<MDLocalVariable*, MDLocation*>`.
This `DebugLoc` is already passed reliably through the backend (as
verified by r234021).
This commit removes the check from r234021, but I added a new check
(that will survive) in r235048, and changed the `DIBuilder` API in
r235041 to require a `!dbg` attachment whose 'scope:` is in the same
`MDSubprogram` as the variable's.
If this breaks your out-of-tree testcases, perhaps the script I used
(mdlocalvariable-drop-inlinedat.sh) will help; I'll attach it to PR22778
in a moment.
llvm-svn: 235050
This avoids emitting code for unreachable landingpad blocks that contain
calls to llvm.eh.actions and indirectbr.
It's also a first step towards unifying the SEH and WinEH lowering
codepaths. I'm keeping the old fan-in lowering of SEH around until the
preparation version works well enough that we can switch over without
breaking existing users.
llvm-svn: 235037
This causes badness for GDB which expects to find a definition in any
compile_unit that has an entry for the variable in its pubnames.
llvm-svn: 234915
TargetRegisterInfo::getRegPressureLimit has a note that it is an old
model that relies on manually entered classes. Using the newer model of
register pressure sets seems more appropriate. We might eventually even
switch to lib/CodeGen/RegisterPressure.cpp, but we should probably do
incremental changes here.
Using the newer model also makes it easier to take regmasks into account
which is necessary to fix llvm.org/PR23143. I am currently also
preparing a patch for that, but would like to do this switch
independently.
Review: http://reviews.llvm.org/D8986
llvm-svn: 234880
Gut the `DIDescriptor` wrappers around `MDLocalScope` subclasses. Note
that `DILexicalBlock` wraps `MDLexicalBlockBase`, not `MDLexicalBlock`.
llvm-svn: 234850
Gut all the non-pointer API from the variable wrappers, except an
implicit conversion from `DIGlobalVariable` to `DIDescriptor`. Note
that if you're updating out-of-tree code, `DIVariable` wraps
`MDLocalVariable` (`MDVariable` is a common base class shared with
`MDGlobalVariable`).
llvm-svn: 234840
This is along the same lines as r234832, but for `DILocation`. Clean
out all accessors from `DILocation`. Any callers should be using
`MDLocation` directly (e.g., via `operator->()`).
llvm-svn: 234835
Completely gut `DIExpression`, turning it into a simple wrapper around
`MDExpression *`. There are two bits of magic left:
- It's constructed from `const MDExpression*` but convertible to
`MDExpression*`.
- It's default-constructed to `nullptr`.
Otherwise, it should behave quite like a raw pointer. Once I've done
the same to the rest of the `DIDescriptor` subclasses, I'll come back to
delete them entirely (and update call sites as necessary to deal with
the missing magic).
llvm-svn: 234832
There's only one user of the various `DIObjCProperty::is*Property()`
accessors -- `DwarfUnit::constructTypeDIE()` -- and it's just using the
reverse logic to reconstruct the bitfield. Drop this API and simplify
the only caller.
llvm-svn: 234818
Combine something like:
(v8i8 concat_vectors (v2i8 bitcast (i16)) x4)
into:
(v8i8 (bitcast (v4i16 BUILD_VECTOR (i16) x4)))
If any of the scalars are floating point, use that throughout.
Differential Revision: http://reviews.llvm.org/D8948
llvm-svn: 234809
Instead of calling the somewhat confusingly-named
`DIVariable::isInlinedFnArgument()`, do the check directly here.
There's possibly a small functionality change here: instead of
`dyn_cast<>`'ing `DV->getScope()` to `MDSubprogram`, I'm looking up the
scope chain for the actual subprogram. I suspect that this is a no-op
for function arguments so in practise there isn't a real difference.
I've also added a `FIXME` to check the `inlinedAt:` chain instead, since
I wonder if that would be more reliable than the
`MDSubprogram::describes()` function.
Since this was the only user of `DIVariable::isInlinedFnArgument()`,
delete it.
llvm-svn: 234799
`DIGlobalVariable::getGlobal()` isn't really helpful, it just does a
`dyn_cast_or_null<>`. Simplify its only user by doing the cast directly
and delete the code.
llvm-svn: 234796
This reverts commit r234717, reapplying r234698 (in spirit).
As described in r234717, the original `Verifier` check had a
use-after-free. Instead of storing pointers to "interesting" debug info
intrinsics whose bit piece expressions should be verified once we have
typerefs, do a second traversal. I've added a testcase to catch the
`llc` crasher.
Original commit message:
Verifier: Check for incompatible bit piece expressions
Convert an assertion into a `Verifier` check. Bit piece expressions
must fit inside the variable, and mustn't be the entire variable.
Catching this in the verifier will help us find bugs sooner, and makes
`DIVariable::getSizeInBits()` dead code.
llvm-svn: 234776
Revert "Remove default in fully-covered switch (to fix Clang -Werror -Wcovered-switch-default)"
Revert "R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO"
Revert "LegalizeDAG: Try to use Overflow operations when expanding ADD/SUB"
Using overflow operations fails CodeGen/Generic/2011-07-07-ScheduleDAGCrash.ll
on hexagon, nvptx, and r600. Revert while I investigate.
llvm-svn: 234768
In case of different types used for the condition of the selects the
select(select) -> select(and) normalisation cannot be performed.
See also: http://reviews.llvm.org/D7622
llvm-svn: 234763
Fill in the TODO in CodeGenPrepare::OptimizeCallInst so that global
variables that are passed to memory intrinsics are aligned in the same
way that allocas are.
Differential Revision: http://reviews.llvm.org/D8421
llvm-svn: 234735
This reverts commit r234698.
This caused a use-after-free: `QueuedBitPieceExpressions` holds onto
references to `DbgInfoIntrinsic`s and references them past where they're
deleted (this is because the verifier is run as a function pass, and
then `verifyTypeRefs()` is called during `doFinalization()`).
I'll include a reduced crasher for `llc` when I recommit the check.
llvm-svn: 234717
Convert an assertion into a `Verifier` check. Bit piece expressions
must fit inside the variable, and mustn't be the entire variable.
Catching this in the verifier will help us find bugs sooner, and makes
`DIVariable::getSizeInBits()` dead code.
llvm-svn: 234698
The patch is generated using clang-tidy misc-use-override check.
This command was used:
tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py \
-checks='-*,misc-use-override' -header-filter='llvm|clang' \
-j=32 -fix -format
http://reviews.llvm.org/D8925
llvm-svn: 234679
Currently, there's a single flag, checked by the pass itself.
It can't force-enable the pass (and is on by default), because it
might not even have been created, as that's the targets decision.
Instead, have separate explicit flags, so that the decision is
consistently made in the target.
Keep the flag as a last-resort "force-disable GlobalMerge" for now,
for backwards compatibility.
llvm-svn: 234666
This allows winehprepare to build sensible llvm.eh.actions calls for SEH
finally blocks. The pattern matching in this change is brittle and
should be replaced with something more robust soon. In the meantime,
this will let us write the code that produces __C_specific_handler xdata
tables, which we need regardless of how we decide to get finally blocks
through EH preparation.
llvm-svn: 234663
r234638 chained another transform below which was tripping over the
deleted instruction. Use after free found by asan in many regression
tests.
llvm-svn: 234654
Summary:
This change moves creating calls to `llvm.uadd.with.overflow` from
InstCombine to CodeGenPrep. Combining overflow check patterns into
calls to the said intrinsic in InstCombine inhibits optimization because
it introduces an intrinsic call that not all other transforms and
analyses understand.
Depends on D8888.
Reviewers: majnemer, atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8889
llvm-svn: 234638
WinEH currently turns invokes into calls. Long term, we will reconsider
this, but for now, make sure we remap the operands and clone the
successors of the new terminator.
llvm-svn: 234608
The IPToState table must be emitted after we have generated labels for
all functions in the table. Don't rely on the order of the list of
globals. Instead, utilize WinEHFuncInfo to tell us how many catch
handlers we expect to outline. Once we know we've visited all the catch
handlers, emit the cppxdata.
llvm-svn: 234566
For the most common ones (such as fadd), we already did the promotion.
Do the same thing for all the others.
Currently, we'll just crash/assert on all these operations, as
there's no hardware or libcall support whatsoever.
f16 (half) is specified as an interchange - not arithmetic - format,
and is expected to be promoted to single-precision for arithmetic
operations.
While there, teach the legalizer about promoting some of the (mostly
floating-point) operations that we never needed before.
Differential Revision: http://reviews.llvm.org/D8648
See related discussion on the thread for: http://reviews.llvm.org/D8755
llvm-svn: 234550
formatted_raw_ostream is a wrapper over another stream to add column and line
number tracking.
It is used only for asm printing.
This patch moves the its creation down to where we know we are printing
assembly. This has the following advantages:
* Simpler lifetime management: std::unique_ptr
* We don't compute column and line number of object files :-)
llvm-svn: 234535
We already do:
concat_vectors(scalar, undef) -> scalar_to_vector(scalar)
When the scalar is legal.
When it's not, but is a truncated legal scalar, we can also do:
concat_vectors(trunc(scalar), undef) -> scalar_to_vector(scalar)
Which is equivalent, since the upper lanes are undef anyway.
While there, teach the combine to look at more than 2 operands.
Differential Revision: http://reviews.llvm.org/D8883
llvm-svn: 234530
Revert "Add classof implementations to the raw_ostream classes."
Revert "Use the cast machinery to remove dummy uses of formatted_raw_ostream."
The underlying issue can be fixed without classof.
llvm-svn: 234495
The bug manifests when there are two loads and two stores chained as follows in
a DAG,
(ld v3f32) -> (st f32) -> (ld v3f32) -> (st f32)
and the stores' values are extracted from the preceding vector loads.
MergeConsecutiveStores would replace the first store in the chain with the
merged vector store, which would create a cycle between the merged store node
and the last load node that appears in the chain.
This commits fixes the bug by replacing the last store in the chain instead.
rdar://problem/20275084
Differential Revision: http://reviews.llvm.org/D8849
llvm-svn: 234430
If two livesegments from different subranges happened to have the same
definition they could possibly end up as two adjacent segments in the
main liverange with the same value number which is not allowed. Detect
such cases and fix them in the 2nd pass of computeFromMainRange() if
necessary.
No testcase as there is only an out-of-tree target where I can sensibly
come up with one.
llvm-svn: 234382
This reverts commit r234295 (and r234294 and r234292 before it). I
removed the implicit conversion to `MDTuple*` r234326, so there's no
longer an ambiguity in `operator[]()`.
I think MSVC should accept the original code now...
llvm-svn: 234335
There were four almost identical implementations of calculating/updating
the register pressure for a certain MachineInstr. Cleanup to have a
single implementation (well, controlled with two bool flags until this
is cleaned up more).
No functional changes intended.
Tested by verify that there are no binary changes in the entire llvm
test-suite. A new test was added separately in r234309 as it revealed a
pre-existing error in the register pressure calculation.
llvm-svn: 234325
Replace all uses of `DITypedArray<>` with `MDTupleTypedArrayWrapper<>`
and `MDTypeRefArray`. The APIs are completely different, but the
provided functionality is the same: treat an `MDTuple` as if it's an
array of a particular element type.
To simplify this patch a bit, I've temporarily typedef'ed
`DebugNodeArray` to `DIArray` and `MDTypeRefArray` to `DITypeArray`.
I've also temporarily conditionalized the accessors to check for null --
eventually these should be changed to asserts and the callers should
check for null themselves.
There's a tiny accompanying patch to clang.
llvm-svn: 234290
Remove special iterators from `DIExpression` in favour of same in
`MDExpression`. There should be no functionality change here.
Note that the APIs are slightly different: `getArg(unsigned)` counts
from 0, not 1, in the `MDExpression` version of the iterator.
llvm-svn: 234285
Fast isel used to zero extends immediates to 64 bits. This normally goes
unnoticed because the value is truncated to 32 bits for output.
Two cases were it is noticed:
* We fail to use smaller encodings.
* If the original constant was smaller than i32.
In the tests using i1 constants, codegen would change to use -1, which is fine
(and matches what regular isel does) since only the lowest bit is then used.
Instead, this patch then changes the ir to use i8 constants, which looks more
like what clang produces.
llvm-svn: 234249
Remove `DIDescriptor::Verify()` and the `Verify()`s from subclasses.
They had already been gutted, and just did an `isa<>` check.
In a couple of cases I've temporarily dropped the check entirely, but
subsequent commits are going to disallow conversions to the
`DIDescriptor`s directly from `MDNode`, so the checks will come back in
another form soon enough.
llvm-svn: 234201
The uselist isn't enough to infer anything about the lifetime of such
allocas. If we want to re-add this optimization, we will need to
leverage lifetime markers to do it.
Fixes PR23122.
llvm-svn: 234196
This allows the compiler/assembly programmer to switch back to a
section. This in turn fixes the bootstrap failure on powerpc (tested
on gcc110) without changing the ppc codegen at all.
I will try to cleanup the various getELFSection overloads in a followup patch.
Just using a default argument now would lead to ambiguities.
llvm-svn: 234099
This add support for catching an exception such that an exception object
available to the catch handler will be initialized by the runtime.
llvm-svn: 234062
We don't need to represent UnwindHelp in IR. Instead, we can use the
knowledge that we are emitting the parent function to decide if we
should create the UnwindHelp stack object.
llvm-svn: 234061
As a follow-up to r234021, assert that a debug info intrinsic variable's
`MDLocalVariable::getInlinedAt()` always matches the
`MDLocation::getInlinedAt()` of its `!dbg` attachment.
The goal here is to get rid of `MDLocalVariable::getInlinedAt()`
entirely (PR22778), but I'll let these assertions bake for a while
first.
If you have an out-of-tree backend that just broke, you're probably
attaching the wrong `DebugLoc` to a `DBG_VALUE` instruction. The one
you want is the location that was attached to the corresponding
`@llvm.dbg.declare` or `@llvm.dbg.value` call that you started with.
llvm-svn: 234038
This patch attempts to fold the shuffling of 'scalar source' inputs - BUILD_VECTOR and SCALAR_TO_VECTOR nodes - if the shuffle node is the only user. This folds away a lot of unnecessary shuffle nodes, and allows quite a bit of constant folding that was being missed.
Differential Revision: http://reviews.llvm.org/D8516
llvm-svn: 234004
This makes it possible to use the same representation of llvm.eh.actions
in outlined handlers as we use in the parent function because i32's are
just constants that can be copied freely between functions.
I had to add a sentinel alloca to the list of child allocas so that we
don't try to sink the catch object into the handler. Normally, one would
use nullptr for this kind of thing, but TinyPtrVector doesn't support
null elements. More than that, it's elements have to have a suitable
alignment. Therefore, I settled on this for my sentinel:
AllocaInst *getCatchObjectSentinel() {
return static_cast<AllocaInst *>(nullptr) + 1;
}
llvm-svn: 233947
Require the pointee type to be passed explicitly and assert that it is
correct. For now it's possible to pass nullptr here (and I've done so in
a few places in this patch) but eventually that will be disallowed once
all clients have been updated or removed. It'll be a long road to get
all the way there... but if you have the cahnce to update your callers
to pass the type explicitly without depending on a pointer's element
type, that would be a good thing to do soon and a necessary thing to do
eventually.
llvm-svn: 233938
I'm playing with supporting custom stack map formats with statepoints. While
doing so, I noticed that the existing implementation didn't indicate inherently
unsized frames. This change essentially just ports the functionality that already
exists for the default StackMaps section to custom stackmaps.
llvm-svn: 233891
This lets us catch exceptions in simple cases.
N.B. Things that do not work include (but are not limited to):
- Throwing from within a catch handler.
- Catching an object with a named catch parameter.
- 'CatchHigh' is fictitious, we aren't sure of its purpose.
- We aren't entirely efficient with regards to the number of EH states
that we generate.
- IP-to-State tables are sensitive to the order of emission.
llvm-svn: 233767
The existing code in getMemsetValue only handled integer-preferred types when
the fill value was not a constant. Make this more robust in two ways:
1. If the preferred type is a floating-point value, do the mul-splat trick on
the corresponding integer type and then bitcast.
2. If the preferred type is a vector, do the mul-splat trick on one vector
element, and then build a vector out of them.
Fixes PR22754 (although, we should also turn off use of vector types at -O0).
llvm-svn: 233749
Specify an allocation order with a register class. This is used by register
allocators with a greedy heuristic. This is usefull as it is sometimes
beneficial to color more constrained classes first.
Differential Revision: http://reviews.llvm.org/D8626
llvm-svn: 233743
When allocating live intervals in linear order and all of them are local
to a single basic block you get an optimal coloring. This is also true
if you reverse the order, but it is not true if you sort live ranges
beginnings in reverse order, change to sort live range endings in
reverse order. Take the following live ranges for example:
|---| |--------|
|----------| |-------|
They get colored suboptimally with 3 registers if you sort the live range
starting points in reverse order (but optimally with live range begins in order,
or live range ends in reverse order).
Apparently the previous strategy was intentional because of allocation
time considerations. I am having a hard time replicating these effects,
while I see substantial improvements in allocation quality with this
change.
No testcase as none of the (in tree) targets use reverse order mode.
Differential Revision: http://reviews.llvm.org/D8625
llvm-svn: 233742
it more liberally.
SplitVecOp_TRUNCATE has logic for recursively splitting oversize vectors
that need more than one round of splitting to become legal. There are many
other ISD nodes that could benefit from this logic, so factor it out and
use it for FP_TO_UINT,FP_TO_SINT,SINT_TO_FP,UINT_TO_FP and FTRUNC.
llvm-svn: 233681
Two things here:
1. I read `getScope()` and `getContext()` backwards in r233640. There
was no need for `getScopeOfScope()`. Obviously not enough test
coverage here (as I said in that commit, I'm going to come back to
that), but anyway I'm reverting to the behaviour before r233640.
2. The callers that use `DILexicalBlockFile::getContext()` don't seem
to care about the difference. Just have it redirect to `getScope()`
so I can't get confused again.
llvm-svn: 233650
The only user of `DebugLoc::getFromDILexicalBlock()` was creating a new
`MDLocation` as convenient API for passing an `MDScope`. Stop doing
that, and remove the API. If in the future we actually *want* to create
new DebugLocs, calling `MDLexicalBlock::get()` makes more sense.
llvm-svn: 233643
Pervasively use the types provided by the debug info hierarchy rather
than `MDNode` in `LexicalScopes`.
I noticed (again, I guess, based on comments in the implementation?)
that `DILexicalBlockFile::getScope()` returns something different from
`DILexicalBlockFile::getContext()`. I created a local helper for
getting the same logic from `MDLexicalBlockFile` called
`getScopeOfScope()`. I still don't really understand it, but I've added
some FIXMEs and I'll come back to it (I suspect the way we encode these
objects isn't really ideal).
Note that my previous commit r233610 accidentally changed behaviour in
`findLexicalScope()` -- it transitioned from a call to
`DILexicalBlockFile::getScope()` to `MDLexicalBlockFile::getScope()`
(sounds right, doesn't it?) -- so I've fixed that as a drive-by. No
tests failed with my error, so it looks like we're missing some coverage
here... when I come back to understand the logic, I'll see if I can add
some.
Other than the fix to `findLexicalScope()`, no functionality change.
llvm-svn: 233640
Generate tables in the .xdata section representing what actions to take
when an exception is thrown. This currently fills in state for
cleanups, catch handlers are still unfinished.
llvm-svn: 233636
There's no benefit to using `DebugLoc` here. Moreover, this will let a
follow-up commit work with `MDScope` directly instead of `DebugLoc`.
llvm-svn: 233610
Don't use `DebugLoc::getFnDebugLoc()`, which creates new `MDLocation`s,
in the backend. We just want to grab the subprogram here anyway.
llvm-svn: 233601
DAGCombiner::ReassociateOps was correctly testing for an constant integer scalar but failed to correctly test for constant integer vectors (it was testing for any constant vector).
llvm-svn: 233482
The original f32->f64 promotion logic was refactored into roughly the
currently shape in r37781. However, starting with r132263, the
legalizer has been split into different kinds, and the previous
"Promote" (which did the right thing) was search-and-replace'd into
"PromoteInteger". The divide gradually deepened, with type legalization
("PromoteInteger") being separated from ops legalization
("Promote", which still works for floating point ops).
Fast-forward to today: there's no in-tree target with legal f64 but
illegal f32 (rather: no tests were harmed in the making of this patch).
With such a target, i.e., if you trick the legalizer into going through
the PromoteInteger path for FP, you get the expected brokenness.
For instance, there's no PromoteIntRes_FADD (the name itself sounds
wrong), so we'll just hit some assert in the PromoteInteger path.
Don't pretend we can promote f32 to f64. Instead, always soften.
llvm-svn: 233464
Tailcalls are only OK with forwarded sret pointers. With explicit sret,
one approximation is to check that the pointer isn't an Instruction, as
in that case it might point into some local memory (alloca). That's not
OK with tailcalls.
Explicit sret counterpart to r233409.
Differential Revison: http://reviews.llvm.org/D8510
llvm-svn: 233410
Tailcalls are only OK with forwarded sret pointers. With sret demotion,
they're not, as we'd have a pointer into a soon-to-be-dead stack frame.
Differential Revison: http://reviews.llvm.org/D8510
llvm-svn: 233409
nodes.
When a node is terminal it is pushed at the end of the list of the copies to
coalesce instead of being completely ignored. In effect, this reduces its
priority over non-terminal nodes.
Because of that, we do not miss the rematerialization opportunities, nor the
copies that can be merged with more complex, than the terminal rule,
interference checks.
Related to PR22768.
llvm-svn: 233395
"Fix the MachineScheduler's logic for updating ready times for in-order.
Now the scheduler updates a node's ready time as soon as it is
scheduled, before releasing dependent nodes."
This fix was only made in one variant of the ScheduleDAGMI driver.
Francois de Ferriere reported the issue in the other bit of code where
it was also needed.
I never got around to coming up with a test case, but it's an
obvious fix that shouldn't be delayed any longer.
I'll try to refactor this code a little better.
I did verify performance on a wide variety of targets and saw no
negative impact with this fix.
llvm-svn: 233366
This was discussed a while back and I left it optional for migration. Since it's been far more than the 'week or two' that was discussed, time to actually make this manditory.
llvm-svn: 233357
This patch adds support for explicitly provided spill slots in the GC arguments of a gc.statepoint. This is somewhat analogous to gcroot, but leverages the STATEPOINT MI node and StackMap infrastructure. The motivation for this is:
1) The stack spilling code for gc.statepoints hasn't advanced as fast as I'd like. One major option is to give up on doing spilling in the backend and do it at the IR level instead. We'd give up the ability to have gc values in registers, but that's a minor cost in practice. We are not neccessarily moving in that direction, but having the ability to prototype such a thing cheaply is interesting.
2) I want to port the gcroot lowering to use the statepoint infastructure. Given the metadata printers for gcroot expect a fixed set of stack roots, it's easiest to just reuse the explicit stack slots and pass them directly to the underlying statepoint.
I'm holding off on the documentation for the new feature until I'm reasonable sure this is going to stick around.
llvm-svn: 233356
We don't have any logic to emit those tables yet, so the SDAG lowering
of this intrinsic is just a stub. We can see the intrinsic in the
prepared IR, though.
llvm-svn: 233354
It can happen (by line CurSU->isPending = true; // This SU is not in
AvailableQueue right now.) that a SUnit is mark as available but is
not in the AvailableQueue. For SUnit being selected for scheduling
both conditions must be met.
This patch mainly defensively protects from invalid removing a node
from a queue. Sometimes nodes are marked isAvailable but are not in
the queue because they have been defered due to some hazard.
Patch by Pawel Bylica!
llvm-svn: 233351
We used to mark a bunch of libm nodes as Expand for f16. There are no
libcalls we can use for those, so we eventually just hit an unhelpful
llvm_unreachable in ExpandFPLibCall.
Instead, just ignore them altogether. If nothing else changes, we'll
then get the more descriptive and pleasant "Cannot select" fatal error.
There's an argument to be made for consistency, but f16 is already
special in all the good ways, and as long as there's no f16 support in
the ops expander (this patch), as well as the Soften/Expand float
legalizers (which, when hit, will currently segfault), I think there's
no point in even pretending we can legalize any of this.
This shouldn't affect anything that's not already broken.
llvm-svn: 233328
those are in the same basic block.
The previous approach was the topological order of the basic block.
By default this rule is disabled.
Related to PR22768.
llvm-svn: 233241
This patch adds supports for the vector constant folding of TRUNCATE and FP_EXTEND instructions and tidies up the SINT_TO_FP and UINT_TO_FP instructions to match.
It also moves the vector constant folding for the FNEG and FABS instructions to use the DAG.getNode() functionality like the other unary instructions.
Differential Revision: http://reviews.llvm.org/D8593
llvm-svn: 233224
If liveranges induced by an IMPLICIT_DEF get completely covered by a
proper liverange the IMPLICIT_DEF instructions and its corresponding
definitions have to be removed from the live ranges. This has to happen
in the subregister live ranges as well (I didn't see this case earlier
because in most programs only some subregisters are covered and the
IMPLCIT_DEF won't get removed).
No testcase, I spent hours trying to create one for one of the public
targets, but ultimately failed because I couldn't manage to properly
control the placement of COPY and IMPLICIT_DEF instructions from an .ll
file.
llvm-svn: 233217
We don't have any logic to emit those tables yet, so the sdag lowering
of this intrinsic is just a stub. We can see the intrinsic in the
prepared IR, though.
llvm-svn: 233209
Reverts the code change from r221168 and the relevant test.
It was a mistake to disable the combiner, and based on the ultimate
definition of 'optnone' we shouldn't have considered the test case
as failing in the first place.
llvm-svn: 233153
We can't use TargetFrameLowering::getFrameIndexOffset directly, because
Win64 really wants the offset from the stack pointer at the end of the
prologue. Instead, use X86FrameLowering::getFrameIndexOffsetFromSP(),
which is a pretty close approximiation of that. It fails to handle cases
with interestingly large stack alignments, which is pretty uncommon on
Win64 and is TODO.
llvm-svn: 233137
The changes to InstCombine do seem a bit silly - it doesn't make
anything obviously better to have the caller access the pointers element
type (the thing I'm trying to remove) than the GEP itself, but it's a
helpful migration step. This will allow me to more obviously lock down
GEP (& Load, etc) API usage, then fix all the code that accesses pointer
element types except the places that need to be removed (most of the
InstCombines) anyway - at which point I'll need to just remove all that
code because it won't be meaningful anymore (there will be no pointer
types, so no bitcasts to combine)
llvm-svn: 233126
While the uitofp scalar constant folding treats an integer as an unsigned value (from lang ref):
%X = sitofp i8 -1 to double ; yields double:-1.0
%Y = uitofp i8 -1 to double ; yields double:255.0
The vector constant folding was always using sitofp:
%X = sitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0>
%Y = uitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0>
This patch fixes this so that the correct opcode is used for sitofp and uitofp.
%X = sitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0>
%Y = uitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double 255.0, double 255.0>
Differential Revision: http://reviews.llvm.org/D8560
llvm-svn: 233033
There is now a canonical symbol at the end of a section that different
passes can request.
This also allows us to assert that we don't switch back to a section whose
end symbol has already been printed.
llvm-svn: 233026
Fixing sign extension in makeLibCall for MIPS64. In MIPS64 architecture all
32 bit arguments (int, unsigned int, float 32 (soft float)) must be sign
extended. This fixes test "MultiSource/Applications/oggenc/".
Patch by Strahinja Petrovic.
Differential Revision: http://reviews.llvm.org/D7791
llvm-svn: 232943
Because the operands of a vector SETCC node can be of a different type from the
result (and often are), it can happen that even if we'd prefer to widen the
result type of the SETCC, the operands have been split instead. In this case,
the SETCC result also must be split. This mirrors what is done in
WidenVecRes_SELECT, and should be NFC elsewhere because if the operands are not
widened the following calls to GetWidenedVector will assert (which is what was
happening in the test case).
llvm-svn: 232935
As preparation for removing the getSubtargetImpl() call from
TargetMachine go ahead and flip the switch on caching the function
dependent subtarget and remove the bare getSubtargetImpl call
from the X86 port. As part of this add a few tests that show we
can generate code and assemble on X86 based on features/cpu on
the Function.
llvm-svn: 232879
thumb-ness similar to the rest of the Module level asm printing
infrastructure as debug info finalization happens after the function
may be missing.
llvm-svn: 232875
If we couldn't analyze its terminator (i.e., it's an indirectbr, or some
other weirdness), we can't safely re-if-convert a predicated block,
because we can't tell whether the predicated terminator can
fallthrough (it does).
Currently, we would completely ignore the fallthrough successor. In
the added testcase, this means we used to generate:
...
@ %entry:
cmp r5, #21
ittt ne
@ %cc1f:
cmpne r7, #42
@ %cc2t:
strne.w r5, [r8]
movne pc, r10
@ %cc1t:
...
Whereas the successor of %cc1f was originally %bb1.
With the fix, we get the correct:
...
@ %entry:
cmp r5, #21
itt eq
@ %cc1t:
streq.w r5, [r11]
moveq pc, r0
@ %cc1f:
cmp r7, #42
itt ne
@ %cc2t:
strne.w r5, [r8]
movne pc, r10
@ %bb1:
...
rdar://20192768
Differential Revision: http://reviews.llvm.org/D8509
llvm-svn: 232872
The code this patch removes was there to make sure the text sections went
before the dwarf sections. That is necessary because MachO uses offsets
relative to the start of the file, so adding a section can change relaxations.
The dwarf sections were being printed at the start just to produce symbols
pointing at the start of those sections.
The underlying issue was fixed in r231898. The dwarf sections are now printed
when they are about to be used, which is after we printed the text sections.
To make sure we don't regress, the patch makes the MachO streamer assert
if CodeGen puts anything unexpected after the DWARF sections.
llvm-svn: 232842
Check return of `getDISubprogram()` before using it. A WIP patch makes
`DIDescriptor` accessors more strict (and would crash on this).
llvm-svn: 232838
`DL` might be null, so check for that before using accessors. A WIP
patch to make `DIDescriptors` more strict fails otherwise.
As a bonus, I think the logic is easier to follow now (despite the extra
nesting depth).
llvm-svn: 232836
LocalStackSlotPass assumes that isFrameOffsetLegal doesn't change its
answer when the base register changes. Unfortunately this isn't true
in thumb1, where SP-based loads allow a larger offset than
non-SP-based loads, and this causes the base register reuse code to
generate instructions that are unencodable, causing an assertion
failure.
Solve this by adding a BaseReg parameter to isFrameOffsetLegal, which
ARMBaseRegisterInfo can then make use of to give the correct answer.
Differential Revision: http://reviews.llvm.org/D8419
llvm-svn: 232825
numbers before emission.
This removes a dependency on being able to access TRI at the module
level and is similar to the DwarfExpression handling. I've modified
the debug support into print/dump routines that'll do the same dumping
but is now callable anywhere and if TRI isn't available will go ahead
and just print out raw register numbers.
llvm-svn: 232821
nullptr so that users get an earlier dereferencing error and
so that we can use it to conditionalize access to MachineFunction
specific data.
llvm-svn: 232820
With the option -outline-optional-branches, LLVM will place optional
branches out of line (more details on r231230).
With this patch, this is not done for short optional branches. A short
optional branch is a branch containing a single block with an
instruction count below a certain threshold (defaulting to 3). Still
everything is guarded under -outline-optional-branches).
Outlining a short branch can't significantly improve code locality. It
can however decrease performance because of the additional jmp and in
cases where the optional branch is hot. This fixes a compile time
regression I have observed in a benchmark.
Review: http://reviews.llvm.org/D8108
llvm-svn: 232802
This is very related to the bug fixed in r174431. The problem is that
SelectionDAG does not include alignment in the uniquing of loads and
stores. When an otherwise no-op DAGCombine would increase the alignment
of a load or store, the original node would be returned (with the
alignment increased), which would cause the node not to be processed by
any further DAGCombines.
I don't have a direct testcase for this that manifests on an in-tree
target, but I did see some noise in the tests for other targets and have
updated them for it.
llvm-svn: 232780
This enables us to remove calls to the subtarget from the TargetMachine
and with a small hack for backends that require global subtarget
information for module level code generation, e.g. mips abi flags, as
mentioned in a fixme in the code.
llvm-svn: 232776
This switches the sense of the i32 values and updates the test cases.
We can also use CHECK-SAME to clean up some tests, and reduce the visual
noise from bitcasts.
llvm-svn: 232774
Remove `DebugInfoVerifierLegacyPass` and the `-verify-di` pass.
Instead, call into the `DebugInfoVerifier` from inside
`VerifierLegacyPass::finalizeModule()`. This better matches the logic
in `verifyModule()` (used by the new PassManager), avoids requiring two
separate passes to verify the IR, and makes the API for "add a pass to
verify the IR" simple.
Note: the `-verify-debug-info` flag still works (for now, at least;
eventually it might make sense to just remove it).
llvm-svn: 232772