The frame object which points to the dynamically allocated area will not be
needed after changes are made to cease reserving call frames.
llvm-svn: 161076
Assuming infinite issue width, compute the earliest each instruction in
the trace can issue, when considering the latency of data dependencies.
The issue cycle is record as a 'depth' from the beginning of the trace.
This is half the computation required to find the length of the critical
path through the trace. Heights are next.
llvm-svn: 161074
arguments to the stack in MipsISelLowering::LowerCall, use stack pointer and
integer offset operands rather than frame object operands.
llvm-svn: 161068
single-precision load and store.
Also avoid selecting LUXC1 and SUXC1 instructions during isel. It is incorrect
to map unaligned floating point load/store nodes to these instructions.
llvm-svn: 161063
One motivating example is to sink an instruction from a basic block which has
two successors: one outside the loop, the other inside the loop. We should try
to sink the instruction outside the loop.
rdar://11980766
llvm-svn: 161062
for this class. These tests exercise most of the basic properties, but
the API for TinyPtrVector is very strange currently. My plan is to start
fleshing out the API to match that of SmallVector, but I wanted a test
for what is there first.
Sadly, it doesn't look reasonable to just re-use the SmallVector tests,
as this container can only ever store pointers, and much of the
SmallVector testing is to get construction and destruction right.
Just to get this basic test working, I had to add value_type to the
interface.
While here I found a subtle bug in the combination of 'erase', 'begin',
and 'end'. Both 'begin' and 'end' wanted to use a null pointer to
indicate the "end" iterator of an empty vector, regardless of whether
there is actually a vector allocated or the pointer union is null.
Everything else was fine with this except for erase. If you erase the
last element of a vector after it has held more than one element, we
return the end iterator of the underlying SmallVector which need not be
a null pointer. Instead, simply use the pointer, and poniter + size()
begin/end definitions in the tiny case, and delegate to the inner vector
whenever it is present.
llvm-svn: 161024
We are extending live ranges, so kill flags are not accurate. They
aren't needed until they are recomputed after RA anyway.
<rdar://problem/11950722>
llvm-svn: 161023
We branch to the successor with higher edge weight first.
Convert from
je LBB4_8 --> to outer loop
jmp LBB4_14 --> to inner loop
to
jne LBB4_14
jmp LBB4_8
PR12750
rdar: 11393714
llvm-svn: 161018
CallInst for intrinsics. This allows users of the InstVisitor that would
like to special case certain very common intrinsics to do so naturally
in keeping with the type hierarchy's utility classes.
llvm-svn: 161006
This lets traces include the final iteration of a nested loop above the
center block, and the first iteration of a nested loop below the center
block.
We still don't allow traces to contain backedges, and traces are
truncated where they would leave a loop, as seen from the center block.
llvm-svn: 161003
Empty macro arguments at the end of the list should be as-if not specified at
all, but those in the middle of the list need to be kept so as not to screw
up the positional numbering. E.g.:
.macro foo
foo_-bash___:
nop
.endm
foo 1, 2, 3, 4
foo 1, , 3, 4
Should create two labels, "foo_1_2_3_4" and "foo_1__3_4".
rdar://11948769
llvm-svn: 161002
test more than a single instantiation of SmallVector.
Add testing for 0, 1, 2, and 4 element sized "small" buffers. These
appear to be essentially untested in the unit tests until now.
Fix several tests to be robust in the face of a '0' small buffer. As
a consequence of this size buffer, the growth patterns are actually
observable in the test -- yes this means that many tests never caused
a grow to occur before. For some tests I've merely added a reserve call
to normalize behavior. For others, the growth is actually interesting,
and so I captured the fact that growth would occur and adjusted the
assertions to not assume how rapidly growth occured.
Also update the specialization for a '0' small buffer length to have all
the same interface points as the normal small vector.
llvm-svn: 161001
When computing a trace, all the candidates for pred/succ must have been
visited. Filter out back-edges first, though. The PO traversal ignores
them.
Thanks to Andy for spotting this in review.
llvm-svn: 160995
where the other_half of the movt and movw relocation entries needs to get set
and only with the 16 bits of the other half.
rdar://10038370
llvm-svn: 160978
This is a cleaned up version of the isFree() function in
MachineTraceMetrics.cpp.
Transient instructions are very unlikely to produce any code in the
final output. Either because they get eliminated by RegisterCoalescing,
or because they are pseudo-instructions like labels and debug values.
llvm-svn: 160977
A->isPredecessor(B) is the same as B->isSuccessor(A), but it can
tolerate a B that is null or dangling. This shouldn't happen normally,
but it it useful for verification code.
llvm-svn: 160968
Machine CSE and other optimizations can remove instructions so folding
is possible at peephole while not possible at ISel.
rdar://10554090 and rdar://11873276
llvm-svn: 160919
It is possible that an instruction can use and update EFLAGS.
When checking the safety, we should check the usage of EFLAGS first before
declaring it is safe to optimize due to the update.
llvm-svn: 160912
A value number is a PHI def if and only if it begins at a block
boundary. This can be derived from the def slot, a separate flag is not
necessary.
llvm-svn: 160893
This option replaces the existing live interval computation with one
based on LiveRangeCalc.cpp. The new algorithm does not depend on
LiveVariables, and it can be run at any time, before or after leaving
SSA form.
llvm-svn: 160892
This can happen as long as the instruction is not reachable. Instcombine does generate these unreachable malformed selects when doing RAUW
llvm-svn: 160874
The rationale here is that it's hard to write loops containing vector erases and
it only shows up if the vector contains non-trivial objects leading to crashes
when forming them out of garbage memory.
llvm-svn: 160854
These tables were indexed by [register][subreg index] which made them,
very large and sparse.
Replace them with lists of sub-register indexes that match the existing
lists of sub-registers. MCRI::getSubReg() becomes a very short linear
search, like getSubRegIndex() already was.
llvm-svn: 160843
Now that the weird X86 sub_ss and sub_sd sub-register indexes are gone,
there is no longer a need for the CompositeIndices construct in .td
files. Sub-register index composition can be specified on the
SubRegIndex itself using the ComposedOf field.
Also enforce unique names for sub-registers in TableGen. The same
sub-register cannot be available with multiple sub-register indexes.
llvm-svn: 160842
The (COPY_TO_REGCLASS GR32:$src, VR128) pattern looks odd, but
copyPhysReg does the right thing with it. (The old pattern would
eventually produce the same cross-class copy).
llvm-svn: 160830
The SUBREG_TO_REG instruction has magic semantics asserting that the
source value was defined by an instruction that cleared the high half of
the register. Those semantics are never actually exploited for xmm
registers.
llvm-svn: 160818
These idempotent sub-register indices don't do anything --- They simply
map XMM registers to themselves. They no longer affect register classes
either since the SubRegClasses field has been removed from Target.td.
This patch replaces XMM->XMM EXTRACT_SUBREG and INSERT_SUBREG patterns
with COPY_TO_REGCLASS patterns which simply become COPY instructions.
The number of IMPLICIT_DEF instructions before register allocation is
reduced, and that is the cause of the test case changes.
llvm-svn: 160816
This is still a work in progress.
Out-of-order CPUs usually execute instructions from multiple basic
blocks simultaneously, so it is necessary to look at longer traces when
estimating the performance effects of code transformations.
The MachineTraceMetrics analysis will pick a typical trace through a
given basic block and provide performance metrics for the trace. Metrics
will include:
- Instruction count through the trace.
- Issue count per functional unit.
- Critical path length, and per-instruction 'slack'.
These metrics can be used to determine the performance limiting factor
when executing the trace, and how it will be affected by a code
transformation.
Initially, this will be used by the early if-conversion pass.
llvm-svn: 160796
hopefully make it more visible. Adjust the web-docs to have a link to
this file rather than the list itself. I described code owners as also
being gatekeepers for their part of the code, which I think is true but
isn't in the code owner explanation on the web page.
llvm-svn: 160776
It is redundant; RegisterCoalescer will do the remat if it can't eliminate
the copy. Collected instruction counts before and after this. A few extra
instructions are generated due to spilling but it is normal to see these kinds
of changes with almost any small codegen change, according to Jakob.
This also fixed rdar://11830760 where xor is expected instead of movi0.
llvm-svn: 160749
When a live range splits into multiple connected components, we would
arbitrarily assign <undef> uses to component 0. This is wrong when the
use is tied to a def that gets assigned to a different component:
%vreg69<def> = ADD8ri %vreg68<undef>, 1
The use and def must get the same virtual register.
Fix this by assigning <undef> uses to the same component as the value
defined by the instruction, if any:
%vreg69<def> = ADD8ri %vreg69<undef>, 1
This fixes PR13402. The PR has a test case which I am not including
because it is unlikely to keep exposing this behavior in the future.
llvm-svn: 160739
of an array element (rather than at the beginning of the element) and extended
into the next element, then the load from the second element was being handled
wrong due to incorrect updating of the notion of which byte to load next. This
fixes PR13442. Thanks to Chris Smowton for reporting the problem, analyzing it
and providing a fix.
llvm-svn: 160711
The long branch pass (fixed in r160601) no longer uses the global base register
to compute addresses of branch destinations, so it is not necessary to reserve
a slot on the stack.
llvm-svn: 160703
struct s {
double x1;
float x2;
};
__attribute__((regparm(3))) struct s f(int a, int b, int c);
void g(void) {
f(41, 42, 43);
}
We need to be able to represent passing the address of s to f (sret) in a
register (inreg). Turns out that all that is needed is to not mark them as
mutually incompatible.
llvm-svn: 160695
if Condition Is Met instuctions that was not correctly determining the target
instruction.
So for a jne rel32 instruction:
% cat x.s
.byte 0x0f, 0x85, 0x09, 0x00, 0x00, 0x00
% as x.s
it was incorrectly deterining the target:
% otool -q -tv a.out
a.out:
(__TEXT,__text) section
0000000000000000 jne 0xd
and with the fix it gets this correct as:
% otool -q -tv a.out
a.out:
(__TEXT,__text) section
0000000000000000 jne 0xf
rdar://11505997
llvm-svn: 160694
are targeting an ELF platform. Only fold gs-relative (and fs-relative) loads
if it is actually sensible to do so for the target platform.
This fixes PR13438.
llvm-svn: 160687
might be deliberate "one time" leaks, so that leak checkers can find them.
This is a reapply of r160602 with the fix that this time I'm committing the
code I thought I was committing last time; the I->eraseFromParent() goes
*after* the break out of the loop.
llvm-svn: 160664
r160529 that was subsequently reverted. The fix was to not call
GV->eraseFromParent() right before the caller does the same. The existing
testcases already caught this bug if run under valgrind.
llvm-svn: 160602
This pass no longer requires that the global pointer value be saved to the
stack or register since it uses bal instruction to compute branch distance.
llvm-svn: 160601
LiveRangeEdit::foldAsLoad() can eliminate a register by folding a load
into its only use. Only do that when the load is safe to move, and it
won't extend any live ranges.
This fixes PR13414.
llvm-svn: 160575
CI's name, and then used the StringRef pointing at its old name. I'm
fixing it by storing the name in a std::string, and hoisting the
renaming logic to happen always. This is nicer anyways as it will allow
the upgraded IR to have the same names as the input IR in more cases.
Another bug found by AddressSanitizer. Woot.
llvm-svn: 160572
PHIElimination splits critical edges when it predicts it can resolve
interference and eliminate copies. It doesn't split the edge if the
interference wouldn't be resolved anyway because the phi-use register is
live in the critical edge anyway.
Teach PHIElimination to split loop exiting edges with interference, even
if it wouldn't resolve the interference. This removes the necessary
copies from the loop, which is still an improvement from injecting the
copies into the loop.
The test case demonstrates the improvement. Before:
LBB0_1:
cmpb $0, (%rdx)
leaq 1(%rdx), %rdx
movl %esi, %eax
je LBB0_1
After:
LBB0_1:
cmpb $0, (%rdx)
leaq 1(%rdx), %rdx
je LBB0_1
movl %esi, %eax
llvm-svn: 160571
GetBestDestForJumpOnUndef() assumes there is at least 1 successor, which isn't
true if the block ends in an indirect branch with no successors. Fix this by
bailing out earlier in this case.
llvm-svn: 160546
This fixes a bunch of make check failures of the form:
Unknown Architecture Version.
UNREACHABLE executed at ../lib/Target/Hexagon/HexagonSubtarget.cpp:60!
llvm-svn: 160518
It is optimal at least up to 7 bits (I've tested all such cases)
This change to truncate() allows a little simplification to the multiplication code,
and it also makes multiplication optimal :)
llvm-svn: 160512
Updated OptimizeCompare in peephole to remove redundant cmp against zero.
We only remove Compare if CF and OF are not used.
rdar://11855129
llvm-svn: 160454
when run on an Intel Atom processor. The failures have arisen due
to changes elsewhere in the trunk over the past 8 weeks or so.
These failures were not detected by the Atom buildbot because the
CPU on the Atom buildbot was not being detected as an Atom CPU.
The fix for this problem is in Host.cpp and X86Subtarget.cpp, but
shall remain commented out until the current set of Atom test failures
are fixed.
Patch by Andy Zhang and Tyler Nowicki!
llvm-svn: 160451
LiveIntervals due to the two-addr pass generating bogus MI code.
The crux of the issue was a loop nesting problem. The intent of the code
which attempts to transform instructions before converting them to
two-addr form is to defer and reprocess any transformed instructions as
the second processing is likely to have more opportunities to coalesce
copies, etc. Unfortunately, there was one section of processing that was
not deferred -- the INSERT_SUBREG rewriting. Due to quirks of how this
rewriting proceeded, not only did it occur early, it removed the bits of
information needed for the deferred processing to correctly generate the
necessary two address form (specifically inserting a copy), but didn't
trigger any immediate assertions and produced what appeared to be
already valid two-address from code. Thus, the assertion only fired much
later in the pipeline.
The fix is to hoist the transformation logic up layer to where it can
more firmly defer all further processing, and to teach the normal
processing to handle an edge case previously handled as part of the
transformation logic. This edge case (already matched tied register
operands) needs to *not* defer any steps.
As has been brought up repeatedly in the process: wow does this code
need refactoring. I *may* squeeze in some time to at least bring sanity
to this loop... but wow... =]
Thanks to Jakob for helpful hints on the way here, and the review.
llvm-svn: 160443
load source operand is used by multiple nodes. The v2i64 broadcast was emulated
by shuffling the two lower i32 elements to the upper two.
We had a bug in the immediate used for the broadcast.
Replacing 0 to 0x44.
0x44 means [01|00|01|00] which corresponds to the correct lane.
Patch by Michael Kuperstein.
llvm-svn: 160430
Print the high order register of a double word register operand.
In 32 bit mode, a 64 bit double word integer will be represented
by 2 32 bit registers. This modifier causes the high order register
to be used in the asm expression. It is useful if you are using
doubles in assembler and continue to control register to variable
relationships.
This patch also fixes a related bug in a previous patch:
case 'D': // Second part of a double word register operand
case 'L': // Low order register of a double word register operand
case 'M': // High order register of a double word register operand
I got 'D' and 'M' confused. The second part of a double word operand
will only match 'M' for one of the endianesses. I had 'L' and 'D'
be the opposite twins when 'L' and 'M' are.
llvm-svn: 160429
Fixes PR13371: indvars pass incorrectly substitutes 'undef' values.
I do not like this fix. It's needed until/unless the meaning of undef
changes. It attempts to be complete according to the IR spec, but I
don't have much confidence in the implementation given the difficulty
testing undefined behavior. Worse, this invalidates some of my
hard-fought work on indvars and LSR to optimize pointer induction
variables. It results benchmark regressions, which I'll track
internally. On x86_64 no LTO I see:
-3% huffbench
-3% 400.perlbench
-8% fhourstones
My only suggestion for recovering is to change the meaning of
undef. If we could trust an arbitrary instruction to produce a some
real value that can be manipulated (e.g. incremented) according to
non-undef rules, then this case could be easily handled with SCEV.
llvm-svn: 160421
intrinsics. The second instruction(s) to be handled are the vector versions
of count set bits (ctpop).
The changes here are to clang so that it generates a target independent
vector ctpop when it sees an ARM dependent vector bits set count. The changes
in llvm are to match the target independent vector ctpop and in
VMCore/AutoUpgrade.cpp to update any existing bc files containing ARM
dependent vector pop counts with target-independent ctpops. There are also
changes to an existing test case in llvm for ARM vector count instructions and
to a test for the bitcode upgrade.
<rdar://problem/11892519>
There is deliberately no test for the change to clang, as so far as I know, no
consensus has been reached regarding how to test neon instructions in clang;
q.v. <rdar://problem/8762292>
llvm-svn: 160410
A standalone pattern defined in a multiclass expansion should handle
null_frag references just like patterns on instructions. Follow-up to
r160333.
llvm-svn: 160384
Make it possible to prune individual graph edges from a post-order
traversal by specializing the po_iterator_storage template. Previously,
it was only possible to prune full graph nodes. Edge pruning makes it
possible to remove loop back-edges, for example.
Also replace the existing DFSetTraits customization hook with a
po_iterator_storage method for observing the post-order. DFSetTraits was
only used by LoopIterator.h which now provides a po_iterator_storage
specialization.
Thanks to Sean and Chandler for reviewing.
llvm-svn: 160366
To fetch a subprogram name we should not only inspect the DIE for this subprogram, but optionally inspect
its specification, or its abstract origin (even if there is no inlining), or even specification of an abstract origin.
Reviewed by Benjamin Kramer.
llvm-svn: 160365
large immediates. Add dag combine logic to recover in case the large
immediates doesn't fit in cmp immediate operand field.
int foo(unsigned long l) {
return (l>> 47) == 1;
}
we produce
%shr.mask = and i64 %l, -140737488355328
%cmp = icmp eq i64 %shr.mask, 140737488355328
%conv = zext i1 %cmp to i32
ret i32 %conv
which codegens to
movq $0xffff800000000000,%rax
andq %rdi,%rax
movq $0x0000800000000000,%rcx
cmpq %rcx,%rax
sete %al
movzbl %al,%eax
ret
TargetLowering::SimplifySetCC would transform
(X & -256) == 256 -> (X >> 8) == 1
if the immediate fails the isLegalICmpImmediate() test. For x86,
that's immediates which are not a signed 32-bit immediate.
Based on a patch by Eli Friedman.
PR10328
rdar://9758774
llvm-svn: 160346
This places limits on CollectSubexprs to constrains the number of
reassociation possibilities. It limits the recursion depth and skips
over chains of nested recurrences outside the current loop.
Fixes PR13361. Although underlying SCEV behavior is still potentially bad.
llvm-svn: 160340
Define a 'null_frag' SDPatternOperator node, which if referenced in an
instruction Pattern, results in the pattern being collapsed to be as-if
'[]' had been specified instead. This allows supporting a multiclass
definition where some instaniations have ISel patterns associated and
others do not.
For example,
multiclass myMulti<RegisterClass rc, SDPatternOperator OpNode = null_frag> {
def _x : myI<(outs rc:), (ins rc:), []>;
def _r : myI<(outs rc:), (ins rc:), [(set rc:, (OpNode rc:))]>;
}
defm foo : myMulti<GRa, not>;
defm bar : myMulti<GRb>;
llvm-svn: 160333
uint32_t hi(uint64_t res)
{
uint_32t hi = res >> 32;
return !hi;
}
llvm IR looks like this:
define i32 @hi(i64 %res) nounwind uwtable ssp {
entry:
%lnot = icmp ult i64 %res, 4294967296
%lnot.ext = zext i1 %lnot to i32
ret i32 %lnot.ext
}
The optimizer has optimize away the right shift and truncate but the resulting
constant is too large to fit in the 32-bit immediate field. The resulting x86
code is worse as a result:
movabsq $4294967296, %rax ## imm = 0x100000000
cmpq %rax, %rdi
sbbl %eax, %eax
andl $1, %eax
This patch teaches the x86 lowering code to handle ult against a large immediate
with trailing zeros. It will issue a right shift and a truncate followed by
a comparison against a shifted immediate.
shrq $32, %rdi
testl %edi, %edi
sete %al
movzbl %al, %eax
It also handles a ugt comparison against a large immediate with trailing bits
set. i.e. X > 0x0ffffffff -> (X >> 32) >= 1
rdar://11866926
llvm-svn: 160312
The first variant accepts immediate number as the second argument.
The second variant accepts register operand as the second argument.
llvm-svn: 160307
In the added testcase the constant 55 was behind an AssertZext of type i1, and ComputeDemandedBits
reported that some of the bits were both known to be one and known to be zero.
Together with Michael Kuperstein <michael.m.kuperstein@intel.com>
llvm-svn: 160305
Mips shift instructions DSLL, DSRL and DSRA are transformed into
DSLL32, DSRL32 and DSRA32 respectively if the shift amount is between
32 and 63
Here is a description of DSLL:
Purpose: Doubleword Shift Left Logical Plus 32
To execute a left-shift of a doubleword by a fixed amount--32 to 63 bits
Description: GPR[rd] <- GPR[rt] << (sa+32)
The 64-bit doubleword contents of GPR rt are shifted left, inserting
zeros into the emptied bits; the result is placed in
GPR rd. The bit-shift amount in the range 0 to 31 is specified by sa.
This patch implements the direct object output of these instructions.
llvm-svn: 160277
1. FileCheck-ize epilogue.ll and allow another asm instruction to restore %rsp.
2. Remove check in widen_arith-3.ll that was hitting instruction in epilogue instead of
vector add.
llvm-svn: 160274
undef virtual register. The problem is that ProcessImplicitDefs removes the
definition of the register and marks all uses as undef. If we lose the undef
marker then we get a register which has no def, is not marked as undef. The
live interval analysis does not collect information for these virtual
registers and we crash in later passes.
Together with Michael Kuperstein <michael.m.kuperstein@intel.com>
llvm-svn: 160260
It turns out that ASan relied on the at-the-end block insertion order to
(purely by happenstance) disable some LLVM optimizations, which in turn
start firing when the ordering is made more "normal". These
optimizations in turn merge many of the instrumentation reporting calls
which breaks the return address based error reporting in ASan.
We're looking at several different options for fixing this.
llvm-svn: 160256
This is particularly useful to the backend code generators which try to
process things in the incoming function order.
Also, cleanup some uses of IRBuilder to be a bit simpler and more clear.
llvm-svn: 160254
functionality test.
In general, unless the functionality is substantially separated, we
should lump more basic testing into this file. The test running
infrastructure likes having a few test files with more comprehensive
testing within them.
llvm-svn: 160253
Added a basic unit test for this with CreateCondBr. I didn't go all the
way and test the switch side as the boilerplate for setting up the
switch IRBuilder unit tests is a lot more. Fortunately, the two share
all the interesting code paths.
llvm-svn: 160251
It is intended to fix PR11468.
Old prologue and epilogue looked like this:
push %rbp
mov %rsp, %rbp
and $alignment, %rsp
push %r14
push %r15
...
pop %r15
pop %r14
mov %rbp, %rsp
pop %rbp
The problem was to reference the locations of callee-saved registers in exception handling:
locations of callee-saved had to be re-calculated regarding the stack alignment operation. It would
take some effort to implement this in LLVM, as currently MachineLocation can only have the form
"Register + Offset". Funciton prologue and epilogue are now changed to:
push %rbp
mov %rsp, %rbp
push %14
push %15
and $alignment, %rsp
...
lea -$size_of_saved_registers(%rbp), %rsp
pop %r15
pop %r14
pop %rbp
Reviewed by Chad Rosier.
llvm-svn: 160248
the move of *Builder classes into the Core library.
No uses of this builder in Clang or DragonEgg I could find.
If there is a desire to have an IR-building-support library that
contains all of these builders, that can be easily added, but currently
it seems likely that these add no real overhead to VMCore.
llvm-svn: 160243
IRBuilder, DIBuilder, etc.
This is the proper layering as MDBuilder can't be used (or implemented)
without the Core Metadata representation.
Patches to Clang and Dragonegg coming up.
llvm-svn: 160237
Allow the folding of vbroadcastRR to vbroadcastRM, where the memory operand is a spill slot.
PR12782.
Together with Michael Kuperstein <michael.m.kuperstein@intel.com>
llvm-svn: 160230
Add a micro-optimization to getNode of CONCAT_VECTORS when both operands are undefs.
Can't find a testcase for this because VECTOR_SHUFFLE already handles undef operands, but Duncan suggested that we add this.
Together with Michael Kuperstein <michael.m.kuperstein@intel.com>
llvm-svn: 160229
The notable fix is to look at any dependencies attached to the kill
instruction (or other instructions between MI nad the kill) where the
dependencies are specific to the register in question.
The old code implicitly handled this by rejecting the transform if *any*
other uses were found within the block, but after the start point. The
new code directly finds the kill, and has to re-use the existing
dependency scan to check for non-kill uses.
This was caught by self-host, but I found the bug via inspection and use
of absurd assert scaffolding to compute the kills in two ways and
compare them. So I have no useful testcase for this other than
"bootstrap". I'd work harder to reduce a test case if this particular
code were likely to live for a long time.
Thanks to Benjamin Kramer for reviewing the fix itself.
llvm-svn: 160228
Catch uses of undefined physregs that haven't been added to basic block
live-in lists. Run the verifier to pinpoint the problem.
Also run the verifier when a virtual register use is not jointly
dominated by defs.
llvm-svn: 160207
All SCEV expressions used by LSR formulae must be safe to
expand. i.e. they may not contain UDiv unless we can prove nonzero
denominator.
Fixes PR11356: LSR hoists UDiv.
llvm-svn: 160205
This allows SCEVExpander to run on the IV expressions.
This codifies an assumption made by LSR to complete the fix for
PR11356, but I haven't been able to generate a separate unit test for
this part. I'm adding it as an extra safety check.
llvm-svn: 160204
intrinsics with target-indepdent intrinsics. The first instruction(s) to be
handled are the vector versions of count leading zeros (ctlz).
The changes here are to clang so that it generates a target independent
vector ctlz when it sees an ARM dependent vector ctlz. The changes in llvm
are to match the target independent vector ctlz and in VMCore/AutoUpgrade.cpp
to update any existing bc files containing ARM dependent vector ctlzs with
target-independent ctlzs. There are also changes to an existing test case in
llvm for ARM vector count instructions and a new test for the bitcode upgrade.
<rdar://problem/11831778>
There is deliberately no test for the change to clang, as so far as I know, no
consensus has been reached regarding how to test neon instructions in clang;
q.v. <rdar://problem/8762292>
llvm-svn: 160200
removes the largest scaling problem in the test cases from PR13225 when
ASan is switched to insert basic blocks in the natural CFG order.
It may also solve some scaling problems for more normal code with large
numbers of basic blocks and variables.
llvm-svn: 160194
Call instructions are no longer required to be variadic, and
variable_ops should only be used for instructions that encode a variable
number of arguments, like the ARM stm/ldm instructions.
llvm-svn: 160189
Function argument registers are added to the call SDNode, but
InstrEmitter now knows how to make those operands implicit, and the call
instruction doesn't have to be variadic.
Explicit register operands should only be those that are encoded in the
instruction, implicit register operands are for extra dependencies like
call argument and return values.
llvm-svn: 160188
is used in cases where global symbols are
directly represented in the GOT and we use an
offset into the global offset table.
This patch adds direct object support for R_MIPS_GOT_DISP.
llvm-svn: 160183
When dumping the DAG for a fatal 'Cannot select' back-end error, also
provide the name of the function the construct is in. Useful when dealing
with large testcases, as the next step is to llvm-extract the function
in question to get a small(er) testcase.
llvm-svn: 160152
Make sure the tblgen'erated asm matcher correctly returns numoperands+1
as the ErrorInfo when the problem was that there weren't enough operands
specified.
rdar://9142751
llvm-svn: 160144
the input vector, it can be bigger (this is helpful for powerpc where <2 x i16>
is a legal vector type but i16 isn't a legal type, IIRC). However this wasn't
being taken into account by ExpandRes_EXTRACT_VECTOR_ELT, causing PR13220.
Lightly tweaked version of a patch by Michael Liao.
llvm-svn: 160116
%shr = lshr i64 %key, 3
%0 = load i64* %val, align 8
%sub = add i64 %0, -1
%and = and i64 %sub, %shr
ret i64 %and
to:
%shr = lshr i64 %key, 3
%0 = load i64* %val, align 8
%sub = add i64 %0, 2305843009213693951
%and = and i64 %sub, %shr
ret i64 %and
The demanded bit optimization is actually a pessimization because add -1 would
be codegen'ed as a sub 1. Teach the demanded constant shrinking optimization
to check for negated constant to make sure it is actually reducing the width
of the constant.
rdar://11793464
llvm-svn: 160101
def Pat<...>;
Results in 'record name is not a string!' diagnostic. Not the best,
but the lack of location information moves it from not very helpful
into completely useless. We're in the Record class when throwing the
error, so just add the location info directly.
llvm-svn: 160098
It is safe if CPSR is killed or re-defined.
When we are done with the basic block, check whether CPSR is live-out.
Do not optimize away cmp if CPSR is live-out.
llvm-svn: 160090
When WriteFragmentData() case FT_align called
Asm.getBackend().writeNopData() is called, nothing
is done since Mips implementation of writeNopData just
returned "true".
For some reason this has not caused problems in 32 bit
mode, but in 64 bit mode it caused an assert when processing
multiple function units.
The test case included will assert without this patch. It
runs twice with different flags to prevent false positives
due to changes in code generation over time.
llvm-svn: 160084
file buffer is null-terminated.
If the file is smaller than we thought, mmap will not allow dereferencing
past the pages that are enough to cover the actual file size,
even though we asked for a larger address range.
rdar://11612916
llvm-svn: 160075
r1025 = s/zext r1024, 4
r1026 = extract_subreg r1025, 4
to a copy:
r1026 = copy r1024
This is correct. However it uses TII->isCoalescableExtInstr() which can return
true for instructions which essentially does a sext_in_reg so this can end up
with an illegal copy where the source and destination register classes do not
match. Add a check to avoid it. Sorry, no test case possible at this time.
rdar://11849816
llvm-svn: 160059
Low order register of a double word register operand. Operands
are defined by the name of the variable they are marked with in
the inline assembler code. This is a way to specify that the
operand just refers to the low order register for that variable.
It is the opposite of modifier 'D' which specifies the high order
register.
Example:
main()
{
long long ll_input = 0x1111222233334444LL;
long long ll_val = 3;
int i_result = 0;
__asm__ __volatile__(
"or %0, %L1, %2"
: "=r" (i_result)
: "r" (ll_input), "r" (ll_val));
}
Which results in:
lui $2, %hi(_gp_disp)
addiu $2, $2, %lo(_gp_disp)
addiu $sp, $sp, -8
addu $2, $2, $25
sw $2, 0($sp)
lui $2, 13107
ori $3, $2, 17476 <-- Low 32 bits of ll_input
lui $2, 4369
ori $4, $2, 8738 <-- High 32 bits of ll_input
addiu $5, $zero, 3 <-- Low 32 bits of ll_val
addiu $2, $zero, 0 <-- High 32 bits of ll_val
#APP
or $3, $4, $5 <-- or i_result, high 32 ll_input, low 32 of ll_val
#NO_APP
addiu $sp, $sp, 8
jr $ra
If not direction is done for the long long for 32 bit variables results
in using the low 32 bits as ll_val shows.
There is an existing bug if 'L' or 'D' is used for the destination register
for 32 bit long longs in that the target value will be updated incorrectly
for the non-specified part unless explicitly set within the inline asm code.
llvm-svn: 160028
generalizing its implementation sufficiently to support this value
number scenario as well.
This cuts out another significant performance hit in large functions
(over 10k basic blocks, etc), especially those with "natural" CFG
structures.
llvm-svn: 160026
This ordering allows nested if-conversion without using a work list, and
it makes it possible to update the dominator tree on the fly as well.
Any erased basic blocks will always be dominated by the current
post-order position, so the domtree can be pruned without invalidating
the iterator.
llvm-svn: 160025
X86MachineFunctionInfo as this is currently only used by X86. If this ever
becomes an issue on another arch (e.g., ARM) then we can hoist it back out.
llvm-svn: 160009
X86. Basically, this is a reapplication of r158087 with a few fixes.
Specifically, (1) the stack pointer is restored from the base pointer before
popping callee-saved registers and (2) in obscure cases (see comments in patch)
we must cache the value of the original stack adjustment in the prologue and
apply it in the epilogue.
rdar://11496434
llvm-svn: 160002
back of it.
I don't have anything even remotely close to a test case for this. It
only broke two build bots, both of them doing bootstrap builds, one of
them a dragonegg bootstrap. It doesn't break for me when I bootstrap
either. It doesn't reproduce every time or on many machines during the
bootstrap. Many thanks to Duncan Sands who got the exact command (and
stage of the bootstrap) which failed on the dragonegg bootstrap and
managed to get it to trigger under valgrind with debug symbols. The fix
was then found by inspection.
llvm-svn: 159993
multiple scalars and insert them into a vector. Next, we shuffle the elements
into the correct places, as before.
Also fix a small dagcombine bug in SimplifyBinOpWithSameOpcodeHands, when the
migration of bitcasts happened too late in the SelectionDAG process.
llvm-svn: 159991
quadratic behavior when performing pathological merges. Fixes the core
element of PR12652.
There is only one user of addRangeFrom left: join. I'm hoping to
refactor further in a future patch and have join use this merge
operation as well.
llvm-svn: 159982
of the trick merge routines. This adds a layer of testing that was
necessary when implementing more efficient (and complex) merge logic for
this datastructure.
No functionality changed here.
llvm-svn: 159981
Some NEON instructions want to match against normal SDNodes for some
operand types and Intrinsics for others. For example, CTLZ. To enable this,
switch from explicitly requiring Intrinsic on the class templates to using
SDPatternOperator instead.
llvm-svn: 159974
TableGen has support for using an intrinics name directly in a DAG,
but this breaks down when referring to just a node, as that's
handled initializer list stuff entirely via subclassing in the
parser. That is, using an instrinsic like "(int_my_intrinsic ...)"
works fine. Using it standalone for parameterizing the operator
in such a DAG does not.
Fixing this is simple enough, as we simply declare Intrinsic
as deriving from SDPatternOperator, which is the class name
intended for exactly this purpose in TargetSelectionDAG.td.
When the intrinsic is actually used in the DAG pattern, it will
be recognized and expanded to an intrinsic_wo_chain (et. al.)
just like when it's used directly.
Incoming ARM NEON cleanup based on this and a bit of functionality
improvement after that.
llvm-svn: 159973
getCondFromSETOpc, getCondFromCMovOpc, getSETFromCond, getCMovFromCond
No functional change intended.
If we want to update the condition code of CMOV|SET|Jcc, we first analyze the
opcode to get the condition code, then update the condition code, finally
synthesize the new opcode form the new condition code.
llvm-svn: 159955
Access mips register classes via MCRegisterInfo's functions instead of via the
TargetRegisterClasses defined in MipsGenRegisterInfo.inc.
llvm-svn: 159953
This patch removes ~70 lines in InstCombineLoadStoreAlloca.cpp and makes both functions a bit more aggressive than before :)
In theory, we can be more aggressive when removing an alloca than a malloc, because an alloca pointer should never escape, but we are not taking advantage of this anyway
llvm-svn: 159952
subtarget CPU descriptions and support new features of
MachineScheduler.
MachineModel has three categories of data:
1) Basic properties for coarse grained instruction cost model.
2) Scheduler Read/Write resources for simple per-opcode and operand cost model (TBD).
3) Instruction itineraties for detailed per-cycle reservation tables.
These will all live side-by-side. Any subtarget can use any
combination of them. Instruction itineraries will not change in the
near term. In the long run, I expect them to only be relevant for
in-order VLIW machines that have complex contraints and require a
precise scheduling/bundling model. Once itineraries are only actively
used by VLIW-ish targets, they could be replaced by something more
appropriate for those targets.
This tablegen backend rewrite sets things up for introducing
MachineModel type #2: per opcode/operand cost model.
llvm-svn: 159891
It is safe if EFLAGS is killed or re-defined.
When we are done with the basic block, check whether EFLAGS is live-out.
Do not optimize away cmp if EFLAGS is live-out.
llvm-svn: 159888
This means we can do cheap DSE for heap memory.
Nothing is done if the pointer excapes or has a load.
The churn in the tests is mostly due to objectsize, since we want to make sure we
don't delete the malloc call before evaluating the objectsize (otherwise it becomes -1/0)
llvm-svn: 159876
For each Cmp, we check whether there is an earlier Sub which make Cmp
redundant. We handle the case where SUB operates on the same source operands as
Cmp, including the case where the two source operands are swapped.
llvm-svn: 159838
DwarfDebug class could generate the same (inlined) DIVariable twice:
1) when trying to find abstract debug variable for a concrete inlined instance.
2) when explicitly collecting info for variables that were optimized out.
This change makes sure that this duplication won't happen and makes
Clang pass "gdb.opt/inline-locals" test from gdb testsuite.
Reviewed by Eric Christopher.
llvm-svn: 159811
Print the second half of a double word operand.
The include list was cleaned up a bit as well.
Also the test case was modified to test for both
big and little patterns.
llvm-svn: 159787