Previously, MBP essentially aligned every branch target it could. This
bloats code quite a bit, especially non-looping code which has no real
reason to prefer aligned branch targets so heavily.
As Andy said in review, it's still a bit odd to do this without a real
cost model, but this at least has much more plausible heuristics.
Fixes PR13265.
llvm-svn: 161409
If the result of a common subexpression is used at all uses of the candidate
expression, CSE should not increase the live range of the common subexpression.
rdar://11393714 and rdar://11819721
llvm-svn: 161396
initialize fields of the class that it used.
The result was nonsense code.
Before:
0000000000000000 <foo>:
0: 00441100 0x441100
4: 03e00008 jr ra
8: 00000000 nop
After:
0000000000000000 <foo>:
0: 00041000 sll v0,a0,0x0
4: 03e00008 jr ra
8: 00000000 nop
llvm-svn: 161377
were using a class defined for 32 bit instructions and
thus the instruction was for addiu instead of daddiu.
This was corrected by adding the instruction opcode as a
field in the base class to be filled in by the defs.
llvm-svn: 161359
These 2 relocations gain access to the
highest and the second highest 16 bits
of a 64 bit object.
R_MIPS_HIGHER %higher(A+S)
The %higher(x) function is [ (((long long) x + 0x80008000LL) >> 32) & 0xffff ].
R_MIPS_HIGHEST %highest(A+S)
The %highest(x) function is [ (((long long) x + 0x800080008000LL) >> 48) & 0xffff ].
llvm-svn: 161348
The MFTB instruction itself is being phased out, and its functionality
is provided by MFSPR. According to the ISA docs, using MFSPR works on all known
chips except for the 601 (which did not have a timebase register anyway)
and the POWER3.
Thanks to Adhemerval Zanella for pointing this out!
llvm-svn: 161346
On PPC64, this can be done with a simple TableGen pattern.
To enable this, I've added the (otherwise missing) readcyclecounter
SDNode definition to TargetSelectionDAG.td.
llvm-svn: 161302
This patch is mostly just refactoring a bunch of copy-and-pasted code, but
it also adds a check that the call instructions are readnone or readonly.
That check was already present for sin, cos, sqrt, log2, and exp2 calls, but
it was missing for the rest of the builtins being handled in this code.
llvm-svn: 161282
I noticed that SelectionDAGBuilder::visitCall was missing a check for memcmp
in TargetLibraryInfo, so that it would use custom code for memcmp calls even
with -fno-builtin. I also had to add a new -disable-simplify-libcalls option
to llc so that I could write a test for this.
llvm-svn: 161262
Fast isel doesn't currently have support for translating builtin function
calls to target instructions. For embedded environments where the library
functions are not available, this is a matter of correctness and not
just optimization. Most of this patch is just arranging to make the
TargetLibraryInfo available in fast isel. <rdar://problem/12008746>
llvm-svn: 161232
Add more comments and use early returns to reduce nesting in isLoadFoldable.
Also disable folding for V_SET0 to avoid introducing a const pool entry and
a const pool load.
rdar://10554090 and rdar://11873276
llvm-svn: 161207
yaml2obj takes a textual description of an object file in YAML format
and outputs the binary equivalent. This greatly simplifies writing
tests that take binary object files as input.
llvm-svn: 161205
Machine CSE and other optimizations can remove instructions so folding
is possible at peephole while not possible at ISel.
This patch is a rework of r160919 and was tested on clang self-host on my local
machine.
rdar://10554090 and rdar://11873276
llvm-svn: 161152
MipsSEFrameLowering.
Implement MipsSEFrameLowering::hasReservedCallFrame. Call frames will not be
reserved if there is a call with a large call frame or there are variable sized
objects on the stack.
llvm-svn: 161090
The frame object which points to the dynamically allocated area will not be
needed after changes are made to cease reserving call frames.
llvm-svn: 161076
arguments to the stack in MipsISelLowering::LowerCall, use stack pointer and
integer offset operands rather than frame object operands.
llvm-svn: 161068
single-precision load and store.
Also avoid selecting LUXC1 and SUXC1 instructions during isel. It is incorrect
to map unaligned floating point load/store nodes to these instructions.
llvm-svn: 161063
One motivating example is to sink an instruction from a basic block which has
two successors: one outside the loop, the other inside the loop. We should try
to sink the instruction outside the loop.
rdar://11980766
llvm-svn: 161062
We are extending live ranges, so kill flags are not accurate. They
aren't needed until they are recomputed after RA anyway.
<rdar://problem/11950722>
llvm-svn: 161023
We branch to the successor with higher edge weight first.
Convert from
je LBB4_8 --> to outer loop
jmp LBB4_14 --> to inner loop
to
jne LBB4_14
jmp LBB4_8
PR12750
rdar: 11393714
llvm-svn: 161018
Empty macro arguments at the end of the list should be as-if not specified at
all, but those in the middle of the list need to be kept so as not to screw
up the positional numbering. E.g.:
.macro foo
foo_-bash___:
nop
.endm
foo 1, 2, 3, 4
foo 1, , 3, 4
Should create two labels, "foo_1_2_3_4" and "foo_1__3_4".
rdar://11948769
llvm-svn: 161002
where the other_half of the movt and movw relocation entries needs to get set
and only with the 16 bits of the other half.
rdar://10038370
llvm-svn: 160978
Machine CSE and other optimizations can remove instructions so folding
is possible at peephole while not possible at ISel.
rdar://10554090 and rdar://11873276
llvm-svn: 160919
It is possible that an instruction can use and update EFLAGS.
When checking the safety, we should check the usage of EFLAGS first before
declaring it is safe to optimize due to the update.
llvm-svn: 160912
This can happen as long as the instruction is not reachable. Instcombine does generate these unreachable malformed selects when doing RAUW
llvm-svn: 160874
These idempotent sub-register indices don't do anything --- They simply
map XMM registers to themselves. They no longer affect register classes
either since the SubRegClasses field has been removed from Target.td.
This patch replaces XMM->XMM EXTRACT_SUBREG and INSERT_SUBREG patterns
with COPY_TO_REGCLASS patterns which simply become COPY instructions.
The number of IMPLICIT_DEF instructions before register allocation is
reduced, and that is the cause of the test case changes.
llvm-svn: 160816
It is redundant; RegisterCoalescer will do the remat if it can't eliminate
the copy. Collected instruction counts before and after this. A few extra
instructions are generated due to spilling but it is normal to see these kinds
of changes with almost any small codegen change, according to Jakob.
This also fixed rdar://11830760 where xor is expected instead of movi0.
llvm-svn: 160749
of an array element (rather than at the beginning of the element) and extended
into the next element, then the load from the second element was being handled
wrong due to incorrect updating of the notion of which byte to load next. This
fixes PR13442. Thanks to Chris Smowton for reporting the problem, analyzing it
and providing a fix.
llvm-svn: 160711
The long branch pass (fixed in r160601) no longer uses the global base register
to compute addresses of branch destinations, so it is not necessary to reserve
a slot on the stack.
llvm-svn: 160703
struct s {
double x1;
float x2;
};
__attribute__((regparm(3))) struct s f(int a, int b, int c);
void g(void) {
f(41, 42, 43);
}
We need to be able to represent passing the address of s to f (sret) in a
register (inreg). Turns out that all that is needed is to not mark them as
mutually incompatible.
llvm-svn: 160695
are targeting an ELF platform. Only fold gs-relative (and fs-relative) loads
if it is actually sensible to do so for the target platform.
This fixes PR13438.
llvm-svn: 160687
might be deliberate "one time" leaks, so that leak checkers can find them.
This is a reapply of r160602 with the fix that this time I'm committing the
code I thought I was committing last time; the I->eraseFromParent() goes
*after* the break out of the loop.
llvm-svn: 160664
r160529 that was subsequently reverted. The fix was to not call
GV->eraseFromParent() right before the caller does the same. The existing
testcases already caught this bug if run under valgrind.
llvm-svn: 160602
This pass no longer requires that the global pointer value be saved to the
stack or register since it uses bal instruction to compute branch distance.
llvm-svn: 160601
LiveRangeEdit::foldAsLoad() can eliminate a register by folding a load
into its only use. Only do that when the load is safe to move, and it
won't extend any live ranges.
This fixes PR13414.
llvm-svn: 160575
PHIElimination splits critical edges when it predicts it can resolve
interference and eliminate copies. It doesn't split the edge if the
interference wouldn't be resolved anyway because the phi-use register is
live in the critical edge anyway.
Teach PHIElimination to split loop exiting edges with interference, even
if it wouldn't resolve the interference. This removes the necessary
copies from the loop, which is still an improvement from injecting the
copies into the loop.
The test case demonstrates the improvement. Before:
LBB0_1:
cmpb $0, (%rdx)
leaq 1(%rdx), %rdx
movl %esi, %eax
je LBB0_1
After:
LBB0_1:
cmpb $0, (%rdx)
leaq 1(%rdx), %rdx
je LBB0_1
movl %esi, %eax
llvm-svn: 160571
GetBestDestForJumpOnUndef() assumes there is at least 1 successor, which isn't
true if the block ends in an indirect branch with no successors. Fix this by
bailing out earlier in this case.
llvm-svn: 160546
Updated OptimizeCompare in peephole to remove redundant cmp against zero.
We only remove Compare if CF and OF are not used.
rdar://11855129
llvm-svn: 160454
when run on an Intel Atom processor. The failures have arisen due
to changes elsewhere in the trunk over the past 8 weeks or so.
These failures were not detected by the Atom buildbot because the
CPU on the Atom buildbot was not being detected as an Atom CPU.
The fix for this problem is in Host.cpp and X86Subtarget.cpp, but
shall remain commented out until the current set of Atom test failures
are fixed.
Patch by Andy Zhang and Tyler Nowicki!
llvm-svn: 160451
LiveIntervals due to the two-addr pass generating bogus MI code.
The crux of the issue was a loop nesting problem. The intent of the code
which attempts to transform instructions before converting them to
two-addr form is to defer and reprocess any transformed instructions as
the second processing is likely to have more opportunities to coalesce
copies, etc. Unfortunately, there was one section of processing that was
not deferred -- the INSERT_SUBREG rewriting. Due to quirks of how this
rewriting proceeded, not only did it occur early, it removed the bits of
information needed for the deferred processing to correctly generate the
necessary two address form (specifically inserting a copy), but didn't
trigger any immediate assertions and produced what appeared to be
already valid two-address from code. Thus, the assertion only fired much
later in the pipeline.
The fix is to hoist the transformation logic up layer to where it can
more firmly defer all further processing, and to teach the normal
processing to handle an edge case previously handled as part of the
transformation logic. This edge case (already matched tied register
operands) needs to *not* defer any steps.
As has been brought up repeatedly in the process: wow does this code
need refactoring. I *may* squeeze in some time to at least bring sanity
to this loop... but wow... =]
Thanks to Jakob for helpful hints on the way here, and the review.
llvm-svn: 160443
Print the high order register of a double word register operand.
In 32 bit mode, a 64 bit double word integer will be represented
by 2 32 bit registers. This modifier causes the high order register
to be used in the asm expression. It is useful if you are using
doubles in assembler and continue to control register to variable
relationships.
This patch also fixes a related bug in a previous patch:
case 'D': // Second part of a double word register operand
case 'L': // Low order register of a double word register operand
case 'M': // High order register of a double word register operand
I got 'D' and 'M' confused. The second part of a double word operand
will only match 'M' for one of the endianesses. I had 'L' and 'D'
be the opposite twins when 'L' and 'M' are.
llvm-svn: 160429
Fixes PR13371: indvars pass incorrectly substitutes 'undef' values.
I do not like this fix. It's needed until/unless the meaning of undef
changes. It attempts to be complete according to the IR spec, but I
don't have much confidence in the implementation given the difficulty
testing undefined behavior. Worse, this invalidates some of my
hard-fought work on indvars and LSR to optimize pointer induction
variables. It results benchmark regressions, which I'll track
internally. On x86_64 no LTO I see:
-3% huffbench
-3% 400.perlbench
-8% fhourstones
My only suggestion for recovering is to change the meaning of
undef. If we could trust an arbitrary instruction to produce a some
real value that can be manipulated (e.g. incremented) according to
non-undef rules, then this case could be easily handled with SCEV.
llvm-svn: 160421
intrinsics. The second instruction(s) to be handled are the vector versions
of count set bits (ctpop).
The changes here are to clang so that it generates a target independent
vector ctpop when it sees an ARM dependent vector bits set count. The changes
in llvm are to match the target independent vector ctpop and in
VMCore/AutoUpgrade.cpp to update any existing bc files containing ARM
dependent vector pop counts with target-independent ctpops. There are also
changes to an existing test case in llvm for ARM vector count instructions and
to a test for the bitcode upgrade.
<rdar://problem/11892519>
There is deliberately no test for the change to clang, as so far as I know, no
consensus has been reached regarding how to test neon instructions in clang;
q.v. <rdar://problem/8762292>
llvm-svn: 160410
To fetch a subprogram name we should not only inspect the DIE for this subprogram, but optionally inspect
its specification, or its abstract origin (even if there is no inlining), or even specification of an abstract origin.
Reviewed by Benjamin Kramer.
llvm-svn: 160365
large immediates. Add dag combine logic to recover in case the large
immediates doesn't fit in cmp immediate operand field.
int foo(unsigned long l) {
return (l>> 47) == 1;
}
we produce
%shr.mask = and i64 %l, -140737488355328
%cmp = icmp eq i64 %shr.mask, 140737488355328
%conv = zext i1 %cmp to i32
ret i32 %conv
which codegens to
movq $0xffff800000000000,%rax
andq %rdi,%rax
movq $0x0000800000000000,%rcx
cmpq %rcx,%rax
sete %al
movzbl %al,%eax
ret
TargetLowering::SimplifySetCC would transform
(X & -256) == 256 -> (X >> 8) == 1
if the immediate fails the isLegalICmpImmediate() test. For x86,
that's immediates which are not a signed 32-bit immediate.
Based on a patch by Eli Friedman.
PR10328
rdar://9758774
llvm-svn: 160346
uint32_t hi(uint64_t res)
{
uint_32t hi = res >> 32;
return !hi;
}
llvm IR looks like this:
define i32 @hi(i64 %res) nounwind uwtable ssp {
entry:
%lnot = icmp ult i64 %res, 4294967296
%lnot.ext = zext i1 %lnot to i32
ret i32 %lnot.ext
}
The optimizer has optimize away the right shift and truncate but the resulting
constant is too large to fit in the 32-bit immediate field. The resulting x86
code is worse as a result:
movabsq $4294967296, %rax ## imm = 0x100000000
cmpq %rax, %rdi
sbbl %eax, %eax
andl $1, %eax
This patch teaches the x86 lowering code to handle ult against a large immediate
with trailing zeros. It will issue a right shift and a truncate followed by
a comparison against a shifted immediate.
shrq $32, %rdi
testl %edi, %edi
sete %al
movzbl %al, %eax
It also handles a ugt comparison against a large immediate with trailing bits
set. i.e. X > 0x0ffffffff -> (X >> 32) >= 1
rdar://11866926
llvm-svn: 160312
In the added testcase the constant 55 was behind an AssertZext of type i1, and ComputeDemandedBits
reported that some of the bits were both known to be one and known to be zero.
Together with Michael Kuperstein <michael.m.kuperstein@intel.com>
llvm-svn: 160305
Mips shift instructions DSLL, DSRL and DSRA are transformed into
DSLL32, DSRL32 and DSRA32 respectively if the shift amount is between
32 and 63
Here is a description of DSLL:
Purpose: Doubleword Shift Left Logical Plus 32
To execute a left-shift of a doubleword by a fixed amount--32 to 63 bits
Description: GPR[rd] <- GPR[rt] << (sa+32)
The 64-bit doubleword contents of GPR rt are shifted left, inserting
zeros into the emptied bits; the result is placed in
GPR rd. The bit-shift amount in the range 0 to 31 is specified by sa.
This patch implements the direct object output of these instructions.
llvm-svn: 160277
1. FileCheck-ize epilogue.ll and allow another asm instruction to restore %rsp.
2. Remove check in widen_arith-3.ll that was hitting instruction in epilogue instead of
vector add.
llvm-svn: 160274
undef virtual register. The problem is that ProcessImplicitDefs removes the
definition of the register and marks all uses as undef. If we lose the undef
marker then we get a register which has no def, is not marked as undef. The
live interval analysis does not collect information for these virtual
registers and we crash in later passes.
Together with Michael Kuperstein <michael.m.kuperstein@intel.com>
llvm-svn: 160260
It turns out that ASan relied on the at-the-end block insertion order to
(purely by happenstance) disable some LLVM optimizations, which in turn
start firing when the ordering is made more "normal". These
optimizations in turn merge many of the instrumentation reporting calls
which breaks the return address based error reporting in ASan.
We're looking at several different options for fixing this.
llvm-svn: 160256
This is particularly useful to the backend code generators which try to
process things in the incoming function order.
Also, cleanup some uses of IRBuilder to be a bit simpler and more clear.
llvm-svn: 160254
functionality test.
In general, unless the functionality is substantially separated, we
should lump more basic testing into this file. The test running
infrastructure likes having a few test files with more comprehensive
testing within them.
llvm-svn: 160253
It is intended to fix PR11468.
Old prologue and epilogue looked like this:
push %rbp
mov %rsp, %rbp
and $alignment, %rsp
push %r14
push %r15
...
pop %r15
pop %r14
mov %rbp, %rsp
pop %rbp
The problem was to reference the locations of callee-saved registers in exception handling:
locations of callee-saved had to be re-calculated regarding the stack alignment operation. It would
take some effort to implement this in LLVM, as currently MachineLocation can only have the form
"Register + Offset". Funciton prologue and epilogue are now changed to:
push %rbp
mov %rsp, %rbp
push %14
push %15
and $alignment, %rsp
...
lea -$size_of_saved_registers(%rbp), %rsp
pop %r15
pop %r14
pop %rbp
Reviewed by Chad Rosier.
llvm-svn: 160248
Allow the folding of vbroadcastRR to vbroadcastRM, where the memory operand is a spill slot.
PR12782.
Together with Michael Kuperstein <michael.m.kuperstein@intel.com>
llvm-svn: 160230
All SCEV expressions used by LSR formulae must be safe to
expand. i.e. they may not contain UDiv unless we can prove nonzero
denominator.
Fixes PR11356: LSR hoists UDiv.
llvm-svn: 160205
intrinsics with target-indepdent intrinsics. The first instruction(s) to be
handled are the vector versions of count leading zeros (ctlz).
The changes here are to clang so that it generates a target independent
vector ctlz when it sees an ARM dependent vector ctlz. The changes in llvm
are to match the target independent vector ctlz and in VMCore/AutoUpgrade.cpp
to update any existing bc files containing ARM dependent vector ctlzs with
target-independent ctlzs. There are also changes to an existing test case in
llvm for ARM vector count instructions and a new test for the bitcode upgrade.
<rdar://problem/11831778>
There is deliberately no test for the change to clang, as so far as I know, no
consensus has been reached regarding how to test neon instructions in clang;
q.v. <rdar://problem/8762292>
llvm-svn: 160200
is used in cases where global symbols are
directly represented in the GOT and we use an
offset into the global offset table.
This patch adds direct object support for R_MIPS_GOT_DISP.
llvm-svn: 160183
the input vector, it can be bigger (this is helpful for powerpc where <2 x i16>
is a legal vector type but i16 isn't a legal type, IIRC). However this wasn't
being taken into account by ExpandRes_EXTRACT_VECTOR_ELT, causing PR13220.
Lightly tweaked version of a patch by Michael Liao.
llvm-svn: 160116
%shr = lshr i64 %key, 3
%0 = load i64* %val, align 8
%sub = add i64 %0, -1
%and = and i64 %sub, %shr
ret i64 %and
to:
%shr = lshr i64 %key, 3
%0 = load i64* %val, align 8
%sub = add i64 %0, 2305843009213693951
%and = and i64 %sub, %shr
ret i64 %and
The demanded bit optimization is actually a pessimization because add -1 would
be codegen'ed as a sub 1. Teach the demanded constant shrinking optimization
to check for negated constant to make sure it is actually reducing the width
of the constant.
rdar://11793464
llvm-svn: 160101
It is safe if CPSR is killed or re-defined.
When we are done with the basic block, check whether CPSR is live-out.
Do not optimize away cmp if CPSR is live-out.
llvm-svn: 160090
Low order register of a double word register operand. Operands
are defined by the name of the variable they are marked with in
the inline assembler code. This is a way to specify that the
operand just refers to the low order register for that variable.
It is the opposite of modifier 'D' which specifies the high order
register.
Example:
main()
{
long long ll_input = 0x1111222233334444LL;
long long ll_val = 3;
int i_result = 0;
__asm__ __volatile__(
"or %0, %L1, %2"
: "=r" (i_result)
: "r" (ll_input), "r" (ll_val));
}
Which results in:
lui $2, %hi(_gp_disp)
addiu $2, $2, %lo(_gp_disp)
addiu $sp, $sp, -8
addu $2, $2, $25
sw $2, 0($sp)
lui $2, 13107
ori $3, $2, 17476 <-- Low 32 bits of ll_input
lui $2, 4369
ori $4, $2, 8738 <-- High 32 bits of ll_input
addiu $5, $zero, 3 <-- Low 32 bits of ll_val
addiu $2, $zero, 0 <-- High 32 bits of ll_val
#APP
or $3, $4, $5 <-- or i_result, high 32 ll_input, low 32 of ll_val
#NO_APP
addiu $sp, $sp, 8
jr $ra
If not direction is done for the long long for 32 bit variables results
in using the low 32 bits as ll_val shows.
There is an existing bug if 'L' or 'D' is used for the destination register
for 32 bit long longs in that the target value will be updated incorrectly
for the non-specified part unless explicitly set within the inline asm code.
llvm-svn: 160028
X86. Basically, this is a reapplication of r158087 with a few fixes.
Specifically, (1) the stack pointer is restored from the base pointer before
popping callee-saved registers and (2) in obscure cases (see comments in patch)
we must cache the value of the original stack adjustment in the prologue and
apply it in the epilogue.
rdar://11496434
llvm-svn: 160002
multiple scalars and insert them into a vector. Next, we shuffle the elements
into the correct places, as before.
Also fix a small dagcombine bug in SimplifyBinOpWithSameOpcodeHands, when the
migration of bitcasts happened too late in the SelectionDAG process.
llvm-svn: 159991
getCondFromSETOpc, getCondFromCMovOpc, getSETFromCond, getCMovFromCond
No functional change intended.
If we want to update the condition code of CMOV|SET|Jcc, we first analyze the
opcode to get the condition code, then update the condition code, finally
synthesize the new opcode form the new condition code.
llvm-svn: 159955
Access mips register classes via MCRegisterInfo's functions instead of via the
TargetRegisterClasses defined in MipsGenRegisterInfo.inc.
llvm-svn: 159953
This patch removes ~70 lines in InstCombineLoadStoreAlloca.cpp and makes both functions a bit more aggressive than before :)
In theory, we can be more aggressive when removing an alloca than a malloc, because an alloca pointer should never escape, but we are not taking advantage of this anyway
llvm-svn: 159952
It is safe if EFLAGS is killed or re-defined.
When we are done with the basic block, check whether EFLAGS is live-out.
Do not optimize away cmp if EFLAGS is live-out.
llvm-svn: 159888
This means we can do cheap DSE for heap memory.
Nothing is done if the pointer excapes or has a load.
The churn in the tests is mostly due to objectsize, since we want to make sure we
don't delete the malloc call before evaluating the objectsize (otherwise it becomes -1/0)
llvm-svn: 159876
For each Cmp, we check whether there is an earlier Sub which make Cmp
redundant. We handle the case where SUB operates on the same source operands as
Cmp, including the case where the two source operands are swapped.
llvm-svn: 159838
DwarfDebug class could generate the same (inlined) DIVariable twice:
1) when trying to find abstract debug variable for a concrete inlined instance.
2) when explicitly collecting info for variables that were optimized out.
This change makes sure that this duplication won't happen and makes
Clang pass "gdb.opt/inline-locals" test from gdb testsuite.
Reviewed by Eric Christopher.
llvm-svn: 159811
Print the second half of a double word operand.
The include list was cleaned up a bit as well.
Also the test case was modified to test for both
big and little patterns.
llvm-svn: 159787
The CopyToReg nodes that set up the argument registers before a call
must be glued to the call instruction. Otherwise, the scheduler may emit
the physreg copies long before the call, causing long live ranges for
the fixed registers.
Besides disabling good register allocation, that can also expose
problems when EmitInstrWithCustomInserter() splits a basic block during
the live range of a physreg.
llvm-svn: 159721
Implement the TII hooks needed by EarlyIfConversion to create cmov
instructions and estimate their latency.
Early if-conversion is still not enabled by default.
llvm-svn: 159695
- execute_external should be;
- Not on Win32.
- Using bash.
In reverse, "execute_internal" shoud be (Win32 && !bash).
- lit.getBashPath() behaves differently before and after tweaking $PATH.
I will add a few explanations there later.
llvm-svn: 159641
inlineasm-cnstrnt-bad-r-1.ll is NOT supposed to fail, so it was removed. This resulted in the removal of a negative test (inlineasm-cnstrnt-bad-r-1.ll)
llvm-svn: 159625
inlineasm-cnstrnt-bad-r-1.ll is NOT supposed to fail, so it was removed. This resulted in the removal of a negative test (inlineasm-cnstrnt-bad-r-1.ll)
llvm-svn: 159610
in the abstraction for lit test suites so that the various other layers
of abstraction pick up the same behavioral fix, and so that we still get
a complete list of dependencies for the 'check-all' target.
This should fix the follow-on issues of the same nature with various
other build targets, including Clang targets. Sorry for the churn, and
again thanks to Matt for testing and breaking this more thoroughly.
llvm-svn: 159593
No functionality changed here, except that the CMake installed by
default on Ubuntu Lucid should actually work with the makefile
generators now.
Thanks to Matt for the report and head-desking required to figure out
why it was failing.
llvm-svn: 159588
This is still a work in progress but I believe it is currently good enough
to fix PR13122 "Need unit test driver for codegen IR passes". For example,
you can run llc with -stop-after=loop-reduce to have it dump out the IR after
running LSR. Serializing machine-level IR is not yet supported but we have
some patches in progress for that.
The plan is to serialize the IR to a YAML file, containing separate sections
for the LLVM IR, machine-level IR, and whatever other info is needed. Chad
suggested that we stash the stop-after pass in the YAML file and use that
instead of the start-after option to figure out where to restart the
compilation. I think that's a great idea, but since it's not implemented yet
I put the -start-after option into this patch for testing purposes.
llvm-svn: 159570
another mechanical change accomplished though the power of terrible Perl
scripts.
I have manually switched some "s to 's to make escaping simpler.
While I started this to fix tests that aren't run in all configurations,
the massive number of tests is due to a really frustrating fragility of
our testing infrastructure: things like 'grep -v', 'not grep', and
'expected failures' can mask broken tests all too easily.
Essentially, I'm deeply disturbed that I can change the testsuite so
radically without causing any change in results for most platforms. =/
llvm-svn: 159547
versions of Bash. In addition, I can back out the change to the lit
built-in shell test runner to support this.
This should fix the majority of fallout on Darwin, but I suspect there
will be a few straggling issues.
llvm-svn: 159544
and multi-line nature of this test. I don't really feel like bugging
this kind of edge-case, so just put it on one line and use single
quotes. With this, every test *really* passes with the built-in shell
test runner.
llvm-svn: 159530
through my perl nets.
With this, the test suite passes even if I force it to run with the
built-in shell test logic, except for a test which REQUIREs shell.
llvm-svn: 159529
This was done through the aid of a terrible Perl creation. I will not
paste any of the horrors here. Suffice to say, it require multiple
staged rounds of replacements, state carried between, and a few
nested-construct-parsing hacks that I'm not proud of. It happens, by
luck, to be able to deal with all the TCL-quoting patterns in evidence
in the LLVM test suite.
If anyone is maintaining large out-of-tree test trees, feel free to poke
me and I'll send you the steps I used to convert things, as well as
answer any painful questions etc. IRC works best for this type of thing
I find.
Once converted, switch the LLVM lit config to use ShTests the same as
Clang. In addition to being able to delete large amounts of Python code
from 'lit', this will also simplify the entire test suite and some of
lit's architecture.
Finally, the test suite runs 33% faster on Linux now. ;]
For my 16-hardware-thread (2x 4-core xeon e5520): 36s -> 24s
llvm-svn: 159525
a pipeline, and then a positive assertion via grep, use two RUN lines
instead.
Supporting these complex ideas of 'success' and 'failure' across
multiple stages of a pipeline is brittle in the shell world, and would
block switching to ShTest format; it only worked due to contrivances
introduced by the TclTest format.
Writing this as two separate RUN lines seems clearer in any event.
This is another step toward completely removing TclTests from lit.
llvm-svn: 159524
use FileCheck.
Aside from removing a dependence on TCL-style quoting, this also makes
the tests ... significantly more robust. =] It would be really, *really*
great of the maintainer(s) of the CellSPU backend went through and
systematically rewrite these tests to use FileCheck. There are a lot
more that have nearly this bad of abuses.
Another step along the path to a TclTest-free testsuite.
llvm-svn: 159523
just provide and reference separate input files from an Inputs
subdirectory. This pattern works very well in the Clang tree and is
easier to understand in my opinion. It also has fewer limitations and
will remove one particularly annoying use of TCL-style {} quoting from
the testsuite.
Also teach the LLVM lit configuration to avoid recursing into 'Inputs'
subdirectories. This wasn't required for the previous 'Inputs'
subdirectories used due to fortuitous suffix patterns.
This is the first step to completely removing support for TCL-style tests.
llvm-svn: 159520
1) DIContext is now able to return function name for a given instruction address (besides file/line info).
2) llvm-dwarfdump accepts flag --functions that prints the function name (if address is specified by --address flag).
3) test case that checks the basic functionality of llvm-dwarfdump added
llvm-svn: 159512
implicit_def, the other instruction can be anything, including instructions
that define multiple values. Be careful about that and don't assume what operand
0 is.
Fixes pr13249.
llvm-svn: 159509
re-used. Also, build in direct support for accumulating a set of lit
parameters, arguments, and testsuites to run as part of a 'check-all'
rule. This sinks 'check-all' from a Clang-specific construct to
a generic construct of the project.
llvm-svn: 159482
Before this patch in pic 32 bit code we would add the global base register
and not load from that address. This is a really old bug, but before the
introduction of the tls attributes we would never select initial exec for
pic code.
llvm-svn: 159409
Corrected type for index of llvm.x86.avx2.gather.d.pd.256
from 256-bit to 128-bit.
Corrected types for src|dst|mask of llvm.x86.avx2.gather.q.ps.256
from 256-bit to 128-bit.
Support the following intrinsics:
llvm.x86.avx2.gather.d.q, llvm.x86.avx2.gather.q.q
llvm.x86.avx2.gather.d.q.256, llvm.x86.avx2.gather.q.q.256
llvm.x86.avx2.gather.d.d, llvm.x86.avx2.gather.q.d
llvm.x86.avx2.gather.d.d.256, llvm.x86.avx2.gather.q.d.256
llvm-svn: 159402
The original algorithm only used recursive pair fusion of equal-length
types. This is now extended to allow pairing of any types that share
the same underlying scalar type. Because we would still generally
prefer the 2^n-length types, those are formed first. Then a second
set of iterations form the non-2^n-length types.
Also, a call to SimplifyInstructionsInBlock has been added after each
pairing iteration. This takes care of DCE (and a few other things)
that make the following iterations execute somewhat faster. For the
same reason, some of the simple shuffle-combination cases are now
handled internally.
There is some additional refactoring work to be done, but I've had
many requests for this feature, so additional refactoring will come
soon in future commits (as will additional test cases).
llvm-svn: 159330
This is another vestige of the DejaGNU roots. There were FIXMEs in the
lit setup to add a 'lit.site.cfg', which has been around for quite some
time now, so I've properly switched the handling of the 4 things
actually used in site.exp to go through lit.site.cfg now. No more
parsing of the .exp file, one fewer configure-style generated file,
etc., etc.
llvm-svn: 159313
removing '-lit' qualifiers from make rules. I've left a legacy
'check-local-lit' rule in case build scripts have this encoded
somewhere.
llvm-svn: 159311
It takes advantage of r159299 which introduces relocation support for N64.
elf-dump needed to be upgraded to support N64 relocations as well.
This passes make check.
Jack
llvm-svn: 159302
It takes advantage of r159299 which introduces relocation support for N64.
elf-dump needed to be upgraded to support N64 relocations as well.
This passes make check.
Jack
llvm-svn: 159301
Original commit message:
If a constant or a function has linkonce_odr linkage and unnamed_addr, mark it
hidden. Being linkonce_odr guarantees that it is available in every dso that
needs it. Being a constant/function with unnamed_addr guarantees that the
copies don't have to be merged.
llvm-svn: 159272
before the expression root. Any existing operators that are changed to use one
of them needs to be moved between it and the expression root, and recursively
for the operators using that one. When I rewrote RewriteExprTree I accidentally
inverted the logic, resulting in the compacting going down from operators to
operands rather than up from operands to the operators using them, oops. Fix
this, resolving PR12963.
llvm-svn: 159265
'check-llvm'.
Don't worry! 'check' still works! =] To rationalize the names of targets
used to run tests, the vague plan is the following:
make check-llvm # run LLVM reg/unit tests (currently 'check')
make check-clang # run Clang reg/unit tests (currently 'clang-test')
make check-rt # run CompilerRT reg/unit tests
make check-asan # run ASan reg/unit tests (subset of -rt)
make check-tsan # run TSan reg/unit tests (subset of -rt)
make check-all # run as much of the above as is available
The last one respects what projects are checked out and built for
a given tree. Personally, I would like to eventually make 'check' be an
alias for 'check-all'. For now however, it is an alias for 'check-llvm',
and thus no behavior has changed.
While this patch and my plan only really apply to CMake, I think it
might be good to similarly rationalize the naming scheme for the Make
builds.
llvm-svn: 159258
// C - zext(bool) -> bool ? C - 1 : C
if (ZExtInst *ZI = dyn_cast<ZExtInst>(Op1))
if (ZI->getSrcTy()->isIntegerTy(1))
return SelectInst::Create(ZI->getOperand(0), SubOne(C), C);
This ends up forming sext i1 instructions that codegen to terrible code. e.g.
int blah(_Bool x, _Bool y) {
return (x - y) + 1;
}
=>
movzbl %dil, %eax
movzbl %sil, %ecx
shll $31, %ecx
sarl $31, %ecx
leal 1(%rax,%rcx), %eax
ret
Without the rule, llvm now generates:
movzbl %sil, %ecx
movzbl %dil, %eax
incl %eax
subl %ecx, %eax
ret
It also helps with ARM (and pretty much any target that doesn't have a sext i1 :-).
The transformation was done as part of Eli's r75531. He has given the ok to
remove it.
rdar://11748024
llvm-svn: 159230