to have single return block (at least getting there) for optimizations. This
is general goodness but it would prevent some tailcall optimizations.
One specific case is code like this:
int f1(void);
int f2(void);
int f3(void);
int f4(void);
int f5(void);
int f6(void);
int foo(int x) {
switch(x) {
case 1: return f1();
case 2: return f2();
case 3: return f3();
case 4: return f4();
case 5: return f5();
case 6: return f6();
}
}
=>
LBB0_2: ## %sw.bb
callq _f1
popq %rbp
ret
LBB0_3: ## %sw.bb1
callq _f2
popq %rbp
ret
LBB0_4: ## %sw.bb3
callq _f3
popq %rbp
ret
This patch teaches codegenprep to duplicate returns when the return value
is a phi and where the phi operands are produced by tail calls followed by
an unconditional branch:
sw.bb7: ; preds = %entry
%call8 = tail call i32 @f5() nounwind
br label %return
sw.bb9: ; preds = %entry
%call10 = tail call i32 @f6() nounwind
br label %return
return:
%retval.0 = phi i32 [ %call10, %sw.bb9 ], [ %call8, %sw.bb7 ], ... [ 0, %entry ]
ret i32 %retval.0
This allows codegen to generate better code like this:
LBB0_2: ## %sw.bb
jmp _f1 ## TAILCALL
LBB0_3: ## %sw.bb1
jmp _f2 ## TAILCALL
LBB0_4: ## %sw.bb3
jmp _f3 ## TAILCALL
rdar://9147433
llvm-svn: 127953
not have native support for this operation (such as X86).
The legalized code uses two vector INT_TO_FP operations and is faster
than scalarizing.
llvm-svn: 127951
The relevant instruction table entries were changed sometime ago to no longer take
<Rt2> as an operand. Modify ARMDisassemblerCore.cpp to accomodate the change and
add a test case.
llvm-svn: 127935
Proof-of-concept code that code-gens a module to an in-memory MachO object.
This will be hooked up to a run-time dynamic linker library (see: llvm-rtdyld
for similarly conceptual work for that part) which will take the compiled
object and link it together with the rest of the system, providing back to the
JIT a table of available symbols which will be used to respond to the
getPointerTo*() queries.
llvm-svn: 127916
The llvm.dbg.value intrinsic refers to SSA values, not virtual registers, so we
should be able to extend the range of a value by tracking that value through
register copies. This greatly improves the debug value tracking for function
arguments that for some reason are copied to a second virtual register at the
end of the entry block.
We only extend the debug value range where its register is killed. All original
llvm.dbg.value locations are still respected.
Copies from physical registers are ignored. That should not be a problem since
the entry block already adds DBG_VALUE instructions for the virtual registers
holding the function arguments.
llvm-svn: 127912
- Emit mad instead of mad.rn for shader model 1.0
- Emit explicit mov.u32 instructions for reading global variables
- (most PTX instructions cannot take global variable immediates)
llvm-svn: 127895
For example, on 32-bit architecture, don't promote all uses of the IV
to 64-bits just because one use is a 64-bit cast.
Alternate implementation of the patch by Arnaud de Grandmaison.
llvm-svn: 127884
On MSVCRT and compatible, output of %e is incompatible to Posix by default. Number of exponent digits should be at least 2. "%+03d"
FIXME: Implement our formatter in future!
llvm-svn: 127872
makes valgrind stop complaining about uninitialized variables being read when it
accesses a bitfield (category) that shares its bits with these variables.
llvm-svn: 127871
Stack slot real estate is virtually free compared to registers, so it is
advantageous to spill earlier even though the same value is now kept in both a
register and a stack slot.
Also eliminate redundant spills by extending the stack slot live range
underneath reloaded registers.
This can trigger a dead code elimination, removing copies and even reloads that
were only feeding spills.
llvm-svn: 127868
comparisons on x86. Essentially, the way this works is that SUB+SBB sets
the relevant flags the same way a double-width CMP would.
This is a substantial improvement over the generic lowering in LLVM. The output
is also shorter than the gcc-generated output; I haven't done any detailed
benchmarking, though.
llvm-svn: 127852
SCEV may generate expressions composed of multiple pointers, which can
lead to invalid GEP expansion. Until we can teach SCEV to follow strict
pointer rules, make sure no bad GEPs creep into IR.
Fixes rdar://problem/9038671.
llvm-svn: 127839
o A8.6.195 STR (register) -- Encoding T1
o A8.6.193 STR (immediate, Thumb) -- Encoding T1
It has been changed so that now they use different addressing modes
and thus different MC representation (Operand Infos). Modify the
disassembler to reflect the change, and add relevant tests.
llvm-svn: 127833
I have convinced myself that it can only happen when a phi value dies. When it
happens, allocate new virtual registers for the components.
llvm-svn: 127827
rather than an int. Thankfully, this only causes LLVM to miss optimizations, not
generate incorrect code.
This just fixes the zext at the return. We still insert an i32 ZextAssert when
reading a function's arguments, but it is followed by a truncate and another i8
ZextAssert so it is not optimized.
llvm-svn: 127766
report_fatal_error() invokes exit(). We know report_fatal_error() might not write messages to stderr when any errors were detected on FD == 2.
llvm-svn: 127726
chose is having a non-memcpy/memset use and being larger than any native integer
type. Originally I chose having an access of a size smaller than the total size
of the alloca, but this caused some minor issues on the spirit benchmark where
SRoA runs again after some inlining.
This fixes <rdar://problem/8613163>.
llvm-svn: 127718
1. The ARM Darwin *r9 call instructions were pseudo-ized recently.
Modify the ARMDisassemblerCore.cpp file to accomodate the change.
2. The disassembler was unnecessarily adding 8 to the sign-extended imm24:
imm32 = SignExtend(imm24:'00', 32); // A8.6.23 BL, BLX (immediate)
// Encoding A1
It has no business doing such. Removed the offending logic.
Add test cases to arm-tests.txt.
llvm-svn: 127707
After live range splitting, an original value may be available in multiple
registers. Tracing back through the registers containing the same value, find
the best place to insert a spill, determine if the value has already been
spilled, or discover a reaching def that may be rematerialized.
This is only the analysis part. The information is not used for anything yet.
llvm-svn: 127698
memory builtins as equivalent to malloc/free.
This is different from any attribute we have. For example, you can delete the
allocators when their result is unused, but you can't collapse two calls to the
same function, even if no global/memory state has changed in between. The
noalias return states that the result does not alias any other pointer, but
instcombine optimizes malloc() as though the result is non-null for the purpose
of eliminating unused pointers.
llvm-svn: 127673
v2 = bitcast v1
...
v3 = bitcast v2
...
= v3
=>
v2 = bitcast v1
...
= v1
if v1 and v3 are of in the same register class.
bitcast between i32 and fp (and others) are often not nops since they
are in different register classes. These bitcast instructions are often
left because they are in different basic blocks and cannot be
eliminated by dag combine.
rdar://9104514
llvm-svn: 127668
instruction set. This code adds support for the VEX prefix
and for the YMM registers accessible on AVX-enabled
architectures. Instruction table support that enables AVX
instructions for the disassembler is in an upcoming patch.
llvm-svn: 127644
and then go kablooie. The problem was that it was tracking the PHI nodes anew
each time into this function. But it didn't need to. And because the recursion
didn't know that a PHINode was visited before, it would go ahead and call
itself.
There is a testcase, but unfortunately it's too big to add. This problem will go
away with the EH rewrite.
<rdar://problem/8856298>
llvm-svn: 127640