Since mem2reg isn't run at -O0, we get a ton of reloads from the stack,
for example, before, this code:
int foo(int x, int y, int z) {
return x+y+z;
}
used to compile into:
_foo: ## @foo
subq $12, %rsp
movl %edi, 8(%rsp)
movl %esi, 4(%rsp)
movl %edx, (%rsp)
movl 8(%rsp), %edx
movl 4(%rsp), %esi
addl %edx, %esi
movl (%rsp), %edx
addl %esi, %edx
movl %edx, %eax
addq $12, %rsp
ret
Now we produce:
_foo: ## @foo
subq $12, %rsp
movl %edi, 8(%rsp)
movl %esi, 4(%rsp)
movl %edx, (%rsp)
movl 8(%rsp), %edx
addl 4(%rsp), %edx ## Folded load
addl (%rsp), %edx ## Folded load
movl %edx, %eax
addq $12, %rsp
ret
Fewer instructions and less register use = faster compiles.
llvm-svn: 113102
location is being re-stored to the memory location. We would get
a dangling pointer from the SSAUpdate data structure and miss a
use. This fixes PR8068
llvm-svn: 113042
checking each standalone condition and decide whether emit target
specific nodes or remove the condition if it's already matched before.
llvm-svn: 113031
invertible. ScalarEvolution's folding routines don't always succeed
in canonicalizing equal expressions to a single canonical form, and
this can cause these asserts to fail, even though there's no actual
correctness problem. This fixes PR8066.
llvm-svn: 113021
"Use target specific nodes instead of relying in unpckl and
unpckh pattern fragments during isel time. Also place a
depth limit in getShuffleScalarElt.
llvm-svn: 113020
overload UserInInstr. Explicitly check Allocatable. The early exit in the
condition will mean the performance impact of the extra test should be
minimal.
llvm-svn: 113016
solve the root problem, but it corrects the bug in the code I added to
support legalizing in the case where the non-extended type is also legal.
llvm-svn: 112997
"For ARM stack frames that utilize variable sized objects and have either
large local stack areas or require dynamic stack realignment, allocate a
base register via which to access the local frame. This allows efficient
access to frame indices not accessible via the FP (either due to being out
of range or due to dynamic realignment) or the SP (due to variable sized
object allocation). In particular, this greatly improves efficiency of access
to spill slots in Thumb functions which contain VLAs."
r112986 fixed a latent bug exposed by the above.
llvm-svn: 112989
slot.
Teach it to also check for early clobbered aliases, and early clobber operands
following the current operand.
This fixes the miscompilation in PR8044 where EC registers eax and ecx were
being used for inputs.
llvm-svn: 112988
alignment should be performed. Otherwise dynamic realignment may trigger
when the register allocator has already used the frame pointer as a general
purpose register. That is, we need to make sure that the list of reserved
registers doesn't change after register allocation.
llvm-svn: 112986
instructions prior to regalloc. Since it's getting a little close to
the 2.8 branch deadline, I'll have to leave the rest of the instructions
handled by the NEONPreAllocPass for now, but I didn't want to leave half
of the VLD instructions converted and the other half not.
llvm-svn: 112983
Original commit message:
Use the SSAUpdator to turn calls to eh.exception that are not in a
landing pad into uses of registers rather than loads from a stack
slot. Doesn't touch the 'orrible hack code - Bill needs to persuade
me harder :)
llvm-svn: 112952
The cmake (+ MSVC) build is broken if you don't select your native
target.
e.g. 'cmake -D LLVM_TARGETS_TO_BUILD="MyNonNativeTarget" .'
This is because cmake currently sets the LLVM_NATIVE_* definitions
regardless of whether the native target is selected (causing build
errors).
Patch by Mike Gist!
llvm-svn: 112946
vabd intrinsic and add and/or zext operations. In the case of vaba, this
also avoids the need for a DAG combine pattern to combine vabd with add.
Update tests. Auto-upgrade the old intrinsics.
llvm-svn: 112941
- Teach getShuffleScalarElt how to handle more target
specific nodes, so the DAGCombine can make use of it.
- Add another hack to avoid the node update problem
during legalization. More description on the comments
llvm-svn: 112934
Remove #uses comments from functions: they we're padded out to column 50
and were potentially confusing for externally visible functions.
going further, remove the "<i8**> [#uses=3]" comments entirely. They
add a lot of noise, confuse people about what the IR is, and don't add
any particular value. When the types are long it makes it really really
hard to read IR.
If someone is interested in this sort of thing, the right way to do this
is to implement an AsmAnnotationWriter that produces the same output, and
add a flag to llvm-dis (only) to produce this output.
llvm-svn: 112899
and were potentially confusing for externally visible functions.
going further, remove the "<i8**> [#uses=3]" comments entirely. They
add a lot of noise, confuse people about what the IR is, and don't add
any particular value. When the types are long it makes it really really
hard to read IR.
If someone is interested in this sort of thing, the right way to do this
is to implement an AsmAnnotationWriter that produces the same output, and
add a flag to llvm-dis (only) to produce this output.
llvm-svn: 112894
I wasn't able to convince myself that all GetMainExecutable
implementations always return absolute paths; this prevents
unexpected behavior in case they ever don't.
llvm-svn: 112888
large local stack areas or require dynamic stack realignment, allocate a
base register via which to access the local frame. This allows efficient
access to frame indices not accessible via the FP (either due to being out
of range or due to dynamic realignment) or the SP (due to variable sized
object allocation). In particular, this greatly improves efficiency of access
to spill slots in Thumb functions which contain VLAs.
rdar://7352504
rdar://8374540
rdar://8355680
llvm-svn: 112883
- Add patterns to match the following MMX builtins:
* __builtin_ia32_vec_init_v8qi
* __builtin_ia32_vec_init_v4hi
* __builtin_ia32_vec_init_v2si
* __builtin_ia32_vec_ext_v2si
These builtins do not correspond to a single MMX instruction. They will have
to be lowered -- most likely in the back-end.
llvm-svn: 112881
there are clearly no stores between the load and the store. This fixes
this miscompile reported as PR7833.
This breaks the test/CodeGen/X86/narrow_op-2.ll optimization, which is
safe, but awkward to prove safe. Move it to X86's README.txt.
llvm-svn: 112861
I'm sure it is harmless. Original commit message:
If PrototypeValue is erased in the middle of using the SSAUpdator
then the SSAUpdator may access freed memory. Instead, simply pass
in the type and name explicitly, which is all that was used anyway.
llvm-svn: 112810
add, and subtract operations with zero-extended or sign-extended vectors.
Update tests. Add auto-upgrade support for the old intrinsics.
llvm-svn: 112773
moment, as there's a testcase that uses it and expects it
to be subject to optimizations; we won't be doing that.
Some adjustments based on feedback from Bill.
llvm-svn: 112754
check more strict, breaking some cases not checked in the
testsuite, but also exposes some foldings not done before,
as this example:
movaps (%rdi), %xmm0
movaps (%rax), %xmm1
movaps %xmm0, %xmm2
movss %xmm1, %xmm2
shufps $36, %xmm2, %xmm0
now is generated as:
movaps (%rdi), %xmm0
movaps %xmm0, %xmm1
movlps (%rax), %xmm1
shufps $36, %xmm1, %xmm0
llvm-svn: 112753
of a base class.
This makes it possible to unregister the file from FilesToRemove when
the file is done. Also, this eliminates the need for
formatted_tool_output_file.
llvm-svn: 112706
landing pad into uses of registers rather than loads from a stack
slot. Doesn't touch the 'orrible hack code - Bill needs to persuade
me harder :)
llvm-svn: 112702
on llvmdev: SRoA is introducing MMX datatypes like <1 x i64>,
which then cause random problems because the X86 backend is
producing mmx stuff without inserting proper emms calls.
In the short term, force off MMX datatypes. In the long term,
the X86 backend should not select generic vector types to MMX
registers. This is being worked on, but won't be done in time
for 2.8. rdar://8380055
llvm-svn: 112696
available in normal llvm operators. We aren't going to
use those for MMX any more because it's unsafe for the
optimizers to synthesize new MMX instructions.
llvm-svn: 112685
and output the dwarf line number tables. This takes the current loc info after
an instruction is assembled and saves the needed info into an object that has
vector and for each section. These objects will be used for the final patch to
build and emit the encoded dwarf line number tables. Again for now this is only
in the Mach-O streamer but at some point will move to a more generic place.
llvm-svn: 112668
int x(int t) {
if (t & 256)
return -26;
return 0;
}
We generate this:
tst.w r0, #256
mvn r0, #25
it eq
moveq r0, #0
while gcc generates this:
ands r0, r0, #256
it ne
mvnne r0, #25
bx lr
Scandalous really!
During ISel time, we can look for this particular pattern. One where we have a
"MOVCC" that uses the flag off of a CMPZ that itself is comparing an AND
instruction to 0. Something like this (greatly simplified):
%r0 = ISD::AND ...
ARMISD::CMPZ %r0, 0 @ sets [CPSR]
%r0 = ARMISD::MOVCC 0, -26 @ reads [CPSR]
All we have to do is convert the "ISD::AND" into an "ARM::ANDS" that sets [CPSR]
when it's zero. The zero value will all ready be in the %r0 register and we only
need to change it if the AND wasn't zero. Easy!
llvm-svn: 112664
Reserved registers are unpredictable, and are treated as always live by machine
DCE.
Allocatable registers are never reserved, and can be used for virtual registers.
Unreserved, unallocatable registers can not be used for virtual registers, but
otherwise behave like a normal allocatable register. Most targets only have
the flag register in this set.
llvm-svn: 112649
I have not been able to find a way to test each in isolation, for a few reasons:
1) The ability to look-through non-i1 BinaryOperator's requires the ability to look through non-constant
ICmps in order for it to ever trigger.
2) The ability to do LVI-powered PHI value determination only matters in cases that ProcessBranchOnPHI
can't handle. Since it already handles all the cases without other instructions in the def-use chain
between the PHI and the branch, it requires the ability to look through ICmps and/or BinaryOperators
as well.
llvm-svn: 112611
1. Allocate them in the entry block of the function to enable function-wide
re-use. The instructions to create them should be re-materializable, so
there shouldn't be additional cost compared to creating them local
to the basic blocks where they are used.
2. Collect all of the frame index references for the function and sort them
by the local offset referenced. Iterate over the sorted list to
allocate the virtual base registers. This enables creation of base
registers optimized for positive-offset access of frame references.
(Note: This may be appropriate to later be a target hook to do the
sorting in a target appropriate manner. For now it's done here for
simplicity.)
llvm-svn: 112609
any more. I plan to reimplement alloca promotion using SSAUpdater later.
It looks like Bill's URoR logic really always needs domtree, so the pass
now always asks for domtree info.
llvm-svn: 112597
two are weak, we make them thunks to a new strong function) so don't iterate
through the function list as we're modifying it.
Also add back the outermost loop which got removed during the cleanups.
llvm-svn: 112595
This actually exposed an infinite recursion bug in ComputeValueKnownInPredecessors which theoretically already existed (in JumpThreading's
handling of and/or of i1's), but never manifested before. This patch adds a tracking set to prevent this case.
llvm-svn: 112589
getMagicNumber was treating the _binary_ data it read in as a
null terminated string. This resulted in the std::string
calculating the length, and causing an assert in other code that
assumed that the length it passed was the same as the length of
the string it would get back.
llvm-svn: 112586