x86-32: 32-bit calls were named "call" not "calll". 64-bit calls were correctly
named "callq", so this only impacted x86-32.
This fixes rdar://8456370 - llvm-mc rejects 'calll'
This also exposes that mingw/64 is generating a 32-bit call instead of a 64-bit call,
I will file a bugzilla.
llvm-svn: 114534
(sbbl x, x) sets the registers to 0 or ~0. Combined with two's complement arithmetic, we can fold
the intermediate AND and the ADD into a single SUB.
This fixes <rdar://problem/8449754>.
llvm-svn: 114460
CombinerAA cannot assume that different FrameIndex's never alias, but can instead use
MachineFrameInfo to get the actual offsets of these slots and check for actual aliasing.
This fixes CodeGen/X86/2010-02-19-TailCallRetAddrBug.ll and CodeGen/X86/tailcallstack64.ll
when CombinerAA is enabled, modulo a different register allocation sequence.
llvm-svn: 114348
between the high and low registers for prologue/epilogue code. This was
a Darwin-only thing that wasn't providing a realistic benefit anymore.
Combining the save areas simplifies the compiler code and results in better
ARM/Thumb2 codegen.
For example, previously we would generate code like:
push {r4, r5, r6, r7, lr}
add r7, sp, #12
stmdb sp!, {r8, r10, r11}
With this change, we combine the register saves and generate:
push {r4, r5, r6, r7, r8, r10, r11, lr}
add r7, sp, #12
rdar://8445635
llvm-svn: 114340
NO path to the destination containing side effects, not that SOME path contains no side effects.
In practice, this only manifests with CombinerAA enabled, because otherwise the chain has little
to no branching, so "any" is effectively equivalent to "all".
llvm-svn: 114268
value should be in GPRs when it's going to be used as a scalar, and we use
VMOVRRD to make that happen, but if the value is converted back to a vector
we need to fold to a simple bit_convert. Radar 8407927.
llvm-svn: 114233
1) Do forward copy propagation. This makes it easier to estimate the cost of the
instruction being sunk.
2) Break critical edges on demand, including cases where the value is used by
PHI nodes.
Critical edge splitting is not yet enabled by default.
llvm-svn: 114227
legacy asm printer uses instructions of the form, "mov r0, r0, lsl #3", while
the MC-instruction printer uses the form "lsl r0, r0, #3". The latter mnemonic
is correct and preferred according the ARM documentation (A8.6.98). The former
are pseudo-instructions for the latter.
llvm-svn: 114221
expression to include the modifier.
- Gross, but this a corner case we don't expect to see often in practice, but
it is worth accepting.
- Also improves diagnostics on invalid modifiers.
llvm-svn: 114154
walking the asm arguments once and stashing their Values. This is
wrong because the same memory location can be in the list twice, and
if the first one has a sunkaddr substituted, the stashed value for the
second one will be wrong (use-after-free). PR 8154.
llvm-svn: 114104
a Constant into a ConstantRange. Handle this conservatively for now, rather than asserting. The testcase is
more complex that I would like, but the manifestation of the problem is sensitive to iteration orders and the state of the
LVI cache, and I have not been able to reproduce it with manually constructed or simplified cases.
Fixes PR8162.
llvm-svn: 114103
This cleans up after the mess r108567 left in the CellSPU backend.
ORCvt-instruction were used to reinterpret registers, and the ORs were then
removed by isMoveInstr(). This patch now removes 350 instrucions of format:
or $3, $3, $3
(from the 52 testcases in CodeGen/CellSPU). One case of a nonexistant or is
checked for.
Some moves of the form 'ori $., $., 0' and 'ai $., $., 0' still remain.
llvm-svn: 114074
The ELF implementation now creates text, data and bss to match the gnu as
behavior.
The text streamer still has the old MachO specific behavior since
the testsuite checks that it will error when a directive is given
before a setting the current section for example.
A nice benefit is that -n is not required anymore when producing
ELF files.
llvm-svn: 114027
encountered while building llvm-gcc for arm. This is probably the same issue
that the ppc buildbot hit. llvm::prior works on a MachineBasicBlock::iterator,
not a plain MachineInstr.
llvm-svn: 113983
backing out following to get it back to green,
so I can investigate in peace:
svn merge -c -113840 llvm/test/CodeGen/ARM/arm-and-tst-peephole.ll
svn merge -c -113876 -c -113839 llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
llvm-svn: 113980
"The register specified for a dregpair is the corresponding Q register, so to
get the pair, we need to look up the sub-regs based on the qreg. Create a
lookup function since we don't have access to TargetRegisterInfo here to
be able to use getSubReg(ARM::dsub_[01])."
Additionaly, fix the NEON VLD1* and VST1* instruction patterns not to use
the dregpair modifier for the 2xdreg versions. Explicitly specifying the two
registers as operands is more correct and more consistent with the other
instruction patterns. This enables further cleanup of special case code in the
disassembler as a nice side-effect.
llvm-svn: 113903
to expose greater opportunities for store narrowing in codegen. This patch fixes a potential
infinite loop in instcombine caused by one of the introduced transforms being overly aggressive.
llvm-svn: 113763
This can result in increased opportunities for store narrowing in code generation. Update a number of
tests for this change. This fixes <rdar://problem/8285027>.
Additionally, because this inverts the order of ors and ands, some patterns for optimizing or-of-and-of-or
no longer fire in instances where they did originally. Add a simple transform which recaptures most of these
opportunities: if we have an or-of-constant-or and have failed to fold away the inner or, commute the order
of the two ors, to give the non-constant or a chance for simplification instead.
llvm-svn: 113679
to use AddrMode4, there was a count of the registers stored in one of the
operands. I changed that to just count the operands but forgot to adjust for
the size of D registers. This was noticed by Evan as a performance problem
but it is a potential correctness bug as well, since it is possible that this
could merge a base update with a non-matching immediate.
llvm-svn: 113576
take multiple cycles to decode.
For the current if-converter clients (actually only ARM), the instructions that
are predicated on false are not nops. They would still take machine cycles to
decode. Micro-coded instructions such as LDM / STM can potentially take multiple
cycles to decode. If-converter should take treat them as non-micro-coded
simple instructions.
llvm-svn: 113570
- Output format and some of the code stolen from macho-dump.
- Somewhat incomplete and probably buggy.
- Comes with a very basic test.
llvm-svn: 113488
unrolling threshold to the optimize-for-size threshold. Basically, for loops containing calls, unrolling
can still be profitable as long as the loop is REALLY small.
llvm-svn: 113439
turning (fptrunc (sqrt (fpext x))) -> (sqrtf x) is great, but we have
to delete the original sqrt as well. Not doing so causes us to do
two sqrt's when building with -fmath-errno (the default on linux).
llvm-svn: 113260
always be disambiguated as sldtw. sldtw and sldtq with
a mem operands have the same effect, but sldtw is more
compact. Force it to sldtw, resolving rdar://8017530
llvm-svn: 113186
in the duplicated block instead of duplicating them.
Duplicating them into the end of the loop and the preheader
means that we got a phi node in the header of the loop,
which prevented LICM from hoisting them. GVN would
usually come around later and merge the duplicated
instructions so we'd get reasonable output... except that
anything dependent on the shoulda-been-hoisted value can't
be hoisted. In PR5319 (which this fixes), a memory value
didn't get promoted.
llvm-svn: 113134
Since mem2reg isn't run at -O0, we get a ton of reloads from the stack,
for example, before, this code:
int foo(int x, int y, int z) {
return x+y+z;
}
used to compile into:
_foo: ## @foo
subq $12, %rsp
movl %edi, 8(%rsp)
movl %esi, 4(%rsp)
movl %edx, (%rsp)
movl 8(%rsp), %edx
movl 4(%rsp), %esi
addl %edx, %esi
movl (%rsp), %edx
addl %esi, %edx
movl %edx, %eax
addq $12, %rsp
ret
Now we produce:
_foo: ## @foo
subq $12, %rsp
movl %edi, 8(%rsp)
movl %esi, 4(%rsp)
movl %edx, (%rsp)
movl 8(%rsp), %edx
addl 4(%rsp), %edx ## Folded load
addl (%rsp), %edx ## Folded load
movl %edx, %eax
addq $12, %rsp
ret
Fewer instructions and less register use = faster compiles.
llvm-svn: 113102
location is being re-stored to the memory location. We would get
a dangling pointer from the SSAUpdate data structure and miss a
use. This fixes PR8068
llvm-svn: 113042
"For ARM stack frames that utilize variable sized objects and have either
large local stack areas or require dynamic stack realignment, allocate a
base register via which to access the local frame. This allows efficient
access to frame indices not accessible via the FP (either due to being out
of range or due to dynamic realignment) or the SP (due to variable sized
object allocation). In particular, this greatly improves efficiency of access
to spill slots in Thumb functions which contain VLAs."
r112986 fixed a latent bug exposed by the above.
llvm-svn: 112989
vabd intrinsic and add and/or zext operations. In the case of vaba, this
also avoids the need for a DAG combine pattern to combine vabd with add.
Update tests. Auto-upgrade the old intrinsics.
llvm-svn: 112941
large local stack areas or require dynamic stack realignment, allocate a
base register via which to access the local frame. This allows efficient
access to frame indices not accessible via the FP (either due to being out
of range or due to dynamic realignment) or the SP (due to variable sized
object allocation). In particular, this greatly improves efficiency of access
to spill slots in Thumb functions which contain VLAs.
rdar://7352504
rdar://8374540
rdar://8355680
llvm-svn: 112883
there are clearly no stores between the load and the store. This fixes
this miscompile reported as PR7833.
This breaks the test/CodeGen/X86/narrow_op-2.ll optimization, which is
safe, but awkward to prove safe. Move it to X86's README.txt.
llvm-svn: 112861
add, and subtract operations with zero-extended or sign-extended vectors.
Update tests. Add auto-upgrade support for the old intrinsics.
llvm-svn: 112773
check more strict, breaking some cases not checked in the
testsuite, but also exposes some foldings not done before,
as this example:
movaps (%rdi), %xmm0
movaps (%rax), %xmm1
movaps %xmm0, %xmm2
movss %xmm1, %xmm2
shufps $36, %xmm2, %xmm0
now is generated as:
movaps (%rdi), %xmm0
movaps %xmm0, %xmm1
movlps (%rax), %xmm1
shufps $36, %xmm1, %xmm0
llvm-svn: 112753
on llvmdev: SRoA is introducing MMX datatypes like <1 x i64>,
which then cause random problems because the X86 backend is
producing mmx stuff without inserting proper emms calls.
In the short term, force off MMX datatypes. In the long term,
the X86 backend should not select generic vector types to MMX
registers. This is being worked on, but won't be done in time
for 2.8. rdar://8380055
llvm-svn: 112696
int x(int t) {
if (t & 256)
return -26;
return 0;
}
We generate this:
tst.w r0, #256
mvn r0, #25
it eq
moveq r0, #0
while gcc generates this:
ands r0, r0, #256
it ne
mvnne r0, #25
bx lr
Scandalous really!
During ISel time, we can look for this particular pattern. One where we have a
"MOVCC" that uses the flag off of a CMPZ that itself is comparing an AND
instruction to 0. Something like this (greatly simplified):
%r0 = ISD::AND ...
ARMISD::CMPZ %r0, 0 @ sets [CPSR]
%r0 = ARMISD::MOVCC 0, -26 @ reads [CPSR]
All we have to do is convert the "ISD::AND" into an "ARM::ANDS" that sets [CPSR]
when it's zero. The zero value will all ready be in the %r0 register and we only
need to change it if the AND wasn't zero. Easy!
llvm-svn: 112664
I have not been able to find a way to test each in isolation, for a few reasons:
1) The ability to look-through non-i1 BinaryOperator's requires the ability to look through non-constant
ICmps in order for it to ever trigger.
2) The ability to do LVI-powered PHI value determination only matters in cases that ProcessBranchOnPHI
can't handle. Since it already handles all the cases without other instructions in the def-use chain
between the PHI and the branch, it requires the ability to look through ICmps and/or BinaryOperators
as well.
llvm-svn: 112611
This actually exposed an infinite recursion bug in ComputeValueKnownInPredecessors which theoretically already existed (in JumpThreading's
handling of and/or of i1's), but never manifested before. This patch adds a tracking set to prevent this case.
llvm-svn: 112589
1) nuke ConstDataCoalSection, which is dead.
2) revise my previous patch for rdar://8018335,
which was completely wrong. Specifically, it doesn't
make sense to mark __TEXT,__const_coal as PURE_INSTRUCTIONS,
because it is for readonly data. templates (it turns out)
go to const_coal_nt. The real fix for rdar://8018335 was
to give ConstTextCoalSection a section kind of ReadOnly
instead of Text.
llvm-svn: 112496
when the top elements of a vector are undefined. This happens all
the time for X86-64 ABI stuff because only the low 2 elements of
a 4 element vector are defined. For example, on:
_Complex float f32(_Complex float A, _Complex float B) {
return A+B;
}
We used to produce (with SSE2, SSE4.1+ uses insertps):
_f32: ## @f32
movdqa %xmm0, %xmm2
addss %xmm1, %xmm2
pshufd $16, %xmm2, %xmm2
pshufd $1, %xmm1, %xmm1
pshufd $1, %xmm0, %xmm0
addss %xmm1, %xmm0
pshufd $16, %xmm0, %xmm1
movdqa %xmm2, %xmm0
unpcklps %xmm1, %xmm0
ret
We now produce:
_f32: ## @f32
movdqa %xmm0, %xmm2
addss %xmm1, %xmm2
pshufd $1, %xmm1, %xmm1
pshufd $1, %xmm0, %xmm3
addss %xmm1, %xmm3
movaps %xmm2, %xmm0
unpcklps %xmm3, %xmm0
ret
This implements rdar://8368414
llvm-svn: 112378
all the other LDM/STM instructions. This fixes asm printer crashes when
compiling with -O0. I've changed one of the NEON tests (vst3.ll) to run
with -O0 to check this in the future.
Prior to this change VLDM/VSTM used addressing mode #5, but not really.
The offset field was used to hold a count of the number of registers being
loaded or stored, and the AM5 opcode field was expanded to specify the IA
or DB mode, instead of the standard ADD/SUB specifier. Much of the backend
was not aware of these special cases. The crashes occured when rewriting
a frameindex caused the AM5 offset field to be changed so that it did not
have a valid submode. I don't know exactly what changed to expose this now.
Maybe we've never done much with -O0 and NEON. Regardless, there's no longer
any reason to keep a count of the VLDM/VSTM registers, so we can use
addressing mode #4 and clean things up in a lot of places.
llvm-svn: 112322
A = shl x, 42
...
B = lshr ..., 38
which can be transformed into:
A = shl x, 4
...
iff we can prove that the would-be-shifted-in bits
are already zero. This eliminates two shifts in the testcase
and allows eliminate of the whole i128 chain in the real example.
llvm-svn: 112314
framework, which is good at ripping through bitfield
operations. This generalize a bunch of the existing
xforms that instcombine does, such as
(x << c) >> c -> and
to handle intermediate logical nodes. This is useful for
ripping up the "promote to large integer" code produced by
SRoA.
llvm-svn: 112304