cost modeling for if-conversion. Now if only we had a way to estimate the misprediction probability.
Adjsut CodeGen/ARM/ifcvt10.ll. The pipeline on Cortex-A8 is long enough that it is still profitable
to predicate an ldm, but the shorter pipeline on Cortex-A9 makes it unprofitable.
llvm-svn: 114995
Rather than having arbitrary cutoffs, actually try to cost model the conversion.
For now, the constants are tuned to more or less match our existing behavior, but these will be
changed to reflect realistic values as this work proceeds.
llvm-svn: 114973
llc now recognizes the "intent" to support MC/obj emission for ARM, but
given that they are all stubs, it asserts on --filetype=obj --march=arm
Patch by Jason Kim.
llvm-svn: 114856
(yet) recognize the 'trap' mnemonic, so we use .short/.long to emit the
opcode directly. On Darwin, however, we do want the mnemonic for more
readable assembly code and better disassembly.
Adjust the .td file to use the 'trap' mnemonic and handle using the binutils
workaround in the assembly printer. Also tweak the formatting of the opcode
values to make them consistent between the MC printer and the old printer.
llvm-svn: 114679
new VariantKind to the MCSymbolExpr seems like overkill, but I'm not sure
there's a more straightforward way to get the printing difference captured.
(i.e., x86 uses @PLT, ARM uses (PLT)).
llvm-svn: 114613
CombineTo to avoid putting the result on the worklist. I don't think it makes
much difference for now, but it might help someday as we add more DAG
combine optimizations.
llvm-svn: 114595
of those. Refactor to share code for handling BUILD_VECTOR(VMOVRRD).
I don't have a testcase that exercises this, but it seems like an obvious
good thing to do.
llvm-svn: 114589
passed the root of the match, even though only a few patterns
actually needed this (one in X86, several in ARM [which should
be refactored anyway], and some in CellSPU that I don't feel
like detangling). Instead of requiring all ComplexPatterns to
take the dead root, have targets opt into getting the root by
putting SDNPWantRoot on the ComplexPattern.
llvm-svn: 114471
I am unable to write a test for this case, help is solicited, though...
What I did is to tickle the code in the debugger and verify that we do the right thing.
llvm-svn: 114430
into OptimizeCompareInstr.
This necessitates the passing of CmpValue around,
so widen the virtual functions to accomodate.
No functionality changes.
llvm-svn: 114428
between the high and low registers for prologue/epilogue code. This was
a Darwin-only thing that wasn't providing a realistic benefit anymore.
Combining the save areas simplifies the compiler code and results in better
ARM/Thumb2 codegen.
For example, previously we would generate code like:
push {r4, r5, r6, r7, lr}
add r7, sp, #12
stmdb sp!, {r8, r10, r11}
With this change, we combine the register saves and generate:
push {r4, r5, r6, r7, r8, r10, r11, lr}
add r7, sp, #12
rdar://8445635
llvm-svn: 114340
value should be in GPRs when it's going to be used as a scalar, and we use
VMOVRRD to make that happen, but if the value is converted back to a vector
we need to fold to a simple bit_convert. Radar 8407927.
llvm-svn: 114233
functions in ARMBaseInfo.h so it can be used in the MC library as well.
For anything bigger than this, we may want a means to have a small support
library for shared helper functions like this. Cross that bridge when we
come to it.
llvm-svn: 114016
VFP instructions use it for loading some constants, so implement that
handling.
Not thrilled with adding a member to MCOperand, but not sure there's much of
a better option that's not pretty fragile (like putting a double in the
union instead and just assuming that's good enough). Suggestions welcome...
llvm-svn: 113996
Recognize VLD1q64Pseudo as a stack slot load.
Reject these if they are loading or storing a subregister. The API (and
VirtRegRewriter) doesn't know how to deal with that.
llvm-svn: 113985
encountered while building llvm-gcc for arm. This is probably the same issue
that the ppc buildbot hit. llvm::prior works on a MachineBasicBlock::iterator,
not a plain MachineInstr.
llvm-svn: 113983
backing out following to get it back to green,
so I can investigate in peace:
svn merge -c -113840 llvm/test/CodeGen/ARM/arm-and-tst-peephole.ll
svn merge -c -113876 -c -113839 llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
llvm-svn: 113980
"The register specified for a dregpair is the corresponding Q register, so to
get the pair, we need to look up the sub-regs based on the qreg. Create a
lookup function since we don't have access to TargetRegisterInfo here to
be able to use getSubReg(ARM::dsub_[01])."
Additionaly, fix the NEON VLD1* and VST1* instruction patterns not to use
the dregpair modifier for the 2xdreg versions. Explicitly specifying the two
registers as operands is more correct and more consistent with the other
instruction patterns. This enables further cleanup of special case code in the
disassembler as a nice side-effect.
llvm-svn: 113903
get the pair, we need to look up the sub-regs based on the qreg. Create a
lookup function since we don't have access to TargetRegisterInfo here to
be able to use getSubReg(ARM::dsub_[01]).
llvm-svn: 113875
an argument, so that we can distinguish instructions with the same register
classes but different numbers of registers (e.g., vld3 and vld4). Fix some
of the non-pseudo NEON ld/st instruction itineraries to reflect the number
of registers loaded or stored, not just the opcode name.
llvm-svn: 113854
by morphing the 'and' to its recording form 'andS'.
This is basically a test commit into this area, to
see whether the bots like me. Several generalizations
can be applied and various avenues of code simplification
are open. I'll introduce those as I go.
I am aware of stylistic input from Bill Wendling, about
where put the analysis complexity, but I am positive
that we can move things around easily and will find a
satisfactory solution.
llvm-svn: 113839
the end of the line on a parser error, allowing skipping to happen
for syntactic errors but not for semantic errors. Before we would
miss emitting a diagnostic about the second line, because we skipped
it due to the semantic error on the first line:
foo %eax
bar %al
This fixes rdar://8414033 - llvm-mc ignores lines after an invalid instruction mnemonic errors
llvm-svn: 113688
iterator when an optimization took place. This allows us to do more insane
things with the code than just remove an instruction or two.
llvm-svn: 113640
to use AddrMode4, there was a count of the registers stored in one of the
operands. I changed that to just count the operands but forgot to adjust for
the size of D registers. This was noticed by Evan as a performance problem
but it is a potential correctness bug as well, since it is possible that this
could merge a base update with a non-matching immediate.
llvm-svn: 113576
take multiple cycles to decode.
For the current if-converter clients (actually only ARM), the instructions that
are predicated on false are not nops. They would still take machine cycles to
decode. Micro-coded instructions such as LDM / STM can potentially take multiple
cycles to decode. If-converter should take treat them as non-micro-coded
simple instructions.
llvm-svn: 113570
instruction in the class would be decoded to. Or zero if the number of
uOPs must be determined dynamically.
This will be used to determine the cost-effectiveness of predicating a
micro-coded instruction.
llvm-svn: 113513
operand from the pseudo instruction to the new instruction as an implicit use.
This will preserve any other flags (e.g., kill) on the operand.
llvm-svn: 113456
For VLD3/VLD4 with double-spaced registers, add the implicit use of the
super register for both the instruction loading the even registers and the
instruction loading the odd registers.
llvm-svn: 113452
register must be one of the destination registers for the load. Otherwise,
the tLDM instruction will write-back to the base register, which isn't what's
desired (otherwise, we'd have a t2LDM_UPD instead).
rdar://8394087
llvm-svn: 113297
of a mneumonic, report operand errors with better location
info. For example, we now report:
t.s:6:14: error: invalid operand for instruction
cwtl $1
^
but we fail for common cases like:
t.s:11:4: error: invalid operand for instruction
addl $1, $1
^
because we don't know if this is supposed to be the reg/imm or imm/reg
form.
llvm-svn: 113178
"For ARM stack frames that utilize variable sized objects and have either
large local stack areas or require dynamic stack realignment, allocate a
base register via which to access the local frame. This allows efficient
access to frame indices not accessible via the FP (either due to being out
of range or due to dynamic realignment) or the SP (due to variable sized
object allocation). In particular, this greatly improves efficiency of access
to spill slots in Thumb functions which contain VLAs."
r112986 fixed a latent bug exposed by the above.
llvm-svn: 112989
alignment should be performed. Otherwise dynamic realignment may trigger
when the register allocator has already used the frame pointer as a general
purpose register. That is, we need to make sure that the list of reserved
registers doesn't change after register allocation.
llvm-svn: 112986
instructions prior to regalloc. Since it's getting a little close to
the 2.8 branch deadline, I'll have to leave the rest of the instructions
handled by the NEONPreAllocPass for now, but I didn't want to leave half
of the VLD instructions converted and the other half not.
llvm-svn: 112983
vabd intrinsic and add and/or zext operations. In the case of vaba, this
also avoids the need for a DAG combine pattern to combine vabd with add.
Update tests. Auto-upgrade the old intrinsics.
llvm-svn: 112941
large local stack areas or require dynamic stack realignment, allocate a
base register via which to access the local frame. This allows efficient
access to frame indices not accessible via the FP (either due to being out
of range or due to dynamic realignment) or the SP (due to variable sized
object allocation). In particular, this greatly improves efficiency of access
to spill slots in Thumb functions which contain VLAs.
rdar://7352504
rdar://8374540
rdar://8355680
llvm-svn: 112883
add, and subtract operations with zero-extended or sign-extended vectors.
Update tests. Add auto-upgrade support for the old intrinsics.
llvm-svn: 112773
int x(int t) {
if (t & 256)
return -26;
return 0;
}
We generate this:
tst.w r0, #256
mvn r0, #25
it eq
moveq r0, #0
while gcc generates this:
ands r0, r0, #256
it ne
mvnne r0, #25
bx lr
Scandalous really!
During ISel time, we can look for this particular pattern. One where we have a
"MOVCC" that uses the flag off of a CMPZ that itself is comparing an AND
instruction to 0. Something like this (greatly simplified):
%r0 = ISD::AND ...
ARMISD::CMPZ %r0, 0 @ sets [CPSR]
%r0 = ARMISD::MOVCC 0, -26 @ reads [CPSR]
All we have to do is convert the "ISD::AND" into an "ARM::ANDS" that sets [CPSR]
when it's zero. The zero value will all ready be in the %r0 register and we only
need to change it if the AND wasn't zero. Easy!
llvm-svn: 112664
kill flag.
This could cause duplicate kill flags when the same register was used twice in a
continuous sequence of STRs.
There is no small test case. <rdar://problem/8218046>
llvm-svn: 112534
operand is killed, add it to the expanded instruction as an implicit kill
operand instead of marking the individual subregs with kill flags. This
should work better in general and also handles the case for VST3 where one
of the subregs was not referenced in the expanded instruction and so was
not marked killed.
llvm-svn: 112494
optional modified register (instead of reg0). Along with r112461 it will make
sure that the optional define of CPSR is marked as "def" and will thus mark the
instructions using these classes (t2ANDS*) as setting the 's' flag.
llvm-svn: 112462
the special values that for ARM would be used with IB or DA modes. Fall
through and consider materializing a new base address is it would be
profitable.
llvm-svn: 112329
all the other LDM/STM instructions. This fixes asm printer crashes when
compiling with -O0. I've changed one of the NEON tests (vst3.ll) to run
with -O0 to check this in the future.
Prior to this change VLDM/VSTM used addressing mode #5, but not really.
The offset field was used to hold a count of the number of registers being
loaded or stored, and the AM5 opcode field was expanded to specify the IA
or DB mode, instead of the standard ADD/SUB specifier. Much of the backend
was not aware of these special cases. The crashes occured when rewriting
a frameindex caused the AM5 offset field to be changed so that it did not
have a valid submode. I don't know exactly what changed to expose this now.
Maybe we've never done much with -O0 and NEON. Regardless, there's no longer
any reason to keep a count of the VLDM/VSTM registers, so we can use
addressing mode #4 and clean things up in a lot of places.
llvm-svn: 112322
still having a significant effect. It shouldn't be now that the pre-RA
virtual base reg stuff is in. Assuming that's valididated by the nightly
testers, we can simplify a lot of the PEI frame index code.
llvm-svn: 112220
comparison with 0. These two pieces of code should give identical results:
rsbs r1, r1, 0
cmp r0, r1
mov r0, #0
it ls
mov r0, #1
and:
cmn r0, r1
mov r0, #0
it ls
mov r0, #1
However, the CMN gives the *opposite* result when r1 is 0. This is because the
carry flag is set in the CMP case but not in the CMN case. In short, the CMP
instruction doesn't perform a truncate of the (logical) NOT of 0 plus the value
of r0 and the carry bit (because the "carry bit" parameter to AddWithCarry is
defined as 1 in this case, the carry flag will always be set when r0 >= 0). The
CMN instruction doesn't perform a NOT of 0 so there is never a "carry" when this
AddWithCarry is performed (because the "carry bit" parameter to AddWithCarry is
defined as 0).
The AddWithCarry in the CMP case seems to be relying upon the identity:
~x + 1 = -x
However when x is 0 and unsigned, this doesn't hold:
x = 0
~x = 0xFFFF FFFF
~x + 1 = 0x1 0000 0000
(-x = 0) != (0x1 0000 0000 = ~x + 1)
Therefore, we should disable *all* versions of CMN, especially when comparing
against zero, until we can limit when the CMN instruction is used (when we know
that the RHS is not 0) or when we have a hardware fix for this.
(See the ARM docs for the "AddWithCarry" pseudo-code.)
This is related to <rdar://problem/7569620>.
llvm-svn: 112176
with the VST4 instructions. Until after register allocation, we want to
represent sets of adjacent registers by a single super-register. These
VST4 pseudo instructions have a single QQ or QQQQ source register operand.
They get expanded to the real VST4 instructions with 4 separate D register
operands. Once this conversion is complete, we'll be able to remove the
NEONPreAllocPass and avoid some fragile and hacky code elsewhere.
llvm-svn: 112108
comparison that would overflow.
- The other under/overflow cases can't actually happen because the immediates
which would trigger them are legal (so we don't enter this code), but
adjusted the style to make it clear the transform is always valid.
llvm-svn: 112053
Intended to help ease reproducing problems by increasing base register usage
after heuristics for only using the when needed are in place.
llvm-svn: 111930
frame index reference to an object in the local block is seen, check if
it's near enough to any previously allocaated base register to re-use.
rdar://8277890
llvm-svn: 111443
Nothing fancy, just ask the target if any currently available base reg
is in range for the instruction under consideration and use the first one
that is. Placeholder ARM implementation simply returns false for now.
ongoing saga of rdar://8277890
llvm-svn: 111374
the local block. Resolve references to those indices to a new base register.
For simplification and testing purposes, a new virtual base register is
allocated for each frame index being resolved. The result is truly horrible,
but correct, code that's good for exercising the new code paths.
Next up is adding thumb1 support, which should be very simple. Following that
will be adding base register re-use and implementing a reasonable ARM
heuristic for when a virtual base register should be generated at all.
llvm-svn: 111315
whether to allocate a virtual frame base register to resolve the frame
index reference in it. Implement a simple version for ARM to aid debugging.
In LocalStackSlotAllocation, scan the function for frame index references
to local frame indices and ask the target whether to allocate virtual
frame base registers for any it encounters. Purely infrastructural for
debug output. Next step is to actually allocate base registers, then add
intelligent re-use of them.
rdar://8277890
llvm-svn: 111262