difficult on current ARM implementations for a few reasons.
1. Even though a single vmla has latency that is one cycle shorter than a pair
of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause
additional pipeline stall. So it's frequently better to single codegen
vmul + vadd.
2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to
stall for 4 cycles. We need to schedule them apart.
3. A vmla followed vmla is a special case. Obvious issuing back to back RAW
vmla + vmla is very bad. But this isn't ideal either:
vmul
vadd
vmla
Instead, we want to expand the second vmla:
vmla
vmul
vadd
Even with the 4 cycle vmul stall, the second sequence is still 2 cycles
faster.
Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough
but it isn't the optimial solution. This patch attempts to make it possible to
use vmla / vmls in cases where it is profitable.
A. Add missing isel predicates which cause vmla to be codegen'ed.
B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to
compute a fmul and a fmla.
C. Add additional isel checks for vmla, avoid cases where vmla is feeding into
fp instructions (except for the #3 exceptional case).
D. Add ARM hazard recognizer to model the vmla / vmls hazards.
E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the
vmla / vmls will trigger one of the special hazards.
Work in progress, only A+B are enabled.
llvm-svn: 120960
instructions have to distinguish between lists of single- and double-precision
registers in order for the ASM matcher to do a proper job. In all other
respects, a list of single- or double-precision registers are the same as a list
of GPR registers.
llvm-svn: 119460
'db', 'ib', 'da') instead of having that mode as a separate field in the
instruction. It's more convenient for the asm parser and much more readable for
humans.
<rdar://problem/8654088>
llvm-svn: 119310
vldr.64 d1, [r0, #-32]
The problem was with how the addressing mode 5 encodes the offsets. This change
makes sure that the way offsets are handled in addressing mode 5 is consistent
throughout the MC code. It involves re-refactoring the "getAddrModeImmOpValue"
method into an "Imm12" and "addressing mode 5" version. But not to worry! The
majority of the duplicated code has been unified.
llvm-svn: 118144
with immediates up to 16-bits in size. The same logic is applied to other LDR
encodings, e.g. VLDR, but which use a different immediate bit width (8-bits in
VLDR's case). Removing the "12" allows it to be more generic.
llvm-svn: 118094
Instead of silently ignoring these instructions, emit a hard error and
force the target author to either refactor the target or mark the
instruction 'isCodeGenOnly'.
Mark a few instructions in ARM and MBlaze as isCodeGenOnly the are
doing this.
llvm-svn: 117858
here. The f32 in FCONSTS is handled as a double instead of a float in the
code. So the encoding of the immediate into the instruction isn't exactly in
line with the documentation in that regard. But given that we know it's handled
as a double, it doesn't cause any harm.
llvm-svn: 116471
all the other LDM/STM instructions. This fixes asm printer crashes when
compiling with -O0. I've changed one of the NEON tests (vst3.ll) to run
with -O0 to check this in the future.
Prior to this change VLDM/VSTM used addressing mode #5, but not really.
The offset field was used to hold a count of the number of registers being
loaded or stored, and the AM5 opcode field was expanded to specify the IA
or DB mode, instead of the standard ADD/SUB specifier. Much of the backend
was not aware of these special cases. The crashes occured when rewriting
a frameindex caused the AM5 offset field to be changed so that it did not
have a valid submode. I don't know exactly what changed to expose this now.
Maybe we've never done much with -O0 and NEON. Regardless, there's no longer
any reason to keep a count of the VLDM/VSTM registers, so we can use
addressing mode #4 and clean things up in a lot of places.
llvm-svn: 112322
Add support for using the FPSCR in conjunction with the vcvtr instruction, for controlling fp to int rounding.
Add support for the FLT_ROUNDS_ node now that the FPSCR is exposed.
llvm-svn: 110152
--- Reverse-merging r98889 into '.':
U lib/Target/ARM/ARMInstrNEON.td
U lib/Target/ARM/ARMISelLowering.h
U lib/Target/ARM/ARMInstrInfo.td
U lib/Target/ARM/ARMInstrVFP.td
U lib/Target/ARM/ARMISelLowering.cpp
U lib/Target/ARM/ARMInstrFormats.td
llvm-svn: 99010
This is for the disassembly work.
There are cases where this is not possible, for example, A8.6.53 LDM Encoding T1.
In such case, we'll use an adhoc approach to deduce the Opcode programmatically.
llvm-svn: 98679
writebacks to the address register. This gets rid of the hack that the
first register on the list was the magic writeback register operand. There
was an implicit constraint that if that operand was not reg0 it had to match
the base register operand. The post-RA scheduler's antidependency breaker
did not understand that constraint and sometimes changed one without the
other. This also fixes Radar 7495976 and should help the verifier work
better for ARM code.
There are now new ld/st instructions explicit writeback operands and explicit
constraints that tie those registers together.
llvm-svn: 98409
example, this:
(set DPR:$dst, (fsub (fneg (fmul DPR:$a, DPR:$b)), DPR:$dstin))
is ambiguous because DPR contains both f64 and v2f32. tblgen
currently accidentally picks f64 because it's first in the
regclass.
llvm-svn: 97955
bit (Inst{22}) and the M bit (Inst{5}) should be left unspecified. For binary
format instructions, Inst{6} and Inst{4} need to specified for proper decodings.
llvm-svn: 94855