- This fixed a number of bugs in if-converter, tail merging, and post-allocation
scheduler. If-converter now runs branch folding / tail merging first to
maximize if-conversion opportunities.
- Also changed the t2IT instruction slightly. It now defines the ITSTATE
register which is read by instructions in the IT block.
- Added Thumb2 specific hazard recognizer to ensure the scheduler doesn't
change the instruction ordering in the IT block (since IT mask has been
finalized). It also ensures no other instructions can be scheduled between
instructions in the IT block.
This is not yet enabled.
llvm-svn: 106344
call must not be callee-saved; following x86, add a new
regclass to represent this. Also fixes a couple of bugs.
Still disabled by default; Thumb doesn't work yet.
llvm-svn: 106053
through to the generic version. The generic functions use STR/LDR, but T2
needs the t2STR/t2LDR instead so we get the addressing mode correct.
llvm-svn: 99678
immediate instructions cannot set the condition codes, so they do not have
the extra cc_out operand. We hit an assertion during tail duplication
because the instruction being duplicated had more operands that expected.
llvm-svn: 98001
except it doesn't care if the definitions' virtual registers differ. This is
used by machine LICM and other MI passes to perform CSE.
- Teach Thumb2InstrInfo::isIdentical() to check two t2LDRpci_pic are identical.
Since pc relative constantpool entries are always different, this requires it
it check if the values can actually the same.
llvm-svn: 86328
load of a GV from constantpool and then add pc. It allows the code sequence to
be rematerializable so it would be hoisted by machine licm.
- Add a late pass to break these pseudo instructions into a number of real
instructions. Also move the code in Thumb2 IT pass that breaks up t2MOVi32imm
to this pass. This is done before post regalloc scheduling to allow the
scheduler to proper schedule these instructions. It also allow them to be
if-converted and shrunk by later passes.
llvm-svn: 86304
VLDM/VSTM instructions, and without this check, the code assumes that an
offset is allowed, as it would be with VLDR/VSTR. The asm printer,
however, silently drops the offset, producing incorrect code. Since the
address register in this case is either the stack or frame pointer, the
spill location ends up conflicting with some other stack slot or with
outgoing arguments on the stack.
llvm-svn: 81879
This patch takes pain to ensure all the PEI lowering code does the right thing when lowering frame indices, insert code to manipulate stack pointers, etc. It's also custom lowering dynamic stack alloc into pseudo instructions so we can insert the right instructions at scheduling time.
This fixes PR4659 and PR4682.
llvm-svn: 78361
the only real caller (GetFunctionSizeInBytes) uses it.
The custom ARM implementation of this is basically reimplementing
an assembler poorly for negligible gain. It should be removed
IMNSHO, but I'll leave that to ARMish folks to decide.
llvm-svn: 77877
- This change also makes it possible to switch between ARM / Thumb on a
per-function basis.
- Fixed thumb2 routine which expand reg + arbitrary immediate. It was using
using ARM so_imm logic.
- Use movw and movt to do reg + imm when profitable.
- Other code clean ups and minor optimizations.
llvm-svn: 77300
This also fixes potential problems in ARMBaseInstrInfo routines not recognizing thumb1 instructions when 32-bit and 16-bit instructions mix.
llvm-svn: 77218
Before:
adr r12, #LJTI3_0_0
ldr pc, [r12, +r0, lsl #2]
LJTI3_0_0:
.long LBB3_24
.long LBB3_30
.long LBB3_31
.long LBB3_32
After:
adr r12, #LJTI3_0_0
add pc, r12, +r0, lsl #2
LJTI3_0_0:
b.w LBB3_24
b.w LBB3_30
b.w LBB3_31
b.w LBB3_32
This has several advantages.
1. This will make it easier to optimize this to a TBB / TBH instruction +
(smaller) table.
2. This eliminate the need for ugly asm printer hack to force the address
into thumb addresses (bit 0 is one).
3. Same codegen for pic and non-pic.
4. This eliminate the need to align the table so constantpool island pass
won't have to over-estimate the size.
Based on my calculation, the later is probably slightly faster as well since
ldr pc with shifter address is very slow. That is, it should be a win as long
as the HW implementation can do a reasonable job of branch predict the second
branch.
llvm-svn: 77024