The previous situation where ATOMIC_LOAD_WHATEVER nodes were expanded
at MachineInstr emission time had grown to be extremely large and
involved, to account for the subtly different code needed for the
various flavours (8/16/32/64 bit, cmpxchg/add/minmax).
Moving this transformation into the IR clears up the code
substantially, and makes future optimisations much easier:
1. an atomicrmw followed by using the *new* value can be more
efficient. As an IR pass, simple CSE could handle this
efficiently.
2. Making use of cmpxchg success/failure orderings only has to be done
in one (simpler) place.
3. The common "cmpxchg; did we store?" idiom can be exposed to
optimisation.
I intend to gradually improve this situation within the ARM backend
and make sure there are no hidden issues before moving the code out
into CodeGen to be shared with (at least ARM64/AArch64, though I think
PPC & Mips could benefit too).
llvm-svn: 205525
We've already got versions without the barriers, so this just adds IR-level
support for generating the new v8 ones.
rdar://problem/16227836
llvm-svn: 204813
ATOMIC_STORE operations always get here as a lowered ATOMIC_SWAP, so there's no
need for any code to handle them specially.
There should be no functionality change so no tests.
llvm-svn: 203567
With constant-sharing, litpool loads consume 4 + N*2 bytes of code, but
movw/movt pairs consume 8*N. This means litpools are better than movw/movt even
with just one use. Other materialisation strategies can still be better though,
so the logic is a little odd.
llvm-svn: 199891
These are used by MachO only at the moment, and (much like the existing
MOVW/MOVT set) work around the fact that the labels used in the actual
instructions often contain PC-dependent components, which means that repeatedly
materialising the same global can't be CSEed.
With small modifications, it could be adapted to how ELF finds the address of
_GLOBAL_OFFSET_TABLE_, which would give similar benefits in PIC mode there.
llvm-svn: 196090
Previously, the DAGISel function WalkChainUsers was spotting that it
had entered already-selected territory by whether a node was a
MachineNode (amongst other things). Since it's fairly common practice
to insert MachineNodes during ISelLowering, this was not the correct
check.
Looking around, it seems that other nodes get their NodeId set to -1
upon selection, so this makes sure the same thing happens to all
MachineNodes and uses that characteristic to determine whether we
should stop looking for a loop during selection.
This should fix PR15840.
llvm-svn: 191165
Back in the mists of time (2008), it seems TableGen couldn't handle the
patterns necessary to match ARM's CMOV node that we convert select operations
to, so we wrote a lot of fairly hairy C++ to do it for us.
TableGen can deal with it now: there were a few minor differences to CodeGen
(see tests), but nothing obviously worse that I could see, so we should
probably address anything that *does* come up in a localised manner.
llvm-svn: 188995
When patching inlineasm nodes to use GPRPair for 64-bit values, we
were dropping the information that two operands were tied, which
effectively broke the live-interval of vregs affected.
llvm-svn: 188643
In the SelectionDAG immediate operands to inline asm are constructed as
two separate operands. The first is a constant of value InlineAsm::Kind_Imm
and the second is a constant with the value of the immediate.
In ARMDAGToDAGISel::SelectInlineAsm, if we reach an operand of Kind_Imm we
should skip over the next operand too.
llvm-svn: 185688
This patch assigns paired GPRs for inline asm with
64-bit data on ARM. It's enabled for both ARM and Thumb to support modifiers
like %H, %Q, %R.
llvm-svn: 185169
into their new header subdirectory: include/llvm/IR. This matches the
directory structure of lib, and begins to correct a long standing point
of file layout clutter in LLVM.
There are still more header files to move here, but I wanted to handle
them in separate commits to make tracking what files make sense at each
layer easier.
The only really questionable files here are the target intrinsic
tablegen files. But that's a battle I'd rather not fight today.
I've updated both CMake and Makefile build systems (I think, and my
tests think, but I may have missed something).
I've also re-sorted the includes throughout the project. I'll be
committing updates to Clang, DragonEgg, and Polly momentarily.
llvm-svn: 171366
((x & 0xff00) >> 8) << 2
to
(x >> 6) & 0x3fc
This is general goodness since it folds a left shift into the mask. However,
the trailing zeros in the mask prevents the ARM backend from using the bit
extraction instructions. And worse since the mask materialization may require
an addition instruction. This comes up fairly frequently when the result of
the bit twiddling is used as memory address. e.g.
= ptr[(x & 0xFF0000) >> 16]
We want to generate:
ubfx r3, r1, #16, #8
ldr.w r3, [r0, r3, lsl #2]
vs.
mov.w r9, #1020
and.w r2, r9, r1, lsr #14
ldr r2, [r0, r2]
Add a late ARM specific isel optimization to
ARMDAGToDAGISel::PreprocessISelDAG(). It folds the left shift to the
'base + offset' address computation; change the mask to one which doesn't have
trailing zeros and enable the use of ubfx.
Note the optimization has to be done late since it's target specific and we
don't want to change the DAG normalization. It's also fairly restrictive
as shifter operands are not always free. It's only done for lsh 1 / 2. It's
known to be free on some cpus and they are most common for address
computation.
This is a slight win for blowfish, rijndael, etc.
rdar://12870177
llvm-svn: 170581
Sooooo many of these had incorrect or strange main module includes.
I have manually inspected all of these, and fixed the main module
include to be the nearest plausible thing I could find. If you own or
care about any of these source files, I encourage you to take some time
and check that these edits were sensible. I can't have broken anything
(I strictly added headers, and reordered them, never removed), but they
may not be the headers you'd really like to identify as containing the
API being implemented.
Many forward declarations and missing includes were added to a header
files to allow them to parse cleanly when included first. The main
module rule does in fact have its merits. =]
llvm-svn: 169131
This patch replaces the hard coded GPR pair [R0, R1] of
Intrinsic:arm_ldrexd and [R2, R3] of Intrinsic:arm_strexd with
even/odd GPRPair reg class.
Similar to the lowering of atomic_64 operation.
llvm-svn: 168207
* wrap code blocks in \code ... \endcode;
* refer to parameter names in paragraphs correctly (\arg is not what most
people want -- it starts a new paragraph);
* use \param instead of \arg to document parameters in order to be consistent
with the rest of the codebase.
llvm-svn: 163902
This patch corrects the definition of umlal/smlal instructions and adds support
for matching them to the ARM dag combiner.
Bug 12213
Patch by Yin Ma!
llvm-svn: 163136
When predicating this instruction:
Rd = ADD Rn, Rm
We need an extra operand to represent the value given to Rd when the
predicate is false:
Rd = ADDCC Rfalse, Rn, Rm, pred
The Rd and Rfalse operands are different registers while in SSA form.
Rfalse is tied to Rd to make sure they get the same register during
register allocation.
Previously, Rd and Rn were tied, but that is not required.
Compare to MOVCC:
Rd = MOVCC Rfalse, Rtrue, pred
llvm-svn: 161955
This patch corrects the definition of umlal/smlal instructions and adds support
for matching them to the ARM dag combiner.
Bug 12213
Patch by Yin Ma!
llvm-svn: 161581
While there is an encoding for it in VUZP, the result of that is undefined,
so we should avoid it. Define the instruction as a pseudo for VTRN.32
instead, as the ARM ARM indicates.
rdar://11222366
llvm-svn: 154511
While there is an encoding for it in VZIP, the result of that is undefined,
so we should avoid it. Define the instruction as a pseudo for VTRN.32
instead, as the ARM ARM indicates.
rdar://11221911
llvm-svn: 154505
With the new composite physical registers to represent arbitrary pairs
of DPR registers, we don't need the pseudo-registers anymore. Get rid of
a bunch of them that use DPR register pairs and just use the real
instructions directly instead.
llvm-svn: 152045
value is zero. Instead of a cmov + op, issue an conditional op instead. e.g.
cmp r9, r4
mov r4, #0
moveq r4, #1
orr lr, lr, r4
should be:
cmp r9, r4
orreq lr, lr, #1
That is, optimize (or x, (cmov 0, y, cond)) to (or.cond x, y). Similarly extend
this to xor as well as (and x, (cmov -1, y, cond)) => (and.cond x, y).
It's possible to extend this to ADD and SUB but I don't think they are common.
rdar://8659097
llvm-svn: 151224
Refactor the instructions into fixed writeback and register-stride
writeback variants to simplify the offset operand (no more optional
register operand using reg0). This is a simpler representation and allows
the assembly parser to more easily handle these instructions.
Add tests for the instruction variants now supported.
llvm-svn: 146278
Split am6offset into fixed and register offset variants so the instruction
encodings are explicit rather than relying an a magic reg0 marker.
Needed to being able to parse these.
llvm-svn: 142853
merging an lsl #2 that has multiple uses on A9. This shift is free, so there is
no problem merging it in multiple places. Other unprofitable shifts will not be
merged.
llvm-svn: 141247
Math is hard, and isScaledConstantInRange() always returned false for
negative constants. It was doing unsigned division of negative numbers
before casting back to signed.
llvm-svn: 140425
Add the predicate operand to the instructions. Update the back end
accordingly where the instructions are used. Restrict the SP operands
to actually only be SP, as otherwise these break assembly parsing for the
normal instruction variants.
llvm-svn: 138445
Refactor STR[B] pre and post indexed instructions to use addressing modes for
memory operands, which is necessary for assembly parsing and is more consistent
with the rest of the memory instruction definitions. Make some incremental
progress on refactoring away the mega-operand addrmode2 along the way, which
is nice.
llvm-svn: 136978
Encode the width operand as it encodes in the instruction, which simplifies
the disassembler and the encoder, by using the imm1_32 operand def. Add a
diagnostic for the context-sensitive constraint that the width must be in
the range [1,32-lsb].
llvm-svn: 136264