I'd forgotten that "Requires" blocks override rather than add to the
constraints, so my pseudo-instruction was being selected in Thumb mode leading
to nonsense instructions.
rdar://problem/14817358
llvm-svn: 189096
This field specifies registers that are preserved across function calls,
but that should not be included in the generates SaveList array.
This can be used ot generate regmasks for architectures that save
registers through other means, like SPARC's register windows.
llvm-svn: 189084
The instruction to convert between floating point and fixed point representations
takes an immediate operand for the number of fractional bits of the fixed point
value. ARMARM specifies that when that number of bits is zero, the assembler
should encode floating point/integer conversion instructions.
This patch adds the necessary instruction aliases to achieve this behaviour.
llvm-svn: 189009
Back in the mists of time (2008), it seems TableGen couldn't handle the
patterns necessary to match ARM's CMOV node that we convert select operations
to, so we wrote a lot of fairly hairy C++ to do it for us.
TableGen can deal with it now: there were a few minor differences to CodeGen
(see tests), but nothing obviously worse that I could see, so we should
probably address anything that *does* come up in a localised manner.
llvm-svn: 188995
def imm0_63 : Operand<i32>, ImmLeaf<i32, [{ return Imm >= 0 && Imm < 63;}]>{
As it seems Imm <63 should be Imm <= 63. ImmLeaf is used in pattern match, but there is already a function check the shift amount range, so just remove ImmLeaf. Also add a test to check 63.
llvm-svn: 188911
According to the ARM specification, "mov" is a valid mnemonic for all Thumb2 MOV encodings.
To achieve this, the patch adds one instruction alias with a special range condition to avoid collision with the Thumb1 MOV.
llvm-svn: 188901
The initial port used MLG(R) for i64 UMUL_LOHI but left the other three
combinations as not-legal-or-custom. Although 32x32->{32,32}
multiplications exist, they're not as quick as doing a normal 64-bit
multiplication, so it didn't seem like i32 SMUL_LOHI and UMUL_LOHI
would be useful. There's also no direct instruction for i64 SMUL_LOHI,
so it needs to be implemented in terms of UMUL_LOHI.
However, not defining these patterns means that we don't convert
division by a constant into multiplication, so this patch fills
in the other cases. The new i64 SMUL_LOHI sequence is simpler
than the one that we used previously for 64x64->128 multiplication,
so int-mul-08.ll now tests the full sequence.
llvm-svn: 188898
I accidentally changed the encoding of the MSA registers to zero instead of 0
to 31. This change restores the encoding the registers had prior to r188893.
This didn't show up in the existing tests because direct-object emission isn't
implemented yet for MSA.
llvm-svn: 188896
point registers. We will need this register class later when we add
definitions for instructions mfhc1 and mthc1. Also, remove sub-register indices
sub_fpeven and sub_fpodd and use sub_lo and sub_hi instead.
llvm-svn: 188842
load/store instructions defined. Previously, we were defining load/store
instructions for each pointer size (32 and 64-bit), but now we need just one
definition.
llvm-svn: 188830
functions be compiled as mips32, without having to add attributes. This
is useful in certain situations where you don't want to have to edit the
function attributes in the source. For now it's only an option used for
the compiler developers when debugging the mips16 port.
llvm-svn: 188826
Update testcase to be more careful about checking register
values. While regexes are general goodness for these sorts of
testcases, in this example, the registers are constrained by
the calling convention, so we can and should check their
explicit values.
rdar://14779513
llvm-svn: 188819
SystemZTargetLowering::emitStringWrapper() previously loaded the character
into R0 before the loop and made R0 live on entry. I'd forgotten that
allocatable registers weren't allowed to be live across blocks at this stage,
and it confused LiveVariables enough to cause a miscompilation of f3 in
memchr-02.ll.
This patch instead loads R0 in the loop and leaves LICM to hoist it
after RA. This is actually what I'd tried originally, but I went for
the manual optimisation after noticing that R0 often wasn't being hoisted.
This bug forced me to go back and look at why, now fixed as r188774.
We should also try to optimize null checks so that they test the CC result
of the SRST directly. The select between null and the SRST GPR result could
then usually be deleted as dead.
llvm-svn: 188779
Previously we used a const-pool load for virtually all 64-bit floating values.
Actually, we can get quite a few common values (including 0.0, 1.0) via "vmov"
instructions of one stripe or another.
llvm-svn: 188773
(Patch committed on behalf of Mark Minich, whose log entry follows.)
This is a continuation of the refactorings performed in svn rev 188573
(see that rev's comments for more detail).
This is my stage 2 refactoring: I combined the emitPrologue() &
emitEpilogue() PPC32 & PPC64 code into a single flow, simplifying a
lot of the code since in essence the PPC32 & PPC64 code generation
logic is the same, only the instruction forms are different (in most
cases). This simplification is necessary because my functional changes
(yet to come) add significant complexity, and without the
simplification of my stage 2 refactoring, the overall complexity of
both emitPrologue() & emitEpilogue() would have become almost
intractable for most mortal programmers (like me).
This submission was intended to be a pure refactoring (no functional
changes whatsoever). However, in the process of combining the PPC32 &
PPC64 flows, I spotted a difference that I believe is a bug (see svn
rev 186478 line 863, or svn rev 188573 line 888): This line appears to
be restoring the BP with the original FP content, not the original BP
content. When I merged the 32-bit and 64-bit code, I used the
corresponding code from the 64-bit flow, which I believe uses the
correct offset (BPOffset) for this operation.
llvm-svn: 188741
This adds a llvm.copysign intrinsic; We already have Libfunc recognition for
copysign (which is turned into the FCOPYSIGN SDAG node). In order to
autovectorize calls to copysign in the loop vectorizer, we need a corresponding
intrinsic as well.
In addition to the expected changes to the language reference, the loop
vectorizer, BasicTTI, and the SDAG builder (the intrinsic is transformed into
an FCOPYSIGN node, just like the function call), this also adds FCOPYSIGN to a
few lists in LegalizeVector{Ops,Types} so that vector copysigns can be
expanded.
In TargetLoweringBase::initActions, I've made the default action for FCOPYSIGN
be Expand for vector types. This seems correct for all in-tree targets, and I
think is the right thing to do because, previously, there was no way to generate
vector-values FCOPYSIGN nodes (and most targets don't specify an action for
vector-typed FCOPYSIGN).
llvm-svn: 188728
copysign/copysignf never become function calls (because the SDAG expansion code
does not lower to the corresponding function call, but rather directly
implements the associated logic), but copysignl almost always is lowered into a
call to the requested libm functon (and, thus, might clobber CTR).
llvm-svn: 188727