In the default PowerPC assembler syntax, registers are specified simply
by number, so they cannot be distinguished from immediate values (without
looking at the opcode). This means that the default operand matching logic
for the asm parser does not work, and we need to specify custom matchers.
Since those can only be specified with RegisterOperand classes and not
directly on the RegisterClass, all instructions patterns used by the asm
parser need to use a RegisterOperand (instead of a RegisterClass) for
all their register operands.
This patch adds one RegisterOperand for each RegisterClass, using the
same name as the class, just in lower case, and updates all instruction
patterns to use RegisterOperand instead of RegisterClass operands.
llvm-svn: 180611
When testing the asm parser, I noticed wrong encodings for the
above instructions (wrong sub-opcodes).
Tests will be added together with the asm parser.
llvm-svn: 180608
When testing the asm parser, I noticed wrong encodings for the
above instructions (wrong sub-opcodes). Note that apparently
the compiler currently never generates pre-inc instructions
for floating point types for some reason ...
Tests will be added together with the asm parser.
llvm-svn: 180607
When testing the asm parser, I noticed wrong encodings for the
above instructions (wrong operand name in rldimi, wrong form
and sub-opcode for rldcl).
Tests will be added together with the asm parser.
llvm-svn: 180606
When testing the asm parser, I ran into an error when using a conditional
branch to an external symbol (this doesn't occur in compiler-generated
code) due to missing support in PPCELFObjectWriter::getRelocTypeInner.
llvm-svn: 180605
This exposed an issue with PowerPC AltiVec where it appears it was setting the wrong vector boolean contents. The included change
fixes the PowerPC tests, and was OK'd by Hal.
llvm-svn: 180129
The getSwappedPredicate function can be used in other places (such as in
improvements to the PPCCTRLoops pass). Instead of trapping it as a static
function in PPCInstrInfo, move it into PPCPredicates with other
predicate-related things.
No functionality change intended.
llvm-svn: 179926
When matching a compare with a subtract where the arguments of the compare are
swapped w.r.t. the arguments of the subtract, we need to negate the predicates
(or CR bit indices) of the users. This, however, is not the same as inverting
the predicate (negating LT -> GT, but inverting LT -> GE, for example). The ARM
backend seems to do this correctly, but when I adapted the code for the PPC
backend, I introduced an error in this logic.
Comparison optimization is now enabled again by default.
llvm-svn: 179899
Many PPC instructions have a so-called 'record form' which stores to a specific
condition register the result of comparing the result of the instruction with
zero (always as a signed comparison). For integer operations on PPC64, this is
always a 64-bit comparison.
This implementation is derived from the implementation in the ARM backend;
there are some differences because PPC condition registers are allocatable
virtual registers (although the record forms always use a specific one), and we
look for a matching subtraction instruction after the compare (but before the
first use) in addition to before it.
llvm-svn: 179802
A couple of recently introduced conditional branch patterns
also need to be marked as isCodeGenOnly since they cannot
be handled by the asm parser.
No change in generated code.
llvm-svn: 179690
Now that the CR spilling issues have been resolved, we can remove the
unmodeled-side-effect attributes from the comparison instructions (and also
mark them as isCompare). By allowing these, by default, to have unmodeled side
effects, we were hiding problems with CR spilling; but everything seems much
happier now.
llvm-svn: 179502
This fixes an ABI bug for non-Darwin PPC64. For the callee-saved condition
registers, the spill location is specified relative to the stack pointer (SP +
8). However, this is not relative to the SP after the new stack frame is
established, but instead relative to the caller's stack pointer (it is stored
into the linkage area of the parent's stack frame).
So, like with the link register, we don't directly spill the CRs with other
callee-saved registers, but just mark them to be spilled during prologue
generation.
In practice, this reverts r179457 for PPC64 (but leaves it in place for PPC32).
llvm-svn: 179500
Leaving MFCR has having unmodeled side effects is not enough to prevent
unwanted instruction reordering post-RA. We could probably apply a stronger
barrier attribute, but there is a better way: Add all (not just the first) CR
to be spilled as live-in to the entry block, and add all CRs to the MFCR
instruction as implicitly killed.
Unfortunately, I don't have a small test case.
llvm-svn: 179465
For functions that need to spill CRs, and have dynamic stack allocations, the
value of the SP during the restore is not what it was during the save, and so
we need to use the FP in these cases (as for all of the other spills and
restores, but the CR restore has a special code path because its reserved slot,
like the link register, is specified directly relative to the adjusted SP).
llvm-svn: 179457
TableGen will not combine nested list 'let' bindings into a single list, and
instead uses only the inner scope. As a result, several instruction definitions
were missing implicit register defs that were in outer scopes. This de-nests
these scopes and makes all instructions have only one let binding which sets
implicit register definitions.
llvm-svn: 179392
This is prep. work for the implementation of optimizeCompare. Many PPC
instructions have 'record' forms (in almost all cases, this means that the RC
bit is set) that cause the result of the instruction to be compared with zero,
and the result of that comparison saved in a predefined condition register. In
order to add the record forms of the instructions without too much
copy-and-paste, the relevant functions have been refactored into multiclasses
which define both the record and normal forms.
Also, two TableGen-generated mapping functions have been added which allow
querying the instruction code for the record form given the normal form (and
vice versa).
No functionality change intended.
llvm-svn: 179356
Because of how predication in implemented on PPC (only for branches), I think
that this is the right thing to do. No functionality change intended.
llvm-svn: 179252
I've not seen this happen in practice, and probably can't until we start
allowing decrement-counter-based conditional branches to be double predicated,
but just in case, don't allow predication of a diamond in which both sides have
ctr-defining branches. Even though the branching behavior of these can be
predicated, the counter-decrementing behavior cannot be.
llvm-svn: 179199
This adds in-principle support for if-converting the bctr[l] instructions.
These instructions are used for indirect branching. It seems, however, that the
current if converter will never actually predicate these. To do so, it would
need the ability to hoist a few setup insts. out of the conditionally-executed
block. For example, code like this:
void foo(int a, int (*bar)()) { if (a != 0) bar(); }
becomes:
...
beq 0, .LBB0_2
std 2, 40(1)
mr 12, 4
ld 3, 0(4)
ld 11, 16(4)
ld 2, 8(4)
mtctr 3
bctrl
ld 2, 40(1)
.LBB0_2:
...
and it would be safe to do all of this unconditionally with a predicated
beqctrl instruction.
llvm-svn: 179156
This enables us to form predicated branches (which are the same conditional
branches we had before) and also a larger set of predicated returns (including
instructions like bdnzlr which is a conditional return and loop-counter
decrement all in one).
At the moment, if conversion does not capture all possible opportunities. A
simple example is provided in early-ret2.ll, where if conversion forms one
predicated return, and then the PPCEarlyReturn pass picks up the other one. So,
at least for now, we'll keep both mechanisms.
llvm-svn: 179134
Some general cleanup and only scan the end of a BB for branches (once we're
done with the terminators and debug values, then there should not be any other
branches). These address post-commit review suggestions by Bill Schmidt.
No functionality change intended.
llvm-svn: 179112
On PowerPC, non-vector loads and stores have r+i forms; however, in functions
with large stack frames these were not being used to access slots far from the
stack pointer because such slots were out of range for the signed 16-bit
immediate offset field. This increases register pressure because we need a
separate register for each offset (when the r+r form is used). By enabling
virtual base registers, we can deal with large stack frames without unduly
increasing register pressure.
llvm-svn: 179105
PowerPC has a conditional branch to the link register (return) instruction: BCLR.
This should be used any time when we'd otherwise have a conditional branch to a
return. This adds a small pass, PPCEarlyReturn, which runs just prior to the
branch selection pass (and, importantly, after block placement) to generate
these conditional returns when possible. It will also eliminate unconditional
branches to returns (these happen rarely; most of the time these have already
been tail duplicated by the time PPCEarlyReturn is invoked). This is a nice
optimization for small functions that do not maintain a stack frame.
llvm-svn: 179026
First, we should not cheat: fsel-based lowering of select_cc is a
finite-math-only optimization (the ISA manual, section F.3 of v2.06, makes
this clear, as does a note in our own README).
This also adds fsel-based lowering of EQ and NE condition codes. As it turned
out, fsel generation was covered by a grand total of zero regression test
cases. I've added some test cases to cover the existing behavior (which is now
finite-math only), as well as the new EQ cases.
llvm-svn: 179000
There are certain PPC instructions into which we can fold a zero immediate
operand. We can detect such cases by looking at the register class required
by the using operand (so long as it is not otherwise constrained).
llvm-svn: 178961
On cores for which we know the misprediction penalty, and we have
the isel instruction, we can profitably perform early if conversion.
This enables us to replace some small branch sequences with selects
and avoid the potential stalls from mispredicting the branches.
Enabling this feature required implementing canInsertSelect and
insertSelect in PPCInstrInfo; isel code in PPCISelLowering was
refactored to use these functions as well.
llvm-svn: 178926
The manual states that there is a minimum of 13 cycles from when the
mispredicted branch is issued to when the correct branch target is
issued.
llvm-svn: 178925
On certain architectures we can support efficient vectorized version of
instructions if the operand value is uniform (splat) or a constant scalar.
An example of this is a vector shift on x86.
We can efficiently support
for (i = 0 ; i < ; i += 4)
w[0:3] = v[0:3] << <2, 2, 2, 2>
but not
for (i = 0; i < ; i += 4)
w[0:3] = v[0:3] << x[0:3]
This patch adds a parameter to getArithmeticInstrCost to further qualify operand
values as uniform or uniform constant.
Targets can then choose to return a different cost for instructions with such
operand values.
A follow-up commit will test this feature on x86.
radar://13576547
llvm-svn: 178807
BCL is normally a conditional branch-and-link instruction, but has
an unconditional form (which is used in the SjLj code, for example).
To make clear that this BCL instruction definition is specifically
the special unconditional form (which does not meaningfully take
a condition-register input), rename it to BCLalways.
No functionality change intended.
llvm-svn: 178803
The DAGCombine logic that recognized a/sqrt(b) and transformed it into
a multiplication by the reciprocal sqrt did not handle cases where the
sqrt and the division were separated by an fpext or fptrunc.
llvm-svn: 178801