Previously we modelled VPR128 and VPR64 as essentially identical
register-classes containing V0-V31 (which had Q0-Q31 as "sub_alias"
sub-registers). This model is starting to cause significant problems
for code generation, particularly writing EXTRACT/INSERT_SUBREG
patterns for converting between the two.
The change here switches to classifying VPR64 & VPR128 as
RegisterOperands, which are essentially aliases for RegisterClasses
with different parsing and printing behaviour. This fits almost
exactly with their real status (VPR128 == FPR128 printed strangely,
VPR64 == FPR64 printed strangely).
llvm-svn: 190665
When a structure is passed by value, and that structure contains a vector
member, according to the PPC ABI, the structure will receive enhanced alignment
(so that the vector within the structure will always be aligned).
This should resolve PR16641.
llvm-svn: 190636
In fast-math mode sqrt(x) is calculated using the fast expansion of the
reciprocal of the reciprocal sqrt expansion. The reciprocal and reciprocal
sqrt expansions use the associated estimate instructions along with some Newton
iterations. Unfortunately, as a result, sqrt(0) was being calculated as NaN,
which is not correct. Now we explicitly return a result of zero if the input is
zero.
llvm-svn: 190624
Add basic assembly/disassembly support for the first Intel SHA
instruction 'sha1rnds4'. Also includes feature flag, and test cases.
Support for the remaining instructions will follow in a separate patch.
llvm-svn: 190611
Use the new instruction deprecation feature to mark mftb (now replaced with
mfspr) and dst (along with the other Altivec cache control instructions) as
deprecated when targeting cores supporting at least ISA v2.03.
llvm-svn: 190605
The 'Deprecated' class allows you to specify a SubtargetFeature that the
instruction is deprecated on.
The 'ComplexDeprecationPredicate' class allows you to define a custom
predicate that is called to check for deprecation.
For example:
ComplexDeprecationPredicate<"MCR">
would mean you would have to define the following function:
bool getMCRDeprecationInfo(MCInst &MI, MCSubtargetInfo &STI,
std::string &Info)
Which returns 'false' for not deprecated, and 'true' for deprecated
and store the warning message in 'Info'.
The MCTargetAsmParser constructor was chaned to take an extra argument of
the MCInstrInfo class, so out-of-tree targets will need to be changed.
llvm-svn: 190598
Aggressive anti-dependency breaking is enabled by default for all PPC cores.
This provides a general speedup on the P7 and other platforms (among other
factors, the instruction group formation for the non-embedded PPC cores is done
during post-RA scheduling). In order to do this safely, the incompatibility
between uses of the MFOCRF instruction and anti-dependency breaking are
resolved by marking MFOCRF with hasExtraSrcRegAllocReq. As noted in the removed
FIXME, the problem was that MFOCRF's output is sensitive to the identify of the
source register, and always paired with a shift to undo this effect. Because
anti-dependency breaking is unaware of this hidden dependency of the shift
amount on the source register of the MFOCRF instruction, changing that register
must be inhibited.
Two test cases were adjusted: The SjLj test was made more insensitive to
register choices and scheduling; the saveCR test disabled anti-dependency
breaking because part of what it is testing is proper register reuse.
llvm-svn: 190587
For _XYZ, the type of VDATA is v4i32, because v3i32 doesn't exist.
The ADDR64 bit is not exposed. A simpler intrinsic that doesn't take
a resource descriptor might be nicer.
The maximum number of input SGPRs is bumped to 17.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 190575
This fixes some regressions in the piglit local memory store tests
introduced by recent commits which made the scheduler aware of the trans
slot.
It's not possible to test this using lit, because there is no way to
determine from the assembly dumps whether or not an instruction is in
the trans slot.
Even if this were possible, the test would be highly sensitive to
changes in the scheduler and might generate confusing false negatives.
Reviewed-by: Vincent Lejeune<vljn at ovi.com>
llvm-svn: 190574
As Andy pointed out to me a long time ago, there are no structural hazards in
the later pipeline stages of the A2, and so modeling them is useless. Also,
modeling the top pre-dispatch stages is deceiving because, when multiple
hardware threads are active, those resources are shared among the threads. The
bypass definitions were mostly wrong, and so those have been removed. The
resulting itinerary is much simpler, and more accurate.
llvm-svn: 190562
The PowerPC A2 core greatly benefits from aggressive concatenation unrolling;
use the new getUnrollingPreferences to enable this by default when targeting
the PPC A2 core.
llvm-svn: 190549
We were figuring out whether to use tPICADD or PICADD, then just using
tPICADD unconditionally anyway. Oops.
A testcase from someone familiar enough with ELF to produce one would
be appreciated. The existing PIC testcase correctly verifies the .s
generated, but that doesn't catch this bug, which only showed up in
direct-to-object mode.
http://llvm.org/bugs/show_bug.cgi?id=17180
llvm-svn: 190417
The main complication here is that TM and TMY (the memory forms) set
CC differently from the register forms. When the tested bits contain
some 0s and some 1s, the register forms set CC to 1 or 2 based on the
value the uppermost bit. The memory forms instead set CC to 1
regardless of the uppermost bit.
Until now, I've tried to make it so that a branch never tests for an
impossible CC value. E.g. NR only sets CC to 0 or 1, so branches on the
result will only test for 0 or 1. Originally I'd tried to do the same
thing for TM and TMY by using custom matching code in ISelDAGToDAG.
That ended up being very ugly though, and would have meant duplicating
some of the chain checks that the common isel code does.
I've therefore gone for the simpler alternative of adding an extra
operand to the TM DAG opcode to say whether a memory form would be OK.
This means that the inverse of a "TM;JE" is "TM;JNE" rather than the
more precise "TM;JNLE", just like the inverse of "TMLL;JE" is "TMLL;JNE".
I suppose that's arguably less confusing though...
llvm-svn: 190400
The work on this project was left in an unfinished and inconsistent state.
Hopefully someone will eventually get a chance to implement this feature, but
in the meantime, it is better to put things back the way the were. I have
left support in the bitcode reader to handle the case-range bitcode format,
so that we do not lose bitcode compatibility with the llvm 3.3 release.
This reverts the following commits: 155464, 156374, 156377, 156613, 156704,
156757, 156804 156808, 156985, 157046, 157112, 157183, 157315, 157384, 157575,
157576, 157586, 157612, 157810, 157814, 157815, 157880, 157881, 157882, 157884,
157887, 157901, 158979, 157987, 157989, 158986, 158997, 159076, 159101, 159100,
159200, 159201, 159207, 159527, 159532, 159540, 159583, 159618, 159658, 159659,
159660, 159661, 159703, 159704, 160076, 167356, 172025, 186736
llvm-svn: 190328
stores, make sure the load or store that accesses the higher half does not have
an alignment that is larger than the offset from the original address.
llvm-svn: 190318
Fix XCoreLowerThreadLocal trying to initialise globals
which have no initializer.
Add handling of const expressions containing thread local variables.
These need to be replaced with instructions, as the thread ID is
used to access the thread local variable.
llvm-svn: 190300
This sidesteps a bug in PrescheduleNodesWithMultipleUses() which
does not check if callResources will be affected by the transformation.
llvm-svn: 190299
We used to generate the compact unwind encoding from the machine
instructions. However, this had the problem that if the user used `-save-temps'
or compiled their hand-written `.s' file (with CFI directives), we wouldn't
generate the compact unwind encoding.
Move the algorithm that generates the compact unwind encoding into the
MCAsmBackend. This way we can generate the encoding whether the code is from a
`.ll' or `.s' file.
<rdar://problem/13623355>
llvm-svn: 190290
precision loads and stores as well as reg+imm double precision loads and stores.
Previously, expansion of loads and stores was done after register allocation,
but now it takes place during legalization. As a result, users will see double
precision stores and loads being emitted to spill and restore 64-bit FP registers.
llvm-svn: 190235
The architecture has many comparison instructions, including some that
extend one of the operands. The signed comparison instructions use sign
extensions and the unsigned comparison instructions use zero extensions.
In cases where we had a free choice between signed or unsigned comparisons,
we were trying to decide at lowering time which would best fit the available
instructions, taking things like extension type into account. The code
to do that was getting increasingly hairy and was also making some bad
decisions. E.g. when comparing the result of two LLCs, it is better to use
CR rather than CLR, since CR can be fused with a branch while CLR can't.
This patch removes the lowering code and instead adds an operand to
integer comparisons to say whether signed comparison is required,
whether unsigned comparison is required, or whether either is OK.
We can then leave the choice of instruction up to the normal isel code.
llvm-svn: 190138
If the DAG already has only legal types, then the second round of DAG combines
is skipped. In this case VSELECT+SETCC patterns that match a more efficient
instruction (e.g. min/max) are never recognized.
This fix allows VSELECT+SETCC combines if the types are already legal before DAG
type legalization.
Reviewer: Nadav
llvm-svn: 190105
expression uses an assembler temporary symbol from an assignment. In this case
the symbol does not have a fragment so the use of getFragment() would be NULL
and caused a crash. In the case of an assembler temporary symbol we want to use
the AliasedSymbol (if any) which will create a local relocation entry, but if
it is not an assembler temporary symbol then let it use that symbol with an
external relocation entry.
rdar://9356266
llvm-svn: 190096
This pass was segfaulting when it ran into a non-intrinsic function
call. Function calls are not supported, so now instead of segfaulting,
we will get an assertion failure with a nice error message.
I'm not sure how to test this using lit.
llvm-svn: 190076
These were pretty straightforward instructions, with some assembly support
required for HLT.
The ARM assembler is keen to split the instruction mnemonic into a
(non-existent) 'H' instruction with the LT condition code. An exception for
HLT is needed.
HLT follows the same rules as BKPT when in IT blocks, so the special BKPT
hadling code has been adapted to handle HLT also.
Regression tests added including diagnostic tests for out of range immediates
and illegal condition codes, as well as negative tests for pre-ARMv8.
llvm-svn: 190053
Solution is not sufficient to prevent 'mov pc, lr' being emitted for jump table code.
Test case doesn't trigger the added functionality.
llvm-svn: 190047
This improves code generation for jump tables by avoiding the emission of "mov pc, lr" which could fool the processor into believing this is a return from a function causing mispredicts. The code generation logic for jump tables uses ADR to materialize the address of the jump target.
Patch by Daniel Stewart!
llvm-svn: 190043
In sparc, setjmp stores only the registers %fp, %sp, %i7 and %o7. longjmp restores
the stack, and the callee-saved registers (all local/in registers: %i0-%i7, %l0-%l7)
using the stored %fp and register windows. However, this does not guarantee that the longjmp
will restore the registers, as they were when the setjmp was called. This is because these
registers may be clobbered after returning from setjmp, but before calling longjmp.
This patch prevents the registers %i0-%i5, %l0-l7 to live across the setjmp call using the register mask.
llvm-svn: 190033
These instructions, such as vmul.f32, require the second source operand to
be in D0-D15 rather than the full D0-D31. When optimizing, make sure to
account for that by constraining the register class of a replacement virtual
register to be compatible with the virtual register(s) it's replacing.
I've been unsuccessful in creating a non-fragile regression test. This issue
was detected by the LLVM nightly test suite running on an A15 (Bullet).
PR17093: http://llvm.org/bugs/show_bug.cgi?id=17093
llvm-svn: 189972
Iterator of std::vector may be implemented as a raw pointer. In
this case begin iterators are rvalues and cannot be incremented.
For example, this is the case with STDCXX implementation of vector.
Patch by Konstantin Tokarev <annulen@yandex.ru>.
llvm-svn: 189911
Previously, the clang crash handling code would kick in and give a crash
report for these, even though they're not that sort of error.
rdar://14882264
llvm-svn: 189878
This reverts commit r189648.
Fixes for the previously failing clang-side arm_neon_intrinsics test
cases will be checked in separately.
llvm-svn: 189841
For now this just handles simple comparisons of an ANDed value with zero.
The CC value provides enough information to do any comparison for a
2-bit mask, and some nonzero comparisons with more populated masks,
but that's all future work.
llvm-svn: 189819
What we really want is to enable Swift by default for *v7s triples (and there already seems to be some logic which attempts to do that). In that case the iOS version doesn't matter.
llvm-svn: 189763
don't exist in libc. This is really not the right way to solve this problem;
but it's not clear to me at this time exactly what is the right way.
If we create stubs here, they will cause link errors because these functions
do not exist in libc.
llvm-svn: 189727
Here are a few miscellaneous things to tidy up the PPC64 fast-isel
implementation. I corrected a couple of commentary lapses, and added
documentation of future opportunities. I also implemented
TargetMaterializeAlloca, which I somehow forgot when I split up the
original huge patch.
Finally, I decided to delete SelectCmp. I hadn't previously hooked it
in to TargetSelectInstruction(), and when I did I realized it wasn't
serving any useful purpose. This is only useful for compares that
don't feed a branch in the same block, and to handle that we would
have to have logic to interpret i1 as a condition register. This
could probably be done, but would require Unseemly Hackery, and
honestly does not seem worth the hassle.
This ends the current patch series.
llvm-svn: 189715
This is the last substantive patch I'm planning for fast-isel in the
near future, adding fast selection of integer truncates. There are
certainly more things that can be improved (many of which are called
out in FIXMEs), but for now we are catching most of the important
cases.
I'll document some of the remaining work in a cleanup patch shortly.
llvm-svn: 189706
This patch adds fast-isel support for calls (but not intrinsic calls
or varargs calls). It also removes a badly-formed assert. There are
some new tests just for calls, and also for folding loads into
arguments on calls to avoid extra extends.
llvm-svn: 189701
has hard float, when you compile the mips32 code you have to make sure
that it knows to compile any mips32 routines as hard float. I need to clean
up the way mips16 hard float is specified but I need to first think through
all the details. Mips16 always has a form of soft float, the difference being
whether the underlying hardware has floating point. So it's not really
necessary to pass the -soft-float to llvm since soft-float is always true
for mips16 by virtue of the fact that it will not register floating point
registers. By using this fact, I can simplify the way this is all handled.
llvm-svn: 189690
Yet another chunk of fast-isel code. This one handles various
conversions involving floating-point. (It also includes some
miscellaneous handling throughout the back end for LWA_32 and LWAX_32
that should have been part of the load-store patch.)
llvm-svn: 189677
Created SUPressureDiffs array to hold the per node PDiff computed during DAG building.
Added a getUpwardPressureDelta API that will soon replace the old
one. Compute PressureDelta here from the precomputed PressureDiffs.
Updating for liveness will come next.
llvm-svn: 189640
This is the next big chunk of fast-isel code. The primary purpose is
to implement selection of loads and stores, but there is a lot of
drag-along to support this. The common code to analyze addresses for
both loads and stores is substantial. It's also necessary to add the
materialization code for global values.
Related to load-store processing is the code to fold loads into
integer extends, since otherwise we generate lots of redundant
instructions. We also need to add some overrides to some FastEmit
routines to ensure we don't assign GPR 0 to a virtual register when
this would change the meaning of an instruction.
I added handling selection of a few binary arithmetic instructions, to
enable committing some test cases I wrote a while back.
Finally, ap couple of miscellaneous changes:
* I cleaned up some poor style from a previous patch in
PPCISelLowering.cpp, pointed out by David Blaikie.
* I enlarged the Addr.Offset field to avoid sign problems with 32-bit
offsets.
llvm-svn: 189636
In addition to recognizing when the multiply's second argument is
coming from an explicit VDUPLANE, also look for a plain scalar
f32 reference and reference it via the corresponding vector
lane.
rdar://14870054
llvm-svn: 189619
There are several optional (off-by-default) features in CodeGen that can make
use of alias analysis. These features are important for generating code for
some kinds of cores (for example the (in-order) PPC A2 core). This adds a
useAA() function to TargetSubtargetInfo to allow these features to be enabled
by default on a per-subtarget basis.
Here is the first use of this function: To control the default of the
-enable-aa-sched-mi feature.
llvm-svn: 189563
Fix a few things in one swoop.
# Add some negative tests.
# Fix some formatting issues.
# Add some missing IsThumb / ARMv8
# Fix some outs / ins mistakes.
llvm-svn: 189490
The usual default of "dmb ish" (inner-shareable) isn't even a valid instruction
on v6M or v7M (well, it does the same thing but software is strongly
discouraged from using it) so we should emit a full-system barrier there.
llvm-svn: 189483