PPCTTI::getMemoryOpCost will now make use of BasicTTI::getMemoryOpCost to
calculate the base cost of the memory access, and then adjust on top of that.
There is no functionality change from this modification, but it will become
important so that PPCTTI can take advantage of scalarization information for which
BasicTTI::getMemoryOpCost will account in the near future.
llvm-svn: 205476
Just pass a MachineInstr reference rather than an MBB iterator.
Creating a MachineInstr& is the first thing every implementation did
anyway.
llvm-svn: 205453
If we have two unique values for a v2i64 build vector, this will always result
in two vector loads if we expand using shuffles. Only one is necessary.
llvm-svn: 205231
sitofp from v2i32 to v2f64 ends up generating a SIGN_EXTEND_INREG v2i64 node
(and similarly for v2i16 and v2i8). Even though there are no sign-extension (or
algebraic shifts) for v2i64 types, we can handle v2i32 sign extensions by
converting two and from v2i64. The small trick necessary here is to shift the
i32 elements into the right lanes before the i32 -> f64 step. This is because
of the big Endian nature of the system, we need the i32 portion in the high
word of the i64 elements.
For v2i16 and v2i8 we can do the same, but we first use the default Altivec
shift-based expansion from v2i16 or v2i8 to v2i32 (by casting to v4i32) and
then apply the above procedure.
llvm-svn: 205146
v2i64 is a legal type under VSX, however we don't have native vector
comparisons. We can handle eq/ne by casting it to an Altivec type, but
everything else must be expanded.
llvm-svn: 205106
The vector divide and sqrt instructions have high latencies, and the scalar
comparisons are like all of the others. On the P7, permutations take an extra
cycle over purely-simple vector ops.
llvm-svn: 205096
I started trying to fix a small issue, but this code has seen a small fix too
many.
The old code was fairly convoluted. Some of the issues it had:
* It failed to check if a symbol difference was in the some section when
converting a relocation to pcrel.
* It failed to check if the relocation was already pcrel.
* The pcrel value computation was wrong in some cases (relocation-pc.s)
* It was missing quiet a few cases where it should not convert symbol
relocations to section relocations, leaving the backends to patch it up.
* It would not propagate the fact that it had changed a relocation to pcrel,
requiring a quiet nasty work around in ARM.
* It was missing comments.
llvm-svn: 205076
We had stored both f64 values and v2f64, etc. values in the VSX registers. This
worked, but was suboptimal because we would always spill 16-byte values even
through we almost always had scalar 8-byte values. This resulted in an
increase in stack-size use, extra memory bandwidth, etc. To fix this, I've
added 64-bit subregisters of the Altivec registers, and combined those with the
existing scalar floating-point registers to form a class of VSX scalar
floating-point registers. The ABI code has also been enhanced to use this
register class and some other necessary improvements have been made.
llvm-svn: 205075
v2[fi]64 values need to be explicitly passed in VSX registers. This is because
the code in TRI that finds the minimal register class given a register and a
value type will assert if given an Altivec register and a non-Altivec type.
llvm-svn: 205041
As explained in r204976, because of how the allocation of VSX registers
interacts with the call-lowering code, we sometimes end up generating self VSX
copies. Specifically, things like this:
%VSL2<def> = COPY %F2, %VSL2<imp-use,kill>
(where %F2 is really a sub-register of %VSL2, and so this copy is a nop)
This adds a small cleanup pass to remove these prior to post-RA scheduling.
llvm-svn: 204980
Because of how the allocation of VSX registers interacts with the call-lowering
code, we sometimes end up generating self VSX copies. Specifically, things like
this:
%VSL2<def> = COPY %F2, %VSL2<imp-use,kill>
(where %F2 is really a sub-register of %VSL2, and so this copy is a nop)
The problem is that ExpandPostRAPseudos always assumes that *some* instruction
has been inserted, and adds implicit defs to it. This is a problem if no copy
was inserted because it can cause subtle problems during post-RA scheduling.
These self copies will have to be removed some other way.
llvm-svn: 204976
First, v2f64 vector extract had not been declared legal (and so the existing
patterns were not being used). Second, the patterns for that, and for
scalar_to_vector, should really be a regclass copy, not a subregister
operation, because the VSX registers directly hold both the vector and scalar data.
llvm-svn: 204971
These operations need to be expanded during legalization so that isel does not
crash. In theory, we might be able to custom lower some of these. That,
however, would need to be follow-up work.
llvm-svn: 204963
This adds back r204781.
Original message:
Aliases are just another name for a position in a file. As such, the
regular symbol resolutions are not applied. For example, given
define void @my_func() {
ret void
}
@my_alias = alias weak void ()* @my_func
@my_alias2 = alias void ()* @my_alias
We produce without this patch:
.weak my_alias
my_alias = my_func
.globl my_alias2
my_alias2 = my_alias
That is, in the resulting ELF file my_alias, my_func and my_alias are
just 3 names pointing to offset 0 of .text. That is *not* the
semantics of IR linking. For example, linking in a
@my_alias = alias void ()* @other_func
would require the strong my_alias to override the weak one and
my_alias2 would end up pointing to other_func.
There is no way to represent that with aliases being just another
name, so the best solution seems to be to just disallow it, converting
a miscompile into an error.
llvm-svn: 204934
These patterns are dead (because v4f32 stores are currently promoted to v4i32
and stored using Altivec instructions), and also are likely not correct
(because they'd store the vector elements in the opposite order from that
assumed by the rest of the Altivec code).
llvm-svn: 204839
These instructions have access to the complete VSX register file. In addition,
they "swap" the order of the elements so that element 0 (the scalar part) comes
first in memory and element 1 follows at a higher address.
llvm-svn: 204838
v2i64 needs to be a legal VSX type because it is the SetCC result type from
v2f64 comparisons. We need to expand all non-arithmetic v2i64 operations.
This fixes the lowering for v2f64 VSELECT.
llvm-svn: 204828
With VSX there is a real vector select instruction, and so we should use it.
Note that VSELECT will still scalarize for v2f64 because the corresponding
SetCC result type (v2i64) is not currently a legal type.
llvm-svn: 204801
This reverts commit r204781.
I will follow up to with msan folks to see what is what they
were trying to do with aliases to weak aliases.
llvm-svn: 204784
Aliases are just another name for a position in a file. As such, the
regular symbol resolutions are not applied. For example, given
define void @my_func() {
ret void
}
@my_alias = alias weak void ()* @my_func
@my_alias2 = alias void ()* @my_alias
We produce without this patch:
.weak my_alias
my_alias = my_func
.globl my_alias2
my_alias2 = my_alias
That is, in the resulting ELF file my_alias, my_func and my_alias are
just 3 names pointing to offset 0 of .text. That is *not* the
semantics of IR linking. For example, linking in a
@my_alias = alias void ()* @other_func
would require the strong my_alias to override the weak one and
my_alias2 would end up pointing to other_func.
There is no way to represent that with aliases being just another
name, so the best solution seems to be to just disallow it, converting
a miscompile into an error.
llvm-svn: 204781
The VSX instruction set has two types of FMA instructions: A-type (where the
addend is taken from the output register) and M-type (where one of the product
operands is taken from the output register). This adds a small pass that runs
just after MI scheduling (and, thus, just before register allocation) that
mutates A-type instructions (that are created during isel) into M-type
instructions when:
1. This will eliminate an otherwise-necessary copy of the addend
2. One of the product operands is killed by the instruction
The "right" moment to make this decision is in between scheduling and register
allocation, because only there do we know whether or not one of the product
operands is killed by any particular instruction. Unfortunately, this also
makes the implementation somewhat complicated, because the MIs are not in SSA
form and we need to preserve the LiveIntervals analysis.
As a simple example, if we have:
%vreg5<def> = COPY %vreg9; VSLRC:%vreg5,%vreg9
%vreg5<def,tied1> = XSMADDADP %vreg5<tied0>, %vreg17, %vreg16,
%RM<imp-use>; VSLRC:%vreg5,%vreg17,%vreg16
...
%vreg9<def,tied1> = XSMADDADP %vreg9<tied0>, %vreg17, %vreg19,
%RM<imp-use>; VSLRC:%vreg9,%vreg17,%vreg19
...
We can eliminate the copy by changing from the A-type to the
M-type instruction. This means:
%vreg5<def,tied1> = XSMADDADP %vreg5<tied0>, %vreg17, %vreg16,
%RM<imp-use>; VSLRC:%vreg5,%vreg17,%vreg16
is replaced by:
%vreg16<def,tied1> = XSMADDMDP %vreg16<tied0>, %vreg18, %vreg9,
%RM<imp-use>; VSLRC:%vreg16,%vreg18,%vreg9
and we remove: %vreg5<def> = COPY %vreg9; VSLRC:%vreg5,%vreg9
llvm-svn: 204768
Although the first two operands are the ones that can be swapped, the tied
input operand is listed before them, so we need to adjust for that.
I have a test case for this, but it goes along with an upcoming commit (so it
will come soon).
llvm-svn: 204748
TableGen will create a lookup table for the A-type FMA instructions providing
their corresponding M-form opcodes. This will be used by upcoming commits.
llvm-svn: 204746
As a first step towards real little-endian code generation, this patch
changes the PowerPC MC layer to actually generate little-endian object
files. This involves passing the little-endian flag through the various
layers, including down to createELFObjectWriter so we actually get basic
little-endian ELF objects, emitting instructions in little-endian order,
and handling fixups and relocations as appropriate for little-endian.
The bulk of the patch is to update most test cases in test/MC/PowerPC
to verify both big- and little-endian encodings. (The only test cases
*not* updated are those that create actual big-endian ABI code, like
the TLS tests.)
Note that while the object files are now little-endian, the generated
code itself is not yet updated, in particular, it still does not adhere
to the ELFv2 ABI.
llvm-svn: 204634
[PPC64LE] ELFv2 ABI updates for the .opd section
The PPC64 Little Endian (PPC64LE) target supports the ELFv2 ABI, and as
such, does not have a ".opd" section. This is keyed off a _CALL_ELF=2
macro check.
The CALL_ELF check is not clearly documented at this time. The basis
for usage in this patch is from the gcc thread here:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01144.html
> Adding comment from Uli:
Looks good to me. I think the old-style JIT doesn't really work
anyway for 64-bit, but at least with this patch LLVM will compile
and link again on a ppc64le host ...
llvm-svn: 204614
I'm under the impression that we used to infer the isCommutable flag from the
instruction-associated pattern. Regardless, we don't seem to do this (at least
by default) any more. I've gone through all of our instruction definitions, and
marked as commutative all of those that should be trivial to commute (by
exchanging the first two operands). There has been special code for the RL*
instructions, and that's not changed.
Before this change, we had the following commutative instructions:
RLDIMI
RLDIMIo
RLWIMI
RLWIMI8
RLWIMI8o
RLWIMIo
XSADDDP
XSMULDP
XVADDDP
XVADDSP
XVMULDP
XVMULSP
After:
ADD4
ADD4o
ADD8
ADD8o
ADDC
ADDC8
ADDC8o
ADDCo
ADDE
ADDE8
ADDE8o
ADDEo
AND
AND8
AND8o
ANDo
CRAND
CREQV
CRNAND
CRNOR
CROR
CRXOR
EQV
EQV8
EQV8o
EQVo
FADD
FADDS
FADDSo
FADDo
FMADD
FMADDS
FMADDSo
FMADDo
FMSUB
FMSUBS
FMSUBSo
FMSUBo
FMUL
FMULS
FMULSo
FMULo
FNMADD
FNMADDS
FNMADDSo
FNMADDo
FNMSUB
FNMSUBS
FNMSUBSo
FNMSUBo
MULHD
MULHDU
MULHDUo
MULHDo
MULHW
MULHWU
MULHWUo
MULHWo
MULLD
MULLDo
MULLW
MULLWo
NAND
NAND8
NAND8o
NANDo
NOR
NOR8
NOR8o
NORo
OR
OR8
OR8o
ORo
RLDIMI
RLDIMIo
RLWIMI
RLWIMI8
RLWIMI8o
RLWIMIo
VADDCUW
VADDFP
VADDSBS
VADDSHS
VADDSWS
VADDUBM
VADDUBS
VADDUHM
VADDUHS
VADDUWM
VADDUWS
VAND
VAVGSB
VAVGSH
VAVGSW
VAVGUB
VAVGUH
VAVGUW
VMADDFP
VMAXFP
VMAXSB
VMAXSH
VMAXSW
VMAXUB
VMAXUH
VMAXUW
VMHADDSHS
VMHRADDSHS
VMINFP
VMINSB
VMINSH
VMINSW
VMINUB
VMINUH
VMINUW
VMLADDUHM
VMULESB
VMULESH
VMULEUB
VMULEUH
VMULOSB
VMULOSH
VMULOUB
VMULOUH
VNMSUBFP
VOR
VXOR
XOR
XOR8
XOR8o
XORo
XSADDDP
XSMADDADP
XSMAXDP
XSMINDP
XSMSUBADP
XSMULDP
XSNMADDADP
XSNMSUBADP
XVADDDP
XVADDSP
XVMADDADP
XVMADDASP
XVMAXDP
XVMAXSP
XVMINDP
XVMINSP
XVMSUBADP
XVMSUBASP
XVMULDP
XVMULSP
XVNMADDADP
XVNMADDASP
XVNMSUBADP
XVNMSUBASP
XXLAND
XXLNOR
XXLOR
XXLXOR
This is a by-inspection change, and I'm not sure how to write a reliable test
case. I would like advice on this, however.
llvm-svn: 204609
I've done some experimentation with this, and it looks like using the
lower-latency (but lower throughput) copy instruction is essentially always the
right thing to do.
My assumption is that, in order to be relatively sure that the higher-latency
copy will increase throughput, we'd want to have it unlikely to be in-flight
with its use. On the P7, the global completion table (GCT) can hold a maximum
of 120 instructions, shared among all active threads (up to 4), giving 30
instructions per thread. So specifically, I'd require at least that many
instructions between the copy and the use before the high-latency variant is
used.
Trying this, however, over the entire test suite resulted in zero cases where
the high-latency form would be preferable. This may be a consequence of the
fact that the scheduler views copies as free, and so they tend to end up close
to their uses. For this experiment I created a function:
unsigned chooseVSXCopy(MachineBasicBlock &MBB,
MachineBasicBlock::iterator I,
unsigned DestReg, unsigned SrcReg,
unsigned StartDist = 1,
unsigned Depth = 3) const;
with an implementation like:
if (!Depth)
return PPC::XXLOR;
const unsigned MaxDist = 30;
unsigned Dist = StartDist;
for (auto J = I, JE = MBB.end(); J != JE && Dist <= MaxDist; ++J) {
if (J->isTransient() && !J->isCopy())
continue;
if (J->isCall() || J->isReturn() || J->readsRegister(DestReg, TRI))
return PPC::XXLOR;
++Dist;
}
// We've exceeded the required distance for the high-latency form, use it.
if (Dist > MaxDist)
return PPC::XVCPSGNDP;
// If this is only an exit block, use the low-latency form.
if (MBB.succ_empty())
return PPC::XXLOR;
// We've reached the end of the block, check the successor blocks (up to some
// depth), and use the high-latency form if that is okay with all successors.
for (auto J = MBB.succ_begin(), JE = MBB.succ_end(); J != JE; ++J) {
if (chooseVSXCopy(**J, (*J)->begin(), DestReg, SrcReg,
Dist, --Depth) == PPC::XXLOR)
return PPC::XXLOR;
}
// All of our successor blocks seem okay with the high-latency variant, so
// we'll use it.
return PPC::XVCPSGNDP;
and then changed the copy opcode selection from:
Opc = PPC::XXLOR;
to:
Opc = chooseVSXCopy(MBB, std::next(I), DestReg, SrcReg);
In conclusion, I'm removing the FIXME from the comment, because I believe that
there is, at least absent other examples, nothing to fix.
llvm-svn: 204591
When VSX is available, these instructions should be used in preference to the
older variants that only have access to the scalar floating-point registers.
llvm-svn: 204559
Given
bar = foo + 4
.long bar
MC would eat the 4. GNU as includes it in the relocation. The rule seems to be
that a variable that defines a symbol is used in the relocation and one that
does not define a symbol is evaluated and the result included in the relocation.
Fixing this unfortunately required some other changes:
* Since the variable is now evaluated, it would prevent the ELF writer from
noticing the weakref marker the elf streamer uses. This patch then replaces
that with a VariantKind in MCSymbolRefExpr.
* Using VariantKind then requires us to look past other VariantKind to see
.weakref bar,foo
call bar@PLT
doing this also fixes
zed = foo +2
call zed@PLT
so that is a good thing.
* Looking past VariantKind means that the relocation selection has to use
the fixup instead of the target.
This is a reboot of the previous fixes for MC. I will watch the sanitizer
buildbot and wait for a build before adding back the previous fixes.
llvm-svn: 204294
When converting a signed 32-bit integer to double-precision floating point on
hardware without a lfiwax instruction, we have to instead use a lfd followed
by fcfid. We were erroneously offsetting the address by 4 bytes in
preparation for either a lfiwax or lfiwzx when generating the lfd. This fixes
that silly error.
This was not caught in the test suite since the conversion tests were run with
-mcpu=pwr7, which implies availability of lfiwax. I've added another test
case for older hardware that checks the code we expect in the absence of
lfiwax and other flavors of fcfid. There are fewer tests in this test case
because we punt to DAG selection in more cases on older hardware. (We must
generate complex fiddly sequences in those cases, and there is marginal
benefit in duplicating that logic in fast-isel.)
llvm-svn: 204155
Commit r181723 introduced code to avoid placing initialized variables
needing relocations into the .rodata section, which avoid copy relocs
that do not work as expected on ppc64 function references.
The same treatment is also needed for *named* .rodata.XXX sections.
This patch changes PPC64LinuxTargetObjectFile::SelectSectionForGlobal
to modify "Kind" *before* calling the default SelectSectionForGlobal
routine, instead of first calling the default routine and then just
checking for the (main) .rodata section afterwards.
llvm-svn: 203921
operator* on the by-operand iterators to return a MachineOperand& rather than
a MachineInstr&. At this point they almost behave like normal iterators!
Again, this requires making some existing loops more verbose, but should pave
the way for the big range-based for-loop cleanups in the future.
llvm-svn: 203865
VSX is an ISA extension supported on the POWER7 and later cores that enhances
floating-point vector and scalar capabilities. Among other things, this adds
<2 x double> support and generally helps to reduce register pressure.
The interesting part of this ISA feature is the register configuration: there
are 64 new 128-bit vector registers, the 32 of which are super-registers of the
existing 32 scalar floating-point registers, and the second 32 of which overlap
with the 32 Altivec vector registers. This makes things like vector insertion
and extraction tricky: this can be free but only if we force a restriction to
the right register subclass when needed. A new "minipass" PPCVSXCopy takes care
of this (although it could do a more-optimal job of it; see the comment about
unnecessary copies below).
Please note that, currently, VSX is not enabled by default when targeting
anything because it is not yet ready for that. The assembler and disassembler
are fully implemented and tested. However:
- CodeGen support causes miscompiles; test-suite runtime failures:
MultiSource/Benchmarks/FreeBench/distray/distray
MultiSource/Benchmarks/McCat/08-main/main
MultiSource/Benchmarks/Olden/voronoi/voronoi
MultiSource/Benchmarks/mafft/pairlocalalign
MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4
SingleSource/Benchmarks/CoyoteBench/almabench
SingleSource/Benchmarks/Misc/matmul_f64_4x4
- The lowering currently falls back to using Altivec instructions far more
than it should. Worse, there are some things that are scalarized through the
stack that shouldn't be.
- A lot of unnecessary copies make it past the optimizers, and this needs to
be fixed.
- Many more regression tests are needed.
Normally, I'd fix these things prior to committing, but there are some
students and other contributors who would like to work this, and so it makes
sense to move this development process upstream where it can be subject to the
regular code-review procedures.
llvm-svn: 203768
There are currently two schemes for mapping instruction operands to
instruction-format variables for generating the instruction encoders and
decoders for the assembler and disassembler respectively: a) to map by name and
b) to map by position.
In the long run, we'd like to remove the position-based scheme and use only
name-based mapping. Unfortunately, the name-based scheme currently cannot deal
with complex operands (those with suboperands), and so we currently must use
the position-based scheme for those. On the other hand, the position-based
scheme cannot deal with (register) variables that are split into multiple
ranges. An upcoming commit to the PowerPC backend (adding VSX support) will
require this capability. While we could teach the position-based scheme to
handle that, since we'd like to move away from the position-based mapping
generally, it seems silly to teach it new tricks now. What makes more sense is
to allow for partial transitioning: use the name-based mapping when possible,
and only use the position-based scheme when necessary.
Now the problem is that mixing the two sensibly was not possible: the
position-based mapping would map based on position, but would not skip those
variables that were mapped by name. Instead, the two sets of assignments would
overlap. However, I cannot currently change the current behavior, because there
are some backends that rely on it [I think mistakenly, but I'll send a message
to llvmdev about that]. So I've added a new TableGen bit variable:
noNamedPositionallyEncodedOperands, that can be used to cause the
position-based mapping to skip variables mapped by name.
llvm-svn: 203767
the stack of the analysis group because they are all immutable passes.
This is made clear by Craig's recent work to use override
systematically -- we weren't overriding anything for 'finalizePass'
because there is no such thing.
This is kind of a lame restriction on the API -- we can no longer push
and pop things, we just set up the stack and run. However, I'm not
invested in building some better solution on top of the existing
(terrifying) immutable pass and legacy pass manager.
llvm-svn: 203437
The integrated assembler now works for ppc. Since this was the last use of the
bg/p predicate and Hal says that it is now dead, drop the predicate too.
llvm-svn: 203269
Summary:
llvm/MC/MCSectionMachO.h and llvm/Support/MachO.h both had the same
definitions for the section flags. Instead, grab the definitions out of
support.
No functionality change.
Reviewers: grosbach, Bigcheese, rafael
Reviewed By: rafael
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2998
llvm-svn: 203211
The old system was fairly convoluted:
* A temporary label was created.
* A single PROLOG_LABEL was created with it.
* A few MCCFIInstructions were created with the same label.
The semantics were that the cfi instructions were mapped to the PROLOG_LABEL
via the temporary label. The output position was that of the PROLOG_LABEL.
The temporary label itself was used only for doing the mapping.
The new CFI_INSTRUCTION has a 1:1 mapping to MCCFIInstructions and points to
one by holding an index into the CFI instructions of this function.
I did consider removing MMI.getFrameInstructions completelly and having
CFI_INSTRUCTION own a MCCFIInstruction, but MCCFIInstructions have non
trivial constructors and destructors and are somewhat big, so the this setup
is probably better.
The net result is that we don't create temporary labels that are never used.
llvm-svn: 203204
The global base register cannot be r0 because it might end up as the first
argument to addi or addis. Fixes PR18316.
I don't have a small stable test case.
llvm-svn: 203054
When copying an i1 value into a GPR for a vaarg call, we need to explicitly
zero-extend the i1 value (otherwise an invalid CRBIT -> GPR copy will be
generated).
llvm-svn: 203041
On cores without fpcvt support, we cannot promote int_to_fp i1 operations,
because there is nothing to promote them to. The most straightforward
implementation of this uses a select to choose between the two possible
resulting floating-point values (and that's what is done here).
llvm-svn: 203015
Move the test for this class into the IR unittests as well.
This uncovers that ValueMap too is in the IR library. Ironically, the
unittest for ValueMap is useless in the Support library (honestly, so
was the ValueHandle test) and so it already lives in the IR unittests.
Mmmm, tasty layering.
llvm-svn: 202821
name might indicate, it is an iterator over the types in an instruction
in the IR.... You see where this is going.
Another step of modularizing the support library.
llvm-svn: 202815
Now that the PowerPC backend can track individual CR bits as first-class
registers, we should also have a way of allocating them for inline asm
statements. Because these registers are only one bit, if an output variable is
implicitly cast to a larger integer size, we'll get an any_extend to that
larger type (this is part of the existing target-independent logic). As a
result, regardless of the size of the output type, only the first bit is
meaningful.
The constraint identifier "wc" has been chosen for this purpose. Although gcc
does not currently support allocating individual CR bits, this identifier
choice has been coordinated with the gcc PowerPC team, and will be marked as
reserved for this purpose in the gcc constraints.md file.
llvm-svn: 202657
This generalizes the code to eliminate extra truncs/exts around i1 bit
operations to also do the same on PPC64 for i32 bit operations. This eliminates
a fairly prevalent code wart:
int foo(int a) {
return a == 5 ? 7 : 8;
}
On PPC64, because of the extension implied by the ABI, this would generate:
cmplwi 0, 3, 5
li 12, 8
li 4, 7
isel 3, 4, 12, 2
rldicl 3, 3, 0, 32
blr
where the 'rldicl 3, 3, 0, 32', the extension, is completely unnecessary. At
least for the single-BB case (which is all that the DAG combine mechanism can
handle), this unnecessary extension is no longer generated.
llvm-svn: 202600
The PPC isel instruction can fold 0 into the first operand (thus eliminating
the need to materialize a zero-containing register when the 'true' result of
the isel is 0). When the isel is fed by a bit register operation that we can
invert, do so as part of the bit-register-operation peephole routine.
llvm-svn: 202469
The CR bit tracking code broke PPC/Darwin; trying to get it working again...
(the darwin11 builder, which defaults to the darwin ABI when running PPC tests,
asserted when running test/CodeGen/PowerPC/inverted-bool-compares.ll)
llvm-svn: 202459
This change enables tracking i1 values in the PowerPC backend using the
condition register bits. These bits can be treated on PowerPC as separate
registers; individual bit operations (and, or, xor, etc.) are supported.
Tracking booleans in CR bits has several advantages:
- Reduction in register pressure (because we no longer need GPRs to store
boolean values).
- Logical operations on booleans can be handled more efficiently; we used to
have to move all results from comparisons into GPRs, perform promoted
logical operations in GPRs, and then move the result back into condition
register bits to be used by conditional branches. This can be very
inefficient, because the throughput of these CR <-> GPR moves have high
latency and low throughput (especially when other associated instructions
are accounted for).
- On the POWER7 and similar cores, we can increase total throughput by using
the CR bits. CR bit operations have a dedicated functional unit.
Most of this is more-or-less mechanical: Adjustments were needed in the
calling-convention code, support was added for spilling/restoring individual
condition-register bits, and conditional branch instruction definitions taking
specific CR bits were added (plus patterns and code for generating bit-level
operations).
This is enabled by default when running at -O2 and higher. For -O0 and -O1,
where the ability to debug is more important, this feature is disabled by
default. Individual CR bits do not have assigned DWARF register numbers,
and storing values in CR bits makes them invisible to the debugger.
It is critical, however, that we don't move i1 values that have been promoted
to larger values (such as those passed as function arguments) into bit
registers only to quickly turn around and move the values back into GPRs (such
as happens when values are returned by functions). A pair of target-specific
DAG combines are added to remove the trunc/extends in:
trunc(binary-ops(binary-ops(zext(x), zext(y)), ...)
and:
zext(binary-ops(binary-ops(trunc(x), trunc(y)), ...)
In short, we only want to use CR bits where some of the i1 values come from
comparisons or are used by conditional branches or selects. To put it another
way, if we can do the entire i1 computation in GPRs, then we probably should
(on the POWER7, the GPR-operation throughput is higher, and for all cores, the
CR <-> GPR moves are expensive).
POWER7 test-suite performance results (from 10 runs in each configuration):
SingleSource/Benchmarks/Misc/mandel-2: 35% speedup
MultiSource/Benchmarks/Prolangs-C++/city/city: 21% speedup
MultiSource/Benchmarks/MiBench/automotive-susan: 23% speedup
SingleSource/Benchmarks/CoyoteBench/huffbench: 13% speedup
SingleSource/Benchmarks/Misc-C++/Large/sphereflake: 13% speedup
SingleSource/Benchmarks/Misc-C++/mandel-text: 10% speedup
SingleSource/Benchmarks/Misc-C++-EH/spirit: 10% slowdown
MultiSource/Applications/lemon/lemon: 8% slowdown
llvm-svn: 202451
We need to abort the formation of counter-register-based loops where there are
128-bit integer operations that might become function calls.
llvm-svn: 202192
TargetLoweringBase is implemented in CodeGen, so before this patch we had
a dependency fom Target to CodeGen. This would show up as a link failure of
llvm-stress when building with -DBUILD_SHARED_LIBS=ON.
This fixes pr18900.
llvm-svn: 201711
r201608 made llvm corretly handle private globals with MachO. r201622 fixed
a bug in it and r201624 and r201625 were changes for using private linkage,
assuming that llvm would do the right thing.
They all got reverted because r201608 introduced a crash in LTO. This patch
includes a fix for that. The issue was that TargetLoweringObjectFile now has
to be initialized before we can mangle names of private globals. This is
trivially true during the normal codegen pipeline (the asm printer does it),
but LTO has to do it manually.
llvm-svn: 201700
The IR
@foo = private constant i32 42
is valid, but before this patch we would produce an invalid MachO from it. It
was invalid because it would use an L label in a section where the liker needs
the labels in order to atomize it.
One way of fixing it would be to just reject this IR in the backend, but that
would not be very front end friendly.
What this patch does is use an 'l' prefix in sections that we know the linker
requires symbols for atomizing them. This allows frontends to just use
private and not worry about which sections they go to or how the linker handles
them.
One small issue with this strategy is that now a symbol name depends on the
section, which is not available before codegen. This is not a problem in
practice. The reason is that it only happens with private linkage, which will
be ignored by the non codegen users (llvm-nm and llvm-ar).
llvm-svn: 201608
Summary:
AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for
targets with mature MC support. Such targets will always parse the inline
assembly (even when emitting assembly). Targets without mature MC support
continue to use EmitRawText() for assembly output.
The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced
with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler
to parse inline assembly (even when emitting assembly output). UseIntegratedAs
is set to true for targets that consider any failure to parse valid assembly
to be a bug. Target specific subclasses generally enable the integrated
assembler in their constructor. The default value can be overridden with
-no-integrated-as.
All tests that rely on inline assembly supporting invalid assembly (for example,
those that use mnemonics such as 'foo' or 'hello world') have been updated to
disable the integrated assembler.
Changes since review (and last commit attempt):
- Fixed test failures that were missed due to configuration of local build.
(fixes crash.ll and a couple others).
- Fixed tests that happened to pass because the local build was on X86
(should fix 2007-12-17-InvokeAsm.ll)
- mature-mc-support.ll's should no longer require all targets to be compiled.
(should fix ARM and PPC buildbots)
- Object output (-filetype=obj and similar) now forces the integrated assembler
to be enabled regardless of default setting or -no-integrated-as.
(should fix SystemZ buildbots)
Reviewers: rafael
Reviewed By: rafael
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2686
llvm-svn: 201333
Summary:
AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for targets with mature MC support. Such targets will always parse the inline assembly (even when emitting assembly). Targets without mature MC support continue to use EmitRawText() for assembly output.
The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler to parse inline assembly (even when emitting assembly output). UseIntegratedAs is set to true for targets that consider any failure to parse valid assembly to be a bug. Target specific subclasses generally enable the integrated assembler in their constructor. The default value can be overridden with -no-integrated-as.
All tests that rely on inline assembly supporting invalid assembly (for example, those that use mnemonics such as 'foo' or 'hello world') have been updated to disable the integrated assembler.
Reviewers: rafael
Reviewed By: rafael
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2686
llvm-svn: 201237
As part of the cleanup done to enable the disassembler, the PPC instructions
now have a valid Size description field. This can now be used to replace some
custom logic in a few places to compute instruction sizes.
Patch by David Wiberg!
llvm-svn: 200623
The subtarget info is explicitly passed to the EncodeInstruction
method and we should use that subtarget info to influence any
encoding decisions.
llvm-svn: 200350
GPRC_NOR0 is not a subclass of GPRC (because it also contains the ZERO pseudo
register). As a result, we also need to check for it in the spilling code.
llvm-svn: 200288
code to see if we're emitting a function into a non-default
text section. This is still a less-than-ideal solution, but more
contained than r199871 to determine whether or not we're emitting
code into an array of comdat sections.
llvm-svn: 200269
This has a few advantages:
* Only targets that use a MCTargetStreamer have to worry about it.
* There is never a MCTargetStreamer without a MCStreamer, so we can use a
reference.
* A MCTargetStreamer can talk to the MCStreamer in its constructor.
llvm-svn: 200129
e.g. linkonce, to TargetMachine and set it when we've done so
for ELF targets currently. This involved making TargetMachine
non-const in a TLOF use and propagating that change around - I'm
open to other ideas.
This will be used in a future commit to handle emitting debug
information with ranges.
llvm-svn: 199871
My understanding (from reading just the llvm code) is that
* most ppc cpus have a "sync n" instruction and an msync alias that is "sync 0".
* "book e" cpus instead have a msync instruction and not the more
general "sync n"
This patch reflects that in the .td files, allowing a single codepath for
asm ond obj streamer and incidentelly fixes a crash when EmitRawText was
called on a obj streamer.
llvm-svn: 199832
For PPC64 SVR (and Darwin), the stores that take byval aggregate parameters
from registers into the stack frame had MachinePointerInfo objects with
incorrect offsets. These offsets are relative to the object itself, not to the
stack frame base.
This fixes self hosting on PPC64 when compiling with -enable-aa-sched-mi.
llvm-svn: 199763
This will allow it to be called from target independent parts of the main
streamer that don't know if there is a registered target streamer or not. This
in turn will allow targets to perform extra actions at specified points in the
interface: add extra flags for some labels, extra work during finalization, etc.
llvm-svn: 199174
can be used by both the new pass manager and the old.
This removes it from any of the virtual mess of the pass interfaces and
lets it derive cleanly from the DominatorTreeBase<> template. In turn,
tons of boilerplate interface can be nuked and it turns into a very
straightforward extension of the base DominatorTree interface.
The old analysis pass is now a simple wrapper. The names and style of
this split should match the split between CallGraph and
CallGraphWrapperPass. All of the users of DominatorTree have been
updated to match using many of the same tricks as with CallGraph. The
goal is that the common type remains the resulting DominatorTree rather
than the pass. This will make subsequent work toward the new pass
manager significantly easier.
Also in numerous places things became cleaner because I switched from
re-running the pass (!!! mid way through some other passes run!!!) to
directly recomputing the domtree.
llvm-svn: 199104
directory. These passes are already defined in the IR library, and it
doesn't make any sense to have the headers in Analysis.
Long term, I think there is going to be a much better way to divide
these matters. The dominators code should be fully separated into the
abstract graph algorithm and have that put in Support where it becomes
obvious that evn Clang's CFGBlock's can use it. Then the verifier can
manually construct dominance information from the Support-driven
interface while the Analysis library can provide a pass which both
caches, reconstructs, and supports a nice update API.
But those are very long term, and so I don't want to leave the really
confusing structure until that day arrives.
llvm-svn: 199082
The target specific parser should return `false' if the target AsmParser handles
the directive, and `true' if the generic parser should handle the directive.
Many of the target specific directive handlers would `return Error' which does
not follow these semantics. This change simply changes the target specific
routines to conform to the semantis of the ParseDirective correctly.
Conformance to the semantics improves diagnostics emitted for the invalid
directives. X86 is taken as a sample to ensure that multiple diagnostics are
not presented for a single error.
llvm-svn: 199068
operand into the Value interface just like the core print method is.
That gives a more conistent organization to the IR printing interfaces
-- they are all attached to the IR objects themselves. Also, update all
the users.
This removes the 'Writer.h' header which contained only a single function
declaration.
llvm-svn: 198836
are part of the core IR library in order to support dumping and other
basic functionality.
Rename the 'Assembly' include directory to 'AsmParser' to match the
library name and the only functionality left their -- printing has been
in the core IR library for quite some time.
Update all of the #includes to match.
All of this started because I wanted to have the layering in good shape
before I started adding support for printing LLVM IR using the new pass
infrastructure, and commandline support for the new pass infrastructure.
llvm-svn: 198688
subsequent changes are easier to review. About to fix some layering
issues, and wanted to separate out the necessary churn.
Also comment and sink the include of "Windows.h" in three .inc files to
match the usage in Memory.inc.
llvm-svn: 198685
This moves the check up into the parent class so that all targets can use it
without having to copy (and keep in sync) the same error message.
llvm-svn: 198579
__builtin_returnaddress requires that the value passed into is be a constant.
However, at -O0 even a constant expression may not be converted to a constant.
Emit an error message intead of crashing.
llvm-svn: 198531
Before this patch any program that wanted to know the final symbol name of a
GlobalValue had to link with Target.
This patch implements a compromise solution where the mangler uses DataLayout.
This way, any tool that already links with Target (llc, clang) gets the exact
behavior as before and new IR files can be mangled without linking with Target.
With this patch the mangler is constructed with just a DataLayout and DataLayout
is extended to include the information the Mangler needs.
llvm-svn: 198438
CR logicals (crand, crxor, etc.) on the P7 need to be in the first slot of each
dispatch group. The old itinerary entry was just wrong (but has not mattered
because we don't generate these instructions).
This will matter when, in an upcoming commit, we start generating these
instructions.
llvm-svn: 198359
Several of the 64-bit fixed-point instructions with immediate operands were
using the 32-bit (i32) operand nodes instead of the corresponding 64-bit (i64)
operand definitions (u16imm instead of u16imm64, for example).
This error has had no effect so far, but would have caused type-checking
violations with an upcoming change.
llvm-svn: 198356
The tests for the disassembler were adapted from the encoder tests, and for the
most part, the output from the disassembler matches that encoder-test inputs.
There are some places where more-informative mnemonics could be produced
(notably for the branch instructions), and those cases are noted in the tests
with FIXMEs.
Future work includes:
- Generating more-informative mnemonics when possible (this may also be done
in the printer).
- Remove the dependence on positional "numbered" operand-to-variable mapping
(for both encoding and decoding).
- Internally using 64-bit instruction variants in 64-bit mode (if this turns
out to matter).
llvm-svn: 197693
This patch adds -f64:32:64 to 32 bit ppc darwin since a f64 inside a
structure are only 32 bit aligned.
The patch also drop -f128:64:128 from all ppc darwin, since f128 is
128 bit aligned.
llvm-svn: 197574
The instruction definitions in the PPC backend have a number of variants
defined for the same instruction to represent differences between 64-bit and
32-bit semantics. In order to generate a disassembler for the PPC backend, we
need to mark all but one of these as CodeGen only.
No functionality change intended; this is prep work for PPC disassembly
support.
llvm-svn: 197535
Without this, MachineCSE is powerless to handle redundant operations with truncated source operands.
This required fixing the 2-addr pass to handle tied subregisters. It isn't clear what combinations of subregisters can legally be tied, but the simple case of truncated source operands is now safely handled:
%vreg11<def> = COPY %vreg1:sub_32bit; GR32:%vreg11 GR64:%vreg1
%vreg12<def> = COPY %vreg2:sub_32bit; GR32:%vreg12 GR64:%vreg2
%vreg13<def,tied1> = ADD32rr %vreg11<tied0>, %vreg12<kill>, %EFLAGS<imp-def>
Test case: cse-add-with-overflow.ll.
This exposed an existing bug in
PPCInstrInfo::commuteInstruction. Thanks to Rafael for the test case:
PowerPC/crash.ll.
llvm-svn: 197465
This is a base implementation of the powerpc-apple-darwin asm parser dialect.
* Enables infrastructure (essentially isDarwin()) and fixes up the parsing of asm directives to separate out ELF and MachO/Darwin additions.
* Enables parsing of {r,f,v}XX as register identifiers.
* Enables parsing of lo16() hi16() and ha16() as modifiers.
The changes to the test case are from David Fang (fangism).
llvm-svn: 197324
Aside from a few minor latency corrections, the major change here is a new
hazard recognizer which focuses on better dispatch-group formation on the
POWER7. As with the PPC970's hazard recognizer, the most important thing it
does is avoid load-after-store hazards within the same dispatch group. It uses
the POWER7's special dispatch-group-terminating nop instruction (instead of
inserting multiple regular nop instructions). This new hazard recognizer makes
use of the scheduling dependency graph itself, built using AA information, to
robustly detect the possibility of load-after-store hazards.
significant test-suite performance changes (the error bars are 99.5% confidence
intervals based on 5 test-suite runs both with and without the change --
speedups are negative):
speedups:
MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2
-0.55171% +/- 0.333168%
MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl
-17.5576% +/- 14.598%
MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl
-29.5708% +/- 7.09058%
MultiSource/Benchmarks/TSVC/Reductions-flt/Reductions-flt
-34.9471% +/- 11.4391%
SingleSource/Benchmarks/BenchmarkGame/puzzle
-25.1347% +/- 11.0104%
SingleSource/Benchmarks/Misc/flops-8
-17.7297% +/- 9.79061%
SingleSource/Benchmarks/Shootout-C++/ary3
-35.5018% +/- 23.9458%
SingleSource/Regression/C/uint64_to_float
-56.3165% +/- 25.4234%
SingleSource/UnitTests/Vectorizer/gcc-loops
-18.5309% +/- 6.8496%
regressions:
MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000
18.351% +/- 12.156%
SingleSource/Benchmarks/Shootout-C++/methcall
27.3086% +/- 14.4733%
llvm-svn: 197099
For one predicate to subsume another, they must both check the same condition
register. Failure to check this prerequisite was causing miscompiles.
Fixes PR18003.
llvm-svn: 197089
getSymbolWithGlobalValueBase use is to create a name of a new symbol based
on the name of an existing GV. Assert that and then remove the last call
to pass true to isImplicitlyPrivate.
This gives the mangler API a 1:1 mapping from GV to names, which is what we
need to drop the mangler dependency on the target (and use an extended
datalayout instead).
llvm-svn: 196472