Commit Graph

98 Commits

Author SHA1 Message Date
Kit Barton 7a1a9e01ad This reverts commit r265505.
Revert "[Power9] Implement add-pc, multiply-add, modulo, extend-sign-shift, random number, set bool, and dfp test significance".
This patch has caused a functional regression in SPEC2k6 namd, and a performance regression in mesa-pipe.

llvm-svn: 267927
2016-04-28 20:00:42 +00:00
Nemanja Ivanovic 87bcae366d [PowerPC] Basic support for P9 byte comparison and count trailing zero insns
This patch corresponds to review:
http://reviews.llvm.org/D17850

This patch implements the following instructions:
cmprb, cmpeqb, cnttzw, cnttzw., cnttzd, cnttzd.

llvm-svn: 266228
2016-04-13 18:51:18 +00:00
Chuang-Yu Cheng 024a623c55 [Power9] Implement add-pc, multiply-add, modulo, extend-sign-shift, random number, set bool, and dfp test significance
This patch implement the following instructions:
- addpcis subpcis
- maddhd maddhdu maddld
- modsw moduw modsd modud
- darn
- extswsli extswsli.
- setb
- dtstsfi dtstsfiq

Total 15 instructions

Reviewers: nemanjai hfinkel tjablin amehsan kbarton

http://reviews.llvm.org/D17885

llvm-svn: 265505
2016-04-06 01:47:02 +00:00
Chuang-Yu Cheng eaf4b3d75c [Power9] Implement copy-paste, msgsync, slb, and stop instructions
This patch implements the following BookII and Book III instructions:
- copy copy_first cp_abort paste paste. paste_last
- msgsync
- slbieg slbsync
- stop

Total 10 instructions

Reviewers: nemanjai hfinkel tjablin amehsan kbarton
llvm-svn: 265504
2016-04-06 01:46:45 +00:00
Nemanja Ivanovic a621a7f9c3 [PowerPC] Basic support for P9 atomic loads and stores
This patch corresponds to review:
http://reviews.llvm.org/D18032

This patch provides asm implementation for the following instructions:
lwat, ldat, stwat, stdat, ldmx, mcrxrx

llvm-svn: 265022
2016-03-31 15:26:37 +00:00
Chuang-Yu Cheng d5eb774eb6 [Power9] Implement new altivec instructions: bcd* series
This patch implements the following altivec instructions:

- Decimal Convert From/to National/Zoned/Signed-QWord:
    bcdcfn. bcdcfz. bcdctn. bcdctz. bcdcfsq. bcdctsq.

- Decimal Copy-Sign/Set-Sign:
    bcdcpsgn. bcdsetsgn.

- Decimal Shift/Unsigned-Shift/Shift-and-Round:
    bcds. bcdus. bcdsr.

- Decimal (Unsigned) Truncate:
    bcdtrunc. bcdutrunc.

Total 13 instructions

Thanks Amehsan's advice! Thanks Kit's great help!
Reviewers: hal, nemanja, kbarton, tjablin, amehsan

http://reviews.llvm.org/D17838

llvm-svn: 264568
2016-03-28 09:04:23 +00:00
Chuang-Yu Cheng 80722719eb [Power9] Implement new vsx instructions: insert, extract, test data class, min/max, reverse, permute, splat
This change implements the following vsx instructions:

- Scalar Insert/Extract
    xsiexpdp xsiexpqp xsxexpdp xsxsigdp xsxexpqp xsxsigqp

- Vector Insert/Extract
    xviexpdp xviexpsp xvxexpdp xvxexpsp xvxsigdp xvxsigsp
    xxextractuw xxinsertw

- Scalar/Vector Test Data Class
    xststdcdp xststdcsp xststdcqp
    xvtstdcdp xvtstdcsp

- Maximum/Minimum
    xsmaxcdp xsmaxjdp
    xsmincdp xsminjdp

- Vector Byte-Reverse/Permute/Splat
    xxbrd xxbrh xxbrq xxbrw
    xxperm xxpermr
    xxspltib

30 instructions

Thanks Nemanja for invaluable discussion! Thanks Kit's great help!
Reviewers: hal, nemanja, kbarton, tjablin, amehsan

http://reviews.llvm.org/D16842

llvm-svn: 264567
2016-03-28 08:34:28 +00:00
Chuang-Yu Cheng 065969ec8e [Power9] Implement new altivec instructions: permute, count zero, extend sign, negate, parity, shift/rotate, mul10
This change implements the following vector operations:
- vclzlsbb vctzlsbb vctzb vctzd vctzh vctzw
- vextsb2w vextsh2w vextsb2d vextsh2d vextsw2d
- vnegd vnegw
- vprtybd vprtybq vprtybw
- vbpermd vpermr
- vrlwnm vrlwmi vrldnm vrldmi vslv vsrv
- vmul10cuq vmul10uq vmul10ecuq vmul10euq

28 instructions

Thanks Nemanja, Kit for invaluable hints and discussion!
Reviewers: hal, nemanja, kbarton, tjablin, amehsan

Phabricator: http://reviews.llvm.org/D15887
llvm-svn: 264504
2016-03-26 05:46:11 +00:00
Kit Barton ba532dc816 [Power9] Implement new vsx instructions: load, store instructions for vector and scalar
We follow the comments mentioned in http://reviews.llvm.org/D16842#344378 to
implement this new patch.

This patch implements the following vsx instructions:

Vector load/store:
lxv lxvx lxvb16x lxvl lxvll lxvh8x lxvwsx
stxv stxvb16x stxvh8x stxvl stxvll stxvx
Scalar load/store:
lxsd lxssp lxsibzx lxsihzx
stxsd stxssp stxsibx stxsihx
21 instructions

Phabricator: http://reviews.llvm.org/D16919
llvm-svn: 262906
2016-03-08 03:49:13 +00:00
Kit Barton 93612ec5f2 Power9] Implement new vsx instructions: compare and conversion
This change implements the following vsx instructions:

Quad/Double-Precision Compare:
xscmpoqp xscmpuqp
xscmpexpdp xscmpexpqp
xscmpeqdp xscmpgedp xscmpgtdp xscmpnedp
xvcmpnedp(.) xvcmpnesp(.)
Quad-Precision Floating-Point Conversion
xscvqpdp(o) xscvdpqp
xscvqpsdz xscvqpswz xscvqpudz xscvqpuwz xscvsdqp xscvudqp
xscvdphp xscvhpdp xvcvhpsp xvcvsphp
xsrqpi xsrqpix xsrqpxp
28 instructions

Phabricator: http://reviews.llvm.org/D16709
llvm-svn: 262068
2016-02-26 21:11:55 +00:00
Bill Schmidt e26236eed9 [PPC64] Add support for clrbhrb, mfbhrbe, rfebb.
This patch adds support for the ISA 2.07 additions involving the
branch history rolling buffer and event-based branching.  These will
not be used by typical applications, so built-in support is not
required.  They will only be available via inline assembly.

Assembly/disassembly tests are included in the patch.

llvm-svn: 238032
2015-05-22 16:44:10 +00:00
Hal Finkel fefcfffe68 [PowerPC] Add asm/disasm support for dcbt with hint
Add assembler/disassembler support for dcbt/dcbtst (and aliases) with the hint
field specified (non-zero). Unforunately, the syntax for this instruction is
special in that it differs for server vs. embedded cores:
   dcbt ra, rb, th [server]
   dcbt th, ra, rb [embedded]
where th can be omitted when it is 0. dcbtst is the same. Thus we need to play
games in the parser and the printer to flip the operands around on the embedded
cores. We'll use the server syntax as the default (binutils currently uses the
embedded form by default, but IBM is changing that).

We also stop marking dcbtst as having unmodeled side effects (this is not
necessary, it is just a hint like dcbt -- noticed by inspection, so no separate
test case).

llvm-svn: 235657
2015-04-23 22:47:57 +00:00
Nemanja Ivanovic c38b5311cb Add direct moves to/from VSR and exploit them for FP/INT conversions
This patch corresponds to review:
http://reviews.llvm.org/D8928

It adds direct move instructions to/from VSX registers to GPR's. These are
exploited for FP <-> INT conversions.

llvm-svn: 234682
2015-04-11 10:40:42 +00:00
Kit Barton 535e69de34 Add Hardware Transactional Memory (HTM) Support
This patch adds Hardware Transaction Memory (HTM) support supported by ISA 2.07
(POWER8). The intrinsic support is based on GCC one [1], but currently only the
'PowerPC HTM Low Level Built-in Function' are implemented.

The HTM instructions follows the RC ones and the transaction initiation result
is set on RC0 (with exception of tcheck). Currently approach is to create a
register copy from CR0 to GPR and comapring. Although this is suboptimal, since
the branch could be taken directly by comparing the CR0 value, it generates code
correctly on both test and branch and just return value. A possible future
optimization could be elimitate the MFCR instruction to branch directly.

The HTM usage requires a recently newer kernel with PPC HTM enabled. Tested on
powerpc64 and powerpc64le.

This is send along a clang patch to enabled the builtins and option switch.

[1] https://gcc.gnu.org/onlinedocs/gcc/PowerPC-Hardware-Transactional-Memory-Built-in-Functions.html

Phabricator Review: http://reviews.llvm.org/D8247

llvm-svn: 233204
2015-03-25 19:36:23 +00:00
Nemanja Ivanovic e8effe1edb Add LLVM support for PPC cryptography builtins
Review: http://reviews.llvm.org/D7955

llvm-svn: 231285
2015-03-04 20:44:33 +00:00
Hal Finkel c93a9a2cb4 [PowerPC] Add support for the QPX vector instruction set
This adds support for the QPX vector instruction set, which is used by the
enhanced A2 cores on the IBM BG/Q supercomputers. QPX vectors are 256 bytes
wide, holding 4 double-precision floating-point values. Boolean values, modeled
here as <4 x i1> are actually also represented as floating-point values
(essentially  { -1, 1 } for { false, true }). QPX shares many features with
Altivec and VSX, but is distinct from both of them. One major difference is
that, instead of adding completely-separate vector registers, QPX vector
registers are extensions of the scalar floating-point registers (lane 0 is the
corresponding scalar floating-point value). The operations supported on QPX
vectors mirrors that supported on the scalar floating-point values (with some
additional ones for permutations and logical/comparison operations).

I've been maintaining this support out-of-tree, as part of the bgclang project,
for several years. This is not the entire bgclang patch set, but is most of the
subset that can be cleanly integrated into LLVM proper at this time. Adding
this to the LLVM backend is part of my efforts to rebase bgclang to the current
LLVM trunk, but is independently useful (especially for codes that use LLVM as
a JIT in library form).

The assembler/disassembler test coverage is complete. The CodeGen test coverage
is not, but I've included some tests, and more will be added as follow-up work.

llvm-svn: 230413
2015-02-25 01:06:45 +00:00
Hal Finkel 64202167c5 [PowerPC] Add assembler support for mcrfs and friends
Fill out our support for the floating-point status and control register
instructions (mcrfs and friends). As it turns out, these are necessary for
compiling src/test/harness_fp.h in TBB for PowerPC.

Thanks to Raf Schietekat for reporting the issue!

llvm-svn: 226070
2015-01-15 01:00:53 +00:00
Hal Finkel fc096c98f3 [PowerPC] Ensure that the TOC reload directly follows bctrl on PPC64
On non-Darwin PPC64, the TOC reload needs to come directly after the bctrl
instruction (for indirect calls) because the 'bctrl/ld 2, 40(1)' instruction
sequence is interpreted by the unwinding code in libgcc. To make sure these
occur as a pair, as with other pairings interpreted by the linker, fuse the two
instructions into one instruction (for code generation only).

In the future, we might wish to do this by emitting CFI directives instead,
but this solution is simpler, and mirrors what GCC does. Additional discussion
on this point is contained in the PR.

Fixes PR22015.

llvm-svn: 224788
2014-12-23 22:29:40 +00:00
Hal Finkel 5901676581 [PowerPC] Add the 'attn' instruction
The attn instruction is not part of the Power ISA, but is documented in the A2
user manual, and is accepted by the GNU assembler for the A2 and the POWER4+.
Reported as part of PR21650.

llvm-svn: 222712
2014-11-25 00:30:11 +00:00
Hal Finkel 584a70c820 [PowerPC] Add support for dcbtst and icbt (prefetch)
Adds code generation support for dcbtst (data cache prefetch for write) and
icbt (instruction cache prefetch for read - Book E cores only).

We still end up with a 'cannot select' error for the non-supported prefetch
intrinsic forms. This will be fixed in a later commit.

Fixes PR20692.

llvm-svn: 216339
2014-08-23 23:21:04 +00:00
Joerg Sonnenberger 6c3e38522a tlbre / tlbwe / tlbsx / tlbsx. variants for the PPC 4xx CPUs.
llvm-svn: 214784
2014-08-04 21:28:22 +00:00
Joerg Sonnenberger 99ab590ac9 Don't use additional arguments for dss and friends to satisfy DSS_Form,
when let can do the same thing. Keep the 64bit variants as codegen-only.
While they have a different register class, the encoding is the same for
32bit and 64bit mode. Having both present would otherwise confuse the
disassembler.

llvm-svn: 214636
2014-08-02 15:09:41 +00:00
Joerg Sonnenberger c5fe19d062 Refactor TLBIVAX and add tlbsx.
llvm-svn: 214354
2014-07-30 22:51:15 +00:00
Joerg Sonnenberger 99ef10f915 Recognize BookE's mbar instruction.
llvm-svn: 214244
2014-07-29 23:16:31 +00:00
Joerg Sonnenberger 9e9623ca64 Support move to/from segment register.
llvm-svn: 214234
2014-07-29 22:21:57 +00:00
Ulrich Weigand ad0cb91ed9 [PowerPC] Simplify and improve loading into TOC register
During an indirect function call sequence on the 64-bit SVR4 ABI,
generate code must load and then restore the TOC register.

This does not use a regular LOAD instruction since the TOC
register r2 is marked as reserved.  Instead, the are two
special instruction patterns:

 let RST = 2, DS = 2 in
 def LDinto_toc: DSForm_1a<58, 0, (outs), (ins g8rc:$reg),
                     "ld 2, 8($reg)", IIC_LdStLD,
                     [(PPCload_toc i64:$reg)]>, isPPC64;
 
 let RST = 2, DS = 10, RA = 1 in
 def LDtoc_restore : DSForm_1a<58, 0, (outs), (ins),
                     "ld 2, 40(1)", IIC_LdStLD,
                     [(PPCtoc_restore)]>, isPPC64;

Note that these not only restrict the destination of the
load to r2, but they also restrict the *source* of the
load to particular address combinations.  The latter is
a problem when we want to support the ELFv2 ABI, since
there the TOC save slot is no longer at 40(1).

This patch replaces those two instructions with a single
instruction pattern that only hard-codes r2 as destination,
but supports generic addresses as source.  This will allow
supporting the ELFv2 ABI, and also helps generate more
efficient code for calls to absolute addresses (allowing
simplification of the ppc64-calls.ll test case).

llvm-svn: 211193
2014-06-18 17:52:49 +00:00
Hal Finkel 27774d9274 [PowerPC] Initial support for the VSX instruction set
VSX is an ISA extension supported on the POWER7 and later cores that enhances
floating-point vector and scalar capabilities. Among other things, this adds
<2 x double> support and generally helps to reduce register pressure.

The interesting part of this ISA feature is the register configuration: there
are 64 new 128-bit vector registers, the 32 of which are super-registers of the
existing 32 scalar floating-point registers, and the second 32 of which overlap
with the 32 Altivec vector registers. This makes things like vector insertion
and extraction tricky: this can be free but only if we force a restriction to
the right register subclass when needed. A new "minipass" PPCVSXCopy takes care
of this (although it could do a more-optimal job of it; see the comment about
unnecessary copies below).

Please note that, currently, VSX is not enabled by default when targeting
anything because it is not yet ready for that.  The assembler and disassembler
are fully implemented and tested. However:

 - CodeGen support causes miscompiles; test-suite runtime failures:
      MultiSource/Benchmarks/FreeBench/distray/distray
      MultiSource/Benchmarks/McCat/08-main/main
      MultiSource/Benchmarks/Olden/voronoi/voronoi
      MultiSource/Benchmarks/mafft/pairlocalalign
      MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4
      SingleSource/Benchmarks/CoyoteBench/almabench
      SingleSource/Benchmarks/Misc/matmul_f64_4x4

 - The lowering currently falls back to using Altivec instructions far more
   than it should. Worse, there are some things that are scalarized through the
   stack that shouldn't be.

 - A lot of unnecessary copies make it past the optimizers, and this needs to
   be fixed.

 - Many more regression tests are needed.

Normally, I'd fix these things prior to committing, but there are some
students and other contributors who would like to work this, and so it makes
sense to move this development process upstream where it can be subject to the
regular code-review procedures.

llvm-svn: 203768
2014-03-13 07:58:58 +00:00
Hal Finkel 940ab934d4 Add CR-bit tracking to the PowerPC backend for i1 values
This change enables tracking i1 values in the PowerPC backend using the
condition register bits. These bits can be treated on PowerPC as separate
registers; individual bit operations (and, or, xor, etc.) are supported.
Tracking booleans in CR bits has several advantages:

 - Reduction in register pressure (because we no longer need GPRs to store
   boolean values).

 - Logical operations on booleans can be handled more efficiently; we used to
   have to move all results from comparisons into GPRs, perform promoted
   logical operations in GPRs, and then move the result back into condition
   register bits to be used by conditional branches. This can be very
   inefficient, because the throughput of these CR <-> GPR moves have high
   latency and low throughput (especially when other associated instructions
   are accounted for).

 - On the POWER7 and similar cores, we can increase total throughput by using
   the CR bits. CR bit operations have a dedicated functional unit.

Most of this is more-or-less mechanical: Adjustments were needed in the
calling-convention code, support was added for spilling/restoring individual
condition-register bits, and conditional branch instruction definitions taking
specific CR bits were added (plus patterns and code for generating bit-level
operations).

This is enabled by default when running at -O2 and higher. For -O0 and -O1,
where the ability to debug is more important, this feature is disabled by
default. Individual CR bits do not have assigned DWARF register numbers,
and storing values in CR bits makes them invisible to the debugger.

It is critical, however, that we don't move i1 values that have been promoted
to larger values (such as those passed as function arguments) into bit
registers only to quickly turn around and move the values back into GPRs (such
as happens when values are returned by functions). A pair of target-specific
DAG combines are added to remove the trunc/extends in:
  trunc(binary-ops(binary-ops(zext(x), zext(y)), ...)
and:
  zext(binary-ops(binary-ops(trunc(x), trunc(y)), ...)
In short, we only want to use CR bits where some of the i1 values come from
comparisons or are used by conditional branches or selects. To put it another
way, if we can do the entire i1 computation in GPRs, then we probably should
(on the POWER7, the GPR-operation throughput is higher, and for all cores, the
CR <-> GPR moves are expensive).

POWER7 test-suite performance results (from 10 runs in each configuration):

SingleSource/Benchmarks/Misc/mandel-2: 35% speedup
MultiSource/Benchmarks/Prolangs-C++/city/city: 21% speedup
MultiSource/Benchmarks/MiBench/automotive-susan: 23% speedup
SingleSource/Benchmarks/CoyoteBench/huffbench: 13% speedup
SingleSource/Benchmarks/Misc-C++/Large/sphereflake: 13% speedup
SingleSource/Benchmarks/Misc-C++/mandel-text: 10% speedup

SingleSource/Benchmarks/Misc-C++-EH/spirit: 10% slowdown
MultiSource/Applications/lemon/lemon: 8% slowdown

llvm-svn: 202451
2014-02-28 00:27:01 +00:00
Hal Finkel 2345347eb9 Add a disassembler to the PowerPC backend
The tests for the disassembler were adapted from the encoder tests, and for the
most part, the output from the disassembler matches that encoder-test inputs.
There are some places where more-informative mnemonics could be produced
(notably for the branch instructions), and those cases are noted in the tests
with FIXMEs.

Future work includes:

 - Generating more-informative mnemonics when possible (this may also be done
   in the printer).

 - Remove the dependence on positional "numbered" operand-to-variable mapping
   (for both encoding and decoding).

 - Internally using 64-bit instruction variants in 64-bit mode (if this turns
   out to matter).

llvm-svn: 197693
2013-12-19 16:13:01 +00:00
Hal Finkel ceb1f12d9a Improve instruction scheduling for the PPC POWER7
Aside from a few minor latency corrections, the major change here is a new
hazard recognizer which focuses on better dispatch-group formation on the
POWER7. As with the PPC970's hazard recognizer, the most important thing it
does is avoid load-after-store hazards within the same dispatch group. It uses
the POWER7's special dispatch-group-terminating nop instruction (instead of
inserting multiple regular nop instructions). This new hazard recognizer makes
use of the scheduling dependency graph itself, built using AA information, to
robustly detect the possibility of load-after-store hazards.

significant test-suite performance changes (the error bars are 99.5% confidence
intervals based on 5 test-suite runs both with and without the change --
speedups are negative):

speedups:

MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2
	-0.55171% +/- 0.333168%

MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl
	-17.5576% +/- 14.598%

MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl
	-29.5708% +/- 7.09058%

MultiSource/Benchmarks/TSVC/Reductions-flt/Reductions-flt
	-34.9471% +/- 11.4391%

SingleSource/Benchmarks/BenchmarkGame/puzzle
	-25.1347% +/- 11.0104%

SingleSource/Benchmarks/Misc/flops-8
	-17.7297% +/- 9.79061%

SingleSource/Benchmarks/Shootout-C++/ary3
	-35.5018% +/- 23.9458%

SingleSource/Regression/C/uint64_to_float
	-56.3165% +/- 25.4234%

SingleSource/UnitTests/Vectorizer/gcc-loops
	-18.5309% +/- 6.8496%

regressions:

MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000
	18.351% +/- 12.156%

SingleSource/Benchmarks/Shootout-C++/methcall
	27.3086% +/- 14.4733%

llvm-svn: 197099
2013-12-12 00:19:11 +00:00
Hal Finkel 3e5a360ba3 Add IIC_ prefix to PPC instruction-class names
This adds the IIC_ prefix to the instruction itinerary class names, giving the
PPC backend a naming convention for itinerary classes that is more consistent
with that used by the X86 and ARM backends.

Instruction scheduling in the PPC backend needs a bunch of cleanup and
improvement (especially for the ooo cores). This is just a preliminary step.

No functionality change intended.

llvm-svn: 195890
2013-11-27 23:26:09 +00:00
Roman Divacky 62cb63543b Implement asm support for a few PowerPC bookIII that are needed for assembling
FreeBSD kernel.

llvm-svn: 190618
2013-09-12 17:50:54 +00:00
Ulrich Weigand 98fcc7b6bc [PowerPC] Support "eieio" instruction
This adds support for the "eieio" instruction to
the asm parser.

llvm-svn: 185349
2013-07-01 17:06:26 +00:00
Ulrich Weigand 797f1a3f5b [PowerPC] Add variants of "sync" instruction
This adds support for the "sync $L" instruction with operand,
and provides aliases for "lwsync" and "ptesync".

llvm-svn: 185344
2013-07-01 16:37:52 +00:00
Ulrich Weigand 824b7d8dfd [PowerPC] Support generic conditional branches in asm parser
This adds instruction patterns to cover the generic forms of
the conditional branch instructions.  This allows the assembler
to support the generic mnemonics.

The compiler will still generate the various specific forms
of the instruction that were already supported.

llvm-svn: 184722
2013-06-24 11:55:21 +00:00
Bill Schmidt a87a7e2620 Implement the PowerPC system call (sc) instruction.
Instruction added at request of Roman Divacky.  Tested via asm-parser.

llvm-svn: 181821
2013-05-14 19:35:45 +00:00
Ulrich Weigand 300b6875fb [PowerPC] Add some Book II instructions to AsmParser
This patch adds a couple of Book II instructions (isync, icbi) to the
PowerPC assembler parser.  These are needed when bootstrapping clang
with the integrated assembler forced on, because they are used in
inline asm statements in the code base.

The test case adds the full list of Book II storage control instructions,
including associated extended mnemonics.  Again, those that are not yet
supported as marked as FIXME.

llvm-svn: 181052
2013-05-03 19:51:09 +00:00
Ulrich Weigand fa451ba1b9 PowerPC: Fix encoding of rldimi and rldcl instructions
When testing the asm parser, I noticed wrong encodings for the
above instructions (wrong operand name in rldimi, wrong form
and sub-opcode for rldcl).

Tests will be added together with the asm parser.

llvm-svn: 180606
2013-04-26 15:39:12 +00:00
Hal Finkel 2277196f64 Add a comment about the PPC Interpretation64Bit bit
llvm-svn: 179391
2013-04-12 18:17:38 +00:00
Hal Finkel 654d43b41a Add PPC instruction record forms and associated query functions
This is prep. work for the implementation of optimizeCompare. Many PPC
instructions have 'record' forms (in almost all cases, this means that the RC
bit is set) that cause the result of the instruction to be compared with zero,
and the result of that comparison saved in a predefined condition register. In
order to add the record forms of the instructions without too much
copy-and-paste, the relevant functions have been refactored into multiclasses
which define both the record and normal forms.

Also, two TableGen-generated mapping functions have been added which allow
querying the instruction code for the record form given the normal form (and
vice versa).

No functionality change intended.

llvm-svn: 179356
2013-04-12 02:18:09 +00:00
Hal Finkel b5aa7e54d9 Generate PPC early conditional returns
PowerPC has a conditional branch to the link register (return) instruction: BCLR.
This should be used any time when we'd otherwise have a conditional branch to a
return. This adds a small pass, PPCEarlyReturn, which runs just prior to the
branch selection pass (and, importantly, after block placement) to generate
these conditional returns when possible. It will also eliminate unconditional
branches to returns (these happen rarely; most of the time these have already
been tail duplicated by the time PPCEarlyReturn is invoked). This is a nice
optimization for small functions that do not maintain a stack frame.

llvm-svn: 179026
2013-04-08 16:24:03 +00:00
Ulrich Weigand bbfb0c55c8 PowerPC: Mark patterns as isCodeGenOnly.
There remain a number of patterns that cannot (and should not)
be handled by the asm parser, in particular all the Pseudo patterns.

This commit marks those patterns as isCodeGenOnly.

No change in generated code.

llvm-svn: 178008
2013-03-26 10:57:16 +00:00
Ulrich Weigand 874fc628df PowerPC: Simplify FADD in round-to-zero mode.
As part of the the sequence generated to implement long double -> int
conversions, we need to perform an FADD in round-to-zero mode.  This is
problematical since the FPSCR is not at all modeled at the SelectionDAG
level, and thus there is a risk of getting floating point instructions
generated out of sequence with the instructions to modify FPSCR.

The current code handles this by somewhat "special" patterns that in part
have dummy operands, and/or duplicate existing instructions, making them
awkward to handle in the asm parser.

This commit changes this by leaving the "FADD in round-to-zero mode"
as an atomic operation on the SelectionDAG level, and only split it up into
real instructions at the MI level (via custom inserter).  Since at *this*
level the FPSCR *is* modeled (via the "RM" hard register), much of the
"special" stuff can just go away, and the resulting patterns can be used by
the asm parser.

No significant change in generated code expected.

llvm-svn: 178006
2013-03-26 10:56:22 +00:00
Ulrich Weigand 4749b1ecd8 PowerPC: Use CCBITRC operand for ISEL patterns.
This commit changes the ISEL patterns to use a CCBITRC operand
instead of a "pred" operand.  This matches the actual instruction
text more directly, and simplifies use of ISEL with the asm parser.
In addition, this change allows some simplification of handling
the "pred" operand, as this is now only used by BCC.

No change in generated code.

llvm-svn: 178003
2013-03-26 10:54:54 +00:00
Hal Finkel 756810fe36 Implement builtin_{setjmp/longjmp} on PPC
This implements SJLJ lowering on PPC, making the Clang functions
__builtin_{setjmp/longjmp} functional on PPC platforms. The implementation
strategy is similar to that on X86, with the exception that a branch-and-link
variant is used to get the right jump address. Credit goes to Bill Schmidt for
suggesting the use of the unconditional bcl form (instead of the regular bl
instruction) to limit return-address-cache pollution.

Benchmarking the speed at -O3 of:

static jmp_buf env_sigill;

void foo() {
                __builtin_longjmp(env_sigill,1);
}

main() {
	...

        for (int i = 0; i < c; ++i) {
                if (__builtin_setjmp(env_sigill)) {
                        goto done;
                } else {
                        foo();
                }

done:;
        }

	...
}

vs. the same code using the libc setjmp/longjmp functions on a P7 shows that
this builtin implementation is ~4x faster with Altivec enabled and ~7.25x
faster with Altivec disabled. This comparison is somewhat unfair because the
libc version must also save/restore the VSX registers which we don't yet
support.

llvm-svn: 177666
2013-03-21 21:37:52 +00:00
Ulrich Weigand a82389b3d0 Fix wrong PowerPC instruction encodings due to
operand field name mismatches in:
 - AForm_3  (fmul, fmuls)
 - XFXForm_5 (mtcrf)
 - XFLForm (mtfsf)

llvm-svn: 167862
2012-11-13 19:19:46 +00:00
Ulrich Weigand 0117718580 Fix instruction encoding for "bd(n)z" on PowerPC,
by using a new instruction format BForm_1.

llvm-svn: 167861
2012-11-13 19:15:52 +00:00
Ulrich Weigand 84ee76acfe Fix instruction encoding for "isel" on PowerPC,
using a new instruction format AForm_4.

llvm-svn: 167860
2012-11-13 19:14:19 +00:00
Hal Finkel 96c2d4d945 Add the PPCCTRLoops pass: a PPC machine-code-level optimization pass to form CTR-based loop branching code.
This pass is derived from the Hexagon HardwareLoops pass. The only significant enhancement over the Hexagon
pass is that PPCCTRLoops will also attempt to delete the replaced add and compare operations if they are
no longer otherwise used. Also, invalid preheader DebugLoc is not used.

llvm-svn: 158204
2012-06-08 15:38:21 +00:00
Hal Finkel 51861b4855 Fix dynamic linking on PPC64.
Dynamic linking on PPC64 has had problems since we had to move the top-down
hazard-detection logic post-ra. For dynamic linking to work there needs to be
a nop placed after every call. It turns out that it is really hard to guarantee
that nothing will be placed in between the call (bl) and the nop during post-ra
scheduling. Previous attempts at fixing this by placing logic inside the
hazard detector only partially worked.

This is now fixed in a different way: call+nop codegen-only instructions. As far
as CodeGen is concerned the pair is now a single instruction and cannot be split.
This solution works much better than previous attempts.

The scoreboard hazard detector is also renamed to be more generic, there is currently
no cpu-specific logic in it.

llvm-svn: 153816
2012-03-31 14:45:15 +00:00