Commit Graph

5408 Commits

Author SHA1 Message Date
Matthias Braun c6613879ce LivePhysRegs/IfConversion: Change some types from unsigned to MCPhysReg; NFC
Change the type in a couple of lists and sets that only store physical
registers from unsigned to MCPhysRegs. The later is only 16bits and
saves us a bit of memory.

llvm-svn: 346254
2018-11-06 19:00:11 +00:00
Craig Topper 0b5f8169b0 [TargetLowering] Change TargetLoweringBase::getPreferredVectorAction to take an MVT instead of an EVT. NFC
The main caller of this already has an MVT and several targets called getSimpleVT inside without checking isSimple. This makes the simpleness explicit.

llvm-svn: 346180
2018-11-05 23:26:13 +00:00
Zaara Syeda 7509880b54 [Power9] Add support for stxvw4x.be and stxvd2x.be intrinsics
On Power9, we don't have patterns to select the following intrinsics:
llvm.ppc.vsx.stxvw4x.be
llvm.ppc.vsx.stxvd2x.be

This patch adds support for these.

Differential Revision: https://reviews.llvm.org/D53581

llvm-svn: 346148
2018-11-05 17:31:26 +00:00
Reid Kleckner 4dc0b1ac60 Fix clang -Wimplicit-fallthrough warnings across llvm, NFC
This patch should not introduce any behavior changes. It consists of
mostly one of two changes:
1. Replacing fall through comments with the LLVM_FALLTHROUGH macro
2. Inserting 'break' before falling through into a case block consisting
   of only 'break'.

We were already using this warning with GCC, but its warning behaves
slightly differently. In this patch, the following differences are
relevant:
1. GCC recognizes comments that say "fall through" as annotations, clang
   doesn't
2. GCC doesn't warn on "case N: foo(); default: break;", clang does
3. GCC doesn't warn when the case contains a switch, but falls through
   the outer case.

I will enable the warning separately in a follow-up patch so that it can
be cleanly reverted if necessary.

Reviewers: alexfh, rsmith, lattner, rtrieu, EricWF, bollu

Differential Revision: https://reviews.llvm.org/D53950

llvm-svn: 345882
2018-11-01 19:54:45 +00:00
Li Jia He 03170a904f [PowerPC] Support constraint 'wi' in asm
From the gcc manual, we can see that the specific limit of wi inline asm is “FP or VSX register to hold 64-bit integers for VSX insns or NO_REGS”. The link is https://gcc.gnu.org/onlinedocs/gcc-8.2.0/gcc/Machine-Constraints.html#Machine-Constraints. We should accept this constraint.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D53265

llvm-svn: 345810
2018-11-01 02:35:17 +00:00
Dorit Nuzman 34da6dd696 [LV] Support vectorization of interleave-groups that require an epilog under
optsize using masked wide loads 

Under Opt for Size, the vectorizer does not vectorize interleave-groups that
have gaps at the end of the group (such as a loop that reads only the even
elements: a[2*i]) because that implies that we'll require a scalar epilogue
(which is not allowed under Opt for Size). This patch extends the support for
masked-interleave-groups (introduced by D53011 for conditional accesses) to
also cover the case of gaps in a group of loads; Targets that enable the
masked-interleave-group feature don't have to invalidate interleave-groups of
loads with gaps; they could now use masked wide-loads and shuffles (if that's
what the cost model selects).

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53668

llvm-svn: 345705
2018-10-31 09:57:56 +00:00
Lei Huang de20843f6f [PowerPC] Improve BUILD_VECTOR of 4 i32s
Currently, for this node:
  vector int test(int a, int b, int c, int d) {
    return (vector int) { a, b, c, d };
  }

we get this on Power9:
  mtvsrdd 34, 5, 3
  mtvsrdd 35, 6, 4
  vmrgow 2, 3, 2

and this on Power8:
  mtvsrwz 0, 3
  mtvsrwz 1, 5
  mtvsrwz 2, 4
  mtvsrwz 3, 6
  xxmrghd 34, 1, 0
  xxmrghd 35, 3, 2
  vmrgow 2, 3, 2

This can be improved to this on LE Power9:
  rldimi 3, 4, 32, 0
  rldimi 5, 6, 32, 0
  mtvsrdd 34, 5, 3

and this on LE Power8
  rldimi 3, 4, 32, 0
  rldimi 5, 6, 32, 0
  mtvsrd 34, 3
  mtvsrd 35, 5
  xxpermdi 34, 35, 34, 0

This patch updates the TD pattern to generate the optimized sequence for both
Power8 and Power9 on LE and BE.

Differential Revision: https://reviews.llvm.org/D53494

llvm-svn: 345414
2018-10-26 18:09:36 +00:00
Li Jia He f6fb752fe8 [PowerPC] Fix some missed optimization opportunities in combineSetCC
For both operands are bool, short, int, long, long long, add the following optimization.
1. 0-x == y --> x+y ==0
2. 0-x != y --> x+y != 0

Review: nemanjai
Differential Revision: https://reviews.llvm.org/D53360

llvm-svn: 345366
2018-10-26 06:48:53 +00:00
Nemanja Ivanovic 6a74bfba20 [PowerPC] Keep vector int to fp conversions in vector domain
At present a v2i16 -> v2f64 convert is implemented by extracts to scalar,
scalar converts, and merge back into a vector. Use vector converts instead,
with the int data permuted into the proper position and extended if necessary.

Patch by RolandF.

Differential revision: https://reviews.llvm.org/D53346

llvm-svn: 345361
2018-10-26 03:19:13 +00:00
Stefan Pintilie 927e8bf316 [Power9] Add __float128 support in the backend for bitcast to a i128
Add support to allow bit-casting from f128 to i128 and then
extracting 64 bits from the result.

Differential Revision: https://reviews.llvm.org/D49507

llvm-svn: 345053
2018-10-23 17:11:36 +00:00
Nemanja Ivanovic 674581afbb [PowerPC][NFC] Fix bugs in r+r to r+i conversion
The D-Form VSX loads introduced in ISA 3.0 are not direct D-Form equivalent of
the corresponding X-Forms since they only target the Altivec registers.
Namely LXSSPX can load into any of the 64 VSX registers whereas LXSSP can only
load into the upper 32 VSX registers. Similarly with the remaining affected
instructions.

There is currently no way that I can see to trigger the bug, but as we add other
ways of exploiting these instructions, there may very well be instances that do.

This is an NFC patch in practical terms since the changes it introduces can not
be triggered without an MIR test.

Differential revision: https://reviews.llvm.org/D53323

llvm-svn: 344894
2018-10-22 11:22:59 +00:00
Chandler Carruth edb12a838a [TI removal] Make variables declared as `TerminatorInst` and initialized
by `getTerminator()` calls instead be declared as `Instruction`.

This is the biggest remaining chunk of the usage of `getTerminator()`
that insists on the narrow type and so is an easy batch of updates.
Several files saw more extensive updates where this would cascade to
requiring API updates within the file to use `Instruction` instead of
`TerminatorInst`. All of these were trivial in nature (pervasively using
`Instruction` instead just worked).

llvm-svn: 344502
2018-10-15 10:04:59 +00:00
Dorit Nuzman 38bbf81ade recommit 344472 after fixing build failure on ARM and PPC.
llvm-svn: 344475
2018-10-14 08:50:06 +00:00
Dorit Nuzman 5118c68cde revert 344472 due to failures.
llvm-svn: 344473
2018-10-14 07:21:20 +00:00
Dorit Nuzman 8174368955 [IAI,LV] Add support for vectorizing predicated strided accesses using masked
interleave-group

The vectorizer currently does not attempt to create interleave-groups that
contain predicated loads/stores; predicated strided accesses can currently be
vectorized only using masked gather/scatter or scalarization. This patch makes
predicated loads/stores candidates for forming interleave-groups during the
Loop-Vectorizer's analysis, and adds the proper support for masked-interleave-
groups to the Loop-Vectorizer's planning and transformation stages. The patch
also extends the TTI API to allow querying the cost of masked interleave groups
(which each target can control); Targets that support masked vector loads/
stores may choose to enable this feature and allow vectorizing predicated
strided loads/stores using masked wide loads/stores and shuffles.

Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53011

llvm-svn: 344472
2018-10-14 07:06:16 +00:00
Hiroshi Inoue 9552dd187a [PowerPC] avoid masking already-zero bits in BitPermutationSelector
The current BitPermutationSelector generates a code to build a value by tracking two types of bits: ConstZero and Variable.
ConstZero means a bit we need to mask off and Variable is a bit we copy from an input value.

This patch add third type of bits VariableKnownToBeZero caused by AssertZext node or zero-extending load node.
VariableKnownToBeZero means a bit comes from an input value, but it is known to be already zero. So we do not need to mask them.
VariableKnownToBeZero enhances flexibility to group bits, since we can avoid redundant masking for these bits.

This patch also renames "HasZero" to "NeedMask" since now we may skip masking even when we have zeros (of type VariableKnownToBeZero).

Differential Revision: https://reviews.llvm.org/D48025

llvm-svn: 344347
2018-10-12 14:02:20 +00:00
QingShan Zhang bc1586352e [PowerPC] Fix the assert of ISD::SIGN_EXTEND_INREG when type is v2i16 and v2i8
For ISD::SIGN_EXTEND_INREG operation of v2i16 and v2i8 types will cause assert because they are registered as custom operation. 
So that the type legalization phase will enter the custom hook, which do not handle ISD::SIGN_EXTEND_INREG operation and fall throw into unreachable assert.

Patch By: wuzish (Zixuan Wu)
Differential Revision: https://reviews.llvm.org/D52449

llvm-svn: 344109
2018-10-10 02:33:48 +00:00
Nemanja Ivanovic 87873d04c3 [PowerPC] Implement hasBitPreservingFPLogic for types that can be supported
This is the PPC-specific non-controversial part of
https://reviews.llvm.org/D44548 that simply enables this combine for PPC
since PPC has these instructions.
This commit will allow the target-independent portion to be truly target
independent.

llvm-svn: 344077
2018-10-09 20:35:15 +00:00
Nemanja Ivanovic 4c0b110e3e [PowerPC] Remove self-copies in pre-emit peephole
There are occasionally instances where AADB rewrites registers in such a way
that a reg-reg copy becomes a self-copy. Such an instruction is obviously
redundant and can be removed. This patch does precisely that.

Note that this will not remove various nop's that we insert (which are
themselves just self-copies). The reason those are left alone is that all of
them have their own opcodes (that just encode to a self-copy).

What prompted this patch is the fact that these self-copies sometimes end up
using registers that make the instruction a priority-setting nop, thereby
having a significant effect on performance.

Differential revision: https://reviews.llvm.org/D52432

llvm-svn: 344036
2018-10-09 10:54:04 +00:00
Jonas Paulsson faad1b3056 [TargetRegisterInfo] Remove temporary hook enableMultipleCopyHints()
Finally all targets are enabling multiple regalloc hints, so the hook to
disable this can now be removed.

NFC.

Review: Simon Pilgrim
https://reviews.llvm.org/D52316

llvm-svn: 343851
2018-10-05 14:23:11 +00:00
Stefan Pintilie 5d32a86f44 [PowerPC] Folding XForm to DForm loads requires alignment for some DForm loads.
Going from XForm Load to DSForm Load requires that the immediate be 4 byte
aligned.
If we are not aligned we must leave the load as LDX (XForm).
This bug is causing a compile-time failure in the benchmark h264ref.

Differential Revision: https://reviews.llvm.org/D51988

llvm-svn: 343525
2018-10-01 20:16:27 +00:00
Nemanja Ivanovic a59096759d [PowerPC] [NFC] Refactor code for printing register operands
We have an unfortunate situation in our back end where we have to keep pairs of
functions synchronized. Needless to say that this is not an ideal situation as
it is very difficult to enforce. Even without bugs, it's annoying to have to do
the same thing in two places.

This patch just refactors the code so that the two pairs of those functions that
pertain to printing register operands are unified:
  - stripRegisterPrefix() - this just removes the letter prefixes from registers
    for the InstrPrinter and AsmPrinter. This patch provides this as a static
    member of PPCRegisterInfo
  - Handling of PPCII::UseVSXReg - there are 3 places where we do something
    special for instructions with that flag set. Each of those places does its
    own checking of this flag and implements code customization. Any changes to
    how we print/encode VSX/VMX registers require modifying all 3 places. This
    patch unifies this into a static function in PPCInstrInfo that returns the
    register number adjusted as needed.

Differential revision: https://reviews.llvm.org/D52467

llvm-svn: 343195
2018-09-27 11:49:47 +00:00
Fangrui Song 0cac726a00 llvm::sort(C.begin(), C.end(), ...) -> llvm::sort(C, ...)
Summary: The convenience wrapper in STLExtras is available since rL342102.

Reviewers: dblaikie, javed.absar, JDevlieghere, andreadb

Subscribers: MatzeB, sanjoy, arsenm, dschuff, mehdi_amini, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, javed.absar, gbedwell, jrtc27, mgrang, atanasyan, steven_wu, george.burgess.iv, dexonsmith, kristina, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D52573

llvm-svn: 343163
2018-09-27 02:13:45 +00:00
Hiroshi Inoue 20982f0995 [PowerPC] optimize conditional branch on CRSET/CRUNSET
This patch adds a check to optimize conditional branch (BC and BCn) based on a constant set by CRSET or CRUNSET.
Other optimizers, such as block placement, may generate such code and hence
I do this at the very end of the optimization in pre-emit peephole pass.

A conditional branch based on a constant is eliminated or converted into unconditional branch. 
Also CRSET/CRUNSET is eliminated if the condition code register is not used
by instruction other than the branch to be optimized.

Differential Revision: https://reviews.llvm.org/D52345

llvm-svn: 343100
2018-09-26 12:32:45 +00:00
Stefan Pintilie b5305771fb [Power9] [LLVM] Add __float128 exponent GET and SET builtins
Added

__builtin_vsx_scalar_extract_expq
__builtin_vsx_scalar_insert_exp_qp

Builtins should behave the same way as in GCC.

Differential Revision: https://reviews.llvm.org/D48185

llvm-svn: 342910
2018-09-24 18:14:13 +00:00
Zaara Syeda edefda48d2 [PowerPC] Support operand modifier 'x' in inline asm
gcc uses operand modifier 'x' in inline asm for VSX registers.
Without this modifier, instructions which use VSX numbering for their
operands are printed as VMX registers. This patch adds support for the
operand modifier 'x'.

Differential Revision: https://reviews.llvm.org/D52244

llvm-svn: 342882
2018-09-24 14:01:16 +00:00
QingShan Zhang cae9425a3c [PowerPC] Fix the assert of combineBVOfConsecutiveLoads when element num is 1
Building a vector out of multiple loads can be converted to a load of the vector type if the loads are consecutive.
But the special condition is that the element number is 1, such as <1 x i128>. So just early exit to fix the assert.

Patch By: wuzish (Zixuan Wu)
Differential Revision: https://reviews.llvm.org/D52072

llvm-svn: 342611
2018-09-20 03:09:15 +00:00
Matthias Braun 726e12cf0c ScheduleDAG: Cleanup dumping code; NFC
- Instead of having both `SUnit::dump(ScheduleDAG*)` and
  `ScheduleDAG::dumpNode(ScheduleDAG*)`, just keep the latter around.
- Add `ScheduleDAG::dump()` and avoid code duplication in several
  places. Implement it for different ScheduleDAG variants.
- Add `ScheduleDAG::dumpNodeName()` in favor of the `SUnit::print()`
  functions. They were only ever used for debug dumping and putting the
  function into ScheduleDAG is consistent with the `dumpNode()` change.

llvm-svn: 342520
2018-09-19 00:23:35 +00:00
Nemanja Ivanovic 87c31a6113 [PowerPC] Do not emit record-form rotates when record-form andi/andis suffices
This is a follow-up to the previous patch that eliminated some of the rotates.
With this addition, we will also emit the record-form andis.

This patch increases the number of record-form rotates we eliminate by
more than 70%.

Differential revision: https://reviews.llvm.org/D44897

llvm-svn: 342478
2018-09-18 13:43:16 +00:00
Nemanja Ivanovic 6a39d32e66 [PowerPC] Optimize compares fed by ANDISo
Both ANDIo and ANDISo (and the 64-bit versions) are record-form instructions.
When optimizing compares, we handle the former in order to eliminate the compare
instruction but not the latter. This patch just adds the latter to the set of
instructions we optimize.
The reason these instructions need to be handled separately is that they are not
part of the RecFormRel map (since they don't have a non-record-form). The
missing "and-immediate-shifted" is just an oversight in the initial
implementation.

Differential revision: https://reviews.llvm.org/D51353

llvm-svn: 342472
2018-09-18 13:21:58 +00:00
QingShan Zhang f1b0b47b2d [PowerPC] Add Itineraries of IIC_IntMulHD for P7/P8
When doing some instruction scheduling work, we noticed some missing itineraries.
Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling, 
because we can still get same latency due to default values.

With machine scheduler, however, itineraries will have impact to scheduling.
eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class.
And most of the instruction class with itineraries will have NumMicroOps default to 1.

This will has impact on the count of RetiredMOps, affects the Pending/Available Queue, 
then causing different scheduling or suboptimal scheduling further.

Patch By: jsji (Jinsong Ji)
Differential Revision: https://reviews.llvm.org/D52040

llvm-svn: 342441
2018-09-18 02:05:18 +00:00
Strahinja Petrovic 488fd4e625 [PowerPC] Fix label address calculation for ppc64
This patch fixes calculating address of label for non-pic ppc64.

Differential Revision: https://reviews.llvm.org/D50965

llvm-svn: 342368
2018-09-17 11:03:40 +00:00
Lion Yang c68f78d5d8 [PowerPC] Fix the calling convention for i1 arguments on PPC32
Summary:
Integer types smaller than i32 must be extended to i32 by default.
The feature "crbits" introduced at r202451 handles i1 as a special case,
but it did not extend properly.
The caller was, therefore, passing i1 stack arguments by writing 0/1 to
the first byte of the 4-byte stack object and callee was
reading the first byte for the value.

"crbits" is enabled if the optimization level is greater than 1,
which is very common in "release builds".
Such discrepancies with ABI specification also introduces
potential incompatibility with programs or libraries
built with other compilers e.g. GCC.

Fixes PR38661

Reviewers: hfinkel, cuviper

Subscribers: sylvestre.ledru, glaubitz, nagisa, nemanjai, kbarton, llvm-commits

Differential Revision: https://reviews.llvm.org/D51108

llvm-svn: 342288
2018-09-14 21:26:05 +00:00
Josh Stone aca532f14d Test commit: remove trailing whitespace
llvm-svn: 341966
2018-09-11 17:28:43 +00:00
Benjamin Kramer 27c769d28a [Target] Untangle disassemblers
Disassemblers cannot depend on main target headers. The same is true for
MCTargetDesc, but there's a lot more cleanup needed for that.

llvm-svn: 341822
2018-09-10 12:53:46 +00:00
QingShan Zhang abbb894ff5 [PowerPC] Combine ADD to ADDZE
On the ppc64le platform, if ir has the following form,

define i64 @addze1(i64 %x, i64 %z) local_unnamed_addr #0 {
entry:
  %cmp = icmp ne i64 %z, CONSTANT      (-32767 <= CONSTANT <= 32768)
  %conv1 = zext i1 %cmp to i64
  %add = add nsw i64 %conv1, %x
  ret i64 %add
}
we can optimize it to the form below.

                                when C == 0
                            --> addze X, (addic Z, -1))
                           /
add X, (zext(setne Z, C))--
                           \    when -32768 <= -C <= 32767 && C != 0
                            --> addze X, (addic (addi Z, -C), -1)

Patch By: HLJ2009 (Li Jia He)
Differential Revision: https://reviews.llvm.org/D51403
Reviewed By: Nemanjai 

llvm-svn: 341634
2018-09-07 07:56:05 +00:00
QingShan Zhang c2b6c547dc [PowerPC] Add Itineraries of IIC_IntRotateDI for P7/P8
When doing some instruction scheduling work, we noticed some missing itineraries.
Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling, 
because we can still get same latency due to default values.

With machine scheduler, however, itineraries will have impact to scheduling.
eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class.
And most of the instruction class with itineraries will have NumMicroOps default to 1.

This will has impact on the count of RetiredMOps, affects the Pending/Available Queue, 
then causing different scheduling or suboptimal scheduling further.

Patch by jsji (Jinsong Ji)
Differential Revision: https://reviews.llvm.org/D51506

llvm-svn: 341293
2018-09-03 03:14:29 +00:00
Kit Barton 7c80f98b69 [PPC] Remove Darwin support from POWER backend.
This patch issues an error message if Darwin ABI is attempted with the PPC
backend. It also cleans up existing test cases, either converting the test to
use an alternative triple or removing the test if the coverage is no longer
needed.

Updated Tests
-------------
The majority of test cases were updated to use a different triple that does not
include the Darwin ABI. Many tests were also updated to use FileCheck, in place
of grep.

Deleted Tests
-------------
llvm/test/tools/dsymutil/PowerPC/sibling.test was originally added to test
specific functionality of dsymutil using an object file created with an old
version of llvm-gcc for a Powerbook G4. After a discussion with @JDevlieghere he
suggested removing the test.

llvm/test/CodeGen/PowerPC/combine_loads_from_build_pair.ll was converted from a
PPC test to a SystemZ test, as the behavior is also reproducible there.

All other tests that were deleted were specific to the darwin/ppc ABI and no
longer necessary.

Phabricator Review: https://reviews.llvm.org/D50988

llvm-svn: 340795
2018-08-28 01:18:29 +00:00
Sean Fertile a2f095f1a3 [PowerPC][MC] Support expressions in getMemRIX16Encoding.
Loosens an assert in getMemRIX16Encoding that restricts DQ-form instructions to
using an immediate, so that we can assemble instructions like lxv/stxv where the
offset is an expression.

Differential Revision: https://reviews.llvm.org/D51122

llvm-svn: 340761
2018-08-27 17:37:43 +00:00
Nico Weber e75fd1b184 fix comment typo
llvm-svn: 340744
2018-08-27 14:25:22 +00:00
Nemanja Ivanovic 5d06f17b8a [PowerPC] Revert commit r339779
This commit has caused failures in some internal benchmarks. Temporarily
reverting this patch until the issue can be diagnosed and fixed.

llvm-svn: 340740
2018-08-27 13:20:42 +00:00
Nemanja Ivanovic f2588a28a8 [PowerPC] Recommit r340016 after fixing the reported issue
The internal benchmark failure reported by Google was due to a missing
check for the result type for the sign-extend and shift DAG. This commit
adds the check and re-commits the patch.

llvm-svn: 340734
2018-08-27 11:20:27 +00:00
Stefan Pintilie f384606799 [PowerPC] Emit xscpsgndp instead of xxlor when copying floating point scalar registers for P9
This patch will address using the xscpsgndp instruction to copy floating point
scalar registers instead of the xxlor (specifically XXLORf) instruction that is
currently used. Additionally, this patch of utilizing xscpsgndp will apply to
P9, while pre-P9 will still use xxlor.

Patch by amyk

Differential Revision: https://reviews.llvm.org/D50004

llvm-svn: 340643
2018-08-24 20:00:24 +00:00
Eric Christopher 3dc594c1e6 Temporarily Revert "[PowerPC] Generate Power9 extswsli extend sign and shift immediate instruction" due to it causing a compiler crash on valid.
This reverts commit r340016, testcase forthcoming.

llvm-svn: 340315
2018-08-21 18:35:08 +00:00
QingShan Zhang f8f9af7ba5 [PowerPC] Add a peephole post RA to transform the inst that fed by add
If the arch is P8, we will select XFLOAD to load the floating point, and then, expand it to vsx and non-vsx X-form instruction post RA. This patch is trying to convert the X-form to D-form if it meets the requirement that one operand of the x-form inst is the special Zero register, and another operand fed by add inst. i.e.
y = add imm, reg
LFDX. 0, y
-->
LFD imm(reg)

Reviewers: Nemanjai
Differential Revision: https://reviews.llvm.org/D49007

llvm-svn: 340149
2018-08-20 02:52:55 +00:00
Stefan Pintilie 39869ccf51 [PowerPC] Generate lxsd instead of the ld->mtvsrd sequence for vector loads
This patch addresses:

- Implementation within PPCISelLowering.cpp to check if we should use direct
load into vector instructions (such as lxsd/lfd ) when the scalar_to_vector
function is used; which will allow us to catch as many cases of the
scalar_to_vector uses as possible to translate the ld->mtvsrd sequence into
lxsd.

- Test cases to exhibit the behaviour of emitting lxsd/lfd.

Patch by amyk

Differential revision: https://reviews.llvm.org/D49698

llvm-svn: 340037
2018-08-17 15:15:26 +00:00
Nemanja Ivanovic 39751276b0 [PowerPC] Generate Power9 extswsli extend sign and shift immediate instruction
Add a DAG combine for the PowerPC code generator to generate the Power9 extswsli
extend sign and shift immediate instruction.

Patch by RolandF.

Differential revision: https://reviews.llvm.org/D49879

llvm-svn: 340016
2018-08-17 12:35:44 +00:00
Chandler Carruth c73c0307fe [MI] Change the array of `MachineMemOperand` pointers to be
a generically extensible collection of extra info attached to
a `MachineInstr`.

The primary change here is cleaning up the APIs used for setting and
manipulating the `MachineMemOperand` pointer arrays so chat we can
change how they are allocated.

Then we introduce an extra info object that using the trailing object
pattern to attach some number of MMOs but also other extra info. The
design of this is specifically so that this extra info has a fixed
necessary cost (the header tracking what extra info is included) and
everything else can be tail allocated. This pattern works especially
well with a `BumpPtrAllocator` which we use here.

I've also added the basic scaffolding for putting interesting pointers
into this, namely pre- and post-instruction symbols. These aren't used
anywhere yet, they're just there to ensure I've actually gotten the data
structure types correct. I'll flesh out support for these in
a subsequent patch (MIR dumping, parsing, the works).

Finally, I've included an optimization where we store any single pointer
inline in the `MachineInstr` to avoid the allocation overhead. This is
expected to be the overwhelmingly most common case and so should avoid
any memory usage growth due to slightly less clever / dense allocation
when dealing with >1 MMO. This did require several ergonomic
improvements to the `PointerSumType` to reasonably support the various
usage models.

This also has a side effect of freeing up 8 bits within the
`MachineInstr` which could be repurposed for something else.

The suggested direction here came largely from Hal Finkel. I hope it was
worth it. ;] It does hopefully clear a path for subsequent extensions
w/o nearly as much leg work. Lots of thanks to Reid and Justin for
careful reviews and ideas about how to do all of this.

Differential Revision: https://reviews.llvm.org/D50701

llvm-svn: 339940
2018-08-16 21:30:05 +00:00
Nemanja Ivanovic 5b9a4f8ee5 [PowerPC] Enhance the selection(ISD::VSELECT) of vector type
To make ISD::VSELECT available(legal) so long as there are altivec instruction,
otherwise it's default behavior is expanding.
Use xxsel to match vselect if vsx is open, or use vsel.

In order to do not write many patterns in td file, promote (for vector it's
bitcast) all other type into v4i32 and only pattern match vselect of v4i32 into
vsel or xxsel.

Patch by wuzish
Differential revision: https://reviews.llvm.org/D49531

llvm-svn: 339779
2018-08-15 15:30:36 +00:00
Nemanja Ivanovic 8b4bd09e22 [PowerPC] Don't run BV DAG Combine before legalization if it assumes legal types
When trying to combine a DAG that builds a vector out of sign-extensions of
vector extracts, the code assumes legal input types. Due to that, we have to
disable this combine prior to legalization.
In some cases, the DAG will look slightly different after legalization so
account for that in the matching code.

This is a fix for https://bugs.llvm.org/show_bug.cgi?id=38087

Differential Revision: https://reviews.llvm.org/D49080

llvm-svn: 339769
2018-08-15 12:58:13 +00:00