Commit Graph

5533 Commits

Author SHA1 Message Date
Lei Huang 6270ab6ce4 [Power9]Legalize and emit code for round & convert quad-precision values
Legalize and emit code for round & convert float128 to double precision and
single precision.

Differential Revision: https://reviews.llvm.org/D46997

llvm-svn: 336299
2018-07-04 21:59:16 +00:00
Stefan Pintilie cb4f0c5c07 [PowerPC] Replace the Post RA List Scheduler with the Machine Scheduler
We want to run the Machine Scheduler instead of the List Scheduler after RA.
  Checked with a performance run on a Power 9 machine with SPEC 2006 and while
  some benchmarks improved and others degraded the geomean was slightly improved
  with the Machine Scheduler.

  Differential Revision: https://reviews.llvm.org/D45265

llvm-svn: 336295
2018-07-04 18:54:25 +00:00
QingShan Zhang 3b2aa2b4b4 [PowerPC] Don't make it as pre-inc candidate if displacement isn't 4's multiple for i64 pre-inc load/store
For the below case, pre-inc prep think it's a good candidate to use pre-inc for the bucket, but 64bit integer load/store update (pre-inc) instruction on Power requires the displacement field should be DS-form (4's multiple). Since it can't satisfy the constraint, we have to do some fix ups later. As below, the original load/stores could be well-form, it makes things worse.

unsigned long long result = 0;
unsigned long long foo(char *p, unsigned long long n) {
  for (unsigned long long i = 0; i < n; i++) {
    unsigned long long x1 = *(unsigned long long *)(p - 50000 + i);
    unsigned long long x2 = *(unsigned long long *)(p - 61024 + i);
    unsigned long long x3 = *(unsigned long long *)(p - 62048 + i);
    unsigned long long x4 = *(unsigned long long *)(p - 64096 + i);
    result *= x1 * x2 * x3 * x4;
  }
  return result;
}

Patch by jedilyn(Kewen Lin).

Differential Revision: https://reviews.llvm.org/D48813 
--This line, and  those below, will be ignored--

M    lib/Target/PowerPC/PPCLoopPreIncPrep.cpp
A    test/CodeGen/PowerPC/preincprep-i64-check.ll

llvm-svn: 336074
2018-07-02 05:46:09 +00:00
Lei Huang 5d109ee3d4 [PowerPC] Fix incorrectly encoded wait instruction
Encoding for the wait instruction was wrong. Fix according to ISA 3.0.

Differential Revision: https://reviews.llvm.org/D48550

llvm-svn: 335514
2018-06-25 19:28:27 +00:00
Strahinja Petrovic bb2b00bb80 [PowerPC] Fix label address calculation for ppc32
This patch fixes calculating address of label on ppc32 (for -fPIC).

Differential Revision: https://reviews.llvm.org/D46582

llvm-svn: 335043
2018-06-19 13:07:40 +00:00
QingShan Zhang 9f0fe9a3f8 If the arch is P9, we will select the DFLOADf32/DFLOADf64 pseudo instruction when we are loading a floating,
and expand it post RA basing on the register pressure. However, we miss to do the add-imm peephole for these pseudo instruction.

Differential Revision: https://reviews.llvm.org/D47568
Reviewed By: Nemanjai

llvm-svn: 335024
2018-06-19 06:54:51 +00:00
Sean Fertile cac28aeb3f [PowerPC] Add support for high and higha symbol modifiers on tls modifers.
Enables using the high and high-adjusted symbol modifiers on thread local
storage modifers in powerpc assembly. Needed to be able to support 64 bit
thread-pointer and dynamic-thread-pointer access sequences.

Differential Revision: https://reviews.llvm.org/D47754

llvm-svn: 334856
2018-06-15 19:47:16 +00:00
Sean Fertile 80b8f82f17 [PPC64] Support "symbol@high" and "symbol@higha" symbol modifers.
Add support for the "@high" and "@higha" symbol modifiers in powerpc64 assembly.
The modifiers represent accessing the segment consiting of bits 16-31 of a
64-bit address/offset.

Differential Revision: https://reviews.llvm.org/D47729

llvm-svn: 334855
2018-06-15 19:47:11 +00:00
Hiroshi Inoue 0f7f59f073 [PowerPC] fix trivial typos in comment, NFC
llvm-svn: 334583
2018-06-13 08:54:13 +00:00
Hiroshi Inoue 9bffc94cf0 [PowerPC] avoid verification failure due to PowerPC VSX Swap Removal pass
This patch fixes a failure in lnt tests with -verify-machineinstrs option.
When VSX Swap Removal pass swaps two register operands, it did not maintain kill flags associated with operands. This patch swaps flags as well as register number to avoid inconsistent kill flags information.

llvm-svn: 334579
2018-06-13 08:25:14 +00:00
Hiroshi Inoue 863fb7a4b8 [NFC] fix formatting
llvm-svn: 334263
2018-06-08 04:00:54 +00:00
Hiroshi Inoue 01ef4c2c64 [PowerPC] avoid unprofitable Repl32 flag in BitPermutationSelector
BitPermutationSelector sets Repl32 flag for bit groups which can be (potentially) benefit from 32-bit rotate-and-mask instructions with bit replication, i.e. rlwinm/rlwimi copies lower 32 bits into upper 32 bits on 64-bit PowerPC before rotation.
However, enforcing 32-bit instruction sometimes results in redundant generated code.
For example, the following simple code is compiled into rotldi + rlwimi while it can be compiled into only rldimi instruction if Repl32 flag is not set on the bit group for (a & 0xFFFFFFFF).

uint64_t func(uint64_t a, uint64_t b) {
	return (a & 0xFFFFFFFF) | (b << 32) ;
}

To avoid such problem, this patch checks the potential benefit of Repl32 flag before setting it. If a bit group does not require rotation (i.e. RLAmt == 0) and won't be merged into another group, we do not benefit from Repl32 flag on this group.

Differential Revision: https://reviews.llvm.org/D47867

llvm-svn: 334195
2018-06-07 13:21:14 +00:00
Hiroshi Inoue b557846083 [PowerPC] fix trivial typos in comment, NFC
llvm-svn: 334191
2018-06-07 12:49:12 +00:00
Peter Smith 57f661bd7d [MC] Pass MCSubtargetInfo to fixupNeedsRelaxation and applyFixup
On targets like Arm some relaxations may only be performed when certain
architectural features are available. As functions can be compiled with
differing levels of architectural support we must make a judgement on
whether we can relax based on the MCSubtargetInfo for the function. This
change passes through the MCSubtargetInfo for the function to
fixupNeedsRelaxation so that the decision on whether to relax can be made
per function. In this patch, only the ARM backend makes use of this
information. We must also pass the MCSubtargetInfo to applyFixup because
some fixups skip error checking on the assumption that relaxation has
occurred, to prevent code-generation errors applyFixup must see the same
MCSubtargetInfo as fixupNeedsRelaxation.

Differential Revision: https://reviews.llvm.org/D44928

llvm-svn: 334078
2018-06-06 09:40:06 +00:00
Hiroshi Inoue 955655f558 [PowerPC] reduce rotate in BitPermutationSelector
BitPermutationSelector builds the output value by repeating rotate-and-mask instructions with input registers.
Here, we may avoid one rotate instruction if we start building from an input register that does not require rotation.

For example of the test case bitfieldinsert.ll, it first rotates left r4 by 8 bits and then inserts some bits from r5 without rotation.
This can be executed by one rlwimi instruction, which rotates r4 by 8 bits and inserts its bits into r5.

This patch adds a check for rotation amounts in the comparator used in sorting to process the input without rotation first.

Differential Revision: https://reviews.llvm.org/D47765

llvm-svn: 334011
2018-06-05 11:58:01 +00:00
David Blaikie 31b98d2e99 Move Analysis/Utils/Local.h back to Transforms
Review feedback from r328165. Split out just the one function from the
file that's used by Analysis. (As chandlerc pointed out, the original
change only moved the header and not the implementation anyway - which
was fine for the one function that was used (since it's a
template/inlined in the header) but not in general)

llvm-svn: 333954
2018-06-04 21:23:21 +00:00
Hiroshi Inoue 9796b47df1 [NFC] Zero initialize local variables
This patch makes local variables zero initialized to avoid broken values in debug output.

llvm-svn: 333754
2018-06-01 14:23:15 +00:00
Amaury Sechet 8467411dad Set ADDE/ADDC/SUBE/SUBC to expand by default
Summary:
They've been deprecated in favor of UADDO/ADDCARRY or USUBO/SUBCARRY for a while.

Target that uses these opcodes are changed in order to ensure their behavior doesn't change.

Reviewers: efriedma, craig.topper, dblaikie, bkramer

Subscribers: jholewinski, arsenm, jyknight, sdardis, nemanjai, nhaehnle, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, llvm-commits

Differential Revision: https://reviews.llvm.org/D47422

llvm-svn: 333748
2018-06-01 13:21:33 +00:00
Lei Huang 716103f1cd [PowerPC] Fix the incorrect iterator inside peephole
Instruction selection can insert nodes into the underlying list after the root
node so iterating will thereby miss it. We should NOT assume that, the root node
is the last element in the DAG nodelist.

Patch by: steven.zhang (Qing Shan Zhang)

Differential Revision: https://reviews.llvm.org/D47437

llvm-svn: 333415
2018-05-29 13:38:56 +00:00
Lei Huang 651be44913 [Power9]Legalize and emit code for HW/Byte vector extract and convert to QP
Implemente patterns to extract HWord and Byte vector elements and convert to
quad-precision.

Differential Revision: https://reviews.llvm.org/D46774

llvm-svn: 333377
2018-05-28 16:43:29 +00:00
Zaara Syeda 6f3df02fdc [PowerPC] Set isAsmParserOnly=1 for X-form TLS loads/stores
The X-form TLS load/store instructions added for optimizing the initial-exec
sequence in https://reviews.llvm.org/rL327635 fail to assemble. llvm-mc fails
with the error: invalid operand for instruction. This patch adds these
instructions into a block with isAsmParserOnly, similar to how ADD8TLS_ is
currently handled.

Differential Revision: https://reviews.llvm.org/D47382

llvm-svn: 333374
2018-05-28 15:27:58 +00:00
Lei Huang f4ec67822f [PowerPC] Remove the match pattern in the definition of LXSDX/STXSDX
The match pattern in the definition of LXSDX is xoaddr, so the Pseudo
instruction XFLOADf64 never gets selected. XFLOADf64 expands to LXSDX/LFDX post
RA based on the register pressure. To avoid ambiguity, we need to remove the
select pattern for LXSDX, same as what was done for LXSD. STXSDX also have
the same issue.

Patch by Qing Shan Zhang (steven.zhang).

Differential Revision: https://reviews.llvm.org/D47178

llvm-svn: 333150
2018-05-24 03:20:28 +00:00
Lei Huang 8b0da65bfb [Power9]Legalize and emit code for W vector extract and convert to QP
Implemente patterns to extract [Un]signed Word vector element and convert to
quad-precision.

Differential Revision: https://reviews.llvm.org/D46536

llvm-svn: 333115
2018-05-23 19:31:54 +00:00
Lei Huang 8990168a45 [Power9]Legalize and emit code for DW vector extract and convert to QP
Implemente patterns to extract [Un]signed DWord vector element and convert to
quad-precision.

Differential Revision: https://reviews.llvm.org/D46333

llvm-svn: 333112
2018-05-23 18:36:51 +00:00
Peter Collingbourne dcd7d6c331 MC: Separate creating a generic object writer from creating a target object writer. NFCI.
With this we gain a little flexibility in how the generic object
writer is created.

Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47045

llvm-svn: 332868
2018-05-21 19:20:29 +00:00
Peter Collingbourne 571a3301ae MC: Change MCAsmBackend::writeNopData() to take a raw_ostream instead of an MCObjectWriter. NFCI.
To make this work I needed to add an endianness field to MCAsmBackend
so that writeNopData() implementations know which endianness to use.

Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47035

llvm-svn: 332857
2018-05-21 17:57:19 +00:00
Peter Collingbourne e3f652973e Support: Simplify endian stream interface. NFCI.
Provide some free functions to reduce verbosity of endian-writing
a single value, and replace the endianness template parameter with
a field.

Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47032

llvm-svn: 332757
2018-05-18 19:46:24 +00:00
Zaara Syeda 421a5960d2 [NFC] [Power] Fix instruction format for xsrqpi
xsrqpi is currently using Z23Form_1.
The instruction format is xsrqpi R,VRT,VRB,RMC.
Rathar than bits 11-15 being used for FRA, it should have
bits 11-14 reserved and bit 15 for R. This patch adds a new
class Z23Form_4 to fix the instruction format.

Differential Revision: https://reviews.llvm.org/D46761

llvm-svn: 332253
2018-05-14 15:45:15 +00:00
Nicola Zaghen d34e60ca85 Rename DEBUG macro to LLVM_DEBUG.
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.

In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.

Differential Revision: https://reviews.llvm.org/D43624

llvm-svn: 332240
2018-05-14 12:53:11 +00:00
Vedant Kumar e0b5f86b30 [STLExtras] Add distance() for ranges, pred_size(), and succ_size()
This commit adds a wrapper for std::distance() which works with ranges.
As it would be a common case to write `distance(predecessors(BB))`, this
also introduces `pred_size()` and `succ_size()` helpers to make that
easier to write.

Differential Revision: https://reviews.llvm.org/D46668

llvm-svn: 332057
2018-05-10 23:01:54 +00:00
Shiva Chen 801bf7ebbe [DebugInfo] Examine all uses of isDebugValue() for debug instructions.
Because we create a new kind of debug instruction, DBG_LABEL, we need to
check all passes which use isDebugValue() to check MachineInstr is debug
instruction or not. When expelling debug instructions, we should expel
both DBG_VALUE and DBG_LABEL. So, I create a new function,
isDebugInstr(), in MachineInstr to check whether the MachineInstr is
debug instruction or not.

This patch has no new test case. I have run regression test and there is
no difference in regression test.

Differential Revision: https://reviews.llvm.org/D45342

Patch by Hsiangkai Wang.

llvm-svn: 331844
2018-05-09 02:42:00 +00:00
Lei Huang e41e3d3237 [Power9]Legalize and emit code for truncate and convert QP to HW and Byte
Legalize and emit code for truncate and convert float128 to (un)signed short
and (un)signed char.

Differential Revision: https://reviews.llvm.org/D46194

llvm-svn: 331797
2018-05-08 18:52:06 +00:00
Lei Huang 6364288dba [Power9]Legalize and emit code for truncate and convert Quad-Precision to Word
Legalize and emit code for:

  * xscvqpswz : VSX Scalar truncate & Convert Quad-Precision to Signed Word
  * xscvqpuwz : VSX Scalar truncate & Convert Quad-Precision to Unsigned Word

Differential Revision: https://reviews.llvm.org/D45635

llvm-svn: 331790
2018-05-08 18:34:00 +00:00
Lei Huang c517e95bc6 [Power9]Legalize and emit code for truncate and convert QP to DW
Legalize and emit code for:

  * xscvqpsdz : VSX Scalar truncate & Convert Quad-Precision to Signed Dword
  * xscvqpudz : VSX Scalar truncate & Convert Quad-Precision to Unsigned Dword

Differential Revision: https://reviews.llvm.org/D45553

llvm-svn: 331787
2018-05-08 18:23:31 +00:00
Lei Huang c29229a644 [PowerPC] Unify handling for conversion of FP_TO_INT feeding a store
Existing DAG combine only handles conversions for FP_TO_SINT:
"{f32, f64} x { i32, i16 }"

This patch simplifies the code to handle:
"{ FP_TO_SINT, FP_TO_UINT } x { f64, f32 } x { i64, i32, i16, i8 }"

Differential Revision: https://reviews.llvm.org/D46102

llvm-svn: 331778
2018-05-08 17:36:40 +00:00
Nemanja Ivanovic 61ffbf21cd Commit r331416 breaks the big-endian PPC bot. On the big endian build, we
actually encounter constants wider than 64-bits. Add the guard to prevent
tripping the assert.

llvm-svn: 331420
2018-05-03 01:04:13 +00:00
Nemanja Ivanovic 01e2e79abf [PowerPC] Implement isMaskAndCmp0FoldingBeneficial
Sinking the and closer to a compare against zero is beneficial on PPC as it
allows us to emit record-form instructions. In the future, we may expand this
to a larger set of operations that feed compares against zero since PPC has
lots of record-form instructions.

Differential revision: https://reviews.llvm.org/D46060

llvm-svn: 331416
2018-05-02 23:55:23 +00:00
Nemanja Ivanovic 2139e99e47 [PowerPC] No CTR loop if the candidate exiting block is in a different loop
The CTR loops pass will insert the decrementing branch instruction in an exiting
block for the loop being transformed. However if that block is part of another
loop as well (whether a nested loop or with irreducible CFG), it is not valid
to use that exiting block. In fact, if the loop hass irreducible CFG, we don't
bother analyzing it and we just bail on the transformation. In practice, this
doesn't lead to a noticeable reduction in the number of loops transformed by
this pass.

Fixes https://bugs.llvm.org/show_bug.cgi?id=37229

Differential Revision: https://reviews.llvm.org/D46162

llvm-svn: 331410
2018-05-02 22:56:04 +00:00
Adrian Prantl 5f8f34e459 Remove \brief commands from doxygen comments.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

  for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46290

llvm-svn: 331272
2018-05-01 15:54:18 +00:00
Nico Weber 432a38838d IWYU for llvm-config.h in llvm, additions.
See r331124 for how I made a list of files missing the include.
I then ran this Python script:

    for f in open('filelist.txt'):
        f = f.strip()
        fl = open(f).readlines()

        found = False
        for i in xrange(len(fl)):
            p = '#include "llvm/'
            if not fl[i].startswith(p):
                continue
            if fl[i][len(p):] > 'Config':
                fl.insert(i, '#include "llvm/Config/llvm-config.h"\n')
                found = True
                break
        if not found:
            print 'not found', f
        else:
            open(f, 'w').write(''.join(fl))

and then looked through everything with `svn diff | diffstat -l | xargs -n 1000 gvim -p`
and tried to fix include ordering and whatnot.

No intended behavior change.

llvm-svn: 331184
2018-04-30 14:59:11 +00:00
Nico Weber 5d53aed419 Consistently sort add_subdirectory calls in lib/Target/*/CMakeLists.txt
llvm-svn: 330584
2018-04-23 12:49:34 +00:00
Hiroshi Inoue 33486787cb [PowerPC] fix incorrect vectorization of abs() on POWER9
Vectorized loops with abs() returns incorrect results on POWER9. This patch fixes it.
For example the following code returns negative result if input values are negative though it sums up the absolute value of the inputs.

int vpx_satd_c(const int16_t *coeff, int length) {
  int satd = 0;
  for (int i = 0; i < length; ++i) satd += abs(coeff[i]);
  return satd;
}

This problem causes test failures for libvpx.
For vector absolute and vector absolute difference on POWER9, LLVM generates VABSDUW (Vector Absolute Difference Unsigned Word) instruction or variants.
Since these instructions are for unsigned integers, we need adjustment for signed integers.
For abs(sub(a, b)), we generate VABSDUW(a+0x80000000, b+0x80000000). Otherwise, abs(sub(-1, 0)) returns 0xFFFFFFFF(=-1) instead of 1. For abs(a), we generate VABSDUW(a+0x80000000, 0x80000000).

Differential Revision: https://reviews.llvm.org/D45522

llvm-svn: 330497
2018-04-21 09:32:17 +00:00
Lei Huang 192c6ccf6d [Power9]Legalize and emit code for converting Unsigned HWord/Char to Quad-Precision
Legalize and emit code for converting unsigned HWord/Char to QP:

xscvsdqp
xscvudqp

Only covering patterns for unsigned forms cause we don't have part-word
sign-extending integer loads into VSX registers.

Differential Revision: https://reviews.llvm.org/D45494

llvm-svn: 330278
2018-04-18 17:41:46 +00:00
Lei Huang 198e678576 [Power9]Legalize and emit code for converting (Un)Signed Word to Quad-Precision
Legalize and emit code for converting (Un)Signed Word to quad-precision via:

xscvsdqp
xscvudqp

Differential Revision: https://reviews.llvm.org/D45389

llvm-svn: 330273
2018-04-18 16:34:22 +00:00
Lei Huang 42ab1d3d03 [NFC] Move verificaiton check for f128 conversion into LowerINT_TO_FP()
Move veriication check for legal conversions to f128 into LowerINT_TO_FP()
and fix some indentations to match other sections of the code for readability.

llvm-svn: 330138
2018-04-16 17:30:24 +00:00
Stefan Pintilie 118b8675c5 [Power9] Add the TLS store instructions to the Power 9 model
The Power 9 scheduler model should now include the TLS instructions.
We can now, once again, mark the model as complete.
From now on, if instructions are added to Power 9 but are not
added to the model the build should produce an error. Hopefully
that will alert the developer who is adding new instructions
that they should also be added to the scheulder model.

llvm-svn: 330060
2018-04-13 19:49:58 +00:00
Lei Huang 10367eb422 [Power9]Legalize and emit code for converting (Un)Signed DWord to Quad-Precision
Legalize and emit code for:

  * xscvsdqp
  * xscvudqp

Differential Revision: https://reviews.llvm.org/D45230

llvm-svn: 329931
2018-04-12 18:00:14 +00:00
Nemanja Ivanovic c564dc060a [PowerPC] Fix condition for 64-bit rotate when replacing r+r instr with r+i
This patch fixes https://bugs.llvm.org/show_bug.cgi?id=37039
The condition only covers one of the two 64-bit rotate instructions. This just
adds the second (RLDICLo).

Patch by Josh Stone.

llvm-svn: 329852
2018-04-11 21:25:44 +00:00
Mandeep Singh Grang 327fd5e47c [PowerPC] Change std::sort to llvm::sort in response to r327219
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.

To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.

Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.

Reviewers: hfinkel, RKSimon

Reviewed By: RKSimon

Subscribers: nemanjai, kbarton, llvm-commits

Differential Revision: https://reviews.llvm.org/D44870

llvm-svn: 329535
2018-04-08 16:45:04 +00:00
Hiroshi Inoue a2eefb6d9a [PowerPC] allow D-form VSX load/store when accessing FrameIndex without offset
VSX D-form load/store instructions of POWER9 require the offset be a multiple of 16 and a helper`isOffsetMultipleOf` is used to check this.
So far, the helper handles FrameIndex + offset case, but not handling FrameIndex without offset case. Due to this, we are missing opportunities to exploit D-form instructions when accessing an object or array allocated on stack.
For example, x-form store (stxvx) is used for int a[4] = {0}; instead of d-form store (stxv). For larger arrays, D-form instruction is not used when accessing the first 16-byte. Using D-form instructions reduces register pressure as well as instructions.

Differential Revision: https://reviews.llvm.org/D45079

llvm-svn: 329377
2018-04-06 05:41:16 +00:00
Hiroshi Inoue bbf98aea83 [PowerPC] fix assertion failure due to missing instruction in P9InstrResources.td
This patch adds L(W|H|B)ZXTLS_32 instructions introduced by https://reviews.llvm.org/rL327635 in P9InstrResources.td.

llvm-svn: 329299
2018-04-05 15:27:06 +00:00
Simon Pilgrim 1d793b8ac5 [SchedModel] Complete models shouldn't match against itineraries when they don't use them (PR35639)
For schedule models that don't use itineraries, checkCompleteness still checks that an instruction has a matching itinerary instead of skipping and going straight to matching the InstRWs. That doesn't seem to match what happens in TargetSchedule.cpp

This patch causes problems for a number of models that had been incorrectly flagged as complete.

Differential Revision: https://reviews.llvm.org/D43235

llvm-svn: 329280
2018-04-05 13:11:36 +00:00
Lei Huang 09fda63af0 [Power9]Legalize and emit code for quad-precision fma instructions
Legalize and emit code for the following quad-precision fma:

  * xsmaddqp
  * xsnmaddqp
  * xsmsubqp
  * xsnmsubqp

Differential Revision: https://reviews.llvm.org/D44843

llvm-svn: 329206
2018-04-04 16:43:50 +00:00
Nico Weber 1cbd096914 Sort targetgen calls in lib/Target/*/CMakeLists.
Makes it easier to see mistakes such as the one fixed in r329178 and makes
the different target CMakeLists more consistent.

Also remove some stale-looking comments from the Nios2 target cmakefile.

No intended behavior change.

llvm-svn: 329181
2018-04-04 12:37:44 +00:00
Hiroshi Inoue 08a1775f28 [PowerPC] reorder entries in P9InstrResources.td in alphabetical order; NFC
Reorder entries added in my previous commit (rL328969) to keep alphabetical order.

llvm-svn: 329064
2018-04-03 12:49:42 +00:00
Hiroshi Inoue 6d48493817 [PowerPC] fix assertion failure due to missing instruction in P9InstrResources.td
This patch adds L(D|W|H|B)XTLS instructions introduced by https://reviews.llvm.org/rL327635 in P9InstrResources.td.

llvm-svn: 328969
2018-04-02 12:18:21 +00:00
Craig Topper 2fa1436206 [IR][CodeGen] Remove dependency on EVT from IR/Function.cpp. Move EVT to CodeGen layer.
Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it.

The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly.

Differential Revision: https://reviews.llvm.org/D45017

llvm-svn: 328806
2018-03-29 17:21:10 +00:00
David Blaikie 8ad9a97310 Plumb useAA through TargetTransformInfo to remove Transforms->CodeGen header dependency
Thanks to echristo for the pointers on direction.

llvm-svn: 328737
2018-03-28 22:28:50 +00:00
David Blaikie a373d18eb7 Transforms: Introduce Transforms/Utils.h rather than spreading the declarations amongst Scalar.h and IPO.h
Fixes layering - Transforms/Utils shouldn't depend on including a Scalar
or IPO header, because Scalar and IPO depend on Utils.

llvm-svn: 328717
2018-03-28 17:44:36 +00:00
Sterling Augustine 33dc01861a Initialize variable added in r328617.
llvm-svn: 328667
2018-03-27 21:11:57 +00:00
Stefan Pintilie 659f040351 [Power9] Fix the resource list for the COPY instruction.
The COPY instruction was listed as a 4 cycle instruction.
It is now listed correctly as a 2 cycle ALU instruction.

llvm-svn: 328647
2018-03-27 17:51:53 +00:00
Strahinja Petrovic 06cf6a6490 [PowerPC] Secure PLT support
This patch supports secure PLT mode for PowerPC 32 architecture.

Differential Revision: https://reviews.llvm.org/D42112

llvm-svn: 328617
2018-03-27 11:23:53 +00:00
Lei Huang be0afb0870 [Power9]Legalize and emit code for quad-precision convert from double-precision
Legalize and emit code for quad-precision floating point operation xscvdpqp
and add option to guard the quad precision operation support.

Differential Revision: https://reviews.llvm.org/D44746

llvm-svn: 328558
2018-03-26 17:46:25 +00:00
Stefan Pintilie 26d4f923c4 [PowerPC] Infrastructure work. Implement getting the opcode for a spill in one place.
A new function getOpcodeForSpill should now be the only place to get
the opcode for a given spilled register.

Differential Revision: https://reviews.llvm.org/D43086

llvm-svn: 328556
2018-03-26 17:39:18 +00:00
David Blaikie 36a0f226b1 Fix layering by moving ValueTypes.h from CodeGen to IR
ValueTypes.h is implemented in IR already.

llvm-svn: 328397
2018-03-23 23:58:31 +00:00
David Blaikie 13e77db2df Fix layering of MachineValueType.h by moving it from CodeGen to Support
This is used by llvm tblgen as well as by LLVM Targets, so the only
common place is Support for now. (maybe we need another target for these
sorts of things - but for now I'm at least making them correct & we can
make them better if/when people have strong feelings)

llvm-svn: 328395
2018-03-23 23:58:25 +00:00
David Blaikie 6054e650ff Move TargetLoweringObjectFile from CodeGen to Target to fix layering
It's implemented in Target & include from other Target headers, so the
header should be in Target.

llvm-svn: 328392
2018-03-23 23:58:19 +00:00
Zaara Syeda 6535993625 Re-commit: [MachineLICM] Add functions to MachineLICM to hoist invariant stores
This patch adds functions to allow MachineLICM to hoist invariant stores.
Currently, MachineLICM does not hoist any store instructions, however
when storing the same value to a constant spot on the stack, the store
instruction should be considered invariant and be hoisted. The function
isInvariantStore iterates each operand of the store instruction and checks
that each register operand satisfies isCallerPreservedPhysReg. The store
may be fed by a copy, which is hoisted by isCopyFeedingInvariantStore.
This patch also adds the PowerPC changes needed to consider the stack
register as caller preserved.

Differential Revision: https://reviews.llvm.org/D40196

llvm-svn: 328326
2018-03-23 15:28:15 +00:00
David Blaikie 2be3922807 Fix a couple of layering violations in Transforms
Remove #include of Transforms/Scalar.h from Transform/Utils to fix layering.

Transforms depends on Transforms/Utils, not the other way around. So
remove the header and the "createStripGCRelocatesPass" function
declaration (& definition) that is unused and motivated this dependency.

Move Transforms/Utils/Local.h into Analysis because it's used by
Analysis/MemoryBuiltins.cpp.

llvm-svn: 328165
2018-03-21 22:34:23 +00:00
Craig Topper c2dbd677bd [PowerPC][LegalizeFloatTypes] Move the PPC hacks for (i32 fp_to_sint/fp_to_uint (ppcf128 X)) out of LegalizeFloatTypes and into PPC specific code
I'm not entirely sure these hacks are still needed. If you remove the hacks completely, the name of the library call that gets generated doesn't match the grep the test previously had. So the test wasn't really checking anything.

If the hack is still needed it belongs in PPC specific code. I believe the FP_TO_SINT code here is the only place in the tree where a FP_ROUND_INREG node is created today. And I don't think its even being used correctly because the legalization returned a BUILD_PAIR with the same value twice. That doesn't seem right to me. By moving the code entirely to PPC we can avoid creating the FP_ROUND_INREG at all.

I replaced the grep in the existing test with full checks generated by hacking update_llc_test_check.py to support ppc32 just long enough to generate it.

Differential Revision: https://reviews.llvm.org/D44061

llvm-svn: 328017
2018-03-20 18:49:28 +00:00
Lei Huang ecfede94a7 [Power9]Legalize and emit code for quad-precision copySign/abs/nabs/neg/sqrt
Legalize and emit code for quad-precision floating point operations:

  * xscpsgnqp
  * xsabsqp
  * xsnabsqp
  * xsnegqp
  * xssqrtqp

Differential Revision: https://reviews.llvm.org/D44530

llvm-svn: 327889
2018-03-19 19:22:52 +00:00
Lei Huang 6d1596a98c [PowerPC][Power9]Legalize and emit code for quad-precision add/div/mul/sub
Legalize and emit code for quad-precision floating point operations:

  * xsaddqp
  * xssubqp
  * xsdivqp
  * xsmulqp

Differential Revision: https://reviews.llvm.org/D44506

llvm-svn: 327878
2018-03-19 18:52:20 +00:00
Nemanja Ivanovic d9d5bd3067 [PowerPC] Make AddrSpaceCast noop
PowerPC targets do not use address spaces. As a result, we can get selection
failures with address space casts. This patch makes those casts noops.

Patch by Valentin Churavy.

Differential revision: https://reviews.llvm.org/D43781

llvm-svn: 327877
2018-03-19 18:50:02 +00:00
Zaara Syeda 01f414baaa Revert [MachineLICM] This reverts commit rL327856
Failing build bots. Revert the commit now.

llvm-svn: 327864
2018-03-19 16:19:44 +00:00
Zaara Syeda ff05e2b0e6 [MachineLICM] Add functions to MachineLICM to hoist invariant stores
This patch adds functions to allow MachineLICM to hoist invariant stores.
Currently, MachineLICM does not hoist any store instructions, however
when storing the same value to a constant spot on the stack, the store
instruction should be considered invariant and be hoisted. The function
isInvariantStore iterates each operand of the store instruction and checks
that each register operand satisfies isCallerPreservedPhysReg. The store
may be fed by a copy, which is hoisted by isCopyFeedingInvariantStore.
This patch also adds the PowerPC changes needed to consider the stack
register as caller preserved.

Differential Revision: https://reviews.llvm.org/D40196

llvm-svn: 327856
2018-03-19 14:52:25 +00:00
Nicolai Haehnle 18f1998a00 TableGen: Explicitly test some cases of self-references and !cast errors
Summary:
These are cases of self-references that exist today in practice. Let's
add tests for them to avoid regressions.

The self-references in PPCInstrInfo.td can be expressed in a simpler
way. Allowing this type of self-reference while at the same time
consistently doing late-resolve even for self-references is problematic
because there are references to fields that aren't in any class. Since
there's no need for this type of self-reference anyway, let's just
remove it.

Change-Id: I914e0b3e1ae7adae33855fac409b536879bc3f62

Reviewers: arsenm, craig.topper, tra, MartinO

Subscribers: nemanjai, wdng, kbarton, llvm-commits

Differential Revision: https://reviews.llvm.org/D44474

llvm-svn: 327848
2018-03-19 14:14:10 +00:00
Guozhi Wei 9c916584ba [PPC] Avoid non-simple MVT in STBRX optimization
PR35402 triggered this case. It bswap and stores a 48bit value, current STBRX optimization transforms it into STBRX. Unfortunately 48bit is not a simple MVT, there is no PPC instruction to support it, and it can't be automatically expanded by llvm, so caused a crash.

This patch detects the non-simple MVT and returns early.

Differential Revision: https://reviews.llvm.org/D44500

llvm-svn: 327651
2018-03-15 17:49:12 +00:00
Zaara Syeda 1110c4d336 [PowerPC] Optimize TLS initial-exec sequence to use X-Form loads/stores
This patch adds new load/store instructions for integer scalar types
which can be used for X-Form when fed by add with an @tls relocation.

Differential Revision: https://reviews.llvm.org/D43315

llvm-svn: 327635
2018-03-15 15:34:41 +00:00
Lei Huang 1f8da3ae19 [PowerPC][NFC] formatting-only fix
llvm-svn: 327599
2018-03-15 03:06:44 +00:00
Zaara Syeda df28fb6ac2 test commit: fix formatting of a comment
This is  a simple change to do the test commit.

llvm-svn: 327412
2018-03-13 15:49:05 +00:00
Lei Huang cd4f385795 [PowerPC][NFC] Explicitly state types on FP SDAG patterns in anticipation of adding the f128 type
llvm-svn: 327319
2018-03-12 19:26:18 +00:00
Stefan Pintilie 735817aa1a [Power9] Code Cleaup and adding Comments for Power 9 Scheduler
Did some code cleanup up removing ItinRW that are not needed and resource types
that are no longer used.

Also added more comments to the td files related to the Power 9 sheduler model.

llvm-svn: 327174
2018-03-09 21:08:35 +00:00
Stefan Pintilie ef7c4976bb Revert "[PowerPC] LSR tunings for PowerPC"
Revert the rest of the LST tune commit.
It seems that the LSR tune commit breaks internal tests.
Reverting the commit.

llvm-svn: 327143
2018-03-09 16:08:55 +00:00
Stefan Pintilie 235fb927b0 [Power9] Add more missing instructions to the Power 9 scheduler
With this patch we should be able to mark the Power 9 model as complete.

llvm-svn: 327021
2018-03-08 16:24:33 +00:00
Stefan Pintilie f8438e8e59 [PowerPC] LSR tunings for PowerPC
The purpose of this patch is to have LSR generate better code on Power.
This is done by overriding isLSRCostLess.

Differential Revision: https://reviews.llvm.org/D40855

llvm-svn: 326906
2018-03-07 16:53:09 +00:00
Craig Topper 80d3bb3b4b [TargetLowering] Rename DAGCombinerInfo::isAfterLegalizeVectorOps to DAGCombiner::isAfterLegalizeDAG since that's what it checks. NFC
The code checks Level == AfterLegalizeDAG which is the fourth and last of the possible DAG combine stages that we have.

There is a Level called AfterLegalVectorOps, but that's the third DAG combine and it doesn't always run.

A function called isAfterLegalVectorOps should imply it returns true in either of the DAG combines that runs after the legalize vector ops stage, but that's not what this function does.

llvm-svn: 326832
2018-03-06 19:44:52 +00:00
Nemanja Ivanovic 6cc31ca814 [PowerPC] Do not emit record-form rotates when record-form andi suffices
Up until Power9, the performance profile for rlwinm., rldicl. and andi. looked
more or less equivalent. However with Power9, the rotates are still 2-way
cracked whereas the and-immediate is not.

This patch just ensures that we don't emit record-form rotates when an andi.
is adequate.

As first pointed out by Carrot in https://bugs.llvm.org/show_bug.cgi?id=30833
(this patch is a fix for that PR).

Differential Revision: https://reviews.llvm.org/D43977

llvm-svn: 326736
2018-03-05 19:27:16 +00:00
Stefan Pintilie d45db612c6 [Power9] Add more missing instructions to the Power 9 scheduler
Adding more instructions using InstRW so that we can move away from ItinRW
and ultimately have a complete Power 9 scheduler.

llvm-svn: 326701
2018-03-05 14:34:59 +00:00
Stefan Pintilie b5a9440a80 [Power9] Add missing instructions to the Power 9 scheduler
Adding more instructions using InstRW so that we can move away from ItinRW
and ultimately have a complete Power 9 scheduler.

llvm-svn: 326578
2018-03-02 14:41:38 +00:00
Stefan Pintilie e894e0ff6f [Power9] Add missing instructions to the Power 9 scheduler
Adding more instructions using InstRW so that we can move away from ItinRW
and ultimately have a complete Power 9 scheduler.

Differential Revision: https://reviews.llvm.org/D43899

llvm-svn: 326447
2018-03-01 16:16:08 +00:00
Chih-Hung Hsieh 9f9e4681ac [TLS] use emulated TLS if the target supports only this mode
Emulated TLS is enabled by llc flag -emulated-tls,
which is passed by clang driver.
When llc is called explicitly or from other drivers like LTO,
missing -emulated-tls flag would generate wrong TLS code for targets
that supports only this mode.
Now use useEmulatedTLS() instead of Options.EmulatedTLS to decide whether
emulated TLS code should be generated.
Unit tests are modified to run with and without the -emulated-tls flag.

Differential Revision: https://reviews.llvm.org/D42999

llvm-svn: 326341
2018-02-28 17:48:55 +00:00
Nemanja Ivanovic bcc82c9a78 [PowerPC] Disable shrink-wrapping when getting PC address through the LR
The instruction sequence used to get the address of the PC into a GPR requires
that we clobber the link register. Doing so without having first saved it in
the prologue leaves the function unable to return. Currently, this sequence is
emitted into the entry block. To ensure the prologue is inserted before this
sequence, disable shrink-wrapping.

This fixes PR33547.

Differential Revision: https://reviews.llvm.org/D43677

llvm-svn: 325972
2018-02-23 23:08:34 +00:00
Stefan Pintilie 626b651016 [Power9] Add missing instructions to the Power 9 scheduler
This is the first in a series of patches that will define more
instructions using InstRW so that we can move away from ItinRW
and ultimately have a complete Power 9 scheduler.

Differential Revision: https://reviews.llvm.org/D43635

llvm-svn: 325956
2018-02-23 20:37:10 +00:00
Geoff Berry f8bf2ec0a8 [MachineOperand][Target] MachineOperand::isRenamable semantics changes
Summary:
Add a target option AllowRegisterRenaming that is used to opt in to
post-register-allocation renaming of registers.  This is set to 0 by
default, which causes the hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq
fields of all opcodes to be set to 1, causing
MachineOperand::isRenamable to always return false.

Set the AllowRegisterRenaming flag to 1 for all in-tree targets that
have lit tests that were effected by enabling COPY forwarding in
MachineCopyPropagation (AArch64, AMDGPU, ARM, Hexagon, Mips, PowerPC,
RISCV, Sparc, SystemZ and X86).

Add some more comments describing the semantics of the
MachineOperand::isRenamable function and how it is set and maintained.

Change isRenamable to check the operand's opcode
hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq bit directly instead of
relying on it being consistently reflected in the IsRenamable bit
setting.

Clear the IsRenamable bit when changing an operand's register value.

Remove target code that was clearing the IsRenamable bit when changing
registers/opcodes now that this is done conservatively by default.

Change setting of hasExtraSrcRegAllocReq in AMDGPU target to be done in
one place covering all opcodes that have constant pipe read limit
restrictions.

Reviewers: qcolombet, MatzeB

Subscribers: aemerson, arsenm, jyknight, mcrosier, sdardis, nhaehnle, javed.absar, tpr, arichardson, kristof.beyls, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, escha, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D43042

llvm-svn: 325931
2018-02-23 18:25:08 +00:00
Stefan Pintilie 15e6b10ee0 [PowerPC] Code cleanup. Remove instructions that were withdrawn from Power 9.
The following set of instructions was originally planned to be added for Power 9
and so code was added to support them. However, a decision was made later on to
withdraw support for these instructions in the hardware.
xscmpnedp
xvcmpnesp
xvcmpnedp
This patch removes support for the instructions that were not added.

Differential Revision: https://reviews.llvm.org/D43641

llvm-svn: 325918
2018-02-23 15:55:16 +00:00
Nemanja Ivanovic e54a9ee8ac [PowerPC] Do not produce invalid CTR loop with an FRem
An FRem instruction inside a loop should prevent the loop from being converted
into a CTR loop since this is not an operation that is legal on any PPC
subtarget. This will always be a call to a library function which means the
loop will be invalid if this instruction is in the body.

Fixes PR36292.

llvm-svn: 325739
2018-02-22 03:02:41 +00:00
Lei Huang dfd41552f4 [PowerPC] Reduce stack frame for fastcc functions by only allocating parameter save area when needed
Current implementation always allocates the parameter save area conservatively
for fastcc functions. There is no reason to allocate the parameter save area if
all the parameters can be passed via registers.

Differential Revision: https://reviews.llvm.org/D42602

llvm-svn: 325581
2018-02-20 15:09:45 +00:00
Nemanja Ivanovic 6cf41b028d [PowerPC] Fix transform in table gen file causing UB
Running a bootstrap build with UBSan produces a number of instances where
we have signed integer overflow due to this transform. Change the type to
long to prevent this UB on 64-bit build machines.

llvm-svn: 325347
2018-02-16 14:49:01 +00:00
Francis Visoiu Mistrih fb7b14f70d [CodeGen] Unify the syntax of MBB liveins in MIR and -debug output
Instead of:

Live Ins: %r0 %r1

print:

liveins: %r0, %r1
llvm-svn: 324694
2018-02-09 01:14:44 +00:00
Hiroshi Inoue ad48d2fe61 [PowerPC] fix up in rL324229, NFC
This patch fixes up my previous commit (add initialization of local variables).

llvm-svn: 324336
2018-02-06 11:34:16 +00:00
Hiroshi Inoue c5ab1ab797 [PowerPC] Check hot loop exit edge in PPCCTRLoops
PPCCTRLoops transform loops using mtctr/bdnz instructions if loop trip count is known and big enough to compensate for the cost of mtctr.
But if there is a loop exit edge which is known to be frequently taken (by builtin_expect or by PGO), we should not transform the loop to avoid the cost of mtctr instruction. Here is an example of a loop with hot exit edge:

for (unsigned i = 0; i < TripCount; i++) {
  // do something
  if (__builtin_expect(check(), 1))
    break;
  // do something
}

Differential Revision: https://reviews.llvm.org/D42637

llvm-svn: 324229
2018-02-05 12:25:29 +00:00
David Blaikie d8a6f93aae Remove non-modular header containing static utility functions
The one place that uses these functions isn't particularly
long/complicated, so it's easier to just have these inline at that
location than trying to split it out into a true header. (in part also
because of the use of the DEBUG macros, which make this not really a
standalone header even if the static functions were made inline instead)

llvm-svn: 324044
2018-02-02 00:33:50 +00:00
Nemanja Ivanovic 77e34f15c9 [PowerPC] Tell VSX swap removal that scalar conversions are lane-sensitive
This is a rather non-controversial change. We were missing these instructions
from the list of instructions that are lane-sensitive. These two put the result
into lane 0 (BE) or 3 (LE) regardless of the input. This patch fixes PR36068.

llvm-svn: 324005
2018-02-01 21:09:04 +00:00
Jonas Paulsson e6a8329e9f [PowerPC] Return true in enableMultipleCopyHints().
Enable multiple COPY hints to eliminate more COPYs during register allocation.

Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.

Review: Nemanja Ivanovic
llvm-svn: 323858
2018-01-31 09:26:51 +00:00
Zaara Syeda 1f59ae311b Re-commit : [PowerPC] Add handling for ColdCC calling convention and a pass to mark
candidates with coldcc attribute.

This recommits r322721 reverted due to sanitizer memory leak build bot failures.

Original commit message:
This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.

Differential Revision: https://reviews.llvm.org/D38413

llvm-svn: 323778
2018-01-30 16:17:22 +00:00
Hiroshi Inoue c8e9245816 [NFC] fix trivial typos in comments and documents
"to to" -> "to"

llvm-svn: 323628
2018-01-29 05:17:03 +00:00
Tim Shen 7abe9887b0 [PPC] Avoid incorrect fp-i128-fp lowering.
Summary:
Fix an issue that's similar to what D41411 fixed:
  float(__int128(float_var)) shouldn't be optimized to xscvdpsxds +
  xscvsxdsp, as they mean (float)(int64_t)float_var.

Reviewers: jtony, hfinkel, echristo

Subscribers: sanjoy, nemanjai, hiraditya, llvm-commits, kbarton

Differential Revision: https://reviews.llvm.org/D42400

llvm-svn: 323270
2018-01-23 22:06:57 +00:00
Zaara Syeda c9dc7b451b Revert [PowerPC] This reverts commit rL322721
Failing build bots. Revert the commit now.

llvm-svn: 322748
2018-01-17 20:00:15 +00:00
Zaara Syeda 8e951fd2f6 [PowerPC] Add handling for ColdCC calling convention and a pass to mark
candidates with coldcc attribute.

This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.

Differential Revision: https://reviews.llvm.org/D38413

llvm-svn: 322721
2018-01-17 18:22:55 +00:00
Guozhi Wei e6fb4e1f8a [PPC] Add a new register XER aliased to CARRY
When "xer" is specified as clobbered register in inline assembler, clang can accept it, but llvm simply ignore it when lowered to machine instructions. It may cause problems later in scheduler.

This patch adds a new register XER aliased to CARRY, and adds it to register class CARRYRC. Now PPCTargetLowering::getRegForInlineAsmConstraint can return correct register number for inline asm constraint "{xer}", and scheduler behave correctly.

Differential Revision: https://reviews.llvm.org/D41967

llvm-svn: 322591
2018-01-16 19:28:50 +00:00
Benjamin Kramer 309124e0b1 [PowerPC] Don't miscompile rotate+mask into an ANDIo if it can't recreate the immediate
I'm not even sure if this transform is ever worth it, but this at least
stops the bleeding.

llvm-svn: 322373
2018-01-12 15:03:24 +00:00
Nemanja Ivanovic ebb23078e9 [PowerPC] Zero-extend the compare operand for ATOMIC_CMP_SWAP
Part of the fix for https://bugs.llvm.org/show_bug.cgi?id=35812.
This patch ensures that the compare operand for the atomic compare and swap
is properly zero-extended to 32 bits if applicable.
A follow-up commit will fix the extension for the SETCC node generated when
expanding an ATOMIC_CMP_SWAP_WITH_SUCCESS. That will complete the bug fix.

Differential Revision: https://reviews.llvm.org/D41856

llvm-svn: 322372
2018-01-12 14:58:41 +00:00
Stefan Pintilie 70bfe66111 Revert "[PowerPC] Manually schedule the prologue and epilogue"
This reverts commit r322124 since some tests were broken by that patch.
Will recommmit once the patch is fixed.

llvm-svn: 322369
2018-01-12 13:12:49 +00:00
Stefan Pintilie 1712700842 [PowerPC] Manually schedule the prologue and epilogue
This patch makes the following changes to the schedule of instructions in the
prologue and epilogue.

The stack pointer update is moved down in the prologue so that the callee saves
do not have to wait for the update to happen.
Saving the lr is moved down in the prologue to hide the latency of the mflr.
The stack pointer is moved up in the epilogue so that restoring of the lr can
happen sooner.
The mtlr is moved up in the epilogue so that it is away form the blr at the end
of the epilogue. The latency of the mtlr can now be hidden by the loads of the
callee saved registers.

This commit is almost identical to this one: r322036 except that two warnings
that broke build bots have been fixed.

The revision number is D41737 as before.

llvm-svn: 322124
2018-01-09 21:57:49 +00:00
Sean Fertile 33a17762bb [PowerPC] Can not assume an intrinsic argument is a simple type.
The CTRLoop pass performs checks on the argument of certain libcalls/intrinsics,
and assumes the arguments must be of a simple type. This isn't always the case
though. For example if we unroll and vectorize a loop we may end up with vectors
larger then the largest legal type, along with intrinsics that operate on those
wider types. This happened in the ffmpeg build, where we unrolled a loop and
ended up with a sqrt intrinsic that operated on V16f64, triggering an assertion.

Differential Revision: https://reviews.llvm.org/D41758

llvm-svn: 322055
2018-01-09 03:03:41 +00:00
Stefan Pintilie 7e10987b12 Revert "[PowerPC] Manually schedule the prologue and epilogue"
[PowerPC] This reverts commit r322036.

Failing build bots. Revert the commit now.

llvm-svn: 322051
2018-01-09 01:06:21 +00:00
Stefan Pintilie 55bfdd040a [PowerPC] Manually schedule the prologue and epilogue
This patch makes the following changes to the schedule of instructions in the
prologue and epilogue.

The stack pointer update is moved down in the prologue so that the callee saves
do not have to wait for the update to happen.
Saving the lr is moved down in the prologue to hide the latency of the mflr.
The stack pointer is moved up in the epilogue so that restoring of the lr can
happen sooner.
The mtlr is moved up in the epilogue so that it is away form the blr at the end
of the epilogue. The latency of the mtlr can now be hidden by the loads of the
callee saved registers.

Differential Revision: https://reviews.llvm.org/D41737

llvm-svn: 322036
2018-01-08 22:23:10 +00:00
Craig Topper d461aefe5f [PowerPC] Add an ISD::TRUNCATE to the legalization for ppc_is_decremented_ctr_nonzero
Summary:
I believe legalization is really expecting that ReplaceNodeResults will return something with the same type as the thing that's being legalized. Ultimately, it uses the output to replace the uses in the DAG so the type should match to make that work.

There are two relevant cases here. When crbits are enabled, then i1 is a legal type and getSetCCResultType should return i1. In this case, the truncate will be between i1 and i1 and should be removed (SelectionDAG::getNode does this). Otherwise, getSetCCResultType will be i32 and the legalizer will promote the truncate to be i32 -> i32 which will be similarly removed.

With this fixed we can remove some code from PromoteIntRes_SETCC that seemed to only exist to deal with the intrinsic being replaced with a larger type without changing the other operand. With the truncate being used for connectivity this doesn't happen anymore.

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: nemanjai, llvm-commits, kbarton

Differential Revision: https://reviews.llvm.org/D41654

llvm-svn: 321959
2018-01-07 07:51:36 +00:00
Alex Bradbury b22f751fa7 Thread MCSubtargetInfo through Target::createMCAsmBackend
Currently it's not possible to access MCSubtargetInfo from a TgtMCAsmBackend. 
D20830 threaded an MCSubtargetInfo reference through 
MCAsmBackend::relaxInstruction, but this isn't the only function that would 
benefit from access. This patch removes the Triple and CPUString arguments 
from createMCAsmBackend and replaces them with MCSubtargetInfo.

This patch just changes the interface without making any intentional 
functional changes. Once in, several cleanups are possible:
* Get rid of the awkward MCSubtargetInfo handling in ARMAsmBackend
* Support 16-bit instructions when valid in MipsAsmBackend::writeNopData
* Get rid of the CPU string parsing in X86AsmBackend and just use a SubtargetFeature for HasNopl
* Emit 16-bit nops in RISCVAsmBackend::writeNopData if the compressed instruction set extension is enabled (see D41221)

This change initially exposed PR35686, which has since been resolved in r321026.

Differential Revision: https://reviews.llvm.org/D41349

llvm-svn: 321692
2018-01-03 08:53:05 +00:00
Hiroshi Inoue ca3cdd7f27 [PowerPC] fix a bug in TCO eligibility check
If the callee and caller use different calling convensions, we cannot apply TCO if the callee requires arguments on stack; e.g. C calling convention and Fast CC use the same registers for parameter passing, but the stack offset is not necessarily same.

This patch also recommit r319218 "[PowerPC] Allow tail calls of fastcc functions from C CallingConv functions." by @sfertile since the problem reported in r320106 should be fixed.

Differential Revision: https://reviews.llvm.org/D40893

llvm-svn: 321579
2017-12-30 08:09:04 +00:00
Nemanja Ivanovic 4e1f5e0734 [PowerPC] Fix for PR35688 - handle out-of-range values for r+r to r+i conversion
Revision 320791 introduced a pass that transforms reg+reg instructions to
reg+imm if they're fed by "load immediate". However, it didn't
handle out-of-range shifts correctly as reported in PR35688.
This patch fixes that and therefore the PR.

Furthermore, there was undefined behaviour in the patch where the RHS of an
initialization expression was 32 bits and constant `1` was shifted left 32
bits. This was fixed by ensuring the RHS is 64 bits just like the LHS.

Differential Revision: https://reviews.llvm.org/D41369

llvm-svn: 321551
2017-12-29 12:22:27 +00:00
Sanjoy Das 26d11ca4b0 (Re-landing) Expose a TargetMachine::getTargetTransformInfo function
Re-land r321234.  It had to be reverted because it broke the shared
library build.  The shared library build broke because there was a
missing LLVMBuild dependency from lib/Passes (which calls
TargetMachine::getTargetIRAnalysis) to lib/Target.  As far as I can
tell, this problem was always there but was somehow masked
before (perhaps because TargetMachine::getTargetIRAnalysis was a
virtual function).

Original commit message:

This makes the TargetMachine interface a bit simpler.  We still need
the std::function in TargetIRAnalysis to avoid having to add a
dependency from Analysis to Target.

See discussion:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html

I avoided adding all of the backend owners to this review since the
change is simple, but let me know if you feel differently about this.

Reviewers: echristo, MatzeB, hfinkel

Reviewed By: hfinkel

Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits

Differential Revision: https://reviews.llvm.org/D41464

llvm-svn: 321375
2017-12-22 18:21:59 +00:00
Tony Jiang eba757e45c [PowerPC] Fix parest build failure in SPEC2017.
The build failure was caused by an assertion in pre-legalization DAGCombine:

Combining: t6: ppcf128 = uint_to_fp t5
... into: t20: f32 = PPCISD::FCFIDUS t19

which is clearly wrong since ppcf128 are definitely different type with f32 and
we cannot change the node value type when do DAGCombine. The fix is don't
handle ppc_fp128 or i1 conversions in PPCTargetLowering::combineFPToIntToFP and
leave it to downstream to legalize it and expand it to small legal types.

Differential Revision: https://reviews.llvm.org/D41411

llvm-svn: 321276
2017-12-21 15:42:50 +00:00
Sanjoy Das 747d1114d6 Revert "Expose a TargetMachine::getTargetTransformInfo function"
This reverts commit r321234.  It breaks the -DBUILD_SHARED_LIBS=ON build.

llvm-svn: 321243
2017-12-21 02:34:39 +00:00
Sanjoy Das 0c3de350b4 Expose a TargetMachine::getTargetTransformInfo function
Summary:
This makes the TargetMachine interface a bit simpler.  We still need
the std::function in TargetIRAnalysis to avoid having to add a
dependency from Analysis to Target.

See discussion:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html

I avoided adding all of the backend owners to this review since the
change is simple, but let me know if you feel differently about this.

Reviewers: echristo, MatzeB, hfinkel

Reviewed By: hfinkel

Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits

Differential Revision: https://reviews.llvm.org/D41464

llvm-svn: 321234
2017-12-21 01:06:58 +00:00
Stefan Pintilie 4241821848 [PowerPC] Added an assert to make sure that the MBBI iterator is valid.
The function createTailCallBranchInstr assumes that the iterator MBBI is valid.
However, only one use of MBBI is guarded in the function.
Fix this by adding an assert.

Differential Revision: https://reviews.llvm.org/D41358

llvm-svn: 321205
2017-12-20 19:07:44 +00:00
Hiroshi Inoue 11e571e0c6 [PowerPC] fix a bug in redundant compare elimination
This patch fixes a bug in the redundant compare elimination reported in https://reviews.llvm.org/rL320786 and re-enables the optimization.

The redundant compare elimination assumes that we can replace signed comparison with unsigned comparison for the equality check. But due to the difference in the sign extension behavior we cannot change the opcode if the comparison is against an immediate and the most significant bit of the immediate is one.

Differential Revision: https://reviews.llvm.org/D41385

llvm-svn: 321147
2017-12-20 05:18:19 +00:00
Benjamin Kramer efc7c88ea8 [PPC] Also disable the pre-emit version of reg+reg to reg+imm transformation.
This has the same issue as the early pass disabled in r321010.

llvm-svn: 321013
2017-12-18 19:21:56 +00:00
Benjamin Kramer f4cc67acb6 [PPC] Disable reg+reg to reg+imm transformation.
It creates invalid instructions. PR35688.

llvm-svn: 321010
2017-12-18 18:56:57 +00:00
Hal Finkel e86a8b79b5 [PowerPC, AsmParser] Enable the mnemonic spell corrector
r307148 added an assembly mnemonic spelling correction support and enabled it
on ARM. This enables that support on PowerPC as well.

Patch by Dmitry Venikov, thanks!

Differential Revision: https://reviews.llvm.org/D40552

llvm-svn: 320911
2017-12-16 02:42:18 +00:00
Matthias Braun f1caa2833f MachineFunction: Return reference from getFunction(); NFC
The Function can never be nullptr so we can return a reference.

llvm-svn: 320884
2017-12-15 22:22:58 +00:00
Nemanja Ivanovic 6ab32dea12 Fix the second build bot break introduced by r320791.
llvm-svn: 320811
2017-12-15 14:17:45 +00:00
Nemanja Ivanovic 1794cdc481 Fix code causing fallthrough warnings in the PPC back end.
llvm-svn: 320806
2017-12-15 11:47:48 +00:00
Nemanja Ivanovic 74ecf59cc0 Fix the build bot break introduced by r320791.
llvm-svn: 320798
2017-12-15 09:51:34 +00:00
Nemanja Ivanovic 6995e5dae7 [PowerPC] Convert r+r instructions to r+i (pre and post RA)
This patch adds the necessary infrastructure to convert instructions that
take two register operands to those that take a register and immediate if
the necessary operand is produced by a load-immediate. Furthermore, it uses
this infrastructure to perform such conversions twice - first at MachineSSA
and then pre-emit.

There are a number of reasons we may end up with opportunities for this
transformation, including but not limited to:
- X-Form instructions chosen since the exact offset isn't available at ISEL time
- Atomic instructions with constant operands (we will add patterns for this
  in the future)
- Tail duplication may duplicate code where one block contains this redundancy
- When emitting compare-free code in PPCDAGToDAGISel, we don't handle constant
  comparands specially

Furthermore, this patch moves the initialization of PPCMIPeepholePass so that
it can be used for MIR tests.

llvm-svn: 320791
2017-12-15 07:27:53 +00:00
Nemanja Ivanovic 0d47d32caa Disabling r312514 as it causes miscompiles that show up on bootstrap
The compare elimination peephole introduced in https://reviews.llvm.org/rL312514
causes a miscompile in AMDGPUInstrInfo.cpp which in turn causes some AMDGPU
test case failures in stage2 bootstrap testing. This miscompile didn't cause any
test case failures until https://reviews.llvm.org/rL320614, so it appeared as if
that patch caused these failures.
Disabling this transformation for now to bring the build bots back to green and
the author of the patch will investigate the miscompile.

llvm-svn: 320786
2017-12-15 01:38:03 +00:00
Matt Arsenault 7d7adf4f2e TLI: Allow using PSV for intrinsic mem operands
llvm-svn: 320756
2017-12-14 22:34:10 +00:00
Matt Arsenault 1117133687 DAG: Expose all MMO flags in getTgtMemIntrinsic
Rather than adding more bits to express every
MMO flag you could want, just directly use the
MMO flags. Also fixes using a bunch of bool arguments to
getMemIntrinsicNode.

On AMDGPU, buffer and image intrinsics should always
have MODereferencable set, but currently there is no
way to do that directly during the initial intrinsic
lowering.

llvm-svn: 320746
2017-12-14 21:39:51 +00:00
Francis Visoiu Mistrih 5df3bbf3e6 [CodeGen] Print global addresses as @foo in both MIR and debug output
Work towards the unification of MIR and debug output by printing
`@foo` instead of `<ga:@foo>`.

Also print target flags in the MIR format since most of them are used on
global address operands.

Only debug syntax is affected.

llvm-svn: 320682
2017-12-14 10:03:09 +00:00
Nemanja Ivanovic 6af7524063 Fix link failure on one build bot introduced by r320584.
llvm-svn: 320589
2017-12-13 15:28:01 +00:00
Nemanja Ivanovic 6f590bf8bb [PowerPC] MachineSSA pass to reduce the number of CR-logical operations
The initial implementation of an MI SSA pass to reduce cr-logical operations.
Currently, the only operations handled by the pass are binary operations where
both CR-inputs come from the same block and the single use is a conditional
branch (also in the same block).

Committing this off by default to allow for a period of field testing. Will
enable it by default in a follow-up patch soon.

Differential Revision: https://reviews.llvm.org/D30431

llvm-svn: 320584
2017-12-13 14:47:35 +00:00
Craig Topper ac59db2efe [Targets] Don't automatically include the scheduler class enum from *GenInstrInfo.inc with GET_INSTRINFO_ENUM. Make targets request is separately.
Most of the targets don't need the scheduler class enum.

I have an X86 scheduler model change that causes some names in the enum to become about 18000 characters long. This is because using instregex in scheduler models causes the scheduler class to get named with every instruction that matches the regex concatenated together. MSVC has a limit of 4096 characters for an identifier name. Rather than trying to come up with way to reduce the name length, I'm just going to sidestep the problem by not including the enum in X86.

llvm-svn: 320552
2017-12-13 07:26:17 +00:00
Matthias Braun f842297d50 Rename LiveIntervalAnalysis.h to LiveIntervals.h
Headers/Implementation files should be named after the class they
declare/define.

Also eliminated an `#include "llvm/CodeGen/LiveIntervalAnalysis.h"` in
favor of `class LiveIntarvals;`

llvm-svn: 320546
2017-12-13 02:51:04 +00:00
Nemanja Ivanovic 6479c72fcd [PowerPC] Add branch flag on asm parser-only branch instructions
This flag was missing but it wasn't an issue as nothing depended on it
for these asm parser-only instructions. Now that LLDB support is slowly
landing, it is important to get this right.
Committing on behalf of Leonardo Bianconi.

Differential revision: https://reviews.llvm.org/D40846

llvm-svn: 320475
2017-12-12 12:33:09 +00:00
Nemanja Ivanovic b0783cccb7 [PowerPC] Follow-up to r318436 to get the missed CSE opportunities
The last of the three patches that https://reviews.llvm.org/D40348 was
broken up into.
Canonicalize the materialization of constants so that they are more likely
to be CSE'd regardless of the bit-width of the use. If a constant can be
materialized using PPC::LI, materialize it the same way always.
For example:
  li 4, -1
  li 4, 255
  li 4, 65535
are equivalent if the uses only use the low byte. Canonicalize it to the
first form.

Differential Revision: https://reviews.llvm.org/D40348

llvm-svn: 320473
2017-12-12 12:09:34 +00:00
Tony Jiang 3b49dc548f [PowerPC] Partially enable the ISEL expansion pass.
The pass to expand ISEL instructions into if-then-else sequences in patch D23630
is currently disabled. This patch partially enable it by always removing the
unnecessary ISELs (all registers used by the ISELs are the same one) and folding
the ISELs which have the same input registers into unconditional copies.

Differential Revision: https://reviews.llvm.org/D40497

llvm-svn: 320414
2017-12-11 20:42:37 +00:00
Nemanja Ivanovic 50d37a1129 [PowerPC] Sign-extend negative constant stores
Second part of https://reviews.llvm.org/D40348.
Revision r318436 has extended all constants feeding a store to 64 bits
to allow for CSE on the SDAG. However, negative constants were zero extended
which made the constant being loaded appear to be a positive value larger than
16 bits. This resulted in long sequences to materialize such constants
rather than simply a "load immediate". This patch just sign-extends those
updated constants so that they remain 16-bit signed immediates if they started
out that way.

llvm-svn: 320368
2017-12-11 14:35:48 +00:00
Tim Northover cf4701bb89 PowerPC: support external pid instructions in MC layer.
This adds assembly & disassembly support for the e500mc "external pid"
instructions.

See https://reviews.llvm.org/D39249.

Patch by vit9696 <vit9696@avp.su>

llvm-svn: 320287
2017-12-10 08:43:19 +00:00
Eric Christopher a469acac03 Temporarily revert "[PowerPC] Allow tail calls of fastcc functions from C CallingConv functions."
It is causing sanitizer failures on llvm tests in a bootstrapped compiler. No bot link since it's currently down, but following up to get the bot up.

This reverts commit r319218.

llvm-svn: 320106
2017-12-07 22:26:19 +00:00
Francis Visoiu Mistrih a8a83d150f [CodeGen] Use MachineOperand::print in the MIRPrinter for MO_Register.
Work towards the unification of MIR and debug output by refactoring the
interfaces.

For MachineOperand::print, keep a simple version that can be easily called
from `dump()`, and a more complex one which will be called from both the
MIRPrinter and MachineInstr::print.

Add extra checks inside MachineOperand for detached operands (operands
with getParent() == nullptr).

https://reviews.llvm.org/D40836

* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+)<def> ([^ ]+)/kill: \1 def \2 \3/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: \1 \2 def \3/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/kill: def ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: def \1 \2 def \3/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/<def>//g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<kill>/killed \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-use,kill>/implicit killed \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<dead>/dead \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<def[ ]*,[ ]*dead>/dead \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-def[ ]*,[ ]*dead>/implicit-def dead \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-def>/implicit-def \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-use>/implicit \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<internal>/internal \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<undef>/undef \1/g'

llvm-svn: 320022
2017-12-07 10:40:31 +00:00
Francis Visoiu Mistrih 25528d6de7 [CodeGen] Unify MBB reference format in both MIR and debug output
As part of the unification of the debug format and the MIR format, print
MBB references as '%bb.5'.

The MIR printer prints the IR name of a MBB only for block definitions.

* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(*\1)/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g'
* find . \( -name "*.txt" -o -name "*.s" -o -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g'
* grep -nr 'BB#' and fix

Differential Revision: https://reviews.llvm.org/D40422

llvm-svn: 319665
2017-12-04 17:18:51 +00:00
Nemanja Ivanovic 4364513cb2 Follow-up to r319434 to turn the pass on by default
Now that the patch has gone through the buildbot cycle,
turn it on by default.

llvm-svn: 319535
2017-12-01 12:02:59 +00:00
Nemanja Ivanovic db7e77047c [PowerPC] Recommit r314244 with refactoring and off by default
This re-commits everything that was pulled in r314244. The transformation
is off by default (patch to enable it to follow). The code is refactored
to have a single entry-point and provide fine-grained control over patterns
that it selects. This patch also fixes the bugs in the original code.

Everything that failed with the original patch has been re-tested with this
patch (with the transformation turned on). So the patch to turn this on is
soon to follow.

Differential Revision: https://reviews.llvm.org/D38575

llvm-svn: 319434
2017-11-30 13:39:10 +00:00
Francis Visoiu Mistrih 93ef145862 [CodeGen] Print "%vreg0" as "%0" in both MIR and debug output
As part of the unification of the debug format and the MIR format, avoid
printing "vreg" for virtual registers (which is one of the current MIR
possibilities).

Basically:

* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E "s/%vreg([0-9]+)/%\1/g"
* grep -nr '%vreg' . and fix if needed
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E "s/ vreg([0-9]+)/ %\1/g"
* grep -nr 'vreg[0-9]\+' . and fix if needed

Differential Revision: https://reviews.llvm.org/D40420

llvm-svn: 319427
2017-11-30 12:12:19 +00:00
Joerg Sonnenberger 4b1acff9b3 First step towards more human-friendly PPC assembler output:
- add -ppc-reg-with-percent-prefix option to use %r3 etc as register
  names
- split off logic for Darwinish verbose conditional codes into a helper
  function
- be explicit about Darwin vs AIX vs GNUish assembler flavors

Based on the patch from Alexandre Yukio Yamashita

Differential Revision: https://reviews.llvm.org/D39016

llvm-svn: 319381
2017-11-29 23:05:56 +00:00
Sean Fertile aab3ef76d9 [PowerPC] Relax the checking on AND/AND8 in isSignOrZeroExtended.
Separate the handling of AND/AND8 out from PHI/OR/ISEL checking. The reasoning
is the others need all their operands to be sign/zero extended for their output
to also be sign/zero extended. This is true for AND and sign-extension, but for
zero-extension we only need at least one of the input operands to be zero
extended for the result to also be zero extended.

Differential Revision: https://reviews.llvm.org/D39078

llvm-svn: 319289
2017-11-29 04:09:29 +00:00
Sean Fertile e200016ea9 [PowerPC] Allow tail calls of fastcc functions from C CallingConv functions.
Allow fastcc callees to be tail-called from ccc callers.

Differential Revision: https://reviews.llvm.org/D40355

llvm-svn: 319218
2017-11-28 20:25:58 +00:00
Francis Visoiu Mistrih 9d7bb0cb40 [CodeGen] Print register names in lowercase in both MIR and debug output
As part of the unification of the debug format and the MIR format,
always print registers as lowercase.

* Only debug printing is affected. It now follows MIR.

Differential Revision: https://reviews.llvm.org/D40417

llvm-svn: 319187
2017-11-28 17:15:09 +00:00
Zaara Syeda f94d58d908 [PowerPC] Remove redundant TOC saves
This patch adds a peep hole optimization to remove any redundant toc save
instructions added as part of the call sequence for indirect calls. It removes
any toc saves within a function that are dominated by another toc save.

Differential Revision: https://reviews.llvm.org/D39736

llvm-svn: 319087
2017-11-27 20:26:36 +00:00
Zaara Syeda 48cb3c1557 [Power9] Improvements to vector extract with variable index exploitation
This patch extends on to rL307174 to not use the power9 vector extract with
variable index instructions when extracting word element 1. For such cases,
the existing selection of MFVSRWZ provides a better sequence.

Differential Revision: https://reviews.llvm.org/D38287

llvm-svn: 319049
2017-11-27 17:11:03 +00:00
Tony Jiang 438bf4a66b [PPC] Heuristic to choose between a X-Form VSX ld/st vs a X-Form FP ld/st.
The VSX versions have the advantage of a full 64-register target whereas the FP
ones have the advantage of lower latency and higher throughput. So what we’re
after is using the faster instructions in low register pressure situations and
using the larger register file in high register pressure situations.

The heuristic chooses between the following 7 pairs of instructions.
PPC::LXSSPX vs PPC::LFSX
PPC::LXSDX vs PPC::LFDX
PPC::STXSSPX vs PPC::STFSX
PPC::STXSDX vs PPC::STFDX
PPC::LXSIWAX vs PPC::LFIWAX
PPC::LXSIWZX vs PPC::LFIWZX
PPC::STXSIWX vs PPC::STFIWX

Differential Revision: https://reviews.llvm.org/D38486

llvm-svn: 318651
2017-11-20 14:38:30 +00:00
David Blaikie b3bde2ea50 Fix a bunch more layering of CodeGen headers that are in Target
All these headers already depend on CodeGen headers so moving them into
CodeGen fixes the layering (since CodeGen depends on Target, not the
other way around).

llvm-svn: 318490
2017-11-17 01:07:10 +00:00
Guozhi Wei 433e8d3e04 [PPC] Change i32 constant in store instruction to i64
This patch changes all i32 constant in store instruction to i64 with truncation, to increase the chance that the referenced constant can be shared with other i64 constant.

Differential Revision: https://reviews.llvm.org/D39352

llvm-svn: 318436
2017-11-16 18:27:34 +00:00
Daniel Sanders 725584e26d Add backend name to Target to enable runtime info to be fed back into TableGen
Summary:
Make it possible to feed runtime information back to tablegen to enable
profile-guided tablegen-eration, detection of untested tablegen definitions, etc.

Being a cross-compiler by nature, LLVM will potentially collect data for multiple
architectures (e.g. when running 'ninja check'). We therefore need a way for
TableGen to figure out what data applies to the backend it is generating at the
time. This patch achieves that by including the name of the 'def X : Target ...'
for the backend in the TargetRegistry.

Reviewers: qcolombet

Reviewed By: qcolombet

Subscribers: jholewinski, arsenm, jyknight, aditya_nandakumar, sdardis, nemanjai, ab, nhaehnle, t.p.northover, javed.absar, qcolombet, llvm-commits, fedor.sergeev

Differential Revision: https://reviews.llvm.org/D39742

llvm-svn: 318352
2017-11-15 23:55:44 +00:00
Sean Fertile 0f0837e84e [PowerPC] Implement mayBeEmittedAsTailCall for PPC
Implements TargetLowering callback 'mayBeEmittedAsTailCall' that enables
CodeGenPrepare to duplicate returns when they might enable a tail-call.

Differential Revision: https://reviews.llvm.org/D39777

llvm-svn: 318321
2017-11-15 18:58:27 +00:00
Sean Fertile 7b056b3048 [PowerPC] Split out the tailcall calling convention checks. NFC.
Move the calling convention checks for tail-call eligibility for the 64-bit
SysV ABI into a separate function. This is so that it can be shared with
'mayBeEmittedAsTailCall' in a subsequent change.

llvm-svn: 318305
2017-11-15 16:53:41 +00:00
Hiroshi Inoue 72a1f98a67 [PowerPC] fix up in redundant compare elimination
This patch fixes a potential problem in my previous commit (https://reviews.llvm.org/rL312514) by introducing an additional check.

llvm-svn: 318266
2017-11-15 04:23:26 +00:00
David Blaikie 3f833edc7c Target/TargetInstrInfo.h -> CodeGen/TargetInstrInfo.h to match layering
This header includes CodeGen headers, and is not, itself, included by
any Target headers, so move it into CodeGen to match the layering of its
implementation.

llvm-svn: 317647
2017-11-08 01:01:31 +00:00
Graham Yiu 5cd044e8c8 Use new vector insert half-word and byte instructions when we see insertelement on '8 x i16' and '16 x i8' types. Also extended existing lit testcase to cover these cases.
Differential Revision: https://reviews.llvm.org/D34630

llvm-svn: 317613
2017-11-07 20:55:43 +00:00
Graham Yiu 52a52a6cab Fix buildbot breakages from r317503. Add parentheses to assignment when using result as a condition.
llvm-svn: 317508
2017-11-06 21:04:19 +00:00
Graham Yiu 030621bbcb Adds code to PPC ISEL lowering to recognize byte inserts from vector_shuffles, and use P9 shift and vector insert byte instructions instead of vperm. Extends tests from vector insert half-word.
Differential Revision: https://reviews.llvm.org/D34497

llvm-svn: 317503
2017-11-06 20:18:30 +00:00
Guozhi Wei e3b8d9a312 [PPC] Use xxbrd to speed up bswap64
Power doesn't have bswap instructions, so llvm generates following code sequence for bswap64.

  rotldi   5, 3, 16
  rotldi   4, 3, 8
  rotldi   9, 3, 24
  rotldi   10, 3, 32
  rotldi   11, 3, 48
  rotldi   12, 3, 56
  rldimi 4, 5, 8, 48
  rldimi 4, 9, 16, 40
  rldimi 4, 10, 24, 32
  rldimi 4, 11, 40, 16
  rldimi 4, 12, 48, 8
  rldimi 4, 3, 56, 0

But Power9 has vector bswap instructions, they can also be used to speed up scalar bswap intrinsic. With this patch, bswap64 can be translated to:

  mtvsrdd 34, 3, 3
  xxbrd 34, 34
  mfvsrld 3, 34

Differential Revision: https://reviews.llvm.org/D39510

llvm-svn: 317499
2017-11-06 19:09:38 +00:00
David Blaikie 1be62f0327 Move TargetFrameLowering.h to CodeGen where it's implemented
This header already includes a CodeGen header and is implemented in
lib/CodeGen, so move the header there to match.

This fixes a link error with modular codegeneration builds - where a
header and its implementation are circularly dependent and so need to be
in the same library, not split between two like this.

llvm-svn: 317379
2017-11-03 22:32:11 +00:00
Graham Yiu 671526148c Adds code to PPC ISEL lowering to recognize half-word inserts from vector_shuffles, and use P9 shift and vector insert instructions instead of vperm.
Differential Revision: https://reviews.llvm.org/D34160

llvm-svn: 317111
2017-11-01 18:06:56 +00:00
Stefan Pintilie 6262fd4b0a Revert "[PowerPC] Try to simplify a Swap if it feeds a Splat"
Revert r316478.
A test case has failed.
Will recommit this change once we find and fix the failure.

This reverts commit 7c330fabaedaba3d02c58bc3cc1198896c895f34.

llvm-svn: 316952
2017-10-30 19:55:38 +00:00
Fangrui Song 2696db90d1 [PPC CodeGen] Fix the bitreverse.i64 intrinsic.
Summary: The two 32-bit words were swapped. Update a test omitted in reverted r316270.

Reviewers: jtony, aaron.ballman

Subscribers: nemanjai, kbarton

Differential Revision: https://reviews.llvm.org/D39163

llvm-svn: 316916
2017-10-30 16:03:44 +00:00
Clement Courbet b2c3eb8cf1 [CodeGen][ExpandMemcmp] Allow memcmp to expand to vector loads (2).
- Targets that want to support memcmp expansions now return the list of
   supported load sizes.
 - Expansion codegen does not assume that all power-of-two load sizes
   smaller than the max load size are valid. For examples, this is not the
   case for x86(32bit)+sse2.

Fixes PR34887.

llvm-svn: 316905
2017-10-30 14:19:33 +00:00
Hiroshi Inoue b72b1fb0de [PowerPC] Use record-form instruction for Less-or-Equal -1 and Greater-or-Equal 1
Currently a record-form instruction is used for comparison of "greater than -1" and "less than 1" by modifying the predicate (e.g. LT 1 into LE 0) in addition to the naive case of comparison against 0.
This patch also enables emitting a record-form instruction for "less than or equal to -1" (i.e. "less than 0") and "greater than or equal to 1" (i.e. "greater than 0") to increase the optimization opportunities.

Differential Revision: https://reviews.llvm.org/D38941

llvm-svn: 316647
2017-10-26 09:01:51 +00:00
Stefan Pintilie 8f0c783095 [PowerPC] Try to simplify a Swap if it feeds a Splat
If we have the situation where a Swap feeds a Splat we can sometimes change the
  index on the Splat and then remove the Swap instruction.

Fixed the test case that was failing and recommit after pulling the original
  commit.

  Original revision is here: https://reviews.llvm.org/D39009

llvm-svn: 316478
2017-10-24 17:44:27 +00:00
Saleem Abdulrasool fb490a0bcc PowerPC: support the separator character in the IAS
PowerPC uses ; as a comment leader and the @ as a separator character.
Support this properly.

llvm-svn: 316454
2017-10-24 16:19:56 +00:00
Stefan Pintilie 52bbd587ac Revert "[PowerPC] Try to simplify a Swap if it feeds a Splat"
Revert commit r316366.
Previous commit causes p8-scalar_vector_conversions.ll to fail.

This reverts commit 990e764ad8a2eec206ce5dda6aefab059ccd4e92.

llvm-svn: 316371
2017-10-23 20:22:23 +00:00
Stefan Pintilie feafa1d7f0 [PowerPC] Try to simplify a Swap if it feeds a Splat
If we have the situation where a Swap feeds a Splat we can sometimes change the
index on the Splat and then remove the Swap instruction.

Differential Revision: https://reviews.llvm.org/D39009

llvm-svn: 316366
2017-10-23 19:33:31 +00:00
Aaron Ballman fc02869c96 Reverting r316270 due to failing build bots.
http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules-2/builds/12899
http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/builds/7951

llvm-svn: 316276
2017-10-21 20:38:15 +00:00
Fangrui Song c7b749bd06 [PPC CodeGen] Fix the bitreverse.i64 intrinsic.
Summary: The two 32-bit words were swapped.

Subscribers: nemanjai, kbarton

Differential Revision: https://reviews.llvm.org/D38705

llvm-svn: 316270
2017-10-21 16:59:40 +00:00
Nemanja Ivanovic 0026c06e11 Disabling the transformation introduced in r315888
The commit at https://reviews.llvm.org/rL315888 is causing some failures
with internal testing. Disabling this code until we can resolve the issues.

llvm-svn: 316199
2017-10-20 00:36:46 +00:00
Graham Yiu 488782efa3 The cost of splitting a large vector instruction is not being taken into account by the getUserCost function. This was leading to some loops being over unrolled. The cost of a vector instruction is now being multiplied by the cost of the type legalization. This will return a more accurate cost.
Committing on behalf on Brad Nemanich (brad.nemanich@ibm.com)

Differential Revision: https://reviews.llvm.org/D38961

llvm-svn: 316174
2017-10-19 18:16:31 +00:00
Hiroshi Inoue 5388e66d3a [PowerPC] Use helper functions to check sign-/zero-extended value
Helper functions to identify sign- and zero-extending machine instruction is introduced in rL315888.
This patch makes PPCInstrInfo::optimizeCompareInstr use the helper functions. It simplifies the code and also makes possible more optimizations since the helper can do more analysis than the original check code; I observed about 5000 more compare instructions are eliminated while building LLVM.

Also, this patch fixes a bug in helpers on ANDIo instruction handling due to the order of checks. This bug causes a failure in an existing test case for optimizeCompareInstr.

Differential Revision: https://reviews.llvm.org/D38988

llvm-svn: 316071
2017-10-18 10:31:19 +00:00
Krzysztof Parzyszek 72518eaa6f Add iterator range MachineRegisterInfo::liveins(), adopt users, NFC
llvm-svn: 315927
2017-10-16 19:08:41 +00:00
Hiroshi Inoue a7eb78b47f [PowerPC] fix up in sign-/zero-extension elimination
This patch fixes a potential problem in my previous commit (https://reviews.llvm.org/rL315888) by adding a null check.

llvm-svn: 315900
2017-10-16 12:11:15 +00:00
Hiroshi Inoue e3a3e3c9e9 [PowerPC] Eliminate sign- and zero-extensions if already sign- or zero-extended
This patch enables redundant sign- and zero-extension elimination in PowerPC MI Peephole pass.
If the input value of a sign- or zero-extension is known to be already sign- or zero-extended, the operation is redundant and can be eliminated.
One common case is sign-extensions for a method parameter or for a method return value; they must be sign- or zero-extended as defined in PPC ELF ABI. 
For example of the following simple code, two extsw instructions are generated before the invocation of int_func and before the return. With this patch, both extsw are eliminated.

void int_func(int);
void ii_test(int a) {
    if (a & 1) return int_func(a);
}

Such redundant sign- or zero-extensions are quite common in many programs; e.g. I observed about 60,000 occurrences of the elimination while compiling the LLVM+CLANG.

Differential Revision: https://reviews.llvm.org/D31319

llvm-svn: 315888
2017-10-16 04:12:57 +00:00
Aaron Ballman 615eb47035 Reverting r315590; it did not include changes for llvm-tblgen, which is causing link errors for several people.
Error LNK2019 unresolved external symbol "public: void __cdecl `anonymous namespace'::MatchableInfo::dump(void)const " (?dump@MatchableInfo@?A0xf4f1c304@@QEBAXXZ) referenced in function "public: void __cdecl `anonymous namespace'::AsmMatcherEmitter::run(class llvm::raw_ostream &)" (?run@AsmMatcherEmitter@?A0xf4f1c304@@QEAAXAEAVraw_ostream@llvm@@@Z) llvm-tblgen D:\llvm\2017\utils\TableGen\AsmMatcherEmitter.obj 1

llvm-svn: 315854
2017-10-15 14:32:27 +00:00
Matt Arsenault f2db97d8fa DAG: Add opcode and source type to isFPExtFree
This is only currently used for mad/fma transforms.
This is the only case where it should be used for AMDGPU,
so add an opcode to be sure.

llvm-svn: 315740
2017-10-13 19:55:45 +00:00
Matthias Braun bb8507e63c Revert "TargetMachine: Merge TargetMachine and LLVMTargetMachine"
Reverting to investigate layering effects of MCJIT not linking
libCodeGen but using TargetMachine::getNameWithPrefix() breaking the
lldb bots.

This reverts commit r315633.

llvm-svn: 315637
2017-10-12 22:57:28 +00:00
Matthias Braun 3a9c114b24 TargetMachine: Merge TargetMachine and LLVMTargetMachine
Merge LLVMTargetMachine into TargetMachine.

- There is no in-tree target anymore that just implements TargetMachine
  but not LLVMTargetMachine.
- It should still be possible to stub out all the various functions in
  case a target does not want to use lib/CodeGen
- This simplifies the code and avoids methods ending up in the wrong
  interface.

Differential Revision: https://reviews.llvm.org/D38489

llvm-svn: 315633
2017-10-12 22:28:54 +00:00
Lei Huang 0724fea2da [PowerPC] Add profitablilty check for conversion to mtctr loops
Add profitability checks for modifying counted loops to use the mtctr instruction.

The latency of mtctr is only justified if there are more than 4 comparisons that
will be removed as a result.  Usually counted loops are formed relatively early
and before unrolling, so most low trip count loops often don't survive.  However
we want to ensure that if they do, we do not mistakenly update them to mtctr loops.

Use CodeMetrics to ensure we are only doing this for small loops with small trip counts.

Differential Revision: https://reviews.llvm.org/D38212

llvm-svn: 315592
2017-10-12 16:43:33 +00:00
Don Hinton 3e0199f7eb [dump] Remove NDEBUG from test to enable dump methods [NFC]
Summary:
Add LLVM_FORCE_ENABLE_DUMP cmake option, and use it along with
LLVM_ENABLE_ASSERTIONS to set LLVM_ENABLE_DUMP.

Remove NDEBUG and only use LLVM_ENABLE_DUMP to enable dump methods.

Move definition of LLVM_ENABLE_DUMP from config.h to llvm-config.h so
it'll be picked up by public headers.

Differential Revision: https://reviews.llvm.org/D38406

llvm-svn: 315590
2017-10-12 16:16:06 +00:00
Lei Huang 263dc4ef3a [PowerPC] Utilize DQ-Form instructions for spill/restore and fix FrameIndex elimination to only use `lis/addi` if necessary.
Currently we produce a bunch of unnecessary code when emitting the
prologue/epilogue for spills/restores.  Namely, if the load from stack
slot/store to stack slot instruction is an X-Form instruction, we will
always produce an LIS/ORI sequence for the stack offset.

Furthermore, we have not exploited the P9 vector D-Form loads/stores for this
purpose.

This patch address both issues.

Specifying the D-Form load as the instruction to use for stack spills/reloads
should be safe because:

1. The stack should be aligned according to the ABI
2. If the stack isn't aligned, PPCRegisterInfo::eliminateFrameIndex() will
   check for the offset being a multiple of 16 and will convert it to an
   X-Form instruction if it isn't.

Differential Revision : https://reviews.llvm.org/D38758

llvm-svn: 315500
2017-10-11 20:20:58 +00:00
Oliver Stannard 4191b9eaea [Asm] Add debug tracing in table-generated assembly matcher
This adds debug tracing to the table-generated assembly instruction matcher,
enabled by the -debug-only=asm-matcher option.

The changes in the target AsmParsers are to add an MCInstrInfo reference under
a consistent name, so that we can use it from table-generated code. This was
already being used this way for targets that use deprecation warnings, but 5
targets did not have it, and Hexagon had it under a different name to the other
backends.

llvm-svn: 315445
2017-10-11 09:17:43 +00:00
Lang Hames 3a67075a3a [MC] Add a missing <memory> include left out of r315327.
llvm-svn: 315331
2017-10-10 16:58:26 +00:00
Lang Hames 60fbc7cc38 [MC] Thread unique_ptr<MCObjectWriter> through the create.*ObjectWriter
functions.

This makes the ownership of the resulting MCObjectWriter clear, and allows us
to remove one instance of MCObjectStreamer's bizarre "holding ownership via
someone else's reference" trick.

llvm-svn: 315327
2017-10-10 16:28:07 +00:00
Stefan Pintilie cc330daa5b [PowerPC] Add missing record form instructions to the P9 Scheduling Model
A number of record form instructions were missing from the P9 scheduling
model. Added those instructions and marked the P9 model as complete.

Differential Revision: https://reviews.llvm.org/D38560

llvm-svn: 315313
2017-10-10 13:45:35 +00:00
Nemanja Ivanovic 7bf866eb10 Fix for PR34888.
The issue is that we assume operand zero of the input to the add instruction
is a register. In this case, the input comes from inline assembly and
operand zero is not a register thereby causing a crash.
The code will bail anyway if the input instruction doesn't have the right
opcode. So do that check first and let short-circuiting prevent the crash.

llvm-svn: 315285
2017-10-10 08:46:10 +00:00
Lang Hames dcb312bdb9 [MC] Plumb unique_ptr<MCELFObjectTargetWriter> through createELFObjectWriter to
ELFObjectWriter's constructor.

Fixes the same ownership issue for ELF that r315245 did for MachO:
ELFObjectWriter takes ownership of its MCELFObjectTargetWriter, so we want to
pass this through to the constructor via a unique_ptr, rather than a raw ptr.

llvm-svn: 315254
2017-10-09 23:53:15 +00:00
Lang Hames 9b206a7d60 [MC] Plumb unique_ptr<MCMachObjectTargetWriter> through createMachObjectWriter
to MCObjectWriter's constructor.

MCObjectWriter takes ownership of its MCMachObjectTargetWriter argument -- this
patch plumbs that ownership relationship through the constructor (which
previously took raw MCMachObjectTargetWriter*) and the createMachObjectWriter
function.

llvm-svn: 315245
2017-10-09 22:38:13 +00:00
Benjamin Kramer 16610028ea Remove unused variables. No functionality change.
llvm-svn: 315185
2017-10-08 19:11:02 +00:00
Stefan Pintilie e1d7547237 [PowerPC] Revert P9 scheduling model to incomplete
Partially revert a previous change from commit: https://llvm.org/svn/llvm-project/llvm/trunk@314026
The previous change caused regressions on Power 9.

llvm-svn: 314835
2017-10-03 20:27:30 +00:00
Hiroshi Inoue 224661d94b [trivial] fix format, NFC
llvm-svn: 314769
2017-10-03 07:28:58 +00:00
Hiroshi Inoue dcedd66b00 [PowerPC] support ZERO_EXTEND in tryBitPermutation
This patch add a support of ISD::ZERO_EXTEND in PPCDAGToDAGISel::tryBitPermutation to increase the opportunity to use rotate-and-mask by reordering ZEXT and ANDI.
Since tryBitPermutation stops analyzing nodes if it hits a ZEXT node while traversing SDNodes, we want to avoid ZEXT between two nodes that can be folded into a rotate-and-mask instruction.

For example, we allow these nodes

      t9: i32 = add t7, Constant:i32<1>
    t11: i32 = and t9, Constant:i32<255>
  t12: i64 = zero_extend t11
t14: i64 = shl t12, Constant:i64<2>

to be folded into a rotate-and-mask instruction.
Such case often happens in array accesses with logical AND operation in the index, e.g. array[i & 0xFF];

Differential Revision: https://reviews.llvm.org/D37514

llvm-svn: 314655
2017-10-02 09:24:00 +00:00
Hiroshi Inoue 79c0bec06e [PowerPC] eliminate partially redundant compare instruction
This is a follow-on of D37211.
D37211 eliminates a compare instruction if two conditional branches can be made based on the one compare instruction, e.g.
if (a == 0) { ... }
else if (a < 0) { ... }

This patch extends this optimization to support partially redundant cases, which often happen in while loops.
For example, one compare instruction is moved from the loop body into the preheader by this optimization in the following example.
do {
  if (a == 0) dummy1();
  a = func(a);
} while (a > 0);

Differential Revision: https://reviews.llvm.org/D38236

llvm-svn: 314390
2017-09-28 08:38:19 +00:00
Hiroshi Inoue ed1ffa49a4 [PowerPC] eliminate unconditional branch to the next instruction
This patch makes analyzeBranch eliminate unconditional branch to the next instruction.
After basic blocks are re-organized by optimizers, such as machine block placement, a BB may end with an unconditional branch to the next (fallthrough) BB. This patch removes such redundant branch instruction.

Differential Revision: https://reviews.llvm.org/D37730

llvm-svn: 314297
2017-09-27 10:33:02 +00:00
Nemanja Ivanovic e22ebeab1a [PowerPC] Reverting sequence of patches for elimination of comparison instructions
In the past while, I've committed a number of patches in the PowerPC back end
aimed at eliminating comparison instructions. However, this causes some failures
in proprietary source and these issues are not observed in SPEC or any open
source packages I've been able to run.
As a result, I'm pulling the entire series and will refactor it to:
- Have a single entry point for easy control
- Have fine-grained control over which patterns we transform

A side-effect of this is that test cases for these patches (and modified by
them) are XFAIL-ed. This is a temporary measure as it is counter-productive
to remove/modify these test cases and then have to modify them again when
the refactored patch is recommitted.
The failure will be investigated in parallel to the refactoring effort and
the recommit will either have a fix for it or will leave this transformation
off by default until the problem is resolved.

llvm-svn: 314244
2017-09-26 20:42:47 +00:00
Nemanja Ivanovic f7bc9ce378 [PowerPC] Eliminate compares - add i64 sext/zext handling for SETLT/SETGT
As mentioned in https://reviews.llvm.org/D33718, this simply adds another
pattern to the compare elimination sequence and is committed without a
differential review.

llvm-svn: 314106
2017-09-25 14:05:46 +00:00
Clement Courbet 2807c0a442 [CodeGenPrepare][NFC] Rename TargetTransformInfo::expandMemCmp -> TargetTransformInfo::enableMemCmpExpansion.
Summary:
Right now there are two functions with the same name, one does the work
and the other one returns true if expansion is needed. Rename
TargetTransformInfo::expandMemCmp to make it more consistent with other
members of TargetTransformInfo.

Remove the unused Instruction* parameter.

Differential Revision: https://reviews.llvm.org/D38165

llvm-svn: 314096
2017-09-25 06:35:16 +00:00
Nemanja Ivanovic f894ce35d0 [PowerPC] Eliminate compares - add i64 sext/zext handling for SETLE/SETGE
As mentioned in https://reviews.llvm.org/D33718, this simply adds another
pattern to the compare elimination sequence and is committed without a
differential review.

llvm-svn: 314073
2017-09-24 05:48:11 +00:00
Nemanja Ivanovic 35db4f956a [PowerPC] Eliminate compares - add i32 sext/zext handling for SETULT/SETUGT
As mentioned in https://reviews.llvm.org/D33718, this simply adds another
pattern to the compare elimination sequence and is committed without a
differential revision.

llvm-svn: 314062
2017-09-23 12:53:03 +00:00
Nemanja Ivanovic c4980799ab [PowerPC] Eliminate compares - add i32 sext/zext handling for SETULE/SETUGE
As mentioned in https://reviews.llvm.org/D33718, this simply adds another
pattern to the compare elimination sequence and is committed without a
differential revision.

llvm-svn: 314060
2017-09-23 09:50:12 +00:00
Nemanja Ivanovic 41c4a109d8 [PowerPC] Eliminate compares - add i32 sext/zext handling for SETLT/SETGT
As mentioned in https://reviews.llvm.org/D33718, this simply adds another
pattern to the compare elimination sequence and is committed without a
differential revision.

llvm-svn: 314055
2017-09-23 04:41:34 +00:00
Stefan Pintilie 590eb2755d [PowerPC] Mark P9 scheduling model complete
This patch just adds the missing information to the P9 scheduling model to allow
the model to be marked as complete.

The model has been verified against P9 documentation. The model was verified
with utils/schedcover.py.

Differential Revision: https://reviews.llvm.org/D35695

llvm-svn: 314026
2017-09-22 20:17:25 +00:00
Tim Shen cee7536188 [XRay] support conditional return on PPC.
Summary: Conditional returns were not taken into consideration at all. Implement them by turning them into jumps and normal returns. This means there is a slightly higher performance penalty for conditional returns, but this is the best we can do, and it still disturbs little of the rest.

Reviewers: dberris, echristo

Subscribers: sanjoy, nemanjai, hiraditya, kbarton, llvm-commits

Differential Revision: https://reviews.llvm.org/D38102

llvm-svn: 314005
2017-09-22 18:30:02 +00:00
Nemanja Ivanovic cea42b7fff Remove the default clause from a fully-covering switch
to appease bots that use a compiler that warns about this
and use -Werror.

llvm-svn: 313980
2017-09-22 12:26:00 +00:00
Nemanja Ivanovic d6f93f5143 Recommit r310809 with a fix for the spill problem
This patch re-commits the patch that was pulled out due to a
problem it caused, but with a fix for the problem. The fix
was reviewed separately by Eric Christopher and Hal Finkel.

Differential Revision: https://reviews.llvm.org/D38054

llvm-svn: 313978
2017-09-22 11:50:25 +00:00
Zaara Syeda fcd9697d72 [Power9] Spill gprs to vector registers rather than stack
This patch updates register allocation to enable spilling gprs to
volatile vector registers rather than the stack. It can be enabled
 for Power9 with option -ppc-enable-gpr-to-vsr-spills.

Differential Revision: https://reviews.llvm.org/D34815

llvm-svn: 313886
2017-09-21 16:12:33 +00:00
Tony Jiang 2d9c5f3b8b [PowerPC Peephole] Constants into a join add, use ADDI over LI/ADD.
Two blocks prior to the join each perform an li and the the join block has an
add using the initialized register. Optimize each predecessor block to instead
use addi and delete the li's and add.

Differential Revision: https://reviews.llvm.org/D36734

llvm-svn: 313639
2017-09-19 16:14:37 +00:00
Tony Jiang 425071eff3 [Power9] Add missing Power9 instructions.
The following 8 instructions are implemented in this patch.
addpcis(subpcis, lnia), darn, maddhd, maddhdu, maddld, setb

llvm-svn: 313636
2017-09-19 15:22:36 +00:00
Stefan Pintilie dff606ec3e [Power9] Add missing instructions: extswsli, popcntb
Added the following P9 instructions: extswsli, extswsli., popcntb

Differential Revision: https://reviews.llvm.org/D37342

llvm-svn: 313147
2017-09-13 14:05:27 +00:00
Lei Huang 34e6621724 Update branch coalescing to be a PowerPC specific pass
Implementing this pass as a PowerPC specific pass.  Branch coalescing utilizes
the analyzeBranch method which currently does not include any implicit operands.
This is not an issue on PPC but must be handled on other targets.

Pass is currently off by default. Enabled via -enable-ppc-branch-coalesce.

Differential Revision : https: // reviews.llvm.org/D32776

llvm-svn: 313061
2017-09-12 18:39:11 +00:00
Kyle Butt 8c0314c3ed PPC: Don't select lxv/stxv for insufficiently aligned stack slots.
The lxv/stxv instructions require an offset that is 0 % 16. Previously we were
selecting lxv/stxv for loads and stores to the stack where the offset from the
slot was a multiple of 16, but the stack slot was not 16 or more byte aligned.
When the frame gets lowered these transform to r(1|31) + slot + offset.
If slot is not aligned, slot + offset may not be 0 % 16.
Now we require 16 byte or more alignment for select lxv/stxv to stack slots.

Includes a testcase that shows both sufficiently and insufficiently aligned
stack slots.

llvm-svn: 312843
2017-09-09 00:37:56 +00:00
Dean Michael Berris 711dec260f [XRay][CodeGen][PowerPC] Fix tail exit codegen for XRay in PPC
Summary:
This fixes code-gen for XRay in PPC. The regression wasn't caught by
codegen tests  which we add in this change.

What happened was the following:

- For tail exits, we used to unconditionally prepend the returns/exits
  with a pseudo-instruction that gets lowered to the instrumentation
  sled (and leave the actual return/exit instruction as-is).
- Changes to the XRay instrumentation pass caused the tail exits to
  suddenly also emit the tail exit pseudo-instruction, since the check
  for whether a return instruction was also a call instruction meant it
  was a tail exit instruction.
- None of the tests caught the regression either due to non-existent
  tests, or the tests being disabled/removed for continuous breakage.

This change re-introduces some of the basic tests and verifies that
we're back to a state that allows the back-end to generate appropriate
XRay instrumented binaries for PPC in the presence of tail exits.

Reviewers: echristo, timshen

Subscribers: nemanjai, kbarton, llvm-commits

Differential Revision: https://reviews.llvm.org/D37570

llvm-svn: 312772
2017-09-08 01:47:56 +00:00
Hal Finkel 112a6bac72 [PowerPC] Don't use xscvdpspn on the P7
xscvdpspn was not introduced until the P8, so don't use it on the P7. Fixes a
regression introduced in r288152.

llvm-svn: 312612
2017-09-06 03:08:26 +00:00
Tony Jiang 61ef1c540c [PPC][NFC] Renaming things with 'xxinsert' moniker to 'vecinsert' to make it more general.
Commit on behalf of Graham Yiu (gyiu@ca.ibm.com)

llvm-svn: 312547
2017-09-05 18:08:02 +00:00
Hiroshi Inoue 614453b797 [PowerPC] eliminate redundant compare instruction
If multiple conditional branches are executed based on the same comparison, we can execute multiple conditional branches based on the result of one comparison on PPC. For example,

if (a == 0) { ... }
else if (a < 0) { ... }

can be executed by one compare and two conditional branches instead of two pairs of a compare and a conditional branch.

This patch identifies a code sequence of the two pairs of a compare and a conditional branch and merge the compares if possible.
To maximize the opportunity, we do canonicalization of code sequence before merging compares.
For the above example, the input for this pass looks like:

cmplwi r3, 0
beq    0, .LBB0_3
cmpwi  r3, -1
bgt    0, .LBB0_4

So, before merging two compares, we canonicalize it as

cmpwi  r3, 0       ; cmplwi and cmpwi yield same result for beq
beq    0, .LBB0_3
cmpwi  r3, 0       ; greather than -1 means greater or equal to 0
bge    0, .LBB0_4

The generated code should be

cmpwi  r3, 0
beq    0, .LBB0_3
bge    0, .LBB0_4

Differential Revision: https://reviews.llvm.org/D37211

llvm-svn: 312514
2017-09-05 04:15:17 +00:00
Eric Christopher e42ac21499 Temporarily revert "Update branch coalescing to be a PowerPC specific pass"
From comments and code review it wasn't intended to be enabled by default yet.

This reverts commit r311588.

llvm-svn: 312214
2017-08-31 05:56:16 +00:00
Stefan Pintilie c35e4de388 [Power9] Add new instructions for floating point status and control registers.
Added the following P9 instructions: mffsce, mffscdrn, mffscdrni, mffscrn,
  mffscrni, mffsl

Differential Revision: https://reviews.llvm.org/D37167

llvm-svn: 311903
2017-08-28 18:46:01 +00:00
NAKAMURA Takumi a1e97a77f5 Untabify.
llvm-svn: 311875
2017-08-28 06:47:47 +00:00
Sanjay Patel e404cbff66 [DAG] convert vector select-of-constants to logic/math
This goes back to a discussion about IR canonicalization. We'd like to preserve and convert
more IR to 'select' than we currently do because that's likely the best choice in IR:
http://lists.llvm.org/pipermail/llvm-dev/2016-September/105335.html
...but that's often not true for codegen, so we need to account for this pattern coming in
to the backend and transform it to better DAG ops.

Steps in this patch:

  1. Add an EVT param to the existing convertSelectOfConstantsToMath() TLI hook to more finely
     enable this transform. Other targets will probably want that anyway to distinguish scalars
     from vectors. We're using that here to exclude AVX512 targets, but it may not be necessary.

  2. Convert a vselect to ext+add. This eliminates a constant load/materialization, and the
     vector ext is often free.

Implementing a more general fold using xor+and can be a follow-up for targets that don't have
a legal vselect. It's also possible that we can remove the TLI hook for the special case fold
implemented here because we're eliminating a constant, but it needs to be tested on other
targets.

Differential Revision: https://reviews.llvm.org/D36840

llvm-svn: 311731
2017-08-24 23:24:43 +00:00
Lei Huang 0cb591fc4c Update branch coalescing to be a PowerPC specific pass
Implementing this pass as a PowerPC specific pass.  Branch coalescing utilizes
the analyzeBranch method which currently does not include any implicit operands.
This is not an issue on PPC but must be handled on other targets.

Differential Revision : https: // reviews.llvm.org/D32776

llvm-svn: 311588
2017-08-23 19:25:04 +00:00
Hiroshi Inoue cc555bd0ac [PowerPC] better instruction selection for OR (XOR) with a 32-bit immediate
- recommitting after fixing a test failure on MacOS

On PPC64, OR (XOR) with a 32-bit immediate can be done with only two instructions, i.e. ori + oris.
But the current LLVM generates three or four instructions for this purpose (and also it clobbers one GPR).

This patch makes PPC backend generate ori + oris (xori + xoris) for OR (XOR) with a 32-bit immediate.

e.g. (x | 0xFFFFFFFF) should be

	ori 3, 3, 65535
	oris 3, 3, 65535

but LLVM generates without this patch

	li 4, 0
	oris 4, 4, 65535
	ori 4, 4, 65535
	or 3, 3, 4

Differential Revision: https://reviews.llvm.org/D34757

llvm-svn: 311538
2017-08-23 08:55:18 +00:00
Hiroshi Inoue dbb285ca51 Revert rL311526: [PowerPC] better instruction selection for OR (XOR) with a 32-bit immediate
This reverts commit rL311526 due to failures in some buildbot.

llvm-svn: 311530
2017-08-23 06:38:05 +00:00
Hiroshi Inoue c4449df1b0 [PowerPC] better instruction selection for OR (XOR) with a 32-bit immediate
On PPC64, OR (XOR) with a 32-bit immediate can be done with only two instructions, i.e. ori + oris.
But the current LLVM generates three or four instructions for this purpose (and also it clobbers one GPR).

This patch makes PPC backend generate ori + oris (xori + xoris) for OR (XOR) with a 32-bit immediate.

e.g. (x | 0xFFFFFFFF) should be

	ori 3, 3, 65535
	oris 3, 3, 65535

but LLVM generates without this patch

	li 4, 0
	oris 4, 4, 65535
	ori 4, 4, 65535
	or 3, 3, 4

Differential Revision: https://reviews.llvm.org/D34757

llvm-svn: 311526
2017-08-23 05:15:15 +00:00
Sean Fertile 00393cce3a [PPC] Refine checks for emiting TOC restore nop and tail-call eligibility.
For the medium and large code models we only need to check if a call crosses
dso-boundaries when considering tail-call elgibility.

Differential Revision: https://reviews.llvm.org/D34245

llvm-svn: 311353
2017-08-21 17:35:32 +00:00
Stefan Pintilie 9495f33e45 [PowerPC] Check if the pre-increment PHI Node already exists
Preparations to use the per-increment are sometimes done in the target
independent pass Loop Strength Reduction. We try to detect them in the PowerPC
specific pass so that they are not done twice and so that we do not add PHIs
that are not required.

Differential Revision: https://reviews.llvm.org/D36736

llvm-svn: 311332
2017-08-21 13:36:18 +00:00
Lei Huang 451ef4adcd [PowerPC] Add codegen for VSX word extract convert to FP
Add codegen for VSX word extract conversion from signed/unsigned to single/double
precision.

For UINT_TO_FP:
Extract word unsigned and convert to float was implemented in https://reviews.llvm.org/D20239.
Here we will add the missing extract integer and conversion to double. This
utilizes the new P9 instruction xxextractuw to extracting an integer element
when the result will be converted to double thereby saving 2 direct moves
(VSR <-> GPR).

For SINT_TO_FP:
We will implement the following sequence which will also reduce the number of
instructions by saving 2 direct moves.

v4i32->f32:
        xxspltw
        xvcvsxwsp
        xscvspdpn

v4i32->f64:
        xxspltw
        xvcvsxwdp

Differential Revision: https://reviews.llvm.org/D35859

llvm-svn: 310866
2017-08-14 18:09:29 +00:00
Chandler Carruth fe6b509f83 [PowerPC] Revert r310346 (and followups r310356 & r310424) which
introduce a miscompile bug.

There appears to be a bug where the generated code to extract the sign
bit doesn't work correctly for 32-bit inputs. I've replied to the
original commit pointing out the problem. I think I see by inspection
(and reading the manual for PPC) how to fix this, but I can't be 100%
confident and I also don't know what the best way to test this is.
Currently it seems nearly impossible to get the backend to hit this code
path, but the patch autohr is likely in a better position to craft such
test cases than I am, and based on where the bug is it should be easily
done.

Original commit message for r310346:
"""
[PowerPC] Eliminate compares - add i32 sext/zext handling for SETLE/SETGE

Adds handling for SETLE/SETGE comparisons on i32 values. Furthermore, it
adds the handling for the special case where RHS == 0.

Differential Revision: https://reviews.llvm.org/D34048
"""

llvm-svn: 310809
2017-08-14 03:41:00 +00:00
Krzysztof Parzyszek bea30c6286 Add "Restored" flag to CalleeSavedInfo
The liveness-tracking code assumes that the registers that were saved
in the function's prolog are live outside of the function. Specifically,
that registers that were saved are also live-on-exit from the function.
This isn't always the case as illustrated by the LR register on ARM.

Differential Revision: https://reviews.llvm.org/D36160

llvm-svn: 310619
2017-08-10 16:17:32 +00:00
Nemanja Ivanovic 3f98fb4a3f My commit r310346 introduced some valid warnings. This cleans them up.
llvm-svn: 310424
2017-08-08 22:17:31 +00:00
Nemanja Ivanovic 979dcb6f09 [PowerPC] Don't crash on larger splats achieved through 1-byte splats
We've implemented a 1-byte splat using XXSPLTISB on P9. However, LLVM will
produce a 1-byte splat even for wider element BUILD_VECTOR nodes. This patch
prevents crashing in that situation.

Differential Revision: https://reviews.llvm.org/D35650

llvm-svn: 310358
2017-08-08 13:52:45 +00:00
Nemanja Ivanovic bed7136eee Appease compilers that have the -Wcovered-switch-default switch.
llvm-svn: 310356
2017-08-08 12:41:56 +00:00
Nemanja Ivanovic 809fbfa6a1 [PowerPC] Eliminate compares - add i32 sext/zext handling for SETLE/SETGE
Adds handling for SETLE/SETGE comparisons on i32 values. Furthermore, it adds
the handling for the special case where RHS == 0.

Differential Revision: https://reviews.llvm.org/D34048

llvm-svn: 310346
2017-08-08 11:20:44 +00:00
Rafael Espindola 2783469566 Fix the ppc jit tests.
llvm-svn: 309921
2017-08-03 04:52:45 +00:00
Rafael Espindola 79e238afee Delete Default and JITDefault code models
IMHO it is an antipattern to have a enum value that is Default.

At any given piece of code it is not clear if we have to handle
Default or if has already been mapped to a concrete value. In this
case in particular, only the target can do the mapping and it is nice
to make sure it is always done.

This deletes the two default enum values of CodeModel and uses an
explicit Optional<CodeModel> when it is possible that it is
unspecified.

llvm-svn: 309911
2017-08-03 02:16:21 +00:00