Commit Graph

60174 Commits

Author SHA1 Message Date
Jay Foad ad3ec08955 [AMDGPU] One more use of the new export target names. NFC. 2020-11-13 09:44:09 +00:00
serge-sans-paille 9218ff50f9 llvmbuildectomy - replace llvm-build by plain cmake
No longer rely on an external tool to build the llvm component layout.

Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.

These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.

Differential Revision: https://reviews.llvm.org/D90848
2020-11-13 10:35:24 +01:00
Craig Topper 114f044640 [X86] Use EVT::getIntegerVT instead of MVT::getIntegerVT where the type can be i2 or i4.
This was a mistake introduced in D91294. I'm not sure how to
exercise this with the existing code, but I hit it while trying
some follow up experiments.
2020-11-12 21:48:45 -08:00
Craig Topper a4124e455e [X86] When storing v1i1/v2i1/v4i1 to memory, make sure we store zeros in the rest of the byte
We can't store garbage in the unused bits. It possible that something like zextload from i1/i2/i4 is created to read the memory. Those zextloads would be legalized assuming the extra bits are 0.

I'm not sure that the code in lowerStore is executed for the v1i1/v2i1/v4i1 case. It looks like the DAG combine in combineStore may have converted them to v8i1 first. And I think we're missing some cases to avoid going to the stack in the first place. But I don't have time to investigate those things at the moment so I wanted to focus on the correctness issue.

Should fix PR48147.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D91294
2020-11-12 21:28:18 -08:00
Stanislav Mekhanoshin 5ab1702129 [AMDGPU] Remove scratch rsrc from spill pseudos
Differential Revision: https://reviews.llvm.org/D91110
2020-11-12 15:23:37 -08:00
Jessica Paquette d0ba6c4002 [AArch64][GlobalISel] Select CSINC and CSINV for G_SELECT with constants
Select the following:

- G_SELECT cc, 0, 1 -> CSINC zreg, zreg, cc
- G_SELECT cc 0, -1 -> CSINV zreg, zreg cc
- G_SELECT cc, 1, f -> CSINC f, zreg, inv_cc
- G_SELECT cc, -1, f -> CSINV f, zreg, inv_cc
- G_SELECT cc, t, 1 -> CSINC t, zreg, cc
- G_SELECT cc, t, -1 -> CSINC t, zreg, cc

(IR example: https://godbolt.org/z/YfPna9)

These correspond to a bunch of the AArch64csel patterns in AArch64InstrInfo.td.

Unfortunately, it doesn't seem like we can import patterns that use NZCV like
those ones do. E.g.

```
def : Pat<(AArch64csel GPR32:$tval, (i32 1), (i32 imm:$cc), NZCV),
          (CSINCWr GPR32:$tval, WZR, (i32 imm:$cc))>;
```

So we have to manually select these for now.

This replaces `selectSelectOpc` with an `emitSelect` function, which performs
these optimizations.

Differential Revision: https://reviews.llvm.org/D90701
2020-11-12 14:44:01 -08:00
Kazushi (Jam) Marukawa 410626c9b5 [VE] Support vld intrinsics
Add intrinsics for vector load instructions.  Add a regression test also.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D91332
2020-11-13 07:34:42 +09:00
Stanislav Mekhanoshin cf6565f6d0 [AMDGPU] Enable multi-dword flat scratch load/stores
Differential Revision: https://reviews.llvm.org/D91384
2020-11-12 13:38:56 -08:00
Jay Foad 6881a82e8c [AMDGPU] Fix scheduling of exp pos4
Also fix a similar issue in SIInsertWaitcnts, but I don't think that fix
has any effect in practice.

Differential Revision: https://reviews.llvm.org/D91290
2020-11-12 19:57:14 +00:00
Jay Foad d7d6ac5624 [AMDGPU] Define and use names for export targets. NFC.
Differential Revision: https://reviews.llvm.org/D91289
2020-11-12 19:57:14 +00:00
Craig Topper 4cdf1d2110 [MSP430] Remove unused MVT::Glue output from MSP430ISD::SELECT_CC nodes.
Follow up from a similar patch on RISCV 637f19c36b

Nothing reads this Glue value that I could see. The SDNode def in
the td file does not have the SDNPOutGlue flag so I don't think
this glue would get properly propagated to MachineSDNodes if it
was used.
2020-11-12 10:34:01 -08:00
Craig Topper 0add5f9122 [RISCV] Don't include CodeGen layer files in MC layer
-Use MCRegister instead of Register in MC layer.
-Move some enums from RISCVInstrInfo.h to RISCVBaseInfo.h to be with other TSFlags bits.

Differential Revision: https://reviews.llvm.org/D91114
2020-11-12 07:45:38 -08:00
Craig Topper 9ca02d6fe1 [RISCV] Add an ANDI to shift amount of FSL/FSR instructions
The fshl and fshr intrinsics are defined to modulo their shift amount by the bitwidth of one of their inputs. The FSR/FSL instructions read one extra bit from the shift amount. If that bit is set the inputs are swapped. In order to preserve the semantics of the llvm intrinsics we need to make sure that the extra bit isn't set. DAG combine or instcombine may have removed any mask that was originally present.

We could be smarter here and try to use computeKnownBits to check if the bit is known zero, but wanted to start with correctness.

Differential Revision: https://reviews.llvm.org/D90905
2020-11-12 07:33:40 -08:00
David Green 11dee2eae2 [ARM] Ensure CountReg definition dominates InsertPt when creating t2DoLoopStartTP
Of course there was something missing, in this case a check that the def
of the count register we are adding to a t2DoLoopStartTP would dominate
the insertion point.

In the future, when we remove some of these COPY's in between, the
t2DoLoopStartTP will always become the last instruction in the block,
preventing this from happening. In the meantime we need to check they
are created in a sensible order.

Differential Revision: https://reviews.llvm.org/D91287
2020-11-12 13:47:46 +00:00
Kazushi (Jam) Marukawa a72d384249 [VE] Change the default type of v64 register class
Change the default type of v64 register class from v512i32 to v256f64.
Add a regression test also.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D91301
2020-11-12 19:07:07 +09:00
David Sherwood 3225fcf11e [SVE] Deal with SVE tuple call arguments correctly when running out of registers
When passing SVE types as arguments to function calls we can run
out of hardware SVE registers. This is normally fine, since we
switch to an indirect mode where we pass a pointer to a SVE stack
object in a GPR. However, if we switch over part-way through
processing a SVE tuple then part of it will be in registers and
the other part will be on the stack.

I've fixed this by ensuring that:

1. When we don't have enough registers to allocate the whole block
   we mark any remaining SVE registers temporarily as allocated.
2. We temporarily remove the InConsecutiveRegs flags from the last
   tuple part argument and reinvoke the autogenerated calling
   convention handler. Doing this prevents the code from entering
   an infinite recursion and, in combination with 1), ensures we
   switch over to the Indirect mode.
3. After allocating a GPR register for the pointer to the tuple we
   then deallocate any SVE registers we marked as allocated in 1).
   We also set the InConsecutiveRegs flags back how they were before.
4. I've changed the AArch64ISelLowering LowerCALL and
   LowerFormalArguments functions to detect the start of a tuple,
   which involves allocating a single stack object and doing the
   correct numbers of legal loads and stores.

Differential Revision: https://reviews.llvm.org/D90219
2020-11-12 08:41:50 +00:00
Amara Emerson ad376657c1 [AArch64][GlobalISel] Optimize G_PTR_ADD with a negated offset to be a G_SUB. 2020-11-11 22:46:53 -08:00
Baptiste Saleil 37c4ac8545 [PowerPC] Accumulator/Unprimed Accumulator register copy, spill and restore
This patch adds support for accumulator/unprimed accumulator
register copy, spill and restore for MMA.

Authored By: Baptiste Saleil

Reviewed By: #powerpc, bsaleil, amyk

Differential Revision: https://reviews.llvm.org/D90616
2020-11-11 16:23:45 -06:00
Jessica Paquette 7a70a2f04d [AArch64][GlobalISel] Mark G_FCONSTANT as legal when there is full fp16 support
When there is full fp16 support, there is no reason to widen 16-bit
G_FCONSTANTs to 32 bits. Mark them as legal in this case.

Also, we currently import a pattern for materializing a 16-bit 0.0.
Add a testcase showing we select it.

(All other 16-bit G_FCONSTANTS are not yet selected.)

Differential Revision: https://reviews.llvm.org/D89164
2020-11-11 13:25:11 -08:00
Craig Topper 637f19c36b [RISCV] Remove traces of Glue from RISCVISD::SELECT_CC
We were creating RISCVISD::SELECT_CC nodes with Glue output that was never being used, and the tablegen SDNode had the SDNPInGlue flag instead of the SDNPOutGlue flag.

Since we don't seem to need the Glue just get rid of it from both places.

Differential Revision: https://reviews.llvm.org/D91199
2020-11-11 09:30:48 -08:00
Jessica Paquette c42053f79b [AArch64][GlobalISel] Select arith extended add/sub in manual selection code
The manual selection code for add/sub was not checking if it was possible to
fold in shifts + extends (the *rx opcode variants).

As a result, we could never select things like

```
cmp x1, w0, uxtw #2
```

Because we don't import any patterns for compares.

This adds support for the arithmetic shifted register forms and updates tests
for instructions selected using `emitADD`, `emitADDS`, and `emitSUBS`.

This is a 0.1% geomean code size improvement on SPECINT2000 at -Os.

Differential Revision: https://reviews.llvm.org/D91207
2020-11-11 09:26:03 -08:00
Jessica Paquette f0580c73bb [AArch64][GlobalISel] Select negative arithmetic immediates in manual selector
Previously, we only handled negative arithmetic immediates in the imported
selector code.

Since we don't import code for, say, compares, we were missing opportunities
for things like

```
%cst:gpr(s64) = G_CONSTANT i64 -10
%cmp:gpr(s32) = G_ICMP intpred(eq), %reg0(s64), %cst
->
%adds = ADDSXri %reg0, 10, 0, implicit-def $nzcv
%cmp = CSINCWr $wzr, $wzr, 1, implicit $nzcv
```

Instead, we would have to materialize the constant and emit a SUBS.

This adds support for selection like above for SUB, SUBS, ADD, and ADDS.

This is a 0.1% geomean code size improvement on SPECINT2000 at -Os.

Differential Revision: https://reviews.llvm.org/D91108
2020-11-11 09:20:05 -08:00
Jay Foad f23c4c6f8a [AMDGPU] Separate out real exp instructions by subtarget. NFC.
Differential Revision: https://reviews.llvm.org/D91247
2020-11-11 17:13:40 +00:00
Jay Foad 2b33ea6935 [AMDGPU] Split exp instructions out into their own tablegen file. NFC.
Differential Revision: https://reviews.llvm.org/D91246
2020-11-11 17:13:40 +00:00
Jay Foad f94fd1c8ca [AMDGPU] Make use of SIInstrInfo::isEXP. NFC. 2020-11-11 17:01:20 +00:00
Jay Foad 830ed64ccd Revert "Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access""
This reverts commit 8b08fa0103.

The underlying problems were fixed by D90607.
2020-11-11 14:40:14 +00:00
Caroline Concatto 37f4ccb275 [AArch64]Add memory op cost model for SVE
This patch adds/fixes memory op cost model for SVE with fixed-width
vector.

Differential Revision: https://reviews.llvm.org/D90950
2020-11-11 12:49:19 +00:00
Simon Pilgrim 1a62ca65c1 [KnownBits] Add KnownBits::commonBits helper. NFCI.
We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........
2020-11-11 12:15:54 +00:00
Kerry McLaughlin 170947a5de [SVE][CodeGen] Lower scalable masked scatters
Lowers the llvm.masked.scatter intrinsics (scalar plus vector addressing mode only)

Changes included in this patch:
 - Custom lowering for MSCATTER, which chooses the appropriate scatter store opcode to use.
    Floating-point scatters are cast to integer, with patterns added to match FP reinterpret_casts.
 - Added the getCanonicalIndexType function to convert redundant addressing
   modes (e.g. scaling is redundant when accessing bytes)
 - Tests with 32 & 64-bit scaled & unscaled offsets

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D90941
2020-11-11 11:50:22 +00:00
Kerry McLaughlin ffbbfc76ca [SVE][CodeGen] Add the isTruncatingStore flag to MSCATTER
This patch adds the IsTruncatingStore flag to MaskedScatterSDNode, set by getMaskedScatter().
Updated SelectionDAGDumper::print_details for MaskedScatterSDNode to print
the details of masked scatters (is truncating, signed or scaled).

This is the first in a series of patches which adds support for scalable masked scatters

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D90939
2020-11-11 10:58:24 +00:00
Sam Parker 898a81dfc5 [NFC][ARM] Replace lambda with any_of 2020-11-11 10:02:55 +00:00
Amara Emerson 2262393090 [AArch64][GlobalISel] Port some AArch64 target specific MUL combines from SDAG.
These do things like turn a multiply of a pow-2+1 into a shift and and add,
which is a common pattern that pops up, and is universally better than expensive
madd instructions with a constant.

I've added check lines to an existing codegen test since the code being ported
is almost identical, however the mul by negative pow2 constant tests don't generate
the same code because we're missing some generic G_MUL combines still.

Differential Revision: https://reviews.llvm.org/D91125
2020-11-10 22:21:13 -08:00
Gaurav Jain 3726b14428 [NFC] Use [MC]Register for x86 target
Differential Revision: https://reviews.llvm.org/D91161
2020-11-10 15:49:39 -08:00
Kazushi (Jam) Marukawa dd6f607ea8 [VE] Implement FoldImmediate
Implement FoldImmediate for only integer aritihmetic operations.
Add regression tests also.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D91150
2020-11-11 08:08:32 +09:00
Pirama Arumuga Nainar 8262e94a6d [ARM] Fix PR 47980: Use constrainRegClass during foldImmediate opt.
Previously we used setRegClass to rgpr, which may expand the register
domain if the result was already in a constrained class (tcgpr in the
above PR).

Differential Revision: https://reviews.llvm.org/D91192
2020-11-10 13:38:11 -08:00
Stanislav Mekhanoshin 544ef42e40 [AMDGPU] Set default op_sel_hi on accvgpr read/write
These are opsel opcodes with op_sel actually being ignored.
As a such op_sel_hi needs to be set to default 1 even though
these bits are ignored. This is compatibility change.

Differential Revision: https://reviews.llvm.org/D91202
2020-11-10 13:07:29 -08:00
Benjamin Kramer 92c61a045f [ARM] Silence unused variable warning in Release builds. NFC. 2020-11-10 20:35:28 +01:00
Craig Topper 70b481e8db [RISCV] Add missing copyright header to RISCVBaseInfo.cpp. NFC 2020-11-10 11:33:08 -08:00
David Green 08d1c2d470 [ARM] Introduce t2DoLoopStartTP
This introduces a new pseudo instruction, almost identical to a
t2DoLoopStart but taking 2 parameters - the original loop iteration
count needed for a low overhead loop, plus the VCTP element count needed
for a DLSTP instruction setting up a tail predicated loop. The idea is
that the instruction holds both values and the backend
ARMLowOverheadLoops pass can pick between the two, depending on whether
it creates a tail predicated loop or falls back to a low overhead loop.

To do that there needs to be something that converts a t2DoLoopStart to
a t2DoLoopStartTP, for which this patch repurposes the
MVEVPTOptimisationsPass as a "tail predication and vpt optimisation"
pass. The extra operand for the t2DoLoopStartTP is chosen based on the
operands of VCTP's in the loop, and the instruction is moved as late in
the block as possible to attempt to increase the likelihood of making
tail predicated loops.

Differential Revision: https://reviews.llvm.org/D90591
2020-11-10 18:08:12 +00:00
Jay Foad bb8d1437a6 [AMDGPU] Simplify multiclass EXP_m. NFC. 2020-11-10 17:28:36 +00:00
David Green dbe1bf63aa [ARM] Cleanup for ARMLowOverheadLoops. NFC 2020-11-10 17:28:07 +00:00
David Green c7e275388e [ARM] Don't aggressively unroll vector remainder loops
We already do not unroll loops with vector instructions under MVE, but
that does not include the remainder loops that the vectorizer produces.
These remainder loops will be rarely executed and are not worth
unrolling, as the trip count is likely to be low if they get executed at
all. Luckily they get llvm.loop.isvectorized to make recognizing them
simpler.

We have wanted to do this for a while but hit issues with low overhead
loops being reverted due to difficult registry allocation. With recent
changes that seems to be less of an issue now.

Differential Revision: https://reviews.llvm.org/D90055
2020-11-10 17:01:31 +00:00
David Green 73a6cd4b6b [ARM] Add a RegAllocHint for hinting t2DoLoopStart towards LR
This hints the operand of a t2DoLoopStart towards using LR, which can
help make it more likely to become t2DLS lr, lr. This makes it easier to
move if needed (as the input is the same as the output), or potentially
remove entirely.

The hint is added after others (from COPY's etc) which still take
precedence. It needed to find a place to add the hint, which currently
uses the post isel custom inserter.

Differential Revision: https://reviews.llvm.org/D89883
2020-11-10 16:28:57 +00:00
David Green b2ac9681a7 [ARM] Alter t2DoLoopStart to define lr
This changes the definition of t2DoLoopStart from
t2DoLoopStart rGPR
to
GPRlr = t2DoLoopStart rGPR

This will hopefully mean that low overhead loops are more tied together,
and we can more reliably generate loops without reverting or being at
the whims of the register allocator.

This is a fairly simple change in itself, but leads to a number of other
required alterations.

 - The hardware loop pass, if UsePhi is set, now generates loops of the
   form:
       %start = llvm.start.loop.iterations(%N)
     loop:
       %p = phi [%start], [%dec]
       %dec = llvm.loop.decrement.reg(%p, 1)
       %c = icmp ne %dec, 0
       br %c, loop, exit
 - For this a new llvm.start.loop.iterations intrinsic was added, identical
   to llvm.set.loop.iterations but produces a value as seen above, gluing
   the loop together more through def-use chains.
 - This new instrinsic conceptually produces the same output as input,
   which is taught to SCEV so that the checks in MVETailPredication are not
   affected.
 - Some minor changes are needed to the ARMLowOverheadLoop pass, but it has
   been left mostly as before. We should now more reliably be able to tell
   that the t2DoLoopStart is correct without having to prove it, but
   t2WhileLoopStart and tail-predicated loops will remain the same.
 - And all the tests have been updated. There are a lot of them!

This patch on it's own might cause more trouble that it helps, with more
tail-predicated loops being reverted, but some additional patches can
hopefully improve upon that to get to something that is better overall.

Differential Revision: https://reviews.llvm.org/D89881
2020-11-10 15:57:58 +00:00
Kazushi (Jam) Marukawa 543b30db06 [VE][NFC] Change cast to dyn_cast
We used cast where we should use dyn_cast.  So, change it this time.
Old code cause problems if I implement brind instruction and compile
openmp using new compiler.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D91151
2020-11-10 21:49:16 +09:00
Pablo Barrio 642b21beba [AArch64] Enable RAS 1.1 system registers in all AArch64
Some use cases (e.g. kernel devs) have strict requirements to only enable
features available with -march=armv8-a, e.g. no armv8.1-a. Enabling RAS 1.1 in
all AArch64 means they can consider to support it.

Bear in mind that the first versions of the Armv8 architecture still do not
support RAS 1.1. This patch only lets devs write code with the user-friendly
register mnemonic instead of the ugly generic S<op0>_<op1>_<Cn>_<Cm>_<op2>.
They still need to place runtime checks to make sure that the CPU to run on
supports RAS 1.1.

Differential Revision: https://reviews.llvm.org/D90594
2020-11-10 12:13:33 +00:00
Kazushi (Jam) Marukawa c84b2c49be [VE] Support inline assembly with vector regsiters
Support inline assembly with vector registers.  Add a regression test also.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D91146
2020-11-10 20:55:38 +09:00
Mirko Brkusanin a75d6178b8 [GlobalISel] Add combine for (x | mask) -> x when (x | mask) == x
If we have a mask, and a value x, where (x | mask) == x, we can drop the OR
and just use x.

Differential Revision: https://reviews.llvm.org/D90952
2020-11-10 11:32:13 +01:00
Mirko Brkusanin fb36ab0a42 [GlobalISel] Expand combine for (x & mask) -> x when (x & mask) == x
We can use KnownBitsAnalysis to cover cases when mask is not trivial. It can
also help with cases when mask is not constant but can still be folded into
one. Since 'and' is comutative we should treat both operands as possible
replacements.

Differential Revision: https://reviews.llvm.org/D90674
2020-11-10 11:32:13 +01:00
Kazushi (Jam) Marukawa b65ef65b22 [VE] Support inline assembly
Support inline assembly with scalar registers.  Add a regression test also.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D91119
2020-11-10 18:56:22 +09:00
Jay Foad 0ad4d04002 [AMDGPU] Remove an unused return value. NFC.
Differential Revision: https://reviews.llvm.org/D91063
2020-11-10 09:15:14 +00:00
Esme-Yi 6e0ad5bc8c [PowerPC] Add an ISEL pattern for Mul with Imm.
Summary: This patch try to do the following transformation if the multiplier doen't fit int16:
			(mul X, c1 << c2) -> (rldicr (mulli X, c1) c2)

Reviewed By: jsji, steven.zhang

Differential Revision: https://reviews.llvm.org/D87384
2020-11-10 06:52:39 +00:00
Carl Ritson fde8351743 [AMDGPU] Fix lowering of S_MOV_{B32,B64}_term
If the source of S_MOV_{B32,B64}_term is an immediate then it
cannot be lowered to a COPY.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D90451
2020-11-10 12:16:31 +09:00
Eric Astor d657f7cd30 [ms] [llvm-ml] Support MASM's relational operators (EQ, LT, etc.)
Support the named relational operators (EQ, LT, etc.).

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D89733
2020-11-09 14:01:36 -05:00
Francesco Petrogalli 9f61931e07 [llvm][AArch64] Allow TB(N)Z to drop signext for sign bit tests.
For example if the sign extension is only used in for TBZ, and the value is used elsewhere with a zero extension, this can eliminate a sign extension.

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D90606
2020-11-09 18:27:48 +00:00
David Green c8cd7e2bbf [ARM] Remove MI variable aliasing. NFC
This was accidentally using the same name for two different variables in
the same line. Whilst it seems to work for some compilers, others have
trouble and it is probably not a fantastic idea.
2020-11-09 18:18:43 +00:00
Craig Topper 5d3fd3df94 [RISCV] Make ctlz/cttz cheap to speculatively execute so CodeGenPrepare won't insert a zero check.
Add additional isel patterns for ctzw/clzw instructions.

Differential Revision: https://reviews.llvm.org/D91040
2020-11-09 10:13:45 -08:00
Craig Topper a59076006b [RISCV] Add isel patterns for using PACK for zext.h and zext.w.
Differential Revision: https://reviews.llvm.org/D91024
2020-11-09 10:13:45 -08:00
Craig Topper 4265cbaa34 [RISCV] Make SIGN_EXTEND_INREG from i8/i16 legal when Zbb extension is enabled.
This produces better code for sign extend to i64 on RV32 target.

Differential Revision: https://reviews.llvm.org/D91023
2020-11-09 10:13:45 -08:00
Craig Topper c0dd22e44a [RISCV] Add isel patterns to match sbset/sbclr/sbinv/sbext even if the shift amount isn't masked.
This uses the shiftop PatFrags to handle the masked shift amount
and unmasked shift amount cases. That also checks XLen as part
of the masked amount check so we don't need separate RV32 and RV64
patterns.

Differential Revision: https://reviews.llvm.org/D91016
2020-11-09 09:55:26 -08:00
Mircea Trofin 2ac3a7d0c4 [NFC] Use [MC]Register
Differential Revision: https://reviews.llvm.org/D90795
2020-11-09 08:37:14 -08:00
jasonliu 42d2109380 [XCOFF] Enable explicit sections on AIX
Implement mechanism to allow explicit sections to be generated on AIX.

Reviewed By: DiggerLin

Differential Revision: https://reviews.llvm.org/D88615
2020-11-09 16:27:38 +00:00
Stanislav Mekhanoshin d5a465866e [AMDGPU] Omit buffer resource with flat scratch.
Differential Revision: https://reviews.llvm.org/D90979
2020-11-09 08:05:20 -08:00
Paul C. Anagnostopoulos 91d2e5c81a [TableGen] Add the !filter bang operator.
Add a test. Update the Programmer's Reference.

Use it in some TableGen files.

Differential Revision: https://reviews.llvm.org/D91008
2020-11-09 10:56:55 -05:00
Sebastian Neubauer a022b1ccd8 [AMDGPU] Add amdgpu_gfx calling convention
Add a calling convention called amdgpu_gfx for real function calls
within graphics shaders. For the moment, this uses the same calling
convention as other calls in amdgpu, with registers excluded for return
address, stack pointer and stack buffer descriptor.

Differential Revision: https://reviews.llvm.org/D88540
2020-11-09 16:51:44 +01:00
Momchil Velikov 937ab6a785 [ARM][MachineOutliner] Emit more CFI instructions
This patch make the outliner emit CFI instructions in a few more
places:

  * after LR is restored, but before the return in an outlined
  function

  * around save/restore of LR to/from a register at calls to outlined
  functions

  * around save/restore of LR to/from the stack at calls to outlined
  functions

The latter two only when the function does NOT spill LR. If the
function spills LR, then outliner generated saves/restores around
calls are not considered interesting for unwinding the frame.

Differential Revision: https://reviews.llvm.org/D89483
2020-11-09 15:26:18 +00:00
Sam Tebbs 40a3f7e48d [ARM][LowOverheadLoops] Merge a VCMP and the new VPST into a VPT
There were cases where a VCMP and a VPST were merged even if the VCMP
didn't have the same defs of its operands as the VPST. This is fixed by
adding RDA checks for the defs. This however gave rise to cases where
the new VPST created would precede the un-merged VCMP and so would fail
a predicate mask assertion since the VCMP wasn't predicated. This was
solved by converting the VCMP to a VPT instead of inserting the new
VPST.

Differential Revision: https://reviews.llvm.org/D90461
2020-11-09 15:03:48 +00:00
Jay Foad 55ea017759 [AMDGPU] Remove unused DisableDecoder machinery. NFC.
This has been unused since D24738.
2020-11-09 13:53:27 +00:00
David Green a0a9e1c798 [ARM] Remove kill flags between VCMP and insertion point
When we fold a VCMP into a VPST instruction any kill flags between the
old VCMP position and the new insertion point need to be removed, in
order to keep the verifier happy.

Differential Revision: https://reviews.llvm.org/D90964
2020-11-09 13:17:53 +00:00
Lucas Prates c2c2cc1360 [ARM][AArch64] Adding Neoverse V1 CPU support
Add support for the Neoverse V1 CPU to the ARM and AArch64 backends.

This is based on patches from Mark Murray and Victor Campos.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D90765
2020-11-09 13:15:40 +00:00
Craig Topper f40925aa8b [X86] Improve lowering of fptoui
Invert the select condition when masking in the sign bit of a fptoui operation. Also, rather than lowering the sign mask to select/xor and expecting the select to get cleaned up later, directly lower to shift/xor.

Patch by Layton Kifer!

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D90658
2020-11-07 23:50:03 -08:00
Craig Topper 19313ed580 [RISCV] Remove assertsexti32 from a couple B extension isel patterns that don't demanded the sign extended bits. 2020-11-07 22:43:16 -08:00
Carl Ritson 8e8a54c7e9 [AMDGPU] SIWholeQuadMode fix mode insertion when SCC always defined
Fix a crash when SCC is defined until end of block and mode change
must be inserted in SCC live region.

Reviewed By: mceier

Differential Revision: https://reviews.llvm.org/D90997
2020-11-08 11:14:57 +09:00
Craig Topper c72358b77f [RISCV] Use (not X) in instead of (xor X, -1) in isel patterns to improve readability. NFC 2020-11-07 11:50:52 -08:00
Elvina Yakubova 93b99728b1 [AArch64] Add pipeline model for HiSilicon's TSV110
This patch adds the scheduling and cost model for TSV110.

Reviewed by: SjoerdMeijer, bryanpkc

Differential Revision: https://reviews.llvm.org/D89972
2020-11-07 01:23:00 +03:00
Eric Astor 5afb360808 [ms] [llvm-ml] Allow arbitrary strings as integer constants
MASM interprets strings in expression contexts as integers expressed in big-endian base-256, treating each character as its ASCII representation.

This completely eliminates the need to special-case single-character strings.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D90788
2020-11-06 17:15:49 -05:00
Jay Foad d61f2cfb9f [AMDGPU] Simplify exp target parsing
Treat any identifier as a potential exp target and diagnose them all the
same way as "invalid exp target"s.

Differential Revision: https://reviews.llvm.org/D90947
2020-11-06 16:09:34 +00:00
Paul C. Anagnostopoulos eed768b700 [NVPTX] [TableGen] Use new features of TableGen to simplify and clarify.
Differential Revision: https://reviews.llvm.org/D90861
2020-11-06 09:20:19 -05:00
Simon Moll 7914e4f0fa [VE] Add v(m)regs to preserve_all reg mask
V(m)regs where defined before CSR_preserve_all was, add them now.

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D90912
2020-11-06 15:16:11 +01:00
Simon Moll adc69743d2 [VE][NFC] Refactor to support more than one calling conv
Prepare for supporting  different calling conventions by factoring out
things into CC-dependent selection functions (getParamCC, getReturnCC).

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D90911
2020-11-06 14:25:25 +01:00
Kazushi (Jam) Marukawa 43df29e206 [VE] Optimize address calculation
Optimize address calculations using LEA/LEASL instructions.
Update comments in VEISelLowering.cpp also.  Update an
existing regression test optimized by this modification.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90878
2020-11-06 19:46:59 +09:00
Simon Moll d3b33a7810 [VE][TTI] don't advertise vregs/vops
Claim to not have any vector support to dissuade SLP, LV and friends
from generating SIMD IR for the VE target.  We will take this back once
vector isel is stable.

Reviewed By: kaz7, fhahn

Differential Revision: https://reviews.llvm.org/D90462
2020-11-06 11:12:10 +01:00
Craig Topper 741b04b0b7 [RISCV] Only enable GPR<->FPR32 bitconvert isel patterns on RV32. NFCI
Bitconvert requires the bitwidth to match on both sides. On RV64
the GPR size is i64 so bitconvert between f32 isn't possible. The
node should never be generated so the pattern won't ever match, but
moving the patterns under IsRV32 makes it more obviously impossible.
It also moves it to a similar location to the patterns for the
custom nodes we use for RV64.
2020-11-05 16:15:25 -08:00
Konstantin Pyzhov 41e74e400d [AMDGPU] Corrected declaration of VOPC instructions with SDWA addressing mode.
Removed "implicit def VCC" from declarations of AMDGPU VOPC instructions since they do not implicitly write to VCC in SDWA mode.

Differential Revision: https://reviews.llvm.org/D89168
2020-11-05 11:15:50 -05:00
Michael Liao 23c6d1501d [amdgpu] Add `llvm.amdgcn.endpgm` support.
- `llvm.amdgcn.endpgm` is added to enable "abort" support.

Differential Revision: https://reviews.llvm.org/D90809
2020-11-05 19:06:50 -05:00
Yuriy Chernyshov 99e64623ec Do not construct std::string from nullptr
While I am trying to forbid such usages systematically in https://reviews.llvm.org/D79427 / P2166R0 to C++ standard, this PR fixes this (definitelly incorrect) usage in llvm.

This code is unreachable, so it could not cause any harm

Reviewed By: nikic, dblaikie

Differential Revision: https://reviews.llvm.org/D87697
2020-11-05 15:23:26 -08:00
Craig Topper defe11866a [RISCV] Add isel patterns for fnmadd/fnmsub with an fneg on the second operand instead of the first.
The multiply part of FMA is commutable, but TargetSelectionDAG.td
doesn't have it marked as commutable so tablegen won't automatically
create the additional patterns.

So manually add commuted patterns.
2020-11-05 14:00:25 -08:00
Kazushi (Jam) Marukawa f0e585d585 [VE] Add isReMaterializable and isAsCheapAsAMove flags
Add isReMaterializable and isCheapAsAMove flags to integer instructions
which cost cheap.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90833
2020-11-06 06:09:10 +09:00
Sanjay Patel 264a6df353 [ARM] remove cost-kind predicate for cmp/sel costs
This is the cmp/sel sibling to D90692.
Again, the reasoning is: the throughput cost is number of instructions/uops,
so size/blended costs are identical except in special cases (for example,
fdiv or other known-expensive machine instructions or things like MVE that
may require cracking into >1 uops).

We need to check for a valid (non-null) condition type parameter because
SimplifyCFG may pass nullptr for that (and so we will crash multiple
regression tests without that check). I'm not sure if passing nullptr makes
sense, but other code in the cost model does appear to check if that param
is set or not.

Differential Revision: https://reviews.llvm.org/D90781
2020-11-05 14:52:25 -05:00
Amara Emerson f347d78cca [AArch64][GlobalISel] Add AArch64::G_DUPLANE[X] opcodes for lane duplicates.
These were previously handled by pattern matching shuffles in the selector, but
adding a new opcode and making it equivalent to the AArch64duplane SDAG node
allows us to select more patterns, like lane indexed FMLAs (patch adding a test
for that will be committed later).

The pattern matching code has been simply moved to postlegalize lowering.

Differential Revision: https://reviews.llvm.org/D90820
2020-11-05 11:18:11 -08:00
Craig Topper ce5f4f22e9 [RISCV] Use the 'si' lib call for (double (fp_to_sint/uint i32 X)) when F extension is enabled.
D80526 added custom lowering to pick the si lib call on RV64, but this custom handling is only enabled when the F and D extension are both disabled. This prevents the si library call from being used for double when F is enabled but D is not.

This patch changes the behavior so we always enable the Custom hook on RV64 and decide in ReplaceNodeResults if we should emit a libcall based on whether the FP type should be softened or not.

Differential Revision: https://reviews.llvm.org/D90817
2020-11-05 10:46:45 -08:00
Stanislav Mekhanoshin f738aee0bb [AMDGPU] Add default 1 glc operand to rtn atomics
This change adds a real glc operand to the return atomic
instead of just string " glc" in the middle of the asm
string.

Improves asm parser diagnostics.

Differential Revision: https://reviews.llvm.org/D90730
2020-11-05 10:41:59 -08:00
Craig Topper ce1270fc7e [RISCV] Remove shadow register list passed to AllocateReg when allocating FP registers for calling convention
The _F and _D registers are already sub/super registers. When one gets allocated all its aliases are already marked as allocated. We don't need to explicitly shadow it too.

I believe shadow is for calling conventions like 64-bit Windows on X86 where have rules like this

CCIfType<[i32], CCAssignToRegWithShadow<[ECX , EDX , R8D , R9D ],
                                         [XMM0, XMM1, XMM2, XMM3]>>

For that calling convention the argument number determines which register is used regardless of how many scalars or vectors came before it.

Removing this removes a question I had in D90738.

Differential Revision: https://reviews.llvm.org/D90801
2020-11-05 09:49:42 -08:00
Craig Topper c623584b6f [RISCV] Add isel patterns for fshl with immediate to select FSRI/FSRIW
There is no FSLI instruction, but we can emulate it using FSRI by swapping operands and subtracting the immediate from the bitwidth.

Differential Revision: https://reviews.llvm.org/D90826
2020-11-05 09:37:43 -08:00
Sander de Smalen d57bba7cf8 [SVE] Return StackOffset for TargetFrameLowering::getFrameIndexReference.
To accommodate frame layouts that have both fixed and scalable objects
on the stack, describing a stack location or offset using a pointer + uint64_t
is not sufficient. For this reason, we've introduced the StackOffset class,
which models both the fixed- and scalable sized offsets.

The TargetFrameLowering::getFrameIndexReference is made to return a StackOffset,
so that this can be used in other interfaces, such as to eliminate frame indices
in PEI or to emit Debug locations for variables on the stack.

This patch is purely mechanical and doesn't change the behaviour of how
the result of this function is used for fixed-sized offsets. The patch adds
various checks to assert that the offset has no scalable component, as frame
offsets with a scalable component are not yet supported in various places.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D90018
2020-11-05 11:02:18 +00:00
Fangrui Song 96b0b9a5e3 [X86] Enable shrink-wrapping for no-frame-pointer non-nounwind functions on platforms not using compact unwind
The current compact unwind scheme does not work when the prologue is not at the
start (the instructions before the prologue cannot be described).  (Technically
this is fixable, but it requires multiple compact unwind descriptors for one
function.)

rL255175 chose to not perform shrink-wrapping for no-frame-pointer functions not
marked as nounwind to work around PR25614. This is overly limited, as platforms
not supporting compact unwind (all non-Darwin) does not need the workaround.
This patch restricts the limitation to compact unwind platforms.

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D89930
2020-11-04 16:51:48 -08:00
Arthur Eubanks ab0ddbc38a Reland [NewPM] Add OptimizationLevel param to registerPipelineStartEPCallback
This allows targets to skip optional optimization passes at -O0.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D90777
2020-11-04 13:11:40 -08:00
Arthur Eubanks 9173b5a99d Revert "[NewPM] Add OptimizationLevel param to registerPipelineStartEPCallback"
This reverts commit 7a83aa0520.

Causing buildbot failures.
2020-11-04 12:57:32 -08:00
Arthur Eubanks 7a83aa0520 [NewPM] Add OptimizationLevel param to registerPipelineStartEPCallback
This allows targets to skip optional optimization passes at -O0.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D90777
2020-11-04 12:53:30 -08:00
Eric Astor 07c4f1d10b [ms] [llvm-ml] Lex MASM strings, including escaping
Allow single-quoted strings and double-quoted character values, as well as doubled-quote escaping.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D89731
2020-11-04 15:28:43 -05:00
Cameron McInally c126eb7529 [SelectionDAG] Add legalizations for VECREDUCE_SEQ_FMUL
Hook up legalizations for VECREDUCE_SEQ_FMUL. This is following up on the VECREDUCE_SEQ_FADD work from D90247.

Differential Revision: https://reviews.llvm.org/D90644
2020-11-04 14:20:31 -06:00
Mircea Trofin 5dc47541f9 [NFC] Use Register/MCRegister
Differential Revision: https://reviews.llvm.org/D90724
2020-11-04 12:20:17 -08:00
Craig Topper cc3bf27077 [RISCV] Remove assertsexti32 from fslw/fsrw isel patterns.
The operations in these patterns shouldn't be effected by sign
bits. And the pattern is starting from a sign_extend_inreg so
we aren't expecting sign bits to be passed through either.

Differential Revision: https://reviews.llvm.org/D90739
2020-11-04 11:37:58 -08:00
Craig Topper d47300f503 [RISCV] Correct the operand order for fshl/fshr to fsl/fsr instructions.
fsl/fsr take their shift amount in $rs2 or an immediate. The
sources are $rs1 and $rs3.

fshl/fshr ISD opcodes both concatenate operand 0 in the high bits and
operand 1 in the lower bits. fshl returns the high bits after
shifting and fshr returns the low bits. So a shift amount of 0
returns operand 0 for fshl and operand 1 for fshr.

fsl/fsr concatenate their operands in different orders such that
$rs1 will be returned for a shift amount of 0. So $rs1 needs to
come from operand 0 of fshl and operand 1 of fshr.

Differential Revision: https://reviews.llvm.org/D90735
2020-11-04 11:13:25 -08:00
Craig Topper 0122a4ea66 [RISCV] Remove assertsexti32 from inputs to riscv_sllw/srlw nodes in B extension isel patterns.
riscv_sllw/srlw only reads the lower 32 bits of the first operand.
And the lower 5 bits of the second operands. Whether the upper
32 bits of the input are sign bits or not doesn't matter.

Also use ineg and not to shorten the patterns.

Differential Revision: https://reviews.llvm.org/D90668
2020-11-04 10:35:05 -08:00
Craig Topper 857563eaf0 [RISCV] Check all 64-bits of the mask in SelectRORIW.
We need to ensure the upper 32 bits of the mask are zero.
So that the srl shifts zeroes into the lower 32 bits.

Differential Revision: https://reviews.llvm.org/D90585
2020-11-04 10:15:30 -08:00
Christopher Tetreault 900ec97bbe [UBSan] Cannot negate smallest negative signed integer
Silence warning Undefined Behavior Sanitzer warning:
runtime error: negation of -9223372036854775808 cannot be represented in type 'int64_t' (aka 'long'); cast to an unsigned type to negate this value to itself

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D90710
2020-11-04 10:07:52 -08:00
Craig Topper 3701e33a22 [RISCV] Remove custom isel for (srl (shl val, 32), imm). Use pattern instead. NFCI
We don't need custom matching, we just a need a predicate to check
the immediate is greater than 32. We can use the existing ImmSub32
to adjust the immediate.

I've also used the new predicate in the other location that used
ImmSub32. I tried to create a test case where we would break without
the greater than 32 check on that pattern, but DAG combine defeated me.
Still seemed safer to have it.

Differential Revision: https://reviews.llvm.org/D90546
2020-11-04 09:59:14 -08:00
Joe Nash 58adab34c4 [AMDGPU] Resolve pseudo registers at encoding uses
Pseudo-registers allow different register encodings
between gpu generations. Make sure we resolve the
pseudo regs to real regs whenever we get their
hardware encoding.
Using the correct encodings revealed a register
bank conflict and an unnecessary write dependency.
Tests have been updated to match.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D90721

Change-Id: I73c154cd24aecc820993b50bebaf4df97a5710ca
2020-11-04 12:52:32 -05:00
Sebastian Neubauer 31a0b2834f [AMDGPU] Fix iterating in SIFixSGPRCopies
The insertion of waterfall loops splits the current basic block into
three blocks. So the basic block that we iterate over must be updated.

This failed assert(!NodePtr->isKnownSentinel()) in ilist_iterator for
divergent calls in branches before.

Differential Revision: https://reviews.llvm.org/D90596
2020-11-04 18:43:19 +01:00
Paul C. Anagnostopoulos d56cd4291e [TableGen] Add !interleave operator to concatenate a list of values with delimiters
Add a test. Use it in some TableGen files.

Differential Revision: https://reviews.llvm.org/D90469
2020-11-04 09:23:54 -05:00
Simon Moll 351c10cc72 [VE] Add +vpu attribute
`+vpu` controls whether VEISelLowering adds any vregs.  This defaults to
`-vpu` to have scalar code generation out of the box.  We bring up
vector isel under the `+vpu` flag. Once vector isel is stable we switch
to `+vpu` and advertise vregs and vops in TTI.

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D90465
2020-11-04 12:42:00 +01:00
Kerry McLaughlin f2412d372d [SVE][CodeGen] Lower scalable integer vector reductions
This patch uses the existing LowerFixedLengthReductionToSVE function to also lower
scalable vector reductions. A separate function has been added to lower VECREDUCE_AND
& VECREDUCE_OR operations with predicate types using ptest.

Lowering scalable floating-point reductions will be addressed in a follow up patch,
for now these will hit the assertion added to expandVecReduce() in TargetLowering.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D89382
2020-11-04 11:38:49 +00:00
Sebastian Neubauer 1124bf4ab7 [AMDGPU] Set rsrc1 flags for graphics shaders
Before they were only set for compute kernels and compute shaders but
not for other shaders.

Differential Revision: https://reviews.llvm.org/D89399
2020-11-04 12:25:41 +01:00
Sebastian Neubauer 76313288cd [AMDGPU] Fix ieee mode default value
Previously, the default value for ieee mode was
- on for compute kernels and compute shaders,
- off for all shaders except compute shaders.

This commit changes the default to be
- on for compute kernels,
- off for shaders.

This aligns the default value with the settings that are actually in
use.  To my knowledge, all users of shader calling conventions (mesa and
llpc) disable the ieee mode by default.

Differential Revision: https://reviews.llvm.org/D89388
2020-11-04 12:25:38 +01:00
David Green eb611930b6 [ARM] Remove unused variable. NFC 2020-11-04 09:00:03 +00:00
Sander de Smalen 73b6cb67dc [NFCI] Replace AArch64StackOffset by StackOffset.
This patch replaces the AArch64StackOffset class by the generic one
defined in TypeSize.h.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D88983
2020-11-04 08:49:00 +00:00
Amara Emerson 393b55380a [AArch64][GlobalISel] Add combine for G_EXTRACT_VECTOR_ELT to allow selection of pairwise FADD.
For the <2 x float> case, instead of adding another combine or legalization to
get it into a <4 x float> form, I'm just adding a GISel specific selection
pattern to cover it.

Differential Revision: https://reviews.llvm.org/D90699
2020-11-03 17:25:14 -08:00
Julien Jorge 0fca651711 [WebAssembly] Don't fold frame offset for global addresses
When machine instructions are in the form of
```
%0 = CONST_I32 @str
%1 = ADD_I32 %stack.0, %0
%2 = LOAD 0, 0, %1
```

In the `ADD_I32` instruction, it is possible to fold it if `%0` is a
`CONST_I32` from an immediate number. But in this case it is a global
address, so we shouldn't do that. But we haven't checked if the operand
of `ADD` is an immediate so far. This fixes the problem. (The case
applies the same for `ADD_I64` and `CONST_I64` instructions.)

Fixes https://bugs.llvm.org/show_bug.cgi?id=47944.

Patch by Julien Jorge (jjorge@quarkslab.com)

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D90577
2020-11-03 14:56:25 -08:00
Sanjay Patel c40126e740 [ARM] remove cost-kind predicate for most math op costs
This is based on the same idea that I am using for the basic model implementation
and what I have partly already done for x86: throughput cost is number of
instructions/uops, so size/blended costs are identical except in special cases
(for example, fdiv or other known-expensive machine instructions or things like
MVE that may require cracking into >1 uop)).

Differential Revision: https://reviews.llvm.org/D90692
2020-11-03 17:23:46 -05:00
Jordan Rupprecht 980bf1d5d1 [NFC] Inline wasm assertion-only variable 2020-11-03 13:06:59 -08:00
Andy Wingo 107c3a12d6 [WebAssembly] Implement ref.null
This patch adds a new "heap type" operand kind to the WebAssembly MC
layer, used by ref.null. Currently the possible values are "extern" and
"func"; when typed function references come, though, this operand may be
a type index.

Note that the "heap type" production is still known as "refedtype" in
the draft proposal; changing its name in the spec is
ongoing (https://github.com/WebAssembly/reference-types/issues/123).

The register form of ref.null is still untested.

Differential Revision: https://reviews.llvm.org/D90608
2020-11-03 10:46:23 -08:00
Craig Topper 00eff96e1d [RISCV] Add missing patterns for rotr with immediate for Zbb/Zbp extensions.
DAGCombine doesn't canonicalize rotl/rotr with immediate so we
need patterns for both.

Remove the custom matcher for rotl to RORI and just use a SDNodeXForm
to convert the immediate instead. Doing this gives priority to the
rev32/rev16 versions of grevi over rori since an explicit immediate
is more precise than any immediate. I also added rotr patterns for
rev32/rev16. And removed the (or (shl), (shr)) patterns that should be
combined to rotl by DAG combine.

There is at least one other grev pattern that probably needs a
another rotr pattern, but we need more test coverage first.

Differential Revision: https://reviews.llvm.org/D90575
2020-11-03 10:04:52 -08:00
Esme-Yi 5053eab890 Revert "[PowerPC] Extend folding RLWINM + RLWINM to post-RA."
This reverts commit 119ab2181e.
2020-11-03 16:34:02 +00:00
Tim Renouf 89d41f3a2b [AMDGPU] Add gfx1033 target
Differential Revision: https://reviews.llvm.org/D90447

Change-Id: If2650fc7f31bbdd49c76e74a9ca8e3734d769761
2020-11-03 16:27:48 +00:00
Tim Renouf ee3e642627 [AMDGPU] Add gfx90c target
This differentiates the Ryzen 4000/4300/4500/4700 series APUs that were
previously included in gfx909.

Differential Revision: https://reviews.llvm.org/D90419

Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
2020-11-03 16:27:43 +00:00
Jay Foad 040c50278c [AMDGPU] Fix ds_read2/write2 with unaligned offsets
These instructions use a scaled offset. We were wrongly selecting them
even when the required offset was not a multiple of the scale factor.

Differential Revision: https://reviews.llvm.org/D90607
2020-11-03 15:16:10 +00:00
Jameson Nash a0ad066ce4 make the AsmPrinterHandler array public
This lets external consumers customize the output, similar to how
AssemblyAnnotationWriter lets the caller define callbacks when printing
IR. The array of handlers already existed, this just cleans up the code
so that it can be exposed publically.

Replaces https://reviews.llvm.org/D74158

Differential Revision: https://reviews.llvm.org/D89613
2020-11-03 10:02:09 -05:00
Sanjay Patel 9af561ec99 [x86] update cost table comments for maxnum; NFC
Follow-up suggested in D90613.
2020-11-03 08:09:59 -05:00
David Green bd32386410 [ARM] Remove unused variable. NFC 2020-11-03 12:58:10 +00:00
David Green e474499402 [ARM] Treat memcpy/memset/memmove as call instructions for low overhead loops
If an instruction will be lowered to a call there is no advantage of
using a low overhead loop as the LR register will need to be spilled and
reloaded around the call, and the low overhead will end up being
reverted. This teaches our hardware loop lowering that these memory
intrinsics will be calls under certain situations.

Differential Revision: https://reviews.llvm.org/D90439
2020-11-03 11:53:09 +00:00
Nicholas Guy 54d8627852 [AArch64] Redundant masks in downcast long multiply
Adds patterns to catch masks preceeding a long multiply,
and generating a single umull/smull instruction instead.

Differential revision: https://reviews.llvm.org/D89956
2020-11-03 10:12:28 +00:00
Petar Avramovic 0031418dce AMDGPU/GlobalISel: Use same builder/observer in post-legalizer-combiner
Change match/apply functions into methods of new target specific combiner
helper class. Use reference to MachineIRBuilder from helper instead of
constructing new MachineIRBuilder each time new instruction needs to made.
Allows correct tracking of newly created instructions.

Differential Revision: https://reviews.llvm.org/D90623
2020-11-03 09:24:50 +01:00
Esme-Yi 119ab2181e [PowerPC] Extend folding RLWINM + RLWINM to post-RA.
Summary: This patch depends on D89846. We have the patterns to fold 2 RLWINMs in ppc-mi-peephole, while some RLWINM will be generated after RA, for example rGc4690b007743. If the RLWINM generated after RA followed by another RLWINM, we expect to perform the optimization after RA, too.

Reviewed By: shchenz, steven.zhang

Differential Revision: https://reviews.llvm.org/D89855
2020-11-03 07:44:11 +00:00
Craig Topper 46e91f6701 [RISCV] Remove isel patterns for fshl/fshr with same inputs. NFC
These were being selected to ROL/ROR, but DAG combine should
canonicalize fshl/fshr with same inputs to rotl/rotr which we
also have patterns for.
2020-11-02 23:12:18 -08:00
Esme-Yi b969dfe26f [NFC][PowerPC] Move the folding RLWINMs from ppc-mi-peephole to PPCInstrInfo.
Summary: We have the patterns to fold 2 RLWINMs in ppc-mi-peephole, while some RLWINM will be generated after RA, for example D88274. If the RLWINM generated after RA followed by another RLWINM, we expect to perform the optimization after RA, too.
This is a NFC patch to move the folding patterns to PPCInstrInfo, and the follow-up works will be calling it in pre-emit-peephole and expand the patterns to handle more cases.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D89846
2020-11-03 06:28:56 +00:00
Jessica Clarke 7601a21738 [RISCV] Only return DestSourcePair from isCopyInstrImpl for registers
ADDI often has a frameindex in operand 1, but consumers of this
interface, such as MachineSink, tend to call getReg() on the Destination
and Source operands, leading to the following crash when building
FreeBSD after this implementation was added in 8cf6778d30:

```
clang: llvm/include/llvm/CodeGen/MachineOperand.h:359: llvm::Register llvm::MachineOperand::getReg() const: Assertion `isReg() && "This is not a register operand!"' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
 #0 0x00007f4286f9b4d0 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) llvm/lib/Support/Unix/Signals.inc:563:0
 #1 0x00007f4286f9b587 PrintStackTraceSignalHandler(void*) llvm/lib/Support/Unix/Signals.inc:630:0
 #2 0x00007f4286f9926b llvm::sys::RunSignalHandlers() llvm/lib/Support/Signals.cpp:71:0
 #3 0x00007f4286f9ae52 SignalHandler(int) llvm/lib/Support/Unix/Signals.inc:405:0
 #4 0x00007f428646ffd0 (/lib/x86_64-linux-gnu/libc.so.6+0x3efd0)
 #5 0x00007f428646ff47 raise /build/glibc-2ORdQG/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0
 #6 0x00007f42864718b1 abort /build/glibc-2ORdQG/glibc-2.27/stdlib/abort.c:81:0
 #7 0x00007f428646142a __assert_fail_base /build/glibc-2ORdQG/glibc-2.27/assert/assert.c:89:0
 #8 0x00007f42864614a2 (/lib/x86_64-linux-gnu/libc.so.6+0x304a2)
 #9 0x00007f428d4078e2 llvm::MachineOperand::getReg() const llvm/include/llvm/CodeGen/MachineOperand.h:359:0
#10 0x00007f428d8260e7 attemptDebugCopyProp(llvm::MachineInstr&, llvm::MachineInstr&) llvm/lib/CodeGen/MachineSink.cpp:862:0
#11 0x00007f428d826442 performSink(llvm::MachineInstr&, llvm::MachineBasicBlock&, llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>, llvm::SmallVectorImpl<llvm::MachineInstr*>&) llvm/lib/CodeGen/MachineSink.cpp:918:0
#12 0x00007f428d826e27 (anonymous namespace)::MachineSinking::SinkInstruction(llvm::MachineInstr&, bool&, std::map<llvm::MachineBasicBlock*, llvm::SmallVector<llvm::MachineBasicBlock*, 4u>, std::less<llvm::MachineBasicBlock*>, std::allocator<std::pair<llvm::MachineBasicBlock* const, llvm::SmallVector<llvm::MachineBasicBlock*, 4u> > > >&) llvm/lib/CodeGen/MachineSink.cpp:1073:0
#13 0x00007f428d824a2c (anonymous namespace)::MachineSinking::ProcessBlock(llvm::MachineBasicBlock&) llvm/lib/CodeGen/MachineSink.cpp:410:0
#14 0x00007f428d824513 (anonymous namespace)::MachineSinking::runOnMachineFunction(llvm::MachineFunction&) llvm/lib/CodeGen/MachineSink.cpp:340:0
```

Thus, check that operand 1 is also a register in the condition.

Reviewed By: arichardson, luismarques

Differential Revision: https://reviews.llvm.org/D89090
2020-11-03 03:55:47 +00:00
Qiu Chaofan d14e51806b [PowerPC] Skip IEEE 128-bit FP type in FastISel
Vector types, quadword integers and f128 currently cannot be handled in
FastISel. We did not skip f128 type in lowering arguments, which causes
a crash. This patch will fix it.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D90206
2020-11-03 11:17:11 +08:00
Qiu Chaofan 3204ffeade [PowerPC] [NFC] Rename VCMPo to VCMP_rec
Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D90581
2020-11-03 11:10:59 +08:00
Fangrui Song ca01a6b3ac [PowerPC] Parse and ignore .machine ppc64
In the wild, kexec-tools purgatory/arch/ppc64/v2wrap.S and hvcall.S
use this directive.
2020-11-02 16:49:57 -08:00
Krzysztof Parzyszek b26a2755dc [Hexagon] Move isTypeForHVX from Hexagon TTI to HexagonSubtarget, NFC
It's useful outside of Hexagon TTI, and with how TTI is implemented,
it is not accessible outside of TTI.
2020-11-02 14:00:45 -06:00
Stanislav Mekhanoshin c9d6fe6f7d [AMDGPU] Improve FLAT scratch detection
We were useing too broad check for isFLATScratch() which also
includes FLAT global.

Differential Revision: https://reviews.llvm.org/D90505
2020-11-02 11:37:33 -08:00
Craig Topper 9ac2910093 [RISCV] Make SelectRORIW handle the commutability of OR.
The SHL and SRL could be in opposite order so account for that.

Differential Revision: https://reviews.llvm.org/D90586
2020-11-02 09:32:54 -08:00
Sanjay Patel 35fa3c474f [x86] add AVX2 cost model entries for maxnum of 256-bit vectors
As noticed in D90554 ,
the AVX2 costs for 256-bit vectors did not include FMAXNUM entries,
so we fell back to AVX1 which assumes those ops will be split into
128-bit halves or something close to that.

Differential Revision: https://reviews.llvm.org/D90613
2020-11-02 12:20:17 -05:00
Craig Topper 7142ec3aaf [RISCV] When matching RORIW, make sure the same input is given to both shifts.
The code is looking for (sext_inreg (or (shl X, C2), (shr (and Y, C3), C1))).
We need to ensure X and Y are the same.

Differential Revision: https://reviews.llvm.org/D90580
2020-11-02 09:12:40 -08:00
Momchil Velikov 7360d6d921 [ARM][MachineOutliner] Do not overestimate LR liveness in return block
The `LiveRegUnits` utility (as well as `LivePhysRegs`) considers
callee-saved registers to be alive at the point after the return
instruction in a block. In the ARM backend, the `LR` register is
classified as callee-saved, which is not really correct (from an ARM
eABI or just common sense point of view).  These two conditions cause
the `MachineOutliner` to overestimate the liveness of `LR`, which
results in unnecessary saves/restores of `LR` around calls to outlined
sequences.  It also causes the `MachineVerifer` to crash in some
cases, because the save instruction reads a dead `LR`, for example
when the following program:

int h(int, int);

int f(int a, int b, int c, int d) {
  a = h(a + 1, b - 1);
  b = b + c;
  return 1 + (2 * a + b) * (c - d) / (a - b) * (c + d);
}

int g(int a, int b, int c, int d) {
  a = h(a - 1, b + 1);
  b = b + c;
  return 2 + (2 * a + b) * (c - d) / (a - b) * (c + d);
}

is compiled with `-target arm-eabi -march=armv7-m -Oz`.

This patch computes the liveness of `LR` in return blocks only, while
taking into account the few ARM instructions, which read `LR`, but
nevertheless the register is not mentioned (explicitly or implicitly)
in the instruction operands.

Differential Revision: https://reviews.llvm.org/D89189
2020-11-02 16:47:22 +00:00
Florian Hahn b3b993a7ad Reland "[TTI] Add VecPred argument to getCmpSelInstrCost."
This reverts the revert commit 408c4408fa.

This version of the patch includes a fix for a crash caused by
treating ICmp/FCmp constant expressions as instructions.

Original message:

On some targets, like AArch64, vector selects can be efficiently lowered
if the vector condition is a compare with a supported predicate.

This patch adds a new argument to getCmpSelInstrCost, to indicate the
predicate of the feeding select condition. Note that it is not
sufficient to use the context instruction when querying the cost of a
vector select starting from a scalar one, because the condition of the
vector select could be composed of compares with different predicates.

This change greatly improves modeling the costs of certain
compare/select patterns on AArch64.

I am also planning on putting up patches to make use of the new argument in
SLPVectorizer & LV.
2020-11-02 15:39:29 +00:00
Matt Arsenault 86b8f6919b AMDGPU: Reorder checks 2020-11-02 10:21:48 -05:00
Evgeny Leviant cc96a82291 [TableGen][SchedModels] Fix read/write variant substitution
Patch fixes case when sched class has write and read variants belonging
to different processor models.

Differential revision: https://reviews.llvm.org/D89777
2020-11-02 17:39:04 +03:00
Jay Foad 0892d2a311 Revert "Fix ds_read2/write2 unaligned offsets"
This reverts commit 2e7e898c8f.

It was committed by mistake.
2020-11-02 14:01:33 +00:00