Commit Graph

29257 Commits

Author SHA1 Message Date
Mikhail Maltsev 08da01b496 [ARM] Add FP16 vector insert/extract patterns
This change adds two FP16 extraction and two insertion patterns
(one per possible vector length).
Extractions are handled by copying a Q/D register into one of VFP2
class registers, where single FP32 sub-registers can be accessed. Then
the extraction of even lanes are simple sub-register extractions
(because we don't care about the top parts of registers for FP16
operations). Odd lanes need an additional VMOVX instruction.

Unfortunately, insertions cannot be handled in the same way, because:
* There is no instruction to insert FP16 into an even lane (VINS only
  works with odd lanes)
* The patterns for odd lanes will have a form of a DAG (not a tree),
  and will not be implementable in pure tablegen

Because of this insertions are handled in the same way as 16-bit
integer insertions (with conversions between FP registers and GPRs
using VMOVHR instructions).

Without these patterns the ARM backend would sometimes fail during
instruction selection.

This patch also adds patterns which combine:
* an FP16 element extraction and a store into a single VST1
  instruction
* an FP16 load and insertion into a single VLD1 instruction

Differential Revision: https://reviews.llvm.org/D62651

llvm-svn: 362482
2019-06-04 09:39:55 +00:00
QingShan Zhang 11de0e71b0 [DAGCombine] Match a pattern where a wide type scalar value is stored by several narrow stores
This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c

static void store64(u64 x, unsigned char* y)
{
    for(int i = 0; i != 8; ++i)
        y[i] = (x >> ((7-i) * 8)) & 255;
}

static u64 load64(const unsigned char* y)
{
    u64 res = 0;
    for(int i = 0; i != 8; ++i)
        res |= (u64)(y[i]) << ((7-i) * 8);
    return res;
}
The load64 has been implemented by https://reviews.llvm.org/D26149
This patch is trying to implement the store pattern.

Match a pattern where a wide type scalar value is stored by several narrow
stores. Fold it into a single store or a BSWAP and a store if the targets
supports it.

Assuming little endian target:
i8 *p = ...
i32 val = ...
p[0] = (val >> 0) & 0xFF;
p[1] = (val >> 8) & 0xFF;
p[2] = (val >> 16) & 0xFF;
p[3] = (val >> 24) & 0xFF;

>
*((i32)p) = val;

i8 *p = ...
i32 val = ...
p[0] = (val >> 24) & 0xFF;
p[1] = (val >> 16) & 0xFF;
p[2] = (val >> 8) & 0xFF;
p[3] = (val >> 0) & 0xFF;

>
*((i32)p) = BSWAP(val);

Differential Revision: https://reviews.llvm.org/D61843

llvm-svn: 362472
2019-06-04 08:53:53 +00:00
QingShan Zhang 72667b4e48 [NFC] Update the test to check the endianness after the CodeGenPrepare instead of checking the assembly instructions.
llvm-svn: 362471
2019-06-04 08:45:07 +00:00
Chen Zheng a050b25544 [PowerPC] add testcases for reordering LSR and PPCCTRLoops - NFC
llvm-svn: 362468
2019-06-04 06:48:14 +00:00
Roman Lebedev b3650868f6 [NFC][X86] Fixup FileCheck prefixes - drop duplicates
llvm-svn: 362460
2019-06-03 23:00:51 +00:00
Craig Topper ac062bbad8 [X86] Add test cases for 32 and 64 bit versions of PR42118. NFC
llvm-svn: 362457
2019-06-03 22:34:15 +00:00
Roman Lebedev 6dc8ce323e [NFC][Codegen] Add tests for hoisting and-by-const from "logical shift", when then eq-comparing with 0
This was initially reported as: https://reviews.llvm.org/D62818

https://rise4fun.com/Alive/oPH

llvm-svn: 362455
2019-06-03 22:30:18 +00:00
Craig Topper 099f4a9fa8 Revert r362451 "foo" and r362452 "[X86] Add test cases for 32 and 64 bit versions of PR42118. NFC"
I failed to squash these properly

llvm-svn: 362453
2019-06-03 22:14:54 +00:00
Craig Topper 17728e7c15 [X86] Add test cases for 32 and 64 bit versions of PR42118. NFC
llvm-svn: 362452
2019-06-03 22:11:40 +00:00
Craig Topper 27a546610c foo
llvm-svn: 362451
2019-06-03 22:11:30 +00:00
Michael Berg 6ff978ee05 Propagate fmf for setcc in SDAG for select folds
llvm-svn: 362448
2019-06-03 21:53:26 +00:00
Matt Arsenault 0ceda9fb5c AMDGPU: Disable stack realignment for kernels
This is something of a workaround, and the state of stack realignment
controls is kind of a mess. Ideally, we would be able to specify the
stack is infinitely aligned on entry to a kernel.

TargetFrameLowering provides multiple controls which apply at
different points. The StackRealignable field is used during
SelectionDAG, and for some reason distinct from this
hook. StackAlignment is a single field not dependent on the
function. It would probably be better to make that dependent on the
calling convention, and the maximum value for kernels.

Currently this doesn't really change anything, since the frame
lowering mostly does its own thing. This helps avoid regressions in a
future change which will rely more heavily on hasFP.

llvm-svn: 362447
2019-06-03 21:33:22 +00:00
Jessica Paquette 7500c97ce4 [AArch64][GlobalISel] Optimize G_FCMP + G_SELECT pairs when G_SELECT is fp
Instead of emitting all of the test stuff for a compare when it's only used by
a select, instead, just emit the compare + select. The select will use the
value of NZCV correctly, so we don't need to emit all of the test instructions
etc.

For now, only support fp selects which use G_FCMP. Also only support condition
codes which will only require one select to represent.

Also add a test.

Differential Revision: https://reviews.llvm.org/D62695

llvm-svn: 362446
2019-06-03 20:47:20 +00:00
Craig Topper dcf865f0ca [X86] Fix the pattern for merge masked vcvtps2pd.
r362199 fixed it for zero masking, but not zero masking. The load
folding in the peephole pass hid the bug. This patch turns off
the peephole pass on the relevant test to ensure coverage.

llvm-svn: 362440
2019-06-03 19:29:14 +00:00
Michael Berg 0b7f98da65 Propagate fmf for setcc/select folds
Summary: This change facilitates propagating fmf which was placed on setcc from fcmp through folds with selects so that back ends can model this path for arithmetic folds on selects in SDAG.

Reviewers: qcolombet, spatel

Reviewed By: qcolombet

Subscribers: nemanjai, jsji

Differential Revision: https://reviews.llvm.org/D62552

llvm-svn: 362439
2019-06-03 19:12:15 +00:00
Nemanja Ivanovic bad43d8f49 [PowerPC] Look through copies for compare elimination
We currently miss the opportunities for optmizing comparisons in the peephole
optimizer if the input is the result of a COPY since we look for record-form
versions of the producing instruction.

This patch simply lets the optimization peek through copies.

Differential revision: https://reviews.llvm.org/D59633

llvm-svn: 362438
2019-06-03 19:09:15 +00:00
Simon Pilgrim 985f2f48bd [WebAssembly] Remove fptosi(undef) and fptoui(undef) from reduced test case.
Pre-commit for D62811 - which adds DAG fpto[us]i(undef) --> undef constant fold

llvm-svn: 362414
2019-06-03 16:21:58 +00:00
Simon Pilgrim cb7e4e8193 [SelectionDAG] Add [us]itofp(undef) --> 0 constant fold (PR39205)
We were missing this fold in the DAG, which I've copied directly from llvm::ConstantFoldCastInstruction

Differential Revision: https://reviews.llvm.org/D62807

llvm-svn: 362397
2019-06-03 13:02:07 +00:00
Simon Pilgrim 74467814f2 [SystemZ] Remove sitofp(undef) from reduced test case.
Pre-commit for D62807 - which adds DAG [us]itofp(undef) --> 0 constant fold

llvm-svn: 362396
2019-06-03 12:58:36 +00:00
Diogo N. Sampaio df92f84110 [ARM][FIX] Ran out of registers due tail recursion
Summary:
- pr42062
When compiling for MinSize,
ARMTargetLowering::LowerCall decides to indirect
multiple calls to a same function. However,
it disconsiders the limitation that thumb1
indirect calls require the callee to be in a
register from r0 to r3 (llvm limiation).
If all those registers are used by arguments, the
compiler dies with "error: run out of registers
during register allocation".
This patch tells the function
IsEligibleForTailCallOptimization if we intend to
perform indirect calls, as to avoid tail call
optimization.

Reviewers: dmgreen, efriedma

Reviewed By: efriedma

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62683

llvm-svn: 362366
2019-06-03 08:58:05 +00:00
Sam Parker a0bd6f8a1a [AArch64] Check for simple type in FPToUInt
DAGCombiner was hitting a SimpleType assertion when trying to combine
a v3f32 before type legalization.

bugzilla: https://bugs.llvm.org/show_bug.cgi?id=41916

Differential Revision: https://reviews.llvm.org/D62734

llvm-svn: 362365
2019-06-03 08:49:17 +00:00
Roman Lebedev bcd542881d [NFC][X86] extract-{low,}bits.ll: one more pattern c with truncation
llvm-svn: 362364
2019-06-03 08:44:09 +00:00
Jim Lin 20b14dacbb [AVR] Fix incorrect source regclass of LDWRdPtr
Summary:
LDWRdPtr would be expanded to ld+ldd. ldd only accepts the pointer register is Y or Z.
So the register class of pointer of LDWRdPtr should be PTRDISPREGS instead of PTRREGS.

Reviewers: dylanmckay

Reviewed By: dylanmckay

Subscribers: dylanmckay, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62300

llvm-svn: 362351
2019-06-03 02:31:07 +00:00
Florian Hahn e71963c850 Recommit r360171: [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor.
If we hit the limit, we do expand the outstanding tokenfactors.
Otherwise, we might drop nodes with users in the unexpanded
tokenfactors. This fixes the crashes reported by Jordan Rupprecht.

Reviewers: niravd, spatel, craig.topper, rupprecht

Reviewed By: niravd

Differential Revision: https://reviews.llvm.org/D62633

llvm-svn: 362350
2019-06-03 01:30:19 +00:00
Craig Topper 50b35caf30 [DAGCombiner][X86] Fold away masked store and scatter with all zeroes mask.
Similar to what was done for masked load and gather.

llvm-svn: 362342
2019-06-02 22:52:38 +00:00
Craig Topper 5f79d74946 [X86] Add test cases for masked store and masked scatter with an all zeroes mask. Fix bug in ScalarizeMaskedMemIntrin
Need to cast only to Constant instead of ConstantVector to allow
ConstantAggregateZero.

llvm-svn: 362341
2019-06-02 22:52:34 +00:00
Craig Topper a7bc31ebc6 [DAGCombiner] Replace masked loads with a zero mask with the passthru value
Similar to what was recently done for gathers in r362015.

llvm-svn: 362337
2019-06-02 18:58:46 +00:00
Roman Lebedev 420f5df1c3 [NFC][X86] extract-{low,}bits.ll: one more pattern a with truncation
llvm-svn: 362330
2019-06-02 17:11:21 +00:00
Simon Pilgrim 71a39bcf68 [X86] isHorizontalBinOp - add extract_subvector(shuffle(x)) handling (PR39921)
Let's us match horizontal op patterns on fast-variable-shuffle targets (Haswell etc.)

llvm-svn: 362327
2019-06-02 15:47:49 +00:00
Simon Pilgrim b0dc262ffb [X86] Add AVX2 'fast-variable-shuffle' PHADD tests (PR39921)
Haswell etc. will combine shuffles to a extract_subvector(permd(x)) before isHorizontalBinOp can match it.

llvm-svn: 362326
2019-06-02 15:33:28 +00:00
Roman Lebedev 2065ddfd79 [NFC][X86] extract-lowbits.ll: add one more pattern a with truncation
We are also free to interpret this as 'BZHI'/'BEXTR'.
https://rise4fun.com/Alive/dD6

llvm-svn: 362325
2019-06-02 15:07:49 +00:00
Simon Pilgrim ffb4d2bff7 [DAG] isBitwiseNot / isConstOrConstSplat - add support for build vector undefs + truncation (PR41020)
Add (opt-in) support for implicit truncation to isConstOrConstSplat, which allows us to match truncated 'all ones' cases in isBitwiseNot.

PR41020 compares against using ISD::isBuildVectorAllOnes() instead, but that predicate silently accepts any UNDEF elements in the build vector which might not be what we want in isBitwiseNot - so I've added an opt-in 'AllowUndefs' flag that is set to false by default but will allow us to enable it on individual cases where its safe.

Differential Revision: https://reviews.llvm.org/D62783

llvm-svn: 362323
2019-06-02 11:56:39 +00:00
Roman Lebedev 0bfa9359b0 [NFC][X86] extract-lowbits.ll: add patterns with truncation too
If we look past truncations of X too eagerly (D62786), we may
end up with 64-bit 'BEXTR', even though 32-bit-one would suffice.

llvm-svn: 362319
2019-06-02 08:05:24 +00:00
Craig Topper fe699c32a2 [X86] Simplify the CHECK lines in vector-reduce-and/or/xor-widen.ll in similar way to r362308.
Forgot to do the widen forms when I was doing the others.

llvm-svn: 362310
2019-06-02 00:43:02 +00:00
Craig Topper 396a915c26 [X86] Add the SSE versions of PMULLW and PMULLD to isAssociativeAndCommutative.
llvm-svn: 362309
2019-06-02 00:42:58 +00:00
Craig Topper 4721fad972 [X86] Simplify the CHECK lines in vector-reduce-and/or/xor.
The AVX512BW and AVX512VL checks were never used. And AVX512 is the same
as AVX on all tests that weren't already split for AVX1 and AVX2.

llvm-svn: 362308
2019-06-02 00:07:52 +00:00
Craig Topper eeaecc63e9 [X86] Add avx512 command lines and test cases to machine-combiner.ll
llvm-svn: 362307
2019-06-02 00:07:48 +00:00
Simon Pilgrim cd1878d0f9 [AMDGPU] Regenerate SDIV tests for an upcoming patch
llvm-svn: 362303
2019-06-01 18:27:06 +00:00
Simon Pilgrim 0d4a040510 [X86][AVX] Add tests for CONCAT(MOVDDUP(x),MOVDDUP(y))
llvm-svn: 362300
2019-06-01 14:05:46 +00:00
Simon Atanasyan 25694e0084 [mips] Extend range of register indexes accepted by cfcmsa/ctcmsa
The `cfcmsa` and `ctcmsa` instructions accept index of MSA control
register. The MIPS64 SIMD Architecture define eight MSA control
registers. But register index for `cfcmsa` and `ctcmsa` instructions
might be any number in 0..31 range. If the index is greater then 7,
`cfcmsa` writes zero to the destination registers and `ctcmsa` does
nothing [1].

[1] MIPS Architecture for Programmers Volume IV-j:
    The MIPS64 SIMD Architecture Module
https://www.mips.com/?do-download=the-mips64-simd-architecture-module

Differential Revision: https://reviews.llvm.org/D62597

llvm-svn: 362299
2019-06-01 13:55:18 +00:00
Dylan McKay 45eb4c7e55 [AVR] Disable register coalescing to the PTRDISPREGS class
If we would allow register coalescing on PTRDISPREGS class then register
allocator can lock Z register to some virtual register. Larger instructions
requiring a memory acces then fail during the register allocation phase since
there is no available register to hold a pointer if Y register was already
taken for a stack frame. This patch prevents it by keeping Z register
spillable. It does it by not allowing coalescer to lock it.

Original discussion on https://github.com/avr-rust/rust/issues/128.

llvm-svn: 362298
2019-06-01 12:38:56 +00:00
Roman Lebedev 1aaa23c0fc [NFC][Codegen] shift-amount-mod.ll: drop innermost operation
I have initially added it in for test to display both
whether the binop w/ constant is sinked or hoisted.
But as it can be seen from the 'sub (sub C, %x), %y'
test, that actually conceals the issues it is supposed to test.

At least two more patterns are unhandled:
* 'add (sub C, %x), %y' - D62266
* 'sub (sub C, %x), %y'

llvm-svn: 362295
2019-06-01 11:08:29 +00:00
Craig Topper c288a19bb7 [X86] Add AVX512BF16 and AVX512VP2INTERSECT instructions to the loading folding tables.
llvm-svn: 362288
2019-06-01 06:20:59 +00:00
Matt Arsenault 302eedcbfa AMDGPU: Fix not adding ImplicitBufferPtr as a live-in
Fixes missing test from r293000.

llvm-svn: 362275
2019-05-31 22:47:36 +00:00
Puyan Lotfi 3ea6b24f41 [MIR-Canon] Don't do vreg skip for independent instructions if there are none.
We don't want to create vregs if there is nothing to use them for. That causes
verifier errors.

Differential Revision: https://reviews.llvm.org/D62740

llvm-svn: 362247
2019-05-31 17:34:25 +00:00
Kevin P. Neal ac79007205 Revert revert of r362112 with minor SystemZ test file corrections.
[FPEnv] Added a special UnrollVectorOp method to deal with the chain on StrictFP opcodes

This change creates UnrollVectorOp_StrictFP. The purpose of this is to address a failure that consistently occurs when calling StrictFP functions on vectors whose number of elements is 3 + 2n on most platforms, such as PowerPC or SystemZ. The old UnrollVectorOp method does not expect that the vector that it will unroll will have a chain, so it has an assert that prevents it from running if this is the case. This new StrictFP version of the method deals with the chain while unrolling the vector. With this new function in place during vector widending, llc can run vector-constrained-fp-intrinsics.ll for SystemZ successfully.

Submitted by:	Drew Wock <drew.wock@sas.com>
Reviewed by:	Cameron McInally, Kevin P. Neal
Approved by:	Cameron McInally
Differential Revision:	https://reviews.llvm.org/D62546

llvm-svn: 362241
2019-05-31 16:32:12 +00:00
Guozhi Wei c3a24e93d5 [PPC] Correctly adjust branch probability in PPCReduceCRLogicals
In PPCReduceCRLogicals after splitting the original MBB into 2, the 2 impacted branches still use original branch probability. This is unreasonable. Suppose we have following code, and the probability of each successor is 50%.

    condc = conda || condb
    br condc, label %target, label %fallthrough

It can be transformed to following,

    br conda, label %target, label %newbb
  newbb:
    br condb, label %target, label %fallthrough

Since each branch has a probability of 50% to each successor, the total probability to %fallthrough is 25% now, and the total probability to %target is 75%. This actually changed the original profiling data. A more reasonable probability can be set to 70% to the false side for each branch instruction, so the total probability to %fallthrough is close to 50%.

This patch assumes the branch target with two incoming edges have same edge frequency and computes new probability fore each target, and keep the total probability to original targets unchanged.

Differential Revision: https://reviews.llvm.org/D62430

llvm-svn: 362237
2019-05-31 16:11:17 +00:00
Simon Pilgrim db6a1d4f24 [AMDGPU] Regenerate add/sub shrink constant tests for an upcoming patch
llvm-svn: 362230
2019-05-31 15:06:51 +00:00
Simon Pilgrim 27d6ea9698 [AMDGPU] Regenerate CTLZ tests for an upcoming patch
llvm-svn: 362229
2019-05-31 15:06:14 +00:00
Petar Avramovic f317debdb8 [MIPS GlobalISel] Add detailed tests for lower call
Test different operand types of callee and their behavior whether
relocation model is pic or not.
Possible operand types are:
Register (function pointer),
External symbol (used for libcalls e.g. __udivdi3 or memcpy),
Global address.

Global address has different handling depending on relocation model
and linkage type. Register and external symbol do not.

Differential Revision: https://reviews.llvm.org/D62590

llvm-svn: 362212
2019-05-31 08:40:08 +00:00
Petar Avramovic efcd3c0009 [MIPS GlobalISel] Handle position independent code
Handle position independent code for MIPS32.
When callee is global address, lower call will emit callee
as G_GLOBAL_VALUE and add target flag if needed.
Support $gp in getRegBankFromRegClass().
Select G_GLOBAL_VALUE, specially handle case when
there are target flags attached by lowerCall.

Differential Revision: https://reviews.llvm.org/D62589

llvm-svn: 362210
2019-05-31 08:27:06 +00:00
Roman Lebedev 7c1ac8269a [NFC][Codegen] Add/sub constant-folding: add scalar tests too
Just for completeness.

llvm-svn: 362208
2019-05-31 08:23:48 +00:00
Petar Avramovic f4a6dd28b6 [MIPS GlobalISel] Lower call for callee that is register
Lower call for callee that is register for MIPS32.
Register should contain callee function address.

Differential Revision: https://reviews.llvm.org/D62585

llvm-svn: 362204
2019-05-31 08:06:17 +00:00
Craig Topper 31d00d80a2 [X86] Remove patterns for X86VSintToFP/X86VUintToFP+loadv4f32 to v2f64.
These patterns can incorrectly narrow a volatile load from 128-bits to 64-bits.
Similar to PR42079.

Switch to using (v4i32 (bitcast (v2i64 (scalar_to_vector (loadi64))))) as the
load pattern used in the instructions.

This probably still has issues in 32-bit mode where loadi64 isn't legal. Maybe
we should use VZMOVL for widened loads even when we don't need the upper bits
as zeroes?

llvm-svn: 362203
2019-05-31 07:38:26 +00:00
Craig Topper cded573710 [X86] Add test cases for failure to use 128-bit masked vcvtdq2pd when load starts as v2i32.
llvm-svn: 362202
2019-05-31 07:38:22 +00:00
Craig Topper 67d43e0744 [X86] Add test cases for a volatile load shrinking bug involving cvtdq2pd. NFC
Similar to PR42079

llvm-svn: 362201
2019-05-31 07:38:18 +00:00
Craig Topper cb0ad5accb [X86] Copy a test case from avx512-cvt.ll to avx512-cvt-widen.ll. NFC
llvm-svn: 362200
2019-05-31 07:38:14 +00:00
Craig Topper b79cc5f802 [X86] Remove avx512 isel patterns for fpextend+load. Prefer to only match fp extloads instead.
DAG combine will usually fold fpextend+load to an fp extload anyway. So the
256 and 512 patterns were probably unnecessary. The 128 bit pattern was special
in that it looked for a v4f32 load, but then used it in an instruction that
only loads 64-bits. This is bad if the load happens to be volatile. We could
probably make the patterns volatile aware, but that's more work for something
that's probably rare. The peephole pass might kick in and save us anyway. We
might also be able to fix this with some additional DAG combines.

This also adds patterns for vselect+extload to enabled masked vcvtps2pd to be
used. Previously we looked for the unlikely vselect+fpextend+load.

llvm-svn: 362199
2019-05-31 06:21:53 +00:00
Craig Topper 73b07284df [X86] Add test to show missed opportunity to use masked vcvtps2pd for vselect+extload.
llvm-svn: 362198
2019-05-31 06:21:49 +00:00
Craig Topper 8cb076ec6e [X86] Add test case for PR42079. NFC
llvm-svn: 362197
2019-05-31 06:21:45 +00:00
Puyan Lotfi 0d63cef180 [MIR-Canon] Skip the first N vreg names lazily.
This consolidates the vreg skip code into one function (SkipVRegs()).
SkipVRegs() now knows if it should skip as if it is the first initialization or
subsequent skips.

The first skip is also done the first time createVirtualRegister is called by
the cursor instead of by the cursor's constructor. This prevents verifier
errors on machine functions that have no vregs (where the verifier will
complain that there are vregs when the function uses none).

Differential Revision: https://reviews.llvm.org/D62717

llvm-svn: 362195
2019-05-31 06:02:38 +00:00
Craig Topper 23066033a1 [X86] Correct the ins operand order for MASKPAIR16STORE to match other store instructions.
This makes the 5 address operands come first. And the data operand comes last.

This matches the operand order the instruction is created with. It's also the
expected order in X86MCInstLower. So everything appeared to work, but the
operands didn't match their declared type.

Fixes a -verify-machineinstrs failure.

Also remove the isel patterns from these instructions since they should only
be used for stack spills and reloads. I'm not even sure what types the patterns
were looking for to match.

llvm-svn: 362193
2019-05-31 05:20:27 +00:00
Puyan Lotfi 2a901401fe [MIR-Canon] Hardening propagateLocalCopies.
This is am almost NFC, it does the following:
- If there is no register class for a COPY's src or dst, bail.
- Fixes uses iterator invalidation bug.

Differential Revision: https://reviews.llvm.org/D62713

llvm-svn: 362191
2019-05-31 04:49:58 +00:00
Pengfei Wang 2e67d0c842 [X86] Add VP2INTERSECT instructions
Support Intel AVX512 VP2INTERSECT instructions in llvm

Patch by Xiang Zhang (xiangzhangllvm)

Differential Revision: https://reviews.llvm.org/D62366

llvm-svn: 362188
2019-05-31 02:50:41 +00:00
Roman Lebedev 31f1939848 [NFC][ARM] Add a test that potentially causes endless combine loop with D62266
llvm-svn: 362159
2019-05-30 21:41:21 +00:00
Puyan Lotfi daaecf98c9 [MIR-Canon] Fixing case where MachineFunction is empty.
In cases where the machine function is empty: bail on the RPO traversal.

Differential Revision: https://reviews.llvm.org/D62617

llvm-svn: 362158
2019-05-30 21:37:25 +00:00
Roman Lebedev a4e3b50e26 [DAGCombiner][X86][AArch64] (x - C) + y -> (x + y) - C fold. Try 2
Summary:
Only vector tests are being affected here,
since subtraction by scalar constant is rewritten
as addition by negated constant.

No surprising test changes.

https://rise4fun.com/Alive/pbT

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62257

llvm-svn: 362146
2019-05-30 20:37:49 +00:00
Roman Lebedev 57aa36ff91 [DAGCombine] (x - C) - y -> (x - y) - C fold. Try 3
Summary:
Again only vectors affected. Frustrating. Let me take a look into that..

https://rise4fun.com/Alive/AAq

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: javed.absar, JDevlieghere, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62294

llvm-svn: 362145
2019-05-30 20:37:39 +00:00
Roman Lebedev 63b4741534 [DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold. Try 3
Summary:
This prevents regressions in next patch,
and somewhat recovers from the regression to AMDGPU test in D62223.

It is indeed not great that we leave vector decrement,
don't transform it into vector add all-ones..

https://rise4fun.com/Alive/ZRl

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.

Reviewers: RKSimon, craig.topper, spatel, arsenm

Reviewed By: RKSimon, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62263

llvm-svn: 362144
2019-05-30 20:37:29 +00:00
Roman Lebedev 05ad5fd213 [DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold. Try 3
Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?

The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`

https://rise4fun.com/Alive/ffh

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.

Reviewers: RKSimon, craig.topper, spatel, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62252

llvm-svn: 362143
2019-05-30 20:37:18 +00:00
Roman Lebedev 1d9ec7a81b [DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold. Try 3
Summary:
The main motivation is shown by all these `neg` instructions that are now created.
In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test.

AArch64 test changes all look good (`neg` created), or neutral.

X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created).

I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.

I'm unable to interpret AMDGPU change, looks neutral-ish?

This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].

https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.

Reviewers: craig.topper, RKSimon, spatel, arsenm

Reviewed By: RKSimon

Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62223

llvm-svn: 362142
2019-05-30 20:36:54 +00:00
Matt Arsenault e0a4da8c0a AMDGPU/GlobalISel: Add wave scratch offset argument
Avoids crashing in PEI in a future change.

llvm-svn: 362136
2019-05-30 19:33:18 +00:00
Roman Lebedev 7eb8b5b5dd [DAGCombine] ((c1-A)-c2) -> ((c1-c2)-A) constant-fold
Summary: https://rise4fun.com/Alive/B0A

Reviewers: t.p.northover, RKSimon, spatel, craig.topper

Reviewed By: RKSimon

Subscribers: javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62691

llvm-svn: 362135
2019-05-30 19:27:51 +00:00
Roman Lebedev 691b5e2ecc [DAGCombine] (A-C1)-C2 -> A-(C1+C2) constant-fold
Summary: https://rise4fun.com/Alive/Mb1M

Reviewers: RKSimon, craig.topper, spatel, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62689

llvm-svn: 362134
2019-05-30 19:27:42 +00:00
Roman Lebedev 0a3dbbcdfb [DAGCombine] (A+C1)-C2 -> A+(C1-C2) constant-fold
Summary:
Direct sibling of D62662, the root cause of the endless combine loop in D62257

https://rise4fun.com/Alive/d3W

Reviewers: RKSimon, craig.topper, spatel, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62664

llvm-svn: 362133
2019-05-30 19:27:32 +00:00
Roman Lebedev cc9a9cf237 [DAGCombine] ((A-c1)+c2) -> (A+(c2-c1)) constant-fold
Summary:
This was the root cause of the endless combine loop in D62257

https://rise4fun.com/Alive/d3W

Reviewers: RKSimon, spatel, craig.topper, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62662

llvm-svn: 362131
2019-05-30 19:27:19 +00:00
Tim Northover b7141207a4 Reapply: IR: add optional type to 'byval' function parameters
When we switch to opaque pointer types we will need some way to describe
how many bytes a 'byval' parameter should occupy on the stack. This adds
a (for now) optional extra type parameter.

If present, the type must match the pointee type of the argument.

The original commit did not remap byval types when linking modules, which broke
LTO. This version fixes that.

Note to front-end maintainers: if this causes test failures, it's probably
because the "byval" attribute is printed after attributes without any parameter
after this change.

llvm-svn: 362128
2019-05-30 18:48:23 +00:00
Tim Renouf 7fecdf36cc [AMDGPU] Added target-specific attribute amdgpu-max-memory-clause
With LLPC, previous investigation has suggested that si-scheduler
interacts badly with SiFormMemoryClauses on an XNACK target in some
games.

That needs further investigation in the future. In the meantime, this
commit adds a target-specific attribute to allow us to disable
SIFormMemoryClauses by setting it to 1 on a per-function basis for LLPC
to use.

Differential Revision: https://reviews.llvm.org/D62572

Change-Id: Ia0ca12ce79093cbbe86caded723ffb13384ede92
llvm-svn: 362127
2019-05-30 18:46:34 +00:00
Puyan Lotfi 0f4446b270 [MIR-Canon] Add support for rewriting VRegs that are typed but don't have an RC.
There were crashes (addrspace-memoperands.mir was only one of them) in MIR that
had operands that came from before register classes were set. With these
operands, creating a replacement vreg (for MIR-Canon's renaming) needs to use
the vreg type rather than the RegisterClass which is not present.

Differential Revision: https://reviews.llvm.org/D62543

llvm-svn: 362122
2019-05-30 18:06:28 +00:00
Kevin P. Neal d3db7b40b0 Revert r362112, it broke the bots with the message "Unsupported vector argument or return type"
Differential Revision:	http://reviews.llvm.org/D62546

llvm-svn: 362117
2019-05-30 17:10:21 +00:00
Roman Lebedev 2ae4b33181 [NFC][Codegen] Potential add/sub constant folding: fixup non-splat tests
llvm-svn: 362114
2019-05-30 16:50:43 +00:00
Kevin P. Neal 2e1807678d [FPEnv] Added a special UnrollVectorOp method to deal with the chain on StrictFP opcodes
This change creates UnrollVectorOp_StrictFP. The purpose of this is to address a failure that consistently occurs when calling StrictFP functions on vectors whose number of elements is 3 + 2n on most platforms, such as PowerPC or SystemZ. The old UnrollVectorOp method does not expect that the vector that it will unroll will have a chain, so it has an assert that prevents it from running if this is the case. This new StrictFP version of the method deals with the chain while unrolling the vector. With this new function in place during vector widending, llc can run vector-constrained-fp-intrinsics.ll for SystemZ successfully.

Submitted by:	Drew Wock <drew.wock@sas.com>
Reviewed by:	Cameron McInally, Kevin P. Neal
Approved by:	Cameron McInally
Differential Revision:	http://reviews.llvm.org/D62546

llvm-svn: 362112
2019-05-30 16:44:47 +00:00
Roman Lebedev 700fdb1070 [NFC][Codegen] Add better test coverage for potential add/sub constant folding
This adds hopefully-full test coverage for all the possible permutations:
First op is one of:
* x + c1
* x - c1
* c1 - x

Second op is one of:
* + c2
* - c2
* c2 -

And thus 3*3=9 patterns.
Some of them show missed constant-folds.

Without previous patch (the revert), these tests were causing endless
dagcombine loop. I really should have thought about this first :S

llvm-svn: 362110
2019-05-30 16:07:19 +00:00
Roman Lebedev 019d270e43 [DAGCombine] Revert of recommit of "binop-with-const hoisting" patches
I was looking into an endless combine loop the uncommitted follow-up patch
was causing, and it appears even these patches can exibit such an
endless loop. The root cause is that we try to hoist one binop (add/sub) with
constant operand, and if we get two such binops both of which are
eligible for this hoisting, we get stuck.

Some cases may highlight missing constant-folds.

Reverts r361871,r361872,r361873,r361874.

llvm-svn: 362109
2019-05-30 16:07:11 +00:00
Roman Lebedev 8f220a5d2c [NFC][Codegen] Add add+sub/sub+add constant-fold tests for from D62257
add+sub/sub+add when second operands are constants should be folded
into a single add, just like with add+add.

llvm-svn: 362093
2019-05-30 13:02:11 +00:00
Sjoerd Meijer 930dee2c0b [ARM] add target arch definitions for 8.1-M and MVE
This adds:
- LLVM subtarget features to make all the new instructions conditional on,
- CPU and FPU names for use on clang's command line, with default FPUs set
  so that "armv8.1-m.main+fp" and "armv8.1-m.main+fp.dp" will select the right
  FPU features,
- architecture extension names "mve" and "mve.fp",
- ABI build attribute support for v8.1-M (a new value for Tag_CPU_arch) and MVE
  (a new actual tag).

Patch mostly by Simon Tatham.

Differential Revision: https://reviews.llvm.org/D60698

llvm-svn: 362090
2019-05-30 12:57:04 +00:00
Simon Pilgrim 32aac1727a [X86][SSE] Improve bool vector extload (PR26091)
We already have good codegen for (vXiY *ext(vXi1 bitcast(iX))) cases, this patch uses it for loads of vXi1 types as well - changing the load into a iX integer load, and bitcasting so that combineToExtendBoolVectorInReg can then use it.

Differential Revision: https://reviews.llvm.org/D62449

llvm-svn: 362081
2019-05-30 10:25:20 +00:00
Pengfei Wang 1f67d94279 [X86] Add ENQCMD instructions
For more details about these instructions, please refer to the latest
ISE document:
https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference.

Patch by Tianqing Wang (tianqing)

Differential Revision: https://reviews.llvm.org/D62281

llvm-svn: 362053
2019-05-30 03:59:16 +00:00
Tim Northover 71ee3d0237 Revert "IR: add optional type to 'byval' function parameters"
The IRLinker doesn't delve into the new byval attribute when mapping types, and
this breaks LTO.

llvm-svn: 362029
2019-05-29 20:46:38 +00:00
Roman Lebedev 68908c9017 UpdateTestChecks: Lanai triple support
Summary:
The assembly structure most resembles the SPARC pattern:
```
        .globl  f6                      ! -- Begin function f6
        .p2align        2
        .type   f6,@function
f6:                                     ! @f6
        .cfi_startproc
! %bb.0:
        st      %fp, [--%sp]
<...>
        ld      -8[%fp], %fp
.Lfunc_end0:
        .size   f6, .Lfunc_end0-f6
        .cfi_endproc
                                        ! -- End function
```
Test being affected by upcoming patch, so regenerate it.

Reviewers: RKSimon, jpienaar

Reviewed By: RKSimon

Subscribers: jyknight, fedor.sergeev, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62545

llvm-svn: 362019
2019-05-29 20:03:00 +00:00
Benjamin Kramer 107f8d9873 [DAGCombiner] Replace gathers with a zero mask with the passthru value
These can be created by the legalizer when splitting a larger gather.

See https://llvm.org/PR42055 for a motivating example.

Differential Revision: https://reviews.llvm.org/D62613

llvm-svn: 362015
2019-05-29 19:24:19 +00:00
Tim Northover 6e07f16fae IR: add optional type to 'byval' function parameters
When we switch to opaque pointer types we will need some way to describe
how many bytes a 'byval' parameter should occupy on the stack. This adds
a (for now) optional extra type parameter.

If present, the type must match the pointee type of the argument.

Note to front-end maintainers: if this causes test failures, it's probably
because the "byval" attribute is printed after attributes without any parameter
after this change.

llvm-svn: 362012
2019-05-29 19:12:48 +00:00
Aakanksha Patil d5443f8c21 AMDGPU: Return address lowering
The patch computes the return address for the current function.

Differential revision: https://reviews.llvm.org/D59666

llvm-svn: 362001
2019-05-29 18:20:11 +00:00
Craig Topper e3a76fa1e2 [X86] Fix machineverifier error on avx512f-256-set0.mir
Previously the pass ran the entire llc pipeline which caused the IR to be recodegened.

This commit restricts it to just running the postrapseudos pass and checking the results of that instead of the final assembly.

llvm-svn: 361991
2019-05-29 17:02:27 +00:00
Kevin P. Neal 308b7139b1 Partial revert of revert of r361827: Add constrained intrinsic tests for powerpc64le.
The powerpc64-"nonle" tests are removed. They fail because of a bug that
Drew is currently working on that affects multiple targets.

Submitted by:	Drew Wock <drew.wock@sas.com>
Reviewed by:	Hal Finkel, Kevin P. Neal
Approved by:	Hal Finkel
Differential Revision:	http://reviews.llvm.org/D62388

llvm-svn: 361985
2019-05-29 16:29:31 +00:00
Simon Atanasyan 909c8c2b0d [mips] Use reg-exp in tests to tolerate register indexes changing. NFC
llvm-svn: 361966
2019-05-29 14:59:07 +00:00
Matt Arsenault 9ffd8b5a6f AMDGPU/GlobalISel: Remove unnecesssary REQUIREs
This has been a mandatory part of the build for a while.

llvm-svn: 361956
2019-05-29 13:14:35 +00:00
Peter Collingbourne 31fda09b2d Add IR support, ELF section and user documentation for partitioning feature.
The partitioning feature was proposed here:
http://lists.llvm.org/pipermail/llvm-dev/2019-February/130583.html

This is mostly just documentation. The feature itself will be contributed
in subsequent patches.

Differential Revision: https://reviews.llvm.org/D60242

llvm-svn: 361923
2019-05-29 03:29:01 +00:00
Sanjay Patel 19f703e0d7 [AArch64] auto-generate complete test checks; NFC
llvm-svn: 361908
2019-05-29 01:37:44 +00:00
Sanjay Patel 860736cc3c [AArch64] auto-generate complete test checks; NFC
llvm-svn: 361906
2019-05-29 01:35:10 +00:00
Quentin Colombet a6f57ad2c9 [RegUsageInfoCollector] Don't mark as saved registers that don't have subregister lanes
To determine the list of clobbered registers, the RegUsageInfoCollector pass
uses the list of callee saved registers provided by the target and then augments
it with the list of registers which have all their subregisters saved. It then
basically does the difference between all the registers and the saved registers
to come up with what is clobbered (plus it checks that the register is defined
within that functions).

The patch fixes a bug where when register does not have any subregister lane,
hence when checking if any of its subregister are not saved, we would find none
and think the register is saved as well.

That's obviously wrong.

The code was actually kind of checking for something like that with the
CoveredBySubRegs bit. What this bit says is that a register is completely
covered by its subregisters.
We required that this bit was set, to check that a register was saved by its
subregister lanes, since without this bit, we potentially would miss to check
some part of the register.

However, this bit is used de facto on registers that don't have any
subregisters (e.g., on ARM) and the code was not prepared for that.

This patch fixes this by checking that a register has subregisters before
declaring it saved when none of its lanes are modified.

llvm-svn: 361901
2019-05-28 23:43:12 +00:00
Jessica Paquette b73ea75b38 [AArch64][GlobalISel] Select FCMPSri/FCMPDri when comparing against 0.0
Add support for selecting FCMPSri and FCMPDri when comparing against 0.0, and
factor out opcode selection for G_FCMP into its own function.

Add a test to show that we don't do this with other immediates.

Differential Revision: https://reviews.llvm.org/D62539

llvm-svn: 361888
2019-05-28 22:52:49 +00:00
Heejin Ahn 5514658591 [WebAssembly] Support for atomic fences
Summary:
This adds support for translation of LLVM IR fence instruction. We
convert a singlethread fence to a pseudo compiler barrier which becomes
0 instructions in final binary, and a thread fence to an idempotent
atomicrmw instruction to a memory address.

Reviewers: dschuff, jfb, sunfish, tlively

Subscribers: sbc100, jgravelle-google, llvm-commits

Differential Revision: https://reviews.llvm.org/D50277

llvm-svn: 361884
2019-05-28 22:09:12 +00:00
Konstantin Zhuravlyov fe23ed2c68 AMDGPU: Temporary drop s_mul_hi_i/u32 patterns
It introduces performance regressions in several applications.

This has already been submitted downstream.

llvm-svn: 361879
2019-05-28 21:18:34 +00:00
Adhemerval Zanella 34d8daae53 [AArch64] Handle ISD::LRINT and ISD::LLRINT
This patch optimizes ISD::LRINT and ISD::LLRINT to frintx plus
fcvtzs. It currently only handles the scalar version.

Reviewed By: SjoerdMeijer, mstorsjo

Differential Revision: https://reviews.llvm.org/D62018

llvm-svn: 361877
2019-05-28 21:04:29 +00:00
Adhemerval Zanella 6d7bf5e8df [CodeGen] Add lrint/llrint builtins
This patch add the ISD::LRINT and ISD::LLRINT along with new
intrinsics.  The changes are straightforward as for other
floating-point rounding functions, with just some adjustments
required to handle the return value being an interger.

The idea is to optimize lrint/llrint generation for AArch64
in a subsequent patch.  Current semantic is just route it to libm
symbol.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D62017

llvm-svn: 361875
2019-05-28 20:47:44 +00:00
Roman Lebedev dfc34f0211 [DAGCombine] (x - C) - y -> (x - y) - C fold. Try 2
Summary:
Again only vectors affected. Frustrating. Let me take a look into that..

https://rise4fun.com/Alive/AAq

This is a recommit, originally committed in rL361856, but reverted
to investigate test-suite compile-time hangs.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: javed.absar, JDevlieghere, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62294

llvm-svn: 361874
2019-05-28 20:40:10 +00:00
Roman Lebedev d485c6bc9f [DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold. Try 2
Summary:
This prevents regressions in next patch,
and somewhat recovers from the regression to AMDGPU test in D62223.

It is indeed not great that we leave vector decrement,
don't transform it into vector add all-ones..

https://rise4fun.com/Alive/ZRl

This is a recommit, originally committed in rL361855, but reverted
to investigate test-suite compile-time hangs.

Reviewers: RKSimon, craig.topper, spatel, arsenm

Reviewed By: RKSimon, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62263

llvm-svn: 361873
2019-05-28 20:40:03 +00:00
Roman Lebedev 96c9986199 [DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold. Try 2
Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?

The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`

https://rise4fun.com/Alive/ffh

This is a recommit, originally committed in rL361853, but reverted
to investigate test-suite compile-time hangs.

Reviewers: RKSimon, craig.topper, spatel, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62252

llvm-svn: 361872
2019-05-28 20:39:55 +00:00
Roman Lebedev 2feb7e56e2 [DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold. Try 2
Summary:
The main motivation is shown by all these `neg` instructions that are now created.
In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test.

AArch64 test changes all look good (`neg` created), or neutral.

X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created).

I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.

I'm unable to interpret AMDGPU change, looks neutral-ish?

This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].

https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs.

Reviewers: craig.topper, RKSimon, spatel, arsenm

Reviewed By: RKSimon

Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62223

llvm-svn: 361871
2019-05-28 20:39:39 +00:00
Michael Liao 5fc1dfa784 [AMDGPU] Correct the handling of inlineasm output registers.
Summary:
- There's a regression due to the cross-block RC assignment. Use the
  proper way to derive the output register RC in inline asm.

Reviewers: rampitec, alex-t

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, eraman, hiraditya, llvm-commits, yaxunl

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62537

llvm-svn: 361868
2019-05-28 19:37:09 +00:00
Roman Lebedev 272d70c366 Revert DAGCombine "hoist binop with const" folds
Appear to introduce test-suite compile-time hang.

http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/22825

This reverts r361852,r361853,r361854,r361855,r361856

llvm-svn: 361865
2019-05-28 19:04:21 +00:00
Roman Lebedev caeec8501e [NFC][MIPS] Autogenerater madd-msub.ll test
Being affected by upcoming patch

llvm-svn: 361860
2019-05-28 18:31:36 +00:00
Roman Lebedev 7669665432 [DAGCombine] (x - C) - y -> (x - y) - C fold
Summary:
Again only vectors affected. Frustrating. Let me take a look into that..

https://rise4fun.com/Alive/AAq

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: javed.absar, JDevlieghere, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62294

llvm-svn: 361856
2019-05-28 17:54:21 +00:00
Roman Lebedev 8c9b3e4e4a [DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold
Summary:
This prevents regressions in next patch,
and somewhat recovers from the regression to AMDGPU test in D62223.

It is indeed not great that we leave vector decrement,
don't transform it into vector add all-ones..

https://rise4fun.com/Alive/ZRl

Reviewers: RKSimon, craig.topper, spatel, arsenm

Reviewed By: RKSimon, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62263

llvm-svn: 361855
2019-05-28 17:54:13 +00:00
Roman Lebedev 6a24c9b9ab [DAGCombiner][X86][AArch64] (x - C) + y -> (x + y) - C fold
Summary:
Only vector tests are being affected here,
since subtraction by scalar constant is rewritten
as addition by negated constant.

No surprising test changes.

https://rise4fun.com/Alive/pbT

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62257

llvm-svn: 361854
2019-05-28 17:54:04 +00:00
Roman Lebedev 1499f65ac1 [DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold
Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?

The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`

https://rise4fun.com/Alive/ffh

Reviewers: RKSimon, craig.topper, spatel, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62252

llvm-svn: 361853
2019-05-28 17:53:54 +00:00
Roman Lebedev 19f51ec04a [DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold
Summary:
The main motivation is shown by all these `neg` instructions that are now created.
In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test.

AArch64 test changes all look good (`neg` created), or neutral.

X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created).

I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.

I'm unable to interpret AMDGPU change, looks neutral-ish?

This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].

https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)

Reviewers: craig.topper, RKSimon, spatel, arsenm

Reviewed By: RKSimon

Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62223

llvm-svn: 361852
2019-05-28 17:53:43 +00:00
Sanjay Patel f7980e727f Revert "[x86] split 256-bit store of concatenated vectors"
This reverts commit d5a8637072.

Most likely suspect for this bot failure:
http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/9684

llvm-svn: 361850
2019-05-28 17:37:58 +00:00
Matt Arsenault 24e80b8d04 AMDGPU: Don't enable all lanes with non-CSR VGPR spills
If the only VGPRs used for SGPR spilling were not CSRs, this was
enabling all laness and immediately restoring exec. This is the usual
situation in leaf functions.

llvm-svn: 361848
2019-05-28 16:46:02 +00:00
Michael Liao 7166843f1e [AMDGPU] Fix the mis-handling of `vreg_1` copied from scalar register.
Summary:
- Don't treat the use of a scalar register as `vreg_1` an VGPR usage.
  Otherwise, that promotes that scalar register into vector one, which
  breaks the assumption that scalar register holds the lane mask.
- The issue is triggered in a complicated case, where if the uses of
  that (lane mask) scalar register is legalized firstly before its
  definition, e.g., due to the mismatch block placement and its
  topological order or loop. In that cases, the legalization of PHI
  introduces the use of that scalar register as `vreg_1`.

Reviewers: rampitec, nhaehnle, arsenm, alex-t

Subscribers: kzhuravl, jvesely, wdng, dstuttard, tpr, t-tye, hiraditya, llvm-commits, yaxunl

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62492

llvm-svn: 361847
2019-05-28 16:29:39 +00:00
Simon Tatham 760df47b77 [ARM] Replace fp-only-sp and d16 with fp64 and d32.
Those two subtarget features were awkward because their semantics are
reversed: each one indicates the _lack_ of support for something in
the architecture, rather than the presence. As a consequence, you
don't get the behavior you want if you combine two sets of feature
bits.

Each SubtargetFeature for an FP architecture version now comes in four
versions, one for each combination of those options. So you can still
say (for example) '+vfp2' in a feature string and it will mean what
it's always meant, but there's a new string '+vfp2d16sp' meaning the
version without those extra options.

A lot of this change is just mechanically replacing positive checks
for the old features with negative checks for the new ones. But one
more interesting change is that I've rearranged getFPUFeatures() so
that the main FPU feature is appended to the output list *before*
rather than after the features derived from the Restriction field, so
that -fp64 and -d32 can override defaults added by the main feature.

Reviewers: dmgreen, samparker, SjoerdMeijer

Subscribers: srhines, javed.absar, eraman, kristof.beyls, hiraditya, zzheng, Petar.Avramovic, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D60691

llvm-svn: 361845
2019-05-28 16:13:20 +00:00
David Greene 561fcc0d63 [X86-64] Fix 256-bit SET0 lowering for non-VLX targets
If we don't have VLX then 256-bit SET0 should be lowered
to VPXOR with ZMM registers.  This restores functionality
accidentally removed by r309926.

Differential Revision: https://reviews.llvm.org/D62415

llvm-svn: 361843
2019-05-28 15:37:01 +00:00
Kevin P. Neal 71f8f745b4 Revert 361827. It broke the bots.
llvm-svn: 361831
2019-05-28 14:37:45 +00:00
Kevin P. Neal 6d458fa866 Add constrained intrinsic tests for powerpc64 and powerpc64le.
Submitted by:	Drew Wock
Reviewed by:	Hal Finkel
Approved by:	Hal Finkel
Differential Revision:	https://reviews.llvm.org/D62388

llvm-svn: 361827
2019-05-28 14:17:48 +00:00
Sanjay Patel d5a8637072 [x86] split 256-bit store of concatenated vectors
This shows up as a side issue to the main problem for the AVX target example from PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3

But as we can see in the pile of existing test diffs, it's actually a widespread problem
that affects any AVX or later target. Apart from a couple of oddballs, I think these are
all improvements for the reasons stated in the code comment: we do not want to enable YMM
unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit
stores anyway.

We could say that MergeConsecutiveStores() is going overboard on some of these examples,
but that won't solve the problem completely. But that is the reason I'm proposing this as
a lowering rather than a combine: we will infinite loop fighting the merge code if we try
this earlier.

Differential Revision: https://reviews.llvm.org/D62498

llvm-svn: 361822
2019-05-28 13:54:17 +00:00
Matt Arsenault d3ed418ad3 MIR: Fix printer crashing on dead CSR frame indexes
llvm-svn: 361819
2019-05-28 13:08:31 +00:00
Sjoerd Meijer c0f43bee37 Follow up of r361810: test case fix attempt for Windows builder
llvm-svn: 361817
2019-05-28 13:04:47 +00:00
Sanjay Patel 6bf4ca9d2e [x86] fix 256-bit vector store splitting to honor 'volatile'
Forking this out of the discussion in D62498
(and assuming that will be committed later, so adding the helper function here).
The LangRef says:
"the backend should never split or merge target-legal volatile load/store instructions."

Differential Revision: https://reviews.llvm.org/D62506

llvm-svn: 361815
2019-05-28 12:58:07 +00:00
Benjamin Kramer 57e267a2e9 [X86] Custom lower CONCAT_VECTORS of v2i1
The generic legalizer cannot handle this. Add an assert instead of
silently miscompiling vectors with elements smaller than 8 bits.

llvm-svn: 361814
2019-05-28 12:52:57 +00:00
Hans Wennborg d936e40575 Re-commit r357452 (take 2): "SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)"
This was reverted in r360086 as it was supected of causing mysterious test
failures internally. However, it was never concluded that this patch was the
root cause.

> The code was previously checking that candidates for sinking had exactly
> one use or were a store instruction (which can't have uses). This meant
> we could sink call instructions only if they had a use.
>
> That limitation seemed a bit arbitrary, so this patch changes it to
> "instruction has zero or one use" which seems more natural and removes
> the need to special-case stores.
>
> Differential revision: https://reviews.llvm.org/D59936

llvm-svn: 361811
2019-05-28 12:19:38 +00:00
Sjoerd Meijer 4df2baadd2 [ARM] Use CHECK-NEXT in CodeGen/ARM/O3-pipeline.ll. NFC.
Use CHECK-NEXT, like in other pipeline tests, so that we actually
notice when the pipeline is changed.

llvm-svn: 361810
2019-05-28 12:06:26 +00:00
Sanjay Patel 165663aeeb [x86] add test to show volatile store splitting; NFC
From the LangRef:
"the backend should never split or merge target-legal
volatile load/store instructions."

See also:
D62498

llvm-svn: 361785
2019-05-27 23:56:41 +00:00
Matt Arsenault ca84c4be4b RegAllocFast: Set MayLiveAcrossBlocks when allocating uses
Setting mayLiveOut based only on use instructions after allocating the
def block did not work if the use block was allocated before the def
block, since the virtual register uses were already removed.

Fixes bug 41973.

llvm-svn: 361781
2019-05-27 20:37:31 +00:00
Michael Liao 9c70c574b4 [SelectionDAG] Enhance the simplification of `copyto` from `implicit-def`.
Summary:
- The current implementation simplifies the case where the source of
  `copyto` is `implicit-def`ed. However, it only works when that
  `implicit-def` is single-used since it detects that from
  `implicit-def` and cannot determine which destination vreg should be
  used if there are multiple uses.
- This patch changes that detection when `copyto` is being emitted. If
  that `copyto`'s source is defined from `implicit-def`, it simplifies
  it. Hence, it works even that `implicit-def` is multi-used.
- Except it simplifies the internal IR, it won't improve the quality of
  code generation. However, it helps to detect 'implicit-def` in a
  straight-forward manner in some passes, such as `si-i1-copies`. A test
  case is added.

Reviewers: sunfish, nhaehnle

Subscribers: jvesely, hiraditya, asbirlea, llvm-commits, yaxunl

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62342

llvm-svn: 361777
2019-05-27 18:26:29 +00:00
Diana Picus c675215f67 [ARM GlobalISel] Un-XFAIL some tests. NFC
It turns out we support big endian now (probably since r332449, but I
haven't bisected to confirm).

llvm-svn: 361756
2019-05-27 10:32:34 +00:00
David L. Jones 0ff41b8a5a Revert r361356: "[MIR] Add simple PRE pass to MachineCSE"
This is problematic on buildbots, as discussed here: https://reviews.llvm.org/rL361356

It seems like the plan already was to revert, but that hasn't happened yet.

llvm-svn: 361746
2019-05-27 06:00:00 +00:00
Yonghong Song e698958ad8 [BPF] generate R_BPF_NONE relocation for BTF DataSec variables
The variables in BTF DataSec type encode in-section offset.
R_BPF_NONE should be generated instead of R_BPF_64_32.

Signed-off-by: Yonghong Song <yhs@fb.com>

Differential Revision: https://reviews.llvm.org/D62460

llvm-svn: 361742
2019-05-26 21:26:06 +00:00
Alexander Timofeev ba447bae74 [AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
             the correct register classes to the cross block values beforehand. For the divergent targets
             same value type requires different register classes dependent on the value divergence.

    Reviewers: rampitec, nhaehnle

    Differential Revision: https://reviews.llvm.org/D59990

    This commit was reverted because of the build failure.
    The reason was mlformed patch.
    Build failure fixed.

llvm-svn: 361741
2019-05-26 20:33:26 +00:00
Simon Pilgrim a044410f37 [X86][SSE] Add shuffle combining support for ISD::ANY_EXTEND_VECTOR_INREG
Reuses what we already have in place for ISD::ZERO_EXTEND_VECTOR_INREG just with a different sentinel

llvm-svn: 361734
2019-05-26 16:00:35 +00:00
David Green 0dbafe191e [ARM] Select fp16 fma
This adds a pattern for fma, similar to the float and double patterns.

Differential Revision: https://reviews.llvm.org/D62330

llvm-svn: 361719
2019-05-26 11:34:30 +00:00
David Green 21542cd6f4 [ARM] Select a number of fp16 rounding functions
This add patterns for fp16 round and ceil etc. Same as the float and double
patterns.

Differential Revision: https://reviews.llvm.org/D62326

llvm-svn: 361718
2019-05-26 11:13:00 +00:00
David Green c9f4b7d201 [ARM] Promote various fp16 math intrinsics
Promote a number of fp16 math intrinsics to float, so that the relevant float
math routines can be used. Copysign is expanded so as to be handled in-place.

Differential Revision: https://reviews.llvm.org/D62325

llvm-svn: 361717
2019-05-26 10:59:21 +00:00
Simon Pilgrim 58a8541dcc [X86][AVX] combineBitcastvxi1 - peek through bitops to determine size of original vector
We were only testing for direct SETCC results - this allows us to peek through AND/OR/XOR combinations of the comparison results as well.

There's a missing SEXT(PACKSS) fold that I need to investigate for v8i1 cases before I can enable it there as well.

llvm-svn: 361716
2019-05-26 10:54:23 +00:00
David Green 2881325b17 [ARM] Select fp16 fabs
This adds a pattern for the fabs intrinsic, the same as float and double.

Differential Revision: https://reviews.llvm.org/D62324

llvm-svn: 361715
2019-05-26 10:51:58 +00:00
David Green aeade651f3 [ARM] Select fp16 fsqrt
This adds a pattern for the sqrt intrinsic, the same as float and double.

Differential Revision: https://reviews.llvm.org/D62322

llvm-svn: 361714
2019-05-26 10:42:24 +00:00
David Green caf8a11b65 [ARM] Promote fp16 frem
Promote fp16 frem operations on ARM to floats so they call fmodf.

Differential Revision: https://reviews.llvm.org/D62321

llvm-svn: 361713
2019-05-26 10:30:22 +00:00
David Green 1c1e2ca022 [ARM] Add some base fullfp16 tests. NFC
llvm-svn: 361712
2019-05-26 10:06:40 +00:00
Simon Pilgrim 40fa52b174 [X86] lowerBuildVectorToBitOp - support build_vector(shift()) -> shift(build_vector(),C)
Commonly occurs in sign-extension cases

llvm-svn: 361706
2019-05-25 18:02:17 +00:00
Nikita Popov d87eceda0e [X86] Combine fminnum/fmaxnum with non-nan operand to fmin/fmax
If we have a known non-nan operand, place it in the second operand
of fmin/fmax that is returned if either operand is nan.

Differential Revision: https://reviews.llvm.org/D62448

llvm-svn: 361704
2019-05-25 16:44:29 +00:00
Simon Pilgrim 34d5a74b03 [X86][SSE] vector-sext - cleanup prefix lists
Add X32-SSE common prefix to merge some checks

llvm-svn: 361702
2019-05-25 16:33:17 +00:00
Sanjay Patel 3f0905e46f [SelectionDAG] define binops as a superset of commutative binops
The test diffs show improved vector narrowing for integer min/max opcodes because
those were all absent from the list. I'm not sure if we can expose functional diffs
for all of the moved/added opcodes though.

It seems like we are missing an AVX512 opportunity to use 256-bit ops in place of
512-bit ops on some tests/targets, but I think that can be a follow-up.

Preliminary steps to make sure the callers are not misusing these queries:
rL361268
rL361547

Differential Revision: https://reviews.llvm.org/D62191

llvm-svn: 361701
2019-05-25 15:28:55 +00:00
Nikita Popov c9de92ee76 [X86] Add tests for min/maxnum with const operand; NFC
llvm-svn: 361700
2019-05-25 15:06:54 +00:00
David Bolvansky 2149811854 [NFC] Make tests more robust for new optimizations
llvm-svn: 361697
2019-05-25 14:10:20 +00:00
Sanjay Patel 91131b6500 [SelectionDAG] soften assertion when legalizing narrow vector FP ops
The test based on PR42010:
https://bugs.llvm.org/show_bug.cgi?id=42010
...may show an inaccuracy for PPC's target defs, but we should not
be so aggressive with an assert here. There's no telling what out-of-tree
targets look like.

llvm-svn: 361696
2019-05-25 13:48:07 +00:00
Craig Topper 46e5052b8e [X86FixupLEAs] Turn optIncDec into a generic two address LEA optimizer. Support LEA64_32r properly.
INC/DEC is really a special case of a more generic issue. We should also turn leas into add reg/reg or add reg/imm regardless of the slow lea flags.

This also supports LEA64_32 which has 64 bit input registers and 32 bit output registers. So we need to convert the 64 bit inputs to their 32 bit equivalents to check if they are equal to base reg.

One thing to note, the original code preserved the kill flags by adding operands to the new instruction instead of using addReg. But I think tied operands aren't supposed to have the kill flag set. I dropped the kill flags, but I could probably try to preserve it in the add reg/reg case if we think its important. Not sure which operand its supposed to go on for the LEA64_32r instruction due to the super reg implicit uses. Though I'm also not sure those are needed since they were probably just created by an INSERT_SUBREG from a 32-bit input.

Differential Revision: https://reviews.llvm.org/D61472

llvm-svn: 361691
2019-05-25 06:17:47 +00:00
Peter Collingbourne 3b93737446 Revert r361644, "[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence."
Broke sanitizer bots:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/21694/steps/bootstrap%20clang/logs/stdio
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/32478/steps/check-llvm%20asan/logs/stdio

llvm-svn: 361688
2019-05-25 01:52:38 +00:00
Jessica Paquette 97d668d70f [GlobalISel][AArch64] Make FP constraint checks consider possible use/def banks
In a few places in getInstrMapping, we check if use/def instructions for the
instruction we're mapping have floating point constraints.

We can improve this check and reduce the number of copies in GISel-compiled code
if we make a couple observations:

- For a def instruction, it only matters if the def instruction must always
  output a value stored on a FPR

- For a use instruction, it only matters if the use instruction must always
  only take in values stored in FPRs

This adds two new functions:

- onlyUsesFP
- onlyDefinesFP

Then we can use those when we're checking the uses/defs instead.

Without this patch, the load, unmerge, store, and select in the added test
would have unnecessary copies.

Differential Revision: https://reviews.llvm.org/D62426

llvm-svn: 361679
2019-05-24 23:08:45 +00:00
Jason Liu 8e1d921bb3 Implement call lowering without parameters on AIX
Summary:dd
This patch implements call lowering for calls without parameters
on AIX as initial support.

Reviewers: sfertile, hubert.reinterpretcast, aheejin, efriedma

Differential Revision: https://reviews.llvm.org/D61948

llvm-svn: 361669
2019-05-24 20:54:35 +00:00
Jessica Paquette 56503865ed [GlobalISel][AArch64] Improve register bank mappings for G_SELECT
The fcsel and csel instructions differ in only the register banks they work on.

So, they're entirely interchangeable otherwise.

With this in mind, this does two things:

- Teach AArch64RegisterBankInfo to consider the inputs to G_SELECT as well as
  the outputs.
- Teach it to choose the best register bank mapping based off the constraints
  of the inputs and outputs.

The "best" in this case means the one that requires the smallest number of
copies to properly emit a fcsel/csel.

For example, if the inputs are all already going to be on FPRs, we should
emit a fcsel, even if the output is a GPR. This costs one copy to produce the
result, but saves us from copying the inputs into GPRs.

Also update the regbank-select.mir to check that we end up with the right
select instruction.

Differential Revision: https://reviews.llvm.org/D62267

llvm-svn: 361665
2019-05-24 19:35:25 +00:00
Matt Arsenault 3d59e388ca AMDGPU: Activate all lanes when spilling CSR VGPR for SGPR spills
If some lanes weren't active on entry to the function, this could
clobber their VGPR values.

llvm-svn: 361655
2019-05-24 18:18:51 +00:00
Alexander Timofeev dffedea014 [AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
         the correct register classes to the cross block values beforehand. For the divergent targets
         same value type requires different register classes dependent on the value divergence.

Reviewers: rampitec, nhaehnle

Differential Revision: https://reviews.llvm.org/D59990

llvm-svn: 361644
2019-05-24 15:32:18 +00:00
Stefan Pintilie 522307fa40 [PowerPC] Remove CRBits Copy Of Unset/set CBit
For the situation, where we generate the following code:

       crxor 8, 8, 8
       < Some instructions>
.LBB0_1:
       < Some instructions>
       cror 1, 8, 8

cror (COPY of CRbit) depends on the result of the crxor instruction.
CR8 is known to be zero as crxor is equivalent to CRUNSET. We can simply use
crxor 1, 1, 1 instead to zero out CR1, which does not have any dependency on
any previous instruction.

This patch will optimize it to:

        < Some instructions>
.LBB0_1:
        < Some instructions>
        cror 1, 1, 1

Patch By: Victor Huang (NeHuang)

Differential Revision: https://reviews.llvm.org/D62044

llvm-svn: 361632
2019-05-24 12:05:37 +00:00
Simon Pilgrim 95b8d9bbf8 [SelectionDAG] computeKnownBits - support constant pool values from target
This patch adds the overridable TargetLowering::getTargetConstantFromLoad function which allows targets to return any constant value loaded by a LoadSDNode node - only X86 makes use of this so far but everything should be in place for other targets.

computeKnownBits then uses this function to improve codegen, notably vector code after legalization.

A future commit will do the same for ComputeNumSignBits but computeKnownBits sees the bigger benefit.

This required a couple of fixes:
* SimplifyDemandedBits must early-out for getTargetConstantFromLoad cases to prevent infinite loops of constant regeneration (similar to what we already do for BUILD_VECTOR).
* Fix a DAGCombiner::visitTRUNCATE issue as we had trunc(shl(v8i32),v8i16) <-> shl(trunc(v8i16),v8i32) infinite loops after legalization on AVX512 targets.

Differential Revision: https://reviews.llvm.org/D61887

llvm-svn: 361620
2019-05-24 10:03:11 +00:00
Tim Northover 3b2157aeed GlobalISel: support swifterror attribute on AArch64.
swifterror marks an argument as a register pretending to be a pointer, so we
need a guaranteed mem2reg-like analysis of its uses. Fortunately most of the
infrastructure can be reused from the DAG world.

llvm-svn: 361608
2019-05-24 08:40:13 +00:00
Simon Atanasyan c1b482f2a5 [mips] Always check that `shift and add` optimization is efficient.
The D45316 introduced the `shouldTransformMulToShiftsAddsSubs` function
to check that breaking down constant multiplications into a series
of shifts, adds, and subs is efficient. Unfortunately, this function
does not check maximum number of steps on all paths of the algorithm.
This patch fixes this bug.

Fix for PR41929.

Differential Revision: https://reviews.llvm.org/D62166

llvm-svn: 361606
2019-05-24 08:39:40 +00:00
QingShan Zhang 449bfdd1b0 [Power9] Add a specific heuristic to schedule the addi before the load
When we are scheduling the load and addi, if all other heuristic didn't take effect,
 we will try to schedule the addi before the load, to hide the latency, and avoid the
 true dependency added by RA. And this only take effects for Power9.

Differential Revision: https://reviews.llvm.org/D61930

llvm-svn: 361600
2019-05-24 05:30:09 +00:00
Craig Topper af0add6c39 [X86] Add test case that was supposed to go with r360102.
Found in my working area. Guess I forgot 'git add' before committing.

llvm-svn: 361599
2019-05-24 04:46:56 +00:00
Reid Kleckner b7a78c7dff [AArch64] Preserve X8 for thunks ending in variadic musttail calls
Summary:
On Windows, X8 may be used to pass in the address of an aggregate that
is returned indirectly. Therefore, it should be forwarded to variadic
musttail calls and preserved in thunks.

Fixes PR41997

Reviewers: mgrang, efriedma

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62344

llvm-svn: 361585
2019-05-24 01:27:20 +00:00
Serge Pavlov ed595e8627 [AArch64] Add nvcast patterns for v2f32 -> v1f64
Summary: Constant stores of f32 values can create such NvCast nodes.

Reviewers: t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62285

llvm-svn: 361584
2019-05-24 01:20:34 +00:00
Thomas Lively 55229f6b10 [WebAssembly] Expand more SIMD float ops
Summary: These were previously causing ISel failures.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62354

llvm-svn: 361577
2019-05-24 00:15:04 +00:00
Roman Lebedev f81ebfb045 UpdateTestChecks: ppc32 triple support
Summary:
Appears identical to powerpc64{,le}.
Regenerate test that is being affected by upcoming patch.

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: nemanjai, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62339

llvm-svn: 361543
2019-05-23 19:54:41 +00:00
Matt Arsenault 5c714cbdd8 AMDGPU: Correct maximum possible private allocation size
We were assuming a much larger possible per-wave visible stack
allocation than is possible:

faa3ae5138/src/core/runtime/amd_gpu_agent.cpp (L70)

Based on this, we can assume the high 15 bits of a frame index or sret
are 0. The frame index value is the per-lane offset, so the maximum
frame index value is MAX_WAVE_SCRATCH / wavesize.

Remove the corresponding subtarget feature and option that made
this configurable.

llvm-svn: 361541
2019-05-23 19:38:14 +00:00
Robert Lougher 170dfeb2ff Resubmit r360436 "[X86] Avoid SFB - Fix inconsistent codegen with/without debug info"
Fixes https://bugs.llvm.org/show_bug.cgi?id=40969

The functions findPotentiallyBlockedCopies and buildCopy are currently not
accounting for the presence of debug instructions. In the former this results
in the optimization not being trigerred, and in the latter results in
inconsistent codegen.

This patch enables the optimization to be performed in a debug build and
ensures the codegen is consistent with non-debug builds.

Patch by Chris Dawson.

Differential Revision: https://reviews.llvm.org/D61680

llvm-svn: 361527
2019-05-23 18:15:12 +00:00
Thomas Lively e18b5c6237 [WebAssembly] Implement ReplaceNodeResults to fix a SIMD crash
Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61037

llvm-svn: 361526
2019-05-23 18:09:26 +00:00
Roman Lebedev 702a152e6a [NFC][PPC] Autogenerate vec_add_sub_quadword.ll test
Being affected by (sub %x, C) -> add %X, (sub 0, C) 'for vectors' patch.

llvm-svn: 361525
2019-05-23 18:08:26 +00:00
Roman Lebedev c8364ef567 [NFC][PPC] Autogenerate vec_add_sub_doubleword.ll test
Being affected by (sub %x, C) -> add %X, (sub 0, C) 'for vectors' patch.

llvm-svn: 361524
2019-05-23 18:08:21 +00:00
Roman Lebedev a8a470c45b [NFC][Mips] Autogenerate msa/i5-s.ll test
Being affected by (sub %x, C) -> add %X, (sub 0, C) 'for vectors' patch.

llvm-svn: 361523
2019-05-23 18:08:17 +00:00
Roman Lebedev 06688fe715 [NFC][Mips] Autogenerate msa/arithmetic.ll test
Being affected by (sub %x, C) -> add %X, (sub 0, C) 'for vectors' patch.

llvm-svn: 361522
2019-05-23 18:08:13 +00:00
Matt Arsenault 0f3ba44b57 AMDGPU/GlobalISel: Legality for integer min/max
llvm-svn: 361519
2019-05-23 17:58:48 +00:00
Thomas Lively eafe8ef6f2 [WebAssembly] Add multivalue and tail-call target features
Summary:
These features will both be implemented soon, so I thought I would
save time by adding the boilerplate for both of them at the same time.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D62047

llvm-svn: 361516
2019-05-23 17:26:47 +00:00
Shoaib Meenai 87226a7202 [AsmPrinter] Treat a narrowing PtrToInt like Trunc
When printing assembly for PtrToInt, AsmPrinter::lowerConstant
incorrectly assumed that if PtrToInt was not converting to an
int with exactly the same number of bits, it must be widening
to a larger int. But this isn't necessarily true; PtrToInt can
also shrink the size, which is useful when you want to produce
a known 32-bit pointer on a 64-bit platform (on x86_64 ELF
this yields a R_X86_64_32 relocation).

The old behavior of falling through to the widening case for a
narrowing PtrToInt yields bogus assembly code like this, which
fails to assemble because the no-op bit and it accidentally
creates is not a valid relocation:

```
        .long   a&-1
```

The fix is to treat a narrowing PtrToInt exactly the same as
it already treats Trunc: just emit the expression and let
the assembler deal with truncating it in the appropriate way.

Patch by Mat Hostetter <mjh@fb.com>.

Differential Revision: https://reviews.llvm.org/D61325

llvm-svn: 361508
2019-05-23 16:29:09 +00:00
Simon Pilgrim 46806749ac [X86] Regenerate LZCNT tests on x86/x32/x64 targets
llvm-svn: 361495
2019-05-23 13:30:10 +00:00
Alex Bradbury 5dabe03b41 [RISCV][NFC] Add nounwind attribute to functions missing it in test/CodeGen/RISCV
r360897 was incomplete, must have applied an old/wip patch. This is in preparation for emitting CFI directives.

llvm-svn: 361493
2019-05-23 12:43:13 +00:00
Simon Pilgrim 46165b2409 [AMDGPU] Regenerate vector sub tests
llvm-svn: 361485
2019-05-23 11:27:28 +00:00
Roman Lebedev 32d976bac1 [NFC][X86] Fix check prefixes and autogenerate fold-pcmpeqd-2.ll test
Being affected by (sub %x, c) -> (add %x, (sub 0, c))
patch in an uncertain way.

llvm-svn: 361483
2019-05-23 10:55:13 +00:00
Sam Parker 617cdc5a6d [ARM][CGP] Clear SafeWrap before each search
The previous patch added a member set to store instructions that we
could allow to wrap. But this wasn't cleared between searches meaning
that they could get promoted, incorrectly, during the promotion of a
separate valid chain.

Differential Revision: https://reviews.llvm.org/D62254

llvm-svn: 361462
2019-05-23 07:46:39 +00:00
Thomas Lively 1a3cbe720c [WebAssembly] Implement __builtin_return_address for emscripten
Summary:
In this patch, `ISD::RETURNADDR` is lowered on the emscripten target
to the new Emscripten runtime function `emscripten_return_address`, which
implements the functionality.

Patch by Guanzhong Chen

Reviewers: tlively, aheejin

Reviewed By: tlively

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62210

llvm-svn: 361454
2019-05-23 01:24:01 +00:00
Fangrui Song 86c9ca48c3 [X86] Support -fno-plt __tls_get_addr calls
In general dynamic/local dynamic TLS models, with -fno-plt,

* x86: emit `calll *___tls_get_addr@GOT(%ebx)` instead of `calll ___tls_get_addr@PLT`
  Note, on x86, if we can get rid of %ebx as the PIC register,
  it may be better to use a register not preserved across function calls.
* x86_64: emit `callq *__tls_get_addr@GOTPCREL(%rip)` instead of `callq __tls_get_addr@PLT`

Reorganize the code by separating 32-bit and 64-bit.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D62106

llvm-svn: 361453
2019-05-23 01:05:13 +00:00
Craig Topper 93f38e1f1a [X86] Explcitly disable VEXTRACT instruction matching for an immediate of 0. Remove a bunch of isel patterns that become unnecessary.
We effectively had a second set of isel patterns that tried to use a
regular store instruction and an extract_subreg instruction. Or a masked move
and an extract_subreg. These patterns were intended to override the
matching of VEXTRACT instructions by taking advantage of the priority
of the explicit immediate 0 for the index.

This patch instaed just disables the immediate 0 matchin the VEXTRACT
patterns. This each of the component pieces of the larger patterns will
match by themselves.

This found a bug of sorts were we didn't use 128-bit store for 512->128
extract on KNL. Its unclear what the right thing here should be.
Using the vextract avoids constraining the register allocator to use
xmm0-15. But it always results in a longer encoding if the register
allocator ends up choosing xmm0-15 anyway.

llvm-svn: 361431
2019-05-22 21:00:18 +00:00
Craig Topper 9816d55776 [X86][InstCombine] Remove InstCombine code that turns X86 round intrinsics into llvm.ceil/floor. Remove some isel patterns that existed because that was happening.
We were turning roundss/sd/ps/pd intrinsics with immediates of 1 or 2 into
llvm.floor/ceil.  The llvm.ceil/floor intrinsics are supposed to correspond
to the libm functions.  For the libm functions we need to disable the
precision exception so the llvm.floor/ceil functions should always map to
encodings 0x9 and 0xA.

We had a mix of isel patterns where some used 0x9 and 0xA and others used
0x1 and 0x2. We need to be consistent and always use 0x9 and 0xA.

Since we have no way in isel of knowing where the llvm.ceil/floor came
from, we can't map X86 specific intrinsics with encodings 1 or 2 to it.
We could map 0x9 and 0xA to llvm.ceil/floor instead, but I'd really like
to see a use case and optimization advantage first.

I've left the backend test cases to show the blend we now emit without
the extra isel patterns. But I've removed the InstCombine tests completely.

llvm-svn: 361425
2019-05-22 20:04:55 +00:00
Alexey Lapshin 53726588f6 [DebugInfo][AArch64] Recognise target specific instruction as mov instr
This fix is for the problem from https://bugs.llvm.org/show_bug.cgi?id=38714.
Specifically, Simple Register Coalescing creates following conversion :

 undef %0.sub_32:gpr64 = ORRWrs $wzr, %3:gpr32common, 0, debug-location !24;

It copies 32-bit value from gpr32 into gpr64. But Live DEBUG_VALUE analysis
is not able to create debug location record for that instruction. So the problem
is in that debug info for argc variable is incorrect. The fix is
to write custom isCopyInstrImpl() which would recognize the ORRWrs instr.

llvm-svn: 361417
2019-05-22 18:48:58 +00:00
Roman Lebedev 5e1ce15c5d [NFC][X86][AArch64] Add tests for missing (x - y) + -1 -> not(y) + x fold
https://rise4fun.com/Alive/OaY

llvm-svn: 361409
2019-05-22 16:58:26 +00:00
Matt Arsenault ca64ef2043 MC: Allow getMaxInstLength to depend on the subtarget
Keep it optional in cases this is ever needed in some global
context. Currently it's only used for getting an upper bound inline
asm code size.

For AMDGPU, gfx10 increases the maximum instruction size to
20-bytes. This avoids penalizing older subtargets when estimating code
size, and making some annoying branch relaxation test adjustments.

llvm-svn: 361405
2019-05-22 16:28:41 +00:00
Kees Cook c2187c20a4 [TargetLowering] Extend bool args to inline-asm according to getBooleanType
Summary:
This extends Krzysztof Parzyszek's X86-specific solution
(https://reviews.llvm.org/D60208) to the generic code pointed out by
James Y Knight.

Reviewers: kparzysz, craig.topper, nickdesaulniers

Subscribers: efriedma, sdardis, nemanjai, javed.absar, eraman, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, llvm-commits, srhines, void, nickdesaulniers, jyknight

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60224

llvm-svn: 361404
2019-05-22 16:16:15 +00:00
Sanjay Patel 5a4f7cf2ff [IR] allow fast-math-flags on select of FP values
This is a minimal start to correcting a problem most directly discussed in PR38086:
https://bugs.llvm.org/show_bug.cgi?id=38086

We have been hacking around a limitation for FP select patterns by using the
fast-math-flags on the condition of the select rather than the select itself.
This patch just allows FMF to appear with the 'select' opcode. No changes are
needed to "FPMathOperator" because it already includes select-of-FP because
that definition is based on the (return) value type.

Once we have this ability, we can start correcting and adding IR transforms
to use the FMF on a 'select' instruction. The instcombine and vectorizer test
diffs only show that the IRBuilder change is behaving as expected by applying
an FMF guard value to 'select'.

For reference:
rL241901 - allowed FMF with fcmp
rL255555 - allowed FMF with FP calls

Differential Revision: https://reviews.llvm.org/D61917

llvm-svn: 361401
2019-05-22 15:50:46 +00:00
Roman Lebedev 1f63d7fef9 [NFC][ARM] addsubcarry-promotion.ll: whoops - replace '.' with '-' in check-prefix
Does not affect update_llc_test_checks, or the actual output,
but is not accepted by the actual FileCheck.

Sorry, i should have noticed this before committing,
not the very next second after..

llvm-svn: 361398
2019-05-22 15:42:33 +00:00
Roman Lebedev 1b45bdf5ba [NFC][ARM] Autogenerate addsubcarry-promotion.ll test
Being affected by upcoming patch

llvm-svn: 361397
2019-05-22 15:34:51 +00:00
Roman Lebedev 6a53135698 [NFC][X86] Autogenerate negative-offset.ll test
Being affected by upcoming patch

llvm-svn: 361396
2019-05-22 15:34:43 +00:00
Roman Lebedev 406421b332 [NFC][X86][AArch64] Rewrite sink-addsub-of-const.ll tests to have full permutation coverage
Somehow missed some patterns initially..
While there, add comments.

llvm-svn: 361390
2019-05-22 14:42:41 +00:00
Roman Lebedev 7c72ca012d UpdateTestChecks: sparc march handling
Summary:
Another target that prefers to use `-march` in tests
```
llvm/test/CodeGen/SPARC$ grep -ri mtriple | wc -l
25
llvm/test/CodeGen/SPARC$ grep -ri march | wc -l
165
```

This test is being affected by a further patch,
so regenerate it to better visualize the changes

Reviewers: RKSimon, dcederman, gberry

Reviewed By: RKSimon

Subscribers: jyknight, fedor.sergeev, jrtc27, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62242

llvm-svn: 361381
2019-05-22 13:04:34 +00:00
Roman Lebedev 4bf35671b5 [NFC][SystemZ] Autogenerate alloca-03.ll test to make test changes more visible
The check lines are being affected by an upcoming patch,
regenerate the checklines to visualize the changes better.

llvm-svn: 361380
2019-05-22 13:04:24 +00:00
Anton Afanasyev df00c6a54f [MIR] Add simple PRE pass to MachineCSE
This is the second part of the commit fixing PR38917 (hoisting
partitially redundant machine instruction). Most of PRE (partitial
redundancy elimination) and CSE work is done on LLVM IR, but some of
redundancy arises during DAG legalization. Machine CSE is not enough
to deal with it. This simple PRE implementation works a little bit
intricately: it passes before CSE, looking for partitial redundancy
and transforming it to fully redundancy, anticipating that the next
CSE step will eliminate this created redundancy. If CSE doesn't
eliminate this, than created instruction will remain dead and eliminated
later by Remove Dead Machine Instructions pass.

The third part of the commit is supposed to refactor MachineCSE,
to make it more clear and to merge MachinePRE with MachineCSE,
so one need no rely on further Remove Dead pass to clear instrs
not eliminated by CSE.

First step: https://reviews.llvm.org/D54839

Fixes llvm.org/PR38917

llvm-svn: 361356
2019-05-22 07:41:34 +00:00
Fangrui Song 1c61471ab1 [PPC64] Parse -elfv1 -elfv2 when specified on target triple
Summary:
For big-endian powerpc64, the default ABI is ELFv1. OpenPower ABI ELFv2 is supported when -mabi=elfv2 is specified. FreeBSD support for PowerPC64 ELFv2 ABI with LLVM is in progress[1]. This patch adds an alternative way to specify ELFv2 ABI on target triple [2].

The following results are expected:

ELFv1 when using:
-target powerpc64-unknown-freebsd12.0
-target powerpc64-unknown-freebsd12.0 -mabi=elfv1
-target powerpc64-unknown-freebsd12.0-elfv1

ELFv2 when using:
-target powerpc64-unknown-freebsd12.0 -mabi=elfv2
-target powerpc64-unknown-freebsd12.0-elfv2

[1] https://wiki.freebsd.org/powerpc/llvm-elfv2
[2] https://clang.llvm.org/docs/CrossCompilation.html

Patch by Alfredo Dal'Ava Júnior!

Differential Revision: https://reviews.llvm.org/D61950

llvm-svn: 361355
2019-05-22 07:29:59 +00:00
Nikita Popov 15df05152d [X86] Don't compare i128 through vector if construction not cheap (PR41971)
Fix for https://bugs.llvm.org/show_bug.cgi?id=41971. Make the
combineVectorSizedSetCCEquality() transform more conservative by
checking that the bitcast to the vector type will be cheap/free
for both operands. I'm considering it cheap if it's a constant,
a load or already a vector. I've dropped the explicit check for
f128 because it should fall out naturally (in the cases where
it'd be detrimental).

Differential Revision: https://reviews.llvm.org/D62220

llvm-svn: 361352
2019-05-22 06:47:06 +00:00
Chen Zheng 9970665f60 [PowerPC] [ISEL] select x-form instruction for unaligned offset
Differential Revision: https://reviews.llvm.org/D62173

llvm-svn: 361346
2019-05-22 02:57:31 +00:00
Pengfei Wang 6a0d432e9e [X86] [CET] Deal with return-twice function such as vfork, setjmp when
CET-IBT enabled

Return-twice functions will indirectly jump after the caller's position.
So when CET-IBT is enable, we should make sure these is endbr*
instructions follow these Return-twice function caller. Like GCC does.

Patch by Xiang Zhang (xiangzhangllvm)

Differential Revision: https://reviews.llvm.org/D61881

llvm-svn: 361342
2019-05-22 00:50:21 +00:00
Matt Arsenault 2cba91b8db AMDGPU: Assume calls read exec
llvm-svn: 361333
2019-05-21 23:23:16 +00:00
Matt Arsenault eea81c20fe AMDGPU: Add some tests for inlineasm behavior
llvm-svn: 361332
2019-05-21 23:23:12 +00:00
Matt Arsenault dd1ffa00a5 AMDGPU: Assume call pseudos are convergent
There should probably be nonconvergent versions, but my guess is it
doesn't matter in practice.

llvm-svn: 361331
2019-05-21 23:23:10 +00:00
Matt Arsenault 60ba03e210 AMDGPU: Fix not marking new gfx10 SGPRs as CSRs
llvm-svn: 361330
2019-05-21 23:23:05 +00:00
Dan Gohman a49496fb2a [WebAssembly] Add the signature for the new llround builtin function
r360889 added new llround builtin functions. This patch adds their
signatures for the WebAssembly backend.

It also adds wasm32 support to utils/update_llc_test_checks.py, since
that's the script other targets are using for their testcases for this
feature.

Differential Revision: https://reviews.llvm.org/D62207

llvm-svn: 361327
2019-05-21 23:06:34 +00:00
Roman Lebedev 675307b1f1 [NFC][AMDGPU] Autogenerate llvm.amdgcn.s.barrier.ll test
llvm-svn: 361320
2019-05-21 21:49:14 +00:00
Roman Lebedev 21e8ec8d4f [NFC][X86] Autogenerate ragreedy-hoist-spill.ll test
llvm-svn: 361319
2019-05-21 21:49:10 +00:00
Roman Lebedev 079d8b425f [NFC][Thumb2] Autogenerate thumb2-ldr_pre.ll test
llvm-svn: 361318
2019-05-21 21:49:05 +00:00
Nikita Popov d34d96770e [X86] Add large integer comparison tests for PR41971; NFC
In these cases we would prefer a direct comparison over going through
a vector type.

llvm-svn: 361315
2019-05-21 21:27:08 +00:00
Yi-Hong Lyu 00e85f7535 Move csr-save-restore-order.ll to the right place
llvm-svn: 361306
2019-05-21 20:28:31 +00:00
Roman Lebedev a7e88f8570 [NFC][X86][AArch64] Add tests for sinking of add/sub by constant through add/sub
Looks we can transform all 8 variants of the pattern:
https://rise4fun.com/Alive/auH

This comes up as an issue on the path towards
https://bugs.llvm.org/show_bug.cgi?id=41952

llvm-svn: 361303
2019-05-21 20:14:54 +00:00
Stanislav Mekhanoshin 44d17ca02e Fix register coalescer failure to prune value
Register coalescer fails for the test in the patch with the assertion in
JoinVals::ConflictResolution `DefMI != nullptr'. It attempts to join
live intervals for two adjacent instructions and erase the copy:

    %2:vreg_256 = COPY %1
    %3:vreg_256 = COPY killed %1

The LI needs to be adjusted to kill subrange for the erased instruction
and extend the subrange of the original def. That was done for the main
interval only but not for the subrange. As a result subrange had a VNI
pointing to the erased slot resulting in the above failure.

Differential Revision: https://reviews.llvm.org/D62162

llvm-svn: 361293
2019-05-21 19:32:41 +00:00
Leonard Chan 0bada7ce6c [Intrinsic] Signed Fixed Point Saturation Multiplication Intrinsic
Add an intrinsic that takes 2 signed integers with the scale of them provided
as the third argument and performs fixed point multiplication on them. The
result is saturated and clamped between the largest and smallest representable
values of the first 2 operands.

This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.

Differential Revision: https://reviews.llvm.org/D55720

llvm-svn: 361289
2019-05-21 19:17:19 +00:00
Simon Pilgrim 4b82e50315 [X86][SSE] computeKnownBitsForTargetNode - add X86ISD::ANDNP support
Fixes PACKSS-PSHUFB shuffle regressions mentioned on D61692

llvm-svn: 361270
2019-05-21 15:20:24 +00:00
Roman Lebedev d8db224ecb [NFC][X86][AArch64] Shift amount masking: tests that show that 'neg' doesn't last
Meaning if we were to produce 'neg' in dagcombine, we will get an
endless cycle; some inverse transform would need to be guarded somehow.

Also, the 'and (sub 0, x), 31' variant is sticky,
doesn't get optimized in any way.

https://bugs.llvm.org/show_bug.cgi?id=41952

llvm-svn: 361254
2019-05-21 13:04:56 +00:00
Simon Pilgrim bc03bee66b [X86][SSE] Add shuffle tests for 'splat3' patterns.
Test codegen from shuffles for { dst[0] = dst[1] = dst[2] = *src++; dst += 3 } 'splatting' memcpy patterns generated by loop-vectorizer.

llvm-svn: 361243
2019-05-21 11:42:28 +00:00
Roman Lebedev 2aee73f591 [NFC][X86][AArch64] Add some more tests for shift amount masking
The negation creation should be more eager:
https://bugs.llvm.org/show_bug.cgi?id=41952

llvm-svn: 361241
2019-05-21 11:14:01 +00:00
Florian Hahn 4a8835c655 [AArch64] Skip mask checks for masks with an odd number of elements.
Some checks in isShuffleMaskLegal expect an even number of elements,
e.g. isTRN_v_undef_Mask or isUZP_v_undef_Mask, otherwise they access
invalid elements and crash. This patch adds checks to the impacted
functions.

Fixes PR41951

Reviewers: t.p.northover, dmgreen, samparker

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D60690

llvm-svn: 361235
2019-05-21 10:05:26 +00:00
Sam Parker 3141bbd52d [ARM][CGP] Skip nuw in PrepareConstants
PrepareConstants step converts add/sub with 'negative' immediates to
sub/add with a 'positive' imm to make promotion more simple. nuw
already states that the add shouldn't cause an unsigned wrap, so
it shouldn't need any tweaking. Plus, we also don't allow a sub with
a 'negative' immediate to be safe wrap, so this functionality has
been removed. The PrepareConstants step now just handles the add
instructions that we've determined would be safe if they wrap around
zero.

Differential Revision: https://reviews.llvm.org/D62057

llvm-svn: 361227
2019-05-21 07:56:47 +00:00
Dylan McKay e967308da4 Add TargetLoweringInfo hook for explicitly setting the ABI calling convention endianess
Summary:
The endianess used in the calling convention does not always match the
endianess of the target on all architectures, namely AVR.

When an argument is too large to be legalised by the architecture and is
split for the ABI, a new hook TargetLoweringInfo::shouldSplitFunctionArgumentsAsLittleEndian
is queried to find the endianess that function arguments must be laid
out in.

This approach was recommended by Eli Friedman.

Originally reported in https://github.com/avr-rust/rust/issues/129.

Patch by Carl Peto.

Reviewers: bogner, t.p.northover, RKSimon, niravd, efriedma

Reviewed By: efriedma

Subscribers: JDevlieghere, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62003

llvm-svn: 361222
2019-05-21 06:38:02 +00:00
QingShan Zhang 690fa1b51b [NFC][PowerPC] Add a test to verify if the scheduler schedule the addi before the load.
llvm-svn: 361221
2019-05-21 06:32:31 +00:00
Nikita Popov e44691bf9f Move thumbv7k test from AArch64 to ARM
As pointed out by charukcs on rL361166, this test uses an ARM triple.

llvm-svn: 361220
2019-05-21 06:24:36 +00:00
Chen Zheng e64bcada5f [PowerPC] test cases for selecting x-form instruction for unaligned offset - NFC
llvm-svn: 361219
2019-05-21 05:06:09 +00:00
Matt Arsenault 6dd08e335f AMDGPU: Force skip branches over calls
Unfortunately the way SIInsertSkips works is backwards, and is
required for correctness. r338235 added handling of some special cases
where skipping is mandatory to avoid side effects if no lanes are
active. It conservatively handled asm correctly, but the same logic
needs to apply to calls.

Usually the call sequence code is larger than the skip threshold,
although the way the count is computed is really broken, so I'm not
sure if anything was likely to really hit this.

llvm-svn: 361202
2019-05-20 22:04:42 +00:00
Martin Storsjo 4ed18e5ef5 [AArch64] Handle lowering lround on windows, where long is 32 bit
Differential Revision: https://reviews.llvm.org/D62108

llvm-svn: 361192
2019-05-20 19:53:28 +00:00
Craig Topper e97e52757c [X86] Add test case for r361177.
That commit makes sure we flush PendingExports in SelectDAGBuilder
before we create INLINEASM_BR. Unfortunatley, I haven't yet found
a CodeGen failure without that change.

This commit uses the debug output from SelectionDAG to at least
ensure we build the DAG correctly.

llvm-svn: 361179
2019-05-20 17:37:52 +00:00
Nikita Popov 9060b6df97 [SDAG] Vector op legalization for overflow ops
Fixes issue reported by aemerson on D57348. Vector op legalization
support is added for uaddo, usubo, saddo and ssubo (umulo and smulo
were already supported). As usual, by extracting TargetLowering methods
and calling them from vector op legalization.

Vector op legalization doesn't really deal with multiple result nodes,
so I'm explicitly performing a recursive legalization call on the
result value that is not being legalized.

There are some existing test changes because expansion happens
earlier, so we don't get a DAG combiner run in between anymore.

Differential Revision: https://reviews.llvm.org/D61692

llvm-svn: 361166
2019-05-20 16:09:22 +00:00
Matt Arsenault 7c8ec18964 RegAlloc: Fix verifier error with undef identity copies
The code did not match the example in the comment, and was checking
the undef flag on the copy dest instead of source. The existing tests
were only hitting the > 2 operands case.

llvm-svn: 361156
2019-05-20 14:09:36 +00:00
Sander de Smalen f83cccf917 Match types of accumulator and result for llvm.experimental.vector.reduce.fadd/fmul
The scalar start/accumulator value of the fadd- and fmul reduction
should match the result type of the reduction, as well as the vector
element-type of the input vector. Although this was not explicitly
specified in the LangRef, it was taken for granted in code implementing
the reductions. The patch also fixes the LangRef by adding this
constraint.

Reviewed By: aemerson, nikic

Differential Revision: https://reviews.llvm.org/D60260

llvm-svn: 361133
2019-05-20 09:54:06 +00:00
Carl Ritson 34e95ce259 [AMDGPU] gfx1010 Avoid SMEM WAR hazard for some s_waitcnt values
Summary:
Avoid introducing hazard mitigation when lgkmcnt is reduced to 0.
Clarify code comments to explain assumptions made for this hazard
mitigation.  Expand and correct test cases to cover variants of
s_waitcnt.

Reviewers: nhaehnle, rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62058

llvm-svn: 361124
2019-05-20 07:20:12 +00:00
Roman Lebedev 1a5d623ded [NFC][AArch64] Autogenerate fcopysign.ll test
llvm-svn: 361106
2019-05-18 20:24:40 +00:00
Roman Lebedev 13ac317e4c [NFC][AArch64] Autogenerate bitfield-insert.ll, selectcc-to-shiftand.ll tests
Investigating bit-extract (ubfx) pattern with shifted mask.

llvm-svn: 361105
2019-05-18 17:42:06 +00:00
Roman Lebedev d1be3c446e [NFC][AArch64] Add some ubfx tests with immediates
Shows the regression in D62100

llvm-svn: 361102
2019-05-18 13:49:44 +00:00
Roman Lebedev 98092f37d0 UpdateTestChecks: fix AMDGPU handling
Summary:
Was looking into supporting `(srl (shl x, c1), c2)` with c1 != c2 in dagcombiner,
this test changes, but makes `update_llc_test_checks.py` unhappy.

**Many** AMDGPU tests specify `-march`, not `-mtriple`, which results in `update_llc_test_checks.py`
defaulting to x86 asm function detection heuristics, which don't work here.
I propose to fix this by adding an infrastructure to map from `-march` to `-mtriple`,
in the UpdateTestChecks tooling.

Reviewers: RKSimon, MaskRay, arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62099

llvm-svn: 361101
2019-05-18 13:00:03 +00:00
Roman Lebedev 822b9c971b UpdateTestChecks: arm64-eabi handlind
Summary:
Was looking into supporting `(srl (shl x, c1), c2)` with c1 != c2 in dagcombiner,
this test changes, but makes `update_llc_test_checks.py` unhappy

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62097

llvm-svn: 361100
2019-05-18 12:59:56 +00:00
Matt Arsenault 2f29220d6d AMDGPU/GlobalISel: Implement s64->s64 [SU]ITOFP
llvm-svn: 361082
2019-05-17 23:05:18 +00:00
Matt Arsenault 02b5ca8cd1 GlobalISel: Implement lower for S64->S32 [SU]ITOFP
This is ported from the custom AMDGPU DAG implementation. I think this
is a better default expansion than what the DAG currently uses, at
least if the target has CTLZ.

This implements the signed version in terms of the unsigned
conversion, which is implemented with bit operations. SelectionDAG has
several other implementations that should eventually be ported
depending on what instructions are legal.

llvm-svn: 361081
2019-05-17 23:05:13 +00:00
Matt Arsenault f3cedf4823 GlobalISel: Define integer min/max instructions
Doesn't attempt to emit them for anything yet, but some legalizations
I want to port use them.

llvm-svn: 361061
2019-05-17 18:36:31 +00:00
Simon Pilgrim 065431c82b [X86][SSE] Fold movmsk(not(x)) -> not(movmsk)
Helps to improve folding of comparisons with movmsk results.

llvm-svn: 361056
2019-05-17 17:56:25 +00:00
Simon Pilgrim 2c2f8e74b9 [X86][SSE] Match all-of bool scalar reductions into a bitcast/movmsk + cmp.
Same as what we do for vector reductions in combineHorizontalPredicateResult, use movmsk+cmp for scalar (and(extract(x,0),extract(x,1)) reduction patterns. 

llvm-svn: 361052
2019-05-17 17:25:55 +00:00
Roman Lebedev 64c756b991 [DAGCombiner] visitShiftByConstant(): drop bogus signbit check
Summary:
That check claims that the transform is illegal otherwise.
That isn't true:
1. For `ISD::ADD`, we only process `ISD::SHL` outer shift => sign bit does not matter
   https://rise4fun.com/Alive/K4A
2. For `ISD::AND`, there is no restriction on constants:
   https://rise4fun.com/Alive/Wy3
3. For `ISD::OR`, there is no restriction on constants:
   https://rise4fun.com/Alive/GOH
3. For `ISD::XOR`, there is no restriction on constants:
   https://rise4fun.com/Alive/ml6

So, why is it there then?

This changes the testcase that was touched by @spatel in rL347478,
but i'm not sure that test tests anything particular?

Reviewers: RKSimon, spatel, craig.topper, jojo, rengolin

Reviewed By: spatel

Subscribers: javed.absar, llvm-commits, spatel

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61918

llvm-svn: 361044
2019-05-17 15:52:58 +00:00
Simon Pilgrim 62c7032c18 [X86][AVX] isNOT - add extract_subvector(xor X, -1) -> extract_subvector(X) fold.
Prep work for the removal of the remaining x86 CTTZ vector lowering.

llvm-svn: 361035
2019-05-17 14:04:56 +00:00
Matt Arsenault a510b570c2 AMDGPU/GlobalISel: Legalize G_FCEIL
llvm-svn: 361028
2019-05-17 12:20:05 +00:00