Commit Graph

24469 Commits

Author SHA1 Message Date
Simon Pilgrim be9a206883 [X86] Split WriteCvtF2F into F32->F64 and F64->F32 scheduler classes
BtVer2 - Fixes schedules for (V)CVTPS2PD instructions

A lot of the Intel models still have too many InstRW overrides for these new classes - this needs cleaning up but I wanted to get the classes in first

llvm-svn: 332376
2018-05-15 17:36:49 +00:00
Geoff Berry 32d07d59f5 [AArch64] Fix mir test case liveins info.
The test case added in r332265 had incomplete livein information which
was caught by the EXPENSIVE_CHECKS bot.  Fix the livein information and
add -verify-machineinstrs to the test case.

llvm-svn: 332367
2018-05-15 16:27:34 +00:00
Krzysztof Parzyszek 8c389bd368 [Hexagon] Remove unused flag from subtarget and (non)corresponding test
llvm-svn: 332365
2018-05-15 16:13:52 +00:00
Simon Dardis f40eb03ce9 [mips] Mark select instructions correctly
Reviewers: atanasyan, abeserminji, smaksimovic

Differential Revision: https://reviews.llvm.org/D46702

llvm-svn: 332364
2018-05-15 16:05:04 +00:00
Sanjay Patel 8652c53d29 [DAG] propagate FMF for all FPMathOperators
This is a simple hack based on what's proposed in D37686, but we can extend it if needed in follow-ups. 
It gets us most of the FMF functionality that we want without adding any state bits to the flags. It 
also intentionally leaves out non-FMF flags (nsw, etc) to minimize the patch.

It should provide a superset of the functionality from D46563 - the extra tests show propagation and 
codegen diffs for fcmp, vecreduce, and FP libcalls.

The PPC log2() test shows the limits of this most basic approach - we only applied 'afn' to the last 
node created for the call. AFAIK, there aren't any libcall optimizations based on the flags currently, 
so that shouldn't make any difference.

Differential Revision: https://reviews.llvm.org/D46854

llvm-svn: 332358
2018-05-15 14:16:24 +00:00
Simon Pilgrim 891ebcdbaa [X86] Split off F16C WriteCvtPH2PS/WriteCvtPS2PH scheduler classes
Btver2 - VCVTPH2PSYrm needs to double pump the AGU
Broadwell - missing VCVTPS2PH*mr stores extra latency

Allows us to remove the WriteCvtF2FSt conversion store class

llvm-svn: 332357
2018-05-15 14:12:32 +00:00
Artur Gainullin 243a3d56d8 [X86] Improve unsigned saturation downconvert detection.
Summary:
New unsigned saturation downconvert patterns detection was implemented in
X86 Codegen:

(truncate (smin (smax (x, C1), C2)) to dest_type),
where C1 >= 0 and C2 is unsigned max of destination type.

(truncate (smax (smin (x, C2), C1)) to dest_type)
where C1 >= 0, C2 is unsigned max of destination type and C1 <= C2.
These two patterns are equivalent to:

(truncate (umin (smax(x, C1), unsigned_max_of_dest_type)) to dest_type)

Reviewers: RKSimon

Subscribers: llvm-commits, a.elovikov

Differential Revision: https://reviews.llvm.org/D45315

llvm-svn: 332336
2018-05-15 10:24:12 +00:00
Craig Topper fadf8b8dec [X86] Add fast isel tests for some of the avx512 truncate intrinsics to match current clang codegen.
llvm-svn: 332326
2018-05-15 04:26:27 +00:00
Sanjay Patel 165587b424 [AArch64] enhance test to show FMF loss; NFC
llvm-svn: 332301
2018-05-14 21:53:21 +00:00
Martin Storsjo ace7ae935f [ARM] Back up R4 and LR if calling the stack probe function
Differential Revision: https://reviews.llvm.org/D46777

llvm-svn: 332298
2018-05-14 21:32:52 +00:00
Sanjay Patel 4c8a67a229 [PowerPC] add more tests for FMF propagation; NFC
llvm-svn: 332295
2018-05-14 21:17:49 +00:00
Krzysztof Parzyszek 771f2422d0 [Hexagon] Add a target feature for memop generation
llvm-svn: 332285
2018-05-14 20:09:07 +00:00
Sid Manning d9f2873511 Hexagon: Put relocations after instructions not packets.
Change relocation output so that relocation information follows
individual instructions rather than clustering them at the end
of packets.

This change required shifting block of code but the actual change
is in HexagonPrettyPrinter's PrintInst.

Differential Revision: https://reviews.llvm.org/D46728

llvm-svn: 332283
2018-05-14 19:46:08 +00:00
Craig Topper 53ceb4805f [X86] Remove and autoupgrade avx512.vbroadcast.ss/avx512.vbroadcast.sd intrinsics.
llvm-svn: 332271
2018-05-14 18:21:22 +00:00
Simon Pilgrim 228d24a2d6 [X86][BtVer2] Fix MMX/YMM integer vector nt store schedules
MMX was missing and YMM was tagged as a fp nt store

llvm-svn: 332269
2018-05-14 18:07:28 +00:00
Geoff Berry 64a2ea41ea [BranchFolding] Allow hoisting to block with a single conditional branch.
Summary:
The BranchFolding pass is currently missing opportunities to hoist
common code if the hoisted-to block contains a single conditional branch
that has register uses.  This occurs somewhat frequently on AArch64 with
CBZ/TBZ opcodes.

This change also eliminates some code differences when debug info is
present since the presence of e.g. DBG_VALUE instructions in the
hoisted-to block can enable hoisting that wouldn't have occurred without
them.

Reviewers: MatzeB, rnk, kparzysz, twoh, aprantl, javed.absar

Subscribers: kristof.beyls, JDevlieghere, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D46324

llvm-svn: 332265
2018-05-14 17:31:18 +00:00
Krzysztof Parzyszek 329c3e9a5f [Hexagon] Avoid predicate copies to integer registers from store-locked
llvm-svn: 332260
2018-05-14 16:41:40 +00:00
Evandro Menezes 14fa2e4fa5 [AArch64] Improve single vector lane stores
When storing the 0th lane of a vector, use a simpler and usually more efficient scalar store instead.

Differential revision: https://reviews.llvm.org/D46655

llvm-svn: 332251
2018-05-14 15:26:35 +00:00
Craig Topper f633f3eb67 [X86] Add fast isel test cases for the clang output for 512-bit cvtps2pd related intrinsics.
llvm-svn: 332214
2018-05-14 05:09:41 +00:00
Craig Topper 0e71c6d5ca [X86] Remove and autoupgrade the cvtusi2sd intrinsic. Use uitofp+insertelement instead.
llvm-svn: 332206
2018-05-14 00:06:49 +00:00
Craig Topper 97e74b05ef [X86] Add patterns for combining movss+uint_to_fp into the intrinsic instructions under AVX512.
This matches what we do for sint_to_fp.

llvm-svn: 332205
2018-05-13 23:24:21 +00:00
Craig Topper 12067185d4 [X86] Add fast-isel test cases for _mm_cvtu32_sd, _mm_cvtu64_sd, _mm_cvtu32_ss, and _mm_cvtu64_ss.
llvm-svn: 332204
2018-05-13 23:24:19 +00:00
Craig Topper 85906cf041 [X86] Remove and autoupgrade masked vpermd/vpermps intrinsics.
llvm-svn: 332198
2018-05-13 18:03:59 +00:00
Dimitry Andric a39c409619 Follow-up to rL332176 by adding a test case for PR37264.
Noticed by Simon Pilgrim.

llvm-svn: 332197
2018-05-13 14:32:23 +00:00
Matt Arsenault dfb88dfe30 AMDGPU: Make undef legal for v2i16/v2f16
This is apparently necessary to stop undef from being
turned into a build_vector of 0s.

llvm-svn: 332195
2018-05-13 10:04:38 +00:00
Puyan Lotfi 380a6f55ff [NFC] MIR-Canon: switching to a stable string sorting of instructions.
llvm-svn: 332191
2018-05-13 06:07:20 +00:00
Craig Topper 38b713d4a7 [X86] Add some load folding patterns for cvtsi2ss/sd into intrinsic instructions.
llvm-svn: 332189
2018-05-13 01:54:33 +00:00
Craig Topper 28b85caea8 [X86] Remove some unused CHECK lines from tests.
llvm-svn: 332188
2018-05-13 00:58:23 +00:00
Craig Topper df3a9cedff [X86] Remove an autoupgrade legacy cvtss2sd intrinsics.
llvm-svn: 332187
2018-05-13 00:29:40 +00:00
Craig Topper 38ad7ddabc [X86] Remove and autoupgrade cvtsi2ss/cvtsi2sd intrinsics to match what clang has used for a very long time.
llvm-svn: 332186
2018-05-12 23:14:39 +00:00
Craig Topper a288f241cd [X86] Remove some unused masked conversion intrinsics that can be replaced with an older intrinsic and a select.
This is what clang already uses.

llvm-svn: 332170
2018-05-12 02:34:28 +00:00
Stanislav Mekhanoshin 7012c246c1 [AMDGPU] Fix amdgpu-waves-per-eu accounting in scheduler
We cannot query this attribute from a subtarget given a machine function.
At this point attribute itself is already unavailable and can only be
obtained through MFI.

Differential Revision: https://reviews.llvm.org/D46781

llvm-svn: 332166
2018-05-12 01:41:56 +00:00
Tom Stellard 655fdd3f82 AMDGPU/GlobalISel: Implement select() for >32-bit G_STORE
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D46153

llvm-svn: 332154
2018-05-11 23:12:49 +00:00
Changpeng Fang f094885a9e AMDGPU/SI: Don't promote alloca to vector for AddrSpaceCast instruction.
Summary:
  We have no logic to promote alloca to vector for an AddrSpaceCast instruction.

Reviewer:
  arsenm

Differential Revision:
  https://reviews.llvm.org/D45993

llvm-svn: 332147
2018-05-11 22:17:57 +00:00
Craig Topper a17d627abb [X86] Remove and autoupgrade a bunch of FMA instrinsics that are no longer used by clang.
llvm-svn: 332146
2018-05-11 21:59:34 +00:00
Yaxun Liu deba150c27 [AMDGPU] Fix compilation failure when IR contains comdat
Remove a useless SwitchSection which also causes compilation failure
when IR contains comdat.

The SwitchSection is useless because the current section is already
correct text section for the function therefore no need to switch.

It causes compilation failure for comdat because functions with comdat
has specific text section, not the default .text section.

Since HIP uses comdat, this bug caused failures for HIP.

Differential Revision: https://reviews.llvm.org/D46770

llvm-svn: 332137
2018-05-11 20:40:14 +00:00
Vedant Kumar 99d5c072f0 [DAGCombiner] Set the right SDLoc on extended SETCC uses (7/N)
ExtendSetCCUses updates SETCC nodes which use a load (OriginalLoad) to
reflect a simplification to the load (ExtLoad).

Based on my reading, ExtendSetCCUses may create new nodes to extend a
constant attached to a SETCC. It also creates fresh SETCC nodes which
refer to any updated operands.

ISTM that the location applied to the new constant and SETCC nodes
should be the same as the location of the ExtLoad.

This was suggested by Adrian in https://reviews.llvm.org/D45995.

Part of: llvm.org/PR37262

Differential Revision: https://reviews.llvm.org/D46216

llvm-svn: 332119
2018-05-11 18:40:10 +00:00
Vedant Kumar fd340a4047 [DAGCombiner] Set the right SDLoc on a newly-created sextload (6/N)
This teaches tryToFoldExtOfLoad to set the right location on a
newly-created extload. With that in place, the logic for performing a
certain ([s|z]ext (load ...)) combine becomes identical for sexts and
zexts, and we can get rid of one copy of the logic.

The test case churn is due to dependencies on IROrders inherited from
the wrong SDLoc.

Part of: llvm.org/PR37262

Differential Revision: https://reviews.llvm.org/D46158

llvm-svn: 332118
2018-05-11 18:40:08 +00:00
Simon Pilgrim 661ae7778d [X86][BtVer2] Model ymm move as double pumped instructions
We still need to handle mmx/xmm moves as 'decode-only' no-pipe instructions

llvm-svn: 332109
2018-05-11 17:38:36 +00:00
Alex Bradbury bca0c3cdb6 [RISCV] Support .option rvc and norvc assembler directives
These directives allow the 'C' (compressed) extension to be enabled/disabled 
within a single file.

Differential Revision: https://reviews.llvm.org/D45864
Patch by Kito Cheng

llvm-svn: 332107
2018-05-11 17:30:28 +00:00
Geoff Berry 60460268c0 [AArch64] Fix performPostLD1Combine to check for constant lane index.
Summary:
performPostLD1Combine in AArch64ISelLowering looks for vector
insert_vector_elt of a loaded value which it can optimize into a single
LD1LANE instruction.  The code checking for the pattern was not checking
if the lane index was a constant which could cause two problems:

- an assert when lowering the LD1LANE ISD node since it assumes an
  constant operand

- an assert in isel if the lane index value depends on the
  post-incremented base register

Both of these issues are avoided by simply checking that the lane index
is a constant.

Fixes bug 35822.

Reviewers: t.p.northover, javed.absar

Subscribers: rengolin, kristof.beyls, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D46591

llvm-svn: 332103
2018-05-11 16:25:06 +00:00
Sanjoy Das 82105e2a7d Use iteration instead of recursion in CFIInserter
Summary: This recursive step can overflow the stack.

Reviewers: djokov, petarj

Subscribers: mcrosier, jlebar, bixia, llvm-commits

Differential Revision: https://reviews.llvm.org/D46671

llvm-svn: 332101
2018-05-11 15:54:46 +00:00
Simon Pilgrim 22dd72b995 [X86] Split WriteF/WriteVec Move/Load/Store scheduler classes by vector width
Fixes a SNB issue that was missing vlddqu/vmovntdqa ymm instructions

llvm-svn: 332094
2018-05-11 14:30:54 +00:00
Tom Stellard dcc95e9385 AMDGPU/GlobalISel: Implement select() for 32-bit G_FPTOUI
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45883

llvm-svn: 332082
2018-05-11 05:44:16 +00:00
Craig Topper 1ee19ae126 [X86] Add new patterns for masked scalar load/store to match clang's codegen from r331958.
Clang's codegen now uses 128-bit masked load/store intrinsics in IR. The backend will widen to 512-bits on AVX512F targets.

So this patch adds patterns to detect codegen's widening and patterns for AVX512VL that don't get widened.

We may be able to drop some of the old patterns, but I leave that for a future patch.

llvm-svn: 332049
2018-05-10 21:49:16 +00:00
Tom Stellard 1e0edad4bb AMDGPU/GlobalISel: Implement select() for G_BITCAST s32 <--> <2 x s16>
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45881

llvm-svn: 332042
2018-05-10 21:20:10 +00:00
Tom Stellard 1dc90204bf AMDGPU/GlobalISel: Enable TableGen'd instruction selector
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, mgorny, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45994

llvm-svn: 332039
2018-05-10 20:53:06 +00:00
Sam Clegg a5908009cd [WebAsembly] Update default triple in test files to wasm32-unknown-unkown.
Summary: The final -wasm component has been the default for some time now.

Subscribers: jfb, dschuff, jgravelle-google, eraman, aheejin, JDevlieghere, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D46342

llvm-svn: 332007
2018-05-10 17:49:11 +00:00
Simon Pilgrim 38ac0e9c6b [X86] Split WriteVecALU/WriteVecLogic/WriteShuffle/WriteVarShuffle/WritePSADBW/WritePHAdd scheduler classes
Split off XMM classes from the default (MMX) classes.

llvm-svn: 331999
2018-05-10 17:06:09 +00:00
Sanjay Patel b4e7893ba8 [x86] fix fmaxnum/fminnum with nnan
With nnan, there's no need for the masked merge / blend
sequence (that probably costs much more than the min/max
instruction).

Somewhere between clang 5.0 and 6.0, we started producing
these intrinsics for fmax()/fmin() in C source instead of
libcalls or fcmp/select. The backend wasn't prepared for
that, so we regressed perf in those cases.

Note: it's possible that other targets have similar problems
as seen here. 

Noticed while investigating PR37403 and related bugs:
https://bugs.llvm.org/show_bug.cgi?id=37403

The IR FMF propagation cases still don't work. There's
a proposal that might fix those cases in D46563.

llvm-svn: 331992
2018-05-10 15:40:49 +00:00