Commit Graph

50431 Commits

Author SHA1 Message Date
Matt Arsenault 299302fbe7 AMDGPU/GlobalISel: InstrMapping for G_UNMERGE_VALUES
llvm-svn: 350588
2019-01-08 00:46:19 +00:00
Craig Topper 486313b5f7 Recommit r350554 "[X86] Remove AVX512VBMI2 concat and shift intrinsics. Replace with target independent funnel shift intrinsics."
The MSVC limit we hit on AutoUpgrade.cpp has been worked around for now.

llvm-svn: 350567
2019-01-07 21:00:32 +00:00
Craig Topper fad1589f39 Revert r350554 "[X86] Remove AVX512VBMI2 concat and shift intrinsics. Replace with target independent funnel shift intrinsics."
The AutoUpgrade.cpp if/else cascade hit an MSVC limit again.

llvm-svn: 350562
2019-01-07 19:39:05 +00:00
Craig Topper 826f44b550 [TargetLowering][AMDGPU] Remove the SimplifyDemandedBits function that takes a User and OpIdx. Stop using it in AMDGPU target for simplifyI24.
As we saw in D56057 when we tried to use this function on X86, it's unsafe. It allows the operand node to have multiple users, but doesn't prevent recursing past the first node when it does have multiple users. This can cause other simplifications earlier in the graph without regard to what bits are needed by the other users of the first node. Ideally all we should do to the first node if it has multiple uses is bypass it when its not needed by the user we started from. Doing any other transformation that SimplifyDemandedBits can do like turning ZEXT/SEXT into AEXT would result in an increase in instructions.

Fortunately, we already have a function that can do just that, GetDemandedBits. It will only make transformations that involve bypassing a node.

This patch changes AMDGPU's simplifyI24, to use a combination of GetDemandedBits to handle the multiple use simplifications. And then uses the regular SimplifyDemandedBits on each operand to handle simplifications allowed when the operand only has a single use. Unfortunately, GetDemandedBits simplifies constants more aggressively than SimplifyDemandedBits. This caused the -7 constant in the changed test to be simplified to remove the upper bits. I had to modify computeKnownBits to account for this by ignoring the upper 8 bits of the input.

Differential Revision: https://reviews.llvm.org/D56087

llvm-svn: 350560
2019-01-07 19:30:43 +00:00
Craig Topper 9c4f7e9147 [X86] Remove AVX512VBMI2 concat and shift intrinsics. Replace with target independent funnel shift intrinsics.
Differential Revision: https://reviews.llvm.org/D56377

llvm-svn: 350554
2019-01-07 19:10:12 +00:00
Diogo N. Sampaio f192cdb5c9 [ARM] ComputeKnownBits to handle extract vectors
This patch adds the sign/zero extension done by
vgetlane to ARM computeKnownBitsForTargetNode.

Differential revision: https://reviews.llvm.org/D56098

llvm-svn: 350553
2019-01-07 19:01:47 +00:00
Rhys Perry f77e2e8406 AMDGPU: test for uniformity of branch instruction, not its condition
Summary:
If a divergent branch instruction is marked as divergent by propagation
rule 2 in DivergencePropagator::exploreSyncDependency() and its condition
is uniform, that branch would incorrectly be assumed to be uniform.

Reviewers: arsenm, tstellar

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D56331

llvm-svn: 350532
2019-01-07 15:52:28 +00:00
Matt Arsenault 7a27c1886f AMDGPU: Remove v16i8 from register classes
llvm-svn: 350518
2019-01-07 13:31:55 +00:00
Matt Arsenault 369acb8470 AMDGPU: Remove VS/SV mappings from select
These would violate the constant bus restriction

llvm-svn: 350517
2019-01-07 13:21:36 +00:00
Craig Topper 6ffeeb705f [X86] Add support for matching vector funnel shift to AVX512VBMI2 instructions.
Summary: AVX512VBMI2 supports a funnel shift by immediate and a funnel shift by a variable vector.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56361

llvm-svn: 350498
2019-01-06 18:10:18 +00:00
Sanjay Patel 8f12e8f3f6 [x86] explicitly set cost of integer add/sub
There are no test changes here in the existing cost model
regression tests because integer add/sub have a default
legal cost of 1 already. This would break, however, if
we custom lower those ops because the default cost model
assumes that custom-lowered ops are more expensive.

This is similar to the change in rL350403. See discussion
in D56011 for more details. When we enhance that patch to
handle integer ops, we need this cost model change to avoid
unintended diffs here from the custom lowering.

llvm-svn: 350496
2019-01-06 16:21:42 +00:00
Craig Topper 1187991bcf [X86][AsmParser] Don't allow X86::DX in CheckBaseRegAndIndexRegAndScale.
This was here because out and in instructions allow '(%dx)' even though its not a memory reference. To handle this we build a special operand for the DX register reference before we get to the call to CheckBaseRegAndIndexRegAndScale. So we no longer need this special case.

llvm-svn: 350483
2019-01-05 23:30:28 +00:00
Craig Topper d0ba531a0c [X86] Use two pmovmskbs in combineBitcastvxi1 for (i64 (bitcast (v64i1 (truncate (v64i8)))) on KNL.
llvm-svn: 350481
2019-01-05 22:42:58 +00:00
Craig Topper 46f8b4a11e [X86] Allow combinevxi1Bitcast to use pmovmskb on avx512 targets if the input is a truncate from v16i8/v32i8.
This is especially helpful on targets without avx512bw since we don't have a good way to convert from v16i8/v32i8 to v16i1/v32i1 for the truncate anyway. If we're just going to convert it to a GPR we might as well use pmovmskb to accomplish both.

llvm-svn: 350480
2019-01-05 21:40:07 +00:00
Craig Topper 3f48dbf72e [X86] Allow LowerTRUNCATE to use PACKUS/PACKSS for v16i16->v16i8 truncate when -mprefer-vector-width-256 is in effect and BWI is not available.
llvm-svn: 350473
2019-01-05 18:48:11 +00:00
Craig Topper 45ec002e25 [X86] Require second operand of X86vshiftuniform to be an integer. NFC
We don't need to require the first operand to be an integer because we already said it was the same type as the result which we also constrained to an integer.

llvm-svn: 350455
2019-01-05 01:40:29 +00:00
Nikita Popov c35b4a37ba [X86] Fix warning; NFC
llvm-svn: 350437
2019-01-04 21:41:35 +00:00
Vyacheslav Zakharin 0a6f86c54b Update the pr_datasz of .note.gnu.property section.
Patch by Xiang Zhang.

Differential Revision: https://reviews.llvm.org/D56080

llvm-svn: 350436
2019-01-04 21:25:01 +00:00
Evandro Menezes 9f53bea536 [AArch64] Adjust the cost model for Exynos M3
Improve the modeling of ASIMD loads and stores.

llvm-svn: 350434
2019-01-04 21:02:25 +00:00
Sanjay Patel 6153565511 [x86] lower extracted fadd/fsub to horizontal vector math; 2nd try
The 1st try for this was at rL350369, but it caused IR-level diffs because
our cost models differentiate custom vs. legal/promote lowering. So that was
reverted at rL350373. The cost models were fixed independently at rL350403,
so this is effectively the same patch as last time.

Original commit message:
This would show up if we fix horizontal reductions to narrow as they go along,
but it's an improvement for size and/or Jaguar (fast-hops) independent of that.

We need to do this late to not interfere with other pattern matching of larger
horizontal sequences.

We can extend this to integer ops in a follow-up patch.

Differential Revision: https://reviews.llvm.org/D56011

llvm-svn: 350421
2019-01-04 17:48:13 +00:00
Nirav Dave 1468d6e1c5 Undo r350355 "[X86] Remove terrible DX Register parsing hack in parse operand. NFCI."
Add missing test case and update comments.

llvm-svn: 350406
2019-01-04 17:11:15 +00:00
Simon Pilgrim c2054144ee [CostModel][X86] Fix SSE1 FADD/FSUB costs
Noticed in D56011 - handle the case that scalar fp ops are quicker on P3 than P4

Add the other costs so that we're not relying on the default "is legal/custom" cost logic.

llvm-svn: 350403
2019-01-04 16:55:57 +00:00
Simon Pilgrim 9f4dea8c06 [X86] Add VPSLLI/VPSRLI ((X >>u C1) << C2) SimplifyDemandedBits combine
Repeat of the generic SimplifyDemandedBits shift combine

llvm-svn: 350399
2019-01-04 15:43:43 +00:00
Richard Trieu e1fef949ae [WebAssembly] Split the checking from the sorting logic.
Move the check for -1 and identical values outside the vector sorting code.
Compare functions need to be able to compare identical elements to be
conforming.

llvm-svn: 350379
2019-01-04 06:49:24 +00:00
Craig Topper 6265a15f2e [X86] Add post-isel peephole to fold KAND+KORTEST into KTEST if only the zero flag is used.
Doing this late so we will prefer to fold the AND into a masked comparison first. That can be better for the live range of the mask register.

Differential Revision: https://reviews.llvm.org/D56246

llvm-svn: 350374
2019-01-04 00:10:58 +00:00
Sanjay Patel 26ce9c38a7 revert r350369: [x86] lower extracted fadd/fsub to horizontal vector math
There are non-codegen tests that need to be updated with this code change.

llvm-svn: 350373
2019-01-04 00:02:02 +00:00
Sanjay Patel ef4afca2ad [x86] lower extracted fadd/fsub to horizontal vector math
This would show up if we fix horizontal reductions to narrow as they go along, 
but it's an improvement for size and/or Jaguar (fast-hops) independent of that.

We need to do this late to not interfere with other pattern matching of larger 
horizontal sequences.

We can extend this to integer ops in a follow-up patch.

Differential Revision: https://reviews.llvm.org/D56011

llvm-svn: 350369
2019-01-03 23:16:19 +00:00
Heejin Ahn 777d01c756 [WebAssembly] Optimize Irreducible Control Flow
Summary:
Irreducible control flow is not that rare, e.g. it happens in malloc and
3 other places in the libc portions linked in to a hello world program.
This patch improves how we handle that code: it emits a br_table to
dispatch to only the minimal necessary number of blocks. This reduces
the size of malloc by 33%, and makes it comparable in size to asm2wasm's
malloc output.

Added some tests, and verified this passes the emscripten-wasm tests run
on the waterfall (binaryen2, wasmobj2, other).

Reviewers: aheejin, sunfish

Subscribers: mgrang, jgravelle-google, sbc100, dschuff, llvm-commits

Differential Revision: https://reviews.llvm.org/D55467

Patch by Alon Zakai (kripken)

llvm-svn: 350367
2019-01-03 23:10:11 +00:00
Wouter van Oortmerssen 820c6263d9 [WebAssembly] Fixed disassembler not knowing about new brlist operand
Summary:
The previously introduced new operand type for br_table didn't have
a disassembler implementation, causing an assert.

Reviewers: dschuff, aheejin

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D56227

llvm-svn: 350366
2019-01-03 23:01:30 +00:00
Wouter van Oortmerssen 9843295608 [WebAssembly] Made InstPrinter more robust
Summary:
Instead of asserting on certain kinds of malformed instructions, it
now still print, but instead adds an annotation indicating the
problem, and/or indicates invalid_type etc.

We're using the InstPrinter from many contexts that can't always
guarantee values are within range (e.g. the disassembler), where having
output is more valueable than asserting.

Reviewers: dschuff, aheejin

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D56223

llvm-svn: 350365
2019-01-03 22:59:59 +00:00
Nirav Dave 8de916d1a4 [X86] Remove terrible DX Register parsing hack in parse operand. NFCI.
Fold hack special casing of (%dx) operand parsing into the related
hack for out*/in* instruction parsing.

llvm-svn: 350355
2019-01-03 21:46:30 +00:00
Sanjay Patel 9633d76a40 [DAGCombiner][x86] scalarize binop followed by extractelement
As noted in PR39973 and D55558:
https://bugs.llvm.org/show_bug.cgi?id=39973
...this is a partial implementation of a fold that we do as an IR canonicalization in instcombine:

// extelt (binop X, Y), Index --> binop (extelt X, Index), (extelt Y, Index)

We want to have this in the DAG too because as we can see in some of the test diffs (reductions), 
the pattern may not be visible in IR.

Given that this is already an IR canonicalization, any backend that would prefer a vector op over 
a scalar op is expected to already have the reverse transform in DAG lowering (not sure if that's
a realistic expectation though). The transform is limited with a TLI hook because there's an
existing transform in CodeGenPrepare that tries to do the opposite transform.

Differential Revision: https://reviews.llvm.org/D55722

llvm-svn: 350354
2019-01-03 21:31:16 +00:00
Alexander Timofeev 993e2798fd [AMDGPU] Fix scalar operand folding bug that causes SHOC performance regression.
Detailed description: SIFoldOperands::foldInstOperand iterates over the
operand uses calling the function that changes def-use iteratorson the
way. As a result loop exits immediately when def-use iterator is
changed. Hence, the operand is folded to the very first use instruction
only. This makes VGPR live along the whole basic block and increases
register pressure significantly. The performance drop observed in SHOC
DeviceMemory test is caused by this bug.

Proposed fix: collect uses to separate container for further processing
in another loop.

Testing: make check-llvm
SHOC performance test.

Reviewers: rampitec, ronlieb

Differential Revision: https://reviews.llvm.org/D56161

llvm-svn: 350350
2019-01-03 19:55:32 +00:00
Evandro Menezes 0f67746c92 [AArch64] Add new scheduling predicates
Add new scheduling predicates to identify the ASIMD loads and stores using the post indexed addressing mode.

llvm-svn: 350332
2019-01-03 17:28:09 +00:00
Alex Bradbury 2ba76be882 [RISCV][MC] Accept %lo and %pcrel_lo on operands to li
This matches GNU assembler behaviour.

llvm-svn: 350321
2019-01-03 14:41:41 +00:00
Diogo N. Sampaio 8786a946d8 [ARM] Add command-line option for SB
SB (Speculative Barrier) is only mandatory from 8.5
onwards but is optional from Armv8.0-A. This patch adds a command
line option to enable SB, as it was previously only possible to
enable by selecting -march=armv8.5-a.

This patch also renames FeatureSpecRestrict to FeatureSB.

Reviewed By: olista01, LukeCheeseman

Differential Revision: https://reviews.llvm.org/D55990

llvm-svn: 350299
2019-01-03 12:09:12 +00:00
Simon Pilgrim d824f99a6c [X86] Add ADD/SUB SSAT/USAT vector costs (PR40123)
Costs for real SSE2 instructions

llvm-svn: 350295
2019-01-03 11:38:42 +00:00
Piotr Sobczak 3abef8f9ea [AMDGPU] Change section name with metadata access
Summary:
The commit rL348922 introduced a means to set Metadata
section kind for a global variable, if its explicit section
name was prefixed with ".AMDGPU.metadata.".

This patch changes that prefix to ".AMDGPU.comment.",
as "metadata" in the section name might lead to
ambiguity with metadata used by AMD PAL runtime.

Change-Id: Idd4748800d6fe801441d91595fc21e5a4171e668

Reviewers: kzhuravl

Reviewed By: kzhuravl

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D56197

llvm-svn: 350292
2019-01-03 11:22:58 +00:00
QingShan Zhang f24ec7bdd0 [Power9] Enable the Out-of-Order scheduling model for P9 hw
When switched to the MI scheduler for P9, the hardware is modeled as out of order.
However, inside the MI Scheduler algorithm, we still use the in-order scheduling model
as the MicroOpBufferSize isn't set. The MI scheduler take it as the hw cannot buffer
the op. So, only when all the available instructions issued, the pending instruction
could be scheduled. That is not true for our P9 hw in fact.

This patch is trying to enable the Out-of-Order scheduling model. The buffer size 44 is
picked from the P9 hw spec, and the perf test indicate that, its value won't hurt the cpu2017.

With this patch, there are 3 specs improved over 3% and 1 spec deg over 3%. The detail is as follows:

x264_r: +6.95%
cactuBSSN_r: +6.94%
lbm_r: +4.11%
xz_r: -3.85%

And the GEOMEAN for all the C/C++ spec in spec2017 is about 0.18% improved. 

Reviewer: Nemanjai
Differential Revision: https://reviews.llvm.org/D55810

llvm-svn: 350285
2019-01-03 05:04:18 +00:00
Robert Widmann 7882b283cd [LLVM-C] Expand LLVMRelocMode
Summary: Add read[only|write] PIC relocation models to the C API and teach the TargetMachine API about it.

Reviewers: whitequark, deadalnix

Reviewed By: whitequark

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56187

llvm-svn: 350279
2019-01-03 00:33:44 +00:00
Craig Topper df5304d8de [X86] Add load folding support to the custom isel we do for X86ISD::UMUL/SMUL.
The peephole pass isn't always able to fold the load because it can't commute the implicit usage of AL/AX/EAX/RAX.

llvm-svn: 350272
2019-01-02 23:24:08 +00:00
Wouter van Oortmerssen ad72f68501 [WebAssembly] made assembler parse block_type
Summary:
This was previously ignored and an incorrect value generated.

Also fixed Disassembler's handling of block_type.

Reviewers: dschuff, aheejin

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D56092

llvm-svn: 350270
2019-01-02 23:23:51 +00:00
Craig Topper 9d4860ec4e [X86] Remove X86ISD::INC/DEC. Just select them from X86ISD::ADD/SUB at isel time
INC/DEC are pretty much the same as ADD/SUB except that they don't update the C flag.

This patch removes the special nodes and just pattern matches from ADD/SUB during isel if the C flag isn't being used.

I had to avoid selecting DEC is the result isn't used. This will become a SUB immediate which will turned into a CMP later by optimizeCompareInstr. This lead to the one test change where we use a CMP instead of a DEC for an overflow intrinsic since we only checked the flag.

This also exposed a hole in our RMW flag matching use of hasNoCarryFlagUses. Our root node for the match is a store and there's no guarantee that all the flag users have been selected yet. So hasNoCarryFlagUses needs to check copyToReg and machine opcodes, but it also needs to check for the pre-match SETCC, SETCC_CARRY, BRCOND, and CMOV opcodes.

Differential Revision: https://reviews.llvm.org/D55975

llvm-svn: 350245
2019-01-02 19:01:05 +00:00
Wei Mi ecc89b76cb [PowerPC] Remove SeenUse check when optimizing conditional branch in
PPCPreEmitPeephole pass.

PPCPreEmitPeephole will convert a BC to B when the conditional branch is
based on a constant CR by CRSET or CRUNSET. This is added in
https://reviews.llvm.org/rL343100.

When the conditional branch is known to be always taken, all branches will
be removed and a new unconditional branch will be inserted. However, when
SeenUse is false the original patch will not remove the branches, but still
insert the new unconditional branch, update the successors and create
inconsistent IR. Compiling the synthetic testcase included can show the
problem we run into.

The patch simply removes the SeenUse condition when adding branches into
InstrsToErase set.

Differential Revision: https://reviews.llvm.org/D56041

llvm-svn: 350223
2019-01-02 17:07:23 +00:00
Simon Pilgrim d8125726d5 [X86] Support SHLD/SHRD masked shift-counts (PR34641)
Peek through shift modulo masks while matching double shift patterns.

I was hoping to delay this until I could remove the X86 code with generic funnel shift matching (PR40081) but this will do for now.

Differential Revision: https://reviews.llvm.org/D56199

llvm-svn: 350222
2019-01-02 17:05:37 +00:00
Piotr Sobczak 378131bae0 [AMDGPU] Handle OR as operand of raw load/store
Summary:
Use isBaseWithConstantOffset() which handles OR as an operand
to llvm.amdgcn.raw.buffer.load and llvm.amdgcn.raw.buffer.store.

Change-Id: Ifefb9dc5ded8710d333df07ab1900b230e33539a

Reviewers: nhaehnle, mareko, arsenm

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D55999

llvm-svn: 350208
2019-01-02 09:47:41 +00:00
Craig Topper f7cc7e3201 [X86] Remove the separate SMUL8/UMUL8 X86ISD opcodes by merging with SMUL/UMUL. Remove the second result from X86ISD::UMUL.
All of these use custom isel so we can pretty easily detect the differences in the custom code in X86ISelDAGToDAG. The ISD opcodes just need to express the desired semantics not the details of how they would be selected by isel. So unifying them lets us remove the special casing from lowering.

llvm-svn: 350206
2019-01-02 06:40:11 +00:00
Craig Topper d4db122483 [X86] Allow LowerSELECT and LowerBRCOND to directly lower i8 UMULO/SMULO.
These require a different X86ISD node to be created than i16/i32/i64. I guess no one wanted to add the special code for that except in LowerXALUO. But now LowerXALUO, LowerSELECT, and LowerBRCOND all use a common helper function so they all share the special code.

Unfortunately, there are no test changes because we seem to correct the miss in a DAG combine later. I did verify it manually using test cases from xmulo.ll

llvm-svn: 350205
2019-01-02 05:46:03 +00:00
Craig Topper 00b390a000 [X86] Factor the core code out of LowerXALUO into a helper function. Use it in LowerBRCOND and LowerSELECT to avoid some duplicated code.
This makes it easier to keep the LowerBRCOND and LowerSELECT code in sync with LowerXALUO so they always pick the same operation for overflowing instructions.

This is inspired by the helper functions used by ARM and AArch64 for the same purpose.

The test change is because LowerSELECT was not in sync with LowerXALUO with regard to INC/DEC for SADDO/SSUBO.

llvm-svn: 350198
2019-01-01 19:34:11 +00:00
Sanjay Patel 738a863648 [x86] move/rename helper for horizontal op codegen; NFC
Preliminary commit as suggested in D56011.

llvm-svn: 350193
2019-01-01 16:08:36 +00:00
Craig Topper bb0873cf46 [X86] Add X86ISD::VSRAI to computeKnownBitsForTargetNode.
Differential Revision: https://reviews.llvm.org/D56169

llvm-svn: 350178
2018-12-31 19:09:27 +00:00
Simon Pilgrim f2b9d10477 Keep tablegen commands in alphabetical order. NFCI.
Mentioned on D56167.

llvm-svn: 350176
2018-12-31 14:51:53 +00:00
Martin Storsjo 74d93f9b24 [AArch64] Accept "sve" as arch feature in assembler
Differential Revision: https://reviews.llvm.org/D56128

llvm-svn: 350174
2018-12-31 10:22:04 +00:00
Martin Storsjo 2018777836 [AArch64] Implement the .arch_extension directive
Differential Revision: https://reviews.llvm.org/D56131

llvm-svn: 350169
2018-12-30 21:06:32 +00:00
Kang Zhang 9d78c60bf4 [PowerPC] Fix machine verify pass error for PATCHPOINT pseudo instruction that bad machine code
Summary:
For SDAG, we pretend patchpoints aren't special at all until we emit the code for the pseudo.
Then the verifier runs and it seems like we have a use of an undefined register (the register will 
be reserved later, but the verifier doesn't know that).

So this patch call setUsesTOCBasePtr before emit the code for the pseudo, so verifier can know 
X2 is a reserved register.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D56148

llvm-svn: 350165
2018-12-30 15:13:51 +00:00
Craig Topper a32e353afa [X86] Don't mark SEXTLOAD from v4i8/v4i16/v8i8 as Custom on pre-sse4.1.
This seems to be getting in the way more than its helping. This does mean we stop scalarizing some cases, but I'm not convinced the scalarization was really better.

Some of the changes to vsel-cmp-load.ll are a regression but D56156 should fix it.

llvm-svn: 350159
2018-12-30 03:05:07 +00:00
Craig Topper f237ce159e [X86] Add custom type legalization for SIGN_EXTEND_VECTOR_INREG from 16i16/v32i8 to v4i64 when v4i64 needs splitting.
This allows us to sign extend to v4i32 first. And then share that extension to implement the final steps to v4i64 using a pcmpgt and punpckl and punpckh.

We already do something similar for SIGN_EXTEND with -x86-experimental-vector-widening-legalization.

llvm-svn: 350158
2018-12-30 02:30:34 +00:00
Nemanja Ivanovic 0dad994a10 [PowerPC][NFC] Macro for register set defs for the Asm Parser
We have some unfortunate code in the back end that defines a bunch of register
sets for the Asm Parser. Every time another class is needed in the parser, we
have to add another one of those definitions with explicit lists of registers.
This NFC patch simply provides macros to use to condense that code a little bit.

Differential revision: https://reviews.llvm.org/D54433

llvm-svn: 350156
2018-12-29 16:13:11 +00:00
Nemanja Ivanovic 0f7715afe1 [PowerPC] Complete the custom legalization of vector int to fp conversion
A recent patch has added custom legalization of vector conversions of
v2i16 -> v2f64. This just rounds it out for other types where the input vector
has an illegal (narrower) type than the result vector. Specifically, this will
handle the following conversions:

v2i8 -> v2f64
v4i8 -> v4f32
v4i16 -> v4f32

Differential revision: https://reviews.llvm.org/D54663

llvm-svn: 350155
2018-12-29 13:40:48 +00:00
Nemanja Ivanovic 3c7ac649ec [PowerPC] Fix CR Bit spill pseudo expansion
The current CRBIT spill pseudo-op expansion creates a KILL instruction
that kills the CRBIT and defines the enclosing CR field. However, this
paints a false picture to the register allocator that all bits in the CR
field are killed so copies of other bits out of the field become dead and
removable.
This changes the expansion to preserve the KILL flag on the CRBIT as an
implicit use and to treat the CR field as an undef input.

Thanks to Hal Finkel for the review and Uli Weigand for implementation input.

Differential revision: https://reviews.llvm.org/D55996

llvm-svn: 350153
2018-12-29 11:43:54 +00:00
Simon Atanasyan a6424e7c4e [mips] Show an error on attempt to use 64-bit PC-relative relocation
The following code requests 64-bit PC-relative relocations unsupported
by MIPS ABI. Now it triggers an assertion. It's better to show an error
message.
```
foo:
  .quad bar - foo
```

llvm-svn: 350152
2018-12-29 10:10:02 +00:00
Simon Atanasyan b243d8d42a [mips] Show a regular error message on attempt to use one byte relocation
llvm-svn: 350151
2018-12-29 10:09:55 +00:00
Heejin Ahn 4d98dfb67d [WebAssembly] Fix comments in ExplicitLocals (NFC)
llvm-svn: 350144
2018-12-29 02:42:04 +00:00
Craig Topper 0a6cec6f9f [X86] Don't mark SEXTLOAD v4i8->v4i64 and v8i8->v8i64 as custom under vector widening legalization.
This was tricking us into making these operations and then letting them get scalarized later. But I can't prove that the scalarized version is actually better.

llvm-svn: 350141
2018-12-29 01:17:11 +00:00
Craig Topper f814d28eb3 [X86] Directly emit X86ISD::PMULUDQ from the ReplaceNodeResults handling of v2i8/v2i16/v2i32 multiply.
Previously we emitted a multiply and some masking that was supposed to matched to PMULUDQ, but the masking could sometimes be removed before we got a chance to match it. So instead just emit the PMULUDQ directly.

Remove the DAG combine that was added when the ReplaceNodeResults code was originally added. Add a new DAG combine to avoid regressions in shrink_vmul.ll

Some of the shrink_vmul.ll test cases now pick PMULUDQ instead of PMADDWD/PMULLD, but I think this should be an improvement on most CPUs.

I think all of this can go away if/when we switch to -x86-experimental-vector-widening-legalization

llvm-svn: 350134
2018-12-28 19:19:39 +00:00
Diogo N. Sampaio 9123f82cc4 [AArch64] Add command-line option for SB
SB (Speculative Barrier) is only mandatory from 8.5
onwards but is optional from Armv8.0-A. This patch adds a command
line option to enable SB, as it was previously only possible to
enable by selecting -march=armv8.5-a.

This patch also moves to FeatureSB the old FeatureSpecRestrict.

Reviewers: pbarrio, olista01, t.p.northover, LukeCheeseman	

Differential Revision: https://reviews.llvm.org/D55921

llvm-svn: 350126
2018-12-28 17:14:58 +00:00
Hiroshi Inoue 1ea98f040e [PowerPC] handle ISD:TRUNCATE in BitPermutationSelector
This is the last one in a series of patches to support better code generation for bitfield insert.
BitPermutationSelector already support ISD::ZERO_EXTEND but not TRUNCATE.
This patch adds support for ISD:TRUNCATE in BitPermutationSelector.

For example of this test case, 
struct s64b {
  int a:4;
  int b:16;
  int c:24;
};
void bitfieldinsert64b(struct s64b *p, unsigned char v) {
  p->b = v;
}

the selection DAG loos like:

t14: i32,ch = load<(load 4 from %ir.0)> t0, t2, undef:i64
       t18: i32 = and t14, Constant:i32<-1048561>
            t4: i64,ch = CopyFromReg t0, Register:i64 %1
          t22: i64 = AssertZext t4, ValueType:ch:i8
        t23: i32 = truncate t22
      t16: i32 = shl nuw nsw t23, Constant:i32<4>
    t19: i32 = or t18, t16
  t20: ch = store<(store 4 into %ir.0)> t14:1, t19, t2, undef:i64

By handling truncate in the BitPermutationSelector, we can use information from AssertZext when selecting t19 and skip the mask operation corresponding to t18.
So the generated sequences with and without this patch are

without this patch
	rlwinm 5, 5, 0, 28, 11 # corresponding to t18
	rlwimi 5, 4, 4, 20, 27
with this patch
	rlwimi 5, 4, 4, 12, 27

Differential Revision: https://reviews.llvm.org/D49076

llvm-svn: 350118
2018-12-28 08:00:39 +00:00
QingShan Zhang f2d9df61c7 [PowerPC] Remove the implicit use of the register if it is replaced by Imm
If we are changing the MI operand from Reg to Imm, we need also handle its implicit use if have.

Differential Revision: https://reviews.llvm.org/D56078

llvm-svn: 350115
2018-12-28 03:38:09 +00:00
Zi Xuan Wu 5187444345 [NFC] clang-format functions related to r350113
llvm-svn: 350114
2018-12-28 02:45:17 +00:00
Zi Xuan Wu a02a3feecf [PowerPC] Fix assert from machine verify pass that atomic pseudo expanding causes mismatched register class
For atomic value operand which less than 4 bytes need to be masked. 
And the related operation to calculate the newvalue can be done in 32 bit gprc. 
So just use gprc for mask and value calculation.

Differential Revision: https://reviews.llvm.org/D56077

llvm-svn: 350113
2018-12-28 02:12:55 +00:00
Chen Zheng 5ede950df9 [PowerPC] fix register class after converting X-FORM instruction to D-FORM instruction
Differential Revision: https://reviews.llvm.org/D55806

llvm-svn: 350111
2018-12-28 01:02:35 +00:00
Craig Topper 787ad92bf6 [X86] Remove check that avoids creating PMULDQ with illegal types. Rely on SplitOpsAndApply to legalize it.
Create PMULDQ/PMULUDQ as long as the number of elements is a power of 2.

This seems to give some improvements in our ability to use SimplifyDemandedBits.

llvm-svn: 350084
2018-12-27 03:37:04 +00:00
Craig Topper a8f07e51f9 [X86] Factor the core code out of LowerSETCC into a helper that can create CMP/BT/PTEST/KORTEST etc. without making an X86ISD::SETCC node. NFCI
Make each of the helper functions only return their comparison node and the condition code. Leave X86ISD::SETCC creation to the LowerSETCC function itself.

Looking into whether we can use this code directly in BRCOND and SELECT lowering instead of going through LowerSETCC which creates an X86ISD::SETCC node we need to look through.

llvm-svn: 350082
2018-12-27 01:50:40 +00:00
Craig Topper 4f1ef9fc0f [X86] Merge getBitTestCondition into LowerAndToBT. Don't create X86ISD::SETCC node in the merged function. NFCI
Only one of the 3 callers of LowerAndToBT need the SETCC node. Two of them have to look through it to find the operands they really need. Instead create it after the one call that needs it.

LowerAndToBT now returns both the BT node and the X86 specific condition code separately.

llvm-svn: 350081
2018-12-27 01:50:38 +00:00
Wouter van Oortmerssen f227621036 [WebAssembly] Added basic support for if/else/end_if in MC layer.
Summary:
These instructions are currently unused in our backend, but for
completeness it is good to support them, so they can be used with
the assembler in hand-written code.

Tests are very basic, signature support missing much like other blocks.

Reviewers: dschuff, aheejin

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D55973

llvm-svn: 350079
2018-12-26 22:55:26 +00:00
Wouter van Oortmerssen 29c6ce5879 [WebAssembly] Make assembler check for proper nesting of control flow.
Summary:
It does so using a simple nesting stack, and gives clear errors upon
violation. This is unique to wasm, since most CPUs do not have
any nested constructs.

Had to add an end of file check to the general assembler for this.

Note: if/else/end instructions are not currently supported in our
tablegen defs, so these tests will be enabled in a follow-up.
They already pass the nesting check.

Reviewers: dschuff, aheejin

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D55797

llvm-svn: 350078
2018-12-26 22:46:18 +00:00
Heejin Ahn ce1d50f9d7 [WebAssembly] Delete an unnecessary line in RegStackify
`OneUseInst` is set outside of the loop before and `OneUse` does not
change throughout the loop, so this line is not necessary.

llvm-svn: 350076
2018-12-26 22:33:35 +00:00
Heejin Ahn 99d3946398 [WebAssembly] Fix typos in comments in RegStackify (NFC)
llvm-svn: 350075
2018-12-26 22:27:46 +00:00
Justin Lebar 49fac56ea3 [NVPTX] Allow libcalls that are defined in the current module.
The patch adds a possibility to make library calls on NVPTX.

An important thing about library functions - they must be defined within
the current module. This basically should guarantee that we produce a
valid PTX assembly (without calls to not defined functions). The one who
wants to use the libcalls is probably will have to link against
compiler-rt or any other implementation.

Currently, it's completely impossible to make library calls because of
error LLVM ERROR: Cannot select: i32 = ExternalSymbol '...'. But we can
lower ExternalSymbol to TargetExternalSymbol and verify if the function
definition is available.

Also, there was an issue with a DAG during legalisation. When we expand
instruction into libcall, the inner call-chain isn't being "integrated"
into outer chain. Since the last "data-flow" (call retval load) node is
located in call-chain earlier than CALLSEQ_END node, the latter becomes
a leaf and therefore a dead node (and is being removed quite fast).
Proposed here solution relies on another data-flow pseudo nodes
(ProxyReg) which purpose is only to keep CALLSEQ_END at legalisation and
instruction selection phases - we remove the pseudo instructions before
register scheduling phase.

Patch by Denys Zariaiev!

Differential Revision: https://reviews.llvm.org/D34708

llvm-svn: 350069
2018-12-26 19:12:31 +00:00
Petar Avramovic 09dff33349 [MIPS GlobalISel] Select G_SELECT
Add widen scalar for type index 1 (i1 condition) for G_SELECT.
Select G_SELECT for pointer, s32(integer) and smaller low level
types on MIPS32.

Differential Revision: https://reviews.llvm.org/D56001

llvm-svn: 350063
2018-12-25 14:42:30 +00:00
Kang Zhang d501a1e596 [PowerPC] Fix the bug of ISD::ADDE to set its second return type to glue
Summary:
This patch is to fix the bug imported by rL341634.
In above submit , the the return type of ISD::ADDE is 
14224: SDVTList VTs = DAG.getVTList(MVT::i64, MVT::i64), 
but in fact, the second return type of ISD::ADDE should be 
MVT::Glue not MVT::i64.

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D55977

llvm-svn: 350061
2018-12-25 03:29:51 +00:00
Craig Topper 0229da8f07 [X86] Use GetDemandedBits to simplify the operands of PMULDQ/PMULUDQ.
This is an alternative to what I attempted in D56057.

GetDemandedBits is a special version of SimplifyDemandedBits that allows simplifications even when the operand has other uses. GetDemandedBits will only do simplifications that allow a node to be bypassed. It won't create new nodes or alter any of the other users.

I had to add support for bypassing SIGN_EXTEND_INREG to GetDemandedBits.

Based on a patch that Simon Pilgrim sent me in email.

Fixes PR40142.

llvm-svn: 350059
2018-12-24 19:40:20 +00:00
George Burgess IV 7e12875c89 [LoopIdioms] More LocationSize::precise annotations; NFC
Both of these places reference memset-like loops. Memset is precise.

Trying to keep these patches super small so they're easily post-commit
verifiable, as requested in D44748.

llvm-svn: 350044
2018-12-24 05:55:50 +00:00
Craig Topper 0adc3fe9e7 [X86] Remove unused variables left after r350041. NFC
llvm-svn: 350043
2018-12-24 05:45:45 +00:00
Craig Topper d8217b23ff [X86] Move the optimization that turns 'CMP (AND+IMM64), 0' into SRL/SHL+TEST to X86ISelDAGToDAG.
This cleans more code out of EmitTest.

llvm-svn: 350041
2018-12-24 05:27:13 +00:00
Craig Topper e8c50fc6af [X86] Remove the ANDN check from EmitTest.
Remove the TESTmr isel patterns and add another postprocessing combine for TESTrr+ANDrm->TESTmr. We already have a postprocessing combine for TESTrr+ANDrr->TESTrr. With this we can give ANDN a chance to match first. And clean it up during post processing if we ended up with just a regular AND.

This is another step towards my plan to gut EmitTest and do more flag handling during isel matching or by using optimizeCompare.

llvm-svn: 350038
2018-12-24 01:10:13 +00:00
Craig Topper 006bac6880 [X86] Return false from hasAndNotCompare if the comparision value is a constant.
We won't end up using an ANDN instruction in this case so we should generate the same code we do for pre-BMI targets.

llvm-svn: 350018
2018-12-23 05:52:55 +00:00
Craig Topper 3cc92a28ce [X86] Fix an old FIXME about folding the zero constant into the OR instruction we use for sequentially consistent fence in 32-bit mode without SSE2.
llvm-svn: 350013
2018-12-23 01:54:43 +00:00
Sanjay Patel 52c02d70e2 [x86] add load fold patterns for movddup with vzext_load
The missed load folding noticed in D55898 is visible independent of that change 
either with an adjusted IR pattern to start or with AVX2/AVX512 (where the build 
vector becomes a broadcast first; movddup is not produced until we get into isel 
via tablegen patterns).

Differential Revision: https://reviews.llvm.org/D55936

llvm-svn: 350005
2018-12-22 16:59:02 +00:00
Craig Topper 1f02ac3451 [X86] FixupLEAs, reduce number of calls to getOperand and use X86::AddrBaseReg/AddrIndexReg, etc. instead of hardcoded constants.
Makes the code a little more readable.

llvm-svn: 349983
2018-12-22 01:34:47 +00:00
Justin Lebar 7f41fe3a58 [NVPTX] Reduce stack size in NVPTXAsmPrinter::doInitialization().
NVPTXAsmPrinter::doInitialization() was creating an NVPTXSubtarget on
the stack.  This object is huge, about 80kb.  Also it's slow to create.
And it's all redundant; we have one in NVPTXTargetMachine anyway!

llvm-svn: 349982
2018-12-22 01:30:37 +00:00
Mircea Trofin 499a66ecc0 Silence warning in assert introduced in rL349973.
Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56030

llvm-svn: 349975
2018-12-21 23:02:10 +00:00
Mircea Trofin b53eeb6f4c [llvm] API for encoding/decoding DWARF discriminators.
Summary:
Added a pair of APIs for encoding/decoding the 3 components of a DWARF discriminator described in http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html: the base discriminator, the duplication factor (useful in profile-guided optimization) and the copy index (used to identify copies of code in cases like loop unrolling)

The encoding packs 3 unsigned values in 32 bits. This CL addresses 2 issues:
- communicates overflow back to the user
- supports encoding all 3 components together. Current APIs assume a sequencing of events. For example, creating a new discriminator based on an existing one by changing the base discriminator was not supported.

Reviewers: davidxl, danielcdh, wmi, dblaikie

Reviewed By: dblaikie

Subscribers: zzheng, dmgreen, aprantl, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D55681

llvm-svn: 349973
2018-12-21 22:48:50 +00:00
Craig Topper e58cd9cbc6 [X86] Add isel patterns to match BMI/TBMI instructions when lowering has turned the root nodes into one of the flag producing binops.
This fixes the patterns that have or/and as a root. 'and' is handled differently since thy usually have a CMP wrapped around them.

I had to look for uses of the CF flag because all these nodes have non-standard CF flag behavior. A real or/xor would always clear CF. In practice we shouldn't be using the CF flag from these nodes as far as I know.

Differential Revision: https://reviews.llvm.org/D55813

llvm-svn: 349962
2018-12-21 21:42:43 +00:00
Craig Topper 62ec024d3b [X86] Don't allow optimizeCompareInstr to replace a CMP with BEXTR if the sign flag is used.
The BEXTR instruction documents the SF bit as undefined.

The TBM BEXTR instruction has the same issue, but I'm not sure how to test it. With the control being an immediate we can determine the sign bit is 0 or the BEXTR would have been removed.

Fixes PR40060

Differential Revision: https://reviews.llvm.org/D55807

llvm-svn: 349956
2018-12-21 21:16:26 +00:00
Changpeng Fang 6f539294b5 AMDGPU: Don't peel of the offset if the resulting base could possibly be negative in Indirect addressing.
Summary:
  Don't peel of the offset if the resulting base could possibly be negative in Indirect addressing.
This is because the M0 field is of unsigned.

This patch achieves the similar goal as https://reviews.llvm.org/D55241, but keeps the optimization
if the base is known unsigned.

Reviewers:
  arsemn

Differential Revision:
  https://reviews.llvm.org/D55568

llvm-svn: 349951
2018-12-21 20:57:34 +00:00
Sanjay Patel 80187b8a17 [x86] add movddup specialization for build vector lowering (PR37502)
This is admittedly a narrow fix for the problem:
https://bugs.llvm.org/show_bug.cgi?id=37502
...but as the XOP restriction shows, it's a maze to get this right. 
In the motivating example, note that we have movddup before SSE4.1 and 
again with AVX2. That's because insertps isn't available pre-SSE41 and 
vbroadcast is (more generally) available with AVX2 (and the splat is 
reduced to movddup via isel pattern).

Differential Revision: https://reviews.llvm.org/D55898

llvm-svn: 349937
2018-12-21 18:48:32 +00:00
Florian Hahn 8c9f865e3d [ARM] Set Defs = [CPSR] for COPY_STRUCT_BYVAL, as it clobbers CPSR.
Fixes PR35023.

Reviewers: MatzeB, t.p.northover, sunfish, qcolombet, efriedma

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D55909

llvm-svn: 349935
2018-12-21 18:07:10 +00:00
Jessica Paquette 453ab1db5b [GlobalISel][AArch64] Add support for widening G_FCEIL
This adds support for widening G_FCEIL in LegalizerHelper and
AArch64LegalizerInfo. More specifically, it teaches the AArch64 legalizer to
widen G_FCEIL from a 16-bit float to a 32-bit float when the subtarget doesn't
support full FP 16.

This also updates AArch64/f16-instructions.ll to show that we perform the
correct transformation.

llvm-svn: 349927
2018-12-21 17:05:26 +00:00
Evandro Menezes 96c11eceb2 [AArch64] Refactor Exynos predicate (NFC)
Change order of conditions in predicate.

llvm-svn: 349918
2018-12-21 15:51:34 +00:00
Simon Pilgrim aa53fc15b7 [XCore] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349915
2018-12-21 15:35:32 +00:00
Simon Pilgrim 7787a02b23 [Sparc] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349914
2018-12-21 15:32:36 +00:00
Simon Pilgrim 3c157d3fa3 [AMDGPU] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349912
2018-12-21 15:29:47 +00:00
Simon Pilgrim ca8bca2ad3 [WebAssembly] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349911
2018-12-21 15:25:37 +00:00
Simon Pilgrim d800ee4861 [ARM] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349909
2018-12-21 15:15:38 +00:00
Simon Pilgrim 148957f336 [AArch64] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349908
2018-12-21 15:05:10 +00:00
Simon Pilgrim 2482c51e99 [SystemZ] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349906
2018-12-21 14:50:54 +00:00
Simon Pilgrim d43bdc715c [Lanai] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349905
2018-12-21 14:48:35 +00:00
Simon Pilgrim af1ab22a76 [PPC] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349903
2018-12-21 14:32:39 +00:00
Simon Pilgrim 57733507fe [X86] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the old KnownBits output paramater version.

llvm-svn: 349902
2018-12-21 14:25:14 +00:00
Luke Cheeseman 41a9e53500 [Dwarf/AArch64] Return address signing B key dwarf support
- When signing return addresses with -msign-return-address=<scope>{+<key>},
  either the A key instructions or the B key instructions can be used. To
  correctly authenticate the return address, the unwinder/debugger must know
  which key was used to sign the return address.
- When and exception is thrown or a break point reached, it may be necessary to
  unwind the stack. To accomplish this, the unwinder/debugger must be able to
  first authenticate an the return address if it has been signed.
- To enable this, the augmentation string of CIEs has been extended to allow
  inclusion of a 'B' character. Functions that are signed using the B key
  variant of the instructions should have and FDE whose associated CIE has a 'B'
  in the augmentation string.
- One must also be able to preserve these semantics when first stepping from a
  high level language into assembly and then, as a second step, into an object
  file. To achieve this, I have introduced a new assembly directive
  '.cfi_b_key_frame ', that tells the assembler the current frame uses return
  address signing with the B key.
- This ensures that the FDE is associated with a CIE that has 'B' in the
  augmentation string.

Differential Revision: https://reviews.llvm.org/D51798

llvm-svn: 349895
2018-12-21 10:45:08 +00:00
Simon Pilgrim 5d403f6bf8 [X86][SSE] Auto upgrade PADDS/PSUBS intrinsics to SADD_SAT/SSUB_SAT generic intrinsics (llvm)
This auto upgrades the signed SSE saturated math intrinsics to SADD_SAT/SSUB_SAT generic intrinsics.

Clang counterpart: https://reviews.llvm.org/D55890

Differential Revision: https://reviews.llvm.org/D55894

llvm-svn: 349892
2018-12-21 09:04:14 +00:00
Thomas Lively b6dac89c87 [WebAssembly] Fix invalid machine instrs in -O0, verify in tests
Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D55956

llvm-svn: 349889
2018-12-21 06:58:15 +00:00
Matt Arsenault 3eae3c4590 AMDGPU/GlobalISel: RegBankSelect for amdgcn.wqm.vote
llvm-svn: 349882
2018-12-21 03:20:54 +00:00
Matt Arsenault f4c21c575a AMDGPU/GlobalISel: RegBankSelect for some fp ops
llvm-svn: 349880
2018-12-21 03:14:45 +00:00
Matt Arsenault bee2ad7185 AMDGPU/GlobalISel: Redo legality for build_vector
It seems better to avoid using the callback if possible since
there are coverage assertions which are disabled if this is used.

Also fix missing tests. Only test the legal cases since it seems
legalization for build_vector is quite lacking.

llvm-svn: 349878
2018-12-21 03:03:11 +00:00
Craig Topper 54f1a7be13 [X86] Refactor hasNoCarryFlagUses and hasNoSignFlagUses in X86ISelDAGToDAG.cpp to tranlate opcode to condition code using the helpers in X86InstrInfo.cpp.
This shortens the switches in X86ISelDAGToDAG.cpp to only need to check condition code instead of a list of opcodes.

This also fixes a bug where the memory forms of SETcc were missing from hasNoCarryFlagUses.

llvm-svn: 349868
2018-12-21 01:14:25 +00:00
Craig Topper e0cff10289 [X86] Add memory forms of some SETCC instructions to hasNoCarryFlagUses.
Found while working on another patch

llvm-svn: 349867
2018-12-21 01:14:23 +00:00
Eli Friedman b1bbd5dca3 [ARM] Complete the Thumb1 shift+and->shift+shift transforms.
This saves materializing the immediate.  The additional forms are less
common (they don't usually show up for bitfield insert/extract), but
they're still relevant.

I had to add a new target hook to prevent DAGCombine from reversing the
transform. That isn't the only possible way to solve the conflict, but
it seems straightforward enough.

Differential Revision: https://reviews.llvm.org/D55630

llvm-svn: 349857
2018-12-20 23:39:54 +00:00
Jessica Paquette a6b9c68a85 [GlobalISel][AArch64] Add G_FCEIL to isPreISelGenericFloatingPointOpcode
If you don't do this, then if you hit a G_LOAD in getInstrMapping, you'll end
up with GPRs on the G_FCEIL instead of FPRs. This causes a fallback.

Add it to the switch, and add a test verifying that this happens.

llvm-svn: 349822
2018-12-20 21:14:15 +00:00
Eli Friedman 48397102d0 [MC] [AArch64] Correctly resolve ":abs_g1:3" etc.
We have to treat constructs like this as if they were "symbolic", to use
the correct codepath to resolve them.  This mostly only affects movz
etc. because the other uses of classifySymbolRef conservatively treat
everything that isn't a constant as if it were a symbol.

Differential Revision: https://reviews.llvm.org/D55906

llvm-svn: 349800
2018-12-20 19:46:14 +00:00
Eli Friedman 4648209e16 [MC] [AArch64] Support resolving fixups for abs_g0 etc.
This requires a bit more code than other fixups, to distingush between
abs_g0/abs_g1/etc.  Actually, I think some of the other fixups are
missing some checks, but I won't try to address that here.

I haven't seen any real-world code that uses a construct like this, but
it clearly should work, and we're considering using it in the
implementation of localescape/localrecover on Windows (see
https://reviews.llvm.org/D53540). I've verified that binutils produces
the same code as llvm-mc for the testcase.

This currently doesn't include support for the *_s variants (that
requires a bit more work to set the opcode).

Differential Revision: https://reviews.llvm.org/D55896

llvm-svn: 349799
2018-12-20 19:38:07 +00:00
Simon Pilgrim 2a25360ae3 [X86] Auto upgrade XOP/AVX512 rotation intrinsics to generic funnel shift intrinsics (llvm)
This emits FSHL/FSHR generic intrinsics for the XOP VPROT and AVX512 VPROL/VPROR rotation intrinsics.

Clang counterpart: https://reviews.llvm.org/D55937

Differential Revision: https://reviews.llvm.org/D55938

llvm-svn: 349795
2018-12-20 19:01:07 +00:00
Yonghong Song 821c93d556 [BPF] Disable relocation for .BTF.ext section
Build llvm with assertion on, and then build bcc against this llvm.
Run any bcc tool with debug=8 (turning on -g for clang compilation),
you will get the following assertion errors,
  /home/yhs/work/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:888:
  void llvm::RuntimeDyldELF::resolveBPFRelocation(const llvm::SectionEntry&, uint64_t,
    uint64_t, uint32_t, int64_t): Assertion `Value <= (4294967295U)' failed.

The .BTF.ext ELF section uses Fixup's to get the instruction
offsets. The data width of the Fixup is 4 bytes since we only need
the insn offset within the section.

This caused the above error though since R_BPF_64_32 expects
4-byte value and the Runtime Dyld tried to resolve the actual
insn address which is 8 bytes.

Actually the offset within the section is all what we need.
Therefore, there is no need to perform any kind of relocation
for .BTF.ext section and such relocation will actually cause
incorrect result.

This patch changed BPFELFObjectWriter::getRelocType() such that
for Fixup Kind FK_Data_4, if the relocation Target is a temporary
symbol, let us skip the relocation (ELF::R_BPF_NONE).

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 349778
2018-12-20 17:40:23 +00:00
Krzysztof Parzyszek 30c42e2ab6 [Hexagon] Add patterns for funnel shifts
llvm-svn: 349770
2018-12-20 16:39:20 +00:00
Alex Bradbury eb3a64a4da [RISCV] Properly evaluate fixup_riscv_pcrel_lo12
This is a update to D43157 to correctly handle fixup_riscv_pcrel_lo12.

Notable changes:

Rebased onto trunk
Handle and test S-type
Test case pcrel-hilo.s is merged into relocations.s

D43157 description:
VK_RISCV_PCREL_LO has to be handled specially. The MCExpr inside is
actually the location of an auipc instruction with a VK_RISCV_PCREL_HI fixup
pointing to the real target.

Differential Revision: https://reviews.llvm.org/D54029
Patch by Chih-Mao Chen and Michael Spencer.

llvm-svn: 349764
2018-12-20 14:52:15 +00:00
Simon Pilgrim 09c081176a [X86][AVX512] Don't custom lower v16i8 rotations.
As discussed on D55747, the expansion to (wider) shifts is better on all AVX512 cases, not just BWI.

llvm-svn: 349763
2018-12-20 14:38:35 +00:00
Ulrich Weigand 380bece7af [SystemZ] "Generic" vector assembler instructions shoud clobber CC
There are several vector instructions which may or may not set the
condition code register, depending on the value of an argument.

For codegen, we use two versions of the instruction, one that sets
CC and one that doesn't, which hard-code appropriate values of that
argument.  But we also have a "generic" version of the instruction
that is used for the assembler/disassembler.  These generic versions
should always be considered to clobber CC just to be safe.

llvm-svn: 349761
2018-12-20 14:24:17 +00:00
Ulrich Weigand 44d37ae38c [SystemZ] Make better use of VLLEZ
This patch fixes two deficiencies in current code that recognizes
the VLLEZ idiom:

- For the floating-point versions, we have ISel patterns that match
  on a bitconvert as the top node.  In more complex cases, that
  bitconvert may already have been merged into something else.
  Fix the patterns to match the inner nodes instead.

- For the 64-bit integer versions, depending on the surrounding code,
  we may get either a DAG tree based on JOIN_DWORDS or one based on
  INSERT_VECTOR_ELT.  Use a PatFrags to simply match both variants.

llvm-svn: 349749
2018-12-20 13:05:03 +00:00
Ulrich Weigand 8bb46b0f01 [SystemZ] Make better use of VGEF/VGEG
Current code in SystemZDAGToDAGISel::tryGather refuses to perform
any transformation if the Load SDNode has more than one use.  This
(erronously) counts uses of the chain result, which prevents the
optimization in many cases unnecessarily.  Fixed by this patch.

llvm-svn: 349748
2018-12-20 13:01:20 +00:00
Clement Courbet 36a3480385 Re-land r349731 "[CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads.
Update PPC ir following GEP->bitcat to bitcat->GEP->bitcat change.

llvm-svn: 349747
2018-12-20 13:01:04 +00:00
Ulrich Weigand f43b510015 [SystemZ] Make better use of VLDEB
We already have special code (DAG combine support for FP_ROUND)
to recognize cases where we an use a vector version of VLEDB to
perform two floating-point truncates in parallel, but equivalent
support for VLEDB (vector floating-point extends) has been
missing so far.  This patch adds corresponding DAG combine
support for FP_EXTEND.

llvm-svn: 349746
2018-12-20 12:59:05 +00:00
Clement Courbet e22cf4d7cb Revert r349731 "[CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads."
Forgot to update PowerPC tests for the GEP->bitcast change.

llvm-svn: 349733
2018-12-20 09:58:33 +00:00
Clement Courbet 1bb6e1b0f2 [CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads.
Summary:
This allows expanding {7,11,13,14,15,21,22,23,25,26,27,28,29,30,31}-byte memcmp
in just two loads on X86. These were previously calling memcmp.

Reviewers: spatel, gchatelet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55263

llvm-svn: 349731
2018-12-20 09:13:47 +00:00
Kang Zhang ca8db48974 [PowerPC] Implement the isSelectSupported() target hook
Summary:
PowerPC has scalar selects (isel) and vector mask selects (xxsel). But PowerPC
does not have vector CR selects, PowerPC does not support scalar condition 
selects on vectors.
In addition to implementing this hook, isSelectSupported() should return false
when the SelectSupportKind is ScalarCondVectorVal, so that predictable selects
are converted into branch sequences.

Reviewed By: steven.zhang,  hfinkel

Differential Revision: https://reviews.llvm.org/D55754

llvm-svn: 349727
2018-12-20 06:19:59 +00:00
Thomas Lively feb18fe927 [WebAssembly] Emit a splat for v128 IMPLICIT_DEF
Summary:
This is a code size savings and is also important to get runnable code
while engines do not support v128.const.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D55910

llvm-svn: 349724
2018-12-20 04:20:32 +00:00
Amara Emerson 321bfb210a Fix build errors introduced by r349712 on aarch64 bots.
llvm-svn: 349723
2018-12-20 03:27:42 +00:00
Thomas Lively 8dbf29af95 [WebAssembly] Gate unimplemented SIMD ops on flag
Summary:
Gates v128.const, f32x4.sqrt, f32x4.div, i8x16.extract_lane_u, and
i16x8.extract_lane_u on the --wasm-enable-unimplemented-simd flag,
since these ops are not implemented yet in V8.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D55904

llvm-svn: 349720
2018-12-20 02:10:22 +00:00
Matt Arsenault 4339883710 AMDGPU: Make i1/i64/v2i32 and/or/xor legal
The 64-bit types do depend on the register bank,
but that's another issue to deal with later.

llvm-svn: 349716
2018-12-20 01:35:49 +00:00
Matt Arsenault 8cc98bee8a AMDGPU/GlobalISel: Fix ValueMapping tables for i1
This was incorrectly selecting SGPR for any i1 values,
e.g. G_TRUNC to i1 from a VGPR was still an SGPR.

llvm-svn: 349715
2018-12-20 01:33:43 +00:00
Craig Topper 9ca2f5605e [X86] Disable custom widening of signed/unsigned add/sub saturation intrinsics under -x86-experimental-vector-widening-legalization.
Generic legalization should take care of this.

llvm-svn: 349714
2018-12-20 01:32:06 +00:00
Amara Emerson 8cb186ce17 [AArch64][GlobalISel] Implement selection og G_MERGE of two s32s into s64.
This code pattern is an unfortunate side effect of the way some types get split
at call lowering. Ideally we'd either not generate it at all or combine it away
in the legalizer artifact combiner.

Until then, add selection support anyway which is a significant proportion of
our current fallbacks on CTMark.

rdar://46491420

llvm-svn: 349712
2018-12-20 01:11:04 +00:00
Matt Arsenault dff33c38e1 AMDGPU/GlobalISel: RegBankSelect for fp conversions
llvm-svn: 349709
2018-12-20 00:37:02 +00:00
Matt Arsenault 36d4092173 AMDGPU/GlobalISel: Legality/regbankselect for atomicrmw/atomic_cmpxchg
llvm-svn: 349708
2018-12-20 00:33:49 +00:00
Craig Topper 217b3b20d8 [X86] Remove TLI variable from ReplaceNodeResults. NFC
We're already in X86TargetLowering which is a derived class of TargetLowering. We can just call methods directly.

llvm-svn: 349695
2018-12-19 23:13:03 +00:00
Rhys Perry 3931ad38b9 AMDGPU: Add patterns for v4i16/v4f16 -> v4i16/v4f16 bitcasts
Reviewers: arsenm, tstellar

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D55058

llvm-svn: 349694
2018-12-19 22:53:33 +00:00
Evandro Menezes 374ccf6768 [AArch64] Improve Exynos predicates
Expand the predicate `ExynosResetPred` to include all forms of immediate
moves.

llvm-svn: 349686
2018-12-19 22:24:36 +00:00
Evandro Menezes ff827d737a [AArch64] Use canonical copy idiom
Use only the canonical form of the alias for register transfers in the
`IsCopyIdiomPred` predicate.

llvm-svn: 349685
2018-12-19 22:24:31 +00:00
Craig Topper d16da2b479 [X86] Remove a bunch of 'else' after returns in reduceVMULWidth. NFC
This reduces indentation and makes it obvious this function always returns something.

llvm-svn: 349671
2018-12-19 19:39:34 +00:00
Jessica Paquette 3560e93dc1 [GlobalISel][AArch64] Add support for @llvm.ceil
This adds a G_FCEIL generic instruction and uses it in AArch64. This adds
selection for floating point ceil where it has a supported, dedicated
instruction. Other cases aren't handled here.

It updates the relevant gisel tests and adds a select-ceil test. It also adds a
check to arm64-vcvt.ll which ensures that we don't fall back when we run into
one of the relevant cases.

llvm-svn: 349664
2018-12-19 19:01:36 +00:00