Commit Graph

49465 Commits

Author SHA1 Message Date
Thomas Lively 5ea17d450e [WebAssembly] Implement vector sext_inreg and tests with comparisons
Summary: Depends on D53251.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53252

llvm-svn: 344826
2018-10-20 01:35:23 +00:00
Thomas Lively 55735d522d [WebAssembly] Custom lower i64x2 constant shifts to avoid wrap
Summary: Depends on D53057.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53251

llvm-svn: 344825
2018-10-20 01:31:18 +00:00
Changpeng Fang f95f763ea5 AMDGPU: Add support pattern for SUB of one bit
Summary:
  Add selection patterns to support one bit Sub.

Reviewers:
  rampitec, arsenm

Differential Revision:
  https://reviews.llvm.org/D52946

llvm-svn: 344815
2018-10-19 21:09:21 +00:00
Craig Topper 5ed1099962 [X86] Remove some left over code from when MVT:i1 was a legal type for AVX512.
llvm-svn: 344813
2018-10-19 20:44:33 +00:00
Craig Topper 5c81c68385 [X86] In PostprocessISelDAG, start from allnodes_end, not the root.
There is no guarantee the root is at the end if isel created any nodes without morphing them. This includes the nodes created by manual isel from C++ code in X86ISelDAGToDAG.

This is similar to r333415 from PowerPC which is where I originally stole the peephole loop from.

I don't have a test case, but without this a future patch doesn't work which is how I found it.

llvm-svn: 344808
2018-10-19 19:24:42 +00:00
Thomas Lively 11a332d08d [WebAssembly] Handle undefined lane indices in SIMD patterns
Summary:
Undefined indices in shuffles can be used when not all lanes of the
output vector will be used. This happens for example in the expansion
of vector reduce operations. Regardless, undefs are legal as lane
indices in IR and should be supported.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53057

llvm-svn: 344803
2018-10-19 19:08:06 +00:00
Krzysztof Parzyszek 6bfc6577f2 [Hexagon] Remove support for V4
llvm-svn: 344791
2018-10-19 17:31:11 +00:00
Fangrui Song 2e83b2e9ee Use llvm::{all,any,none}_of instead std::{all,any,none}_of. NFC
llvm-svn: 344774
2018-10-19 06:12:02 +00:00
Eli Friedman b09c778715 Revert r344693 ("[ARM] bottom-top mul support in ARMParallelDSP")
Still causing failures on the polly-aosp buildbot; I'll follow up
with a reduced testcase.

llvm-svn: 344752
2018-10-18 19:34:30 +00:00
Kristina Brooks 312fcc116b [X86] Support for the mno-tls-direct-seg-refs flag
Allows to disable direct TLS segment access (%fs or %gs). GCC supports
a similar flag, it can be useful in some circumstances, e.g. when a thread
context block needs to be updated directly from user space. More info
and specific use cases: https://bugs.llvm.org/show_bug.cgi?id=16145

There is another revision for clang as well.
Related: D53102

All X86 CodeGen tests appear to pass:
```
[46/47] Running lit suite /SourceCache/llvm-trunk-8.0/test/CodeGen
Testing Time: 23.17s
  Expected Passes    : 3801
  Expected Failures  : 15
  Unsupported Tests  : 8021
```

Reviewed by: Craig Topper.

Patch by nruslan (Ruslan Nikolaev).

Differential Revision: https://reviews.llvm.org/D53103

llvm-svn: 344723
2018-10-18 03:14:37 +00:00
Nicolai Haehnle 4821937d2e AMDGPU: Avoid selecting ds_{read,write}2_b32 on SI
Summary:
To workaround a hardware issue in the (base + offset) calculation
when base is negative. The impact on code quality should be limited
since SILoadStoreOptimizer still runs afterwards and is able to
combine loads/stores based on known sign information.

This fixes visible corruption in Hitman on SI (easily reproducible
by running benchmark mode).

Change-Id: Ia178d207a5e2ac38ae7cd98b532ea2ae74704e5f
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99923

Reviewers: arsenm, mareko

Subscribers: jholewinski, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53160

llvm-svn: 344698
2018-10-17 15:37:48 +00:00
Nicolai Haehnle c4a2ff0950 AMDGPU: Divergence-driven selection of scalar buffer load intrinsics
Summary:
Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if
the load is really uniform. So select the scalar load intrinsics directly
to either VMEM or SMRD buffer loads based on divergence analysis.

If an offset happens to end up in a VGPR -- either because a floating
point calculation was involved, or due to other remaining deficiencies
in SIFixSGPRCopies -- we use v_readfirstlane.

There is some unrelated churn in tests since we now select MUBUF offsets
in a unified way with non-scalar buffer loads.

Change-Id: I170e6816323beb1348677b358c9d380865cd1a19

Reviewers: arsenm, alex-t, rampitec, tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53283

llvm-svn: 344696
2018-10-17 15:37:30 +00:00
Sam Parker 2ef3c0dad6 [ARM] bottom-top mul support in ARMParallelDSP
Previously reverted in rL343082.

Original commit message:

On failing to find sequences that can be converted into dual macs,
try to find sequential 16-bit loads that are used by muls which we
can then use smultb, smulbt, smultt with a wide load.

Differential Revision: https://reviews.llvm.org/D51983

llvm-svn: 344693
2018-10-17 13:02:48 +00:00
Nicolai Haehnle e9b134aa31 AMDGPU: Remove dead TableGen code
Summary: Change-Id: Ic1f2c1d0cf9e90a0baa9fc6bacd0d3c386069fb0

Reviewers: tpr

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53318

Change-Id: Ib4d143c898801e5cf6cb9999a495d62c91ae77fb
llvm-svn: 344691
2018-10-17 12:14:26 +00:00
Petar Jovanovic 8a08412533 [MIPS GlobalISel] Legalize constants
Legalize s1, s8, s16 and s64 G_CONSTANT for MIPS32.

Patch by Petar Avramovic.

Differential Revision: https://reviews.llvm.org/D53077

llvm-svn: 344684
2018-10-17 10:30:03 +00:00
Sjoerd Meijer 64cfb74a61 [ARM] Do not fuse VADD and VMUL, continued (2/2)
This is patch 2/2, following up on D53314, and is the functional change
to prevent fusing mul + add sequences into VFMAs.

Differential revision: https://reviews.llvm.org/D53315

llvm-svn: 344683
2018-10-17 10:05:44 +00:00
Sjoerd Meijer 1a213a42d6 [ARM] Follow up of rL344671, attempt to pacify a buildbot
It was rightfully complaining about an unpretty logical expression.

llvm-svn: 344677
2018-10-17 07:51:24 +00:00
Sjoerd Meijer ff3ab33ec8 [ARM][NFCI] Do not fuse VADD and VMUL, continued (1/2)
This is a follow up of rL342874, which stopped fusing muls and adds into VMLAs
for performance reasons on the Cortex-M4 and Cortex-M33.  This is a serie of 2
patches, that is trying to achieve the same for VFMA.  The second column in the
table below shows what we were generating before rL342874, the third column
what changed with rL342874, and the last column what we want to achieve with
these 2 patches:

 --------------------------------------------------------
 | Opt   |  < rL342874   |  >= rL342874   |             |
 |------------------------------------------------------|
 |-O3    |     vmla      |      vmul      |     vmul    |
 |       |               |      vadd      |     vadd    |
 |------------------------------------------------------|
 |-Ofast |     vfma      |      vfma      |     vmul    |
 |       |               |                |     vadd    |
 |------------------------------------------------------|
 |-Oz    |     vmla      |      vmla      |     vmla    |
 --------------------------------------------------------

This patch 1/2, is a cleanup of the spaghetti predicate logic on the different
VMLA and VFMA codegen rules, so that we can make the final functional change in
patch 2/2.  This also fixes a typo in the regression test added in rL342874.

Differential revision: https://reviews.llvm.org/D53314

llvm-svn: 344671
2018-10-17 07:26:35 +00:00
Craig Topper e0a992918b [X86] Match (cmp (and (shr X, C), mask), 0) to BEXTR+TEST.
Without this we match the CMP+AND to a TEST and then match the SHR separately. I'm trusting analyzeCompare to remove the TEST during the peephole pass. Otherwise we need to check the flag users to see if they only use the Z flag.

This recovers a case lost by r344270.

Differential Revision: https://reviews.llvm.org/D53310

llvm-svn: 344649
2018-10-16 22:29:36 +00:00
Krasimir Georgiev 547d824da6 Revert "[WebAssembly] LSDA info generation"
This reverts commit r344575.
Newly introduced test eh-lsda.ll.test fails with use-after-free under
ASAN build.

llvm-svn: 344639
2018-10-16 18:50:09 +00:00
Evandro Menezes c98decf864 [PATCH] [NFC][AArch64] Fix refactoring of macro fusion
Fix compiler error.

llvm-svn: 344632
2018-10-16 17:41:45 +00:00
Evandro Menezes 46eadcff9c [NFC][ARM] Refactor macro fusion
Simplify code for wildcards.

llvm-svn: 344625
2018-10-16 17:19:51 +00:00
Evandro Menezes de655c6d3a [NFC][AArch64] Refactor macro fusion
Simplify API of checking functions.

llvm-svn: 344624
2018-10-16 17:19:28 +00:00
Simon Pilgrim 7d27cfdcb2 [X86] Fix Skylake ReadAfterLd for PADDrm etc.
Missed in rL343868 as due to their custom InstrRW.

llvm-svn: 344600
2018-10-16 09:50:16 +00:00
Aleksandar Beserminji a5949439ca [mips][micromips] Fix how values in .gcc_except_table are calculated
When a landing pad is calculated in a program that is compiled
for micromips, it will point to an even address. Such an error will
cause a segmentation fault, as the instructions in micromips are
aligned on odd addresses. This patch sets the last bit of the offset
where a landing pad is, to 1, which will effectively be
an odd address and point to the instruction exactly.

Differential Revision: https://reviews.llvm.org/D52985

llvm-svn: 344591
2018-10-16 08:27:28 +00:00
Heejin Ahn 0981eaab47 [WebAssembly] LSDA info generation
Summary:
This adds support for LSDA (exception table) generation for wasm EH.
Wasm EH mostly follows the structure of Itanium-style exception tables,
with one exception: a call site table entry in wasm EH corresponds to
not a call site but a landing pad.

In wasm EH, the VM is responsible for stack unwinding. After an
exception occurs and the stack is unwound, the control flow is
transferred to wasm 'catch' instruction by the VM, after which the
personality function is called from the compiler-generated code. (Refer
to WasmEHPrepare pass for more information on this part.)

This patch:
- Changes wasm.landingpad.index intrinsic to take a token argument, to
make this 1:1 match with a catchpad instruction
- Stores landingpad index info and catch type info MachineFunction in
before instruction selection
- Lowers wasm.lsda intrinsic to an MCSymbol pointing to the start of an
exception table
- Adds WasmException class with overridden methods for table generation
- Adds support for LSDA section in Wasm object writer

Reviewers: dschuff, sbc100, rnk

Subscribers: mgorny, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D52748

llvm-svn: 344575
2018-10-16 00:09:12 +00:00
Craig Topper e70c560b6d [X86] Remove some isel patterns that shouldn't be possible.
These included a bitcast of a load from v4f32 to v2f64, but DAG combine should have already changed the type of the load to remove the cast.

llvm-svn: 344573
2018-10-15 23:34:58 +00:00
Craig Topper 2909a3d9d0 [X86] Fix a bad bitcast in the load form of vXi16 uniform shift patterns for EVEX encoded instructions.
llvm-svn: 344563
2018-10-15 21:51:32 +00:00
Simon Pilgrim 095a7fe635 [AARCH64] Improve vector popcnt lowering with ADDLP
AARCH64 equivalent to D53257 - uses widening pairwise adds on vXi8 CTPOP to support i16/i32/i64 vectors.

This is a blocker for generic vector CTPOP expansion (P32655) - this will remove the aarch64 diff from D53258.

Differential Revision: https://reviews.llvm.org/D53259

llvm-svn: 344554
2018-10-15 21:15:58 +00:00
Konstantin Zhuravlyov 94dfcc2eb2 AMDGPU: Generate .amdgcn_target for object code v3
Differential Revision: https://reviews.llvm.org/D53221

llvm-svn: 344552
2018-10-15 20:37:47 +00:00
Aleksandar Beserminji 81eb440772 [mips][micromips] Fix overlaping FDEs error
When compiling static executable for micromips, CFI symbols
are incorrectly labeled as MICROMIPS, which cause
".eh_frame_hdr refers to overlapping FDEs." error.

This patch does not label CFI symbols as MICROMIPS, and FDEs do not
overlap anymore. This patch also exposes another bug, which is fixed
here: https://reviews.llvm.org/D52985

Differential Revision: https://reviews.llvm.org/D52987

llvm-svn: 344516
2018-10-15 14:39:12 +00:00
Aleksandar Beserminji 585f55bb8b [mips][micromips] Revert "Fix overlaping FDEs error"
This reverts r344511.

llvm-svn: 344515
2018-10-15 14:36:48 +00:00
Simon Pilgrim 5abb607ebe [ARM][NEON] Improve vector popcnt lowering with PADDL (PR39281)
As I suggested on PR39281, this patch uses PADDL pairwise addition to widen from the vXi8 CTPOP result to the target vector type.

This is a blocker for moving more x86 code to generic vector CTPOP expansion (P32655 + D53258) - ARM's vXi64 CTPOP currently expands, which would generate a vXi64 MUL but ARM's custom lowering expands the general MUL case and vectors aren't well handled in LegalizeDAG - improving the CTPOP lowering was a lot easier than fixing the MUL lowering for this one case......

Differential Revision: https://reviews.llvm.org/D53257

llvm-svn: 344512
2018-10-15 13:20:41 +00:00
Aleksandar Beserminji 10ec5c8c28 [mips][micromips] Fix overlaping FDEs error
When compiling static executable for micromips, CFI symbols
are incorrectly labeled as MICROMIPS, which cause
".eh_frame_hdr refers to overlapping FDEs." error.

This patch does not label CFI symbols as MICROMIPS, and FDEs do not
overlap anymore. This patch also exposes another bug, which is fixed
here: https://reviews.llvm.org/D52985

Differential Revision: https://reviews.llvm.org/D52987

llvm-svn: 344511
2018-10-15 12:59:17 +00:00
Chandler Carruth edb12a838a [TI removal] Make variables declared as `TerminatorInst` and initialized
by `getTerminator()` calls instead be declared as `Instruction`.

This is the biggest remaining chunk of the usage of `getTerminator()`
that insists on the narrow type and so is an easy batch of updates.
Several files saw more extensive updates where this would cascade to
requiring API updates within the file to use `Instruction` instead of
`TerminatorInst`. All of these were trivial in nature (pervasively using
`Instruction` instead just worked).

llvm-svn: 344502
2018-10-15 10:04:59 +00:00
Craig Topper 06aea1720a [X86] Move promotion of vector and/or/xor from legalization to DAG combine
Summary:
I've noticed that the bitcasts we introduce for these make computeKnownBits and computeNumSignBits not work well in LegalizeVectorOps. LegalizeVectorOps legalizes bottom up while LegalizeDAG legalizes top down. The bottom up strategy for LegalizeVectorOps means operands are legalized before their uses. So we promote and/or/xor before we legalize the operands that use them making computeKnownBits/computeNumSignBits in places like LowerTruncate suboptimal. I looked at changing LegalizeVectorOps to be top down as well, but that was more disruptive and caused some regressions. I also looked at just moving promotion of binops to LegalizeDAG, but that had a few issues one around matching AND,ANDN,OR into VSELECT because I had to create ANDN as vXi64, but the other nodes hadn't legalized yet, I didn't look too hard at fixing that.

This patch seems to produce better results overall than my other attempts. We now form broadcasts of constants better in some cases. For at least some of them the AND was being introduced in LegalizeDAG, promoted to vXi64, and the BUILD_VECTOR was also legalized there. I think we got bad ordering of that. Now the promotion is out of the legalizer so we handle this better.

In the longer term I think we really should evaluate whether we should be doing this promotion at all. It's really there to reduce isel pattern count, but I'm wondering if we'd be better served just eating the pattern cost or doing C++ based isel for vector and/or/xor in X86ISelDAGToDAG. The masked and/or/xor will definitely be difficult in patterns if a bitcast gets between the vselect and the and/or/xor node. That becomes a lot of permutations to cover.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53107

llvm-svn: 344487
2018-10-15 01:51:58 +00:00
Craig Topper 671779456a [X86] Add 128 MOVDDUP to the constant pool printing in X86AsmPrinter::EmitInstruction.
We use this instruction to broadcast a single 64-bit value to a v2i64/v2f64 vector.

llvm-svn: 344486
2018-10-15 01:51:53 +00:00
Simon Pilgrim 861cd0ba44 [X86][AVX] Enable lowerVectorShuffleAsLanePermuteAndPermute v16i16/v32i8 shuffle lowering
Extends D53148 from v4f64 now that we have test coverage for v16i16/v32i8 shuffles.

llvm-svn: 344481
2018-10-14 17:34:20 +00:00
Dorit Nuzman 38bbf81ade recommit 344472 after fixing build failure on ARM and PPC.
llvm-svn: 344475
2018-10-14 08:50:06 +00:00
Dorit Nuzman 5118c68cde revert 344472 due to failures.
llvm-svn: 344473
2018-10-14 07:21:20 +00:00
Dorit Nuzman 8174368955 [IAI,LV] Add support for vectorizing predicated strided accesses using masked
interleave-group

The vectorizer currently does not attempt to create interleave-groups that
contain predicated loads/stores; predicated strided accesses can currently be
vectorized only using masked gather/scatter or scalarization. This patch makes
predicated loads/stores candidates for forming interleave-groups during the
Loop-Vectorizer's analysis, and adds the proper support for masked-interleave-
groups to the Loop-Vectorizer's planning and transformation stages. The patch
also extends the TTI API to allow querying the cost of masked interleave groups
(which each target can control); Targets that support masked vector loads/
stores may choose to enable this feature and allow vectorizing predicated
strided loads/stores using masked wide loads/stores and shuffles.

Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53011

llvm-svn: 344472
2018-10-14 07:06:16 +00:00
Craig Topper 20fa085d74 [X86] Fix bad indentation. NFC
llvm-svn: 344471
2018-10-14 04:01:40 +00:00
Craig Topper ec4b75f47a [X86] Type legalize v2f32 stores by widening to v4f32, casting to v2f64, extracting f64 and storing.
Summary: This is similar to what D52528 did for loads. It should match what generic type legalization does in 64-bit mode where it uses a v2i64 cast and an i64 store.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53173

llvm-svn: 344470
2018-10-14 03:36:27 +00:00
Benjamin Kramer c55e997556 Move some helpers from the global namespace into anonymous ones.
llvm-svn: 344468
2018-10-13 22:18:22 +00:00
Thomas Lively ffde98de21 [WebAssembly][NFC] Fix signed/unsigned comparison warning
llvm-svn: 344459
2018-10-13 16:58:03 +00:00
Simon Pilgrim c5d7c6e5f6 [X86][SSE] Remove most of vector CTTZ custom lowering and use LegalizeDAG instead.
There is one remnant - AVX1 custom splitting of 256-bit vectors - which is due to a regression where the X86ISD::ANDNP is still performed as a YMM.

I've also tightened the CTLZ or CTPOP lowering in SelectionDAGLegalize::ExpandBitCount to require a legal CTLZ - it doesn't affect existing users and fixes an issue with AVX512 codegen.

llvm-svn: 344457
2018-10-13 16:11:15 +00:00
Simon Pilgrim 1c2051ead7 [X86][SSE] Begin removing vector CTTZ custom lowering and use LegalizeDAG instead.
Adds CTTZ vector legalization support and begins the removal of the X86/SSE custom lowering. 

llvm-svn: 344453
2018-10-13 15:16:55 +00:00
Simon Pilgrim 1c6d320351 [X86][SSE] combineIncDecVector - use isConstantSplat
Use isConstantSplat instead of ISD::isConstantSplatVector to let us us peek through to illegal types (in this case for i686 targets to recognise i64 constants)

llvm-svn: 344452
2018-10-13 14:45:44 +00:00
Simon Pilgrim a03379527a [X86] Pull out target constant splat helper function. NFCI.
The code in LowerScalarImmediateShift is just a more powerful version of ISD::isConstantSplatVector.

llvm-svn: 344451
2018-10-13 14:28:40 +00:00
Simon Pilgrim 10434cbae1 Pull out repeated getOperand(). NFCI.
llvm-svn: 344450
2018-10-13 13:33:32 +00:00