Commit Graph

144065 Commits

Author SHA1 Message Date
Simon Pilgrim 4eab18f6b8 [X86][SSE] Detect unary PBLEND shuffles.
These can appear during shuffle combining.

llvm-svn: 293628
2017-01-31 13:58:01 +00:00
Simon Pilgrim c29eab52e8 [X86][SSE] Add support for combining PINSRW into a target shuffle.
Also add the ability to recognise PINSR(Vex, 0, Idx).

Targets shuffle combines won't replace multiple insertions with a bit mask until a depth of 3 or more, so we avoid codesize bloat.

The unnecessary vpblendw in clearupper8xi16a will be fixed in an upcoming patch.

llvm-svn: 293627
2017-01-31 13:51:10 +00:00
Nemanja Ivanovic 2f2a6ab991 [PowerPC][Altivec] Add vmr extended mnemonic
Just adds the vmr (Vector Move Register) mnemonic for the VOR instruction in
the PPC back end.

Committing on behalf of brunoalr (Bruno Rosa).

Differential Revision: https://reviews.llvm.org/D29133

llvm-svn: 293626
2017-01-31 13:43:11 +00:00
Florian Hahn 5364cf3b56 [LoopUnroll] Use addClonedBlockToLoopInfo to clone the top level loop (NFC)
Summary:
rL293124 added the necessary infrastructure to properly add the cloned
top level loop to LoopInfo, which means we do not have to do it manually
in CloneLoopBlocks.

@mkuper sorry for not pointing this out during my review of D29156, I just
realized that today.


Reviewers: mzolotukhin, chandlerc, mkuper

Reviewed By: mkuper

Subscribers: llvm-commits, mkuper

Differential Revision: https://reviews.llvm.org/D29173

llvm-svn: 293615
2017-01-31 11:13:44 +00:00
Simon Dardis 12850eeac5 [mips] Addition of the immediate cases for the instructions [d]div, [d]divu
Related to http://reviews.llvm.org/D15772

Depends on http://reviews.llvm.org/D16888

Adds support for immediate operand for [D]DIV[U] instructions.

Patch By: Srdjan Obucina

Reviewers: zoran.jovanovic, vkalintiris, dsanders, obucina

Differential Revision: https://reviews.llvm.org/D16889

llvm-svn: 293614
2017-01-31 10:49:24 +00:00
Craig Topper 2cfa2071bd [AVX-512] Don't both looking into the AVX512DQ execution domain fixing tables if AVX512DQ isn't supported since we can't do any conversion anyway.
llvm-svn: 293608
2017-01-31 06:49:55 +00:00
Craig Topper 797e32dd98 [X86] Add AVX and SSE2 version of MOVSDmr to execution domain fixing table. AVX-512 already did this for the EVEX version.
llvm-svn: 293607
2017-01-31 06:49:53 +00:00
Craig Topper 779e4c5bb4 [AVX-512] Fix copy and paste bug in execution domain fixing tables so that we can convert 256-bit movnt instructions.
llvm-svn: 293606
2017-01-31 06:49:50 +00:00
Justin Lebar 1c9692a46f [NVPTX] Implement NVPTXTargetLowering::getSqrtEstimate.
Summary:

This lets us lower to sqrt.approx and rsqrt.approx under more
circumstances.

* Now we emit sqrt.approx and rsqrt.approx for calls to @llvm.sqrt.f32,
  when fast-math is enabled.  Previously, we only would emit it for
  calls to @llvm.nvvm.sqrt.f.  (With this patch we no longer emit
  sqrt.approx for calls to @llvm.nvvm.sqrt.f; we rely on intcombine to
  simplify llvm.nvvm.sqrt.f into llvm.sqrt.f32.)

* Now we emit the ftz version of rsqrt.approx when ftz is enabled.
  Previously, we only emitted rsqrt.approx when ftz was disabled.

Reviewers: hfinkel

Subscribers: llvm-commits, tra, jholewinski

Differential Revision: https://reviews.llvm.org/D28508

llvm-svn: 293605
2017-01-31 05:58:22 +00:00
Craig Topper 06e038c6de [X86] Update the broadcast fallback patterns to use shuffle instructions from the appropriate execution domain.
llvm-svn: 293603
2017-01-31 05:18:29 +00:00
Craig Topper 88b0a47312 [X86] Add test cases for AVX1 broadcast fallback patterns when load can't be folded.
Also add test cases that do an insertelement to all elements for the 8 element vector tests.

llvm-svn: 293602
2017-01-31 05:18:27 +00:00
Craig Topper e9e84c8284 [AVX-512] Fix the ExeDomain for VMOVDDUP, VMOVSLDUP, and VMOVSHDUP.
llvm-svn: 293601
2017-01-31 05:18:24 +00:00
Matt Arsenault f84e5d9a27 AMDGPU: Generalize matching of v_med3_f32
I think this is safe as long as no inputs are known to ever
be nans.

Also add an intrinsic for fmed3 to be able to handle all safe
math cases.

llvm-svn: 293598
2017-01-31 03:07:46 +00:00
Matt Arsenault 973c4aebad InferAddressSpaces: Rename constant
llvm-svn: 293594
2017-01-31 02:17:41 +00:00
Matt Arsenault 72f259b8eb InferAddressSpaces: Handle icmp
llvm-svn: 293593
2017-01-31 02:17:32 +00:00
Craig Topper d064cc93b2 [X86] Remove patterns for X86VPermilpi with integer types. I don't think we've formed these since the shuffle lowering rewrite.
llvm-svn: 293592
2017-01-31 02:09:53 +00:00
Craig Topper 85935f69fb [X86] Remove duplicate patterns for X86VPermilpv that already exist in the instructions themselves.
llvm-svn: 293591
2017-01-31 02:09:51 +00:00
Craig Topper ced68315ce [X86] Remove patterns for selecting PSHUFD with FP types. We don't seem to do this anymore and the AVX case definitely should be using VPERMILPS anyway.
llvm-svn: 293590
2017-01-31 02:09:49 +00:00
Craig Topper b76494e017 [X86] Remove 'else' after 'return'. NFC
llvm-svn: 293589
2017-01-31 02:09:46 +00:00
Craig Topper f9d901f0ea [X86] Use integer broadcast instructions for integer broadcast patterns.
I'm not sure why we were using an FP instruction before and had to have a comment calling attention to it, but not justifying it.

llvm-svn: 293588
2017-01-31 02:09:43 +00:00
Matt Arsenault 6d5a8d48fd InferAddressSpaces: Support memory intrinsics
llvm-svn: 293587
2017-01-31 01:56:57 +00:00
Matt Arsenault 6c907a9bb3 InferAddressSpaces: Support atomics
llvm-svn: 293584
2017-01-31 01:40:38 +00:00
Matt Arsenault d89a6e11a7 InferAddressSpaces: Don't replace volatile users
llvm-svn: 293582
2017-01-31 01:30:16 +00:00
Matt Arsenault b6491cc854 AMDGPU: Implement hook for InferAddressSpaces
For now just port some of the existing NVPTX tests
and from an old HSAIL optimization pass which
approximately did the same thing.

Don't enable the pass yet until more testing is done.

llvm-svn: 293580
2017-01-31 01:20:54 +00:00
Matt Arsenault 850657a439 NVPTX: Move InferAddressSpaces to generic code
llvm-svn: 293579
2017-01-31 01:10:58 +00:00
Eugene Zelenko 342257ea92 [ARM] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 293578
2017-01-31 00:56:17 +00:00
Saleem Abdulrasool 6f5f001fdc TableGen: use fully qualified name for StringLiteral
Use the qualified name for StringLiteral (llvm::StringLiteral) when
generating the sources.  This is needed as the generated files may be
used out-of-tree (e.g. swift) where you may not have a
`using namespace llvm;` resulting in an undefined lookup.

llvm-svn: 293577
2017-01-31 00:45:01 +00:00
Eli Friedman 10d1ff64fe [SCEV] Simplify/generalize howFarToZero solving.
Make SolveLinEquationWithOverflow take the start as a SCEV, so we can
solve more cases. With that implemented, get rid of the special case
for powers of two.

The additional functionality probably isn't particularly useful,
but it might help a little for certain cases involving pointer
arithmetic.

Differential Revision: https://reviews.llvm.org/D28884

llvm-svn: 293576
2017-01-31 00:42:42 +00:00
Reid Kleckner 71012aa945 Remove LLVM_CONFIG from config headers
It appears to be dead, and it needlessly caused me to rebuild all of
LLVM when I changed CMAKE_INSTALL_PREFIX.

llvm-svn: 293574
2017-01-31 00:34:23 +00:00
Vedant Kumar 359785ddad Fix llvm-readobj build error after r293569
Clang complains about an ambiguous call to printNumber() because it
can't work out what size_t should convert to. I picked uint64_t.

llvm-svn: 293573
2017-01-30 23:58:51 +00:00
Keno Fischer 578cf7aae7 [ExecutionDepsFix] Improve clearance calculation for loops
Summary:
In revision rL278321, ExecutionDepsFix learned how to pick a better
register for undef register reads, e.g. for instructions such as
`vcvtsi2sdq`. While this revision improved performance on a good number
of our benchmarks, it unfortunately also caused significant regressions
(up to 3x) on others. This regression turned out to be caused by loops
such as:

PH -> A -> B (xmm<Undef> -> xmm<Def>) -> C -> D -> EXIT
      ^                                  |
      +----------------------------------+

In the previous version of the clearance calculation, we would visit
the blocks in order, remembering for each whether there were any
incoming backedges from blocks that we hadn't processed yet and if
so queuing up the block to be re-processed. However, for loop structures
such as the above, this is clearly insufficient, since the block B
does not have any unknown backedges, so we do not see the false
dependency from the previous interation's Def of xmm registers in B.

To fix this, we need to consider all blocks that are part of the loop
and reprocess them one the correct clearance values are known. As
an optimization, we also want to avoid reprocessing any later blocks
that are not part of the loop.

In summary, the iteration order is as follows:
Before: PH A B C D A'
Corrected (Naive): PH A B C D A' B' C' D'
Corrected (w/ optimization): PH A B C A' B' C' D

To facilitate this optimization we introduce two new counters for each
basic block. The first counts how many of it's predecssors have
completed primary processing. The second counts how many of its
predecessors have completed all processing (we will call such a block
*done*. Now, the criteria to reprocess a block is as follows:
    - All Predecessors have completed primary processing
    - For x the number of predecessors that have completed primary
      processing *at the time of primary processing of this block*,
      the number of predecessors that are done has reached x.

The intuition behind this criterion is as follows:
We need to perform primary processing on all predecessors in order to
find out any direct defs in those predecessors. When predecessors are
done, we also know that we have information about indirect defs (e.g.
in block B though that were inherited through B->C->A->B). However,
we can't wait for all predecessors to be done, since that would
cause cyclic dependencies. However, it is guaranteed that all those
predecessors that are prior to us in reverse postorder will be done
before us. Since we iterate of the basic blocks in reverse postorder,
the number x above, is precisely the count of the number of predecessors
prior to us in reverse postorder.

Reviewers: myatsina
Differential Revision: https://reviews.llvm.org/D28759

llvm-svn: 293571
2017-01-30 23:37:03 +00:00
Sanjay Patel 8c5f236197 [InstCombine] enable (X <<nsw C1) >>s C2 --> X <<nsw (C1 - C2) for vectors with splat constants
llvm-svn: 293570
2017-01-30 23:35:52 +00:00
Derek Schuff 6d76b7b455 [WebAssembly] Add wasm support for llvm-readobj
Create a WasmDumper subclass of ObjDumper to support Webassembly binary
files.

Patch by Sam Clegg

Differential Revision: https://reviews.llvm.org/D27355

llvm-svn: 293569
2017-01-30 23:30:52 +00:00
Matt Arsenault 9f432ec24c NVPTX: Trivial cleanups of NVPTXInferAddressSpaces
- Move DEBUG_TYPE below includes
- Change unknown address space constant to be consistent with other
  passes
- Grammar fixes in debug output

llvm-svn: 293567
2017-01-30 23:27:11 +00:00
Sanjay Patel abbb118a78 [InstCombine] add vector test for (X <<nsw C1) >>s C2 --> X <<nsw (C1 - C2); NFC
llvm-svn: 293566
2017-01-30 23:26:17 +00:00
Eugene Zelenko dde94e4c4f [Mips] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 293565
2017-01-30 23:21:32 +00:00
Benjamin Kramer 365c9bd941 [ICP] Fix bool conversion warning and actually write out the reason instead of dropping it.
llvm-svn: 293564
2017-01-30 23:11:29 +00:00
Matt Arsenault 42b6478344 NVPTX: Refactor NVPTXInferAddressSpaces to check TTI
Add a new TTI hook for getting the generic address space value.

llvm-svn: 293563
2017-01-30 23:02:12 +00:00
Sanjay Patel 0c39d56a60 [InstCombine] enable more lshr(shl X, C1), C2 folds for vectors with splat constants
llvm-svn: 293562
2017-01-30 23:01:05 +00:00
Simon Pilgrim 3905e03a47 [X86][SSE] Fix unsigned <= 0 warning in assert. NFCI.
Thanks to @mkuper

llvm-svn: 293561
2017-01-30 22:58:44 +00:00
Simon Pilgrim a80a47afef [X86][SSE] Generalize the number of decoded shuffle inputs. NFCI.
combineX86ShufflesRecursively can still only handle a maximum of 2 shuffle inputs but everything before it now supports any number of shuffle inputs.

This will be necessary for combining OR(SHUFFLE, SHUFFLE) patterns.

llvm-svn: 293560
2017-01-30 22:48:49 +00:00
Dehao Chen 6775f5d629 Expose isLegalToPromot as a global helper function so that SamplePGO pass can call it for legality check.
Summary: SamplePGO needs to check if it is legal to promote a target before it actually promotes it.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29306

llvm-svn: 293559
2017-01-30 22:46:37 +00:00
Dehao Chen 6217fa44b8 Revert r292979 which causes compile time failure.
llvm-svn: 293557
2017-01-30 22:26:05 +00:00
Sanjay Patel 98cc841421 [InstCombine] add tests for more shift-shift patterns; NFC
llvm-svn: 293555
2017-01-30 22:24:36 +00:00
Eli Friedman 2345733246 Fix line endings.
llvm-svn: 293554
2017-01-30 22:04:23 +00:00
Tom Stellard 887a2562b7 AMDGPU: Fix release build broken by r293551
llvm-svn: 293553
2017-01-30 22:02:58 +00:00
Artem Tamazov 61eb79d7a7 Reapply [AMDGPU][mc][tests][NFC] Add coverage/smoke tests for Gfx7 and Gfx8.
llvm-svn: 293552
2017-01-30 21:59:21 +00:00
Tom Stellard ca16621b2a Re-commit AMDGPU/GlobalISel: Add support for simple shaders
Fix build when global-isel is disabled and fix a warning.

Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP.

Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm

Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris

Differential Revision: https://reviews.llvm.org/D26730

llvm-svn: 293551
2017-01-30 21:56:46 +00:00
Tim Northover 2bf8c9d381 GlobalISel: correctly translate invoke when callee is a register.
This should fix the GlobalISel verifier.

llvm-svn: 293550
2017-01-30 21:45:21 +00:00
Stanislav Mekhanoshin a3b72798af [AMDGPU] Internalize non-kernel symbols
Since we have no call support and late linking we can produce code
only for used symbols. This saves compilation time, size of the final
executable, and size of any intermediate dumps.

Run Internalize pass early in the opt pipeline followed by global
DCE pass. To enable it RT can pass -amdgpu-internalize-symbols option.

Differential Revision: https://reviews.llvm.org/D29214

llvm-svn: 293549
2017-01-30 21:05:18 +00:00