Commit Graph

145388 Commits

Author SHA1 Message Date
Filipe Cabecinhas 33dd486f1d [AddressSanitizer] Add PS4 offset
llvm-svn: 295994
2017-02-23 17:10:28 +00:00
Sanjay Patel 68e4cb3c86 [InstCombine] use loop instead of recursion to peek through FPExt; NFCI
llvm-svn: 295992
2017-02-23 16:39:51 +00:00
Sanjay Patel adf2ab16e4 [InstCombine] use 'match' to reduce code; NFCI
llvm-svn: 295991
2017-02-23 16:26:03 +00:00
Jan Vesely 70293a045b AMDGPU/SI: Fix trunc i16 pattern
Hit on ASICs that support 16bit instructions.

Differential Revision: https://reviews.llvm.org/D30281

llvm-svn: 295990
2017-02-23 16:12:21 +00:00
Simon Pilgrim 4c0ea9d438 Strip trailing whitespace.
llvm-svn: 295989
2017-02-23 16:07:04 +00:00
Krzysztof Parzyszek af5ff65d67 [Hexagon] Patterns for CTPOP, BSWAP and BITREVERSE
llvm-svn: 295981
2017-02-23 15:02:09 +00:00
Tobias Grosser 38c0ab45f5 [docs] Add information about how to checkout polly to getting started page
llvm-svn: 295974
2017-02-23 14:27:07 +00:00
Diana Picus a8cb0cd8f2 [ARM] GlobalISel: Lower call returns
Introduce a common ValueHandler for call returns and formal arguments, and
inherit two different versions for handling the differences (at the moment the
only difference is the way physical registers are marked as used).

llvm-svn: 295973
2017-02-23 14:18:41 +00:00
Alexey Bataev f77d1656af [SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong
result

Summary:
If the same value is used several times as an extra value, SLP
vectorizer takes it into account only once instead of actual number of
using.
For example:
```
int val = 1;
for (int y = 0; y < 8; y++) {
  for (int x = 0; x < 8; x++) {
    val = val + input[y * 8 + x] + 3;
  }
}
```
We have 2 extra rguments: `1` - initial value of horizontal reduction
and `3`, which is added 8*8 times to the reduction. Before the patch we
added `1` to the reduction value and added once `3`, though it must be
added 64 times.

Reviewers: mkuper, mzolotukhin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30262

llvm-svn: 295972
2017-02-23 13:37:09 +00:00
Diana Picus a606713c33 [ARM] GlobalISel: Lower call parameters in regs
Add support for lowering calls with parameters than can fit into regs.  Use the
same ValueHandler that we used for function returns, but rename it to match its
new, extended purpose.

llvm-svn: 295971
2017-02-23 13:25:43 +00:00
Ayman Musa 4b2c968c43 [X86][AVX] Disable VCVTSS2SD & VCVTSD2SS memory folding and fix the register class of their first input when creating node in fast-isel.
(Quick fix to buildbot failure after rL295940 commit).

llvm-svn: 295970
2017-02-23 13:15:44 +00:00
Simon Dardis d410fc8f28 [mips][ias] Further relax operands of certain assembly instructions
This patch adjusts the most relaxed predicate of immediate operands to accept
immediate forms such as ~(0xf0000000|0x000f00000). Previously these forms
would be accepted by GAS and rejected by IAS.

This partially resolves PR/30383.

Thanks to Sean Bruno for reporting the issue!

Reviewers: slthakur, seanbruno

Differential Revision: https://reviews.llvm.org/D29218

llvm-svn: 295965
2017-02-23 12:40:58 +00:00
Kristof Beyls 5ac6adbb6d Fix assertion failure in ARMConstantIslandPass.
The ARMConstantIslandPass didn't have support for handling accesses to
constant island objects through ARM::t2LDRBpci instructions. This adds
support for that.

This fixes PR31997.

llvm-svn: 295964
2017-02-23 12:24:55 +00:00
Simon Pilgrim 858d8e672d Fix signed/unsigned comparison warning on MSVC
llvm-svn: 295962
2017-02-23 12:00:34 +00:00
Alexey Bataev 14b370c1bf Revert "[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong"
This reverts commit 7c5141e577d9efd1c8e3087566a38ce6b3a41a84.

llvm-svn: 295957
2017-02-23 11:09:35 +00:00
Alexey Bataev 7ae653285d [SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong
result

Summary:
If the same value is used several times as an extra value, SLP
vectorizer takes it into account only once instead of actual number of
using.
For example:
```
int val = 1;
for (int y = 0; y < 8; y++) {
  for (int x = 0; x < 8; x++) {
    val = val + input[y * 8 + x] + 3;
  }
}
```
We have 2 extra rguments: `1` - initial value of horizontal reduction
and `3`, which is added 8*8 times to the reduction. Before the patch we
added `1` to the reduction value and added once `3`, though it must be
added 64 times.

Reviewers: mkuper, mzolotukhin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30262

llvm-svn: 295956
2017-02-23 10:57:15 +00:00
Alexey Bataev 7337212e83 Revert "[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong"
This reverts commit d83c81ee6a8dea662808ac22b396d1bb0595c89d.

llvm-svn: 295951
2017-02-23 09:59:29 +00:00
Alexey Bataev 68f2402c61 [SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong
result

Summary:
If the same value is used several times as an extra value, SLP
vectorizer takes it into account only once instead of actual number of
using.
For example:
```
int val = 1;
for (int y = 0; y < 8; y++) {
  for (int x = 0; x < 8; x++) {
    val = val + input[y * 8 + x] + 3;
  }
}
```
We have 2 extra rguments: `1` - initial value of horizontal reduction
and `3`, which is added 8*8 times to the reduction. Before the patch we
added `1` to the reduction value and added once `3`, though it must be
added 64 times.

Reviewers: mkuper, mzolotukhin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30262

llvm-svn: 295949
2017-02-23 09:40:38 +00:00
Ayman Musa 524dbdaa2b [X86][AVX512] Remove VCVTSS2SDZ & VCVTSD2SSZ from memory folding tables as they introduce new read dependency when folding.
(Quick fix to buildbot fail). 

llvm-svn: 295946
2017-02-23 08:13:36 +00:00
Ayman Musa 6e670cf44f [X86][AVX512] Change VCVTSS2SD and VCVTSD2SS node types to keep consistency between VEX/EVEX versions.
AVX versions of the converts work on f32/f64 types, while AVX512 version work on vectors.

Differential Revision: https://reviews.llvm.org/D29988

llvm-svn: 295940
2017-02-23 07:24:21 +00:00
Matt Arsenault f0a88dbaab LoadStoreVectorizer: Split even sized illegal chains properly
Implement isLegalToVectorizeLoadChain for AMDGPU to avoid
producing private address spaces accesses that will need to be
split up later. This was doing the wrong thing in the case
where the queried chain was an even number of elements.

A possible <4 x i32> store was being split into
store <2 x i32>
store i32
store i32

rather than
store <2 x i32>
store <2 x i32>

when legal.

llvm-svn: 295933
2017-02-23 03:58:53 +00:00
Craig Topper 185ced8b2b [X86][IR] In AutoUpgrade, check explicitly for xop.vpcmov and xop.vpcmov.256 instead of anything starting with xop.vpcmov
There were some older intrinsics that only existed for less than a month in 2012 that still exist in some out of tree test files that start with this string, but aren't able to be handled by the current upgrade code and fire an assert. Now we'll go back to treating them as not intrinsics at all and just passing them through to output.

Fixes PR32041, sort of.

llvm-svn: 295930
2017-02-23 03:22:14 +00:00
Matt Arsenault 3e9ce44eee TargetOptions: Fix not accounting for NoSignedZerosFPMath in ==
llvm-svn: 295928
2017-02-23 03:16:44 +00:00
Matthias Braun 1cba1408c0 Test if we can use raw strings on all platforms compiling LLVM.
llvm-svn: 295917
2017-02-23 01:09:01 +00:00
Eli Friedman 13f2e35311 Explicitly state the behavior of inbounds with a null pointer.
See https://llvm.org/bugs/show_bug.cgi?id=31439; this reflects LLVM's
behavior in practice, and should be compatible with C/C++ rules.

Differential Revision: https://reviews.llvm.org/D28026

llvm-svn: 295916
2017-02-23 00:48:18 +00:00
Matt Arsenault d4bca1e9ef AMDGPU: Replace disabled exp inputs with undef
llvm-svn: 295914
2017-02-23 00:44:03 +00:00
Matt Arsenault a9e16e6597 AMDGPU: Add another BFE pattern
This is the pattern that falls out of the instruction's
definition if offset == 0.

llvm-svn: 295912
2017-02-23 00:23:43 +00:00
Matt Arsenault 79a45db7f5 AMDGPU: Use clamp with f64
llvm-svn: 295908
2017-02-22 23:53:37 +00:00
Michael Kuperstein 6181c62b95 Revert r295868 because it breaks a different SLP lit test.
llvm-svn: 295906
2017-02-22 23:35:13 +00:00
Matt Arsenault d5c6515b68 AMDGPU: Fold FP clamp as modifier bit
The manual is unclear on the details of this. It's not
clear to me if denormals are not allowed with clamp,
or if that is only omod. Not allowing denorms for
fp16 or fp64 isn't useful so I also question if that
is really a restriction. Same with whether this is valid
without IEEE mode enabled.

llvm-svn: 295905
2017-02-22 23:27:53 +00:00
Wei Ding f2cce02eb2 AMDGPU : Update TrapCode based on Trap Handler ABI.
Differential Revision: http://reviews.llvm.org/D30232

llvm-svn: 295904
2017-02-22 23:22:19 +00:00
Justin Bogner d519a92a27 [libFuzzer] Update traces hooks test after r293741
This test now passes on darwin.

llvm-svn: 295902
2017-02-22 23:12:36 +00:00
Justin Bogner 59c8420018 [libFuzzer] Mark a test that infinite loops as unsupported
We need to investigate this, but for now it just causes too much
headache when trying to run these tests.

llvm-svn: 295900
2017-02-22 23:05:17 +00:00
Matt Arsenault f5262256a1 AMDGPU: Add replacement bfe intrinsics
llvm-svn: 295899
2017-02-22 23:04:58 +00:00
Sanjay Patel 4805ce0b17 [InstCombine] don't try SimplifyDemandedInstructionBits from add/sub because it's slow and unlikely to succeed
Notably, no regression tests change when we remove these calls, and these are expensive calls.

The motivation comes from the general acknowledgement that the compiler is getting slower:
http://lists.llvm.org/pipermail/llvm-dev/2017-January/109188.html
http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html

And specifically the test case attached to PR32037:
https://bugs.llvm.org//show_bug.cgi?id=32037

Profiling the middle-end (opt) part of the compile:
$ ./opt -O2 row_common.bc -o /dev/null

...visitAdd and visitSub are near the top of the instcombine list, and the calls to SimplifyDemandedInstructionBits()
are high within each of those. Those calls account for 1%+ of the opt time in either debug or release profiles. And 
that's the rough win I see from this patch when testing opt built release from r295864 on an iMac with Haswell 4GHz
(model 4790K).

It seems unlikely that we'd be able to eliminate add/sub or change their operands given that add/sub normally affect
all bits, and the PR32037 example shows no IR difference after this change using -O2.

Also worth noting - the code comment in visitAdd:
// This handles stuff like (X & 254)+1 -> (X&254)|1
...isn't true. That transform is handled later with a call to haveNoCommonBitsSet().

Differential Revision: https://reviews.llvm.org/D30270

llvm-svn: 295898
2017-02-22 23:01:12 +00:00
Dylan McKay 19d9533496 [AVR] Disable integrated assembler for a few tests
Fixes the build.

llvm-svn: 295895
2017-02-22 22:41:13 +00:00
Eugene Zelenko db56e5a89a [CodeGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 295893
2017-02-22 22:32:51 +00:00
Krzysztof Parzyszek ab57c2bad3 [Hexagon] Implement @llvm.readcyclecounter()
llvm-svn: 295892
2017-02-22 22:28:47 +00:00
Matt Arsenault 7b6c5d28f5 AMDGPU: Don't add emergency stack slot if all spills are SGPR->VGPR
This should avoid reporting any stack needs to be allocated in the
case where no stack is truly used. An unused stack slot is still
left around in other cases where there are real stack objects
but no spilling occurs.

llvm-svn: 295891
2017-02-22 22:23:32 +00:00
Daniel Berlin fccbda967a PredicateInfo: Support switch statements
Summary:
Depends on D29606 and D29682

Makes us pass GVN's edge.ll (we also will pass a few other testcases
they just need cleaning up).

Thoughts on the Predicate* hiearchy of classes especially welcome :)
(it's not clear to me how best to organize it, and currently, the getBlock* seems ... uglier than maybe wasting a field somewhere or something).

Reviewers: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29747

llvm-svn: 295889
2017-02-22 22:20:58 +00:00
Daniel Berlin 211a1209a5 Add pair conversion functions to BasicBlockEdge.
llvm-svn: 295888
2017-02-22 22:20:53 +00:00
Daniel Berlin 17e8d0eae2 Move updating functions to MemorySSAUpdater.
Add updater to passes that now need it.
Move around code in MemorySSA to expose needed functions.

Summary: Mostly cleanup

Reviewers: george.burgess.iv

Subscribers: llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D30221

llvm-svn: 295887
2017-02-22 22:19:55 +00:00
Matthew Simpson 835b246d7f [LV] Update floating-point induction test checks (NFC)
llvm-svn: 295885
2017-02-22 21:56:02 +00:00
Wei Mi 74d5a90fa6 [LSR] Canonicalize formula and put recursive Reg related with current loop in ScaledReg.
After rL294814, LSR formula can have multiple SCEVAddRecExprs inside of its BaseRegs.
Previous canonicalization will swap the first SCEVAddRecExpr in BaseRegs with ScaledReg.
But now we want to swap the SCEVAddRecExpr Reg related with current loop with ScaledReg.
Otherwise, we may generate code like this: RegA + lsr.iv + RegB, where loop invariant
parts RegA and RegB are not grouped together and cannot be promoted outside of loop.
With this patch, it will ensure lsr.iv to be generated later in the expr:
RegA + RegB + lsr.iv, so that RegA + RegB can be promoted outside of loop.

Differential Revision: https://reviews.llvm.org/D26781

llvm-svn: 295884
2017-02-22 21:47:08 +00:00
Krzysztof Parzyszek 3596a81c69 [RDF] Support for partial structural aliases in RegisterAggr
llvm-svn: 295883
2017-02-22 21:42:15 +00:00
Zachary Turner 842972b740 [Support] Re-add the special OSX flags on mmap.
The problem appears to be that these flags can only be used
when mapping a file for read-only, not for readwrite.  So
we do that here.

llvm-svn: 295880
2017-02-22 21:24:06 +00:00
Krzysztof Parzyszek 65971d97b0 [Hexagon] Add intrinsics for masked vector stores
Patch by Harsha Jagasia.

llvm-svn: 295879
2017-02-22 21:23:09 +00:00
Matt Arsenault 93e65ea733 AMDGPU: Don't look at chain users when adjusting writemask
Fixes not adjusting using new intrinsics with chains.

llvm-svn: 295878
2017-02-22 21:16:41 +00:00
Matt Arsenault 707780b420 AMDGPU: Always allocate emergency stack slot at offset 0
This allows us to ensure that 0 is never a valid pointer
to a user object, and ensures that the offset is always legal
without needing a register to access it. This comes at the cost
of usable offsets and wasted stack space.

llvm-svn: 295877
2017-02-22 21:05:25 +00:00
Derek Schuff 8afd6565e7 [WebAssembly] Update llvm-readobj tests for switch to version 0x1
llvm-svn: 295875
2017-02-22 21:01:17 +00:00
Matt Arsenault 61ec6a03ca AMDGPU: Change exp with compr bit printing
llvm-svn: 295873
2017-02-22 20:37:12 +00:00
Wei Ding 6ade56e0a0 Revert "AMDGPU : Update TrapCode based on Trap Handler ABI."
This reverts commit r295867.

llvm-svn: 295871
2017-02-22 20:29:22 +00:00
Dan Gohman 8e6ee2070a [WebAssembly] Update llvm-objdump tests for the new wasm version number.
llvm-svn: 295869
2017-02-22 20:24:16 +00:00
Alexey Bataev b551a81c28 [SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong result
Summary:
If the same value is used several times as an extra value, SLP
vectorizer takes it into account only once instead of actual number of
using.
For example:
```
int val = 1;
for (int y = 0; y < 8; y++) {
  for (int x = 0; x < 8; x++) {
    val = val + input[y * 8 + x] + 3;
  }
}
```
We have 2 extra rguments: `1` - initial value of horizontal reduction
and `3`, which is added 8*8 times to the reduction. Before the patch we
added `1` to the reduction value and added once `3`, though it must be
added 64 times.

Reviewers: mkuper, mzolotukhin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30262

llvm-svn: 295868
2017-02-22 20:06:40 +00:00
Wei Ding 4991d3570f AMDGPU : Update TrapCode based on Trap Handler ABI.
Differential Revision: http://reviews.llvm.org/D30232

llvm-svn: 295867
2017-02-22 20:05:06 +00:00
Rafael Espindola f133ccbd8d Move llvm_unreachable out of switch.
This should make gcc happy and still produce a clang warning if we add
another value to the enum.

llvm-svn: 295865
2017-02-22 19:42:14 +00:00
Matthias Braun e8a0f5ef3b Bring back 2>&1 redirection for this test
llvm-svn: 295864
2017-02-22 19:16:33 +00:00
Geoff Berry 6bb79157dd [AArch64] Extend AArch64RedundantCopyElimination to do simple copy propagation.
Summary:
Extend AArch64RedundantCopyElimination to catch cases where the register
that is known to be zero is COPY'd in the predecessor block.  Before
this change, this pass would catch cases like:

      CBZW %W0, <BB#1>
  BB#1:
      %W0 = COPY %WZR // removed

After this change, cases like the one below are also caught:

      %W0 = COPY %W1
      CBZW %W1, <BB#1>
  BB#1:
      %W0 = COPY %WZR // removed

This change results in a 4% increase in static copies removed by this
pass when compiling the llvm test-suite.  It also fixes regressions
caused by doing post-RA copy propagation (a separate change to be put up
for review shortly).

Reviewers: junbuml, mcrosier, t.p.northover, qcolombet, MatzeB

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D30113

llvm-svn: 295863
2017-02-22 19:10:45 +00:00
Matthew Simpson 5ef66ef248 [LV] Add scalar floating-point induction test (NFC)
llvm-svn: 295862
2017-02-22 19:09:38 +00:00
Davide Italiano e122d6885a [ModuleSummaryAnalysis] Don't crash when referencing unnamed globals.
Instead, just be conservative as these are unfrequent enough. Thanks
to Peter Collingbourne for the discussion about this on IRC.

llvm-svn: 295861
2017-02-22 18:53:38 +00:00
Dan Gohman 7ea5adfff4 [WebAssembly] Implement the wasm binary container header.
Also, update the version number to 0x1, which is what engines are now
expecting.

llvm-svn: 295860
2017-02-22 18:50:20 +00:00
Matthias Braun f1141285eb MIRTests: Remove unnecessary 2>&1 redirection
llc mir output goes to stdout nowadays, so the 2>&1 is not necessary
anymore for most tests.

llvm-svn: 295859
2017-02-22 18:47:41 +00:00
Karl-Johan Karlsson 6eaed7aceb [LoopVectorize] Added address space check when analysing interleaved accesses
Prevent memory objects of different address spaces to be part of
the same load/store groups when analysing interleaved accesses.

This is fixing pr31900.

Reviewers: HaoLiu, mssimpso, mkuper

Reviewed By: mssimpso, mkuper

Subscribers: llvm-commits, efriedma, mzolotukhin

Differential Revision: https://reviews.llvm.org/D29717

This reverts r295042 (re-applies r295038) with an additional fix for the
buildbot problem.

llvm-svn: 295858
2017-02-22 18:37:36 +00:00
Dan Gohman 38b42b4a95 [WebAssembly] Define a table of function signatures for runtime library calls.
LLVM CodeGen emits references to external symbols that are never declared in
LLVM IR level, so they have no declared signature. However, WebAssembly requires
all functions be declared with signatures. This patch adds a table for providing
signatures for known runtime libcalls that will be used in subsequent patches to
emit declarations for such functions.

llvm-svn: 295857
2017-02-22 18:34:16 +00:00
Krzysztof Parzyszek ace1b89060 [RDF] Skip undef uses when calculating kill flags
llvm-svn: 295856
2017-02-22 18:29:16 +00:00
Krzysztof Parzyszek ba36b92bef [RDF] Only access block live-ins when tracking liveness
llvm-svn: 295855
2017-02-22 18:27:36 +00:00
Michal Gorny 5ddd2a5bda [Support] Provide linux/magic.h fallback for older kernels
The function for distinguishing local and remote files added in r295768
unconditionally uses linux/magic.h header to provide necessary
filesystem magic numbers. However, in kernel headers predating 2.6.18
the magic numbers are spread throughout multiple include files.
Furthermore, LLVM did not require kernel headers being installed so far.

To increase the portability across different versions of Linux kernel
and different Linux systems, add CMake header checks for linux/magic.h
and -- if it is missing -- the linux/nfs_fs.h and linux/smb.h headers
which contained the numbers previously.

Furthermore, since the numbers are static and the feature does not seem
critical enough to make LLVM require kernel headers at all, add fallback
constants for the case when none of the necessary headers is available.

Differential Revision: https://reviews.llvm.org/D30261

llvm-svn: 295854
2017-02-22 18:09:15 +00:00
Alexey Bataev 1aeb0e73e0 [SLP] Test with extra argument used several times.
llvm-svn: 295853
2017-02-22 17:47:28 +00:00
Dehao Chen 920677a997 Fix an obvious bug in SampleProfileReaderGCC.
Summary: The CallTargetProfile should be added to FProfile to be consistent with other profile readers.

Reviewers: dnovillo, davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30233

llvm-svn: 295852
2017-02-22 17:27:21 +00:00
Dan Gohman a63e8eb138 [WebAssembly] Configure codegen to legalize f16 values.
llvm-svn: 295850
2017-02-22 16:28:00 +00:00
Bill Seurer 8e48f416ad [DAGCombiner] revert r295336
r295336 causes a bootstrapped clang to fail for many compilations on
powerpc BE.  See 
http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/2315
for example.

Reverting as per the developer's request.

llvm-svn: 295849
2017-02-22 16:27:33 +00:00
Simon Pilgrim 13cdd57964 [X86][SSE] getTargetConstantBitsFromNode - insert constant bits directly into masks.
Minor optimization, don't create temporary mask APInts that are just going to be OR'd into the accumulate masks - insert directly instead.

llvm-svn: 295848
2017-02-22 15:38:13 +00:00
Simon Pilgrim 3a895c4873 [X86][SSE] Use APInt::getBitsSet() instead of APInt::getLowBitsSet().shl() separately. NFCI.
llvm-svn: 295845
2017-02-22 15:04:55 +00:00
Simon Dardis e8bb39205d [Support] XFAIL is_local for mips
is_local can't pass on some our buildbots as some of our buildbots use network
shares for building and testing LLVM.

llvm-svn: 295840
2017-02-22 14:34:45 +00:00
Dmitry Preobrazhensky 4f3b96726b * [AMDGPU][mc][tests] Updated coverage/smoke tests for gfx7 and gfx8; minor test corrections.
NB: several old tests have been corrected because they violated constant bus limitations
llvm-svn: 295834
2017-02-22 13:59:39 +00:00
Simon Pilgrim 3b97067ae8 Fix -Wunused-but-set-variable warning by removing unused 'aggregateIsPacked' checking
llvm-svn: 295830
2017-02-22 13:37:31 +00:00
Benjamin Kramer 5a7e0f8357 [GlobalISel] Fix compiler warnings and make assert assert something.
llvm-svn: 295827
2017-02-22 12:59:47 +00:00
Alexey Bataev 2e0031b371 [SLP] Remove unused initial value from the variable, NFC.
llvm-svn: 295826
2017-02-22 12:57:58 +00:00
Igor Breger f7359d893a [X86][GlobalISel] Initial implementation , select G_ADD gpr, gpr
Summary: Initial implementation for X86InstructionSelector. Handle selection COPY and G_ADD/G_SUB gpr, gpr .

Reviewers: qcolombet, rovka, zvi, ab

Reviewed By: rovka

Subscribers: mgorny, dberris, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D29816

llvm-svn: 295824
2017-02-22 12:25:09 +00:00
Simon Pilgrim 07056a06a0 [X86] Regenerate CSE test with codegen instead of just the instruction count
llvm-svn: 295819
2017-02-22 10:12:46 +00:00
Roger Ferrer Ibanez 56db97d4de [ARM] Fix constant islands pass.
The pass tries to fix a spill of LR that turns out to be unnecessary.
So it removes the tPOP but forgets to remove tPUSH.
This causes the stack be misaligned upon returning the function.

Thus, remove the tPUSH as well in this case.

Differential Revision: https://reviews.llvm.org/D30207

llvm-svn: 295816
2017-02-22 09:06:21 +00:00
Benjamin Kramer 48e9af6ee8 Write to a temporary file in test instead of random file in the test directory.
llvm-svn: 295815
2017-02-22 09:02:27 +00:00
Ayman Musa ceea56c705 [X86] Fix memory operands definition for some instructions.
Change integer memory operands to FP memory operands to some FP instructions.

Differential Revision: https://reviews.llvm.org/D30201

llvm-svn: 295813
2017-02-22 08:06:29 +00:00
Justin Bogner 8281c81413 OptDiag: Add const to some interfaces that don't modify anything. NFC
This needed a const_cast for the dominator tree recalculation in
OptimizationRemarkEmitter, but we do that all over the place already
and it's safe.

llvm-svn: 295812
2017-02-22 07:38:17 +00:00
Javed Absar b672722810 [ARM] Classification Improvements to ARM Sched-Models. NFCI.
This patch adds missing sched classes for Thumb2 instructions.
This has been missing so far, and as a consequence, machine
scheduler models for individual sub-targets have tended to
be larger than they needed to be. These patches should help
write schedulers better and faster in the future
for ARM sub-targets.

Reviewer: Diana Picus
Differential Revision: https://reviews.llvm.org/D29953

llvm-svn: 295811
2017-02-22 07:22:57 +00:00
Craig Topper 56d4022997 [AVX-512] Allow legacy scalar min/max intrinsics to select EVEX instructions when available
This patch introduces new X86ISD::FMAXS and X86ISD::FMINS opcodes. The legacy intrinsics now lower to this node. As do the AVX-512 masked intrinsics when the rounding mode is CUR_DIRECTION.

I've merged a copy of the tablegen multiclass avx512_fp_scalar into avx512_fp_scalar_sae. avx512_fp_scalar still needs to support CUR_DIRECTION appearing as a rounding mode for X86ISD::FADD_ROUND and others.

Differential revision: https://reviews.llvm.org/D30186

llvm-svn: 295810
2017-02-22 06:54:18 +00:00
Sanjoy Das 5cd6c5cacf [ValueTracking] Make poison propagation more aggressive
Summary:
Motivation: fix PR31181 without regression (the actual fix is still in
progress).  However, the actual content of PR31181 is not relevant
here.

This change makes poison propagation more aggressive in the following
cases:

 1. poision * Val == poison, for any Val.  In particular, this changes
    existing intentional and documented behavior in these two cases:
     a. Val is 0
     b. Val is 2^k * N
 2. poison << Val == poison, for any Val
 3. getelementptr is poison if any input is poison

I think all of these are justified (and are axiomatically true in the
new poison / undef model):

1a: we need poison * 0 to be poison to allow transforms like these:

  A * (B + C) ==> A * B + A * C

If poison * 0 were 0 then the above transform could not be allowed
since e.g. we could have A = poison, B = 1, C = -1, making the LHS

  poison * (1 + -1) = poison * 0 = 0

and the RHS

  poison * 1 + poison * -1 = poison + poison = poison

1b: we need e.g. poison * 4 to be poison since we want to allow

  A * 4 ==> A + A + A + A

If poison * 4 were a value with all of their bits poison except the
last four; then we'd not be able to do this transform since then if A
were poison the LHS would only be "partially" poison while the RHS
would be "full" poison.

2: Same reasoning as (1b), we'd like have the following kinds
transforms be legal:

  A << 1 ==> A + A

Reviewers: majnemer, efriedma

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D30185

llvm-svn: 295809
2017-02-22 06:52:32 +00:00
Sean Silva 9011aca5f4 Use const-ref in range-loop for to avoid copying pairs of std::string
No reason to create temporaries.

Differential Revision: https://reviews.llvm.org/D29871

Patch by sergio.martins!

llvm-svn: 295807
2017-02-22 06:34:04 +00:00
Dan Gohman 18eafb6c68 [WebAssembly] Add skeleton MC support for the Wasm container format
This just adds the basic skeleton for supporting a new object file format.
All of the actual encoding will be implemented in followup patches.

Differential Revision: https://reviews.llvm.org/D26722

llvm-svn: 295803
2017-02-22 01:23:18 +00:00
Rui Ueyama e67e162654 Fix -Wcovered-switch-default.
llvm-svn: 295799
2017-02-22 01:01:45 +00:00
Matt Arsenault 1f17c66890 AMDGPU: Add cvt.pkrtz intrinsic
Convert llvm.SI.packf16 test uses

llvm-svn: 295797
2017-02-22 00:27:34 +00:00
Michael Kuperstein c2af82b4b7 [LoopUnroll] Enable PGO-based loop peeling by default.
This enables peeling of loops with low dynamic iteration count by default,
when profile information is available.

Differential Revision: https://reviews.llvm.org/D27734

llvm-svn: 295796
2017-02-22 00:27:34 +00:00
Matt Arsenault 3ea06336fc AMDGPU: Remove some uses of llvm.SI.export in tests
Merge some of the old, smaller tests into more complete versions.

llvm-svn: 295792
2017-02-22 00:02:21 +00:00
Matt Arsenault 9417505f7d AMDGPU: Remove llvm.AMDGPU.clamp intrinsic
llvm-svn: 295789
2017-02-21 23:46:04 +00:00
Matt Arsenault 2fdf2a1a18 AMDGPU: Redefine clamp node as clamp 0.0-1.0
Change implementation to use max instead of add.
min/max/med3 do not flush denormals regardless of the mode,
so it is OK to use it whether or not they are enabled.

Also allow using clamp with f16, and use knowledge
of dx10_clamp.

llvm-svn: 295788
2017-02-21 23:35:48 +00:00
Artem Belevich 29bbdc1c32 [NVPTX] Unify vectorization of load/stores of aggregate arguments and return values.
Original code only used vector loads/stores for explicit vector arguments.
It could also do more loads/stores than necessary (e.g v5f32 would
touch 8 f32 values). Aggregate types were loaded one element at a time,
even the vectors contained within.

This change attempts to generalize (and simplify) parameter space
loads/stores so that vector loads/stores can be used more broadly.
Functionality of the patch has been verified by compiling thrust
test suite and manually checking the differences between PTX
generated by llvm with and without the patch.

General algorithm:
* ComputePTXValueVTs() flattens input/output argument into a flat list
  of scalars to load/store and returns their types and offsets.
* VectorizePTXValueVTs() uses that data to create vectorization plan
  which returns an array of flags marking boundaries of vectorized
  load/stores. Scalars are represented as 1-element vectors.
* Code that generates loads/stores implements a simple state machine
  that constructs a vector according to the plan.

Differential Revision: https://reviews.llvm.org/D30011

llvm-svn: 295784
2017-02-21 22:56:05 +00:00
Matt Arsenault 7d6b71db4f AMDGPU: Formatting fixes
llvm-svn: 295783
2017-02-21 22:50:41 +00:00
Matt Arsenault f0a4823b91 DAG: Check if extract_vector_elt is legal or custom
Avoids test regressions in future AMDGPU commits when
more vector types are custom lowered.

llvm-svn: 295782
2017-02-21 22:47:27 +00:00
Evandro Menezes a8d3301ee1 [AArch64, X86] Add statistics for the MacroFusion pass
llvm-svn: 295777
2017-02-21 22:16:13 +00:00
Evandro Menezes b9b7f4b8d3 [AArch64, X86] Guard against both instrs being wild cards
If both instrs are wild cards, the result can be a crash.

llvm-svn: 295776
2017-02-21 22:16:11 +00:00
Evandro Menezes bc9a13db0e [AArch64] Add test case for fusion of literal generation
Add test case from https://reviews.llvm.org/D28698 that was somehow lost in
transit.

llvm-svn: 295775
2017-02-21 22:16:09 +00:00
Evandro Menezes ec330cc283 [AArch64] Add test case for fusion of AES crypto operations
Add test case from https://reviews.llvm.org/D28491 that was somehow lost in
transit.

llvm-svn: 295774
2017-02-21 22:16:06 +00:00
Eugene Zelenko 49e2fc4f5f [CodeGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 295773
2017-02-21 22:07:52 +00:00
Zachary Turner e1ca5a294c Try to fix the buildbot on OSX.
Since I'm only seeing failures on OSX, and it's saying
permission denied, I'm suspecting this is due to the addition
of the MAP_RESILIENT_CODESIGN and/or MAP_RESILIENT_MEDIA flags.
Speculatively trying to remove those to get the bots working.

llvm-svn: 295770
2017-02-21 21:31:28 +00:00
Zachary Turner 6bc2dac132 Try to fix Android build.
llvm-svn: 295769
2017-02-21 21:13:10 +00:00
Zachary Turner 392ed9d342 [Support] Add a function to check if a file resides locally.
Differential Revision: https://reviews.llvm.org/D30010

llvm-svn: 295768
2017-02-21 20:55:47 +00:00
Xin Tong ccee0e0c05 Make default value for disable-licm-promotion in licm explicit.
llvm-svn: 295767
2017-02-21 20:53:48 +00:00
Rafael Espindola 23a76be5ad Don't modify archive members unless really needed.
For whatever reason ld64 requires that member headers (not the member
themselves) should be aligned. The only way to do that is to edit the
previous member so that it ends at an aligned boundary.

Since modifying data put in an archive is an undesirable property,
llvm-ar should only do it when it is absolutely necessary.

llvm-svn: 295765
2017-02-21 20:40:54 +00:00
Evgeniy Stepanov 1fd19c6e5d Fix PR31896.
Address of an alias of a global with offset is incorrectly lowered as an address of the global (i.e. ignoring offset).

llvm-svn: 295762
2017-02-21 20:17:34 +00:00
Zachary Turner 43313b3e89 Try to fix line endings.
llvm-svn: 295759
2017-02-21 19:52:57 +00:00
Sanjay Patel cb731f1538 [InstCombine] canonicalize non-obivous forms of integer min/max
This is part of trying to clean up our handling of min/max patterns in IR.
By converting these to canonical form, we're more likely to recognize them
because there are various places in InstCombine that don't use 
matchSelectPattern or m_SMax and friends.

The backend fixups referenced in the now deleted TODO comment were added with:
https://reviews.llvm.org/rL291392
https://reviews.llvm.org/rL289738

If there's any codegen fallout from this change, we should be able to address
it in DAGCombiner or target-specific lowering. 

llvm-svn: 295758
2017-02-21 19:33:53 +00:00
Matt Arsenault f3ffe75a1b AMDGPU: Remove dead declarations in tests
llvm-svn: 295757
2017-02-21 19:31:33 +00:00
Zachary Turner 3788818730 Remove svn:eol-style property from 2 files.
There are still over 3400 files remaining with this property set, but there are tens of thousands more with the property not set.  Until we decide what to do on a global scale, this at least unblocks me temporarily.

llvm-svn: 295756
2017-02-21 19:29:56 +00:00
Matt Arsenault b2e6811ec1 AMDGPU: Remove dead declarations from MIR tests
llvm-svn: 295755
2017-02-21 19:27:36 +00:00
Matt Arsenault c2a44e4c3c AMDGPU: Remove llvm.AMDGPU.flbit intrinsic
llvm-svn: 295754
2017-02-21 19:27:33 +00:00
Matt Arsenault e0bf7d02f0 AMDGPU: Don't use stack space for SGPR->VGPR spills
Before frame offsets are calculated, try to eliminate the
frame indexes used by SGPR spills. Then we can delete them
after.

I think for now we can be sure that no other instruction
will be re-using the same frame indexes. It should be easy
to notice if this assumption ever breaks since everything
asserts if it tries to use a dead frame index later.

The unused emergency stack slot seems to still be left behind,
so an additional 4 bytes is still wasted.

llvm-svn: 295753
2017-02-21 19:12:08 +00:00
Xin Tong ebfe01c121 [LoopSimplify] Simplify how we compute UniqueExit
Summary: Simplify how we compute UniqueExit. Reuse ExitBlockSet.

Reviewers: sanjoy, efriedma, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30182

llvm-svn: 295751
2017-02-21 19:10:58 +00:00
Xin Tong a05a6c101d More comments for getUniqueExitBlocks. NFCI
llvm-svn: 295750
2017-02-21 19:08:03 +00:00
Adrian Prantl 11b2d7dad8 Teach the IR verifier to reject conflicting debug info for function arguments.
Conflicting debug info for function arguments causes hard-to-debug
assertions in the DWARF backend, so the Verifier should reject it.
For performance reasons this only checks function arguments from
non-inlined debug intrinsics for now.

rdar://problem/30520286

llvm-svn: 295749
2017-02-21 19:03:15 +00:00
Geoff Berry 5d534b6a11 [CodeGenPrepare] Sink and duplicate more 'and' instructions.
Summary:
Rework the code that was sinking/duplicating (icmp and, 0) sequences
into blocks where they were being used by conditional branches to form
more tbz instructions on AArch64.  The new code is more general in that
it just looks for 'and's that have all icmp 0's as users, with a target
hook used to select which subset of 'and' instructions to consider.
This change also enables 'and' sinking for X86, where it is more widely
beneficial than on AArch64.

The 'and' sinking/duplicating code is moved into the optimizeInst phase
of CodeGenPrepare, where it can take advantage of the fact the
OptimizeCmpExpression has already sunk/duplicated any icmps into the
blocks where they are used.  One minor complication from this change is
that optimizeLoadExt needed to be updated to always mark 'and's it has
determined should be in the same block as their feeding load in the
InsertedInsts set to avoid an infinite loop of hoisting and sinking the
same 'and'.

This change fixes a regression on X86 in the tsan runtime caused by
moving GVNHoist to a later place in the optimization pipeline (see
PR31382).

Reviewers: t.p.northover, qcolombet, MatzeB

Subscribers: aemerson, mcrosier, sebpop, llvm-commits

Differential Revision: https://reviews.llvm.org/D28813

llvm-svn: 295746
2017-02-21 18:53:14 +00:00
Wei Ding 16289cfcfc AMDGPU : AMDGPU : Update AMDGPU Trap Handler ABI.
Differential Revision: http://reviews.llvm.org/D29913

llvm-svn: 295745
2017-02-21 18:48:01 +00:00
Dmitry Preobrazhensky e6e205344e Test commit
llvm-svn: 295740
2017-02-21 18:07:07 +00:00
Simon Pilgrim 8eb515d8c4 [X86] EltsFromConsecutiveLoads SDLoc argument should be const&.
There appears never to have been a time that the reference was updated.

llvm-svn: 295739
2017-02-21 17:42:28 +00:00
Vassil Vassilev 59e5a64435 Do not leak OpenedHandles.
Reviewed by Vedant Kumar (D30178)

llvm-svn: 295737
2017-02-21 17:30:43 +00:00
Simon Pilgrim 5afda30930 [X86][AVX512] Update VPBROADCASTQ test to combine from VPERMQ instead of VPERMI2Q.
VPERMI2Q doesn't have shuffle decoding from re-materializable constants.

llvm-svn: 295736
2017-02-21 17:04:11 +00:00
Simon Pilgrim f321ab6dd2 [X86][AVX] Rename shuffle combine tests to show combined shuffle type. NFCI.
llvm-svn: 295735
2017-02-21 16:45:31 +00:00
John Brawn cfd4f9cfec [ARM] Correct SP/PC handling in t2MOVr
Add a missing test that I forgot to svn add in my previous commit

llvm-svn: 295734
2017-02-21 16:45:04 +00:00
Simon Pilgrim 791955819c [X86][AVX2] Fix VPBROADCASTQ folding on 32-bit targets.
As i64 isn't a value type on 32-bit targets, we need to fold the VZEXT_LOAD into VPBROADCASTQ.

llvm-svn: 295733
2017-02-21 16:41:44 +00:00
John Brawn a6e95e1652 [ARM] Correct SP/PC handling in t2MOVr
PC isn't allowed in the source operand of t2MOVr, so change the register class
to one without PC. SP handling is slightly trickier and changes depending on if
we're in ARMv8, so do that in checkTargetMatchPredicate.

Differential Revision: https://reviews.llvm.org/D30199

llvm-svn: 295732
2017-02-21 16:41:29 +00:00
Simon Pilgrim f98a32fa7f [X86][AVX2] Add AVX512 test targets to AVX2 shuffle combines.
llvm-svn: 295731
2017-02-21 16:29:28 +00:00
Simon Pilgrim 4cc6dd0cf6 [X86][AVX] Add tests showing missed VPBROADCASTQ folding on 32-bit targets.
As i64 isn't a value type on 32-bit targets, we fail to fold the VZEXT_LOAD into VPBROADCASTQ.

Also shows that we're not decoding VPERMIV3 shuffles very well....

llvm-svn: 295729
2017-02-21 16:05:35 +00:00
Simon Pilgrim 3546156122 [X86][SSE] Prefer to combine shuffles to VZEXT over VZEXT_MOVL.
This matches what is already done during shuffle lowering and helps prevent the need for a zero-vector in cases where shuffles match both patterns.

llvm-svn: 295723
2017-02-21 15:09:00 +00:00
Simon Pilgrim 0c094f504c [X86][SSE] Added SSE41 shuffle combining test file.
Currently just contains one case where we combine to VZEXT_MOVL instead of VZEXT which would avoid the need for a zero vector to be generated

llvm-svn: 295721
2017-02-21 14:51:15 +00:00
Anna Thomas ec36f3b79a [InstCombine] Do not exercise nested max/min pattern on abs
Summary:
This is a fix for assertion failure in
`getInverseMinMaxSelectPattern` when ABS is passed in as a select pattern.

We should not be invoking the simplification rule for
ABS(MIN(~ x,y))) or ABS(MAX(~x,y)) combinations.

Added a test case which would cause an assertion failure without the patch.

Reviewers: sanjoy, majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30051

llvm-svn: 295719
2017-02-21 14:40:28 +00:00
Igor Breger 812f319794 [AVX512] Fix EXTRACT_VECTOR_ELT for v2i1/v4i1/v32i1/v64i1 with variable index.
Differential Revision: https://reviews.llvm.org/D30189

llvm-svn: 295718
2017-02-21 14:01:25 +00:00
Alexey Bataev 64da79424e [SLP] Tests for shuffle/blending operations.
llvm-svn: 295717
2017-02-21 13:40:55 +00:00
Diana Picus 613b65696a [ARM] GlobalISel: Lower calls to void() functions
For now, we hardcode a BLX instruction, and generate an ADJCALLSTACKDOWN/UP pair
with amount 0.

llvm-svn: 295716
2017-02-21 11:33:59 +00:00
Pavel Labath 52a82e2ec6 tablegen: Fix android build
use llvm::to_string instead of std:: version.

llvm-svn: 295711
2017-02-21 09:19:41 +00:00
Craig Topper fe78d95a49 [X86] Remove ssse3 intrinsic tests from the avx intrinsics test file.
They are all covered by the SSSE3 intrinsics test with SSSE3, AVX, and AVX512 command lines.

llvm-svn: 295708
2017-02-21 08:06:08 +00:00
Craig Topper 55e2de869d [X86] Remove sse4.2 intrinsic tests from the avx intrinsics test file. Fix some other consistency issues.
They are all covered by the SSE4.2 intrinsics test with SSE4.2, AVX, and AVX512 command lines.

Merge sse42.ll into the other intrinsics test. Rename sse42_64.ll to be named like other intrinsic tests.

llvm-svn: 295707
2017-02-21 08:06:05 +00:00
Craig Topper 25191b4ac3 [X86] Remove sse4.1 intrinsic tests from the avx intrinsics test file.
They are all covered by the SSE4.1 intrinsics test with SSE4.1, AVX, and AVX512 command lines.

llvm-svn: 295706
2017-02-21 08:06:02 +00:00
Craig Topper da8e6f1337 [X86] Remove sse3 intrinsic tests from the avx intrinsics test file.
They are all covered by the SSE3 intrinsics test with SSE2, AVX, and AVX512 command lines.

llvm-svn: 295705
2017-02-21 08:05:59 +00:00
Evgeny Stupachenko 9909872e30 The patch introduces new way of narrowing complex (>UINT16 variants) solutions.
The new method introduced under "-lsr-exp-narrow" option (currenlty set to true).

Summary:

The method is based on registers number mathematical expectation and should be
 generally closer to optimal solution.
Please see details in comments to
 "LSRInstance::NarrowSearchSpaceByDeletingCostlyFormulas()" function
 (in lib/Transforms/Scalar/LoopStrengthReduce.cpp).

Reviewers: qcolombet

Differential Revision: http://reviews.llvm.org/D29862

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 295704
2017-02-21 07:34:40 +00:00
Craig Topper 002549b8be [X86] Remove aes intrinsic tests from the avx intrinsics test file.
They are all covered by the AES intrinsics test with a legacy command line and an AVX command line.

llvm-svn: 295702
2017-02-21 07:32:18 +00:00
Craig Topper 2a71fd95e8 [X86] Add an AVX command line and regenerate AES intrinsics test using the update_llc_test_checks.py
llvm-svn: 295701
2017-02-21 07:32:14 +00:00
Craig Topper dbf6f367e9 [X86] Remove sse2 intrinsic tests from the avx intrinsics test file.
They are all covered by the SSE2 intrinsics test with SSE2, AVX, and AVX512 command lines.

Also remove an unneeded lfence intrinsic test since it was already covered.

llvm-svn: 295700
2017-02-21 07:32:11 +00:00
Craig Topper 0d47fdcf3f [X86] Remove sse1 intrinsic tests from the avx intrinsics test file.
They are all covered by the SSE intrinsics test with SSE, AVX, and AVX512 command lines.

Also remove an unneeded sfence intrinsic test since it was already covered.

llvm-svn: 295699
2017-02-21 07:32:03 +00:00
Craig Topper d88389aa7e [X86] Use SHLD with both inputs from the same register to implement rotate on Sandy Bridge and later Intel CPUs
Summary:
Sandy Bridge and later CPUs have better throughput using a SHLD to implement rotate versus the normal rotate instructions. Additionally it saves one uop and avoids a partial flag update dependency.

This patch implements this change on any Sandy Bridge or later processor without BMI2 instructions. With BMI2 we will use RORX as we currently do.

Reviewers: zvi

Reviewed By: zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30181

llvm-svn: 295697
2017-02-21 06:39:13 +00:00
Craig Topper 16d9730b86 [X86] Fix formatting. NFC
llvm-svn: 295695
2017-02-21 06:27:13 +00:00
Craig Topper d9fe664868 [AVX-512] Use sse_load_f32/f64 in place of scalar_to_vector and scalar load in some patterns.
llvm-svn: 295693
2017-02-21 04:26:10 +00:00