Commit Graph

18219 Commits

Author SHA1 Message Date
Matt Arsenault d4bb5e4831 AMDGPU: Enable store clustering
Also respect the TII hook for these like the generic code does
in case we want a flag later to disable this.

llvm-svn: 287021
2016-11-15 20:22:55 +00:00
Haicheng Wu faee2b71a7 [AArch64] Lower multiplication by a constant int to shl+add+shl
Lower a = b * C where C = (2^n + 1) * 2^m to

add     w0, w0, w0, lsl n
lsl     w0, w0, m

Differential Revision: https://reviews.llvm.org/D229245

llvm-svn: 287019
2016-11-15 20:16:48 +00:00
Matt Arsenault 3666629837 AMDGPU: Analyze mubuf with immediate soffset
Fixes giving up on clustering common addr64 accesses with
constant 0 soffset.

llvm-svn: 287018
2016-11-15 20:14:27 +00:00
Wei Mi 37c4aaaf52 Revert r286999 which caused buildbot test failures. Some testcases need to be made target specific.
llvm-svn: 287014
2016-11-15 19:42:05 +00:00
Stanislav Mekhanoshin ea91cca593 [AMDGPU] Add wave barrier builtin
The wave barrier represents the discardable barrier. Its main purpose is to
carry convergent attribute, thus preventing illegal CFG optimizations. All lanes
in a wave come to convergence point simultaneously with SIMT, thus no special
instruction is needed in the ISA. The barrier is discarded during code generation.

Differential Revision: https://reviews.llvm.org/D26585

llvm-svn: 287007
2016-11-15 19:00:15 +00:00
Sanjay Patel 22465125b3 [x86] auto-generate checks; NFC
Also, fix the test params to use an attribute rather than a CPU model
and remove the AVX run because that does nothing but check for a 'v'
prefix in all of these tests.

llvm-svn: 287003
2016-11-15 18:44:53 +00:00
Wei Mi 7ccf7651c0 [LSR] Allow formula containing Reg for SCEVAddRecExpr related with outerloop.
In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr,
and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser
and dropped.

Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only
handle inner loop now so only %for.body2 will be handled.

Using the logic above, formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) will be dropped
no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related
with outerloop. Only formula like
reg(%array) + 1*reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept
because the SCEVAddRecExpr related with outerloop is folded into the initial value of the
SCEVAddRecExpr related with current loop.

But in some cases, we do need to share the basic induction variable
reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction
variables used by LSR, so we don't want to drop the formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally.

From the existing comment, it tries to avoid considering multiple level loops at the same time.
However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other
than current loop, it is an invariant and will be simple to handle, and the formula doesn't have
to be dropped.

Differential Revision: https://reviews.llvm.org/D26429

llvm-svn: 286999
2016-11-15 18:35:53 +00:00
Pawel Bylica c3f6c97f71 Integer legalization: fix MUL expansion
Summary:
This fixes the runtime results produces by the fallback multiplication expansion introduced in r270720.

For tests I created a fuzz tester that compares the results with Boost.Multiprecision.

Reviewers: hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26628

llvm-svn: 286998
2016-11-15 18:29:24 +00:00
Zaara Syeda a19c9e60e9 vector load store with length (left justified) llvm portion
llvm-svn: 286993
2016-11-15 17:54:19 +00:00
Simon Pilgrim ceffb43b1b [X86][SSE] Improve SINT_TO_FP of boolean vector results (signum)
This patch helps avoids poor legalization of boolean vector results (e.g. 8f32 -> 8i1 -> 8i16) that feed into SINT_TO_FP by inserting an early SIGN_EXTEND and so help improve the truncation logic.

This is not necessary for AVX512 targets where boolean vectors are legal - AVX512 manages to lower ( sint_to_fp vXi1 ) into some form of ( select mask, 1.0f , 0.0f ) in most cases.

Fix for PR13248

Differential Revision: https://reviews.llvm.org/D26583

llvm-svn: 286979
2016-11-15 16:24:40 +00:00
Tony Jiang 5f850cd1b1 [PowerPC] Implement BE VSX load/store builtins - llvm portion.
This patch implements all the overloads for vec_xl_be and vec_xst_be. On BE,
they behaves exactly the same with vec_xl and vec_xst, therefore they are
simply implemented by defining a matching macro. On LE, they are implemented
by defining new builtins and intrinsics. For int/float/long long/double, it
is just a load (lxvw4x/lxvd2x) or store(stxvw4x/stxvd2x). For char/char/short,
we also need some extra shuffling before or after call the builtins to get the
desired BE order. For int128, simply call vec_xl or vec_xst.

llvm-svn: 286967
2016-11-15 14:25:56 +00:00
Zvi Rackover f0b9b57bd3 [X86][FastISel] Fix lowering of overflow result on AVX512 targets
Summary:
    Fix a case where the overflow value of type i1, which is legal on AVX512, was assigned to a VK1 register class.
    We always want this value to be assigned to a GPR since the overflow return value is lowered to a SETO instruction.

    Fixes pr30981.

    Reviewers: mkuper, igorb, craig.topper, guyblank, qcolombet

    Subscribers: qcolombet, llvm-commits

    Differential Revision: https://reviews.llvm.org/D26620

llvm-svn: 286958
2016-11-15 13:29:23 +00:00
Javed Absar f043dac25d [ARM] Add machine scheduler for Cortex-R52
This patch adds the Sched Machine Model for Cortex-R52.

Details of the pipeline and descriptions are in comments
in file ARMScheduleR52.td included in this patch.

Reviewers: rengolin, jmolloy

Differential Revision: https://reviews.llvm.org/D26500

llvm-svn: 286949
2016-11-15 11:34:54 +00:00
Asaf Badouh b573553424 DAGCombiner: fix combine of trunc and select
bugzilla:
https://llvm.org/bugs/show_bug.cgi?id=29002
pr29002

Differential Revision: https://reviews.llvm.org/D26449


 

llvm-svn: 286938
2016-11-15 07:55:22 +00:00
Zvi Rackover 76dbf26599 [X86][GlobalISel] Add minimal call lowering support to the IRTranslator
Summary:
    Add basic functionality to support call lowering for X86.
    Currently only supports functions which return void and take zero arguments.
    Inspired by commit 286573.

Reviewers: ab, qcolombet, t.p.northover

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26593

llvm-svn: 286935
2016-11-15 06:34:33 +00:00
Craig Topper 0637099f24 [AVX-512] Add an example test case for PR31018.
llvm-svn: 286934
2016-11-15 05:21:55 +00:00
Matt Arsenault c79dc70d50 AMDGPU: Fix f16 fabs/fneg
llvm-svn: 286931
2016-11-15 02:25:28 +00:00
Matt Arsenault 972034bda9 AMDGPU: Fix formatting of 1/2pi immediate
llvm-svn: 286912
2016-11-15 00:04:33 +00:00
Tom Stellard 9c884e495c MIRParser: Add support for parsing vreg reg alloc hints
Reviewers: qcolombet, MatzeB

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D26573

llvm-svn: 286911
2016-11-15 00:03:14 +00:00
Evandro Menezes 9fc54826e0 [AArch64] Compute the Newton series for reciprocals natively
Implement the Newton series for square root, its reciprocal and reciprocal
natively using the specialized instructions in AArch64 to perform each
series iteration.

Differential revision: https://reviews.llvm.org/D26518

llvm-svn: 286907
2016-11-14 23:29:01 +00:00
Tim Northover e33b175411 GlobalISel: add tests for G_ZEXT/G_SEXT to types smaller than 32-bits.
Support was accidentally added in r286407, but there were no tests at the time.

llvm-svn: 286903
2016-11-14 22:50:22 +00:00
Tom Stellard 11e60ff7da RegAllocGreedy: Properly initialize this pass, so that -run-pass will work
Reviewers: qcolombet, MatzeB

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D26572

llvm-svn: 286895
2016-11-14 21:50:13 +00:00
Tim Northover 46a6f0fbf0 Recommit: ARM: sort register lists by encoding in push/pop instructions.
For example we were producing

    push {r8, r10, r11, r4, r5, r7, lr}

This is misleading (r4, r5 and r7 are actually pushed before the rest), and
other components (stack folding recently) often forget to deal with the extra
complexity coming from the different order, leading to miscompiles. Finally, we
warn about our own code in -no-integrated-as mode without this, which is really
not a good idea.

Fixed usage of std::sort so that we (hopefully) use instantiations that
actually exist in GCC 4.8.

llvm-svn: 286881
2016-11-14 20:28:24 +00:00
Michael Kuperstein f221f13ccc [X86] Tests exhibiting bad parial reloading behavior. NFC.
llvm-svn: 286878
2016-11-14 19:58:11 +00:00
Geoff Berry 526c50588d [AArch64] Split 0 vector stores into scalar store pairs.
Summary:
Replace a splat of zeros to a vector store by scalar stores of WZR/XZR.
The load store optimizer pass will merge them to store pair stores.
This should be better than a movi to create the vector zero followed by
a vector store if the zero constant is not re-used, since one
instructions and one register live range will be removed.

For example, the final generated code should be:

  stp xzr, xzr, [x0]

instead of:

  movi v0.2d, #0
  str q0, [x0]

Reviewers: t.p.northover, mcrosier, MatzeB, jmolloy

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D26561

llvm-svn: 286875
2016-11-14 19:39:04 +00:00
Tim Northover 1b66f39cf2 Revert "ARM: sort register lists by encoding in push/pop instructions."
This reverts commit 286866. It broke a bot, something to do with exactly which
templates std::sort accepts.

llvm-svn: 286867
2016-11-14 19:05:28 +00:00
Tim Northover e908ea844c ARM: sort register lists by encoding in push/pop instructions.
For example we were producing

    push {r8, r10, r11, r4, r5, r7, lr}

This is misleading (r4, r5 and r7 are actually pushed before the rest), and
other components (stack folding recently) often forget to deal with the extra
complexity coming from the different order, leading to miscompiles. Finally, we
warn about our own code in -no-integrated-as mode without this, which is really
not a good idea.

llvm-svn: 286866
2016-11-14 19:02:17 +00:00
Sean Fertile a435e07de8 [PPC] Add intrinsic mapping to the xscvhpsp instruction
add an intrinsic to expose the 'VSX Scalar Convert Half-Precision to
Single-Precision' instruction.

Differential review: https://reviews.llvm.org/D26536

llvm-svn: 286862
2016-11-14 18:43:59 +00:00
Changpeng Fang 8236fe103f AMDGPU/SI: Support data types other than V4f32 in image intrinsics
Summary:
  Extend image intrinsics to support data types of V1F32 and V2F32.

  TODO: we should define a mapping table to change the opcode for data type of V2F32 but just one channel is active,
  even though such case should be very rare.

Reviewers:
  tstellarAMD

Differential Revision:
  http://reviews.llvm.org/D26472

llvm-svn: 286860
2016-11-14 18:33:18 +00:00
Zvi Rackover 35bb7fdadc [X86] Adding reproducer for pr30981
llvm-svn: 286855
2016-11-14 18:10:44 +00:00
Sumanth Gundapaneni d428cf8b5f [Hexagon] Remove unsafe load instructions that affect Stack Slot Coloring
The Stack slot coloring pass removes a store that is followed by a load
that deal with the same stack slot. The function isLoadFromStackSlot
is supposed to consider the loads that have no side-effects. This
patch fixed the issue by removing the unsafe loads from this function
Eg:
%vreg0<def> = L2_loadruh_io <fi#15>, 0
S2_storeri_io <fi#15>, 0, %vreg0

In this case, we load an unsigned extended half word and store this in to
the same stack slot. The Stack slot coloring pass considers safe to remove
the store. This patch marked all the non-vector byte and half word loads as
unsafe.

llvm-svn: 286843
2016-11-14 17:11:00 +00:00
Sean Fertile adda5b2d2b [PPC] add intrinsics for vec extract exp/significand and vec test data class.
Differential Revision: https://reviews.llvm.org/D26272

llvm-svn: 286829
2016-11-14 14:42:37 +00:00
Craig Topper b8596e4d1d [X86] Cleanup 'x' and 'y' mnemonic suffixes for vcvtpd2dq/vcvttpd2dq/vcvtpd2ps and similar instructions.
-Don't print the 'x' suffix for the 128-bit reg/mem VEX encoded instructions in Intel syntax. This is consistent with the EVEX versions.
-Don't print the 'y' suffix for the 256-bit reg/reg VEX encoded instructions in Intel or AT&T syntax. This is consistent with the EVEX versions.
-Allow the 'x' and 'y' suffixes to be used for the reg/mem forms when we're assembling using Intel syntax.
-Allow the 'x' and 'y' suffixes on the reg/reg EVEX encoded instructions in Intel or AT&T syntax. This is consistent with what VEX was already allowing.

This should fix at least some of PR28850.

llvm-svn: 286787
2016-11-14 01:53:29 +00:00
Craig Topper 353e59b6d6 [AVX-512] Remove and autoupgrade masked dword/qword variable shift intrinsics to the new unmasked versions and selects.
llvm-svn: 286786
2016-11-14 01:53:22 +00:00
Sanjay Patel cfcc42bdc2 [ValueTracking] recognize even more variants of smin/smax
Similar to:
https://reviews.llvm.org/rL285499
https://reviews.llvm.org/rL286318

We can't minimally expose this in IR tests because we don't have min/max intrinsics,
but the difference is visible in codegen because SelectionDAGBuilder::visitSelect() 
uses matchSelectPattern().

We're not canonicalizing these patterns in IR (yet), so I don't expect there to be any
regressions as noted here:
http://lists.llvm.org/pipermail/llvm-dev/2016-November/106868.html

llvm-svn: 286776
2016-11-13 20:04:52 +00:00
Matt Arsenault dc45274d54 AMDGPU: Implement SGPR spilling with scalar stores
nThis avoids the nasty problems caused by using
memory instructions that read the exec mask while
spilling / restoring registers used for control flow
masking, but only for VI when these were added.

This always uses the scalar stores when enabled currently,
but it may be better to still try to spill to a VGPR
and use this on the fallback memory path.

The cache also needs to be flushed before wave termination
if a scalar store is used.

llvm-svn: 286766
2016-11-13 18:20:54 +00:00
Igor Breger e2399f9e0e revert commit r286761, some builds failed on Win platforms
llvm-svn: 286765
2016-11-13 15:48:11 +00:00
Simon Pilgrim 055c09c1c0 [X86][SSE] Add zero lower 32-bits test case for PR30845
llvm-svn: 286764
2016-11-13 15:32:11 +00:00
Simon Pilgrim 8f7c56125e [X86][AVX512] Add masked VPMOZX test case for PR26762
llvm-svn: 286763
2016-11-13 15:16:43 +00:00
Simon Pilgrim ce59a536f7 [X86][SSE] Add additional test case for PR30845
llvm-svn: 286762
2016-11-13 14:57:52 +00:00
Ayman Musa c09b3769ae [X86][AVX512] Removing llvm x86 intrinsics for _mm_mask_move_{ss|sd} intrinsics.
Differential Revision: https://reviews.llvm.org/D26128

llvm-svn: 286761
2016-11-13 14:51:25 +00:00
Ayman Musa 46af8f9c6f [X86][AVX512] Add patterns for all variants of VMOVSS/VMOVSD instructions.
Differential Revision: https://reviews.llvm.org/D26022

llvm-svn: 286758
2016-11-13 14:29:32 +00:00
Craig Topper 43e97649a1 [AVX-512] Add unmasked intrinsics for variable shifts of dwords and qwords.
These will be used to replace the masked intrinsics so that InstCombineCalls can optimize the AVX-512 variable shifts the same way it does for AVX2.

llvm-svn: 286754
2016-11-13 07:26:15 +00:00
Konstantin Zhuravlyov f86e4b7266 [AMDGPU] Add f16 support (VI+)
Differential Revision: https://reviews.llvm.org/D25975

llvm-svn: 286753
2016-11-13 07:01:11 +00:00
Craig Topper 706d897d8a [AVX-512] Move masked shift intrinsics tests to the autoupgrade test file. These missed being moved in r286725.
llvm-svn: 286746
2016-11-13 03:42:27 +00:00
Sanjay Patel a1b8c10bf6 [x86] add smin/smax with zero tests
These are vector tests corresponding to the discussion at:
http://lists.llvm.org/pipermail/llvm-dev/2016-November/106868.html

Apart from the lack of min/max matching, the and/andn difference 
shows a lack of DAG-level canonicalization.

llvm-svn: 286737
2016-11-13 00:32:39 +00:00
Simon Pilgrim 6e09afa9d0 [X86][SSE] Add test case for PR30845
llvm-svn: 286734
2016-11-12 23:44:58 +00:00
Craig Topper da6a63db1c [AVX-512] Remove the remaining masked shift by immediate or by single value. Autoupgrade them to recently introduced unmasked versions and a select.
After this I'll add the unmasked intrinsics to InstCombineCalls to finish making our handling of these types of shuffles consistent between AVX-512 and the legacy intrinsics.

llvm-svn: 286725
2016-11-12 18:04:46 +00:00
Craig Topper 9d25c5e2fa [AVX-512] Add unmasked version of shift by immediate and shift by single element in XMM.
Summary:
This is the first step towards being able to add the avx512 shift by immediate intrinsics to InstCombineCalls where we aleady support the sse2 and avx2 intrinsics. We need to the unmasked versions so we can avoid having to teach InstCombineCalls that it would need to insert selects sometimes. Instead we'll just add the selects around the new instrinsics in the frontend.

This change should also enable the shift by i32 intrinsics to take a non-constant shift value just like the avx2 and sse intrinsics. This will enable us to fix PR30691 once we update clang.

Next I'll switch clang to use the new builtins. Then we'll come back to the backend and remove/autoupgrade the old intrinsics. Then I'll work on the same series for variable shifts.

Reviewers: RKSimon, zvi, delena

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26333

llvm-svn: 286711
2016-11-12 05:28:24 +00:00
Craig Topper 5cb13062d2 [AVX-512] Add support for lowering shuffles to VALIGND/VALIGNQ
Summary: VALIGND and VALIGNQ are similar to PALIGNR but instead of working on a 128-bit lane they work on the entire vector register. This change leverages the shuffle rotate detection code used for PALIGNR to detect these cases.

Reviewers: delena, RKSimon

Subscribers: Farhana, llvm-commits

Differential Revision: https://reviews.llvm.org/D26297

llvm-svn: 286709
2016-11-12 05:05:27 +00:00
Tom Stellard b4c8e8e30b AMDGPU/SI: Promote i16 = fp_[us]int f32 for VI
Summary: This fixes a regression caused by r286464.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D26570

llvm-svn: 286687
2016-11-12 00:19:11 +00:00
Tom Stellard 9fdbec870c AMDGPU/SI: Fix visit order assumption in SIFixSGPRCopies
Summary:
This pass was assuming that when a PHI instruction defined a register
used by another PHI instruction that the defining insstruction would
be legalized before the using instruction.

This assumption was causing the pass to not legalize some PHI nodes
within divergent flow-control.

This fixes a bug that was uncovered by r285762.

Reviewers: nhaehnle, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D26303

llvm-svn: 286676
2016-11-11 23:35:42 +00:00
Nemanja Ivanovic ec4b0c360f [PowerPC] Add remaining vector permute builtins in altivec.h - LLVM portion
This patch corresponds to review:
https://reviews.llvm.org/D26480

Adds all the intrinsics used for various permute builtins that will
be added to altivec.h.

llvm-svn: 286638
2016-11-11 21:42:01 +00:00
Chad Rosier 811e76dbcd [AArch64] Add test to show narrow zero store merging is disabled with strict align. NFC.
llvm-svn: 286617
2016-11-11 19:25:48 +00:00
Geoff Berry 25fa4999ff [AArch64] Fix bugs in isel lowering replaceSplatVectorStore.
Summary:
Fix off-by-one indexing error in loop checking that inserted value was a
splat vector.

Add code to check that INSERT_VECTOR_ELT nodes constructing the splat
vector have the expected constant index values.

Reviewers: t.p.northover, jmolloy, mcrosier

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D26409

llvm-svn: 286616
2016-11-11 19:25:20 +00:00
Adrian Prantl 554fd99dd5 Revert "Use private linkage for MergedGlobals variables" on Darwin.
This is a partial revert of r244615 (http://reviews.llvm.org/D11942),
which caused a major regression in debug info quality.

Turning the artificial __MergedGlobal symbols into private symbols
(l__MergedGlobal) means that the linker will not include them in the
symbol table of the final executable. Without a symbol table entry
dsymutil is not be able to process the debug info for any of the
merged globals and thus drops the debug info for all of them.

This patch is enabling the old behavior for all MachO targets while
leaving all other targets unaffected.

rdar://problem/29160481
https://reviews.llvm.org/D26531

llvm-svn: 286607
2016-11-11 17:50:09 +00:00
Nemanja Ivanovic 2efc3cb968 [PowerPC] Add vector conversion builtins to altivec.h - LLVM portion
This patch corresponds to review:
https://reviews.llvm.org/D26307

Adds all the intrinsics used for various conversion builtins that will
be added to altivec.h. These are type conversions between various types of
vectors.

llvm-svn: 286596
2016-11-11 14:41:19 +00:00
Chad Rosier 10c7aaaee9 [AArch64] Enable merging of adjacent zero stores for all subtargets.
This optimization merges adjacent zero stores into a wider store.

e.g.,

strh wzr, [x0]
strh wzr, [x0, #2]
; becomes
str wzr, [x0]

e.g.,

str wzr, [x0]
str wzr, [x0, #4]
; becomes
str xzr, [x0]

Previously, this was only enabled for Kryo and Cortex-A57.

Differential Revision: https://reviews.llvm.org/D26396

llvm-svn: 286592
2016-11-11 14:10:12 +00:00
Ulrich Weigand a0e7325023 [SystemZ] Support CL(G)T instructions
This adds support for the compare logical and trap (memory)
instructions that were added as part of the miscellaneous
instruction extensions feature with zEC12.

llvm-svn: 286587
2016-11-11 12:48:26 +00:00
Ulrich Weigand 92c2c672e5 [SystemZ] Support load-and-zero-rightmost-byte facility
This adds support for the LZRF/LZRG/LLZRGF instructions that were
added on z13, and uses them for code generation were appropriate.

SystemZDAGToDAGISel::tryRISBGZero is updated again to prefer LLZRGF
over RISBG where both would be possible.

llvm-svn: 286586
2016-11-11 12:46:28 +00:00
Ulrich Weigand 5dc7b67c62 [SystemZ] Use LLGT(R) instructions
This adds support for the 31-to-64-bit zero extension instructions
LLGT and LLGTR and uses them for code generation where appropriate.

Since this operation can also be performed via RISBG, we have to
update SystemZDAGToDAGISel::tryRISBGZero so that we prefer LLGT
over RISBG in case both are possible.  The patch includes some
simplification to the tryRISBGZero code; this is not intended
to cause any (further) functional change in codegen.

llvm-svn: 286585
2016-11-11 12:43:51 +00:00
Simon Pilgrim 807f9cf243 [SelectionDAG] Add support for vector demandedelts in BSWAP opcodes
llvm-svn: 286582
2016-11-11 11:51:29 +00:00
Simon Pilgrim 08dedfc589 [X86] Add knownbits vector BSWAP test
In preparation for demandedelts support

llvm-svn: 286579
2016-11-11 11:33:21 +00:00
Simon Pilgrim 813721e98a [SelectionDAG] Add support for vector demandedelts in UREM/SREM opcodes
llvm-svn: 286578
2016-11-11 11:23:43 +00:00
Simon Pilgrim 8bc531d349 [X86] Add knownbits vector UREM/SREM tests
In preparation for demandedelts support

llvm-svn: 286577
2016-11-11 11:11:40 +00:00
Simon Pilgrim 0652227814 [SelectionDAG] Add support for vector demandedelts in UDIV opcodes
llvm-svn: 286576
2016-11-11 10:47:24 +00:00
Simon Pilgrim da1a43e861 [X86] Add knownbits vector UDIV test
In preparation for demandedelts support

llvm-svn: 286575
2016-11-11 10:39:15 +00:00
Diana Picus 22274934f4 [ARM] Add plumbing for GlobalISel
Add GlobalISel skeleton, up to the point where we can select a ret void.

llvm-svn: 286573
2016-11-11 08:27:37 +00:00
Matthias Braun 325cd2c98a ScheduleDAGInstrs: Add condjump deps to addSchedBarrierDeps()
addSchedBarrierDeps() is supposed to add use operands to the ExitSU
node. The current implementation adds uses for calls/barrier instruction
and the MBB live-outs in all other cases. The use
operands of conditional jump instructions were missed.

Also added code to macrofusion to set the latencies between nodes to
zero to avoid problems with the fusing nodes lingering around in the
pending list now.

Differential Revision: https://reviews.llvm.org/D25140

llvm-svn: 286544
2016-11-11 01:34:21 +00:00
Stanislav Mekhanoshin 6fc8a1cdaa Revert "[AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies"
This reverts commit r286171, it breaks piglit test fs-discard-exit-2

llvm-svn: 286530
2016-11-11 00:22:34 +00:00
Matthias Braun f29b12dca8 ScheduleDAGInstrs: Ignore dependencies of constant physregs
There is no need to track dependencies for constant physregs, as they
don't change their value no matter in what order you read/write to them.

Differential Revision: https://reviews.llvm.org/D26221

llvm-svn: 286526
2016-11-10 23:46:44 +00:00
Simon Pilgrim 38f0045cb0 [SelectionDAG] Add support for vector demandedelts in ADD/SUB opcodes
llvm-svn: 286516
2016-11-10 22:41:49 +00:00
Justin Lebar ea27ef6969 [LSR] Tweak loop-strength-reduce-crash test. Test-only change.
Run opt instead of llc, and update the comment.

llvm-svn: 286515
2016-11-10 22:37:13 +00:00
Simon Pilgrim a0dee61df3 [X86] Updated knownbits vector ADD/SUB test
In preparation for demandedelts support

llvm-svn: 286513
2016-11-10 22:34:12 +00:00
Simon Pilgrim 8bbfacaf2c [X86] Add knownbits vector ADD test
llvm-svn: 286511
2016-11-10 22:21:04 +00:00
Simon Pilgrim fe3a54371d [SelectionDAG] Add support for splatted vectors in SUB opcode
llvm-svn: 286509
2016-11-10 21:57:42 +00:00
Simon Pilgrim 7e0a4b8fdf [X86] Add knownbits vector SUB test
llvm-svn: 286508
2016-11-10 21:50:23 +00:00
Matthias Braun 9d62c5571b RegisterCoalescer: Ignore interferences for constant physregs
When copying to/from a constant register interferences can be ignored.

Also update the documentation for isConstantPhysReg() to make it more
obvious that this transformation is valid.

Differential Revision: https://reviews.llvm.org/D26106

llvm-svn: 286503
2016-11-10 21:22:47 +00:00
Yaxun Liu d6fbe65040 AMDGPU: Emit runtime metadata as a note element in .note section
Currently runtime metadata is emitted as an ELF section with name .AMDGPU.runtime_metadata.

However there is a standard way to convey vendor specific information about how to run an ELF binary, which is called vendor-specific note element (http://www.netbsd.org/docs/kernel/elf-notes.html).

This patch lets AMDGPU backend emits runtime metadata as a note element in .note section.

Differential Revision: https://reviews.llvm.org/D25781

llvm-svn: 286502
2016-11-10 21:18:49 +00:00
Simon Pilgrim d67af68f06 [SelectionDAG] Add support for vector demandedelts in TRUNCATE opcodes
llvm-svn: 286481
2016-11-10 17:43:52 +00:00
Simon Pilgrim e517f0a417 [X86] Add knownbits vector TRUNC test
In preparation for demandedelts support

llvm-svn: 286477
2016-11-10 17:24:33 +00:00
Simon Pilgrim ee187fd6e7 [SelectionDAG] Add support for vector demandedelts in MUL opcodes
llvm-svn: 286471
2016-11-10 16:27:42 +00:00
Asaf Badouh bb2338e939 reproducer for pr29002
https://reviews.llvm.org/D26449

llvm-svn: 286470
2016-11-10 16:27:27 +00:00
Tom Stellard 115a61560e AMDGPU: Add VI i16 support
Patch By: Wei Ding

Differential Revision: https://reviews.llvm.org/D18049

llvm-svn: 286464
2016-11-10 16:02:37 +00:00
Simon Pilgrim 2cf393c8fe [X86] Add knownbits vector MUL test
In preparation for demandedelts support

llvm-svn: 286463
2016-11-10 15:57:33 +00:00
Simon Pilgrim ca57e53ded [SelectionDAG] Add support for vector demandedelts in SRA opcodes
llvm-svn: 286461
2016-11-10 15:05:09 +00:00
Simon Pilgrim 7be6d99442 [X86] Add knownbits vector arithmetic shift test
In preparation for demandedelts support

llvm-svn: 286457
2016-11-10 14:46:24 +00:00
Simon Pilgrim 37c9034bd6 [DAGCombiner] Correctly extract the ConstOrConstSplat shift value for SHL nodes
We were failing to extract a constant splat shift value if the shifted value was being masked.

The (shl (and (setcc) N01CV) N1CV) -> (and (setcc) N01CV<<N1CV) combine was unnecessarily preventing this.

llvm-svn: 286454
2016-11-10 14:35:09 +00:00
Chad Rosier c16824d217 Remove unnecessary check prefix directives. NFC.
llvm-svn: 286453
2016-11-10 14:28:44 +00:00
Simon Pilgrim 87f38fa85c [DAGCombiner] Show missed opportunity to UNDEF out-of-range SHL
Fails to match constant shift value due to presence of AND mask.

llvm-svn: 286452
2016-11-10 14:19:45 +00:00
Simon Pilgrim 3bf99c056a [SelectionDAG] Add support for vector demandedelts in SHL/SRL opcodes
llvm-svn: 286448
2016-11-10 13:52:42 +00:00
Simon Pilgrim ede8ad7c5a [X86] Add knownbits vector logical shift test
In preparation for demandedelts support

llvm-svn: 286447
2016-11-10 13:34:17 +00:00
Craig Topper bd298c37d1 [AVX-512] Allow legacy cvtpd2dq intrinsics to select EVEX encoded instruction when available.
llvm-svn: 286435
2016-11-10 07:47:17 +00:00
Craig Topper e0845d8e8c [AVX-512][X86] Convert avx_cvtt_ps2dq_256 and sse2_cvttps2dq intrinsics to ISD::FP_TO_SINT in the intrinsics table and delete patterns. While nearby also move CVTDQ2PS patterns into their instructions.
This allows these intrinsics to also use EVEX instructons.

llvm-svn: 286434
2016-11-10 07:24:52 +00:00
Craig Topper f37b9b9b5f [X86] Convert int_x86_avx_cvtt_pd2dq_256 to fp_to_sint using the intrinsics table. Removes extra patterns and allows legacy intrinsic to select EVEX encoded instructions when available.
llvm-svn: 286433
2016-11-10 06:45:39 +00:00
Craig Topper 924c5ec472 [AVX-512] Add test cases to show missed opportunities for using VALIGND/Q to handle shuffles.
llvm-svn: 286425
2016-11-10 03:39:19 +00:00
Peter Collingbourne 32ab3a817d Re-apply r286384, "X86: Introduce the "relocImm" ComplexPattern, which represents a relocatable immediate.", with a fix for 32-bit x86.
Teach X86InstrInfo::analyzeCompare() not to crash on CMP and SUB instructions
that take a global address operand.

llvm-svn: 286420
2016-11-09 23:53:43 +00:00
Dylan McKay 0d4778f841 [AVR] Add a selection of CodeGen tests
Summary: This adds all of the CodeGen tests which currently pass.

Reviewers: arsenm, kparzysz

Subscribers: japaric, wdng

Differential Revision: https://reviews.llvm.org/D26388

llvm-svn: 286418
2016-11-09 23:46:52 +00:00
Tim Northover a9105be437 GlobalISel: translate invoke and landingpad instructions
Pretty bare-bones support for exception handling (no weird MSVC stuff, no SjLj
etc), but it should get things going.

llvm-svn: 286407
2016-11-09 22:39:54 +00:00
Krzysztof Parzyszek a540997ce4 [Hexagon] Separate Hexagon subreg indices for different register classes
For pairs of 32-bit registers: isub_lo, isub_hi.
For pairs of vector registers: vsub_lo, vsub_hi.

Add generic subreg indices: ps_sub_lo, ps_sub_hi, and a function
  HexagonRegisterInfo::getHexagonSubRegIndex(RegClass, GenericSubreg)
that returns the appropriate subreg index for RegClass.

llvm-svn: 286377
2016-11-09 16:19:08 +00:00
Krzysztof Parzyszek 601d7eb11a [Hexagon] Eliminate Insert4 pseudo-instruction, use combines instead
llvm-svn: 286368
2016-11-09 14:16:29 +00:00
Craig Topper f334ac19ad [AVX-512] Add lowering to cvttpd2udq/cvttps2udq for fptoui v2f64/2f32 to 2i32
This patch adds support for fptoui to 2i32 from both 2f64 and 2f32, building on Simon's change for the signed version in r284459 and using AVX-512 instructions.

If we don't have VLX support we need to use a 512-bit operation for v2f64->v2i32 and extract the result.

It also recognises that cvttpd2udq zeroes the upper 64-bits of the xmm result.

Differential Revision: https://reviews.llvm.org/D26331

llvm-svn: 286345
2016-11-09 07:48:51 +00:00
Craig Topper 731bf9c5d6 [X86] Lower AVX512 and SSE intrinsics for CVTTPD2DQ to X86ISD::CVTTPD2DQ.
Summary: This allows the SSE intrinsic to use the EVEX instruction when available. It also fixes EVEX to not use a weird (v4i32 (fp_to_sint v2f64)) node and it merges some isel patterns. This also fixes some cases that weren't combining vzmovl with cvttpd2dq to remove extra moves.

Reviewers: delena, zvi, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26330

llvm-svn: 286344
2016-11-09 07:31:32 +00:00
Craig Topper ef1807fb73 [AVX-512] Add more varied alignments to tests for storing the lower 128-bits of a 256 or 512-bit subvector extract.
llvm-svn: 286343
2016-11-09 05:38:47 +00:00
Craig Topper 28e3dfc02b [AVX-512] Use alignedstore256 in patterns that look for stores of the lower 256-bits of a 512-bit vector to use a 256-bit aligned store.
Previously we were only checking for 16 byte alignment instead of 32 byte alignment. Fixes PR30947.

llvm-svn: 286342
2016-11-09 05:31:57 +00:00
Craig Topper abf5041537 [AVX-512] Add test cases to demonstrate PR30947. We accidentally use 32 byte aligned store instructions when the original store was only 16 byte aligned if the store is from the lower bits of a subvector extract.
llvm-svn: 286341
2016-11-09 05:31:53 +00:00
Craig Topper 5c842be9a0 [AVX-512] Make VBMI instruction set enabling imply that the BWI instruction set is also enabled.
Summary:
This is needed to make the v64i8 and v32i16 types legal for the 512-bit VBMI instructions. Fixes PR30912.

Reviewers: delena, zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26322

llvm-svn: 286339
2016-11-09 04:50:48 +00:00
Sanjay Patel e104554412 [ValueTracking] recognize obfuscated variants of umin/umax
The smallest tests that expose this are codegen tests (because SelectionDAGBuilder::visitSelect() uses matchSelectPattern
to create UMAX/UMIN nodes), but it's also possible to see the effects in IR alone with folds of min/max pairs.

If these were written as unsigned compares in IR, InstCombine canonicalizes the unsigned compares to signed compares. 
Ie, running the optimizer pessimizes the codegen for this case without this patch:

define <4 x i32> @umax_vec(<4 x i32> %x) {
  %cmp = icmp ugt <4 x i32> %x, <i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647>
  %sel = select <4 x i1> %cmp, <4 x i32> %x, <4 x i32> <i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647>
  ret <4 x i32> %sel
}

$ ./opt umax.ll -S | ./llc -o - -mattr=avx

vpmaxud LCPI0_0(%rip), %xmm0, %xmm0

$ ./opt -instcombine umax.ll -S | ./llc -o - -mattr=avx

vpxor %xmm1, %xmm1, %xmm1
vpcmpgtd  %xmm0, %xmm1, %xmm1
vmovaps LCPI0_0(%rip), %xmm2    ## xmm2 = [2147483647,2147483647,2147483647,2147483647]
vblendvps %xmm1, %xmm0, %xmm2, %xmm0

Differential Revision: https://reviews.llvm.org/D26096

llvm-svn: 286318
2016-11-09 00:24:44 +00:00
Dan Gohman e81021a5cb [WebAssembly] Convert stackified IMPLICIT_DEF into constant 0.
Since IMPLIFIT_DEF instructions are omitted in the output, when the output
of an IMPLICIT_DEF instruction is stackified, the resulting register lacks
an explicit push, leading to a push/pop mismatch. Fix this by converting
such IMPLICIT_DEFs into CONST_I32 0 instructions so that they have explicit
pushes.

llvm-svn: 286274
2016-11-08 19:40:38 +00:00
Tim Northover 5f7dea85c2 GlobalISel: support selecting fpext/fptrunc instructions on AArch64.
llvm-svn: 286253
2016-11-08 17:44:07 +00:00
Anton Korobeynikov 243a4700ce Fix PR27500: on MSP430 the branch destination offset is measured in words, not bytes.
Summary: In addition, the branch instructions will have proper BB destinations, not offsets, like before.

Reviewers: asl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23718

llvm-svn: 286252
2016-11-08 17:19:59 +00:00
Simon Pilgrim bdb3c38157 [X86][SSE] Regenerate test (just adds missing header)
llvm-svn: 286241
2016-11-08 15:42:49 +00:00
Simon Pilgrim 778596bf59 [TargetLowering] Fix undef vector element issue with true/false result handling
Fixed an issue with vector usage of TargetLowering::isConstTrueVal / TargetLowering::isConstFalseVal boolean result matching.

The comment said we shouldn't handle constant splat vectors with undef elements. But the the actual code was returning false if the build vector contained no undef elements....

This patch now ignores the number of undefs (getConstantSplatNode will return null if the build vector is all undefs).

The change has also unearthed a couple of missed opportunities in AVX512 comparison code that will need to be addressed.

Differential Revision: https://reviews.llvm.org/D26031

llvm-svn: 286238
2016-11-08 15:07:01 +00:00
Simon Pilgrim d02c55204b [VectorLegalizer] Expansion of CTLZ using CTPOP when possible
This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available.

This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful.

Differential Revision: https://reviews.llvm.org/D25910

llvm-svn: 286233
2016-11-08 14:10:28 +00:00
Roger Ferrer Ibanez 80c0f33c29 [AArch64] Fix incorrect CSEL node created
Under -enable-unsafe-fp-math, SELECT_CC lowering in AArch64
transforms floating point comparisons of the form "a == 0.0 ? 0.0 : x" to
"a == 0.0 ? a : x". But it incorrectly assumes that 'x' and 'a' have
the same type which can lead to a wrong CSEL node that crashes later
due to nonsensical copies.

Differential Revision: https://reviews.llvm.org/D26394

llvm-svn: 286231
2016-11-08 13:34:41 +00:00
Simon Dardis e7cc54058d [mips] Renable small data section test.
llvm-svn: 286230
2016-11-08 13:03:45 +00:00
Craig Topper c6a0339fb0 [AVX-512] Add an avx512f without avx512vl command line to vec_fp_to_int.ll and regenerate. This will make a change in a future patch easier to see. NFC
llvm-svn: 286216
2016-11-08 06:58:53 +00:00
Tim Northover 9ac0eba672 GlobalISel: support selecting G_SELECT on AArch64.
llvm-svn: 286185
2016-11-08 00:45:29 +00:00
Tim Northover 7d88da6a46 GlobalISel: constrain PHI registers on AArch64.
Self-referencing PHI nodes need their destination operands to be constrained
because nothing else is likely to do so. For now we just pick a register class
naively.

Patch mostly by Ahmed again.

llvm-svn: 286183
2016-11-08 00:34:06 +00:00
Chad Rosier 583a307e17 [AArch64] Remove dead check prefixes after r286110. NFC.
llvm-svn: 286174
2016-11-07 23:13:59 +00:00
Chad Rosier d8447a7d30 [AArch64] Rename test to reflect changes after r286110. NFC.
llvm-svn: 286173
2016-11-07 23:13:55 +00:00
Stanislav Mekhanoshin 92e01ee90b [AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies
Codegen prepare sinks comparisons close to a user is we have only one register
for conditions. For AMDGPU we have many SGPRs capable to hold vector conditions.
Changed BE to report we have many condition registers. That way IR LICM pass
would hoist an invariant comparison out of a loop and codegen prepare will not
sink it.

With that done a condition is calculated in one block and used in another.
Current behavior is to store workitem's condition in a VGPR using v_cndmask
and then restore it with yet another v_cmp instruction from that v_cndmask's
result. To mitigate the issue a forward propagation of a v_cmp 64 bit result
to an user is implemented. Additional side effect of this is that we may
consume less VGPRs in a cost of more SGPRs in case if holding of multiple
conditions is needed, and that is a clear win in most cases.

llvm-svn: 286171
2016-11-07 23:04:50 +00:00
Sanjin Sijaric 6f020d91a1 [AArch64] Transfer memory operands when lowering vector load/store intrinsics
Summary:
Some vector loads and stores generated from AArch64 intrinsics alias each other
unnecessarily, preventing better scheduling.  We just need to transfer memory
operands during lowering.

Reviewers: mcrosier, t.p.northover, jmolloy

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D26313

llvm-svn: 286168
2016-11-07 22:39:02 +00:00
Derek Schuff 0d41b7b3f3 [WebAssembly] Emit a BasePointer when we have overly-aligned stack objects
Because we shift the stack pointer by an unknown amount, we need an
additional pointer. In the case where we have variable-size objects
as well, we can't reuse the frame pointer, thus three pointers.

Patch by Jacob Gravelle

Differential Revision: https://reviews.llvm.org/D26263

llvm-svn: 286160
2016-11-07 22:00:48 +00:00
Matt Arsenault f530e8b3f0 AMDGPU: Remove unnecessary and on conditional branch
The comment explaining why this was necessary is incorrect
in its description of v_cmp's behavior for inactive workitems.

llvm-svn: 286134
2016-11-07 19:09:33 +00:00
Matt Arsenault 52f14ec596 AMDGPU: Preserve vcc undef flags when inverting branch
If the branch was on a read-undef of vcc, passes that used
analyzeBranch to invert the branch condition wouldn't preserve
the undef flag resulting in a verifier error.

Fixes verifier failures in a future commit.

Also fix verifier error when inserting copy for vccz
corruption bug.

llvm-svn: 286133
2016-11-07 19:09:27 +00:00
Richard Smith 857efb0880 Add -O0 support for @llvm.invariant.group.barrier by discarding it if it gets to ISel.
Differential Revision: https://reviews.llvm.org/D26292

llvm-svn: 286119
2016-11-07 16:47:20 +00:00
Amara Emerson 614b44bbe9 This patch adds support for 16 bit floating point registers to the inline asm register selection on AArch64.
Without this patch, register allocation for the example below fails.

define half @test(half %a1, half %a2) #0 {
entry:
  %0 = tail call half asm "sqrshl ${0:h}, ${1:h}, ${2:h}", "=w,w,w" (half %a1, half %a2) #1
  ret half %0
}

Patch by Florian Hahn.

Differential Revision: https://reviews.llvm.org/D25080

llvm-svn: 286111
2016-11-07 15:42:12 +00:00
Chad Rosier d6daac4746 [AArch64] Removed the narrow load merging code in the ld/st optimizer.
This feature has been disabled for some time now, so remove cruft.

Differential Revision: https://reviews.llvm.org/D26248

llvm-svn: 286110
2016-11-07 15:27:22 +00:00
James Molloy b03e0879fc [Thumb1] Move padding earlier when synthesizing TBBs off of the PC
When the base register (register pointing to the jump table) is the PC, we expect the jump table to directly follow the jump sequence with no intervening padding.

If there is intervening padding, the calculated offsets will not be correct. One solution would be to account for any padding in the emitted LDRB instruction, but at the moment we don't support emitting MCExprs for the load offset.

In the meantime, it's correct and only a slight amount worse to just move the padding up, from just before the jump table to just before the jump instruction sequence. We can do that by emitting code alignment before the jump sequence, as we know the number of instructions in the sequence is always 4.

llvm-svn: 286107
2016-11-07 13:38:21 +00:00
Simon Pilgrim b56c731f18 [X86][AVX512] Add AVX512VL/AVX512BWVL vector truncation tests
llvm-svn: 286105
2016-11-07 13:34:29 +00:00
Simon Pilgrim 02666ac9c3 [X86][SSE] Drop unnecessary -mcpu argument from trunc tests
cpu/triple duplication

llvm-svn: 286104
2016-11-07 13:28:20 +00:00
Craig Topper b110e04851 [AVX-512] Remove masked pmovzx/pmovsx builtins and autoupgrade them to selects and native zext/sext.
This mostly reuses earlier autoupgrade support for the sse and avx equivalents. Just needed to add the code to add the select.

llvm-svn: 286092
2016-11-07 02:12:57 +00:00
Craig Topper 7e545335d6 [AVX-512] Remove 128/256 masked pshufb intrinsics. Autoupgrade them to legacy intrinsics and a select.
llvm-svn: 286089
2016-11-07 00:13:39 +00:00
Saleem Abdulrasool 804e12eeb5 ARM: lower fpowi appropriately for Windows ARM
This handles the last case of the builtin function calls that we would
generate code which differed from Microsoft's ABI.  Rather than
generating a call to `__pow{d,s}i2` we now promote the parameter to a
float or double and invoke `powf` or `pow` instead.

Addresses PR30825!

llvm-svn: 286082
2016-11-06 19:46:54 +00:00
Simon Pilgrim 39df78e384 [SelectionDAG] Add support for vector demandedelts in XOR opcodes
llvm-svn: 286075
2016-11-06 16:49:19 +00:00
Simon Pilgrim 3ac353cb51 [X86] Add knownbits vector xor test
In preparation for demandedelts support

llvm-svn: 286074
2016-11-06 16:36:29 +00:00
Craig Topper 46de41330c [AVX-512] Remove intrinsics for 128/256-bit masked variable shift. Instead upgrade them to a select and the older AVX2 intrinsic.
llvm-svn: 286073
2016-11-06 16:29:19 +00:00
Craig Topper af9b3fe752 [AVX-512] Remove intrinsics for 128/256-bit masked shift by immediate. Instead upgrade them to a select and the older SSE/AVX2 intrinsic.
llvm-svn: 286072
2016-11-06 16:29:14 +00:00
Simon Pilgrim dd4809a603 [SelectionDAG] Add support for vector demandedelts in OR opcodes
llvm-svn: 286071
2016-11-06 16:29:09 +00:00
Craig Topper c9467ed31e [AVX-512] Remove intrinsics for 128/256-bit masked shift by single element in xmm. Instead upgrade them to a select and the older SSE/AVX2 intrinsic.
llvm-svn: 286070
2016-11-06 16:29:08 +00:00
Craig Topper 1b468b4e3a [AVX-512] Remove a 512-bit test cases from the avx512vl test file. It already exists in the avx512f test file.
llvm-svn: 286069
2016-11-06 16:29:03 +00:00
Simon Pilgrim c104185580 [X86] Add knownbits vector or test
In preparation for demandedelts support

llvm-svn: 286068
2016-11-06 16:05:59 +00:00
Craig Topper 6b3e7b47d8 [X86] Add a few more fptoui test cases to the vec_fp_to_int.ll. The codegen for these test cases will be improved for AVX512 in a future commit.
llvm-svn: 286063
2016-11-06 07:50:25 +00:00
Craig Topper 5471fc29e4 [AVX-512] Add missing EVEX version of pattern for (v2f64 (extloadv2f32 addr:)) -> VCVTPS2PDZ128rm
llvm-svn: 286059
2016-11-06 04:12:52 +00:00
Craig Topper bd156195b0 [AVX-512] Add avx512vl command line to the fpext test and add -show-mc-encoding to show where we aren't using EVEX instructions.
llvm-svn: 286058
2016-11-06 04:12:49 +00:00
Craig Topper 1162857ec4 [AVX-512] Lower AVX cvtpd2ps intrinsic to ISD::FP_ROUND so it can use EVEX instruction when available.
llvm-svn: 286057
2016-11-06 04:12:46 +00:00
Craig Topper 9a4a3af5dd [AVX-512] Lower SSE/AVX cvtdq2ps intrinsics directly to ISD::SINT_TO_FP so they can use EVEX instructions when available.
llvm-svn: 286056
2016-11-06 04:12:42 +00:00
Craig Topper a4a51f1afe [AVX-512] Add -show-mc-encoding to legacy vector intrinsic tests so we can see when VEX or EVEX encoded instructions are being emitted. Make sure the tests all have an avx2 command line and an skx command line.
llvm-svn: 286055
2016-11-06 02:03:58 +00:00
Justin Lebar 54b0be048e [LoopStrengthReduce] Don't use a DenseSet<int64_t> when we might add any valid int64_t to the set.
Summary:
SmallSetVector uses DenseSet, but that means we need to reserve some
values for the empty and tombstone keys.

It seems to me we should have a general way to let us store full-range
ints inside of DenseSets, and furthermore that we probably shouldn't
silently let you add ints into DenseSets without explicitly promising
that they're in range.  But that's a battle for another day; for now,
just fix this code, since we currently do something Very Bad when
compiling ffmpeg.

Fixes PR30914.

Reviewers: jeremyhu

Subscribers: llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D26323

llvm-svn: 286038
2016-11-05 16:47:25 +00:00
Krzysztof Parzyszek b7eb7fc892 [Hexagon] Account for <def,read-undef> when validating moves for predication
llvm-svn: 286009
2016-11-04 20:41:03 +00:00
Zvi Rackover 85bc64c734 [X86] Broadcast from memory intructions aren't unfoldable
Broadcast from memory instructions should be treated as moves. They can't be unfolded.

Fixes pr30693.

llvm-svn: 285998
2016-11-04 15:15:19 +00:00
Zvi Rackover 1522b33195 Add bugpoint-reduced reproducer for pr30693
llvm-svn: 285997
2016-11-04 14:53:22 +00:00
Tom Stellard 2d2d33f1dc Revert "AMDGPU: Add VI i16 support"
This reverts commit r285939 and r285948.  These broke some conformance tests.

llvm-svn: 285995
2016-11-04 13:06:34 +00:00
Weiming Zhao 962eaaea9c [Cortex-M0] Atomic lowering
Summary: ARMv6m supports dmb etc fench instructions but not ldrex/strex etc. So for some atomic load/store, LLVM should inline instructions instead of lowering to __sync_ calls.

Reviewers: rengolin, efriedma, t.p.northover, jmolloy

Subscribers: efriedma, aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D26120

llvm-svn: 285969
2016-11-03 21:49:08 +00:00
Tom Stellard 2b3379cdff AMDGPU: Add VI i16 support
Patch By: Wei Ding

Differential Revision: https://reviews.llvm.org/D18049

llvm-svn: 285939
2016-11-03 17:13:50 +00:00
Alexander Timofeev f867a40bf6 [AMDGPU][CodeGen] To improve CGEMM performance: combine LDS reads.
hange explores the fact that LDS reads may be reordered even if access
the same location.

Prior the change, algorithm immediately stops as soon as any memory
access encountered between loads that are expected to be merged
together. Although, Read-After-Read conflict cannot affect execution
correctness.

Improves hcBLAS CGEMM manually loop-unrolled kernels performance by 44%.
Also improvement expected on any massive sequences of reads from LDS.

Differential Revision: https://reviews.llvm.org/D25944

llvm-svn: 285919
2016-11-03 14:37:13 +00:00
James Molloy e7d97368f2 Revert "[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently"
This reverts commit r285893. It caused (probably) http://lab.llvm.org:8011/builders/clang-cmake-thumbv7-a15-full-sh/builds/83 .

llvm-svn: 285912
2016-11-03 14:08:01 +00:00
James Molloy b60d8b1987 [Thumb] Teach ISel how to lower compares of AND bitmasks efficiently
This recommits r281323, which was backed out for two reasons. One, a selfhost failure, and two, it apparently caused Chromium failures. Actually, the latter was a red herring. The log has expired from the former, but I suspect that was a red herring too (actually caused by another problematic patch of mine). Therefore reapplying, and will watch the bots like a hawk.

For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)).

1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS.
2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS.
3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS).
4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask.

1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win.

llvm-svn: 285893
2016-11-03 10:18:20 +00:00
Craig Topper 7b9cc1474f [AVX-512] Use 'vnot' instead of 'not' in patterns involving vXi1 vectors.
This fixes selection of KANDN instructions and allows us to remove an extra set of patterns for KNOT and KXNOR.

Reviewers: delena, igorb

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26134

llvm-svn: 285878
2016-11-03 06:04:28 +00:00
Elena Demikhovsky caaceef4b3 Expandload and Compressstore intrinsics
2 new intrinsics covering AVX-512 compress/expand functionality.
This implementation includes syntax, DAG builder, operation lowering and tests.
Does not include: handling of illegal data types, codegen prepare pass and the cost model.

llvm-svn: 285876
2016-11-03 03:23:55 +00:00
Krzysztof Parzyszek ead77016d8 [Hexagon] Remove registers coalesced in expand-condsets from live intervals
llvm-svn: 285846
2016-11-02 17:59:54 +00:00
Matt Arsenault bf9ee26aea AMDGPU: Cleanup some xfailed tests
Some of these are already fixed or tested somewhere else.

llvm-svn: 285840
2016-11-02 17:24:54 +00:00
Nicolai Haehnle 368972c3b3 AMDGPU: Allow additional implicit operands on MOVRELS instructions
Summary:
The post-RA scheduler occasionally uses additional implicit operands when
the vector implicit operand as a whole is killed, but some subregisters
are still live because they are directly referenced later. Unfortunately,
this seems incredibly subtle to reproduce.

Fixes piglit spec/glsl-110/execution/variable-indexing/vs-temp-array-mat2-index-wr.shader_test
and others.

Reviewers: arsenm, tstellarAMD

Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25656

llvm-svn: 285835
2016-11-02 17:03:11 +00:00
Matt Arsenault 44deb7914e BranchRelaxation: Fix computing indirect branch block size
llvm-svn: 285828
2016-11-02 16:18:29 +00:00
Matt Arsenault 663ab8c119 AMDGPU: Use brev for materializing SGPR constants
This is already done with VGPR immediates and saves 4 bytes.

llvm-svn: 285765
2016-11-01 23:14:20 +00:00
Matt Arsenault 3d463193a9 AMDGPU: Default to using scalar mov to materialize immediate
This is the conservatively correct way because it's easy to
move or replace a scalar immediate. This was incorrect in the case
when the register class wasn't known from the static instruction
definition, but still needed to be an SGPR. The main example of this
is inlineasm has an SGPR constraint.

Also start verifying the register classes of inlineasm operands.

llvm-svn: 285762
2016-11-01 22:55:07 +00:00
Konstantin Zhuravlyov d971a1123f [AMDGPU] Check if type transforms to i16 (VI+) when getting AMDGPUISD::FFBH_U32
This will prevent following regression when enabling i16 support (D18049):

test/CodeGen/AMDGPU/ctlz.ll
test/CodeGen/AMDGPU/ctlz_zero_undef.ll

Differential Revision: https://reviews.llvm.org/D25802

llvm-svn: 285716
2016-11-01 17:49:33 +00:00
Tom Stellard 94c21bc088 AMDGPU: Implement expansion of f16 = FP_TO_FP16 f64
I wanted to implement this as a target independent expansion, however when
targets say they want to expand FP_TO_FP16 what they actually want is
the unsafe math expansion when possible and expansion to a libcall in all
other cases.

The only way to make this work as a target independent would be to add logic
to target's TargetLowering construction to mark theses nodes as Expand when
LegalizeDAG can use the unsafe expansion and mark them as LibCall when it
cannot.  I think this would be possible, but I think it would be too fragile
and complex as it would require targets to keep their expansion logic up
to date with the code in LegalizeDAG.

Reviewers: bogner, ab, t.p.northover, arsenm

Subscribers: wdng, llvm-commits, nhaehnle

Differential Revision: https://reviews.llvm.org/D25999

llvm-svn: 285704
2016-11-01 16:31:48 +00:00
Sjoerd Meijer b187f5d988 This is a 1 character fix for an ARM build attribute test (r284571): the
purpose of the test was to have 2 different function attribute sets, but due
to a typo there was only one both with number #0.

llvm-svn: 285701
2016-11-01 15:59:37 +00:00
Chris Dewhurst 30de1de144 [Sparc][LEON] Test for FixFDIVSQRT erratum fix.
Note: Test is per differential review, but the other changed code in the review was for an optimisation that din't quite work. Nevertheless, the test is valid for the unoptimised version of the fix.

Differential Review: https://reviews.llvm.org/D24658

llvm-svn: 285692
2016-11-01 14:23:37 +00:00
James Molloy 70a3d6df52 [Thumb-1] Synthesize TBB/TBH instructions to make use of compressed jump tables
[Reapplying r284580 and r285917 with fix and testing to ensure emitted jump tables for Thumb-1 have 4-byte alignment]

The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions.

It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size.

TBB example:
Before: lsls r0, r0, #2    After: add  r0, pc
        adr  r1, .LJTI0_0         ldrb r0, [r0, #6]
        ldr  r0, [r0, r1]         lsls r0, r0, #1
        mov  pc, r0               add  pc, r0
  => No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4.

The only case that can increase dynamic instruction count is the TBH case:

Before: lsls r0, r4, #2    After: lsls r4, r4, #1
        adr  r1, .LJTI0_0         add  r4, pc
        ldr  r0, [r0, r1]         ldrh r4, [r4, #6]
        mov  pc, r0               lsls r4, r4, #1
                                  add  pc, r4
  => 1 more instruction in prologue. Jump table shrunk by a factor of 2.

So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!)

llvm-svn: 285690
2016-11-01 13:37:41 +00:00
Valery Pykhtin 8a89d3662a [AMDGPU] Expand vector mulhu/mulhs
Differential revision: https://reviews.llvm.org/D26077

llvm-svn: 285684
2016-11-01 10:26:48 +00:00
Nemanja Ivanovic e70fa63390 [PowerPC] Implement vector shift builtins - llvm portion
This patch corresponds to review https://reviews.llvm.org/D26095.
Committing on behalf of Tony Jiang.

llvm-svn: 285681
2016-11-01 09:42:32 +00:00
Sanjay Patel 70c5f02d25 [DAG] disable nsw/nuw for add/sub/mul when simplifying based on demanded bits (PR30841)
This bug was exposed by using nsw/nuw for more aggressive folds in:
https://reviews.llvm.org/rL284844

The changes mimic the IR demanded bits logic in InstCombiner::SimplifyDemandedUseBits(),
but we can't just flip flag bits in the DAG; we have to create a new node that has the
bits cleared.

This should fix:
https://llvm.org/bugs/show_bug.cgi?id=30841 

llvm-svn: 285656
2016-10-31 23:28:45 +00:00
Saleem Abdulrasool e1aa782bd0 CodeGen: further loosen -O0 CG for WoA division
Generate the slowest possible codepath for noopt CodeGen.  Even trying to be
clever with the negated jump can cause out-of-range jumps.  Use a wide branch
instead. Although the code is modelled simplistically, the later optimizations
would recombine the branching into `cbz` if possible.  This re-enables the
previous optimization as well as hopefully gives us working code in all cases.

Addresses PR30356!

llvm-svn: 285649
2016-10-31 22:12:37 +00:00
Justin Lebar ed1e312f05 [NVPTX] Remove NVPTXFavorNonGenericAddrSpaces pass.
Summary:
This has been replaced by the NVPTXInferAddressSpaces pass.  We've had
the new one as the default with the old one accessible via a flag for
some months now, and we've had no problems.

Reviewers: tra

Subscribers: llvm-commits, jholewinski, jingyue, mgorny

Differential Revision: https://reviews.llvm.org/D26165

llvm-svn: 285642
2016-10-31 21:51:42 +00:00
Nemanja Ivanovic 60bdfe5a7c [PPC] add absolute difference altivec instructions and matching intrinsics
This patch corresponds to review https://reviews.llvm.org/D26072.
Committing on behalf of Sean Fertile.

llvm-svn: 285627
2016-10-31 19:47:52 +00:00
Tim Northover 037af52c8b GlobalISel: allow truncating pointer casts on AArch64.
llvm-svn: 285615
2016-10-31 18:31:09 +00:00
Tim Northover cdf23f1d93 GlobalISel: translate stack protector intrinsics
llvm-svn: 285614
2016-10-31 18:30:59 +00:00
Krzysztof Parzyszek 22586dcb2a [Hexagon] Don't expand mux instructions with both sources identical
llvm-svn: 285588
2016-10-31 15:45:09 +00:00
Manuel Klimek 7c41f20a04 Add triple to test so it does not fail on windows.
llvm-svn: 285560
2016-10-31 11:40:14 +00:00
Manuel Klimek bab67d2af4 Delete .s file that did not test anything, and check in test that works.
In D26098, Davide Italiano submitted a .s file instead of the .ll file
that was the last stage of the review.

llvm-svn: 285559
2016-10-31 11:18:39 +00:00
Craig Topper d4e580705d [AVX-512] Add missing patterns for selecting masked vector extracts that started from shuffles.
llvm-svn: 285546
2016-10-31 05:55:57 +00:00
Sanjay Patel 339a51ac13 [DAG] x | x --> x
llvm-svn: 285522
2016-10-30 18:19:35 +00:00
Sanjay Patel 13aee345ca [DAG] x & x --> x
llvm-svn: 285521
2016-10-30 18:13:30 +00:00
Sanjay Patel 8a5f9810a0 [x86] add tests for basic logic op folds
llvm-svn: 285520
2016-10-30 18:04:19 +00:00
Sanjay Patel 36eeb6d6f6 [ValueTracking] recognize more variants of smin/smax
Try harder to detect obfuscated min/max patterns: the initial pattern was added with D9352 / rL236202. 
There was a bug fix for PR27137 at rL264996, but I think we can do better by folding the corresponding
smax pattern and commuted variants.

The codegen tests demonstrate the effect of ValueTracking on the backend via SelectionDAGBuilder. We
can't expose these differences minimally in IR because we don't have smin/smax intrinsics for IR.

Differential Revision: https://reviews.llvm.org/D26091

llvm-svn: 285499
2016-10-29 16:21:19 +00:00
Sanjay Patel e9fa95e572 [x86] add tests for smin/smax matchSelPattern (D26091)
llvm-svn: 285498
2016-10-29 16:02:57 +00:00
Simon Pilgrim 75a697a17e [DAGCombiner] (REAPPLIED) Add vector demanded elements support to computeKnownBits
Currently computeKnownBits returns the common known zero/one bits for all elements of vector data, when we may only be interested in one/some of the elements.

This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original computeKnownBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1.

The approach was found to be easier than trying to add a per-element known bits solution, for a similar usefulness given the combines where computeKnownBits is typically used.

I've only added support for a few opcodes so far (the ones that have proven straightforward to test), all others will default to demanding all elements but can be updated in due course.

DemandedElts support could similarly be added to computeKnownBitsForTargetNode in a future commit.

This looked like this had caused compile time regressions on some buildbots (and was reverted in rL285381), but appears to have just been a harmless bystander!

Differential Revision: https://reviews.llvm.org/D25691

llvm-svn: 285494
2016-10-29 11:29:39 +00:00
Elena Demikhovsky 519b4ccd70 Fixed FMA + FNEG combine.
Masked form of FMA should be omitted in this optimization.

Differential Revision: https://reviews.llvm.org/D25984

llvm-svn: 285492
2016-10-29 08:44:46 +00:00
Matt Arsenault c88ba36eab AMDGPU: Use 1/2pi inline imm on VI
I'm guessing at how it is supposed to be printed

llvm-svn: 285490
2016-10-29 04:05:06 +00:00
Davide Italiano 86168b23cf [DAGCombiner] Fix a crash visiting `AND` nodes.
Instead of asserting that the shift count is != 0 we just bail out
as it's not profitable trying to optimize a node which will be
removed anyway.

Differential Revision:  https://reviews.llvm.org/D26098

llvm-svn: 285480
2016-10-28 23:55:32 +00:00
Tom Stellard 6695ba0440 AMDGPU/SI: Don't use non-0 waitcnt values when waiting on Flat instructions
Summary:
Flat instruction can return out of order, so we need always need to wait
for all the outstanding flat operations.

Reviewers: tony-tye, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D25998

llvm-svn: 285479
2016-10-28 23:53:48 +00:00
Matt Arsenault 7b6475568d AMDGPU: Add definitions for scalar store instructions
Also add glc bit to the scalar loads since they exist on VI
and change the caching behavior.

This currently has an assembler bug where the glc bit is incorrectly
accepted on SI/CI which do not have it.

llvm-svn: 285463
2016-10-28 21:55:15 +00:00
Justin Lebar f0a80ba385 [NVPTX] Compute 'rem' using the result of 'div', if possible.
Summary:
In isel, transform

  Num % Den

into

  Num - (Num / Den) * Den

if the result of Num / Den is already available.

Reviewers: tra

Subscribers: hfinkel, llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D26090

llvm-svn: 285461
2016-10-28 21:44:00 +00:00
Matt Arsenault b5f2bb1a88 AMDGPU: Change check prefix in test
llvm-svn: 285449
2016-10-28 20:33:01 +00:00
Matt Arsenault 4eae301995 AMDGPU: Diagnose using too many SGPRs
This is possible when using inline asm.

llvm-svn: 285447
2016-10-28 20:31:47 +00:00
Krzysztof Parzyszek 2717175c99 Handle non-~0 lane masks on live-in registers in LivePhysRegs
When LivePhysRegs adds live-in registers, it recognizes ~0 as a special
lane mask indicating the entire register. If the lane mask is not ~0,
it will only add the subregisters that overlap the specified lane mask.

The problem is that if a live-in register does not have subregisters,
and the lane mask is not ~0, it will not be added to the live set.
(The given lane mask may simply be the lane mask of its register class.)

If a register does not have subregisters, add it to the live set if
the lane mask is non-zero.

Differential Revision: https://reviews.llvm.org/D26094

llvm-svn: 285440
2016-10-28 20:06:37 +00:00
Matt Arsenault 08906a3c62 AMDGPU: Fix using incorrect private resource with no allocation
It's possible to have a use of the private resource descriptor or
scratch wave offset registers even though there are no allocated
stack objects. This would result in continuing to use the maximum
number reserved registers. This could go over the number of SGPRs
available on VI, or violate the SGPR limit requested by
the function attributes.

llvm-svn: 285435
2016-10-28 19:43:31 +00:00