Commit Graph

17909 Commits

Author SHA1 Message Date
Craig Topper e7f2611160 [X86] Add EVEX encoded VBROADCASTSS/SD and VPBROADCASTD/Q to execution domain fixing table.
llvm-svn: 282687
2016-09-29 05:54:39 +00:00
Craig Topper 7da0465062 [X86] Add 512-bit VPBROADCASTB and VPBROADCASTW tests.
llvm-svn: 282685
2016-09-29 05:54:32 +00:00
Craig Topper 816a1d7783 [X86] Add VBROADCASTF128/VBROADCASTI128 to execution domain fixing tables.
llvm-svn: 282684
2016-09-29 05:54:28 +00:00
Matt Arsenault e6740754f0 AMDGPU: Partially fix control flow at -O0
Fixes to allow spilling all registers at the end of the block
work with exec modifications. Don't emit s_and_saveexec_b64 for
if lowering, and instead emit copies. Mark control flow mask
instructions as terminators to get correct spill code placement
with fast regalloc, and then have a separate optimization pass
form the saveexec.

This should work if SGPRs are spilled to VGPRs, but
will likely fail in the case that an SGPR spills to memory
and no workitem takes a divergent branch.

llvm-svn: 282667
2016-09-29 01:44:16 +00:00
Krzysztof Parzyszek dcb1bcae0b IfConversion: Add implicit uses for redefined regs with live subregisters
Normally, if conversion would add implicit uses for redefined registers,
e.g. R0<def> = add_if ..., R0<imp-use>. However, if only subregisters of
R0 are known to be live but not R0 itself, such implicit uses will not be
added, causing prior definitions of such subregisters and R0 itself to
become dead.

llvm-svn: 282626
2016-09-28 20:07:41 +00:00
Konstantin Zhuravlyov e14df4b236 [AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit instructions
Differential Revision: https://reviews.llvm.org/D24125

llvm-svn: 282624
2016-09-28 20:05:39 +00:00
Simon Pilgrim fea5c7a051 [X86][AVX] Add test showing that VBROADCAST loads don't correctly respect dependencies
llvm-svn: 282613
2016-09-28 17:59:30 +00:00
Adrian Prantl 7f5866c227 Teach LiveDebugValues about lexical scopes.
This addresses PR26055 LiveDebugValues is very slow.

Contrary to the old LiveDebugVariables pass LiveDebugValues currently
doesn't look at the lexical scopes before inserting a DBG_VALUE
intrinsic. This means that we often propagate DBG_VALUEs much further
down than necessary. This is especially noticeable in large C++
functions with many inlined method calls that all use the same
"this"-pointer.

For example, in the following code it makes no sense to propagate the
inlined variable a from the first inlined call to f() into any of the
subsequent basic blocks, because the variable will always be out of
scope:

void sink(int a);
void __attribute((always_inline)) f(int a) { sink(a); }
void foo(int i) {
   f(i);
   if (i)
     f(i);
   f(i);
}

This patch reuses the LexicalScopes infrastructure we have for
LiveDebugVariables to take this into account.

The effect on compile time and memory consumption is quite noticeable:
I tested a benchmark that is a large C++ source with an enormous
amount of inlined "this"-pointers that would previously eat >24GiB
(most of them for DBG_VALUE intrinsics) and whose compile time was
dominated by LiveDebugValues. With this patch applied the memory
consumption is 1GiB and 1.7% of the time is spent in LiveDebugValues.

https://reviews.llvm.org/D24994
Thanks to Daniel Berlin and Keith Walker for reviewing!

llvm-svn: 282611
2016-09-28 17:51:14 +00:00
Artem Belevich 3e1211581c [NVPTX] Added intrinsics for atom.gen.{sys|cta}.* instructions.
These are only available on sm_60+ GPUs.

Differential Revision: https://reviews.llvm.org/D24943

llvm-svn: 282607
2016-09-28 17:25:38 +00:00
Nirav Dave e524f50882 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r282600 due to test failues with MCJIT

llvm-svn: 282604
2016-09-28 16:37:50 +00:00
Nirav Dave e17e055b75 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Simplify Consecutive Merge Store Candidate Search

  Now that address aliasing is much less conservative, push through
  simplified store merging search which only checks for parallel stores
  through the chain subgraph. This is cleaner as the separation of
  non-interfering loads/stores from the store-merging logic.

  Whem merging stores, search up the chain through a single load, and
  finds all possible stores by looking down from through a load and a
  TokenFactor to all stores visited. This improves the quality of the
  output SelectionDAG and generally the output CodeGen (with some
  exceptions).

  Additional Minor Changes:

    1. Finishes removing unused AliasLoad code
    2. Unifies the the chain aggregation in the merged stores across
       code paths
    3. Re-add the Store node to the worklist after calling
       SimplifyDemandedBits.
    4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
       arbitrary, but seemed sufficient to not cause regressions in
       tests.

  This finishes the change Matt Arsenault started in r246307 and
  jyknight's original patch.

  Many tests required some changes as memory operations are now
  reorderable. Some tests relying on the order were changed to use
  volatile memory operations

  Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -
      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

    CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
      This test appears to work but no longer exhibits the spill
      behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 282600
2016-09-28 15:50:43 +00:00
Guy Blank 2bdc74a471 [X86][FastISel] Use a COPY from K register to a GPR instead of a K operation
The KORTEST was introduced due to a bug where a TEST instruction used a K register.
but, turns out that the opposite case of KORTEST using a GPR is now happening

The change removes the KORTEST flow and adds a COPY instruction from the K reg to a GPR.

Differential Revision: https://reviews.llvm.org/D24953

llvm-svn: 282580
2016-09-28 11:22:17 +00:00
Michael Kuperstein 3e06eafc20 [DAG] Remove isVectorClearMaskLegal() check from vector_build dagcombine
This check currently doesn't seem to do anything useful on any in-tree target:
On non-x86, it always evaluates to false, so we never hit the code path that
creates the shuffle with zero.
On x86, it just forwards to isShuffleMaskLegal(), which is a reasonable thing to
query in general, but doesn't make sense if only restricted to zero blends.

Differential Revision: https://reviews.llvm.org/D24625

llvm-svn: 282567
2016-09-28 06:13:58 +00:00
Sanjay Patel 764ae8bd72 [x86] add folds for FP logic with vector zeros
The 'or' case shows up in copysign. The copysign code also had 
redundant checking for a scalar zero operand with 'and', so I 
removed that. 

I'm not sure how to test vector 'and', 'andn', and 'xor' yet, 
but it seems better to just include all of the logic ops since
we're fixing 'or' anyway.

llvm-svn: 282546
2016-09-27 22:28:13 +00:00
Geoff Berry b124331db7 [TargetRegisterInfo, AArch64] Add target hook for isConstantPhysReg().
Summary:
The current implementation of isConstantPhysReg() checks for defs of
physical registers to determine if they are constant.  Some
architectures (e.g. AArch64 XZR/WZR) have registers that are constant
and may be used as destinations to indicate the generated value is
discarded, preventing isConstantPhysReg() from returning true.  This
change adds a TargetRegisterInfo hook that overrides the no defs check
for cases such as this.

Reviewers: MatzeB, qcolombet, t.p.northover, jmolloy

Subscribers: junbuml, aemerson, mcrosier, rengolin

Differential Revision: https://reviews.llvm.org/D24570

llvm-svn: 282543
2016-09-27 22:17:27 +00:00
Keith Walker 83ebef5db3 Propagate DBG_VALUE entries when there are unvisited predecessors
Variables are sometimes missing their debug location information in
blocks in which the variables should be available. This would occur
when one or more predecessor blocks had not yet been visited by the
routine which propagated the information from predecessor blocks.

This is addressed by only considering predecessor blocks which have
already been visited.

The solution to this problem was suggested by Daniel Berlin on the
LLVM developer mailing list.

Differential Revision: https://reviews.llvm.org/D24927

llvm-svn: 282506
2016-09-27 16:46:07 +00:00
Simon Dardis d2ed8abb15 [mips] Disable tail calls temporarily
Disable tail calls while the remaining bugs are fixed. Enable only for tests.

Reviewers: vkalintiris

Differential Review: https://reviews.llvm.org/D24912

llvm-svn: 282487
2016-09-27 13:15:54 +00:00
Nemanja Ivanovic 6f22b41398 [Power9] Builtins for ELF v.2 API conformance - back end portion
This patch corresponds to review:
https://reviews.llvm.org/D24396

This patch adds support for the "vector count trailing zeroes",
"vector compare not equal" and "vector compare not equal or zero instructions"
as well as "scalar count trailing zeroes" instructions. It also changes the
vector negation to use XXLNOR (when VSX is enabled) so as not to increase
register pressure (previously this was done with a splat immediate of all
ones followed by an XXLXOR). This was done because the altivec.h
builtins (patch to follow) use vector negation and the use of an additional
register for the splat immediate is not optimal.

llvm-svn: 282478
2016-09-27 08:42:12 +00:00
Craig Topper 71f1c64320 [X86] Add test case for PR30511 and r282341.
llvm-svn: 282473
2016-09-27 06:44:30 +00:00
Craig Topper 4ffe5d5af0 [X86] Expand all-ones-vector test to cover 256-bit and 512-bit vectors.
llvm-svn: 282472
2016-09-27 06:44:27 +00:00
Davide Italiano a9f85d68cc [CodeGen] Add support for emitting .init_array instead of .ctors on FreeBSD.
PR: 30494
llvm-svn: 282451
2016-09-26 22:53:15 +00:00
Davide Italiano f5d77f4aad [CodeGen] Switch test as FreeBSD will support .init_array soon.
llvm-svn: 282450
2016-09-26 22:38:17 +00:00
Derek Schuff 92d300eb8f [WebAssembly] Use the frame pointer instead of the stack pointer
When we have dynamic allocas we have a frame pointer, and
when we're lowering frame indexes we should make sure we use it.

Patch by Jacob Gravelle

Differential Revision: https://reviews.llvm.org/D24889

llvm-svn: 282442
2016-09-26 21:18:03 +00:00
Evandro Menezes 055767d5f4 [AArch64] Fix test triplet
Specify proper target triplet to pass under Windows too.

llvm-svn: 282423
2016-09-26 18:09:21 +00:00
Tom Stellard 1b9748c6a2 AMDGPU/SI: Don't crash on anonymous GlobalValues
Summary:
We need to call AsmPrinter::getNameWithPrefix() in order to handle
anonymous GlobalValues (e.g. @0, @1).

Reviewers: arsenm, b-sumner

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D24865

llvm-svn: 282420
2016-09-26 17:29:25 +00:00
Geoff Berry 256fcf975f [AArch64] Improve add/sub/cmp isel of uxtw forms.
Don't match the UXTW extended reg forms of ADD/ADDS/SUB/SUBS if the
32-bit to 64-bit zero-extend can be done for free by taking advantage
of the 32-bit defining instruction zeroing the upper 32-bits of the X
register destination.  This enables better instruction selection in a
few cases, such as:

  sub x0, xzr, x8
  instead of:
  mov x8, xzr
  sub x0, x8, w9, uxtw

  madd x0, x1, x1, x8
  instead of:
  mul x9, x1, x1
  add x0, x9, w8, uxtw

  cmp x2, x8
  instead of:
  sub x8, x2, w8, uxtw
  cmp x8, #0

  add x0, x8, x1, lsl #3
  instead of:
  lsl x9, x1, #3
  add x0, x9, w8, uxtw

Reviewers: t.p.northover, jmolloy

Subscribers: mcrosier, aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D24747

llvm-svn: 282413
2016-09-26 15:34:47 +00:00
Evandro Menezes e45de8a5ec Add support to optionally limit the size of jump tables.
Many high-performance processors have a dedicated branch predictor for
indirect branches, commonly used with jump tables.  As sophisticated as such
branch predictors are, they tend to have well defined limits beyond which
their effectiveness is hampered or even nullified.  One such limit is the
number of possible destinations for a given indirect branches that such
branch predictors can handle.

This patch considers a limit that a target may set to the number of
destination addresses in a jump table.

Patch by: Evandro Menezes <e.menezes@samsung.com>, Aditya Kumar
<aditya.k7@samsung.com>, Sebastian Pop <s.pop@samsung.com>.

Differential revision: https://reviews.llvm.org/D21940

llvm-svn: 282412
2016-09-26 15:32:33 +00:00
James Molloy 9abb2fa5bb [ARM] Promote small global constants to constant pools
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:

      ldr r0, .CPI0
      bl printf
      bx lr
    .CPI0: &format_string
    format_string: .asciz "hello, world!\n"

We can emit:

      adr r0, .CPI0
      bl printf
      bx lr
    .CPI0: .asciz "hello, world!\n"

This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).

This recommit contains fixes for a nasty bug related to fast-isel fallback - because
fast-isel doesn't know about this optimization, if it runs and emits references to
a string that we inline (because fast-isel fell back to SDAG) we will end up
with an inlined string and also an out-of-line string, and we won't emit the
out-of-line string, causing backend failures.

It also contains fixes for emitting .text relocations which made the sanitizer
bots unhappy.

llvm-svn: 282387
2016-09-26 07:26:24 +00:00
Zvi Rackover 839d15a194 [X86] Optimization for replacing LEA with MOV at frame index elimination time
Summary:
Replace a LEA instruction of the form 'lea (%esp), %ebx' --> 'mov %esp, %ebx'

MOV is preferable over LEA because usually there are more issue-slots available to execute MOVs than LEAs. Latest processors also support zero-latency MOVs.

Fixes pr29022.

Reviewers: hfinkel, delena, igorb, myatsina, mkuper

Differential Revision: https://reviews.llvm.org/D24705

llvm-svn: 282385
2016-09-26 06:42:07 +00:00
Ayman Musa d7a5ed4141 [X86][avx512] Fix bug in masked compress store.
Differential Revision: https://reviews.llvm.org/D23984

llvm-svn: 282381
2016-09-26 06:22:08 +00:00
Craig Topper 60d3ef1d72 [AVX-512] Fix some patterns predicates to properly enforce priority for various versions of CVTDQ2PD instruction.
llvm-svn: 282358
2016-09-25 16:34:02 +00:00
Craig Topper d8b2bd492c [AVX-512] Add the scalar unsigned integer to fp conversion instructions to hasUndefRegUpdate.
llvm-svn: 282356
2016-09-25 16:33:57 +00:00
Sanjay Patel 752ad8fde7 [x86] don't try to create a vector integer inst for an SSE1 target (PR30512)
This bug was introduced with:
http://reviews.llvm.org/rL272511

We need to restrict the lowering to v4f32 comparisons because that's all SSE1 can handle.

This should fix:
https://llvm.org/bugs/show_bug.cgi?id=28044

llvm-svn: 282336
2016-09-24 20:24:06 +00:00
Sanjay Patel 0b36337d61 [x86] fix FCOPYSIGN lowering to create constants instead of ConstantPool loads
This is similar to:
https://reviews.llvm.org/rL279958

By not prematurely lowering to loads, we should be able to more easily eliminate
the 'or' with zero instructions seen in copysign-constant-magnitude.ll.

We should also be able to extend this code to handle vectors.

llvm-svn: 282312
2016-09-23 23:17:29 +00:00
Matthias Braun 729c989083 llc: Add -start-before/-stop-before options
Differential Revision: https://reviews.llvm.org/D23089

llvm-svn: 282302
2016-09-23 21:46:02 +00:00
Matthias Braun 1acb55e67c ScheduleDAG: Match enum names when printing sdep kinds
It is less confusing to have the same names in the debug print as the
enum members.

llvm-svn: 282273
2016-09-23 18:28:31 +00:00
James Molloy 85124c76fc Revert "[ARM] Promote small global constants to constant pools"
This reverts commit r282241. It caused http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/19882.

llvm-svn: 282249
2016-09-23 13:35:43 +00:00
Nemanja Ivanovic d2c3c51a70 [Power9] Exploit move and splat instructions for build_vector improvement
This patch corresponds to review:
https://reviews.llvm.org/D21135

This patch exploits the following instructions:
mtvsrws
lxvwsx
mtvsrdd
mfvsrld

In order to improve some build_vector and extractelement patterns.

llvm-svn: 282246
2016-09-23 13:25:31 +00:00
James Molloy 1ce54d6be2 [ARM] Promote small global constants to constant pools
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:

      ldr r0, .CPI0
      bl printf
      bx lr
    .CPI0: &format_string
    format_string: .asciz "hello, world!\n"

We can emit:

      adr r0, .CPI0
      bl printf
      bx lr
    .CPI0: .asciz "hello, world!\n"

This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).

This recommit contains fixes for a nasty bug related to fast-isel fallback - because
fast-isel doesn't know about this optimization, if it runs and emits references to
a string that we inline (because fast-isel fell back to SDAG) we will end up
with an inlined string and also an out-of-line string, and we won't emit the
out-of-line string, causing backend failures.

It also contains fixes for emitting .text relocations which made the sanitizer
bots unhappy.

llvm-svn: 282241
2016-09-23 12:15:58 +00:00
Tom Stellard e88bbc34c6 AMDGPU/SI: Include implicit arguments in kernarg_segment_byte_size
Reviewers: arsenm

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D24835

llvm-svn: 282223
2016-09-23 01:33:26 +00:00
Arnold Schwaighofer 0fd32c005b i386 does not support optimized swifterror handling
rdar://28432565

llvm-svn: 282186
2016-09-22 20:06:25 +00:00
Hans Wennborg c4b1d20ba2 Win64: Don't emit unwind info for "leaf" functions (PR30337)
According to MSDN (see the PR), functions which don't touch any callee-saved
registers (including %rsp) don't need any unwind info.

This patch makes LLVM not emit unwind info for such functions, to save
binary size.

Differential Revision: https://reviews.llvm.org/D24748

llvm-svn: 282185
2016-09-22 19:50:05 +00:00
Nemanja Ivanovic 8dacca943a [PowerPC] Sign extend sub-word values for atomic comparisons
Atomic comparison instructions use the sub-word load instruction on
Power8 and up but the value is not sign extended prior to the signed word
compare instruction. This patch adds that sign extension.

llvm-svn: 282182
2016-09-22 19:06:38 +00:00
Nirav Dave 9011da3d44 [DAG] Fix incorrect alignment of ext load.
Correctly use alignment size from loaded size not output value size.

Reviewers: jyknight, tstellarAMD, arsenm

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23356

llvm-svn: 282177
2016-09-22 17:28:43 +00:00
Krzysztof Parzyszek b66efb855c [PPC] Set SP after loading data from stack frame, if no red zone is present
Follow-up to r280705: Make sure that the SP is only restored after all data
is loaded from the stack frame, if there is no red zone.

This completes the fix for https://llvm.org/bugs/show_bug.cgi?id=26519.

Differential Revision: https://reviews.llvm.org/D24466

llvm-svn: 282174
2016-09-22 17:22:43 +00:00
Tim Northover a5e38fa00d GlobalISel: handle stack-based parameters on AArch64.
llvm-svn: 282153
2016-09-22 13:49:25 +00:00
Nemanja Ivanovic 6e7879c5e6 [Power9] Add exploitation of non-permuting memory ops
This patch corresponds to review:
https://reviews.llvm.org/D19825

The new lxvx/stxvx instructions do not require the swaps to line the elements
up correctly. In order to select them over the lxvd2x/lxvw4x instructions which
require swaps, the patterns for the old instruction have a predicate that
ensures they won't be selected on Power9 and newer CPUs.

llvm-svn: 282143
2016-09-22 09:52:19 +00:00
Craig Topper 202b453a8a [AVX-512] Add support for commuting VPTERNLOG instructions.
VPTERNLOG is a ternary instruction with an immediate specifying the logical operation to perform. For each bit position in the 3 source vectors the bit from each source is concatenated together and the resulting 3-bit value is used to select a bit in the immediate. This bit value is written to the result vector.

We can commute this by swapping operands and modifying the immediate. To modify the immediate we need to swap two pairs of bits. The pairs correspond to the locations in the immediate where the commuted operands bits have opposite values and the uncommuted operand has the same value. Bits 0 and 7 will never be swapped since the relevant bits from all sources are the same value.

This refactors and reuses parts of the FMA3 commuting code which is also a three operand instruction.

llvm-svn: 282132
2016-09-22 03:00:50 +00:00
Arnold Schwaighofer de2490d0dc Disable tail calls if there is an swifterror argument
ISel does not handle them correctly yet i.e we crash trying to emit tail call
code.

radar://28407842

llvm-svn: 282088
2016-09-21 16:53:36 +00:00
Cameron McInally 2aa85e1210 [AVX512] Fix return types on int_x86_avx512_gatherXXX_di intrinsics
The return type should match the pass through vector type.

Differential Revision: https://reviews.llvm.org/D24744

llvm-svn: 282081
2016-09-21 16:06:10 +00:00
Nico Weber 903859c0e4 Revert r281715, it caused PR30475
llvm-svn: 282076
2016-09-21 15:33:24 +00:00
Tim Northover 9a46718378 GlobalISel: produce correct code for signext/zeroext ABI flags.
We still don't really have an equivalent of "AssertXExt" in DAG, so we don't
exploit the guarantees on the receiving side yet, but this should produce
conservatively correct code on iOS ABIs.

llvm-svn: 282069
2016-09-21 12:57:45 +00:00
Simon Dardis 9a66bbecae [mips] LLVM PR/30197 - Tail call incorrectly clobbers arguments for mips
The postRA scheduler performs alias analysis to determine if stores and loads
can moved past each other. When a function has more arguments than argument
registers for the calling convention used, excess arguments are spilled onto the
stack. LLVM by default assumes that argument slots are immutable, unless the
function contains a tail call. Without the knowledge of that a function contains
a tail call site, stores and loads to fixed stack slots may be re-ordered
causing the out-going arguments to clobber the incoming arguments before the
incoming arguments are supposed to be dead.

Reviewers: vkalintiris

Differential Review: https://reviews.llvm.org/D24077

llvm-svn: 282063
2016-09-21 09:43:40 +00:00
NAKAMURA Takumi 0d57392ef3 llvm/test/CodeGen/NVPTX/zero-cs.ll: Relax an expression to match in -Asserts.
LLVM ERROR: Cannot select: 0x3607bf0: i32 = ExternalSymbol'__powidf2'

llvm-svn: 282053
2016-09-21 04:43:11 +00:00
Jacques Pienaar 98345fc0a1 [NVPTX] Check if callsite is defined when computing argument allignment
Summary: In getArgumentAlignment check if the ImmutableCallSite pointer CS is non-null before dereferencing. If CS is 0x0 fall back to the ABI type alignment else compute the alignment as before.

Reviewers: eliben, jpienaar

Subscribers: jlebar, vchuravy, cfe-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D9168

llvm-svn: 282045
2016-09-21 01:57:57 +00:00
Petr Hosek 1290355f5a Mark ELF sections whose name start with .note as note
Previously, such section would be marked as SHT_PROGBITS which
makes it impossible to use an initialized C variable declaration
to emit an (allocated) ELF note. The new behavior is also consistent
with ELF assembly parser.

Differential Revision: https://reviews.llvm.org/D24692

llvm-svn: 282010
2016-09-20 20:21:13 +00:00
Sanjay Patel db3438abcb [x86] split up tests, regenerate checks
Note that we fail to eliminate 'or' with 0!

llvm-svn: 282005
2016-09-20 19:31:30 +00:00
Evandro Menezes ba4926efde Revert "[AArch64] Use the reciprocal estimation machinery"
This reverts commit b7d42b0048f65346e9fa37fb65defeea7ce8c337 per request by
Eric Christopher <echristo@gmail.com> (v. http://bit.ly/2cmz6kW).

llvm-svn: 282000
2016-09-20 19:02:06 +00:00
Evandro Menezes 61a1273d27 Revert "[AArch64] Properly validate the reciprocal estimation."
This reverts commit ad8ca1528242e2a4cb363e3779309e70eb7a430e per request by
Eric Christopher <echristo@gmail.com> (v. http://bit.ly/2cmz6kW).

llvm-svn: 281999
2016-09-20 19:02:02 +00:00
Tim Northover b18ea162df GlobalISel: split aggregates for PCS lowering
This should match the existing behaviour for passing complicated struct and
array types, in particular HFAs come through like that from Clang.

For C & C++ we still need to somehow support all the weird ABI flags, or at
least those that are present in the IR (signext, byval, ...), and stack-based
parameter passing.

llvm-svn: 281977
2016-09-20 15:20:36 +00:00
Simon Pilgrim 1c38e1189f [X86][SSE] Regenerate multiple combine tests
llvm-svn: 281973
2016-09-20 14:42:45 +00:00
Elena Demikhovsky d3ff7c288b AVX-512: Fixed a bug in lowering saturated operations on KNL.
The generated code is still not optimal.

Differential Revision: https://reviews.llvm.org/D24723

llvm-svn: 281966
2016-09-20 11:02:26 +00:00
Craig Topper 9820e341f9 [AVX-512] Use 512-bit vcvtps2ph/vcvtph2ps to implement fp_to_f16/f16_to_fp when F16C and VLX are not supported.
Fixes PR23941.

llvm-svn: 281958
2016-09-20 05:44:47 +00:00
Matthias Braun 4040c0f4ec BranchFolder: Fix invalid undef flags after merge.
It is legal to merge instructions with different undef flags; However we
must drop the undef flag from the merged instruction if it isn't present
everywhere.

This fixes http://llvm.org/PR30199

llvm-svn: 281957
2016-09-20 01:14:42 +00:00
Sanjay Patel e9ac83dbe3 [x86] auto-generate checks
llvm-svn: 281950
2016-09-19 23:44:50 +00:00
Simon Pilgrim 233374c4d1 [X86][SSE] Updated vector abs tests
Renamed and added v2i64 / v4i64 tests

llvm-svn: 281937
2016-09-19 20:50:35 +00:00
Matthias Braun e1d9628bba LiveRangeCalc: Fix reporting of invalid vreg usage in liveness calculation
Machine programs need a definition of each vreg before reaching a use
(the definition may come from an IMPLICIT_DEF instruction). This class
of errors is not detected by the MachineVerifier because of efficiency
concerns. LiveRangeCalc used to report these problems, make it do that
again (followup to r279625).

Also use report_fatal_error() instead of llvm_unreachable() as the error
reporting is only present in asserts build anyway.

llvm-svn: 281914
2016-09-19 16:49:45 +00:00
Nico Weber 9be7267516 Revert r281841, it does not work on Windows (PR30443).
llvm-svn: 281905
2016-09-19 15:22:04 +00:00
Tim Northover eaee28b5ca ARM: check alignment before transforming ldr -> ldm (or similar).
ldm and stm instructions always require 4-byte alignment on the pointer, but we
weren't checking this before trying to reduce code-size by replacing a
post-indexed load/store with them. Unfortunately, we were also dropping this
incormation in DAG ISel too, but that's easy enough to fix.

llvm-svn: 281893
2016-09-19 09:11:09 +00:00
Elena Demikhovsky 59f6f8c05d [X86 Codegen Test] Divided masked_memop into several files. NFC.
The masked_memop.ll became huge. I extracted AVX-512 specific tests into separate files.

llvm-svn: 281892
2016-09-19 08:58:43 +00:00
Craig Topper 61403201ea [X86,AVX-512] Use INSERT_SUBREG instead of SUBREG_TO_REG when the input is not the output of an instruction.
SUBREG_TO_REG is supposed to indicate that the super register has been zeroed, but we can't prove that if we don't know where it came from.

llvm-svn: 281885
2016-09-19 02:53:43 +00:00
Craig Topper b3b5033179 [AVX-512] Add support for lowering fp_to_f16 and f16_to_fp when VLX is supported regardless of whether F16C is also supported.
Still need to add support for lowering using AVX512F when neither VLX or F16C is supported.

llvm-svn: 281884
2016-09-19 02:53:37 +00:00
Dean Michael Berris 4640154446 [XRay] ARM 32-bit no-Thumb support in LLVM
This is a port of XRay to ARM 32-bit, without Thumb support yet. The XRay instrumentation support is moving up to AsmPrinter.
This is one of 3 commits to different repositories of XRay ARM port. The other 2 are:

https://reviews.llvm.org/D23932 (Clang test)
https://reviews.llvm.org/D23933 (compiler-rt)

Differential Revision: https://reviews.llvm.org/D23931

llvm-svn: 281878
2016-09-19 00:54:35 +00:00
Simon Pilgrim 6c21e6a54e [X86][SSE] Improve recognition of uitofp conversions that can be performed as sitofp
With D24253 we can now use SelectionDAG::SignBitIsZero with vector operations.

This patch uses SelectionDAG::SignBitIsZero to recognise that a zero sign bit means that we can use a sitofp instead of a uitofp (which is not directly support on pre-AVX512 hardware).

While AVX512 does provide support for uitofp, the conversion to sitofp should not cause any regressions.

Differential Revision: https://reviews.llvm.org/D24343

llvm-svn: 281852
2016-09-18 12:45:23 +00:00
Wei Mi ab24cd189f Change the order of the splitted store from high - low to low - high.
It is a trivial change which could make the testcase easier to be reused
for the store splitting in CodeGenPrepare.

llvm-svn: 281846
2016-09-18 06:10:32 +00:00
Simon Pilgrim aeb764a7b7 [X86][SSE] Added vector udiv combine tests
llvm-svn: 281842
2016-09-17 22:02:23 +00:00
Simon Pilgrim 2cc2900296 [X86][SSE] Added vector fcopysign combine tests
Also demonstrating the poor lowering of fcopysign...

llvm-svn: 281841
2016-09-17 21:31:34 +00:00
Simon Pilgrim d4473f1126 [X86][SSE] Added vector mul combine tests
llvm-svn: 281839
2016-09-17 20:06:16 +00:00
Simon Pilgrim 6736096ac3 [X86][SSE] Improve target shuffle mask extraction
Add ability to extract vXi64 'vzext_movl' masks on 32-bit targets

llvm-svn: 281834
2016-09-17 18:50:54 +00:00
Simon Pilgrim 06bfabbfb6 [X86][AVX] Test target shuffle combining on 32 and 64-bit targets
llvm-svn: 281833
2016-09-17 18:42:41 +00:00
Simon Pilgrim 06f85e43cf [X86][AVX2] Add target shuffle constant folding tests
llvm-svn: 281830
2016-09-17 17:42:15 +00:00
Simon Pilgrim a7785cdee6 [X86][AVX] Add target shuffle constant folding tests
llvm-svn: 281829
2016-09-17 17:41:14 +00:00
Simon Pilgrim 44f3aead3d [X86][XOP] Add target shuffle constant folding tests
llvm-svn: 281828
2016-09-17 17:40:40 +00:00
Simon Pilgrim f0c22ea1a4 [X86][SSSE3] Add target shuffle constant folding tests
llvm-svn: 281827
2016-09-17 17:40:08 +00:00
Ron Lieberman da5df7c99e [Hexagon] segv while processing SUnit with nullNodePtr
Added BoundaryNode check to isBestZeroLatency function.

llvm-svn: 281825
2016-09-17 16:21:09 +00:00
Matt Arsenault ac0fc849cf AMDGPU: Fix broken FrameIndex handling
We were trying to avoid using a FrameIndex operand in non-pointer
operands in a convoluted way, and would break because of
using TargetFrameIndex. The TargetFrameIndex should only be used
in the case where it makes sense to fold it as part of the addressing
mode, otherwise it requires materialization like a normal constant.
This wasn't working reliably and failed in the added testcase, hitting
the assert when processing the frame index.

The TargetFrameIndex was coming from trying to produce an AssertZext
limiting the maximum stack size. I'm not sure this was correct to begin
with, because it is apparently possible to have a single workitem
dispatch that requires all 4G of private memory.

llvm-svn: 281824
2016-09-17 16:09:55 +00:00
Matt Arsenault d99ef1144b AMDGPU: Push bitcasts through build_vector
This reduces the number of copies and reg_sequences
when using fp constant vectors. This significantly
reduces the code size in local-stack-alloc-bug.ll

llvm-svn: 281822
2016-09-17 15:44:16 +00:00
Matt Arsenault 7b1dc2c983 AMDGPU: Use i64 scalar compare instructions
VI added eq/ne for i64, so use them.

llvm-svn: 281800
2016-09-17 02:02:19 +00:00
Tom Stellard 7998db634c AMDGPU/SI: Fix kernel argument ABI for HSA
Summary: i8, i16, and f16 values are not extended to 32-bit in the HSA kernel ABI.

Reviewers: arsenm

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D24621

llvm-svn: 281789
2016-09-16 22:20:24 +00:00
Matt Arsenault 6408c9135c AMDGPU: Allow some control flow intrinsics to be CSEd
These clean up some unnecessary or instructions in
cases with complex loops.

In the original testcase I noticed this, the same
or with exec was repeated 5 or 6 times in a row. With
this only one is emitted or sometimes a copy.

llvm-svn: 281786
2016-09-16 22:11:18 +00:00
Tom Stellard bbeb45aff6 AMDGPU: Refactor kernel argument lowering
Summary:
The main challenge in lowering kernel arguments for AMDGPU is determing the
memory type of the argument.  The generic calling convention code assumes
that only legal register types can be stored in memory, but this is not the
case for AMDGPU.

This consolidates all the logic AMDGPU uses for deducing memory types into a single
function.  This will make it much easier to support different ABIs in the future.

Reviewers: arsenm

Subscribers: arsenm, wdng, nhaehnle, llvm-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D24614

llvm-svn: 281781
2016-09-16 21:53:00 +00:00
Matt Arsenault 7ccf6cd104 AMDGPU: Use SOPK compare instructions
llvm-svn: 281780
2016-09-16 21:41:16 +00:00
Tom Stellard 0b76fc4c77 AMDGPU/SI: Add support for triples with the mesa3d operating system
Summary:
mesa3d will use the same kernel calling convention as amdhsa, but it will
handle everything else like the default 'unknown' OS type.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D22783

llvm-svn: 281779
2016-09-16 21:34:26 +00:00
Derek Schuff 3b04b7eba1 [WebAssembly] Fix function types of CFGStackify tests
Make the function's declared type match its (lack of) return type

llvm-svn: 281773
2016-09-16 20:58:31 +00:00
Simon Pilgrim 99cc6f8043 [X86][SSE] Added vector sub combine tests
llvm-svn: 281769
2016-09-16 20:00:51 +00:00
Simon Pilgrim 28568c477d [X86][SSE] Added vector add combine tests
Some work great and others currently demonstrate the anti-vector bias prevalent in DAGCombiner

llvm-svn: 281768
2016-09-16 19:20:41 +00:00
Michael Kuperstein dadb71be88 Make test slightly more explicit. NFC.
llvm-svn: 281759
2016-09-16 18:20:43 +00:00
Ahmed Bougacha 85ef4a1c47 [AArch64][GlobalISel] Add default regbank mapping for int<>FP.
llvm-svn: 281739
2016-09-16 15:12:46 +00:00
Ahmed Bougacha 7b3b2e7f65 [AArch64][GlobalISel] Add default regbank mapping for G_FCMP.
llvm-svn: 281738
2016-09-16 15:12:43 +00:00
Ahmed Bougacha 90637f6196 [AArch64][GlobalISel] Add default regbank mapping for FP ops.
These should have all their operands - even scalars - go on FPR.

llvm-svn: 281737
2016-09-16 15:12:40 +00:00
Ahmed Bougacha 74db8faa71 [AArch64][GlobalISel] Test default regbank mapping for G_ICMP.
Also relax a RegisterBankInfo verifier check that's incompatible with
1-bit mappings.

llvm-svn: 281735
2016-09-16 14:44:54 +00:00
Ahmed Bougacha 7306313e6d [AArch64][GlobalISel] Add default regbank mappings for mixed-type ops.
We used to only support instructions with same-type operands.
Instead, use the per-register type information to map each
operand more accurately.

llvm-svn: 281734
2016-09-16 14:44:51 +00:00
Ahmed Bougacha 02629aae3b [AArch64][GlobalISel] Add tests for default RegBank mappings. NFC.
llvm-svn: 281733
2016-09-16 14:44:48 +00:00
Keith Walker 830a8c1fbd Place the lowered phi instruction(s) before the DEBUG_VALUE entry
When a phi node is finally lowered to a machine instruction it is
important that the lowered "load" instruction is placed before the
associated DEBUG_VALUE entry describing the value loaded.

Renamed the existing SkipPHIsAndLabels to SkipPHIsLabelsAndDebug to
more fully describe that it also skips debug entries. Then used the
"new" function SkipPHIsAndLabels when the debug information should not
be skipped when placing the lowered "load" instructions so that it is
placed before the debug entries.

Differential Revision: https://reviews.llvm.org/D23760 

llvm-svn: 281727
2016-09-16 14:07:29 +00:00
Sjoerd Meijer 227825346e Reverting r281719, this is causing buildbot failures and timeouts again.
llvm-svn: 281722
2016-09-16 13:16:52 +00:00
Sjoerd Meijer 23385c87a4 This is an attempt to reapply r280808: [ARM] Lower UDIV+UREM to UDIV+MLS
(and the same for SREM)

This was causing buildbot failures earlier (time outs in the LNT suite).
However, we haven't been able to reproduce this and are suspecting this
was caused by another (reverted) patch.

llvm-svn: 281719
2016-09-16 12:10:09 +00:00
James Molloy 0dc4708fca [ARM] Promote small global constants to constant pools
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:

      ldr r0, .CPI0
      bl printf
      bx lr
    .CPI0: &format_string
    format_string: .asciz "hello, world!\n"

We can emit:

      adr r0, .CPI0
      bl printf
      bx lr
    .CPI0: .asciz "hello, world!\n"

This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).

This recommit contains fixes for a nasty bug related to fast-isel fallback - because
fast-isel doesn't know about this optimization, if it runs and emits references to
a string that we inline (because fast-isel fell back to SDAG) we will end up
with an inlined string and also an out-of-line string, and we won't emit the
out-of-line string, causing backend failures.

It also contains fixes for emitting .text relocations which made the sanitizer
bots unhappy.

llvm-svn: 281715
2016-09-16 10:17:04 +00:00
Evandro Menezes 19b2aed308 [AArch64] Support for FP FMA when -ffp-contract=fast
Currently, the machine combiner can proceed matching when -ffast-math is on.
It should also match when only -ffp-contract=fast is specified as was the
case before when DAGCombiner was doing the job.

Patch by: Abderrazek Zaafrani <a.zaafrani@samsung.com>.

Differential Revision: https://reviews.llvm.org/D24366

llvm-svn: 281649
2016-09-15 19:55:23 +00:00
Evgeniy Stepanov a0601a40f7 Revert "[ARM] Promote small global constants to constant pools"
This reverts r281604, which adds text relocations to ARM binaries.

llvm-svn: 281645
2016-09-15 19:13:32 +00:00
James Molloy fe7fd879d7 [ARM] Promote small global constants to constant pools
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:

      ldr r0, .CPI0
      bl printf
      bx lr
    .CPI0: &format_string
    format_string: .asciz "hello, world!\n"

We can emit:

      adr r0, .CPI0
      bl printf
      bx lr
    .CPI0: .asciz "hello, world!\n"

This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).

This recommit contains fixes for a nasty bug related to fast-isel fallback - because
fast-isel doesn't know about this optimization, if it runs and emits references to
a string that we inline (because fast-isel fell back to SDAG) we will end up
with an inlined string and also an out-of-line string, and we won't emit the
out-of-line string, causing backend failures.

llvm-svn: 281604
2016-09-15 12:30:27 +00:00
Tim Northover 22d82cf179 GlobalISel: legalize GEP instructions with small offsets.
llvm-svn: 281602
2016-09-15 11:02:19 +00:00
Tim Northover 4cf0a482bc GlobalISel: relax type constraints on G_ICMP to allow pointers.
llvm-svn: 281600
2016-09-15 10:40:38 +00:00
Sanjoy Das 23f06e53d8 [Stackmap] Added callsite counts to emitted function information.
Summary:
It was previously not possible for tools to use solely the stackmap
information emitted to reconstruct the return addresses of callsites in
the map, which is necessary to use the information to walk a stack. This
patch adds per-function callsite counts when emitting the stackmap
section in order to resolve the problem. Note that this slightly alters
the stackmap format, so external tools parsing these maps will need to
be updated.

**Problem Details:**
Records only store their offset from the beginning of the function they
belong to. While these records and the functions are output in program
order, it is not possible to determine where the end of one function's
records are without the callsite count when processing the records to
compute return addresses.

Patch by Kavon Farvardin!

Reviewers: atrick, ributzka, sanjoy

Subscribers: nemanjai

Differential Revision: https://reviews.llvm.org/D23487

llvm-svn: 281532
2016-09-14 20:22:03 +00:00
Sanjay Patel 3caf155bd3 [x86] regenerate checks
llvm-svn: 281531
2016-09-14 20:21:28 +00:00
Sanjay Patel c0899b961a [x86] regenerate checks
llvm-svn: 281529
2016-09-14 20:16:24 +00:00
Evgeniy Stepanov e97d3b90b9 Revert "[ARM] Promote small global constants to constant pools"
Breaks Android tests by introducing text relocations to ARM binaries.

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/25362/steps/run%20asan%20lit%20tests%20%5Barm%2Fbullhead-userdebug%2FMTC20F%5D/logs/stdio

llvm-svn: 281526
2016-09-14 20:02:30 +00:00
Sanjay Patel d56a27e242 [x86] regenerate checks
llvm-svn: 281523
2016-09-14 19:42:03 +00:00
Matt Arsenault f40b70fa75 Revert "AMDGPU: Use SOPK compare instructions"
Accidentally committed

llvm-svn: 281514
2016-09-14 18:04:42 +00:00
Matt Arsenault f757c87959 AMDGPU: Use SOPK compare instructions
llvm-svn: 281513
2016-09-14 18:03:53 +00:00
Simon Pilgrim a369219ce6 [X86][SSE] Improve recognition of i64 sitofp conversions that can be performed as i32 (PR29078)
Until AVX512DQ we only support i64/vXi64 sitofp conversion as scalars.

This patch sees if the sign bit extends far enough that we can truncate to a i32 type and then perform sitofp without loss of precision.

Differential Revision: https://reviews.llvm.org/D24345

llvm-svn: 281502
2016-09-14 17:15:26 +00:00
Matt Arsenault 2bc198a333 AMDGPU: Support folding FrameIndex operands
This avoids test regressions in a future commit.

llvm-svn: 281491
2016-09-14 15:51:33 +00:00
Matt Arsenault fa5f767a38 AMDGPU: Improve splitting 64-bit bit ops by constants
This addresses a TODO to handle operations besides and. This
also starts eliminating no-op operations with a constant that
can emerge later.

llvm-svn: 281488
2016-09-14 15:19:03 +00:00
James Molloy 13065b00ba [ARM] Promote small global constants to constant pools
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:

      ldr r0, .CPI0
      bl printf
      bx lr
    .CPI0: &format_string
    format_string: .asciz "hello, world!\n"

We can emit:

      adr r0, .CPI0
      bl printf
      bx lr
    .CPI0: .asciz "hello, world!\n"

This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).

llvm-svn: 281484
2016-09-14 14:47:27 +00:00
Simon Pilgrim 67fdd15cf9 [X86] Added i128 lshr+shl -> mask combine test
llvm-svn: 281480
2016-09-14 14:29:16 +00:00
Nemanja Ivanovic d5deb4896c Fix code-gen crash on Power9 for insert_vector_elt with variable index (PR30189)
This patch corresponds to review:
https://reviews.llvm.org/D24021

In the initial implementation of this instruction, I forgot to account for
variable indices. This patch fixes PR30189 and should probably be merged into
3.9.1 (I'll open a bug according to the new instructions).

llvm-svn: 281479
2016-09-14 14:19:09 +00:00
Simon Pilgrim ba325e3a73 [X86][SSE] Don't blend vector shifts with MOVSS/MOVSD directly, lower from generic shuffle
Shuffle lowering will correctly lower to MOVSS/MOVSD/PBLEND, improving commutation opportunities

llvm-svn: 281471
2016-09-14 14:08:18 +00:00
James Molloy 9790d8f81d Revert "[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently"
This reverts commit r281323. It caused chromium test failures and a selfhost failure.

llvm-svn: 281451
2016-09-14 09:45:28 +00:00
Tim Northover 1c7825fd79 GlobalISel: mark pointer stores as legal on AArch64.
llvm-svn: 281448
2016-09-14 08:28:54 +00:00
Sjoerd Meijer 724023a1ec This reapplies r281304. The issue was that I had missed
to copy the new isAdd field in the tablegen data structure.

llvm-svn: 281447
2016-09-14 08:20:03 +00:00
Elena Demikhovsky 0569d9d588 AVX-512: Fixed a bug in kortest.z intrinsic
Lowering was wrong - X86ISD::SETCC node should return i8 type.

llvm-svn: 281446
2016-09-14 08:06:54 +00:00
Igor Breger 74813fc19c [AVX512BW] Change truncStore action (v16i16->v16i18). It can be legal only with AVX512VL.
Differential Revision: http://reviews.llvm.org/D24547

llvm-svn: 281445
2016-09-14 08:04:28 +00:00
Craig Topper 4e2d5a43cf [X86] Remove the VCVTSI2SD32 with rounding intrinsic. It's not used by clang and not needed since 32-bit integer to double is always exact.
llvm-svn: 281442
2016-09-14 06:27:46 +00:00
Ahmed Bougacha 7398b178f1 [AArch64] Simplify patchpoint/stackmap size test (r281301). NFC.
llvm-svn: 281407
2016-09-13 22:16:40 +00:00
Pawel Bylica c397f0b272 [CodeGen] Fix invalid shift in mul expansion
Summary: When expanding mul in type legalization make sure the type for shift amount can actually fit the value. This fixes PR30354 https://llvm.org/bugs/show_bug.cgi?id=30354.

Reviewers: hfinkel, majnemer, RKSimon

Subscribers: RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D24478

llvm-svn: 281403
2016-09-13 21:55:41 +00:00
Michael Kuperstein 59f8305305 [DAG] Allow build-to-shuffle combine to combine builds from two wide vectors.
This allows us to, in some cases, create a vector_shuffle out of a build_vector, when
the inputs to the build are extract_elements from two different vectors, at least one
of which is wider than the output. (E.g. a <8 x i16> being constructed out of
elements from a <16 x i16> and a <8 x i16>).

Differential Revision: https://reviews.llvm.org/D24491

llvm-svn: 281402
2016-09-13 21:53:32 +00:00
Krzysztof Parzyszek d19d0507c8 [Hexagon] Better handling of HVX vector lowering
- Expand SELECT_CC and BR_CC for vector types.
- Implement TLI::isShuffleMaskLegal.

llvm-svn: 281397
2016-09-13 21:16:07 +00:00
Matthias Braun 1af1414d4d AArch64: Cleanup tailcall CC check, enable swiftcc.
Cleanup/change the code that checks for possible tailcall conventions to
look the same as the one in the X86 target. This makes the distinction
between calling conventions that can guarnatee tailcalls and the ones
that may tailcall more obvious.

- Add Swift to the mayTailCall list
- PreserveMost seemed to be incorrectly part of the guarnteed tail call
  list, move it to the mayTailCall list.

llvm-svn: 281376
2016-09-13 19:27:38 +00:00
Matt Arsenault 25dba30017 AMDGPU: Support commuting a FrameIndex operand
llvm-svn: 281369
2016-09-13 19:03:12 +00:00
Simon Pilgrim 4a8eba3e96 [DAGCombiner] Use APInt directly in (shl (zext (srl x, C)), C) combine range test
To avoid assertion, we must ensure that the inner shift constant is within range before calling ConstantSDNode::getZExtValue(). We already know that the outer shift constant is in range.

Followup to D23007

llvm-svn: 281362
2016-09-13 18:33:29 +00:00
Douglas Katzman 8ea02f4e1c [Myriad]: set LeonCASA processor feature
llvm-svn: 281359
2016-09-13 17:51:41 +00:00
Simon Pilgrim 1f9ddf48a6 [X86][SSE] Added AVX512F and additional vector truncate test cases
trunc16i16_16i8 is currently commented out due to PR25684

llvm-svn: 281356
2016-09-13 17:34:56 +00:00
Simon Pilgrim bd28a85d14 [DAGCombiner] Use APInt directly in (shl (ext (shl x, c1)), c2) combine
Fix failure to detect out of range shift constants leading to assert in ConstantSDNode::getZExtValue()

Followup to D23007

llvm-svn: 281354
2016-09-13 17:15:28 +00:00
Simon Pilgrim 2e99e0ff63 [X86] Regenerated shift combine tests.
Added x86_64 tests

llvm-svn: 281341
2016-09-13 14:41:39 +00:00
Krzysztof Parzyszek b558ae2125 [Hexagon] Clear the flow queue after visiting a single instruction
llvm-svn: 281339
2016-09-13 14:36:55 +00:00
James Molloy 043d613791 Revert "[ARM] Promote small global constants to constant pools"
This reverts commit r281314. Speculatively revert as it's possible this caused linker errors: http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/19656

llvm-svn: 281327
2016-09-13 12:45:51 +00:00
Pablo Barrio bb6984d401 [ARM] Add ".code 32" to functions in the ARM instruction set
Before, only Thumb functions were marked as ".code 16". These
".code x" directives are effective until the next directive of its
kind is encountered. Therefore, in code with interleaved ARM and
Thumb functions, it was possible to declare a function as ARM and
end up with a Thumb function after assembly. A test has been added.

An existing test has also been fixed to take this change into
account.

Reviewers: aschwaighofer, t.p.northover, jmolloy, rengolin

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D24337

llvm-svn: 281324
2016-09-13 12:18:15 +00:00
James Molloy d246c598de [Thumb] Teach ISel how to lower compares of AND bitmasks efficiently
For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)).

1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS.
2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS.
3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS).
4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask.

1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win.

llvm-svn: 281323
2016-09-13 12:12:32 +00:00
James Molloy 3e4bc66134 [ARM] Promote small global constants to constant pools
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:

      ldr r0, .CPI0
      bl printf
      bx lr
    .CPI0: &format_string
    format_string: .asciz "hello, world!\n"

We can emit:

      adr r0, .CPI0
      bl printf
      bx lr
    .CPI0: .asciz "hello, world!\n"

This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).

llvm-svn: 281314
2016-09-13 10:28:11 +00:00
Eric Liu 882dc72b38 [WebAssembly] Trying to fix broken tests in CodeGen/WebAssembly caused by r281285.
Reviewers: bkramer, ddcc, dschuff, sunfish

Subscribers: jfb, llvm-commits, dschuff

Differential Revision: https://reviews.llvm.org/D24497

llvm-svn: 281312
2016-09-13 10:05:44 +00:00
Ayman Musa 0c2da88f82 Remove MVT:i1 xor instruction before SELECT. (Performance improvement).
Differential Revision: https://reviews.llvm.org/D23764

llvm-svn: 281308
2016-09-13 09:12:45 +00:00
Sjoerd Meijer 520a18df9c Revert of r281304 as it is causing build bot failures in hexagon
hwloop regression tests. These tests pass locally; will be investigating
where these differences come from.

llvm-svn: 281306
2016-09-13 08:51:59 +00:00
Sjoerd Meijer 05453991fe This adds a new field isAdd to MCInstrDesc. The ARM and Hexagon instruction
descriptions now tag add instructions, and the Hexagon backend is using this to
identify loop induction statements.

Patch by Sam Parker and Sjoerd Meijer.

Differential Revision: https://reviews.llvm.org/D23601

llvm-svn: 281304
2016-09-13 08:08:06 +00:00
Elena Demikhovsky b906df9fe5 AVX-512: Fix for PR28175 - Scalar code optimization.
Optimized (truncate (assertzext x) to i1) and anyext i1 to i8/16/32.
Optimization of this patterns is a one more step towards i1 optimization on AVX-512.

Differential Revision: https://reviews.llvm.org/D24456

llvm-svn: 281302
2016-09-13 07:57:00 +00:00
Diana Picus 4b97288184 [AArch64] Support stackmap/patchpoint in getInstSizeInBytes
We currently return 4 for stackmaps and patchpoints, which is very optimistic
and can in rare cases cause the branch relaxation pass to fail to relax certain
branches.

This patch causes getInstSizeInBytes to return a pessimistic estimate of the
size as the number of bytes requested in the stackmap/patchpoint. In the future,
we could provide a more accurate estimate by sharing some of the logic in
AArch64::LowerSTACKMAP/PATCHPOINT.

Fixes part of https://llvm.org/bugs/show_bug.cgi?id=28750

Differential Revision: https://reviews.llvm.org/D24073

llvm-svn: 281301
2016-09-13 07:45:17 +00:00
Craig Topper 4619c9e6a8 [X86] Remove masked shufpd/shufps intrinsics and autoupgrade to native vector shuffles. They were removed from clang previously but accidentally left in the backend.
llvm-svn: 281300
2016-09-13 07:40:53 +00:00
Peter Collingbourne d4135bbc30 DebugInfo: New metadata representation for global variables.
This patch reverses the edge from DIGlobalVariable to GlobalVariable.
This will allow us to more easily preserve debug info metadata when
manipulating global variables.

Fixes PR30362. A program for upgrading test cases is attached to that
bug.

Differential Revision: http://reviews.llvm.org/D20147

llvm-svn: 281284
2016-09-13 01:12:59 +00:00
Hans Wennborg 8a42d4b9cc X86: Conditional tail calls should not have isBarrier = 1
That confuses e.g. machine basic block placement, which then doesn't
realize that control can fall through a block that ends with a conditional
tail call. Instead, isBranch=1 should be set.

Also, mark EFLAGS as used by these instructions.

llvm-svn: 281281
2016-09-13 00:21:32 +00:00
Nico Weber 7c31d0ebc0 Revert r281215, it caused PR30358.
llvm-svn: 281263
2016-09-12 21:40:50 +00:00
Dehao Chen 9bbb941acf Lower consecutive select instructions correctly.
Summary: If consecutive select instructions are lowered separately in CGP, it will introduce redundant condition check and branches that cannot be removed by later optimization phases. This patch lowers all consecutive select instructions at the same to to avoid inefficent code as demonstrated in https://llvm.org/bugs/show_bug.cgi?id=29095

Reviewers: davidxl

Subscribers: vsk, llvm-commits

Differential Revision: https://reviews.llvm.org/D24147

llvm-svn: 281252
2016-09-12 20:23:28 +00:00
Elena Demikhovsky 57c602aad3 AVX-512: Added a test for -O0 mode. NFC.
llvm-svn: 281246
2016-09-12 19:03:21 +00:00
Elena Demikhovsky 5730bf0429 AVX-512: Simplified masked_gather_scatter test. NFC.
llvm-svn: 281244
2016-09-12 18:50:47 +00:00
Nicolai Haehnle e58e0e3fe3 AMDGPU: Do not clobber SCC in SIWholeQuadMode
Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D22198

llvm-svn: 281230
2016-09-12 16:25:20 +00:00
James Molloy 3d06ff22b7 Revert "[ARM] Promote small global constants to constant pools"
This reverts commit r281213. It made a bot go bang: http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-full/builds/14625

llvm-svn: 281228
2016-09-12 16:18:23 +00:00
Ahmed Bougacha b678219aa6 [BranchFolding] Unique added live-ins after hoisting code.
We're not supposed to have duplicate live-ins.

llvm-svn: 281224
2016-09-12 16:05:31 +00:00
Ahmed Bougacha 45bfa8772f [X86] Copy imp-uses when folding tailcall into conditional branch.
r280832 added 32-bit support for emitting conditional tail-calls, but
dropped imp-used parameter registers.  This went unnoticed until
r281113, which added 64-bit support, as this is only exposed with
parameter passing via registers.

Don't drop the imp-used parameters.

llvm-svn: 281223
2016-09-12 16:05:27 +00:00
Igor Breger a3e36da6f2 add select i1 test, reproduser pr30249.
llvm-svn: 281218
2016-09-12 15:27:02 +00:00
James Molloy 1e1b56bd48 [Thumb] Teach ISel how to lower compares of AND bitmasks efficiently
For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)).

1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS.
2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS.
3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS).
4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask.

1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win.

llvm-svn: 281215
2016-09-12 14:30:48 +00:00
James Molloy 8f82d45ff4 [ARM] Promote small global constants to constant pools
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:

      ldr r0, .CPI0
      bl printf
      bx lr
    .CPI0: &format_string
    format_string: .asciz "hello, world!\n"

We can emit:

      adr r0, .CPI0
      bl printf
      bx lr
    .CPI0: .asciz "hello, world!\n"

This can cause significant code size savings when many small strings are used in one
function (4 bytes per string).

llvm-svn: 281213
2016-09-12 13:42:16 +00:00
Pablo Barrio 0bebc38abb Fix the Thumb test for vfloat intrinsics
Summary:
This test was not testing the intrinsics. A function like this:

define %v4f32 @test_v4f32.floor(%v4f32 %a){
...
        %1 = call %v4f32 @llvm.floor.v4f32(%v4f32 %a)
...
}

is transformed into the following assembly:

_test_v4f32.floor:              @ @test_v4f32.floor
...
        bl _floorf
...

In each function tested, there are two CHECK: one that checked
for the label and another one for the intrinsic that should be used
inside the function (in our case, "floor"). However, although the
first CHECK was matching the label, the second was not matching the
intrinsic, but the second "floor" in the same line as the label.

This is fixed by making the first CHECK match the entire line.

Reviewers: jmolloy, rengolin

Subscribers: rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D24398

llvm-svn: 281211
2016-09-12 13:14:14 +00:00
Tim Northover 032548fc5e GlobalISel: support translation of global addresses.
llvm-svn: 281207
2016-09-12 12:10:41 +00:00
Tim Northover a7653b3919 GlobalISel: translate GEP instructions.
Unlike SDag, we use a separate G_GEP instruction (much simplified, only taking
a single byte offset) to preserve the pointer type information through
selection.

llvm-svn: 281205
2016-09-12 11:20:22 +00:00
Tim Northover d28d3cc079 GlobalISel: disambiguate types when printing MIR
Some generic instructions have multiple types. While in theory these always be
discovered by inspecting the single definition of each generic vreg, in
practice those definitions won't always be local and traipsing through a big
function to find them will not be fun.

So this changes MIRPrinter to print out the type of uses as well as defs, if
they're known to be different or not known to be the same.

On the parsing side, we're a little more flexible: provided each register is
given a type in at least one place it's mentioned (and all types are
consistent) we accept the MIR. This doesn't introduce ambiguity but makes
writing tests manually a bit less painful.

llvm-svn: 281204
2016-09-12 11:20:10 +00:00
Elena Demikhovsky de1b494555 AVX-512: Added a test case that should be optimized in the future. NFC.
llvm-svn: 281196
2016-09-12 06:26:03 +00:00
NAKAMURA Takumi cf6aaa9e1a llvm/test/CodeGen/AMDGPU/infinite-loop-evergreen.ll REQUIRES +Asserts.
This might not *crash* with -Asserts. I saw it caused infinite loop in the codegen.

llvm-svn: 281190
2016-09-12 04:27:28 +00:00
James Molloy 3e1ce05752 [AArch64] Fixup test after r281160
How I missed this locally is beyond me. I suspect llc didn't recompile. This is just changing the CHECK line back to what it was before r280364.

llvm-svn: 281161
2016-09-11 08:24:04 +00:00
Craig Topper 3639cda748 [AVX-512] Add test cases to demonstrate opportunities for commuting vpternlog. Commuting will be added in a future commit.
llvm-svn: 281157
2016-09-11 05:33:43 +00:00
Craig Topper fb4564cf21 [AVX-512] Add VPTERNLOG to load folding tables.
llvm-svn: 281156
2016-09-11 05:33:40 +00:00
Craig Topper 2c86705755 [X86] Side effecting asm in AVX512 integer stack folding test should return 2 x i64 not 8 x i64.
llvm-svn: 281155
2016-09-11 05:33:38 +00:00
Justin Lebar 6d6b11a4a6 [NVPTX] Use ldg for explicitly invariant loads.
Summary:
With this change (plus some changes to prevent !invariant from being
clobbered within llvm), clang will be able to model the __ldg CUDA
builtin as an invariant load, rather than as a target-specific llvm
intrinsic.  This will let the optimizer play with these loads --
specifically, we should be able to vectorize them in the load-store
vectorizer.

Reviewers: tra

Subscribers: jholewinski, hfinkel, llvm-commits, chandlerc

Differential Revision: https://reviews.llvm.org/D23477

llvm-svn: 281152
2016-09-11 01:39:04 +00:00
Arnold Schwaighofer 6c57f4f56d It should also be legal to pass a swifterror parameter to a call as a swifterror
argument.

rdar://28233388

llvm-svn: 281147
2016-09-10 19:42:53 +00:00
Arnold Schwaighofer 112ff66505 We also need to pass swifterror in R12 under swiftcc not only under ccc
rdar://28190687

llvm-svn: 281138
2016-09-10 14:16:55 +00:00
Matt Arsenault 124384f08d AMDGPU: Fix immediate folding logic when shrinking instructions
If the literal is being folded into src0, it doesn't matter
if it's an SGPR because it's being replaced with the literal.

Also fixes initially selecting 32-bit versions of some instructions
which also confused commuting.

llvm-svn: 281117
2016-09-09 23:32:53 +00:00
Hans Wennborg 6ecf619be9 X86: Fold tail calls into conditional branches also for 64-bit (PR26302)
This extends the optimization in r280832 to also work for 64-bit. The only
quirk is that we can't do this for 64-bit Windows (yet).

Differential Revision: https://reviews.llvm.org/D24423

llvm-svn: 281113
2016-09-09 22:37:27 +00:00
Matt Arsenault 0efdd06b22 AMDGPU: Run LoadStoreVectorizer pass by default
llvm-svn: 281112
2016-09-09 22:29:28 +00:00
Simon Pilgrim a3d1e03cd7 [X86][XOP] Fix VPERMIL2PD mask creation on 32-bit targets
Use getConstVector helper to correctly create v2i64/v4i64 constants on 32-bit targets

llvm-svn: 281105
2016-09-09 21:47:21 +00:00
Michael Kuperstein b1153848ff [X86] Regenerate test. NFC.
llvm-svn: 281099
2016-09-09 21:36:17 +00:00
Arnold Schwaighofer 7d7b4b4014 Create phi nodes for swifterror values at the end of the phi instructions list
ISel makes assumption about the order of phi nodes.

rdar://28190150

llvm-svn: 281095
2016-09-09 21:18:47 +00:00
Justin Lebar b5e884976b [NVPTX] Implement llvm.fabs.f32, llvm.max.f32, etc.
Summary:
Previously these only worked via NVPTX-specific intrinsics.

This change will allow us to convert these target-specific intrinsics
into the general LLVM versions, allowing existing LLVM passes to reason
about their behavior.

It also gets us some minor codegen improvements as-is, from situations
where we canonicalize code into one of these llvm intrinsics.

Reviewers: majnemer

Subscribers: llvm-commits, jholewinski, tra

Differential Revision: https://reviews.llvm.org/D24300

llvm-svn: 281092
2016-09-09 21:07:26 +00:00
Wei Ding 06f8d39424 AMDGPU : Fix mqsad_u32_u8 instruction incorrect data type.
Differential Revision: http://reviews.llvm.org/D23700

llvm-svn: 281081
2016-09-09 19:31:51 +00:00
Tom Stellard b2869eb6e9 AMDGPU/SI: Make sure llvm.amdgcn.implicitarg.ptr() is 8-byte aligned for HSA
Reviewers: arsenm

Subscribers: arsenm, wdng, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D24405

llvm-svn: 281080
2016-09-09 19:28:00 +00:00
Chris Dewhurst c59f7c745b [Sparc][LEON] Removed the parts of the errata fixes implemented using inline assembly as this is not the desired behaviour for end-users. Small change to a unit test to implement this without requiring the inline assembly.
llvm-svn: 281047
2016-09-09 14:16:51 +00:00
James Molloy 57d9dfa9ac [ARM] ADD with a negative offset can become SUB for free
So model that directly in TTI::getIntImmCost().

llvm-svn: 281044
2016-09-09 13:35:36 +00:00
James Molloy 1454e90f86 [ARM] icmp %x, -C can be lowered to a simple ADDS or CMN
Tell TargetTransformInfo about this so ConstantHoisting is informed.

llvm-svn: 281043
2016-09-09 13:35:28 +00:00
Simon Pilgrim 153b408433 [SelectionDAG] Ensure DAG::getZeroExtendInReg is called with a scalar type
Fixes issue with rL280927 identified by Mikael Holmén

llvm-svn: 281042
2016-09-09 13:31:52 +00:00
James Molloy 4d86bed0bb [Thumb] Select (CMPZ X, -C) -> (CMPZ (ADDS X, C), 0)
The CMPZ #0 disappears during peepholing, leaving just a tADDi3, tADDi8 or t2ADDri. This avoids having to materialize the expensive negative constant in Thumb-1, and allows a shrinking from a 32-bit CMN to a 16-bit ADDS in Thumb-2.

llvm-svn: 281040
2016-09-09 12:52:24 +00:00
Tim Northover 25d1286e5a GlobalISel: remove G_TYPE and G_PHI
These instructions were only necessary when type information was stored in the
MachineInstr (because only generic MachineInstrs possessed a type). Now that
it's in MachineRegisterInfo, COPY and PHI work fine.

llvm-svn: 281037
2016-09-09 11:47:31 +00:00
Tim Northover 0f140c769a GlobalISel: move type information to MachineRegisterInfo.
We want each register to have a canonical type, which means the best place to
store this is in MachineRegisterInfo rather than on every MachineInstr that
happens to use or define that register.

Most changes following from this are pretty simple (you need an MRI anyway if
you're going to be doing any transformations, so just check the type there).
But legalization doesn't really want to check redundant operands (when, for
example, a G_ADD only ever has one type) so I've made use of MCInstrDesc's
operand type field to encode these constraints and limit legalization's work.

As an added bonus, more validation is possible, both in MachineVerifier and
MachineIRBuilder (coming soon).

llvm-svn: 281035
2016-09-09 11:46:34 +00:00
Simon Dardis ba92b034bf Revert "[mips] Fix c.<cc>.<fmt> instruction definition."
This reverts commit r281022. Mips buildbot broke, due to unhandled register
class FCC.

llvm-svn: 281033
2016-09-09 11:06:01 +00:00
Sam Kolton a2e5c88baf [AMDGPU] Assembler: rename amd_kernel_code_t asm names according to spec
Summary:
Also removed duplicate code from AMDGPUTargetAsmStreamer.
This change only change how amd_kernel_code_t is parsed and printed. No variable names are changed.

Reviewers: vpykhtin, tstellarAMD

Subscribers: arsenm, wdng, nhaehnle

Differential Revision: https://reviews.llvm.org/D24296

llvm-svn: 281028
2016-09-09 10:08:02 +00:00
James Molloy 0f41227b21 [Thumb1] Teach optimizeCompareInstr about thumb1 compares
This avoids us doing a completely unneeded "cmp r0, #0" after a flag-setting instruction if we only care about the Z or C flags.

Add LSL/LSR to the whitelist while we're here and add testing. This code could really do with a spring clean.

llvm-svn: 281027
2016-09-09 09:51:06 +00:00
Simon Dardis 8efa979029 [mips] Fix c.<cc>.<fmt> instruction definition.
As part of this effort, remove MipsFCmp nodes and use tablegen
patterns rather than custom lowering through C++.

Unexpectedly, this improves codesize for microMIPS as previous floating
point setcc expansions would materialize 0 and 1 into GPRs before using
the relevant mov[tf].[sd] instruction. Now $zero is used directly.

Reviewers: dsanders, vkalintiris, zoran.jovanovic

Differential Review: https://reviews.llvm.org/D23118

llvm-svn: 281022
2016-09-09 09:22:52 +00:00
Chris Dewhurst ddad6e028e [Sparc][LEON] Unit test for CASA instruction supported by some LEON processors added.
llvm-svn: 281021
2016-09-09 09:08:13 +00:00
Craig Topper 149e6bdc16 [AVX-512] Add VPCMP instructions to the load folding tables and make them commutable.
llvm-svn: 281013
2016-09-09 01:36:10 +00:00
Craig Topper 1a1ac11625 [AVX-512] Add more integer vector comparison tests with loads. Some of these show opportunities where we can commute to fold loads.
Commutes will be added in a followup commit.

llvm-svn: 281012
2016-09-09 01:36:04 +00:00
Michael Kuperstein ceff022800 [X86] Add more baseline tests for "irregular" shuffles. NFC.
This adds more tests for shuffles where the output width does not match
the input width and/or the output is generated from more than two inputs.

llvm-svn: 281005
2016-09-09 00:49:29 +00:00
Hans Wennborg c39ef776fc Win64: Don't use REX prefix for direct tail calls
The REX prefix should be used on indirect jmps, but not direct ones.
For direct jumps, the unwinder looks at the offset to determine if
it's inside the current function.

Differential Revision: https://reviews.llvm.org/D24359

llvm-svn: 281003
2016-09-08 23:35:10 +00:00
Krzysztof Parzyszek a1218728d3 [RDF] Further improve handling of multiple phis reached from shadows
llvm-svn: 280987
2016-09-08 20:48:42 +00:00
Krzysztof Parzyszek a696b1b641 [Hexagon] Expand sext- and zextloads of vector types, not just extloads
Recent change exposed this issue, breaking the Hexagon buildbots.

llvm-svn: 280973
2016-09-08 17:42:14 +00:00
Matt Arsenault be90f70d3a AMDGPU: Try to commute when selecting s_addk_i32/s_mulk_i32
llvm-svn: 280972
2016-09-08 17:35:41 +00:00
Matt Arsenault bbb47da8a1 AMDGPU: Support commuting with immediate in src0
llvm-svn: 280970
2016-09-08 17:19:29 +00:00
Renato Golin 049f387112 Revert "[XRay] ARM 32-bit no-Thumb support in LLVM"
And associated commits, as they broke the Thumb bots.

This reverts commit r280935.
This reverts commit r280891.
This reverts commit r280888.

llvm-svn: 280967
2016-09-08 17:10:39 +00:00
James Molloy c6a6144966 [SDAGBuilder] Don't create a binary tree for switches in minsize mode
This bloats codesize - all of the non-leaf nodes are extra code.

llvm-svn: 280932
2016-09-08 13:12:22 +00:00
James Molloy 753c18f5c0 [Thumb1] AND with a constant operand can be converted into BIC
So model the cost of materializing the constant operand C as the minimum of
C and ~C.

llvm-svn: 280929
2016-09-08 12:58:12 +00:00
James Molloy 7c7255e40b [Thumb1] Fix cost calculation for complemented immediates
Materializing something like "-3" can be done as 2 instructions:
  MOV r0, #3
  MVN r0, r0

This has a cost of 2, not 3. It looks like we were already trying to detect this pattern in TII::getIntImmCost(), but were taking the complement of the zero-extended value instead of the sign-extended value which is unlikely to ever produce a number < 256.

There were no tests failing after changing this... :/

llvm-svn: 280928
2016-09-08 12:58:04 +00:00
Simon Pilgrim cc7b4b511b [SelectionDAG] Add BUILD_VECTOR support to computeKnownBits and SimplifyDemandedBits
Add the ability to computeKnownBits and SimplifyDemandedBits to extract the known zero/one bits from BUILD_VECTOR, returning the known bits that are shared by every vector element.

This is an initial step towards determining the sign bits of a vector (PR29079).

Differential Revision: https://reviews.llvm.org/D24253

llvm-svn: 280927
2016-09-08 12:57:51 +00:00
Simon Pilgrim a01ee07a19 [DAGCombiner] Enable AND combines of splatted constant vectors
Allow AND combines to use a vector splatted constant as well as a constant scalar.

Preliminary part of D24253.

llvm-svn: 280926
2016-09-08 12:36:39 +00:00
Pablo Barrio 2b7ed1339c Revert "[ARM] Lower UDIV+UREM to UDIV+MLS (and the same for SREM)"
This reverts commit r280808.

It is possible that this change results in an infinite loop. This
is causing timeouts in some tests on ARM, and a Chromebook bot is
failing.

llvm-svn: 280918
2016-09-08 10:05:57 +00:00
Michael Kuperstein f79af6f8c4 [CGP] Be less conservative about tail-duplicating a ret to allow tail calls
CGP tail-duplicates rets into blocks that end with a call that feed the ret.
This puts the call in tail position, potentially allowing the DAG builder to
lower it as a tail call. To avoid tail duplication in cases where we won't
form the tail call, CGP tried to predict whether this is going to be possible,
and avoids doing it when lowering as a tail call will definitely fail.
However, it was being too conservative by always throwing away calls to
functions with a signext/zeroext attribute on the return type.

Instead, we can use the same logic the builder uses to determine whether the
attributes work out.

Differential Revision: https://reviews.llvm.org/D24315

llvm-svn: 280894
2016-09-08 00:48:37 +00:00
Dean Michael Berris 17d94e279e [XRay] ARM 32-bit no-Thumb support in LLVM
This is a port of XRay to ARM 32-bit, without Thumb support yet. The XRay instrumentation support is moving up to AsmPrinter.
This is one of 3 commits to different repositories of XRay ARM port. The other 2 are:

1. https://reviews.llvm.org/D23932 (Clang test)
2. https://reviews.llvm.org/D23933 (compiler-rt)

Differential Revision: https://reviews.llvm.org/D23931

llvm-svn: 280888
2016-09-08 00:19:04 +00:00
Elena Demikhovsky dcc86d5bb6 Shift-left (ISD::SHL) operation crashes on "DAG Legalization" phase.
https://llvm.org/bugs/show_bug.cgi?id=29058.

While node legalization we tried to legalize its operands.
If an operand node is replaced during legalization the user node may be destroyed.

Differential Revision: https://reviews.llvm.org/D24244

llvm-svn: 280862
2016-09-07 20:54:33 +00:00
Krzysztof Parzyszek 2db0c8b75f [RDF] Fix liveness analysis for phi nodes with shadow uses
Shadow uses need to be analyzed together, since each individual shadow
will only have a partial reaching def. All shadows together may cover
a given register ref, while each individual shadow may not.

llvm-svn: 280855
2016-09-07 20:37:05 +00:00
Wei Mi b96ebe6a1a Rename test pr30298.ll to shrink_vmul_sse.ll, to make the name more meaningful, NFC.
Add PR number and comment in pr30298.ll to explain what is testing.

llvm-svn: 280843
2016-09-07 18:46:15 +00:00
Wei Mi f100d4e93d Don't reduce the width of vector mul if the target doesn't support SSE2.
The patch is to fix PR30298, which is caused by rL272694. The solution is to
bail out if the target has no SSE2.

Differential Revision: https://reviews.llvm.org/D24288

llvm-svn: 280837
2016-09-07 18:22:17 +00:00
Hans Wennborg 6c45f9233c Add more triple to conditional-tailcall.ll test
llvm-svn: 280835
2016-09-07 18:19:31 +00:00
Saleem Abdulrasool 02d9851c1c CodeGen: ensure that libcalls are always AAPCS CC
The original commit was too aggressive about marking LibCalls as AAPCS.  The
libcalls contain libc/libm/libunwind calls which are not AAPCS, but C.

llvm-svn: 280833
2016-09-07 17:56:09 +00:00
Hans Wennborg 75e25f6812 X86: Fold tail calls into conditional branches where possible (PR26302)
When branching to a block that immediately tail calls, it is possible to fold
the call directly into the branch if the call is direct and there is no stack
adjustment, saving one byte.

Example:

  define void @f(i32 %x, i32 %y) {
  entry:
    %p = icmp eq i32 %x, %y
    br i1 %p, label %bb1, label %bb2
  bb1:
    tail call void @foo()
    ret void
  bb2:
    tail call void @bar()
    ret void
  }

before:

  f:
          movl    4(%esp), %eax
          cmpl    8(%esp), %eax
          jne     .LBB0_2
          jmp     foo
  .LBB0_2:
          jmp     bar

after:

  f:
          movl    4(%esp), %eax
          cmpl    8(%esp), %eax
          jne     bar
  .LBB0_1:
          jmp     foo

I don't expect any significant size savings from this (on a Clang bootstrap I
saw 288 bytes), but it does make the code a little tighter.

This patch only does 32-bit, but 64-bit would work similarly.

Differential Revision: https://reviews.llvm.org/D24108

llvm-svn: 280832
2016-09-07 17:52:14 +00:00
Yaxun Liu 638914009a AMDGPU: Add hidden kernel arguments to runtime metadata
OpenCL kernels have hidden kernel arguments for global offset and printf buffer. For consistency, these hidden argument should be included in the runtime metadata. Also updated kernel argument kind metadata.

Differential Revision: https://reviews.llvm.org/D23424

llvm-svn: 280829
2016-09-07 17:44:00 +00:00
Simon Pilgrim d311311beb Fix typo in test - it should be masking bits0-15 not bit16
llvm-svn: 280816
2016-09-07 15:19:07 +00:00
Simon Pilgrim 32cfa5ba83 [X86][SSE] Added or combine tests for known bits of vectors
Part of the yak shaving for D24253

llvm-svn: 280813
2016-09-07 14:49:50 +00:00
Simon Pilgrim 65cdc058b6 [X86][SSE] Added and+or+zext combine tests for known bits of vectors
Part of the yak shaving for D24253

llvm-svn: 280810
2016-09-07 14:00:52 +00:00
Simon Pilgrim 2415144425 [X86][SSE] Added and+or combine tests currently failing with vectors
(and (or x, C), D) -> D if (C & D) == D

Part of the yak shaving for D24253

llvm-svn: 280809
2016-09-07 13:40:03 +00:00
Pablo Barrio fc752bb70a [ARM] Lower UDIV+UREM to UDIV+MLS (and the same for SREM)
Summary:
This saves a library call to __aeabi_uidivmod. However, the
processor must feature hardware division in order to benefit from
the transformation.

Reviewers: scott-0, jmolloy, compnerd, rengolin

Subscribers: t.p.northover, compnerd, aemerson, rengolin, samparker, llvm-commits

Differential Revision: https://reviews.llvm.org/D24133

llvm-svn: 280808
2016-09-07 12:49:15 +00:00
Vasileios Kalintiris 1ed49fd384 [mips] Disable the TImode shift libcalls for 32-bit targets.
Summary:
The o32 ABI doesn't not support the TImode helpers. For the time being,
disable just the shift libcalls as they break recursive builds on MIPS.

Reviewers: sdardis

Subscribers: llvm-commits, sdardis

Differential Revision: https://reviews.llvm.org/D24259

llvm-svn: 280798
2016-09-07 10:01:18 +00:00
Hal Finkel 42c83f131e [PowerPC] Fix address-offset folding for plain addi
When folding an addi into a memory access that can take an immediate offset, we
were implicitly assuming that the existing offset was zero. This was incorrect.
If we're dealing with an addi with a plain constant, we can add it to the
existing offset (assuming that doesn't overflow the immediate, etc.), but if we
have anything else (i.e. something that will become a relocation expression),
we'll go back to requiring the existing immediate offset to be zero (because we
don't know what the requirements on that relocation expression might be - e.g.
maybe it is paired with some addis in some relevant way).

On the other hand, when dealing with a plain addi with a regular constant
immediate, the alignment restrictions (from the TOC base pointer, etc.) are
irrelevant.

I've added the test case from PR30280, which demonstrated the bug, but also
demonstrates a missed optimization opportunity (i.e. we don't need the memory
accesses at all).

Fixes PR30280.

llvm-svn: 280789
2016-09-07 07:36:11 +00:00
Elena Demikhovsky f0ddd1b8b5 AVX512F: FMA intrinsic + FNEG - sequence optimization
The previous commit (r280368 - https://reviews.llvm.org/D23313) does not cover AVX-512F, KNL set.
FNEG(x) operation is lowered to (bitcast (vpxor (bitcast x), (bitcast constfp(0x80000000))).
It happens because FP XOR is not supported for 512-bit data types on KNL and we use integer XOR instead.
I added pattern match for integer XOR.

Differential Revision: https://reviews.llvm.org/D24221

llvm-svn: 280785
2016-09-07 06:54:28 +00:00
Craig Topper b880ad3a71 [AVX-512] Add support for commuting masked instructions in findCommutedOpIndices. The default implementation doesn't skip the mask input or the preserved input.
llvm-svn: 280781
2016-09-07 04:46:11 +00:00
Saleem Abdulrasool a7ade33d16 Revert "CodeGen: ensure that libcalls are always AAPCS CC"
This reverts SVN r280683.  Revert until I figure out why this is breaking lli
tests.

llvm-svn: 280778
2016-09-07 03:17:19 +00:00
Hal Finkel 8ca2ed22b2 [DAGCombine] More fixups to SETCC legality checking (visitANDLike/visitORLike)
I might have called this "r246507, the sequel". It fixes the same issue, as the
issue has cropped up in a few more places. The underlying problem is that
isSetCCEquivalent can pick up select_cc nodes with a result type that is not
legal for a setcc node to have, and if we use that type to create new setcc
nodes, nothing fixes that (and so we've violated the contract that the
infrastructure has with the backend regarding setcc node types).

Fixes PR30276.

For convenience, here's the commit message from r246507, which explains the
problem is greater detail:

[DAGCombine] Fixup SETCC legality checking

SETCC is one of those special node types for which operation actions (legality,
etc.) is keyed off of an operand type, not the node's value type. This makes
sense because the value type of a legal SETCC node is determined by its
operands' value type (via the TLI function getSetCCResultType). When the
SDAGBuilder creates SETCC nodes, it either creates them with an MVT::i1 value
type, or directly with the value type provided by TLI.getSetCCResultType.

The first problem being fixed here is that DAGCombine had several places
querying TLI.isOperationLegal on SETCC, but providing the return of
getSetCCResultType, instead of the operand type directly. This does not mean
what the author thought, and "luckily", most in-tree targets have SETCC with
Custom lowering, instead of marking them Legal, so these checks return false
anyway.

The second problem being fixed here is that two of the DAGCombines could create
SETCC nodes with arbitrary (integer) value types; specifically, those that
would simplify:

  (setcc a, b, op1) and|or (setcc a, b, op2) -> setcc a, b, op3
     (which is possible for some combinations of (op1, op2))

If the operands of the and|or node are actual setcc nodes, then this is not an
issue (because the and|or must share the same type), but, the relevant code in
DAGCombiner::visitANDLike and DAGCombiner::visitORLike actually calls
DAGCombiner::isSetCCEquivalent on each operand, and that function will
recognise setcc-like select_cc nodes with other return types. And, thus, when
creating new SETCC nodes, we need to be careful to respect the value-type
constraint. This is even true before type legalization, because it is quite
possible for the SELECT_CC node to have a legal type that does not happen to
match the corresponding TLI.getSetCCResultType type.

To be explicit, there is nothing that later fixes the value types of SETCC
nodes (if the type is legal, but does not happen to match
TLI.getSetCCResultType). Creating SETCCs with an MVT::i1 value type seems to
work only because, either MVT::i1 is not legal, or it is what
TLI.getSetCCResultType returns if it is legal. Fixing that is a larger change,
however. For the time being, restrict the relevant transformations to produce
only SETCC nodes with a value type matching TLI.getSetCCResultType (or MVT::i1
prior to type legalization).

Fixes PR24636.

llvm-svn: 280767
2016-09-06 23:02:23 +00:00
Konstantin Zhuravlyov 864718c666 [AMDGPU] Wave and register controls
- Add missing test

llvm-svn: 280749
2016-09-06 20:29:10 +00:00
Konstantin Zhuravlyov 1d65026ca6 [AMDGPU] Wave and register controls
- Implemented amdgpu-flat-work-group-size attribute
- Implemented amdgpu-num-active-waves-per-eu attribute
- Implemented amdgpu-num-sgpr attribute
- Implemented amdgpu-num-vgpr attribute
- Dynamic LDS constraints are in a separate patch

Patch by Tom Stellard and Konstantin Zhuravlyov

Differential Revision: https://reviews.llvm.org/D21562

llvm-svn: 280747
2016-09-06 20:22:28 +00:00
Wei Ding 5e832e866e AMDGPU : Add XNACK feature to GPUs that support it.
Differential Revision: http://reviews.llvm.org/D24276

llvm-svn: 280742
2016-09-06 19:55:17 +00:00
Krzysztof Parzyszek 7c9b012629 [RDF] Ignore undef use operands
llvm-svn: 280717
2016-09-06 17:03:13 +00:00
Simon Pilgrim 1b4462b7c1 [SelectionDAG] Simplify extract_subvector( insert_subvector ( Vec, In, Idx ), Idx ) -> In
If we are extracting a subvector that has just been inserted then we should just use the original inserted subvector.

This has come up in certain several x86 shuffle lowering cases where we are crossing 128-bit lanes.

Differential Revision: https://reviews.llvm.org/D24254

llvm-svn: 280715
2016-09-06 16:42:05 +00:00
Simon Dardis b432a3ed7e [mips] Tighten FastISel restrictions
LLVM PR/29052 highlighted that FastISel for MIPS attempted to lower
arguments assuming that it was using the paired 32bit registers to
perform operations for f64. This mode of operation is not supported
for MIPSR6.

This patch resolves the reported issue by adding additional checks
for unsupported floating point unit configuration.

Thanks to mike.k for reporting this issue!

Reviewers: seanbruno, vkalintiris

Differential Review: https://reviews.llvm.org/D23795

llvm-svn: 280706
2016-09-06 12:36:24 +00:00
Krzysztof Parzyszek 020ec299bf [PPC] Claim stack frame before storing into it, if no red zone is present
Unlike PPC64, PPC32/SVRV4 does not have red zone. In the absence of it 
there is no guarantee that this part of the stack will not be modified 
by any interrupt. To avoid this, make sure to claim the stack frame first
before storing into it.

This fixes https://llvm.org/bugs/show_bug.cgi?id=26519.

Differential Revision: https://reviews.llvm.org/D24093

llvm-svn: 280705
2016-09-06 12:30:00 +00:00
Craig Topper 4fa3b50fc3 [AVX-512] Fix masked VPERMI2PS isel when the index comes from a bitcast.
We need to bitcast the index operand to a floating point type so that it matches the result type. If not then the passthru part of the DAG will be a bitcast from the index's original type to the destination type. This makes it very difficult to match. The other option would be to add 5 sets of patterns for every other possible type.

llvm-svn: 280696
2016-09-06 06:56:59 +00:00
Craig Topper cf9f1b8dfa [AVX-512] Add a test case to show that we don't select masked vpermi2ps when the index operand comes from a bitcast.
It doesn't work because we're looking for a bitcast from the v4i32 index operand to v4f32 for the passthru part of the DAG. But since the index is bitcasted from v2i64 and bitcasts fold, we actually have a bitcast from v2i64 to v4f32 in the passthru part of the DAG.

Taken from optimized output from clang's test case.

llvm-svn: 280695
2016-09-06 05:45:27 +00:00
Saleem Abdulrasool bfa25bd1ac ARM: workaround bundled operation predication
This is a Windows ARM specific issue.  If the code path in the if conversion
ends up using a relocation which will form a IMAGE_REL_ARM_MOV32T, we end up
with a bundle to ensure that the mov.w/mov.t pair is not split up.  This is
normally fine, however, if the branch is also predicated, then we end up trying
to predicate the bundle.

For now, report a bundle as being unpredicatable.  Although this is false, this
would trigger a failure case previously anyways, so this is no worse.  That is,
there should not be any code which would previously have been if converted and
predicated which would not be now.

Under certain circumstances, it may be possible to "predicate the bundle".  This
would require scanning all bundle instructions, and ensure that the bundle
contains only predicatable instructions, and converting the bundle into an IT
block sequence.  If the bundle is larger than the maximal IT block length (4
instructions), it would require materializing multiple IT blocks from the single
bundle.

llvm-svn: 280689
2016-09-06 04:00:12 +00:00
Craig Topper 62d0a5e7d3 [AVX-512] Fix v8i64 shift by immediate lowering on 32-bit targets.
llvm-svn: 280684
2016-09-06 00:31:10 +00:00
Saleem Abdulrasool a6519b1d54 CodeGen: ensure that libcalls are always AAPCS CC
All of the builtins are designed to be invoked with ARM AAPCS CC even on ARM
AAPCS VFP CC hosts.  Tweak the default initialisation to ARM AAPCS CC rather
than C CC for ARM/thumb targets.

The changes to the tests are necessary to ensure that the calling convention for
the lowered library calls are honoured.  Furthermore, these adjustments cause
certain branch invocations to change to branch-and-link since the returned value
needs to be moved across registers (d0 -> r0, r1).

llvm-svn: 280683
2016-09-06 00:28:43 +00:00
Craig Topper dfc4fc9f02 [AVX-512] Teach fastisel load/store handling to use EVEX encoded instructions for 128/256-bit vectors and scalar single/double.
Still need to fix the register classes to allow the extended range of registers.

llvm-svn: 280682
2016-09-05 23:58:40 +00:00
Craig Topper 70e1348031 [X86] Update fast-isel store test to have more 256 and 512-bit test cases. Add command lines for AVX and AVX512 feature sets.
llvm-svn: 280681
2016-09-05 23:58:37 +00:00
Craig Topper f54ebca2dd [X86] Update fast-isel vector load test to have more 256 and 512-bit test cases. Add a command line for SKX features too.
llvm-svn: 280680
2016-09-05 23:58:32 +00:00
Simon Pilgrim 6dd4b3526a [X86][SSE] Add test cases for PR29078
'Failure to recognise i64 sitofp/uitofp conversions that can be performed as i32'

llvm-svn: 280671
2016-09-05 18:11:17 +00:00
Simon Pilgrim efab350967 [X86][SSE] Add test cases for PR29079
'Failure to recognise uitofp conversions that can be performed as sitofp'

llvm-svn: 280670
2016-09-05 18:04:38 +00:00
Simon Pilgrim 5f14ff75ec [X86][SSE] Regenerate odd shuffle tests with common prefixes
llvm-svn: 280661
2016-09-05 14:15:38 +00:00
Igor Breger a2f8ca9a34 [AVX512] Fix v8i1 /v16i1 zext + bitcast lowering pattern. Explicitly zero upper bits.
Differential Revision: http://reviews.llvm.org/D23983

llvm-svn: 280650
2016-09-05 08:26:51 +00:00
Craig Topper d9ca3d97ef [AVX-512] Simplify X86InstrInfo::copyPhysReg for 128/256-bit vectors with AVX512, but not VLX. We should use the VEX opcodes and trust the register allocator to not use the extended XMM/YMM register space.
Previously we were extending to copying the whole ZMM register. The register allocator shouldn't use XMM16-31 or YMM16-31 in this configuration as the instructions to spill them aren't available.

llvm-svn: 280648
2016-09-05 06:43:06 +00:00
Craig Topper 42b69dd961 [X86] Add AVX and AVX512 command lines to the vec_ss_load_fold test.
llvm-svn: 280645
2016-09-05 02:20:53 +00:00
Simon Pilgrim cded03f163 [X86] Regenerate x64 mmx/f64 return value tests
llvm-svn: 280634
2016-09-04 18:14:45 +00:00
Craig Topper 4177345d7f [AVX-512] Remove 128-bit and 256-bit masked floating point add/sub/mul/div intrinsics and upgrade to native IR.
llvm-svn: 280633
2016-09-04 18:13:33 +00:00
Simon Pilgrim 128047fde5 [X86] Regenerate trunc-store legalization test
llvm-svn: 280631
2016-09-04 17:50:03 +00:00
Simon Pilgrim c228f5c195 [X86][SSE] Regenerate fcmp/uitofp combine tests
llvm-svn: 280629
2016-09-04 17:16:01 +00:00
Igor Breger 7e2a0dfa0c revert r279960.
https://llvm.org/bugs/show_bug.cgi?id=30249

llvm-svn: 280625
2016-09-04 14:03:52 +00:00
Simon Pilgrim 9a36318c54 EOL fixes
llvm-svn: 280624
2016-09-04 13:30:46 +00:00
Hal Finkel 73390c7acd [PowerPC] Zero-extend constants in FastISel
As it turns out, whether we zero-extend or sign-extend i8/i16 constants, which
are illegal types promoted to i32 on PowerPC, is a choice constrained by
assumptions within the infrastructure. Specifically, the logic in
FunctionLoweringInfo::ComputePHILiveOutRegInfo assumes that constant PHI
operands will be zero extended, and so, at least when materializing constants
that are PHI operands, we must do the same.

The rest of our fast-isel implementation does not appear to depend on the fact
that we were sign-extending i8/i16 constants, and all other targets also appear
to zero-extend small-bitwidth constants in fast-isel; we'll now do the same (we
had been doing this only for i1 constants, and sign-extending the others).

Fixes PR27721.

llvm-svn: 280614
2016-09-04 06:07:19 +00:00
Craig Topper af0d63d2e7 [AVX-512] Remove masked integer add/sub/mull intrinsics and upgrade to native IR.
llvm-svn: 280611
2016-09-04 02:09:53 +00:00
Xinliang David Li 241e6c7086 [Profile] preserve branch metadata lowering select in CGP
CGP currently drops select's MD_prof profile data when
generating conditional branch which can lead to bad
code layout. The patch fixes the issue.

Differential Revision: http://reviews.llvm.org/D24169

llvm-svn: 280600
2016-09-03 21:26:36 +00:00
Craig Topper 907b580d72 [AVX-512] Add integer ADD/SUB instructions to load folding tables. Add an AVX512 stack folding test.
llvm-svn: 280593
2016-09-03 17:20:07 +00:00
Nicolai Haehnle 3bba6a8438 AMDGPU: Reduce the duration of whole-quad-mode
Summary:
This contains two changes that reduce the time spent in WQM, with the
intention of reducing bandwidth required by VMEM loads:

1. Sampling instructions by themselves don't need to run in WQM, only their
   coordinate inputs need it (unless of course there is a dependent sampling
   instruction). The initial scanInstructions step is modified accordingly.

2. When switching back from WQM to Exact, switch back as soon as possible.
   This affects the logic in processBlock.

This should always be a win or at best neutral.

There are also some cleanups (e.g. remove unused ExecExports) and some new
debugging output.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D22092

llvm-svn: 280590
2016-09-03 12:26:38 +00:00
Nicolai Haehnle a246dccc26 AMDGPU: Fix an interaction between WQM and polygon stippling
Summary:
This fixes a rare bug in polygon stippling with non-monolithic pixel shaders.

The underlying problem is as follows: the prolog part contains the polygon
stippling sequence, i.e. a kill. The main part then enables WQM based on the
_reduced_ exec mask, effectively undoing most of the polygon stippling.

Since we cannot know whether polygon stippling will be used, the main part
of a non-monolithic shader must always return to exact mode to fix this
problem.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23131

llvm-svn: 280589
2016-09-03 12:26:32 +00:00
Matt Arsenault 2510a31677 AMDGPU: Fix spilling of m0
readlane/writelane do not support using m0 as the output/input.
Constrain the register class of spill vregs to try to avoid this,
but also handle spilling of the physreg when necessary by inserting
an additional copy to a normal SGPR.

llvm-svn: 280584
2016-09-03 06:57:55 +00:00
Craig Topper 892ce56901 [AVX-512] Add EVEX encoded VPCMPEQ and VPCMPGT to the load folding tables.
llvm-svn: 280581
2016-09-03 04:37:50 +00:00
Wei Mi c37307a5f4 Fix buildbot error.
Add -mtriple=x86_64-unknown-linux-gnu for the test and move it to CodeGen/X86.

llvm-svn: 280568
2016-09-03 01:43:28 +00:00
Hal Finkel 7b104d4721 [PowerPC] For larger offsets, when possible, fold offset into addis toc@ha
When we have an offset into a global, etc. that is accessed relative to the TOC
base pointer, and the offset is larger than the minimum alignment of the global
itself and the TOC base pointer (which is 8-byte aligned), we can still fold
the @toc@ha into the memory access, but we must update the addis instruction's
symbol reference with the offset as the symbol addend. When there is only one
use of the addi to be folded and only one use of the addis that would need its
symbol's offset adjusted, then we can make the adjustment and fold the @toc@l
into the memory access.

llvm-svn: 280545
2016-09-02 21:37:07 +00:00
Jan Vesely ea45746d5a AMDGPU/R600: EXTRACT_VECT_ELT should only bypass BUILD_VECTOR if the vectors have the same number of elements.
Fixes R600 piglit regressions since r280298

Differential Revision: https://reviews.llvm.org/D24174

llvm-svn: 280535
2016-09-02 20:13:19 +00:00
Krzysztof Parzyszek 3bf4aeccd6 Do not consider subreg defs as reads when computing subrange liveness
Subregister definitions are considered uses for the purpose of tracking
liveness of the whole register. At the same time, when calculating live
interval subranges, subregister defs should not be treated as uses.

Differential Revision: https://reviews.llvm.org/D24190

llvm-svn: 280532
2016-09-02 19:48:55 +00:00
Jan Vesely 00864886f4 AMDGPU/R600: Expand unaligned writes to local and global AS
LOCAL and GLOBAL AS only
PRIVATE needs special treatment

Differential Revision: https://reviews.llvm.org/D23971

llvm-svn: 280526
2016-09-02 19:07:06 +00:00
Jan Vesely cd6b12b12e AMDGPU: Reorganize store tests
Split by AS.
Merge with some prviously failing tests.

Differential Revision: https://reviews.llvm.org/D23969

llvm-svn: 280523
2016-09-02 18:52:28 +00:00
Kyle Butt 8699921c4b IfConversion: Fix bug introduced by rescanning diamonds.
Passing the wrong values for predicate-clobbering. Simple to miss.
Added an assert to make this easier to catch in the future.

llvm-svn: 280517
2016-09-02 18:29:26 +00:00
Andrea Di Biagio fd503e5af3 [DAGcombiner] Fix incorrect sinking of a truncate into the operand of a shift.
This fixes a regression introduced by revision 268094.
Revision 268094 added the following dag combine rule:
// trunc (shl x, K) -> shl (trunc x), K => K < vt.size / 2

That rule converts a truncate of a shift-by-constant into a shift of a truncated
value. We do this only if the shift count is less than half the size in bits of
the truncated value (K < vt.size / 2).

The problem is that the constraint on the shift count is incorrect, so the rule
doesn't work well in some cases involving vector types. The combine rule should
have been written instead like this:
// trunc (shl x, K) -> shl (trunc x), K => K < vt.getScalarSizeInBits()

Basically, if K is smaller than the "scalar size in bits" of the truncated value
then we know that by "sinking" the truncate into the operand of the shift we
would never accidentally make the shift undefined.

This patch fixes the check on the shift count, and adds test cases to make sure
that we don't regress the behavior.

Differential Revision: https://reviews.llvm.org/D24154

llvm-svn: 280482
2016-09-02 11:29:09 +00:00
Craig Topper ad79bf4712 [AVX-512] Move tests for masked floating point logical operations to avx512dqvl-intrinsics-upgrade.ll since they have now been autoupgraded.
llvm-svn: 280467
2016-09-02 06:11:31 +00:00
Craig Topper 45d6503089 [AVX-512] Add more patterns for masked and broadcasted logical operations where the select or broadcast has a floating point type.
These are needed in order to remove the masked floating point logical operation intrinsics and use native IR.

llvm-svn: 280465
2016-09-02 05:29:13 +00:00
Craig Topper 00aecd97bf [AVX-512] Add execution domain fixing for logical operations with broadcast loads. This builds on the handling of masked ops since we need to keep element size the same.
llvm-svn: 280464
2016-09-02 05:29:09 +00:00
Hal Finkel 5ef4b03106 [PowerPC] hasAndNotCompare should return true
As Sanjay suggested when he added the hook, PPC should return true from
hasAndNotCompare. We have an efficient negated 'and' on PPC (which can feed a
compare).

Fixes PR27203.

llvm-svn: 280457
2016-09-02 02:58:25 +00:00
Hal Finkel a39fd4bc53 [PowerPC] Add a pattern for a runtime bit check
Following a suggestion by Sanjay, we should lower:

  %shl = shl i32 1, %y
  %and = and i32 %x, %shl
  %cmp = icmp eq i32 %and, %shl
  ret i1 %cmp

into:

  subfic r4, r4, 32
  rlwnm r3, r3, r4, 31, 31

Add this pattern and some associated patterns for the 64-bit case and the
not-equal case. Fixes PR27356.

llvm-svn: 280454
2016-09-02 02:34:44 +00:00
Hal Finkel b54579fab6 [PowerPC] Don't apply the PPC64 address-formation peephole for offsets greater than 7
When applying our address-formation PPC64 peephole, we are reusing the @ha TOC
addis value with the low parts associated with different offsets (i.e.
different effective symbol addends). We were assuming this was okay so long as
the offsets were less than the alignment of the global variable being accessed.
This ignored the fact, however, that the TOC base pointer itself need only be
8-byte aligned. As a result, what we were doing is legal only for offsets less
than 8 regardless of the alignment of the object being accessed.

Fixes PR28727.

llvm-svn: 280441
2016-09-02 00:28:20 +00:00
Hal Finkel 1e8218cc09 [PowerPC] Don't consider fusion in PPC64 address-formation peephole
The logic in this function assumes that the P8 supports fusion of addis/addi,
but it does not. As a result, there is no advantage to restricting our peephole
application, merging addi instructions into dependent memory accesses, even
when the addi has multiple users, regardless of whether or not we're optimizing
for size.

We might need something like this again for the P9; I suspect we'll revisit
this code when we work on P9 tuning.

llvm-svn: 280440
2016-09-02 00:27:50 +00:00
Michael Kuperstein 7bc54cebea [Legalizer] Don't throw away false low half when expanding GT/LT SETCC
When expanding a SETCC for which the low half is known to evaluate to false,
we can only throw it away for LT/GT comparisons, not LE/GE.

This fixes PR29170.

Differential Revision: https://reviews.llvm.org/D24151

llvm-svn: 280424
2016-09-01 23:02:32 +00:00
Michael Kuperstein 5f17d08f49 [SelectionDAG] Generate vector_shuffle nodes for undersized result vector sizes
Prior to this, we could generate a vector_shuffle from an IR shuffle when the
size of the result was exactly the sum of the sizes of the input vectors.
If the output vector was narrower - e.g. a <12 x i8> being formed by a shuffle
with two <8 x i8> inputs - we would lower the shuffle to a sequence of extracts
and inserts.

Instead, we can form a larger vector_shuffle, and then extract a subvector
of the right size - e.g. shuffle the two <8 x i8> inputs into a <16 x i8>
and then extract a <12 x i8>.

This also includes a target-specific X86 combine that in the presence of
AVX2 combines:
(vector_shuffle <mask> (concat_vectors t1, undef)
                       (concat_vectors t2, undef))
into:
(vector_shuffle <mask> (concat_vectors t1, t2), undef)
in cases where this allows us to form VPERMD/VPERMQ.

(This is not a separate commit, as that pattern does not appear without
the DAGBuilder change.)

llvm-svn: 280418
2016-09-01 21:32:09 +00:00
Heejin Ahn c0f18172f5 [WebAssembly] Add asm.js-style setjmp/longjmp handling for wasm (reland r280302)
Summary: This patch adds asm.js-style setjmp/longjmp handling support for WebAssembly. It also uses JavaScript's try and catch mechanism.

Reviewers: jpp, dschuff

Subscribers: jfb, dschuff

Differential Revision: https://reviews.llvm.org/D24121

llvm-svn: 280415
2016-09-01 21:05:15 +00:00
Tim Northover 8d8812c5d7 GlobalISel: add a G_PHI instruction to give phis a type.
They're another source of generic vregs, which are going to need a type on the
definition when we remove the register width from MachineRegisterInfo.

llvm-svn: 280412
2016-09-01 20:45:41 +00:00
Andrey Turetskiy cde38b6a99 [X86] Loosen memory folding requirements for cvtdq2pd and cvtps2pd instructions.
According to spec cvtdq2pd and cvtps2pd instructions don't require memory operand to be aligned
to 16 bytes. This patch removes this requirement from the memory folding table.

Differential Revision: https://reviews.llvm.org/D23919

llvm-svn: 280402
2016-09-01 18:50:02 +00:00
Yaxun Liu add05a8d95 AMDGPU: Add runtime metadata for pointee alignment of argument.
Add runtime metdata for pointee alignment of pointer type kernel argument. The key is KeyArgPointeeAlign and the value is a 32 bit unsigned integer.

Differential Revision: https://reviews.llvm.org/D24145

llvm-svn: 280399
2016-09-01 18:46:49 +00:00
Michael Kuperstein 65bc3c89ff [DAGCombine] Don't fold a trunc if it feeds an anyext
Legalization tends to create anyext(trunc) patterns. This should always be
combined - into either a single trunc, a single ext, or nothing if the
types match exactly. But if we happen to combine the trunc first, we may pull
the trunc away from the anyext or make it implicit (e.g. the truncate(extract)
-> extract(bitcast) fold).

To prevent this, we can avoid doing the fold, similarly to how we already handle
fpround(fpextend).

Differential Revision: https://reviews.llvm.org/D23893

llvm-svn: 280386
2016-09-01 17:59:24 +00:00
Simon Dardis bd27154757 [mips] interAptiv based generic schedule model
This scheduler describes a processor which covers all MIPS ISAs based
around the interAptiv and P5600 timings.

Reviewers: vkalintiris, dsanders

Differential Revision: https://reviews.llvm.org/D23551

llvm-svn: 280374
2016-09-01 14:53:53 +00:00
Krzysztof Parzyszek 07d9f53b51 [Hexagon] Deal with undefs when extending live intervals
Reapply r280275, since MSVC accepts r280358.

llvm-svn: 280369
2016-09-01 13:59:35 +00:00
Elena Demikhovsky 4d7738dfde Optimized FMA intrinsic + FNEG , like
-(a*b+c)

and FNEG + FMA, like
a*b-c or (-a)*b+c.

The bug description is here :  https://llvm.org/bugs/show_bug.cgi?id=28892

Differential revision: https://reviews.llvm.org/D23313

llvm-svn: 280368
2016-09-01 13:58:53 +00:00
James Molloy 88cad7e5cf [SimplifyCFG] Handle tail-sinking of more than 2 incoming branches
This was a real restriction in the original version of SinkIfThenCodeToEnd. Now it's been rewritten, the restriction can be lifted.

As part of this, we handle a very common and useful case where one of the incoming branches is actually conditional. Consider:

   if (a)
     x(1);
   else if (b)
     x(2);

This produces the following CFG:

         [if]
        /    \
      [x(1)] [if]
        |     | \
        |     |  \
        |  [x(2)] |
         \    |  /
          [ end ]

[end] has two unconditional predecessor arcs and one conditional. The conditional refers to the implicit empty 'else' arc. This same pattern can also be caused by an empty default block in a switch.

We can't sink the call to x() down to end because no call to x() happens on the third incoming arc (assume that x() has sideeffects for the sake of argument; if something is safe to speculate we could indeed sink nevertheless but this cannot happen in the general case and causes many extra selects).

We are now able to detect this case and split off the unconditional arcs to a common successor:

         [if]
        /    \
      [x(1)] [if]
        |     | \
        |     |  \
        |  [x(2)] |
         \   /    |
     [sink.split] |
           \     /
           [ end ]

Now we can sink the call to x() into %sink.split. This can cause significant code simplification in many testcases.

llvm-svn: 280364
2016-09-01 12:58:13 +00:00
Hal Finkel 5081ac27c7 Add ISD::EH_DWARF_CFA, simplify @llvm.eh.dwarf.cfa on Mips, fix on PowerPC
LLVM has an @llvm.eh.dwarf.cfa intrinsic, used to lower the GCC-compatible
__builtin_dwarf_cfa() builtin. As pointed out in PR26761, this is currently
broken on PowerPC (and likely on ARM as well). Currently, @llvm.eh.dwarf.cfa is
lowered using:

  ADD(FRAMEADDR, FRAME_TO_ARGS_OFFSET)

where FRAME_TO_ARGS_OFFSET defaults to the constant zero. On x86,
FRAME_TO_ARGS_OFFSET is lowered to 2*SlotSize. This setup, however, does not
work for PowerPC. Because of the way that the stack layout works, the canonical
frame address is not exactly (FRAMEADDR + FRAME_TO_ARGS_OFFSET) on PowerPC
(there is a lower save-area offset as well), so it is not just a matter of
implementing FRAME_TO_ARGS_OFFSET for PowerPC (unless we redefine its
semantics -- We can do that, since it is currently used only for
@llvm.eh.dwarf.cfa lowering, but the better to directly lower the CFA construct
itself (since it can be easily represented as a fixed-offset FrameIndex)). Mips
currently does this, but by using a custom lowering for ADD that specifically
recognizes the (FRAMEADDR, FRAME_TO_ARGS_OFFSET) pattern.

This change introduces a ISD::EH_DWARF_CFA node, which by default expands using
the existing logic, but can be directly lowered by the target. Mips is updated
to use this method (which simplifies its implementation, and I suspect makes it
more robust), and updates PowerPC to do the same.

Fixes PR26761.

Differential Revision: https://reviews.llvm.org/D24038

llvm-svn: 280350
2016-09-01 10:28:47 +00:00
Hal Finkel 40d7f5c277 Add a counter-function insertion pass
As discussed in https://reviews.llvm.org/D22666, our current mechanism to
support -pg profiling, where we insert calls to mcount(), or some similar
function, is fundamentally broken. We insert these calls in the frontend, which
means they get duplicated when inlining, and so the accumulated execution
counts for the inlined-into functions are wrong.

Because we don't want the presence of these functions to affect optimizaton,
they should be inserted in the backend. Here's a pass which would do just that.
The knowledge of the name of the counting function lives in the frontend, so
we're passing it here as a function attribute. Clang will be updated to use
this mechanism.

Differential Revision: https://reviews.llvm.org/D22825

llvm-svn: 280347
2016-09-01 09:42:39 +00:00
Dean Michael Berris e8ae5baaf7 [XRay] Detect and emit sleds for sibling/tail calls
Summary:
This change promotes the 'isTailCall(...)' member function to
TargetInstrInfo as a query interface for determining on a per-target
basis whether a given MachineInstr is a tail call instruction. We build
upon this in the XRay instrumentation pass to emit special sleds for
tail call optimisations, where we emit the correct kind of sled.

The tail call sleds look like a mix between the function entry and
function exit sleds. Form-wise, the sled comes before the "jmp"
instruction that implements the tail call similar to how we do it for
the function entry sled. Functionally, because we know this is a tail
call, it behaves much like an exit sled -- i.e. at runtime we may use
the exit trampolines instead of a different kind of trampoline.

A follow-up change to recognise these sleds will be done in compiler-rt,
so that we can start intercepting these initially as exits, but also
have the option to have different log entries to more accurately reflect
that this is actually a tail call.

Reviewers: echristo, rSerge, majnemer

Subscribers: mehdi_amini, dberris, llvm-commits

Differential Revision: https://reviews.llvm.org/D23986

llvm-svn: 280334
2016-09-01 01:29:13 +00:00
Heejin Ahn 10a7086700 Revert "Add asm.js-style setjmp/longjmp handling for wasm"
This reverts commit r280302, it broke the integration tests.

llvm-svn: 280329
2016-09-01 00:44:37 +00:00
Heejin Ahn 23d57103a4 Add asm.js-style setjmp/longjmp handling for wasm
Summary: This patch adds asm.js-style setjmp/longjmp handling support for WebAssembly. It also uses JavaScript's try and catch mechanism.

Reviewers: jpp, dschuff

Subscribers: jfb, dschuff

Differential Revision: https://reviews.llvm.org/D23928

llvm-svn: 280302
2016-08-31 22:40:34 +00:00
Reid Kleckner 109448ee81 Revert "Add an optional parameter with a list of undefs to extendToIndices"
This reverts commit r280268, it causes all MSVC 2013 to ICE. This
appears to have been fixed in a later MSVC 2013 update, because I cannot
reproduce it locally. That said, all upstream LLVM bots are broken right
now, so I am reverting.

Also reverts dependent change r280275, "[Hexagon] Deal with undefs when
extending live intervals".

llvm-svn: 280301
2016-08-31 22:36:02 +00:00
Matt Arsenault b50eb8dc2b AMDGPU: Fix introducing stack access on unaligned v16i8
llvm-svn: 280298
2016-08-31 21:52:27 +00:00
Tim Northover 11a2354670 GlobalISel: use G_TYPE to annotate physregs with a type.
More preparation for dropping source types from MachineInstrs: regsters coming
out of already-selected code (i.e. non-generic instructions) don't have a type,
but that information is needed so we must add it manually.

This is done via a new G_TYPE instruction.

llvm-svn: 280292
2016-08-31 21:24:02 +00:00
Derek Schuff 1b258d313c [WebAssembly] Disable folding of GA+reg into load/store constant offsets
Summary:
If the register has a negative value then unsigned overflow will occur;
this case is sometimes even created intentionally by LSR. For now
disable GA+reg folding. Fixes PR29127

Differential Revision: https://reviews.llvm.org/D24053

llvm-svn: 280285
2016-08-31 20:27:20 +00:00
Quentin Colombet bd850f4185 Actually check for the diagnostic to be emitted!
This makes the test case in r280273 actually useful!

llvm-svn: 280276
2016-08-31 18:53:32 +00:00
Krzysztof Parzyszek e21a0b3b9f [Hexagon] Deal with undefs when extending live intervals
llvm-svn: 280275
2016-08-31 18:52:09 +00:00
Tom Stellard ba5730884b AMDGPU/SI: Make sure llvm.amdgcn.implicitarg.ptr() is at least 4-byte aligned
Summary: This fixes some OpenCV tests that were broken by libclc commit r276443.

Reviewers: arsenm, jvesely

Subscribers: arsenm, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D24051

llvm-svn: 280274
2016-08-31 18:46:07 +00:00
Quentin Colombet 1c06a73a7c [TargetPassConfig] Add a hook to tell whether GlobalISel should warm on fallback.
Thanks to this patch, we know have a way to easly see if GlobalISel
failed.

llvm-svn: 280273
2016-08-31 18:43:04 +00:00
Philip Reames 2b1084ac93 [statepoints][experimental] Add support for live-in semantics of values in deopt bundles
This is a first step towards supporting deopt value lowering and reporting entirely with the register allocator. I hope to build on this in the near future to support live-on-return semantics, but I have a use case which allows me to test and investigate code quality with just the live-in semantics so I've chosen to start there. For those curious, my use cases is our implementation of the "__llvm_deoptimize" function we bind to @llvm.deoptimize. I'm choosing not to hard code that fact in the patch and instead make it configurable via function attributes.

The basic approach here is modelled on what is done for the "Live In" values on stackmaps and patchpoints. (A secondary goal here is to remove one of the last barriers to merging the pseudo instructions.) We start by adding the operands directly to the STATEPOINT SDNode. Once we've lowered to MI, we extend the remat logic used by the register allocator to fold virtual register uses into StackMap::Indirect entries as needed. This does rely on the fact that the register allocator rematerializes. If it didn't along some code path, we could end up with more vregs than physical registers and fail to allocate.

Today, we *only* fold in the register allocator. This can create some weird effects when combined with arguments passed on the stack because we don't fold them appropriately. I have an idea how to fix that, but it needs this patch in place to work on that effectively. (There's some weird interaction with the scheduler as well, more investigation needed.)

My near term plan is to land this patch off-by-default, experiment in my local tree to identify any correctness issues and then start fixing codegen problems one by one as I find them. Once I have the live-in lowering fully working (both correctness and code quality), I'm hoping to move on to the live-on-return semantics. Note: I don't have any *known* miscompiles with this patch enabled, but I'm pretty sure I'll find at least a couple. Thus, the "experimental" tag and the fact it's off by default.

Differential Revision: https://reviews.llvm.org/D24000

llvm-svn: 280250
2016-08-31 15:12:17 +00:00
Simon Pilgrim 6199b4fd49 [X86][SSE] Improve awareness of (v)cvtpd2ps implicit zeroing of upper 64-bits of xmm result
Associate x86_sse2_cvtpd2ps with X86ISD::VFPROUND to avoid inserting unnecessary zeroing shuffles.

Differential Revision: https://reviews.llvm.org/D23797

llvm-svn: 280249
2016-08-31 15:09:34 +00:00
Sjoerd Meijer 46b5b88387 Clang patch r280064 introduced ways to set the FP exceptions and denormal
types. This is the LLVM counterpart and it adds options that map onto FP
exceptions and denormal build attributes allowing better fp math library
selections.

Differential Revision: https://reviews.llvm.org/D24070

llvm-svn: 280246
2016-08-31 14:17:38 +00:00
Krzysztof Parzyszek 64488118bf Fixed spill stack objects are mutable
Differential Revision: https://reviews.llvm.org/D24039

llvm-svn: 280244
2016-08-31 13:52:17 +00:00
James Molloy 76c9d423a7 Revert "[SimplifyCFG] Handle tail-sinking of more than 2 incoming branches"
This reverts commit r280217. r280216 caused buildbot failures - backing out the entire chain.

llvm-svn: 280233
2016-08-31 13:16:45 +00:00
Nikolay Haustov eba808957e AMDGPU/SI: Handle aliases in AMDGPUAlwaysInlinePass
Summary:
Simply replace usage of aliases to functions with aliasee.
This came up when bitcode linking to builtin library and
calls to aliases not being resolved.

Also made minor improvements to existing test.

Reviewers: tstellarAMD, alex-t, vpykhtin

Subscribers: arsenm, wdng, rampitec

Differential Revision: https://reviews.llvm.org/D24023

llvm-svn: 280221
2016-08-31 11:18:33 +00:00
James Molloy c53b40b509 [SimplifyCFG] Handle tail-sinking of more than 2 incoming branches
This was a real restriction in the original version of SinkIfThenCodeToEnd. Now it's been rewritten, the restriction can be lifted.

As part of this, we handle a very common and useful case where one of the incoming branches is actually conditional. Consider:

   if (a)
     x(1);
   else if (b)
     x(2);

This produces the following CFG:

         [if]
        /    \
      [x(1)] [if]
        |     | \
        |     |  \
        |  [x(2)] |
         \    |  /
          [ end ]

[end] has two unconditional predecessor arcs and one conditional. The conditional refers to the implicit empty 'else' arc. This same pattern can also be caused by an empty default block in a switch.

We can't sink the call to x() down to end because no call to x() happens on the third incoming arc (assume that x() has sideeffects for the sake of argument; if something is safe to speculate we could indeed sink nevertheless but this cannot happen in the general case and causes many extra selects).

We are now able to detect this case and split off the unconditional arcs to a common successor:

         [if]
        /    \
      [x(1)] [if]
        |     | \
        |     |  \
        |  [x(2)] |
         \   /    |
     [sink.split] |
           \     /
           [ end ]

Now we can sink the call to x() into %sink.split. This can cause significant code simplification in many testcases.

llvm-svn: 280217
2016-08-31 10:46:33 +00:00
Simon Pilgrim 7b09af193a [X86][SSE] Improve awareness of fptrunc implicit zeroing of upper 64-bits of xmm result
Add patterns to avoid inserting unnecessary zeroing shuffles when lowering fptrunc to (v)cvtpd2ps

Differential Revision: https://reviews.llvm.org/D23797

llvm-svn: 280214
2016-08-31 10:35:13 +00:00
Craig Topper 8f6827c945 [AVX-512] Add patterns to select masked logical operations if the select has a floating point type.
This is needed in order to replace the masked floating point logical op intrinsics with native IR.

llvm-svn: 280195
2016-08-31 05:37:52 +00:00
Craig Topper 0f8fb47637 [AVX-512] Add test cases for masked floating point logic operations with bitcasts between the logic ops and the select. We don't currently select masked operations for these cases.
Test cases taken from optimized clang output after trying to convert the masked floating point logical op intrinsics to native IR.

llvm-svn: 280194
2016-08-31 05:37:50 +00:00
Craig Topper de8b1a0012 [X86] Regenerate a test using update_llc_test_checks.py.
llvm-svn: 280193
2016-08-31 05:37:47 +00:00
Dean Michael Berris 047669f18c [XRay] Support multiple return instructions in a single basic block
Add a .mir test to catch this case, and fix the xray-instrumentation
pass to handle it appropriately.

llvm-svn: 280192
2016-08-31 05:20:08 +00:00
Hal Finkel 97a189c716 [PowerPC] Don't spill the frame pointer twice
When a function contains something, such as inline asm, which explicitly
clobbers the register used as the frame pointer, don't spill it twice. If we
need a frame pointer, it will be saved/restored in the prologue/epilogue code.
Explicitly spilling it again will reuse the same spill slot used by the
prologue/epilogue code, thus clobbering the saved value. The same applies
to the base-pointer or PIC-base register.

Partially fixes PR26856. Thanks to Ulrich for his analysis and the small
inline-asm reproducer.

llvm-svn: 280188
2016-08-31 00:52:03 +00:00
Tim Northover 991b12bf09 GlobalISel: combine extracts & sequences created for legalization
Legalization ends up creating many G_SEQUENCE/G_EXTRACT pairs which leads to
inefficient codegen (even for -O0), so add a quick pass over the function to
remove them again.

llvm-svn: 280155
2016-08-30 20:51:25 +00:00
Matt Arsenault a609e2d5ce AMDGPU: Relax SGPR asm constraint register class
s should be SReg_32 to be as general as possible. This can avoid a copy
from m0.

llvm-svn: 280154
2016-08-30 20:50:08 +00:00
Tim Northover e5102de678 GlobalISel: forbid physical registers on generic MIs.
We're intending to move to a world where the type of a register is determined
by its (unique) def. This is incompatible with physregs, which are untyped.

It also means the other passes don't have to worry quite so much about
register-class compatibility and inserting COPYs appropriately.

llvm-svn: 280132
2016-08-30 18:52:46 +00:00
Hal Finkel 18d0e3f44c [PowerPC] Force entry alignment in .got2
Implement Bill's suggested fix for 32-bit targets for PR22711 (for the
alignment of each entry). As pointed out in the bug report, we could just force
the section alignment, since we only add pointer-sized things currently, but
this fix is somewhat more future-proof.

llvm-svn: 280049
2016-08-30 01:43:38 +00:00
Hal Finkel b074a608ce [PowerPC] Add support for -mlongcall
The "long call" option forces the use of the indirect calling sequence for all
calls (even those that don't really need it). GCC provides this option; This is
helpful, under certain circumstances, for building very-large binaries, and
some other specialized use cases.

Fixes PR19098.

llvm-svn: 280040
2016-08-30 00:59:23 +00:00
Hal Finkel a819cda059 [PowerPC] Add triple to test/CodeGen/PowerPC/atomic-2.ll for ppc64le
Otherwise, running the test on Darwin systems will not work.

llvm-svn: 280034
2016-08-30 00:22:22 +00:00
Hal Finkel 3d70a9dbb7 [PowerPC] Fix i8/i16 atomics for little-Endian targets without partword atomics
For little-Endian PowerPC, we generally target only P8 and later by default.
However, generic (older) 64-bit configurations are still an option, and in that
case, partword atomics are not available (e.g. stbcx.). To lower i8/i16 atomics
without true i8/i16 atomic operations, we emulate using i32 atomics in
combination with a bunch of shifting and masking, etc. The amount by which to
shift in little-Endian mode is different from the amount in big-Endian mode (it
is inverted -- meaning we can leave off the xor when computing the amount).

Fixes PR22923.

llvm-svn: 280022
2016-08-29 22:25:36 +00:00
Krzysztof Parzyszek 354832e585 Propagate TBAA info in SelectionDAG::getIndexedLoad
Patch by Pranav Bhandarkar.

llvm-svn: 279998
2016-08-29 19:50:15 +00:00
Tom Stellard 0d23ebe888 AMDGPU/SI: Implement a custom MachineSchedStrategy
Summary:
GCNSchedStrategy re-uses most of GenericScheduler, it's just uses
a different method to compute the excess and critical register
pressure limits.

It's not enabled by default, to enable it you need to pass -misched=gcn
to llc.

Shader DB stats:

32464 shaders in 17874 tests
Totals:
SGPRS: 1542846 -> 1643125 (6.50 %)
VGPRS: 1005595 -> 904653 (-10.04 %)
Spilled SGPRs: 29929 -> 27745 (-7.30 %)
Spilled VGPRs: 334 -> 352 (5.39 %)
Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread
Code Size: 36688188 -> 37034900 (0.95 %) bytes
LDS: 1913 -> 1913 (0.00 %) blocks
Max Waves: 254101 -> 265125 (4.34 %)
Wait states: 0 -> 0 (0.00 %)

Totals from affected shaders:
SGPRS: 1338220 -> 1438499 (7.49 %)
VGPRS: 886221 -> 785279 (-11.39 %)
Spilled SGPRs: 29869 -> 27685 (-7.31 %)
Spilled VGPRs: 334 -> 352 (5.39 %)
Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread
Code Size: 34315716 -> 34662428 (1.01 %) bytes
LDS: 1551 -> 1551 (0.00 %) blocks
Max Waves: 188127 -> 199151 (5.86 %)
Wait states: 0 -> 0 (0.00 %)

Reviewers: arsenm, mareko, nhaehnle, MatzeB, atrick

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: https://reviews.llvm.org/D23688

llvm-svn: 279995
2016-08-29 19:42:52 +00:00
Tom Stellard c2ff0eb697 AMDGPU/SI: Improve SILoadStoreOptimizer and run it before the scheduler
Summary:
The SILoadStoreOptimizer can now look ahead more then one instruction when
looking for instructions to merge, which greatly improves the number of
loads/stores that we are able to merge.

Moving the pass before scheduling avoids increasing register pressure after
the scheduler, so that the scheduler's register pressure estimates will be
more accurate.  It also gives more consistent results, since it is no longer
affected by minor scheduling changes.

Reviewers: arsenm

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: https://reviews.llvm.org/D23814

llvm-svn: 279991
2016-08-29 19:15:22 +00:00
Tim Northover edb3c8ccb8 GlobalISel: legalize frem to a libcall on AArch64.
llvm-svn: 279988
2016-08-29 19:07:16 +00:00
Matt Arsenault b90fc9b3b4 AMDGPU/R600: Fix fixups used for constant arrays
Fixes bug 29289

llvm-svn: 279986
2016-08-29 19:01:48 +00:00
Kyle Butt 092c4dd5b6 IfConversion: Fix branch predication bug.
This bug shows up with diamonds that share unpredicable, unanalyzable branches.
There's an included test case from Hexagon. What was happening was that we were
attempting to predicate the branch instruction despite the fact that it was
checked to be the same. Now for unanalyzable branches we skip over the branch
instructions when predicating the block.

Differential Revision: https://reviews.llvm.org/D23939

llvm-svn: 279985
2016-08-29 18:27:12 +00:00
Reid Kleckner cfec5ff1b9 Make vec_fabs.ll pass with MSVC 2013
We should revert this change once we drop support for MSVC 2013.

llvm-svn: 279979
2016-08-29 16:35:43 +00:00
Sanjay Patel b57d0a2fda [TargetLowering] remove fdiv and frem from canOpTrap() (PR29114)
Assuming the default FP env, we should not treat fdiv and frem any differently in terms of 
trapping behavior than any other FP op. Ie, FP ops do not trap with the default FP env.

This matches how we treat these ops in IR with isSafeToSpeculativelyExecute(). There's a 
similar bug in Constant::canTrap().

This bug manifests in PR29114:
https://llvm.org/bugs/show_bug.cgi?id=29114
...as a sequence of scalar divisions instead of a vector division on x86 for a <3 x float> 
type.

Differential Revision: https://reviews.llvm.org/D23974

llvm-svn: 279970
2016-08-29 13:32:41 +00:00
Krzysztof Parzyszek 0a955d6dcb Do not use MRI::getMaxLaneMaskForVReg as a mask covering whole register
MRI::getMaxLaneMaskForVReg does not always cover the whole register.
For example, on X86 the upper 16 bits of EAX cannot be accessed via
any subregister. Consequently, there is no lane mask that only covers
that part of EAX. The getMaxLaneMaskForVReg will return the union of
the lane masks for all subregisters, and in case of EAX, that union
will not cover the upper 16 bits.

This fixes https://llvm.org/bugs/show_bug.cgi?id=29132

llvm-svn: 279969
2016-08-29 13:15:35 +00:00
Tom Stellard 5d3f71f721 AMDGPU/SI: Improve register allocation hints for sopk instructions
Summary:
For shrinking SOPK instructions, we were creating a hint to tell the
register allocator to use the register allocated for src0 for the dst
operand as well.  However, this seems to not work sometimes depending
on the order virtual registers are assigned physical registers.

To fix this, I've added a second allocation hint which does the reverse,
asks that the register allocated for dst is used for src0.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23862

llvm-svn: 279968
2016-08-29 13:06:10 +00:00
Rafael Espindola 412a529551 Use the correct ctor/dtor section for dynamic-no-pic.
llvm-svn: 279967
2016-08-29 12:47:22 +00:00
Igor Breger 24281b4740 Fixed a bug in type legalizer for masked gather.
The problem occurs when the Node doesn't updated in place , UpdateNodeOperation() return the node that already exist.
In this case assert fail in PromoteIntegerOperand() , N have 2 results ( val + chain).

Differential Revision: http://reviews.llvm.org/D23756

llvm-svn: 279961
2016-08-29 09:12:31 +00:00
Igor Breger 1a388871b9 [AVX512] In some cases KORTEST instruction may be used instead of ZEXT + TEST sequence.
Differential Revision: http://reviews.llvm.org/D23490

llvm-svn: 279960
2016-08-29 08:52:52 +00:00
Craig Topper 713085e60a [X86] Don't lower FABS/FNEG masking directly to a ConstantPool load. Just create a ConstantFPSDNode and let that be lowered.
This allows broadcast loads to used when available.

llvm-svn: 279958
2016-08-29 04:49:31 +00:00
Craig Topper 71584cd0f0 [AVX-512] Add 512-bit fabs tests with and without AVX512DQ.
llvm-svn: 279956
2016-08-29 04:49:24 +00:00
Craig Topper 850feaf3b7 [AVX-512] Add support for selecting 512-bit VPABSB/VPABSW when BWI is available.
llvm-svn: 279951
2016-08-28 22:20:51 +00:00
Craig Topper a47fc6e5b5 [AVX-512] Add testcases showing that we don't emit 512-bit vpabsb/vpabsw. Will be fixed in a future commit.
llvm-svn: 279949
2016-08-28 22:20:45 +00:00
Sanjay Patel cd7d0c6aca [x86] add tests for <3 x N> vector types (PR29114)
llvm-svn: 279939
2016-08-28 18:31:32 +00:00
Simon Pilgrim 5369cd9e9c [X86][AVX512] Only combine EVEX targets shuffles to shuffles of the same number of vector elements
Over eager combing prevents the correct folding of writemasks.

At the moment this occurs for ALL EVEX shuffles, in the future we need to check that the user of the root shuffle is a VSELECT that can fold to a writemask.

llvm-svn: 279934
2016-08-28 17:27:14 +00:00
Hal Finkel 5728200f33 [PowerPC] Implement lowering for atomicrmw min/max/umin/umax
Implement lowering for atomicrmw min/max/umin/umax. Fixes PR28818.

llvm-svn: 279933
2016-08-28 16:17:58 +00:00
Craig Topper abe80cc04d [AVX-512] Promote AND/OR/XOR to v2i64/v4i64/v8i64 even when we have AVX512F/AVX512VL.
Previously we weren't creating masked logical operations if bitcasts appeared between the logic operation and the select. The IR optimizers can move bitcasts across logic operations and create these cases. To minimize the number of cases we need to handle, this change promotes all logic ops to an i64 vector type just like when only SSE or AVX is available.

Unfortunately, this also has the consequence of making it difficult to select unmasked VPANDD/VPORD/VPXORD in all the cases it was previously used. This is the cause of most of the test change. This shouldn't result in any functional change though.

llvm-svn: 279929
2016-08-28 06:06:28 +00:00
Craig Topper 8046e2033e [AVX-512] Add tests to show that we don't select masked logic ops if there are bitcasts between the logic op and the select.
This is taken from optimized IR of clang test cases for masked logic ops.

llvm-svn: 279928
2016-08-28 06:06:24 +00:00
Jan Vesely 38814fa2fd AMDGPU/R600: Enable Load combine
Fix and improve tests

Differential Revision: https://reviews.llvm.org/D23899

llvm-svn: 279925
2016-08-27 19:09:43 +00:00
Craig Topper 144fdef66b [X86] Enable FR32/FR64 cmpeq/cmpne/cmpunord/cmpord to be commuted.
llvm-svn: 279913
2016-08-27 05:22:12 +00:00
Craig Topper 4891c724aa [AVX-512] Add load folding for EVEX vcmpps/pd/ss/sd.
llvm-svn: 279912
2016-08-27 05:22:08 +00:00
Matt Arsenault 2712d4a3d8 AMDGPU: Select mulhi 24-bit instructions
llvm-svn: 279902
2016-08-27 01:32:27 +00:00
Matt Arsenault 22e417956d AMDGPU: Move cndmask pseudo to be isel pseudo
There's only one use of this for the convenience
of a pattern. I think v_mov_b64_pseudo should also be
moved, but SIFoldOperands does currently make use of it.

llvm-svn: 279901
2016-08-27 01:00:37 +00:00
Quentin Colombet 374796d678 [GlobalISel] Add a fallback path to SDISel.
When global-isel fails on a MachineFunction MF, MF will be cleaned up
and given to SDISel.
Thanks to this fallback, we can already perform correctness test even if
we support only a small portion of the functions in a test.

llvm-svn: 279891
2016-08-27 00:18:31 +00:00
Michael Kuperstein aea50f8b84 [X86] Add baseline test for "odd" shuffles. NFC.
Adds a baseline test for lowering shuffles where the width of the output
vector is not twice the size of the input vectors. Many of those sequences
are suboptimal, and will hopefully be improved in follow-up patches.

llvm-svn: 279888
2016-08-27 00:10:24 +00:00
Tom Stellard e175d8aba5 AMDGPU/SI: Canonicalize offset order for merged DS instructions
Summary:
If the scheduler clusters the loads, then the offsets will be sorted,
but it is possible for the scheduler to scheduler loads together
without out explicitly clustering them, which would give us non-sorted
offsets.

Also, we will want to do this if we move the load/store optimizer before
the scheduler.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23776

llvm-svn: 279870
2016-08-26 21:36:47 +00:00
Manman Ren 66b54e9f32 Swift Calling Convetion: add support for AArch64.
It will just be the same as the regular calling convention.

rdar://28029509

llvm-svn: 279853
2016-08-26 19:28:17 +00:00
Tim Northover 85cf564c51 AArch64: avoid assertion on illegal types in performFDivCombine.
In the code to detect fixed-point conversions and make use of AArch64's special
instructions, we weren't prepared for weird types. The fptosi direction got
fixed recently, but not the similar sitofp code.

llvm-svn: 279852
2016-08-26 18:52:31 +00:00
Chad Rosier 58f505ba24 [AArch64] Avoid materializing constant values when generating csel instructions.
Differential Revision: https://reviews.llvm.org/D23677

llvm-svn: 279849
2016-08-26 18:05:50 +00:00
Tim Northover bc1701c7fb GlobalISel: mark G_FPEXT legal from float to double.
llvm-svn: 279845
2016-08-26 17:46:22 +00:00
Tim Northover 30bd36e3fc GlobalISel: mark G_FCMP legal on float & double.
llvm-svn: 279844
2016-08-26 17:46:19 +00:00
Tim Northover 051b8ad3d9 GlobalISel: simplify G_ICMP legalization regime.
It's unclear how the old

    %res(32) = G_ICMP { s32, s32 } intpred(eq), %0, %1

is actually different from an s1 verison

    %res(1) = G_ICMP { s1, s32 } intpred(eq), %0, %1

so we'll remove it for now.

llvm-svn: 279843
2016-08-26 17:46:17 +00:00
Tim Northover cecee56abb GlobalISel: legalize sdiv and srem operations.
llvm-svn: 279842
2016-08-26 17:46:13 +00:00
Tim Northover 7a753d9bec GlobalISel: legalize under-width divisions.
llvm-svn: 279841
2016-08-26 17:46:06 +00:00
Tim Northover 1d18a99a53 GlobalISel: mark selects legal
llvm-svn: 279840
2016-08-26 17:46:03 +00:00
Tim Northover 5d0eaa4e79 GlobalISel: mark float/int conversions legal
llvm-svn: 279839
2016-08-26 17:45:58 +00:00
Chad Rosier 39c1dbb845 [AArch64] Avoid materializing constant 1 by using csinc, rather than csel.
This is similar to what was done in r261675, but for CSINC rather than CSINV.

Differential Revision: https://reviews.llvm.org/D23892

llvm-svn: 279822
2016-08-26 14:01:55 +00:00
Pablo Barrio b8ec630583 Handle empty functions with debug info in load/store opt pass
Summary:
In fuctions that contained debug info but were empty otherwise,
the ARM load/store optimizer could abort. This was because
function MergeReturnIntoLDM handled the special case where a
Machine Basic BLock is empty by calling MBB.empty(). However, this
returns false in presence of debug info, although the function
should be considered empty in the eyes of the load/store optimizer.
This has been fixed by handling the case where searching through the
block finds only debug instructions.

Reviewers: rengolin, dexonsmith, llvm-commits, jmolloy

Subscribers: t.p.northover, aemerson, rengolin, samparker

Differential Revision: https://reviews.llvm.org/D23847

llvm-svn: 279820
2016-08-26 13:00:39 +00:00
Simon Pilgrim 091c4c781c [X86][SSE4A] The EXTRQ/INSERTQ bit extraction/insertion ops should be in the integer domain
llvm-svn: 279811
2016-08-26 09:55:41 +00:00
Craig Topper 8f27f51192 [X86][SSE] Add CMPSS/CMPSD intrinsic scalar load folding support.
llvm-svn: 279806
2016-08-26 07:08:00 +00:00
Matt Arsenault f403df38eb Replace subregister uses when processing tied operands
This was for some reason skipping operands that are subregisters
instead of keeping the same subregister index.

v_movreld_b32 expects src0 to be the subregister of the tied
super register use/def.

e.g.

v_movreld_b32 v0, v9, <imp-def, tied3> v[0:3], <imp-use, tied2> v[0:3]

was being replaced with

v[4:7] = copy v[0:3]
v_movreld_b32 v0, v9, <imp-def, tied3> v[4:7], <imp-use, tied2> v[4:7],

which really writes to v[0:3]

llvm-svn: 279804
2016-08-26 06:31:32 +00:00
Michael Kuperstein 2ee911e985 Revert r274613 because it breaks the test suite with AVX512
This reverts most of r274613 (AKA r274626) and its follow-ups (r276347, r277289),
due to miscompiles in the test suite. The FastISel change was left in, because
it apparently fixes an unrelated issue.

(Recommit of r279782 which was broken due to a bad merge.)

This fixes 4 out of the 5 test failures in PR29112.

llvm-svn: 279788
2016-08-25 22:48:11 +00:00
Michael Kuperstein 6e271f4ce8 Revert r279782 due to debug buildbot breakage.
llvm-svn: 279785
2016-08-25 22:14:45 +00:00
Michael Kuperstein a6ccc8d365 Revert r274613 because it breaks the test suite with AVX512
This reverts most of r274613 and its follow-ups (r276347, r277289), due to
miscompiles in the test suite. The FastISel change was left in, because it
apparently fixes an unrelated issue.

This fixes 4 out of the 5 test failures in PR29112.

llvm-svn: 279782
2016-08-25 21:55:41 +00:00
Tim Northover fe880a8801 GlobalISel: mark simple ops legal even on types < 32-bit.
The 32-bit variants of these operations don't depend on the bits not being
operated on, so they also naturally model operations narrower than the actual
register width.

llvm-svn: 279760
2016-08-25 17:37:39 +00:00
Tim Northover 7a1ec0141a GlobalISel: mark pointer constants as legal on AArch64.
llvm-svn: 279759
2016-08-25 17:37:35 +00:00
Tim Northover 438c77ca1a GlobalISel: perform multi-step legalization
llvm-svn: 279758
2016-08-25 17:37:32 +00:00
Tim Northover 2c4a838e24 GlobalISel: mark small extends as legal on AArch64
llvm-svn: 279757
2016-08-25 17:37:25 +00:00
Michael Kuperstein 40887c5566 [X86] 512-bit VPAVG requires AVX512BW
Fix VPAVG detection to require AVX512BW, not AVX512F for 512-bit widths,
and change associated asserts to assert in the right direction...

This fixes PR29111.

llvm-svn: 279755
2016-08-25 17:17:46 +00:00
Ron Lieberman c93d123b86 [Hexagon] vector store print tracing.
Add vector store print tracing option for hexagon vector instructions.

https://reviews.llvm.org/D23870

llvm-svn: 279739
2016-08-25 13:35:48 +00:00
Simon Pilgrim 3125501bba [X86][AVX] Improved AVX512F/AVX512VL SubVectorBroadcast tests
llvm-svn: 279736
2016-08-25 12:50:13 +00:00
Simon Pilgrim 0ad9f3e93b [X86][AVX] Provide SubVectorBroadcast fallback if load fold fails (PR29133)
Fix for PR29133, matching the approach that was taken for AVX1 scalar broadcasts.

llvm-svn: 279735
2016-08-25 12:45:16 +00:00
Matthias Braun 1eb473680a MachineFunctionProperties/MIRParser: Rename AllVRegsAllocated->NoVRegs, compute it
Rename AllVRegsAllocated to NoVRegs. This avoids the connotation of
running after register and simply describes that no vregs are used in
a machine function. With that we can simply compute the property and do
not need to dump/parse it in .mir files.

Differential Revision: http://reviews.llvm.org/D23850

llvm-svn: 279698
2016-08-25 01:27:13 +00:00
Kyle Butt 90e51b1bef Test: Add REQUIRES: asserts to test that now requires stats.
Test was modified in r279670

llvm-svn: 279690
2016-08-25 00:06:52 +00:00
Krzysztof Parzyszek 6dff336ad1 [Hexagon] Check for block end when skipping debug instructions
llvm-svn: 279681
2016-08-24 22:36:35 +00:00
Matthias Braun a319e2cae0 MIRParser/MIRPrinter: Compute HasInlineAsm instead of printing/parsing it
llvm-svn: 279680
2016-08-24 22:34:06 +00:00
Matthias Braun 5dce48e0a7 Missed a test in my last commit
llvm-svn: 279679
2016-08-24 22:32:11 +00:00
Matthias Braun f1b20c5225 MachineRegisterInfo/MIR: Initialize tracksSubRegLiveness early, do not print/parser it
tracksSubRegLiveness only depends on the Subtarget and a cl::opt, there
is not need to change it or save/parse it in a .mir file.
Make the field const and move the initialization LiveIntervalAnalysis to the
MachineRegisterInfo constructor. Also cleanup some code and fix some
instances which better use MachineRegisterInfo::subRegLivenessEnabled() instead
of TargetSubtargetInfo::enableSubRegLiveness().

llvm-svn: 279676
2016-08-24 22:17:45 +00:00
Kyle Butt a8c7371d16 CodeGen: If Convert blocks that would form a diamond when tail-merged.
The following function currently relies on tail-merging for if
conversion to succeed. The common tail of cond_true and cond_false is
extracted, and this then forms a diamond pattern that can be
successfully if converted.

If this block does not get extracted, either because tail-merging is
disabled or the threshold is higher, we should still recognize this
pattern and if-convert it.

Fixed a regression in the original commit. Need to un-reverse branches after
reversing them, or other conversions go awry.

define i32 @t2(i32 %a, i32 %b) nounwind {
entry:
        %tmp1434 = icmp eq i32 %a, %b           ; <i1> [#uses=1]
        br i1 %tmp1434, label %bb17, label %bb.outer

bb.outer:               ; preds = %cond_false, %entry
        %b_addr.021.0.ph = phi i32 [ %b, %entry ], [ %tmp10, %cond_false ]
        %a_addr.026.0.ph = phi i32 [ %a, %entry ], [ %a_addr.026.0, %cond_false ]
        br label %bb

bb:             ; preds = %cond_true, %bb.outer
        %indvar = phi i32 [ 0, %bb.outer ], [ %indvar.next, %cond_true ]
        %tmp. = sub i32 0, %b_addr.021.0.ph
        %tmp.40 = mul i32 %indvar, %tmp.
        %a_addr.026.0 = add i32 %tmp.40, %a_addr.026.0.ph
        %tmp3 = icmp sgt i32 %a_addr.026.0, %b_addr.021.0.ph
        br i1 %tmp3, label %cond_true, label %cond_false

cond_true:              ; preds = %bb
        %tmp7 = sub i32 %a_addr.026.0, %b_addr.021.0.ph
        %tmp1437 = icmp eq i32 %tmp7, %b_addr.021.0.ph
        %indvar.next = add i32 %indvar, 1
        br i1 %tmp1437, label %bb17, label %bb

cond_false:             ; preds = %bb
        %tmp10 = sub i32 %b_addr.021.0.ph, %a_addr.026.0
        %tmp14 = icmp eq i32 %a_addr.026.0, %tmp10
        br i1 %tmp14, label %bb17, label %bb.outer

bb17:           ; preds = %cond_false, %cond_true, %entry
        %a_addr.026.1 = phi i32 [ %a, %entry ], [ %tmp7, %cond_true ], [ %a_addr.026.0, %cond_false ]
        ret i32 %a_addr.026.1
}

Without tail-merging or diamond-tail if conversion:
LBB1_1:                                 @ %bb
                                        @ =>This Inner Loop Header: Depth=1
        cmp     r0, r1
        ble     LBB1_3
@ BB#2:                                 @ %cond_true
                                        @   in Loop: Header=BB1_1 Depth=1
        subs    r0, r0, r1
        cmp     r1, r0
        it      ne
        cmpne   r0, r1
        bgt     LBB1_4
LBB1_3:                                 @ %cond_false
                                        @   in Loop: Header=BB1_1 Depth=1
        subs    r1, r1, r0
        cmp     r1, r0
        bne     LBB1_1
LBB1_4:                                 @ %bb17
        bx      lr

With diamond-tail if conversion, but without tail-merging:
@ BB#0:                                 @ %entry
        cmp     r0, r1
        it      eq
        bxeq    lr
LBB1_1:                                 @ %bb
                                        @ =>This Inner Loop Header: Depth=1
        cmp     r0, r1
        ite     le
        suble   r1, r1, r0
        subgt   r0, r0, r1
        cmp     r1, r0
        bne     LBB1_1
@ BB#2:                                 @ %bb17
        bx      lr

llvm-svn: 279671
2016-08-24 21:34:27 +00:00
Kyle Butt 6262ca3448 IfConversion: Rescan diamonds.
The cost of predicating a diamond is only the instructions that are not shared
between the two branches. Additionally If a predicate clobbering instruction
occurs in the shared portion of the branches (e.g. a cond move), it may still
be possible to if convert the sub-cfg. This change handles these two facts by
rescanning the non-shared portion of a diamond sub-cfg to recalculate both the
predication cost and whether both blocks are pred-clobbering.

Fixed 2 bugs before recommitting. Branch instructions must be compared and found
identical before diamond conversion. Also, predicate-clobbering instructions in
the shared prefix disqualifies a potential diamond conversion. Includes tests
for both.

llvm-svn: 279670
2016-08-24 21:34:24 +00:00
Changpeng Fang 75f0968b39 AMDGCN/SI: Implement readlane/readfirstlane intrinsics
Summary:
  This patch implements readlane/readfirstlane intrinsics.
TODO: need to define a new register class to consider the case
that the source could be a vector register or M0.

Reviewed by:
  arsenm and tstellarAMD

Differential Revision:
  http://reviews.llvm.org/D22489

llvm-svn: 279660
2016-08-24 20:35:23 +00:00
Rafael Espindola 70c6a3976b Use isTargetMachO instead of isTargetDarwin.
llvm-svn: 279655
2016-08-24 19:02:29 +00:00
Simon Pilgrim e14653e17d [X86][SSE] Add MINSD/MAXSD/MINSS/MAXSS intrinsic scalar load folding support
These are no different in load behaviour to the existing ADD/SUB/MUL/DIV scalar ops but were missing from isNonFoldablePartialRegisterLoad

llvm-svn: 279652
2016-08-24 18:40:53 +00:00
Simon Pilgrim 941bd6bbae [X86][SSE] Add support for combining VZEXT_MOVL target shuffles
Includes adding more general support for the pattern: VZEXT_MOVL(VZEXT_LOAD(ptr)) -> VZEXT_LOAD(ptr)

This has unearthed a couple of latent poor codegen issues (MINSS/MAXSS scalar load folding and MOVDDUP/BROADCAST load folding patterns), which will be fixed shortly.

Its also reduced a couple of tests so that they no longer reach the instruction threshold necessary to be combined to PSHUFB (see PR26183).

llvm-svn: 279646
2016-08-24 18:07:53 +00:00
Tim Northover 65f6336ff9 GlobalISel: fix cmp test to be in SSA form
llvm-svn: 279633
2016-08-24 15:37:51 +00:00
Simon Pilgrim 2725217680 [X86][SSE] Regenerate scalar math load folding tests for 32 and 64 bit targets
llvm-svn: 279630
2016-08-24 15:07:11 +00:00
Wei Ding 1041a646a9 AMDGPU : Add V_SAD_U32 instruction pattern.
Differential Revision: http://reviews.llvm.org/D23069

llvm-svn: 279629
2016-08-24 14:59:47 +00:00
Krzysztof Parzyszek a7ed090bba Create subranges for new intervals resulting from live interval splitting
The register allocator can split a live interval of a register into a set
of smaller intervals. After the allocation of registers is complete, the
rewriter will modify the IR to replace virtual registers with the corres-
ponding physical registers. At this stage, if a register corresponding
to a subregister of a virtual register is used, the rewriter will check
if that subregister is undefined, and if so, it will add the <undef> flag
to the machine operand. The function verifying liveness of the subregis-
ter would assume that it is undefined, unless any of the subranges of the
live interval proves otherwise.
The problem is that the live intervals created during splitting do not
have any subranges, even if the original parent interval did. This could
result in the <undef> flag placed on a register that is actually defined.

Differential Revision: http://reviews.llvm.org/D21189

llvm-svn: 279625
2016-08-24 13:37:55 +00:00
Simon Dardis f114820912 [mips] Preparatory work for a generic scheduler
Extend instruction definitions from nearly all ISAs to include
appropriate instruction itineraries. Change MIPS16s gp prologue
generation to use real instructions instead of using a pseudo
instruction.

Reviewers: dsanders, vkalintiris

Differential Review: https://reviews.llvm.org/D23548

llvm-svn: 279623
2016-08-24 13:00:47 +00:00
Simon Pilgrim 7a50c8c2ba [X86][AVX2] Ensure on 32-bit targets that we broadcast f64 types not i64 (PR29101)
llvm-svn: 279622
2016-08-24 12:42:31 +00:00
Simon Pilgrim 3c8cd3df5e [X86][F16C] Regenerated f16c tests
llvm-svn: 279621
2016-08-24 11:56:15 +00:00
Matthias Braun 733fe3676c CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses
Re-apply this patch, hopefully I will get away without any warnings
in the constructor now.

This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.

This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.

Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.

Differential Revision: http://reviews.llvm.org/D23736

llvm-svn: 279602
2016-08-24 01:52:46 +00:00
Matthias Braun 79f85b3b8f MIRParser/MIRPrinter: Compute isSSA instead of printing/parsing it.
Specifying isSSA is an extra line at best and results in invalid MI at
worst. Compute the value instead.

Differential Revision: http://reviews.llvm.org/D22722

llvm-svn: 279600
2016-08-24 01:32:41 +00:00
Richard Smith 8c3fbdc6c4 Revert r279564. It introduces undefined behavior (binding a reference to a
dereferenced null pointer) in MachineModuleInfo::MachineModuleInfo that causes
-Werror builds (including several buildbots) to fail.

llvm-svn: 279580
2016-08-23 22:08:27 +00:00
Tim Northover d0cfb7344e GlobalISel: add some G_TRUNCs to make icmp test valid MIR.
llvm-svn: 279579
2016-08-23 22:07:31 +00:00
Tim Northover 4bdf473590 GlobalISel: add forgotten test-case for G_ICMP
llvm-svn: 279569
2016-08-23 21:11:36 +00:00
Tim Northover bdf67c9a00 GlobalISel: make truncate/extend casts uniform
They really should have both types represented, but early variants were created
before MachineInstrs could have multiple types so they're rather ambiguous.

llvm-svn: 279567
2016-08-23 21:01:33 +00:00
Tim Northover b3a0be4d38 GlobalISel: legalize conditional branches on AArch64.
llvm-svn: 279565
2016-08-23 21:01:20 +00:00
Matthias Braun 4c1f1f120c CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses
Re-apply this commit with the deletion of a MachineFunction delegated to
a separate pass to avoid use after free when doing this directly in
AsmPrinter.

This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.

This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.

Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.

Differential Revision: http://reviews.llvm.org/D23736

llvm-svn: 279564
2016-08-23 20:58:29 +00:00
Tim Northover 456a3c03ac GlobalISel: mark pointer casts legal on AArch64.
llvm-svn: 279553
2016-08-23 19:30:38 +00:00
Tim Northover 3c73e367c0 GlobalISel: legalize 1-bit load/store and mark 8/16 bit variants legal on AArch64.
llvm-svn: 279548
2016-08-23 18:20:09 +00:00
Simon Pilgrim 95580d6ed2 [X86][SSE] Demonstrate inability to recognise that (v)cvtpd2dq & (v)cvttpd2dq intrinsics implicitly zeroes the upper half of the xmm
llvm-svn: 279527
2016-08-23 16:11:21 +00:00
Krzysztof Parzyszek 38e2ccc8d0 [Hexagon] Packetize return value setup with the return instruction
Commit r279241 unintentionally reverted that ability.

llvm-svn: 279526
2016-08-23 16:01:01 +00:00
Simon Pilgrim 04b99fcded [X86][AVX] Updated fptosi_2f64_to_4i32 test to show missed opportunity to implicit zero the upper elements
llvm-svn: 279521
2016-08-23 15:10:39 +00:00
Simon Pilgrim 22c415a696 [X86][AVX] Add v2i32 fp to int conversion tests
llvm-svn: 279520
2016-08-23 15:00:52 +00:00
Simon Pilgrim cb96142fb8 [X86][AVX] Add AVX2/AVX512 fp to int conversion tests
llvm-svn: 279518
2016-08-23 14:37:35 +00:00
Simon Pilgrim 2ed547513d [X86][SSE] Demonstrate inability to recognise that (v)cvtpd2ps intrinsics implicitly zeroes the upper half of the xmm
llvm-svn: 279511
2016-08-23 11:26:28 +00:00
Simon Pilgrim 9eb978b47b [X86][SSE] Demonstrate inability to recognise that (v)cvtpd2ps implicitly zeroes the upper half of the xmm
llvm-svn: 279508
2016-08-23 10:35:24 +00:00
Oliver Stannard 9aa6f010a4 [ARM] Generate consistent frame records for Thumb2
There is not an official documented ABI for frame pointers in Thumb2,
but we should try to emit something which is useful.

We use r7 as the frame pointer for Thumb code, which currently means
that if a function needs to save a high register (r8-r11), it will get
pushed to the stack between the frame pointer (r7) and link register
(r14). This means that while a stack unwinder can follow the chain of
frame pointers up the stack, it cannot know the offset to lr, so does
not know which functions correspond to the stack frames.

To fix this, we need to push the callee-saved registers in two batches,
with the first push saving the low registers, fp and lr, and the second
push saving the high registers. This is already implemented, but
previously only used for iOS. This patch turns it on for all Thumb2
targets when frame pointers are required by the ABI, and the frame
pointer is r7 (Windows uses r11, so this isn't a problem there). If
frame pointer elimination is enabled we still emit a single push/pop
even if we need a frame pointer for other reasons, to avoid increasing
code size.

We must also ensure that lr is pushed to the stack when using a frame
pointer, so that we end up with a complete frame record. Situations that
could cause this were rare, because we already push lr in most
situations so that we can return using the pop instruction.

Differential Revision: https://reviews.llvm.org/D23516

llvm-svn: 279506
2016-08-23 09:19:22 +00:00
Matthias Braun 7f66202d38 Revert "(HEAD -> master, origin/master, origin/HEAD) CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses"
Reverting while tracking down a use after free.

This reverts commit r279502.

llvm-svn: 279503
2016-08-23 05:17:11 +00:00
Matthias Braun fd936841eb CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses
This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.

This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.

Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.

Differential Revision: http://reviews.llvm.org/D23736

llvm-svn: 279502
2016-08-23 03:20:09 +00:00
Matt Arsenault 567631bdd4 BranchRelaxation: Fix handling of blocks with multiple conditional
branches

Looping over all terminators exposed AArch64 tests hitting
an assert from analyzeBranch failing. I believe these cases
were miscompiled before.

e.g.
  fcmp s0, s1
  b.ne LBB0_1
  b.vc LBB0_2
  b LBB0_2
LBB0_1:
  ; Large block
LBB0_2:
 ; ...

Both of the individual conditional branches need to
be expanded, since neither can reach the final block.

Split the original block into ones which analyzeBranch
will be able to understand.

llvm-svn: 279499
2016-08-23 01:30:30 +00:00
Matt Arsenault 78fc9daf8d AMDGPU: Split SILowerControlFlow into two pieces
Do most of the lowering in a pre-RA pass. Keep the skip jump
insertion late, plus a few other things that require more
work to move out.

One concern I have is now there may be COPY instructions
which do not have the necessary implicit exec uses
if they will be lowered to v_mov_b32.

This has a positive effect on SGPR usage in shader-db.

llvm-svn: 279464
2016-08-22 19:33:16 +00:00
James Molloy 5bf2114265 [SimplifyCFG] Rewrite SinkThenElseCodeToEnd
[Recommitting now an unrelated assertion in SROA is sorted out]

The new version has several advantages:
  1) IMSHO it's more readable and neater
  2) It handles loads and stores properly
  3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.

With this change we can now finally sink load-modify-store idioms such as:

    if (a)
      return *b += 3;
    else
      return *b += 4;

    =>

    %z = load i32, i32* %y
    %.sink = select i1 %a, i32 5, i32 7
    %b = add i32 %z, %.sink
    store i32 %b, i32* %y
    ret i32 %b

When this works for switches it'll be even more powerful.

Round 4. This time we should handle all instructions correctly, and not replace any operands that need to be constant with variables.

This was really hard to determine safely, so the helper function should be put into the Instruction API. I'll do that as a followup.

llvm-svn: 279460
2016-08-22 19:07:15 +00:00
James Molloy 475f4a763f Revert "[SimplifyCFG] Rewrite SinkThenElseCodeToEnd"
This reverts commit r279443. It caused buildbot failures.

llvm-svn: 279447
2016-08-22 18:13:12 +00:00
James Molloy 353052698a [SimplifyCFG] Rewrite SinkThenElseCodeToEnd
The new version has several advantages:
  1) IMSHO it's more readable and neater
  2) It handles loads and stores properly
  3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.

With this change we can now finally sink load-modify-store idioms such as:

    if (a)
      return *b += 3;
    else
      return *b += 4;

    =>

    %z = load i32, i32* %y
    %.sink = select i1 %a, i32 5, i32 7
    %b = add i32 %z, %.sink
    store i32 %b, i32* %y
    ret i32 %b

When this works for switches it'll be even more powerful.

Round 4. This time we should handle all instructions correctly, and not replace any operands that need to be constant with variables.

This was really hard to determine safely, so the helper function should be put into the Instruction API. I'll do that as a followup.

llvm-svn: 279443
2016-08-22 17:40:23 +00:00
Simon Pilgrim c8ad5c069c [X86][AVX] Don't use SubVectorBroadcast if there are additional users of the chain (PR29088)
We could improve on this by making X86SubVBroadcast a full memory intrinsic similar to X86vzload

llvm-svn: 279441
2016-08-22 16:47:55 +00:00
Simon Pilgrim 2279e59573 [X86][SSE] Avoid specifying unused arguments in SHUFPD lowering
As discussed on PR26491, we are missing the opportunity to make use of the smaller MOVHLPS instruction because we set both arguments of a SHUFPD when using it to lower a single input shuffle.

This patch sets the lowered argument to UNDEF if that shuffle element is undefined. This in turn makes it easier for target shuffle combining to decode UNDEF shuffle elements, allowing combines to MOVHLPS to occur.

A fix to match against MOVHPD stores was necessary as well.

This builds on the improved MOVLHPS/MOVHLPS lowering and memory folding support added in D16956

Adding similar support for SHUFPS will have to wait until have better support for target combining of binary shuffles.

Differential Revision: https://reviews.llvm.org/D23027

llvm-svn: 279430
2016-08-22 12:56:54 +00:00
Guy Blank 9ae797a798 [AVX512][FastISel] Do not use K registers in TEST instructions
In some cases, FastIsel was emitting TEST instruction with K reg input, which is illegal.
Changed to using KORTEST when dealing with K regs.

Differential Revision: https://reviews.llvm.org/D23163

llvm-svn: 279393
2016-08-21 08:02:27 +00:00
Duncan P. N. Exon Smith 8f44c98d04 ARM: Avoid dereferencing end() in ARMFrameLowering::emitEpilogue
This fixes the crash from PR29072, where the MachineBasicBlock::iterator
wasn't being properly checked against MachineBasicBlock::end() before
iterating.  This was another bug exposed by the new
ilist::iterator::operator*() assertion from r279314.

This testcase is poor quality.  bugpoint couldn't reduce any further,
and I haven't had time to dig into what's going on so I can't invent a
better one.  I didn't even get good CHECK lines in: this is just a
crasher.

I'm committing anyway since this is a real crash with an obvious fix,
but I'll leave PR29072 open and ask an ARM maintainer to help improve
the testcase.

llvm-svn: 279391
2016-08-21 00:08:10 +00:00
Simon Pilgrim 636422a898 [X86][SSE] Regenerate 32-bit buildvector test
llvm-svn: 279389
2016-08-20 23:09:57 +00:00
Simon Pilgrim ead5076753 [X86][SSE] Regenerate subvector extraction widening test
llvm-svn: 279388
2016-08-20 22:00:53 +00:00
Simon Pilgrim b65d476549 [X86] Regenerate fp truncate tests
llvm-svn: 279387
2016-08-20 21:56:33 +00:00
Simon Pilgrim 8275583be6 Regenerate test
llvm-svn: 279386
2016-08-20 21:37:30 +00:00
Simon Pilgrim a1142579f1 Regenerate test
llvm-svn: 279385
2016-08-20 21:35:45 +00:00
Simon Pilgrim ae9b81e684 [X86][XOP] Tweak vpermil2pd test to stop it being combined away
llvm-svn: 279384
2016-08-20 21:07:41 +00:00
Simon Pilgrim cb0ba1067f [X86][SSE] Added vector interleave test (PR21281)
llvm-svn: 279375
2016-08-20 17:07:38 +00:00
Tim Northover a11be04769 GlobalISel: support legalization of G_FCONSTANTs
llvm-svn: 279341
2016-08-19 22:40:08 +00:00
Tim Northover ea904f9424 GlobalISel: teach legalizer how to handle integer constants.
llvm-svn: 279340
2016-08-19 22:40:00 +00:00
Matthias Braun a7d6fc9618 MachineFunction: Cleanup/simplify MachineFunctionProperties::print()
- Always compile print() regardless of LLVM_ENABLE_DUMP. (We usually
  only gard dump() functions with that).
- Only show the set properties to reduce output clutter.
- Remove the unused variant that even shows the unset properties.
- Fix comments

llvm-svn: 279338
2016-08-19 22:31:45 +00:00
Tim Northover b78e4cafde GlobalISel: translate floating-point round/extend
llvm-svn: 279320
2016-08-19 20:48:23 +00:00
Tim Northover d5c23bcfc9 GlobalISel: translate floating-point comparisons
llvm-svn: 279319
2016-08-19 20:48:16 +00:00
Justin Lebar d13880a750 [NVPTX] Switch nvptx-use-infer-addrspace to true.
Summary:
This switches us to use a different, more powerful algorithm for address
space inference.  I've tested this locally and it seems to work great.
Once we're more confident in it, we can remove the old pass altogether.

Reviewers: jingyue

Subscribers: llvm-commits, tra, jholewinski

Differential Revision: https://reviews.llvm.org/D23694

llvm-svn: 279317
2016-08-19 20:46:45 +00:00
Reid Kleckner 98a48afa5d Revert "[SimplifyCFG] Rewrite SinkThenElseCodeToEnd"
This reverts commit r279229. It breaks intrinsic function calls in
diamonds.

llvm-svn: 279313
2016-08-19 20:22:39 +00:00