Commit Graph

24962 Commits

Author SHA1 Message Date
Craig Topper 875e9f8fa4 [X86] Remove masking from the avx512 packed sqrt intrinsics. Use select in IR instead.
While there improve the coverage of the intrinsic testing and add fast-isel tests.

llvm-svn: 335944
2018-06-29 05:43:26 +00:00
Jessica Paquette 0c5d3ffbb8 [MachineOutliner] Never add the outliner in -O0
This is a recommit of r335879.

We shouldn't add the outliner when compiling at -O0 even if
-enable-machine-outliner is passed in. This makes sure that we
don't add it in this case.

This also removes -O0 from the outliner DWARF test.

llvm-svn: 335930
2018-06-28 21:49:24 +00:00
Martin Storsjo 2a9bd7b756 [COFF] Fix constant sharing regression for MinGW
This fixes a regression since SVN r334523, where the object files
built targeting MinGW were rejected by GNU binutils tools. Prior to
that commit, we only put constants in comdat for MSVC configurations.

Differential Revision: https://reviews.llvm.org/D48567

llvm-svn: 335918
2018-06-28 20:28:29 +00:00
Craig Topper 90317d1d94 [X86] Suppress load folding into and/or/xor if it will prevent matching btr/bts/btc.
This is a follow up to r335753. At the time I forgot about isProfitableToFold which makes this pretty easy.

Differential Revision: https://reviews.llvm.org/D48706

llvm-svn: 335895
2018-06-28 17:58:01 +00:00
Jonas Devlieghere b757fc3878 Revert "Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code models""
Reverting because this is causing failures in the LLDB test suite on
GreenDragon.

  LLVM ERROR: unsupported relocation with subtraction expression, symbol
  '__GLOBAL_OFFSET_TABLE_' can not be undefined in a subtraction
  expression

llvm-svn: 335894
2018-06-28 17:56:43 +00:00
Simon Pilgrim 9c70d48cb2 [DAGCombiner] Ensure we use the correct CC result type in visitSDIV (REAPPLIED)
We could get away with it for constant folded cases, but not for rL335719.

Thanks to Krzysztof Parzyszek for noticing.

Reapply original commit rL335821 which was reverted at rL335871 due to a WebAssembly bug that was fixed at rL335884.

llvm-svn: 335886
2018-06-28 17:33:41 +00:00
Jessica Paquette d6261bef7b Revert "[MachineOutliner] Add always and never options to -enable-machine-outliner"
I accidentally committed this instead of D48683 because I haven't had coffee
yet.

llvm-svn: 335883
2018-06-28 17:26:19 +00:00
Jessica Paquette f3a44fe833 Revert "[MachineOutliner] Never add the outliner in -O0"
This reverts commit 9c7c10e4073a0bc6a759ce5cd33afbac74930091.

It relies on r335872 since that introduces the machine outliner
flags test. I meant to commit D48683 in that commit, but got mixed
up and committed D48682 instead. So, I'm reverting this and
r335872, since D48682 hasn't made it through review yet.

llvm-svn: 335882
2018-06-28 17:26:18 +00:00
Jessica Paquette c9d675266e [MachineOutliner] Never add the outliner in -O0
We shouldn't add the outliner when compiling at -O0 even if
-enable-machine-outliner is passed in. This makes sure that we
don't add it in this case.

This also updates machine-outliner-flags to reflect the change
and improves the comment describing what that test does.

llvm-svn: 335879
2018-06-28 17:05:57 +00:00
Matthias Braun da5e7e11d1 SelectionDAGBuilder, mach-o: Skip trap after noreturn call (for Mach-O)
Add NoTrapAfterNoreturn target option which skips emission of traps
behind noreturn calls even if TrapUnreachable is enabled.

Enable the feature on Mach-O to save code size; Comments suggest it is
not possible to enable it for the other users of TrapUnreachable.

rdar://41530228

DifferentialRevision: https://reviews.llvm.org/D48674
llvm-svn: 335877
2018-06-28 17:00:45 +00:00
Jessica Paquette 1ccb66c5fb [MachineOutliner] Add always and never options to -enable-machine-outliner
To enable the MachineOutliner by default on AArch64, we need to be able to
disable the MachineOutliner and also provide an option to "always" enable the
outliner.

This adds that capability. It allows the user to still use the old
-enable-machine-outliner option, which defaults to "always". This is building
up to allowing the user to specify "always" versus the target-default
outlining behaviour.

llvm-svn: 335872
2018-06-28 16:39:42 +00:00
Haojian Wu 2103990e63 Revert "[DAGCombiner] Ensure we use the correct CC result type in visitSDIV"
This reverts commit r335821.

This crashes the webassembly test, run "ninja check-llvm-codegen-webassembly" to reproduce.

llvm-svn: 335871
2018-06-28 16:25:57 +00:00
Stanislav Mekhanoshin 67aa18f165 [AMDGPU] Early expansion of 32 bit udiv/urem
This allows hoisting of a common code, for instance if denominator
is loop invariant. Current change is expansion only, adding licm to
the target pass list going to be a separate patch. Given this patch
changes to codegen are minor as the expansion is similar to that on
DAG. DAG expansion still must remain for R600.

Differential Revision: https://reviews.llvm.org/D48586

llvm-svn: 335868
2018-06-28 15:59:18 +00:00
Stanislav Mekhanoshin 298a61590a [AMDGPU] Overload llvm.amdgcn.fmad.ftz to support f16
Differential Revision: https://reviews.llvm.org/D48677

llvm-svn: 335866
2018-06-28 15:24:46 +00:00
Sjoerd Meijer c89ca5582a [ARM] Parallel DSP Pass
Armv6 introduced instructions to perform 32-bit SIMD operations. The purpose of
this pass is to do some straightforward IR pattern matching to create ACLE DSP
intrinsics, which map on these 32-bit SIMD operations.

Currently, only the SMLAD instruction gets recognised. This instruction
performs two multiplications with 16-bit operands, and stores the result in an
accumulator. We will follow this up with patches to recognise SMLAD in more
cases, and also to generate other DSP instructions (like e.g. SADD16).

Patch by: Sam Parker and Sjoerd Meijer

Differential Revision: https://reviews.llvm.org/D48128

llvm-svn: 335850
2018-06-28 12:55:29 +00:00
Matt Arsenault 1fb9013368 AMDGPU: Error on calls from graphics shaders
In principle nothing should stop these from working, but
work is necessary to create an ABI for dealing with the stack
related registers.

llvm-svn: 335829
2018-06-28 10:18:36 +00:00
Matt Arsenault 513e0c0ea4 AMDGPU: Fix assert on aggregate type kernel arguments
Just fix the crash for now by not doing the optimization since
figuring out how to properly convert the bits for an arbitrary
struct is a pain.

Also fix a crash when there is only an empty struct argument.

llvm-svn: 335827
2018-06-28 10:18:11 +00:00
Simon Pilgrim abebe4c746 [DAGCombiner] Ensure we use the correct CC result type in visitSDIV
We could get away with it for constant folded cases, but not for rL335719.

Thanks to Krzysztof Parzyszek for noticing.

llvm-svn: 335821
2018-06-28 09:54:28 +00:00
Sameer AbuAsal 9b65ffb097 [RISCV] Add machine function pass to merge base + offset
Summary:
   In r333455 we added a peephole to fix the corner cases that result
   from separating base + offset lowering of global address.The
   peephole didn't handle some of the cases because it only has a basic
   block view instead of a function level view.

   This patch replaces that logic with a machine function pass. In
   addition to handling the original cases it handles uses of the global
   address across blocks in function and folding an offset from LW\SW
   instruction. This pass won't run for OptNone compilation, so there
   will be a negative impact overall vs the old approach at O0.

Reviewers: asb, apazos, mgrang

Reviewed By: asb

Subscribers: MartinMosbeck, brucehoult, the_o, rogfer01, mgorny, rbar, johnrusso, simoncook, niosHD, kito-cheng, shiva0217, zzheng, llvm-commits, edward-jones

Differential Revision: https://reviews.llvm.org/D47857

llvm-svn: 335786
2018-06-27 20:51:42 +00:00
Fangrui Song 5dc371a7a6 [WebAssembly] Try fixing test/CodeGen/WebAssembly/vector_sdiv.ll
llvm-svn: 335771
2018-06-27 19:35:50 +00:00
Daniel Sanders bdeb880d14 [globalisel][legalizer] Add AtomicOrdering to LegalityQuery and use it in AArch64
Now that we have the ability to legalize based on MMO's. Add support for
legalizing based on AtomicOrdering and use it to correct the legalization
of the atomic instructions.

Also extend all() to be a variadic template as this ruleset now requires
3 and 4 argument versions.

llvm-svn: 335767
2018-06-27 19:03:21 +00:00
Sanjay Patel d052de856d [DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros
As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc 
can produce -0.0 where the original code does not:

#include <stdio.h>
  
int main(int argc) {
  float x;
  x = -0.8 * argc;
  printf("%f\n", (float)((int)x));
  return 0;
}

$ clang -O0 -mavx fp.c ; ./a.out 
0.000000
$ clang -O1 -mavx fp.c ; ./a.out 
-0.000000

Ideally, we'd use IR/node flags to predicate the transform, but the IR parser 
doesn't currently allow fast-math-flags on the cast instructions. So for now, 
just use the function attribute that corresponds to clang's "-fno-signed-zeros" 
option.

Differential Revision: https://reviews.llvm.org/D48085

llvm-svn: 335761
2018-06-27 18:16:40 +00:00
Jessica Paquette f472f6159a [MachineOutliner] Don't outline sequences where x16/x17/nzcv are live across
It isn't safe to outline sequences of instructions where x16/x17/nzcv live
across the sequence.

This teaches the outliner to check whether or not a specific canidate has
x16/x17/nzcv live across it and discard the candidate in the case that that is
true.

https://bugs.llvm.org/show_bug.cgi?id=37573
https://reviews.llvm.org/D47655

llvm-svn: 335758
2018-06-27 17:43:27 +00:00
Craig Topper 812fcb35e7 [X86] Use bts/btr/btc for single bit set/clear/complement of a variable bit position
If we are just modifying a single bit at a variable bit position we can use the BT* instructions to make the change instead of shifting a 1(or rotating a -1) and doing a binop. These instruction also ignore the upper bits of their index input so we can also remove an and if one is present on the index.

Fixes PR37938.

llvm-svn: 335754
2018-06-27 16:47:39 +00:00
Craig Topper 069628b4df [X86] Add test cases for D48606.
llvm-svn: 335753
2018-06-27 16:47:36 +00:00
Simon Pilgrim 8a02b25313 [X86][SSE] Add missing AVX512 rotation tests
Increase coverage to make sure we're not doing anything stupid without AVX512BW

llvm-svn: 335746
2018-06-27 16:00:53 +00:00
Craig Topper 31cbe75b3b [X86] Rename the autoupgraded of packed fp compare and fpclass intrinsics that don't take a mask as input to exclude '.mask.' from their name.
I think the intrinsics named 'avx512.mask.' should refer to the previous behavior of taking a mask argument in the intrinsic instead of using a 'select' or 'and' instruction in IR to accomplish the masking. This is more consistent with the goal that eventually we will have no intrinsics that have masking builtin. When we reach that goal, we should have no intrinsics named "avx512.mask".

llvm-svn: 335744
2018-06-27 15:57:53 +00:00
Stanislav Mekhanoshin 1a1687f1bb [AMDGPU] Convert rcp to rcp_iflag
If a source of rcp instruction is a result of any conversion from
an integer convert it into rcp_iflag instruction. No FP exception
can ever happen except division by zero if a single precision rcp
argument is a representation of an integral number.

Differential Revision: https://reviews.llvm.org/D48569

llvm-svn: 335742
2018-06-27 15:33:33 +00:00
Luke Geeson 316327150b [AArch64] Reverting FP16 vcvth_n_s64_f16 to fix
llvm-svn: 335737
2018-06-27 14:34:40 +00:00
Adhemerval Zanella cadcfed7aa [AArch64] Add custom lowering for v4i8 trunc store
This patch adds a custom trunc store lowering for v4i8 vector types.
Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h)
and default action for v4i8 is to extract each element and issue 4
byte stores.

A better strategy would be to extended the promoted v4i16 to v8i16
(with undef elements) and extract and store the word lane which
represents the v4i8 subvectores. The construction:

  define void @foo(<4 x i16> %x, i8* nocapture %p) {
    %0 = trunc <4 x i16> %x to <4 x i8>
    %1 = bitcast i8* %p to <4 x i8>*
    store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2
    ret void
  }

Can be optimized from:

  umov    w8, v0.h[3]
  umov    w9, v0.h[2]
  umov    w10, v0.h[1]
  umov    w11, v0.h[0]
  strb    w8, [x0, #3]
  strb    w9, [x0, #2]
  strb    w10, [x0, #1]
  strb    w11, [x0]
  ret

To:

  xtn     v0.8b, v0.8h
  str     s0, [x0]
  ret

The patch also adjust the memory cost for autovectorization, so the C
code:

  void foo (const int *src, int width, unsigned char *dst)
  {
    for (int i = 0; i < width; i++)
       *dst++ = *src++;
  }

can be vectorized to:

  .LBB0_4:                                // %vector.body
                                          // =>This Inner Loop Header: Depth=1
        ldr     q0, [x0], #16
        subs    x12, x12, #4            // =4
        xtn     v0.4h, v0.4s
        xtn     v0.8b, v0.8h
        st1     { v0.s }[0], [x2], #4
        b.ne    .LBB0_4

Instead of byte operations.

llvm-svn: 335735
2018-06-27 13:58:46 +00:00
Ivan A. Kosarev 7231598fce [NEON] Support vldNq intrinsics in AArch32 (LLVM part)
This patch adds support for the q versions of the dup
(load-to-all-lanes) NEON intrinsics, such as vld2q_dup_f16() for
example.

Currently, non-q versions of the dup intrinsics are implemented
in clang by generating IR that first loads the elements of the
structure into the first lane with the lane (to-single-lane)
intrinsics, and then propagating it other lanes. There are at
least two problems with this approach. First, there are no
double-spaced to-single-lane byte-element instructions. For
example, there is no such instruction as 'vld2.8 { d0[0], d2[0]
}, [r0]'. That means we cannot rely on the to-single-lane
intrinsics and instructions to implement the q versions of the
dup intrinsics. Note that to-all-lanes instructions do support
all sizes of data items, including bytes.

The second problem with the current approach is that we need a
separate vdup instruction to propagate the structure to each
lane. So for vld4q_dup_f16() we would need four vdup instructions
in addition to the initial vld instruction.

This patch introduces dup LLVM intrinsics and reworks handling of
the currently supported (non-q) NEON dup intrinsics to expand
them into those LLVM intrinsics, thus eliminating the need for
using to-single-lane intrinsics and instructions.

Additionally, this patch adds support for u64 and s64 dup NEON
intrinsics. These are marked as Arch64-only in the ARM NEON
Reference, but it seems there are no reasons to not support them
in AArch32 mode. Please correct, if that is wrong.

That's what we generate with this patch applied:

vld2q_dup_f16:
  vld2.16 {d0[], d2[]}, [r0]
  vld2.16 {d1[], d3[]}, [r0]

vld3q_dup_f16:
  vld3.16 {d0[], d2[], d4[]}, [r0]
  vld3.16 {d1[], d3[], d5[]}, [r0]

vld4q_dup_f16:
  vld4.16 {d0[], d2[], d4[], d6[]}, [r0]
  vld4.16 {d1[], d3[], d5[], d7[]}, [r0]

Differential Revision: https://reviews.llvm.org/D48439

llvm-svn: 335733
2018-06-27 13:57:52 +00:00
Simon Pilgrim d3e583a52d [DAGCombiner] visitSDIV - add special case handling for (sdiv X, 1) -> X in pow2 expansion
For divisor = 1, perform a select of X - reduces scalarisation of simple SDIVs

llvm-svn: 335727
2018-06-27 12:45:31 +00:00
Simon Pilgrim 41afbcb9ca [X86][SSE] Include MIN_SIGNED element in non-uniform SDIV pow2 tests
llvm-svn: 335721
2018-06-27 10:59:36 +00:00
Simon Pilgrim dfbcc66adc [DAGCombiner] Fold SDIV(%X, MIN_SIGNED) -> SELECT(%X == MIN_SIGNED, 1, 0)
Fixes PR37569.

llvm-svn: 335719
2018-06-27 10:21:06 +00:00
Simon Pilgrim 0a566bc0ae [DAGCombiner] Don't accept signbit sdiv divisors in sdiv-by-pow2 vector expansion (PR37569)
llvm-svn: 335717
2018-06-27 09:41:22 +00:00
Luke Geeson 68cb233c0f [AArch64] Remove Duplicate FP16 Patterns with same encoding, match on existing patterns
llvm-svn: 335715
2018-06-27 09:20:13 +00:00
Simon Pilgrim c9e60adcb5 [X86] Add test for SDIV by sign bit (minsigned) value
llvm-svn: 335671
2018-06-26 22:03:00 +00:00
Jessica Paquette 67599c2e1e [X86][AsmParser] Recommit r335658
Recommit of r335658 so that it does not change the behaviour of any
existing error output.

llvm-svn: 335668
2018-06-26 21:30:34 +00:00
Jessica Paquette 0a80af0761 Revert "[X86][AsmParser] Emit an error when RIP-relative instructions are used in 32-bit mode"
This reverts commit 4850a9aae8b38c7deadc103d634ec7397e6c323b.

It caused MC/X86/x86_errors.s to fail. Will fix and recommit shortly.

llvm-svn: 335660
2018-06-26 20:57:19 +00:00
Jessica Paquette 0e40d4bfc3 [X86][AsmParser] Emit an error when RIP-relative instructions are used in 32-bit mode
Right now, when we use RIP-relative instructions in 32-bit mode, we'll just
assert and crash.

This adds an error message which tells the user that they can't do that in
32-bit mode, so that we don't crash (and also can see the issue outside of
assert builds).

llvm-svn: 335658
2018-06-26 20:33:46 +00:00
Stanislav Mekhanoshin dacda79ee6 [AMDGPU] Add llvm.amdgcn.fmad.ftz intrinsic
This intrinsic selects v_mad_f32 regardless of fp32 denorm support.

Differential Revision: https://reviews.llvm.org/D48573

llvm-svn: 335654
2018-06-26 20:04:19 +00:00
Matt Arsenault 8c4a35237a AMDGPU: Add pass to lower kernel arguments to loads
This replaces most argument uses with loads, but for
now not all.

The code in SelectionDAG for calling convention lowering
is actively harmful for amdgpu_kernel. It attempts to
split the argument types into register legal types, which
results in low quality code for arbitary types. Since
all kernel arguments are passed in memory, we just want the
raw types.

I've tried a couple of methods of mitigating this in SelectionDAG,
but it's easier to just bypass this problem alltogether. It's
possible to hack around the problem in the initial lowering,
but the real problem is the DAG then expects to be able to use
CopyToReg/CopyFromReg for uses of the arguments outside the block.

Exposing the argument loads in the IR also has the advantage
that the LoadStoreVectorizer can merge them.

I'm not sure the best approach to dealing with the IR
argument list is. The patch as-is just leaves the IR arguments
in place, so all the existing code will still compute the same
kernarg size and pointlessly lowers the arguments.

Arguably the frontend should emit kernels with an empty argument
list in the first place. Alternatively a dummy array could be
inserted as a single argument just to reserve space.

This does have some disadvantages. Local pointer kernel arguments can
no longer have AssertZext placed  on them as the equivalent !range
metadata is not valid on pointer  typed loads. This is mostly bad
for SI which needs to know about the known bits in order to use the
DS instruction offset, so in this case this is not done.

More importantly, this skips noalias arguments since this pass
does not yet convert this to the equivalent !alias.scope and !noalias
metadata. Producing this metadata correctly seems to be tricky,
although this logically is the same as inlining into a function which
doesn't exist. Additionally, exposing these loads to the vectorizer
may result in degraded aliasing information if a pointer load is
merged with another argument load.

I'm also not entirely sure this is preserving the current clover
ABI, although I would greatly prefer if it would stop widening
arguments and match the HSA ABI. As-is I think it is extending
< 4-byte arguments to 4-bytes but doesn't align them to 4-bytes.

llvm-svn: 335650
2018-06-26 19:10:00 +00:00
Brendon Cahoon b7169c435a [Hexagon] Add a "generic" cpu
Add the generic processor for Hexagon so that it can be used
with 3rd party programs that create a back-end with the
"generic" CPU. This patch also enables the JIT for Hexagon.

Differential Revision: https://reviews.llvm.org/D48571

llvm-svn: 335641
2018-06-26 18:44:05 +00:00
Simon Pilgrim 7f55af37f4 [DAGCombiner] Don't accept -1 sdiv divisors in sdiv-by-pow2 vector expansion (PR37119)
Temporary fix until I've managed to get D45806 updated - both +1 and -1 special cases need to be properly supported.

llvm-svn: 335637
2018-06-26 17:46:51 +00:00
Simon Pilgrim 1576df53a9 [X86][SSE] Add another sdiv by (nonuniform) minus one test (PR37119)
Include a test that divides by -1 but not by 1 (another special case)

llvm-svn: 335629
2018-06-26 17:06:05 +00:00
Krzysztof Parzyszek 70f027022c Account for undef values from predecessors in extendSegmentsToUses
It is legal for a PHI node not to have a live value in a predecessor
as long as the end of the predecessor is jointly dominated by an undef
value.

llvm-svn: 335607
2018-06-26 14:37:16 +00:00
Than McIntosh 3190993a02 [X86,ARM] Retain split-stack prolog check for sibling calls
Summary:
If a routine with no stack frame makes a sibling call, we need to
preserve the stack space check even if the local stack frame is empty,
since the call target could be a "no-split" function (in which case
the linker needs to be able to fix up the prolog sequence in order to
switch to a larger stack).

This fixes PR37807.

Reviewers: cherry, javed.absar

Subscribers: srhines, llvm-commits

Differential Revision: https://reviews.llvm.org/D48444

llvm-svn: 335604
2018-06-26 14:11:30 +00:00
Dan Gohman 910ba33d0c [WebAssembly] Fix lowering of varargs functions with non-legal fixed arguments.
CallLoweringInfo's NumFixedArgs field gives the number of fixed arguments
before legalization. The ISD::OutputArg "Outs" array holds legalized
arguments, so when indexing into it to find the non-fixed arguemn, we need
to use the number of arguments after legalization.

Fixes PR37934.

llvm-svn: 335576
2018-06-26 03:18:38 +00:00
Craig Topper c42ed4e3c4 [X86] Use XOR for SUB (C, X) during isel if will help fold an immediate
Summary:
Same idea as D48529, but restricted to X86 and done very late to avoid any surprises where subtract might be better for DAG combining.

This seems like the safest way to do this trick. And we consider doing it as a DAG combine later.

Reviewers: spatel, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48557

llvm-svn: 335575
2018-06-26 03:11:15 +00:00
Craig Topper 689e363ff2 [X86] Redefine avx512 packed fpclass intrinsics to return a vXi1 mask and implement the mask input argument using an 'and' IR instruction.
This recommits r335562 and 335563 as a single commit.

The frontend will surround the intrinsic with the appropriate marshalling to/from a scalar type to match the sigature of the builtin that software expects.

By exposing the vXi1 type directly in the llvm intrinsic we make it available to optimizers much earlier. This can enable the scalar marshalling code to be optimized away.

llvm-svn: 335568
2018-06-26 01:37:02 +00:00