Commit Graph

12061 Commits

Author SHA1 Message Date
Craig Topper 95eb88abfe [X86] Add support for combining FMSUB/FNMADD/FNMSUB ISD nodes with an fneg input.
Previously we could only negate the FMADD opcodes. This used to be mostly ok when we lowered FMA intrinsics during lowering. But with the move to llvm.fma from target specific intrinsics, we can combine (fneg (fma)) to (fmsub) earlier. So if we start with (fneg (fma (fneg))) we would get stuck at (fmsub (fneg)).

This patch fixes that so we can also combine things like (fmsub (fneg)).

llvm-svn: 336304
2018-07-05 02:52:56 +00:00
Craig Topper e4b9257b69 [X86] Remove some of the packed FMA3 intrinsics since we no longer use them in clang.
There's a regression in here due to inability to combine fneg inputs of X86ISD::FMSUB/FNMSUB/FNMADD nodes.

More removals to come, but I wanted to stop and fix the regression that showed up in this first.

llvm-svn: 336303
2018-07-05 02:52:54 +00:00
Simon Pilgrim 0c3b421b1a [X86][SSE] Add v16i16 shl x,c -> pmullw test
llvm-svn: 336277
2018-07-04 14:20:58 +00:00
Simon Pilgrim 6a99ce27f6 [X86][SSE] Add SSE2 target to some shift tests
Show the difference in behaviour cf SSE41 (no PMULLD, PBLENDW etc.)

Raised by D48936

llvm-svn: 336271
2018-07-04 13:58:13 +00:00
Simon Pilgrim c3e1617bf9 [X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique values (REAPPLIED)
We were only doing this for basic blends, despite shuffle lowering now being good enough to handle more complex blends. This means that the two v8i16 splat shifts are performed in parallel instead of serially as the general shift case.

Reapplied with a fixed (extra null tests) version of rL336113 after reversion in rL336189 - extra test case added at rL336247.

llvm-svn: 336250
2018-07-04 09:12:48 +00:00
Simon Pilgrim 61fdf3b33c [X86][SSE] Add reduced crash test case for r336113 - [X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique values
The patch was reverted at r336189 due to crashes

llvm-svn: 336247
2018-07-04 08:55:23 +00:00
Max Kazantsev e8e01143ec [ImplicitNullChecks] Check for rewrite of register used in 'test' instruction
The following code pattern:

       mov %rax, %rcx
       test %rax, %rax
       %rax = ....
       je  throw_npe
       mov(%rcx), %r9
       mov(%rax), %r10

gets transformed into the following incorrect code after implicit null check pass:
        mov %rax, %rcx
       %rax = ....
       faulting_load_op("movl (%rax), %r10", throw_npe)
       mov(%rcx), %r9

For implicit null check pass, if the register that is checked for null value (ie, the register used in the 'test' instruction) is written into before the condition jump, we should avoid doing the optimization.

Patch by Surya Kumari Jangala!

Differential Revision: https://reviews.llvm.org/D48627
Reviewed By: skatkov

llvm-svn: 336241
2018-07-04 08:01:26 +00:00
Roman Lebedev 93357f52c6 [X86] Add tests for low/high bit clearing with different attributes.
D48768 may turn some of these into shifts.

Reviewers: spatel

Reviewed By: spatel

Subscribers: spatel, RKSimon, llvm-commits, craig.topper

Differential Revision: https://reviews.llvm.org/D48767

llvm-svn: 336224
2018-07-03 19:12:37 +00:00
Craig Topper bc598f0d61 [X86][AsmParser] Don't consider %eip as a valid register outside of 32-bit mode.
This might make the error message added in r335668 unneeded, but I'm not sure yet.

The check for RIP is technically unnecessary since RIP is in GR64, but that fact is kind of surprising so be explicit.

llvm-svn: 336217
2018-07-03 17:40:51 +00:00
Simon Pilgrim 74cc4cfa94 [DAGCombiner] visitSDIV - Permit MIN_SIGNED_VALUE in pow2 vector codegen
Now that D45806 has landed, we can re-enable support for MIN_SIGNED_VALUE in the sdiv by pow2-constant code

llvm-svn: 336198
2018-07-03 14:11:32 +00:00
Benjamin Kramer fd171f2f89 Revert "[X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique values"
This reverts commit r336113. It causes crashes.

llvm-svn: 336189
2018-07-03 11:15:17 +00:00
Craig Topper 6121699b11 [X86] Add avx512vl command line to break-false-dep.ll
llvm-svn: 336169
2018-07-03 04:43:49 +00:00
Krzysztof Parzyszek fd97494984 [X86] Add phony registers for high halves of regs with low halves
Add registers still missing after r328016 (D43353):
- for bits 15-8  of SI, DI, BP, SP (*H), and R8-R15 (*BH),
- for bits 31-16 of R8-R15 (*WH).

Thanks to Craig Topper for pointing it out.

llvm-svn: 336134
2018-07-02 19:05:09 +00:00
Craig Topper 56440b9745 [X86] Don't use aligned load/store instructions for fp128 if the load/store isn't aligned.
Similarily, don't fold fp128 loads into SSE instructions if the load isn't aligned. Unless we're targeting an AMD CPU that doesn't check alignment on arithmetic instructions.

Should fix PR38001

llvm-svn: 336121
2018-07-02 17:01:54 +00:00
Simon Pilgrim 2bc8e079f2 [X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique values
We were only doing this for basic blends, despite shuffle lowering now being good enough to handle more complex blends. This means that the two v8i16 splat shifts are performed in parallel instead of serially as the general shift case.

llvm-svn: 336113
2018-07-02 15:14:07 +00:00
Simon Pilgrim a6be2437e7 [X86][SSE] Add v8i16 shift test for 2 shift values that doesn't match basic blend
We have special case support for 2 shift values for basic blends, but irregular shift patterns end up using the generic lowering, despite shuffle lowering being good enough to handle more complex blends.

llvm-svn: 336112
2018-07-02 14:53:41 +00:00
Craig Topper df99cdb95b [X86] Fix a few test names in avx512-intrinsics-fast-isel.ll to match their clang intrinsic names.
I thought I fixed these yesterday, but I guess I missed a few.

llvm-svn: 336071
2018-07-01 23:49:06 +00:00
Simon Pilgrim fae337704e [DAGCombiner] Handle correctly non-splat power of 2 -1 divisor (PR37119)
The combine added in commit 329525 overlooked the case where one, but not all, of the divisor elements is -1, -1 is the only power of two value for which the sdiv expansion recipe breaks.

Thanks to @zvi for the original patch.

Differential Revision: https://reviews.llvm.org/D45806

llvm-svn: 336048
2018-06-30 12:22:55 +00:00
Craig Topper 50a10ba6e0 [X86] Update some avx512 fast-isel tests to match their real clang IRgen.
Especially of note was the test_mm_mask_set1_epi64 and other set1 tests that were truncating the element to be broadcasted to i8 and broadcasting that instead of a whole 64 bit value.

Some of the others were just correcting mask sizes on parameters due to bugs in the clang test case they were generated from that have now been fixed.

Some were converting i8 to <4 x i1>/<2 x i1> by truncating to i4/i2 and then bitcasting. But the clang codegen is bitcast to <8 x i1>, then extract to <4 x i1>/<2 x i1>. This is likely to incur less trouble from the integer type legalizer in the backend.

llvm-svn: 336045
2018-06-30 07:25:29 +00:00
Craig Topper db1d7f2b16 [X86] Change some chec-prefixes from X32 to X86 to match the FileCheck command line.
I think this test changed and these test cases were created around the same time and missed the change.

llvm-svn: 336044
2018-06-30 06:45:10 +00:00
Craig Topper 8f6ace5bcd [X86] Remove test cases from avx512vl-intrinsics-fast-isel.ll for intrinsics that don't really exist in clang.
llvm-svn: 336043
2018-06-30 06:45:09 +00:00
Craig Topper 59f2f38fe0 [X86] Remove masking from avx512 rotate intrinsics. Use select in IR instead.
llvm-svn: 336035
2018-06-30 01:32:04 +00:00
Craig Topper 87b107dd69 [X86] Limit the number of target specific nodes emitted in LowerShiftParts
The important part is the creation of the SHLD/SHRD nodes. The compare and the conditional move can use target independent nodes that can be legalized on their own. This gives some opportunities to trigger the optimizations present in the lowering for those things. And its just better to limit the number of places we emit target specific nodes.

The changed test cases still aren't optimal.

Differential Revision: https://reviews.llvm.org/D48619

llvm-svn: 335998
2018-06-29 17:24:07 +00:00
Simon Pilgrim aab8660e23 [X86][SSE] Support v16i8/v32i8 vector rotations
This uses the same technique as for shifts - split the rotation into 4/2/1-bit partial rotations and select those partials based on the amount bit, making use of PBLENDVB if available. This halves the use of PBLENDVB compared to expanding to shifts, which can be a slow op.

Unfortunately I haven't found a decent way to share much of this code with the shift equivalent.

Differential Revision: https://reviews.llvm.org/D48655

llvm-svn: 335957
2018-06-29 09:36:39 +00:00
Craig Topper 875e9f8fa4 [X86] Remove masking from the avx512 packed sqrt intrinsics. Use select in IR instead.
While there improve the coverage of the intrinsic testing and add fast-isel tests.

llvm-svn: 335944
2018-06-29 05:43:26 +00:00
Martin Storsjo 2a9bd7b756 [COFF] Fix constant sharing regression for MinGW
This fixes a regression since SVN r334523, where the object files
built targeting MinGW were rejected by GNU binutils tools. Prior to
that commit, we only put constants in comdat for MSVC configurations.

Differential Revision: https://reviews.llvm.org/D48567

llvm-svn: 335918
2018-06-28 20:28:29 +00:00
Craig Topper 90317d1d94 [X86] Suppress load folding into and/or/xor if it will prevent matching btr/bts/btc.
This is a follow up to r335753. At the time I forgot about isProfitableToFold which makes this pretty easy.

Differential Revision: https://reviews.llvm.org/D48706

llvm-svn: 335895
2018-06-28 17:58:01 +00:00
Jonas Devlieghere b757fc3878 Revert "Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code models""
Reverting because this is causing failures in the LLDB test suite on
GreenDragon.

  LLVM ERROR: unsupported relocation with subtraction expression, symbol
  '__GLOBAL_OFFSET_TABLE_' can not be undefined in a subtraction
  expression

llvm-svn: 335894
2018-06-28 17:56:43 +00:00
Simon Pilgrim 9c70d48cb2 [DAGCombiner] Ensure we use the correct CC result type in visitSDIV (REAPPLIED)
We could get away with it for constant folded cases, but not for rL335719.

Thanks to Krzysztof Parzyszek for noticing.

Reapply original commit rL335821 which was reverted at rL335871 due to a WebAssembly bug that was fixed at rL335884.

llvm-svn: 335886
2018-06-28 17:33:41 +00:00
Matthias Braun da5e7e11d1 SelectionDAGBuilder, mach-o: Skip trap after noreturn call (for Mach-O)
Add NoTrapAfterNoreturn target option which skips emission of traps
behind noreturn calls even if TrapUnreachable is enabled.

Enable the feature on Mach-O to save code size; Comments suggest it is
not possible to enable it for the other users of TrapUnreachable.

rdar://41530228

DifferentialRevision: https://reviews.llvm.org/D48674
llvm-svn: 335877
2018-06-28 17:00:45 +00:00
Haojian Wu 2103990e63 Revert "[DAGCombiner] Ensure we use the correct CC result type in visitSDIV"
This reverts commit r335821.

This crashes the webassembly test, run "ninja check-llvm-codegen-webassembly" to reproduce.

llvm-svn: 335871
2018-06-28 16:25:57 +00:00
Simon Pilgrim abebe4c746 [DAGCombiner] Ensure we use the correct CC result type in visitSDIV
We could get away with it for constant folded cases, but not for rL335719.

Thanks to Krzysztof Parzyszek for noticing.

llvm-svn: 335821
2018-06-28 09:54:28 +00:00
Sanjay Patel d052de856d [DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros
As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc 
can produce -0.0 where the original code does not:

#include <stdio.h>
  
int main(int argc) {
  float x;
  x = -0.8 * argc;
  printf("%f\n", (float)((int)x));
  return 0;
}

$ clang -O0 -mavx fp.c ; ./a.out 
0.000000
$ clang -O1 -mavx fp.c ; ./a.out 
-0.000000

Ideally, we'd use IR/node flags to predicate the transform, but the IR parser 
doesn't currently allow fast-math-flags on the cast instructions. So for now, 
just use the function attribute that corresponds to clang's "-fno-signed-zeros" 
option.

Differential Revision: https://reviews.llvm.org/D48085

llvm-svn: 335761
2018-06-27 18:16:40 +00:00
Craig Topper 812fcb35e7 [X86] Use bts/btr/btc for single bit set/clear/complement of a variable bit position
If we are just modifying a single bit at a variable bit position we can use the BT* instructions to make the change instead of shifting a 1(or rotating a -1) and doing a binop. These instruction also ignore the upper bits of their index input so we can also remove an and if one is present on the index.

Fixes PR37938.

llvm-svn: 335754
2018-06-27 16:47:39 +00:00
Craig Topper 069628b4df [X86] Add test cases for D48606.
llvm-svn: 335753
2018-06-27 16:47:36 +00:00
Simon Pilgrim 8a02b25313 [X86][SSE] Add missing AVX512 rotation tests
Increase coverage to make sure we're not doing anything stupid without AVX512BW

llvm-svn: 335746
2018-06-27 16:00:53 +00:00
Craig Topper 31cbe75b3b [X86] Rename the autoupgraded of packed fp compare and fpclass intrinsics that don't take a mask as input to exclude '.mask.' from their name.
I think the intrinsics named 'avx512.mask.' should refer to the previous behavior of taking a mask argument in the intrinsic instead of using a 'select' or 'and' instruction in IR to accomplish the masking. This is more consistent with the goal that eventually we will have no intrinsics that have masking builtin. When we reach that goal, we should have no intrinsics named "avx512.mask".

llvm-svn: 335744
2018-06-27 15:57:53 +00:00
Simon Pilgrim d3e583a52d [DAGCombiner] visitSDIV - add special case handling for (sdiv X, 1) -> X in pow2 expansion
For divisor = 1, perform a select of X - reduces scalarisation of simple SDIVs

llvm-svn: 335727
2018-06-27 12:45:31 +00:00
Simon Pilgrim 41afbcb9ca [X86][SSE] Include MIN_SIGNED element in non-uniform SDIV pow2 tests
llvm-svn: 335721
2018-06-27 10:59:36 +00:00
Simon Pilgrim dfbcc66adc [DAGCombiner] Fold SDIV(%X, MIN_SIGNED) -> SELECT(%X == MIN_SIGNED, 1, 0)
Fixes PR37569.

llvm-svn: 335719
2018-06-27 10:21:06 +00:00
Simon Pilgrim 0a566bc0ae [DAGCombiner] Don't accept signbit sdiv divisors in sdiv-by-pow2 vector expansion (PR37569)
llvm-svn: 335717
2018-06-27 09:41:22 +00:00
Simon Pilgrim c9e60adcb5 [X86] Add test for SDIV by sign bit (minsigned) value
llvm-svn: 335671
2018-06-26 22:03:00 +00:00
Jessica Paquette 67599c2e1e [X86][AsmParser] Recommit r335658
Recommit of r335658 so that it does not change the behaviour of any
existing error output.

llvm-svn: 335668
2018-06-26 21:30:34 +00:00
Jessica Paquette 0a80af0761 Revert "[X86][AsmParser] Emit an error when RIP-relative instructions are used in 32-bit mode"
This reverts commit 4850a9aae8b38c7deadc103d634ec7397e6c323b.

It caused MC/X86/x86_errors.s to fail. Will fix and recommit shortly.

llvm-svn: 335660
2018-06-26 20:57:19 +00:00
Jessica Paquette 0e40d4bfc3 [X86][AsmParser] Emit an error when RIP-relative instructions are used in 32-bit mode
Right now, when we use RIP-relative instructions in 32-bit mode, we'll just
assert and crash.

This adds an error message which tells the user that they can't do that in
32-bit mode, so that we don't crash (and also can see the issue outside of
assert builds).

llvm-svn: 335658
2018-06-26 20:33:46 +00:00
Simon Pilgrim 7f55af37f4 [DAGCombiner] Don't accept -1 sdiv divisors in sdiv-by-pow2 vector expansion (PR37119)
Temporary fix until I've managed to get D45806 updated - both +1 and -1 special cases need to be properly supported.

llvm-svn: 335637
2018-06-26 17:46:51 +00:00
Simon Pilgrim 1576df53a9 [X86][SSE] Add another sdiv by (nonuniform) minus one test (PR37119)
Include a test that divides by -1 but not by 1 (another special case)

llvm-svn: 335629
2018-06-26 17:06:05 +00:00
Than McIntosh 3190993a02 [X86,ARM] Retain split-stack prolog check for sibling calls
Summary:
If a routine with no stack frame makes a sibling call, we need to
preserve the stack space check even if the local stack frame is empty,
since the call target could be a "no-split" function (in which case
the linker needs to be able to fix up the prolog sequence in order to
switch to a larger stack).

This fixes PR37807.

Reviewers: cherry, javed.absar

Subscribers: srhines, llvm-commits

Differential Revision: https://reviews.llvm.org/D48444

llvm-svn: 335604
2018-06-26 14:11:30 +00:00
Craig Topper c42ed4e3c4 [X86] Use XOR for SUB (C, X) during isel if will help fold an immediate
Summary:
Same idea as D48529, but restricted to X86 and done very late to avoid any surprises where subtract might be better for DAG combining.

This seems like the safest way to do this trick. And we consider doing it as a DAG combine later.

Reviewers: spatel, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48557

llvm-svn: 335575
2018-06-26 03:11:15 +00:00
Craig Topper 689e363ff2 [X86] Redefine avx512 packed fpclass intrinsics to return a vXi1 mask and implement the mask input argument using an 'and' IR instruction.
This recommits r335562 and 335563 as a single commit.

The frontend will surround the intrinsic with the appropriate marshalling to/from a scalar type to match the sigature of the builtin that software expects.

By exposing the vXi1 type directly in the llvm intrinsic we make it available to optimizers much earlier. This can enable the scalar marshalling code to be optimized away.

llvm-svn: 335568
2018-06-26 01:37:02 +00:00