Commit Graph

14408 Commits

Author SHA1 Message Date
Sanjay Patel 8f49aede82 [x86] avoid crashing with illegal vector type (PR31672)
https://llvm.org/bugs/show_bug.cgi?id=31672

llvm-svn: 292758
2017-01-22 17:06:12 +00:00
Craig Topper 8e0724d332 [X86] Don't allow commuting to form phsub operations.
Fixes PR31714.

llvm-svn: 292713
2017-01-21 06:59:38 +00:00
Simon Pilgrim 3e5b525699 Remove trailing whitespace. NFCI.
llvm-svn: 292613
2017-01-20 15:15:59 +00:00
Simon Pilgrim 0da4d2bc03 [CostModel][X86] Removed unused cost. NFCI.
SHL v8i32 is already handled in the SSE41 cost table

llvm-svn: 292612
2017-01-20 15:14:38 +00:00
Simon Pilgrim db101e4d57 [X86][SSE] Improve comments describing combineTruncatedArithmetic. NFCI.
llvm-svn: 292502
2017-01-19 18:18:32 +00:00
Simon Pilgrim 5f2f53b106 [X86][SSE] Attempt to pre-truncate arithmetic operations that have already been extended
As discussed on D28219 - it is profitable to combine trunc(binop (s/zext(x), s/zext(y)) to binop(trunc(s/zext(x)), trunc(s/zext(y))) assuming the trunc(ext()) will simplify further

llvm-svn: 292493
2017-01-19 16:25:02 +00:00
Elena Demikhovsky e01512cecf Recommiting unsigned saturation with a bugfix.
A test case that crached is added to avx512-trunc.ll.
(PR31589)

llvm-svn: 292479
2017-01-19 12:08:21 +00:00
Craig Topper 200ea31684 [AVX-512] Support ADD/SUB/MUL of mask vectors
Summary:
Currently we expand and scalarize these operations, but I think we should be able to implement ADD/SUB with KXOR and MUL with KAND.

We already do this for scalar i1 operations so I just extended it to vectors of i1.

Reviewers: zvi, delena

Reviewed By: delena

Subscribers: guyblank, llvm-commits

Differential Revision: https://reviews.llvm.org/D28888

llvm-svn: 292474
2017-01-19 07:12:35 +00:00
Craig Topper c227529105 [X86] Merge LowerADD and LowerSUB into a single LowerADD_SUB since they are identical.
llvm-svn: 292469
2017-01-19 03:49:29 +00:00
Craig Topper b561e66384 [AVX-512] Use VSHUF instructions instead of two inserts as fallback for subvector broadcasts that can't fold the load.
llvm-svn: 292466
2017-01-19 02:34:29 +00:00
Michael Kuperstein d3d2925933 Revert r291670 because it introduces a crash.
r291670 doesn't crash on the original testcase from PR31589,
but it crashes on a slightly more complex one.

PR31589 has the new reproducer.

llvm-svn: 292444
2017-01-18 23:05:58 +00:00
Kirill Bobyrev 6afbaf0944 Revert 292404 due to buildbot failures.
llvm-svn: 292407
2017-01-18 16:34:25 +00:00
Kirill Bobyrev 9ad06dbe17 [X86] Minor code cleanup to fix several clang-tidy warnings. NFC
llvm-svn: 292404
2017-01-18 16:15:47 +00:00
Michael Zuckerman 0c0240ce84 [X86] Improve mul combine for negative multiplayer (2^c - 1)
This patch improves the mul instruction combine function (combineMul) 
by adding new layer of logic. 
In this patch, we are adding the ability to fold (mul x, -((1 << c) -1)) 
or (mul x, -((1 << c) +1)) into (neg(X << c) -x) or (neg((x << c) + x) respective.

Differential Revision: https://reviews.llvm.org/D28232

llvm-svn: 292358
2017-01-18 09:31:13 +00:00
Marina Yatsina 197db00e3e [X86] Fix for bugzilla 31576 - add support for "data32" instruction prefix
This patch fixes bugzilla 31576 (https://llvm.org/bugs/show_bug.cgi?id=31576).

"data32" instruction prefix was not defined in the llvm.
An exception had to be added to the X86 tablegen and AsmPrinter because both "data16" and "data32" are encoded to 0x66 (but in different modes).

Differential Revision: https://reviews.llvm.org/D28468

llvm-svn: 292352
2017-01-18 08:07:51 +00:00
Joerg Sonnenberger 270dd41f75 Remove an overeager assert from r288844.
llvm-svn: 292244
2017-01-17 19:29:15 +00:00
Bob Wilson f2d0b68b3b Revert r291640 change to fold X86 comparison with atomic_load_add.
Even with the fix from r291630, this still causes problems. I get
widespread assertion failures in the Swift runtime's WeakRefCount::increment()
function. I sent a reduced testcase in reply to the commit.

llvm-svn: 292242
2017-01-17 19:18:57 +00:00
Craig Topper 729d30d0ae [AVX-512] Add support for taking a bitcast between a SUBV_BROADCAST and VSELECT and moving it to the input of the SUBV_BROADCAST if it will help with using a masked operation.
llvm-svn: 292201
2017-01-17 06:49:59 +00:00
Craig Topper fba613e407 [X86] Merge the disassemblers handling of the different TYPE_RELs by getting the size information from the ENCODING field. NFCI
llvm-svn: 292096
2017-01-16 06:49:09 +00:00
Craig Topper ad944a1cac [X86] Reduce the number of operand 'types' the disassembler needs to deal with. NFCI
We were frequently checking for a list of types and the different types
conveyed no real information. So lump them together explicitly.

llvm-svn: 292095
2017-01-16 06:49:03 +00:00
Craig Topper 3173a1f8ff [AVX-512] Teach the disassembler about all of the EVEX gather and scatter instructions.
llvm-svn: 292094
2017-01-16 05:44:33 +00:00
Craig Topper 33ac064137 [AVX-512] Begin giving the disassembler a way to recognize that VSIB is a different encoding than regular addressing modes.
This part first teaches it not to check error if EVEX.V2 is used by a VSIB instruction.

llvm-svn: 292093
2017-01-16 05:44:25 +00:00
Craig Topper 7dfd583644 [AVX-512] Correct memory operand size for VPGATHERQPS and VPGATHERQD
with ZMM index. Similar for SCATTER and the prefetch gather and scatter
instructions.

Fixes PR31618.

llvm-svn: 292088
2017-01-16 00:55:58 +00:00
Craig Topper 8be6ebce2b [AVX-512] Fix register class in one of the gather/scatter memory operands so that all 32 bit registers can be allowed.
llvm-svn: 292087
2017-01-16 00:55:50 +00:00
Simon Pilgrim 6ed996cdf0 [CostModel][X86] Fix AVX512BW vector shift costs for vXi16 types
We already have patterns in place to support 128/256-bit shifts without AVX512VL

llvm-svn: 292077
2017-01-15 20:44:00 +00:00
Michael Zuckerman 6baa3838e9 Fix blend mask by switch the side of the operand since Blend node uses opposite mask then Select NODE.
llvm-svn: 292066
2017-01-15 16:43:14 +00:00
Craig Topper f1388ef006 [AVX-512] Remove unnecessary duplicate broadcast patterns. NFC
llvm-svn: 292053
2017-01-15 06:15:45 +00:00
Craig Topper 52317e8b6e [AVX-512] Replicate some broadcast patterns to VLX and disable the AVX2 patterns when VLX is available.
llvm-svn: 292051
2017-01-15 05:47:45 +00:00
Craig Topper c294cff863 [X86] Remove untested MOVDDUP patterns.
These all involve bitcasts around the memory operands. This isn't
something we normally do for isel patterns. I suspect DAG combine should
convert the load type making this unnecessary.

llvm-svn: 292050
2017-01-15 05:21:29 +00:00
Simon Pilgrim d419b73a42 [CostModel][X86] Updated vXi64 ASHR costs on AVX512 targets now that D28604 has landed
llvm-svn: 292023
2017-01-14 19:24:23 +00:00
Simon Pilgrim 8e5ecf8ad1 [X86][XOP] Added support for VPMADCSWD 'extend+hadd' IFMA patterns
VPMADCSWD act as VPADDD( VPMADDWD( x, y ), z ) - multiply+extend+hadd and add to v4i32 accumulator

llvm-svn: 292021
2017-01-14 18:52:13 +00:00
Simon Pilgrim b290805e94 [X86][XOP] Added support for VPMACSDQH/VPMACSDQL 'extension' IFMA patterns
VPMACSDQH/VPMACSDQL act as VPADDQ( VPMULDQ( x, y ), z ) - multiply+extending either the odd/even 4i32 input elements and adding to v2i64 accumulator

llvm-svn: 292020
2017-01-14 18:08:54 +00:00
Simon Pilgrim a1631749f8 [X86][XOP] Added support for VPMACSWW/VPMACSDD 'lossy' IFMA patterns
VPMACSWW/VPMACSDD act as add( mul( x, y ), z ) - ignoring any upper bits from both the multiply and add stages

llvm-svn: 292019
2017-01-14 17:13:52 +00:00
Craig Topper 63e2cd6caa [AVX-512] Teach two address instruction pass to replace masked move instructions with blendm instructions when its beneficial.
Isel now selects masked move instructions for vselect instead of blendm. But sometimes it beneficial to register allocation to remove the tied register constraint by using blendm instructions.

This also picks up cases where the masked move was created due to a masked load intrinsic.

Differential Revision: https://reviews.llvm.org/D28454

llvm-svn: 292005
2017-01-14 07:50:52 +00:00
Craig Topper 09b7e0f01d [AVX-512] Replace V_SET0 in AVX-512 patterns with AVX512_128_SET0. Enhance AVX512_128_SET0 expansion to make this possible.
We'll now expand AVX512_128_SET0 to an EVEX VXORD if VLX available. Or if its not, but register allocation has selected a non-extended register we will use VEX VXORPS. And if its an extended register without VLX we'll use a 512-bit XOR. Do the same for AVX512_FsFLD0SS/SD.

This makes it possible for the register allocator to have all 32 registers available to work with.

llvm-svn: 292004
2017-01-14 07:29:24 +00:00
Craig Topper 9cc685a56e [X86] Simplify the code that calculates a scaled blend mask. We don't need a second loop.
llvm-svn: 291996
2017-01-14 04:29:15 +00:00
Craig Topper 9850210d03 [AVX-512] Change blend mask in lowerVectorShuffleAsBlend to a 64-bit value. Also add 32-bit mode command lines to the test case that exercises this just to make sure we sanely handle the 64-bit immediate there.
This fixes a undefined sanitizer failure from r291888.

llvm-svn: 291994
2017-01-14 04:19:35 +00:00
Benjamin Kramer 061f4a5fe6 Apply clang-tidy's performance-unnecessary-value-param to LLVM.
With some minor manual fixes for using function_ref instead of
std::function. No functional change intended.

llvm-svn: 291904
2017-01-13 14:39:03 +00:00
Simon Pilgrim 7f2a6d5e8c [X86][AVX512] Add support for variable ASHR v2i64/v4i64 support without VLX
Use v8i64 variable ASHR instructions if we don't have VLX.

This is a reduced version of D28537 that just adds support for variable shifts - I'll continue with that patch (for just constant/uniform shifts) once I've fixed the type legalization issue in avx512-cvt.ll.

Differential Revision: https://reviews.llvm.org/D28604

llvm-svn: 291901
2017-01-13 13:16:19 +00:00
Diana Picus 116bbab4e4 [CodeGen] Rename MachineInstrBuilder::addOperand. NFC
Rename from addOperand to just add, to match the other method that has been
added to MachineInstrBuilder for adding more than just 1 operand.

See https://reviews.llvm.org/D28057 for the whole discussion.

Differential Revision: https://reviews.llvm.org/D28556

llvm-svn: 291891
2017-01-13 09:58:52 +00:00
Michael Zuckerman 558a4d8419 [X86][AVX512] Adding missing shuffle lowering to blend mask instructions
Some shuffles can be lowered to blend mask instruction (VPBLENDMB/VPBLENDMW/VPBLENDMD/VPBLENDMQ) .
In this patch, I added new pattern match for this case.

Reviewers:
1. craig.topper
2. guyblank
3. RKSimon
4. igorb     

Differential Revision: https://reviews.llvm.org/D28483

llvm-svn: 291888
2017-01-13 09:06:00 +00:00
Craig Topper 1ec84c2a18 [AVX-512] Remove unmasked BLENDM instructions from the wrong load folding table. The unmasked versions read memory from operand 2, but were in the operand 3 table.
These aren't the most interesting set of blendm instructions as the unmasked version isn't useful. We were also missing the B and W forms. I'll add the masked versions of all sizes in a future patch.

llvm-svn: 291885
2017-01-13 07:28:56 +00:00
Craig Topper 46b6ecf41e [X86] Move some entries in the load folding tables to move appropriate grouping. NFC
llvm-svn: 291884
2017-01-13 07:28:53 +00:00
Nikolai Bozhenov f02ac0eeb2 [X86] Replace AND+IMM64 with SRL/SHL
Emit SHRQ/SHLQ instead of ANDQ with a 64 bit constant mask if the result
is unused and the mask has only higher/lower bits set. For example, with
this patch LLVM emits

  shrq $41, %rdi
  je

instead of

  movabsq $0xFFFFFE0000000000, %rcx
  testq   %rcx, %rdi
  je

This reduces number of instructions, code size and register pressure.
The transformation is applied only for cases where the mask cannot be
encoded as an immediate value within TESTQ instruction.

Differential Revision: https://reviews.llvm.org/D28198

llvm-svn: 291806
2017-01-12 19:54:27 +00:00
Nikolai Bozhenov 6bdf92cec7 [X86] Tune bypassing of slow division for Intel CPUs
64-bit integer division in Intel CPUs is extremely slow, much slower
than 32-bit division. On the other hand, 8-bit and 16-bit divisions
aren't any faster. The only important exception is Atom where DIV8
is fastest. Because of that, the patch
1) Enables bypassing of 64-bit division for Atom, Silvermont and
   all big cores.
2) Modifies 64-bit bypassing to use 32-bit division instead of
   16-bit one. This doesn't make the shorter division slower but
   increases chances of taking it. Moreover, it's much more likely
   to prove at compile-time that a value fits 32 bits and doesn't
   require a run-time check (e.g. zext i32 to i64).

Differential Revision: https://reviews.llvm.org/D28196

llvm-svn: 291800
2017-01-12 19:34:15 +00:00
Craig Topper 24c3a2395f [AVX-512] Improve lowering of zero_extend of v4i1 to v4i32 and v2i1 to v2i64 with VLX, but no DQ or BW support.
llvm-svn: 291747
2017-01-12 06:49:12 +00:00
Craig Topper 69ab67b279 [AVX-512] Improve lowering of sign_extend of v4i1 to v4i32 and v2i1 to v2i64 when avx512vl is available, but not avx512dq.
llvm-svn: 291746
2017-01-12 06:49:08 +00:00
Elad Cohen c5ba925ef2 [X86][AVX512] Fix PR31515 - Do not flip vselect condition if it's not a vXi1 mask
r289653 added a case where `vselect <cond> <vector1> <all-zeros>`
is transformed to:
`vselect xor(cond, DAG.getConstant(1, DL, CondVT) <all-zeros> <vector1>`
This was not aimed to catch cases where Cond is not a vXi1
mask but it does. Moreover, when Cond type is VxiN (N > 1)
then xor(cond, DAG.getConstant(1, DL, CondVT) != NOT(cond).
This patch changes the above to xor with allones, and avoids
entering the case for non-mask Conds.

llvm-svn: 291745
2017-01-12 06:49:03 +00:00
Peter Collingbourne 1b5f1cfdb4 X86: Remove dead code. NFC.
llvm-svn: 291721
2017-01-11 23:00:28 +00:00
Simon Pilgrim 0c1faf432b Remove trailing whitespace. NFCI.
llvm-svn: 291680
2017-01-11 16:38:20 +00:00
Elena Demikhovsky 9d0e7c33d3 X86 CodeGen: Optimized pattern for truncate with unsigned saturation.
DAG patterns optimization: truncate + unsigned saturation supported by VPMOVUS* instructions in AVX-512.
And VPACKUS* instructions on SEE* targets.

Differential Revision: https://reviews.llvm.org/D28216

llvm-svn: 291670
2017-01-11 12:59:32 +00:00
Simon Pilgrim 5a81fefad3 [X86][AVX512BW] Vectorize v64i8 vector shifts
Differential Revision: https://reviews.llvm.org/D28447

llvm-svn: 291665
2017-01-11 10:36:51 +00:00
Elad Cohen 0c2601073e [X86] Fix PR30926 - Add patterns for (v)cvtsi2s{s,d} and (v)cvtsd2s{s,d}
The code emiited by Clang's intrinsics for (v)cvtsi2ss, (v)cvtsi2sd,
(v)cvtsd2ss and (v)cvtss2sd is lowered to a code sequence that includes
redundant (v)movss/(v)movsd instructions. This patch adds patterns for
optimizing these sequences.

Differential revision: https://reviews.llvm.org/D28455

llvm-svn: 291660
2017-01-11 09:11:48 +00:00
Mohammed Agabaria 2c96c43388 [X86] updating TTI costs for arithmetic instructions on X86\SLM arch.
updated instructions:
pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd.

special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. 
In case if the real operands bitwidth <= 16.

Differential Revision: https://reviews.llvm.org/D28104 

llvm-svn: 291657
2017-01-11 08:23:37 +00:00
Hans Wennborg 6573976f57 Re-commit r289955: [X86] Fold (setcc (cmp (atomic_load_add x, -C) C), COND) to (setcc (LADD x, -C), COND) (PR31367)
This was reverted because it would miscompile code where the cmp had
multiple uses. That was due to a deficiency in the existing code, which
was fixed in r291630 (see the PR for details).

This re-commit includes an extra test for the kind of code that got
miscompiled: @test_sub_1_setcc_jcc.

llvm-svn: 291640
2017-01-11 01:36:57 +00:00
Hans Wennborg 12de693747 [X86] Dont run combineSetCCAtomicArith() when the cmp has multiple uses
We would miscompile the following:

  void g(int);
  int f(volatile long long *p) {
    bool b = __atomic_fetch_add(p, 1, __ATOMIC_SEQ_CST) < 0;
    g(b ? 12 : 34);
    return b ? 56 : 78;
  }

into

  pushq   %rax
  lock            incq    (%rdi)
  movl    $12, %eax
  movl    $34, %edi
  cmovlel %eax, %edi
  callq   g(int)
  testq   %rax, %rax   <---- Bad.
  movl    $56, %ecx
  movl    $78, %eax
  cmovsl  %ecx, %eax
  popq    %rcx
  retq

because the code failed to take into account that the cmp has multiple
uses, replaced one of them, and left the other one comparing garbage.

llvm-svn: 291630
2017-01-11 00:49:54 +00:00
Michael Zuckerman bcd03e7f3b [X86][AVX512]Improving shuffle lowering by using AVX-512 EXPAND* instructions
This patch fix PR31351: https://llvm.org/bugs/show_bug.cgi?id=31351

1.  This patch adds new type of shuffle lowering
2.  We can use the expand instruction, When the shuffle pattern is as following:
    { 0*a[0]0*a[1]...0*a[n] , n >=0 where a[] elements in a ascending order}.

Reviewers: 1. igorb  
           2. guyblank  
           3. craig.topper  
           4. RKSimon 

Differential Revision: https://reviews.llvm.org/D28352

llvm-svn: 291584
2017-01-10 18:57:17 +00:00
Craig Topper d55b83128b AMD family 17h (znver1) enablement
Summary:
This patch enables the following
1. AMD family 17h architecture using "znver1" tune flag (-march, -mcpu).
2. ISAs that are enabled for "znver1" architecture.
3. Checks ADX isa from cpuid to identify "znver1" flag when -march=native is used.
4. ISAs FMA4, XOP are disabled as they are dropped from amdfam17.
5. For the time being, it uses the btver2 scheduler model.
6. Test file is updated to check this flag.

This item is linked to clang review item https://reviews.llvm.org/D28018

Patch by Ganesh Gopalasubramanian

Reviewers: RKSimon, craig.topper

Subscribers: vprasad, RKSimon, ashutosh.nema, llvm-commits

Differential Revision: https://reviews.llvm.org/D28017

llvm-svn: 291543
2017-01-10 06:01:16 +00:00
Craig Topper 2ed461e5c4 [X86] When lowering uniform shifts, use X86ISD::VZEXT instead of using a ZERO_EXTEND_VECTOR_INREG. If we emit the ZERO_EXTEND_VECTOR_INREG too late it doesn't get lowered properly and makes it through to isel and fails.
Fixes PR31593.

llvm-svn: 291535
2017-01-10 04:12:24 +00:00
Michael Kuperstein 1559e8863e Revert r291092 because it introduces a crash.
See PR31589 for details.

llvm-svn: 291478
2017-01-09 21:04:46 +00:00
Vyacheslav Klochkov d497d36083 X86-specific path: Implemented the fusing of MUL+ADDSUB to FMADDSUB.
Differential Revision: https://reviews.llvm.org/D28087

llvm-svn: 291473
2017-01-09 20:26:17 +00:00
Simon Pilgrim 0f23b2ba1a [X86][AVX512] Enable v16i8/v32i8 vector shifts to use an extend+shift+truncate pattern.
Use the existing AVX2 v8i16 vector shift lowering for v16i8 (extending to v16i32) on AVX512 targets and v32i8 (extending to v32i16) on AVX512BW targets.

Cost model updates to follow.

llvm-svn: 291451
2017-01-09 17:20:03 +00:00
Simon Pilgrim d990cd371b [X86][AVX512DQ] Enable v16i16 vector shifts to use an extend+shift+truncate pattern.
Use the existing AVX2 v8i16 vector shift lowering for v16i16 on AVX512 targets (AVX512BW will have already have lowered with vpsravw).

Cost model updates to follow.

llvm-svn: 291445
2017-01-09 15:15:45 +00:00
Craig Topper 96ab6fd2eb [AVX-512] Change another pattern that was using BLENDM to use masked moves. A future patch will conver it back to BLENDM if its beneficial to register allocation.
llvm-svn: 291419
2017-01-09 04:19:34 +00:00
Craig Topper 6393afce97 [AVX-512] Add patterns to use a zero masked VPTERNLOG instruction for vselects of all ones and all zeros.
Previously we emitted a VPTERNLOG and a separate masked move.

llvm-svn: 291415
2017-01-09 02:44:34 +00:00
Craig Topper f51ba1e3da [AVX-512] If avx512dq is available use vpmovm2d/vpmovm2q instead of vselect of zeroes/ones when handling sign extends of i1 without VLX.
llvm-svn: 291402
2017-01-08 21:32:30 +00:00
Simon Pilgrim e6d948b857 Strip trailing whitespace.
llvm-svn: 291395
2017-01-08 18:37:42 +00:00
Simon Pilgrim b2a80950fe Fix line endings and strip trailing whitespace.
llvm-svn: 291393
2017-01-08 16:45:39 +00:00
Sanjay Patel bf51c8a975 [x86] fix usage of stale operands when lowering select
I noticed this problem as part of the ongoing attempt to canonicalize min/max ops in IR.

The debug output shows nodes like this:

t4: i32 = xor t2, Constant:i32<-1>
    t21: i8 = setcc t4, Constant:i32<0>, setlt:ch
  t14: i32 = select t21, t4, Constant:i32<-1>

And because the select is holding onto the t4 (xor) node while EmitTest creates a new 
x86-specific xor node, the lowering results in:

  t4: i32 = xor t2, Constant:i32<-1>
  t25: i32,i32 = X86ISD::XOR t2, Constant:i32<-1>
t28: i32,glue = X86ISD::CMOV Constant:i32<-1>, t4, Constant:i8<15>, t25:1

Differential Revision: https://reviews.llvm.org/D28374

llvm-svn: 291392
2017-01-08 15:53:40 +00:00
Simon Pilgrim 9c58950eeb [CostModel][X86] Fixed vXi8 uniform shift costs.
The 'fast' costs should only work for shifts by uniform constants (uniform non-constant are lowered using the slow default implementation).

Logical shifts were not taking into account that we must mask the psrlw result, so the costs needed to be doubled.

Added missing AVX2/AVX512BW costs as well.

llvm-svn: 291391
2017-01-08 14:14:36 +00:00
Simon Pilgrim 1fa5487c05 [CostModel][X86] Moved legal uniform shift costs earlier.
XOP was prematurely matching, doubling the cost of ashr/lshr uniform shifts.

llvm-svn: 291390
2017-01-08 13:12:03 +00:00
Craig Topper 5c46c7526e [AVX-512] Remove redundant patterns that select unaligned moves with zero masking for patterns that already use the aligned form. NFC
llvm-svn: 291383
2017-01-08 05:46:21 +00:00
Simon Pilgrim 9681c407b4 [CostModel][X86] Update SSE41/AVX1 vXi32 SHL costs
SSE41 provides pmulld which allows the simpler pslld/paddd/cvttps2dq/pmulld pattern than SSE2's use of pmuludq.

llvm-svn: 291372
2017-01-07 22:27:43 +00:00
Craig Topper a74e3088df [AVX-512] Remove patterns from the other VBLENDM instructions. They are all redundant with masked move instructions.
We should probably teach the two address instruction pass to turn masked moves into BLENDM when its beneficial to the register allocator.

llvm-svn: 291371
2017-01-07 22:20:34 +00:00
Craig Topper 81f20aa336 [AVX-512] Remove patterns from masked broadcast versions of BLENDM instructions.
All but (v2f64 broadcast f64) are handled with VBROADCAST instructions. The v2f64 version can be handled with VMOVDDUP.

We may want to consider converting to BLENDM instructions in the two address instruction pass if its beneficial to register allocation.

llvm-svn: 291369
2017-01-07 22:20:26 +00:00
Craig Topper da84ff3ed4 [AVX-512] Add masked forms of the alternate MOVDDUP patterns.
I'm not too sure how to get isel to select even all of the unmasked forms, but at least we have a consistent set now.

llvm-svn: 291368
2017-01-07 22:20:23 +00:00
Simon Pilgrim a470296367 [CostModel][X86] Fix AVX2 v16i16 shift 'splat' costs.
llvm-svn: 291366
2017-01-07 22:08:09 +00:00
Simon Pilgrim 82e3e05fe2 [CostModel][X86] Match 256-bit vector shift 'splat' costs for AVX2 and above
We were matching against general vector shift costs before the uniform splat costs

llvm-svn: 291365
2017-01-07 21:47:10 +00:00
Simon Pilgrim e70644dab7 [CostModel][X86] Generalized cost calculation of SHL by constant -> MUL conversion.
llvm-svn: 291364
2017-01-07 21:33:00 +00:00
Simon Pilgrim 725997154d [CostModel][X86] Merge separate AVX1 cost LUTs. NFCI.
llvm-svn: 291355
2017-01-07 18:19:25 +00:00
Simon Pilgrim a4109d6433 [CostModel][AVX512BW] Add v32i16 vector shift costs for avx512bw targets.
llvm-svn: 291354
2017-01-07 17:54:10 +00:00
Simon Pilgrim df7de7a87e [CostModel][X86] Added missing AVX2 arithmetic costs.
Allows us to correctly fall through to the lower AVX1 costs if look up failed.

llvm-svn: 291353
2017-01-07 17:27:39 +00:00
Simon Pilgrim 100eae1ee0 [CostModel][X86] Reordered AVX1 arithmetic cost LUT into descending target order. NFCI.
llvm-svn: 291352
2017-01-07 17:03:51 +00:00
Simon Pilgrim a1b8e2c725 [X86][AVX512] Use lowerShuffleAsRepeatedMaskAndLanePermute for non-VBMI v64i8 shuffles (PR31470)
llvm-svn: 291347
2017-01-07 15:37:50 +00:00
Craig Topper 42b848a683 [X86] Disable load unfolding for 128-bit MOVDDUP instructions since the load size is smaller than the register size so unfolding would increase the load size.
llvm-svn: 291338
2017-01-07 06:56:54 +00:00
Simon Pilgrim 3128d6b520 [X86][SSE] Pass float domain flag to shuffle combine match functions. NFCI.
Early step towards ignoring domain above a certain shuffle depth.

llvm-svn: 291248
2017-01-06 17:34:30 +00:00
Simon Pilgrim bd3c6824d4 [X86][SSE] Simplify float domain requirement in unary shuffle matching.
The AVX1-only limit is never actually required in matchUnaryVectorShuffle

llvm-svn: 291244
2017-01-06 17:00:59 +00:00
Simon Pilgrim a08d7b9913 Remove trailing whitespace. NFCI.
llvm-svn: 291240
2017-01-06 15:31:52 +00:00
Simon Pilgrim 9b8c7caf4e [X86] Add X86Subtarget argument. NFCI.
All callers of getTargetVShiftNode have access to X86Subtarget already so pass it along instead of re-extracting it.

llvm-svn: 291239
2017-01-06 15:29:17 +00:00
Simon Pilgrim d8333372bc [CostModel][X86] Fix 512-bit SDIV/UDIV 'big' costs.
Set the costs on the lowest target that supports the type.

llvm-svn: 291229
2017-01-06 11:12:53 +00:00
Craig Topper e86fb932ea [AVX-512] Add EXTRACT_SUBVECTOR support to combineBitcastForMaskedOp.
llvm-svn: 291214
2017-01-06 05:18:48 +00:00
Simon Pilgrim aa186c632d [CostModel][X86] Tidyup arithmetic costs code. NFCI.
Remove unnecessary braces, remove one use variables and keep LUTs to similar naming convention.

llvm-svn: 291187
2017-01-05 22:48:02 +00:00
Simon Pilgrim 4c050c2190 [CostModel][X86] Move vXi32 MUL costs into existing tables. NFCI.
llvm-svn: 291165
2017-01-05 19:42:43 +00:00
Simon Pilgrim 6f72eba606 Remove trailing whitespace. NFCI.
llvm-svn: 291163
2017-01-05 19:24:25 +00:00
Simon Pilgrim 5b06e4d319 [CostModel][X86] Reordered SSE42 arithmetic cost LUT into descending order. NFCI.
llvm-svn: 291162
2017-01-05 19:19:39 +00:00
Simon Pilgrim a8bf97569a [CostModel][X86] Move vXi64 MUL costs into existing tables. NFCI.
Removes need for yet another LUT.

llvm-svn: 291158
2017-01-05 19:01:50 +00:00
Simon Pilgrim 430d34fc14 [CostModel][X86] Strip unused 256-bit vector shift costs. NFCI.
Remove SSE2 256-bit entries - AVX targets will have used the SSE42 costs instead.

llvm-svn: 291152
2017-01-05 18:36:48 +00:00
Simon Pilgrim b01e844241 [CostModel][X86] Include the cost of 256-bit upper subvector extract/insertion in AVX1 v4i64 MUL
Matches other MUL/ADD/SUB 256-bit case on AVX1

llvm-svn: 291149
2017-01-05 18:20:25 +00:00
Simon Pilgrim f74700aa8c [CostModel][X86] Merged SK_PermuteSingleSrc/SK_PermuteTwoSrc into common shuffle cost LUTs. NFCI.
llvm-svn: 291146
2017-01-05 17:56:19 +00:00
Sanjay Patel dea5a7bd53 less braces; NFC
llvm-svn: 291126
2017-01-05 16:47:32 +00:00