Commit Graph

14408 Commits

Author SHA1 Message Date
Elena Demikhovsky 9d0e7c33d3 X86 CodeGen: Optimized pattern for truncate with unsigned saturation.
DAG patterns optimization: truncate + unsigned saturation supported by VPMOVUS* instructions in AVX-512.
And VPACKUS* instructions on SEE* targets.

Differential Revision: https://reviews.llvm.org/D28216

llvm-svn: 291670
2017-01-11 12:59:32 +00:00
Simon Pilgrim 5a81fefad3 [X86][AVX512BW] Vectorize v64i8 vector shifts
Differential Revision: https://reviews.llvm.org/D28447

llvm-svn: 291665
2017-01-11 10:36:51 +00:00
Elad Cohen 0c2601073e [X86] Fix PR30926 - Add patterns for (v)cvtsi2s{s,d} and (v)cvtsd2s{s,d}
The code emiited by Clang's intrinsics for (v)cvtsi2ss, (v)cvtsi2sd,
(v)cvtsd2ss and (v)cvtss2sd is lowered to a code sequence that includes
redundant (v)movss/(v)movsd instructions. This patch adds patterns for
optimizing these sequences.

Differential revision: https://reviews.llvm.org/D28455

llvm-svn: 291660
2017-01-11 09:11:48 +00:00
Mohammed Agabaria 2c96c43388 [X86] updating TTI costs for arithmetic instructions on X86\SLM arch.
updated instructions:
pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd.

special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. 
In case if the real operands bitwidth <= 16.

Differential Revision: https://reviews.llvm.org/D28104 

llvm-svn: 291657
2017-01-11 08:23:37 +00:00
Hans Wennborg 6573976f57 Re-commit r289955: [X86] Fold (setcc (cmp (atomic_load_add x, -C) C), COND) to (setcc (LADD x, -C), COND) (PR31367)
This was reverted because it would miscompile code where the cmp had
multiple uses. That was due to a deficiency in the existing code, which
was fixed in r291630 (see the PR for details).

This re-commit includes an extra test for the kind of code that got
miscompiled: @test_sub_1_setcc_jcc.

llvm-svn: 291640
2017-01-11 01:36:57 +00:00
Hans Wennborg 12de693747 [X86] Dont run combineSetCCAtomicArith() when the cmp has multiple uses
We would miscompile the following:

  void g(int);
  int f(volatile long long *p) {
    bool b = __atomic_fetch_add(p, 1, __ATOMIC_SEQ_CST) < 0;
    g(b ? 12 : 34);
    return b ? 56 : 78;
  }

into

  pushq   %rax
  lock            incq    (%rdi)
  movl    $12, %eax
  movl    $34, %edi
  cmovlel %eax, %edi
  callq   g(int)
  testq   %rax, %rax   <---- Bad.
  movl    $56, %ecx
  movl    $78, %eax
  cmovsl  %ecx, %eax
  popq    %rcx
  retq

because the code failed to take into account that the cmp has multiple
uses, replaced one of them, and left the other one comparing garbage.

llvm-svn: 291630
2017-01-11 00:49:54 +00:00
Michael Zuckerman bcd03e7f3b [X86][AVX512]Improving shuffle lowering by using AVX-512 EXPAND* instructions
This patch fix PR31351: https://llvm.org/bugs/show_bug.cgi?id=31351

1.  This patch adds new type of shuffle lowering
2.  We can use the expand instruction, When the shuffle pattern is as following:
    { 0*a[0]0*a[1]...0*a[n] , n >=0 where a[] elements in a ascending order}.

Reviewers: 1. igorb  
           2. guyblank  
           3. craig.topper  
           4. RKSimon 

Differential Revision: https://reviews.llvm.org/D28352

llvm-svn: 291584
2017-01-10 18:57:17 +00:00
Craig Topper d55b83128b AMD family 17h (znver1) enablement
Summary:
This patch enables the following
1. AMD family 17h architecture using "znver1" tune flag (-march, -mcpu).
2. ISAs that are enabled for "znver1" architecture.
3. Checks ADX isa from cpuid to identify "znver1" flag when -march=native is used.
4. ISAs FMA4, XOP are disabled as they are dropped from amdfam17.
5. For the time being, it uses the btver2 scheduler model.
6. Test file is updated to check this flag.

This item is linked to clang review item https://reviews.llvm.org/D28018

Patch by Ganesh Gopalasubramanian

Reviewers: RKSimon, craig.topper

Subscribers: vprasad, RKSimon, ashutosh.nema, llvm-commits

Differential Revision: https://reviews.llvm.org/D28017

llvm-svn: 291543
2017-01-10 06:01:16 +00:00
Craig Topper 2ed461e5c4 [X86] When lowering uniform shifts, use X86ISD::VZEXT instead of using a ZERO_EXTEND_VECTOR_INREG. If we emit the ZERO_EXTEND_VECTOR_INREG too late it doesn't get lowered properly and makes it through to isel and fails.
Fixes PR31593.

llvm-svn: 291535
2017-01-10 04:12:24 +00:00
Michael Kuperstein 1559e8863e Revert r291092 because it introduces a crash.
See PR31589 for details.

llvm-svn: 291478
2017-01-09 21:04:46 +00:00
Vyacheslav Klochkov d497d36083 X86-specific path: Implemented the fusing of MUL+ADDSUB to FMADDSUB.
Differential Revision: https://reviews.llvm.org/D28087

llvm-svn: 291473
2017-01-09 20:26:17 +00:00
Simon Pilgrim 0f23b2ba1a [X86][AVX512] Enable v16i8/v32i8 vector shifts to use an extend+shift+truncate pattern.
Use the existing AVX2 v8i16 vector shift lowering for v16i8 (extending to v16i32) on AVX512 targets and v32i8 (extending to v32i16) on AVX512BW targets.

Cost model updates to follow.

llvm-svn: 291451
2017-01-09 17:20:03 +00:00
Simon Pilgrim d990cd371b [X86][AVX512DQ] Enable v16i16 vector shifts to use an extend+shift+truncate pattern.
Use the existing AVX2 v8i16 vector shift lowering for v16i16 on AVX512 targets (AVX512BW will have already have lowered with vpsravw).

Cost model updates to follow.

llvm-svn: 291445
2017-01-09 15:15:45 +00:00
Craig Topper 96ab6fd2eb [AVX-512] Change another pattern that was using BLENDM to use masked moves. A future patch will conver it back to BLENDM if its beneficial to register allocation.
llvm-svn: 291419
2017-01-09 04:19:34 +00:00
Craig Topper 6393afce97 [AVX-512] Add patterns to use a zero masked VPTERNLOG instruction for vselects of all ones and all zeros.
Previously we emitted a VPTERNLOG and a separate masked move.

llvm-svn: 291415
2017-01-09 02:44:34 +00:00
Craig Topper f51ba1e3da [AVX-512] If avx512dq is available use vpmovm2d/vpmovm2q instead of vselect of zeroes/ones when handling sign extends of i1 without VLX.
llvm-svn: 291402
2017-01-08 21:32:30 +00:00
Simon Pilgrim e6d948b857 Strip trailing whitespace.
llvm-svn: 291395
2017-01-08 18:37:42 +00:00
Simon Pilgrim b2a80950fe Fix line endings and strip trailing whitespace.
llvm-svn: 291393
2017-01-08 16:45:39 +00:00
Sanjay Patel bf51c8a975 [x86] fix usage of stale operands when lowering select
I noticed this problem as part of the ongoing attempt to canonicalize min/max ops in IR.

The debug output shows nodes like this:

t4: i32 = xor t2, Constant:i32<-1>
    t21: i8 = setcc t4, Constant:i32<0>, setlt:ch
  t14: i32 = select t21, t4, Constant:i32<-1>

And because the select is holding onto the t4 (xor) node while EmitTest creates a new 
x86-specific xor node, the lowering results in:

  t4: i32 = xor t2, Constant:i32<-1>
  t25: i32,i32 = X86ISD::XOR t2, Constant:i32<-1>
t28: i32,glue = X86ISD::CMOV Constant:i32<-1>, t4, Constant:i8<15>, t25:1

Differential Revision: https://reviews.llvm.org/D28374

llvm-svn: 291392
2017-01-08 15:53:40 +00:00
Simon Pilgrim 9c58950eeb [CostModel][X86] Fixed vXi8 uniform shift costs.
The 'fast' costs should only work for shifts by uniform constants (uniform non-constant are lowered using the slow default implementation).

Logical shifts were not taking into account that we must mask the psrlw result, so the costs needed to be doubled.

Added missing AVX2/AVX512BW costs as well.

llvm-svn: 291391
2017-01-08 14:14:36 +00:00
Simon Pilgrim 1fa5487c05 [CostModel][X86] Moved legal uniform shift costs earlier.
XOP was prematurely matching, doubling the cost of ashr/lshr uniform shifts.

llvm-svn: 291390
2017-01-08 13:12:03 +00:00
Craig Topper 5c46c7526e [AVX-512] Remove redundant patterns that select unaligned moves with zero masking for patterns that already use the aligned form. NFC
llvm-svn: 291383
2017-01-08 05:46:21 +00:00
Simon Pilgrim 9681c407b4 [CostModel][X86] Update SSE41/AVX1 vXi32 SHL costs
SSE41 provides pmulld which allows the simpler pslld/paddd/cvttps2dq/pmulld pattern than SSE2's use of pmuludq.

llvm-svn: 291372
2017-01-07 22:27:43 +00:00
Craig Topper a74e3088df [AVX-512] Remove patterns from the other VBLENDM instructions. They are all redundant with masked move instructions.
We should probably teach the two address instruction pass to turn masked moves into BLENDM when its beneficial to the register allocator.

llvm-svn: 291371
2017-01-07 22:20:34 +00:00
Craig Topper 81f20aa336 [AVX-512] Remove patterns from masked broadcast versions of BLENDM instructions.
All but (v2f64 broadcast f64) are handled with VBROADCAST instructions. The v2f64 version can be handled with VMOVDDUP.

We may want to consider converting to BLENDM instructions in the two address instruction pass if its beneficial to register allocation.

llvm-svn: 291369
2017-01-07 22:20:26 +00:00
Craig Topper da84ff3ed4 [AVX-512] Add masked forms of the alternate MOVDDUP patterns.
I'm not too sure how to get isel to select even all of the unmasked forms, but at least we have a consistent set now.

llvm-svn: 291368
2017-01-07 22:20:23 +00:00
Simon Pilgrim a470296367 [CostModel][X86] Fix AVX2 v16i16 shift 'splat' costs.
llvm-svn: 291366
2017-01-07 22:08:09 +00:00
Simon Pilgrim 82e3e05fe2 [CostModel][X86] Match 256-bit vector shift 'splat' costs for AVX2 and above
We were matching against general vector shift costs before the uniform splat costs

llvm-svn: 291365
2017-01-07 21:47:10 +00:00
Simon Pilgrim e70644dab7 [CostModel][X86] Generalized cost calculation of SHL by constant -> MUL conversion.
llvm-svn: 291364
2017-01-07 21:33:00 +00:00
Simon Pilgrim 725997154d [CostModel][X86] Merge separate AVX1 cost LUTs. NFCI.
llvm-svn: 291355
2017-01-07 18:19:25 +00:00
Simon Pilgrim a4109d6433 [CostModel][AVX512BW] Add v32i16 vector shift costs for avx512bw targets.
llvm-svn: 291354
2017-01-07 17:54:10 +00:00
Simon Pilgrim df7de7a87e [CostModel][X86] Added missing AVX2 arithmetic costs.
Allows us to correctly fall through to the lower AVX1 costs if look up failed.

llvm-svn: 291353
2017-01-07 17:27:39 +00:00
Simon Pilgrim 100eae1ee0 [CostModel][X86] Reordered AVX1 arithmetic cost LUT into descending target order. NFCI.
llvm-svn: 291352
2017-01-07 17:03:51 +00:00
Simon Pilgrim a1b8e2c725 [X86][AVX512] Use lowerShuffleAsRepeatedMaskAndLanePermute for non-VBMI v64i8 shuffles (PR31470)
llvm-svn: 291347
2017-01-07 15:37:50 +00:00
Craig Topper 42b848a683 [X86] Disable load unfolding for 128-bit MOVDDUP instructions since the load size is smaller than the register size so unfolding would increase the load size.
llvm-svn: 291338
2017-01-07 06:56:54 +00:00
Simon Pilgrim 3128d6b520 [X86][SSE] Pass float domain flag to shuffle combine match functions. NFCI.
Early step towards ignoring domain above a certain shuffle depth.

llvm-svn: 291248
2017-01-06 17:34:30 +00:00
Simon Pilgrim bd3c6824d4 [X86][SSE] Simplify float domain requirement in unary shuffle matching.
The AVX1-only limit is never actually required in matchUnaryVectorShuffle

llvm-svn: 291244
2017-01-06 17:00:59 +00:00
Simon Pilgrim a08d7b9913 Remove trailing whitespace. NFCI.
llvm-svn: 291240
2017-01-06 15:31:52 +00:00
Simon Pilgrim 9b8c7caf4e [X86] Add X86Subtarget argument. NFCI.
All callers of getTargetVShiftNode have access to X86Subtarget already so pass it along instead of re-extracting it.

llvm-svn: 291239
2017-01-06 15:29:17 +00:00
Simon Pilgrim d8333372bc [CostModel][X86] Fix 512-bit SDIV/UDIV 'big' costs.
Set the costs on the lowest target that supports the type.

llvm-svn: 291229
2017-01-06 11:12:53 +00:00
Craig Topper e86fb932ea [AVX-512] Add EXTRACT_SUBVECTOR support to combineBitcastForMaskedOp.
llvm-svn: 291214
2017-01-06 05:18:48 +00:00
Simon Pilgrim aa186c632d [CostModel][X86] Tidyup arithmetic costs code. NFCI.
Remove unnecessary braces, remove one use variables and keep LUTs to similar naming convention.

llvm-svn: 291187
2017-01-05 22:48:02 +00:00
Simon Pilgrim 4c050c2190 [CostModel][X86] Move vXi32 MUL costs into existing tables. NFCI.
llvm-svn: 291165
2017-01-05 19:42:43 +00:00
Simon Pilgrim 6f72eba606 Remove trailing whitespace. NFCI.
llvm-svn: 291163
2017-01-05 19:24:25 +00:00
Simon Pilgrim 5b06e4d319 [CostModel][X86] Reordered SSE42 arithmetic cost LUT into descending order. NFCI.
llvm-svn: 291162
2017-01-05 19:19:39 +00:00
Simon Pilgrim a8bf97569a [CostModel][X86] Move vXi64 MUL costs into existing tables. NFCI.
Removes need for yet another LUT.

llvm-svn: 291158
2017-01-05 19:01:50 +00:00
Simon Pilgrim 430d34fc14 [CostModel][X86] Strip unused 256-bit vector shift costs. NFCI.
Remove SSE2 256-bit entries - AVX targets will have used the SSE42 costs instead.

llvm-svn: 291152
2017-01-05 18:36:48 +00:00
Simon Pilgrim b01e844241 [CostModel][X86] Include the cost of 256-bit upper subvector extract/insertion in AVX1 v4i64 MUL
Matches other MUL/ADD/SUB 256-bit case on AVX1

llvm-svn: 291149
2017-01-05 18:20:25 +00:00
Simon Pilgrim f74700aa8c [CostModel][X86] Merged SK_PermuteSingleSrc/SK_PermuteTwoSrc into common shuffle cost LUTs. NFCI.
llvm-svn: 291146
2017-01-05 17:56:19 +00:00
Sanjay Patel dea5a7bd53 less braces; NFC
llvm-svn: 291126
2017-01-05 16:47:32 +00:00
Simon Pilgrim bca02f9e20 [CostModel][X86] Add support for broadcast shuffle costs
Currently only for broadcasts with input and output of the same width.

Differential Revision: https://reviews.llvm.org/D27811

llvm-svn: 291122
2017-01-05 15:56:08 +00:00
Zvi Rackover 4b7d724d62 [X86] Optimize vector shifts with variable but uniform shift amounts
Summary:
For instructions such as PSLLW/PSLLD/PSLLQ a variable shift amount may be passed in an XMM register.
The lower 64-bits of the register are evaluated to determine the shift amount.
This patch improves the construction of the vector containing the shift amount.

Reviewers: craig.topper, delena, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28353

llvm-svn: 291120
2017-01-05 15:11:43 +00:00
Simon Pilgrim a62395a4bd [CostModel][X86] Pulled out common type legalization code
llvm-svn: 291109
2017-01-05 14:33:32 +00:00
Mohammed Agabaria 23599ba794 Currently isLikelyComplexAddressComputation tries to figure out if the given stride seems to be 'complex' and need some extra cost for address computation handling.
This code seems to be target dependent which may not be the same for all targets.
Passed the decision whether the given stride is complex or not to the target by sending stride information via SCEV to getAddressComputationCost instead of 'IsComplex'.

Specifically at X86 targets we dont see any significant address computation cost in case of the strided access in general.

Differential Revision: https://reviews.llvm.org/D27518

llvm-svn: 291106
2017-01-05 14:03:41 +00:00
Mohammed Agabaria 189e2d29ba [Test Commit] fixing some format issue in X86TTI to match clang-format output.
llvm-svn: 291095
2017-01-05 09:51:02 +00:00
Elena Demikhovsky 143cbc425b AVX-512: Optimized pattern for truncate with unsigned saturation.
DAG patterns optimization: truncate + unsigned saturation supported by VPMOVUS* instructions in AVX-512.
Differential revision: https://reviews.llvm.org/D28216

llvm-svn: 291092
2017-01-05 08:21:09 +00:00
Eric Christopher 568c113ac0 Remove dead and unused variable NumSentinelElements.
Fixes PR31529.

llvm-svn: 290998
2017-01-04 20:05:18 +00:00
Simon Pilgrim bb895f3e9c [CostModel][X86] Updated vXi8 and vXi16 Reverse/Alternate shuffle costs
Actual codegen is much better than the extract+insert patterns that was assumed.

llvm-svn: 290962
2017-01-04 14:01:33 +00:00
Simon Pilgrim 939b8cd708 [X86] Merged Reverse/Alternate shuffle cost tables. NFCI.
As discussed on D27811, merged the shuffle cost LUTs and use the shuffle kind to perform the lookup instead of the ISD opcode.

llvm-svn: 290956
2017-01-04 12:08:41 +00:00
Florian Hahn 5815f6c53c [framelowering] Skip dbg values when getting next/previous instruction.
Summary:
In mergeSPUpdates, debug values need to be ignored when getting the
previous element, otherwise debug data could have an impact on codegen.

In eliminateCallFramePseudoInstr, debug values after the erased element
could have an impact on codegen and should be skipped.

Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319)

Reviewers: aprantl, MatzeB, mkuper

Subscribers: gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D27688

llvm-svn: 290955
2017-01-04 12:08:35 +00:00
Ayman Musa 02f9533823 [X86][AVX512] Passing the appropriate memory operand class to INT_{U}COMIS{S|D} instructions
Replacing the memory operand in the intrinsic versions of the comis/ucomis instrucions from f128mem to ssmem/sdmem accordingly.

Differential Revision: https://reviews.llvm.org/D28138

llvm-svn: 290948
2017-01-04 08:21:54 +00:00
Simon Pilgrim c76ea4b638 [X86] Attempt to pre-truncate arithmetic operations if useful
In some cases its more efficient to combine TRUNC( BINOP( X, Y ) ) --> BINOP( TRUNC( X ), TRUNC( Y ) ) if the binop is legal for the truncated types.

This is true for vector integer multiplication (especially vXi64), as well as ADD/AND/XOR/OR in cases where we only need to truncate one of the inputs at runtime (e.g. a duplicated input or an one use constant we can fold).

Further work could be done here - scalar cases (especially i64) could often benefit (if we avoid partial registers etc.), other opcodes, and better analysis of when truncating the inputs reduces costs.

I have considered implementing this for all targets within the DAGCombiner but wasn't sure we could devise a suitable cost model system that would give us the range we need.

Differential Revision: https://reviews.llvm.org/D28219

llvm-svn: 290947
2017-01-04 08:05:42 +00:00
Craig Topper d0aa53b9ae [AVX-512] Add support for detecting 512-bit shuffles that contain a 128-bit subvector insertion from the lowest subvector of one of the sources.
These are best handled with a vinsert32x4 or vinsert64x2 instruction.

llvm-svn: 290946
2017-01-04 07:32:03 +00:00
Craig Topper 83115a809f [AVX-512] Simplify code for creating 512-bit SHUF128 operations.
We don't need two loops and we can safely assume assume and hardcode the size of the widened mask.

llvm-svn: 290942
2017-01-04 07:31:51 +00:00
Craig Topper 48d232d3e7 [X86] Move 128-bit shuffle mask widening check into lowerV2X128VectorShuffle to reduce code duplication. Use the now available widened mask to simplify some code inside lowerV2X128VectorShuffle.
llvm-svn: 290872
2017-01-03 07:36:41 +00:00
Craig Topper 785e58fdc9 [AVX-512] Simplify the code added in r290870 to recognized 256-bit subvector inserts and avoid calling isShuffleEquivalent on a widened mask.
llvm-svn: 290871
2017-01-03 07:36:39 +00:00
Craig Topper 9496e3f916 [AVX-512] Teach shuffle lowering to use vinsert instructions for shuffles corresponding to 256-bit subvector inserts.
llvm-svn: 290870
2017-01-03 07:00:40 +00:00
Craig Topper fa875a1d3d [AVX-512] Teach EVEX to VEX conversion pass to handle VINSERT and VEXTRACT instructions.
llvm-svn: 290869
2017-01-03 05:46:18 +00:00
Craig Topper be9ef55152 [X86] Remove trailing whitespace and an unnecessary line wrap. NFC
llvm-svn: 290867
2017-01-03 05:46:06 +00:00
Craig Topper 06bae884bd [X86] Fix header comment. NFC
llvm-svn: 290866
2017-01-03 05:46:05 +00:00
Craig Topper c849172105 [AVX-512] Add support for pushing bitcasts through INSERT_SUBVEC in order to select a masked operation.
llvm-svn: 290865
2017-01-03 05:46:02 +00:00
Craig Topper 0cda8bbf74 [AVX-512] Remove vinsert intrinsics and autoupgrade to native shufflevectors. There are some codegen problems here that I'll try to fix in future commits.
llvm-svn: 290864
2017-01-03 05:45:57 +00:00
Craig Topper 4d47c6ae57 [AVX-512] Remove vextract intrinsics and autoupgrade to native shufflevectors. This unfortunately generates some really terrible code without VLX support due to v2i1 and v4i1 not being legal.
Hopefully we can improve that in future patches.

llvm-svn: 290863
2017-01-03 05:45:46 +00:00
Dean Michael Berris f7e7b938ea [XRay] Merge instrumentation point table emission code into AsmPrinter.
Summary:
No need to have this per-architecture.  While there, unify 32-bit ARM's
behaviour with what changed elsewhere and start function names lowercase
as per the coding standards.  Individual entry emission code goes to the
entry's own class.

Fully tested on amd64, cross-builds on both ARMs and PowerPC.

Reviewers: dberris

Subscribers: aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D28209

llvm-svn: 290858
2017-01-03 04:30:21 +00:00
Elena Demikhovsky d96200d60a Fixed shuffle-reverse cost on AVX-512.
(This changed was approved in https://reviews.llvm.org/D28118, but Simon asked to submit it separately).

llvm-svn: 290812
2017-01-02 11:44:10 +00:00
Elena Demikhovsky 21706cbd24 AVX-512 Loop Vectorizer: Cost calculation for interleave load/store patterns.
X86 target does not provide any target specific cost calculation for interleave patterns.It uses the common target-independent calculation, which gives very high numbers. As a result, the scalar version is chosen in many cases. The situation on AVX-512 is even worse, since we have 3-src shuffles that significantly reduce the cost.

In this patch I calculate the cost on AVX-512. It will allow to compare interleave pattern with gather/scatter and choose a better solution (PR31426).

* Shiffle-broadcast cost will be changed in Simon's upcoming patch.

Differential Revision: https://reviews.llvm.org/D28118

llvm-svn: 290810
2017-01-02 10:37:52 +00:00
Reid Kleckner cd46c1df80 Revert "[COFF] Use 32-bit jump table entries in .rdata for Win64"
This reverts commit r290694. It broke sanitizer tests on Win64. I'll
probably bring this back, but the jump tables will just live in .text
like they do for MSVC.

llvm-svn: 290714
2016-12-29 17:07:10 +00:00
Reid Kleckner c9e0a153cf [COFF] Use 32-bit jump table entries in .rdata for Win64
Summary:
We were already using 32-bit jump table entries, but this was a
consequence of the default PIC model on Win64, and not an intentional
design decision. This patch ensures that we always use 32-bit label
difference jump table entries on Win64 regardless of the PIC model. This
is a good idea because it saves executable size and object file size.

Moving the jump tables to .rdata cleans up the disassembled object code
and reduces the available ROP targets, but it requires adding one more
RIP-relative lea to the code.  COFF doesn't have relocations to express
the difference between two arbitrary symbols, so we can't use the jump
table label in the label difference like we do elsewhere.

Fixes PR31488

Reviewers: majnemer, compnerd

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28141

llvm-svn: 290694
2016-12-29 00:12:39 +00:00
Gadi Haber 19c4fc5e62 This is a large patch for X86 AVX-512 of an optimization for reducing code size by encoding EVEX AVX-512 instructions using the shorter VEX encoding when possible.
There are cases of AVX-512 instructions that have two possible encodings. This is the case with instructions that use vector registers with low indexes of 0 - 15 and do not use the zmm registers or the mask k registers.
The EVEX encoding prefix requires 4 bytes whereas the VEX prefix can take only up to 3 bytes. Consequently, using the VEX encoding for these instructions results in a code size reduction of ~2 bytes even though it is compiled with the AVX-512 features enabled.

Reviewers: Craig Topper, Zvi Rackoover, Elena Demikhovsky 
Differential Revision: https://reviews.llvm.org/D27901

llvm-svn: 290663
2016-12-28 10:12:48 +00:00
Craig Topper e77e901130 [AVX-512] Add all forms of VPALIGNR, VALIGND, and VALIGNQ to the load folding tables.
llvm-svn: 290591
2016-12-27 06:51:09 +00:00
Craig Topper 2da265b7bf [AVX-512] Remove masked pmuldq and pmuludq intrinsics and autoupgrade them to unmasked intrinsics plus a select.
llvm-svn: 290583
2016-12-27 05:30:14 +00:00
Craig Topper 89b3e0223f [AVX-512] Add 512-bit unmasked intrinsics for pmuldq and pmuludq so we can add them to InstCombine with the 128 and 256 bit versions.
The 128 and 256 bit masked intrinsics are currently unused by clang. The sse and avx2 unmasked intrinsics are used instead. The new 512-bit intrinsic will be used to do the same. Then all masked versions will removed and autoupgraded.

llvm-svn: 290573
2016-12-27 03:46:05 +00:00
Craig Topper 83f2145c18 [AVX-512] Add isel patterns to turn native masked scalar add/sub/mul/div into masked instructions.
llvm-svn: 290564
2016-12-27 01:56:24 +00:00
Craig Topper 5ef13ba18b [AVX-512] Fix some patterns to use extended register classes.
llvm-svn: 290536
2016-12-26 07:26:07 +00:00
Craig Topper f56d985f77 [AVX-512] Don't assume that the rounding mode argument to intrinsics is a constant. While clang will guarantee this, nothing in the backend will.
A non-constant value will now result in an isel error instead of just asserting or crashing due to a bad cast during lowering.

llvm-svn: 290532
2016-12-26 01:40:17 +00:00
Michael Zuckerman 86602e85dd revert commit 290516
llvm-svn: 290517
2016-12-25 12:45:18 +00:00
Michael Zuckerman 45aa420640 Commit try added new empty line
llvm-svn: 290516
2016-12-25 12:01:34 +00:00
Florian Hahn 898127fe36 Revert r290423 because it broke the sanitizer-x86_64-linux-autoconf buildbot.
llvm-svn: 290425
2016-12-23 12:26:11 +00:00
Florian Hahn 1d6b1a7b79 [framelowering] Skip dbg values when getting next/previous instruction.
Summary:
In mergeSPUpdates, debug values need to be ignored when getting the
previous element, otherwise debug data could have an impact on codegen.

In eliminateCallFramePseudoInstr, debug values after the erased element
could have an impact on codegen and should be skipped.

Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319)

Reviewers: mkuper, MatzeB, aprantl

Subscribers: gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D27688

llvm-svn: 290423
2016-12-23 11:35:00 +00:00
Wei Mi f3f01aba48 Change the interface of TLI.isMultiStoresCheaperThanBitsMerge.
This is for splitMergedValStore in DAG Combine to share the target query interface
with similar logic in CodeGenPrepare.

Differential Revision: https://reviews.llvm.org/D24707

llvm-svn: 290363
2016-12-22 19:38:22 +00:00
Ayman Musa 9ff608cdc6 [X86][AVX2] Passing the appropriate memory operand class to VPMADDWD instruction.
Replacing the memory operand in the ymm version of VPMADDWD from i128mem to i256mem.

Differential Revision: https://reviews.llvm.org/D28024

llvm-svn: 290333
2016-12-22 08:42:46 +00:00
Simon Pilgrim 081abbb164 [X86][SSE] Improve lowering of vXi64 multiplies
As mentioned on PR30845, we were performing our vXi64 multiplication as:

AloBlo = pmuludq(a, b);
AloBhi = pmuludq(a, psrlqi(b, 32));
AhiBlo = pmuludq(psrlqi(a, 32), b);
return AloBlo + psllqi(AloBhi, 32)+ psllqi(AhiBlo, 32);

when we could avoid one of the upper shifts with:

AloBlo = pmuludq(a, b);
AloBhi = pmuludq(a, psrlqi(b, 32));
AhiBlo = pmuludq(psrlqi(a, 32), b);
return AloBlo + psllqi(AloBhi + AhiBlo, 32);

This matches the lowering on gcc/icc.

Differential Revision: https://reviews.llvm.org/D27756

llvm-svn: 290267
2016-12-21 20:00:10 +00:00
Michael Zuckerman 85e12d2851 revert first commit . removing empty line in X86.h
llvm-svn: 290255
2016-12-21 12:48:01 +00:00
Michael Zuckerman 58838cf29d First commit adding new line to X86.h
llvm-svn: 290254
2016-12-21 12:44:47 +00:00
Elena Demikhovsky 7c7bf1b432 Added a template for building target specific memory node in DAG.
I added API for creation a target specific memory node in DAG. Today, all memory nodes are common for all targets and their constructors are located in SelectionDAG.cpp.
There are some cases in X86 where we need to create a special node - truncation-with-saturation store, float-to-half-store. 
In the current patch I added truncation-with-saturation nodes and I'm using them for intrinsics. In the future I plan to implement DAG lowering for truncation-with-saturation pattern.

Differential Revision: https://reviews.llvm.org/D27899

llvm-svn: 290250
2016-12-21 10:43:36 +00:00
Oren Ben Simhon cb692157b7 [X86] Vectorcall Calling Convention - Adding CodeGen Complete Support
Fixing a warning.

llvm-svn: 290248
2016-12-21 09:47:31 +00:00
Oren Ben Simhon de2eea7298 [X86] Vectorcall Calling Convention - Adding CodeGen Complete Support
Fixing build issues.

llvm-svn: 290244
2016-12-21 08:59:42 +00:00
Oren Ben Simhon 3b95157090 [X86] Vectorcall Calling Convention - Adding CodeGen Complete Support
The vectorcall calling convention specifies that arguments to functions are to be passed in registers, when possible.
vectorcall uses more registers for arguments than fastcall or the default x64 calling convention use. 
The vectorcall calling convention is only supported in native code on x86 and x64 processors that include Streaming SIMD Extensions 2 (SSE2) and above.

The current implementation does not handle Homogeneous Vector Aggregates (HVAs) correctly and this review attempts to fix it.
This aubmit also includes additional lit tests to cover better HVAs corner cases.

Differential Revision: https://reviews.llvm.org/D27392

llvm-svn: 290240
2016-12-21 08:31:45 +00:00
Simon Pilgrim 688114d888 [X86][SSE] Ensure we're only combining shuffles with legal mask types.
I haven't managed to get this to fail yet but its technically possible for the AND -> shuffle decomposition to result in illegal types.

llvm-svn: 290183
2016-12-20 17:09:52 +00:00
Michael LeMay 20565e2aa1 [TargetInstrInfo] replace redundant expression in getMemOpBaseRegImmOfs
Summary:
The expression for computing the return value of getMemOpBaseRegImmOfs has only
one possible value. The other value would result in a return earlier in the
function. This patch replaces the expression with its only possible value.

Reviewers: sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27437

llvm-svn: 290133
2016-12-19 21:02:41 +00:00
Craig Topper 1fd4196337 [X86] When recognizing vector loads or VZEXT_LOAD in selectScalarSSELoad make sure we pass the load's user rather than load itself to the second operand of IsLegalToFold.
llvm-svn: 290089
2016-12-19 08:35:56 +00:00
Craig Topper 375aa90291 [X86] Remove all of the patterns that use X86ISD:FAND/FXOR/FOR/FANDN except for the ones needed for SSE1. Anything SSE2 or above uses the integer ISD opcode.
This removes 11721 bytes from the DAG isel table or 2.2%

llvm-svn: 290073
2016-12-19 00:42:28 +00:00
Daniel Jasper 373f9a6a0c Revert r289955 and r289962. This is causing lots of ASAN failures for us.
Not sure whether it causes and ASAN false positive or whether it
actually leads to incorrect code or whether it even exposes bad code.
Hans, I'll get you instructions to reproduce this.

llvm-svn: 290066
2016-12-18 14:36:38 +00:00
Michael Zuckerman 4b88a770ef [X86] [AVX512] Minor fix in encoding of scalar EVEX instructions. NFC.
Commit on behalf of Gadi Haber  

Removed EVEX_V512 prefix from scalar EVEX instructions since HW ignores L'L bits anyway (LIG). 4 instructions are modified.
The changed encodings are validated with XED.
Rviewers: delena, igorb

Differential revision: https://reviews.llvm.org/D27802

llvm-svn: 290065
2016-12-18 14:29:00 +00:00
Simon Pilgrim e940daf532 [X86][SSE] Add support for combining target shuffles to SHUFPS.
As discussed on D27692, the next step will be to allow cross-domain shuffles once the combined shuffle depth passes a certain point.

llvm-svn: 290064
2016-12-18 14:26:02 +00:00
Craig Topper 7029db0eaa [X86][SSE][AVX-512] Convert FAND/FOR/FXOR/FANDN nodes to integer operations if they are available. This will allow a bunch of patterns to be removed.
These nodes are only emitted for lowering FABS/FNEG/FNABS/FCOPYSIGN. Ideally we just wouldn't create these nodes if SSE2 or higher is available, but it was simple to just convert them in DAG combine.

For SSE2, AVX, and AVX512 with DQI this is no functional change as the execution domain fixing pass ensures the right domain is selected regardless of the ISD opcode.

For AVX-512 without DQI we end up using integer instructions since the floating point versions aren't available. But we were already doing that for any logical operations in code that didn't come from FABS/FNEG/FNABS/FCOPYSIGN so this seems no worse. And we get the benefit of being able to fold broadcasts now.

llvm-svn: 290060
2016-12-18 07:54:23 +00:00
Craig Topper add9cc697a [AVX-512] Use EVEX encoded XOR instruction for zeroing scalar registers when DQI and VLX instructions are available.
This can give the register allocator more registers to use.

llvm-svn: 290057
2016-12-18 06:23:14 +00:00
Craig Topper 2baef8f466 [AVX-512] Make sure VLX is also enabled before using EVEX encoded logic ops for scalars. I missed this in r290049.
llvm-svn: 290055
2016-12-18 04:17:00 +00:00
Craig Topper d3295c6a3a [AVX-512] Use EVEX encoded logic operations for scalar types when they are available. This gives the register allocator more registers to work with.
llvm-svn: 290049
2016-12-17 19:26:00 +00:00
Hans Wennborg ef57755427 Fix -Wself-assign from r289955
llvm-svn: 289962
2016-12-16 17:16:46 +00:00
Hans Wennborg 35f21cba13 [X86] Fold (setcc (cmp (atomic_load_add x, -C) C), COND) to (setcc (LADD x, -C), COND) (PR31367)
atomic_load_add returns the value before addition, but sets EFLAGS based on the
result of the addition. That means it's setting the flags based on effectively
subtracting C from the value at x, which is also what the outer cmp does.

This targets a pattern that occurs frequently with reference counting pointers:

  void decrement(long volatile *ptr) {
    if (_InterlockedDecrement(ptr) == 0)
      release();
  }

Clang would previously compile it (for 32-bit at -Os) as:

00000000 <?decrement@@YAXPCJ@Z>:
   0:   8b 44 24 04             mov    0x4(%esp),%eax
   4:   31 c9                   xor    %ecx,%ecx
   6:   49                      dec    %ecx
   7:   f0 0f c1 08             lock xadd %ecx,(%eax)
   b:   83 f9 01                cmp    $0x1,%ecx
   e:   0f 84 00 00 00 00       je     14 <?decrement@@YAXPCJ@Z+0x14>
  14:   c3                      ret

and with this patch it becomes:

00000000 <?decrement@@YAXPCJ@Z>:
   0:   8b 44 24 04             mov    0x4(%esp),%eax
   4:   f0 ff 08                lock decl (%eax)
   7:   0f 84 00 00 00 00       je     d <?decrement@@YAXPCJ@Z+0xd>
   d:   c3                      ret

(Equivalent variants with _InterlockedExchangeAdd, std::atomic<>'s fetch_add
or pre-decrement operator generate the same code.)

Differential Revision: https://reviews.llvm.org/D27781

llvm-svn: 289955
2016-12-16 16:34:59 +00:00
Simon Pilgrim 4b73c3de50 [X86][AVX] Call lowerVectorShuffleWithSHUFPS directly instead of calling DAG.getVectorShuffle (PR27885)
We've already done the hardwork of ensuring the mask is safe for 'SHUFPS'.

llvm-svn: 289950
2016-12-16 15:23:32 +00:00
Simon Pilgrim 9519bd9232 [X86][AVX512] use a single shufps for 512-bit vectors when it can save instructions
This is the 512-bit counterpart to the 128-bit transform checked in here:
https://reviews.llvm.org/rL289837

This patch is based on the draft by @sroland (Roland Scheidegger) that is attached to PR27885:
https://llvm.org/bugs/show_bug.cgi?id=27885

llvm-svn: 289946
2016-12-16 14:30:04 +00:00
Simon Pilgrim f159a3414f [X86][SSE] Combine shuffles to MOVSS/MOVSD whatever the domain.
We already do the same thing in shuffle lowering; but don't do it if we have SSE41 (PBLEND) instead.

llvm-svn: 289937
2016-12-16 11:48:51 +00:00
Ahmed Bougacha 5228603387 [GlobalISel] Drop workaround for Legalizer member/class sharing a name. NFC.
MachineLegalizer used to be the name of both the class and the member,
causing GCC errors. r276522 fixed that by renaming the member to just
'Legalizer'.  The 'class' workaround isn't necessary anymore; drop it.

llvm-svn: 289848
2016-12-15 18:45:30 +00:00
Sanjay Patel a97358bc8e [x86] use a single shufps for 256-bit vectors when it can save instructions
This is the 256-bit counterpart to the 128-bit transform checked in here:
https://reviews.llvm.org/rL289837

This patch is based on the draft by @sroland (Roland Scheidegger) that is
attached to PR27885:
https://llvm.org/bugs/show_bug.cgi?id=27885

llvm-svn: 289846
2016-12-15 18:43:46 +00:00
Sanjay Patel a0d8a278a7 [x86] use a single shufps when it can save instructions
This is a tiny patch with a big pile of test changes.
This partially fixes PR27885:
https://llvm.org/bugs/show_bug.cgi?id=27885

My motivating case looks like this:

  - vpshufd {{.*#+}} xmm1 = xmm1[0,1,0,2]
  - vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
  - vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7]

  + vshufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2]

And this happens several times in the diffs. For chips with domain-crossing penalties,
the instruction count and size reduction should usually overcome any potential 
domain-crossing penalty due to using an FP op in a sequence of int ops. For chips such
as recent Intel big cores and Atom, there is no domain-crossing penalty for shufps, so
using shufps is a pure win.

So the test case diffs all appear to be improvements except one test in 
vector-shuffle-combining.ll where we miss an opportunity to use a shift to generate 
zero elements and one test in combine-sra.ll where multiple uses prevent the expected
shuffle combining.

Differential Revision: https://reviews.llvm.org/D27692

llvm-svn: 289837
2016-12-15 18:03:38 +00:00
Simon Pilgrim 7522f54feb [X86][SSE] Fix domains for scalar store instructions
As discussed on D27692

llvm-svn: 289834
2016-12-15 17:09:24 +00:00
Simon Pilgrim ba46422694 [X86][AVX512] Moved instruction domain lookups to the right table. NFCI.
Avoid duplicating instructions in the int32/int64 domains.

llvm-svn: 289830
2016-12-15 16:38:51 +00:00
Simon Pilgrim d7518896ff [X86][SSE] Fix domains for VZEXT_LOAD type instructions
Add the missing domain equivalences for movss, movsd, movd and movq zero extending loading instructions.

Differential Revision: https://reviews.llvm.org/D27684

llvm-svn: 289825
2016-12-15 16:05:29 +00:00
Simon Pilgrim 2f7f0e7a48 [CostModel][X86] Updated reverse shuffle costs
llvm-svn: 289819
2016-12-15 14:24:07 +00:00
Michael Zuckerman 1ce2a23a1e Fix bug 30945- [AVX512] Failure to flip vector comparison to remove not mask instruction
adding new optimization opportunity by adding new X86ISelLowering pattern. The test case was shown in https://llvm.org/bugs/show_bug.cgi?id=30945.

Test explanation:
Select gets three arguments mask, op and op2. In this case, the Mask is a result of ICMP. The ICMP instruction compares (with equal operand) the zero initializer vector and the result of the first ICMP.

In general, The result of "cmp eq, op1, zero initializers" is "not(op1)" where op1 is a mask. By rearranging of the two arguments inside the Select instruction, we can get the same result. Without the necessary of the middle phase ("cmp eq, op1, zero initializers").

Missed optimization opportunity: 
vpcmpled %zmm0, %zmm1, %k0
knotw %k0, %k1

can be combine to 
vpcmpgtd %zmm0, %zmm2, %k1

Reviewers: 
1. delena
2. igorb 

Commited after check all 
Differential Revision: https://reviews.llvm.org/D27160

llvm-svn: 289653
2016-12-14 14:57:10 +00:00
Stephan Bergmann 17c7f70362 Replace APFloatBase static fltSemantics data members with getter functions
At least the plugin used by the LibreOffice build
(<https://wiki.documentfoundation.org/Development/Clang_plugins>) indirectly
uses those members (through inline functions in LLVM/Clang include files in turn
using them), but they are not exported by utils/extract_symbols.py on Windows,
and accessing data across DLL/EXE boundaries on Windows is generally
problematic.

Differential Revision: https://reviews.llvm.org/D26671

llvm-svn: 289647
2016-12-14 11:57:17 +00:00
Philip Reames 1f1bbac8da [peephole] Enhance folding logic to work for STATEPOINTs
The general idea here is to get enough of the existing restrictions out of the way that the already existing folding logic in foldMemoryOperand can kick in for STATEPOINTs and fold references to immutable stack slots. The key changes are:

    Support for folding multiple operands at once which reference the same load
    Support for folding multiple loads into a single instruction
    Walk all the operands of the instruction for varidic instructions (this is a bug fix!)

Once this lands, I'll post another patch which refactors the TII interface here. There's nothing actually x86 specific about the x86 code used here.

Differential Revision: https://reviews.llvm.org/D24103

llvm-svn: 289510
2016-12-13 01:38:41 +00:00
Sanjay Patel 62104ee6d9 [x86] fix formatting; NFC
llvm-svn: 289476
2016-12-12 22:31:01 +00:00
Simon Pilgrim 4cbe1834e4 Update inline argument comment. NFCI.
combineX86ShufflesRecursively 'HasPSHUFB' flag has been the more generic 'HasVariableMask' flag for some time.

llvm-svn: 289430
2016-12-12 13:43:15 +00:00
Simon Pilgrim 5ebd2b542b [X86][SSE] Add support for combining SSE VSHLI/VSRLI uniform constant shifts.
Fixes some missed constant folding opportunities and allows us to combine shuffles that end with a logical bit shift.

llvm-svn: 289429
2016-12-12 13:33:58 +00:00
Simon Pilgrim 369cd349b9 [X86][SSE] Lower suitably sign-extended mul vXi64 using PMULDQ
PMULDQ returns the 64-bit result of the signed multiplication of the lower 32-bits of vXi64 vector inputs, we can lower with this if the sign bits stretch that far.

Differential Revision: https://reviews.llvm.org/D27657

llvm-svn: 289426
2016-12-12 10:49:15 +00:00
Craig Topper 36ecce9bed [X86] Teach selectScalarSSELoad to accept full 128-bit vector loads and the X86ISD::VZEXT_LOAD opcode.
Disable peephole on some of the tests that no longer require it to properly fold scalar intrinsics.

llvm-svn: 289424
2016-12-12 07:57:24 +00:00
Craig Topper f2c6f7abf3 [X86] Change CMPSS/CMPSD intrinsic instructions to use sse_load_f32/f64 as its memory pattern instead of full vector load.
These intrinsics only load a single element. We should use sse_loadf32/f64 to give more options of what loads it can match.

Currently these instructions are often only getting their load folded thanks to the load folding in the peephole pass. I plan to add more types of loads to sse_load_f32/64 so we can match without the peephole.

llvm-svn: 289423
2016-12-12 07:57:21 +00:00
Craig Topper 081c0e2864 [X86] Remove some intrinsic instructions from hasPartialRegUpdate
Summary:
These intrinsic instructions are all selected from intrinsics that have well defined behavior for where the upper bits come from. It's not the same place as the lower bits.

As you can see we were suppressing load folding for these instructions in some cases. In none of the cases was the separate load helping avoid a partial dependency on the destination register. So we should just go ahead and allow the load to be folded.

Only foldMemoryOperand was suppressing folding for these. They all have patterns for folding sse_load_f32/f64 that aren't gated with OptForSize, but sse_load_f32/f64 doesn't allow 128-bit vector loads. It only allows scalar_to_vector and vzmovl of scalar loads to match. There's no reason we can't allow a 128-bit vector load to be narrowed so I would like to fix sse_load_f32/f64 to allow that. And if I do that it changes some of these same test cases to fold the load too.

Reviewers: spatel, zvi, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27611

llvm-svn: 289419
2016-12-12 05:07:17 +00:00
Simon Pilgrim 831435cb14 [X86][SSE] Add support for combining target shuffles to SHUFPD.
llvm-svn: 289407
2016-12-11 21:26:25 +00:00
Ayman Musa 7ec4ed55d3 [X86][AVX512] Add missing patterns for broadcast fallback in case load node has multiple uses (for v4i64 and v4f64).
When the load node which the broadcast instruction broadcasts has multiple uses, it cannot be folded.
A fallback pattern is added to catch these cases and provide another solution.

Differential Revision: https://reviews.llvm.org/D27661

llvm-svn: 289404
2016-12-11 20:11:17 +00:00
Oren Ben Simhon 9683ecbff6 [X86] Regcall - Adding support for mask types
Regcall calling convention passes mask types arguments in x86 GPR registers.
The review includes the changes required in order to support v32i1, v16i1 and v8i1.

Differential Revision: https://reviews.llvm.org/D27148

llvm-svn: 289383
2016-12-11 14:10:52 +00:00
Craig Topper e7166ce237 [X86] Fix a comment to say 'an FMA' instead of 'a FMA'. NFC
llvm-svn: 289352
2016-12-11 01:28:08 +00:00
Craig Topper 1f1b441267 [X86] Remove masking from 512-bit VPERMIL intrinsics in preparation for being able to constant fold them in InstCombineCalls like we do for 128/256-bit.
llvm-svn: 289350
2016-12-11 01:26:44 +00:00
Craig Topper edab02b50b [X86] Remove masking from 512-bit PSHUFB intrinsics in preparation for being able to constant fold it in InstCombineCalls like we do for 128/256-bit.
llvm-svn: 289344
2016-12-10 23:09:43 +00:00
Simon Pilgrim a03e350e69 [X86][SSE] Ensure UNPCK inputs are a consistent value type in LowerHorizontalByteSum
llvm-svn: 289341
2016-12-10 21:16:45 +00:00
Craig Topper abe7c5b5e9 [AVX-512] Remove 128/256 masked vpermil instrinsics and autoupgrade to a select around the unmasked avx1 intrinsics.
llvm-svn: 289340
2016-12-10 21:15:52 +00:00
Simon Pilgrim fb58550d73 [X86][SSE] Move ZeroVector creation into the shuffle pattern case where its actually used.
Also fix the ZeroVector's type - I've no idea how this hasn't caused problems........

llvm-svn: 289336
2016-12-10 19:49:55 +00:00
Craig Topper 18b57da491 [AVX-512] Add support for lowering (v2i64 (fp_to_sint (v2f32))) to vcvttps2uqq when AVX512DQ and AVX512VL are available.
llvm-svn: 289335
2016-12-10 19:35:39 +00:00
Craig Topper 8e288e0b68 [X86] Clarify indentation. NFC
llvm-svn: 289334
2016-12-10 19:35:36 +00:00
Craig Topper 85f0e57c33 [X86] Combine LowerFP_TO_SINT and LowerFP_TO_UINT. They only differ by a single boolean flag passed to a helper function. Just check the opcode and create the flag.
llvm-svn: 289333
2016-12-10 19:35:33 +00:00
Craig Topper a39b650d72 [X86] Use X86ISD::CVTTP2SI and X86ISD::CVTTP2UI for lowering 128-bit cvttps2qq and cvttps2uqq intrinsics since there is a mismatch between number of input and output elements.
Ideally ISD::FP_TO_SINT and ISD::FP_TO_UINT would only be used for cases with the same number of input and output elements.

Similar things have already been done for other convert intrinsics.

llvm-svn: 289316
2016-12-10 06:02:48 +00:00
Simon Pilgrim 017b7a71d8 [SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes (REAPPLIED)
Reapplied with fix for PR31323 - X86 SSE2 vXi16 multiplies for illegal types were creating CONCAT_VECTORS nodes with vector inputs that might not total the number of elements in the result type.

llvm-svn: 289232
2016-12-09 17:53:11 +00:00
Craig Topper 38b1b5d44f [X86] Modify patterns from memory form of RCP/RSQRT/SQRT intrinsics to only allow (scalar_to_vector (loadf32/load64)) instead of anything that sse_load_f32/f64 can match.
sse_load_f32/f64 can also match loads that are zero extended to vectors. We shouldn't match that because we wouldn't be able to get the instruction to zero the upper bits like the intrinsic semantics would require for such a case.

There is a test case that does depend on this behavior.

llvm-svn: 289193
2016-12-09 07:57:21 +00:00
Craig Topper a55b483bb5 [AVX-512] Correctly preserve the passthru semantics of the FMA scalar intrinsics
Summary:
Scalar intrinsics have specific semantics about the which input's upper bits are passed through to the output. The same input is also supposed to be the input we use for the lower element when the mask bit is 0 in a masked operation. We aren't currently keeping these semantics with instruction selection.

This patch corrects this by introducing new scalar FMA ISD nodes that indicate whether operand 1(one of the multiply inputs) or operand 3(the additon/subtraction input) should pass thru its upper bits.

We use this information to select 213/132 form for the operand 1 version and the 231 form for the operand 3 version.

We also use this information to suppress combining FNEG operations on the passthru input since semantically the passthru bits aren't negated. This is stronger than the earlier check added for a user being SELECTS so we can remove that.

This fixes PR30913.

Reviewers: delena, zvi, v_klochkov

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27144

llvm-svn: 289190
2016-12-09 06:42:28 +00:00
Craig Topper c4f2b0996d [X86] Add masked versions of VPERMT2* and VPERMI2* to load folding tables.
llvm-svn: 289186
2016-12-09 05:20:11 +00:00
Craig Topper 2aeb456425 [AVX-512] Add vpermilps/pd to load folding tables.
llvm-svn: 289173
2016-12-09 02:18:11 +00:00
Peter Collingbourne 235c275b20 IR, X86: Understand !absolute_symbol metadata on global variables.
Summary:
Attaching !absolute_symbol to a global variable does two things:
1) Marks it as an absolute symbol reference.
2) Specifies the value range of that symbol's address.
Teach the X86 backend to allow absolute symbols to appear in place of
immediates by extending the relocImm and mov64imm32 matchers. Start using
relocImm in more places where it is legal.

As previously proposed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-October/105800.html

Differential Revision: https://reviews.llvm.org/D25878

llvm-svn: 289087
2016-12-08 19:01:00 +00:00