Commit Graph

10452 Commits

Author SHA1 Message Date
Sanjay Patel 7765c93be2 [DAG, x86] allow store merging before and after legalization (PR34217)
rL310710 allowed store merging to occur after legalization to catch stores that are created late,
but this exposes a logic hole seen in PR34217:
https://bugs.llvm.org/show_bug.cgi?id=34217

We will miss merging stores if the target lowers vector extracts into target-specific operations.
This patch allows store merging to occur both before and after legalization if the target chooses
to get maximum merging.

I don't think the potential regressions in the other tests are relevant. The tests are for
correctness of weird IR constructs rather than perf tests, and I think those are still correct.

Differential Revision: https://reviews.llvm.org/D37987

llvm-svn: 313564
2017-09-18 20:54:26 +00:00
Craig Topper 39cdb84560 [X86] Make sure we still emit zext for GR32 to GR64 when the source of the zext is AssertZext
The AssertZext we might see in this case is only giving information about the lower 32 bits. It isn't providing information about the upper 32 bits. So we should emit a zext.

This fixes PR28540.

Differential Revision: https://reviews.llvm.org/D37729

llvm-svn: 313563
2017-09-18 20:49:13 +00:00
Sanjay Patel 74d12b5697 [x86] add tests for PR34217; NFC
llvm-svn: 313548
2017-09-18 18:07:50 +00:00
Simon Pilgrim 4aa28b9730 [X86][AVX] Improve (i8 bitcast (v8i1 x)) handling for 256-bit vector compare results.
As commented on D37849, AVX1 targets were missing a chance to use vmovmskps for v8f32/v8i32 results for bool vector bitcasts

llvm-svn: 313547
2017-09-18 17:58:31 +00:00
Sanjay Patel 078d5d978c [x86] regenerate checks; NFC
llvm-svn: 313545
2017-09-18 17:33:47 +00:00
Simon Pilgrim 0b21ef1fa3 [SelectionDAG] Add BITCAST handling to ComputeNumSignBits for splatted sign bits.
For cases where we are BITCASTing to vectors of smaller elements, then if the entire source was a splatted sign (src's NumSignBits == SrcBitWidth) we can say that the dst's NumSignBit == DstBitWidth, as we're just splitting those sign bits across multiple elements.

We could generalize this but at the moment the only use case I have is to peek through bitcasts to vector comparison results.

Differential Revision: https://reviews.llvm.org/D37849

llvm-svn: 313543
2017-09-18 16:45:05 +00:00
Craig Topper 77d7f331dd [X86] Fix two more places to prefer VPERMQ/PD over VPERM2X128 when AVX2 is enabled
The shuffle combining and lowerVectorShuffleAsLanePermuteAndBlend were both still trying to use VPERM2XF128 for unary shuffles when AVX2 is enabled. VPERM2X128 takes two inputs meaning when we use it for a unary shuffle one of those inputs is left undefined creating a false dependency on whatever register gets allocated there.

If we have VPERMQ/PD we should prefer those since they only have a single input.

Differential Revision: https://reviews.llvm.org/D37947

llvm-svn: 313542
2017-09-18 16:39:49 +00:00
Simon Pilgrim 00161c9961 [X86][SSE] Improve support for vselect(Cond, 0, X) -> ANDN(Cond, X)
As discussed on PR28925 and D37849.

Differential Revision: https://reviews.llvm.org/D37975

llvm-svn: 313532
2017-09-18 14:23:23 +00:00
Simon Pilgrim 360629d170 [X86][SSE] Add vselect with zero tests (PR28925)
llvm-svn: 313529
2017-09-18 13:32:33 +00:00
Nikolai Bozhenov 84af99b3b1 [X86FixupBWInsts] More precise register liveness if no <imp-use> on MOVs.
Summary:
Subregister liveness tracking is not implemented for X86 backend, so
sometimes the whole super register is said to be live, when only a
subregister is really live. That might happen if the def and the use
are located in different MBBs, see added fixup-bw-isnt.mir test.

However, using knowledge of the specific instructions handled by the
bw-fixup-pass we can get more precise liveness information which this
change does.

Reviewers: MatzeB, DavidKreitzer, ab, andrew.w.kaylor, craig.topper

Reviewed By: craig.topper

Subscribers: n.bozhenov, myatsina, llvm-commits, hiraditya

Patch by Andrei Elovikov <andrei.elovikov@intel.com>

Differential Revision: https://reviews.llvm.org/D37559

llvm-svn: 313524
2017-09-18 10:17:59 +00:00
Mohammed Agabaria 77cb080c2d [X86][Codegen] adding masked gathers tests for avx2
related to patch: https://reviews.llvm.org/D35772
adding llvm gathers test before gathers codegen support.

Differential Revision: https://reviews.llvm.org/D37800

llvm-svn: 313516
2017-09-18 06:49:54 +00:00
Craig Topper a6054328e8 [X86] Teach the execution domain fixing tables to use movlhps inplace of unpcklpd for the packed single domain.
MOVLHPS has a smaller encoding than UNPCKLPD in the legacy encodings. With VEX and EVEX encodings it doesn't matter.

llvm-svn: 313509
2017-09-18 04:40:58 +00:00
Craig Topper 87f7381edf [X86] Teach execution domain fixing to convert between FP and int unpack instructions.
llvm-svn: 313508
2017-09-18 03:29:54 +00:00
Craig Topper d4341920d5 [X86] Teach execution domain fixing to convert between VPERMILPS and VPSHUFD.
llvm-svn: 313507
2017-09-18 03:29:47 +00:00
Craig Topper ee6646d7de [X86] Teach shuffle lowering to use MOVLHPS/MOVHLPS for lowering v4f32 unary shuffles with SSE1 only.
llvm-svn: 313504
2017-09-17 22:36:41 +00:00
Craig Topper 6c221690a3 [X86] Add a couple more unary shuffles to the sse1 shuffle test.
These can be implemented with movlhps and movhlps.

llvm-svn: 313503
2017-09-17 22:36:39 +00:00
Jatin Bhateja 356e3e2c1d Adding test cases for PR34629 & PR34634.
Differential Revision: https://reviews.llvm.org/D37962

llvm-svn: 313490
2017-09-17 18:16:26 +00:00
Igor Breger f1d388a5c5 [GlobalISel][X86] Legalize i1 G_ADD/G_SUB/G_MUL/G_XOR/G_OR/G_AND instructions.
llvm-svn: 313483
2017-09-17 11:34:17 +00:00
Igor Breger 0f382ccb68 [GlobalISel][X86] Use correct physical register in mir tests.NFC.
llvm-svn: 313479
2017-09-17 08:30:42 +00:00
Igor Breger 21200ed7af [GlobalISel][X86] G_FCONSTANT support.
Summary: G_FCONSTANT support, port the implementation from X86FastIsel.

Reviewers: zvi, delena, guyblank

Reviewed By: delena

Subscribers: rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D37734

llvm-svn: 313478
2017-09-17 08:08:13 +00:00
Sanjay Patel 65d6780703 [x86] enable storeOfVectorConstantIsCheap() target hook
This allows vector-sized store merging of constants in DAGCombiner using the existing code in MergeConsecutiveStores(). 
All of the twisted logic that decides exactly what vector operations are legal and fast for each particular CPU are 
handled separately in there using the appropriate hooks.

For the motivating tests in merge-store-constants.ll, we already produce the same vector code in IR via the SLP vectorizer. 
So this is just providing a backend backstop for code that doesn't go through that pass (-O1). More details in PR24449:
https://bugs.llvm.org/show_bug.cgi?id=24449 (this change should be the last step to resolve that bug)

Differential Revision: https://reviews.llvm.org/D37451

llvm-svn: 313458
2017-09-16 13:29:12 +00:00
Craig Topper 23f78c1662 [X86] Add isel patterns to be able to fold loads into VPERM2F128 even when the load is on the first input to the SDNode.
We just need to toggle bits 1 and 5 of the immediate and swap the sources. The peephole pass could trigger commuting/folding for this later, but its easy enough to fix in isel.

Disable the peephole pass on the main vperm2x128 test so we know we're doing this through isel.

llvm-svn: 313455
2017-09-16 09:16:48 +00:00
Craig Topper 0d1b519f78 [X86] Remove unused check lines that got left behind when I moved tests to the instrinsic upgrade file and regenerated.
llvm-svn: 313454
2017-09-16 09:16:46 +00:00
Craig Topper 8374ffde08 [X86] Remove the vperm2f128 test file I just added in r313450.
I missed the we already had a pretty thorough test file for these instructions.

llvm-svn: 313451
2017-09-16 07:51:01 +00:00
Craig Topper f264fcc704 [X86] Remove VPERM2F128/VPERM2I128 intrinsics and autoupgrade to native shuffles.
I've moved the test cases from the InstCombine optimizations to the backend to keep the coverage we had there. It covered every possible immediate so I've preserved the resulting shuffle mask for each of those immediates.

llvm-svn: 313450
2017-09-16 07:36:14 +00:00
Craig Topper aa499c1cb2 [X86] Fix some FileCheck lines that use the wrong prefix.
Assume they were moved during autoupgrading and not changed.

llvm-svn: 313448
2017-09-16 07:13:39 +00:00
Craig Topper 950b19515a [X86] Don't set reserved bits in the immediate in the test cases for vperm2f128.
I'm going to autoupgrade these intrinsics in a future commit. This bit will never be set in the resulting output so pre-removing the bit.

llvm-svn: 313434
2017-09-16 02:11:21 +00:00
Craig Topper 9313df747d [X86] Remove slash in front of a CHECK line in a test.
llvm-svn: 313433
2017-09-16 01:43:21 +00:00
Craig Topper d02179cd9c [X86] Remove usages of vperm2f intrinsics from fast isel tests to match what clang generates after r313418.
llvm-svn: 313424
2017-09-15 23:53:43 +00:00
Hans Wennborg 534bfbd3ba Revert r313343 "[X86] PR32755 : Improvement in CodeGen instruction selection for LEAs."
This caused PR34629: asserts firing when building Chromium. It also broke some
buildbots building test-suite as reported on the commit thread.

> Summary:
>    1/  Operand folding during complex pattern matching for LEAs has been
>        extended, such that it promotes Scale to accommodate similar operand
>        appearing in the DAG.
>        e.g.
>           T1 = A + B
>           T2 = T1 + 10
>           T3 = T2 + A
>        For above DAG rooted at T3, X86AddressMode will no look like
>           Base = B , Index = A , Scale = 2 , Disp = 10
>
>    2/  During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
>        so that if there is an opportunity then complex LEAs (having 3 operands)
>        could be factored out.
>        e.g.
>           leal 1(%rax,%rcx,1), %rdx
>           leal 1(%rax,%rcx,2), %rcx
>        will be factored as following
>           leal 1(%rax,%rcx,1), %rdx
>           leal (%rdx,%rcx)   , %edx
>
>    3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
>       thus avoiding creation of any complex LEAs within a loop.
>
> Reviewers: lsaba, RKSimon, craig.topper, qcolombet
>
> Reviewed By: lsaba
>
> Subscribers: spatel, igorb, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D35014

llvm-svn: 313376
2017-09-15 18:40:26 +00:00
Craig Topper 7a183e2760 [X86] Prefer VPERMQ over VPERM2F128 for any unary shuffle, not just the ones that can be done with a insertf128
The early out for AVX2 in lowerV2X128VectorShuffle is positioned in a weird spot below some shuffle mask equivalency checks.

But I think we want to allow VPERMQ for any unary shuffle.

Differential Revision: https://reviews.llvm.org/D37893

llvm-svn: 313373
2017-09-15 18:11:13 +00:00
Craig Topper e0d724cf51 [X86] Don't create i64 constants on 32-bit targets when lowering v64i1 constant build vectors
When handling a v64i1 build vector of constants on 32-bit targets we were creating an illegal i64 constant that we then bitcasted back to v64i1. We need to instead create two 32-bit constants, bitcast them to v32i1 and concat the result. We should also take care to handle the halves being all zeros/ones after the split.

This patch splits the build vector and then recursively lowers the two pieces. This allows us to handle the all ones and all zeros cases with minimal effort. Ideally we'd just do the split and concat, and let lowering get called again on the new nodes, but getNode has special handling for CONCAT_VECTORS that reassembles the pieces back into a single BUILD_VECTOR. Hopefully the two temporary BUILD_VECTORS we had to create to do this that don't get returned don't cause any issues.

Fixes PR34605.

Differential Revision: https://reviews.llvm.org/D37858

llvm-svn: 313366
2017-09-15 17:09:03 +00:00
Craig Topper 143797eb89 [X86] Add isel pattern infrastructure to begin recognizing when we're inserting 0s into the upper portions of a vector register and the producing instruction as already produced the zeros.
Currently if we're inserting 0s into the upper elements of a vector register we insert an explicit move of the smaller register to implicitly zero the upper bits. But if we can prove that they are already zero we can skip that. This is based on a similar idea of what we do to avoid emitting explicit zero extends for GR32->GR64.

Unfortunately, this is harder for vector registers because there are several opcodes that don't have VEX equivalent instructions, but can write to XMM registers. Among these are SHA instructions and a MMX->XMM move. Bitcasts can also get in the way.

So for now I'm starting with explicitly allowing only VPMADDWD because we emit zeros in combineLoopMAddPattern. So that is placing extra instruction into the reduction loop.

I'd like to allow PSADBW as well after D37453, but that's currently blocked by a bitcast. We either need to peek through bitcasts or canonicalize insert_subvectors with zeros to remove bitcasts on the value being inserted.

Longer term we should probably have a cleanup pass that removes superfluous zeroing moves even when the producer is in another basic block which is something these isel tricks can't do. See PR32544.

Differential Revision: https://reviews.llvm.org/D37653

llvm-svn: 313365
2017-09-15 17:09:00 +00:00
Simon Pilgrim 905e79c4dc [X86][SSE] Add test cases vector for integer multiplies
Mainly inspired by PR34474 / D37896

llvm-svn: 313353
2017-09-15 11:17:42 +00:00
Jatin Bhateja 908c8b37c2 [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs.
Summary:
   1/  Operand folding during complex pattern matching for LEAs has been
       extended, such that it promotes Scale to accommodate similar operand
       appearing in the DAG.
       e.g.
          T1 = A + B
          T2 = T1 + 10
          T3 = T2 + A
       For above DAG rooted at T3, X86AddressMode will no look like
          Base = B , Index = A , Scale = 2 , Disp = 10

   2/  During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
       so that if there is an opportunity then complex LEAs (having 3 operands)
       could be factored out.
       e.g.
          leal 1(%rax,%rcx,1), %rdx
          leal 1(%rax,%rcx,2), %rcx
       will be factored as following
          leal 1(%rax,%rcx,1), %rdx
          leal (%rdx,%rcx)   , %edx

   3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
      thus avoiding creation of any complex LEAs within a loop.

Reviewers: lsaba, RKSimon, craig.topper, qcolombet

Reviewed By: lsaba

Subscribers: spatel, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D35014

llvm-svn: 313343
2017-09-15 05:29:51 +00:00
Simon Pilgrim 0b220c7524 [X86] Regenerate test. NFCI.
llvm-svn: 313259
2017-09-14 13:00:27 +00:00
Simon Pilgrim 47d8f62472 Regenerate test (broadcast comment). NFCI.
llvm-svn: 313258
2017-09-14 12:41:19 +00:00
Ayman Musa ab68449c53 [X86] When applying the shuffle-to-zero-extend transformation on floating point, bitcast to integer first.
Fix issue described in PR34577.

Differential Revision: https://reviews.llvm.org/D37803

llvm-svn: 313256
2017-09-14 12:06:38 +00:00
Simon Pilgrim 8bd2d8780a [DAGCombine] (shl (or x, c1), c2) -> (or (shl x, c2), c1 << c2)
We already have a combine for this pattern when the input to shl is add, so we just need to enable the transformation when the input is or.

Original patch by @tstellar

Differential Revision: https://reviews.llvm.org/D19325

llvm-svn: 313251
2017-09-14 10:38:30 +00:00
Simon Pilgrim 11e2969a35 Fix line endings. NFCI.
llvm-svn: 313246
2017-09-14 10:30:22 +00:00
Dean Michael Berris 01fd7c8bd4 [XRay][CodeGen] Use the current function symbol as the associated symbol for the instrumentation map
Summary:
XRay had been assuming that the previous section is the "text" section
of the function when lowering the instrumentation map. Unfortunately
this is not a safe assumption, because we may be coming from lowering
debug type information for the function being lowered.

This fixes an issue with combining -gsplit-dwarf, -generate-type-units,
-debug-compile and -fxray-instrument for sole member functions. When the
split dwarf section is stripped, we're left with references from the
xray_instr_map to the debug section. The change now uses the function's
symbol instead of the previous section's start symbol.

We found the bug while attempting to strip the split debug sections off
an XRay-instrumented object file, which had a peculiar edge-case for
single-function classes where the single function is being lowered.
Because XRay had assocaited the instrumentation map for a function to
the debug types section instead of the function's section, the objcopy
call will fail due to the misplaced reference from the xray_instr_map
section.

Reviewers: pcc, dblaikie, echristo

Subscribers: llvm-commits, aprantl

Differential Revision: https://reviews.llvm.org/D37791

llvm-svn: 313233
2017-09-14 07:08:23 +00:00
NAKAMURA Takumi 38fac5905e Move llvm/test/CodeGen/X86/clear-liverange-spillreg.mir to SystemZ. It was in wrong place.
llvm-svn: 313218
2017-09-14 00:03:23 +00:00
Hans Wennborg 06e2a384c2 Revert r312719 "[MachineCombiner] Update instruction depths incrementally for large BBs."
This caused PR34596.

> [MachineCombiner] Update instruction depths incrementally for large BBs.
>
> Summary:
> For large basic blocks with lots of combinable instructions, the
> MachineTraceMetrics computations in MachineCombiner can dominate the compile
> time, as computing the trace information is quadratic in the number of
> instructions in a BB and it's relevant successors/predecessors.
>
> In most cases, knowing the instruction depth should be enough to make
> combination decisions. As we already iterate over all instructions in a basic
> block, the instruction depth can be computed incrementally. This reduces the
> cost of machine-combine drastically in cases where lots of instructions
> are combined. The major drawback is that AFAIK, computing the critical path
> length cannot be done incrementally. Therefore we only compute
> instruction depths incrementally, for basic blocks with more
> instructions than inc_threshold. The -machine-combiner-inc-threshold
> option can be used to set the threshold and allows for easier
> experimenting and checking if using incremental updates for all basic
> blocks has any impact on the performance.
>
> Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn
>
> Reviewed By: fhahn
>
> Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D36619

llvm-svn: 313213
2017-09-13 23:23:09 +00:00
Wei Mi a2a135a01c Add a comment for the test. NFC.
llvm-svn: 313199
2017-09-13 21:47:13 +00:00
Wei Mi c0d066468e [RegAlloc] Keep a copy of live interval for the spilled vregs in HoistSpillHelper.
This is to fix PR34502. After rL311401, the live range of spilled vreg will be
cleared. HoistSpill need to use the live range of the original vreg before splitting
to know the moving range of the spills. The patch saves a copy of live interval for
the spilled vreg inside of HoistSpillHelper.

Differential Revision: https://reviews.llvm.org/D37578

llvm-svn: 313197
2017-09-13 21:41:30 +00:00
Gadi Haber 35f4d7ca46 [X86][Skylake] Replacing -mcpu=skx by -mattr in a codegen test. NFC.
NFC.
Replacing -mcpu=skx by -mattr in the run command of the codegen test: avx512-gather-scatter-intrin.ll.

Reviewers: delena
Revision: https://reviews.llvm.org/D37799
llvm-svn: 313144
2017-09-13 12:39:18 +00:00
Simon Pilgrim f613a45bf3 [X86][FMA4] Test FMA4 commutation with repeated ops as well as FMA3
llvm-svn: 313143
2017-09-13 11:21:38 +00:00
Simon Pilgrim 322fc53725 [X86][FMA] Added *213 fma instructions to scheduling tests
Annoyingly the 132/231 variants are pretty tricky to create when you need to due to weak FMA commutation patterns.

llvm-svn: 313142
2017-09-13 11:12:56 +00:00
Gadi Haber a753080d1e [X86][Skylake][KNL] Updating code gen regression test to use the KNL and SKYLAKE prefixes. NFC.
NFC.
Updating the code gen regression test bmi2-schedule.ll to use the KNL and SKYLAKE prefixes for the run commands that use the knl and Skylake mcpu options.
The fix is in preparation for a large patch of adding all SKL scheduling information.

Reviewers: delena, zvi, RKSimon
Revision: https://reviews.llvm.org/D37796
llvm-svn: 313138
2017-09-13 09:28:25 +00:00
Gadi Haber 04de4ce9e2 [X86][Skylake][KNL] Updating code gen regression test to use the KNL and SKYLAKE prefixes. NFC.
NFC.
Updating the code gen regression test bmi2-schedule.ll to use the KNL and SKYLAKE prefixes for the run commands that use the knl and Skylake mcpu options.
The fix is in preparation for a large patch of adding all SKL scheduling information.

Reviewers: delena, zvi
Revision: https://reviews.llvm.org/D37796
llvm-svn: 313137
2017-09-13 09:28:18 +00:00
Gadi Haber fb47ab7cdd NFC.
Updating codegen test bmi2-schedule.ll to use the SKYLAKE and KNL prefix as preparatipn for an upcoming patch to add all SKL scheduling information.

llvm-svn: 313136
2017-09-13 09:27:39 +00:00
Igor Breger 5c721199dd [GlobalISel][X86] support G_FPEXT operation.
Summary: Support G_FPEXT operation. Selection done via TableGen'erated code.

Reviewers: zvi, guyblank, aymanmus, m_zuckerman

Reviewed By: zvi

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D34816

llvm-svn: 313135
2017-09-13 09:05:23 +00:00
Uriel Korach 5d5da5f531 [X86] [PATCH] [intrinsics] Lowering X86 ABS intrinsics to IR. (llvm)
This patch, together with a matching clang patch (https://reviews.llvm.org/D37694), implements the lowering of X86 ABS intrinsics to IR.

differential revision: https://reviews.llvm.org/D37693.

llvm-svn: 313134
2017-09-13 09:02:36 +00:00
Uriel Korach 53872a2d89 [X86] Add explicit mc-encoding checks to X86/viabs.ll. NFC.
Add explicit mc-encoding checks showing that the AVX512VL ABS intrinsics are actually mapped to EVEX encoding.
This is a pre-commit for a soon to come patch which will lower x86 target specific ABS intrinsics to IR.

Differential Revision: https://reviews.llvm.org/D37688

llvm-svn: 313131
2017-09-13 08:33:55 +00:00
Craig Topper 2b6bfda561 [X86] Make sure we emit a SUBREG_TO_REG after the MOV32ri when creating a BEXTR64rr instruction from a shift/and pair.
Fixes PR34589.

llvm-svn: 313126
2017-09-13 07:53:21 +00:00
Elena Demikhovsky 6cab129464 [X86 CodeGen] Optimization of ZeroExtendLoad for v2i8 vector
Load with zero-extend and sign-extend from v2i8 to v2i32 is "Legal" since SSE4.1 and may be performed using PMOVZXBD , PMOVSXBD instructions.

llvm-svn: 313121
2017-09-13 06:40:26 +00:00
Sanjay Patel 659279450e [x86] eliminate unnecessary vector compare for AVX masked store
The masked store instruction only cares about the sign-bit of each mask element,
so the compare s<0 isn't needed.

As noted in PR11210:
https://bugs.llvm.org/show_bug.cgi?id=11210
...fixing this should allow us to eliminate x86-specific masked store intrinsics in IR.
(Although more testing will be needed to confirm that.)

I filed a bug to track improvements for AVX512:
https://bugs.llvm.org/show_bug.cgi?id=34584

Differential Revision: https://reviews.llvm.org/D37446

llvm-svn: 313089
2017-09-12 23:24:05 +00:00
Craig Topper 958106d0f1 [X86] Move matching of (and (srl/sra, C), (1<<C) - 1) to BEXTR/BEXTRI instruction to custom isel
Recognizing this pattern during DAG combine hides information about the 'and' and the shift from other combines. I think it should be recognized at isel so its as late as possible. But it can't be done with table based isel because you need to be able to look at both immediates. This patch moves it to custom isel in X86ISelDAGToDAG.cpp.

This does break a couple tests in tbm_patterns because we are now emitting an and_flag node or (cmp and, 0) that we dont' recognize yet. We already had this problem for several other TBM patterns so I think this fine and we can address of them together.

I've also fixed a bug where the combine to BEXTR was preventing us from using a trick of zero extending AH to handle extracts of bits 15:8. We might still want to use BEXTR if it enables load folding. But honestly I hope we narrowed the load instead before got to isel.

I think we should probably also support matching BEXTR from (srl/srl (and mask << C), C). But that should be a different patch.

Differential Revision: https://reviews.llvm.org/D37592

llvm-svn: 313054
2017-09-12 17:40:25 +00:00
Elena Demikhovsky 18ff5c1374 Added "zext" from v2i8 to v2i32. In the next patch I'll optimize the sequence.
llvm-svn: 313052
2017-09-12 17:27:53 +00:00
Simon Pilgrim 76418aae74 [X86][AVX2] Add gather/movntdqa/pmaskmov/pmovmskb/pslldq/psrldq instructions to scheduling tests
llvm-svn: 313039
2017-09-12 15:52:01 +00:00
Simon Pilgrim 0af5a772e0 [X86][AVX2] Add further instructions to scheduling tests
llvm-svn: 313032
2017-09-12 15:01:20 +00:00
Simon Pilgrim d2d2b37cc9 [X86][AVX2] Add integer broadcast scheduling tests
llvm-svn: 313026
2017-09-12 12:59:20 +00:00
Simon Pilgrim 5a931c641e [X86][AVX2] Add additional fp-broadcast/subvector/shuffle scheduling tests
llvm-svn: 313022
2017-09-12 11:17:01 +00:00
Simon Pilgrim ef9a9d709a [X86][AVX] Add vperm2f128 scheduling test
llvm-svn: 313021
2017-09-12 11:10:59 +00:00
Simon Pilgrim f336d9ce3c [X86][AVX2] Remove old (unused) intrinsic declarations
llvm-svn: 313020
2017-09-12 11:09:30 +00:00
Yael Tsafrir 47668b5e03 [X86] Lower _mm[256|512]_[mask[z]]_avg_epu[8|16] intrinsics to native llvm IR
Differential Revision: https://reviews.llvm.org/D37560

llvm-svn: 313013
2017-09-12 07:50:35 +00:00
Craig Topper afdc36ed74 [X86] Add an extra instruction to TruncAssertSext.ll to prevent the 'or' from being narrowed so that the movl is really required to avoid a miscompile.
If we allow the OR to be narrowed then the upper bits really are zero and we can't tell if the zeroing movl was removed on purpose.

While here regenerate the test with update_llc_test_checks.py

llvm-svn: 312995
2017-09-12 03:50:44 +00:00
Craig Topper 66e4ace1c8 [X86] Rename TruncAssertZext.ll test to TruncAssertSext.ll. Since its testing AssertSext.
llvm-svn: 312991
2017-09-12 01:30:10 +00:00
Adrian Prantl 16aa4cf7ef llvm-dwarfdump: Make -brief the default and add a -verbose option instead.
Differential Revision: https://reviews.llvm.org/D37717

llvm-svn: 312972
2017-09-11 23:05:20 +00:00
Adrian Prantl 7bc1b28291 llvm-dwarfdump: Replace -debug-dump=sect option with individual options.
As discussed on llvm-dev in
http://lists.llvm.org/pipermail/llvm-dev/2017-September/117301.html
this changes the command line interface of llvm-dwarfdump to match the
one used by the dwarfdump utility shipping on macOS. In addition to
being shorter to type this format also has the advantage of allowing
more than one section to be specified at the same time.

In a nutshell, with this change

  $ llvm-dwarfdump --debug-dump=info
  $ llvm-dwarfdump --debug-dump=apple-objc

becomes

  $ dwarfdump --debug-info --apple-objc

Differential Revision: https://reviews.llvm.org/D37714

llvm-svn: 312970
2017-09-11 22:59:45 +00:00
Zvi Rackover 255488a1e0 X86 Tests: More AVX512 conversions tests. NFC
Adding more tests for AVX512 fp<->int conversions that were missing.

llvm-svn: 312921
2017-09-11 15:54:38 +00:00
Simon Pilgrim b092bd321a [X86][SSE] Add support for X86ISD::PACKSS to ComputeNumSignBitsForTargetNode
Helps improve combineLogicBlendIntoPBLENDV support by allowing us to peek into through PACKSS truncations of vector comparison results.

Differential Revision: https://reviews.llvm.org/D37680

llvm-svn: 312916
2017-09-11 14:03:47 +00:00
Simon Pilgrim d0ff65b50e [X86][SSE] Add further test cases showing failure to compute sign bits through PACKSS
Suggested in D37680

Note: had to drop AVX512VL tests as there is an infinite loop in the new tests that needs further investigation (not relevant to D37680).
llvm-svn: 312910
2017-09-11 12:18:43 +00:00
Gadi Haber 3ddffced43 [X86][SKX][KNL] Updating several CodeGen tests to use the attr flag instead of mcpu flag
NFC.
 Updated 3 Codegen regression tests to use the -mattr flag instead of the -mcpu flags as follows:
 Instead of -mcpu=skx use -mattr=+avx512f,+avx512bw,+avx512vl,+avx512dq
 Instead of -mcpu=knl use -mattr=+avx512f

Reviewers: delena
Revision: https://reviews.llvm.org/D37674
llvm-svn: 312909
2017-09-11 11:26:20 +00:00
Michael Zuckerman 9707ba0957 [Interleved][Stride 3]Adding test for case the VF=64 target with AVX512.
llvm-svn: 312907
2017-09-11 10:57:15 +00:00
Simon Pilgrim f6fa1d0369 [X86][SSE] Add test showing failure to compute sign bits through PACKSS
Prevents combineLogicBlendIntoPBLENDV from merging to PBLENDV

llvm-svn: 312906
2017-09-11 10:50:03 +00:00
Igor Breger 1f14364d64 [GlobalISel][X86] G_ANYEXT support.
Summary: G_ANYEXT support

Reviewers: zvi, delena

Reviewed By: delena

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D37675

llvm-svn: 312903
2017-09-11 09:41:13 +00:00
Elena Demikhovsky cc477bbcea Fixed a bug in splitting Scatter operation in the Type Legalizer.
After the split of the Scatter operation, the order of the new instructions is well defined - Lo goes before Hi. Otherwise the semantic of Scatter (from LSB to MSB) is broken.
I'm chaining 2 nodes to prevent reordering.

Differential Revision https://reviews.llvm.org/D37670

llvm-svn: 312894
2017-09-11 06:18:15 +00:00
Elena Demikhovsky 9afc3d7b82 Added a test that demonstrates a ug in Scatter scheduling.
The bug is going to be fixed in an upcomming patch.
 

llvm-svn: 312883
2017-09-10 13:20:42 +00:00
Simon Pilgrim ed27bea373 [X86] Add v2i4 store test case (PR20012)
llvm-svn: 312874
2017-09-09 20:28:50 +00:00
Simon Pilgrim e932c7fafa [X86] Add v2i2 test case (PR20011)
llvm-svn: 312873
2017-09-09 20:22:35 +00:00
Simon Pilgrim da41ca5a25 [X86][FMA] Regenerate FMA tests
llvm-svn: 312871
2017-09-09 19:25:59 +00:00
Simon Pilgrim 97a56866a2 [X86][SSE] i32 vector multiplications test cases from PR6399
llvm-svn: 312868
2017-09-09 18:18:17 +00:00
Simon Pilgrim a866a190d6 [X86][MOVBE] Fix typo in MOVBE scheduling test names
Copy+paste is not your friend

llvm-svn: 312867
2017-09-09 17:52:44 +00:00
Craig Topper 3be1db82b6 [X86] Don't disable slow INC/DEC if optimizing for size
Summary:
Just because INC/DEC is a little slow on some processors doesn't mean we shouldn't prefer it when optimizing for size.

This appears to match gcc behavior.

Reviewers: chandlerc, zvi, RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37177

llvm-svn: 312866
2017-09-09 17:11:59 +00:00
Craig Topper 56af2cad89 [X86] Simplify the slow-incdec test and add test cases with optsize.
I think we want to consider using inc/dec with optsize.

llvm-svn: 312804
2017-09-08 17:33:54 +00:00
Simon Pilgrim 2e4fb24173 [X86] Added PR31045 test case
Reduced version of 'addr-calc-crash.ll' that was included in D27044, that had been fixed already by D31286/rL298633

llvm-svn: 312786
2017-09-08 10:49:11 +00:00
Jatin Bhateja a251312719 [X86] Adding a test point for PR34149 'Suboptimal codegen for "fast" minnum and maxnum'
Differential Revision: https://reviews.llvm.org/D37614

llvm-svn: 312778
2017-09-08 09:15:36 +00:00
Chandler Carruth acbcf06f03 [x86] Flesh out the custom ISel for RMW aritmetic ops with used flags to
cover the bitwise operators.

Nothing really exciting here, this just stamps out the rest of the core
operations that can RMW memory and set flags.

Still not implemented here: ADC, SBB. Those will require more
interesting logic to channel the flags *in*, and I'm not currently
planning to try to tackle that. It might be interesting for someone who
wants to improve our code generation for bignum implementations.

Differential Revision: https://reviews.llvm.org/D37141

llvm-svn: 312768
2017-09-08 00:17:12 +00:00
Chandler Carruth 52a31bf268 [x86] Extend the manual ISel of `add` and `sub` with both RMW memory
operands and used flags to support matching immediate operands.

This is a bit trickier than register operands, and we still want to fall
back on a register operands even for things that appear to be
"immediates" when they won't actually select into the operation's
immediate operand. This also requires us to handle things like selecting
`sub` vs. `add` to minimize the number of bits needed to represent the
immediate, and picking the shortest immediate encoding. In order to
that, we in turn need to scan to make sure that CF isn't used as it will
get inverted.

The end result seems very nice though, and we're now generating
optimal instruction sequences for these patterns IMO.

A follow-up patch will further expand this to other operations with RMW
memory operands. But handing `add` and `sub` are useful starting points
to flesh out the machinery and make sure interesting and complex cases
can be handled.

Thanks to Craig Topper who provided a few fixes and improvements to this
patch in addition to the review!

Differential Revision: https://reviews.llvm.org/D37139

llvm-svn: 312764
2017-09-07 23:54:24 +00:00
Paul Robinson bb92137080 [DWARF] Line 0 should not have a discriminator.
It's meaningless and takes up extra space in the line table.

Differential Revision: https://reviews.llvm.org/D37364

llvm-svn: 312751
2017-09-07 22:15:44 +00:00
Michael Zuckerman 5a385940d3 [X86][LLVM]Expanding Supports lowerInterleavedLoad() in X86InterleavedAccess (VF{8|16|32} stride 3).
This patch expands the support of lowerInterleavedload to {8|16|32}x8i stride 3.

LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=3 VF={8|16|32}) and we plan to include the store (deinterleved side).

The patch goal is to optimize the following sequence:
a0 b0 c0 a1 b1 c1 a2 b2
c2 a3 b3 c3 a4 b4 c4 a5
b5 c5 a6 b6 c6 a7 b7 c7

into

a0 a1 a2 a3 a4 a5 a6 a7
b0 b1 b2 b3 b4 b5 b6 b7
c0 c1 c2 c3 c4 c5 c6 c7

Reviewers
1. zvi
2. igor
3. guyblank
4. dorit
5. Ayal

llvm-svn: 312722
2017-09-07 14:02:13 +00:00
Florian Hahn d39b8a3533 [MachineCombiner] Update instruction depths incrementally for large BBs.
Summary:
For large basic blocks with lots of combinable instructions, the
MachineTraceMetrics computations in MachineCombiner can dominate the compile
time, as computing the trace information is quadratic in the number of
instructions in a BB and it's relevant successors/predecessors.

In most cases, knowing the instruction depth should be enough to make
combination decisions. As we already iterate over all instructions in a basic
block, the instruction depth can be computed incrementally. This reduces the
cost of machine-combine drastically in cases where lots of instructions
are combined. The major drawback is that AFAIK, computing the critical path
length cannot be done incrementally. Therefore we only compute
instruction depths incrementally, for basic blocks with more
instructions than inc_threshold. The -machine-combiner-inc-threshold
option can be used to set the threshold and allows for easier
experimenting and checking if using incremental updates for all basic
blocks has any impact on the performance.

Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn

Reviewed By: fhahn

Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits

Differential Revision: https://reviews.llvm.org/D36619

llvm-svn: 312719
2017-09-07 12:49:39 +00:00
Alexander Ivchenko f3a3cd198e [x86] Update to cmov promotion tests for D36711; NFC
Adding i8 -> [i16, i32, i64] and i32 -> i64 cases.
This way we can see what the current codegen looks like.

llvm-svn: 312707
2017-09-07 08:59:05 +00:00
Zvi Rackover 25799d93f0 X86: Improve AVX512 fptoui lowering
Summary:
Add patterns for
  fptoui <16 x float> to <16 x i8>
  fptoui <16 x float> to <16 x i16>

Reviewers: igorb, delena, craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37505

llvm-svn: 312704
2017-09-07 07:40:34 +00:00
Sanjay Patel e96f875deb [x86] fix triple and regenerate checks for psubus; NFC
Patch by Yulia Koval!

Differential Revision: https://reviews.llvm.org/D37523

llvm-svn: 312662
2017-09-06 19:05:20 +00:00
Wei Mi 818d50a93d [TailCall] Allow llvm.memcpy/memset/memmove to be tail calls when parent
function return the intrinsics's first argument.

llvm.memcpy/memset/memmove return void but they will return the first
argument after they are expanded as libcalls. Now if the parent function
has any return value, llvm.memcpy cannot be turned into tail call after
expansion.

The patch is to handle that case in SelectionDAGBuilder so when caller
function return the same value as the first argument of llvm.memcpy,
tail call is allowed.

Differential Revision: https://reviews.llvm.org/D37406

llvm-svn: 312641
2017-09-06 16:05:17 +00:00
Chandler Carruth 585bfc8443 [x86] Fix PR34377 by disabling cmov conversion when we relied on it
performing a zext of a register.

On the PR there is discussion of how to more effectively handle this,
but this patch prevents us from miscompiling code.

Differential Revision: https://reviews.llvm.org/D37504

llvm-svn: 312620
2017-09-06 06:28:08 +00:00
Zvi Rackover 5ebe94a84d X86 Tests: Tidy up AVX512 conversion tests. NFC.
Rename functions to a consistent format to make it easier to track coverage.

llvm-svn: 312619
2017-09-06 05:33:04 +00:00
Jatin Bhateja 80b5e38c4e Updating a test reference for rL312608.
Differential Revision: https://reviews.llvm.org/D37501

llvm-svn: 312614
2017-09-06 03:58:14 +00:00
Jatin Bhateja 2c139f77c7 [X86] Allow cross-lane permutations for sub targets supporting AVX2.
Summary:
Most instructions in AVX work “in-lane”, that is, each source element is applied only to other
elements of the same lane, thus a cross lane permutation is costly and needs more than one instrution.
AVX2 includes instructions to perform any-to-any permutation of words over a 256-bit register
and vectorized table lookup.

This should also Fix PR34369

Differential Revision: https://reviews.llvm.org/D37388

llvm-svn: 312608
2017-09-06 02:58:47 +00:00
Reid Kleckner e33c94f1b0 Add llvm.codeview.annotation to implement MSVC __annotation
Summary:
This intrinsic represents a label with a list of associated metadata
strings. It is modelled as reading and writing inaccessible memory so
that it won't be removed as dead code. I think the intention is that the
annotation strings should appear at most once in the debug info, so I
marked it noduplicate. We are allowed to inline code with annotations as
long as we strip the annotation, but that can be done later.

Reviewers: majnemer

Subscribers: eraman, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D36904

llvm-svn: 312569
2017-09-05 20:14:58 +00:00
Craig Topper 784fa8a4e3 [X86] Remove unnecessary (v4f32 (X86vzmovl (v4f32 (scalar_to_vector FR32X)))) patterns
We had already disabled the pattern for SSE4.1 and SSE4.2. But it got re-enabled for AVX and AVX512.

With SSE41 we rely on a separate (v4f32 (X86vzmovl VR128)) pattern to select blendps with a xorps to create zeroess. And a separate (v4f32 (scalar_to_vector FR32X)) to select a COPY_TO_REG_CLASS to move FR32 to VR128

The same thing can happen for AVX with vblendps and those separate patterns already exist.

For AVX512, (v4f32 (X86vzmov VR128)) will select a VMOVSS instruction instead of VBLENDPS due to their not being a EVEX VBLENDPS. This is what we were getting out of the larger pattern anyway. So the larger pattern is unneeded for AVX512 too.

For SSE1-SSSE3 we can rely on (v4f32 (X86vzmov VR128)) selecting a MOVSS similar to AVX512. Again this is what the larger pattern did too.

So the only real change here is that AVX1/2 now properly outputs a VBLENDPS during isel instead of a VMOVSS to match SSE41. Most tests didn't notice because the two address instruction pass knows how to turn VMOVSS into VBLENDPS to get an independent destination register.

llvm-svn: 312564
2017-09-05 19:09:02 +00:00
Zvi Rackover 2096893f34 X86 Tests: Adding missing AVX512 fptoui coverage tests. NFC.
Some of the cases show missing pattern i intend to fix shortly.

llvm-svn: 312560
2017-09-05 18:24:39 +00:00
Craig Topper 33caeadd90 [AVX512] Remove patterns for (v8f32 (X86vzmovl (insert_subvector undef, (v4f32 (scalar_to_vector FR32X:)), (iPTR 0)))) and the same for v4f64.
We don't have this same pattern for AVX2 so I don't believe we should have it for AVX512. We also didn't have it for v16f32.

llvm-svn: 312543
2017-09-05 17:33:58 +00:00
Simon Pilgrim 49f9ba37d8 [X86] Limit store merge size when implicitfloat is enabled (PR34421)
As suggested by @niravd : https://bugs.llvm.org/show_bug.cgi?id=34421#c2

Differential Revision: https://reviews.llvm.org/D37464

llvm-svn: 312534
2017-09-05 13:40:29 +00:00
Simon Pilgrim 8dbd745b09 [X86] Regenerate scalar rotation tests
llvm-svn: 312530
2017-09-05 12:28:30 +00:00
Simon Pilgrim 08246d185b [X86][AVX512] Use AVX512 attributes instead of -mcpu in vector shift tests
llvm-svn: 312529
2017-09-05 12:23:45 +00:00
Simon Pilgrim 3cbe005a69 [X86][AVX512] Use AVX512 attributes instead of -mcpu
llvm-svn: 312528
2017-09-05 12:05:14 +00:00
Sanjay Patel 8d7c8c7960 [x86] add tests for vector store merge opportunity; NFC
llvm-svn: 312504
2017-09-04 22:01:25 +00:00
Sanjay Patel 543f3fda83 [x86] auto-generate complete checks; NFC
llvm-svn: 312503
2017-09-04 21:46:05 +00:00
Sanjay Patel 4e10b61d8f [x86] add/regenerate complete checks; NFC
llvm-svn: 312502
2017-09-04 21:43:32 +00:00
Sanjay Patel d413303b83 [x86] add test for unnecessary cmp + masked store; NFC
As noted in PR11210:
https://bugs.llvm.org/show_bug.cgi?id=11210
...fixing this should allow us to eliminate x86-specific masked store intrinsics in IR.
(Although more testing will be needed to confirm that.)

llvm-svn: 312496
2017-09-04 17:21:17 +00:00
Sam McCall f71bb198ed Revert "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding""
This crashes on boringSSL on PPC (will send reduced testcase)

This reverts commit r312328.

llvm-svn: 312490
2017-09-04 15:47:00 +00:00
Simon Pilgrim 91751b42f6 [X86][AVX512] Add support for VPERMILPS v16f32 shuffle lowering (PR34382)
Avoid use of VPERMPS where we don't need it by instead using the variable mask version of VPERMILPS for unary shuffles.

llvm-svn: 312486
2017-09-04 13:51:57 +00:00
Simon Pilgrim adffa8b2e9 Added shuffle test case from PR34382
llvm-svn: 312485
2017-09-04 13:43:13 +00:00
Simon Pilgrim 62c78f27d4 Added shuffle test case from PR34369
llvm-svn: 312481
2017-09-04 11:08:47 +00:00
Ayman Musa 5defce3986 [X86] Replace -mcpu option with -mattr in LIT tests added in https://reviews.llvm.org/rL312442
llvm-svn: 312474
2017-09-04 09:31:32 +00:00
Igor Breger 2661ae48c7 [GlobalISel][X86] G_PHI support.
llvm-svn: 312473
2017-09-04 09:06:45 +00:00
Dean Michael Berris ebc1659016 [XRay][CodeGen] Use PIC-friendly code in XRay sleds and remove synthetic references in .text
Summary:
This is a re-roll of D36615 which uses PLT relocations in the back-end
to the call to __xray_CustomEvent() when building in -fPIC and
-fxray-instrument mode.

Reviewers: pcc, djasper, bkramer

Subscribers: sdardis, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D37373

llvm-svn: 312466
2017-09-04 05:34:58 +00:00
Craig Topper 76f44015e7 [X86] Add a combine to recognize when we have two insert subvectors that together write the whole vector, but the starting vector isn't undef.
In this case we should replace the starting vector with undef.

llvm-svn: 312462
2017-09-04 01:13:36 +00:00
Craig Topper bc13af84f2 [X86] Add a combine to turn (insert_subvector zero, (insert_subvector zero, X, Idx), Idx) into an insert of X into the larger zero vector.
llvm-svn: 312460
2017-09-03 22:25:52 +00:00
Craig Topper fcf6bc5503 [X86] Add more patterns to use moves to zero the upper portions of a vector register that I missed in r312450.
llvm-svn: 312459
2017-09-03 22:25:50 +00:00
Craig Topper 788fbe08db [X86] Combine inserting a vector of zeros into a vector of zeros just the larger vector.
llvm-svn: 312458
2017-09-03 22:25:49 +00:00
Craig Topper 8ee36ffb54 [X86] Add patterns to turn an insert into lower subvector of a zero vector into a move instruction which will implicitly zero the upper elements.
Ideally we'd be able to emit the SUBREG_TO_REG without the explicit register->register move, but we'd need to be sure the producing operation would select something that guaranteed the upper bits were already zeroed.

llvm-svn: 312450
2017-09-03 17:52:25 +00:00
Craig Topper fa82efb50a [X86] Add VBLENDPS/VPBLENDD to the execution domain fixing tables.
llvm-svn: 312449
2017-09-03 17:52:23 +00:00
Craig Topper bb6506d251 [X86] Canonicalize (concat_vectors X, zero) -> (insert_subvector zero, X, 0).
In a future patch, I plan to teach isel to use a small vector move with implicit zeroing of the upper elements when it sees the (insert_subvector zero, X, 0) pattern.

llvm-svn: 312448
2017-09-03 17:52:19 +00:00
Ayman Musa 2927ea0b19 [X86] Add -mtriple option to LIT tests added in https://reviews.llvm.org/rL312442
llvm-svn: 312443
2017-09-03 15:06:26 +00:00
Ayman Musa ef8f61bce6 [X86][AVX512] Add simple tests for all AVX512 shuffle instructions.
Throughout an effort to strongly check the behavior of CodeGen with the IR shufflevector instruction we generated many tests while predicting the best X86 sequence that may be generated.

This is a subset of the generated tests that we think may add value to our X86 set of tests.

Some of the checks are not optimal and will be changed after fixing:
1. PR34394
2. PR34382
3. PR34380
4. PR34359

Differential Revision: https://reviews.llvm.org/D37329

llvm-svn: 312442
2017-09-03 13:53:44 +00:00
Ayman Musa ac12849d32 [X86] Add RUN line for LIT test committed in "rL312438: [X86] Fix crash on assert of non-simple type after type-legalization.".
llvm-svn: 312439
2017-09-03 10:44:18 +00:00
Ayman Musa 44cde94935 [X86] Fix crash on assert of non-simple type after type-legalization
The function combineShuffleToVectorExtend in DAGCombine might generate an illegal typed node after "legalize types" phase, causing assertion on non-simple type to fail afterwards.

Adding a type check in case the combine is running after the type legalize pass.

Differential Revision: https://reviews.llvm.org/D37330

llvm-svn: 312438
2017-09-03 09:09:16 +00:00
Craig Topper 619b759a57 [X86] Teach fastisel to handle zext/sext i8->i16 and sext i1->i8/i16/i32/i64
Summary:
ZExt and SExt from i8 to i16 aren't implemented in the autogenerated fast isel table because normal isel does a zext/sext to 32-bits and a subreg extract to avoid a partial register write or false dependency on the upper bits of the destination. This means without handling in fast isel we end up triggering a fast isel abort.

We had no custom sign extend handling at all so while I was there I went ahead and implemented sext i1->i8/i16/i32/i64 which was also missing. This generates an i1->i8 sign extend using a mask with 1, then an 8-bit negate, then continues with a sext from i8. A better sequence would be a wider and/negate, but would require more custom code.

Fast isel tests are a mess and I couldn't find a good home for the tests so I created a new one.

The test pr34381.ll had to have fast-isel removed because it was relying on a fast isel abort to hit the bug. The test case still seems valid with fast-isel disabled though some of the instructions changed.

Reviewers: spatel, zvi, igorb, guyblank, RKSimon

Reviewed By: guyblank

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37320

llvm-svn: 312422
2017-09-02 18:53:46 +00:00
Sanjay Patel f4425e9a66 [x86] eliminate redundant shuffle of horizontal math ops when both inputs are the same
This is limited to a set of patterns based on the example in PR34111:
https://bugs.llvm.org/show_bug.cgi?id=34111
...but as I was investigating this, I see that horizontal patterns can go wrong in many, 
many other ways that would not be handled by this patch. Each data type may even go 
different in the DAG after starting with the same basic IR pattern, so even proper IR 
canonicalization won't fix it all.

Differential Revision: https://reviews.llvm.org/D37357

llvm-svn: 312379
2017-09-01 21:09:04 +00:00
Craig Topper 2a75b6f26b [X86] Add test case I forgot to commit with r312285.
llvm-svn: 312335
2017-09-01 16:40:24 +00:00
Geoff Berry 65528f2991 Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"
Issues addressed since original review:
- Moved removal of dead instructions found by
  LiveIntervals::shrinkToUses() outside of loop iterating over
  instructions to avoid instructions being deleted while pointed to by
  iterator.
- Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
- The pass no longer forwards COPYs to physical register uses, since
  doing so can break code that implicitly relies on the physical
  register number of the use.
- The pass no longer forwards COPYs to undef uses, since doing so
  can break the machine verifier by creating LiveRanges that don't
  end on a use (since the undef operand is not considered a use).

  [MachineCopyPropagation] Extend pass to do COPY source forwarding

  This change extends MachineCopyPropagation to do COPY source forwarding.

  This change also extends the MachineCopyPropagation pass to be able to
  be run during register allocation, after physical registers have been
  assigned, but before the virtual registers have been re-written, which
  allows it to remove virtual register COPY LiveIntervals that become dead
  through the forwarding of all of their uses.

llvm-svn: 312328
2017-09-01 14:27:20 +00:00
Craig Topper 70e581cdd6 [X86] Add isel patterns for memory forms of FMA3 intrinsic instructions
llvm-svn: 312309
2017-09-01 07:58:13 +00:00
Sanjay Patel 841acbbca0 [x86] add more tests for horizontal ops; NFC
llvm-svn: 312279
2017-08-31 20:59:25 +00:00
Daniel Jasper c0a976d417 Revert r311525: "[XRay][CodeGen] Use PIC-friendly code in XRay sleds; remove synthetic references in .text"
Breaks builds internally. Will forward repo instructions to author.

llvm-svn: 312243
2017-08-31 15:17:17 +00:00
Yael Tsafrir 185c81725e [X86] Added run line to intrinsics upgrade test. NFC.
llvm-svn: 312241
2017-08-31 13:56:22 +00:00
Ashutosh Nema bfcac0b480 AMD family 17h (znver1) scheduler model update.
Summary:
This patch enables the following:
1) Regex based Instruction itineraries for integer instructions.
2) The instructions are grouped as per the nature of the instructions
   (move, arithmetic, logic, Misc, Control Transfer). 
3) FP instructions and their itineraries are added which includes values
   for SSE4A, BMI, BMI2 and SHA instructions.

Patch by Ganesh Gopalasubramanian

Reviewers: RKSimon, craig.topper

Subscribers: vprasad, shivaram, ddibyend, andreadb, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D36617

llvm-svn: 312237
2017-08-31 12:38:35 +00:00
Hans Wennborg 24775a0a6c Revert r312154 "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding""
It caused PR34387: Assertion failed: (RegNo < NumRegs && "Attempting to access record for invalid register number!")

> Issues identified by buildbots addressed since original review:
> - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
> - The pass no longer forwards COPYs to physical register uses, since
>   doing so can break code that implicitly relies on the physical
>   register number of the use.
> - The pass no longer forwards COPYs to undef uses, since doing so
>   can break the machine verifier by creating LiveRanges that don't
>   end on a use (since the undef operand is not considered a use).
>
>   [MachineCopyPropagation] Extend pass to do COPY source forwarding
>
>   This change extends MachineCopyPropagation to do COPY source forwarding.
>
>   This change also extends the MachineCopyPropagation pass to be able to
>   be run during register allocation, after physical registers have been
>   assigned, but before the virtual registers have been re-written, which
>   allows it to remove virtual register COPY LiveIntervals that become dead
>   through the forwarding of all of their uses.

llvm-svn: 312178
2017-08-30 22:11:37 +00:00
Geoff Berry feffb0c8af Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"
Issues identified by buildbots addressed since original review:
- Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
- The pass no longer forwards COPYs to physical register uses, since
  doing so can break code that implicitly relies on the physical
  register number of the use.
- The pass no longer forwards COPYs to undef uses, since doing so
  can break the machine verifier by creating LiveRanges that don't
  end on a use (since the undef operand is not considered a use).

  [MachineCopyPropagation] Extend pass to do COPY source forwarding

  This change extends MachineCopyPropagation to do COPY source forwarding.

  This change also extends the MachineCopyPropagation pass to be able to
  be run during register allocation, after physical registers have been
  assigned, but before the virtual registers have been re-written, which
  allows it to remove virtual register COPY LiveIntervals that become dead
  through the forwarding of all of their uses.

llvm-svn: 312154
2017-08-30 18:41:07 +00:00
Adrian Prantl 05782218ab Canonicalize the representation of empty an expression in DIGlobalVariableExpression
This change simplifies code that has to deal with
DIGlobalVariableExpression and mirrors how we treat DIExpressions in
debug info intrinsics. Before this change there were two ways of
representing empty expressions on globals, a nullptr and an empty
!DIExpression().

If someone needs to upgrade out-of-tree testcases:
  perl -pi -e 's/(!DIGlobalVariableExpression\(var: ![0-9]*)\)/\1, expr: !DIExpression())/g' <MYTEST.ll>
will catch 95%.

llvm-svn: 312144
2017-08-30 18:06:51 +00:00
Craig Topper afce0baacd [AVX512] Don't use 32-bit elements version of AND/OR/XOR/ANDN during isel unless we're matching a masked op or broadcast
Selecting 32-bit element logical ops without a select or broadcast requires matching a bitconvert on the inputs to the and. But that's a weird thing to rely on. It's entirely possible that one of the inputs doesn't have a bitcast and one does.

Since there's no functional difference, just remove the extra patterns and save some isel table size.

Differential Revision: https://reviews.llvm.org/D36854

llvm-svn: 312138
2017-08-30 16:38:33 +00:00
Igor Breger 36d447d8a8 [GlobalISel][X86] Support variadic function call.
Summary: Support variadic function call. Port the implementation from X86FastISel.

Reviewers: zvi, guyblank, oren_ben_simhon

Reviewed By: guyblank

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D37261

llvm-svn: 312130
2017-08-30 15:10:15 +00:00
Balaram Makam 42adadfca0 Re-land MachineInstr: Reason locally about some memory objects before going to AA.
Summary:
Reverts r311008 to reinstate r310825 with a fix.

Refine alias checking for pseudo vs value to be conservative.
This fixes the original failure in builtbot unittest SingleSource/UnitTests/2003-07-09-SignedArgs.

Reviewers: hfinkel, nemanjai, efriedma

Reviewed By: efriedma

Subscribers: bjope, mcrosier, nhaehnle, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D36900

llvm-svn: 312126
2017-08-30 14:57:12 +00:00
Gadi Haber 767d98bad8 [X86][Skylake] Fixing duplicated prefixes in the run command of Code Gen regression tests
NFC.
Replaced duplicated HASWELL prefixes in run commands in the X86 Code Gen regression tests by the SKYLAKE prefix when the -mcpu is set to skylake.
The fix is needed in preparation of an upcoming patch containing the Skylake scheduling info.

Reviewers: zvi, RKSimon, aymanmus, igorb

Differential Revision: https://reviews.llvm.org/D37258

llvm-svn: 312103
2017-08-30 08:08:50 +00:00
Craig Topper 17854ecf24 [AVX512] Correct isel patterns to support selecting masked vbroadcastf32x2/vbroadcasti32x2
Summary:
This patch adjusts the patterns to make the result type of the broadcast node vXf64/vXi64. Then adds a bitcast to vXi32 after that. Intrinsic lowering was also adjusted to generate this new pattern.

Fixes PR34357

We should probably just drop the intrinsic entirely and use native IR, but I'll leave that for a future patch.

Any idea what instruction we should be lowering the floating point 128-bit result version of this pattern to?  There's a 128-bit v2i32 integer broadcast but not an fp one.

Reviewers: aymanmus, zvi, igorb

Reviewed By: aymanmus

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37286

llvm-svn: 312101
2017-08-30 07:48:39 +00:00
Craig Topper 48a7917079 [AVX512] Use 256-bit extract instructions for extracting bits [255:128] from a 512-bit register
This enables the use of a smaller encoding by using a VEX instruction when possible.

Differential Revision: https://reviews.llvm.org/D37092

llvm-svn: 312100
2017-08-30 07:26:12 +00:00
Craig Topper ef1f71669e [X86] Apply SlowIncDec feature to Sandybridge/Ivybridge CPUs as well
Currently we start applying this on Haswell and newer. I don't believe anything changed in the Haswell architecture to make this the right cutoff point. The partial flag handling around this has been roughly the same since Sandybridge.

Differential Revision: https://reviews.llvm.org/D37250

llvm-svn: 312099
2017-08-30 05:00:35 +00:00
Craig Topper 641e2af9e8 [X86] Provide a separate feature bit for macro fusion support instead of basing it on the AVX flag
Summary:
Currently we determine if macro fusion is supported based on the AVX flag as a proxy for the processor being Sandy Bridge".

This is really strange as now AMD supports AVX. It also means if user explicitly disables AVX we disable macro fusion.

This patch adds an explicit macro fusion feature. I've also enabled for the generic 64-bit CPU (which doesn't have AVX)

This is probably another candidate for being in the MI layer, but for now I at least wanted to correct the overloading of the AVX feature.

Reviewers: spatel, chandlerc, RKSimon, zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37280

llvm-svn: 312097
2017-08-30 04:34:48 +00:00
Reid Kleckner a058736c9c [dwarfdump] Pretty print location expressions and location lists
Summary:
Based on Fred's patch here: https://reviews.llvm.org/D6771

I can't seem to commandeer the old review, so I'm creating a new one.

With that change the locations exrpessions are pretty printed inline in the
DIE tree. The output looks like this for debug_loc entries:

    DW_AT_location [DW_FORM_data4]        (0x00000000
       0x0000000000000001 - 0x000000000000000b: DW_OP_consts +3
       0x000000000000000b - 0x0000000000000012: DW_OP_consts +7
       0x0000000000000012 - 0x000000000000001b: DW_OP_reg0 RAX, DW_OP_piece 0x4
       0x000000000000001b - 0x0000000000000024: DW_OP_breg5 RDI+0)

And like this for debug_loc.dwo entries:
    DW_AT_location [DW_FORM_sec_offset]   (0x00000000
      Addr idx 2 (w/ length 190): DW_OP_consts +0, DW_OP_stack_value
      Addr idx 3 (w/ length 23): DW_OP_reg0 RAX, DW_OP_piece 0x4)

Simple locations without ranges are printed inline:

   DW_AT_location [DW_FORM_block1]       (DW_OP_reg4 RSI, DW_OP_piece 0x4, DW_OP_bit_piece 0x20 0x0)

The debug_loc(.dwo) dumping in changed accordingly to factor the code.

Reviewers: dblaikie, aprantl, friss

Subscribers: mgorny, javed.absar, hiraditya, llvm-commits, JDevlieghere

Differential Revision: https://reviews.llvm.org/D37123

llvm-svn: 312042
2017-08-29 21:41:21 +00:00
Guy Blank 9203afcf0d [X86] Add a test cases to demonstrate selecting GPR instructions when
using mask based ones are more appropriate.

llvm-svn: 311996
2017-08-29 11:58:03 +00:00
Jatin Bhateja 1f41a505d6 [X86] Adding a test to demonstrate aggressive folding for LEA facotrization.
Differential Revision: https://reviews.llvm.org/D37257

llvm-svn: 311994
2017-08-29 10:49:33 +00:00
Craig Topper 62c47a2aa5 Mark Knights Landing as having slow two memory operand instructions
Summary: Knights Landing, because it is Atom derived, has slow two memory operand instructions. Mark the Knights Landing CPU model accordingly.

Patch by David Zarzycki.

Reviewers: craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37224

llvm-svn: 311979
2017-08-29 05:14:27 +00:00
Craig Topper 029a21dfdc [DAGCombiner] Teach visitEXTRACT_SUBVECTOR to turn extracts of BUILD_VECTOR into smaller BUILD_VECTORs
Only do this before operations are legalized of BUILD_VECTOR is Legal for the target.

Differential Revision: https://reviews.llvm.org/D37186

llvm-svn: 311892
2017-08-28 15:28:33 +00:00
Gadi Haber d76f7b824e [X86][Haswell] Updating HSW instruction scheduling information
This patch completely replaces the instruction scheduling information for the Haswell architecture target by modifying the file X86SchedHaswell.td located under the X86 Target.
We used the scheduling information retrieved from the Haswell architects in order to replace and modify the existing scheduling.
The patch continues the scheduling replacement effort started with the SNB target in r307529 and r310792.
Information includes latency, number of micro-Ops and used ports by each HSW instruction.

Please expect some performance fluctuations due to code alignment effects.

Reviewers: RKSimon, zvi, aymanmus, craig.topper, m_zuckerman, igorb, dim, chandlerc, aaboud

Differential Revision: https://reviews.llvm.org/D36663

llvm-svn: 311879
2017-08-28 10:04:16 +00:00
Craig Topper 80075a5fb7 [AVX512] Add more patterns for using masked moves for subvector extracts of the lowest subvector. This time with bitcasts between the vselect and the extract.
llvm-svn: 311856
2017-08-27 19:03:36 +00:00
Sanjay Patel a7a61d9768 [DAGCombiner] allow undef shuffle operands when eliminating bitcasts (PR34111)
As noted in the FIXME, this could be improved more, but this is the smallest fix
that helps:
https://bugs.llvm.org/show_bug.cgi?id=34111

llvm-svn: 311853
2017-08-27 17:29:30 +00:00
Sanjay Patel 4e4ba615b2 [x86] add haddps test for PR34111; NFC
llvm-svn: 311852
2017-08-27 17:15:49 +00:00
Jatin Bhateja 23eaf52d7d [X86] Adding more tests for horizontal [F]HADD/[F]SUB for AVX512 vectors types
llvm-svn: 311847
2017-08-27 12:43:25 +00:00
Craig Topper 36bd247f64 [X86] Add a target-specific DAG combine to combine extract_subvector from all zero/one build_vectors.
llvm-svn: 311841
2017-08-27 05:39:57 +00:00
Craig Topper a088362e88 [AVX512] Add patterns to match masked extract_subvector with bitcasts between the vselect and the extract_subvector. Remove the late DAG combine.
We used to do a late DAG combine to move the bitcasts out of the way, but I'm starting to think that it's better to canonicalize extract_subvector's type to match the type of its input. I've seen some cases where we've formed two different extract_subvector from the same node where one had a bitcast and the other didn't.

Add some more test cases to ensure we've also got most of the zero masking covered too.

llvm-svn: 311837
2017-08-26 22:24:57 +00:00
Jatin Bhateja c2f41b9f0b [X86] Adding a test for horizontal [f]add/[f]sub for avx512 vector type 16x32.
Differential Revision: https://reviews.llvm.org/D37183

llvm-svn: 311834
2017-08-26 19:02:49 +00:00
Jatin Bhateja e4ca95d6aa [DAGCombiner] Extending pattern detection for vector shuffle.
Summary:
If all the operands of a BUILD_VECTOR extract elements from same vector then split the
vector efficiently based on the maximum vector access index.

This will also fix PR 33784

Reviewers: zvi, delena, RKSimon, thakis

Reviewed By: RKSimon

Subscribers: chandlerc, eladcohen, llvm-commits

Differential Revision: https://reviews.llvm.org/D35788

llvm-svn: 311833
2017-08-26 19:02:36 +00:00
Jatin Bhateja b60cfbefac Revert rL311247 : To rectify commit message.
Summary: This reverts commit rL311247.

Differential Revision: https://reviews.llvm.org/D36927

llvm-svn: 311832
2017-08-26 19:02:17 +00:00
Craig Topper d27386a9ed [AVX512] Add patterns to use masked moves to implement masked extract_subvector of the lowest subvector.
This only supports 32 and 64 bit element sizes for now. But we could probably do 16 and 8-bit elements with BWI.

llvm-svn: 311821
2017-08-25 23:34:59 +00:00
Craig Topper b89dbf0220 [AVX512] Add additional test cases for masked extract subvector.
This includes tests for extracting 128-bits from a 256-bit vector and zero masking.

llvm-svn: 311820
2017-08-25 23:34:57 +00:00
Craig Topper e81de105a5 [X86] Add patterns to show more failures to use TBM instructions when we're trying to check flags.
We can probably add patterns to fix some of them. But the ones that use 'and' as their root node emit a X86ISD::CMP node in front of the 'and' and then pattern matching that to 'test' instruction. We can't use a tablegen pattern to fix that because we can't remap the cmp result to the flag output of a TBM instruction.

llvm-svn: 311819
2017-08-25 23:34:55 +00:00
Chandler Carruth 4b611a896d [x86] Teach the backend to fold more read-modify-write memory operands
to instructions.

These can't be reasonably matched in tablegen due to the handling of
flags, so we have to do this in C++ code. We only did it for `inc` and
`dec` historically, this starts fleshing that out to more interesting
instructions. Notably, this handles transfering operands to `add` and
`sub`.

Currently this forces them into a register. The next patch will add
support for keeping immediate operands as immediates. Then I'll extend
this beyond just `add` and `sub`.

I'm not super thrilled by the repeated switches in the code but
everything else I tried was really ugly or problematic.

Many thanks to Craig Topper for the suggestions about where to even
begin here and how to make this stuff work.

Differential Revision: https://reviews.llvm.org/D37130

llvm-svn: 311806
2017-08-25 22:50:52 +00:00
Sanjay Patel 50a446ef10 [x86] regenerate checks; NFC
llvm-svn: 311793
2017-08-25 19:25:03 +00:00
Chandler Carruth 46259260c7 [x86] NFC - normalize test case formatting of IR and generate CHECK
lines with the script rather than using manually written checks.

llvm-svn: 311753
2017-08-25 02:32:51 +00:00
Craig Topper 355d8cff49 [X86] Add TBM instructions to X86InstrInfo::isDefConvertible.
This allows us to remove "test" instructions and use the flags from the TBM instructions directly.

llvm-svn: 311747
2017-08-25 01:59:06 +00:00
Chandler Carruth 5b491808f5 [x86] Back out one aspect of r311318: don't generically set
FeatureSlowUAMem32.

The idea was to mark things that are slow on widely available processors
as slow in the generic CPU so that the code generated for that CPU would
be fast across those processors. However, for this feature that doesn't
work out very well at all.

The problem here is that you can very easily enable AVX or AVX2 on top
of this generic CPU. For example, this can happen just by using AVX2
intrinsics from Clang within a region of code guarded by a dynamic CPU
feature test. When you do that, the generated code with SlowUAMem32 set
is ... amazingly slower. The problem is that there really aren't very
good alternatives to the unaligned loads, and so our vector codegen
regresses significantly.

The other issue is that there are plenty of AMD CPUs with AVX1 that
don't set FeatureSlowUAMem32 and so we shouldn't just check for AVX2
instead of this special feature. =/

It would be nice to have the target attriute logic be able to
enable/disable more than just one feature at a time and control this in
a more fine grained and useful way, but that doesn't seem easy. Given
that it is only Sandybridge and Ivybridge that set this feature, for now
I'm just backing it out of the generic CPU. That has the additional
advantage of going back to the previous state that people seemed vaguely
happy with.

llvm-svn: 311740
2017-08-25 00:56:05 +00:00
Chandler Carruth 8ac488b161 [x86] Fix an amazing goof in the handling of sub, or, and xor lowering.
The comment for this code indicated that it should work similar to our
handling of add lowering above: if we see uses of an instruction other
than flag usage and store usage, it tries to avoid the specialized
X86ISD::* nodes that are designed for flag+op modeling and emits an
explicit test.

Problem is, only the add case actually did this. In all the other cases,
the logic was incomplete and inverted. Any time the value was used by
a store, we bailed on the specialized X86ISD node. All of this appears
to have been historical where we had different logic here. =/

Turns out, we have quite a few patterns designed around these nodes. We
should actually form them. I fixed the code to match what we do for add,
and it has quite a positive effect just within some of our test cases.
The only thing close to a regression I see is using:

  notl %r
  testl %r, %r

instead of:

  xorl -1, %r

But we can add a pattern or something to fold that back out. The
improvements seem more than worth this.

I've also worked with Craig to update the comments to no longer be
actively contradicted by the code. =[ Some of this still remains
a mystery to both Craig and myself, but this seems like a large step in
the direction of consistency and slightly more accurate comments.

Many thanks to Craig for help figuring out this nasty stuff.

Differential Revision: https://reviews.llvm.org/D37096

llvm-svn: 311737
2017-08-25 00:34:07 +00:00
Sanjay Patel e404cbff66 [DAG] convert vector select-of-constants to logic/math
This goes back to a discussion about IR canonicalization. We'd like to preserve and convert
more IR to 'select' than we currently do because that's likely the best choice in IR:
http://lists.llvm.org/pipermail/llvm-dev/2016-September/105335.html
...but that's often not true for codegen, so we need to account for this pattern coming in
to the backend and transform it to better DAG ops.

Steps in this patch:

  1. Add an EVT param to the existing convertSelectOfConstantsToMath() TLI hook to more finely
     enable this transform. Other targets will probably want that anyway to distinguish scalars
     from vectors. We're using that here to exclude AVX512 targets, but it may not be necessary.

  2. Convert a vselect to ext+add. This eliminates a constant load/materialization, and the
     vector ext is often free.

Implementing a more general fold using xor+and can be a follow-up for targets that don't have
a legal vselect. It's also possible that we can remove the TLI hook for the special case fold
implemented here because we're eliminating a constant, but it needs to be tested on other
targets.

Differential Revision: https://reviews.llvm.org/D36840

llvm-svn: 311731
2017-08-24 23:24:43 +00:00
Michael Zuckerman 9ee61d9b00 Adding base lit test for x86interleaved
llvm-svn: 311658
2017-08-24 14:11:28 +00:00
Chandler Carruth dc2556934c [x86] NFC: Clean up two tests and generate precise checks for them.
Mostly this involved giving unnamed values names and running the IR
through `opt` to re-format it but merging in any important comments in
the original. I then deleted pointless comments and inlined the function
attributes for ease of reading and editting.

All of this is to make it much easier to see the instructions being
generated here and evaluate any updates to the tests.

llvm-svn: 311634
2017-08-24 07:38:36 +00:00
Igor Breger 47be5fbbe9 [GlobalISel][X86] Support G_IMPLICIT_DEF.
Summary: Support G_IMPLICIT_DEF.

Reviewers: zvi, guyblank, t.p.northover

Reviewed By: guyblank

Subscribers: rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D36733

llvm-svn: 311633
2017-08-24 07:06:27 +00:00
Wei Ding a131d3fb29 Add ‘llvm.experimental.constrained.fma‘ Intrinsic.
Differential Revision: http://reviews.llvm.org/D36335

llvm-svn: 311629
2017-08-24 04:18:24 +00:00
Hans Wennborg c39ec95d88 [DAG] Fix Node Replacement in PromoteIntBinOp
When one operand is a user of another in a promoted binary operation
we may replace and delete the returned value before returning
triggering an assertion. Reorder node replacements to prevent this.

Fixes PR34137.

Landing on behalf of Nirav.

Differential Revision: https://reviews.llvm.org/D36581

llvm-svn: 311623
2017-08-24 01:08:27 +00:00
Reid Kleckner 6d353348e5 Parse and print DIExpressions inline to ease IR and MIR testing
Summary:
Most DIExpressions are empty or very simple. When they are complex, they
tend to be unique, so checking them inline is reasonable.

This also avoids the need for CodeGen passes to append to the
llvm.dbg.mir named md node.

See also PR22780, for making DIExpression not be an MDNode.

Reviewers: aprantl, dexonsmith, dblaikie

Subscribers: qcolombet, javed.absar, eraman, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D37075

llvm-svn: 311594
2017-08-23 20:31:27 +00:00
Craig Topper 853a8d9ffc [AVX512] Don't create SHRUNKBLEND SDNodes for 512-bit vectors
There are no 512-bit blend instructions so we shouldn't create SHRUNKBLEND for them.

On a side note, it looks like there may be a missed opportunity for constant folding TESTM when LHS and RHS are equal.

This fixes PR34139.

Differential Revision: https://reviews.llvm.org/D36992

llvm-svn: 311572
2017-08-23 16:41:02 +00:00
Dean Michael Berris 0884b73220 [XRay][CodeGen] Use PIC-friendly code in XRay sleds; remove synthetic references in .text
Summary:
This change achieves two things:

  - Redefine the Custom Event handling instrumentation points emitted by
    the compiler to not require dynamic relocation of references to the
    __xray_CustomEvent trampoline.

  - Remove the synthetic reference we emit at the end of a function that
    we used to keep auxiliary sections alive in favour of SHF_LINK_ORDER
    associated with the section where the function is defined.

To achieve the custom event handling change, we've had to introduce the
concept of sled versioning -- this will need to be supported by the
runtime to allow us to understand how to turn on/off the new version of
the custom event handling sleds. That change has to land first before we
change the way we write the sleds.

To remove the synthetic reference, we rely on a relatively new linker
feature that preserves the sections that are associated with each other.
This allows us to limit the effects on the .text section of ELF
binaries.

Because we're still using absolute references that are resolved at
runtime for the instrumentation map (and function index) maps, we mark
these sections write-able. In the future we can re-define the entries in
the map to use relative relocations instead that can be statically
determined by the linker. That change will be a bit more invasive so we
defer this for later.

Depends on D36816.

Reviewers: dblaikie, echristo, pcc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36615

llvm-svn: 311525
2017-08-23 04:49:41 +00:00
Matthias Braun d6c0868da5 Fix tail-merge-after-mbp test
The output of this test changed after the fix in r311520 to have
-run-pass=block-placement behave like it does in a normal pipeline.
Adjust the test.

llvm-svn: 311521
2017-08-23 03:49:53 +00:00
Sanjay Patel 0ab50f6d68 [x86] auto-generate full checks; NFC
I don't see anything Darwin-specific here, so I made the target generic x86-64.

llvm-svn: 311465
2017-08-22 16:27:00 +00:00
Sanjay Patel 40b8e3bfe5 [x86] simplify runs and auto-generate full checks
I've replaced the two OS-specific runs with a generic run because
there's no functional difference in the resulting output that
we're checking. Also, the script still doesn't work with a Win
target.

llvm-svn: 311463
2017-08-22 16:21:45 +00:00
Craig Topper b49f0893b2 [X86] Prevent several calls to ISD::isConstantSplatVector from returning a narrower APInt than the original scalar type
ISD::isConstantSplatVector can shrink to the smallest splat width. But we don't check the size of the resulting APInt at all. This can cause us to misinterpret the results.

This patch just adds a flag to prevent the APInt from changing width.

Fixes PR34271.

Differential Revision: https://reviews.llvm.org/D36996

llvm-svn: 311429
2017-08-22 05:40:17 +00:00
Craig Topper 8078dd2984 [X86] When selecting sse_load_f32/f64 pattern, make sure there's only one use of every node all the way back to the root of the match
Summary: With masked operations, its possible for the operation node like fadd, fsub, etc. to be used by multiple different vselects. Since the pattern matching will start at the vselect, we need to make sure the operation node itself is only used once before we can fold a load. Otherwise we'll end up folding the same load into multiple instructions.

Reviewers: RKSimon, spatel, zvi, igorb

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36938

llvm-svn: 311342
2017-08-21 16:04:04 +00:00
Igor Breger 685889cf9b [GlobalISel][X86] Support G_BRCOND operation.
Summary: Support G_BRCOND operation. For now don't try to fold cmp/trunc instructions.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D34754

llvm-svn: 311327
2017-08-21 10:51:54 +00:00
Igor Breger 1b5e3d3e28 [GlobalISel][X86] LowerCall, for now don't handel ByValue function arguments.
llvm-svn: 311321
2017-08-21 08:59:59 +00:00
Michael Zuckerman bdb6673151 [InterLeaved] Adding lit test for future work interleaved load strid 3
llvm-svn: 311320
2017-08-21 08:56:39 +00:00
Chandler Carruth 98c51cbee1 [x86] Teach the "generic" x86 CPU to avoid patterns that are slow on
widely used processors.

This occured to me when I saw that we were generating 'inc' and 'dec'
when for Haswell and newer we shouldn't. However, there were a few "X is
slow" things that we should probably just set.

I've avoided any of the "X is fast" features because most of those would
be pretty serious regressions on processors where X isn't actually fast.
The slow things are likely to be negligible costs on processors where
these aren't slow and a significant win when they are slow.

In retrospect this seems somewhat obvious. Not sure why we didn't do
this a long time ago.

Differential Revision: https://reviews.llvm.org/D36947

llvm-svn: 311318
2017-08-21 08:45:22 +00:00
Chandler Carruth 63dd5e0ef6 [x86] Handle more cases where we can re-use an atomic operation's flags
rather than doing a separate comparison.

This both saves an explicit comparision and avoids the use of `xadd`
which introduces register constraints and other challenges to the
generated code.

The motivating case is from atomic reference counts where `1` is the
sentinel rather than `0` for whatever reason. This can and should be
lowered efficiently on x86 by just using a different flag, however the
x86 code only handled the `0` case.

There remains some further opportunities here that are currently hidden
due to canonicalization. I've included test cases that show these and
FIXMEs. However, I don't at the moment have any production use cases and
they seem substantially harder to address.

Differential Revision: https://reviews.llvm.org/D36945

llvm-svn: 311317
2017-08-21 08:45:19 +00:00
Craig Topper d6f4be97e6 [AVX-512] Don't change which instructions we use for unmasked subvector broadcasts when AVX512DQ is enabled.
There's no functional difference between the AVX512DQ instructions if we're not masking.

This change unifies test checks and removes extra isel entries. Similar was done for subvector insert and extracts recently.

llvm-svn: 311308
2017-08-21 05:29:02 +00:00
Craig Topper 485cca1ecb [AVX512] Add 128->256 vbroadcastf64x2/vbroadcasti64x2 instructions to the EVEX->VEX table.
llvm-svn: 311307
2017-08-21 05:03:28 +00:00
Craig Topper d63b33f9c4 [AVX512] Add a test to check what happens when a load is referenced by two different masked scalar intrinsics with the same op inputs, but different masking node.
We're missing some single use checks in the sse_load_f32/f64 handling that cause us to replicate the load.

llvm-svn: 311300
2017-08-20 19:47:00 +00:00
Igor Breger 88a3d5c855 [GlobalISel][X86] Support call ABI.
Summary: Support call ABI. For now only Linux C and X86_64_SysV calling conventions supported. Variadic function not supported.

Reviewers: zvi, guyblank, oren_ben_simhon

Reviewed By: oren_ben_simhon

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D34602

llvm-svn: 311279
2017-08-20 09:25:22 +00:00
Igor Breger b3a860a5e8 [GlobalISel][X86] Support asimetric copy from/to GPR physical register.
Usually this case generated by ABI lowering, it requare to performe trancate/anyext.

llvm-svn: 311278
2017-08-20 07:14:40 +00:00
Chandler Carruth 9ef881efab [x86] Fix an even stranger corner case where we have multiple levels of
cmov self-refrencing.

Pointed out by Amjad Aboud in code review, test case minorly simplified
from the one he posted.

llvm-svn: 311267
2017-08-19 23:35:50 +00:00
Craig Topper a0319bb434 [AVX512] Use alignedstore256 in a pattern that's emitting a 256-bit movaps from an extract subvector operation.
llvm-svn: 311263
2017-08-19 22:02:02 +00:00
Jatin Bhateja 6b4c205685 [DAGCombiner] Extending pattern detection for vector shuffle.
Summary:
    If all the operands of a BUILD_VECTOR extract elements from same vector then split the
    vector efficiently based on the maximum vector access index.

    Reviewers: zvi, delena, RKSimon, thakis

    Reviewed By: RKSimon

    Subscribers: chandlerc, eladcohen, llvm-commits

    Differential Revision: https://reviews.llvm.org/D35788

llvm-svn: 311255
2017-08-19 18:08:59 +00:00
Jatin Bhateja 66f7958e91 Revert rL311247 : To rectify commit message.
Summary: This reverts commit rL311247.

Differential Revision: https://reviews.llvm.org/D36927

llvm-svn: 311252
2017-08-19 17:59:58 +00:00
Jatin Bhateja 6f0d0d23b0 Merge branch 'arcpatch-D35788'
llvm-svn: 311247
2017-08-19 17:00:04 +00:00
Jatin Bhateja 1c56863739 Revert rL311242 "Extension of shuffle vector pattern detection, updating post rebase."
Summary:

This reverts commit rL311242.

Differential Revision: https://reviews.llvm.org/D36924

llvm-svn: 311246
2017-08-19 16:40:06 +00:00
Jatin Bhateja 313f97dd84 Extension of shuffle vector pattern detection, updating post rebase.
llvm-svn: 311242
2017-08-19 15:58:36 +00:00
Chandler Carruth 93a645525c [x86] Teach the cmov converter to aggressively convert cmovs with memory
operands into control flow.

We have seen periodically performance problems with cmov where one
operand comes from memory. On modern x86 processors with strong branch
predictors and speculative execution, this tends to be much better done
with a branch than cmov. We routinely see cmov stalling while the load
is completed rather than continuing, and if there are subsequent
branches, they cannot be speculated in turn.

Also, in many (even simple) cases, macro fusion causes the control flow
version to be fewer uops.

Consider the IACA output for the initial sequence of code in a very hot
function in one of our internal benchmarks that motivates this, and notice the
micro-op reduction provided.
Before, SNB:
```
Throughput Analysis Report
--------------------------
Block Throughput: 2.20 Cycles       Throughput Bottleneck: Port1

| Num Of |              Ports pressure in cycles               |    |
|  Uops  |  0  - DV  |  1  |  2  -  D  |  3  -  D  |  4  |  5  |    |
---------------------------------------------------------------------
|   1    |           | 1.0 |           |           |     |     | CP | mov rcx, rdi
|   0*   |           |     |           |           |     |     |    | xor edi, edi
|   2^   | 0.1       | 0.6 | 0.5   0.5 | 0.5   0.5 |     | 0.4 | CP | cmp byte ptr [rsi+0xf], 0xf
|   1    |           |     | 0.5   0.5 | 0.5   0.5 |     |     |    | mov rax, qword ptr [rsi]
|   3    | 1.8       | 0.6 |           |           |     | 0.6 | CP | cmovbe rax, rdi
|   2^   |           |     | 0.5   0.5 | 0.5   0.5 |     | 1.0 |    | cmp byte ptr [rcx+0xf], 0x10
|   0F   |           |     |           |           |     |     |    | jb 0xf
Total Num Of Uops: 9
```
After, SNB:
```
Throughput Analysis Report
--------------------------
Block Throughput: 2.00 Cycles       Throughput Bottleneck: Port5

| Num Of |              Ports pressure in cycles               |    |
|  Uops  |  0  - DV  |  1  |  2  -  D  |  3  -  D  |  4  |  5  |    |
---------------------------------------------------------------------
|   1    | 0.5       | 0.5 |           |           |     |     |    | mov rax, rdi
|   0*   |           |     |           |           |     |     |    | xor edi, edi
|   2^   | 0.5       | 0.5 | 1.0   1.0 |           |     |     |    | cmp byte ptr [rsi+0xf], 0xf
|   1    | 0.5       | 0.5 |           |           |     |     |    | mov ecx, 0x0
|   1    |           |     |           |           |     | 1.0 | CP | jnbe 0x39
|   2^   |           |     |           | 1.0   1.0 |     | 1.0 | CP | cmp byte ptr [rax+0xf], 0x10
|   0F   |           |     |           |           |     |     |    | jnb 0x3c
Total Num Of Uops: 7
```
The difference even manifests in a throughput cycle rate difference on Haswell.
Before, HSW:
```
Throughput Analysis Report
--------------------------
Block Throughput: 2.00 Cycles       Throughput Bottleneck: FrontEnd

| Num Of |                    Ports pressure in cycles                     |    |
|  Uops  |  0  - DV  |  1  |  2  -  D  |  3  -  D  |  4  |  5  |  6  |  7  |    |
---------------------------------------------------------------------------------
|   0*   |           |     |           |           |     |     |     |     |    | mov rcx, rdi
|   0*   |           |     |           |           |     |     |     |     |    | xor edi, edi
|   2^   |           |     | 0.5   0.5 | 0.5   0.5 |     | 1.0 |     |     |    | cmp byte ptr [rsi+0xf], 0xf
|   1    |           |     | 0.5   0.5 | 0.5   0.5 |     |     |     |     |    | mov rax, qword ptr [rsi]
|   3    | 1.0       | 1.0 |           |           |     |     | 1.0 |     |    | cmovbe rax, rdi
|   2^   | 0.5       |     | 0.5   0.5 | 0.5   0.5 |     |     | 0.5 |     |    | cmp byte ptr [rcx+0xf], 0x10
|   0F   |           |     |           |           |     |     |     |     |    | jb 0xf
Total Num Of Uops: 8
```
After, HSW:
```
Throughput Analysis Report
--------------------------
Block Throughput: 1.50 Cycles       Throughput Bottleneck: FrontEnd

| Num Of |                    Ports pressure in cycles                     |    |
|  Uops  |  0  - DV  |  1  |  2  -  D  |  3  -  D  |  4  |  5  |  6  |  7  |    |
---------------------------------------------------------------------------------
|   0*   |           |     |           |           |     |     |     |     |    | mov rax, rdi
|   0*   |           |     |           |           |     |     |     |     |    | xor edi, edi
|   2^   |           |     | 1.0   1.0 |           |     | 1.0 |     |     |    | cmp byte ptr [rsi+0xf], 0xf
|   1    |           | 1.0 |           |           |     |     |     |     |    | mov ecx, 0x0
|   1    |           |     |           |           |     |     | 1.0 |     |    | jnbe 0x39
|   2^   | 1.0       |     |           | 1.0   1.0 |     |     |     |     |    | cmp byte ptr [rax+0xf], 0x10
|   0F   |           |     |           |           |     |     |     |     |    | jnb 0x3c
Total Num Of Uops: 6
```

Note that this cannot be usefully restricted to inner loops. Much of the
hot code we see hitting this is not in an inner loop or not in a loop at
all. The optimization still remains effective and indeed critical for
some of our code.

I have run a suite of internal benchmarks with this change. I saw a few
very significant improvements and a very few minor regressions,
but overall this change rarely has a significant effect. However, the
improvements were very significant, and in quite important routines
responsible for a great deal of our C++ CPU cycles. The gains pretty
clealy outweigh the regressions for us.

I also ran the test-suite and SPEC2006. Only 11 binaries changed at all
and none of them showed any regressions.

Amjad Aboud at Intel also ran this over their benchmarks and saw no
regressions.

Differential Revision: https://reviews.llvm.org/D36858

llvm-svn: 311226
2017-08-19 05:01:19 +00:00
Simon Pilgrim f36cca88fb [X86][ADX] Regenerate ADX intrinsics tests
llvm-svn: 311198
2017-08-18 21:21:14 +00:00
Simon Pilgrim 879ce046ad [X86][BMI2] Added scheduling test for RORX/SARX/SHLX/SHRX instructions
llvm-svn: 311171
2017-08-18 16:26:39 +00:00
Simon Pilgrim 358aeae7b8 [X86][AES] Add scheduling latency/throughput tests for AES instructions
llvm-svn: 311167
2017-08-18 15:26:51 +00:00
Simon Pilgrim 9eb0869e91 [X86][PCLMUL] Add scheduling latency/throughput test for PCLMULQDQ instruction
Added it to the SSE42 tests as targets seem to always have both

llvm-svn: 311166
2017-08-18 15:08:30 +00:00
Simon Pilgrim ccaec26175 [X86][SHA] Add scheduling latency/throughput tests for SHA instructions
llvm-svn: 311164
2017-08-18 14:55:50 +00:00
Simon Pilgrim 7f506f7d72 [X86][MOVBE] Add scheduling latency/throughput tests for MOVBE instructions
llvm-svn: 311163
2017-08-18 14:44:31 +00:00
Simon Pilgrim 320f89782a [X86][BMI2] Added scheduling test for MULX instructions
llvm-svn: 311159
2017-08-18 13:22:18 +00:00
Geoff Berry bd47e8a4f7 Revert "[MachineCopyPropagation] Extend pass to do COPY source forwarding" round 2
This reverts commit r311135.

sanitizer-x86_64-linux-android buildbot is timing out with just this
patch applied.

llvm-svn: 311142
2017-08-18 01:43:11 +00:00
Richard Smith c0541dfa3e Increase tail dup threshold for -O3 from 3 to 4.
We see a modest performance improvement from this slightly higher tail dup threshold.

Differential Revision: https://reviews.llvm.org/D36775

llvm-svn: 311139
2017-08-17 23:38:41 +00:00
Craig Topper 1fae3ae6f0 [X86] Remove SSE/AVX patterns for AND/XOR/OR/ANDN that checked for the inputs being bitcasted from floating point types.
There's really no reason to do this we should just let isel pick the integer version and let the execution dependency fixing pass take care of moving to FP if necessary.

It's not very reliable to look for bitcasts at the edges of patterns. If for some reason one input was bitcasted and the other wasn't, or if one was a v4f32 bitcast and one was a v2f64 bitcast, we would have fallen back to the integer pattern anyway.

llvm-svn: 311138
2017-08-17 23:20:57 +00:00
Geoff Berry 51f52c4fca Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"
Two issues identified by buildbots were addressed:
    - The pass no longer forwards COPYs to physical register uses, since
      doing so can break code that implicitly relies on the physical
      register number of the use.
    - The pass no longer forwards COPYs to undef uses, since doing so
      can break the machine verifier by creating LiveRanges that don't
      end on a use (since the undef operand is not considered a use).

    [MachineCopyPropagation] Extend pass to do COPY source forwarding

    This change extends MachineCopyPropagation to do COPY source forwarding.

    This change also extends the MachineCopyPropagation pass to be able to
    be run during register allocation, after physical registers have been
    assigned, but before the virtual registers have been re-written, which
    allows it to remove virtual register COPY LiveIntervals that become dead
    through the forwarding of all of their uses.

    Reviewers: qcolombet, javed.absar, MatzeB, jonpa

    Subscribers: jyknight, nemanjai, llvm-commits, nhaehnle, mcrosier, mgorny

    Differential Revision: https://reviews.llvm.org/D30751

llvm-svn: 311135
2017-08-17 23:06:55 +00:00
Sanjay Patel f2d67f7ecc [x86] add tests for vector select-of-constants; NFC
We've discussed canonicalizing to this form in IR, so the backend
should be prepared to lower these in ways better than what we see
here in most cases.

llvm-svn: 311103
2017-08-17 17:07:37 +00:00
Adrian Prantl 6a57daad81 Improve line debug info when translating a CaseBlock to SDNodes.
The SelectionDAGBuilder translates various conditional branches into
CaseBlocks which are then translated into SDNodes. If a conditional
branch results in multiple CaseBlocks only the first CaseBlock is
translated into SDNodes immediately, the rest of the CaseBlocks are
put in a queue and processed when all LLVM IR instructions in the
basic block have been processed.

When a CaseBlock is transformed into SDNodes the SelectionDAGBuilder
is queried for the current LLVM IR instruction and the resulting
SDNodes are annotated with the debug info of the current
instruction (if it exists and has debug metadata).

When the deferred CaseBlocks are processed, the SelectionDAGBuilder
does not have a current LLVM IR instruction, and the resulting SDNodes
will not have any debuginfo. As DwarfDebug::beginInstruction() outputs
a .loc directive for the first instruction in a labeled
block (typically the case for something coming from a CaseBlock) this
tends to produce a line-0 directive.

This patch changes the handling of CaseBlocks to store the current
instruction's debug info into the CaseBlock when it is created (and the
SelectionDAGBuilder knows the current instruction) and to always use
the stored debug info when translating a CaseBlock to SDNodes.

Patch by Frej Drejhammar!

Differential Revision: https://reviews.llvm.org/D36671

llvm-svn: 311097
2017-08-17 16:57:13 +00:00
Craig Topper 3a622a14f9 [AVX512] Don't switch unmasked subvector insert/extract instructions when AVX512DQI is enabled.
There's no reason to switch instructions with and without DQI. It just creates extra isel patterns and test divergences.

There is however value in enabling the masked version of the instructions with DQI.

This required introducing some new multiclasses to enabling this splitting.

Differential Revision: https://reviews.llvm.org/D36661

llvm-svn: 311091
2017-08-17 15:40:25 +00:00
Simon Pilgrim 8be9f4af4f [DAGCombiner] Add support for non-uniform constant vectors to (mul x, (1 << c)) -> x << c
llvm-svn: 311083
2017-08-17 13:03:34 +00:00
Elad Cohen 124d32829c [SelectionDAG] Teach the vector-types operand scalarizer about SETCC
When v1i1 is legal (e.g. AVX512) the legalizer can reach
a case where a v1i1 SETCC with an illgeal vector type operand
wasn't scalarized (since v1i1 is legal) but its operands does
have to be scalarized. This used to assert because SETCC was
missing from the vector operand scalarizer.

This patch attemps to teach the legalizer to handle these cases
by scalazring the operands, converting the node into a scalar
SETCC node.

Differential revision: https://reviews.llvm.org/D36651

llvm-svn: 311071
2017-08-17 08:06:36 +00:00
Geoff Berry 4e38e02e6f Revert "[MachineCopyPropagation] Extend pass to do COPY source forwarding"
This reverts commit r311038.

Several buildbots are breaking, and at least one appears to be due to
the forwarding of physical regs enabled by this change.  Reverting while
I investigate further.

llvm-svn: 311062
2017-08-17 04:04:11 +00:00
Sanjay Patel 4abc3f6036 [x86] add cmov promotion tests for D36711; NFC
This way we can see what the current codegen looks like.
I've also explicitly added/removed the cmov attribute from the RUN lines,
so we know exactly what we're checking in the runs.

llvm-svn: 311052
2017-08-16 22:50:11 +00:00
Geoff Berry 87f8d25150 [MachineCopyPropagation] Extend pass to do COPY source forwarding
This change extends MachineCopyPropagation to do COPY source forwarding.

This change also extends the MachineCopyPropagation pass to be able to
be run during register allocation, after physical registers have been
assigned, but before the virtual registers have been re-written, which
allows it to remove virtual register COPY LiveIntervals that become dead
through the forwarding of all of their uses.

Reviewers: qcolombet, javed.absar, MatzeB, jonpa

Subscribers: jyknight, nemanjai, llvm-commits, nhaehnle, mcrosier, mgorny

Differential Revision: https://reviews.llvm.org/D30751

llvm-svn: 311038
2017-08-16 20:50:01 +00:00
Simon Pilgrim 38e8a023fa [X86] Regenerate immediate store merging tests
llvm-svn: 311016
2017-08-16 16:22:19 +00:00
Balaram Makam c5698befb6 Revert "MachineInstr: Reason locally about some memory objects before going to AA."
r310825 caused the clang-ppc64le-linux-lnt bot to go red
(http://lab.llvm.org:8011/builders/clang-ppc64le-linux-lnt/builds/5712)
because of a test-suite failure of
SingleSource/UnitTests/2003-07-09-SignedArgs

This reverts commit 0028f6a87224fb595a1c19c544cde9b003035996.

llvm-svn: 311008
2017-08-16 14:17:43 +00:00
Igor Breger ce5ea38135 [GlobalISel][X86] Fix mir tests, use correct physical register.NFC.
llvm-svn: 310996
2017-08-16 07:25:51 +00:00
Sanjay Patel 92653865e6 [x86] fold the mask op on 8- and 16-bit rotates
Ref the post-commit thread for r310770:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170807/478507.html

The motivating cases as 'C' source examples can look like this:

unsigned char rotate_right_8(unsigned char v, int shift) {
  // shift &= 7;
  v = ( v >> shift ) | ( v << ( 8 - shift ) );
  return v;
}

https://godbolt.org/g/K6rc1A

Notice that the source doesn't contain UB-safe masked shift amounts, but instcombine created those 
in order to produce narrow rotate patterns. This should be the last step needed to resolve PR34046:
https://bugs.llvm.org/show_bug.cgi?id=34046

Differential Revision: https://reviews.llvm.org/D36644

llvm-svn: 310849
2017-08-14 15:55:43 +00:00
Elad Cohen 6a9edda356 [SelectionDAG] combine vextract (v1iX extract_subvector(vNiX, Idx))
into vextract(vNiX,Idx) when creating vextract with getNode().
This case appeared in AVX512 after fixing pr33349 in r310552.

Differential revision: https://reviews.llvm.org/D36571

llvm-svn: 310828
2017-08-14 10:49:45 +00:00
Balaram Makam d9f53414de MachineInstr: Reason locally about some memory objects before going to AA.
This addresses a FIXME in MachineInstr::mayAlias.

llvm-svn: 310825
2017-08-14 09:41:40 +00:00
Elad Cohen 3a90a0c10d Revert "[DAGCombiner] Extending pattern detection for vector shuffle (REAPPLIED)"
This reverts commit r310782.

llvm-svn: 310822
2017-08-14 09:06:00 +00:00
Simon Pilgrim 8d9e6e607a [X86][BMI] Add BEXTR demanded bits test cases (PR34042)
llvm-svn: 310802
2017-08-13 20:35:38 +00:00
Gadi Haber bed2c50607 [X86][SandyBridge] Additional updates to the SNB instructions scheduling information
This is a continuation patch for commit r307529 which completely replaces the scheduling information for the SandyBridge architecture target by modifying the file X86SchedSandyBridge.td located under the X86 Target (see also https://reviews.llvm.org/D35019).

In this patch we added the scheduling information of additional SNB instructions that were missing from the patch commit r307529, fixed the scheduling of several resource groups that include only port0 instead of port05 (i.e., port0 OR port5) and fixed several incorrect instructions' scheduling in the r307529 commit.

The patch also includes the X87 instructions which were missing in previous patch commit r307529 as reported in bugzilla bug 34080.

Reviewers: zvi, RKSimon, chandlerc, igorb, m_zuckerman, craig.topper, aymanmus, dim

Differential Revision: https://reviews.llvm.org/D36388

llvm-svn: 310792
2017-08-13 13:59:24 +00:00
Simon Pilgrim 631991fdde [X86][AVX512] Added additional shuffle+trunc test case.
An existing test should have covered this but a typo caused it to fail. I've kept both as the codegen for the typo case needs addressing as well. 

llvm-svn: 310791
2017-08-13 12:30:36 +00:00
Simon Pilgrim ad1c5566a7 [X86][TBM] Add tests showing failure to fold RFLAGS result into TBM instructions.
And fails to select TBM instructions at all.

llvm-svn: 310790
2017-08-13 12:16:00 +00:00
Simon Pilgrim 808ce12878 [X86][TBM] Regenerate bextri intrinsics tests. NFCI.
llvm-svn: 310788
2017-08-13 11:56:15 +00:00
Guy Blank de425ae753 [X86][AVX512] Add combine for TESTM
Add an X86 combine for TESTM when one of the operands is a BUILD_VECTOR(0,0,...).

TESTM op0, BUILD_VECTOR(0,0,...) -> BUILD_VECTOR(0,0,...)
TESTM BUILD_VECTOR(0,0,...), op1 -> BUILD_VECTOR(0,0,...)

Differential Revision:
https://reviews.llvm.org/D36536

llvm-svn: 310787
2017-08-13 08:03:37 +00:00
Craig Topper 44cb1ffb6a [X86] When handling addcarry intrinsic, create the flag result with the correct type so we don't crash if we use a memory instruction
Summary:
Previously we were creating the flag result with MVT::Other which is interpretted as a Chain node. If we used a memory form of the instruction we would end up with a copyToReg that consumed the chain result of the adcx instruction instead of the flag result.

Pretty sure we should be using MVT::i32 here, that's what we do other places we create these node types.

We should probably consider this for 5.0 as well.

Reviewers: RKSimon, zvi, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36645

llvm-svn: 310784
2017-08-12 20:19:44 +00:00
Simon Pilgrim 5a86f0e717 [DAGCombiner] Extending pattern detection for vector shuffle (REAPPLIED)
If all the operands of a BUILD_VECTOR extract elements from same vector then split the vector efficiently based on the maximum vector access index.

Reapplied with fix to only work with simple value types.

Committed on behalf of @jbhateja (Jatin Bhateja)

Differential Revision: https://reviews.llvm.org/D35788

llvm-svn: 310782
2017-08-12 17:43:25 +00:00
Simon Pilgrim 32546d1434 [X86] Regenerate merge store tests. NFCI.
Gives us a much better idea of what is going on than just relying on a few checks.

llvm-svn: 310780
2017-08-12 17:27:35 +00:00
Richard Smith 3704eba1d1 D36604: PR34148: Do not assume we can use a copy relocation for an `external_weak` global
An `external_weak` global may be intended to resolve as a null pointer if it's
not defined, so it doesn't make sense to use a copy relocation for it.

Differential Revision: https://reviews.llvm.org/D36604

llvm-svn: 310773
2017-08-11 23:52:28 +00:00
Sanjay Patel 2b452c7192 [x86] add tests for rotate left/right with masked shifter; NFC
As noted in the test comment, instcombine now produces the masked
shift value even when it's not included in the source, so we should
handle this.

Although the AMD/Intel docs don't say it explicitly, over-rotating
the narrow ops produces the same results. An existence proof that
this works as expected on all x86 comes from gcc 4.9 or later:
https://godbolt.org/g/K6rc1A

llvm-svn: 310770
2017-08-11 22:38:40 +00:00
Sanjay Patel 5d6df36fde [x86] regenerate test checks, add 64-bit run; NFC
llvm-svn: 310767
2017-08-11 22:05:33 +00:00
Craig Topper ac217b7aa3 [X86] Don't use fsin/fcos/fsincos instructions ever
Summary:
Previously we would use these instructions if sse was disabled and fastmath was enabled.

As mentioned in D28335, this is a bad idea.

Reviewers: efriedma, scanon, DavidKreitzer

Reviewed By: DavidKreitzer

Subscribers: zvi, llvm-commits

Differential Revision: https://reviews.llvm.org/D36344

llvm-svn: 310762
2017-08-11 20:55:29 +00:00
Rafael Espindola b8956a70d3 Fix access to undefined weak symbols in pic code
When the access to a weak symbol is not a call, the access has to be
able to produce the value 0 at runtime.

We were sometimes producing code sequences where that was not possible
if the code was leaded more than 4g away from 0.

llvm-svn: 310756
2017-08-11 20:49:27 +00:00
Craig Topper 561092f233 [AVX512] Remove and autoupgrade many of the broadcast intrinsics
Summary:
This autoupgrades most of the broadcast intrinsics. They've been unused in clang for some time.

This leaves the 32x2 intrinsics because they are still used in clang.

Reviewers: RKSimon, zvi, igorb

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36606

llvm-svn: 310725
2017-08-11 16:22:45 +00:00
Craig Topper 0f30fe9634 [x86] Enable some support for lowerVectorShuffleWithUndefHalf with AVX-512
Summary:
This teaches 512-bit shuffles to detect unused halfs in order to reduce shuffle size.

We may need to refine the 512-bit exit point. I couldn't remember if we had good cross lane shuffles for 8/16 bit with AVX-512 or not.

I believe this is step towards being able to handle D36454 without a special case.

From here we need to improve our ability to combine extract_subvector with insert_subvector and other extract_subvectors. And we need to support narrowing binary operations where we don't demand all elements. This may be improvements to DAGCombiner::narrowExtractedVectorBinOp(by recognizing an insert_subvector in addition to concat) or we may need a target specific combiner.

Reviewers: RKSimon, zvi, delena, jbhateja

Reviewed By: RKSimon, jbhateja

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36601

llvm-svn: 310724
2017-08-11 16:20:05 +00:00
Sanjay Patel 169dae70a6 [x86] use more shift or LEA for select-of-constants (2nd try)
The previous rev (r310208) failed to account for overflow when subtracting the
constants to see if they're suitable for shift/lea. This version add a check
for that and more test were added in r310490.

We can convert any select-of-constants to math ops:
http://rise4fun.com/Alive/d7d

For this patch, I'm enhancing an existing x86 transform that uses fake multiplies
(they always become shl/lea) to avoid cmov or branching. The current code misses
cases where we have a negative constant and a positive constant, so this is just
trying to plug that hole.

The DAGCombiner diff prevents us from hitting a terrible inefficiency: we can start
with a select in IR, create a select DAG node, convert it into a sext, convert it
back into a select, and then lower it to sext machine code.

Some notes about the test diffs:

1. 2010-08-04-MaskedSignedCompare.ll - We were creating control flow that didn't exist in the IR.
2. memcmp.ll - Choose -1 or 1 is the case that got me looking at this again. We could avoid the 
   push/pop in some cases if we used 'movzbl %al' instead of an xor on a different reg? That's a 
   post-DAG problem though.
3. mul-constant-result.ll - The trade-off between sbb+not vs. setne+neg could be addressed if
   that's a regression, but those would always be nearly equivalent.
4. pr22338.ll and sext-i1.ll - These tests have undef operands, so we don't actually care about these diffs.
5. sbb.ll - This shows a win for what is likely a common case: choose -1 or 0.
6. select.ll - There's another borderline case here: cmp+sbb+or vs. test+set+lea? Also, sbb+not vs. setae+neg shows up again.
7. select_const.ll - These are motivating cases for the enhancement; replace cmov with cheaper ops.

Assembly differences between movzbl and xor to avoid a partial reg stall are caused later by the X86 Fixup SetCC pass.

Differential Revision: https://reviews.llvm.org/D35340

llvm-svn: 310717
2017-08-11 15:44:14 +00:00