Commit Graph

9930 Commits

Author SHA1 Message Date
Guy Blank 509d1b2a5a [X86][AVX512] regenerate avx512-insert-extract.ll
llvm-svn: 307654
2017-07-11 11:51:49 +00:00
Serguei Katkov 0e831c996c Revert Revert [MBP] do not rotate loop if it creates extra branch
This is a second attempt to land this patch.

The first one resulted in a crash of clang sanitizer buildbot.
The fix is here and regression test is added.

This is a last fix for the corner case of PR32214. Actually this is not really corner case in general.

We should not do a loop rotation if we create an additional branch due to it.
Consider the case where we have a loop chain H, M, B, C , where
H is header with viable fallthrough from pre-header and exit from the loop
M - some middle block
B - backedge to Header but with exit from the loop also.
C - some cold block of the loop.

Let's H is determined as a best exit. If we do a loop rotation M, B, C, H we can introduce the extra branch.
Let's compute the change in number of branches:
+1 branch from pre-header to header
-1 branch from header to exit
+1 branch from header to middle block if there is such
-1 branch from cold bock to header if there is one

So if C is not a predecessor of H then we introduce extra branch.

This change actually prohibits rotation of the loop if both true
  Best Exit has next element in chain as successor.
  Last element in chain is not a predecessor of first element of chain.

Reviewers: iteratee, xur, sammccall, chandlerc	
Reviewed By: iteratee
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34745

llvm-svn: 307631
2017-07-11 08:34:58 +00:00
Igor Breger 324d3791f8 [GlobalISel][X86] Use correct AND instructions.
AND8ri8 not supported in 64bit.

llvm-svn: 307630
2017-07-11 08:04:51 +00:00
Serguei Katkov 0b7b59ada3 [CGP] Relax a bit restriction for optimizeMemoryInst to extend scope
CodeGenPrepare::optimizeMemoryInst contains a check that we do nothing
if all instructions combining the address for memory instruction is in the same
block as memory instruction itself.

However if any of these instruction are placed after memory instruction then
address calculation will not be folded to memory instruction.

The added test case shows an example.

Reviewers: loladiro, spatel, efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34862

llvm-svn: 307628
2017-07-11 06:24:44 +00:00
Matthias Braun b38736706e Revert "[DAG] Improve Aliasing of operations to static alloca"
Reverting as it breaks tramp3d-v4 in the llvm test-suite. I added some
comments to https://reviews.llvm.org/D33345 about it.

This reverts commit r307546.

llvm-svn: 307589
2017-07-10 20:51:30 +00:00
Andrew V. Tischenko ae9d6db769 [X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler Part-1 (PR28573).
The new version of the model is definitely faster.

Differential Revision:
https://reviews.llvm.org/D35198

llvm-svn: 307552
2017-07-10 16:36:03 +00:00
Nirav Dave 163e1ad9dc [DAG] Improve Aliasing of operations to static alloca
Memory accesses offset from frame indices may alias, e.g., we
may merge write from function arguments passed on the stack when they
are contiguous. As a result, when checking aliasing, we consider the
underlying frame index's offset from the stack pointer.

Static allocs are realized as stack objects in SelectionDAG, but its
offset is not set until post-DAG causing DAGCombiner's alias check to
consider access to static allocas to frequently alias. Modify isAlias
to consider access between static allocas and access from other frame
objects to be considered aliasing.

Many test changes are included here. Most are fixes for tests which
indirectly relied on our aliasing ability and needed to be modified to
preserve their original intent.

The remaining tests have minor improvements due to relaxed
ordering. The exception is CodeGen/X86/2011-10-19-widen_vselect.ll
which has a minor degradation dispite though the pre-legalized DAG is
improved.

Reviewers: rnk, mkuper, jonpa, hfinkel, uweigand

Reviewed By: rnk

Subscribers: sdardis, nemanjai, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33345

llvm-svn: 307546
2017-07-10 15:39:41 +00:00
Gadi Haber f4d154c089 This patch completely replaces the scheduling information for the SandyBridge architecture target by modifying the file X86SchedSandyBridge.td located under the X86 Target.
The SandyBridge architects have provided us with a more accurate information about each instruction latency, number of uOPs and used ports and I used it to replace the existing estimated SNB instructions scheduling and to add missing scheduling information.

Please note that the patch extensively affects the X86 MC instr scheduling for SNB.

Also note that this patch will be followed by additional patches for the remaining target architectures HSW, IVB, BDW, SKL and SKX.

The updated and extended information about each instruction includes the following details:
•static latency of the instruction
•number of uOps from which the instruction consists of
•all ports used by the instruction's' uOPs

For example, the following code dictates that instructions, ADC64mr, ADC8mr, SBB64mr, SBB8mr have a static latency of 9 cycles. Each of these instructions is decoded into 6 micro operations which use ports 4, ports 2 or 3 and port 0 and ports 0 or 1 or 5:

def SBWriteResGroup94 : SchedWriteRes<[SBPort4,SBPort23,SBPort0,SBPort015]> {
let Latency = 9;
let NumMicroOps = 6;
let ResourceCycles = [1,2,2,1];

}
def: InstRW<[SBWriteResGroup94], (instregex "ADC64mr")>;
def: InstRW<[SBWriteResGroup94], (instregex "ADC8mr")>;
def: InstRW<[SBWriteResGroup94], (instregex "SBB64mr")>;
def: InstRW<[SBWriteResGroup94], (instregex "SBB8mr")>;

Note that apart for the header, most of the X86SchedSandyBridge.td file was generated by a script.

Reviewers: zvi, chandlerc, RKSimon, m_zuckerman, craig.topper, igorb

Differential Revision:  https://reviews.llvm.org/D35019#inline-304691

llvm-svn: 307529
2017-07-10 09:53:16 +00:00
Igor Breger d8b51e134e [GlobalISel][X86] Support G_LOAD/G_STORE i1.
Summary: Support G_LOAD/G_STORE i1.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D35178

llvm-svn: 307527
2017-07-10 09:26:09 +00:00
Igor Breger d48c5e4855 [GlobalISel][X86] extend G_ZEXT support.
Summary:
Mark G_ZEXT/G_SEXT i1 to i8/i16,  i8 to i16 as legal.
Support G_ZEXT i1 to i8/i16 instruction selection ( C++ code).
This patch requred to support G_LOAD/G_STORE i1.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D35177

llvm-svn: 307526
2017-07-10 09:07:34 +00:00
Davide Italiano c4b0ccd049 [X86] Relax an assertion when legalizing vector types.
WidenVSELECTAndMask can fold (and it folds in this case) so we
get a BUILD_VECTOR of constants as mask. convertMask() seems to
work fine when the input is a vector of constants, and we still
need to call it to extend/add elements at the end. but the current
code just asserts on anything but a SETCC or AND/OR/XOR of 2xSETCC.
This change was discussed briefly with Simon Pilgrim, who also
suggests we might consider dropping this assertion in the future.

Fixes PR33715.

llvm-svn: 307508
2017-07-09 19:22:48 +00:00
Simon Pilgrim 8247687e0f [X86][AVX512] Regenerate AVX512VL comparison tests.
Show poor codegen on KNL targets as mentioned on D35179

llvm-svn: 307500
2017-07-09 15:47:43 +00:00
Igor Breger 769cd05232 [GlobalISel][X86] Add legalizer tests for G_LOAD/G_STORE operations. NFC.
llvm-svn: 307494
2017-07-09 07:25:57 +00:00
Igor Breger b80b44b7b9 [FastISel] fix a fallback diagnostic.
Summary: FastISel was marked as failed in case instruction selection succeeded.

Reviewers: qcolombet, zvi, rovka, ab

Reviewed By: zvi

Subscribers: javed.absar, ab, qcolombet, bogner, llvm-commits

Differential Revision: https://reviews.llvm.org/D34438

llvm-svn: 307489
2017-07-09 05:55:20 +00:00
Hiroshi Inoue 713b5ba2de fix trivial typos; NFC
sucessor -> successor 

llvm-svn: 307488
2017-07-09 05:54:44 +00:00
Sanjay Patel 18ee908ca2 [x86] add SBB optimization for SETBE (ule) condition code
x86 scalar select-of-constants (Cond ? C1 : C2) combining/lowering is a mess 
with missing optimizations. We handle some patterns, but miss logical variants.

To clean that up, we should convert all select-of-constants to logic/math and 
enhance the combining for the expected patterns from that. Selecting 0 or -1 
needs extra attention to produce the optimal code as shown here.

Attempt to verify that all of these IR forms are logically equivalent:
http://rise4fun.com/Alive/plxs

Earlier steps in this series:
rL306040
rL306072
rL307404 (D34652)

As acknowledged in the earlier review, there's a possibility that some Intel
uarch would prefer to produce an xor to clear the fake register operand with
sbb %eax, %eax. This will likely need to be addressed in a separate pass.

llvm-svn: 307471
2017-07-08 14:04:48 +00:00
Sanjay Patel dd36f75733 [x86] add SBB optimization for SETAE (uge) condition code
x86 scalar select-of-constants (Cond ? C1 : C2) combining/lowering is a mess 
with missing optimizations. We handle some patterns, but miss logical variants.

To clean that up, we should convert all select-of-constants to logic/math and 
enhance the combining for the expected patterns from that. DAGCombiner already 
has the foundation to allow the transforms, so we just need to fill in the holes 
for x86 math op lowering. Selecting 0 or -1 needs extra attention to produce the
optimal code as shown here.

Attempt to verify that all of these IR forms are logically equivalent:
http://rise4fun.com/Alive/plxs

Earlier steps in this series:
rL306040
rL306072

Differential Revision: https://reviews.llvm.org/D34652

llvm-svn: 307404
2017-07-07 14:56:20 +00:00
Wei Mi 20526b2725 [ConstHoisting] choose to hoist when frequency is the same.
The patch is to adjust the strategy of frequency based consthoisting:
Previously when the candidate block has the same frequency with the existing
blocks containing a const, it will not hoist the const to the candidate block.
For that case, now we change the strategy to hoist the const if only existing
blocks have more than one block member. This is helpful for reducing code size.

Differential Revision: https://reviews.llvm.org/D35084

llvm-svn: 307328
2017-07-06 22:32:27 +00:00
Simon Pilgrim a80cb1d7a7 [X86][SSE] Tests for bitcasting iX integers to vXi1 boolean vectors
Including sign/zero extension to legal types

llvm-svn: 307301
2017-07-06 19:33:10 +00:00
Simon Pilgrim 0fee3372c9 [X86][SSE] Dropped -mcpu from bitcast+setcc tests
Use triple and attribute only for consistency

Added SSE2/AVX tests on 256-bit vectors to test PACKSS behaviour

llvm-svn: 307289
2017-07-06 18:27:34 +00:00
Wei Mi 90707394e3 [LSR] Narrow search space by filtering non-optimal formulae with the same ScaledReg and Scale.
When the formulae search space is huge, LSR uses a series of heuristic to keep
pruning the search space until the number of possible solutions are within
certain limit.

The big hammer of the series of heuristics is NarrowSearchSpaceByPickingWinnerRegs,
which picks the register which is used by the most LSRUses and deletes the other
formulae which don't use the register. This is a effective way to prune the search
space, but quite often not a good way to keep the best solution. We saw cases before
that the heuristic pruned the best formula candidate out of search space.

To relieve the problem, we introduce a new heuristic called
NarrowSearchSpaceByFilterFormulaWithSameScaledReg. The basic idea is in order to
reduce the search space while keeping the best formula, we want to keep as many
formulae with different Scale and ScaledReg as possible. That is because the central
idea of LSR is to choose a group of loop induction variables and use those induction
variables to represent LSRUses. An induction variable candidate is often represented
by the Scale and ScaledReg in a formula. If we have more formulae with different
ScaledReg and Scale to choose, we have better opportunity to find the best solution.
That is why we believe pruning search space by only keeping the best formula with the
same Scale and ScaledReg should be more effective than PickingWinnerReg. And we use
two criteria to choose the best formula with the same Scale and ScaledReg. The first
criteria is to select the formula using less non shared registers, and the second
criteria is to select the formula with less cost got from RateFormula. The patch
implements the heuristic before NarrowSearchSpaceByPickingWinnerRegs, which is the
last resort.

Testing shows we get 1.8% and 2% on two internal benchmarks on x86. llvm nightly
testsuite performance is neutral. We also tried lsr-exp-narrow and it didn't help
on the two improved internal cases we saw.

Differential Revision: https://reviews.llvm.org/D34583

llvm-svn: 307269
2017-07-06 15:52:14 +00:00
Simon Pilgrim 713600747e [X86][SSE4A] Add support for shuffle combining to INSERTQI.
llvm-svn: 307268
2017-07-06 15:34:17 +00:00
Simon Pilgrim 03641df383 [X86][SSE4A] Add test showing missed opportunities to combine INSERTQI shuffle
llvm-svn: 307265
2017-07-06 14:52:24 +00:00
Sanjay Patel 2a341620e7 [x86] fix over-specified triple and auto-generate checks; NFC
llvm-svn: 307262
2017-07-06 14:15:15 +00:00
Simon Pilgrim cc0f785dca [X86][SSE4A] Add support for shuffle combining to EXTRQ.
llvm-svn: 307254
2017-07-06 12:22:58 +00:00
Simon Pilgrim 40c0ae200f [X86][SSE4A] Add scheduling tests for SSE4A instructions
llvm-svn: 307251
2017-07-06 11:26:43 +00:00
Simon Pilgrim ac78daf517 {DAGCombiner] Fold (rot x, 0) -> x
llvm-svn: 307184
2017-07-05 18:27:11 +00:00
Simon Pilgrim 49123d4bb0 [X86] Test bitfield loadstore tests on i686 as well
llvm-svn: 307182
2017-07-05 18:09:30 +00:00
Andrew Zhogin 45d192823e [DAGCombiner] visitRotate patch to optimize pair of ROTR/ROTL instructions into one with combined shift operand.
For two ROTR operations with shifts C1, C2; combined shift operand will be (C1 + C2) % bitsize.

Differential revision: https://reviews.llvm.org/D12833

llvm-svn: 307179
2017-07-05 17:55:42 +00:00
Simon Pilgrim 55006b407b [X86][SSE] Dropped -mcpu from bitcast+setcc mask tests
Use triple and attribute only for consistency

llvm-svn: 307176
2017-07-05 17:30:30 +00:00
Igor Breger 55e2f5963a [GlobalIsel] allow x86_fp80 values to be dumped.
Summary:
Otherwise the fallback path fails with an assertion on x86_64 targets,
when "x86_fp80" is encountered.

Reviewers: t.p.northover, zvi, guyblank

Reviewed By: zvi

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D34975

llvm-svn: 307140
2017-07-05 11:11:10 +00:00
Nirav Dave b320ef9fab Rewrite areNonVolatileConsecutiveLoads to use BaseIndexOffset
Relanding after rewriting undef.ll test to avoid host-dependant
endianness.

As discussed in D34087, rewrite areNonVolatileConsecutiveLoads using
generic checks. Also, propagate missing local handling from there to
BaseIndexOffset checks.

Tests of note:

  * test/CodeGen/X86/build-vector* - Improved.
  * test/CodeGen/BPF/undef.ll - Improved store alignment allows an
    additional store merge

  * test/CodeGen/X86/clear_upper_vector_element_bits.ll - This is a
    case we already do not handle well. Here, the DAG is improved, but
    scheduling causes a code size degradation.

Reviewers: RKSimon, craig.topper, spatel, andreadb, filcab

Subscribers: nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D34472

llvm-svn: 307114
2017-07-05 01:21:23 +00:00
Gadi Haber 689426e3cb NFC.
Made some updates to the half.ll test under CodeGen to make it friendly to the update_llc_test_checks .py tool as follows:
1.Removing the llc flag -asm-verbose=false
2.Grouping the multiple check-prefix directives
3.Apply update_llc_test_checks.py tool on the test

This change is needed to easily update scheduling changes in an upcoming patch.

Reviewers: zvi, RKSimon, craig.topper 

Differential Revision: https://reviews.llvm.org/D34934

llvm-svn: 307108
2017-07-04 21:51:05 +00:00
Simon Pilgrim ac3e7f3f57 [X86][SSE4A] Add support for combining from non-v16i8 EXTRQI/INSERTQI shuffles
With the improved shuffle decoding we can now combine EXTRQI/INSERTQI shuffles from non-v16i8 vector types

llvm-svn: 307099
2017-07-04 18:11:02 +00:00
Anna Thomas 505941e7d6 [FastISel] Move gc intrinsic test to X86 directory
Move from generic to X86 directory since gc intrinsics only supposed in
X86 64 bit.
Add target triple as well.
Fixes build failure in i686-linux-RA  caused by rL307084.

llvm-svn: 307086
2017-07-04 15:24:08 +00:00
Simon Pilgrim d128222f0c [X86] Add combine tests for vector rotates
Reference tests for D12833

llvm-svn: 307073
2017-07-04 12:33:53 +00:00
Gadi Haber 4980790e81 NFC commit.
Converting the Codegen test "extractelement-legalization-store-ordering.ll" to be "update_llc_test_checks" friendly.

The changes to the test are needed for an upcoming scheduling patch.

Reviewers: zvi, RKSimon

Differential Revision: https://reviews.llvm.org/D34935

llvm-svn: 307066
2017-07-04 07:18:03 +00:00
Craig Topper ad140cfb68 [X86] Add comment string for broadcast loads from the constant pool.
Summary:
When broadcasting from the constant pool its useful to print out the final vector similar to what we do for normal moves from the constant pool.

I changed only a couple tests that were broadcast focused. One of them had been previously hand tweaked after running the script so that it could check the constant pool declaration. But I think this patch makes that unnecessary now since we can check the comment instead.

Reviewers: spatel, RKSimon, zvi

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34923

llvm-svn: 307062
2017-07-04 05:46:11 +00:00
Anton Yartsev 66d32c5e06 [legalize-types] Clean up softening machinery.
The patch makes SoftenFloatResult/Operand logic just the same as all other legalization routines have: SoftenFloatResult() now fills the SoftenFloats map and SoftenFloatOperand() perform all needed replacements. This prevents softening mashinery from leaving stale entries in SoftenFloats map (that resulted in errors during the legalize type checking) and clarifies softening. The patch replaces https://reviews.llvm.org/D29265.

Differential Revision: https://reviews.llvm.org/D31946

llvm-svn: 307053
2017-07-04 01:08:55 +00:00
Simon Pilgrim fa6e675267 [X86][SSE4A] Add support for combining from EXTRQI/INSERTQI shuffles
llvm-svn: 307048
2017-07-03 20:58:16 +00:00
Simon Pilgrim bdfb3b1d5f [X86][SSE4A] Add SSE4A shuffle tests on pre-SSSE3 hardware
llvm-svn: 307042
2017-07-03 16:53:11 +00:00
Simon Pilgrim b5c68a6717 [X86][SSE4A] Test SSE4A shuffle combining on SSE42 capable target as well
llvm-svn: 307038
2017-07-03 15:55:54 +00:00
Zvi Rackover d7a1c334ce DAGCombine: Combine BUILD_VECTOR to TRUNCATE
Summary:
Add a combine for creating a truncate to replace a build_vector composed of extracts with
indices that form a stride-2^N series.

Example:
v8i32 V = ...

v4i32 build_vector((extract_elt V, 0), (extract_elt V, 2), (extract_elt V, 4), (extract_elt V, 6))
-->
v4i32 truncate (bitcast V to v4i64)

Related discussion in llvm-dev about canonicalizing shuffles to
truncates in LLVM IR:
http://lists.llvm.org/pipermail/llvm-dev/2017-January/108936.html.

Reviewers: spatel, RKSimon, efriedma, igorb, craig.topper, wolfgangp, delena

Reviewed By: delena

Subscribers: guyblank, delena, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D34077

llvm-svn: 307036
2017-07-03 15:47:40 +00:00
Sanjay Patel e9b1d16a8c [x86] auto-generate complete checks for tests; NFC
These all used 'CHECK-NOT' which isn't necessary if we have complete checks.
There were also over-specifications in the RUN params such as CPU model.

llvm-svn: 307033
2017-07-03 15:27:19 +00:00
Sanjay Patel d3173740fd [x86] auto-generate complete checks for tests; NFC
These all used 'CHECK-NOT' which isn't necessary if we have complete checks.
There were also several over-specifications in the RUN params such as CPU model or OS requirement

llvm-svn: 307028
2017-07-03 15:04:05 +00:00
Simon Pilgrim decfaca033 [X86][SSE4A] Add tests showing missed opportunities to combine EXTRQI/INSERTQI shuffles
llvm-svn: 307027
2017-07-03 15:01:07 +00:00
Sanjay Patel dab798a25f [x86] auto-generate complete checks for tests; NFC
These all used 'CHECK-NOT' which isn't necessary if we have complete checks.

llvm-svn: 307024
2017-07-03 14:29:45 +00:00
Igor Breger 5c787ab346 [GlobalISel][X86] fix %ptr(p0) = G_CONSTANT selection.
llvm-svn: 307019
2017-07-03 11:06:54 +00:00
Simon Pilgrim f05c5ef441 [X86][AVX512] Test AVX512VPOPCNTDQ CTPOP with/without AVX512BW
llvm-svn: 306991
2017-07-02 19:52:20 +00:00
Simon Pilgrim a9655ffb42 [X86][AVX512VPOPCNTDQ] Improve support for v16i8/v8i16/v16i16/ CTPOP
Zero extend to v16i32/v8i64, use VPOPCNTDQ instructions and truncate back.

llvm-svn: 306990
2017-07-02 19:32:37 +00:00
Simon Pilgrim 3f5ed96f92 [X86][AVX512] Cleanup tzcnt tests triples and attributes
Avoid use of specific -mcpu

llvm-svn: 306989
2017-07-02 18:51:48 +00:00
Simon Pilgrim df55dd09d6 [X86][AVX512] Cleanup popcnt tests triples and attributes
Avoid use of specific -mcpu

llvm-svn: 306988
2017-07-02 18:35:22 +00:00
Sanjay Patel 7d263c1a27 [x86] auto-generate complete checks for tests; NFC
These all used 'CHECK-NOT' which isn't necessary if we have complete checks.

llvm-svn: 306984
2017-07-02 15:24:08 +00:00
Sanjay Patel dd076f0178 [x86] remove unnecessary RUN for test after auto-generating checks; NFC
llvm-svn: 306983
2017-07-02 15:16:17 +00:00
Sanjay Patel c22223e6cd [x86] update test to use FileCheck and auto-generate checks; NFC
llvm-svn: 306982
2017-07-02 15:15:18 +00:00
Sanjay Patel 27cccc96c2 [x86] auto-generate complete checks for tests; NFC
These all used 'CHECK-NOT' which isn't necessary if we have complete checks.

llvm-svn: 306981
2017-07-02 14:50:35 +00:00
Simon Pilgrim 8971b2904e [X86][SSE] Attempt to combine 64-bit and 32-bit shuffles to unary shuffles before bit shifts
We are combining shuffles to bit shifts before unary permutes, which means we can't fold loads plus the destination register is destructive

llvm-svn: 306978
2017-07-02 14:16:25 +00:00
Simon Pilgrim 4cb5613c38 [X86][SSE] Attempt to combine 64-bit and 16-bit shuffles to unary shuffles before bit shifts
We are combining shuffles to bit shifts before unary permutes, which means we can't fold loads plus the destination register is destructive

The 32-bit shuffles are a bit tricky and will be dealt with in a later patch

llvm-svn: 306977
2017-07-02 13:19:10 +00:00
Simon Pilgrim 638af5f1c4 [X86][SSE] Add test showing missed opportunity to combine to pshuflw
We are combining shuffles to bit shifts before unary permutes, which means we can't fold loads plus the destination register is destructive

llvm-svn: 306976
2017-07-02 12:56:10 +00:00
Gadi Haber dc25c2b08b [X86] Rerun "update_llc_test_checks" tool on CodeGen tests. NFC.
This is NFC after rerunning the "update_llc_test_checks.py" tool on the CodeGen X86 tests in order to submit a patch.
Minor differences due to added "End of Function" lines.

Reviewers: zvi
Differential Revision: https://reviews.llvm.org/D34933

llvm-svn: 306973
2017-07-02 12:01:33 +00:00
Igor Breger 717bd36c83 [GlobalISel][X86] Support G_GLOBAL_VALUE operation.
Summary: Support G_GLOBAL_VALUE operation. For now most of the PIC configurations not implemented yet.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D34738

Conflicts:
	test/CodeGen/X86/GlobalISel/regbankselect-X86_64.mir

llvm-svn: 306972
2017-07-02 08:58:29 +00:00
Igor Breger b186a69aa5 [GlobalISel][X86] Support vector type G_UNMERGE_VALUES selection.
Summary:
Support vector type G_UNMERGE_VALUES selection.
For now G_UNMERGE_VALUES marked as legal for any type, so nothing to do in legalizer.

Reviewers: t.p.northover, qcolombet, zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, kristof.beyls, guyblank, llvm-commits

Differential Revision: https://reviews.llvm.org/D33665

llvm-svn: 306971
2017-07-02 08:15:49 +00:00
Hiroshi Inoue bb703e8960 fix trivial typos; NFC
suport -> support

llvm-svn: 306968
2017-07-02 03:24:54 +00:00
Simon Pilgrim 3bad6f3167 [X86][RDSEED] Split off i64 intrinsic tests and test i16/i32 on 32-bit target as well.
llvm-svn: 306961
2017-07-01 16:42:16 +00:00
Simon Pilgrim 2d320161e5 [X86][RDRAND] Split off i64 intrinsic tests and test i16/i32 on 32-bit target as well.
llvm-svn: 306960
2017-07-01 16:41:12 +00:00
Simon Pilgrim 2b679e1812 [X86] Removed reference to update_test_checks.py
llvm-svn: 306959
2017-07-01 16:34:29 +00:00
Simon Pilgrim ad7f0844ea [X86][AVX] Remove duplicate autogeneration note
llvm-svn: 306958
2017-07-01 16:32:02 +00:00
Nirav Dave a35938d827 Revert "[DAG] Rewrite areNonVolatileConsecutiveLoads to use BaseIndexOffset"
This reverts commit r306819 which appears be exposing underlying
issues in a stage1 ppc64be build

llvm-svn: 306820
2017-06-30 12:56:02 +00:00
Nirav Dave c5a48c1ee8 [DAG] Rewrite areNonVolatileConsecutiveLoads to use BaseIndexOffset
As discussed in D34087, rewrite areNonVolatileConsecutiveLoads using
generic checks. Also, propagate missing local handling from there to
BaseIndexOffset checks.

Tests of note:

  * test/CodeGen/X86/build-vector* - Improved.
  * test/CodeGen/BPF/undef.ll - Improved store alignment allows an
    additional store merge

  * test/CodeGen/X86/clear_upper_vector_element_bits.ll - This is a
    case we already do not handle well. Here, the DAG is improved, but
    scheduling causes a code size degradation.

Reviewers: RKSimon, craig.topper, spatel, andreadb, filcab

Subscribers: nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D34472

llvm-svn: 306819
2017-06-30 12:23:41 +00:00
Simon Pilgrim e5e9232260 [X86] Updated 32-bit memcmp tests to run with/without SSE2
llvm-svn: 306816
2017-06-30 11:23:59 +00:00
Taewook Oh 0e35ea3b7c Remove redundant copy in recurrences
Summary:
If there is a chain of instructions formulating a recurrence, commuting operands can help removing a redundant copy. In the following example code,

```
BB#1: ; Loop Header
  %vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13
  ...

BB#6: ; Loop Latch
  %vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15
  %vreg10<def,tied1> = ADD32rr %vreg1<kill,tied0>, %vreg0<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg1,%vreg0
  %vreg3<def,tied1> = ADD32rr %vreg2<kill,tied0>, %vreg10<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2,%vreg10
  CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3
  %vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3
  JL_1 <BB#1>, %EFLAGS<imp-use,kill>
```

Existing two-address generation pass generates following code:

```
BB#1:
  %vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13
  ...

BB#6:
    Predecessors according to CFG: BB#5 BB#4
  %vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15
  %vreg10<def> = COPY %vreg1<kill>; GR32:%vreg10,%vreg1
  %vreg10<def,tied1> = ADD32rr %vreg10<tied0>, %vreg0<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg0
  %vreg3<def> = COPY %vreg10<kill>; GR32:%vreg3,%vreg10
  %vreg3<def,tied1> = ADD32rr %vreg3<tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2
  CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3
  %vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3
  JL_1 <BB#1>, %EFLAGS<imp-use,kill>
  JMP_1 <BB#7>
```

This is suboptimal because the assembly code generated has a redundant copy at the end of #BB6 to feed %vreg13 to BB#1:

```
.LBB0_6:
  addl  %esi, %edi
  addl  %ebx, %edi
  cmpl  $10, %edi
  movl  %edi, %esi
  jl  .LBB0_1
```

This redundant copy can be elimiated by making instructions in the recurrence chain to compute the value "into" the register that actually holds the feedback value. In this example, this can be achieved by commuting %vreg0 and %vreg1 to compute %vreg10. With that change, code after two-address generation becomes

```
BB#1:
  %vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13
  ...

BB#6: derived from LLVM BB %bb7
    Predecessors according to CFG: BB#5 BB#4
  %vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15
  %vreg10<def> = COPY %vreg0<kill>; GR32:%vreg10,%vreg0
  %vreg10<def,tied1> = ADD32rr %vreg10<tied0>, %vreg1<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg1
  %vreg3<def> = COPY %vreg10<kill>; GR32:%vreg3,%vreg10
  %vreg3<def,tied1> = ADD32rr %vreg3<tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2
  CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3
  %vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3
  JL_1 <BB#1>, %EFLAGS<imp-use,kill>
  JMP_1 <BB#7>
```

and the final assembly does not have redundant copy:

```
.LBB0_6:
  addl  %edi, %eax
  addl  %ebx, %eax
  cmpl  $10, %eax
  jl  .LBB0_1
```

Reviewers: qcolombet, MatzeB, wmi

Reviewed By: wmi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31821

llvm-svn: 306758
2017-06-29 23:11:24 +00:00
Daniel Jasper 559aa75382 Revert "r306529 - [X86] Correct dwarf unwind information in function epilogue"
I am 99% sure that this breaks the PPC ASAN build bot:
http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/3112/steps/64-bit%20check-asan/logs/stdio

If it doesn't go back to green, we can recommit (and fix the original
commit message at the same time :) ).

llvm-svn: 306676
2017-06-29 13:58:24 +00:00
Igor Breger 0cddd34876 [GlobalISel][X86] Support vector type G_MERGE_VALUES selection.
Summary:
Support vector type G_MERGE_VALUES selection. For now G_MERGE_VALUES marked as legal for any type, so nothing to do in legalizer.
Split from https://reviews.llvm.org/D33665

Reviewers: qcolombet, t.p.northover, zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, kristof.beyls, guyblank, llvm-commits

Differential Revision: https://reviews.llvm.org/D33958

llvm-svn: 306665
2017-06-29 12:08:28 +00:00
Simon Pilgrim 9a68e69c68 [X86][SSE] Dropped -mcpu from palignr tests
Use triple and attribute only for consistency 

Add AVX tests as well

llvm-svn: 306664
2017-06-29 11:13:39 +00:00
Simon Pilgrim e2eacbfc23 [X86][SSE] Regenerate shuffle test with update_llc_test_checks.py
llvm-svn: 306663
2017-06-29 11:11:37 +00:00
Simon Pilgrim 0afe97f480 [X86][SSE] Dropped -mcpu from vector shift tests
Use triple and attribute only for consistency 

llvm-svn: 306662
2017-06-29 11:09:53 +00:00
Simon Pilgrim 91539ce2d3 [X86][SSE] Dropped -mcpu from zero insertion tests
Use triple and attribute only for consistency 

llvm-svn: 306661
2017-06-29 11:08:11 +00:00
Michael Zuckerman 4bcb9c3349 [LLVM][X86][Goldmont] Adding new target-cpu: Goldmont
[LLVM SIDE]
Connecting the GoldMont processor to his feature.

Reviewers: 
1. igorb
2. zvi
3. delena
4. RKSimon
5. craig.topper        

Differential Revision: https://reviews.llvm.org/D34504

llvm-svn: 306658
2017-06-29 10:00:33 +00:00
Zvi Rackover da3943d600 [X86] Adding shuffle tests demonstrating missed vcompress opportunities. NFC
llvm-svn: 306646
2017-06-29 06:22:01 +00:00
Chih-Hung Hsieh 514dafdae3 Another test commit.
llvm-svn: 306567
2017-06-28 17:12:51 +00:00
Simon Pilgrim 48b30c3d55 [X86] Added BSWAP tests for illegal i64/i128/i256 'wide' scalar integers
llvm-svn: 306546
2017-06-28 14:07:50 +00:00
Simon Pilgrim 4f5fcb03ad [X86][SSE] Dropped -mcpu from vector bswap tests
Use triple and attribute only for consistency 

llvm-svn: 306545
2017-06-28 13:59:15 +00:00
Michael Zuckerman d0e663a697 [X86][LLVM][test]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess test.
Exapnding the test to include AVX target. 
Adding base tast (to trunk) for Store strid=4 vf=32. 

llvm-svn: 306543
2017-06-28 13:42:45 +00:00
Igor Breger 86cf07a32e [GlobalISel][X86] Test G_CONSTANT i32 0 TableGen'erated selection.NFC.
llvm-svn: 306537
2017-06-28 12:43:21 +00:00
Igor Breger d5b59cf914 [GlobalISel][X86] Support bitwise operations : G_AND, G_OR, G_XOR
Summary: Support G_AND, G_OR, G_XOR for i8/i16/i32/i64. Selection done via TableGen'erated code.

Reviewers: zvi, guyblank, aymanmus, m_zuckerman

Reviewed By: aymanmus

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D34605

llvm-svn: 306533
2017-06-28 11:39:04 +00:00
Michael Zuckerman f66840020c Reverting commit 306414 on behalf of @gadi.haber
llvm-svn: 306532
2017-06-28 11:23:31 +00:00
Simon Pilgrim b9fa16bc53 [X86][AVX2] Dropped -mcpu from avx2 arithmetic/intrinsics tests
Use triple and attribute only for consistency 

llvm-svn: 306531
2017-06-28 10:54:54 +00:00
Petar Jovanovic 7b3a38ec30 [X86] Correct dwarf unwind information in function epilogue
CFI instructions that set appropriate cfa offset and cfa register are now
inserted in emitEpilogue() in X86FrameLowering.

Majority of the changes in this patch:

1. Ensure that CFI instructions do not affect code generation.
2. Enable maintaining correct information about cfa offset and cfa register
in a function when basic blocks are reordered, merged, split, duplicated.

These changes are target independent and described below.

Changed CFI instructions so that they:

1. are duplicable
2. are not counted as instructions when tail duplicating or tail merging
3. can be compared as equal

Add information to each MachineBasicBlock about cfa offset and cfa register
that are valid at its entry and exit (incoming and outgoing CFI info). Add
support for updating this information when basic blocks are merged, split,
duplicated, created. Add a verification pass (CFIInfoVerifier) that checks
that outgoing cfa offset and register of predecessor blocks match incoming
values of their successors.

Incoming and outgoing CFI information is used by a late pass
(CFIInstrInserter) that corrects CFA calculation rule for a basic block if
needed. That means that additional CFI instructions get inserted at basic
block beginning to correct the rule for calculating CFA. Having CFI
instructions in function epilogue can cause incorrect CFA calculation rule
for some basic blocks. This can happen if, due to basic block reordering,
or the existence of multiple epilogue blocks, some of the blocks have wrong
cfa offset and register values set by the epilogue block above them.

Patch by Violeta Vukobrat.

Differential Revision: https://reviews.llvm.org/D18046

llvm-svn: 306529
2017-06-28 10:21:17 +00:00
Sanjay Patel 4b23fa0abf [CGP] add specialization for memcmp expansion with only one basic block
llvm-svn: 306485
2017-06-27 23:15:01 +00:00
Sanjay Patel 70b36f193d [CGP] eliminate a sub instruction in memcmp expansion
As noted in D34071, there are some IR optimization opportunities that could be 
handled by normal IR passes if this expansion wasn't happening so late in CGP.

Regardless of that, it seems wasteful to knowingly produce suboptimal IR here, 
so I'm proposing this change:
  %s = sub i32 %x, %y
  %r = icmp ne %s, 0
    =>
  %r = icmp ne %x, %y

Changing the predicate to 'eq' mimics what InstCombine would do, so that's just
an efficiency improvement if we decide this expansion should happen sooner.

The fact that the PowerPC backend doesn't eliminate the 'subf.' might be 
something for PPC folks to investigate separately.

Differential Revision: https://reviews.llvm.org/D34416

llvm-svn: 306471
2017-06-27 21:46:34 +00:00
Chih-Hung Hsieh ff680f0386 Another test commit
llvm-svn: 306420
2017-06-27 16:18:41 +00:00
Gadi Haber 13759a7ed6 Updated and extended the information about each instruction in HSW and SNB to include the following data:
•static latency
•number of uOps from which the instructions consists
•all ports used by the instruction

Reviewers: 
 RKSimon 
 zvi  
aymanmus  
m_zuckerman 

Differential Revision: https://reviews.llvm.org/D33897
 

llvm-svn: 306414
2017-06-27 15:05:13 +00:00
Ayman Musa 721d97f7b8 Recommitting rL305465 after fixing bug in TableGen in rL306251 & rL306371
[X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove redundant shift left+right instructions).

AVX512 compare instructions return v*i1 types.
In cases where the number of elements in the returned value are less than 8, clang adds zeroes to get a mask of v8i1 type.
Later on it's replaced with CONCAT_VECTORS, which then is lowered to many DAG nodes including insert/extract element and shift right/left nodes.
The fact that AVX512 compare instructions put the result in a k register and zeroes all its upper bits allows us to remove the extra nodes simply by copying the result to the required register class.

When lowering, identify these cases and transform them into an INSERT_SUBVECTOR node (marked legal), then catch this pattern in instructions selection phase and transform it into one avx512 cmp instruction.

Differential Revision: https://reviews.llvm.org/D33188

llvm-svn: 306402
2017-06-27 12:08:37 +00:00
Simon Pilgrim 71d8b67bea [X86][AVX512] Regenerate avx512 arithmetic tests
llvm-svn: 306389
2017-06-27 10:13:56 +00:00
Igor Breger 925f088bae [GlobalISel][X86] Add fp32/62 legalizer, regbank-select, selection tests for G_FADD, G_FSUB, G_FMUL, G_FDIV. NFC.
llvm-svn: 306370
2017-06-27 07:01:54 +00:00
Wolfgang Pieb 9f65858235 DAGCombine: Make sure we only eliminate trunc/extend when the scales of truncation and extension match.
This fixes PR33368.

Reviewer: rksimon

Differential Revision:  https://reviews.llvm.org/D34069

llvm-svn: 306345
2017-06-26 23:05:51 +00:00
Sanjay Patel b859910eb2 [x86] add tests for missing sbb transforms; NFC
llvm-svn: 306337
2017-06-26 22:20:07 +00:00
Simon Pilgrim d58f051792 [X86][SSE] Check SSE2/SSE3 codegen tests on i686 and x86_64
llvm-svn: 306314
2017-06-26 18:20:46 +00:00
Simon Pilgrim f07663876a [X86][SSE] Add combine tests for PMULDQ/PMULUDQ
Found several missed optimizations while investigating replacing _mm_mul_epi32/_mm_mul_epu32 with generic implementations

llvm-svn: 306302
2017-06-26 16:22:52 +00:00
Ahmed Bougacha 58a197414e [X86][AVX-512] Don't raise inexact in ceil, floor, round, trunc.
The non-AVX-512 behavior was changed in r248266 to match N1778
(C bindings for IEEE-754 (2008)), which defined the four functions
to not raise the inexact exception ("rint" is still defined as raising
it).

Update the AVX-512 lowering of these functions to match that: it should
not be different.

llvm-svn: 306299
2017-06-26 16:00:24 +00:00
Simon Pilgrim 0ad0e5802b [X86] Add test case for PR15981
llvm-svn: 306296
2017-06-26 15:53:11 +00:00
Sanjay Patel 15748d239e [x86] transform vector inc/dec to use -1 constant (PR33483)
Convert vector increment or decrement to sub/add with an all-ones constant:

add X, <1, 1...> --> sub X, <-1, -1...>
sub X, <1, 1...> --> add X, <-1, -1...>

The all-ones vector constant can be materialized using a pcmpeq instruction that is 
commonly recognized as an idiom (has no register dependency), so that's better than 
loading a splat 1 constant.

AVX512 uses 'vpternlogd' for 512-bit vectors because there is apparently no better
way to produce 512 one-bits.

The general advantages of this lowering are:
1. pcmpeq has lower latency than a memop on every uarch I looked at in Agner's tables, 
   so in theory, this could be better for perf, but...

2. That seems unlikely to affect any OOO implementation, and I can't measure any real 
   perf difference from this transform on Haswell or Jaguar, but...

3. It doesn't look like it from the diffs, but this is an overall size win because we 
   eliminate 16 - 64 constant bytes in the case of a vector load. If we're broadcasting 
   a scalar load (which might itself be a bug), then we're replacing a scalar constant 
   load + broadcast with a single cheap op, so that should always be smaller/better too.

4. This makes the DAG/isel output more consistent - we use pcmpeq already for padd x, -1 
   and psub x, -1, so we should use that form for +1 too because we can. If there's some
   reason to favor a constant load on some CPU, let's make the reverse transform for all
   of these cases (either here in the DAG or in a later machine pass).

This should fix:
https://bugs.llvm.org/show_bug.cgi?id=33483

Differential Revision: https://reviews.llvm.org/D34336

llvm-svn: 306289
2017-06-26 14:19:26 +00:00
Michael Zuckerman ce7e187f84 [X86][LLVM][test]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess test.
Adding base tast (to trunk) for Store strid=4 vf=32. 

llvm-svn: 306286
2017-06-26 13:27:32 +00:00
Serguei Katkov 0e70206c8f This reverts commit r306272.
Revert "[MBP] do not rotate loop if it creates extra branch"

It breaks the sanitizer build bots. Need to fix this.

llvm-svn: 306276
2017-06-26 06:51:45 +00:00
Serguei Katkov b01fff06ed [MBP] do not rotate loop if it creates extra branch
This is a last fix for the corner case of PR32214. Actually this is not really corner case in general.

We should not do a loop rotation if we create an additional branch due to it.
Consider the case where we have a loop chain H, M, B, C , where
H is header with viable fallthrough from pre-header and exit from the loop
M - some middle block
B - backedge to Header but with exit from the loop also.
C - some cold block of the loop.

Let's H is determined as a best exit. If we do a loop rotation M, B, C, H we can introduce the extra branch.
Let's compute the change in number of branches:
+1 branch from pre-header to header
-1 branch from header to exit
+1 branch from header to middle block if there is such
-1 branch from cold bock to header if there is one

So if C is not a predecessor of H then we introduce extra branch.

This change actually prohibits rotation of the loop if both true
1) Best Exit has next element in chain as successor.
2) Last element in chain is not a predecessor of first element of chain.

Reviewers: iteratee, xur
Reviewed By: iteratee
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34271

llvm-svn: 306272
2017-06-26 05:27:27 +00:00
Simon Pilgrim 9956364a1f [X86] Add test case for PR15705
llvm-svn: 306246
2017-06-25 16:12:45 +00:00
Elena Demikhovsky 72f991cded AVX-512: Fixed a crash during legalization of <3 x i8> type
The compiler fails with assertion during legalization of SETCC for <3 x i8> operands.
The result is extended to <4 x i8> and then truncated <4 x i1>. It does not happen on AVX2, because the final result of SETCC is <4 x i32>.

Differential Revision: https://reviews.llvm.org/D34503

llvm-svn: 306242
2017-06-25 13:36:20 +00:00
Igor Breger f5035d6ee5 [GlobalISel][X86] Support vector type G_EXTRACT selection.
Summary:
Support vector type G_EXTRACT selection. For now G_EXTRACT marked as legal for any type, so nothing to do in legalizer.
Split from https://reviews.llvm.org/D33665

Reviewers: qcolombet, t.p.northover, zvi, guyblank

Reviewed By: guyblank

Subscribers: guyblank, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D33957

llvm-svn: 306240
2017-06-25 11:42:17 +00:00
Nirav Dave cedfeb364f Add bitcast store-merge test.
llvm-svn: 306158
2017-06-23 20:52:14 +00:00
Sanjay Patel 3de6bad65f [x86] fix value types for SBB transform (PR33560)
I'm not sure yet why this wouldn't fail in the simple case,
but clearly I used the wrong value type with:
https://reviews.llvm.org/rL306040

...and the bug manifests with:
https://bugs.llvm.org/show_bug.cgi?id=33560

llvm-svn: 306139
2017-06-23 18:42:15 +00:00
Simon Pilgrim 19cee0d56c [X86][AVX] Regenerate i256 bitcasted store test
Check on slow/fast unaligned memory targets

llvm-svn: 306138
2017-06-23 18:34:56 +00:00
Simon Pilgrim dfa436079f Regenerate extract-store.ll tests
llvm-svn: 306131
2017-06-23 17:19:44 +00:00
Sanjay Patel 021f32fd0f [x86] auto-generate complete checks; NFC
llvm-svn: 306114
2017-06-23 15:29:49 +00:00
Sanjay Patel 02469b63c2 [x86] auto-generate complete checks; NFC
llvm-svn: 306113
2017-06-23 15:22:27 +00:00
Sanjay Patel 563e5afa0e [x86] remove overridden target settings in test; NFC
r306109 was supposed to make this change, but I committed the wrong version.

llvm-svn: 306110
2017-06-23 15:06:30 +00:00
Sanjay Patel 8e06df4303 [x86] rename test file and auto-generate complete checks; NFC
The command-line params override the target setting in the file itself, so delete that.
Also, remove the cpu and arch because those don't matter and neither does the OS specification in the triple.

llvm-svn: 306109
2017-06-23 14:58:21 +00:00
Simon Pilgrim 859b48d2d3 [X86][AVX] Extended vector average tests
Added AVX1 tests and merged AVX1/AVX2/AVX512 checks where possible

llvm-svn: 306107
2017-06-23 14:38:00 +00:00
Simon Pilgrim dbd20ffee1 [X86][SSE] Dropped -mcpu from vector average tests
Use triple and attribute only for consistency 

llvm-svn: 306104
2017-06-23 14:16:50 +00:00
Simon Pilgrim dbf8f5ace7 [X86][SSE] Dropped -mcpu from scalar math tests
Use triple and attribute only for consistency 

llvm-svn: 306097
2017-06-23 13:07:20 +00:00
Simon Pilgrim 5d3d716815 [X86][SSE] Dropped -mcpu from insertps tests
Use triple and attribute only for consistency 

llvm-svn: 306092
2017-06-23 11:00:49 +00:00
Sanjay Patel 359ae44fb4 [x86] add/sub (X==0) --> sbb(cmp X, 1)
This is very similar to the transform in:
https://reviews.llvm.org/rL306040
...but in this case, we use cmp X, 1 to set the carry bit as needed.

Again, we can show that all of these are logically equivalent (although
InstCombine currently canonicalizes to a form not seen here), and if
we believe IACA, then this is the smallest/fastest code. Eg, with SNB:

| Num Of |              Ports pressure in cycles               |    |
|  Uops  |  0  - DV  |  1  |  2  -  D  |  3  -  D  |  4  |  5  |    |
---------------------------------------------------------------------
|   1    | 1.0       |     |           |           |     |     |    | cmp edi, 0x1
|   2    |           | 1.0 |           |           |     | 1.0 | CP | sbb eax, eax


The larger motivation is to clean up all select-of-constants combining/lowering 
because we're missing some common cases.

llvm-svn: 306072
2017-06-22 23:47:15 +00:00
Farhana Aleen 4b652a5335 Supported lowerInterleavedStore() in X86InterleavedAccess.
Reviewers: RKSimon, DavidKreitzer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32658

llvm-svn: 306068
2017-06-22 22:59:04 +00:00
Sanjay Patel ff051957fc [x86] add more tests for select --> sbb transform; NFC
These are siblings of the tests added with r306032.

llvm-svn: 306064
2017-06-22 22:17:05 +00:00
Craig Topper 792fc92be2 [AVX-512] Remove and autoupgrade the masked integer compare intrinsics
Summary:
These intrinsics aren't used by clang and haven't been for a while.

There's some really terrible codegen in the 32-bit target for avx512bw due to i64 not being legal. But as I said these intrinsics aren't used by clang even before this patch so this codegen reflects our clang behavior today.

Reviewers: spatel, RKSimon, zvi, igorb

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34389

llvm-svn: 306047
2017-06-22 20:11:01 +00:00
Sanjay Patel 41a34e4111 [x86] add/sub (X==0) --> sbb(neg X)
Our handling of select-of-constants is lumpy in IR (https://reviews.llvm.org/D24480),
lumpy in DAGCombiner, and lumpy in X86ISelLowering. That's why we only had the 'sbb'
codegen in 1 out of the 4 tests. This is a step towards smoothing that out.

First, show that all of these IR forms are equivalent:
http://rise4fun.com/Alive/mx

Second, show that the 'sbb' version is faster/smaller. IACA output for SandyBridge
(later Intel and AMD chips are similar based on Agner's tables):

This is the "obvious" x86 codegen (what gcc appears to produce currently):

| Num Of |              Ports pressure in cycles               |    |
|  Uops  |  0  - DV  |  1  |  2  -  D  |  3  -  D  |  4  |  5  |    |
---------------------------------------------------------------------
|   1*   |           |     |           |           |     |     |    | xor eax, eax
|   1    | 1.0       |     |           |           |     |     | CP | test edi, edi
|   1    |           |     |           |           |     | 1.0 | CP | setnz al
|   1    |           | 1.0 |           |           |     |     | CP | neg eax


This is the adc version:
|   1*   |           |     |           |           |     |     |    | xor eax, eax
|   1    | 1.0       |     |           |           |     |     | CP | cmp edi, 0x1
|   2    |           | 1.0 |           |           |     | 1.0 | CP | adc eax, 0xffffffff


And this is sbb:
|   1    | 1.0       |     |           |           |     |     |    | neg edi
|   2    |           | 1.0 |           |           |     | 1.0 | CP | sbb eax, eax

If IACA is trustworthy, then sbb became a single uop in Broadwell, so this will be
clearly better than the alternatives going forward.

llvm-svn: 306040
2017-06-22 18:11:19 +00:00
Sanjay Patel 96e4e0967e [x86] add tests for select --> sbb transform; NFC
llvm-svn: 306032
2017-06-22 17:01:14 +00:00
whitequark cebe8241ca [X86] Add support for "probe-stack" attribute
This commit adds prologue code emission for stack probe function
calls.

Reviewed By: majnemer

Differential Revision: https://reviews.llvm.org/D34387

llvm-svn: 306010
2017-06-22 15:42:53 +00:00
Igor Breger 1c29be7e4f [GlobalISel][X86] Support vector type G_INSERT legalization/selection.
Summary:
Support vector type G_INSERT legalization/selection.
Split from https://reviews.llvm.org/D33665

Reviewers: qcolombet, t.p.northover, zvi, guyblank

Reviewed By: guyblank

Subscribers: guyblank, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D33956

llvm-svn: 305989
2017-06-22 09:43:35 +00:00
Elena Demikhovsky 2dac0b4d58 AVX-512: Lowering Masked Gather intrinsic - fixed a bug
Masked gather for vector length 2 is lowered incorrectly for element type i32.
The type <2 x i32> was automatically extended to <2 x i64> and we generated VPGATHERQQ instead of VPGATHERQD.
The type <2 x float> is extended to <4 x float>, so there is no bug for this type, but the sequence may be more optimal.

In this patch I'm fixing <2 x i32>bug and optimizing <2 x float> sequence for GATHERs only. The same fix should be done for Scatters as well.

Differential revision: https://reviews.llvm.org/D34343

llvm-svn: 305987
2017-06-22 06:47:41 +00:00
Davide Italiano 9b8e3d308f [Solaris] emit .init_array instead of .ctors on Solaris (Sparc/x86)
Patch by Fedor Sergeev.

Differential Revision:  https://reviews.llvm.org/D33868

llvm-svn: 305948
2017-06-21 20:36:32 +00:00
Simon Pilgrim 550cb7e82c [X86][SSE] Dropped -mcpu from 256-bit vector shuffle tests
Use triple and attribute only for consistency 

llvm-svn: 305916
2017-06-21 14:51:23 +00:00
Simon Pilgrim 9d0c2b7bad [X86][SSE] Dropped -mcpu from 128-bit vector shuffle tests
Use triple and attribute only for consistency 

llvm-svn: 305913
2017-06-21 14:23:02 +00:00
Simon Pilgrim 5309b7d5c9 [X86][SSE] Regenerate merge store tests
llvm-svn: 305910
2017-06-21 13:46:42 +00:00
Simon Pilgrim e74e08fe61 [X86][SSE] Dropped -mcpu from vector blend shuffle tests and regenerate
Use triple and attribute only for consistency 

llvm-svn: 305909
2017-06-21 13:45:33 +00:00
Simon Pilgrim 98aab7c6fc [X86][SSE] Dropped -mcpu from vector shuffle tests
Use triple and attribute only for consistency 

llvm-svn: 305908
2017-06-21 13:26:52 +00:00
Simon Pilgrim 6d5d6b542b [X86][SSE] Dropped -mcpu from vector zero extend tests
Use triple and attribute only for consistency 

llvm-svn: 305907
2017-06-21 13:17:14 +00:00
Simon Pilgrim c388ec32e0 [X86][SSE] Dropped -mcpu from variable shuffle tests
Use triple and attribute only for consistency 

llvm-svn: 305906
2017-06-21 13:15:41 +00:00
Simon Pilgrim 73814a2594 [X86][AVX] Add AVX1 shuffle truncation tests
llvm-svn: 305905
2017-06-21 12:58:56 +00:00
Simon Pilgrim db6c3fa872 [X86][SSE] Add SSE2/SSE42 shuffle truncation tests
llvm-svn: 305904
2017-06-21 12:58:19 +00:00
Zvi Rackover 845ca8fba9 [X86] Rerun the update_llc_test_checks tool on test. NFC.
llvm-svn: 305897
2017-06-21 11:21:43 +00:00
Guy Blank 52d73fce85 [DAGCombiner] Add another combine from build vector to shuffle
Add support for combining a build vector to a shuffle.
When the build vector is of extracted elements from 2 vectors (vec1, vec2) where vec2 is 2 times smaller than vec1.

llvm-svn: 305883
2017-06-21 07:38:41 +00:00
Dean Michael Berris 28ecff5cf1 [XRay] Reduce synthetic references emitted by XRay
Summary:
When we're building with XRay instrumentation, we use a trick that
preserves references from the function to a function sled index. This
index table lives in a separate section, and without this trick the
linker is free to garbage-collect this section and all the segments it
refers to. Until we're able to tell the linkers to preserve these
sections, we use this reference trick to keep around both the index and
the entries in the instrumentation map.

Before this change we emitted both a synthetic reference to the label in
the instrumentation map, and to the entry in the function map index.
This change removes the first synthetic reference and only emits one
synthetic reference to the index -- the index entry has the references
to the labels in the instrumentation map, so the linker will still
preserve those if the function itself is preserved.

This reduces the amount of synthetic references we emit from 16 bytes to
just 8 bytes in x86_64, and similarly to other platforms.

Reviewers: dblaikie

Subscribers: javed.absar, kpw, pelikan, llvm-commits

Differential Revision: https://reviews.llvm.org/D34340

llvm-svn: 305880
2017-06-21 06:39:42 +00:00
Serguei Katkov 0b0dc57dd8 [ImplicitNullChecks] Uphold an invariant in areMemoryOpsAliased
Right now areMemoryOpsAliased has an assertion justified as:

MMO1 should have a value due it comes from operation we'd like to use
as implicit null check.
assert(MMO1->getValue() && "MMO1 should have a Value!");
However, it is possible for that invariant to not be upheld in the
following situation (conceptually):

Null check %RAX
NotNullSucc:

%RAX = LEA %RSP, 16            // I0
%RDX = MOV64rm %RAX            // I1
With the current code, we will have an early exit from
ImplicitNullChecks::isSuitableMemoryOp on I0 with SR_Unsuitable.
However, I1 will look plausible (since it loads from %RAX) and
will go ahead and call areMemoryOpsAliased(I1, I0). This will cause
us to fail the assert mentioned above since I1 does not load from an
IR level value and thus is allowed to have a non-Value base address.

The fix is to bail out earlier whenever we see an unsuitable
instruction overwrite PointerReg. This would guarantee that when we
call areMemoryOpsAliased, we're guaranteed to be looking at an
instruction that loads from or stores to an IR level value.

Original Patch Author: sanjoy
Reviewers: sanjoy, mkazantsev, reames
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34385

llvm-svn: 305879
2017-06-21 06:38:23 +00:00
Sanjay Patel 0656629b87 [x86] enable CGP memcmp() expansion for 2/4/8 byte sizes
There are a couple of potential improvements as seen in the IR and asm:
1. We're unnecessarily extending to a larger type to compare values.
2. The codegen for (select cond, 1, -1) could avoid a cmov.
(or we could change the order of the compares, so we have a select with 0 operand)

llvm-svn: 305802
2017-06-20 15:58:30 +00:00
Simon Pilgrim 4822b5b649 [X86][SSE] Relax 0/-1 vector element insertion to work for any vector with >=16bit elements
Shuffle lowering/combining now does a good job for 256/512-bit vectors - we don't need to prevent this

llvm-svn: 305801
2017-06-20 15:19:02 +00:00
Simon Pilgrim b4a77fe83a Fixed test name. NFCI.
llvm-svn: 305787
2017-06-20 10:24:06 +00:00
Igor Breger 1dcd5e8dc8 [GlobalISel][X86] Get correct RegClass for given RegBank.
Summary:
In some cases RegClass depends on target feature. Hight (16-31) vector registers exist only if AVX512f available.
Split from https://reviews.llvm.org/D33665

Reviewers: qcolombet, t.p.northover, zvi, guyblank

Reviewed By: t.p.northover, guyblank

Subscribers: guyblank, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D33952

Conflicts:
	test/CodeGen/X86/GlobalISel/select-memop-scalar.mir

llvm-svn: 305784
2017-06-20 09:15:10 +00:00
Igor Breger 14535f0fc2 [GlobalISel] combine not symmetric merge/unmerge nodes.
Summary:
In some cases legalization ends up with not symmetric merge/unmerge nodes.
Transform it to merge/unmerge nodes.

Reviewers: t.p.northover, qcolombet, zvi

Reviewed By: t.p.northover

Subscribers: rovka, kristof.beyls, guyblank, llvm-commits

Differential Revision: https://reviews.llvm.org/D33626

llvm-svn: 305783
2017-06-20 08:54:17 +00:00
Igor Breger 22ab175658 [GlobalISel][X86] add legalizer mir tests. NFC
llvm-svn: 305781
2017-06-20 08:30:48 +00:00
Sanjoy Das 7ba830d61c Fix machine instruction in test case
The AMD64rm instruction used in the test case was incorrect.  Since
the first input register to AND64rm is tied to output register, they
must be the same.

Thanks for Jesper Antonsson for pointing this out!

llvm-svn: 305756
2017-06-19 22:35:48 +00:00