Commit Graph

9930 Commits

Author SHA1 Message Date
Nirav Dave 07871007aa [DAG] Avoid deleting nodes before combining them.
When replacing a node and it's operand, replacing the operand node may
cause the deletion of the original node leading to an assertion
failure. Case around these replacements to avoid this without relying
on inspecting the DELETED_NODE opcode in various extend
dagcombiner cases.

Fixes PR32515.

Reviewers: dbabokin, RKSimon, davide, chandlerc

Subscribers: chandlerc, llvm-commits

Differential Revision: https://reviews.llvm.org/D34095

llvm-svn: 308330
2017-07-18 17:39:15 +00:00
Simon Pilgrim 964a1f1fb0 [X86][AVX] Regenerate shift test to show constant broadcast comment
llvm-svn: 308323
2017-07-18 16:07:12 +00:00
Simon Pilgrim 483927aefb [x86, CGP] increase memcmp() expansion up to 4 load pairs
It should be a win to avoid going out to the system lib for all small memcmp() calls using scalar ops. For x86 32-bit, this means most everything up to 16 bytes. For 64-bit, that doubles because we can do 8-byte loads.

Notes:

    Reduced from 4 to 2 loads for -Os behavior, which might not be optimal in all cases. It's effectively a question of how much do we trust the system implementation. Linux and macOS (and Windows I assume, but did not test) have optimized memcmp() code for x86, so it's probably not bad either way? PPC is using 8/4 for defaults on these. We do not expand at all for -Oz.

    There are still potential improvements to make for the CGP expansion IR and/or lowering such as avoiding select-of-constants (D34904) and not doing zexts to the max load type before doing a compare.

    We have special-case SSE/AVX codegen for (memcmp(x, y, 16/32) == 0) that will no longer be produced after this patch. I've shown the experimental justification for that change in PR33329:

https://bugs.llvm.org/show_bug.cgi?id=33329#c12
TLDR: While the vector code is a likely winner, we can't guarantee that it's a winner in all cases on all CPUs, so I'm willing to sacrifice it for the greater good of expanding all small memcmp(). If we want to resurrect that codegen, it can be done by adjusting the CGP params or poking a hole to let those fall-through the CGP expansion.

Committed on behalf of Sanjay Patel

Differential Revision: https://reviews.llvm.org/D35067

llvm-svn: 308322
2017-07-18 15:55:30 +00:00
Simon Pilgrim c2cbb525ec [X86] Add optsize and minsize memcmp tests (D35067)
llvm-svn: 308311
2017-07-18 14:26:07 +00:00
Simon Pilgrim 420e5eadc2 [X86] Added cmov target to memcmp test
As discussed by @spatel on D35067:

"I added the cmov attribute to the 32-bit codegen test because it removes some noise for that file. I think the intent for the SSE vs no-SSE runs is to show the potential difference for the 16 and 32 byte cases rather than the lack of cmov (which has been available for all CPUs since SSE1, so that's why it shows up automatically with -mattr=sse2)."

llvm-svn: 308309
2017-07-18 14:19:34 +00:00
Simon Pilgrim 4793a11df9 [DAGCombine] Fix issue with out of bound constant rotation (PR33828)
Take the modulo of rotations by a constant greater than or equal to the bit-width

llvm-svn: 308302
2017-07-18 12:31:46 +00:00
Simon Pilgrim 0636fbd737 [X86][AVX512] Add ISD::ROTL/ISD::ROTR constant folding tests
llvm-svn: 308295
2017-07-18 11:18:38 +00:00
Simon Pilgrim 8d0fc91adc [X86] Add test case for PR32282
llvm-svn: 308286
2017-07-18 10:09:40 +00:00
Chandler Carruth 0781d52cb3 [x86] Add a missing triple, without which the CPU won't parse.
Notably, this is failing on our PPC build bots:
http://lab.llvm.org:8011/builders/clang-ppc64le-linux/builds/8338/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Apr33772.ll

llvm-svn: 308272
2017-07-18 08:16:32 +00:00
Chandler Carruth a15e080b05 Revert r308025 due to uncovering a crash in SelectionDAG. This is filed
with a minimal test case in http://llvm.org/PR33833.

Original commit message:
  Improve Aliasing of operations to static alloca

llvm-svn: 308271
2017-07-18 07:53:47 +00:00
Craig Topper f54a500101 [X86] Prevent an assertion failure if a gather intrinsic is passed a non-constant scale value.
This isn't legal code, but we shouldn't crash on it. Now we just don't convert the gather intrinsic if the scale isn't constant and let it go through to isel where we'll report an isel failure.

Fixes PR33772.

llvm-svn: 308267
2017-07-18 06:49:23 +00:00
Martin Storsjo 2f24e93481 [AArch64] Extend CallingConv::X86_64_Win64 to AArch64 as well
Rename the enum value from X86_64_Win64 to plain Win64.

The symbol exposed in the textual IR is changed from 'x86_64_win64cc'
to 'win64cc', but the numeric value is kept, keeping support for
old bitcode.

Differential Revision: https://reviews.llvm.org/D34474

llvm-svn: 308208
2017-07-17 20:05:19 +00:00
Mandeep Singh Grang ed64963f1e [llvm] Remove redundant check-prefix=CHECK from tests. NFC.
Reviewers: t.p.northover, oren_ben_simhon, niravd, mcrosier

Reviewed By: oren_ben_simhon, mcrosier

Subscribers: nhaehnle, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D35466

llvm-svn: 308193
2017-07-17 17:32:45 +00:00
Simon Pilgrim 948eca371e [X86] Add LEA scheduling tests
llvm-svn: 308180
2017-07-17 14:37:17 +00:00
Simon Pilgrim 1cbe8c2ca5 [X86][AVX512] Add lowering of vXi32/vXi64 ISD::ROTL/ISD::ROTR
Add support for lowering to ISD::ROTL/ISD::ROTR, including rotate by immediate

Differential Revision: https://reviews.llvm.org/D35463

llvm-svn: 308177
2017-07-17 14:11:30 +00:00
Simon Pilgrim 105a3716bb Fixed line endings. NFCI.
llvm-svn: 308175
2017-07-17 13:58:20 +00:00
Simon Pilgrim 11199b2ee5 [X86][AVX] Fix typo in vector rotate tests
Was preventing rotate matching

llvm-svn: 308171
2017-07-17 10:35:51 +00:00
Simon Pilgrim 5aa70e7fe5 [X86][AVX512] Add constant splat vector rotate tests for D35463
llvm-svn: 308169
2017-07-17 10:09:48 +00:00
Simon Pilgrim 701e25edce [X86][AVX512] Regenerate shift tests
llvm-svn: 308168
2017-07-17 09:53:45 +00:00
Andrew Zhogin 67a64041b9 [DAGCombiner] Recognise vector rotations with non-splat constants
Fixes PR33691.

Differential revision: https://reviews.llvm.org/D35381

llvm-svn: 308150
2017-07-16 23:11:45 +00:00
Simon Pilgrim 2899ec88fc [X86][AVX512] Add 512-bit vector rotate tests
llvm-svn: 308146
2017-07-16 19:26:49 +00:00
Amjad Aboud 4563c062b1 [X86] X86::CMOV to Branch heuristic based optimization.
LLVM compiler recognizes opportunities to transform a branch into IR select instruction(s) - later it will be lowered into X86::CMOV instruction, assuming no other optimization eliminated the SelectInst.
However, it is not always profitable to emit X86::CMOV instruction. For example, branch is preferable over an X86::CMOV instruction when:
1. Branch is well predicted
2. Condition operand is expensive, compared to True-value and the False-value operands

In CodeGenPrepare pass there is a shallow optimization that tries to convert SelectInst into branch, but it is not enough.
This commit, implements machine optimization pass that converts X86::CMOV instruction(s) into branch, based on a conservative heuristic.

Differential Revision: https://reviews.llvm.org/D34769

llvm-svn: 308142
2017-07-16 17:39:56 +00:00
Simon Pilgrim dad2aef037 [X86] Add F16C scheduling tests
llvm-svn: 308138
2017-07-16 14:34:18 +00:00
Simon Pilgrim 6f26f3d07f [X86] Add POPCNT scheduling tests
llvm-svn: 308137
2017-07-16 14:22:39 +00:00
Simon Pilgrim b884b208ee [X86] Add BMI2 scheduling tests
llvm-svn: 308136
2017-07-16 14:09:15 +00:00
Simon Pilgrim dfb6eb279f [X86] Add BMI1 scheduling tests
llvm-svn: 308135
2017-07-16 13:59:44 +00:00
Simon Pilgrim 7194513268 [X86] Add LZCNT scheduling tests
llvm-svn: 308133
2017-07-16 13:40:44 +00:00
Simon Pilgrim 73ef87978f [X86][SSE4A] Add EXTRQ/INSERTQ values to BTVER2 scheduling model
llvm-svn: 308132
2017-07-16 12:06:06 +00:00
Simon Pilgrim 7d43bcfd2d [X86][AVX] Regenerate tests with constant broadcast comments
llvm-svn: 308131
2017-07-16 11:43:16 +00:00
Simon Pilgrim e47df64a18 [X86][AVX] Regenerate vector tzcnt tests with constant broadcast comments
llvm-svn: 308130
2017-07-16 11:40:23 +00:00
Simon Pilgrim 17f20f48c2 [X86][AVX] Regenerate vector idiv tests with constant broadcast comments
llvm-svn: 308129
2017-07-16 11:38:14 +00:00
Simon Pilgrim 77ce072f6b [X86][AVX] Regenerate combine tests with constant broadcast comments
llvm-svn: 308128
2017-07-16 11:36:11 +00:00
Simon Pilgrim f9ea0959d9 [X86][AVX] Regenerate tests with constant broadcast comments
llvm-svn: 308110
2017-07-15 21:17:35 +00:00
Simon Pilgrim c2221ee767 [X86][AVX] Regenerate tests with constant broadcast comments
llvm-svn: 308109
2017-07-15 20:28:09 +00:00
Nirav Dave a8f63af9d1 Improve Aliasing of operations to static alloca
Recommiting after adding check to avoid miscomputing alias information
on addresses of the same base but different subindices.

Memory accesses offset from frame indices may alias, e.g., we
may merge write from function arguments passed on the stack when they
are contiguous. As a result, when checking aliasing, we consider the
underlying frame index's offset from the stack pointer.

Static allocs are realized as stack objects in SelectionDAG, but its
offset is not set until post-DAG causing DAGCombiner's alias check to
consider access to static allocas to frequently alias. Modify isAlias
to consider access between static allocas and access from other frame
objects to be considered aliasing.

Many test changes are included here. Most are fixes for tests which
indirectly relied on our aliasing ability and needed to be modified to
preserve their original intent.

The remaining tests have minor improvements due to relaxed
ordering. The exception is CodeGen/X86/2011-10-19-widen_vselect.ll
which has a minor degradation dispite though the pre-legalized DAG is
improved.

Reviewers: rnk, mkuper, jonpa, hfinkel, uweigand

Reviewed By: rnk

Subscribers: sdardis, nemanjai, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33345

llvm-svn: 308025
2017-07-14 13:56:21 +00:00
Andrew Zhogin af3d5fe83b [X86][tests] Added rotate_vec.ll CodeGen test. NFC precommit for bug 33691 fix.
llvm-svn: 307937
2017-07-13 18:57:40 +00:00
Simon Pilgrim bb85cb16e3 [DAGCombiner] Fix issue with rotate combines asserting if the constant value types differ from the result type.
llvm-svn: 307900
2017-07-13 10:41:49 +00:00
Sanjay Patel ac29895173 [x86] add select-of-constant tests; NFC
We're using cmov in these cases, but we could reduce to simpler ops.

llvm-svn: 307859
2017-07-12 22:42:39 +00:00
Daniel Neilson 965613ef1b Add element atomic memset intrinsic
Summary: Continuing the work from https://reviews.llvm.org/D33240, this change introduces an element unordered-atomic memset intrinsic. This intrinsic is essentially memset with the implementation requirement that all stores used for the assignment are done with unordered-atomic stores of a given element size.

Reviewers: eli.friedman, reames, mkazantsev, skatkov

Reviewed By: reames

Subscribers: jfb, dschuff, sbc100, jgravelle-google, aheejin, efriedma, llvm-commits

Differential Revision: https://reviews.llvm.org/D34885

llvm-svn: 307854
2017-07-12 21:57:23 +00:00
Sanjay Patel 4450e73b5e [x86] improve SBB optimizations for SETB/SETA with subtract
This is another step towards removing a combine that turns sext
into select of constants and preparing the backend for an IR
future where select is the canonical form.

Earlier commits in this area:
https://reviews.llvm.org/rL306040
https://reviews.llvm.org/rL306072
https://reviews.llvm.org/rL307404 (https://reviews.llvm.org/D34652)
https://reviews.llvm.org/rL307471

llvm-svn: 307821
2017-07-12 17:56:46 +00:00
Sanjay Patel 6d6c06879c [x86] add tests for improving sbb transforms; NFC
We're subtracting X from X the hard way...

llvm-svn: 307819
2017-07-12 17:44:50 +00:00
Davide Italiano a63981aaa9 [X86/FastIsel] Fall-back to SelectionDAG when lowering soft-floats.
FastIsel can't handle them, so we would end up crashing during
register class selection.
Fixes PR26522.

Differential Revision:  https://reviews.llvm.org/D35272

llvm-svn: 307797
2017-07-12 15:26:06 +00:00
Daniel Neilson 57226ef33c Add element atomic memmove intrinsic
Summary: Continuing the work from https://reviews.llvm.org/D33240, this change introduces an element unordered-atomic memmove intrinsic. This intrinsic is essentially memmove with the implementation requirement that all loads/stores used for the copy are done with unordered-atomic loads/stores of a given element size.

Reviewers: eli.friedman, reames, mkazantsev, skatkov

Reviewed By: reames

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34884

llvm-svn: 307796
2017-07-12 15:25:26 +00:00
Simon Pilgrim 8dfbc772d7 [X86][SSE] Fix file check prefix warning breaking buildbots
llvm-svn: 307790
2017-07-12 13:41:13 +00:00
Kamil Rytarowski cce21c1dfe Make shell redirection construct portable
Summary:
NetBSD shell sh(1) does not support ">& /dev/null" construct.
This is bashism. The portable and POSIX solution is to use:
"> /dev/null 2>&1".

This change fixes 22 Unexpected Failures on NetBSD/amd64
for the "check-llvm" target.

Sponsored by <The NetBSD Foundation>

Reviewers: joerg, dim, rnk

Reviewed By: joerg, rnk

Subscribers: rnk, davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D35277

llvm-svn: 307789
2017-07-12 13:24:46 +00:00
Simon Pilgrim ebbb969d21 [X86][SSE] Add 512-bit (iX bitcast(vXi1)) test cases
Improves test coverage for pre-AVX512 targets as well

llvm-svn: 307783
2017-07-12 12:44:10 +00:00
Michael Zuckerman fce5c67920 [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess.
Adding base test for AVX512 

llvm-svn: 307761
2017-07-12 08:01:44 +00:00
Sanjay Patel 7c026cb1af [x86] auto-generate full checks; NFC
llvm-svn: 307718
2017-07-11 22:04:36 +00:00
Michael Zuckerman 1fe5628aa0 reverting 307677.
llvm-svn: 307698
2017-07-11 19:46:11 +00:00
Michael Zuckerman 4b6d01a008 [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess.
Base test for avx512
adding new base test to trunk befor commit change on the test

llvm-svn: 307677
2017-07-11 17:17:49 +00:00
Guy Blank 509d1b2a5a [X86][AVX512] regenerate avx512-insert-extract.ll
llvm-svn: 307654
2017-07-11 11:51:49 +00:00
Serguei Katkov 0e831c996c Revert Revert [MBP] do not rotate loop if it creates extra branch
This is a second attempt to land this patch.

The first one resulted in a crash of clang sanitizer buildbot.
The fix is here and regression test is added.

This is a last fix for the corner case of PR32214. Actually this is not really corner case in general.

We should not do a loop rotation if we create an additional branch due to it.
Consider the case where we have a loop chain H, M, B, C , where
H is header with viable fallthrough from pre-header and exit from the loop
M - some middle block
B - backedge to Header but with exit from the loop also.
C - some cold block of the loop.

Let's H is determined as a best exit. If we do a loop rotation M, B, C, H we can introduce the extra branch.
Let's compute the change in number of branches:
+1 branch from pre-header to header
-1 branch from header to exit
+1 branch from header to middle block if there is such
-1 branch from cold bock to header if there is one

So if C is not a predecessor of H then we introduce extra branch.

This change actually prohibits rotation of the loop if both true
  Best Exit has next element in chain as successor.
  Last element in chain is not a predecessor of first element of chain.

Reviewers: iteratee, xur, sammccall, chandlerc	
Reviewed By: iteratee
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34745

llvm-svn: 307631
2017-07-11 08:34:58 +00:00
Igor Breger 324d3791f8 [GlobalISel][X86] Use correct AND instructions.
AND8ri8 not supported in 64bit.

llvm-svn: 307630
2017-07-11 08:04:51 +00:00
Serguei Katkov 0b7b59ada3 [CGP] Relax a bit restriction for optimizeMemoryInst to extend scope
CodeGenPrepare::optimizeMemoryInst contains a check that we do nothing
if all instructions combining the address for memory instruction is in the same
block as memory instruction itself.

However if any of these instruction are placed after memory instruction then
address calculation will not be folded to memory instruction.

The added test case shows an example.

Reviewers: loladiro, spatel, efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34862

llvm-svn: 307628
2017-07-11 06:24:44 +00:00
Matthias Braun b38736706e Revert "[DAG] Improve Aliasing of operations to static alloca"
Reverting as it breaks tramp3d-v4 in the llvm test-suite. I added some
comments to https://reviews.llvm.org/D33345 about it.

This reverts commit r307546.

llvm-svn: 307589
2017-07-10 20:51:30 +00:00
Andrew V. Tischenko ae9d6db769 [X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler Part-1 (PR28573).
The new version of the model is definitely faster.

Differential Revision:
https://reviews.llvm.org/D35198

llvm-svn: 307552
2017-07-10 16:36:03 +00:00
Nirav Dave 163e1ad9dc [DAG] Improve Aliasing of operations to static alloca
Memory accesses offset from frame indices may alias, e.g., we
may merge write from function arguments passed on the stack when they
are contiguous. As a result, when checking aliasing, we consider the
underlying frame index's offset from the stack pointer.

Static allocs are realized as stack objects in SelectionDAG, but its
offset is not set until post-DAG causing DAGCombiner's alias check to
consider access to static allocas to frequently alias. Modify isAlias
to consider access between static allocas and access from other frame
objects to be considered aliasing.

Many test changes are included here. Most are fixes for tests which
indirectly relied on our aliasing ability and needed to be modified to
preserve their original intent.

The remaining tests have minor improvements due to relaxed
ordering. The exception is CodeGen/X86/2011-10-19-widen_vselect.ll
which has a minor degradation dispite though the pre-legalized DAG is
improved.

Reviewers: rnk, mkuper, jonpa, hfinkel, uweigand

Reviewed By: rnk

Subscribers: sdardis, nemanjai, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33345

llvm-svn: 307546
2017-07-10 15:39:41 +00:00
Gadi Haber f4d154c089 This patch completely replaces the scheduling information for the SandyBridge architecture target by modifying the file X86SchedSandyBridge.td located under the X86 Target.
The SandyBridge architects have provided us with a more accurate information about each instruction latency, number of uOPs and used ports and I used it to replace the existing estimated SNB instructions scheduling and to add missing scheduling information.

Please note that the patch extensively affects the X86 MC instr scheduling for SNB.

Also note that this patch will be followed by additional patches for the remaining target architectures HSW, IVB, BDW, SKL and SKX.

The updated and extended information about each instruction includes the following details:
•static latency of the instruction
•number of uOps from which the instruction consists of
•all ports used by the instruction's' uOPs

For example, the following code dictates that instructions, ADC64mr, ADC8mr, SBB64mr, SBB8mr have a static latency of 9 cycles. Each of these instructions is decoded into 6 micro operations which use ports 4, ports 2 or 3 and port 0 and ports 0 or 1 or 5:

def SBWriteResGroup94 : SchedWriteRes<[SBPort4,SBPort23,SBPort0,SBPort015]> {
let Latency = 9;
let NumMicroOps = 6;
let ResourceCycles = [1,2,2,1];

}
def: InstRW<[SBWriteResGroup94], (instregex "ADC64mr")>;
def: InstRW<[SBWriteResGroup94], (instregex "ADC8mr")>;
def: InstRW<[SBWriteResGroup94], (instregex "SBB64mr")>;
def: InstRW<[SBWriteResGroup94], (instregex "SBB8mr")>;

Note that apart for the header, most of the X86SchedSandyBridge.td file was generated by a script.

Reviewers: zvi, chandlerc, RKSimon, m_zuckerman, craig.topper, igorb

Differential Revision:  https://reviews.llvm.org/D35019#inline-304691

llvm-svn: 307529
2017-07-10 09:53:16 +00:00
Igor Breger d8b51e134e [GlobalISel][X86] Support G_LOAD/G_STORE i1.
Summary: Support G_LOAD/G_STORE i1.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D35178

llvm-svn: 307527
2017-07-10 09:26:09 +00:00
Igor Breger d48c5e4855 [GlobalISel][X86] extend G_ZEXT support.
Summary:
Mark G_ZEXT/G_SEXT i1 to i8/i16,  i8 to i16 as legal.
Support G_ZEXT i1 to i8/i16 instruction selection ( C++ code).
This patch requred to support G_LOAD/G_STORE i1.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D35177

llvm-svn: 307526
2017-07-10 09:07:34 +00:00
Davide Italiano c4b0ccd049 [X86] Relax an assertion when legalizing vector types.
WidenVSELECTAndMask can fold (and it folds in this case) so we
get a BUILD_VECTOR of constants as mask. convertMask() seems to
work fine when the input is a vector of constants, and we still
need to call it to extend/add elements at the end. but the current
code just asserts on anything but a SETCC or AND/OR/XOR of 2xSETCC.
This change was discussed briefly with Simon Pilgrim, who also
suggests we might consider dropping this assertion in the future.

Fixes PR33715.

llvm-svn: 307508
2017-07-09 19:22:48 +00:00
Simon Pilgrim 8247687e0f [X86][AVX512] Regenerate AVX512VL comparison tests.
Show poor codegen on KNL targets as mentioned on D35179

llvm-svn: 307500
2017-07-09 15:47:43 +00:00
Igor Breger 769cd05232 [GlobalISel][X86] Add legalizer tests for G_LOAD/G_STORE operations. NFC.
llvm-svn: 307494
2017-07-09 07:25:57 +00:00
Igor Breger b80b44b7b9 [FastISel] fix a fallback diagnostic.
Summary: FastISel was marked as failed in case instruction selection succeeded.

Reviewers: qcolombet, zvi, rovka, ab

Reviewed By: zvi

Subscribers: javed.absar, ab, qcolombet, bogner, llvm-commits

Differential Revision: https://reviews.llvm.org/D34438

llvm-svn: 307489
2017-07-09 05:55:20 +00:00
Hiroshi Inoue 713b5ba2de fix trivial typos; NFC
sucessor -> successor 

llvm-svn: 307488
2017-07-09 05:54:44 +00:00
Sanjay Patel 18ee908ca2 [x86] add SBB optimization for SETBE (ule) condition code
x86 scalar select-of-constants (Cond ? C1 : C2) combining/lowering is a mess 
with missing optimizations. We handle some patterns, but miss logical variants.

To clean that up, we should convert all select-of-constants to logic/math and 
enhance the combining for the expected patterns from that. Selecting 0 or -1 
needs extra attention to produce the optimal code as shown here.

Attempt to verify that all of these IR forms are logically equivalent:
http://rise4fun.com/Alive/plxs

Earlier steps in this series:
rL306040
rL306072
rL307404 (D34652)

As acknowledged in the earlier review, there's a possibility that some Intel
uarch would prefer to produce an xor to clear the fake register operand with
sbb %eax, %eax. This will likely need to be addressed in a separate pass.

llvm-svn: 307471
2017-07-08 14:04:48 +00:00
Sanjay Patel dd36f75733 [x86] add SBB optimization for SETAE (uge) condition code
x86 scalar select-of-constants (Cond ? C1 : C2) combining/lowering is a mess 
with missing optimizations. We handle some patterns, but miss logical variants.

To clean that up, we should convert all select-of-constants to logic/math and 
enhance the combining for the expected patterns from that. DAGCombiner already 
has the foundation to allow the transforms, so we just need to fill in the holes 
for x86 math op lowering. Selecting 0 or -1 needs extra attention to produce the
optimal code as shown here.

Attempt to verify that all of these IR forms are logically equivalent:
http://rise4fun.com/Alive/plxs

Earlier steps in this series:
rL306040
rL306072

Differential Revision: https://reviews.llvm.org/D34652

llvm-svn: 307404
2017-07-07 14:56:20 +00:00
Wei Mi 20526b2725 [ConstHoisting] choose to hoist when frequency is the same.
The patch is to adjust the strategy of frequency based consthoisting:
Previously when the candidate block has the same frequency with the existing
blocks containing a const, it will not hoist the const to the candidate block.
For that case, now we change the strategy to hoist the const if only existing
blocks have more than one block member. This is helpful for reducing code size.

Differential Revision: https://reviews.llvm.org/D35084

llvm-svn: 307328
2017-07-06 22:32:27 +00:00
Simon Pilgrim a80cb1d7a7 [X86][SSE] Tests for bitcasting iX integers to vXi1 boolean vectors
Including sign/zero extension to legal types

llvm-svn: 307301
2017-07-06 19:33:10 +00:00
Simon Pilgrim 0fee3372c9 [X86][SSE] Dropped -mcpu from bitcast+setcc tests
Use triple and attribute only for consistency

Added SSE2/AVX tests on 256-bit vectors to test PACKSS behaviour

llvm-svn: 307289
2017-07-06 18:27:34 +00:00
Wei Mi 90707394e3 [LSR] Narrow search space by filtering non-optimal formulae with the same ScaledReg and Scale.
When the formulae search space is huge, LSR uses a series of heuristic to keep
pruning the search space until the number of possible solutions are within
certain limit.

The big hammer of the series of heuristics is NarrowSearchSpaceByPickingWinnerRegs,
which picks the register which is used by the most LSRUses and deletes the other
formulae which don't use the register. This is a effective way to prune the search
space, but quite often not a good way to keep the best solution. We saw cases before
that the heuristic pruned the best formula candidate out of search space.

To relieve the problem, we introduce a new heuristic called
NarrowSearchSpaceByFilterFormulaWithSameScaledReg. The basic idea is in order to
reduce the search space while keeping the best formula, we want to keep as many
formulae with different Scale and ScaledReg as possible. That is because the central
idea of LSR is to choose a group of loop induction variables and use those induction
variables to represent LSRUses. An induction variable candidate is often represented
by the Scale and ScaledReg in a formula. If we have more formulae with different
ScaledReg and Scale to choose, we have better opportunity to find the best solution.
That is why we believe pruning search space by only keeping the best formula with the
same Scale and ScaledReg should be more effective than PickingWinnerReg. And we use
two criteria to choose the best formula with the same Scale and ScaledReg. The first
criteria is to select the formula using less non shared registers, and the second
criteria is to select the formula with less cost got from RateFormula. The patch
implements the heuristic before NarrowSearchSpaceByPickingWinnerRegs, which is the
last resort.

Testing shows we get 1.8% and 2% on two internal benchmarks on x86. llvm nightly
testsuite performance is neutral. We also tried lsr-exp-narrow and it didn't help
on the two improved internal cases we saw.

Differential Revision: https://reviews.llvm.org/D34583

llvm-svn: 307269
2017-07-06 15:52:14 +00:00
Simon Pilgrim 713600747e [X86][SSE4A] Add support for shuffle combining to INSERTQI.
llvm-svn: 307268
2017-07-06 15:34:17 +00:00
Simon Pilgrim 03641df383 [X86][SSE4A] Add test showing missed opportunities to combine INSERTQI shuffle
llvm-svn: 307265
2017-07-06 14:52:24 +00:00
Sanjay Patel 2a341620e7 [x86] fix over-specified triple and auto-generate checks; NFC
llvm-svn: 307262
2017-07-06 14:15:15 +00:00
Simon Pilgrim cc0f785dca [X86][SSE4A] Add support for shuffle combining to EXTRQ.
llvm-svn: 307254
2017-07-06 12:22:58 +00:00
Simon Pilgrim 40c0ae200f [X86][SSE4A] Add scheduling tests for SSE4A instructions
llvm-svn: 307251
2017-07-06 11:26:43 +00:00
Simon Pilgrim ac78daf517 {DAGCombiner] Fold (rot x, 0) -> x
llvm-svn: 307184
2017-07-05 18:27:11 +00:00
Simon Pilgrim 49123d4bb0 [X86] Test bitfield loadstore tests on i686 as well
llvm-svn: 307182
2017-07-05 18:09:30 +00:00
Andrew Zhogin 45d192823e [DAGCombiner] visitRotate patch to optimize pair of ROTR/ROTL instructions into one with combined shift operand.
For two ROTR operations with shifts C1, C2; combined shift operand will be (C1 + C2) % bitsize.

Differential revision: https://reviews.llvm.org/D12833

llvm-svn: 307179
2017-07-05 17:55:42 +00:00
Simon Pilgrim 55006b407b [X86][SSE] Dropped -mcpu from bitcast+setcc mask tests
Use triple and attribute only for consistency

llvm-svn: 307176
2017-07-05 17:30:30 +00:00
Igor Breger 55e2f5963a [GlobalIsel] allow x86_fp80 values to be dumped.
Summary:
Otherwise the fallback path fails with an assertion on x86_64 targets,
when "x86_fp80" is encountered.

Reviewers: t.p.northover, zvi, guyblank

Reviewed By: zvi

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D34975

llvm-svn: 307140
2017-07-05 11:11:10 +00:00
Nirav Dave b320ef9fab Rewrite areNonVolatileConsecutiveLoads to use BaseIndexOffset
Relanding after rewriting undef.ll test to avoid host-dependant
endianness.

As discussed in D34087, rewrite areNonVolatileConsecutiveLoads using
generic checks. Also, propagate missing local handling from there to
BaseIndexOffset checks.

Tests of note:

  * test/CodeGen/X86/build-vector* - Improved.
  * test/CodeGen/BPF/undef.ll - Improved store alignment allows an
    additional store merge

  * test/CodeGen/X86/clear_upper_vector_element_bits.ll - This is a
    case we already do not handle well. Here, the DAG is improved, but
    scheduling causes a code size degradation.

Reviewers: RKSimon, craig.topper, spatel, andreadb, filcab

Subscribers: nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D34472

llvm-svn: 307114
2017-07-05 01:21:23 +00:00
Gadi Haber 689426e3cb NFC.
Made some updates to the half.ll test under CodeGen to make it friendly to the update_llc_test_checks .py tool as follows:
1.Removing the llc flag -asm-verbose=false
2.Grouping the multiple check-prefix directives
3.Apply update_llc_test_checks.py tool on the test

This change is needed to easily update scheduling changes in an upcoming patch.

Reviewers: zvi, RKSimon, craig.topper 

Differential Revision: https://reviews.llvm.org/D34934

llvm-svn: 307108
2017-07-04 21:51:05 +00:00
Simon Pilgrim ac3e7f3f57 [X86][SSE4A] Add support for combining from non-v16i8 EXTRQI/INSERTQI shuffles
With the improved shuffle decoding we can now combine EXTRQI/INSERTQI shuffles from non-v16i8 vector types

llvm-svn: 307099
2017-07-04 18:11:02 +00:00
Anna Thomas 505941e7d6 [FastISel] Move gc intrinsic test to X86 directory
Move from generic to X86 directory since gc intrinsics only supposed in
X86 64 bit.
Add target triple as well.
Fixes build failure in i686-linux-RA  caused by rL307084.

llvm-svn: 307086
2017-07-04 15:24:08 +00:00
Simon Pilgrim d128222f0c [X86] Add combine tests for vector rotates
Reference tests for D12833

llvm-svn: 307073
2017-07-04 12:33:53 +00:00
Gadi Haber 4980790e81 NFC commit.
Converting the Codegen test "extractelement-legalization-store-ordering.ll" to be "update_llc_test_checks" friendly.

The changes to the test are needed for an upcoming scheduling patch.

Reviewers: zvi, RKSimon

Differential Revision: https://reviews.llvm.org/D34935

llvm-svn: 307066
2017-07-04 07:18:03 +00:00
Craig Topper ad140cfb68 [X86] Add comment string for broadcast loads from the constant pool.
Summary:
When broadcasting from the constant pool its useful to print out the final vector similar to what we do for normal moves from the constant pool.

I changed only a couple tests that were broadcast focused. One of them had been previously hand tweaked after running the script so that it could check the constant pool declaration. But I think this patch makes that unnecessary now since we can check the comment instead.

Reviewers: spatel, RKSimon, zvi

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34923

llvm-svn: 307062
2017-07-04 05:46:11 +00:00
Anton Yartsev 66d32c5e06 [legalize-types] Clean up softening machinery.
The patch makes SoftenFloatResult/Operand logic just the same as all other legalization routines have: SoftenFloatResult() now fills the SoftenFloats map and SoftenFloatOperand() perform all needed replacements. This prevents softening mashinery from leaving stale entries in SoftenFloats map (that resulted in errors during the legalize type checking) and clarifies softening. The patch replaces https://reviews.llvm.org/D29265.

Differential Revision: https://reviews.llvm.org/D31946

llvm-svn: 307053
2017-07-04 01:08:55 +00:00
Simon Pilgrim fa6e675267 [X86][SSE4A] Add support for combining from EXTRQI/INSERTQI shuffles
llvm-svn: 307048
2017-07-03 20:58:16 +00:00
Simon Pilgrim bdfb3b1d5f [X86][SSE4A] Add SSE4A shuffle tests on pre-SSSE3 hardware
llvm-svn: 307042
2017-07-03 16:53:11 +00:00
Simon Pilgrim b5c68a6717 [X86][SSE4A] Test SSE4A shuffle combining on SSE42 capable target as well
llvm-svn: 307038
2017-07-03 15:55:54 +00:00
Zvi Rackover d7a1c334ce DAGCombine: Combine BUILD_VECTOR to TRUNCATE
Summary:
Add a combine for creating a truncate to replace a build_vector composed of extracts with
indices that form a stride-2^N series.

Example:
v8i32 V = ...

v4i32 build_vector((extract_elt V, 0), (extract_elt V, 2), (extract_elt V, 4), (extract_elt V, 6))
-->
v4i32 truncate (bitcast V to v4i64)

Related discussion in llvm-dev about canonicalizing shuffles to
truncates in LLVM IR:
http://lists.llvm.org/pipermail/llvm-dev/2017-January/108936.html.

Reviewers: spatel, RKSimon, efriedma, igorb, craig.topper, wolfgangp, delena

Reviewed By: delena

Subscribers: guyblank, delena, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D34077

llvm-svn: 307036
2017-07-03 15:47:40 +00:00
Sanjay Patel e9b1d16a8c [x86] auto-generate complete checks for tests; NFC
These all used 'CHECK-NOT' which isn't necessary if we have complete checks.
There were also over-specifications in the RUN params such as CPU model.

llvm-svn: 307033
2017-07-03 15:27:19 +00:00
Sanjay Patel d3173740fd [x86] auto-generate complete checks for tests; NFC
These all used 'CHECK-NOT' which isn't necessary if we have complete checks.
There were also several over-specifications in the RUN params such as CPU model or OS requirement

llvm-svn: 307028
2017-07-03 15:04:05 +00:00
Simon Pilgrim decfaca033 [X86][SSE4A] Add tests showing missed opportunities to combine EXTRQI/INSERTQI shuffles
llvm-svn: 307027
2017-07-03 15:01:07 +00:00
Sanjay Patel dab798a25f [x86] auto-generate complete checks for tests; NFC
These all used 'CHECK-NOT' which isn't necessary if we have complete checks.

llvm-svn: 307024
2017-07-03 14:29:45 +00:00
Igor Breger 5c787ab346 [GlobalISel][X86] fix %ptr(p0) = G_CONSTANT selection.
llvm-svn: 307019
2017-07-03 11:06:54 +00:00
Simon Pilgrim f05c5ef441 [X86][AVX512] Test AVX512VPOPCNTDQ CTPOP with/without AVX512BW
llvm-svn: 306991
2017-07-02 19:52:20 +00:00
Simon Pilgrim a9655ffb42 [X86][AVX512VPOPCNTDQ] Improve support for v16i8/v8i16/v16i16/ CTPOP
Zero extend to v16i32/v8i64, use VPOPCNTDQ instructions and truncate back.

llvm-svn: 306990
2017-07-02 19:32:37 +00:00