Commit Graph

10024 Commits

Author SHA1 Message Date
Igor Breger ce5ea38135 [GlobalISel][X86] Fix mir tests, use correct physical register.NFC.
llvm-svn: 310996
2017-08-16 07:25:51 +00:00
Sanjay Patel 92653865e6 [x86] fold the mask op on 8- and 16-bit rotates
Ref the post-commit thread for r310770:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170807/478507.html

The motivating cases as 'C' source examples can look like this:

unsigned char rotate_right_8(unsigned char v, int shift) {
  // shift &= 7;
  v = ( v >> shift ) | ( v << ( 8 - shift ) );
  return v;
}

https://godbolt.org/g/K6rc1A

Notice that the source doesn't contain UB-safe masked shift amounts, but instcombine created those 
in order to produce narrow rotate patterns. This should be the last step needed to resolve PR34046:
https://bugs.llvm.org/show_bug.cgi?id=34046

Differential Revision: https://reviews.llvm.org/D36644

llvm-svn: 310849
2017-08-14 15:55:43 +00:00
Elad Cohen 6a9edda356 [SelectionDAG] combine vextract (v1iX extract_subvector(vNiX, Idx))
into vextract(vNiX,Idx) when creating vextract with getNode().
This case appeared in AVX512 after fixing pr33349 in r310552.

Differential revision: https://reviews.llvm.org/D36571

llvm-svn: 310828
2017-08-14 10:49:45 +00:00
Balaram Makam d9f53414de MachineInstr: Reason locally about some memory objects before going to AA.
This addresses a FIXME in MachineInstr::mayAlias.

llvm-svn: 310825
2017-08-14 09:41:40 +00:00
Elad Cohen 3a90a0c10d Revert "[DAGCombiner] Extending pattern detection for vector shuffle (REAPPLIED)"
This reverts commit r310782.

llvm-svn: 310822
2017-08-14 09:06:00 +00:00
Simon Pilgrim 8d9e6e607a [X86][BMI] Add BEXTR demanded bits test cases (PR34042)
llvm-svn: 310802
2017-08-13 20:35:38 +00:00
Gadi Haber bed2c50607 [X86][SandyBridge] Additional updates to the SNB instructions scheduling information
This is a continuation patch for commit r307529 which completely replaces the scheduling information for the SandyBridge architecture target by modifying the file X86SchedSandyBridge.td located under the X86 Target (see also https://reviews.llvm.org/D35019).

In this patch we added the scheduling information of additional SNB instructions that were missing from the patch commit r307529, fixed the scheduling of several resource groups that include only port0 instead of port05 (i.e., port0 OR port5) and fixed several incorrect instructions' scheduling in the r307529 commit.

The patch also includes the X87 instructions which were missing in previous patch commit r307529 as reported in bugzilla bug 34080.

Reviewers: zvi, RKSimon, chandlerc, igorb, m_zuckerman, craig.topper, aymanmus, dim

Differential Revision: https://reviews.llvm.org/D36388

llvm-svn: 310792
2017-08-13 13:59:24 +00:00
Simon Pilgrim 631991fdde [X86][AVX512] Added additional shuffle+trunc test case.
An existing test should have covered this but a typo caused it to fail. I've kept both as the codegen for the typo case needs addressing as well. 

llvm-svn: 310791
2017-08-13 12:30:36 +00:00
Simon Pilgrim ad1c5566a7 [X86][TBM] Add tests showing failure to fold RFLAGS result into TBM instructions.
And fails to select TBM instructions at all.

llvm-svn: 310790
2017-08-13 12:16:00 +00:00
Simon Pilgrim 808ce12878 [X86][TBM] Regenerate bextri intrinsics tests. NFCI.
llvm-svn: 310788
2017-08-13 11:56:15 +00:00
Guy Blank de425ae753 [X86][AVX512] Add combine for TESTM
Add an X86 combine for TESTM when one of the operands is a BUILD_VECTOR(0,0,...).

TESTM op0, BUILD_VECTOR(0,0,...) -> BUILD_VECTOR(0,0,...)
TESTM BUILD_VECTOR(0,0,...), op1 -> BUILD_VECTOR(0,0,...)

Differential Revision:
https://reviews.llvm.org/D36536

llvm-svn: 310787
2017-08-13 08:03:37 +00:00
Craig Topper 44cb1ffb6a [X86] When handling addcarry intrinsic, create the flag result with the correct type so we don't crash if we use a memory instruction
Summary:
Previously we were creating the flag result with MVT::Other which is interpretted as a Chain node. If we used a memory form of the instruction we would end up with a copyToReg that consumed the chain result of the adcx instruction instead of the flag result.

Pretty sure we should be using MVT::i32 here, that's what we do other places we create these node types.

We should probably consider this for 5.0 as well.

Reviewers: RKSimon, zvi, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36645

llvm-svn: 310784
2017-08-12 20:19:44 +00:00
Simon Pilgrim 5a86f0e717 [DAGCombiner] Extending pattern detection for vector shuffle (REAPPLIED)
If all the operands of a BUILD_VECTOR extract elements from same vector then split the vector efficiently based on the maximum vector access index.

Reapplied with fix to only work with simple value types.

Committed on behalf of @jbhateja (Jatin Bhateja)

Differential Revision: https://reviews.llvm.org/D35788

llvm-svn: 310782
2017-08-12 17:43:25 +00:00
Simon Pilgrim 32546d1434 [X86] Regenerate merge store tests. NFCI.
Gives us a much better idea of what is going on than just relying on a few checks.

llvm-svn: 310780
2017-08-12 17:27:35 +00:00
Richard Smith 3704eba1d1 D36604: PR34148: Do not assume we can use a copy relocation for an `external_weak` global
An `external_weak` global may be intended to resolve as a null pointer if it's
not defined, so it doesn't make sense to use a copy relocation for it.

Differential Revision: https://reviews.llvm.org/D36604

llvm-svn: 310773
2017-08-11 23:52:28 +00:00
Sanjay Patel 2b452c7192 [x86] add tests for rotate left/right with masked shifter; NFC
As noted in the test comment, instcombine now produces the masked
shift value even when it's not included in the source, so we should
handle this.

Although the AMD/Intel docs don't say it explicitly, over-rotating
the narrow ops produces the same results. An existence proof that
this works as expected on all x86 comes from gcc 4.9 or later:
https://godbolt.org/g/K6rc1A

llvm-svn: 310770
2017-08-11 22:38:40 +00:00
Sanjay Patel 5d6df36fde [x86] regenerate test checks, add 64-bit run; NFC
llvm-svn: 310767
2017-08-11 22:05:33 +00:00
Craig Topper ac217b7aa3 [X86] Don't use fsin/fcos/fsincos instructions ever
Summary:
Previously we would use these instructions if sse was disabled and fastmath was enabled.

As mentioned in D28335, this is a bad idea.

Reviewers: efriedma, scanon, DavidKreitzer

Reviewed By: DavidKreitzer

Subscribers: zvi, llvm-commits

Differential Revision: https://reviews.llvm.org/D36344

llvm-svn: 310762
2017-08-11 20:55:29 +00:00
Rafael Espindola b8956a70d3 Fix access to undefined weak symbols in pic code
When the access to a weak symbol is not a call, the access has to be
able to produce the value 0 at runtime.

We were sometimes producing code sequences where that was not possible
if the code was leaded more than 4g away from 0.

llvm-svn: 310756
2017-08-11 20:49:27 +00:00
Craig Topper 561092f233 [AVX512] Remove and autoupgrade many of the broadcast intrinsics
Summary:
This autoupgrades most of the broadcast intrinsics. They've been unused in clang for some time.

This leaves the 32x2 intrinsics because they are still used in clang.

Reviewers: RKSimon, zvi, igorb

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36606

llvm-svn: 310725
2017-08-11 16:22:45 +00:00
Craig Topper 0f30fe9634 [x86] Enable some support for lowerVectorShuffleWithUndefHalf with AVX-512
Summary:
This teaches 512-bit shuffles to detect unused halfs in order to reduce shuffle size.

We may need to refine the 512-bit exit point. I couldn't remember if we had good cross lane shuffles for 8/16 bit with AVX-512 or not.

I believe this is step towards being able to handle D36454 without a special case.

From here we need to improve our ability to combine extract_subvector with insert_subvector and other extract_subvectors. And we need to support narrowing binary operations where we don't demand all elements. This may be improvements to DAGCombiner::narrowExtractedVectorBinOp(by recognizing an insert_subvector in addition to concat) or we may need a target specific combiner.

Reviewers: RKSimon, zvi, delena, jbhateja

Reviewed By: RKSimon, jbhateja

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36601

llvm-svn: 310724
2017-08-11 16:20:05 +00:00
Sanjay Patel 169dae70a6 [x86] use more shift or LEA for select-of-constants (2nd try)
The previous rev (r310208) failed to account for overflow when subtracting the
constants to see if they're suitable for shift/lea. This version add a check
for that and more test were added in r310490.

We can convert any select-of-constants to math ops:
http://rise4fun.com/Alive/d7d

For this patch, I'm enhancing an existing x86 transform that uses fake multiplies
(they always become shl/lea) to avoid cmov or branching. The current code misses
cases where we have a negative constant and a positive constant, so this is just
trying to plug that hole.

The DAGCombiner diff prevents us from hitting a terrible inefficiency: we can start
with a select in IR, create a select DAG node, convert it into a sext, convert it
back into a select, and then lower it to sext machine code.

Some notes about the test diffs:

1. 2010-08-04-MaskedSignedCompare.ll - We were creating control flow that didn't exist in the IR.
2. memcmp.ll - Choose -1 or 1 is the case that got me looking at this again. We could avoid the 
   push/pop in some cases if we used 'movzbl %al' instead of an xor on a different reg? That's a 
   post-DAG problem though.
3. mul-constant-result.ll - The trade-off between sbb+not vs. setne+neg could be addressed if
   that's a regression, but those would always be nearly equivalent.
4. pr22338.ll and sext-i1.ll - These tests have undef operands, so we don't actually care about these diffs.
5. sbb.ll - This shows a win for what is likely a common case: choose -1 or 0.
6. select.ll - There's another borderline case here: cmp+sbb+or vs. test+set+lea? Also, sbb+not vs. setae+neg shows up again.
7. select_const.ll - These are motivating cases for the enhancement; replace cmov with cheaper ops.

Assembly differences between movzbl and xor to avoid a partial reg stall are caused later by the X86 Fixup SetCC pass.

Differential Revision: https://reviews.llvm.org/D35340

llvm-svn: 310717
2017-08-11 15:44:14 +00:00
Nirav Dave 0a48e5d506 Improve handling of insert_subvector of bitcast values
Fix insert_subvector / extract_subvector merges of bitcast values.

Reviewers: efriedma, craig.topper, RKSimon

Subscribers: RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D34571

llvm-svn: 310711
2017-08-11 13:21:41 +00:00
Nirav Dave d1b3f09faa [X86][DAG] Switch X86 Target to post-legalized store merge
Move store merge to happen after intrinsic lowering to allow lowered
stores to be merged.

Some regressions due in MergeConsecutiveStores to missing
insert_subvector that are addressed in follow up patch.

Reviewers: craig.topper, efriedma, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34559

llvm-svn: 310710
2017-08-11 13:21:35 +00:00
Simon Pilgrim 83cf3a29b5 [DAGCombiner] Remove shuffle support from simplifyShuffleMask
rL310372 enabled simplifyShuffleMask to support undef shuffle mask inputs, but its causing hangs.

Removing support until I can triage the problem

llvm-svn: 310699
2017-08-11 08:37:00 +00:00
Nirav Dave 4d28c0ff4f [DAG] Relax type restriction for store merge
Summary: Allow stores of bitcastable types to be merged by peeking through BITCAST nodes and recasting stored values constant and vector extract nodes as necessary.

Reviewers: jyknight, hfinkel, efriedma, RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34569

llvm-svn: 310655
2017-08-10 19:52:45 +00:00
Taewook Oh f5040b9685 Make .file directive to have basename only
Summary:
Currently LLVM puts directory along with the filename in .file directive, but this behavior doesn't match gcc. There's a no clear description about which one is right (https://sourceware.org/binutils/docs/as/File.html#File), but one document (https://sourceware.org/gdb/current/onlinedocs/stabs/ELF-Linker-Relocation.html) suggests that STT_FILE symbol in elf file is expected to have basename only, which should have a same sting file .file directive according to (https://docs.oracle.com/cd/E26502_01/html/E28388/eoiyg.html).

This also affects badly on the build system that uses hashing, as the directory info could be differnt from developer to developer even when they're working on same file.

Reviewers: pcc, mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36018

llvm-svn: 310642
2017-08-10 18:17:11 +00:00
Nirav Dave 926e2d39bf [X86] Keep dependencies when constructing loads in combineStore
Summary:
Preserve chain dependecies between old and new loads constructed to
prevent loads from reordering below later stores.

Fixes PR34088.

Reviewers: craig.topper, spatel, RKSimon, efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36528

llvm-svn: 310604
2017-08-10 15:12:32 +00:00
Guy Blank 136b543745 [SelectionDAG] Allow constant folding for implicitly truncating BUILD_VECTOR nodes.
In FoldConstantArithmetic, handle BUILD_VECTOR nodes that do implicit truncation on the elements.

This is similar to what is done in FoldConstantVectorArithmetic.

Differential Revision:
https://reviews.llvm.org/D36506

llvm-svn: 310593
2017-08-10 14:09:50 +00:00
Elad Cohen 22ba97a0a6 [SelectionDAG] When scalarizing vselect, don't assert on
a legal cond operand.

When scalarizing the result of a vselect, the legalizer currently expects
to already have scalarized the operands. While this is true for the true/false
operands (which have the same type as the result), it is not case for the
condition operand. On X86 AVX512, v1i1 is legal - this leads to operations such
as '< N x type> vselect < N x i1> < N x type> < N x type>' where < N x type > is
illegal to hit an assertion during the scalarization.

The handling is similar to r205625.
This also exposes the fact that (v1i1 extract_subvector) should be legal
and selectable on AVX512 - We do this by custom lowering to vector_extract_elt.
This still leaves us in some cases with redundant dag nodes which will be
combined in a separate soon to come patch.

This fixes pr33349.

Differential revision: https://reviews.llvm.org/D36511

llvm-svn: 310552
2017-08-10 07:44:23 +00:00
Guy Blank 7f60c991ae [X86][AVX512] Choose correct registers in vpbroadcastb/w
Fixes the vpbroadcastb/w instructions which use GPRs as source operands, to use the correct registers.
The full GPR should be used, and not the subregister, as it happens before the patch.

Fixes pr33795

Differential Revision:
https://reviews.llvm.org/D36479

llvm-svn: 310498
2017-08-09 17:21:01 +00:00
Sanjay Patel 6f80d6b46f [x86] add more tests for select-of-constants; NFC
This is to help recommit a fixed version of r310208. As shown in PR34097, 
we could miscompile if subtraction of the constants overflowed.

llvm-svn: 310490
2017-08-09 15:57:02 +00:00
Serguei Katkov 6ea2e81cf6 [ImplicitNullCheck] Fix the bug when dependent instruction accesses memory
It is possible that dependent instruction may access memory.
In this case we must reject optimization because the memory change will
be visible in null handler basic block. So we will execute an instruction which
we must not execute if check fails.

Reviewers: sanjoy, reames
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36392

llvm-svn: 310443
2017-08-09 05:17:02 +00:00
Simon Pilgrim 91b7b991d4 [DAGCombiner] simplifyShuffleMask - handle UNDEF inputs from shuffles as well as BUILD_VECTOR
Minor extension to D36393

llvm-svn: 310372
2017-08-08 16:10:33 +00:00
Amjad Aboud 6fa6813aec [X86] Improved X86::CMOV to Branch heuristic.
Resolved PR33954.
This patch contains two more constraints that aim to reduce the noise cases where we convert CMOV into branch for small gain, and end up spending more cycles due to overhead.

Differential Revision: https://reviews.llvm.org/D36081

llvm-svn: 310352
2017-08-08 12:17:56 +00:00
Simon Pilgrim ef44228acb [DAGCombiner] Simplify shuffle mask index if the referenced input element is UNDEF
Fixes one of the cases in PR34041.

Differential Revision: https://reviews.llvm.org/D36393

llvm-svn: 310344
2017-08-08 11:03:30 +00:00
Daniel Sanders 0554004698 [globalisel][tablegen] Add support for importing 'imm' operands.
Summary:
This patch enables the import of rules containing 'imm' operands that do not
constrain the acceptable values using predicates. Support for ImmLeaf will
arrive in a later patch.

Depends on D35681

Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar

Reviewed By: rovka

Subscribers: kristof.beyls, javed.absar, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D35833

llvm-svn: 310343
2017-08-08 10:44:31 +00:00
Simon Pilgrim 8c1167df5c [X86][AVX] Added test for broadcast shuffle from binary sources with undefs (D36393)
llvm-svn: 310317
2017-08-07 22:20:06 +00:00
Evgeny Stupachenko c675290680 Reapply fix PR23384 (part 3 of 3) r304824 (was reverted in r305720).
The root cause of reverting was fixed - PR33514.

Summary:
The patch makes instruction count the highest priority for
 LSR solution for X86 (previously registers had highest priority).

Reviewers: qcolombet

Differential Revision: http://reviews.llvm.org/D30562

From: Evgeny Stupachenko <evstupac@gmail.com>
                         <evgeny.v.stupachenko@intel.com>
llvm-svn: 310289
2017-08-07 19:56:34 +00:00
Simon Pilgrim 0242cead2c [X86][AVX] Add full test coverage of subvector_broadcasts from registers
X86SubVBroadcast is for memory subvector broadcasts, but we must test that it handles all cases without the load as well just in case.

This was noticed while I was triaging the test cases from PR34041.

llvm-svn: 310268
2017-08-07 16:49:09 +00:00
Simon Pilgrim 8b9c4f2855 [X86][AVX] Cleanup subvector broadcast tests - remove old prefixes.
llvm-svn: 310265
2017-08-07 15:50:43 +00:00
Sanjay Patel 807f92b8ff [x86] revert r310208 to investigate test-suite failures (PR34105 / PR34097)
llvm-svn: 310264
2017-08-07 15:47:48 +00:00
Michael Zuckerman 680ac10aa7 [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess (VF16 stride 4).
This patch expands the support of lowerInterleavedStore to 16x8i stride 4.

LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=16) and we plan to include more patterns in the future.

The patch goal is to optimize the following sequence:
At the end of the computation, we have ymm2, ymm0, ymm12 and ymm3 holding
each 16 chars:

c0, c1, , c16
m0, m1, , m16
y0, y1, , y16
k0, k1, ., k16

And these need to be transposed/interleaved and stored like so:

c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 ....

Differential Revision: https://reviews.llvm.org/D35829

llvm-svn: 310252
2017-08-07 13:22:39 +00:00
Simon Pilgrim e2bb83a4ee [X86][AVX] Added test for broadcast shuffle with undefs (PR34041)
llvm-svn: 310249
2017-08-07 12:24:33 +00:00
Sanjay Patel a923c2ee95 [x86] use more shift or LEA for select-of-constants
We can convert any select-of-constants to math ops:
http://rise4fun.com/Alive/d7d

For this patch, I'm enhancing an existing x86 transform that uses fake multiplies 
(they always become shl/lea) to avoid cmov or branching. The current code misses 
cases where we have a negative constant and a positive constant, so this is just 
trying to plug that hole.

The DAGCombiner diff prevents us from hitting a terrible inefficiency: we can start 
with a select in IR, create a select DAG node, convert it into a sext, convert it 
back into a select, and then lower it to sext machine code.

Some notes about the test diffs:

1. 2010-08-04-MaskedSignedCompare.ll - We were creating control flow that didn't exist in the IR.
2. memcmp.ll - Choose -1 or 1 is the case that got me looking at this again. I 
   think we could avoid the push/pop in some cases if we used 'movzbl %al' instead of an xor on 
   a different reg? That's a post-DAG problem though.
3. mul-constant-result.ll - The trade-off between sbb+not vs. setne+neg could be addressed if 
   that's a regression, but I think those would always be nearly equivalent.
4. pr22338.ll and sext-i1.ll - These tests have undef operands, so I don't think we actually care about these diffs.
5. sbb.ll - This shows a win for what I think is a common case: choose -1 or 0.
6. select.ll - There's another borderline case here: cmp+sbb+or vs. test+set+lea? Also, sbb+not vs. setae+neg shows up again.
7. select_const.ll - These are motivating cases for the enhancement; replace cmov with cheaper ops.

Assembly differences between movzbl and xor to avoid a partial reg stall are caused later by the X86 Fixup SetCC pass.

Differential Revision: https://reviews.llvm.org/D35340

llvm-svn: 310208
2017-08-06 16:27:07 +00:00
Simon Pilgrim aeb2b1e36d [X86][X87] Regenerate inline-asm tests
llvm-svn: 310201
2017-08-06 12:17:10 +00:00
Simon Pilgrim 5489781cc7 [X86][X87] Add test case for PR34080
Test with/without the sandybridge (default) model for SSE2, SSE3 and AVX targets.

pre-SSE3 the issue is the order of the fpsw and fpcw load/stores (with SSE3 trunc-store FIST instructions avoid the sw/cw manipulations).

llvm-svn: 310198
2017-08-06 11:22:33 +00:00
Craig Topper 6bfa2aee78 [X86] Enable isel to use the PAUSE instruction even when SSE2 is disabled
Summary:
On older processors this instruction encoding is treated as a NOP.

MSVC doesn't disable intrinsics based on features the way clang/gcc does. Because the PAUSE instruction encoding doesn't crash older processors, some software out there uses these intrinsics without checking for SSE2.

This change also seems to also be consistent with gcc behavior.

Fixes PR34079

Reviewers: RKSimon, zvi

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36361

llvm-svn: 310190
2017-08-05 23:34:44 +00:00
Reid Kleckner 61776e6910 Commit the local change I had to make my test pass
llvm-svn: 310153
2017-08-05 00:15:40 +00:00
Reid Kleckner 7662d50d10 [X86] Teach fastisel to select calls to dllimport functions
Summary:
Direct calls to dllimport functions are very common Windows. We should
add them to the -O0 fast path.

Reviewers: rafael

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D36197

llvm-svn: 310152
2017-08-05 00:10:43 +00:00