Commit Graph

15167 Commits

Author SHA1 Message Date
Nirav Dave d1b3f09faa [X86][DAG] Switch X86 Target to post-legalized store merge
Move store merge to happen after intrinsic lowering to allow lowered
stores to be merged.

Some regressions due in MergeConsecutiveStores to missing
insert_subvector that are addressed in follow up patch.

Reviewers: craig.topper, efriedma, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34559

llvm-svn: 310710
2017-08-11 13:21:35 +00:00
Simon Pilgrim b59c2d9d73 [CostModel][X86] Add SSE2 two-src shuffle costs
llvm-svn: 310654
2017-08-10 19:32:35 +00:00
Simon Pilgrim 7354531b82 [CostModel][X86] Add avx1 two-src shuffle costs
llvm-svn: 310650
2017-08-10 19:02:51 +00:00
Simon Pilgrim ac2e50a4ca [CostModel][X86] Add avx2 two-src shuffle costs
llvm-svn: 310645
2017-08-10 18:29:34 +00:00
Simon Pilgrim 702e5fa391 [CostModel][X86] Improve single src shuffle costs
Add missing SK_PermuteSingleSrc costs for AVX2 targets and earlier, also added some of the simpler SK_PermuteTwoSrc costs to support splitting of SK_PermuteSingleSrc shuffles

llvm-svn: 310632
2017-08-10 17:27:20 +00:00
Krzysztof Parzyszek bea30c6286 Add "Restored" flag to CalleeSavedInfo
The liveness-tracking code assumes that the registers that were saved
in the function's prolog are live outside of the function. Specifically,
that registers that were saved are also live-on-exit from the function.
This isn't always the case as illustrated by the LR register on ARM.

Differential Revision: https://reviews.llvm.org/D36160

llvm-svn: 310619
2017-08-10 16:17:32 +00:00
Nirav Dave 926e2d39bf [X86] Keep dependencies when constructing loads in combineStore
Summary:
Preserve chain dependecies between old and new loads constructed to
prevent loads from reordering below later stores.

Fixes PR34088.

Reviewers: craig.topper, spatel, RKSimon, efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36528

llvm-svn: 310604
2017-08-10 15:12:32 +00:00
Elad Cohen 22ba97a0a6 [SelectionDAG] When scalarizing vselect, don't assert on
a legal cond operand.

When scalarizing the result of a vselect, the legalizer currently expects
to already have scalarized the operands. While this is true for the true/false
operands (which have the same type as the result), it is not case for the
condition operand. On X86 AVX512, v1i1 is legal - this leads to operations such
as '< N x type> vselect < N x i1> < N x type> < N x type>' where < N x type > is
illegal to hit an assertion during the scalarization.

The handling is similar to r205625.
This also exposes the fact that (v1i1 extract_subvector) should be legal
and selectable on AVX512 - We do this by custom lowering to vector_extract_elt.
This still leaves us in some cases with redundant dag nodes which will be
combined in a separate soon to come patch.

This fixes pr33349.

Differential revision: https://reviews.llvm.org/D36511

llvm-svn: 310552
2017-08-10 07:44:23 +00:00
Coby Tayree 7683ca04eb [X86][Asm] Allow negative immediate to appear before bracketed expression
Currently, only non-negative immediate is allowed prior to a brac expression (memory reference).
MASM / GAS does not have any problem cope with the left side of the real line, so we should be able to as well.

Differntial Revision: https://reviews.llvm.org/D36229

llvm-svn: 310528
2017-08-09 21:49:17 +00:00
Guy Blank 7f60c991ae [X86][AVX512] Choose correct registers in vpbroadcastb/w
Fixes the vpbroadcastb/w instructions which use GPRs as source operands, to use the correct registers.
The full GPR should be used, and not the subregister, as it happens before the patch.

Fixes pr33795

Differential Revision:
https://reviews.llvm.org/D36479

llvm-svn: 310498
2017-08-09 17:21:01 +00:00
Coby Tayree 3638655325 [X86][Asm]Allow far jmp/call to be picked when using explicit FWORD size specifier
Currently, far jmp/call which utilizes a 48bit memory operand would have been invoked via the 'lcall/ljmp' mnemonic (intel style).
This patch align those variants to formal intel spec

Differential Revision: https://reviews.llvm.org/D35846

llvm-svn: 310485
2017-08-09 15:34:55 +00:00
Haojian Wu c1cae0bd64 Fix -Wpessimizing-move warning.
llvm-svn: 310469
2017-08-09 12:49:20 +00:00
Coby Tayree 3bfb365f52 [AsmParser][AVX512]Enhance OpMask/Zero/Merge syntax check rubostness
Adopt a more strict approach regarding what marks should/can appear after a destination register, when operating upon an AVX512 platform.

Differential Revision: https://reviews.llvm.org/D35785

llvm-svn: 310467
2017-08-09 12:32:05 +00:00
Craig Topper b049158a55 [X86] Add the rest of the ADC and SBB instructions to isDefConvertible.
I don't know if this really affects anything. Just thought it was weird that we had all of the ADD/SUB/AND/OR/XOR instructions.

llvm-svn: 310447
2017-08-09 06:17:49 +00:00
Quentin Colombet 8dd90fb54b Revert "[GlobalISel] Remove the GISelAccessor API."
This reverts commit r310115.

It causes a linker failure for the one of the unittests of AArch64 on one
of the linux bot:
http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/3429

: && /home/fedora/gcc/install/gcc-7.1.0/bin/g++   -fPIC
-fvisibility-inlines-hidden -Werror=date-time -std=c++11 -Wall -W
-Wno-unused-parameter -Wwrite-strings -Wcast-qual
-Wno-missing-field-initializers -pedantic -Wno-long-long
-Wno-maybe-uninitialized -Wdelete-non-virtual-dtor -Wno-comment
-ffunction-sections -fdata-sections -O2
-L/home/fedora/gcc/install/gcc-7.1.0/lib64 -Wl,-allow-shlib-undefined
-Wl,-O3 -Wl,--gc-sections
unittests/Target/AArch64/CMakeFiles/AArch64Tests.dir/InstSizes.cpp.o  -o
unittests/Target/AArch64/AArch64Tests
lib/libLLVMAArch64CodeGen.so.6.0.0svn lib/libLLVMAArch64Desc.so.6.0.0svn
lib/libLLVMAArch64Info.so.6.0.0svn lib/libLLVMCodeGen.so.6.0.0svn
lib/libLLVMCore.so.6.0.0svn lib/libLLVMMC.so.6.0.0svn
lib/libLLVMMIRParser.so.6.0.0svn lib/libLLVMSelectionDAG.so.6.0.0svn
lib/libLLVMTarget.so.6.0.0svn lib/libLLVMSupport.so.6.0.0svn -lpthread
lib/libgtest_main.so.6.0.0svn lib/libgtest.so.6.0.0svn -lpthread
-Wl,-rpath,/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/lib
&& :
unittests/Target/AArch64/CMakeFiles/AArch64Tests.dir/InstSizes.cpp.o:(.toc+0x0):
undefined reference to `vtable for llvm::LegalizerInfo'
unittests/Target/AArch64/CMakeFiles/AArch64Tests.dir/InstSizes.cpp.o:(.toc+0x8):
undefined reference to `vtable for llvm::RegisterBankInfo'

The particularity of this bot is that it is built with
BUILD_SHARED_LIBS=ON

However, I was not able to reproduce the problem so far.
Reverting to unblock the bot.

llvm-svn: 310425
2017-08-08 22:22:30 +00:00
Amjad Aboud 6fa6813aec [X86] Improved X86::CMOV to Branch heuristic.
Resolved PR33954.
This patch contains two more constraints that aim to reduce the noise cases where we convert CMOV into branch for small gain, and end up spending more cycles due to overhead.

Differential Revision: https://reviews.llvm.org/D36081

llvm-svn: 310352
2017-08-08 12:17:56 +00:00
Evgeny Stupachenko c675290680 Reapply fix PR23384 (part 3 of 3) r304824 (was reverted in r305720).
The root cause of reverting was fixed - PR33514.

Summary:
The patch makes instruction count the highest priority for
 LSR solution for X86 (previously registers had highest priority).

Reviewers: qcolombet

Differential Revision: http://reviews.llvm.org/D30562

From: Evgeny Stupachenko <evstupac@gmail.com>
                         <evgeny.v.stupachenko@intel.com>
llvm-svn: 310289
2017-08-07 19:56:34 +00:00
Sanjay Patel 807f92b8ff [x86] revert r310208 to investigate test-suite failures (PR34105 / PR34097)
llvm-svn: 310264
2017-08-07 15:47:48 +00:00
Michael Zuckerman 680ac10aa7 [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess (VF16 stride 4).
This patch expands the support of lowerInterleavedStore to 16x8i stride 4.

LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=16) and we plan to include more patterns in the future.

The patch goal is to optimize the following sequence:
At the end of the computation, we have ymm2, ymm0, ymm12 and ymm3 holding
each 16 chars:

c0, c1, , c16
m0, m1, , m16
y0, y1, , y16
k0, k1, ., k16

And these need to be transposed/interleaved and stored like so:

c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 ....

Differential Revision: https://reviews.llvm.org/D35829

llvm-svn: 310252
2017-08-07 13:22:39 +00:00
Sanjay Patel a923c2ee95 [x86] use more shift or LEA for select-of-constants
We can convert any select-of-constants to math ops:
http://rise4fun.com/Alive/d7d

For this patch, I'm enhancing an existing x86 transform that uses fake multiplies 
(they always become shl/lea) to avoid cmov or branching. The current code misses 
cases where we have a negative constant and a positive constant, so this is just 
trying to plug that hole.

The DAGCombiner diff prevents us from hitting a terrible inefficiency: we can start 
with a select in IR, create a select DAG node, convert it into a sext, convert it 
back into a select, and then lower it to sext machine code.

Some notes about the test diffs:

1. 2010-08-04-MaskedSignedCompare.ll - We were creating control flow that didn't exist in the IR.
2. memcmp.ll - Choose -1 or 1 is the case that got me looking at this again. I 
   think we could avoid the push/pop in some cases if we used 'movzbl %al' instead of an xor on 
   a different reg? That's a post-DAG problem though.
3. mul-constant-result.ll - The trade-off between sbb+not vs. setne+neg could be addressed if 
   that's a regression, but I think those would always be nearly equivalent.
4. pr22338.ll and sext-i1.ll - These tests have undef operands, so I don't think we actually care about these diffs.
5. sbb.ll - This shows a win for what I think is a common case: choose -1 or 0.
6. select.ll - There's another borderline case here: cmp+sbb+or vs. test+set+lea? Also, sbb+not vs. setae+neg shows up again.
7. select_const.ll - These are motivating cases for the enhancement; replace cmov with cheaper ops.

Assembly differences between movzbl and xor to avoid a partial reg stall are caused later by the X86 Fixup SetCC pass.

Differential Revision: https://reviews.llvm.org/D35340

llvm-svn: 310208
2017-08-06 16:27:07 +00:00
Simon Pilgrim 17e290f1d3 [X86] Add comment to match closing Defs = [FPSW]. NFCI.
llvm-svn: 310202
2017-08-06 13:21:09 +00:00
Craig Topper 6bfa2aee78 [X86] Enable isel to use the PAUSE instruction even when SSE2 is disabled
Summary:
On older processors this instruction encoding is treated as a NOP.

MSVC doesn't disable intrinsics based on features the way clang/gcc does. Because the PAUSE instruction encoding doesn't crash older processors, some software out there uses these intrinsics without checking for SSE2.

This change also seems to also be consistent with gcc behavior.

Fixes PR34079

Reviewers: RKSimon, zvi

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36361

llvm-svn: 310190
2017-08-05 23:34:44 +00:00
Reid Kleckner 7662d50d10 [X86] Teach fastisel to select calls to dllimport functions
Summary:
Direct calls to dllimport functions are very common Windows. We should
add them to the -O0 fast path.

Reviewers: rafael

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D36197

llvm-svn: 310152
2017-08-05 00:10:43 +00:00
Quentin Colombet c046208c52 [GlobalISel] Remove the GISelAccessor API.
Its sole purpose was to avoid spreading around ifdefs related to
building global-isel. Since r309990, GlobalISel is not optional anymore,
thus, we can get rid of this mechanism all together.

NFC.

llvm-svn: 310115
2017-08-04 20:15:46 +00:00
Quentin Colombet 250e050a50 [GlobalISel] Make GlobalISel a non-optional library.
With this change, the GlobalISel library gets always built. In
particular, this is not possible to opt GlobalISel out of the build
using the LLVM_BUILD_GLOBAL_ISEL variable any more.

llvm-svn: 309990
2017-08-03 21:52:25 +00:00
Dinar Temirbulatov a0beedef1c [X86] SET0 to use XMM registers where possible PR26018 PR32862
Differential Revision: https://reviews.llvm.org/D35965

llvm-svn: 309926
2017-08-03 08:50:18 +00:00
Rafael Espindola 79e238afee Delete Default and JITDefault code models
IMHO it is an antipattern to have a enum value that is Default.

At any given piece of code it is not clear if we have to handle
Default or if has already been mapped to a concrete value. In this
case in particular, only the target can do the mapping and it is nice
to make sure it is always done.

This deletes the two default enum values of CodeModel and uses an
explicit Optional<CodeModel> when it is possible that it is
unspecified.

llvm-svn: 309911
2017-08-03 02:16:21 +00:00
Simon Pilgrim 486072d3d6 [X86][SSE] Added missing vector logic intrinsic schedules
Improves atom scheduler test coverage (to make it easier to upgrade them for PR32431).

Merged SSE_VEC_BIT_ITINS_P + SSE_BIT_ITINS_P as we were interchanging between them.

llvm-svn: 309715
2017-08-01 17:51:20 +00:00
Craig Topper 2a5bba7325 [X86] Use BEXTR/BEXTRI for 64-bit 'and' with a large mask
Summary: The 64-bit 'and' with immediate instruction only supports a 32-bit immediate. So for larger constants we have to load the constant into a register first. If the immediate happens to be a mask we can use the BEXTRI instruction to perform the masking. We already do something similar using the BZHI instruction from the BMI2 instruction set.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36129

llvm-svn: 309706
2017-08-01 17:18:14 +00:00
Simon Pilgrim 3f24ff6130 [X86][SSE] Added missing PACKSS/PACKUS intrinsic schedules
Improves atom scheduler test coverage (to make it easier to upgrade them for PR32431).

Checked on Agner that these actually match the UNPACK schedules, but better to include a separate class

llvm-svn: 309701
2017-08-01 16:47:48 +00:00
Simon Pilgrim 810677eba2 [X86][SSSE3] Added missing PHADDS/PHSUBS/PSIGN intrinsic schedules
llvm-svn: 309699
2017-08-01 16:18:25 +00:00
Craig Topper 2462a713ae [AVX-512] Don't use unmasked VMOVDQU8/16 for 8-bit or 16-bit element stores even when BWI instructions are supported. Always use VMOVDQA32/VMOVDQU32.
We were already using the 32 bit element opcode if BWI isn't enabled, but there's no reason to change opcode if we have BWI. We will still use the 8/16 opcodes for masked stores though.

This allows us to use the aligned opcode when we can which makes our test output more consistent between different modes. It also reduces the number of isel patterns we need.

This is a slight inconsistency with loads which default to 64 bit element opcodes. I'll probably rectify that in a future patch.

Differential Revision: https://reviews.llvm.org/D35978

llvm-svn: 309693
2017-08-01 15:31:24 +00:00
Craig Topper 410d252f5b [AVX-512] Add unmasked subvector inserts and extract to the execution domain tables.
llvm-svn: 309632
2017-07-31 22:07:29 +00:00
Konstantin Belochapka b77d0a5cf1 [X86][MMX] Added custom lowering action for MMX SELECT (PR30418)
Fix for pr30418 - error in backend: Cannot select: t17: x86mmx = select_cc t2, Constant:i64<0>, t7, t8, seteq:ch
Differential Revision: https://reviews.llvm.org/D34661

llvm-svn: 309614
2017-07-31 20:11:49 +00:00
Craig Topper cb0e74975a [AVX-512] Remove patterns that select vmovdqu8/16 for unmasked loads. Prefer vmovdqa64/vmovdqu64 instead.
These were taking priority over the aligned load instructions since there is no vmovda8/16. I don't think there is really a difference between aligned and unaligned on newer cpus so I don't think it matters which instructions we use.

But with this change we reduce the size of the isel table a little and we allow the aligned information to pass through to the evex->vec pass and produce the same output has avx/avx2 in some cases.

I also generally dislike patterns rooted in a bitcast which these were.

Differential Revision: https://reviews.llvm.org/D35977

llvm-svn: 309589
2017-07-31 17:35:44 +00:00
Simon Pilgrim 7b89ab5887 Strip trailing whitespace. NFCI.
llvm-svn: 309584
2017-07-31 17:09:27 +00:00
Simon Pilgrim 77bdbc197e Fix typo in comment.
llvm-svn: 309583
2017-07-31 17:06:55 +00:00
Amaury Sechet 4358c5217d Do not recombine FMA when that is not needed.
Summary: As per title. This creates useless recombines.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33848

llvm-svn: 309578
2017-07-31 16:56:25 +00:00
Alexey Bataev 3e9b3eb91d [Cost] Rename getReductionCost() to getArithmeticReductionCost(), NFC.
llvm-svn: 309563
2017-07-31 14:19:32 +00:00
Guy Blank b169d56dc3 [X86][AVX512] Add masked MOVS[S|D] patterns
Added patterns to recognize AND 1 on the mask of a scalar masked
move is not needed since only the lower bit is relevant for the
instruction.

Differential Revision:
https://reviews.llvm.org/D35897

llvm-svn: 309546
2017-07-31 08:26:14 +00:00
Craig Topper 97e9fa7954 [X86] Add pattern to use bzhi for 64-bit 'and' with a mask when there is a load involved.
We already had a pattern without load, but with a load we were falling back to a regular 'and' due to pattern complexity priority.

llvm-svn: 309535
2017-07-31 05:55:54 +00:00
Coby Tayree 48d67cdbb4 [x86][inline-asm][ms-compat] legalize the use of "jc/jz short <op>"
MS ignores the keyword "short" when used after a jc/jz instruction, LLVM ought to do the same.
Test: D35893

Differential Revision: https://reviews.llvm.org/D35892

llvm-svn: 309509
2017-07-30 11:12:47 +00:00
Craig Topper 951f0ca104 [X86] Add addsub intrinsics to the intrinsic lowering table so we have a single set of isel patterns.
llvm-svn: 309502
2017-07-30 06:02:59 +00:00
Simon Pilgrim 718cb0ea62 [SelectionDAG][X86] CombineBT - more aggressively determine demanded bits
This patch is in 2 parts:

1 - replace combineBT's use of SimplifyDemandedBits (hasOneUse only) with SelectionDAG::GetDemandedBits to more aggressively determine the lower bits used by BT.

2 - update SelectionDAG::GetDemandedBits to support ANY_EXTEND - if the demanded bits are only in the non-extended portion, then peek through and demand from the source value and then ANY_EXTEND that if we found a match.

Differential Revision: https://reviews.llvm.org/D35896

llvm-svn: 309486
2017-07-29 14:50:25 +00:00
Jessica Paquette d87f54493d [MachineOutliner] NFC: Change IsTailCall to a call class + frame class
This commit

- Removes IsTailCall and replaces it with a target-defined unsigned
- Refactors getOutliningCallOverhead and getOutliningFrameOverhead so that they don't use IsTailCall
- Adds a call class + frame class classification to OutlinedFunction and Candidate respectively

This accomplishes a couple things.

Firstly, we don't need the notion of *tail call* in the general outlining algorithm.

Secondly, we now can have different "outlining classes" for each candidate within a set of candidates.
This will make it easy to add new ways to outline sequences for certain targets and dynamically choose
an appropriate cost model for a sequence depending on the context that that sequence lives in.

Ultimately, this should get us closer to being able to do something like, say avoid saving the link
register when outlining AArch64 instructions.

llvm-svn: 309475
2017-07-29 02:55:46 +00:00
Adrian Prantl 8b9bb534a1 Remove the unused offset from DBG_VALUE (NFC)
Followup to r309426.
rdar://problem/33580047

llvm-svn: 309450
2017-07-28 23:00:45 +00:00
Jessica Paquette 809d708b8a [MachineOutliner] NFC: Split up getOutliningBenefit
This is some more cleanup in preparation for some actual
functional changes. This splits getOutliningBenefit into
two cost functions: getOutliningCallOverhead and
getOutliningFrameOverhead. These functions return the
number of instructions that would be required to call
a specific function and the number of instructions
that would be required to construct a frame for a
specific funtion. The actual outlining benefit logic
is moved into the outliner, which calls these functions.

The goal of refactoring getOutliningBenefit is to:

- Get us closer to getting rid of the IsTailCall flag

- Further split up "target-specific" things and
"general algorithm" things

llvm-svn: 309356
2017-07-28 03:21:58 +00:00
Reid Kleckner 07a5d4372e [X86] Fix latent bug in sibcall eligibility logic
The X86 tail call eligibility logic was correct when it was written, but
the addition of inalloca and argument copy elision broke its
assumptions. It was assuming that fixed stack objects were immutable.

Currently, we aim to emit a tail call if no arguments have to be
re-arranged in memory. This code would trace the outgoing argument
values back to check if they are loads from an incoming stack object.
If the stack argument is immutable, then we won't need to store it back
to the stack when we tail call.

Fortunately, stack objects track their mutability, so we can just make
the obvious check to fix the bug.

This was http://crbug.com/749826

llvm-svn: 309343
2017-07-28 00:58:35 +00:00
Ahmed Bougacha c890993726 [X86] Don't lie about legality to TLI's demanded bits.
Like r309323, X86 had a typo where it passed the wrong flags to TLO.

Found by inspection; I haven't been able to tickle this into having
observable behavior.  I don't think it does, given that X86 doesn't have
custom demanded bits logic, and the generic logic doesn't have a lot of
exposure to illegal constructs.

llvm-svn: 309325
2017-07-27 21:28:59 +00:00
Dinar Temirbulatov aead31a36f [X86] SET0 to use XMM registers where possible PR26018 PR32862
Differential Revision: https://reviews.llvm.org/D35839

llvm-svn: 309298
2017-07-27 17:47:01 +00:00