Commit Graph

47567 Commits

Author SHA1 Message Date
Craig Topper f94ed26ea9 [X86] Directly legalize v16i16/v8i16 vselect to vXi8 vselect to use VPBLENDVB
The intrinsic legalization for masked truncate uses ISD::TRUNCATE which can be constant folded by getNode. This prevents getVectorMaskingNode from seeing the ISD::TRUNCATE special case where it should emit X86ISD::SELECT instead of ISD::VSELECT. This causes a vselect with a v16i1 or v8i1 condition to be emitted during vector legalization. but vector legalization doesn't revisit nodes it creates. DAG combine will then promote this condition to match the result type. Then op legalization will try to legalize it, but the custom lowering hook returned SDValue(). But op legalization doesn't have an Expand for VSELECT because it expects vector legalization to have taken care of it. So the operation sticks around and fails in isel.

This patch adds a custom legalization hook to morph it to a vXi8 vselect instead.

This also simplifies the normal vXi16 vselect handling because vector legalization was normally expanding to AND/ANDN/OR and DAG combine was turning that into VBLENDVB. So we can skip a step by doing it directly.

Fixes PR37499

Differential Revision: https://reviews.llvm.org/D47025

llvm-svn: 332743
2018-05-18 17:48:06 +00:00
Simon Pilgrim 007b50fd35 [X86][BtVer2] Improve simulation of (V)PINSR values
Include the 6cy delay transferring from the GPR to FPU.

llvm-svn: 332737
2018-05-18 17:09:41 +00:00
Simon Pilgrim 3ecb0b80f6 [X86][BtVer2] Partial vector stores (inc MMX) have a 2cy latency
llvm-svn: 332722
2018-05-18 14:22:22 +00:00
Simon Pilgrim c4b8d367a8 [X86][SSE] Ensure vector partial load/stores use the WriteVecLoad/WriteVecStore scheduler classes
Retag some instructions that were missed when we split off vector load/store/moves - MOVQ/MOVD etc.

Fixes BtVer2/SLM which have different behaviours for GPR stores.

llvm-svn: 332718
2018-05-18 14:08:01 +00:00
Simon Pilgrim e819199e2a [X86][AVX] VEXTRACTF128mr store is a WriteFStoreX not WriteFStore
llvm-svn: 332715
2018-05-18 13:17:51 +00:00
Simon Pilgrim d749b321b2 [X86][SSE] Ensure float load/stores use the WriteFLoad/WriteFStore scheduler classes
Retag some instructions that were missed when we split off vector load/store/moves - MOVSS/MOVSD/MOVHPD/MOVHPD/MOVLPD/MOVLPS etc.

Fixes BtVer2/SLM which have different behaviours for GPR stores.

llvm-svn: 332714
2018-05-18 13:13:59 +00:00
Clement Courbet 8892c7db08 [ExynosM3] Fix scheduling info.
Differential Revision: https://reviews.llvm.org/D46356

llvm-svn: 332713
2018-05-18 13:10:41 +00:00
Simon Pilgrim a325dffd36 [X86][ZnVer1] Cleanup more single match instregexs
llvm-svn: 332712
2018-05-18 13:05:26 +00:00
Jonas Paulsson b51ccaf4d4 [SystemZ] Fix commit message of previous commit.
Sorry, the commit comment for r332703 is completely broken.
My mind slipped - the right description would be:

In SystemZDAGToDAGISel::Select(), in the handling for SELECT_CCMASK:

Check if UpdateNodeOperands() returns a different SDNode and in that
case call ReplaceNode.

Review: Ulrich Weigand.
llvm-svn: 332706
2018-05-18 12:07:16 +00:00
Alexander Ivchenko 5c54742da4 [X86][CET] Changing -fcf-protection behavior to comply with gcc (LLVM part)
This patch aims to match the changes introduced in gcc by
https://gcc.gnu.org/ml/gcc-cvs/2018-04/msg00534.html. The
IBT feature definition is removed, with the IBT instructions
being freely available on all X86 targets. The shadow stack
instructions are also being made freely available, and the
use of all these CET instructions is controlled by the module
flags derived from the -fcf-protection clang option. The hasSHSTK
option remains since clang uses it to determine availability of
shadow stack instruction intrinsics, but it is no longer directly used.

Comes with a clang patch (D46881).

Patch by mike.dvoretsky

Differential Revision: https://reviews.llvm.org/D46882

llvm-svn: 332705
2018-05-18 11:58:25 +00:00
Jonas Paulsson de54c058a6 [SystemZ] Fold AHIMux in foldMemoryOperandImpl.
AHIMux can be folded the same way as AHI.

Review: Ulrich Weigand
llvm-svn: 332703
2018-05-18 11:54:04 +00:00
Shiva Chen 6e07dfb148 [RISCV] Add WasForced parameter to MCAsmBackend::fixupNeedsRelaxationAdvanced
For RISCV branch instructions, we need to preserve relocation types when linker
relaxation enabled, so then linker could modify offset when the branch offsets
changed.

We preserve relocation types by define shouldForceRelocation.
IsResolved return by evaluateFixup will always false when shouldForceRelocation
return true. It will make RISCV MC Branch Relaxation always relax 16-bit
branches to 32-bit form, even if the symbol actually could be resolved.

To avoid 16-bit branches always relax to 32-bit form when linker relaxation
enabled, we add a new parameter WasForced to indicate that the symbol actually
couldn't be resolved and not forced by shouldForceRelocation return true.

RISCVAsmBackend::fixupNeedsRelaxationAdvanced could relax branches with
unresolved symbols by (!IsResolved && !WasForced).

RISCV MC Branch Relaxation is needed because RISCV could perform 32-bit
to 16-bit transformation in MC layer.

Differential Revision: https://reviews.llvm.org/D46350

llvm-svn: 332696
2018-05-18 06:42:21 +00:00
Eric Christopher 68f2218e1e Revert "Temporarily revert "[DEBUG] Initial adaptation of NVPTX target for debug info emission.""
This reapplies commits: r330271, r330592, r330779.

    [DEBUG] Initial adaptation of NVPTX target for debug info emission.

    Summary:
    Patch adds initial emission of the debug info for NVPTX target.
    Currently, only .file and .loc directives are emitted, everything else is
    commented out to not break the compilation of Cuda.

llvm-svn: 332689
2018-05-18 03:13:08 +00:00
Eli Friedman d268bf0a4d Fix unused lambda capture.
llvm-svn: 332686
2018-05-18 02:11:25 +00:00
Eli Friedman 4081a57af7 [MachineOutliner] Count savings from outlining in bytes.
Counting the number of instructions is both unintuitive and inaccurate.
On AArch64, this only affects the generated remarks and certain rare
pseudo-instructions, but it will have a bigger impact on other targets.

Differential Revision: https://reviews.llvm.org/D46921

llvm-svn: 332685
2018-05-18 01:52:16 +00:00
Keno Fischer e07153a859 [X86DomainReassignment] Don't compare stack-allocated values by address
Summary:
The Closure allocated in the main loop is allocated on the stack. However,
later in the code its address is taken (and used for comparisons). This
obviously doesn't work. In fact, the Closure will get the same stack address
during every loop iteration, rendering the check that intended to identify
Closure conflicts entirely ineffective. Fix this bug by giving every Closure
a unique ID and using that for comparison. Alternatively, we could heap
allocate the closure object.

Fixes PR37396
Fixes JuliaLang/julia#27032

Reviewers: craig.topper, guyblank

Reviewed By: craig.topper

Subscribers: vchuravy, llvm-commits

Differential Revision: https://reviews.llvm.org/D46800

llvm-svn: 332682
2018-05-18 01:03:01 +00:00
Keno Fischer 66ab99c3ee [X86DomainReassignment] Don't delete IMPLICIT_DEF nodes
Summary:
We cannot simply delete IMPLICIT_DEF nodes. They may be used
later (e.g. by a PHI) and deleting them will cause later passes (e.g.
LiveVariables) to crash. However, it seems fine to ignore them for
purposes of the domain reassignment (as we do with PHI).

Fixes PR37430
Fixes JuliaLang/julia#27080

Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D46797

llvm-svn: 332680
2018-05-18 00:40:52 +00:00
Changpeng Fang 860d460063 AMDGPU/SI: Don't promote alloca to vector for atomic load/store
Summary:
  Don't promote alloca to vector for atomic load/store

Reviewer:
  arsenm

Differential Revision:
  https://reviews.llvm.org/D46085

llvm-svn: 332673
2018-05-17 21:49:44 +00:00
Peter Collingbourne 0d8fa1b6fd ARC, Nios2: Silence build warnings. NFCI.
llvm-svn: 332663
2018-05-17 20:46:01 +00:00
Sameer AbuAsal 1dc0a8fb18 [RISCV] Separate base from offset in lowerGlobalAddress
Summary:
When lowering global address, lower the base as a TargetGlobal first then
 create an SDNode for the offset separately and chain it to the address calculation

 This optimization will create a DAG where the base address of a global access will
 be reused between different access. The offset can later be folded into the immediate
 part of the memory access instruction.

  With this optimization we generate:

    lui a0, %hi(s)
    addi a0, a0, %lo(s) ; shared base address.

    addi a1, zero, 20 ; 2 instructions per access.
    sw a1, 44(a0)

    addi a1, zero, 10
    sw a1, 8(a0)

    addi a1, zero, 30
    sw a1, 80(a0)

    Instead of:

    lui a0, %hi(s+44) ; 3 instructions per access.
    addi a1, zero, 20
    sw a1, %lo(s+44)(a0)

    lui a0, %hi(s+8)
    addi a1, zero, 10
    sw a1, %lo(s+8)(a0)

    lui a0, %hi(s+80)
    addi a1, zero, 30
    sw a1, %lo(s+80)(a0)

    Which will save one instruction per access.

Reviewers: asb, apazos

Reviewed By: asb

Subscribers: rbar, johnrusso, simoncook, jordy.potman.lists, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, apazos, asb, llvm-commits

Differential Revision: https://reviews.llvm.org/D46989

llvm-svn: 332641
2018-05-17 18:14:53 +00:00
Mandeep Singh Grang ef0ebf2806 [RISCV] Implement MC layer support for the tail pseudoinstruction
Summary:
This patch implements MC support for tail psuedo instruction.
A follow-up patch implements the codegen support as well as handling of the indirect tail pseudo instruction.

Reviewers: asb, apazos

Reviewed By: asb

Subscribers: rbar, johnrusso, simoncook, jordy.potman.lists, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, llvm-commits

Differential Revision: https://reviews.llvm.org/D46221

llvm-svn: 332634
2018-05-17 17:31:27 +00:00
Simon Pilgrim 2782a19fad [X86] Split WriteCMOV + WriteCMOV2 scheduler classes
Handle SNB+ targets which treat CMOVA/CMOVBE specially due to partial EFLAGS handling.

llvm-svn: 332626
2018-05-17 16:47:30 +00:00
Changpeng Fang 391bcf8893 AMDGPU/SI: Handle infinite loop for the structurizer to work with CFG with infinite loops.
Summary:
  The current StructurizeCFG pass only works for CFG with one exit. AMDGPUUnifyDivergentExitNodes combines multiple "return" blocks and/or "unreachable" blocks
to one exit block for the Structurizer to work. However, infinite loop is another kind of special "exit", and if we don't handle it, the case of multiple exits will prevent the structurizer from working.

In this work, for each infinite loop, we add a dummy edge to the "return" block, and thus the AMDGPUUnifyDivergentExitNodes pass will work with infinite loops.
This will make CFG with infinite loops be structurized.

Reviewer:
  nhaehnle

Differential Revision:
  https://reviews.llvm.org/D46340

llvm-svn: 332625
2018-05-17 16:45:01 +00:00
Petar Jovanovic daf5169398 [mips] Add support for Global INValidate ASE
This includes

  Instructions: ginvi, ginvt,

  Assembler directives: .set ginv, .set noginv, .module ginv, .module noginv

  Attribute: ginv

  .MIPS.abiflags: GINV (0x20000)

Patch by Vladimir Stefanovic.

Differential Revision: https://reviews.llvm.org/D46268

llvm-svn: 332624
2018-05-17 16:30:32 +00:00
Alex Bradbury 6a53023b4e [RISCV] Set isReMaterializable on ADDI and LUI instructions
The isReMaterlizable flag is somewhat confusing, unlike most other instruction 
flags it is currently interpreted as a hint (mightBeRematerializable would be 
a better name). While LUI is always rematerialisable, for an instruction like 
ADDI it depends on its operands. TargetInstrInfo::isTriviallyReMaterializable 
will call TargetInstrInfo::isReallyTriviallyReMaterializable, which in turn 
calls TargetInstrInfo::isReallyTriviallyReMaterializableGeneric. We rely on 
the logic in the latter to pick out instances of ADDI that really are 
rematerializable.

The isReMaterializable flag does make a difference on a variety of test 
programs. The recently committed remat.ll test case demonstrates how stack 
usage is reduce and a unnecessary lw/sw can be removed. Stack usage in the 
Proc0 function in dhrystone reduces from 192 bytes to 112 bytes.

For the sake of completeness, this patch also implements 
RISCVRegisterInfo::isConstantPhysReg. Although this is called from a number of 
places, it doesn't seem to result in different codegen for any programs I've 
thrown at it. However, it is called in the rematerialisation codepath and it 
seems sensible to implement something correct here.

Differential Revision: https://reviews.llvm.org/D46182

llvm-svn: 332617
2018-05-17 15:51:37 +00:00
Simon Pilgrim b5741f5c3d [X86][BtVer2] ADC/SBB take 2cy on an ALU pipe, not 1cy like ADD/SUB
llvm-svn: 332616
2018-05-17 15:43:23 +00:00
Alex Bradbury 5e41fc83c5 [Hexagon] Use addAliasForDirective for data directives
Data directives such as .word, .half, .hword are currently parsed using 
HexagonAsmParser::ParseDirectiveValue which effectively duplicates logic from 
AsmParser::parseDirectiveValue. This patch deletes that duplicated logic in 
favour of using addAliasForDirective.

Differential Revision: https://reviews.llvm.org/D46999

llvm-svn: 332607
2018-05-17 13:21:18 +00:00
Simon Pilgrim 0c0336e003 [X86] Split WriteADC/WriteADCRMW scheduler classes
For integer ALU instructions taking eflags as an input (ADC/SBB/ADCX/ADOX)

llvm-svn: 332605
2018-05-17 12:43:42 +00:00
Jonas Paulsson caafed5570 [SystemZ] Commenting (NFC)
Some minor commenting in scheduler files.

Review: Ulrich Weigand
llvm-svn: 332599
2018-05-17 11:53:56 +00:00
Simon Pilgrim ceb4933dc1 [X86][SNB] Minor scheduler cleanup
Merge 2 instregex and explain the VMOVDQArr/MOVDQArr difference

llvm-svn: 332591
2018-05-17 10:36:29 +00:00
Sander de Smalen 75cfa34156 [AArch64][SVE] Asm: Support for structured ST2, ST3 and ST4 (scalar+scalar) store instructions.
Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46680

llvm-svn: 332584
2018-05-17 09:05:41 +00:00
Alex Bradbury cea6db0480 [RISCV] Add support for .half, .hword, .word, .dword directives
These directives are recognised by gas. Support is added through the use of 
addAliasForDirective.

Also match RISC-V gcc in preferring .half and .word for 16-bit and 32-bit data 
directives.

llvm-svn: 332574
2018-05-17 05:58:08 +00:00
Craig Topper a2c5264718 [X86] Add OptForSize to a couple load folding patterns. Remove some bad FIXME comments.
The FIXME comments were about preventing load folding to avoid a partial xmm update. But these instructions use GPR as input when the load isn't folded. This won't help prevent a partial xmm update.

llvm-svn: 332573
2018-05-17 05:41:11 +00:00
Dan Gohman aef674102c [WebAssembly] Fix the opcode number for i64.load16_u.
Fixes PR37488.

llvm-svn: 332561
2018-05-17 00:14:13 +00:00
Simon Pilgrim 820433f533 [X86][SNB] Remove unnecessary CVT InstRW overrides
llvm-svn: 332536
2018-05-16 22:14:29 +00:00
Eli Friedman ddbf6d6514 [MachineOutliner] Don't outline instructions that modify SP.
This breaks the code which saves and restores LR, so we can't outline
without doing something more complicated for stack adjustment.

Found by inspection; we get lucky in most cases because getMemOpInfo
only handles STRWpost, not any other pre/post-increment forms. But it
hits a couple of artificial testcases in the tree.

Differential Revision: https://reviews.llvm.org/D46920

llvm-svn: 332529
2018-05-16 21:20:16 +00:00
Krzysztof Parzyszek f18009dbc6 [Hexagon] Fix the order of operands when selecting QCAT
llvm-svn: 332526
2018-05-16 21:02:43 +00:00
Krzysztof Parzyszek e8a0ae7346 [Hexagon] Mark HVX vector predicate bitwise ops as legal, add patterns
llvm-svn: 332525
2018-05-16 21:00:24 +00:00
Simon Pilgrim 2e0f6c9b21 [X86][SSE] Reduce instruction/register usages for v4i32 vector shifts (PR37441)
As suggested by Fabian on PR37441, use PSHUFLW to extend shift amount types for use with PSRAD/PSRLD to reduce register pressure.

Some of this ideally would be done by combineTargetShuffle but its tricky to do as most of the shuffles are sharing inputs.

Differential Revision: https://reviews.llvm.org/D46959

llvm-svn: 332524
2018-05-16 20:52:52 +00:00
Konstantin Zhuravlyov c72ece6c2c AMDGPU : Recalculate SGPRs when trap handler is supported
Differential Revision: https://reviews.llvm.org/D29911

llvm-svn: 332523
2018-05-16 20:47:48 +00:00
Eric Christopher fb923d28a9 Fix up a misleading format warning.
llvm-svn: 332521
2018-05-16 20:33:59 +00:00
Eli Friedman 02709bcb78 [MachineOutliner] Don't save/restore LR for tail calls.
The cost computation assumes we do this correctly, but the actual
lowering was wrong.

Differential Revision: https://reviews.llvm.org/D46923

llvm-svn: 332514
2018-05-16 19:49:01 +00:00
Simon Pilgrim d5d77dcb46 [X86] Fix typo in instregex for CVTSI642SDrr
llvm-svn: 332510
2018-05-16 18:31:17 +00:00
Craig Topper 67aa726f8c [X86][AVX512DQ] Use packed instructions for scalar FP<->i64 conversions on 32-bit targets
As i64 types are not legal on 32-bit targets, insert these into a suitable zero vector and use the packed vXi64<->FP conversion instructions instead.

Fixes PR3163.

Differential Revision: https://reviews.llvm.org/D43441

llvm-svn: 332498
2018-05-16 17:40:07 +00:00
Tony Tye 43259df44a [AMDGPU] Change llvm.debugtrap to be a debug breakpoint that can resume execution.
No longer require the queue pointer to be passed in in fixed SGPRs.

Differential Revision: https://reviews.llvm.org/D46769

llvm-svn: 332485
2018-05-16 16:19:34 +00:00
Sander de Smalen 22176a2242 [AArch64][SVE] Improve diagnostics for vectors with incorrect element-size.
For regular SVE vector operands, this patch introduces a more
sensible diagnostic when the vector has a wrong suffix (e.g. z0.s vs z0.b).

For example:
  add z0.s, z1.s, z2.b      -> invalid element width
               ^_____^
               mismatch

For the vector-with-shift/extend (e.g. z0.s, uxtw #2) this patch takes
a slightly different approach and instead returns a 'invalid operand'
if the element size is not as expected. This is because the diagnostics
are more specificied to suggest using the right shift/extend suffix. This
is a trade-off not to introduce more operand classes and still provide
useful diagnostics for LD1 and PRF instructions.

For example:
  ld1w z1.s, p0/z, [x0, z0.s] -> invalid shift/extend specified, expected 'z[0..31].s, (uxtw|sxtw)'
  ld1w z1.d, p0/z, [x0, z0.s] -> invalid operand
          ^________________^
               mismatch

For gather prefetches, both 'z0.s' and 'z0.d' would be allowed:
  prfw #0, p0, [x0, z0.s]   -> invalid shift/extend specified, expected 'z[0..31].s, (uxtw|sxtw) #2'
  prfw #0, p0, [x0, z0.d]   -> invalid shift/extend specified, expected 'z[0..31].d, (lsl|uxtw|sxtw) #2'

Without this change, the diagnostic would unnecessarily suggest a
different element size:
  prfw #0, p0, [x0, z0.s]   -> invalid shift/extend specified, expected 'z[0..31].d, (lsl|uxtw|sxtw) #2'

Reviewers: SjoerdMeijer, aemerson, fhahn, samparker, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46688

llvm-svn: 332483
2018-05-16 15:45:17 +00:00
Sirish Pande cabe50a308 [AArch64] Gangup loads and stores for pairing.
Keep loads and stores together (target defines how many loads
and stores to gang up), such that it will help in pairing
and vectorization.

Differential Revision https://reviews.llvm.org/D46477

llvm-svn: 332482
2018-05-16 15:36:52 +00:00
Sander de Smalen bbc4e9a4e3 [AArch64][SVE] Asm: Support for gather PRF prefetch instructions
Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46686

llvm-svn: 332472
2018-05-16 14:16:01 +00:00
Simon Dardis 8ea0ecdedc [mips] Simplify some of the predicate scopes for (negative) multiply add/sub instructions (NFCI)
llvm-svn: 332464
2018-05-16 12:44:27 +00:00
Simon Dardis 6c35f8f445 [mips] Join existing scopes for DecoderNamespace (NFCI)
llvm-svn: 332462
2018-05-16 12:37:04 +00:00