Commit Graph

51309 Commits

Author SHA1 Message Date
Oliver Stannard defdb1070f [AArch64] Allow -mattr=tpidr-el[1|2|3]
Added subtarget features for AArch64 to use TPIDR_EL[1|2|3] as the TLS base
register, rather than the default TPIDR_EL0.

Patch by Philip Derrin!

Differential revision: https://reviews.llvm.org/D54685

llvm-svn: 356657
2019-03-21 11:30:17 +00:00
Craig Topper 8d46403b8e [X86] Add CMPXCHG8B feature flag. Set it for all CPUs except i386/i486 including 'generic'. Disable use of CMPXCHG8B when this flag isn't set.
CMPXCHG8B was introduced on i586/pentium generation.

If its not enabled, limit the atomic width to 32 bits so the AtomicExpandPass will expand to lib calls. Unclear if we should be using a different limit for other configs. The default is 1024 and experimentation shows that using an i256 atomic will cause a crash in SelectionDAG.

Differential Revision: https://reviews.llvm.org/D59576

llvm-svn: 356631
2019-03-20 23:35:49 +00:00
Thomas Lively 5098f8589d [WebAssembly][NFC] Fix formatting error from rL356610
llvm-svn: 356622
2019-03-20 22:34:34 +00:00
Tim Renouf 2327c231d6 [AMDGPU] Do not generate spurious PAL metadata
My previous fix rL356591 "[AMDGPU] Added MsgPack format PAL metadata"
accidentally caused a spurious PAL metadata .note record to be emitted
for any AMDGPU output. That caused failures in the lld test
amdgpu-relocs.s. Fixed.

Differential Revision: https://reviews.llvm.org/D59613

Change-Id: Ie04a2aaae890dcd490f22c89edf9913a77ce070e
llvm-svn: 356621
2019-03-20 22:02:09 +00:00
Craig Topper 0367553304 [X86] Call lowerShuffleAsBitMask for 512-bit vectors in lowerShuffleAsBlend.
This patch enables the use of lowerShuffleAsBitMask for 512-bit blends before
falling back to move immedate, GPR to k-register, and masked op.

I had to make some changes to support v8i64 when i64 is not a legal type. And to
support floating point types.

This trades a load for the move immediate and GPR move which is higher latency.
But its probably better for register pressure not having to hop through other
register classes. The load+and should play better with LICM and
rematerialization I think.

Differential Revision: https://reviews.llvm.org/D59479

llvm-svn: 356618
2019-03-20 21:30:20 +00:00
Michael Liao bbcb95a64e [AMDGPU] Fix dependency on `BinaryFormat`
Summary: - The linking is broken when this library is built as shared one.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59610

llvm-svn: 356617
2019-03-20 21:22:27 +00:00
Matt Arsenault 2065206a9d AMDGPU: Don't look for constant in insert/extract_vector_elt regbankselect
The constantness shouldn't change the register bank choice. We also
don't need to restrict this to only indexing VGPRs, since it's
possible to index SGPRs (but SelectionDAG made using this
difficult). Allow directly indexing SGPRs when appropriate.

llvm-svn: 356611
2019-03-20 20:41:34 +00:00
Thomas Lively f6f4f84378 [WebAssembly] Target features section
Summary:
Implements a new target features section in assembly and object files
that records what features are used, required, and disallowed in
WebAssembly objects. The linker uses this information to ensure that
all objects participating in a link are feature-compatible and records
the set of used features in the output binary for use by optimizers
and other tools later in the toolchain.

The "atomics" feature is always required or disallowed to prevent
linking code with stripped atomics into multithreaded binaries. Other
features are marked used if they are enabled globally or on any
function in a module.

Future CLs will add linker flags for ignoring feature compatibility
checks and for specifying the set of allowed features, implement using
the presence of the "atomics" feature to control the type of memory
and segments in the linked binary, and add front-end flags for
relaxing the linkage policy for atomics.

Reviewers: aheejin, sbc100, dschuff

Subscribers: jgravelle-google, hiraditya, sunfish, mgrang, jfb, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59173

llvm-svn: 356610
2019-03-20 20:26:45 +00:00
Michael Liao eea5177d30 [AMDGPU] Fix clamp bit DAG operand
Summary:
- Should use `targetconstant` instead of `constant` operand for clamp
  bit, which is expected as an immediate operand. Under certain
  conditions, such as a common `i1 false` constant is used in other
  place and selected before the instruction with clamp bit, register
  operand may be added instead of immediate one. Use `targetcosntant` to
  enforce that.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59608

llvm-svn: 356608
2019-03-20 20:18:56 +00:00
Pete Couperus b062239d63 [ARC] Add ARCOptAddrMode pass to generate postincrement loads/stores.
Build on newly introduced ARC postincrement loads/stores from r356200.

Patch By Denis Antrushin! <denis@synopsys.com>

Differential Revision: https://reviews.llvm.org/D59409

llvm-svn: 356606
2019-03-20 20:06:21 +00:00
Konstantin Zhuravlyov 88268e3e36 AMDHSA: Fix COMPUTE_PGM_RSRC2.USER_SGPR calculation when parsing ISA assembly
It must match https://llvm.org/docs/AMDGPUUsage.html#initial-kernel-execution-state

Differential Revision: https://reviews.llvm.org/D59570

llvm-svn: 356603
2019-03-20 19:44:47 +00:00
Eli Friedman 638be660d7 [ARM] Eliminate redundant "mov rN, sp" instructions in Thumb1.
This takes sequences like "mov r4, sp; str r0, [r4]", and optimizes them
to something like "str r0, [sp]".

For regular stack variables, this optimization was already implemented:
we lower loads and stores using frame indexes, which are expanded later.
However, when constructing a call frame for a call with more than four
arguments, the existing optimization doesn't apply.  We need to use
stores which are actually relative to the current value of sp, and don't
have an associated frame index.

This patch adds a special case to handle that construct.  At the DAG
level, this is an ISD::STORE where the address is a CopyFromReg from SP
(plus a small constant offset).

This applies only to Thumb1: in Thumb2 or ARM mode, a regular store
instruction can access SP directly, so the COPY gets eliminated by
existing code.

The change to ARMDAGToDAGISel::SelectThumbAddrModeSP is a related
cleanup: we shouldn't pretend that it can select anything other than
frame indexes.

Differential Revision: https://reviews.llvm.org/D59568

llvm-svn: 356601
2019-03-20 19:40:45 +00:00
Tim Renouf e7bd52f86e [AMDGPU] Added MsgPack format PAL metadata
Summary:
PAL metadata now supports both the old linear reg=val pairs format and
the new MsgPack format.

The MsgPack format uses YAML as its textual representation. On output to
YAML, a mnemonic name is provided for some hardware registers.

Differential Revision: https://reviews.llvm.org/D57028

Change-Id: I2bbaabaaca4b3574f7e03b80fbef7c7a69d06a94
llvm-svn: 356591
2019-03-20 18:47:21 +00:00
Tim Renouf d737b551e9 [AMDGPU] Factored PAL metadata handling out into its own class
Summary:
This commit introduces a new AMDGPUPALMetadata class that:
* is inside the AMDGPU target;
* keeps an in-memory representation of PAL metadata;
* provides a method to read the frontend-supplied metadata from LLVM IR;
* provides methods for the asm printer to set metadata items;
* provides methods to write the metadata as a binary blob to put in a
  .note record or as an asm directive;
* provides a method to read the metadata as a binary blob from a .note
  record.

Because llvm-readobj cannot call directly into a target, I had to remove
llvm-readobj's ability to dump PAL metadata, pending a resolution to
https://reviews.llvm.org/D52821

Differential Revision: https://reviews.llvm.org/D57027

Change-Id: I756dc830894fcb6850324cdcfa87c0120eb2cf64
llvm-svn: 356582
2019-03-20 17:42:00 +00:00
Dmitry Preobrazhensky 04bd1185ad [AMDGPU][MC] Corrected checks for DS offset0 range
See bug 40889: https://bugs.llvm.org/show_bug.cgi?id=40889

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D59313

llvm-svn: 356576
2019-03-20 17:13:58 +00:00
Dmitry Preobrazhensky 137976fae2 [AMDGPU][MC][GFX9] Added support of operands shared_base, shared_limit, private_base, private_limit, pops_exiting_wave_id
See bug 39297: https://bugs.llvm.org/show_bug.cgi?id=39297

Reviewers: artem.tamazov, arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D59290

llvm-svn: 356561
2019-03-20 15:40:52 +00:00
Simon Pilgrim 2acca37a2d [X86] Use getConstantOperandAPInt to detect out-of-range shifts.
llvm-svn: 356549
2019-03-20 11:41:52 +00:00
Andrea Di Biagio 624f5deff4 [X86] Remove X86 specific dag nodes for RDTSC/RDTSCP/RDPMC. NFCI
This patch removes the following dag node opcodes from namespace X86ISD:

RDTSC_DAG,
RDTSCP_DAG,
RDPMC_DAG

The logic that expands RDTSC/RDPMC/XGETBV intrinsics is basically the same. The
only differences are:

    RDTSC/RDTSCP don't implicitly read ECX.
    RDTSCP also implicitly writes ECX.

I moved the common expansion logic into a helper function with the goal to get
rid of code repetition. That helper is now used for the expansion of
RDTSC/RDTSCP/RDPMC/XGETBV intrinsics.

No functional change intended.

Differential Revision: https://reviews.llvm.org/D59547

llvm-svn: 356546
2019-03-20 11:21:15 +00:00
David Stuttard fc2a747345 [AMDGPU] Allow MIMG with no uses in adjustWritemask in isel
Summary:
If an MIMG instruction has managed to get through to adjustWritemask in isel but
has no uses (and doesn't enable TFC) then prevent an assertion by not attempting
to adjust the writemask.

The instruction will be removed anyway.

Change-Id: I9a5dba6bafe1f35ac99c1b73df390936e2ac27a7

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58964

llvm-svn: 356540
2019-03-20 09:29:55 +00:00
Craig Topper 97d104cbee [X86] Re-disable cmpxchg16b for 32-bit mode assembly parsing.
This was broken recently when I factored the 64 bit mode check into hasCmpxchg16 without thinking about the AssemblerPredicate.

llvm-svn: 356531
2019-03-19 23:57:16 +00:00
Eli Friedman 2596e8b3e7 [ARM] Make sure to save/restore LR when we use tBfar.
This change does two things. One, it ensures compilation will abort
instead of miscompiling if ARMFrameLowering::determineCalleeSaves
chooses not to save LR in a case where it's necessary.  Two, it changes
the way we estimate the size of a function to be more conservative in
the presence of constant pool entries and jump tables.

EstimateFunctionSizeInBytes probably still isn't really conservative
enough, but I'm not sure how we can come up with a reliable estimate
before constant islands runs.

Differential Revision: https://reviews.llvm.org/D59439

llvm-svn: 356527
2019-03-19 21:48:08 +00:00
Amara Emerson 761ca2e53b [AArch64][GlobalISel] Add an optimization to select vector DUP instructions.
This adds pattern matching for the insert+shufflevector sequence so we can
generate dup instructions instead of the current TBL sequence.

Differential Revision: https://reviews.llvm.org/D59558

llvm-svn: 356526
2019-03-19 21:43:05 +00:00
Amara Emerson 18e2c5724a [AArch64][GlobalISel] Make v4s32 G_IMPLICIT_DEF legal.
llvm-svn: 356525
2019-03-19 21:43:02 +00:00
Matt Arsenault cf55a657f0 CodeGen: Refactor regallocator command line and target selection
This will allow targets more flexibility to replace the
register allocator core passes. In a future commit,
AMDGPU will run the core register assignment passes
twice, and will also want to disallow using the
standard -regalloc option.

llvm-svn: 356506
2019-03-19 19:33:12 +00:00
Simon Pilgrim 77482120da Fix for ABS legalization on PPC buildbot.
llvm-svn: 356498
2019-03-19 18:55:46 +00:00
Simon Pilgrim e744f513c4 [X86][SSE] SimplifyDemandedVectorEltsForTargetNode - handle repeated shift amounts
If a value with multiple uses is only ever used for SSE shift amounts then we know that only the bottom 64-bits are needed.

llvm-svn: 356483
2019-03-19 17:23:25 +00:00
Simon Atanasyan db4601e60a [MIPS][microMIPS] Enable dynamic stack realignment
Dynamic stack realignment was disabled on micromips by checking if
target has standard encoding. We simply change the condition to skip
Mips16 only.

Patch by Mirko Brkusanin.

Differential Revision: http://reviews.llvm.org/D59499

llvm-svn: 356478
2019-03-19 17:01:24 +00:00
Jordan Rupprecht f74d45a775 [NFC] Fix unused variable in release builds
This was introduced in rL356468.

llvm-svn: 356477
2019-03-19 16:52:40 +00:00
Simon Pilgrim 7a8e5051f4 Fix unused variable warning. NFCI.
llvm-svn: 356474
2019-03-19 16:49:59 +00:00
Simon Pilgrim a56f2822d0 [SelectionDAG] Handle unary SelectPatternFlavor for ABS case in SelectionDAGBuilder::visitSelect
These changes are related to PR37743 and include:

    SelectionDAGBuilder::visitSelect handles the unary SelectPatternFlavor::SPF_ABS case to build ABS node.

    Delete the redundant recognizer of the integer ABS pattern from the DAGCombiner.

    Add promoting the integer ABS node in the LegalizeIntegerType.

    Expand-based legalization of integer result for the ABS nodes.

    Expand-based legalization of ABS vector operations.

    Add some integer abs testcases for different typesizes for Thumb arch

    Add the custom ABS expanding and change the SAD pattern recognizer for X86 arch: The i64 result of the ABS is expanded to:
        tmp = (SRA, Hi, 31)
        Lo = (UADDO tmp, Lo)
        Hi = (XOR tmp, (ADDCARRY tmp, hi, Lo:1))
        Lo = (XOR tmp, Lo)

    The "detectZextAbsDiff" function is changed for the recognition of pattern with the ABS node. Given a ABS node, detect the following pattern:
        (ABS (SUB (ZERO_EXTEND a), (ZERO_EXTEND b))).

    Change integer abs testcases for codegen with the ABS node support for AArch64.
        Indicate that the ABS is legal for the i64 type when the NEON is supported.
        Change the integer abs testcases to show changing of codegen.

    Add combine and legalization of ABS nodes for Thumb arch.

    Extend 'matchSelectPattern' to recognize the ABS patterns with ICMP_SGE condition.

For discussion, see https://bugs.llvm.org/show_bug.cgi?id=37743

Patch by: @ikulagin (Ivan Kulagin)

Differential Revision: https://reviews.llvm.org/D49837

llvm-svn: 356468
2019-03-19 16:24:55 +00:00
Ryan Taylor 00e063ab92 [AMDGPU] Add buffer/load 8/16 bit overloaded intrinsics
Summary:
Add buffer store/load 8/16 overloaded intrinsics for buffer, raw_buffer and struct_buffer

Change-Id: I166a29f071b2ff4e4683fb0392564b1f223ac61d

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59265

llvm-svn: 356465
2019-03-19 16:07:00 +00:00
Neil Henning e85f6bd64f [AMDGPU] Ban i8 min3 promotion.
I found this really weird WWM-related case whereby through the WWM
transformations our isel lowering was trying to promote 2 min's into a
min3 for the i8 type, which our hardware doesn't support.

The new min3_i8.ll test case would previously spew the error:

PromoteIntegerResult #0: t69: i8 = SMIN3 t70, Constant:i8<0>, t68

Before the simple fix to our isel lowering to not do it for i8 MVT's.

Differential Revision: https://reviews.llvm.org/D59543

llvm-svn: 356464
2019-03-19 15:50:24 +00:00
Simon Atanasyan af40d4371d [mips] Fix crash on recursive using of .set
Switch to the `MCParserUtils::parseAssignmentExpression` for parsing
assignment expressions in the `.set` directive reduces code and allows
to print an error message instead of crashing in case of incorrect
recursive using of the `.set`.

Fix for the bug https://bugs.llvm.org/show_bug.cgi?id=41053.

Differential Revision: http://reviews.llvm.org/D59452

llvm-svn: 356461
2019-03-19 15:15:35 +00:00
Heejin Ahn c60bc94afc [WebAssembly] Small improvements in FixIrreducibleControlFlow (NFC)
Summary:
- Make some class member methods const
- Delete unnecessary includes
- Use a simpler form of `BuildMI`

Reviewers: kripken

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59454

llvm-svn: 356440
2019-03-19 05:26:33 +00:00
Heejin Ahn 34dc1f2483 [WebAssembly] Rename methods according to instruction name changes (NFC)
Reviewers: tlively, sbc100

Subscribers: dschuff, jgravelle-google, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59469

llvm-svn: 356438
2019-03-19 05:07:33 +00:00
Thomas Lively 0200d62ec7 [WebAssembly] Lower SIMD nnan setcc nodes
Summary:
Adds patterns to lower all the remaining setcc modes: lt, gt,
le, and ge. Fixes PR40912.

Reviewers: aheejin, sbc100, dschuff

Reviewed By: dschuff

Subscribers: jgravelle-google, hiraditya, sunfish, jdoerfert, llvm-commits, srj

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59519

llvm-svn: 356431
2019-03-19 00:55:34 +00:00
Craig Topper b24bdf626a [X86] Disable CQTO and CLTQ instructions in the assembly parser outside 64-bit mode.
llvm-svn: 356419
2019-03-18 22:06:14 +00:00
Craig Topper e732bc6bea [X86] Allow any 8-bit immediate to be used with BT/BTC/BTR/BTS not just sign extended 8-bit immediates.
We need to allow [128,255] in addition to [-128, 127] to match gas.

llvm-svn: 356413
2019-03-18 21:33:59 +00:00
Sam Clegg b7708ec87f [WebAssembly] Don't override default implementation of isOffsetFoldingLegal. NFC.
The default implementation does we want and is going to more compatible
with dynamic linking (-fPIC) support that is planned.

This is NFC because currently we only build wasm with
`-relocation-model=static` which in turn means that the default
`isOffsetFoldingLegal` always returns true today.

Differential Revision: https://reviews.llvm.org/D54661

llvm-svn: 356410
2019-03-18 21:21:12 +00:00
Craig Topper f086e562f9 [X86] Use relocImm in the ROL8ri/ROL16ri/ROL32ri/ROL64ri patterns to be consistent with the ROR patterns.
llvm-svn: 356407
2019-03-18 20:43:15 +00:00
Craig Topper 0b9c640fe0 [X86] Replace uses of i64immSExt32_su with i64relocImmSExt32_su.
For the i8, i16, and i32 instructions we were using a relocImm. Presumably we should for i64 as well.

llvm-svn: 356406
2019-03-18 20:43:09 +00:00
Michael Liao efb4f9e568 [AMDGPU] Enable code selection using `s_mul_hi_u32`/`s_mul_hi_i32`.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59501

llvm-svn: 356405
2019-03-18 20:40:09 +00:00
Tim Renouf cfdfba996b [AMDGPU] Asm/disasm clamp modifier on vop3 int arithmetic
Allow the clamp modifier on vop3 int arithmetic instructions in assembly
and disassembly.

This involved adding a clamp operand to the affected instructions in MIR
and MC, and thus having to fix up several places in codegen and MIR
tests.

Differential Revision: https://reviews.llvm.org/D59267

Change-Id: Ic7775105f02a985b668fa658a0cd7837846a534e
llvm-svn: 356399
2019-03-18 19:35:44 +00:00
Tim Renouf 2e94f6e584 [AMDGPU] Asm/disasm v_cndmask_b32_e64 with abs/neg source modifiers
This commit allows v_cndmask_b32_e64 with abs, neg source
modifiers on src0, src1 to be assembled and disassembled.

This does appear to be allowed, even though they are floating point
modifiers and the operand type is b32.

To do this, I added src0_modifiers and src1_modifiers to the
MachineInstr, which involved fixing up several places in codegen and mir
tests.

Differential Revision: https://reviews.llvm.org/D59191

Change-Id: I69bf4a8c73ebc65744f6110bb8fc4e937d79fbea
llvm-svn: 356398
2019-03-18 19:25:39 +00:00
Amara Emerson 8627178d46 Revert r356304: remove subreg parameter from MachineIRBuilder::buildCopy()
After review comments, it was preferred to not teach MachineIRBuilder about
non-generic instructions beyond using buildInstr().

For AArch64 I've changed the buildCopy() calls to buildInstr() + a
separate addReg() call.

This also relaxes the MachineIRBuilder's COPY checking more because it may
not always have a SrcOp given to it.

llvm-svn: 356396
2019-03-18 19:20:10 +00:00
Tim Renouf 8723a56551 [MsgPack][AMDGPU] Fix unflushed raw_string_ostream bugs on windows expensive checks bot
This fixes a couple of unflushed raw_string_ostream bugs in recent
commits that only show up on a bot building on windows with expensive
checks.

Differential Revision: https://reviews.llvm.org/D59396

Change-Id: I9c6208325503b3ee0786b4b688e13fc24a15babf
llvm-svn: 356394
2019-03-18 19:00:46 +00:00
Craig Topper f07062a798 [X86] Rename imm8_su/imm16_su/imm32_su to relocImm8_su/relocImm16_su/relocImm32_su/ to accurately reflect what they are.
llvm-svn: 356393
2019-03-18 18:54:06 +00:00
Adhemerval Zanella 270249de2b [AArch64] Small fix for getIntImmCost
It uses the generic AArch64_IMM::expandMOVImm to get the correct
number of instruction used in immediate materialization.

Reviewers: efriedma

Differential Revision: https://reviews.llvm.org/D58461

llvm-svn: 356391
2019-03-18 18:50:58 +00:00
Adhemerval Zanella a3cefa5d64 [AArch64] Optimize floating point materialization
This patch follows some ideas from r352866 to optimize the floating
point materialization even further. It changes isFPImmLegal to
considere up to 2 mov instruction or up to 5 in case subtarget has
fused literals.

The rationale is the cost is the same for mov+fmov vs. adrp+ldr; but
the mov+fmov sequence is always better because of the reduced d-cache
pressure. The timings are still the same if you consider movw+movk+fmov
vs. adrp+ldr will be fused (although one instruction longer).

Reviewers: efriedma

Differential Revision: https://reviews.llvm.org/D58460

llvm-svn: 356390
2019-03-18 18:45:57 +00:00
Adhemerval Zanella 664c1ef528 [TargetLowering] Add code size information on isFPImmLegal. NFC
This allows better code size for aarch64 floating point materialization
in a future patch.

Reviewers: evandro

Differential Revision: https://reviews.llvm.org/D58690

llvm-svn: 356389
2019-03-18 18:40:07 +00:00