Added subtarget features for AArch64 to use TPIDR_EL[1|2|3] as the TLS base
register, rather than the default TPIDR_EL0.
Patch by Philip Derrin!
Differential revision: https://reviews.llvm.org/D54685
llvm-svn: 356657
CMPXCHG8B was introduced on i586/pentium generation.
If its not enabled, limit the atomic width to 32 bits so the AtomicExpandPass will expand to lib calls. Unclear if we should be using a different limit for other configs. The default is 1024 and experimentation shows that using an i256 atomic will cause a crash in SelectionDAG.
Differential Revision: https://reviews.llvm.org/D59576
llvm-svn: 356631
My previous fix rL356591 "[AMDGPU] Added MsgPack format PAL metadata"
accidentally caused a spurious PAL metadata .note record to be emitted
for any AMDGPU output. That caused failures in the lld test
amdgpu-relocs.s. Fixed.
Differential Revision: https://reviews.llvm.org/D59613
Change-Id: Ie04a2aaae890dcd490f22c89edf9913a77ce070e
llvm-svn: 356621
This patch enables the use of lowerShuffleAsBitMask for 512-bit blends before
falling back to move immedate, GPR to k-register, and masked op.
I had to make some changes to support v8i64 when i64 is not a legal type. And to
support floating point types.
This trades a load for the move immediate and GPR move which is higher latency.
But its probably better for register pressure not having to hop through other
register classes. The load+and should play better with LICM and
rematerialization I think.
Differential Revision: https://reviews.llvm.org/D59479
llvm-svn: 356618
Summary: - The linking is broken when this library is built as shared one.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59610
llvm-svn: 356617
The constantness shouldn't change the register bank choice. We also
don't need to restrict this to only indexing VGPRs, since it's
possible to index SGPRs (but SelectionDAG made using this
difficult). Allow directly indexing SGPRs when appropriate.
llvm-svn: 356611
Summary:
Implements a new target features section in assembly and object files
that records what features are used, required, and disallowed in
WebAssembly objects. The linker uses this information to ensure that
all objects participating in a link are feature-compatible and records
the set of used features in the output binary for use by optimizers
and other tools later in the toolchain.
The "atomics" feature is always required or disallowed to prevent
linking code with stripped atomics into multithreaded binaries. Other
features are marked used if they are enabled globally or on any
function in a module.
Future CLs will add linker flags for ignoring feature compatibility
checks and for specifying the set of allowed features, implement using
the presence of the "atomics" feature to control the type of memory
and segments in the linked binary, and add front-end flags for
relaxing the linkage policy for atomics.
Reviewers: aheejin, sbc100, dschuff
Subscribers: jgravelle-google, hiraditya, sunfish, mgrang, jfb, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59173
llvm-svn: 356610
Summary:
- Should use `targetconstant` instead of `constant` operand for clamp
bit, which is expected as an immediate operand. Under certain
conditions, such as a common `i1 false` constant is used in other
place and selected before the instruction with clamp bit, register
operand may be added instead of immediate one. Use `targetcosntant` to
enforce that.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59608
llvm-svn: 356608
This takes sequences like "mov r4, sp; str r0, [r4]", and optimizes them
to something like "str r0, [sp]".
For regular stack variables, this optimization was already implemented:
we lower loads and stores using frame indexes, which are expanded later.
However, when constructing a call frame for a call with more than four
arguments, the existing optimization doesn't apply. We need to use
stores which are actually relative to the current value of sp, and don't
have an associated frame index.
This patch adds a special case to handle that construct. At the DAG
level, this is an ISD::STORE where the address is a CopyFromReg from SP
(plus a small constant offset).
This applies only to Thumb1: in Thumb2 or ARM mode, a regular store
instruction can access SP directly, so the COPY gets eliminated by
existing code.
The change to ARMDAGToDAGISel::SelectThumbAddrModeSP is a related
cleanup: we shouldn't pretend that it can select anything other than
frame indexes.
Differential Revision: https://reviews.llvm.org/D59568
llvm-svn: 356601
Summary:
PAL metadata now supports both the old linear reg=val pairs format and
the new MsgPack format.
The MsgPack format uses YAML as its textual representation. On output to
YAML, a mnemonic name is provided for some hardware registers.
Differential Revision: https://reviews.llvm.org/D57028
Change-Id: I2bbaabaaca4b3574f7e03b80fbef7c7a69d06a94
llvm-svn: 356591
Summary:
This commit introduces a new AMDGPUPALMetadata class that:
* is inside the AMDGPU target;
* keeps an in-memory representation of PAL metadata;
* provides a method to read the frontend-supplied metadata from LLVM IR;
* provides methods for the asm printer to set metadata items;
* provides methods to write the metadata as a binary blob to put in a
.note record or as an asm directive;
* provides a method to read the metadata as a binary blob from a .note
record.
Because llvm-readobj cannot call directly into a target, I had to remove
llvm-readobj's ability to dump PAL metadata, pending a resolution to
https://reviews.llvm.org/D52821
Differential Revision: https://reviews.llvm.org/D57027
Change-Id: I756dc830894fcb6850324cdcfa87c0120eb2cf64
llvm-svn: 356582
This patch removes the following dag node opcodes from namespace X86ISD:
RDTSC_DAG,
RDTSCP_DAG,
RDPMC_DAG
The logic that expands RDTSC/RDPMC/XGETBV intrinsics is basically the same. The
only differences are:
RDTSC/RDTSCP don't implicitly read ECX.
RDTSCP also implicitly writes ECX.
I moved the common expansion logic into a helper function with the goal to get
rid of code repetition. That helper is now used for the expansion of
RDTSC/RDTSCP/RDPMC/XGETBV intrinsics.
No functional change intended.
Differential Revision: https://reviews.llvm.org/D59547
llvm-svn: 356546
Summary:
If an MIMG instruction has managed to get through to adjustWritemask in isel but
has no uses (and doesn't enable TFC) then prevent an assertion by not attempting
to adjust the writemask.
The instruction will be removed anyway.
Change-Id: I9a5dba6bafe1f35ac99c1b73df390936e2ac27a7
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58964
llvm-svn: 356540
This change does two things. One, it ensures compilation will abort
instead of miscompiling if ARMFrameLowering::determineCalleeSaves
chooses not to save LR in a case where it's necessary. Two, it changes
the way we estimate the size of a function to be more conservative in
the presence of constant pool entries and jump tables.
EstimateFunctionSizeInBytes probably still isn't really conservative
enough, but I'm not sure how we can come up with a reliable estimate
before constant islands runs.
Differential Revision: https://reviews.llvm.org/D59439
llvm-svn: 356527
This adds pattern matching for the insert+shufflevector sequence so we can
generate dup instructions instead of the current TBL sequence.
Differential Revision: https://reviews.llvm.org/D59558
llvm-svn: 356526
This will allow targets more flexibility to replace the
register allocator core passes. In a future commit,
AMDGPU will run the core register assignment passes
twice, and will also want to disallow using the
standard -regalloc option.
llvm-svn: 356506
Dynamic stack realignment was disabled on micromips by checking if
target has standard encoding. We simply change the condition to skip
Mips16 only.
Patch by Mirko Brkusanin.
Differential Revision: http://reviews.llvm.org/D59499
llvm-svn: 356478
These changes are related to PR37743 and include:
SelectionDAGBuilder::visitSelect handles the unary SelectPatternFlavor::SPF_ABS case to build ABS node.
Delete the redundant recognizer of the integer ABS pattern from the DAGCombiner.
Add promoting the integer ABS node in the LegalizeIntegerType.
Expand-based legalization of integer result for the ABS nodes.
Expand-based legalization of ABS vector operations.
Add some integer abs testcases for different typesizes for Thumb arch
Add the custom ABS expanding and change the SAD pattern recognizer for X86 arch: The i64 result of the ABS is expanded to:
tmp = (SRA, Hi, 31)
Lo = (UADDO tmp, Lo)
Hi = (XOR tmp, (ADDCARRY tmp, hi, Lo:1))
Lo = (XOR tmp, Lo)
The "detectZextAbsDiff" function is changed for the recognition of pattern with the ABS node. Given a ABS node, detect the following pattern:
(ABS (SUB (ZERO_EXTEND a), (ZERO_EXTEND b))).
Change integer abs testcases for codegen with the ABS node support for AArch64.
Indicate that the ABS is legal for the i64 type when the NEON is supported.
Change the integer abs testcases to show changing of codegen.
Add combine and legalization of ABS nodes for Thumb arch.
Extend 'matchSelectPattern' to recognize the ABS patterns with ICMP_SGE condition.
For discussion, see https://bugs.llvm.org/show_bug.cgi?id=37743
Patch by: @ikulagin (Ivan Kulagin)
Differential Revision: https://reviews.llvm.org/D49837
llvm-svn: 356468
I found this really weird WWM-related case whereby through the WWM
transformations our isel lowering was trying to promote 2 min's into a
min3 for the i8 type, which our hardware doesn't support.
The new min3_i8.ll test case would previously spew the error:
PromoteIntegerResult #0: t69: i8 = SMIN3 t70, Constant:i8<0>, t68
Before the simple fix to our isel lowering to not do it for i8 MVT's.
Differential Revision: https://reviews.llvm.org/D59543
llvm-svn: 356464
Switch to the `MCParserUtils::parseAssignmentExpression` for parsing
assignment expressions in the `.set` directive reduces code and allows
to print an error message instead of crashing in case of incorrect
recursive using of the `.set`.
Fix for the bug https://bugs.llvm.org/show_bug.cgi?id=41053.
Differential Revision: http://reviews.llvm.org/D59452
llvm-svn: 356461
Summary:
- Make some class member methods const
- Delete unnecessary includes
- Use a simpler form of `BuildMI`
Reviewers: kripken
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59454
llvm-svn: 356440
The default implementation does we want and is going to more compatible
with dynamic linking (-fPIC) support that is planned.
This is NFC because currently we only build wasm with
`-relocation-model=static` which in turn means that the default
`isOffsetFoldingLegal` always returns true today.
Differential Revision: https://reviews.llvm.org/D54661
llvm-svn: 356410
Allow the clamp modifier on vop3 int arithmetic instructions in assembly
and disassembly.
This involved adding a clamp operand to the affected instructions in MIR
and MC, and thus having to fix up several places in codegen and MIR
tests.
Differential Revision: https://reviews.llvm.org/D59267
Change-Id: Ic7775105f02a985b668fa658a0cd7837846a534e
llvm-svn: 356399
This commit allows v_cndmask_b32_e64 with abs, neg source
modifiers on src0, src1 to be assembled and disassembled.
This does appear to be allowed, even though they are floating point
modifiers and the operand type is b32.
To do this, I added src0_modifiers and src1_modifiers to the
MachineInstr, which involved fixing up several places in codegen and mir
tests.
Differential Revision: https://reviews.llvm.org/D59191
Change-Id: I69bf4a8c73ebc65744f6110bb8fc4e937d79fbea
llvm-svn: 356398
After review comments, it was preferred to not teach MachineIRBuilder about
non-generic instructions beyond using buildInstr().
For AArch64 I've changed the buildCopy() calls to buildInstr() + a
separate addReg() call.
This also relaxes the MachineIRBuilder's COPY checking more because it may
not always have a SrcOp given to it.
llvm-svn: 356396
This fixes a couple of unflushed raw_string_ostream bugs in recent
commits that only show up on a bot building on windows with expensive
checks.
Differential Revision: https://reviews.llvm.org/D59396
Change-Id: I9c6208325503b3ee0786b4b688e13fc24a15babf
llvm-svn: 356394
It uses the generic AArch64_IMM::expandMOVImm to get the correct
number of instruction used in immediate materialization.
Reviewers: efriedma
Differential Revision: https://reviews.llvm.org/D58461
llvm-svn: 356391
This patch follows some ideas from r352866 to optimize the floating
point materialization even further. It changes isFPImmLegal to
considere up to 2 mov instruction or up to 5 in case subtarget has
fused literals.
The rationale is the cost is the same for mov+fmov vs. adrp+ldr; but
the mov+fmov sequence is always better because of the reduced d-cache
pressure. The timings are still the same if you consider movw+movk+fmov
vs. adrp+ldr will be fused (although one instruction longer).
Reviewers: efriedma
Differential Revision: https://reviews.llvm.org/D58460
llvm-svn: 356390
This allows better code size for aarch64 floating point materialization
in a future patch.
Reviewers: evandro
Differential Revision: https://reviews.llvm.org/D58690
llvm-svn: 356389