Commit Graph

56526 Commits

Author SHA1 Message Date
Matt Arsenault 015b640be4 AMDGPU: Add flag to used fixed function ABI
Pass all arguments to every function, rather than only passing the
minimum set of inputs needed for the call graph.
2020-03-13 13:27:05 -07:00
Matt Arsenault bb8622094d AMDGPU: Don't handle kernarg.segment.ptr in functions
Just lower this to null. Pass implicitarg.ptr in its place in the
argument list.
2020-03-13 12:51:12 -07:00
Nico Weber f82b32a51e Revert "Reland "[DebugInfo] Enable the debug entry values feature by default""
This reverts commit 5aa5c943f7.
Causes clang to assert, see
https://bugs.chromium.org/p/chromium/issues/detail?id=1061533#c4
for a repro.
2020-03-13 15:37:44 -04:00
Stanislav Mekhanoshin 32e90cbcd1 [AMDGPU] Disable endcf collapse
There are some functional regressions and I suspect our
scopes are not as perfectly enclosed as I expected.
Disable it for now.

Differential Revision: https://reviews.llvm.org/D76148
2020-03-13 12:33:22 -07:00
Simon Pilgrim 05c0d34918 [X86][SSE] Prefer trunc(movd(x)) to pextrb(x,0)
If we're extracting the 0'th index of a v16i8 vector we're better off using MOVD than PEXTRB, unless we're storing the value or we require the implicit zero extension of PEXTRB.

The biggest perf diff is on SLM targets where MOVD (uops=1, lat=3 tp=1) is notably faster than PEXTRB (uops=2, lat=5, tp=4).

This matches what we already do for PEXTRW.

Differential Revision: https://reviews.llvm.org/D76138
2020-03-13 18:43:04 +00:00
Philip Reames 1b86ad27a7 Use 15 byte long nops on modern Intel processors
Back in D42616, we switched our default nop length from 15 to 10 bytes because some platforms have painful decode stalls when encountering multiple instruction prefixes. (10 byte long nops come from the fact that prefixes are used to pad after 8 bytes, and some platforms have issues w/more than two prefixes.)

Based on Agner's guides, it appears to be the case that modern Intel (SandyBridge and later) can decode an arbitrary number of prefixes without issue. Intel's guide only provides up to 9 bytes; I read that as providing a safe default for all their chips. Older chips and Atom series have serious decode stalls. I can't find a conclusive reference beyond those two.

Differential Revision: https://reviews.llvm.org/D75945
2020-03-13 10:51:09 -07:00
Simon Cook a26bd4ec16 [TableGen] Support combining AssemblerPredicates with ORs
For context, the proposed RISC-V bit manipulation extension has a subset
of instructions which require one of two SubtargetFeatures to be
enabled, 'zbb' or 'zbp', and there is no defined feature which both of
these can imply to use as a constraint either (see comments in D65649).

AssemblerPredicates allow multiple SubtargetFeatures to be declared in
the "AssemblerCondString" field, separated by commas, and this means
that the two features must both be enabled. There is no equivalent to
say that _either_ feature X or feature Y must be enabled, short of
creating a dummy SubtargetFeature for this purpose and having features X
and Y imply the new feature.

To solve the case where X or Y is needed without adding a new feature,
and to better match a typical TableGen style, this replaces the existing
"AssemblerCondString" with a dag "AssemblerCondDag" which represents the
same information. Two operators are defined for use with
AssemblerCondDag, "all_of", which matches the current behaviour, and
"any_of", which adds the new proposed ORing features functionality.

This was originally proposed in the RFC at
http://lists.llvm.org/pipermail/llvm-dev/2020-February/139138.html

Changes to all current backends are mechanical to support the replaced
functionality, and are NFCI.

At this stage, it is illegal to combine features with ands and ors in a
single AssemblerCondDag. I suspect this case is sufficiently rare that
adding more complex changes to support it are unnecessary.

Differential Revision: https://reviews.llvm.org/D74338
2020-03-13 17:13:51 +00:00
Simon Pilgrim a2db388dce [CostModel][X86] Improve ISD::CTTZ costs accounting for BSF/TZCNT implementations 2020-03-13 16:51:13 +00:00
Simon Pilgrim 846c614f54 [X86] combineExtractWithShuffle - pull out repeated getSizeInBits() call. NFC. 2020-03-13 15:36:04 +00:00
Simon Pilgrim fe047fbccc [X86] LowerEXTRACT_VECTOR_ELT - pull out repeated getOperand() calls. NFC.
Also, cleanup LowerEXTRACT_VECTOR_ELT_SSE4 comments which had references to non-constant extraction indices.
2020-03-13 15:36:02 +00:00
Andrzej Warzynski a0c15ed460 [AArch64][SVE] Add the @llvm.aarch64.sve.dup.x intrinsic
Summary:
This intrinsic implements the unpredicated duplication of scalar values
and is mapped to (through ISD::SPLAT_VECTOR):
  * DUP <Zd>.<T>, #<imm>
  * DUP <Zd>.<T>, <R><n|SP>

Reviewed by: sdesmalen

Differential Revision: https://reviews.llvm.org/D75900
2020-03-13 12:40:22 +00:00
David Green 2c6c169dbd [ARM] Optimise ASRL/LSRL to smaller shifts using demand bits.
The ASRL/LSRL long shifts are generated from 64bit shifts. Once we have
them, it might turn out that enough of the 64bit result was not required
that we can use a smaller shift to perform the same result. As the
smaller shift can in general be folded in more way, such as into add
instructions in one of the test cases here, we can use the demand bit
analysis to prefer the smaller shifts where we can.

Differential Revision: https://reviews.llvm.org/D75371
2020-03-13 10:09:03 +00:00
David Green f67d93dc23 [ARM] Constant long shift combines
This changes the way that asrl and lsrl intrinsics are lowered, going
via a the ISEL ASRL and LSLL nodes instead of straight to machine nodes.
On top of that, it adds some constant folds for long shifts, in case it
turns out that the shift amount was either constant or 0.

Differential Revision: https://reviews.llvm.org/D75553
2020-03-13 08:54:59 +00:00
QingShan Zhang d0fb34dc09 [PowerPC] Replace the PPCISD:: SExtVElems with ISD::SIGN_EXTEND_INREG to leverage the combine rules
The PPCISD::SExtVElems was added by commit https://reviews.llvm.org/D34009. However,
we have another ISD node ISD::SIGN_EXTEND_INREG that perfectly match the semantics
of SExtVElems. And the DAGCombiner has some combine rules for SIGN_EXTEND_INREG
that produce better code.

Differential Revision: https://reviews.llvm.org/D70771
2020-03-13 07:28:28 +00:00
Craig Topper 09c8f38924 [X86] Add isel patterns for X86VBroadcast with i16 truncates from i16->i64 zextload/extload.
We can form vpbroadcastw with a folded load.

We had patterns for i16->i32 zextload/extload, but nothing prevents
i64 from occuring.

I'd like to move this all to DAG combine to fix more cases, but
this is trivial fix to minimize test diffs when moving to a combine.
2020-03-13 00:10:48 -07:00
Amy Kwan 1ba3d2639d [PowerPC][NFC] Rename instruction formats in PPCInstrPrefix.td
This patch renames some of the instruction formats within PPCInstrPrefix.td to
adopt a more uniform naming convention. It also adds the naming convention
extension, `_MEM` to indicate instruction formats for memory ops.

Differential Revision: https://reviews.llvm.org/D75819
2020-03-13 00:50:08 -05:00
Matt Arsenault ccc6e780c8 AMDGPU: Directly annotate functions if they have calls
Currently we infer whether the flat-scratch-init kernel input should
be enabled based on calls. Move this handling, so we can decide if the
full set of ABI inputs is needed in kernels. Ideally we would have an
analysis of some sort, rather than the function attributes.
2020-03-12 19:10:59 -04:00
Stanislav Mekhanoshin a73528649c [AMDGPU] Simplify exec copies
The patch removes late endcf handling and only leaves the
related portion with redundant exec mask copy elimination.

Differential Revision: https://reviews.llvm.org/D76095
2020-03-12 14:54:19 -07:00
Simon Pilgrim e91feeed21 [AMDGPU] Add ISD::FSHR -> ALIGNBIT support
This patch allows ISD::FSHR(i32) patterns to lower to ALIGNBIT instructions.

This improves test coverage of ISD::FSHR matching - x86 has both FSHL/FSHR instructions and we prefer FSHL by default.

Differential Revision: https://reviews.llvm.org/D76070
2020-03-12 20:16:57 +00:00
Thomas Lively 4e589e6c26 [WebAssembly] Fix SIMD shift unrolling to avoid assertion failure
Summary:
Using the default DAG.UnrollVectorOp on v16i8 and v8i16 vectors
results in i8 or i16 nodes being inserted into the SelectionDAG. Since
those are illegal types, this causes a legalization assertion failure
for some code patterns, as uncovered by PR45178. This change unrolls
shifts manually to avoid this issue by adding and using a new optional
EVT argument to DAG.ExtractVectorElements to control the type of the
extract_element nodes.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, zzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76043
2020-03-12 12:20:14 -07:00
Stanislav Mekhanoshin 360aff0493 [AMDGPU] Simplify nested SI_END_CF
This is to replace the optimization from the SIOptimizeExecMaskingPreRA.
We have less opportunities in the control flow lowering because many
VGPR copies are still in place and will be removed later, but we know
for sure an instruction is SI_END_CF and not just an arbitrary S_OR_B64
with EXEC.

The subsequent change needs to convert s_and_saveexec into s_and and
address new TODO lines in tests, then code block guarded by the
-amdgpu-remove-redundant-endcf option in the pre-RA exec mask optimizer
will be removed.

Differential Revision: https://reviews.llvm.org/D76033
2020-03-12 11:25:07 -07:00
Zarko Todorovski d688312660 [PowerPC][AIX] Implement formal arguments passed in stack memory.
This patch is the callee side counterpart for https://reviews.llvm.org/D73209.
It removes the fatal error when we pass more formal arguments than available
registers.

Differential Revision: https://reviews.llvm.org/D74225
2020-03-12 11:48:00 -04:00
Xiangling Liao 3e53bf5781 [PowerPC32] Fix the `setcc` inconsistent result type problem
Summary:
On 32-bit PPC target[AIX and BE], when we convert an `i64` to `f32`, a `setcc` operand expansion is needed. The expansion will set the result type of expanded `setcc` operation based on if the subtarget use CRBits or not. If the subtarget does use the CRBits, like AIX and BE, then it will set the result type to `i1`, leading to an inconsistency with original `setcc` result type[i32].
And the reason why it crashed underneath is because we don't set result type of setcc consistent in those two places.

This patch fixes this problem by setting original setcc opnode result type also with `getSetCCResultType`  interface.

Reviewers: sfertile, cebowleratibm, hubert.reinterpretcast, Xiangling_L

Reviewed By: sfertile

Subscribers: wuzish, nemanjai, hiraditya, kbarton, jsji, shchenz, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75702
2020-03-12 10:50:37 -04:00
Simon Moll c8e1081da6 [VE][nfc] Use RRIm for RRINDm, remove the latter
Summary:
De-duplicate isel instruction classes by using RRIm for RRINDm. The latter
becomes obsolete.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D76063
2020-03-12 15:38:07 +01:00
Sean Fertile 8b39341fb0 [PowerPC][AIX] Fix printing of program counter for AIX assembly.
Program counter on AIX is the dollar-sign.

Differential Revision:https://reviews.llvm.org/D75627
2020-03-12 10:37:18 -04:00
Andrzej Warzynski 46b9f14d71 [AArch64][SVE] Add intrinsics for non-temporal scatters/gathers
Summary:
This patch adds the following intrinsics for non-temporal gather loads
and scatter stores:
  * aarch64_sve_ldnt1_gather_index
  * aarch64_sve_stnt1_scatter_index
These intrinsics implement the "scalar + vector of indices" addressing
mode.

As opposed to regular and first-faulting gathers/scatters, there's no
instruction that would take indices and then scale them. Instead, the
indices for non-temporal gathers/scatters are scaled before the
intrinsics are lowered to `ldnt1` instructions.

The new ISD nodes, GLDNT1_INDEX and SSTNT1_INDEX, are only used as
placeholders so that we can easily identify the cases implemented in
this patch in performGatherLoadCombine and performScatterStoreCombined.
Once encountered, they are replaced with:
  * GLDNT1_INDEX -> SPLAT_VECTOR + SHL + GLDNT1
  * SSTNT1_INDEX -> SPLAT_VECTOR + SHL + SSTNT1

The patterns for lowering ISD::SHL for scalable vectors (required by
this patch) were missing, so these are added too.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D75601
2020-03-12 13:55:56 +00:00
Simon Pilgrim 1e686d2689 [X86] Add FeatureFast7ByteNOP flag
Lets us remove another SLM proc family flag usage.

This is NFC, but we should probably check whether atom/glm/knl? should be using this flag as well...
2020-03-12 13:06:43 +00:00
Dylan McKay 2cf4b4de0c [AVR] Fix reads of uninitialized variables from constructor of AVRSubtarget
The initialization order was not correct. These bugs were discovered by
valgrind. They appear to work fine in practice but this patch should
unblock switching the AVR backend on by default as now a standard AVR
llc invocation runs without memory errors.

The AVRISelLowering constructor would run before the subtarget boolean
fields were initialized to false. Now, the initialization order is
correct.
2020-03-13 00:57:19 +13:00
Simon Pilgrim 4689eae820 [X86] combineOrShiftToFunnelShift - remove shift by immediate handling.
Now that D75114 has landed, DAGCombiner handles this case so the code is redundant.
2020-03-12 11:46:51 +00:00
Simon Tatham 3f8e714e2f [ARM,MVE] Add intrinsics and isel for MVE fused multiply-add.
Summary:
This adds the ACLE intrinsic family for the VFMA and VFMS
instructions, which perform fused multiply-add on vectors of floats.

I've represented the unpredicated versions in IR using the cross-
platform `@llvm.fma` IR intrinsic. We already had isel rules to
convert one of those into a vector VFMA in the simplest possible way;
but we didn't have rules to detect a negated argument and turn it into
VFMS, or rules to detect a splat argument and turn it into one of the
two vector/scalar forms of the instruction. Now we have all of those.

The predicated form uses a target-specific intrinsic as usual, but
I've stuck to just one, for a predicated FMA. The subtraction and
splat versions are code-generated by passing an fneg or a splat as one
of its operands, the same way as the unpredicated version.

In arm_mve_defs.h, I've had to introduce a tiny extra piece of
infrastructure: a record `id` for use in codegen dags which implements
the identity function. (Just because you can't declare a Tablegen
value of type dag which is //only// a `$varname`: you have to wrap it
in something. Now I can write `(id $varname)` to get the same effect.)

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D75998
2020-03-12 11:13:50 +00:00
Dylan McKay 13be27482e [AVR] Fix read of uninitialized variable AVRSubtarget:::ELFArch
Found by the LLVM MemorySanitizer tests when switching AVR to a default
backend.

ELFArch must be initialized before the call to
initializeSubtargetDependencies().

The uninitialized read would occur deep within TableGen'd code.
2020-03-13 00:09:13 +13:00
Qiu Chaofan 096d545376 [PowerPC] Add strict-fp intrinsic to FP arithmetic
This patch adds basic strict-fp intrinsics support to PowerPC backend,
including basic arithmetic operations (add/sub/mul/div).

Reviewed By: steven.zhang, andrew.w.kaylor

Differential Revision: https://reviews.llvm.org/D63916
2020-03-12 17:02:54 +08:00
Sebastian Neubauer 4327a9b46b [AMDGPU] Use progbits type for .AMDGPU.disasm section
The note section type implies a specific format that this section does
not have thus tools like readelf fail here. Progbits has no format and
another pipeline compiler already sets the type to progbits.

Differential Revision: https://reviews.llvm.org/D75913
2020-03-12 09:08:11 +01:00
Shengchen Kan 3a503ce663 [X86] Reduce the number of emitted fragments due to branch align
Summary:
Currently, a BoundaryAlign fragment may be inserted after the branch
that needs to be aligned to truncate the current fragment, this fragment is
unused at most of time. To avoid that, we can insert a new empty Data
fragment instead. Non-relaxable instruction is usually emitted into Data
fragment, so the inserted empty Data fragment will be reused at a high
possibility.

Reviewers: annita.zhang, reames, MaskRay, craig.topper, LuoYuanke, jyknight

Reviewed By: reames, LuoYuanke

Subscribers: llvm-commits, dexonsmith, hiraditya

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75438
2020-03-12 15:37:35 +08:00
Djordje Todorovic 3b984641a7 [DebugInfo] Fix build failure on the mingw
Add the workaround for the X86::MOV16ri when describing call site
parameters.
2020-03-12 08:18:01 +01:00
QingShan Zhang 518292dbdf [PowerPC] Add the MacroFusion support for Power8
This patch is intend to implement the missing P8 MacroFusion for LLVM
according to Power8 User's Manual Section 10.1.12 Instruction Fusion

Differential Revision: https://reviews.llvm.org/D70651
2020-03-12 05:15:41 +00:00
Teresa Johnson 8f5e3c74b6 [PowerPC] Fix compile time issue in recursive CTR analysis code
Summary:
Avoid re-examining operands on recursive walk looking for CTR.
This was causing huge compile time after some earlier optimization
created a large expression.

The start of the expression (created by IndVarSimplify) looked like:

%469 = lshr i64 trunc (i128 xor (i128 udiv (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011), i128 64), i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011)) to i64), i64 45) to i128), i128 8192506886679785011), i128 64), i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011), i128 64), i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011)) to i64), i64 45) to i128), ...

with the _ZN4absl13hash_internal13CityHashState5kSeedE referenced many times.

Reviewers: hfinkel

Subscribers: nemanjai, hiraditya, kbarton, jsji, shchenz, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75790
2020-03-11 16:11:14 -07:00
Matt Arsenault 1e0c540360 AMDGPU: Don't hard error on LDS globals in functions
Instead, emit a trap and a warning. We force inlining of this
situation, so any function where this happens should be dead as
indirect or external calls are not yet supported. This should avoid
erroring on dead code.
2020-03-11 15:34:11 -04:00
Francesco Petrogalli 4dde9e9b02 [llvm][CodeGen] IR intrinsics for SVE2 contiguous conflict detection instructions.
Summary:
The IR intrinsics are mapped to the following SVE2 instructions:

* WHILERW <Pd>.<T>, <Xn>, <Xm>
* WHILEWR <Pd>.<T>, <Xn>, <Xm>

The intrinsics introduced in this patch are the IR counterpart of the
SVE ACLE functions `svwhilerw` and `svwhilewr` (all data type
variants).

Patch by Maciej Gąbka <maciej.gabka@arm.com>.

Reviewers: kmclaughlin, rengolin

Reviewed By: kmclaughlin

Subscribers: tschuett, kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75862
2020-03-11 18:28:02 +00:00
Stanislav Mekhanoshin 9801e5469b [AMDGPU] Disable nested endcf collapse
The assumption is that conditional regions are perfectly nested
and a mask restored at the exit from the inner block will be
completely covered by a mask restored in the outer.

It turns out with our current structurizer this is not always
the case.

Disable the optimization for now, but I want to keep it around
for a while to either try after further structurizer changes or
to move it into control flow lowering where we have more info
and reuse the test.

Differential Revision: https://reviews.llvm.org/D75958
2020-03-11 11:24:20 -07:00
Jay Foad a46dba24fa [AMDGPU] Extend macro fusion for ADDC and SUBB to SUBBREV
Summary:
There's a lot of test case churn but the overall effect is to increase
the number of back-to-back v_sub,v_subbrev pairs, which can execute with
no delay even on gfx10.

Reviewers: arsenm, rampitec, nhaehnle

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75999
2020-03-11 17:59:21 +00:00
Andrzej Warzynski a9f1583228 [AArch64][SVE] Add the @llvm.aarch64.sve.sel intrinsic
Reviewers: sdesmalen, efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75928
2020-03-11 17:05:21 +00:00
Matt Arsenault a2202f6a3f AMDGPU/GlobalISel: Manually RegBankSelect copies
This was failng on any pre-assigned copy to the VCC bank.

This is something of a workaround for the default implementation in
getInstrMappingImpl, and how it treats copy-like operations in
general.

Copy-like operations are considered to only have one result register
bank, rather than separate banks for each source like a normal
instruction. To avoid potentially mishandling reg_sequence with
impossible operand combinations, the generic implementation errors on
impossible costs. If the bank was already assigned, is treated it
as-if it were an unsatisfiable REG_SEQUENCE mapping. We really don't
get any value from any of what getInstrMappingImpl tries to do for
copies, so just directly emit the simple mapping we really want.
2020-03-11 11:12:12 -04:00
Sam Parker d941df363d [NFC][ARM] Reorder some logic
Move some logic around in LowOverheadLoop::ValidateLiveOut
2020-03-11 11:40:09 +00:00
Simon Pilgrim b3b4727a3e [X86] Replace (most) X86ISD::SHLD/SHRD usage with ISD::FSHL/FSHR generic opcodes (PR39467)
For i32 and i64 cases, X86ISD::SHLD/SHRD are close enough to ISD::FSHL/FSHR that we can use them directly, we just need to account for the operand commutation for SHRD.

The i16 SHLD/SHRD case is annoying as the shift amount is modulo-32 (vs funnel shift modulo-16), so I've added X86ISD::FSHL/FSHR equivalents, which matches the generic implementation in all other terms.

Something I'm slightly concerned with is that ISD::FSHL/FSHR legality is controlled by the Subtarget.isSHLDSlow() feature flag - we don't normally use non-ISA features for this but it allows the DAG combines to continue to operate after legalization in a lot more cases.

The X86 *bits.ll changes are all affected by the same issue - we now have a "FSHR(-1,-1,amt) -> ROTR(-1,amt) -> (-1)" simplification that reduces the dependencies enough for the branch fall through code to mess up.

Differential Revision: https://reviews.llvm.org/D75748
2020-03-11 11:17:49 +00:00
Anna Welker a6d3bec83f [TTI][ARM][MVE] Refine gather/scatter cost model
Refines the gather/scatter cost model, but also changes the TTI
function getIntrinsicInstrCost to accept an additional parameter
which is needed for the gather/scatter cost evaluation.
This did require trivial changes in some non-ARM backends to
adopt the new parameter.
Extending gathers and truncating scatters are now priced cheaper.

Differential Revision: https://reviews.llvm.org/D75525
2020-03-11 10:23:41 +00:00
Victor Campos 8a12553223 [ARM] Improve codegen of volatile load/store of i64
Summary:
Instead of generating two i32 instructions for each load or store of a volatile
i64 value (two LDRs or STRs), now emit LDRD/STRD.

These improvements cover architectures implementing ARMv5TE or Thumb-2.

The code generation explicitly deviates from using the register-offset
variant of LDRD/STRD. In this variant, the register allocated to the
register-offset cannot be reused in any of the remaining operands. Such
restriction seems to be non-trivial to implement in LLVM, thus it is
left as a to-do.

Reviewers: dmgreen, efriedma, john.brawn, nickdesaulniers

Reviewed By: efriedma, nickdesaulniers

Subscribers: danielkiss, alanphipps, hans, nathanchance, nickdesaulniers, vvereschaka, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70072
2020-03-11 10:19:27 +00:00
Matt Arsenault edd0dfca0d AMDGPU/GlobalISel: Refine G_TRUNC legality rules
Scalarize most truncates. Avoid touching cases that could end up in
unresolvable infinite loops.
2020-03-10 15:32:22 -07:00
Matt Arsenault ce8a1f7294 GlobalISel: Implement fewerElementsVector for G_TRUNC
Extend fewerElementsVectorBasic to handle operands with different
element types.
2020-03-10 15:17:20 -07:00
Matt Arsenault 200b20639a AMDGPU: Use V_MAC_F32 for fmad.ftz
This avoids regressions in a future patch. I'm confused by the use of
the gfx9 usage legacy_mad. Was this a pointless instruction rename, or
uses fmul_legacy handling? Why is regular mac avilable in that case?
2020-03-10 14:41:06 -07:00