Commit Graph

32386 Commits

Author SHA1 Message Date
Matt Arsenault 872e899b75 AMDGPU/GlobalISel: Legalize unpacked d16 image operations
On targets that don't have the normal packed f16 layout, handle these
during legalization. Directly modify the register types. We can infer
this was a d16 load based on the mem operand size during selection.

A16 operands should possibly be handled here as well, but don't worry
about that yet.
2020-01-30 08:36:11 -05:00
Matt Arsenault d21182d692 AMDGPU/GlobalISel: Only map VOP operands to VGPRs
This trivially avoids violating the constant bus restriction.

Previously this was allowing one SGPR in the first source
operand, which technically also avoided violating this for most
operations (but not for special cases reading vcc).

We do need to write some new, smarter operand folds to pick the
optimal SGPR to use in some kind of post-isel fold, but that's purely
an optimization.

I was originally thinking we would pick which operands should be SGPRs
in RegBankSelect, but I think this isn't really manageable. There
would be additional complexity to handle every G_* instruction, and
then any nontrivial instruction patterns would need to know when to
avoid violating it, which is likely to be very error prone.

I think having all inputs being canonically copies to VGPRs will
simplify the operand folding logic. The current folding we do is
backwards, and only considers one operand at a time, relative to
operands it already has. It therefore poorly handles the case where
there is already a constant bus operand user. If all operands are
copies, it's somewhat simpler to consider all input operands at once
to choose the optimal constant bus user.

Since the failure mode for constant bus violations is now a verifier
error and not an selection failure, this moves towards a place where
we can turn on the fallback mode. The SGPR copy folding optimizations
can be left for later.
2020-01-30 08:32:35 -05:00
Matt Arsenault b4a0766c8d AMDGPU/GlobalISel: Select llvm.amdgcn.buffer.atomic.cmpswap 2020-01-30 08:22:43 -05:00
John Brawn 258d8dd76a [FPEnv][AArch64] Add lowering and instruction selection for STRICT_FP_ROUND
This gets selected to the appropriate fcvt instruction. Handling from there on
isn't fully correct yet, as we need to model fcvt reading and writing to fpsr
and fpcr.

Differential Revision: https://reviews.llvm.org/D73201
2020-01-30 12:51:25 +00:00
Sam Parker 06e12893ff [ARM][LowOverheadLoops] Skip debug values
While iterating through the loop, don't inspect any dbg values.

Differential Revision: https://reviews.llvm.org/D73688
2020-01-30 11:51:58 +00:00
John Brawn 2224407ef5 Add lowering of STRICT_FSETCC and STRICT_FSETCCS
These become STRICT_FCMP and STRICT_FCMPE, which then get selected to the
corresponding FCMP and FCMPE instructions, though the handling from there on
isn't fully correct as we don't model reads and writes to FPCR and FPSR.

Differential Revision: https://reviews.llvm.org/D73368
2020-01-30 10:40:55 +00:00
Connor Abbott ce06d50756 AMDGPU: Fix AMDGPUUnifyDivergentExitNodes with no normal returns
Summary:
The code was assuming in a few places that if there was only one exit
from the function that it was a normal return, which is invalid. It
could be an infinite loop, in which case we still need to insert the
usual fake edge so that the null export happens. This fixes shaders that
end with an infinite loop that discards.

Reviewers: arsenm, nhaehnle, critson

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71192
2020-01-30 10:55:02 +01:00
Sam Parker 6726d67bfd [ARM][LowOverheadLoops] Check scalar predicates
When trying to remove the loop iteration count, check that the
instruction will always execute.

Differential Revision: https://reviews.llvm.org/D73682
2020-01-30 09:13:04 +00:00
Amara Emerson 610f1d22f1 [AArch64][GlobalISel] During ISel try to convert G_PTR_ADD to G_ADD.
This lowering tries to look for G_PTR_ADD instructions and then converts
them to a standard G_ADD with a COPY on the source, and G_INTTOPTR on the
result. This is ok for address space 0 on AArch64 as p0 can be treated as
s64.

The motivation behind this is to expose the add semantics to the imported
tablegen patterns. We shouldn't need to check for uses being loads/stores,
because the selector works bottom up, uses before defs. By the time we
end up trying to select a G_PTR_ADD, we should have already attempted to
fold this into addressing modes and were therefore unsuccessful.

This gives some performance and code size improvements across the board.

Differential Revision: https://reviews.llvm.org/D73673
2020-01-29 23:04:52 -08:00
Matt Arsenault 7f3280ecdd AMDGPU/GlobalISel: Select permlane16/permlanex16 2020-01-29 17:55:31 -05:00
Amara Emerson c12f046eb9 [GlobalISel] Add new combine to convert scalar G_MUL to G_SHL.
For pow2 constants we should use G_SHL for pattern matching (and perf)
purposes later.

Vector support not yet implemented.

Differential Revision: https://reviews.llvm.org/D73659
2020-01-29 13:39:00 -08:00
Jessica Paquette 050cd443ca [AArch64][GlobalISel] Fix TBNZ/TBZ opcode selection
When the bit is <= 32, we have to use the W register variant for TB(N)Z.

This is because of the way the instruction is encoded.

Differential Revision: https://reviews.llvm.org/D73660
2020-01-29 13:11:18 -08:00
Matt Arsenault d3cea95475 AMDGPU/GlobalISel: Fix tests in release build
Irritatingly the failure output is different in release vs. debug
because of the legality check is removed without asserts, so a register
ends up constrained only in release builds.
2020-01-29 12:27:16 -08:00
Amara Emerson 0da937bb5c [GlobalISel][IRTranslator] Follow convention and put constant offset of getelementptr arithmetic on RHS.
We were needlessly putting known constant values on the LHS of a G_MUL, which
is suboptimal.

Differential Revision: https://reviews.llvm.org/D73650
2020-01-29 11:37:19 -08:00
Fangrui Song 8903e61b66 [AsmPrinter][ELF] Define local aliases (.Lfoo$local) for GlobalObjects
For `MC_GlobalAddress` operands referencing **certain** GlobalObjects,
we can lower them to STB_LOCAL aliases to avoid costs brought by
assembler/linker's conservative decisions about symbol interposition:

* An assembler conservatively assumes a global default visibility symbol interposable (ELF
  semantics). So relocations in object files are needed even if the code generator assumed
  the definition exact and non-interposable.
* The relocations can cause the creation of PLT entries on some targets for -shared links.
  A linker conservatively assumes a global default visibility symbol interposable (if not
  otherwise constrained by -Bsymbolic/--dynamic-list/VER_NDX_LOCAL/etc).

"certain" refers to GlobalObjects in the intersection of
`hasExactDefinition() and !isInterposable()`: `external`, `appending`, `internal`, `private`.
Local linkages (`internal` and `private`) cannot be interposed. `appending` is for very
few objects LLVM interpret specially.  So the set just includes `external`.

This patch emits STB_LOCAL aliases (.Lfoo$local) for such GlobalObjects, so that targets can lower
MC_GlobalAddress operands to STB_LOCAL aliases if applicable.
We may extend the scope and include GlobalAlias in the future.

LLVM's existing -fno-semantic-interposition behaviors give us license to do such optimizations:

* Various optimizations (ipconstprop, inliner, sccp, sroa, etc) treat normal ExternalLinkage
  GlobalObjects as non-interposable.
* Before D72197, MC resolved a PC-relative VK_None fixup to a non-local symbol at assembly time (no
  outstanding relocation), if the target is defined in the same section. Put it simply, even if IR
  optimizations failed to optimize and allowed interposition for the function call in
  `void foo() {} void bar() { foo(); }`, the assembler would disallow it.

This patch sets up AsmPrinter infrastructure to make -fno-semantic-interposition more so.
With and without the patch, the object file output should be identical:
`.Lfoo$local` does not take a symbol table entry.

Reviewed By: sfertile

Differential Revision: https://reviews.llvm.org/D73228
2020-01-29 10:58:43 -08:00
Austin Kerbow 2605adb69c [AMDGPU][GlobalISel] Select 8-byte LDS Ops with 4-byte alignment
Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73585
2020-01-29 10:42:12 -08:00
Craig Topper 90c31b0f42 [X86] Custom lower ISD::FROUND with SSE4.1 to avoid a libcall.
ISD::FROUND is defined to round to nearest with ties rounding
away from 0. This mode isn't supported in hardware on X86.

But as long as we aren't compiling with trapping math, we can
emulate this with floor(X + copysign(nextafter(0.5, 0.0), X)).

We have to use nextafter to avoid some corner cases that adding
0.5 would have. For example, if X is nextafter(0.5, 0.0) it should
round to 0.0, but adding 0.5 would need one extra bit of mantissa
than can be stored so it rounds to 1.0. Adding nextafter(0.5, 0.0)
instead will just increase the exponent by 1 and leave the mantissa
as all 1s. This would be nextafter(1.0, 0.0) which will floor to 0.0.

Techically this requires -fno-trapping-math which isn't our default.
But if we care about exceptions we should be using constrained
intrinsics. Constrained intrinsics would use STRICT_FROUND which
won't go through this code.

Fixes PR42195.

Differential Revision: https://reviews.llvm.org/D73607
2020-01-29 09:10:02 -08:00
Jay Foad d07a789579 [AMDGPU] Cluster FLAT instructions with both vaddr and saddr
Reviewers: rampitec, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73634
2020-01-29 17:01:35 +00:00
Craig Topper e5edd641fd [X86] Use a shorter sequence to implement FLT_ROUNDS
This code needs to map from the FPCW 2-bit encoding for rounding mode to the 2-bit encoding defined for FLT_ROUNDS. The previous implementation did some clever swapping of bits and adding 1 modulo 4 to do the mapping.

This patch instead uses an 8-bit immediate as a lookup table of four 2-bit values. Then we use the 2-bit FPCW encoding to index the lookup table by using a right shift and an AND. This requires extracting the 2-bit value from FPCW and multipying it by 2 to make it usable as a shift amount. But still results in less code.

Differential Revision: https://reviews.llvm.org/D73599
2020-01-29 08:56:33 -08:00
Matt Arsenault 62129878a6 AMDGPU/GlobalISel: Fix tablegen selection for scalar bin ops
Fixes selection for scalar G_SMULH/G_UMULH. Also switches to using
tablegen selected add/sub, which switch to the signed version of the
opcode. This matches the current DAG behavior. We can't drop the
manual selection for add/sub yet, because it's still both for VALU
add/sub and for G_PTR_ADD.
2020-01-29 08:55:54 -08:00
Kazushi (Jam) Marukawa fef80a2946 [VE] (conditional) branch modification & isel patterns
Summary:
InstInfo for branch modification, (conditional) branch isel patterns and tests.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D73632
2020-01-29 17:40:57 +01:00
Matt Arsenault b63629a58d GlobalISel: Fix mask computation in lowerInsert
This is supposed to be the high bit index, not the width. Use the
wrapping form of getBitsSet and avoid the bitflip.
2020-01-29 08:25:36 -08:00
Jay Foad 0d7bd34312 [MachineScheduler] Ignore artificial edges when forming store chains
Summary:
BaseMemOpClusterMutation::apply forms store chains by looking for
control (i.e. non-data) dependencies from one mem op to another.

In the test case, clusterNeighboringMemOps successfully clusters the
loads, and then adds artificial edges to the loads' successors as
described in the comment:
      // Copy successor edges from SUa to SUb. Interleaving computation
      // dependent on SUa can prevent load combining due to register reuse.
The effect of this is that *data* dependencies from one load to a store
are copied as *artificial* dependencies from a different load to the
same store.

Then when BaseMemOpClusterMutation::apply looks at the stores, it finds
that some of them have a control dependency on a previous load, which
breaks the chains and means that the stores are not all considered part
of the same chain and won't all be clustered.

The fix is to only consider non-artificial control dependencies when
forming chains.

Subscribers: MatzeB, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71717
2020-01-29 16:23:01 +00:00
Matt Arsenault 96352e0a1b AMDGPU/GlobalISel: Handle LDS with relocations case 2020-01-29 08:18:55 -08:00
Connor Abbott 87d98c1495 AMDGPU: Fix handling of infinite loops in fragment shaders
Summary:
Due to the fact that kill is just a normal intrinsic, even though it's
supposed to terminate the thread, we can end up with provably infinite
loops that are actually supposed to end successfully. The
AMDGPUUnifyDivergentExitNodes pass breaks up these loops, but because
there's no obvious place to make the loop branch to, it just makes it
return immediately, which skips the exports that are supposed to happen
at the end and hangs the GPU if all the threads end up being killed.

While it would be nice if the fact that kill terminates the thread were
modeled in the IR, I think that the structurizer as-is would make a mess if we
did that when the kill is inside control flow. For now, we just add a null
export at the end to make sure that it always exports something, which fixes
the immediate problem without penalizing the more common case. This means that
we sometimes do two "done" exports when only some of the threads enter the
discard loop, but from tests the hardware seems ok with that.

This fixes dEQP-VK.graphicsfuzz.while-inside-switch with radv.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70781
2020-01-29 17:13:25 +01:00
Matt Arsenault 94e8ef4d4c AMDGPU/GlobalISel: Look through copies for source modifiers
When all VOP instructions are legalized to VGPRs, any SGPR source
modifiers will have a copy in the way.
2020-01-29 08:08:13 -08:00
Matt Arsenault 752e2e245a AMDGPU/GlobalISel: Rewrite fadd select tests
Convert to the style most others use with one test instruction per
function, and use an implicit use to ensure the result register class
is constrained.

Change-Id: I6109148b0e3c80aa5535796a37abca583c19a936
2020-01-29 07:49:38 -08:00
Connor Abbott 08b205bb48 Revert "AMDGPU: Fix handling of infinite loops in fragment shaders"
This reverts commit 0994c485e6.
2020-01-29 16:14:52 +01:00
Connor Abbott 13ab22ab22 Revert "AMDGPU: Fix AMDGPUUnifyDivergentExitNodes with no normal returns"
This reverts commit 323bfde20c.
2020-01-29 16:14:49 +01:00
Kazushi (Jam) Marukawa 0bec0e7151 [VE] udiv/sdiv/urem/srem/mul isel patterns
Summary:
udiv/sdiv/urem/srem/mul integer isel patterns and tests.
Pretend for now that integer division were always cheap in HW.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D73623
2020-01-29 15:59:50 +01:00
Matt Arsenault 02adfb5155 AMDGPU/GlobalISel: Manually select scalar f64 G_FNEG
This should be no problem to support with a pattern, but it turns out
there are just too many yaks to shave. The main problem is in the DAG
emitter, which I have no desire to sink effort into fixing.

If we had a bit to disable patterns in the DAG importer, fixing the
GlobalISelEmitter is more manageable.
2020-01-29 06:49:16 -08:00
Matt Arsenault c5c1bb3374 GlobalISel: Lower G_WRITE_REGISTER 2020-01-29 06:48:24 -08:00
Connor Abbott 323bfde20c AMDGPU: Fix AMDGPUUnifyDivergentExitNodes with no normal returns
Summary:
The code was assuming in a few places that if there was only one exit
from the function that it was a normal return, which is invalid. It
could be an infinite loop, in which case we still need to insert the
usual fake edge so that the null export happens. This fixes shaders that
end with an infinite loop that discards.

Reviewers: arsenm, nhaehnle, critson

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71192
2020-01-29 15:08:46 +01:00
Connor Abbott 0994c485e6 AMDGPU: Fix handling of infinite loops in fragment shaders
Summary:
Due to the fact that kill is just a normal intrinsic, even though it's
supposed to terminate the thread, we can end up with provably infinite
loops that are actually supposed to end successfully. The
AMDGPUUnifyDivergentExitNodes pass breaks up these loops, but because
there's no obvious place to make the loop branch to, it just makes it
return immediately, which skips the exports that are supposed to happen
at the end and hangs the GPU if all the threads end up being killed.

While it would be nice if the fact that kill terminates the thread were
modeled in the IR, I think that the structurizer as-is would make a mess if we
did that when the kill is inside control flow. For now, we just add a null
export at the end to make sure that it always exports something, which fixes
the immediate problem without penalizing the more common case. This means that
we sometimes do two "done" exports when only some of the threads enter the
discard loop, but from tests the hardware seems ok with that.

This fixes dEQP-VK.graphicsfuzz.while-inside-switch with radv.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70781
2020-01-29 15:08:46 +01:00
Sanne Wouda 2939fc13c8 [AArch64] Add IR intrinsics for sq(r)dmulh_lane(q)
Summary:
Currently, sqdmulh_lane and friends from the ACLE (implemented in arm_neon.h),
are represented in LLVM IR as a (by vector) sqdmulh and a vector of (repeated)
indices, like so:

   %shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
   %vqdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16> %a, <4 x i16> %shuffle)

When %v's values are known, the shufflevector is optimized away and we are no
longer able to select the lane variant of sqdmulh in the backend.

This defeats a (hand-coded) optimization that packs several constants into a
single vector and uses the lane intrinsics to reduce register pressure and
trade-off materialising several constants for a single vector load from the
constant pool, like so:

   int16x8_t v = {2,3,4,5,6,7,8,9};
   a = vqdmulh_laneq_s16(a, v, 0);
   b = vqdmulh_laneq_s16(b, v, 1);
   c = vqdmulh_laneq_s16(c, v, 2);
   d = vqdmulh_laneq_s16(d, v, 3);
   [...]

In one microbenchmark from libjpeg-turbo this accounts for a 2.5% to 4%
performance difference.

We could teach the compiler to recover the lane variants, but this would likely
require its own pass.  (Alternatively, "volatile" could be used on the constants
vector, but this is a bit ugly.)

This patch instead implements the following LLVM IR intrinsics for AArch64 to
maintain the original structure through IR optmization and into instruction
selection:
- sqdmulh_lane
- sqdmulh_laneq
- sqrdmulh_lane
- sqrdmulh_laneq.

These 'lane' variants need an additional register class.  The second argument
must be in the lower half of the 64-bit NEON register file, but only when
operating on i16 elements.

Note that the existing patterns for shufflevector and sqdmulh into sqdmulh_lane
(etc.) remain, so code that does not rely on NEON intrinsics to generate these
instructions is not affected.

This patch also changes clang to emit these IR intrinsics for the corresponding
NEON intrinsics (AArch64 only).

Reviewers: SjoerdMeijer, dmgreen, t.p.northover, rovka, rengolin, efriedma

Reviewed By: efriedma

Subscribers: kristof.beyls, hiraditya, jdoerfert, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71469
2020-01-29 13:25:23 +00:00
Kazushi (Jam) Marukawa 6b587ee23c [VE] Isel patterns for fp32/64 and i32/64 conversion
Summary:
fp32/64 <> signed/unsigned i32/64 conversion isel patterns and tests

(This patch depends on `fsub` implemented by https://reviews.llvm.org/D73540 )

Reviewers: arsenm, craig.topper, rengolin, k-ishizaka

Reviewed By: arsenm

Subscribers: merge_guards_bot, wdng, hiraditya, llvm-commits

Tags: #ve, #llvm

Differential Revision: https://reviews.llvm.org/D73544
2020-01-29 14:10:22 +01:00
Kerry McLaughlin 3cf80822a9 [AArch64][SVE] Add SVE2 intrinsics for uniform DSP operations
Summary:
Implements the following intrinsics:
 - sqrdmlah, sqrdmlsh, sqrdmulh & sqdmulh
 - [s|u]hadd, [s|u]hsub, [s|u]rhadd & [s|u]hsubr
 - urecpe, ursqrte, sqabs & sqneg

Reviewers: sdesmalen, efriedma, dancgr, cameron.mcinally

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73493
2020-01-29 12:03:15 +00:00
Sam Parker dc0d84f09e [NFC][ARM] Add test 2020-01-29 06:59:21 -05:00
Kerry McLaughlin bd33a46213 [AArch64][SVE] Add SVE2 intrinsics for pairwise arithmetic
Summary:
Implements the following intrinsics:
 - addp
 - smaxp, sminp, umaxp & uminp
 - sadalp & uadalp

Reviewers: dancgr, efriedma, sdesmalen, c-rhodes

Reviewed By: c-rhodes

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cameron.mcinally, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73347
2020-01-29 10:31:31 +00:00
Kazushi (Jam) Marukawa f6bb58542a [VE] fp32/64 fadd/fsub/fdiv/fmul isel patterns
Summary: fp32/64 fadd/fsub/fdiv/fmul isel patterns and tests.

Reviewers: arsenm, craig.topper, rengolin, k-ishizaka

Subscribers: merge_guards_bot, wdng, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D73540
2020-01-29 11:00:56 +01:00
Fangrui Song bc15bf66dc [X86] matchAdd: don't fold a large offset into a %rip relative address
For `ret i64 add (i64 ptrtoint (i32* @foo to i64), i64 1701208431)`,

```
X86DAGToDAGISel::matchAdd
  ...
// AM.setBaseReg(CurDAG->getRegister(X86::RIP, MVT::i64));
  if (!matchAddressRecursively(N.getOperand(0), AM, Depth+1) &&
// Try folding offset but fail; there is a symbolic displacement, so offset cannot be too large
      !matchAddressRecursively(Handle.getValue().getOperand(1), AM, Depth+1))
    return false;
  ...
  // Try again after commuting the operands.
// AM.Disp = Val; foldOffsetIntoAddress() does not know there will be a symbolic displacement
  if (!matchAddressRecursively(Handle.getValue().getOperand(1), AM, Depth+1) &&
// AM.setBaseReg(CurDAG->getRegister(X86::RIP, MVT::i64));
      !matchAddressRecursively(Handle.getValue().getOperand(0), AM, Depth+1))
// Succeeded! Produced leaq sym+disp(%rip),...
    return false;
```

`foldOffsetIntoAddress()` currently does not know there is a symbolic
displacement and can fold a large offset.

The produced `leaq sym+disp(%rip), %rax` instruction is relocated by
an R_X86_64_PC32. If disp is large and sym+disp-rip>=2**31, there
will be a relocation overflow.

This approach is still not elegant. Unfortunately the isRIPRelative
interface is a bit clumsy. I tried several solutions and eventually
picked this one.

Differential Revision: https://reviews.llvm.org/D73606
2020-01-28 22:30:52 -08:00
Derek Schuff d966bf830f [WebAssembly] Preserve debug frame base information through register coloring
2 fixes:

Register coloring can re-assign virtual registers. When the frame base register
is colored, update the DwarfFrameBase accordingly When the frame base register
is stackified, do not attempt to encode DW_AT_frame_base as a local In the
future we will presumably want to handle this case better but for now we can
emit worse debug info rather than crashing.

Differential Revision: https://reviews.llvm.org/D73581
2020-01-28 16:58:15 -08:00
Craig Topper 92ecc306af [X86] Add test case for llvm.flt.rounds 2020-01-28 16:27:59 -08:00
Danilo Carvalho Grael 1f85dfb2af [AArch64][SVE] Add SVE2 mla indexed intrinsics.
Summary:
Add SVE2 mla indexed intrinsics:
- smlalb, smalalt, umlalb, umlalt, smlslb, smlslt, umlslb, umlslt.

Reviewers: efriedma, sdesmalen, dancgr, cameron.mcinally, c-rhodes, rengolin

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, arphaman, psnobl, llvm-commits, amehsan

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73576
2020-01-28 17:15:33 -05:00
Jessica Paquette dba29f7c3b [AArch64][GlobalISel] Fold G_AND into G_BRCOND
When the G_BRCOND is fed by a eq or ne G_ICMP, it may be possible to fold a
G_AND into the branch by producing a tbnz/tbz instead.

This happens when

  1. We have a ne/eq G_ICMP feeding into the G_BRCOND
  2. The G_ICMP is a comparison against 0
  3. One of the operands of the G_AND is a power of 2 constant

This is very similar to the code in AArch64TargetLowering::LowerBR_CC.

Add opt-and-tbnz-tbz to test this.

Differential Revision: https://reviews.llvm.org/D73573
2020-01-28 14:00:31 -08:00
Michael Spang a2fb2c0ddc [GlobalMerge] Preserve symbol visibility when merging globals
Symbols created for merged external global variables have default
visibility. This can break programs when compiling with -Oz
-fvisibility=hidden as symbols that should be hidden will be exported at
link time.

Differential Revision: https://reviews.llvm.org/D73235
2020-01-28 13:26:18 -08:00
Roland McGrath 2b0e6fe2e2 [Fuchsia] Remove aarch64-fuchsia target-specific -mcmodel=kernel
Under --target=aarch64-fuchsia, -mcmodel=kernel has the effect of
(the default) -mcmodel=small plus -mtp=el1 (which did not exist when
this behavior was added). Fuchsia's kernel now uses -mtp=el1
directly instead of -mcmodel=kernel, so remove this special support.

Patch By: mcgrathr

Differential Revision: https://reviews.llvm.org/D73409
2020-01-28 11:32:08 -08:00
Amara Emerson 14c2cf8e18 [AArch64][GlobalISel] Don't bail out of the select(cmp(a, b)) -> csel optimization with multiple users.
It can still be beneficial to do the optimization if the result of the compare
is used by *another* select.

Differential Revision: https://reviews.llvm.org/D73511
2020-01-28 10:09:03 -08:00
Jonathan Roelofs 7f93ff58e1 [llvm] Fix broken cases of 'CHECK[^:]*$' in tests 2020-01-28 09:52:59 -07:00
Victor Huang 4b414d9ade [PowerPC][Future] Add pld and pstd to future CPU
Add the prefixed instructions pld and pstd to future CPU. These are load and
store instructions that require new operand types that are 34 bits. This patch
adds the two instructions as well as the operand types required.

Note that this patch also makes a minor change to tablegen to account for the
fact that some instructions are going to require shifts greater than 31 bits
for the new 34 bit instructions.

Differential Revision: https://reviews.llvm.org/D72574
2020-01-28 08:23:29 -06:00