Commit Graph

55266 Commits

Author SHA1 Message Date
Fangrui Song aa708763d3 [MC] Add parameter `Address` to MCInstPrinter::printInst
printInst prints a branch/call instruction as `b offset` (there are many
variants on various targets) instead of `b address`.

It is a convention to use address instead of offset in most external
symbolizers/disassemblers. This difference makes `llvm-objdump -d`
output unsatisfactory.

Add `uint64_t Address` to printInst(), so that it can pass the argument to
printInstruction(). `raw_ostream &OS` is moved to the last to be
consistent with other print* methods.

The next step is to pass `Address` to printInstruction() (generated by
tablegen from the instruction set description). We can gradually migrate
targets to print addresses instead of offsets.

In any case, downstream projects which don't know `Address` can pass 0 as
the argument.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D72172
2020-01-06 20:42:22 -08:00
Matt Arsenault dc7b84c66c AMDGPU/GlobalISel: Fix unused variable warning in release 2020-01-06 22:31:33 -05:00
Matt Arsenault 452f6243c9 AMDGPU: Select llvm.amdgcn.interp.p2.f16 directly
This will enable automatic GlobalISel support in a future commit.
2020-01-06 20:34:21 -05:00
Matt Arsenault e93b1ffc84 AMDGPU: Use default operands for clamp/omod
We have a lot of complex pattern variants that just set the source
modifiers that are really handled, and then set the output modifiers
to 0. We're unlikely to ever match output modifiers from the use
instruction side, and we already match clamp/omod in a separate pass.
2020-01-06 20:22:13 -05:00
Heejin Ahn 21f7b36209 [WebAssembly] Fix landingpad-only case in Emscripten EH
Summary:
Previously we didn't set `Changed` to true when there are only landing
pads but not invokes. This fixes it and we set `Changed` to true
whenever we have landing pads. (There can't be invokes without landing
pads, so that case is covered too)

The test case for this has to be a separate file because this pass is a
`ModulePass` and `Changed` is computed based on the whole module.

Reviewers: tlively

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72308
2020-01-06 17:02:32 -08:00
Matt Arsenault 52afc93c38 AMDGPU/GlobalISel: Legalize G_READCYCLECOUNTER 2020-01-06 19:16:32 -05:00
Matt Arsenault d4c9e13324 AMDGPU/GlobalISel: Select G_UADDE/G_USUBE 2020-01-06 18:27:52 -05:00
Matt Arsenault 4e85ca9562 AMDGPU/GlobalISel: Replace handling of boolean values
This solves selection failures with generated selection patterns,
which would fail due to inferring the SGPR reg bank for virtual
registers with a set register class instead of VCC bank. Use
instruction selection would constrain the virtual register to a
specific class, so when the def was selected later the bank no longer
was set to VCC.

Remove the SCC reg bank. SCC isn't directly addressable, so it
requires copying from SCC to an allocatable 32-bit register during
selection, so these might as well be treated as 32-bit SGPR values.

Now any scalar boolean value that will produce an outupt in SCC should
be widened during RegBankSelect to s32. Any s1 value should be a
vector boolean during selection. This makes the vcc register bank
unambiguous with a normal SGPR during selection.

Summary of how this should now work:

- G_TRUNC is always a no-op, and never should use a vcc bank result.

- SALU boolean operations should be promoted to s32 in RegBankSelect
  apply mapping

- An s1 value means vcc bank at selection. The exception is for
  legalization artifacts that use s1, which are never VCC. All other
  contexts should infer the VCC register classes for s1 typed
  registers. The LLT for the register is now needed to infer the
  correct register class. Extensions with vcc sources should be
  legalized to a select of constants during RegBankSelect.

- Copy from non-vcc to vcc ensures high bits of the input value are
  cleared during selection.

- SALU boolean inputs should ensure the inputs are 0/1. This includes
  select, conditional branches, and carry-ins.

There are a few somewhat dirty details. One is that G_TRUNC/G_*EXT
selection ignores the usual register-bank from register class
functions, and can't handle truncates with VCC result banks. I think
this is OK, since the artifacts are specially treated anyway. This
does require some care to avoid producing cases with vcc. There will
also be no 100% reliable way to verify this rule is followed in
selection in case of register classes, and violations manifests
themselves as invalid copy instructions much later.

Standard phi handling also only considers the bank of the result
register, and doesn't insert copies to make the source banks
match. This doesn't work for vcc, so we have to manually correct phi
inputs in this case. We should add a verifier check to make sure there
are no phis with mixed vcc and non-vcc register bank inputs.

There's also some duplication with the LegalizerHelper, and some code
which should live in the helper. I don't see a good way to share
special knowledge about what types to use for intermediate operations
depending on the bank for example. Using the helper to replace
extensions with selects also seems somewhat awkward to me.

Another issue is there are some contexts calling
getRegBankFromRegClass that apparently don't have the LLT type for the
register, but I haven't yet run into a real issue from this.

This also introduces new unnecessary instructions in most cases, since
we don't yet try to optimize out the zext when the source is known to
come from a compare.
2020-01-06 18:26:42 -05:00
Matt Arsenault f3de8ab5cc GlobalISel: Implement lower for G_INTRINSIC_ROUND
Mostly copied from AMDGPU lowering implementation, except used
G_SITOFP instead of directly creating a select on -1.0, 0.0.
2020-01-06 18:26:42 -05:00
Philip Reames 08d17cb065 [X86] Move an enum definition into a header to simplify future patches [NFC] 2020-01-06 15:14:42 -08:00
Jinsong Ji 24ee4edee8 [PowerPC][NFC] Rename record instructions to use _rec suffix instead of o
We use o suffix to indicate record form instuctions,
(as it is similar to dot '.' in mne?)

This was fine before, as we did not support XO-form.
However, with https://reviews.llvm.org/D66902,
we now have XO-form support.

It becomes confusing now to still use 'o' for record form,
and it is weird to have something like 'Oo' .

This patch rename all 'o' instructions to use '_rec' instead.
Also rename `isDot` to `isRecordForm`.

Reviewed By: #powerpc, hfinkel, nemanjai, steven.zhang, lkail

Differential Revision: https://reviews.llvm.org/D70758
2020-01-06 22:27:07 +00:00
Matt Arsenault ee6b8722ff GlobalISel: Fix unsupported legalize action
This would complain about invalid legalizer rules otherwise.

Mark some operations as unsupported for AMDGPU. This currently seems
to produce the same legalize error as when no rules are defined, but
eventually this should produce a proper user facing error.
2020-01-06 17:21:51 -05:00
Matt Arsenault 7f2db2917d AMDGPU: Fix legalizing f16 fpow
The existing test only covered one case for r600. The use of
mul_legacy also looks suspicious to me, but leave it for now. The
patterns are also not making use of source modifiers.
2020-01-06 17:21:51 -05:00
Matt Arsenault a506efff18 AMDGPU: Use ImmLeaf
This solves one GlobalISel importer error, but the pattern still fails
for another reason.
2020-01-06 17:21:51 -05:00
Matt Arsenault 14d25052a2 AMDGPU: Use ImmLeaf for inline immediate predicates 2020-01-06 17:21:51 -05:00
Craig Topper 6a0564adcf [X86] Improve v4i32->v4f64 uint_to_fp for AVX1/AVX2 targets.
Use zext+or+fsub to do the conversion. Similar to D71971.

Differential Revision: https://reviews.llvm.org/D71971
2020-01-06 14:07:35 -08:00
Evgenii Stepanov 40a80a0a19 Lower TAGPstack with negative offset to SUBG.
Summary:
This never really occurs in the current codegen, so only a MIR test is
possible.

Reviewers: ostannard, pcc

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72123
2020-01-06 11:48:35 -08:00
Amara Emerson df3f4e0d77 [X86] Fix an 8 bit testb being selected when folding a volatile i32 load pattern.
Differential Revision: https://reviews.llvm.org/D71581
2020-01-06 11:46:42 -08:00
Jinsong Ji e29a2e6be4 [PowerPC][LoopVectorize] Extend getRegisterClassForType to consider double and other floating point type
In https://reviews.llvm.org/D67148, we use isFloatTy to test floating
point type, otherwise we return GPRRC.
So 'double' will be classified as GPRRC, which is not accurate.

This patch covers other floating point types.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D71946
2020-01-06 18:44:59 +00:00
diggerlin 83ec9b51ed [AIX] Use csect reference for function address constants
SUMMARY:
We currently emit a reference for function address constants as labels;
for example:

foo_ptr:
.long foo
however, there may be no such label in the case where the function is
undefined. Although the label exists when the function is defined, we
will (to be consistent) also use a csect reference in that case.

Address one comment
https://reviews.llvm.org/D71144#inline-653255

Reviewers: daltenty,hubert.reinterpretcast,jasonliu,Xiangling_L
Subscribers: cebowleratibm, wuzish, nemanjai

Differential Revision: https://reviews.llvm.org/D71144
2020-01-06 11:45:00 -05:00
David Green f88d52728b [ARM] Use the correct opcodes for Thumb2 segmented stack frame lowering
The segmented stack lowering code appears to be using ARM opcodes under
Thumb2. The MRC opcode will be the same for Thumb and ARM, but t2LDR
seems wrong. Either way, using the correct thumb vs arm opcodes is more
correct.

Differential Revision: https://reviews.llvm.org/D72074
2020-01-06 16:38:49 +00:00
David Green 0eb981b8ce [ARM] Use correct TRAP opcode for thumb in FastISel
We were previously unconditionally using the ARM::TRAP opcode, even
under Thumb. My understanding is that these are essentially the same
thing (they both result in a trap under Thumb), but the ARM::TRAP opcode
is marked as requiring IsARM, so it is more correct to use ARM::tTRAP.

Differential Revision: https://reviews.llvm.org/D72075
2020-01-06 16:38:49 +00:00
diggerlin 61b5e727b7 [AIX] Use csect reference for function address constants
SUMMARY:
We currently emit a reference for function address constants as labels;
for example:

foo_ptr:
.long foo
however, there may be no such label in the case where the function is
undefined. Although the label exists when the function is defined, we
will (to be consistent) also use a csect reference in that case.

Reviewers: daltenty,hubert.reinterpretcast,jasonliu,Xiangling_L
Subscribers: cebowleratibm, wuzish, nemanjai

Differential Revision: https://reviews.llvm.org/D71144
2020-01-06 11:38:22 -05:00
Simon Pilgrim ea2c159f96 [AMDGPU] Fix "use of uninitialized variable" static analyzer warning. NFCI.
Add "unreachable" default case to AMDGPUTargetStreamer::getArchNameFromElfMach
2020-01-06 16:36:56 +00:00
Simon Pilgrim 5bcc747393 Fix "use of uninitialized variable" static analyzer warnings. NFCI.
Add "unreachable" default cases like we do for the other switch()s in X86MCInstLower::Lower
2020-01-06 16:36:56 +00:00
Simon Tatham 34817e04fe [ARM,MVE] Fix many signedness errors in MVE intrinsics.
Summary:
Running an end-to-end test last week I noticed that a lot of the ACLE
intrinsics that operate differently on vectors of signed and unsigned
integers were ending up generating the signed version of the
instruction unconditionally. This is because the IR intrinsics had no
way to distinguish signed from unsigned: the LLVM type system just
calls them both `v8i16` (or whatever), so you need either separate
intrinsics for signed and unsigned, or a flag parameter that tells
ISel which one to choose.

This patch fixes all the problems of that kind that I've noticed, by
adding an i32 flag parameter to many of the IR intrinsics which is set
to 1 for unsigned (matching the existing practice in cases where we
got it right), and conditioning all the isel patterns on that flag. So
the fundamental change is in `IntrinsicsARM.td`, changing the
low-level IR intrinsics API; there are knock-on changes in
`arm_mve.td` (adjusting code gen for the ACLE intrinsics to use the
modified API) and in `ARMInstrMVE.td` (adjusting isel to expect the
new unsigned flags). The rest of this patch is boringly updating tests.

Reviewers: dmgreen, miyuki, MarkMurrayARM

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D72270
2020-01-06 16:33:16 +00:00
Simon Tatham b99ef32d04 [ARM,MVE] Generate the right instruction for vmaxnmq_m_f16.
Summary:
Due to a copy-paste error in the isel patterns, the predicated version
of this intrinsic was expanding to the `VMAXNMT.F32` instruction
instead of `VMAXNMT.F16`. Similarly for vminnm.

Reviewers: dmgreen, miyuki, MarkMurrayARM

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72269
2020-01-06 16:28:20 +00:00
Matt Arsenault e4464bf3d4 AMDGPU/GlobalISel: Select scalar v2s16 G_BUILD_VECTOR 2020-01-06 11:19:33 -05:00
Matt Arsenault f1c85ecdfc AMDGPU/GlobalISel: Select more G_EXTRACTs correctly
This assumed a 32-bit extract size, which would produce invalid copies
with 64-bit extracts. Handle the easy case. Ideally we would have a
way to get the proper subreg index for any 32-bit offset, but there
should probably be a tablegenerated way of getting the subreg index
for any size and offset.
2020-01-06 11:10:13 -05:00
Simon Pilgrim 5d986a68a5 [CostModel][X86] Add missing scalar i64->f32 uitofp costs 2020-01-06 13:17:02 +00:00
James Henderson d68904f957 [NFC] Fix trivial typos in comments
Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D72143

Patch by Kazuaki Ishizaki.
2020-01-06 10:50:26 +00:00
Sjoerd Meijer 0efc9e5a8c [ARM][MVE] More MVETailPredication debug messages. NFC.
I've added a few more debug messages to MVETailPredication because I wanted to
trace better which instructions are added/removed. And while I was at it, I
factored out one function which I thought was clearer, and have added some
comments to describe better the flow between MVETailPredication and
ARMLowOverheadLoops.

Differential Revision: https://reviews.llvm.org/D71549
2020-01-06 09:56:02 +00:00
Shengchen Kan 5173bfcbc4 Add interface emitPrefix for MCCodeEmitter
Differential Revision: https://reviews.llvm.org/D72047
2020-01-06 17:51:14 +08:00
Ehud Katz c5fb73c5d1 [APFloat] Add recoverable string parsing errors to APFloat
Implementing the APFloat part in PR4745.

Differential Revision: https://reviews.llvm.org/D69770
2020-01-06 10:09:01 +02:00
Craig Topper 95840866b7 [X86] Improve v2i64->v2f32 and v4i64->v4f32 uint_to_fp on avx and avx2 targets.
Summary:
Based on Simon's D52965, but improved to handle strict fp and improve some of the shuffling.

Rather than use v2i1/v4i1 and let type legalization continue, just generate all the code with legal types and use an explicit shuffle.

I also added an explicit setcc to the v4i64 code to match the semantics of vselect which doesn't just use the sign bit. I'm also using a v4i64->v4i32 truncate instead of the shuffle in Simon's original code. With the setcc this will become a pack.

Future work can look into using X86ISD::BLENDV and a different shuffle that only moves the sign bit.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71956
2020-01-05 17:44:08 -08:00
Liu, Chen3 ca3bf289a7 [NFC] Modify the format:
Drop the else since we alerady returned in the if.
2020-01-06 09:35:19 +08:00
Fangrui Song 5511861e6d [MC][ARM] Delete MCSection::HasData and move SHF_ARM_PURECODE logic to ARMELFObjectWriter::addTargetSectionFlags
This simplifies the generic interface and also makes SHF_ARM_PURECODE
more robust (fixes a TODO). Inspecting MCDataFragment contents covers
more cases than MCObjectStreamer::EmitBytes.
2020-01-05 14:20:34 -08:00
Simon Pilgrim e3bd011890 [X86][SSE] Combine combineLogicBlendIntoConditionalNegate for VSELECT nodes (PR43660)
Attempt to use combineLogicBlendIntoConditionalNegate for (select M, (sub 0, X), X) -> (sub (xor X, M), M)

We limit this to cases that can't easily replace the VSELECT with a shuffle (non-constant masks) or where a BLENDV is likely to occur (which tends to result in slower codegen).
2020-01-05 18:50:44 +00:00
Simon Pilgrim 6a6e6f04ec [X86] Move combineLogicBlendIntoConditionalNegate before combineSelect. NFCI.
Updates function order in preparation of future fix for PR43660
2020-01-05 17:17:41 +00:00
Simon Pilgrim 3db84f142a [X86] Merge (identical) LowerGC_TRANSITION_START and LowerGC_TRANSITION_END (NFC)
Silences a copy+paste analyzer warning - all they are doing are inserting NOOPs in exactly the same way.
2020-01-05 15:24:57 +00:00
David Green fb8c9a339a [ARM] Use isFMAFasterThanFMulAndFAdd for scalars as well as MVE vectors
This adds extra scalar handling to isFMAFasterThanFMulAndFAdd, allowing
the target independent code to handle more folds in more situations (for
example if the fast math flags are present, but the global
AllowFPOpFusion option isnt). It also splits apart the HasSlowFPVMLx
into HasSlowFPVFMx, to allow VFMA and VMLA to be controlled separately
if needed.

Differential Revision: https://reviews.llvm.org/D72139
2020-01-05 11:24:04 +00:00
David Green c15a56f61a [ARM] Fill in FP16 FMA patterns
This adds fp16 variants of all the fma patterns in the ARM backend.

Differential Revision: https://reviews.llvm.org/D72138
2020-01-05 11:24:04 +00:00
Matt Arsenault d12f2a2998 GlobalISel: Scalarize all division operations
This only handled G_SDIV, but they all are trivially scalarizable.

Also define placeholder AMDGPU division legalizer rules.
2020-01-04 13:47:10 -05:00
Florian Hahn b8a3c34eee Revert "[SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC)."
This reverts commit 51ef53f3bd, as it
breaks some bots.
2020-01-04 18:44:38 +00:00
Florian Hahn 51ef53f3bd [SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC).
SCEVExpander modifies the underlying function so it is more suitable in
Transforms/Utils, rather than Analysis. This allows using other
transform utils in SCEVExpander.

Reviewers: sanjoy.google, efriedma, reames

Reviewed By: sanjoy.google

Differential Revision: https://reviews.llvm.org/D71537
2020-01-04 18:29:35 +00:00
Florian Hahn 99f74a64a2 [SCEV] Remove unused ScalarEvolutionExpander.h includes (NFC). 2020-01-04 18:29:35 +00:00
Matt Arsenault 4e972224c4 AMDGPU/GlobalISel: Refine SMRD selection rules
Fix selecting these for volatile global loads, and ensure the loads
are constant enough.
2020-01-04 12:40:35 -05:00
Matt Arsenault d9b5063b25 AMDGPU/GlobalISel: Legalize more odd sized loads
The attempts to widen sufficently aligned, odd sized loads wasn't
consistently applied.
2020-01-04 12:38:39 -05:00
Matt Arsenault 5fb59f16e2 AMDGPU/GlobalISel: Assume vcc phis for any vcc input
This produces more intelligible looking results, more comparabble to
the DAG output in the simplest cases. This is probably wrong in
complex control flow, but RegBankSelect doesn't attempt analyzing if
this is on a masked path for selecting the bank yet.
2020-01-04 12:38:11 -05:00
Matt Arsenault 5eed4e2664 AMDGPU/GlobalISel: Implement applyMappingImpl less incorrectly
We're checking the current register bank of the registers in the
instruction, but the mapping may have inserted cross bank copies and
is expecting to replace the registers.

We mostly get away with this currently, because VGPR->SGPR copies are
illegal, and we assume this won't happen. In a future change, we'll
start relying on more cross register bank copies being inserted, and
this starts to break down.
2020-01-04 12:36:05 -05:00