Commit Graph

130588 Commits

Author SHA1 Message Date
Mircea Trofin 14a16fae43 [llvm][NFC] Rename CallAnalyzer::onCommonInstructionSimplification
Summary:
It is called when instructions aren't simplified, and the implementation
is expected to account for a penalty. Renamed to
onCommonInstructionMissedSimplification.

Reviewers: davidxl, eraman

Reviewed By: davidxl

Subscribers: hiraditya, baloghadamsoftware, haicheng, a.sidorin, Szelethus, donat.nagy, dkrupp, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73662
2020-01-29 21:07:36 -08:00
Johannes Doerfert 89c2e733e8 [Attributor] Pointer privatization attribute (argument promotion)
A pointer is privatizeable if it can be replaced by a new, private one.
Privatizing pointer reduces the use count, interaction between unrelated
code parts. This is a first step towards replacing argument promotion.
While we can already handle recursion (unlike argument promotion!) we
are restricted to stack allocations for now because we do not analyze
the uses in the callee.

Reviewed By: uenoku

Differential Revision: https://reviews.llvm.org/D68852
2020-01-29 21:31:04 -06:00
Johannes Doerfert 791c9f1145 [Attributor] Fix TODO to avoid recomputation of results
The helpers AAReturnedFromReturnedValues and
AACallSiteReturnedFromReturned are useful not only to avoid code
duplication but also to avoid recomputation of results. If we have N
call sites we should not recompute the function return information N
times but once. These are mostly straightforward usages with some minor
improvements on the helpers and addition of a new one
(IRPosition::getAssociatedType) that knows about function return types.
2020-01-29 19:24:34 -06:00
Gabor Horvath 31ae0165c3 [LTO] Add optimization remarks for removed functions
This only works with regular LTO for now.

Differential Revision: https://reviews.llvm.org/D73597
2020-01-29 15:53:51 -08:00
Craig Topper 35625464c6 [X86] Fix the cost model for v16i16->v16i32 zero_extend/sign_extend with AVX2
We seem to be inheriting the cost from sse4.1. But if we have 256-bit registers we should be able to do this with just one extract to split the 16i16 and two v8i16->v8i32 operations so our cost should be 3 not 4.

Differential Revision: https://reviews.llvm.org/D73646
2020-01-29 15:52:10 -08:00
Matt Arsenault c5fffa4da3 GlobalISel: Add observer argument to legalizeIntrinsic
This is passed to legalizeCustom, but not intrinsic. Also remove the
MRI argument, since you can get that from the MachineIRBuilder.

I'm not sure why MachineIRBuilder has a private observer member, and
this is passed separately.
2020-01-29 18:33:45 -05:00
Matt Arsenault 7f3280ecdd AMDGPU/GlobalISel: Select permlane16/permlanex16 2020-01-29 17:55:31 -05:00
Cameron McInally 4f2e2acc4b [NFC][AArch64][SVE] Rename Destructive enumerator from DestructiveInstType
Rename Destructive enumerator in preparation for a larger set of patches to
support prefixing destructive oeprations with MOVPRFX.

Differential Revision: https://reviews.llvm.org/D73212
2020-01-29 15:42:26 -06:00
Amara Emerson c12f046eb9 [GlobalISel] Add new combine to convert scalar G_MUL to G_SHL.
For pow2 constants we should use G_SHL for pattern matching (and perf)
purposes later.

Vector support not yet implemented.

Differential Revision: https://reviews.llvm.org/D73659
2020-01-29 13:39:00 -08:00
Jessica Paquette 050cd443ca [AArch64][GlobalISel] Fix TBNZ/TBZ opcode selection
When the bit is <= 32, we have to use the W register variant for TB(N)Z.

This is because of the way the instruction is encoded.

Differential Revision: https://reviews.llvm.org/D73660
2020-01-29 13:11:18 -08:00
Hiroshi Yamauchi 24962ced81 [Loads] Handle simple cases with same base pointer with constant offsets in FindAvailableLoadedValue when AA is null.
Summary:
This will help with devirtualization (store forwarding with vtable pointers in
the presence of other stores into members in the constructor.) During inlining,
we don't have AA.

Reviewers: davidxl

Subscribers: mgorny, Prazek, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71307
2020-01-29 13:05:46 -08:00
Cameron McInally 00c2249910 [NFCI][AArch64][SVE] Set default DestructiveInstType in AArch64Inst class
Some housekeeping for the DestructiveInstType enum before a larger set of patches to support prefixing destructive oeprations with MOVPRFX.

Differential Revision: https://reviews.llvm.org/D73141
2020-01-29 15:00:19 -06:00
Victor Huang 1492b70a03 [PowerPC][Future] Add prefixed loads and stores for future CPU
A previous patch should have added pld and pstd and any support code in
the backend that is required for prefixed load and store type operations.
This patch adds a number of additional prefixed load and store type
instructions for the future CPU.

Differential Revision: https://reviews.llvm.org/D72577
2020-01-29 14:45:56 -06:00
Sterling Augustine c64b56617d Print discriminators when printing .debug_line in GNU style.
Summary:
gnu addr2line prints DWARF line table discriminators like so:

<file>:<line> (discriminator <Number>)

This matches that behavior.

Document how and when --output-style=GNU prints discriminators

Add test for new GNU-style discriminator printing.

Reviewers: rupprecht, labath, jhenderson

Subscribers: aprantl, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73318
2020-01-29 12:22:12 -08:00
Nikita Popov e086e23024 [InstCombine] Support non-splat vectors in icmp eq + add/sub fold
For the

    icmp eq (add X, C1), C2 => icmp eq X, C2-C1
    icmp eq (sub C1, X), C2 => icmp eq X, C1-C2

folds, this allows C1 to be non-splat and contain undefs.
C2 is still splat, due to the structure of the code.

This is to address the remaining part of the regression in D73411,
where demanded element analysis replaces some elements with undef.

Differential Revision: https://reviews.llvm.org/D73647
2020-01-29 20:56:58 +01:00
Amara Emerson 0da937bb5c [GlobalISel][IRTranslator] Follow convention and put constant offset of getelementptr arithmetic on RHS.
We were needlessly putting known constant values on the LHS of a G_MUL, which
is suboptimal.

Differential Revision: https://reviews.llvm.org/D73650
2020-01-29 11:37:19 -08:00
Huihui Zhang 8f6761aa41 Revert "[AArch64] Fix data race on RegisterBank initialization."
Buildbot failure, revert first while looking at the issue.

This reverts commit a5a4a47d69.
2020-01-29 11:17:19 -08:00
Huihui Zhang af620fc36a Revert "[AMDGPU] Fix data race on RegisterBank initialization."
There looks to be buildbot failure related.

This reverts commit 8bb6c8a22a.
2020-01-29 11:16:27 -08:00
Huihui Zhang 2ec954579a Revert "[ARM] Fix data race on RegisterBank initialization."
There looks to be buildbot failure related.

This reverts commit 91618d940e.
2020-01-29 11:15:27 -08:00
Fangrui Song 8903e61b66 [AsmPrinter][ELF] Define local aliases (.Lfoo$local) for GlobalObjects
For `MC_GlobalAddress` operands referencing **certain** GlobalObjects,
we can lower them to STB_LOCAL aliases to avoid costs brought by
assembler/linker's conservative decisions about symbol interposition:

* An assembler conservatively assumes a global default visibility symbol interposable (ELF
  semantics). So relocations in object files are needed even if the code generator assumed
  the definition exact and non-interposable.
* The relocations can cause the creation of PLT entries on some targets for -shared links.
  A linker conservatively assumes a global default visibility symbol interposable (if not
  otherwise constrained by -Bsymbolic/--dynamic-list/VER_NDX_LOCAL/etc).

"certain" refers to GlobalObjects in the intersection of
`hasExactDefinition() and !isInterposable()`: `external`, `appending`, `internal`, `private`.
Local linkages (`internal` and `private`) cannot be interposed. `appending` is for very
few objects LLVM interpret specially.  So the set just includes `external`.

This patch emits STB_LOCAL aliases (.Lfoo$local) for such GlobalObjects, so that targets can lower
MC_GlobalAddress operands to STB_LOCAL aliases if applicable.
We may extend the scope and include GlobalAlias in the future.

LLVM's existing -fno-semantic-interposition behaviors give us license to do such optimizations:

* Various optimizations (ipconstprop, inliner, sccp, sroa, etc) treat normal ExternalLinkage
  GlobalObjects as non-interposable.
* Before D72197, MC resolved a PC-relative VK_None fixup to a non-local symbol at assembly time (no
  outstanding relocation), if the target is defined in the same section. Put it simply, even if IR
  optimizations failed to optimize and allowed interposition for the function call in
  `void foo() {} void bar() { foo(); }`, the assembler would disallow it.

This patch sets up AsmPrinter infrastructure to make -fno-semantic-interposition more so.
With and without the patch, the object file output should be identical:
`.Lfoo$local` does not take a symbol table entry.

Reviewed By: sfertile

Differential Revision: https://reviews.llvm.org/D73228
2020-01-29 10:58:43 -08:00
Sterling Augustine 0758ac4e0c Handle non-absolute include dirs properly for both dwarf4 and dwarf5.
Summary:
Add test case for the same. This test case will also serve as a
starting point for later symbolizer tests.

Reviewers: dblaikie, jdoerfert

Subscribers: hiraditya, llvm-commits, jhenderson

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73583
2020-01-29 10:51:51 -08:00
Simon Pilgrim f7245ef897 [DAGCombiner] ISD::SHL/SRA/SRL - use general SelectionDAG::FoldConstantArithmetic
This handles all the constant splat / opaque testing for us.
2020-01-29 18:49:42 +00:00
Huihui Zhang d2e2fc450e [ConstantFold][SVE] Fix constant folding for scalable vector binary operations.
Summary:
Scalable vector should not be evaluated element by element.
Add support to handle scalable vector UndefValue.

Reviewers: sdesmalen, huntergr, spatel, lebedev.ri, apazos, efriedma, willlovett

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71445
2020-01-29 10:49:08 -08:00
Austin Kerbow 2605adb69c [AMDGPU][GlobalISel] Select 8-byte LDS Ops with 4-byte alignment
Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73585
2020-01-29 10:42:12 -08:00
Adrian Prantl 18dbe1b279 Run clang-format on DwarfExpression (NFC) 2020-01-29 10:23:12 -08:00
Adrian Prantl 816ee8a423 DwarfExpression: Factor out getOrCreateBaseType() (NFC) 2020-01-29 10:23:12 -08:00
Huihui Zhang 91618d940e [ARM] Fix data race on RegisterBank initialization.
Summary:
The initialization of RegisterBank needs to be done only once. The
logic of AlreadyInit has data race, use llvm::call_once instead.

This is continuing work of D73587.

Reviewers: arsenm, rovka, dsanders, t.p.northover, efriedma, apazos

Reviewed By: arsenm

Subscribers: wdng, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73605
2020-01-29 10:15:37 -08:00
Huihui Zhang 8bb6c8a22a [AMDGPU] Fix data race on RegisterBank initialization.
Summary:
The initialization of RegisterBank needs to be done only once. The
logic of AlreadyInit has data race, use llvm::call_once instead.

This is continuing work of D73587.

Reviewers: arsenm, tstellar, ronlieb, efriedma, apazos, nhaehnle

Reviewed By: nhaehnle

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73604
2020-01-29 10:14:40 -08:00
Huihui Zhang a5a4a47d69 [AArch64] Fix data race on RegisterBank initialization.
Summary:
The initialization of RegisterBank needs to be done only once. The
logic of AlreadyInit has a data race, use llvm::call_once instead.

This issue was identified through thread sanitizer.

Reviewers: efriedma, apazos, qcolombet, dsanders

Reviewed By: efriedma

Subscribers: arsenm, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73587
2020-01-29 10:12:52 -08:00
Adrian Prantl aa6ec19c5f Add dwarfdump support for DW_OP_regval_type.
Differential Revision: https://reviews.llvm.org/D73598
2020-01-29 10:02:23 -08:00
Simon Pilgrim 25b8e96388 [DAGCombiner] ISD::MUL - use general SelectionDAG::FoldConstantArithmetic
This handles all the constant splat / opaque testing for us.
2020-01-29 17:26:22 +00:00
Craig Topper 90c31b0f42 [X86] Custom lower ISD::FROUND with SSE4.1 to avoid a libcall.
ISD::FROUND is defined to round to nearest with ties rounding
away from 0. This mode isn't supported in hardware on X86.

But as long as we aren't compiling with trapping math, we can
emulate this with floor(X + copysign(nextafter(0.5, 0.0), X)).

We have to use nextafter to avoid some corner cases that adding
0.5 would have. For example, if X is nextafter(0.5, 0.0) it should
round to 0.0, but adding 0.5 would need one extra bit of mantissa
than can be stored so it rounds to 1.0. Adding nextafter(0.5, 0.0)
instead will just increase the exponent by 1 and leave the mantissa
as all 1s. This would be nextafter(1.0, 0.0) which will floor to 0.0.

Techically this requires -fno-trapping-math which isn't our default.
But if we care about exceptions we should be using constrained
intrinsics. Constrained intrinsics would use STRICT_FROUND which
won't go through this code.

Fixes PR42195.

Differential Revision: https://reviews.llvm.org/D73607
2020-01-29 09:10:02 -08:00
Jay Foad d07a789579 [AMDGPU] Cluster FLAT instructions with both vaddr and saddr
Reviewers: rampitec, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73634
2020-01-29 17:01:35 +00:00
Simon Pilgrim 4b04e11735 [DAGCombiner] Sub/SUBSAT - use general SelectionDAG::FoldConstantArithmetic
This handles all the constant splat / opaque testing for us.
2020-01-29 16:57:13 +00:00
Simon Pilgrim 48bd6a0986 [DAGCombiner] visitIMINMAX - use general SelectionDAG::FoldConstantArithmetic
This handles all the constant splat / opaque testing for us instead of the ConstantSDNode variant where we have to do it ourselves.
2020-01-29 16:57:13 +00:00
Craig Topper e5edd641fd [X86] Use a shorter sequence to implement FLT_ROUNDS
This code needs to map from the FPCW 2-bit encoding for rounding mode to the 2-bit encoding defined for FLT_ROUNDS. The previous implementation did some clever swapping of bits and adding 1 modulo 4 to do the mapping.

This patch instead uses an 8-bit immediate as a lookup table of four 2-bit values. Then we use the 2-bit FPCW encoding to index the lookup table by using a right shift and an AND. This requires extracting the 2-bit value from FPCW and multipying it by 2 to make it usable as a shift amount. But still results in less code.

Differential Revision: https://reviews.llvm.org/D73599
2020-01-29 08:56:33 -08:00
Matt Arsenault 62129878a6 AMDGPU/GlobalISel: Fix tablegen selection for scalar bin ops
Fixes selection for scalar G_SMULH/G_UMULH. Also switches to using
tablegen selected add/sub, which switch to the signed version of the
opcode. This matches the current DAG behavior. We can't drop the
manual selection for add/sub yet, because it's still both for VALU
add/sub and for G_PTR_ADD.
2020-01-29 08:55:54 -08:00
Kazushi (Jam) Marukawa fef80a2946 [VE] (conditional) branch modification & isel patterns
Summary:
InstInfo for branch modification, (conditional) branch isel patterns and tests.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D73632
2020-01-29 17:40:57 +01:00
Matt Arsenault b63629a58d GlobalISel: Fix mask computation in lowerInsert
This is supposed to be the high bit index, not the width. Use the
wrapping form of getBitsSet and avoid the bitflip.
2020-01-29 08:25:36 -08:00
Matt Arsenault 68b102b97a AMDGPU: Directly select 16-bank LDS case of llvm.amdgcn.interp.p1.f16
Manually select this is as a tablegen workraound. Both SelectionDAG
and GlobalISel end up misplacing the copy to m0 when both instructions
in the output need it. Neither considers that both output instructions
depend on m0. I don't know of any other pattern we need to handle this
case, so it's less effort to just workaround this for now.
2020-01-29 08:24:31 -08:00
Jay Foad 0d7bd34312 [MachineScheduler] Ignore artificial edges when forming store chains
Summary:
BaseMemOpClusterMutation::apply forms store chains by looking for
control (i.e. non-data) dependencies from one mem op to another.

In the test case, clusterNeighboringMemOps successfully clusters the
loads, and then adds artificial edges to the loads' successors as
described in the comment:
      // Copy successor edges from SUa to SUb. Interleaving computation
      // dependent on SUa can prevent load combining due to register reuse.
The effect of this is that *data* dependencies from one load to a store
are copied as *artificial* dependencies from a different load to the
same store.

Then when BaseMemOpClusterMutation::apply looks at the stores, it finds
that some of them have a control dependency on a previous load, which
breaks the chains and means that the stores are not all considered part
of the same chain and won't all be clustered.

The fix is to only consider non-artificial control dependencies when
forming chains.

Subscribers: MatzeB, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71717
2020-01-29 16:23:01 +00:00
Matt Arsenault 96352e0a1b AMDGPU/GlobalISel: Handle LDS with relocations case 2020-01-29 08:18:55 -08:00
Elia Geretto ab2300bc15 [PassManagerBuilder] Remove global extension when a plugin is unloaded
This commit fixes PR39321.

GlobalExtensions is not guaranteed to be destroyed when optimizer plugins are unloaded. If it is indeed destroyed after a plugin is dlclose-d, the destructor of the corresponding ExtensionFn is not mapped anymore, causing a call to unmapped memory during destruction.

This commit guarantees that extensions coming from external plugins are removed from GlobalExtensions when the plugin is unloaded if GlobalExtensions has not been destroyed yet.

Differential Revision: https://reviews.llvm.org/D71959
2020-01-29 16:15:45 +00:00
Connor Abbott 87d98c1495 AMDGPU: Fix handling of infinite loops in fragment shaders
Summary:
Due to the fact that kill is just a normal intrinsic, even though it's
supposed to terminate the thread, we can end up with provably infinite
loops that are actually supposed to end successfully. The
AMDGPUUnifyDivergentExitNodes pass breaks up these loops, but because
there's no obvious place to make the loop branch to, it just makes it
return immediately, which skips the exports that are supposed to happen
at the end and hangs the GPU if all the threads end up being killed.

While it would be nice if the fact that kill terminates the thread were
modeled in the IR, I think that the structurizer as-is would make a mess if we
did that when the kill is inside control flow. For now, we just add a null
export at the end to make sure that it always exports something, which fixes
the immediate problem without penalizing the more common case. This means that
we sometimes do two "done" exports when only some of the threads enter the
discard loop, but from tests the hardware seems ok with that.

This fixes dEQP-VK.graphicsfuzz.while-inside-switch with radv.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70781
2020-01-29 17:13:25 +01:00
Matt Arsenault 94e8ef4d4c AMDGPU/GlobalISel: Look through copies for source modifiers
When all VOP instructions are legalized to VGPRs, any SGPR source
modifiers will have a copy in the way.
2020-01-29 08:08:13 -08:00
Stanislav Mekhanoshin c2ad7ee1a9 [AMDGPU] override isHighLatencyDef
SIMachineScheduler uses isHighLatencyInstruction with the same
sematincs, but TargetInstrInfo has virtual isHighLatencyDef
method, so override it instead.

Added FLAT to the list of high latency opcodes and a check for
mayLoad since stores are not technically high latency in terms
of data dependency.

This change did not produce any visible impact on our tests.

Differential Revision: https://reviews.llvm.org/D73582
2020-01-29 08:01:29 -08:00
Matt Arsenault f717483acd GlobalISel: Assert on invalid bitcast in MIRBuilder
The other casts validate, so this should too.
2020-01-29 07:49:39 -08:00
Simon Pilgrim 79748add70 Fix MSVC lamdba default capture mode warning. NFCI. 2020-01-29 15:47:04 +00:00
Hans Wennborg 31e07692d7 Work around PR44697 in CrashRecoveryContext 2020-01-29 16:35:07 +01:00
Connor Abbott 08b205bb48 Revert "AMDGPU: Fix handling of infinite loops in fragment shaders"
This reverts commit 0994c485e6.
2020-01-29 16:14:52 +01:00