Commit Graph

146839 Commits

Author SHA1 Message Date
Albion Fung ffbffaf6b6 [PowerPC] Improve codegen for int-to-fp conversion of subword vector extract
When an integer is converted into floating point in subword vector extract,
it can be done in 2 instructions instead of the 3+ instructions it generates
right now. This patch removes the uncessary generation.

Differential: https://reviews.llvm.org/D100604
2021-05-11 15:00:11 -05:00
Amara Emerson 69069509b2 [AArch64][GlobaISel] Mark target generic instructions as HasNoSideEffects.
One test needed updating because the newly side-effect-free instructions were
now being DCE'd.
2021-05-11 12:38:53 -07:00
Roman Lebedev 97e04d41e6
[X86] X86TTIImpl::getInterleavedMemoryOpCostAVX2(): canonicalize to integer type
This way we don't have to duplicate i32/f32 and i64/f64 entries,
which was already forgotten to be done for a few tuples.
2021-05-11 21:35:58 +03:00
Fangrui Song 129f466e22 [GlobalOpt] Remove heap SROA
GlobalOpt implements a heap SROA (SROA for an malloc allocatated struct or array
of structs) which is largely undertested (heap-sra-[1234].ll are basically the
same test with very little difference) and does not trigger at all when
bootstrapping clang (it only supports the case of one single store).

The heap SROA implementation causes PR50027 (GEP is not properly handled; crash or miscompile).
Just drop the implementation. I have deleted some obviously duplicated tests
but kept `heap-sra-[12]{,-no-nullopt}.ll`.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D102257
2021-05-11 11:34:37 -07:00
Amara Emerson ae2b36e8bd [AArch64][GlobalISel] Support truncstorei8/i16 w/ combine to form truncating G_STOREs.
This needs some tablegen changes so that we can actually import the patterns properly.

Differential Revision: https://reviews.llvm.org/D102204
2021-05-11 11:33:03 -07:00
Fangrui Song ec27c5f170 [RISCV] Prefer to lower MC_GlobalAddress operands to .Lfoo$local
Similar to X86 D73230 and AArch64 D101872

With this change, we can set dso_local in clang's -fpic -fno-semantic-interposition mode,
for default visibility external linkage non-ifunc-non-COMDAT definitions.

For such dso_local definitions, variable access/taking the address of a
function/calling a function will go through a local alias to avoid GOT/PLT.

Reviewed By: jrtc27, luismarques

Differential Revision: https://reviews.llvm.org/D101875
2021-05-11 11:29:45 -07:00
Eli Friedman 61cbbba7a6 [ArgumentPromotion] Fix byval alignment handling.
Make sure the alignment of the generated operations matches the
alignment of the byval argument.  Previously, we were just ignoring
alignment and getting lucky.

While I'm here, also delete the unnecessary "tail" handling.
Passing a pointer to a byval argument to a "tail" call is UB, so
rewriting to an alloca doesn't require any special handling.

Differential Revision: https://reviews.llvm.org/D89819
2021-05-11 11:22:18 -07:00
Sam Powell cba508fb67 [TextAPI] Reformat llvm_unreachable message
Change llvm_unreachable message from "Unknown llvm.MachO.PlatformKind
enum" to "Unknown llvm::MachO::PlatformKind enum".

Differential revision: https://reviews.llvm.org/D102250
2021-05-11 09:59:26 -07:00
Craig Topper ce6e4f27dd [RISCV] Use fractional LMULs for fixed length types smaller than riscv-v-vector-bits-min.
My thought process is that if v2i64 is an LMUL=1 type then v2i32
should be an LMUL=1/2 type. We limit the fractional LMUL so that
SEW=64 clips to LMUL=1, SEW=32 clips to LMUL=1/2, etc. This
ensures there's always a fractional LMUL available to truncate a type.
This does reduce the number of vsetvlis in some cases.

Some tests increase vsetvlis because the best container type for a
mask type is dependent on the LMUL+SEW that the mask was produced
from, but you can't tell that from the type. I think this is
something we need to solve this in the machine IR when optimizing
vsetvlis.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D101215
2021-05-11 09:42:48 -07:00
Roman Lebedev 5f78ba001c
[X86][Codegen] Shift amount mod: sh? i64 x, (32-y) --> sh? i64 x, -(y+32)
I've seen this in the RawSpeed's BitPumpMSB*::push() hotpath,
after fixing the buffer abstraction to a more sane one,
when looking into a +5% runtime regression.
I was hoping that this would fix it, but it does not look it does.

This seems to be at least not worse than the original pattern.
But i'm actually mainly interested in the case where we already
compute `(y+32)` (see last test),

https://alive2.llvm.org/ce/z/ZCzJio

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D101944
2021-05-11 19:39:41 +03:00
Craig Topper dc00cbb505 [RISCV] Match trunc_vector_vl+sra_vl/srl_vl with splat shift amount to vnsra/vnsrl.
Limited to splats because we would need to truncate the shift
amount vector otherwise.

I tried to do this with new ISD nodes and a DAG combine to
avoid such a large pattern, but we don't form the splat until
LegalizeDAG and need DAG combine to remove a scalable->fixed->scalable
cast before it becomes visible to the shift node. By the time that
happens we've already visited the truncate node and won't revisit it.

I think I have an idea how to improve i64 on RV32 I'll save for a
follow up.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D102019
2021-05-11 09:29:31 -07:00
Steven Wu 4eff946947 [IR][AutoUpgrade] Drop align attribute from void return types
Since D87304, `align` become an invalid attribute on none pointer types and
verifier will reject bitcode that has invalid `align` attribute.

The problem is before the change, DeadArgumentElimination can easily
turn a pointer return type into a void return type without removing
`align` attribute. Teach Autograde to remove invalid `align` attribute
from return types to maintain bitcode compatibility.

rdar://77022993

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D102201
2021-05-11 08:23:55 -07:00
Congzhe Cao 29342291d2 [LoopInterchange] Fix legality for triangular loops
This is a bug fix in legality check.

When we encounter triangular loops such as the following form:
    for (int i = 0; i < m; i++)
      for (int j = 0; j < i; j++), or

    for (int i = 0; i < m; i++)
      for (int j = 0; j*i < n; j++),

we should not perform interchange since the number of executions of the loop body
will be different before and after interchange, resulting in incorrect results.

Reviewed By: bmahjour

Differential Revision: https://reviews.llvm.org/D101305
2021-05-11 11:00:46 -04:00
Aakanksha Patil c58912eca7 Fix typo "Execpt" in comments
Differential Revision: https://reviews.llvm.org/D101858
2021-05-11 10:47:01 -04:00
Paul C. Anagnostopoulos 46402eb103 Revert "[TableGen] Make the NUL character invalid in .td files"
At least one build uses a 'sed' that does not understand \x00.

This reverts commit cf9647011c4f05e1eb4423c6637d84e2f26b2042.
2021-05-11 10:43:13 -04:00
Florian Hahn faebc6bf10
[VPlan] Register recipe for instr if the simplified value is recipe.
If the simplified VPValue is a recipe, we need to register it for Instr,
in case it needs to be recorded. The way this is handled in general may
change soon, following some post-commit comments.

This fixes PR50298.
2021-05-11 14:32:34 +01:00
Roman Lebedev 69ed93a435
[X86] X86TTIImpl::getInterleavedMemoryOpCostAVX2(): use getMemoryOpCost()
Now that getMemoryOpCost() correctly handles all the vector variants,
we should no longer hand-roll our own version of it, but use it directly.

The AVX512 variant probably needs a similar change,
but there it is less obvious.
2021-05-11 16:28:00 +03:00
Paul C. Anagnostopoulos 6ca2bdb03c [TableGen] Make the NUL character invalid in .td files
Differential Revision: https://reviews.llvm.org/D101923
2021-05-11 09:20:42 -04:00
Simon Pilgrim 759b97e55a [X86] Replace repeated isa/cast<ConstantSDNode> calls with single single dyn_cast<>. NFCI.
Noticed while looking at D101944
2021-05-11 14:18:45 +01:00
Simon Pilgrim 9acc03ad92 [X86][SSE] Replace foldShuffleOfHorizOp with generalized version in canonicalizeShuffleMaskWithHorizOp
foldShuffleOfHorizOp only handled basic shufps(hop(x,y),hop(z,w)) folds - by moving this to canonicalizeShuffleMaskWithHorizOp we can work with more general/combined v4x32 shuffles masks, float/integer domains and support shuffle-of-packs as well.

The next step will be to support 256/512-bit vector cases.
2021-05-11 14:18:45 +01:00
Matt Arsenault bce3cca488 CodeGen: Fix null dereference before null check 2021-05-11 09:07:32 -04:00
Roman Lebedev c02476f315
[X86][CostModel] X86TTIImpl::getMemoryOpCost(): rewrite vector handling again
Instead of handling power-of-two sized vector chunks,
try handling the large vector in a stream mode,
decreasing the operational vector size
once it no longer works for the elements left to process.

Notably, this improves costs for overaligned loads - loading padding is fine.
This more directly tracks when we need to insert/extract the YMM/XMM subvector,
some costs fluctuate because of that.

Reviewed By: RKSimon, ABataev

Differential Revision: https://reviews.llvm.org/D100684
2021-05-11 16:02:22 +03:00
Sanjay Patel 49950cb1f6 [SLP] restrict matching of load combine candidates
The test example from https://llvm.org/PR50256 (and reduced here)
shows that we can match a load combine candidate even when there
are no "or" instructions. We can avoid that by confirming that we
do see an "or". This doesn't apply when matching an or-reduction
because that match begins from the operands of the reduction.

Differential Revision: https://reviews.llvm.org/D102074
2021-05-11 08:46:40 -04:00
Piotr Sobczak 09fe84abb4 [AMDGPU] Move code sinking before structurizer
Moving code sinking pass before structurizer creates more sinking
opportunities.

The extra flow edges introduced by the structurizer can have adverse
effects on sinking, because the sinking pass prefers moving instructions
to blocks with unique predecessors and the structurizer destroys that
property in some cases.

A notable example is moving high-latency image instructions across kills.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D101115
2021-05-11 14:07:23 +02:00
Stefan Pintilie c79bc5942d [PowerPC][Bug] Fix Bug in Stack Frame Update Code
The stack frame update code does not take into consideration spilling
to registers for callee saved registers. The option -ppc-enable-pe-vector-spills
turns on spilling to registers for callee saved registers and may expose a bug
in the code that moves a stack frame pointer update instruction.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D101366
2021-05-11 05:54:07 -05:00
Denis Antrushin df47368d40 [RegAllocFast] properly handle STATEPOINT instruction.
STATEPOINT is a fancy and complex pseudo instruction which
has both tied defs and regmask operand.

Basic FastRA algorithm is as follows:

1. Mark registers used by defs as free
2. If instruction has regmask operand displace clobbered registers
   according to regmask.
3. Assign registers for use operands.

In case of tied defs step 1 is replaced with allocation of registers
for them. But regmask is still processed, which may displace already
allocated registers. As a result, tied use and def will get assigned
to different registers.

This patch makes FastRA to process instruction's RegMask (if any) when
checking for physical registers interference.
That way tied operands won't get registers clobbered by regmask.

Reviewed By: arsenm, skatkov
Differential Revision: https://reviews.llvm.org/D99284
2021-05-11 17:27:00 +07:00
Andy Wingo b2f21b145a [CodeGen][WebAssembly] Better lowering for WASM_SYMBOL_TYPE_GLOBAL symbols
As we have been missing support for WebAssembly globals on the IR level,
the lowering of WASM_SYMBOL_TYPE_GLOBAL to IR was incomplete.  This
commit fleshes out the lowering support, lowering references to and
definitions of addrspace(1) values to correctly typed
WASM_SYMBOL_TYPE_GLOBAL symbols.

Depends on D101608.

Differential Revision: https://reviews.llvm.org/D101913
2021-05-11 11:47:40 +02:00
Paulo Matos d7086af214 [WebAssembly] Support for WebAssembly globals in LLVM IR
This patch adds support for WebAssembly globals in LLVM IR, representing
them as pointers to global values, in a non-default, non-integral
address space.  Instruction selection legalizes loads and stores to
these pointers to new WebAssemblyISD nodes GLOBAL_GET and GLOBAL_SET.
Once the lowering creates the new nodes, tablegen pattern matches those
and converts them to Wasm global.get/set of the appropriate type.

Based on work by Paulo Matos in https://reviews.llvm.org/D95425.

Reviewed By: pmatos

Differential Revision: https://reviews.llvm.org/D101608
2021-05-11 11:19:29 +02:00
Alex Orlov 05d1ae4e18 * Add support for JSON output style to llvm-symbolizer
This patch adds JSON output style to llvm-symbolizer to better support CLI automation by providing a machine readable output.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D96883
2021-05-11 13:10:54 +04:00
Hsiangkai Wang d8ec2b183e [RISCV] Fix the calculation of the offset of Zvlsseg spilling.
For Zvlsseg spilling, we need to convert the pseudo instructions
into multiple vector load/store instructions with appropriate offsets.
For example, for PseudoVSPILL3_M2, we need to convert it to

VS2R %v2, %base
ADDI %base, %base, (vlenb x 2)
VS2R %v4, %base
ADDI %base, %base, (vlenb x 2)
VS2R %v6, %base

We need to keep the size of the offset in the pseudo spilling instructions.
In this case, it is (vlenb x 2).

In the original implementation, we use the size of frame objects divide the
number of vectors in zvlsseg types. The size of frame objects is not
necessary exactly the same as the spilling data. It may be larger than
it. So, we change it to (VLENB x LMUL) in this patch. The calculation is
more direct and easy to understand.

Differential Revision: https://reviews.llvm.org/D101869
2021-05-11 10:13:18 +08:00
Stanislav Mekhanoshin 22d295f695 [AMDGPU] Constant fold Intrinsic::amdgcn_perm
Differential Revision: https://reviews.llvm.org/D102203
2021-05-10 16:23:11 -07:00
Sam Clegg 3b8d2be527 Reland: "[lld][WebAssembly] Initial support merging string data"
This change was originally landed in: 5000a1b4b9
It was reverted in: 061e071d8c

This change adds support for a new WASM_SEG_FLAG_STRINGS flag in
the object format which works in a similar fashion to SHF_STRINGS
in the ELF world.

Unlike the ELF linker this support is currently limited:
- No support for SHF_MERGE (non-string merging)
- Always do full tail merging ("lo" can be merged with "hello")
- Only support single byte strings (p2align 0)

Like the ELF linker merging is only performed at `-O1` and above.

This fixes part of https://bugs.llvm.org/show_bug.cgi?id=48828,
although crucially it doesn't not currently support debug sections
because they are not represented by data segments (they are custom
sections)

Differential Revision: https://reviews.llvm.org/D97657
2021-05-10 16:03:38 -07:00
Jessica Paquette 79be9c59c6 [AArch64][GlobalISel] Add post-legalizer lowering for NEON vector fcmps
This is roughly equivalent to the floating point portion of
`AArch64TargetLowering::LowerVSETCC`. Main part that's missing is the v4s16 bit.

This also adds helpers equivalent to `EmitVectorComparison`, and
`changeVectorFPCCToAArch64CC`. This moves `changeFCMPPredToAArch64CC` out of
the selector into AArch64GlobalISelUtils for the sake of code reuse.

This is done in post-legalizer lowering with pseudos to simplify selection.
The imported patterns end up handling selection for us this way.

Differential Revision: https://reviews.llvm.org/D101782
2021-05-10 15:40:06 -07:00
Nico Weber 061e071d8c Revert "[lld][WebAssembly] Initial support merging string data"
This reverts commit 5000a1b4b9.
Breaks tests, see https://reviews.llvm.org/D97657#2749151

Easily repros locally with `ninja check-llvm-mc-webassembly`.
2021-05-10 18:28:28 -04:00
Jessica Paquette 6d8b070d96 [AArch64][GlobalISel] Enable memcpy family combines on minsize functions
The combines in `tryCombineMemCpyFamily` have heuristics (e.g.
`TLI.getMaxStoresPerMemset`) which consider size. So, theoretically, enabling
these combines on minsize functions shouldn't be harmful.

With this enabled we save 0.9% geomean on CTMark at -Oz, and 5.1% on Bullet.
There are no code size regressions.

Differential Revision: https://reviews.llvm.org/D102198
2021-05-10 15:25:23 -07:00
Krzysztof Parzyszek 8b9c15c281 [Hexagon] Handle loads and stores of scalar predicate vectors
Handle v2i1, v4i1, and v8i1.
2021-05-10 16:42:22 -05:00
Sanjay Patel 5577e86691 [InstCombine] fold extract subvector of bitcast insertelt
This is visible in the original example from:
https://llvm.org/PR50055
(but this change doesn't solve the bug)

https://alive2.llvm.org/ce/z/vM_Yq-
2021-05-10 17:20:10 -04:00
Roman Lebedev 6a64c462eb
[X86] AMD Zen 3: same-reg AVX YMM VPCMP is dep breaking one-idiom
As measured by exegesis, and confirmed by ref docs.
Still not zero-cycle :)
2021-05-10 23:49:27 +03:00
Roman Lebedev 2953245337
[X86] AMD Zen 3: same-reg AVX XMM VPCMP is dep breaking one-idiom
As measured by exegesis, and confirmed by ref docs.
Again, it's not zero-cycle.
2021-05-10 23:49:26 +03:00
Roman Lebedev 0f3bcb97ef
[X86] AMD Zen 3: same-reg SSE XMM PCMP is dep breaking one-idiom
As measured by exegesis, and confirmed by ref docs.
Much like with MMX PCMP, it does actually have to execute, though.
2021-05-10 23:49:26 +03:00
Roman Lebedev b24edfff4f
[X86] AMD Zen 3: same-reg PCMPEQ is an MMX all-ones dep breaking idiom
They are, however, not zero-cycle, and do actually execute.

As measured by exegesis, and confirmed by ref docs.
2021-05-10 23:49:26 +03:00
Nikita Popov 463ea28e96 [InstCombine] Fold comparison of integers by parts
Let's say you represent (i32, i32) as an i64 from which the parts
are extracted with lshr/trunc. Then, if you compare two tuples by
parts you get something like A[0] == B[0] && A[1] == B[1], just
that the part extraction happens by lshr/trunc and not a narrow
load or similar.

The fold implemented here reduces such equality comparisons by
converting them into a comparison on a larger part of the integer
(which might be the whole integer). It handles both the "and of eq"
and the conjugated "or of ne" case.

I'm being conservative with one-use for now, though this could be
relaxed if profitable (the base pattern converts 11 instructions
into 5 instructions, but there's quite a few variations on how it
can play out).

Differential Revision: https://reviews.llvm.org/D101232
2021-05-10 22:22:39 +02:00
Florian Hahn 93a9a8a8d9
[VecLib] Add support for vector fns from Darwin's libsystem.
This patch adds support for Darwin's libsystem math vector functions to
TLI. Darwin's libsystem provides a range of vector functions for libm
functions.

This initial patch only adds the 2 x double and 4 x float versions,
which are available on both X86 and ARM64. On X86, wider vector versions
are supported as well.

Reviewed By: jroelofs

Differential Revision: https://reviews.llvm.org/D101856
2021-05-10 21:19:58 +01:00
Sam Clegg 5000a1b4b9 [lld][WebAssembly] Initial support merging string data
This change adds support for a new WASM_SEG_FLAG_STRINGS flag in
the object format which works in a similar fashion to SHF_STRINGS
in the ELF world.

Unlike the ELF linker this support is currently limited:
- No support for SHF_MERGE (non-string merging)
- Always do full tail merging ("lo" can be merged with "hello")
- Only support single byte strings (p2align 0)

Like the ELF linker merging is only performed at `-O1` and above.

This fixes part of https://bugs.llvm.org/show_bug.cgi?id=48828,
although crucially it doesn't not currently support debug sections
because they are not represented by data segments (they are custom
sections)

Differential Revision: https://reviews.llvm.org/D97657
2021-05-10 13:15:12 -07:00
Arthur Eubanks 85af8a8c1b [NFC] Use ArgListEntry indirect types more in ISel lowering
For opaque pointers, we're trying to avoid uses of
PointerType::getElementType().

A couple of ISel places use PointerType::getElementType(). Some of these
are easy to fix by using ArgListEntry's indirect types.

The inalloca type wasn't stored there, as opposed to preallocated and
byval which have their indirect types available, so add it and use it.

Differential Revision: https://reviews.llvm.org/D101713
2021-05-10 13:05:15 -07:00
Nikita Popov aa9b02ac75 [Inliner] Fix noalias metadata handling for instructions simplified during cloning (PR50270)
Instead of using VMap, which may include instructions from the
caller as a result of simplification, iterate over the
(FirstNewBlock, Caller->end()) range, which will only include new
instructions.

Fixes https://bugs.llvm.org/show_bug.cgi?id=50270.

Differential Revision: https://reviews.llvm.org/D102110
2021-05-10 21:59:59 +02:00
Stefan Pintilie 6215f49b8f [PowerPC] Spilling to registers does not require frame index scavenging
If spills are to registers instead of to the stack then a copy will be used
and frame index scavenging is not required.

This patch adds debug info to frame index scavenging and makes sure that
spilling to registers does not cause frame index scavenging.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D101360
2021-05-10 14:42:39 -05:00
Arthur Eubanks 16748bd2fb [TargetLowering] Only inspect attributes in the arguments for ArgListEntry
Parameter attributes are considered part of the function [1], and like
mismatched calling conventions [2], we can't have the verifier check for
mismatched parameter attributes.

[1] https://llvm.org/docs/LangRef.html#parameter-attributes
[2] https://llvm.org/docs/FAQ.html#why-does-instcombine-simplifycfg-turn-a-call-to-a-function-with-a-mismatched-calling-convention-into-unreachable-why-not-make-the-verifier-reject-it

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D101806
2021-05-10 12:35:11 -07:00
Amara Emerson dc75499998 [GlobalISel][IRTranslator] Fix bit-test lowering dropping phi edges.
For contiguous ranges we drop the last bit-test case but in doing so we skip
adding the new MBB PHI edges to the list of replacement PHI edges, and as a
result we incorrectly omit them in the G_PHI in finishPendingPhis().

Was found when bootstrapping clang with -O3 and GlobalISel enabled on Apple Silicon.
2021-05-10 11:59:31 -07:00
Sanjay Patel 88d8f10baf [PassManager] add helper function to hold set of vector passes (2nd try)
This is better no-functional-change-intended than the 1st attempt.
As noted in D102002, there were at least 2 diffs that went
unchecked in pass manager regressions tests: different pass
parameters (SimplifyCFG) and an extension point/callback.
Those should be lifted from the original code blocks correctly
now.
2021-05-10 14:43:00 -04:00