Commit Graph

34024 Commits

Author SHA1 Message Date
Dmitry Preobrazhensky 933ebc4078 [AMDGPU][MC][GFX8+] Enabled clamp for v_mul_i32_i24_e64 and v_mul_u32_u24_e64
See bug 45925: https://bugs.llvm.org/show_bug.cgi?id=45925

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D80287
2020-05-22 14:11:31 +03:00
QingShan Zhang d1076d729a [NFC][Test] Add test coverage for fsqrt on PowerPC 2020-05-22 10:59:27 +00:00
Victor Campos 872ee78f65 Revert "[ARM] Improve codegen of volatile load/store of i64"
This reverts commit 8a12553223.

A bug has been found when generating code for Thumb2. In some very
specific cases, the prologue/epilogue emitter generates erroneous stack
offsets for the new LDRD instructions that access the stack.

This bug does not seem to be caused by the reverted patch though. Likely
the latter has made an undiscovered issue emerge in the
prologue/epilogue emission pass. Nevertheless, this reversion is
necessary since it is blocking users of the ARM backend.
2020-05-22 11:01:57 +01:00
Jessica Paquette 49a4f3f7d8 [AArch64][GlobalISel] Add a post-legalizer combiner with a very simple combine.
(This patch is by Jessica, I'm just committing it on her behalf because I need
a post-legalizer combiner for something else).

This supersedes D77250, which did equivalent work in the selector. This can be
done pre-legalization or post-legalization. Post-legalization is more likely to
hit, since G_IMPLICIT_DEFs tend to appear during legalization. There's no reason
to not do it pre-legalization though-- if it can be caught earlier, great.

(I also think that it might be worth reimplementing D78769 using a
target-specific post-legalization combine too after thinking about it for a
while.)

Differential Revision: https://reviews.llvm.org/D78852
2020-05-21 18:47:32 -07:00
Alexey Lapshin bf242c067e [AARCH64][NEON] Allow to sink operands of aarch64_neon_pmull64.
Summary:
This patch fixes a problem when pmull2 instruction is not
generated for vmull_high_p64 intrinsic.

ISel has a pattern for int_aarch64_neon_pmull64 intrinsic to generate
PMULL2 instruction. That pattern assumes that extraction operations
are located in the same basic block. We need to sink them
if they are not. Handle operands of int_aarch64_neon_pmull64
into AArch64TargetLowering::shouldSinkOperands.

Reviewed by: efriedma

Differential Revision: https://reviews.llvm.org/D80320
2020-05-22 01:35:24 +03:00
Arthur Eubanks fc937806ef Don't jump to landing pads in Control Flow Optimizer
Summary: Likely fixes https://bugs.llvm.org/show_bug.cgi?id=45858.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80047
2020-05-21 15:19:10 -07:00
Tim Renouf d13a508820 [AMDGPU] Fixed incorrect PAL metadata register naming
This only affects assembly and -filetype=asm codegen of PAL metadata.

Differential Revision: https://reviews.llvm.org/D78860

Change-Id: I7b822e1917bf7b403486820d31afc483be207652
2020-05-21 22:13:19 +01:00
Jean-Michel Gorius 7019cea26d [CodeGen] Add support for multiple memory operands in MachineInstr::mayAlias
Summary:
To support all targets, the mayAlias member function needs to support instructions with multiple operands.

This revision also changes the order of the emitted instructions in some test cases.

Reviewers: efriedma, hfinkel, craig.topper, dmgreen

Reviewed By: efriedma

Subscribers: MatzeB, dmgreen, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80161
2020-05-21 23:02:54 +02:00
Stanislav Mekhanoshin 689e616ed0 [AMDGPU] Promote alloca to vector in opt
Promote alloca to vector before SROA and loop unroll. If we manage
to eliminate allocas before unroll we may choose to unroll less.

Differential Revision: https://reviews.llvm.org/D80386
2020-05-21 13:49:51 -07:00
Eli Friedman f09d220c71 [AArch64][SVE] Fill out missing unpredicated load/store patterns.
The set of patterns for unpredicated load/store was incomplete: it only
included non-extending stores.  Fill out the remaining patterns for
extending stores, and add the corresponding support to frame offset
lowering.

Differential Revision: https://reviews.llvm.org/D80349
2020-05-21 13:29:30 -07:00
Stanislav Mekhanoshin 71bbe5d799 [AMDGPU] Added opt pipeline test. NFC. 2020-05-21 11:58:35 -07:00
Stanislav Mekhanoshin 1dfd1b3e4b [AMDGPU] Tune threshold for cmp/select vector lowering
It was set in total vector size while the idea was to limit
a number of instructions. Now it started to work with doubles
and thresholds needs to be updated.

Differential Revision: https://reviews.llvm.org/D80322
2020-05-21 08:59:35 -07:00
Jon Roelofs 5fb979dd06 [llvm][test] Add missing FileCheck colons. NFC 2020-05-21 09:29:27 -06:00
Jon Roelofs 183d6af081 [llvm][test] Add COM: directives before colon-less non-CHECKs in comments. NFC
Differential Revision: https://reviews.llvm.org/D79963
2020-05-21 09:29:27 -06:00
Sjoerd Meijer b0614509a0 [HardwareLoops] llvm.loop.decrement.reg definition
This is split off from D80316, slightly tightening the definition of overloaded
hardwareloop intrinsic llvm.loop.decrement.reg specifying that both operands
its result have the same type.
2020-05-21 10:48:16 +01:00
Denis Antrushin dedcefe09d [Statepoint] Constant fold FP deopt args.
We do not have any special handling for constant FP deopt arguments.
They are just spilled to stack or generated in register by MOVS
instruction. This is inefficient and, when we have too many such
constant arguments, may result in register allocation failure.
Instead, we can bitcast such constant FP operands to appropriately
sized integer and record as constant into statepoint and later, into
StackMap.

Reviewed By: skatkov
Differential Revision: https://reviews.llvm.org/D80318
2020-05-21 11:02:54 +03:00
Chen Zheng 8086cdd1b0 [PowerPC] add more high latency opcodes for machine combiner pass
Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D80097
2020-05-21 02:39:20 -04:00
Craig Topper ae5ab2f40a [LegalizeDAG] Modify ExpandLegalINT_TO_FP to swap data for little/big endian instead of the pointers.
Will make it easier to pass the pointer info and alignment
correctly to the loads/stores.

While there also make the i32 stores independent and use a token
factor to join before the load.
2020-05-20 22:29:59 -07:00
Eli Friedman b4f9b34701 [AArch64] Fix unwind info generated by outliner.
The offsets were wrong. The result is now the same as what the compiler
would generate for a function that spills lr normally.

Differential Revision: https://reviews.llvm.org/D80238
2020-05-20 16:39:00 -07:00
Francis Visoiu Mistrih 770ba4f051 [AArch64] Fix GlobalISel tests on non-darwin platforms
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/6998
2020-05-20 16:26:58 -07:00
Francis Visoiu Mistrih 161122ea1c [AArch64] Provide Darwin variants of most calling conventions
With the new SVE stack layout, we now need to provide a Darwin variant
for all the calling conventions based on the main AAPCS CSR save order.

This also changes APCS_SwiftError to have a Darwin and a non-Darwin
version, assuming it could be used on other platforms these days, and
restricts the AArch64_CXX_TLS calling convention to Darwin.

Differential Revision: https://reviews.llvm.org/D73805
2020-05-20 16:03:48 -07:00
Stanislav Mekhanoshin 4eecf17164 [AMDGPU] Always expand ext/insertelement with divergent idx
Even though series of cmd/cndmask can produce quite a lot of
code that is still better than a loop. In case of doubles we
would even produce two loops.

Differential Revision: https://reviews.llvm.org/D80032
2020-05-20 15:51:29 -07:00
aartbik 645bba8d3d [llvm] [CodeGen] [X86] Fix issues with v4i1 instruction selection
Summary:
Fixes issue
https://bugs.llvm.org/show_bug.cgi?id=45995

Reviewers: mehdi_amini, nicolasvasilache, reidtatge, craig.topper, ftynse, bkramer

Reviewed By: craig.topper

Subscribers: RKSimon, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80231
2020-05-20 11:34:56 -07:00
Arthur Eubanks 8a88755610 Reland [X86] Codegen for preallocated
See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.

In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.

This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.

The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.

The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.

Force any function containing a preallocated call to use the frame
pointer.

Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.

Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).

Aside from the tests added here, I checked that this codegen produces
correct code for something like

```
struct A {
        A();
        A(A&&);
        ~A();
};

void bar() {
        foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```

by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.

Reverted due to unexpectedly passing tests, added REQUIRES: asserts for reland.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77689
2020-05-20 11:25:44 -07:00
Arthur Eubanks b8cbff51d3 Revert "[X86] Codegen for preallocated"
This reverts commit 810567dc69.

Some tests are unexpectedly passing
2020-05-20 10:04:55 -07:00
Arthur Eubanks 810567dc69 [X86] Codegen for preallocated
See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.

In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.

This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.

The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.

The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.

Force any function containing a preallocated call to use the frame
pointer.

Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.

Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).

Aside from the tests added here, I checked that this codegen produces
correct code for something like

```
struct A {
        A();
        A(A&&);
        ~A();
};

void bar() {
        foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```

by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77689
2020-05-20 09:20:38 -07:00
Matt Arsenault e8f6b0e583 AMDGPU/GlobalISel: Fix splitting 64-bit extensions
This was replicating the low bits into the high bits for G_ZEXT,
rather than using 0.
2020-05-20 11:13:32 -04:00
Jay Foad 3c84353804 [AMDGPU] Add the test from D49097. 2020-05-20 14:34:51 +01:00
Pierre-vh 835251f7d9 [Target][ARM] Make Low Overhead Loops coexist with VPT blocks.
Previously, the LowOverheadLoops pass couldn't handle VPT blocks
with conditions, or with multiple VCTPs. This patch improves the
LowOverheadLoops pass so it can handle those cases.

It also adds support for VCMPs before the VCTP.

Differential Revision: https://reviews.llvm.org/D78206
2020-05-20 12:24:55 +01:00
Kang Zhang 58684fbb6f [NFC][PowerPC] Add 2 new cases to test livevars pass 2020-05-20 05:32:09 +00:00
Stanislav Mekhanoshin 677929e352 [AMDGPU] Process V_MOV_B32_indirect in SET_GPR_IDX optimization
Differential Revision: https://reviews.llvm.org/D80256
2020-05-19 21:37:14 -07:00
Matt Arsenault 77f05e5b53 AMDGPU/GlobalISel: Fix bug in test register bank
The intent wasn't cases with illegal VGPR to SGPR copies.
2020-05-19 22:52:59 -04:00
QingShan Zhang 2b59e9f1bd [DAGCombine] Remove the getNegatibleCost to avoid the out of sync with getNegatedExpression
We have the getNegatibleCost/getNegatedExpression to evaluate the cost and negate the expression.
However, during negating the expression, the cost might change as we are changing the DAG,
and then, hit the assertion if we negated the wrong expression as the cost is not trustful anymore.

This patch is target to remove the getNegatibleCost to avoid the out of sync with getNegatedExpression,
and check the cost during negating the expression. It also reduce the duplicated code between
getNegatibleCost and getNegatedExpression. And fix the crash for the test in D76638

Reviewed By: RKSimon, spatel

Differential Revision: https://reviews.llvm.org/D77319
2020-05-20 02:12:16 +00:00
Matt Arsenault 21d2884a9c AMDGPU: Annotate functions that have stack objects
Relying on any MachineFunction state in the MachineFunctionInfo
constructor is hazardous, because the construction time is unclear and
determined by the first use. The function may be only partially
constructed, which is part of why we have many of these hacky string
attributes to track what we need for ABI lowering.

For SelectionDAG, all stack objects are created up-front before
calling convention lowering so stack objects are visible at
construction time. For GlobalISel, none of the IR function has been
visited yet and the allocas haven't been added to the MachineFrameInfo
yet. This should fix failing to set flat_scratch_init in GlobalISel
when needed.

This pass really needs to be turned into some kind of analysis, but I
haven't found a nice way use one here.
2020-05-19 18:51:00 -04:00
Cameron McInally e89a08aefd [SVE] MOVPRFX zero merging test renaming
Differential Revision: https://reviews.llvm.org/D80244
2020-05-19 17:33:19 -05:00
Matt Arsenault 08ae945318 GlobalISel: Copy correct flags to select
This was looking for a compare condition, and copying the compare
flags. I don't think this was ever correct outside of certain min/max
patterns which aren't checked, but this probably predates select
instructions having fast math flags.
2020-05-19 18:31:24 -04:00
Matt Arsenault 074b802654 AMDGPU: Fix DAG divergence for implicit function arguments
This should be directly implied from the register class, and there's
no need to special case live ins here. This was getting the wrong
answer for the queue ptr argument in callable functions, since it's
not an explicit IR argument and is always uniform.

Fixes not using scalar loads for the aperture in addrspacecast
lowering, and any other places that use implicit SGPR arguments.
2020-05-19 18:11:34 -04:00
Eli Friedman 5d2c3a0b8c [AArch64] Disable MachineOutliner on Windows.
The handling of unwind info is broken, so disable it for now.
2020-05-19 13:49:03 -07:00
Thomas Lively 8a43d41a40 [WebAssembly] Fix bug in custom shuffle combine
Summary:
The code previously assumed the source of the bitcast in the combined
pattern was a vector type, but this is not always true. This patch
adds a check to avoid an assertion failure in that case.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80164
2020-05-19 12:54:15 -07:00
Thomas Lively 3181273be7 [WebAssembly] Implement i64x2.mul and remove i8x16.mul
Summary:
This reflects changes in the spec proposal made since basic arithmetic
was first implemented.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D80174
2020-05-19 12:50:44 -07:00
Craig Topper ccba60a784 [StackColoring] When remapping alloca's move the To alloca if the From alloca is before it.
If To is after From its possible that there's a use of From
between them.

Fixes issue reported here http://lists.llvm.org/pipermail/llvm-dev/2020-May/141421.html

Differential Revision: https://reviews.llvm.org/D80101
2020-05-19 10:37:27 -07:00
Matt Arsenault a7759d1785 GlobalISel: Fix IRTranslator for constantexpr selects
This was assuming a select is always an instruction, which is not
true.
2020-05-19 09:52:48 -04:00
Sam Parker e86f3075f8 [NFC][ARM] Add more tail predication tests 2020-05-19 14:01:10 +01:00
Carl Ritson eeece6dbe6 [AMDGPU] Add more VMEM to SALU WAR hazard tests. NFC 2020-05-19 19:52:13 +09:00
Jonas Paulsson b3bd0c37ec [SystemZ] Eliminate the need to create a zero vector by reusing the VPERM mask.
Try to avoid creating VGBMs by reusing the permutation mask if it contains a
zero. If the first byte was into (any byte of) a zero vector, then the first
byte of the mask can become zero and reused by putting the mask also as the
first operand. If there instead was a first-byte use of the other source
operand, then that zero index can be reused if the mask is placed as the
second operand.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D79925
2020-05-19 09:37:19 +02:00
Qiu Chaofan ad4f196e25 [NFC] [PowerPC] Refresh fma-mutate.ll using script
This is a clean-up after D78989. The old comments are out of date.
2020-05-19 13:39:58 +08:00
Chen Zheng a6be4d17e3 [PowerPC-QPX] adjust operands order of qpx fma instructions.
convert
  %3 = QVFMADD %2, %0, %1, implicit $rm
to
  %3 = QVFMADD %2, %1, %0, implicit $rm

Reviewed By: hfinkel, steven.zhang

Differential Revision: https://reviews.llvm.org/D78986
2020-05-18 22:59:51 -04:00
Yonghong Song 8e8f1bd75a [BPF] Return fail if disassembled insn registers out of range
Daniel reported a llvm-objdump segfault like below:
  $ llvm-objdump -D bpf_xdp.o
  ...
  0000000000000000 <.strtab>:
       0:       00 63 69 6c 69 75 6d 5f <unknown>
       1:       6c 62 36 5f 61 66 66 69 w2 <<= w6
  ...
  (llvm-objdump: lib/Target/BPF/BPFGenAsmWriter.inc:1087: static const char*
   llvm::BPFInstPrinter::getRegisterName(unsigned int): Assertion
   `RegNo && RegNo < 25 && "Invalid register number!"' failed.
   Stack dump:
   0.      Program arguments: llvm-objdump -D bpf_xdp.o
    ...
    abort
    ...
    llvm::BPFInstPrinter::getRegisterName(unsigned int)
    llvm::BPFInstPrinter::printMemOperand(llvm::MCInst const*,
                          int, llvm::raw_ostream&, char const*)
    llvm::BPFInstPrinter::printInstruction(llvm::MCInst const*,
                          unsigned long, llvm::raw_ostream&)
    llvm::BPFInstPrinter::printInst(llvm::MCInst const*,
                          unsigned long, llvm::StringRef, llvm::MCSubtargetInfo const&,
                          llvm::raw_ostream&)
   ...

Basically, since -D enables disassembly for all sections, .strtab is also disassembled,
but some strings are decoded as legal instructions but with illegal register numbers.
When llvm-objdump tries to print register name for these illegal register numbers,
assertion and segfault happens.

The patch fixed the issue by returning fail for a disassembled insn if
that insn contains a reg operand with illegal reg number.
The insn will be printed as "<unknown>" instead of causing an assertion.
2020-05-18 18:53:23 -07:00
Chen Zheng 4a69eda6f3 [PowerPC][MachineCombiner] add testcase for reassociating FMA - NFC 2020-05-18 21:18:01 -04:00
Yonghong Song ddff9799d2 [BPF] Prevent disassembly segfault for NOP insn
For a simple program like below:
  -bash-4.4$ cat t.c
  int test() {
    asm volatile("r0 = r0" ::);
    return 0;
  }
compiled with
  clang -target bpf -O2 -c t.c
the following llvm-objdump command will segfault.
  llvm-objdump -d t.o

  0:       bf 00 00 00 00 00 00 00 nop
  llvm-objdump: ../include/llvm/ADT/SmallVector.h:180
  ...
  Assertion `idx < size()' failed
  ...
  abort
  ...
  llvm::BPFInstPrinter::printOperand
  llvm::BPFInstPrinter::printInstruction
  ...

The reason is both NOP and MOV_rr (r0 = r0) having the same encoding.
The disassembly getInstruction() decodes to be a NOP instruciton but
during printInstruction() the same encoding is interpreted as
a MOV_rr instruction. Such a mismatcch caused the segfault.

The fix is to make NOP instruction as CodeGen only so disassembler
will skip NOP insn for disassembling.

Note that instruction "r0 = r0" should not appear in non inline
asm codes since BPF Machine Instruction Peephole optimization will
remove it.

Differential Revision: https://reviews.llvm.org/D80156
2020-05-18 17:40:18 -07:00