This reverts commit 8a12553223.
A bug has been found when generating code for Thumb2. In some very
specific cases, the prologue/epilogue emitter generates erroneous stack
offsets for the new LDRD instructions that access the stack.
This bug does not seem to be caused by the reverted patch though. Likely
the latter has made an undiscovered issue emerge in the
prologue/epilogue emission pass. Nevertheless, this reversion is
necessary since it is blocking users of the ARM backend.
(This patch is by Jessica, I'm just committing it on her behalf because I need
a post-legalizer combiner for something else).
This supersedes D77250, which did equivalent work in the selector. This can be
done pre-legalization or post-legalization. Post-legalization is more likely to
hit, since G_IMPLICIT_DEFs tend to appear during legalization. There's no reason
to not do it pre-legalization though-- if it can be caught earlier, great.
(I also think that it might be worth reimplementing D78769 using a
target-specific post-legalization combine too after thinking about it for a
while.)
Differential Revision: https://reviews.llvm.org/D78852
Summary:
This patch fixes a problem when pmull2 instruction is not
generated for vmull_high_p64 intrinsic.
ISel has a pattern for int_aarch64_neon_pmull64 intrinsic to generate
PMULL2 instruction. That pattern assumes that extraction operations
are located in the same basic block. We need to sink them
if they are not. Handle operands of int_aarch64_neon_pmull64
into AArch64TargetLowering::shouldSinkOperands.
Reviewed by: efriedma
Differential Revision: https://reviews.llvm.org/D80320
This only affects assembly and -filetype=asm codegen of PAL metadata.
Differential Revision: https://reviews.llvm.org/D78860
Change-Id: I7b822e1917bf7b403486820d31afc483be207652
Summary:
To support all targets, the mayAlias member function needs to support instructions with multiple operands.
This revision also changes the order of the emitted instructions in some test cases.
Reviewers: efriedma, hfinkel, craig.topper, dmgreen
Reviewed By: efriedma
Subscribers: MatzeB, dmgreen, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80161
Promote alloca to vector before SROA and loop unroll. If we manage
to eliminate allocas before unroll we may choose to unroll less.
Differential Revision: https://reviews.llvm.org/D80386
The set of patterns for unpredicated load/store was incomplete: it only
included non-extending stores. Fill out the remaining patterns for
extending stores, and add the corresponding support to frame offset
lowering.
Differential Revision: https://reviews.llvm.org/D80349
It was set in total vector size while the idea was to limit
a number of instructions. Now it started to work with doubles
and thresholds needs to be updated.
Differential Revision: https://reviews.llvm.org/D80322
This is split off from D80316, slightly tightening the definition of overloaded
hardwareloop intrinsic llvm.loop.decrement.reg specifying that both operands
its result have the same type.
We do not have any special handling for constant FP deopt arguments.
They are just spilled to stack or generated in register by MOVS
instruction. This is inefficient and, when we have too many such
constant arguments, may result in register allocation failure.
Instead, we can bitcast such constant FP operands to appropriately
sized integer and record as constant into statepoint and later, into
StackMap.
Reviewed By: skatkov
Differential Revision: https://reviews.llvm.org/D80318
Will make it easier to pass the pointer info and alignment
correctly to the loads/stores.
While there also make the i32 stores independent and use a token
factor to join before the load.
The offsets were wrong. The result is now the same as what the compiler
would generate for a function that spills lr normally.
Differential Revision: https://reviews.llvm.org/D80238
With the new SVE stack layout, we now need to provide a Darwin variant
for all the calling conventions based on the main AAPCS CSR save order.
This also changes APCS_SwiftError to have a Darwin and a non-Darwin
version, assuming it could be used on other platforms these days, and
restricts the AArch64_CXX_TLS calling convention to Darwin.
Differential Revision: https://reviews.llvm.org/D73805
Even though series of cmd/cndmask can produce quite a lot of
code that is still better than a loop. In case of doubles we
would even produce two loops.
Differential Revision: https://reviews.llvm.org/D80032
See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.
In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.
This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.
The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.
The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.
Force any function containing a preallocated call to use the frame
pointer.
Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.
Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).
Aside from the tests added here, I checked that this codegen produces
correct code for something like
```
struct A {
A();
A(A&&);
~A();
};
void bar() {
foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```
by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.
Reverted due to unexpectedly passing tests, added REQUIRES: asserts for reland.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77689
See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.
In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.
This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.
The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.
The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.
Force any function containing a preallocated call to use the frame
pointer.
Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.
Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).
Aside from the tests added here, I checked that this codegen produces
correct code for something like
```
struct A {
A();
A(A&&);
~A();
};
void bar() {
foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```
by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77689
Previously, the LowOverheadLoops pass couldn't handle VPT blocks
with conditions, or with multiple VCTPs. This patch improves the
LowOverheadLoops pass so it can handle those cases.
It also adds support for VCMPs before the VCTP.
Differential Revision: https://reviews.llvm.org/D78206
We have the getNegatibleCost/getNegatedExpression to evaluate the cost and negate the expression.
However, during negating the expression, the cost might change as we are changing the DAG,
and then, hit the assertion if we negated the wrong expression as the cost is not trustful anymore.
This patch is target to remove the getNegatibleCost to avoid the out of sync with getNegatedExpression,
and check the cost during negating the expression. It also reduce the duplicated code between
getNegatibleCost and getNegatedExpression. And fix the crash for the test in D76638
Reviewed By: RKSimon, spatel
Differential Revision: https://reviews.llvm.org/D77319
Relying on any MachineFunction state in the MachineFunctionInfo
constructor is hazardous, because the construction time is unclear and
determined by the first use. The function may be only partially
constructed, which is part of why we have many of these hacky string
attributes to track what we need for ABI lowering.
For SelectionDAG, all stack objects are created up-front before
calling convention lowering so stack objects are visible at
construction time. For GlobalISel, none of the IR function has been
visited yet and the allocas haven't been added to the MachineFrameInfo
yet. This should fix failing to set flat_scratch_init in GlobalISel
when needed.
This pass really needs to be turned into some kind of analysis, but I
haven't found a nice way use one here.
This was looking for a compare condition, and copying the compare
flags. I don't think this was ever correct outside of certain min/max
patterns which aren't checked, but this probably predates select
instructions having fast math flags.
This should be directly implied from the register class, and there's
no need to special case live ins here. This was getting the wrong
answer for the queue ptr argument in callable functions, since it's
not an explicit IR argument and is always uniform.
Fixes not using scalar loads for the aperture in addrspacecast
lowering, and any other places that use implicit SGPR arguments.
Summary:
The code previously assumed the source of the bitcast in the combined
pattern was a vector type, but this is not always true. This patch
adds a check to avoid an assertion failure in that case.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80164
Summary:
This reflects changes in the spec proposal made since basic arithmetic
was first implemented.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D80174
Try to avoid creating VGBMs by reusing the permutation mask if it contains a
zero. If the first byte was into (any byte of) a zero vector, then the first
byte of the mask can become zero and reused by putting the mask also as the
first operand. If there instead was a first-byte use of the other source
operand, then that zero index can be reused if the mask is placed as the
second operand.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D79925
Daniel reported a llvm-objdump segfault like below:
$ llvm-objdump -D bpf_xdp.o
...
0000000000000000 <.strtab>:
0: 00 63 69 6c 69 75 6d 5f <unknown>
1: 6c 62 36 5f 61 66 66 69 w2 <<= w6
...
(llvm-objdump: lib/Target/BPF/BPFGenAsmWriter.inc:1087: static const char*
llvm::BPFInstPrinter::getRegisterName(unsigned int): Assertion
`RegNo && RegNo < 25 && "Invalid register number!"' failed.
Stack dump:
0. Program arguments: llvm-objdump -D bpf_xdp.o
...
abort
...
llvm::BPFInstPrinter::getRegisterName(unsigned int)
llvm::BPFInstPrinter::printMemOperand(llvm::MCInst const*,
int, llvm::raw_ostream&, char const*)
llvm::BPFInstPrinter::printInstruction(llvm::MCInst const*,
unsigned long, llvm::raw_ostream&)
llvm::BPFInstPrinter::printInst(llvm::MCInst const*,
unsigned long, llvm::StringRef, llvm::MCSubtargetInfo const&,
llvm::raw_ostream&)
...
Basically, since -D enables disassembly for all sections, .strtab is also disassembled,
but some strings are decoded as legal instructions but with illegal register numbers.
When llvm-objdump tries to print register name for these illegal register numbers,
assertion and segfault happens.
The patch fixed the issue by returning fail for a disassembled insn if
that insn contains a reg operand with illegal reg number.
The insn will be printed as "<unknown>" instead of causing an assertion.
For a simple program like below:
-bash-4.4$ cat t.c
int test() {
asm volatile("r0 = r0" ::);
return 0;
}
compiled with
clang -target bpf -O2 -c t.c
the following llvm-objdump command will segfault.
llvm-objdump -d t.o
0: bf 00 00 00 00 00 00 00 nop
llvm-objdump: ../include/llvm/ADT/SmallVector.h:180
...
Assertion `idx < size()' failed
...
abort
...
llvm::BPFInstPrinter::printOperand
llvm::BPFInstPrinter::printInstruction
...
The reason is both NOP and MOV_rr (r0 = r0) having the same encoding.
The disassembly getInstruction() decodes to be a NOP instruciton but
during printInstruction() the same encoding is interpreted as
a MOV_rr instruction. Such a mismatcch caused the segfault.
The fix is to make NOP instruction as CodeGen only so disassembler
will skip NOP insn for disassembling.
Note that instruction "r0 = r0" should not appear in non inline
asm codes since BPF Machine Instruction Peephole optimization will
remove it.
Differential Revision: https://reviews.llvm.org/D80156