In case the then-path of an if-region is empty, then merging with the
else-path should be handled with the inverse of the condition (leading
to that path).
Fix PR37662
Differential Revision: https://reviews.llvm.org/D78881
Summary:
Currently, `rewriteLoopExitValues()`'s logic is roughly as following:
> Loop over each incoming value in each PHI node.
> Query whether the SCEV for that incoming value is high-cost.
> Expand the SCEV.
> Perform sanity check (`isValidRewrite()`, D51582)
> Record the info
> Afterwards, see if we can drop the loop given replacements.
> Maybe perform replacements.
The problem is that we interleave SCEV cost checking and expansion.
This is A Problem, because `isHighCostExpansion()` takes special care
to not bill for the expansions that were already expanded, and we can reuse.
While it makes sense in general - if we know that we will expand some SCEV,
all the other SCEV's costs should account for that, which might cause
some of them to become non-high-cost too, and cause chain reaction.
But that isn't what we are doing here. We expand *all* SCEV's, unconditionally.
So every next SCEV's cost will be affected by the already-performed expansions
for previous SCEV's. Even if we are not planning on keeping
some of the expansions we performed.
Worse yet, this current "bonus" depends on the exact PHI node
incoming value processing order. This is completely wrong.
As an example of an issue, see @dmajor's `pr45835.ll` - if we happen to have
a PHI node with two(!) identical high-cost incoming values for the same basic blocks,
we would decide first time around that it is high-cost, expand it,
and immediately decide that it is not high-cost because we have an expansion
that we could reuse (because we expanded it right before, temporarily),
and replace the second incoming value but not the first one;
thus resulting in a broken PHI.
What we instead should do for now, is not perform any expansions
until after we've queried all the costs.
Later, in particular after `isValidRewrite()` is an assertion (D51582)
we could improve upon that, but in a more coherent fashion.
See [[ https://bugs.llvm.org/show_bug.cgi?id=45835 | PR45835 ]]
Reviewers: dmajor, reames, mkazantsev, fhahn, efriedma
Reviewed By: dmajor, mkazantsev
Subscribers: smeenai, nikic, hiraditya, javed.absar, llvm-commits, dmajor
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79787
This is split off from D80316, slightly tightening the definition of overloaded
hardwareloop intrinsic llvm.loop.decrement.reg specifying that both operands
its result have the same type.
We do not have any special handling for constant FP deopt arguments.
They are just spilled to stack or generated in register by MOVS
instruction. This is inefficient and, when we have too many such
constant arguments, may result in register allocation failure.
Instead, we can bitcast such constant FP operands to appropriately
sized integer and record as constant into statepoint and later, into
StackMap.
Reviewed By: skatkov
Differential Revision: https://reviews.llvm.org/D80318
I have refactored the code so that we no longer need the
ScalableVecArgument descriptor - the scalable property of vectors is
now encoded using the ElementCount class in IITDescriptor. This means
that when matching intrinsics we know precisely how to match the
arguments and return values.
Differential Revision: https://reviews.llvm.org/D80107
With the two getIntrinsicInstrCosts folded into one, now fold in the
scalar/code-size orientated getIntrinsicCost. This involved sinking
cost of the TTIImpl into the base implementation, as it performs no
target checks. The opcodes remaining were memcpy, cttz and ctlz which
now have special handling in the BasicTTI implementation.
getInstructionThroughput can now directly return the result of
getUserCost.
This had required a change in the AMDGPU backend for fabs and its
always 'free'. I've also changed the X86 backend to return '1' for
any intrinsic when the CostKind isn't RecipThroughput.
Though this intended to be a non-functional change, there are many
paths being combined here so I would be very surprised if this didn't
have an effect.
Differential Revision: https://reviews.llvm.org/D80012
This has not been implemented by any backends which appear to cover
the functionality through getCastInstrCost. Sink what there is in the
default implementation into BasicTTI.
Differential Revision: https://reviews.llvm.org/D78922
The function does not need an MCStreamer per se; it was used only to get
access to the MCContext.
Differential Revision: https://reviews.llvm.org/D80205
Hide the method that allows setting probability for particular edge
and introduce a public method that sets probabilities for all
outgoing edges at once.
Setting individual edge probability is error prone. More over it is
difficult to check that the total probability is 1.0 because there is
no easy way to know when the user finished setting all
the probabilities.
Related bug is fixed in BranchProbabilityInfo::calcMetadataWeights().
Changing unreachable branch probabilities to raw(1) and distributing
the rest (oldProbability - raw(1)) over the reachable branches could
introduce total probability inaccuracy bigger than 1/numOfBranches.
Reviewers: yamauchi, ebrevnov
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79396
Will make it easier to pass the pointer info and alignment
correctly to the loads/stores.
While there also make the i32 stores independent and use a token
factor to join before the load.
Summary:
If an induction variable is frozen and used, SCEV yields imprecise result
because it doesn't say anything about frozen variables.
Due to this reason, performance degradation happened after
https://reviews.llvm.org/D76483 is merged, causing
SCEV yield imprecise result and preventing LSR to optimize a loop.
The suggested solution here is to add a pass which canonicalizes frozen variables
inside a loop. To be specific, it pushes freezes out of the loop by freezing
the initial value and step values instead & dropping nsw/nuw flags from instructions used by freeze.
This solution was also mentioned at https://reviews.llvm.org/D70623 .
Reviewers: spatel, efriedma, lebedev.ri, fhahn, jdoerfert
Reviewed By: fhahn
Subscribers: nikic, mgorny, hiraditya, javed.absar, llvm-commits, sanwou01, nlopes
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77523
The offsets were wrong. The result is now the same as what the compiler
would generate for a function that spills lr normally.
Differential Revision: https://reviews.llvm.org/D80238
If we don't know anything about the alignment of a pointer, Align(1) is
still correct: all pointers are at least 1-byte aligned.
Included in this patch is a bugfix for an issue discovered during this
cleanup: pointers with "dereferenceable" attributes/metadata were
assumed to be aligned according to the type of the pointer. This
wasn't intentional, as far as I can tell, so Loads.cpp was fixed to
stop making this assumption. Frontends may need to be updated. I
updated clang's handling of C++ references, and added a release note for
this.
Differential Revision: https://reviews.llvm.org/D80072
With the new SVE stack layout, we now need to provide a Darwin variant
for all the calling conventions based on the main AAPCS CSR save order.
This also changes APCS_SwiftError to have a Darwin and a non-Darwin
version, assuming it could be used on other platforms these days, and
restricts the AArch64_CXX_TLS calling convention to Darwin.
Differential Revision: https://reviews.llvm.org/D73805
Even though series of cmd/cndmask can produce quite a lot of
code that is still better than a loop. In case of doubles we
would even produce two loops.
Differential Revision: https://reviews.llvm.org/D80032
Previously this code just used a default constructed
MachinePointerInfo. But we know the accesses are to a fixed stack
object or at least somewhere on the stack.
While there fix the alignment passed to the full vector load/stores.
I don't think this function is currently exercised in tree so I
don't know how to test it. I just noticed it when I removed
non-constant index support in this function.
Differential Revision: https://reviews.llvm.org/D80058
Demangling Itanium symbols either consumes the whole input or fails,
but Microsoft symbols can be successfully demangled with just some
of the input.
Add an outparam that enables clients to know how much of the input was
consumed, and use this flag to give llvm-undname an opt-in warning
on partially consumed symbols.
Differential Revision: https://reviews.llvm.org/D80173
See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.
In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.
This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.
The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.
The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.
Force any function containing a preallocated call to use the frame
pointer.
Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.
Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).
Aside from the tests added here, I checked that this codegen produces
correct code for something like
```
struct A {
A();
A(A&&);
~A();
};
void bar() {
foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```
by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.
Reverted due to unexpectedly passing tests, added REQUIRES: asserts for reland.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77689
Summary:
Rename 'i' to 'I'.
Factor out the optional field handling to getOptionalVal().
Split out of D79951.
Reviewers: davidxl
Subscribers: eraman, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80230
See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.
In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.
This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.
The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.
The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.
Force any function containing a preallocated call to use the frame
pointer.
Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.
Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).
Aside from the tests added here, I checked that this codegen produces
correct code for something like
```
struct A {
A();
A(A&&);
~A();
};
void bar() {
foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```
by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77689
Previously, the LowOverheadLoops pass couldn't handle VPT blocks
with conditions, or with multiple VCTPs. This patch improves the
LowOverheadLoops pass so it can handle those cases.
It also adds support for VCMPs before the VCTP.
Differential Revision: https://reviews.llvm.org/D78206
Combine the two API calls into one by introducing a structure to hold
the relevant data. This has the added benefit of moving the boiler
plate code for arguments and flags, into the constructors. This is
intended to be a non-functional change, but the complicated web of
logic involved here makes it very hard to guarantee.
Differential Revision: https://reviews.llvm.org/D79941
SCEVExpander modifies the underlying function so it is more suitable in
Transforms/Utils, rather than Analysis. This allows using other
transform utils in SCEVExpander.
This patch was originally committed as b8a3c34eee, but broke the
modules build, as LoopAccessAnalysis was using the Expander.
The code-gen part of LAA was moved to lib/Transforms recently, so this
patch can be landed again.
Reviewers: sanjoy.google, efriedma, reames
Reviewed By: sanjoy.google
Differential Revision: https://reviews.llvm.org/D71537
Summary:
For PowerPC, there are 3 passes has disabled the machine verification.
```
PPCTargetMachine.cpp: addPass(&LiveVariablesID, false);
PPCTargetMachine.cpp: addPass(createPPCEarlyReturnPass(), false);
PPCTargetMachine.cpp: addPass(createPPCBranchSelectionPass(), false);
```
This patch is to enable machine verification for above three passes.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D79840
Replace with forward declarations and move necessary includes down to source files.
Exposes an implicit dependency on TargetMachine.h in llvm-opt-fuzzer.cpp
This is the second attempt at landing this patch, after fixing the
KeepOneInputPHIs behaviour to also keep zero input PHIs.
Differential Revision: https://reviews.llvm.org/D80141
We have the getNegatibleCost/getNegatedExpression to evaluate the cost and negate the expression.
However, during negating the expression, the cost might change as we are changing the DAG,
and then, hit the assertion if we negated the wrong expression as the cost is not trustful anymore.
This patch is target to remove the getNegatibleCost to avoid the out of sync with getNegatedExpression,
and check the cost during negating the expression. It also reduce the duplicated code between
getNegatibleCost and getNegatedExpression. And fix the crash for the test in D76638
Reviewed By: RKSimon, spatel
Differential Revision: https://reviews.llvm.org/D77319
Relying on any MachineFunction state in the MachineFunctionInfo
constructor is hazardous, because the construction time is unclear and
determined by the first use. The function may be only partially
constructed, which is part of why we have many of these hacky string
attributes to track what we need for ABI lowering.
For SelectionDAG, all stack objects are created up-front before
calling convention lowering so stack objects are visible at
construction time. For GlobalISel, none of the IR function has been
visited yet and the allocas haven't been added to the MachineFrameInfo
yet. This should fix failing to set flat_scratch_init in GlobalISel
when needed.
This pass really needs to be turned into some kind of analysis, but I
haven't found a nice way use one here.
This was looking for a compare condition, and copying the compare
flags. I don't think this was ever correct outside of certain min/max
patterns which aren't checked, but this probably predates select
instructions having fast math flags.
This should be directly implied from the register class, and there's
no need to special case live ins here. This was getting the wrong
answer for the queue ptr argument in callable functions, since it's
not an explicit IR argument and is always uniform.
Fixes not using scalar loads for the aperture in addrspacecast
lowering, and any other places that use implicit SGPR arguments.
Summary:
The code previously assumed the source of the bitcast in the combined
pattern was a vector type, but this is not always true. This patch
adds a check to avoid an assertion failure in that case.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80164
Summary:
This reflects changes in the spec proposal made since basic arithmetic
was first implemented.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D80174
After D76797 the dominator tree is no longer used in LVI, so we
can remove it as a pass dependency, and also get rid of the
dominator tree enabling/disabling logic in JumpThreading.
Apart from cleaning up the code, this also clarifies LVI
cache consistency, in that the LVI cache can no longer
depend on whether the DT was or wasn't enabled due to
pending DT updates at any given time.
Differential Revision: https://reviews.llvm.org/D76985
This effectively splits the scheduling WriteVecMaskedStore(Y) classes
into four different classes (one per each variant).
The new VecMaskedStore scheduling classes are now correctly marked as
'unsupported' by the bdver2 and btver2 models.
No functional change intended.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D80201
We already check hasNoNaNs and that x is finite and strictly positive.
That only leaves the following special cases (taken from the Linux man
page for pow):
If x is +1, the result is 1.0 (even if y is a NaN).
If the absolute value of x is less than 1, and y is negative infinity, the result is positive infinity.
If the absolute value of x is greater than 1, and y is negative infinity, the result is +0.
If the absolute value of x is less than 1, and y is positive infinity, the result is +0.
If the absolute value of x is greater than 1, and y is positive infinity, the result is positive infinity.
The first case is handled elsewhere, and this transformation preserves
all the others, so there is no need to limit it to hasNoInfs.
Differential Revision: https://reviews.llvm.org/D79409
This patch adds VPValue version of the instruction operands to
VPReplicateRecipe and uses them during code-generation.
Reviewers: Ayal, gilr, rengolin
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D80114
We can remove a dynamic memory allocation, by checking the number of
operands: no operands = all true, 1 operand = mask.
Reviewers: Ayal, gilr, rengolin
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D80110
r119493 protected against PHINode::hasConstantValue returning the PHI
node itself, but a later fix in r159687 means that can never happen, so
the workarounds are no longer required.
For describing section/symbol names we can use unique suffixes,
e.g:
```
- Name: '.foo [1]`
- Name: '.foo [2]`
```
It can be a problem (see https://reviews.llvm.org/D79984#inline-734829),
because `[]` are sometimes used to describe a macros:
```
- Name: "[[a0]]"
```
Seems the better approach is to use something else, like "()".
This patch does it and refactors the code related.
Differential revision: https://reviews.llvm.org/D80123
Replace with forward declarations and move includes down to source files where required.
I also needed to move the TargetLoweringObjectFile::SectionForGlobal wrapper implementation down into TargetLoweringObjectFile.cpp
Try to avoid creating VGBMs by reusing the permutation mask if it contains a
zero. If the first byte was into (any byte of) a zero vector, then the first
byte of the mask can become zero and reused by putting the mask also as the
first operand. If there instead was a first-byte use of the other source
operand, then that zero index can be reused if the mask is placed as the
second operand.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D79925
The patch changes dumping of offsets in .debug_str_offsets sections so
that they are printed as 16-digit hex values if the contribution is in
the DWARF64 format.
Differential Revision: https://reviews.llvm.org/D79997
The patch changes dumping of unit_length, debug_info_offset, and
debug_info_length fields in headers in .debug_pubname and
.debug_pubtypes sections so that they are printed as 16-digit hex values
if the contribution is in the DWARF64 format. Dumping of offsets in the
tables is changed in the same way.
Differential Revision: https://reviews.llvm.org/D79997
The patch changes dumping of a unit_length field and offsets in headers
in .debug_loclists and .debug_rnglists sections so that they are printed
as 16-digit hex values if the contribution is in the DWARF64 format.
Differential Revision: https://reviews.llvm.org/D79997
The patch changes dumping of unit_length and header_length fields in
headers in .debug_line sections so that they are printed as 16-digit hex
values if the contribution is in the DWARF64 format.
Differential Revision: https://reviews.llvm.org/D79997
The patch changes dumping of the unit_length field in a unit header so
that it is printed as a 16-digit hex value if the unit is in the DWARF64
format.
Differential Revision: https://reviews.llvm.org/D79997
The patch changes dumping of DWARF form values which sizes depend on
the DWARF format so that they are printed as 16-digit hex values for
DWARF64.
Differential Revision: https://reviews.llvm.org/D79997
The patch changes dumping of unit_length and debug_info_offset fields in
an address range header so that they are printed as 16-digit hex values
if the contribution is in the DWARF64 format.
Differential Revision: https://reviews.llvm.org/D79997
Commit 8e8f1bd75a ("[BPF] Return fail if disassembled insn registers
out of range") tried to fix a segfault when an illegal instruction
is decoded. A test case is added to emulate such an illegal instruction.
The llvm buildbot reported an asan issue with this test case.
ERROR: AddressSanitizer: global-buffer-overflow on address ...
decodeMemoryOpValue(llvm::MCInst&, unsigned int, ...)
llvm::MCDisassembler::DecodeStatus llvm::decodeToMCInst<unsigned long>(...)
llvm::MCDisassembler::DecodeStatus llvm::decodeInstruction<unsigned long>(...)
in (anonymous namespace)::BPFDisassembler::getInstruction(...)
...
Basically, the fix in Commit 8e8f1bd75a is too later to prevent
the asan. The fix in this patch moved the register number check earlier
during decodeInstruction(). It will return fail for decodeInstruction()
if the register number is out of range.
Note that DecodeGPRRegisterClass() and DecodeGPR32RegisterClass()
already have register number checking, so here we only check
decodeMemoryOpValue().
Summary:
When a loop has multiple backedges, loop simplification attempts to
separate them out into nested loops. This results in incorrect control
flow in the presence of some functions like a GPU barrier. This change
skips the transformation when such "convergent" function calls are
present in the loop body.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D80078
Daniel reported a llvm-objdump segfault like below:
$ llvm-objdump -D bpf_xdp.o
...
0000000000000000 <.strtab>:
0: 00 63 69 6c 69 75 6d 5f <unknown>
1: 6c 62 36 5f 61 66 66 69 w2 <<= w6
...
(llvm-objdump: lib/Target/BPF/BPFGenAsmWriter.inc:1087: static const char*
llvm::BPFInstPrinter::getRegisterName(unsigned int): Assertion
`RegNo && RegNo < 25 && "Invalid register number!"' failed.
Stack dump:
0. Program arguments: llvm-objdump -D bpf_xdp.o
...
abort
...
llvm::BPFInstPrinter::getRegisterName(unsigned int)
llvm::BPFInstPrinter::printMemOperand(llvm::MCInst const*,
int, llvm::raw_ostream&, char const*)
llvm::BPFInstPrinter::printInstruction(llvm::MCInst const*,
unsigned long, llvm::raw_ostream&)
llvm::BPFInstPrinter::printInst(llvm::MCInst const*,
unsigned long, llvm::StringRef, llvm::MCSubtargetInfo const&,
llvm::raw_ostream&)
...
Basically, since -D enables disassembly for all sections, .strtab is also disassembled,
but some strings are decoded as legal instructions but with illegal register numbers.
When llvm-objdump tries to print register name for these illegal register numbers,
assertion and segfault happens.
The patch fixed the issue by returning fail for a disassembled insn if
that insn contains a reg operand with illegal reg number.
The insn will be printed as "<unknown>" instead of causing an assertion.
For a simple program like below:
-bash-4.4$ cat t.c
int test() {
asm volatile("r0 = r0" ::);
return 0;
}
compiled with
clang -target bpf -O2 -c t.c
the following llvm-objdump command will segfault.
llvm-objdump -d t.o
0: bf 00 00 00 00 00 00 00 nop
llvm-objdump: ../include/llvm/ADT/SmallVector.h:180
...
Assertion `idx < size()' failed
...
abort
...
llvm::BPFInstPrinter::printOperand
llvm::BPFInstPrinter::printInstruction
...
The reason is both NOP and MOV_rr (r0 = r0) having the same encoding.
The disassembly getInstruction() decodes to be a NOP instruciton but
during printInstruction() the same encoding is interpreted as
a MOV_rr instruction. Such a mismatcch caused the segfault.
The fix is to make NOP instruction as CodeGen only so disassembler
will skip NOP insn for disassembling.
Note that instruction "r0 = r0" should not appear in non inline
asm codes since BPF Machine Instruction Peephole optimization will
remove it.
Differential Revision: https://reviews.llvm.org/D80156
This reverts commit 525a591f0f.
Fixed an issue with pointers to members based on typedefs. In this case,
LLVM would emit a second UDT. I fixed it by not passing the class type
to getTypeIndex when the base type is not a function type. lowerType
only uses the class type for direct function types. This suggests if we
have a PMF with a function typedef, there may be an issue, but that can
be solved separately.
These should legalize like undefs and select into copies.
The ll test is copied from the x86 test, minus the half fp case because
we don't currently support that.
LV considers an internally computed MaxVF to decide if a constant trip-count is
a multiple of any subsequently chosen VF, and conclude that no scalar remainder
iterations (tail) will be left for Fold Tail to handle. If an external VF is
provided via -force-vector-width, it must be considered instead of the internal
MaxVF.
If an external UF is provided via -force-vector-interleave, it too must be
considered in addition to MaxVF or user VF.
Fixes PR45679.
Differential Revision: https://reviews.llvm.org/D80085
Summary:
This patch adds IR intrinsics for the mnemonics USDOT and SUDOT of the
8.6 extension of Armv8-a.
Reviewers: sdesmalen, efriedma, david-arm
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79876
verifyFunction/verifyModule don't assert or error internally. They
also don't print anything if you don't pass a raw_ostream to them.
So the caller needs to check the result and ideally pass a stream
to get the messages. Otherwise they're just really expensive no-ops.
I've filed PR45965 for another instance in SLPVectorizer
that causes a lit test failure.
Differential Revision: https://reviews.llvm.org/D80106
This was assumed to be a simple move, and interpreting the immediate
modifier operand as a materialized immediate. Apparently the SDWA pass
never produces these, but GlobalISel does emit these for some vector
shuffles.
If both OpA and OpB is an add with NSW/NUW and with the same LHS operand,
we can guarantee that the transformation is safe if we can prove that OpA
won't overflow when IdxDiff added to the RHS of OpA.
Review: https://reviews.llvm.org/D79817
Now that load/store have required alignment, accept Align here.
This also avoids uses of getPointerElementType(), which is
incompatible with opaque pointers.
Summary:
Currently they are not supported together. Supporting them will require
a LangRef change. See discussion in https://reviews.llvm.org/D77689.
Reviewers: rnk, efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80132
r2694 fixed a bug where removePredecessor could create IR with a use not
dominated by its def in a self loop. But this could only happen in an
unreachable loop, and since that time the rules have been relaxed so
that defs don't have to dominate uses in unreachable code, so the fix is
unnecessary. The regression test added in r2691 still stands.
Differential Revision: https://reviews.llvm.org/D80128
It's better to reuse the first source value than to use an undef second
operand, because that will make more resulting VPERMs have identical operands
and therefore MachineCSE more successful.
Review: Ulrich Weigand
Summary:
When salvaging a dead zext instruction, append a convert operation to
the DIExpressions of the debug uses of the instruction, to prevent the
salvaged value from being sign-extended.
I confirmed that lldb prints out the correct unsigned result for "f" in
the example from PR45923 with this changed applied.
rdar://63246143
Reviewers: aprantl, jmorse, chrisjackson, davide
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80034
I have changed the pass so that we ignore shuffle vectors with
scalable vector types, and replaced VectorType with FixedVectorType
in the rest of the pass. I couldn't think of an easy way to test
this change, since for scalable vectors we shouldn't be using
shufflevectors for interleaving. This change fixes up some
type size assert warnings I found in the following test:
CodeGen/AArch64/sve-intrinsics-int-arith-imm.ll
Differential Revision: https://reviews.llvm.org/D79700
> Before this patch, S_[L|G][THREAD32|DATA32] records were emitted with a simple name, not the fully qualified name (namespace + class scope).
>
> Differential Revision: https://reviews.llvm.org/D79447
This causes asserts in Chromium builds:
CodeViewDebug.cpp:2997: void llvm::CodeViewDebug::emitDebugInfoForUDTs(const std::vector<std::pair<std::string, const DIType *>> &):
Assertion `OriginalSize == UDTs.size()' failed.
I will follow up on the Phabricator issue.
for variables in nested scopes (including inlined functions) if there is a
single location which covers the entire scope and the scope is contained in a
single block.
Based on work by @jmorse.
Reviewed By: vsk, aprantl
Differential Revision: https://reviews.llvm.org/D79571
Now that load/store alignment is required, we no longer need most
of them. Also switch the getLoadStoreAlignment() helper to return
Align instead of MaybeAlign.
We can't leave undef vector element constants as-is,
it is a miscompile, so we need to sanitize them.
We have two vectors (C and ~C):
* We can't replace undef with 0 in both of them
* We can't replace undef with 0 in only one of them
* We could replace undef with -1 in both of them
* We could replace undef with -1 in only one(!) of them
* We could replace undef with -1 in one and 0 in another one of them.
Therefore, it seems best to go with the last option, since otherwise
we'd loose knowledge that C and ~C have no common bits set,
which seems more important than preserving partial undef knowledge.
Fixes https://bugs.llvm.org/show_bug.cgi?id=45955
This is a no-op/NFC at the moment & generally makes the code /somewhat/
cleaner/less reliant on assumptions about what will produce a debug_addr
section.
It's still a bit "spooky action at a distance" - the add ranges code
pre-emptively inserts addresses into the address pool it knows will
eventually be used by the range emission code (or low/high pc).
The 'ideal' would be either to actually compute the addresses needed for
range (& loc) emission earlier - which would mean decanonicalizing the
range/loc representation earlier to account for whether it was going to
use addrx encodings or not (which would be unfortunate, but could be
refactored to be relatively unobtrusive).
Alternatively, emitting the range/loc sections earlier would cause them
to request the needed addresses sooner - but then you endup having to
split finalizeModuleInfo because some things need to be handled there
before the ranges/locs are emitted, I think...
LVI and its consumers currently have quite a bit of complexity
related to dominator tree management. However, it doesn't look
like it is actually needed...
The only use of the dominator tree is inside isValidAssumeForContext().
However, due to the way LVI queries work, it is not needed:
If we query a value for some block, we will first get the edge values
from all predecessor blocks, which also includes an intersection with
assumptions that apply to the terminator of the predecessor. As such,
we will already have processed all assumptions from predecessor blocks
(this is actually stronger than what isValidAssumeForContext() does
with a DT, because this is capable of combining non-dominating
assumptions). The only additional assumptions we need to take into
account are those in the block being queried. And we don't need a
dominator tree for that.
This patch only removes the use of DT, I will drop the machinery
around it in a followup.
Differential Revision: https://reviews.llvm.org/D76797
This build vector lowering pattern came up in D79886.
I've tried to limit the improvement to cases where it looks
clearly better to load, but we could remove the 'TODO'
predicates already if we are willing to overlook some
corner cases.
Differential Revision: https://reviews.llvm.org/D80013
This was originally in D79116.
Converting from a narrow-enough FP source value to integer and
back to FP guarantees that the conversion to FP is exact because
of UB/poison-on-overflow.
This was suggested in PR36617:
https://bugs.llvm.org/show_bug.cgi?id=36617#c19
When the callee requires a dynamic stack realignment,
it is not possible to correcty access the incoming
stack arguments using the stack pointer. We reserve a
base pointer in such cases to access the function arguments
inside the callee. The base pointer will hold the incoming
stack pointer value before any kind of delta added to it.
Reviewed By: arsenm, scott.linder
Differential Revision: https://reviews.llvm.org/D78811
Summary:
On XMEGA, I/O address space is same as data address space - there is no 0x20 offset,
because CPU General Purpose Registers are not mapped in data address space.
From https://en.wikipedia.org/wiki/AVR_microcontrollers
> In the XMEGA variant, the working register file is not mapped into the data address space; as such, it is not possible to treat any of the XMEGA's working registers as though they were SRAM. Instead, the I/O registers are mapped into the data address space starting at the very beginning of the address space.
Reviewers: dylanmckay
Reviewed By: dylanmckay
Subscribers: hiraditya, Jim, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77207
Patch by Vlastimil Labsky.
We know the pointer somewhere on the stack, we just don't know
exactly where since the index may be variable.
Differential Revision: https://reviews.llvm.org/D80060
Along the lines of D77454 and D79968. Unlike loads and stores, the
default alignment is getPrefTypeAlign, to match the existing handling in
various places, including SelectionDAG and InstCombine.
Differential Revision: https://reviews.llvm.org/D80044
This ensures we create mem operands for these instructions fixing PR45949.
Unfortunately, it increases the size of X86GenDAGISel.inc, but some dag
combine canonicalization could reduce the types of load we need to match.
If isSized is passed a SmallPtrSet, it uses that set to catch infinitely
recursive types (for example, a struct that has itself as a member).
Otherwise, it just crashes on such types.
This is split off from D79799 - where I was proposing to fully iterate
over a function until there are no more transforms. I suspect we are
still going to want to do something like that eventually.
But we can achieve the same gains much more efficiently on the current
set of regression tests just by reversing the order that we visit the
instructions.
This may also reduce the motivation for D79078, but we are still not
getting the optimal pattern for a reduction.
Given a VQMOVN(VSHR), we can fold that into a VQSHRN simply enough using
a few tablegen patterns.
Differential Revision: https://reviews.llvm.org/D77720
This is basically the same patch as D63233, but converted to
funnel shifts rather than regular shifts. I did not see a
way to effectively share code for these 2 cases though.
This follows D79718 and D79827 to re-fix PR37426 because
that gets canonicalized to funnel shift intrinsics in IR.
I did draft an alternative patch as an enhancement to
"shouldSinkOperands()", but that was awkward because
we have to key the transform from the select, but then
look at both its users and its operands.
This adds two combines for VMOVN, one to fold
VMOVN[tb](c, VQMOVNb(a, b)) => VQMOVN[tb](c, b)
The other to perform demand bits analysis on the lanes of a VMOVN. We
know that only the bottom lanes of the second operand and the top or
bottom lanes of the Qd operand are needed in the result, depending on if
the VMOVN is bottom or top.
Differential Revision: https://reviews.llvm.org/D77718
This adds some custom lowering for VQMOVN, an instruction that can be
used to perform saturating truncates from a pair of min(max(X, -0x8000),
0x7fff), providing those constants are correct. This leaves a VQMOVNBs
which saturates the value and inserts that into the bottom lanes of an
existing vector. We then need to do something with the other lanes,
extending the value using a vmovlb.
Ideally, as will often be the case, only the bottom lane of what remains
will be demanded, allowing the vmovlb to be removed. Which should mean
the instruction is either equal or a win most of the time, and allows
some extra follow-up folding to happen.
Differential Revision: https://reviews.llvm.org/D77590
computeKnownBitsFromAssume() currently asserts if m_V matches a
ptrtoint that changes the bitwidth. Because InstCombine
canonicalizes ptrtoint instructions to use explicit zext/trunc,
we never ran into the issue in practice. I'm adding unit tests,
as I don't know if this can be triggered via IR anywhere.
Fix this by calling anyextOrTrunc(BitWidth) on the computed
KnownBits. Note that we are going from the KnownBits of the
ptrtoint result to the KnownBits of the ptrtoint operand,
so we need to truncate if the ptrtoint zexted and anyext if
the ptrtoint truncated.
Differential Revision: https://reviews.llvm.org/D79234
The code was calculating an offset from a stack pointer SDValue.
This is exactly what getMemBasePlusOffset does. I also replaced
sizeof(int) with a hardcoded 4. We know the type we're operating
on is 4 bytes. But the size of int that the source code is being
compiled with isn't guaranteed to be 4 bytes.
While here replace another use of getMemBasePlusOffset that was
proceeded with a call to getConstant with the other signature
that call getConstant internally.
This bug is exposed by Test7 of ehthrow.cxx in MSVC EH suite where
a rethrow occurs in a try-catch inside a catch (i.e., a nested Catch
handlers). See the test code in
https://github.com/microsoft/compiler-tests/blob/master/eh/ehthrow.cxx#L346
When an object is rethrown in a Catch handler, the copy-ctor of this
object must be executed after the destructions of live objects, but
BEFORE the dtors of live objects in parent handlers.
Today Windows 64-bit runtime (__CxxFrameHandler3 & 4) expects nested Catch
handers
are stored in pre-order (outer first, inner next) in $tryMap$ table, so
that given a State, its Catch's beginning State can be properly
retrieved. The Catch beginning state (which is also the ending State) is
the State where rethrown object's copy-ctor must take place.
LLVM currently stores nested catch handlers in post-ordering because
it's the natural way to compute the highest State in Catch.
The fix is to simply store TryCatch handler in pre-order, but update
Catch's highest State after child Catches are all processed.
Differential Revision: https://reviews.llvm.org/D79474?id=263919
Summary:
When spilling in the entry function we should be able to borrow
StackPtrOffsetReg as a last resort. This restores behaviour
removed in D75138, and fixes failures when shaders use all
SGPRs, VGPRs and spill in the entry function.
Reviewers: scott.linder, arsenm, tpr
Reviewed By: scott.linder, arsenm
Subscribers: qcolombet, foad, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79776
Summary:
In the the given example, a stack slot pointer is merged
between a setjmp and longjmp. This pointer is spilled,
so it does not get correctly restored, addinga undefined
behaviour where it shouldn't.
Change-Id: I60ec010844f2a24ce01ceccf12eb5eba5ab94abb
Reviewers: eli.friedman, thanm, efriedma
Reviewed By: efriedma
Subscribers: MatzeB, qcolombet, tpr, rnk, efriedma, hiraditya, llvm-commits, chill
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77767
This reverts commit 454de99a6f.
The problem was that one of the ctor arguments of CallAnalyzer was left
to be const std::function<>&. A function_ref was passed for it, and then
the ctor stored the value in a function_ref field. So a std::function<>
would be created as a temporary, and not survive past the ctor
invocation, while the field would.
Tested locally by following https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
Original Differential Revision: https://reviews.llvm.org/D79917