We tend to assume that the AA pipeline is by default the default AA
pipeline and it's confusing when it's empty instead.
PR48779
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D95117
The DWARF numbers of vector registers are already defined in
riscv-elf-psabi. The DWARF number for vector is start from 96.
Correct the DWARF numbers of vector registers.
Differential Revision: https://reviews.llvm.org/D94749
These instructions produce 2*SEW result so the input can't have
an LMUL=8 or the result would need a non-existant LMUL=16. So
only create pseudos for LMUL up to 4.
Differential Revision: https://reviews.llvm.org/D95189
PowerPC has its custom scheduler heuristic. It calls parent classes'
tryCandidate in override version, but the function returns void, so this
way doesn't actually help. This patch duplicates code from base scheduler
into PPC machine scheduler class, which does what we wanted.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D94464
This test case demonstrates that the Call Frame Information generation is
totally biased towards whether exceptions are enabled or not. Currently
LLVM does not generate CFI i.e. a .debug_frame for debug purpose even
if --force-dwarf-frame-section is enabled unless exceptions are enabled.
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D94801
check
This patch fixes a bug in emitARCOperationAfterCall where it inserts the
fall-back call after a bitcast instruction and then replaces the
bitcast's operand with the result of the fall-back call. The generated
IR without this patch looks like this:
msgSend.call: ; preds = %entry
%call = call i8* bitcast (i8* (i8*, i8*, ...)* @objc_msgSend
br label %msgSend.cont
msgSend.null-receiver: ; preds = %entry
call void @llvm.objc.release(i8* %4)
br label %msgSend.cont
msgSend.cont:
%8 = phi i8* [ %call, %msgSend.call ], [ null, %msgSend.null-receiver ]
%9 = bitcast i8* %10 to %0*
%10 = call i8* @llvm.objc.retain(i8* %8)
Notice that `%9 = bitcast i8* %10` to %0* is taking operand %10 which is
defined after it.
To fix the bug, this patch modifies the insert point to point to the
bitcast instruction so that the fall-back call is inserted before the
bitcast. In addition, it teaches the function to look at phi
instructions that are generated when there is a check for a null
receiver and insert the retainRV/claimRV instruction right after the
call instead of inserting a fall-back call right after the phi
instruction.
rdar://73360225
Differential Revision: https://reviews.llvm.org/D95181
This extracts the implementation of getType, setType, and getBody from
FunctionSupport.h into the mlir::impl namespace and defines them
generically in FunctionSupport.cpp. This allows them to be used
elsewhere for any FunctionLike ops that use FunctionType for their
type signature.
Using the new helpers, FuncOpSignatureConversion is generalized to
work with all such FunctionLike ops. Convenience helpers are added to
configure the pattern for a given concrete FunctionLike op type.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D95021
The fault-only-first-load instructions can reduce VL if an element
other than element 0 triggers a memory fault. This can be used to
vectorize loops with data dependent exit conditions like strcmp or
strlen.
This patch adds a VL output to these intrinsics so that the new
VL value can be captured by software. This will be expanded to
'csrr gpr, vl' after the vleff instruction during SelectionDAG.
By doing this with one intrinsic we are able to guarantee that the
csrr reads the VL value produced by the vleff instruction. Having
it as a separate intrinsic would make it impossible to guarantee
ordering without making every other vector intrinsic have side
effects.
The intrinsics are expanded during lowering into two ISD nodes
that are glued together. These ISD nodes will go
through isel separately, but should maintain the glue so that they
get emitted adjacently by InstrEmitter.
I've only ran the chain through the vleff instruction, allowing
the READ_VL to be deleted if it is unused.
Reviewed By: HsiangKai
Differential Revision: https://reviews.llvm.org/D94286
local __libcpp_asprintf_l() -> libc asprintf() was inspecting the pointer (with
indeterminate value) for failure, rather than the return value of -1.
Reviewed By: ldionne
Differential Revision: https://reviews.llvm.org/D94564
Don't emit a bogus error message about a bad forward reference
when it's an IMPORT of a USE-associated symbol; don't ignore
intrinsic functions when USE-associating the contents of a
module when the intrinsic has been explicitly USE'd; allow
PUBLIC or PRIVATE accessibility attribute to be specified
for an enumerator before the declaration of the enumerator.
Differential Revision: https://reviews.llvm.org/D95175
Upgrade RISC-V V extension to v1.0-08a0b46.
Indexed load/store have ordered and unordered form.
New whole vector load/store.
Differential Revision: https://reviews.llvm.org/D93614
1.
All `_URC_HANDLER_FOUND` return values need to set `landingPad`
and its value does not matter for `_URC_CONTINUE_UNWIND`. So we
can always set `landingPad` to unify code.
2.
For an exception specification (`ttypeIndex < 0`), we can check `_UA_FORCE_UNWIND` first.
3.
The so-called type 3 search (`actions & _UA_CLEANUP_PHASE && !(actions & _UA_HANDLER_FRAME)`)
is actually conceptually wrong. For a catch handler or an unmatched dynamic
exception specification, `_UA_HANDLER_FOUND` should be returned immediately. It
still appeared to work because the `ttypeIndex==0` case would return
`_UA_HANDLER_FOUND` at a later time.
This patch fixes the conceptual error and simplifies the code by handling type 3
the same way as type 2 (which is also what libsupc++ does).
The only difference between phase 1 and phase 2 is what to do with a cleanup
(`actionEntry==0`, or a `ttypeIndex==0` is found in the action record chain):
phase 1 returns `_URC_CONTINUE_UNWIND` while phase 2 returns `_URC_HANDLER_FOUND`.
Reviewed By: #libc_abi, compnerd
Differential Revision: https://reviews.llvm.org/D93190
CodeGenModule::EmitNullConstant() creates constants with their "in memory"
type, not their "in vregs" type. The one place where this difference matters is
when the type is _Bool, as that is an i1 when in vregs and an i8 in memory.
Fixes: rdar://73361264
We've been using this patch in Android so we can avoid including the
demangler in libc++.so. It comes with a rather large cost in RSS and
isn't commonly needed.
Reviewed By: #libc_abi, compnerd
Differential Revision: https://reviews.llvm.org/D88189
lldb-vsdode was communicating the list of modules to the IDE with events, which in practice ended up having some drawbacks
- when debugging large targets, the number of these events were easily 10k, which polluted the messages being transmitted, which caused the following: a harder time debugging the messages, a lag after terminated the process because of these messages being processes (this could easily take several seconds). The latter was specially bad, as users were complaining about it even when they didn't check the modules view.
- these events were rarely used, as users only check the modules view when something is wrong and they try to debug things.
After getting some feedback from users, we realized that it's better to not used events but make this simply a request and is triggered by users whenever they needed.
This diff achieves that and does some small clean up in the existing code.
Differential Revision: https://reviews.llvm.org/D94033
This adds cost modelling for the inloop vectorization added in
745bf6cf44. Up until now they have been modelled as the original
underlying instruction, usually an add. This happens to works OK for MVE
with instructions that are reducing into the same type as they are
working on. But MVE's instructions can perform the equivalent of an
extended MLA as a single instruction:
%sa = sext <16 x i8> A to <16 x i32>
%sb = sext <16 x i8> B to <16 x i32>
%m = mul <16 x i32> %sa, %sb
%r = vecreduce.add(%m)
->
R = VMLADAV A, B
There are other instructions for performing add reductions of
v4i32/v8i16/v16i8 into i32 (VADDV), for doing the same with v4i32->i64
(VADDLV) and for performing a v4i32/v8i16 MLA into an i64 (VMLALDAV).
The i64 are particularly interesting as there are no native i64 add/mul
instructions, leading to the i64 add and mul naturally getting very
high costs.
Also worth mentioning, under NEON there is the concept of a sdot/udot
instruction which performs a partial reduction from a v16i8 to a v4i32.
They extend and mul/sum the first four elements from the inputs into the
first element of the output, repeating for each of the four output
lanes. They could possibly be represented in the same way as above in
llvm, so long as a vecreduce.add could perform a partial reduction. The
vectorizer would then produce a combination of in and outer loop
reductions to efficiently use the sdot and udot instructions. Although
this patch does not do that yet, it does suggest that separating the
input reduction type from the produced result type is a useful concept
to model. It also shows that a MLA reduction as a single instruction is
fairly common.
This patch attempt to improve the costmodelling of in-loop reductions
by:
- Adding some pattern matching in the loop vectorizer cost model to
match extended reduction patterns that are optionally extended and/or
MLA patterns. This marks the cost of the reduction instruction correctly
and the sext/zext/mul leading up to it as free, which is otherwise
difficult to tell and may get a very high cost. (In the long run this
can hopefully be replaced by vplan producing a single node and costing
it correctly, but that is not yet something that vplan can do).
- getExtendedAddReductionCost is added to query the cost of these
extended reduction patterns.
- Expanded the ARM costs to account for these expanded sizes, which is a
fairly simple change in itself.
- Some minor alterations to allow inloop reduction larger than the highest
vector width and i64 MVE reductions.
- An extra InLoopReductionImmediateChains map was added to the vectorizer
for it to efficiently detect which instructions are reductions in the
cost model.
- The tests have some updates to show what I believe is optimal
vectorization and where we are now.
Put together this can greatly improve performance for reduction loop
under MVE.
Differential Revision: https://reviews.llvm.org/D93476
In LoopInterchange, `findInnerReductionPhi()` looks for reduction
variables, which cannot be constants. Update it to return early in that
case.
This also addresses a blocker for removing use-lists from ConstantData,
whose users could be spread across arbitrary modules in the same
LLVMContext.
Differential Revision: https://reviews.llvm.org/D94712
This fixes the final (I think?) reference invalidation in `SmallVector`
that we need to fix to align with `std::vector`. (There is still some
left in the range insert / append / assign, but the standard calls that
UB for `std::vector` so I think we don't care?)
For POD-like types, reimplement `emplace_back()` in terms of
`push_back()`, taking a copy even for large `T` rather than lose the
realloc optimization in `grow_pod()`.
For other types, split the grow operation in three and construct the new
element in the middle.
- `mallocForGrow()` calculates the new capacity and returns the result
of `safe_malloc()`. We only need a single definition per
`SmallVectorBase` so this is defined in SmallVector.cpp to avoid code
size bloat. Moving this part of non-POD grow to the source file also
allows the logic to be easily shared with `grow_pod`, and
`report_size_overflow()` and `report_at_maximum_capacity()` can move
there too.
- `moveElementsForGrow()` moves elements from the old to the new
allocation.
- `takeAllocationForGrow()` frees the old allocation and saves the
new allocation and capacity .
`SmallVector:assign(size_type, const T&)` also uses the split-grow
operations for non-POD, but it also has a semantic change when not
growing. Previously, assign would start with `clear()`, and so the old
elements were destructed and all elements of the new vector were
copy-constructed (potentially invalidating references). The new
implementation skips destruction and uses copy-assignment for the prefix
of the new vector that fits. The new semantics match what libc++ does
for `std::vector::assign()`.
Note that the following is another possible implementation:
```
void assign(size_type NumElts, ValueParamT Elt) {
std::fill_n(this->begin(), std::min(NumElts, this->size()), Elt);
this->resize(NumElts, Elt);
}
```
The downside of this simpler implementation is that if the vector has to
grow there will be `size()` redundant copy operations.
(I had planned on splitting this patch up into three for committing
(after getting performance numbers / initial review), but I've realized
that if this does for some reason need to be reverted we'll probably
want to revert the whole package...)
Differential Revision: https://reviews.llvm.org/D94739
This recommits 71ed4b6ce5 with
the polarity of some of the pattern corrected.
Original commit message:
The custom expansion of select operations in the RISC-V backend
interferes with the matching of cmov instructions. Legalizing
select when the Zbt extension is available solves that problem.
Reviewed By: luismarques, craig.topper
Differential Revision: https://reviews.llvm.org/D93767
This is NFC-intended and removes the "OperationData"
class which had become nothing more than a recurrence
(reduction) type.
I adjusted the matching logic to distinguish
instructions from non-instructions - that's all that
the "IsLeafValue" member was keeping track of.
Fixes PR48523. When the linker errors with "output file too large",
one question that comes to mind is how the section sizes differ from
what they were previously. Unfortunately, this information is lost
when the linker exits without writing the output file. This change
makes it so that the error message includes the sizes of the largest
sections.
Reviewed By: MaskRay, grimar, jhenderson
Differential Revision: https://reviews.llvm.org/D94560
If a function doesn't contain loops and does not call non-willreturn
functions, then it is willreturn. Loops are detected by checking
for backedges in the function. We don't attempt to handle finite
loops at this point.
Differential Revision: https://reviews.llvm.org/D94633
`X86AsmParser::ParseIntelExpression` has a while loop. In the body,
calls to MCAsmLexer::UnLex can force a reallocation in the MCAsmLexer's
`CurToken` SmallVector, invalidating saved references to
`MCAsmLexer::getTok()`.
`const MCAsmToken &Tok` is such a saved reference, and this moves it
from outside the while loop to inside the body, fixing a
use-after-realloc.
`Tok` will still be reused across calls to `Lex()`, each of which
effectively destroys and constructs the pointed-to token. I'm a bit
skeptical of this usage pattern, but it seems broadly used in the
X86AsmParser (and others) so I'm leaving it alone (for now).
Somehow this bug was exposed by https://reviews.llvm.org/D94739,
resulting in test failures in dot-operator related tests in
llvm/test/tools/llvm-ml. I suspect the exposure path is related to
optimizer changes from splitting up the grow operation, but I haven't
dug all the way in. Regardless, there are already tests in tree that
cover this; they might fail consistently if we added ASan
instrumentation to SmallVector.
Differential Revision: https://reviews.llvm.org/D95112
Summary:
Prior to D91261 the information checked the OMP_MAP_TARGET_PARAM flag, change this as it has been removed. The INFO macro was changed to accept a flag as input to make conditionally printing information easier.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95133
Defaulted destructor was treated inconsistently, compared to other
compiler-generated functions.
When Sema::IdentifyCUDATarget() got called on just-created dtor which didn't
have implicit __host__ __device__ attributes applied yet, it would treat it as a
host function. That happened to (sometimes) hide the error when dtor referred
to a host-only functions.
Even when we had identified defaulted dtor as a HD function, we still treated it
inconsistently during selection of usual deallocators, where we did not allow
referring to wrong-side functions, while it is allowed for other HD functions.
This change brings handling of defaulted dtors in line with other HD functions.
Differential Revision: https://reviews.llvm.org/D94732
The place-holding implementation of C_LOC just didn't work
when used with our more complete semantic checking, specifically
in the case of a polymorphic argument; convert it to an external
function with an implicit interface. C_ASSOCIATED needs to be
a generic interface with specific implementations for C_PTR and
C_FUNPTR.
Differential Revision: https://reviews.llvm.org/D94714
Profiling has been recently implemented in libomptarget (D93055). This patch enables time profiling support for libomptarget in libomp, to support profiling of multi-threaded execution of offloaded regions.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D94855