Before packing LDS globals into a sorted structure, make sure that
their alignment is properly updated based on their size. This will make
sure that the members of sorted structure are properly aligned, and
hence it will further reduce the probability of unaligned LDS access.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D103261
Implements better naming for results of spv.mlir.addressof ops by making it
inherit from OpAsmOpInterface and implementing the associated
getAsmResultName(...) hook.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D103594
During code completion, lookupInDeclContext() calls
CodeCompletionDeclConsumer::FoundDecl(),which can mutate StoredDeclsMap,
over which lookupInDeclContext() iterates. This can lead to invalidation
of iterators and an assert()-crash.
Example code where this happens:
#include <list>
int main() {
std::list<int>;
std::^
}
with code completion on ^ with -std=c++20.
I do not have a repro case that does not need standard library.
This fix stores pointers to NamedDecls in a temporary vector, then
visits them outside of the main loop, when StoredDeclsMap iterators are
gone.
Differential Revision: https://reviews.llvm.org/D103472
TestTU now prints errors to llvm::errs and aborts on failures via
llvm_unreachable, rather than executing ASSERT_FALSE.
We'd like to make use of these testing libraries in different test suits that
might be compiling with a different gtest version than LLVM has.
Differential Revision: https://reviews.llvm.org/D103685
Use llvm.experimental.vector.insert instead of storing into an alloca
when generating code for these intrinsics. This defers the codegen of
the generated vector to instruction selection, allowing existing
shufflevector style optimizations to apply.
Additionally, introduce a new target transform that can recognise fixed
predicate patterns in the svbool variants of these intrinsics.
Differential Revision: https://reviews.llvm.org/D103082
* Add hasUnitStride and hasZeroOffset to OffsetSizeAndStrideOpInterface. These functions are useful for various patterns. E.g., some vectorization patterns apply only for tensor ops with zero offsets and/or unit stride.
* Add getConstantIntValue and isEqualConstantInt helper functions, which are useful for implementing the two above functions, as well as various patterns.
Differential Revision: https://reviews.llvm.org/D103763
This global struct used to hold various flags for monitoring the
initialization of hsa.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D103795
This patch abstract Calls in Inliner:run() to InlineOrder.
With this patch, it's possible to customize the inlining order,
e.g. use queue or priority queue.
Reviewed By: kazu
Differential Revision: https://reviews.llvm.org/D103315
Also adjust a few comments, and move the DylibFile comment talking about
umbrella next to the parameter again.
Differential Revision: https://reviews.llvm.org/D103783
This diff adds testcase for the issue fixed in https://reviews.llvm.org/D77468
but regression test was not added in the diff. On Clang 9 it caused
crash in cland during code completion.
Test Plan: check-clang-unit
Differential Revision: https://reviews.llvm.org/D103722
This pass transforms loops that contain a conditional branch with induction
variable. For example, it transforms left code to right code:
newbound = min(n, c)
while (iv < n) { while(iv < newbound) {
A A
if (iv < c) B
B C
C }
} if (iv != n) {
while (iv < n) {
A
C
}
}
Differential Revision: https://reviews.llvm.org/D102234
This fixes PR49198: Wrong usage of __dso_handle in user code leads to
a compiler crash.
When Init is an address of the global itself, we need to track it
across RAUW. Otherwise the initializer can be destroyed if the global
is replaced.
Differential Revision: https://reviews.llvm.org/D101156
This patch marks the induction increment of the main induction variable
of the vector loop as NUW when not folding the tail.
If the tail is not folded, we know that End - Start >= Step (either
statically or through the minimum iteration checks). We also know that both
Start % Step == 0 and End % Step == 0. We exit the vector loop if %IV +
%Step == %End. Hence we must exit the loop before %IV + %Step unsigned
overflows and we can mark the induction increment as NUW.
This should make SCEV return more precise bounds for the created vector
loops, used by later optimizations, like late unrolling.
At the moment quite a few tests still need to be updated, but before
doing so I'd like to get initial feedback to make sure I am not missing
anything.
Note that this could probably be further improved by using information
from the original IV.
Attempt of modeling of the assumption in Alive2:
https://alive2.llvm.org/ce/z/H_DL_g
Part of a set of fixes required for PR50412.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D103255
This looks like a mistake when the tests were committed in r363946.
There were two sets of tests for the f32 variant of these instructions,
instead of one set for f16 and one set for f32.
Differential Revision: https://reviews.llvm.org/D103699
This fixes the missing address space on `this` in the implicit move
assignment operator.
The function called here is an abstraction around the lines that have
been removed which also sets the address space correctly.
This is copied from CopyConstructor, CopyAssignment and MoveConstructor,
all of which use this function, and now MoveAssignment does too.
Fixes: PR50259
Reviewed By: svenvh
Differential Revision: https://reviews.llvm.org/D103252
Previous logic was to always use the first kernarg pool found to allocate
kernel args. This patch changes this to use only the kernarg pool which
has non-zero size. This logic is also reworked to not use any globals.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D103600
Summary: The patch implements the mapping of the Yaml
information to XCOFF object file to enable the yaml2obj
tool for XCOFF. Currently only 32-bit is supported.
Reviewed By: jhenderson, shchenz
Differential Revision: https://reviews.llvm.org/D95505
The flag to set it is called `-install_name`, and it's called `installName` in tbd files.
No behavior change.
Differential Revision: https://reviews.llvm.org/D103776
dfsan does not use sanitizer allocator as others. In practice,
we let it use glibc's allocator since tcmalloc needs more work
to be working with dfsan well. With glibc, we observe large
memory leakage. This could relate to two things:
1) glibc allocator has limitation: for example, tcmalloc can reduce memory footprint 2x easily
2) glibc may call unmmap directly as an internal system call by using system call number. so DFSan has no way to release shadow spaces for those unmmap.
Using sanitizer allocator addresses the above issues
1) its memory management is close to tcmalloc
2) we can register callback when sanitizer allocator calls unmmap, so dfsan can release shadow spaces correctly.
Our experiment with internal server-based application proved that with the change, in a-few-day run, memory usage leakage is close to what tcmalloc does w/o dfsan.
This change mainly follows MSan's code.
1) define allocator callbacks at dfsan_allocator.h|cpp
2) mark allocator APIs to be discard
3) intercept allocator APIs
4) make dfsan_set_label consistent with MSan's SetShadow when setting 0 labels, define dfsan_release_meta_memory when unmap is called
5) add flags about whether zeroing memory after malloc/free. dfsan works at byte-level, so bit-level oparations can cause reading undefined shadow. See D96842. zeroing memory after malloc helps this. About zeroing after free, reading after free is definitely UB, but if user code does so, it is hard to debug an overtainting caused by this w/o running MSan. So we add the flag to help debugging.
This change will be split to small changes for review. Before that, a question is
"this code shares a lot of with MSan, for example, dfsan_allocator.* and dfsan_new_delete.*.
Does it make sense to unify the code at sanitizer_common? will that introduce some
maintenance issue?"
Reviewed By: morehouse
Differential Revision: https://reviews.llvm.org/D101204
We should be exiting when the shift amount is greater than
the bit width regardless of whether it is a power of 2.
Reported by Simon Pilgrim here https://reviews.llvm.org/D96661
This requires getting a shift amount that is out of bounds that
wasn't already optimized by SelectionDAG. This would be pretty
trick to construct a test for.
Or it would require a non-power of 2 shift amount and a mask
that has runs of ones and zeros of the next lowest power of 2 from
that shift amount. I tried a little to produce a test for this,
but didn't get it to work.
Multiple clauses are mutually exclusive. This patch refactors the functions that check for pairs of mutually exclusive clauses into a generalized function which also also accepts a list of clause types if which at most one can appear.
NFC patch extracted out of D99459 by request.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D103666
Don't require a specific kind of IRBuilder for TargetLowering hooks.
This allows us to drop the IRBuilder.h include from TargetLowering.h.
Differential Revision: https://reviews.llvm.org/D103759
Use cast<> instead which will assert that the cast is correct and not just return null - the match() should have already failed if the cast isn't valid anyhow.
Fixes static analysis warning.
The current method getSingleClause requires an instance of OMPExecutableDirective to be called. Introduce a static version taking a list of clauses as argument instead that can be used during parsing/Sema before any OMPExecutableDirective has been created.
This is the same approach as taken for getClausesOfKind for getting more more than a single clause of a type which also has a method and static version. NFC patch extracted out of D99459 by request.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D103665