The scheduler should set critical/excess register usage thresholds that
are guided by the maximum possible occupancy for the function. This
change is focused on setting proper lower bounds on register usage which
we would typically only see when a specific number of maximum waves is
requested with the "waves-per-eu" attribute, or by setting
"amdgpu-num-vgpr|sgpr" directly. This was broken previously. I have a
follow-on patch that will address issues with the scheduler not
targeting correct upper bounds on register usage which is typical with
launch bounds and min "waves-per-eu".
Changes by this patch:
Set the initial critical register usage thresholds to minimum values
that are determined by the maximum possible occupancy for the function,
or the number of allocatable registers, whichever is lower.
Avoid unisgned overflow if register limits are lower than the register
tracking "ErrorMargin", I.e. when using stress-regalloc=2.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D112373
Otherwise, ODRUniquing would map some member method/variable MDNodes
to have enum type DIScope, resulting in invalid debug info and bad
DWARF.
- Add a Verifier check that when a 'scope:' operand is an ODR type that is not an enum.
- Makes ODRUniquing apply to only ODR types with the same tag so that the debuginfo/DWARF is well-formed.
Reviewed By: probinson, aprantl
Differential Revision: https://reviews.llvm.org/D111770
The new key/value pairs that are added to each module's stats are:
"debugInfoByteSize": The size in bytes of debug info for each module.
"debugInfoIndexTime": The time in seconds that it took to index the debug info.
"debugInfoParseTime": The time in seconds that debug info had to be parsed.
At the top level we add up all of the debug info size, parse time and index time with the following keys:
"totalDebugInfoByteSize": The size in bytes of all debug info in all modules.
"totalDebugInfoIndexTime": The time in seconds that it took to index all debug info if it was indexed for all modules.
"totalDebugInfoParseTime": The time in seconds that debug info was parsed for all modules.
Differential Revision: https://reviews.llvm.org/D112501
**Context:**
This is a second attempt at introducing signature regeneration to llvm-objcopy. In this diff: https://reviews.llvm.org/D109840, a script was introduced to test
the validity of a code signature. In this diff: https://reviews.llvm.org/D109803 (now reverted), an effort was made to extract the signature generation behavior out of LLD into a common location for use in llvm-objcopy. In this diff: https://reviews.llvm.org/D109972 it was decided that there was no appropriate common location and that a small amount of duplication to bring signature generation to llvm-objcopy would be better. This diff introduces this duplication.
**Summary**
Prior to this change, if a LC_CODE_SIGNATURE load command
was included in the binary passed to llvm-objcopy, the command and
associated section were simply copied and included verbatim in the
new binary. If rest of the binary was modified at all, this results
in an invalid Mach-O file. This change regenerates the signature
rather than copying it.
The code_signature_lc.test test was modified to include the yaml
representation of a small signed MachO executable in order to
effectively test the signature generation.
Reviewed By: alexander-shaposhnikov, #lld-macho
Differential Revision: https://reviews.llvm.org/D111164
This diff adds a data formatter for libstdcpp's bitset. Besides, it unifies the tests for bitset for libcxx and libstdcpp for maintainability.
Reviewed By: wallace
Differential Revision: https://reviews.llvm.org/D112180
Rationale:
The currently used trait was demanding that all types are the same
which is not true (since the sparse part may change and the dim sizes
may be relaxed). This revision uses the correct trait and makes the
rank match test explicit in the verify method.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D112576
This refactoring adds a few "event" functions (start/end loop-seq/loop) for
readability of the core function of codegen. This also prepares sparse tensor
output codegen, where these "event" functions will provide convenient
placeholders to start or stop insertion bookkeeping.
This revision also includes a few various minor changes that kept on
pending in my local workspace.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D112506
Essentially moves the foreach over sm integers into a macro and instantiates it for nvptx.
NFC in that the macro is not presently instantiated for amdgpu as the corresponding code doesn't compile yet.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D111987
This diff does away with `addEntriesForFunctionsWithoutUnwindInfo()`,
because `addSymbol()` can now determine which functions need those
entries.
While overhauling UnwindInfoSection, I also parallelized the relocation
of the contents of the CUEs. This somewhat offsets the time regression
from creating one InputSection per CUE (which was done in D109944).
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D109945
Compact unwind entries (CUEs) contain pointers to their respective
function symbols. However, during the link process, it's far more useful
to have pointers from the function symbol to the CUE than vice versa.
This diff adds that pointer in the form of `Defined::compactUnwind`.
In particular, when doing dead-stripping, we want to mark CUEs live when
their function symbol is live; and when doing ICF, we want to dedup
sections iff the symbols in that section have identical CUEs. In both
cases, we want to be able to locate the symbols within a given section,
as well as locate the CUEs belonging to those symbols. So this diff also
adds `InputSection::symbols`.
The ultimate goal of this refactor is to have ICF support dedup'ing
functions with unwind info, but that will be handled in subsequent
diffs. This diff focuses on simplifying `-dead_strip` --
`findFunctionsWithUnwindInfo` is no longer necessary, and
`Defined::isLive()` is now a lot simpler. Moreover, UnwindInfoSection no
longer has to check for dead CUEs -- we simply avoid adding them in the
first place.
Additionally, we now support stripping of dead LSDAs, which follows
quite naturally since `markLive()` can now reach them via the CUEs.
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D109944
Substring information on slice operation has been added in D112441.
The operations fir.array_load, fir.array_coor and fir.array_merge_store can take
a slice but not with a substring. This patch add this check in their verifier.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: schweitz
Differential Revision: https://reviews.llvm.org/D112568
Original commit message: "
Original commit message: "
Original commit message: "
Original commit message:"
The current infrastructure in lib/Interpreter has a tool, clang-repl, very
similar to clang-interpreter which also allows incremental compilation.
This patch moves clang-interpreter as a test case and drops it as conditionally
built example as we already have clang-repl in place.
"
This patch also ignores ppc due to missing weak symbol for __gxx_personality_v0
which may be a feature request for the jit infrastructure. Also, adds a missing
build system dependency to the orc jit.
"
Additionally, this patch defines a custom exception type and thus avoids the
requirement to include header <exception>, making it easier to deploy across
systems without standard location of the c++ headers.
"
This patch also works around PR49692 and finds a way to use llvm::consumeError
in rtti mode.
"
This patch also checks if stl is built with rtti.
Differential revision: https://reviews.llvm.org/D107049
Replace some custom matrix diagnostic kinds with the more generic
err_builtin_invalid_arg_type introduced in D111985.
Reviewed By: aaron.ballman, erichkeane
Differential Revision: https://reviews.llvm.org/D112532
RewritePatterns.td/RewritePatterns.inc is used only by the
FIROps.cpp file. This patch move this file logically in the Dialect
folder together with FIRDialet, FIROps, FIRTypes ...
It also rename it to CanonicalizationPatterns.td.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: schweitz
Differential Revision: https://reviews.llvm.org/D112522
This is a test-only failure. The test wrongly assumes that this gets us
a tagged pointer:
```
NSObject* num1 = @7;
assert(isTaggedPtr(num1));
```
However, on newer deployment targets that have “const data support” we
get a “normal” pointer to constant object.
Radar-Id: rdar://problem/83217293
Polynomial approximation can be extented to support N-d vectors.
N-dimensional vectors are useful when vectorizing operations on N-dimensional
tiles. Before lowering to LLVM these vectors are usually unrolled or flattened
to 1-dimensional vectors.
Differential Revision: https://reviews.llvm.org/D112566
Mark LWG2731 as complete. The type alias `mutex_type` is only provided if
`scoped_lock` is given one mutex type and it has been implemented that
way since the beginning of Clang 5 it seems. There already are tests for
verifying existence (and lack thereof) for `mutex_type` type alias
depending on the number of mutex types, so there is nothing to
do for this LWG issue.
Reviewed By: Quuxplusone, Mordante, #libc
Differential Revision: https://reviews.llvm.org/D112462
Added a notification in the placeholder section. While writing things
like preciate of an attribute, we may embed certain placeholder in the C
expression. Note that the type of the placeholder is only guaranteed to
be the base type like mlir::Type, it's better not to use the derived
type which is based on the implementation.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D112396
This patch refactors the shared_ptr methods from being defined out-of-line
to being defined inline in the class, like what we do for all new code in
the library. The benefits of doing that are that code is not as scattered
around and is hence easier to understand, and it avoids a ton of duplication
due to SFINAE checks. Defining the method where it is declared also removes
the possibility for mismatched attributes.
As a fly-by change, this also:
- Adds a few _LIBCPP_HIDE_FROM_ABI attributes
- Uses __enable_if_t instead of enable_if as a function argument, to match
the style that we use everywhere else.
Differential Revision: https://reviews.llvm.org/D112478
Use RdxDesc->getOpcode instead of getUnderlingInstr()->getOpcode.
Move the code which finds Kind and IsOrdered to be outside the for loop
since neither of these change with the vector part.
Differential Revision: https://reviews.llvm.org/D112547
It seems like protobuf crashed the `std::string` checker.
Somehow it acquired `UnknownVal` as the sole `std::string` constructor
parameter, causing a crash in the `castAs<Loc>()`.
This patch addresses this.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D112551
1.Combining kind min/max of Vector reduction op has been changed to
minf/maxf, minsi/maxsi, and minui/maxui. Modify getVectorReductionOp
accordingly.
2.Add min/max to supported reductions.
Reviewed By: dcaballe, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D112246
This patch implements __builtin_elementwise_max and
__builtin_elementwise_min, as specified in D111529.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D111985
We were previously always emitting the GOT into `__DATA_CONST`, even for
target platforms where it should end up in `__DATA`.
I stumbled onto this while trying to use the `class-dump` tool -- with
the wrong segment names, it fails to locate the ObjC runtime info and
therefore fails to dump any classes.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D112500
We make assumption that:
getDeclForComment(getDeclForComment(X)) == getDeclForComment(X)
but this is not true if you have a template
instantionation of a template instantiation, which is the case when, for
example, you have a <=> operator in a templated class.
This fix makes getDeclForComment() call itself recursively to ensure
this property is always true.
Fixes https://github.com/clangd/clangd/issues/901
Differential Revision: https://reviews.llvm.org/D112527
It was discovered that an extra register COPY remained when expanding a
(variable length) memory operation with a loop and there was another use of
the involved address register(s) afterwards.
A simple fix for this is to COPY the address registers before the loop and
use that new vreg instead.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D112065
The final reduction nodes should not be reordered, the order does not
matter for reductions. Also, it might be profitable to vectorize smaller
reduction trees, reduction cost may compensate small tree cost.
Part of D111574
Differential Revision: https://reviews.llvm.org/D112467