For Zvlsseg spilling, we need to convert the pseudo instructions
into multiple vector load/store instructions with appropriate offsets.
For example, for PseudoVSPILL3_M2, we need to convert it to
VS2R %v2, %base
ADDI %base, %base, (vlenb x 2)
VS2R %v4, %base
ADDI %base, %base, (vlenb x 2)
VS2R %v6, %base
We need to keep the size of the offset in the pseudo spilling instructions.
In this case, it is (vlenb x 2).
In the original implementation, we use the size of frame objects divide the
number of vectors in zvlsseg types. The size of frame objects is not
necessary exactly the same as the spilling data. It may be larger than
it. So, we change it to (VLENB x LMUL) in this patch. The calculation is
more direct and easy to understand.
Differential Revision: https://reviews.llvm.org/D101869
Add unit test infrastructure for the ORC runtime, plus a cut-down
extensible_rtti system and extensible_rtti unit test.
Removes the placeholder.cpp source file.
Differential Revision: https://reviews.llvm.org/D102080
All glue and clutter in the linalg ops has been replaced by proper
sparse tensor type encoding. This code is no longer needed. Thanks
to ntv@ for giving us a temporary home in linalg.
So long, and thanks for all the fish.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D102098
This change was originally landed in: 5000a1b4b9
It was reverted in: 061e071d8c
This change adds support for a new WASM_SEG_FLAG_STRINGS flag in
the object format which works in a similar fashion to SHF_STRINGS
in the ELF world.
Unlike the ELF linker this support is currently limited:
- No support for SHF_MERGE (non-string merging)
- Always do full tail merging ("lo" can be merged with "hello")
- Only support single byte strings (p2align 0)
Like the ELF linker merging is only performed at `-O1` and above.
This fixes part of https://bugs.llvm.org/show_bug.cgi?id=48828,
although crucially it doesn't not currently support debug sections
because they are not represented by data segments (they are custom
sections)
Differential Revision: https://reviews.llvm.org/D97657
This is roughly equivalent to the floating point portion of
`AArch64TargetLowering::LowerVSETCC`. Main part that's missing is the v4s16 bit.
This also adds helpers equivalent to `EmitVectorComparison`, and
`changeVectorFPCCToAArch64CC`. This moves `changeFCMPPredToAArch64CC` out of
the selector into AArch64GlobalISelUtils for the sake of code reuse.
This is done in post-legalizer lowering with pseudos to simplify selection.
The imported patterns end up handling selection for us this way.
Differential Revision: https://reviews.llvm.org/D101782
The combines in `tryCombineMemCpyFamily` have heuristics (e.g.
`TLI.getMaxStoresPerMemset`) which consider size. So, theoretically, enabling
these combines on minsize functions shouldn't be harmful.
With this enabled we save 0.9% geomean on CTMark at -Oz, and 5.1% on Bullet.
There are no code size regressions.
Differential Revision: https://reviews.llvm.org/D102198
This is a test case for D101970, which shows the optimization opportunity for
lea (reg1, reg2), reg3
sub reg3, reg4
to
sub reg1, reg4
sub reg2, reg4
Differential Revision: https://reviews.llvm.org/D102010
These types also wanted to be/were copy assignable, and using the
implicit copy ctor is deprecated in the presence of an explicit copy
ctor.
Removing the explicit copy ctor provides the desired behavior - both
ctor and assignment operator are available implicitly.
Also while I was nearby there were some missing std::moves on shared
pointer parameters.
When a variable is used in an initializer of an aggregate
for its reference-type field this counts as aliasing.
Differential Revision: https://reviews.llvm.org/D101791
D96215 takes care of the situation where the variable is captured into
a nearby lambda. This patch takes care of the situation where
the current function is the lambda and the variable is one of its captures
from an enclosing scope.
The analogous problem for ^{blocks} is already handled automagically
by D96215.
Differential Revision: https://reviews.llvm.org/D101787
The utility function clang::tidy::utils::hasPtrOrReferenceInFunc() scans the
function for pointer/reference aliases to a given variable. It currently scans
for operator & over that variable and for declarations of references to that
variable.
This patch makes it also scan for C++ lambda captures by reference
and for Objective-C block captures.
Differential Revision: https://reviews.llvm.org/D96215
`weak_equality` and `strong_equality` were removed before being
standardised, and need to be removed.
Also adjusts `common_comparison_category` since its test needed
adjusting due to the equality deletions.
Differential Revision: https://reviews.llvm.org/D100283
It is an order of magnitude slower than the second slowest test
according to obj/llvm/test/.lit_test_times.txt.
The two slowest are:
2.870437e+02 CodeGen/PowerPC/aix-xcoff-huge-relocs.ll
2.850697e+01 tools/llvm-readobj/ELF/file-header-machine-types.test
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D102190
Let's say you represent (i32, i32) as an i64 from which the parts
are extracted with lshr/trunc. Then, if you compare two tuples by
parts you get something like A[0] == B[0] && A[1] == B[1], just
that the part extraction happens by lshr/trunc and not a narrow
load or similar.
The fold implemented here reduces such equality comparisons by
converting them into a comparison on a larger part of the integer
(which might be the whole integer). It handles both the "and of eq"
and the conjugated "or of ne" case.
I'm being conservative with one-use for now, though this could be
relaxed if profitable (the base pattern converts 11 instructions
into 5 instructions, but there's quite a few variations on how it
can play out).
Differential Revision: https://reviews.llvm.org/D101232
This patch adds support for Darwin's libsystem math vector functions to
TLI. Darwin's libsystem provides a range of vector functions for libm
functions.
This initial patch only adds the 2 x double and 4 x float versions,
which are available on both X86 and ARM64. On X86, wider vector versions
are supported as well.
Reviewed By: jroelofs
Differential Revision: https://reviews.llvm.org/D101856
This change adds support for a new WASM_SEG_FLAG_STRINGS flag in
the object format which works in a similar fashion to SHF_STRINGS
in the ELF world.
Unlike the ELF linker this support is currently limited:
- No support for SHF_MERGE (non-string merging)
- Always do full tail merging ("lo" can be merged with "hello")
- Only support single byte strings (p2align 0)
Like the ELF linker merging is only performed at `-O1` and above.
This fixes part of https://bugs.llvm.org/show_bug.cgi?id=48828,
although crucially it doesn't not currently support debug sections
because they are not represented by data segments (they are custom
sections)
Differential Revision: https://reviews.llvm.org/D97657
For opaque pointers, we're trying to avoid uses of
PointerType::getElementType().
A couple of ISel places use PointerType::getElementType(). Some of these
are easy to fix by using ArgListEntry's indirect types.
The inalloca type wasn't stored there, as opposed to preallocated and
byval which have their indirect types available, so add it and use it.
Differential Revision: https://reviews.llvm.org/D101713
Instead of using VMap, which may include instructions from the
caller as a result of simplification, iterate over the
(FirstNewBlock, Caller->end()) range, which will only include new
instructions.
Fixes https://bugs.llvm.org/show_bug.cgi?id=50270.
Differential Revision: https://reviews.llvm.org/D102110
This patch does a few cleanup things:
1. The non-standalone scudo has a problem where GWP-ASan allocations
may not meet alignment requirements where Scudo was requested to have
alignment >= 16. Use the new GWP-ASan API to fix this.
2. The standalone variant loses some debugging information inside of
GWP-ASan because we ask GWP-ASan to allocate an aligned size in the
frontend. This means reports end up with 'UaF on a 16-byte allocation'
for a 1-byte allocation with 16-byte alignment. Also use the new API to
fix this.
3. Add post-alloc hooks for GWP-ASan intercepted allocations, and add
stats tracking for GWP-ASan allocations.
4. Add a small test that checks the alignment of the frontend
allocator, so that it can be used under GWP-ASan torture mode.
5. Add GWP-ASan torture mode as a testing configuration to catch these
regressions.
Depends on D94830, D95889.
Reviewed By: cryptoad
Differential Revision: https://reviews.llvm.org/D95884
A very elaborate, but also very fun revision because all
puzzle pieces are finally "falling in place".
1. replaces lingalg annotations + flags with proper sparse tensor types
2. add rigorous verification on sparse tensor type and sparse primitives
3. removes glue and clutter on opaque pointers in favor of sparse tensor types
4. migrates all tests to use sparse tensor types
NOTE: next CL will remove *all* obsoleted sparse code in Linalg
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D102095
We had a hardcoded check and a stale TODO, written back when we only had
support for one architecture.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D102154
In particular, we should apply the `-undefined` behavior to all
such symbols, include those that are specified via the command line
(i.e. `-e`, `-u`, and `-exported_symbol`). ld64 supports this too.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D102143
If spills are to registers instead of to the stack then a copy will be used
and frame index scavenging is not required.
This patch adds debug info to frame index scavenging and makes sure that
spilling to registers does not cause frame index scavenging.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D101360
According to the API contract, LinalgLoopDistributionOptions
expects to work on parallel iterators. When getting processor
information, only loop ranges for parallel dimensions should
be fed in. But right now after generating scf.for loop nests,
we feed in *all* loops, including the ones materialized for
reduction iterators. This can cause unexpected distribution
of reduction dimensions. This commit fixes it.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D102079