If no interchange vector is given initialize it with the identity permutation from 0 to number of loops.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D110249
For large app, dumping disasm of the whole program can be slow and result in gianant output. Adding a switch to dump specific symbols only.
Reviewed By: wlei
Differential Revision: https://reviews.llvm.org/D110079
The pass uses different cost kinds to estimate "old" and "interleaved" costs:
default cost kind for all targets override `getInterleavedMemoryOpCost()` is
`TCK_SizeAndLatency`. Although at the moment estimated `TCK_Latency` costs are
equal to `TCK_SizeAndLatency`, (so the change is NFC) it may change in future.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110100
This doesn't add all of the document numbers, but it adds a bunch of
them. Not all of the documents are available on the committee page
(they're old enough that they come from a time when the mailing was
comprised of physical pieces of paper), so some of the documents listed
are assumed to be correct based on my reading of editor's reports.
When determining whether to fold branches to a common destination by
merging two blocks, SimplifyCFG will count the number of instructions to
be moved into the first basic block. However, there's no reason to count
free instructions like bitcasts and other similar instructions.
This resolves missed branch foldings with -fstrict-vtable-pointers in
llvm-test-suite's lambda benchmark.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D108837
Specifying .global and .weak causes a compiler warning:
warning: __sigsetjmp changed binding to STB_WEAK
Specifying only .weak should have the same effect without causing a
warning.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D110178
Initially, lli only supported lazy mode for ORC. Greedy mode was added with e1579894d2 and it's the default setting now. DebugObjectManagerPlugin tests don't rely on laziness, so we can switch them to greedy in order to avoid some unnecessary complexity.
This patch adds support for an RAII struct that will print function
traces when placed inside of a function declaration. Each successive
call will increase the indentation to make it easier to visually
inspect.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D110202
This revision removes the ad-hoc MemRefs that were needed using the old
ABI (when we still passed by value) and replaces them with the shared
StridedMemRef definitions of CRunnerUtils (possible now that we pass by
pointer). This avoids code duplication and makes sure we have a consistent
view of strided memory references in all our support libraries.
Reviewed By: jsetoain
Differential Revision: https://reviews.llvm.org/D110221
We can use riscv_vse intrinsic instead of riscv_vse_mask. The code here
is based on similar code for handling masked.scatter and vp.scatter.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D110206
Currenlty PseudoProbeInserter is a pass conditioned on a target switch. It works well with a single clang invocation. It doesn't work so well when the backend is called separately (i.e, through the linker or llc), where user has always to pass -pseudo-probe-for-profiling explictly. I'm making the pass a default pass that requires no command line arg to trigger, but will be actually run depending on whether the CU comes with `llvm.pseudo_probe_desc` metadata.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D110209
Note that MIRCanonicalizerID is declared in
llvm/include/llvm/CodeGen/Passes.h, which MIRCanonicalizerPass.cpp
includes.
Identified with readability-redundant-declaration.
These tests were disabled by accident after D107640. Actually, REQUIRES lines don't support `x86_64` and so these tests stopped running on all targets.
`native && target-x86_64` should be the correct term to express "x86_64 host targeting native arch".
Avoid relying on the default cost kinds in TTI calls (we already do this in other places in SLP) - noticed while trying to see how much work it'd be to extend D110242 and remove all remaining uses of default CostKind arguments.
This patch is for fixing potential shufflevector-related bugs like D93818.
As D93818, this patch change shufflevector's default placeholder to poison.
To reduce risk, it was divided into several patches, and this patch is for InstCombineVectorOps.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D110230
LWG 2447 is marked as `Complete`, but there is no `static_assert` to
reject volatile types in `std::allocator`. See the discussion at
https://reviews.llvm.org/D108856.
Add `static_assert` in `std::allocator` to disallow volatile types. Since this
is an implementation choice, mark the binding test as `libc++` only.
Remove tests that use containers backed by `std::allocator` that test
the container when used with a volatile type.
Reviewed By: ldionne, #libc
Differential Revision: https://reviews.llvm.org/D109056
The execution mode of a kernel is stored in a global variable, whose value means:
- 0 - SPMD mode
- 1 - indicates generic mode
- 2 - SPMD mode execution with generic mode semantics
We are going to add support for SIMD execution mode. It will be come with another
execution mode, such as SIMD-generic mode. As a result, this value-based indicator
is not flexible.
This patch changes to bitset based solution to encode execution mode. Each
position is:
[0] - generic mode
[1] - SPMD mode
[2] - SIMD mode (will be added later)
In this way, `0x1` is generic mode, `0x2` is SPMD mode, and `0x3` is SPMD mode
execution with generic mode semantics. In the future after we add the support for
SIMD mode, `0b1xx` will be in SIMD mode.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D110029
This patch is for fixing potential shufflevector-related bugs like D93818.
As D93818, this patch change shufflevector's default placeholder to poison.
To reduce risk, it was divided into several patches, and this patch is for InstCombineCompares and InstructionCombining.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D110227
Based off a discussion on D110100, we should be avoiding default CostKinds whenever possible.
This initial patch removes them from the 'inner' target implementation callbacks - these should only be used by the main TTI calls, so this should guarantee that we don't cause changes in CostKind by missing it in an inner call. This exposed a few missing arguments in getGEPCost and reduction cost calls that I've cleaned up.
Differential Revision: https://reviews.llvm.org/D110242
This patch is for fixing potential shufflevector-related bugs like D93818.
As D93818, this patch change shufflevector's default placeholder to poison.
To reduce risk, it was divided into several patches, and this patch is for InstCombineCasts.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D110226
This patch splits up sve-extract-vector.ll into
* sve-extract-fixed-vector.ll
* sve-extract-scalable-vector.ll
For testing extracts of a fixed-width or scalable sub-vector from a
scalable source vector, respectively.
Summary:
The thread ID function was reintroduced in D110195, but could
potentially be removed by the optimizer. Make the function noinline to
preserve the call sites and add it to the externalization RAII so its
definition is not removed by the attributor.
This code seems untested and is likely obsolete, because this case
should already be handled by the code that legalizes the result type
of EXTRACT_SUBVECTOR.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D110061
Like normal atomicrmw operations, at -O0 the simple register-allocator can
insert spills into the LL/SC loop if it's expanded and visible when regalloc
runs. This can cause the operation to never succeed by repeatedly clearing the
monitor. Instead expand to a cmpxchg, which has a pseudo-instruction for -O0.
Restore the checking of addresses in ICF test which was testing the
behaviour of ICF with regards to different alignments of otherwise
identical sections. Also make the test more robust to layout changes.
Differential Revision: https://reviews.llvm.org/D110090
At first, lli only supported lazy mode for ORC. Greedy mode was added with e1579894d2 and is the default settings now. JITLoaderGDB tests don't rely on laziness, so we can switch them to greedy and remove some complexity.
This is required to codegen something like:
<vscale x 8 x i16> @llvm.experimental.vector.insert(<vscale x 8 x i16> %vec,
<vscale x 2 x i16> %subvec,
i64 %idx)
where the output vector is legal, but the input vector needs promoting.
It implements this by performing the whole operation on the promoted type,
and then truncating the result.
Reviewed By: david-arm, craig.topper
Differential Revision: https://reviews.llvm.org/D110059
IR with matrix intrinsics is likely to also contain large vector
operations, which can benefit from early simplifications.
This is the last step in a series of changes to improve code-gen for
code using matrix subscript operators with the C/C++ matrix extension in
CLang, like
using matrix_t = double __attribute__((matrix_type(15, 15)));
void foo(unsigned i, matrix_t &A, matrix_t &B) {
for (unsigned j = 0; j < 4; ++j)
for (unsigned k = 0; k < i; k++)
B[k][j] -= A[k][j] * B[i][j];
}
https://clang.godbolt.org/z/6dKxK1Ed7
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D102496
This reverts commit 2f6b07316f.
This caused several bots to hit an infinite loop at stage 2,
so it needs to be reverted while figuring out how to fix that.
This reverts commit 52832cd917.
The motivating commit 2f6b07316f caused several bots to hit
an infinite loop at stage 2, so that needs to be reverted too
while figuring out how to fix that.
The matrix extension requires the indices for matrix subscript
expression to be valid and it is UB otherwise.
extract/insertelement produce poison if the index is invalid, which
limits the optimizer to not be bale to scalarize load/extract pairs for
example, which causes very suboptimal code to be generated when using
matrix subscript expressions with variable indices for large matrixes.
This patch updates IRGen to emit assumes to for index expression to
convey the information that the index must be valid.
This also adjusts the order in which operations are emitted slightly, so
indices & assumes are added before the load of the matrix value.
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D102478
This allows VMOVL in tail predicated loops so long as the the vector
size the VMOVL is extending into is less than or equal to the size of
the VCTP in the tail predicated loop. These cases represent a
sign-extend-inreg (or zero-extend-inreg), which needn't block tail
predication as in https://godbolt.org/z/hdTsEbx8Y.
For this a vecsize has been added to the TSFlag bits of MVE
instructions, which stores the size of the elements that the MVE
instruction operates on. In the case of multiple size (such as a
MVE_VMOVLs8bh that extends from i8 to i16, the largest size was be
chosen). The sizes are encoded as 00 = i8, 01 = i16, 10 = i32 and 11 =
i64, which often (but not always) comes from the instruction encoding
directly. A unit test was added, and although only a subset of the
vecsizes are currently used, the rest should be useful for other cases.
Differential Revision: https://reviews.llvm.org/D109706
This regressed in D110181 and apparently the header intentionally requires
DEBUG_TYPE to be defined by the including file. Just exclude the header from
the module to unbreak the build.
The folding rule (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C,
Idx, 0)) creates a malformed SELECT IR if C is a vector while Idx is scalar.
SELECT VecC, ScalarIdx, 0
We could splat Idx to a vector but it defeats the purpose of
optimisation. Don't apply the folding rule in this case.
This fixes a regression from commit d561b6fbdb.
Most of the code wasn't yet scalable safe, although most of the
code conceptually just works for scalable vectors. This change
makes the algorithm work on ElementCount, where appropriate,
and leaves the fixed-width only code to use `getFixedNumElements`.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D110058
As we're checking the cost debug analysis these should match the original IR line - so we shouldn't have any variable naming issues.
I'm investigating v4i32 mul -> PMADDDW costs handling (for PR47437) and these CHECK lines were proving tricky to keep track of
This patch updates VectorCombine to use a worklist to allow iterative
simplifications where a combine enables other combines.
Suggested in D100302.
The main use case at the moment is foldSingleElementStore and
scalarizeLoadExtract working together to improve scalarization.
Note that we now also do not run SimplifyInstructionsInBlock on the
whole function if there have been changes. This means we fail to
remove/simplify instructions not related to any of the vector combines.
IMO this is fine, as simplifying the whole function seems more like a
workaround for not tracking the changed instructions.
Compile-time impact looks neutral:
NewPM-O3: +0.02%
NewPM-ReleaseThinLTO: -0.00%
NewPM-ReleaseLTO-g: -0.02%
http://llvm-compile-time-tracker.com/compare.php?from=52832cd917af00e2b9c6a9d1476ba79754dcabff&to=e66520a4637290550a945d528e3e59573485dd40&stat=instructions
Reviewed By: spatel, lebedev.ri
Differential Revision: https://reviews.llvm.org/D110171