This simplifies the API and addresses a FIXME in
TwoAddressInstructionPass::convertInstTo3Addr.
Differential Revision: https://reviews.llvm.org/D110229
Bisecting and reducing opt pipelines that includes the
ModuleInlinerWrapperPass has turned out to be a bit problematic.
This is far from perfect (it still lacks information about inline
advisor params etc.), but it should give some kind of hint to what
the wrapped pipeline looks like when using -print-pipeline-passes.
Reviewed By: aeubanks, mtrofin
Differential Revision: https://reviews.llvm.org/D109878
This patch is to support transform something like
_mm512_add_ph(acc, _mm512_fmadd_pch(a, b, _mm512_setzero_ph()))
to _mm512_fmadd_pch(a, b, acc).
Differential Revision: https://reviews.llvm.org/D109953
This patch fixes a problem when the AAKernelInfo state was invalidated,
e.g., due to `optnone` for a kernel, but not all parts indicated the
invalidation properly. We further eliminate most full state
invalidations as they should never be necessary.
Differential Revision: https://reviews.llvm.org/D109468
It seems we missed one spot to persist `SampleContextFrameVector` into the global table (CSProfileGenerator::populateFunctionBoundarySamples:340) which causes a crash.
This change tried to fix it in a centralized way i. e. where we generate the `FunctionSamples`.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D110275
It happened that the LBR entry target can be the first address of text section which causes an out-of-range crash. So here add a boundary check.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D110271
Without preinliner, we need to tune down the cold count cutoff to merge/trim more context to limit profile size for large components. However it doesn't make sense for cold threshold to be higher than hot threshold, so we now change to use hot threshold as merging/trimming cut off instead.
Differential Revision: https://reviews.llvm.org/D110212
By not using ADDIW we can cause both an ADDIW and ADDI to be emitted
when the add has multiple users.
These instructions needed be added to the list of instructions that
only use the lower 32 bits of input.
I've also added tests for the wu versions, but I'm having trouble
showing bad codegen from it.
This is a follow-up of D110029, which uses bitset to indicate execution mode. This patches makes the changes in the function call.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D110279
Neither of these passes modify the CFG, allowing us to preserve DomTree
and LoopInfo across them by using setPreservesCFG.
Differential Revision: https://reviews.llvm.org/D110161
The return type of strlen is size_t, not just any integer.
This is a partial fix for an example based on:
https://llvm.org/PR50836
There's another bug here because we can still crash
processing a real strlen or something that looks like it.
For large app, dumping disasm of the whole program can be slow and result in gianant output. Adding a switch to dump specific symbols only.
Reviewed By: wlei
Differential Revision: https://reviews.llvm.org/D110079
The pass uses different cost kinds to estimate "old" and "interleaved" costs:
default cost kind for all targets override `getInterleavedMemoryOpCost()` is
`TCK_SizeAndLatency`. Although at the moment estimated `TCK_Latency` costs are
equal to `TCK_SizeAndLatency`, (so the change is NFC) it may change in future.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110100
When determining whether to fold branches to a common destination by
merging two blocks, SimplifyCFG will count the number of instructions to
be moved into the first basic block. However, there's no reason to count
free instructions like bitcasts and other similar instructions.
This resolves missed branch foldings with -fstrict-vtable-pointers in
llvm-test-suite's lambda benchmark.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D108837
Initially, lli only supported lazy mode for ORC. Greedy mode was added with e1579894d2 and it's the default setting now. DebugObjectManagerPlugin tests don't rely on laziness, so we can switch them to greedy in order to avoid some unnecessary complexity.
We can use riscv_vse intrinsic instead of riscv_vse_mask. The code here
is based on similar code for handling masked.scatter and vp.scatter.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D110206
Currenlty PseudoProbeInserter is a pass conditioned on a target switch. It works well with a single clang invocation. It doesn't work so well when the backend is called separately (i.e, through the linker or llc), where user has always to pass -pseudo-probe-for-profiling explictly. I'm making the pass a default pass that requires no command line arg to trigger, but will be actually run depending on whether the CU comes with `llvm.pseudo_probe_desc` metadata.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D110209
Note that MIRCanonicalizerID is declared in
llvm/include/llvm/CodeGen/Passes.h, which MIRCanonicalizerPass.cpp
includes.
Identified with readability-redundant-declaration.
These tests were disabled by accident after D107640. Actually, REQUIRES lines don't support `x86_64` and so these tests stopped running on all targets.
`native && target-x86_64` should be the correct term to express "x86_64 host targeting native arch".
Avoid relying on the default cost kinds in TTI calls (we already do this in other places in SLP) - noticed while trying to see how much work it'd be to extend D110242 and remove all remaining uses of default CostKind arguments.
This patch is for fixing potential shufflevector-related bugs like D93818.
As D93818, this patch change shufflevector's default placeholder to poison.
To reduce risk, it was divided into several patches, and this patch is for InstCombineVectorOps.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D110230
The execution mode of a kernel is stored in a global variable, whose value means:
- 0 - SPMD mode
- 1 - indicates generic mode
- 2 - SPMD mode execution with generic mode semantics
We are going to add support for SIMD execution mode. It will be come with another
execution mode, such as SIMD-generic mode. As a result, this value-based indicator
is not flexible.
This patch changes to bitset based solution to encode execution mode. Each
position is:
[0] - generic mode
[1] - SPMD mode
[2] - SIMD mode (will be added later)
In this way, `0x1` is generic mode, `0x2` is SPMD mode, and `0x3` is SPMD mode
execution with generic mode semantics. In the future after we add the support for
SIMD mode, `0b1xx` will be in SIMD mode.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D110029
This patch is for fixing potential shufflevector-related bugs like D93818.
As D93818, this patch change shufflevector's default placeholder to poison.
To reduce risk, it was divided into several patches, and this patch is for InstCombineCompares and InstructionCombining.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D110227
Based off a discussion on D110100, we should be avoiding default CostKinds whenever possible.
This initial patch removes them from the 'inner' target implementation callbacks - these should only be used by the main TTI calls, so this should guarantee that we don't cause changes in CostKind by missing it in an inner call. This exposed a few missing arguments in getGEPCost and reduction cost calls that I've cleaned up.
Differential Revision: https://reviews.llvm.org/D110242
This patch is for fixing potential shufflevector-related bugs like D93818.
As D93818, this patch change shufflevector's default placeholder to poison.
To reduce risk, it was divided into several patches, and this patch is for InstCombineCasts.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D110226
This patch splits up sve-extract-vector.ll into
* sve-extract-fixed-vector.ll
* sve-extract-scalable-vector.ll
For testing extracts of a fixed-width or scalable sub-vector from a
scalable source vector, respectively.