I disabled the widening in fa5cb4b because it run in an assert, which was
related to replacing values with different types. I forgot that an extend could
also be a zero-extend, which I have added now. This means that the approach now
is to create and insert a trunc value of the outerloop for each user, and use
that to replace IV values.
Differential Revision: https://reviews.llvm.org/D91690
The function was introduced on Jan 23, 2019 in commit
73078ecd38.
Its definition was removed on Oct 27, 2020 in commit
0930763b4b, leaving the declaration
unused.
This change introduces a new IR intrinsic named `llvm.pseudoprobe` for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.
A pseudo probe is used to collect the execution count of the block where the probe is instrumented. This requires a pseudo probe to be persisting. The LLVM PGO instrumentation also instruments in similar places by placing a counter in the form of atomic read/write operations or runtime helper calls. While these operations are very persisting or optimization-resilient, in theory we can borrow the atomic read/write implementation from PGO counters and cut it off at the end of compilation with all the atomics converted into binary data. This was our initial design and we’ve seen promising sample correlation quality with it. However, the atomics approach has a couple issues:
1. IR Optimizations are blocked unexpectedly. Those atomic instructions are not going to be physically present in the binary code, but since they are on the IR till very end of compilation, they can still prevent certain IR optimizations and result in lower code quality.
2. The counter atomics may not be fully cleaned up from the code stream eventually.
3. Extra work is needed for re-targeting.
We choose to implement pseudo probes based on a special LLVM intrinsic, which is expected to have most of the semantics that comes with an atomic operation but does not block desired optimizations as much as possible. More specifically the semantics associated with the new intrinsic enforces a pseudo probe to be virtually executed exactly the same number of times before and after an IR optimization. The intrinsic also comes with certain flags that are carefully chosen so that the places they are probing are not going to be messed up by the optimizer while most of the IR optimizations still work. The core flags given to the special intrinsic is `IntrInaccessibleMemOnly`, which means the intrinsic accesses memory and does have a side effect so that it is not removable, but is does not access memory locations that are accessible by any original instructions. This way the intrinsic does not alias with any original instruction and thus it does not block optimizations as much as an atomic operation does. We also assign a function GUID and a block index to an intrinsic so that they are uniquely identified and not merged in order to achieve good correlation quality.
Let's now look at an example. Given the following LLVM IR:
```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
%cmp = icmp eq i32 %x, 0
br i1 %cmp, label %bb1, label %bb2
bb1:
br label %bb3
bb2:
br label %bb3
bb3:
ret void
}
```
The instrumented IR will look like below. Note that each `llvm.pseudoprobe` intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID.
```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
%cmp = icmp eq i32 %x, 0
call void @llvm.pseudoprobe(i64 837061429793323041, i64 1)
br i1 %cmp, label %bb1, label %bb2
bb1:
call void @llvm.pseudoprobe(i64 837061429793323041, i64 2)
br label %bb3
bb2:
call void @llvm.pseudoprobe(i64 837061429793323041, i64 3)
br label %bb3
bb3:
call void @llvm.pseudoprobe(i64 837061429793323041, i64 4)
ret void
}
```
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D86490
Summary:
Expand existing loopsink testing to also test loopsinking using new pass
manager. Enable memoryssa for loopsink with new pass manager. This
combination exposed a bug that was previously fixed for loopsink
without memoryssa. When sinking an instruction into a loop, the source
block may not be part of the loop but still needs to be checked for
pointer invalidation. This is the fix for bugzilla #39695 (PR 54659)
expanded to also work with memoryssa.
Respond to review comments. Enable Memory SSA in legacy Loop Sink pass
under EnableMSSALoopDependency option control. Update tests accordingly.
Respond to review comments. Add options controlling whether memoryssa is
used for loop sink, defaulting to off. Expand testing based on these
options.
Respond to review comments. Properly indicated preserved analyses.
This relanding addresses a compile-time performance problem by moving
test for profile data earlier to avoid unnecessary computations.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: asbirlea (Alina Sbirlea)
Differential Revision: https://reviews.llvm.org/D90249
This reuses the existing lower-matrix-intrinsics pass rather than going
the legacy pass route of creating a new pass.
Use this new variant in the NPM -O0 pipeline.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D91811
Unused since https://reviews.llvm.org/D91762 and triggering
-Wunused-private-field
```
llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp:365:13: error: private field 'GetArgTLS' is not used [-Werror,-Wunused-private-field]
Constant *GetArgTLS;
^
llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp:366:13: error: private field 'GetRetvalTLS' is not used [-Werror,-Wunused-private-field]
Constant *GetRetvalTLS;
```
Reviewed By: stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D91820
One less instruction and reducing use count of zext.
As alive2 confirms, we're fine with all the weird combinations of
undef elts in constants, but unless the shift amount was undef
for a lane, we must sanitize undef mask to zero, since sign bits
are no longer zeros.
https://rise4fun.com/Alive/d7r
```
----------------------------------------
Optimization: zz
Precondition: ((C1 == (width(%r) - width(%x))) && isSignBit(C2))
%o0 = zext %x
%o1 = shl %o0, C1
%r = and %o1, C2
=>
%n0 = sext %x
%r = and %n0, C2
Done: 2016
Optimization is correct!
```
When constructing a MemoryLocation by hand, require that a
LocationSize is explicitly specified. D91649 will split up
LocationSize::unknown() into two different states, and callers
should make an explicit choice regarding the kind of MemoryLocation
they want to have.
rGf571fe6df585127d8b045f8e8f5b4e59da9bbb73 led to a warning of an unused
variable for MaxSafeDepDist (written but not used). It seems this
variable and assignment can be safely removed.
Summary:
Add support for passing source locations to libomptarget runtime functions using the ident_t struct present in the rest of the libomp API. This will allow the runtime system to give much more insightful error messages and debugging values.
Reviewers: jdoerfert grokos
Differential Revision: https://reviews.llvm.org/D87946
The assertion that vector widths are <= 256 elements was hard wired in the LV code. Eg, VE allows for vectors up to 512 elements. Test again the TTI vector register bit width instead - this is an NFC for non-asserting builds.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D91518
Some nested loops may share the same ExitingBB, so after we finishing FoldExit,
we need to notify OuterLoop and SCEV to drop any stored trip count.
Patched by: guopeilin
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D91325
This reverts commit 562addba65.
Reverted change too quickly, the failing test cases passed on the next build.
So reverting revert (to include the changes).
Summary:
This patch adds support for passing in the original delcaration name in the source file to the libomptarget runtime. This will allow the runtime to provide more intelligent debugging messages. This patch takes the original expression parsed from the OpenMP map / update clause and provides a textual representation if it was explicitly mapped, otherwise it takes the name of the variable declaration as a fallback. The information in passed to the runtime in a global array of strings that matches the existing ident_t source location strings using ";name;filename;column;row;;"
Reviewers: jdoerfert
Differential Revision: https://reviews.llvm.org/D89802
This is the same fix as 23aeadb89d,
just for CloneScopedAliasMetadata rather than PropagateCallSiteMetadata.
In this case the previous outcome was incorrectly dropped metadata,
as it was not part of the computed metadata map.
The real change in the test is that the first load now retains
metadata, the rest of the changes are due to changes in metadata
numbering.
The VMap also contains a mapping from Argument => Instruction,
where the instruction is part of the original function, not the
inlined one. The code was assuming that all the instructions in
the VMap were inlined.
This was a pre-existing problem for the loop access metadata, but
was extended to the more common noalias metadata by
27f647d117, thus causing miscompiles.
There is a similar assumption inside CloneAliasScopeMetadata(), so
that one likely needs to be fixed as well.
Summary:
Expand existing loopsink testing to also test loopsinking using new pass
manager. Enable memoryssa for loopsink with new pass manager. This
combination exposed a bug that was previously fixed for loopsink
without memoryssa. When sinking an instruction into a loop, the source
block may not be part of the loop but still needs to be checked for
pointer invalidation. This is the fix for bugzilla #39695 (PR 54659)
expanded to also work with memoryssa.
Respond to review comments. Enable Memory SSA in legacy Loop Sink pass
under EnableMSSALoopDependency option control. Update tests accordingly.
Respond to review comments. Add options controlling whether memoryssa is
used for loop sink, defaulting to off. Expand testing based on these
options.
Respond to review comments. Properly indicated preserved analyses.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: asbirlea (Alina Sbirlea)
Differential Revision: https://reviews.llvm.org/D90249
As Wei Mi is reporting in post-commit review
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20201116/853479.html
teaching -reassociate about add-like-or's (70472f3) results in breaking apart
load widening patterns, and reassociating them.
For now, simply exclude any such `or` that appears to be a root of
load widening idiom from the or->add transformation.
Note that the heuristic is greedy, it doesn't ensure that loads
can *actually* be widened into a single load.
This patch generalizes the extraction of a constraint for a given
condition. It allows decompose to return a vector of c * X pairs, which
allows de-composing multiple instructions in the future.
It also adds more clarifying comments.
In some cases we can handle IV and iter count of different types. It's a typical situation
after IV have been widened. This patch adds support for such cases, when legal.
Differential Revision: https://reviews.llvm.org/D88528
Reviewed By: skatkov
Support more instructions in SpeculativeExecution pass:
- ExtractElement
- InsertElement
- ShuffleVector
Differential Revision: https://reviews.llvm.org/D91633
I don't see any reason not to unconditionally retrieve TLI, it's fairly
cheap.
Fixes calls-errno.ll under NPM.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D91476
m_SpecificInt has the same 'no undef element' behaviour as m_APInt so no change there, and anyway we have test coverage for undef elements in the fold.
Noticed while fixing a Wshadow warning about shadow Value *X, *Y variables.
This patch introduces a new VPDef class, which can be used to
manage VPValues defined by recipes/VPInstructions.
The idea here is to mirror VPUser for values defined by a recipe. A
VPDef can produce either zero (e.g. a store recipe), one (most recipes)
or multiple (VPInterleaveRecipe) result VPValues.
To traverse the def-use chain from a VPDef to its users, one has to
traverse the users of all values defined by a VPDef.
VPValues now contain a pointer to their corresponding VPDef, if one
exists. To traverse the def-use chain upwards from a VPValue, we first
need to check if the VPValue is defined by a VPDef. If it does not have
a VPDef, this means we have a VPValue that is not directly defined
iniside the plan and we are done.
If we have a VPDef, it is defined inside the region by a recipe, which
is a VPUser, and the upwards def-use chain traversal continues by
traversing all its operands.
Note that we need to add an additional field to to VPVAlue to link them
to their defs. The space increase is going to be offset by being able to
remove the SubclassID field in future patches.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D90558
This wasn't properly remapping the type like with the other
attributes, so this would end up hitting a verifier error after
linking different modules using byref.
For the scattered operands of load instructions it makes sense
to use gathering load intrinsic, which can lower to native instruction
for X86/AVX512 and ARM/SVE. This also enables building
vectorization tree with entries containing scattered operands.
The next step is to add scattered store.
Fixes PR47629 and PR47623
Differential Revision: https://reviews.llvm.org/D90445
When processing conditional branches, if the condition is an AND of 2 compares
and the true successor only has the current block as predecessor, queue both
conditions for the true successor.