InstrEmitter.h needs TargetMachine but relies on a forward declaration
of TargetMachine in MachineOperand.h. This patch adds a forward
declaration right in InstrEmitter.h.
While we are at it, this patch removes the one in MachineOperand.h,
where it is unnecessary.
D95161 removed the option `--libomptarget-nvptx-path`, which is used in
the tests for `libomptarget-nvptx`.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95293
In the cloning infrastructure, only track an MDNode mapping,
without explicitly storing the Metadata mapping, same as is done
during inlining. This makes things slightly simpler.
To simplify the transition to using LTOBackend, move DisableVerify to
the LTOCodeGenerator class, like most/all other options.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D95223
a6f0221276 enabled intersection of FMF on reduction instructions,
so it is safe to ease the check here.
There is still some room to improve here - it looks like we
have nearly duplicate flags propagation logic inside of the
LoopUtils helper but it is limited targets that do not form
reduction intrinsics (they form the shuffle expansion).
In c042aff886, unused FileCheck prefixes became an error, which exposed some testing bugs in four exegesis tests. I've tried my best to either fix the testing bugs, or expand the testing to cover more scenarios.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D95287
A @llvm.experimental.noalias.scope.decl is only useful if there is !alias.scope and !noalias metadata that uses the declared scope.
When that is not the case for at least one of the two, the intrinsic call can as well be removed.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D95141
Similar to D92887, LoopRotation also needs duplicate the noalias scopes when rotating a `@llvm.experimental.noalias.scope.decl` across a block boundary.
This is based on the version from the Full Restrict paches (D68511).
The problem it fixes also showed up in Transforms/Coroutines/ex5.ll after D93040 (when enabling strict checking with -verify-noalias-scope-decl-dom).
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D94306
This is a fix for https://bugs.llvm.org/show_bug.cgi?id=39282. Compared to D90104, this version is based on part of the full restrict patched (D68484) and uses the `@llvm.experimental.noalias.scope.decl` intrinsic to track the location where !noalias and !alias.scope scopes have been introduced. This allows us to only duplicate the scopes that are really needed.
Notes:
- it also includes changes and tests from D90104
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D92887
[libomptarget][nvptx] Replace cuda atomic primitives with clang intrinsics
Tested by diff of IR generated for target_impl.cu before and after. NFC. Part
of removing deviceRTL build time dependency on cuda SDK.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D95294
This is similar to D94106, but for the
isGuaranteedToTransferExecutionToSuccessor() helper. We should not
assume that readonly functions will return, as this is only true for
mustprogress functions (in which case we already infer willreturn).
As with the DCE change, for now continue assuming that readonly
intrinsics will return, as not all target intrinsics have been
annotated yet.
Differential Revision: https://reviews.llvm.org/D95288
This avoids being dependent on SimplifyDemandedBits having cleared
those bits.
It could make sense to teach SimplifyDemandedBits to keep all
lower bits 1 in an AND mask when possible. This could be
implemented with slli+srli in the general case rather than
needing to materialize the constant.
Previously FDE field names were used, but the fixup kind used for a field can
vary based on the pointer encoding.
This change will improve readability / maintainability when EH-frame support is
added to JITLink/ELF.
Address the compiler warning
OMPIRBuilder.cpp:1232:27: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int' [-Wsign-compare]
The tileLoops method implements the code generation part of the tile directive introduced in OpenMP 5.1. It takes a list of loops forming a loop nest, tiles it, and returns the CanonicalLoopInfo representing the generated loops.
The implementation takes n CanonicalLoopInfos, n tile size Values and returns 2*n new CanonicalLoopInfos. The input CanonicalLoopInfos are invalidated and BBs not reused in the new loop nest removed from the function.
In a modified version of D76342, I was able to correctly compile and execute a tiled loop nest.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D92974
Defining PATH_MAX to _XOPEN_PATH_MAX which is the closest macro available on z/OS.
Note that this value is 1024 which is 4 times smaller from same macro on Linux.
Reviewed By: #libc, ldionne, hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D92110
We try to do this during DAG combine with SimplifyDemandedBits,
but it fails if there are multiple nodes using the AND. For
example, multiple shifts using the same shift amount.
* Adds a flag to MlirOperationState to enable result type inference using the InferTypeOpInterface.
* I chose this level of implementation for a couple of reasons:
a) In the creation flow is naturally where generated and custom builder code will be invoking such a thing
b) it is a bit more efficient to share the data structure and unpacking vs having a standalone entry-point
c) we can always decide to expose more of these interfaces with first-class APIs, but that doesn't preclude that we will always want to use this one in this way (and less API surface area for common things is better for API stability and evolution).
* I struggled to find an appropriate way to test it since we don't link the test dialect into anything CAPI accessible at present. I opted instead for one of the simplest ops I found in a regular dialect which implements the interface.
* This does not do any trait-based type selection. That will be left to generated tablegen wrappers.
Differential Revision: https://reviews.llvm.org/D95283
Add an intrinsic type class to represent the
llvm.experimental.noalias.scope.decl intrinsic, to make code
working with it a bit nicer by hiding the metadata extraction
from view.
[libomptarget][cuda] Call v2 functions explicitly
rtl.cpp calls functions like cuMemFree that are replaced by a macro
in cuda.h with cuMemFree_v2. This patch changes the source to use
the v2 names consistently.
See also D95104, D95155 for the idea. Alternatives are to use a mixture,
e.g. call the macro names and explictly dlopen the _v2 names, or to keep
the current status where the symbols are replaced by macros in both files
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95274
In both cases, optimization is prevented because
"br X == C || X == C2" is converted into a switch. In one case
loop rotation is blocked, in the other vectorization.
D94700 removed the static library so we no longer need to pass
`-llibomptarget-nvptx` to `nvlink`. Since the bitcode library is the only device
runtime for now, instead of emitting a warning when it is not found, an error
should be raised. We also set a new option `libomptarget-nvptx-bc-path` to let
user choose which bitcode library is being used.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D95161
During the review of D91986 it has been discovered the in C++11
deprecated `throw()` exception specification has been removed in
C++20. Removed the part of the test code using this feature.
This patch adds a new InstModificationIRStrategy to mutate flags/options
for instructions. For example, it may add or remove nuw/nsw flags from
add, mul, sub, shl instructions or change the predicate for icmp
instructions.
Subtle changes such as those mentioned above should lead to a more
interesting range of inputs. The presence or absence of overflow flags
can expose subtle bugs, for example.
Reviewed By: bogner
Differential Revision: https://reviews.llvm.org/D94905
ZoneAlgorithms's computePHI relies on being provided with consistent a
schedule to compute the statement prodecessors of a statement containing
PHINodes. Otherwise unexpected results such as PHI nodes with multiple
predecessors can occur which would result in problems in the
algorithms expecting consistent data.
In the added test case, statement instances are scrubbed from the
SCoP their execution would result in undefined behavior (Due to a nsw
overflow). As already being undefined behavior in LLVM-IR, neither
AssumedContext nor InvalidContext are updated, giving computePHI no
means to avoid these cases.
Intoduce a new SCoP property, the DefinedBehaviorContext, that among
the runtime-checked conditions, also tracks the assumptions not needing
a runtime check, in particular those affecting the assumed control flow.
This replaces the manual combination of the 3 other contexts that was
already done in computePHI and setNewAccessRelation. Currently, the only
additional assumption is that loop induction variables will nsw flag for
not wrap, but potentially more can be added. Use in
hasFeasibleRuntimeContext, isl::ast_build and gisting are other
potential uses.
To limit computational complexity, the DefinedBehaviorContext is not
availabe if it grows too large (atm hardcoded to 8 disjuncts).
Possible other fixes include bailing out in computePHI when
inconsistencies are detected, choose an arbitrary value for inconsistent
cases (since it is undefined behavior anyways), or make the code
receiving the result from ComputePHI handle inconsistent data. All of
them reduce the quality of implementation having to bail out more often
and disabling the ability to assert on actually wrong results.
This fixes llvm.org/PR48783.
LoopUtils.h needs ICFLoopSafetyInfo but relies on a forward
declaration of ICFLoopSafetyInfo in IVDescriptors.h. This patch adds
a forward declaration right in LoopUtils.h.
While we are at it, this patch removes the one in IVDescriptors.h,
where it is unnecessary.
Some utilities used by InstCombine, like SimplifyLibCalls, may add new
instructions and replace the uses of a call, but return nullptr because
the inserted call produces multiple results.
Previously, the replaced library calls would get removed by
InstCombine's deleter, but after
292077072e this may not happen, if the
willreturn attribute is missing.
As a work-around, update replaceInstUsesWith to set MadeIRChange, if it
replaces any uses. This catches the cases where it is used as replacer
by utilities used by InstCombine and seems useful in general; updating
uses will modify the IR.
This fixes an expensive-check failure when replacing
@__sinpif/@__cospifi with @__sincospif_sret.
Implements parts of:
- P0898R3 Standard Library Concepts
- P1754 Rename concepts to standard_case for C++20, while we still can
Reviewed By: ldionne, miscco, #libc
Differential Revision: https://reviews.llvm.org/D91004