Currently the following case fails:
```
template<typename Ty>
Ty foo(Ty *addr, Ty val) {
Ty v;
#pragma omp atomic compare capture
{
v = *addr;
if (*addr > val)
*addr = val;
}
return v;
}
```
The compiler complains `addr` is not a lvalue. That's because when an expression
is instantiation dependent, we cannot tell if it is lvalue or not.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D135224
In the case of non-opaque pointers, when combining consecutive loads,
need to bitcast the pointer source to the combined type size, otherwise
asserts are triggered.
Differential Revision: https://reviews.llvm.org/D135249
This supports the lowering of private and firstprivate clauses in single
construct. The alloca ops are emitted in the entry block according to
https://llvm.org/docs/Frontend/PerformanceTips.html#use-of-allocas, and
the load/store ops are emitted in the single region. The data race
problem is handled in OMPIRBuilder. That is, the barrier is emitted in
OMPIRBuilder.
Co-authored-by: Nimish Mishra <neelam.nimish@gmail.com>
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D128596
The infinite loop seen on buildbots should be fixed by
11897708c0 (assuming there are not
multiple infinite combine loops...)
-----
foldOpIntoPhi() currently only folds operations into the phi if all
but one operands constant-fold. The two exceptions to this are freeze
and select, where we allow more general simplification.
This patch makes foldOpIntoPhi() generally simplification based and
removes all the instruction-specific logic. We just try to simplify
the instruction for each operand, and for the (potentially) one
non-simplified operand, we move it into the new block with adjusted
operands.
This fixes https://github.com/llvm/llvm-project/issues/57448, which
was my original motivation for the change.
Differential Revision: https://reviews.llvm.org/D134954
The old approach (dedicated ExecXXX for each instruction) is not flexible and results in duplicated code when RVC kicks in.
According to the spec, every compressed instruction can be decoded to a non-compressed one. So we can lower compressed instructions to instructions we already had, which requires a decoupling between the decoder and executor.
This patch:
- use llvm::Optional and its combinators AMAP.
- use template constraints on common instruction.
- make instructions strongly-typed (no uint32_t everywhere bc it is error-prone and burdens the developer when lowering the RVC) with the help of algebraic datatype (std::variant).
Note:
(NFC) because this is more of a refactoring in preparation for RVC.
Reviewed By: DavidSpickett
Differential Revision: https://reviews.llvm.org/D135015
Rather than inserting a ptrtoint + inttoptr pair, directly replace
the inttoptr with the new phi node. This ensures that no other
transform can undo it before the pair gets folded away.
This avoids the infinite loop when combined with D134954.
This is NFCI in the sense that it shouldn't make a difference, but
could due to different worklist order.
The GPU transform dialect currently has restrictions and several situations where we can't use transform dialect.
This update includes a method to test a failing cases in GPU transform dialect.
Differential Revision: https://reviews.llvm.org/D135063
The new pass implements the following:
* Inserts code at the start of an arm_new_za function to
commit a lazy-save when the lazy-save mechanism is active.
* Adds a smstart intrinsic at the start of the function.
* Adds a smstop intrinsic at the end of the function.
Patch co-authored by kmclaughlin.
Differential Revision: https://reviews.llvm.org/D133896
SimpleLoopUnswitch may remove blocks from loops. Clear block and loop
dispositions in that case, to clean up invalid entries in the cache.
Fixes#58158.
Fixes#58159.
This simplifies the test case added in e399dd601 to only require indvars
and simple-loop-unswitch. This allows adding the test case for #58158 to
the same file, keeping related tests together.
This patch introduces a new AArch64 ISD node (OBSCURE_COPY) that can
be used when we want to prevent SVE object address calculations
from being rematerialised between a smstop/smstart and a call.
At the moment we use COPY to copy the frame index to a register,
which leads to problems because the "simple register coalescing"
pass understands the COPY instruction and attempts to rematerialise
an address calculation with 'addvl' between an smstop and a call.
When in streaming mode the 'addvl' instruction may have different
behaviour because the streaming SVE vector length is not guaranteed
to equal the normal SVE vector length.
The new ISD opcode OBSCURE_COPY gets lowered to a new pseudo
instruction also called OBSCURE_COPY. This ensures it cannot be
rematerialised and we expand this into a simple move very late in
the machine instruction pipeline.
A new test is added here:
CodeGen/AArch64/sme-streaming-interface.ll
Differential Revision: https://reviews.llvm.org/D134940
This patch update the fir::isUnlimitedPolymorphicType function
to reflect the chosen design. It adds also a fir::isPolymorphicType
function.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D135143
This makes sure that the instructions of the prologue matches the
SEH opcodes.
Also remove a couple redundant cases of setting HasWinCFI; it was
already set unconditionally after the conditional cases.
Differential Revision: https://reviews.llvm.org/D135101
When trying to debug some `compiler-rt` unittests, I initially had a hard
time because
- even in a `Debug` build one needs to set `COMPILER_RT_DEBUG` to get
debugging info for some of the code and
- even so the unittests used a hardcoded `-O2` which often makes debugging
impossible.
This patch addresses this by instead using `-O0` if `COMPILER_RT_DEBUG`.
Two tests in `sanitizer_type_traits_test.cpp` need to be disabled since
they have undefined references to `__sanitizer::integral_constant<bool,
true>::value`.
Tested on `sparcv9-sun-solaris2.11`, `amd64-pc-solaris2.11`, and
`x86_64-pc-linux-gnu`.
Differential Revision: https://reviews.llvm.org/D91620
We often have constraints for array attributes that they are sorted
non-decreasing or strictly increasing. This change adds AttrConstraint classes
that support DenseArrayAttr for integer types.
Differential Revision: https://reviews.llvm.org/D134944
These intrinsics are simply expanded to regular icmp/fcmp instructions.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D121594
The `readability-braces-around-statements` check tries to look at the
closing parens of the if condition to determine where to insert braces,
however, "consteval if" statements don't have a condition, and always
have braces regardless, so the skip can be checked.
The `readability-simplify-boolean-expr` check looks at the condition
of the if statement to determine what could be simplified, but as
"consteval if" statements do not have a condition that could be
simplified, they can also be skipped here.
There may still be more checks that try to look at the conditions of
`if`s that aren't included here
Fixes https://github.com/llvm/llvm-project/issues/57568
Reviewed By: njames93, aaron.ballman
Differential Revision: https://reviews.llvm.org/D133413
If we have thread states, the program is going to be rather slow. If we
don't, we want to avoid wasting shared memory. This patch introduces a
slight penalty (malloc + indirection) for the slow path and reduces
resource usage for the fast path.
Differential Revision: https://reviews.llvm.org/D135037
We should use OpenMP atomics but they don't take variable orderings.
Maybe we should expose all of this in the header but that solves only
part of the problem anyway.
Differential Revision: https://reviews.llvm.org/D135036
The patch selects VSELECT_VL/VP_MERGE_VL that uses VF(N)M(ACC|SAC) as its
true operand and the adden of the true operand as its false operand.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D135080
This is primarily used for Arm64EC, but it could be used for other
non-standard calling conventions. The testcase is based on an Arm64EC
thunk generated by MSVC.
The name save_any_reg comes from Microsoft documentation, but the full
encoding isn't specified there; this is reverse-engineered from the
behavior of the unwinder. (Thanks to Martin Storsjö for his example of
how to write simple unwinder testcases by directly calling
RtlVirtualUnwind.)
Differential Revision: https://reviews.llvm.org/D135196
The allocation hints for copies of ACC registers assumed that we would only be
copying between VSRp and UACC registers. In reality it is also possible to copy
between UACC and ACC registers.
This patch adds a new case for the ACC copy to fix that issue.
Note that the test case added with this patch will hit an assert without the
fix.
Reviewed By: lei, amyk
Differential Revision: https://reviews.llvm.org/D134501
(Re-Apply with fixes to clang MicrosoftMangle.cpp)
This is a first step towards high level representation for fp8 types
that have been built in to hardware with near term roadmaps. Like the
BFLOAT16 type, the family of fp8 types are inspired by IEEE-754 binary
floating point formats but, due to the size limits, have been tweaked in
various ways in order to maximally use the range/precision in various
scenarios. The list of variants is small/finite and bounded by real
hardware.
This patch introduces the E5M2 FP8 format as proposed by Nvidia, ARM,
and Intel in the paper: https://arxiv.org/pdf/2209.05433.pdf
As the more conformant of the two implemented datatypes, we are plumbing
it through LLVM's APFloat type and MLIR's type system first as a
template. It will be followed by the range optimized E4M3 FP8 format
described in the paper. Since that format deviates further from the
IEEE-754 norms, it may require more debate and implementation
complexity.
Given that we see two parts of the FP8 implementation space represented
by these cases, we are recommending naming of:
* `F8M<N>` : For FP8 types that can be conceived of as following the
same rules as FP16 but with a smaller number of mantissa/exponent
bits. Including the number of mantissa bits in the type name is enough
to fully specify the type. This naming scheme is used to represent
the E5M2 type described in the paper.
* `F8M<N>F` : For FP8 types such as E4M3 which only support finite
values.
The first of these (this patch) seems fairly non-controversial. The
second is previewed here to illustrate options for extending to the
other known variant (but can be discussed in detail in the patch
which implements it).
Many conversations about these types focus on the Machine-Learning
ecosystem where they are used to represent mixed-datatype computations
at a high level. At that level (which is why we also expose them in
MLIR), it is important to retain the actual type definition so that when
lowering to actual kernels or target specific code, the correct
promotions, casts and rescalings can be done as needed. We expect that
most LLVM backends will only experience these types as opaque `I8`
values that are applicable to some instructions.
MLIR does not make it particularly easy to add new floating point types
(i.e. the FloatType hierarchy is not open). Given the need to fully
model FloatTypes and make them interop with tooling, such types will
always be "heavy-weight" and it is not expected that a highly open type
system will be particularly helpful. There are also a bounded number of
floating point types in use for current and upcoming hardware, and we
can just implement them like this (perhaps looking for some cosmetic
ways to reduce the number of places that need to change). Creating a
more generic mechanism for extending floating point types seems like it
wouldn't be worth it and we should just deal with defining them one by
one on an as-needed basis when real hardware implements a new scheme.
Hopefully, with some additional production use and complete software
stacks, hardware makers will converge on a set of such types that is not
terribly divergent at the level that the compiler cares about.
(I cleaned up some old formatting and sorted some items for this case:
If we converge on landing this in some form, I will NFC commit format
only changes as a separate commit)
Differential Revision: https://reviews.llvm.org/D133823
Once we are in the `Unreachable` we want to disable type checking, but
we were unconditionally returning `true` here which means we encountered
and error. Instead we unconditionally return false to signal no error.
Fixes: https://github.com/llvm/llvm-project/issues/56935
Differential Revision: https://reviews.llvm.org/D135195
As a result of making these legal, and tweaking the combine to allow vectors,
we generate vector G_SEXT_INREG during legalization.
The reason we want to make these legal in the first place is to allow for
more combine opportunities. Once those have been done, we can just lower them
back to shifts in the post-legalizer lowering.
This needs to be one commit otherwise we start causing tests to fail due to
incomplete support for selection etc.
Add the -fpass-plugin option to flang which dynamically loads LLVM passes from the
shared object passed as the argument to the flag. The behavior of the option is
designed to replicate that of the same option in clang and thus has the same
capabilities and limitations.
- Multiple instances of -fpass-plugin=path-to-file can be specified and each of the
files will be loaded in that order.
- The flag can be passed to both flang-new and flang-new -fc1.
Differential Revision: https://reviews.llvm.org/D129156
This information is not preserved in MIR today. So this patch adds
information to RISCVMachineFunctionInfo when the vreg is created for
the argument.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D134621