Issue: https://github.com/llvm/llvm-project/issues/54431
PHINodes that need to be generated to accommodate a PHINode outside the region due to different output paths need to have their own numbering to determine the number of output schemes required to properly handle all the outlined regions. This numbering was previously only determined by the order and values of the incoming values, as well as the parent block of the PHINode. This adds the incoming blocks to the calculation of a hash value for these PHINodes as well, and the supporting infrastructure to give each block in a region a corresponding canonical numbering.
Reviewer: paquette
Differential Revision: https://reviews.llvm.org/D122207
With D117142, we would now format
```
struct A {
#define A
void f() { a(); }
#endif
};
```
into
```
struct A {
#ifdef A
void f() {
a();
}
#endif
};
```
because we were looking for the record lbrace without skipping preprocess lines.
Fixes https://github.com/llvm/llvm-project/issues/54901.
Reviewed By: curdeius, owenpan
Differential Revision: https://reviews.llvm.org/D123737
Handle unsupported passthru values before lowering the gather to
target specific nodes. This is a simplification that's on the road
to moving more of MGATHER lowering into td based isel.
Differential Revision: https://reviews.llvm.org/D123683
Fortran admits a few ways to have multiple symbols with the
same name in the same scope. Two of them involve generic
interfaces (from INTERFACE or GENERIC, the syntax doesn't matter);
these are allowed to inhabit a scope with either a derived type or
a subprogram that is also a specific procedure of the generic.
(But not both a derived type and a subprogram; they could not
cohabit a scope anyway, generic or not.)
In cases of USE association, f18 needs to be capable of combining
use-associated generic interfaces with other use-associated entities.
Two generics get merged (this case was nearly correct); a generic
and a derived type can merge into a GenericDetails with a shadowed
derivedType(); and a generic can replace or ignore a use-associated
procedure of the same name so long as that procedure is already
one of its specifics.
Further, these modifications to the use-associated generic
interface must be made to a local copy of the symbol. The previous
code was messing directly with the symbol in the module's scope.
The fix is basically a reimplementation of the member function
DoAddUse() in name resolution.
Differential Revision: https://reviews.llvm.org/D123704
This is mostly handled by adding "let mayRaiseFPException = 1" before
the definition of the relevant instruction classes, but there are a
couple of complications:
* When we have a multiclass where currently some instantiations are
of instructions that can raise an exception and others aren't we
need to split that into two multiclasses, one inheriting from the
other using a multiclass parameter to enable exceptions.
* In a couple of places in the globalisel instruction selector we
need to manually set the NoFPExcept flag. There's also another
place that looks like it should need it, but that code is never hit
for those opcodes due to them being handled by the generic
instruction selector, so I've instead just removed them from the
switch.
Differential Revision: https://reviews.llvm.org/D115352
Remove the checking of the generated asm, as that's already tested
elsewhere, and adjust some tests that were expecting the wrong
intrinsic to be generated.
Differential Revision: https://reviews.llvm.org/D118259
For strict FP16 to work correctly needs some changes in lowering and
legalization:
* SelectionDAGLegalize::PromoteNode was missing handling for some
strict fp opcodes.
* Some of the custom lowering of strict fp operations needed to be
adjusted to work with FP16.
* Custom lowering needed to be added for round-to-int operations.
With this, and the previous patches for the rest of the strict fp
isel, we can set IsStrictFPEnabled = true.
Differential Revision: https://reviews.llvm.org/D115620
Currently the fsub optimizations in InstSimplify don't know how to fold
-0.0 - (-X) to X when the constrained intrinsics are used. This adds partial
support. The rest of the support will come later with work on the IR
matchers.
This review is split out from D107285.
Differential Revision: https://reviews.llvm.org/D123396
HLSL has a language feature called Semantics which get attached to
declarations like attributes and are used in a variety of ways.
One example of semantic use is here with the `SV_GroupIndex` semantic
which, when applied to an input for a compute shader is pre-populated
by the driver with a flattened thread index.
Differential Revision: https://reviews.llvm.org/D122699
# Conflicts:
# clang/include/clang/Basic/Attr.td
# clang/include/clang/Basic/AttrDocs.td
We need to embed certain metadata along with a binary image when we wish
to perform a device-linking job on it. Currently this metadata was
embedded in the section name of the data itself. This worked, but made
adding new metadata very difficult and didn't work if the user did any
sort of section linking.
This patch introduces a custom binary format for bundling offloading
metadata with a device object file. This binary format is fundamentally
a simple string map table with some additional data and an embedded
image. I decided to use a custom format rather than using an existing
format (ELF, JSON, etc) because of the specialty use-case of this. We
need a simple binary format that can be concatenated without requiring
other external dependencies.
This extension will make it easier to extend the linker wrapper's
capabilties with whatever data is necessary. Eventually this will allow
us to remove all the external arguments passed to the linker wrapper and
embed it directly in the host's linker so device linking behaves exactly
like host linking.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D122069
Offloading sections can be embedded in the host during codegen via a
section. This section was originally marked as metadata to prevent it
from being loaded, but these sections are completely unused at runtime
so the linker should automatically drop them from the final executable
or shard library. This flag adds support for the SHF_EXCLUDE flag in
target lowering and uses it.
Reviewed By: JonChesterfield, MaskRay
Differential Revision: https://reviews.llvm.org/D122987
Error messages can have a list of attachments; these are used to point
to related source locations, supply additional information, and to
encapsulate error messages that were *not* emitted in a given context
to explain why a warning was justified.
This patch adds a message severity ("Because") for that last case,
and extends to AttachTo() API to provide a means for overriding
the severity of an attached message.
Some existing message attachments had their severities adjusted,
now that we're printing them. And operator==() for Message was
cleaned up while debugging after I noticed that it was recursively
O(N**2) and subject to returning a false positive.
Differential Revision: https://reviews.llvm.org/D123710
The existing code was not updating the uses of loads that it recreated,
leading to incorrect chains which could break the ordering between
nodes. This moves the code to a combine instead, and makes sure we
update the chain references. This does mean it happens earlier -
potentially before the concats are simplified. This can lead to
inefficiencies in the codegen, which will be fixed in followups.
The second parameter should be a multiple of the warp size (32).
PTX ISA spec, s9.7.12.1. Parallel Synchronization and Communication
Instructions: bar, barrier
barrier.sync{.aligned} a{, b};
Operand b specifies the number of threads participating in the
barrier. If no thread count is specified, all threads in the CTA
participate in the barrier. When specifying a thread count, the value
must be a multiple of the warp size.
Differential Revision: https://reviews.llvm.org/D123470
PTX ISA spec, s5.4.8. Variable Attribute Directive: .attribute
PTX ISA Notes
Introduced in PTX ISA version 4.0.
Target ISA Notes
.managed attribute requires sm_30 or higher.
Differential Revision: https://reviews.llvm.org/D123040
PTX ISA spec, s9.7.8.6. Data Movement and Conversion Instructions:
shfl.sync
PTX ISA Notes
Introduced in PTX ISA version 6.0.
Target ISA Notes
Requires sm_30 or higher.
Differential Revision: https://reviews.llvm.org/D123039
PTX ISA spec, s9.7.12.4. Parallel Synchronization and Communication
Instructions: atom
Target ISA Notes
64-bit atom.{and,or,xor,min,max} require sm_32 or higher.
Differential Revision: https://reviews.llvm.org/D123038
The intrinsics DREAL, DIMAG, and DCONJG are from Fortran 77 extensions.
For DREAL, the type of argument is extended to any complex. For DIMAG
and DCONJG, the type of argument for them should be complex(8). For DIMAG,
the result type should be real(8). For DCONJG, the result type should be
complex(8). Fix the intrinsic interface for them and add test cases for
the semantic checks and the lowering.
Reviewed By: Jean Perier
Differential Revision: https://reviews.llvm.org/D123459
The float number is represented as (-1)^s * 1.f * 2^(-127) for 32-bit,
where s is the signed flag, f is the mantissa. When the exponent bits
are all zeros, the float number is represented as (-1)^s * 0.f *2^(-126)
for 32-bit, in which case, the intPart is '0'.
Reviewed By: Jean Perier
https://reviews.llvm.org/D123673
znver1/2 models were incorrectly modelling these as 3 cycle latency instructions on the wrong pipe and znver1 ymm variants also require double pumping.
Now matches AMD SoG, Agner and instlatx64 numbers.
Thanks to @fabian-r for the report
This patch enables shift operators on SVE vector types, as well as
supporting vector-scalar shift operations.
Shifts by a scalar that is wider than the contained type in the
vector are permitted but as in the C standard if the value is larger
than the width of the type the behavior is undefined.
Differential Revision: https://reviews.llvm.org/D123303
Undefined behaviour is just passed on to extract_element when the
index is out of bounds. Subscript on svbool_t is not allowed as
this doesn't really have meaningful semantics.
Differential Revision: https://reviews.llvm.org/D122732
This dialect provides operations that can be used to control transformation of
the IR using a different portion of the IR. It refers to the IR being
transformed as payload IR, and to the IR guiding the transformation as
transform IR.
The main use case for this dialect is orchestrating fine-grain transformations
on individual operations or sets thereof. For example, it may involve finding
loop-like operations with specific properties (e.g., large size) in the payload
IR, applying loop tiling to those and only those operations, and then applying
loop unrolling to the inner loops produced by the previous transformations. As
such, it is not intended as a replacement for the pass infrastructure, nor for
the pattern rewriting infrastructure. In the most common case, the transform IR
will be processed and applied to payload IR by a pass. Transformations
expressed by the transform dialect may be implemented using the pattern
infrastructure or any other relevant MLIR component.
This dialect is designed to be extensible, that is, clients of this dialect are
allowed to inject additional operations into this dialect using the newly
introduced in this patch `TransformDialectExtension` mechanism. This allows the
dialect to avoid a dependency on the implementation of the transformation as
well as to avoid introducing dialect-specific transform dialects.
See https://discourse.llvm.org/t/rfc-interfaces-and-dialects-for-precise-ir-transformation-control/60927.
Reviewed By: nicolasvasilache, Mogball, rriddle
Differential Revision: https://reviews.llvm.org/D123135
Move the operations that correspond to LLVM IR intrinsics in a separate .td
file. This makes it easier to maintain the intrinsics and decreases the compile
time of LLVMDialect.cpp by ~25%.
Depends On D123310
Reviewed By: wsmoses, jacquesguan
Differential Revision: https://reviews.llvm.org/D123315
LLVM IR has introduced and is moving forward with the concept of opaque
pointers, i.e. pointer types that are not carrying around the pointee type.
Instead, memory-related operations indicate the type of the data being accessed
through the opaque pointer. Introduce the initial support for opaque pointers
in the LLVM dialect:
- `LLVMPointerType` to support omitting the element type;
- alloca/load/store/gep to support opaque pointers in their operands and
results; this requires alloca and gep to store the element type as an
attribute;
- memory-related intrinsics to support opaque pointers in their operands;
- translation to LLVM IR for the ops above is no longer using methods
deprecated in LLVM API due to the introduction of opaque pointers.
Unlike LLVM IR, MLIR can afford to support both opaque and non-opaque pointers
at the same time and simplify the transition. Translation to LLVM IR of MLIR
that involves opaque pointers requires the LLVMContext to be configured to
always use opaque pointers.
Reviewed By: wsmoses
Differential Revision: https://reviews.llvm.org/D123310
The lowering code did not use the scale operand of MGATHER/MSCATTER
nodes, but instead assumed scaled indices were always scaled based
on the element type of the memory type. This patch adds the missing
support by rewritting the nodes as unscaled variants.
Differential Revision: https://reviews.llvm.org/D123670
This addresses an existing TODO by keeping a mapping of external IR
Value * definitions wrapped in VPValues for use in a VPlan.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D123700
During real range reduction to [0.5, 4) with
SQRT(2**(2a) * x) = SQRT(2**(2a)) * SQRT(x) = 2**a * SQRT(x)
we fall into inf. recursion if IsZero() == true.
Explicitly handle SQRT(0.0) instead of additional checks during folding. Also
add helpers for +0.0/-0.0 generation to clean up a bit.
Reviewed By: klausler
Differential Revision: https://reviews.llvm.org/D123131
The semantics of `-mmlir` are identical to `-mllvm`. The only notable
difference is that `-mmlir` options should be forwarded to MLIR rather
than LLVM.
Note that MLIR llvm::cl options are lazily constructed on demand (see
the definition of options in PassManagerOptions.cpp). This means that:
* MLIR global options are only visible when explicitly initialised and
displayed only when using `-mmlir --help`,
* Flang and LLVM global options are always visible and displayed when
using either `-mllvm -help` or `-mmlir --help`.
In other words, `-mmlir --help` is a superset of `-mllvm --help`. This is not
ideal, but we'd need to refactor all option definitions in Flang and
LLVM to improve this. I suggesting leaving this for later.
Differential Revision: https://reviews.llvm.org/D123297