Complex nested in other types is perfectly fine, just nested structs
aren't supported. Instead of checking whether there's nesting just check
whether the struct we're dealing with is a complex number.
Differential Revision: https://reviews.llvm.org/D125381
1. Call copy constructor of the base class
2. Assign value of the option directly
Reviewed By: dcaballe, rriddle
Differential Revision: https://reviews.llvm.org/D125101
D125214 split off a MLIRExecutionEngineUtils library that is used
by MLIRGPUTransforms. However, currently the entire ExecutionEngine
directory is skipped if the LLVM_NATIVE_ARCH target is not available.
Move the check for LLVM_NATIVE_ARCH, such that MLIRExecutionEngineUtils
always gets built, and only the JIT-related libraries are omitted
without native arch.
Differential Revision: https://reviews.llvm.org/D125357
This change integrates the BufferResultsToOutParamsPass into One-Shot Module Bufferization. This improves memory management (deallocation) when buffers are returned from a function.
Note: This currently only works with statically-sized tensors. The generated code is not very efficient yet and there are opportunities for improvment (fewer copies). By default, this new functionality is deactivated.
Differential Revision: https://reviews.llvm.org/D125376
Bufferization has an optional filter to exclude certain ops from analysis+bufferization. There were a few remaining places in the codebase where the filter was not checked.
Differential Revision: https://reviews.llvm.org/D125356
UNIX03 conformance requires utilities to flush stdout before exiting and raise
an error if writing fails. Flushing already happens on a call to exit
and thus automatically on a return from main. Write failure is then
detected by LLVM's default SIGPIPE handler. The handler already exits with
a non-zero code, but conformance additionally requires an error message.
When a custom operation is unknown and does not have a dialect prefix, we currently
emit an error using the name of the operation with the default dialect prefix. This
leads to a confusing error message, especially when operations get moved between dialects.
For example, `func` was recently moved out of `builtin` and to the `func` dialect. The current
error message we get is:
```
func @foo()
^ custom op 'builtin.func' is unknown
```
This could lead users to believe that there is supposed to be a `builtin.func`,
because there used to be. This commit adds a better error message that does
not assume that the operation is supposed to be in the default dialect:
```
func @foo()
^ custom op 'func' is unknown (tried 'builtin.func' as well)
```
Differential Revision: https://reviews.llvm.org/D125351
`linalg.generic` ops have canonicalizers that either remove arguments
not used in the payload, or redundant arguments. Combine these and
enhance the canonicalization to also remove results that have no use.
This is effectively dead code elimination for Linalg ops.
Differential Revision: https://reviews.llvm.org/D123632
We can simplify the code needed to implement dyn_cast/cast/isa support for MLIR operations with documented interfaces via the CastInfo structures. This will also provide an example of how to use CastInfo.
Depends on D123901
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D124963
This patch expands the expressive capability of the casting utilities in LLVM by introducing several levels of configurability. By creating modular CastInfo classes we can enable projects like MLIR that need more fine-grained control over how a cast is actually performed to retain that control, while making it easy to express the easy cases (like a checked pointer to pointer cast).
The current implementation of Casting.h doesn't make it clear where the entry points for customizing the cast behavior are, so part of the motivation for this patch is adding that documentation. Another part of the motivation is to support using LLVM RTTI with a wider set of use cases, such as nullable value to value casts, or pointer to value casts (as in MLIR).
Reviewed By: lattner, rriddle
Differential Revision: https://reviews.llvm.org/D123901
Using "replaceUsesOfWith" is incorrect because the same initializer value may appear multiple times.
For example, if the epilogue is needed when this loop is unrolled
```
%x:2 = scf.for ... iter_args(%arg1 = %c1, %arg2 = %c1) {
...
}
```
then both epilogue's arguments will be incorrectly renamed to use the same result index (note #1 in both cases):
```
%x_unrolled:2 = scf.for ... iter_args(%arg1 = %c1, %arg2 = %c1) {
...
}
%x_epilogue:2 = scf.for ... iter_args(%arg1 = %x_unrolled#1, %arg2 = %x_unrolled#1) {
...
}
```
Prior to this fix if we have a really large array or collection class, we would end up always creating all of the child variables for an array or collection class. If the number of children was very high this can cause delays when expanding variables. By adding the "indexedVariables" to variables with lots of children, we can keep good performance in the variables view at all times. This patch will add the "indexedVariables" key/value pair to any "Variable" JSON dictionairies when we have an array of synthetic child provider that will create more than 100 children.
We have to be careful to not call "uint32_t SBValue::GetNumChildren()" on any lldb::SBValue that we use because it can cause a class, struct or union to complete the type in order to be able to properly tell us how many children it has and this can be expensive if you have a lot of variables. By default LLDB won't need to complete a type if we have variables that are classes, structs or unions unless the user expands the variable in the variable view. So we try to only get the GetNumChildren() when we have an array, as this is a cheap operation, or a synthetic child provider, most of which are for showing collections that typically fall into this category. We add a variable reference, which indicates that something can be expanded, when the function "bool SBValue::MightHaveChildren()" is true as this call doesn't need to complete the type in order to return true. This way if no one ever expands class variables, we don't need to complete the type.
Differential Revision: https://reviews.llvm.org/D125347
The microMIPS instruction set is compatible with the MIPS instruction
set at the assembly level but not in terms of encodings. `nop` in
microMIPS is a special case as it has the same encoding as `nop` for
MIPS.
Fix this error by reducing the usage of NOP in the MIPS backend such
that only that ISA correct variants are produced.
Differential Revision: https://reviews.llvm.org/D124716
It turns out that the llvm buildbots run the test with
-DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-scei-ps4, which would cause this
test to fail as the test assumed that the default target is Windows. To
fix this, we explicitly set -target for the Windows testcases.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D125425
When Clang generates the path prefix (i.e. the path of the directory
where the file is) when generating FILE, __builtin_FILE(), and
std::source_location, Clang uses the platform-specific path separator
character of the build environment where Clang _itself_ is built. This
leads to inconsistencies in Chrome builds where Clang running on
non-Windows environments uses the forward slash (/) path separator
while Clang running on Windows builds uses the backslash (\) path
separator. To fix this, we add a flag -ffile-reproducible (and its
inverse, -fno-file-reproducible) to have Clang use the target's
platform-specific file separator character.
Additionally, the existing flags -fmacro-prefix-map and
-ffile-prefix-map now both imply -ffile-reproducible. This can be
overriden by setting -fno-file-reproducible.
[0]: https://crbug.com/1310767
Differential revision: https://reviews.llvm.org/D122766
Summary:
We use embedded binaries to extract offloading device code from the host
fatbinary. This uses a binary format whose necessary alignment is
eight bytes. The alignment is included within the ELF section type so
the data extracted from the ELF should always be aligned at that amount.
However, if this file was extraqcted from a static archive, it was being
sent as an offset in the archive file which did not have the same
alignment guaruntees as the ELF file. This was causing errors in the
UB-sanitizer build as it would occasionally try to access a misaligned
address. To fix this, I simply copy the memory directly to a new buffer
which is guarnteed to have worst-case alignment of 16 in the case that
it's not properly aligned.
Adds an intrinsic/builtin that can be used to fine tune scheduler behavior. If
there is a need to have highly optimized codegen and kernel developers have
knowledge of inter-wave runtime behavior which is unknown to the compiler this
builtin can be used to tune scheduling.
This intrinsic creates a barrier between scheduling regions. The immediate
parameter is a mask to determine the types of instructions that should be
prevented from crossing the sched_barrier. In this initial patch, there are only
two variations. A mask of 0 means that no instructions may be scheduled across
the sched_barrier. A mask of 1 means that non-memory, non-side-effect inducing
instructions may cross the sched_barrier.
Note that this intrinsic is only meant to work with the scheduling passes. Any
other transformations that may move code will not be impacted in the ways
described above.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D124700
RVV makes heavy use of subregisters due to LMUL>1 and segment
load/store tuples. Enabling subregister liveness tracking improves the quality
of the register allocation.
I've added a command line that can be used to turn it off if it causes compile
time or functional issues. I used the command line to keep the old behavior
for one interesting test case that was testing register allocation.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D125108
This is a followup to D124231.
We can fold the ADDIW in this pattern if we can prove that LUI+ADDI
would have produced the same result as LUI+ADDIW.
This pattern occurs because constant materialization prefers LUI+ADDIW
for all simm32 immediates. Only immediates in the range
0x7ffff800-0x7fffffff require an ADDIW. Other simm32 immediates
work with LUI+ADDI.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D124693
This is a full audit of emitError calls, I took the opportunity
to remove extranous parens and fix a couple cases where we'd
generate multiple diagnostics for the same error.
Differential Revision: https://reviews.llvm.org/D125355