Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4.
Keeps MVT::i2, MVT::i4 lowering actions as expand, which should be
removed once targets set this explicitly.
Adjusts 11 lit tests to reflect slightly different behavior during
DAG combine.
Differential Revision: https://reviews.llvm.org/D125247
Even if CSR list is same between functions, we could have had a different
allocation order if ignoreCSRForAllocationOrder is evaluated differently.
Hence invalidate cached register class information if
ignoreCSRForAllocationOrder changes.
Patch by Srividya Karumuri <srividya_karumuri@apple.com>
Differential Revision: https://reviews.llvm.org/D126565
We use the `OffloadBinary` to create binary images of offloading files
and their corresonding metadata. This patch changes this to inherit from
the base `Binary` class. This allows us to create and insepect these
more generically. This patch includes all the necessary glue to
implement this as a new binary format, along with added the magic bytes
we use to distinguish the offloading binary to the `file_magic`
implementation.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D126812
The option mdefault-visibility-export-mapping is created to allow
mapping default visibility to an explicit shared library export
(e.g. dllexport). Exactly how and if this is manifested is target
dependent (since it depends on how they map dllexport in the IR).
Three values are provided for the option:
* none: the default and behavior without the option, no additional export linkage information is created.
* explicit: add the export for entities with explict default visibility from the source, including RTTI
* all: add the export for all entities with default visibility
This option is useful for targets which do not export symbols as part of
their usual default linkage behaviour (e.g. AIX), such targets
traditionally specified such information in external files (e.g. export
lists), but this mapping allows them to use the visibility information
typically used for this purpose on other (e.g. ELF) platforms.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D126340
Now that we have an AllocTensorOp (previously InitTensorOp) in the bufferization dialect, the InitOp in the sparse dialect is no longer needed.
Differential Revision: https://reviews.llvm.org/D126180
This is failing on an arm32 builder, and it is going to take me a while
to debug. To not block further progress I'm disabling this test on
arm32 configuraitons.
The trick of using an empty token in the `FOREVERY_O` x-macro relies on preprocessor behavior which is only standard since C99 6.10.3/4 and C++11 N3290 16.3/4 (whereas it was undefined behavior up through C++03 16.3/10). Since the `ExecutionEngine/SparseTensorUtils.cpp` file is required to be compile-able under C++98 compatibility mode (unlike the C++11 used elsewhere in MLIR), we shouldn't rely on that behavior.
Also, using a non-empty suffix helps improve uniformity of the API, since all other primary/overhead suffixes are also non-empty. I'm using the suffix `0` since that's the value used by the `SparseTensorEncoding` attribute for indicating the index overhead-type.
Depends On D126720
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D126724
Recently the terminology used has been changed from Exit->Exiting in
line with common LLVM loop terminology. Update a remaining use of the
old terminology.
Replace "cache+" with "ext-tsp" in all BOLT tests
Test Plan:
```
ninja check-bolt
grep -rnw . -e "cache+"
```
no more tests containing "cache+"
"cache+" and "ext-tsp" are aliases
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D126714
Support the pattern where a test file uses multiple prefixes per run line:
one prefix that is unique to the run line, and additional prefixes that are
common with other run lines.
Decide on a per-function basis which prefix(es) to emit, based on which run
lines have the same output.
Move the renaming of vregs earlier, so that we can compare the output as it
would actually be printed in check lines.
Differential Revision: https://reviews.llvm.org/D126411
This is failing on an arm32 builder, and it is going to take me a while
to debug. To not block further progress I'm disabling this test on
arm32 configuraitons.
F18 doesn't accept INTEGER operands to the intrinsic LOGICAL operations;
some compilers do. This usage is not portable, and not just because it's
non-conforming -- the bit representations of LOGICAL also vary between
compilers and options. The "MIL-STD" bit intrinsics IAND() & al. have been
avaiable since the late 70's and should be used instead.
Differential Revision: https://reviews.llvm.org/D126798
This patch adds a the first bits of support for a yaml representation
of dxcontainer files.
Since the YAML representation's primary purpose is testing
infrastructure, the yaml representation supports both verbose and a
more friendly format by making computable sizes and offsets optional.
If provided they are validated to be correct, otherwise they are
computed on the fly during emission.
As I expand the format I'll be able to make more size fields optional,
and I will continue to make the format easier to work with.
Depends on D124804
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D124944
DXContainer files are structured as parts. This patch adds support for
parsing out the file part offsets and file part headers.
Reviewed By: kuhar
Differential Revision: https://reviews.llvm.org/D124804
Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4.
Keeps MVT::i2, MVT::i4 lowering actions as `expand`, which should be
removed once targets set this explicitly.
Adjusts 11 lit tests to reflect slightly different behavior during
DAG combine.
Differential Revision: https://reviews.llvm.org/D125247
This reverts commit a544710cd4.
See discussion in D120540.
This breaks C++ Clang modules on Darwin and also more than a dozen
tests in the LLDB testsuite. I think we need to be more careful to
separate out the enabling of Clang C++ modules and C++20
modules. Either by having -fmodules-ts control the HaveModules flag,
or by adding a way to explicitly turn them off.
This reduces the time emitStabs() takes by about 275ms, or 3% of overall
linking time for the project I'm on. Although the parent function is run in
parallel, it's one of the slowest tasks in that concurrent batch (I have
another optimization for another slow task as well).
Differential Revision: https://reviews.llvm.org/D126785
Ctlz is an intrinsic in LLVM but does not have equivalent operations in SPIR-V.
Including a decomposition gives an alternative path for these platforms.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D126261
This patch implements the `MaximalStaticExpansion` and its printer in NPM.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D125870
Windows builds can't receive environment variables on the command line or make use of RULE_LAUNCH_COMPILE with ccache.
Reviewed By: stella.stamenova, Ericson2314
Differential Revision: https://reviews.llvm.org/D126575
This field corresponds to the Syntax enumeration, and that gained
another entry in 1fdf952dee. However,
the bit-field for storing the syntax used was not adjusted to handle
the extra field.
This turns out to be unobservable for HLSL attributes at the moment, so
there is no test coverage. But it's also not really an NFC change
either.
This adds tests checking the behavior of const variables declared with
weak attribute.
Both checking that they can not be used in places where a constant
expression is required and that a dynamic initializer is emitted when
used as an initializer expression.
Differential Revision: https://reviews.llvm.org/D126578
Improved/fixed cost modeling for shuffles by providing masks, improved
cost model for non-identity insertelements.
Differential Revision: https://reviews.llvm.org/D115462
The state is now stored on the thread's stack memory. This enables
implementing pthread API like pthread_detach which takes the pthread_t
structure argument by value.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D126716
This patchs adds the arguments necessary to allocate the size of the
dynamic shared memory via the `LIBOMPTARGET_SHARED_MEMORY_SIZE`
environment variable. This patch only allocates the memory, AMDGPU has a
limitation that shared memory can only be accessed from the kernel
directly. So this will currently only work with optimizations to inline
the accessor function.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D125252
The script uses llvm-link to link LLVM bitcode files.
5426da8ffa used -DLLVM_DISABLE_ASSEMBLY_FILES=ON
to ignore object files compiled from lib/Support/BLAKE3/*.S.
A better approach (which fits Bazel better) is to ignore non-bitcode files.
Reviewed By: akyrtzi
Differential Revision: https://reviews.llvm.org/D126728