Do not break down local loads and stores so ds_read/write_b96/b128 in
ISelLowering can be selected on subtargets that support them and if align
requirements allow them.
Differential Revision: https://reviews.llvm.org/D84403
Fix local ds_read/write_b96/b128 so they can be selected if the alignment
allows. Otherwise, either pick appropriate ds_read2/write2 instructions or break
them down.
Differential Revision: https://reviews.llvm.org/D81638
Features UnalignedBufferAccess and UnalignedDSAccess are now used to determine
whether hardware supports such access.
UnalignedAccessMode should be used to enable them.
hasUnalignedBufferAccessEnabled() and hasUnalignedDSAccessEnabled() can be
now used to quickly check both.
Differential Revision: https://reviews.llvm.org/D84522
Adjust alignment requirements for ds_read/write_b96/b128.
GFX9 and onwards allow misaligned access for reads and writes but only if
SH_MEM_CONFIG.alignment_mode allows it.
UnalignedDSAccess is set on GCN subtargets from GFX9 onward to let us know if we
can relax alignment requirements.
UnalignedAccessMode acts similary to UnalignedBufferAccess for DS instructions
but only from GFX9 onward and is supposed to match alignment_mode. By default
alignment of 4 is required.
Differential Revision: https://reviews.llvm.org/D82788
In SelectionDAGBuilder always translate the fshl and fshr intrinsics to
FSHL and FSHR (or ROTL and ROTR) instead of lowering them to shifts and
ORs. Improve the legalization of FSHL and FSHR to avoid code quality
regressions.
Differential Revision: https://reviews.llvm.org/D77152
Both AfterPass and AfterPassInvalidated pass instrumentation
callbacks get additional parameter of type PreservedAnalyses.
This patch was created by @fedor.sergeev. I have just slightly
changed it.
Reviewers: fedor.sergeev
Differential Revision: https://reviews.llvm.org/D81555
Before we speculatively execute a basic block, query the cost of
inserting the necessary select instructions against the phi folding
threshold. For non-trivial insertions, a more accurate decision can
probably be made during machine if-conversion. With minsize we query
the CodeSize cost, otherwise we use SizeAndLatency.
Differential Revision: https://reviews.llvm.org/D82438
1. Complete `process load` with the common disk file completion, so there is not test provided for it;
2. Complete `process unload` with the tokens of valid loaded images.
Thanks for Raphael's help on the test for `process unload`.
Reviewed By: teemperor
Differential Revision: https://reviews.llvm.org/D79887
Currently we have `checkDRI` and two `createDRIFrom` methods which
are used to create `DynRegionInfo` objects.
And we have an issue: constructions like:
`ObjF->getELFFile()->base() + P->p_offset`
that are used in `createDRIFrom` functions might overflow.
I had to revert `D85519` which triggered such UBSan failure.
This NFC, simplifies and generalizes how we create `DynRegionInfo` objects.
It will allow us to introduce more/better validation checks in a single place.
It also will allow to change `createDRI` to return `Expected<>` so
that we will be able to stop using the `reportError`, which
is used inside currently, and have a warning instead.
Differential revision: https://reviews.llvm.org/D86297
Modify the ARM getCmpSelInstrCost implementation for the code size
costs of selects. Now consider the legalization cost and increase
the cost of i1 because those values wouldn't live in a general purpose
register. We also make selects +1 more expensive to account for the IT
instruction.
Differential Revision: https://reviews.llvm.org/D82091
As part of D84741, this adds a target hook for the
preferPredicatedReductionSelect option and makes use
of it under MVE, allowing us to tail predicate most
reduction loops.
Differential Revision: https://reviews.llvm.org/D85980
Commit dbcfbffc adds ppc.readflm and ppc.setflm intrinsics to read or
write FPSCR register. This patch adds them to Clang.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D85874
Instead of ANDing with a one hot mask representing the bit to
be tested, we were ANDing with just the bit number. This tests
multiple bits none of them the correct one.
This caused skylake-avx512, cascadelake and cooperlake to all
be misdetected. Based on experiments with the Intel SDE, it seems
that all of these CPUs are being detected as being cooperlake.
This is bad since its the newest CPU of the 3.
In e99dee82b0, the "out_of_memory_new_handler" was changed to be
explicitly initialized instead of relying on a global static
constructor.
However before this change, install_out_of_memory_new_handler could be
called multiple times while it asserts right now.
We can be more tolerant to calling multiple time InitLLVM without
reintroducing a global constructor for this handler.
Differential Revision: https://reviews.llvm.org/D86330
Use the append_if CMake function from HandleLLVMOptions. Since we
include this file in LLDBStandalone it should work in both for in-tree
and out-of-tree builds.
Refactor the way the reduction tree pass works in the MLIR Reduce tool by introducing a set of utilities that facilitate the implementation of new Reducer classes to be used in the passes.
This will allow for the fast implementation of general transformations to operate on all mlir modules as well as custom transformations for different dialects.
These utilities allow for the implementation of Reducer classes by simply defining a method that indexes the operations/blocks/regions to be transformed and a method to perform the deletion or transfomration based on the indexes.
Create the transformSpace class member in the ReductionNode class to keep track of the indexes that have already been transformed or deleted at a current level.
Delete the FunctionReducer class and replace it with the OpReducer class to reflect this new API while performing the same transformation and allowing the instantiation of a reduction pass for different types of operations at the module's highest hierarchichal level.
Modify the SinglePath Traversal method to reflect the use of the new API.
Reviewed: jpienaar
Differential Revision: https://reviews.llvm.org/D85591
If the type T is incomplete then sizeof(T) results in C++ compilation error at line:
static constexpr bool value = sizeof(T) <= (2 * sizeof(void *));
This patch allows incomplete types in parameters of function. Example:
using SomeFunc = void(SomeIncompleteType &);
llvm::unique_function<SomeFuncType> SomeFunc;
Reviewers: DaniilSuchkov, vvereschaka
Differential Revision: https://reviews.llvm.org/D81554
This patch adds support for referencing different abbrev tables. We use
'ID' to distinguish abbrev tables and use 'AbbrevTableID' to explicitly
assign an abbrev table to compilation units.
The syntax is:
```
debug_abbrev:
- ID: 0
Table:
...
- ID: 1
Table:
...
debug_info:
- ...
AbbrevTableID: 1 ## Reference the second abbrev table.
- ...
AbbrevTableID: 0 ## Reference the first abbrev table.
```
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D83116
This patch adds support for emitting multiple abbrev tables. Currently,
compilation units will always reference the first abbrev table.
Reviewed By: jhenderson, labath
Differential Revision: https://reviews.llvm.org/D86194
ld.lld is an ELF linker. We can switch to the new LLD for Mach-O port
when it's more complete, but for now, assume the user will have set
CMAKE_LINKER correctly themselves when targeting Darwin.
This patch adds support for emitting multiple abbrev tables. Currently,
compilation units will always reference the first abbrev table.
Reviewed By: jhenderson, labath
Differential Revision: https://reviews.llvm.org/D86194
Summary:
- HIP uses an unsized extern array `extern __shared__ T s[]` to declare
the dynamic shared memory, which size is not known at the
compile time.
Reviewers: arsenm, yaxunl, kpyzhov, b-sumner
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82496
We have two ways of using the runtimes build setup to build the
builtins. You can either have an empty LLVM_BUILTIN_TARGETS (or have it
include the "default" target), in which case builtin_default_target is
called to set up the default target, or you can have actual triples in
LLVM_BUILTIN_TARGETS, in which case builtin_register_target is called
for each triple. builtin_default_target lets you build the builtins for
Darwin (assuming your default triple is Darwin); builtin_register_target
does not.
I don't understand the reason for this distinction. The Darwin builtins
build is special in that a single CMake configure handles building the
builtins for multiple platforms (e.g. macOS, iPhoneSimulator, and iOS)
and architectures (e.g. arm64, armv7, and x86_64). Consequently, if you
specify multiple Darwin triples in LLVM_BUILTIN_TARGETS, expecting each
configure to only build for that particular triple, it won't work.
However, if you specify a *single* x86_64-apple-darwin triple in
LLVM_BUILTIN_TARGETS, that single configure will build the builtins for
all Darwin targets, exactly the same way that the default target would.
The only difference between the configuration for the default target and
the x86_64-apple-darwin triple is that the latter runs the configuration
with `-DCOMPILER_RT_DEFAULT_TARGET_ONLY=ON`, but that makes no
difference for Apple targets (none of the CMake codepaths which have
different behavior based on that variable are run for Apple targets).
I tested this by running two builtins builds on my Mac, one with the
default target and one with the x86_64-apple-darwin19.5.0 target (which
is the default target triple for my clang). The only relevant
CMakeCache.txt difference was the following, and as discussed above, it
has no effect on the actual build for Apple targets:
```
-//Default triple for which compiler-rt runtimes will be built.
-COMPILER_RT_DEFAULT_TARGET_TRIPLE:STRING=x86_64-apple-darwin19.5.0
+//No help, variable specified on the command line.
+COMPILER_RT_DEFAULT_TARGET_ONLY:UNINITIALIZED=ON
```
Furthermore, when I add the `-D` flag to compiler-rt's libtool
invocations, the libraries produced by the two builds are *identical*.
If anything, I would expect builtin_register_target to complain if you
tried specifying a triple for a particular Apple platform triple (e.g.
macosx), since that's the scenario in which it won't work as you want.
The generic darwin triple should be fine though, as best as I can tell.
I'm happy to add the error for specific Apple platform triples, either
in this diff or in a follow-up.
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D86313
explicit keyword is declared outside of class is invalid, invalid explicit declaration is handled inside DiagnoseFunctionSpecifiers() function. To avoid compiler crash in case of invalid explicit declaration, remove assertion.
Reviewed By: rsmith
Differential Revision: https://reviews.llvm.org/D83929
When replaying the reproducer, lldb should source the .lldbinit file
that was captured by the reproducer and not the one in the current home
directory. This requires that we store the home directory as part of the
reproducer. By returning the virtual home directory during replay, we
ensure the correct virtual path gets constructed which the VFS can then
find and remap to the correct file in the reproducer root.
This patch adds a new HomeDirectoryProvider, similar to the existing
WorkingDirectoryProvider. As the home directory is not part of the VFS,
it is stored in LLDB's FileSystem instance.
HeaderSearch was marking requested HeaderFileInfo as Resolved only based on
the presence of ExternalSource. As the result, using any module was enough
to set ExternalSource and headers unknown to this module would have
HeaderFileInfo with empty fields, including `isImport = 0`, `NumIncludes = 0`.
Such HeaderFileInfo was preserved without changes regardless of how the
header was used in other modules and caused incorrect result in
`HeaderSearch::ShouldEnterIncludeFile`.
Fix by marking HeaderFileInfo as Resolved only if ExternalSource knows
about this header.
rdar://problem/62126911
Reviewed By: bruno
Differential Revision: https://reviews.llvm.org/D80263
Known bits for G_ANYEXT was incorrectly using KnownBits::zext, causing
us to treat the high bits as zero even though they're (by definition)
unknown.
Differential Revision: https://reviews.llvm.org/D86323
No functionality change intended: there doesn't seem to be any way to
cause Clang to print such a value, but they can show up when dumping
APValues from a debugger.
We are now using a properly-substituted minimal deployment target
compiler flag (`%min_macos_deployment_target=10.11`). Enable test on
iOS and watchOS plus simulators. We are also not testing on very old
platforms anymore, so we can remove some obsolete lit infrastructure.
Now that Clang is able to constant-evaluate void-typed expressions,
disable showing hover-card values for them. It's not useful to say that
an expression cast to void has value '<no value>', even if we can
constant-evaluate it to that result!