This patch adds support for reverse loop vectorization.
It is possible to vectorize the following loop:
```
for (int i = n-1; i >= 0; --i)
a[i] = b[i] + 1.0;
```
with fixed or scalable vector.
The loop-vectorizer will use 'reverse' on the loads/stores to make
sure the lanes themselves are also handled in the right order.
This patch adds support for scalable vector on IRBuilder interface to
create a reverse vector. The IR function
CreateVectorReverse lowers to experimental.vector.reverse for scalable vector
and keedp the original behavior for fixed vector using shuffle reverse.
Differential Revision: https://reviews.llvm.org/D95363
Enhance 'ForOpIterArgsFolder' to remove unused iteration arguments in a
scf::ForOp. If the block argument corresponding to the given iterator has no
use and the yielded value equals the input, we fold it away.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D98503
In BuiltinsRISCV.def, other extension 's intrinsics need to be defined by using macro BUILTIN.
So, it shouldn't undefine macro BUILTIN in the end of declaration for V intrinsics.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D98682
Remove emit-llvm-bc from addClangTargetOptions as it conflicts with -E for save-temps.
AMDGCN does not yet support linking object files so backend and assemble actions are
skipped, leaving LLVM IR as the output format.
Reviewed By: JonChesterfield, ronlieb
Differential Revision: https://reviews.llvm.org/D96769
Previously we didn't support to keep the unique linkage name(-funique-internal-linkage-name) in llvm-profgen. As discussed in https://reviews.llvm.org/D96932, we choose to do canonicalization for it.
Now since "selected" is set as the default parameter of getCanonicalFnName in `D96932`, we don't need to add any attribute here for the previous usage and only fix the missing usage in the pseudo probe decoding.
Differential Revision: https://reviews.llvm.org/D98226
It seems that at some point it became necessary to pass `-thread` to
the ocaml compiler for this test.
Differential Revision: https://reviews.llvm.org/D98593
This pass runs in any situations but we skip it when it is not O0 and the
function doesn't have optnone attribute. With -O0, the def of shape to amx
intrinsics is near the amx intrinsics code. We are not able to find a
point which post-dominate all the shape and dominate all amx intrinsics.
To decouple the dependency of the shape, we transform amx intrinsics
to scalar operation, so that compiling doesn't fail. In long term, we
should improve fast register allocation to allocate amx register.
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D93594
The Intel Advanced Matrix Extensions (AMX) provides a tile matrix
multiply unit (TMUL), a tile control register (TILECFG), and eight
tile registers TMM0 through TMM7 (TILEDATA). This new MLIR dialect
provides a bridge between MLIR concepts like vectors and memrefs
and the lower level LLVM IR details of AMX.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D98470
The LINK_COMPONENTS dependency between DebugInfoCodeView and
DebugInfoMSF is unnecessary. Breaking them would allow a more
fine-controlled distribution.
Patch By: dangyi
Differential Revision: https://reviews.llvm.org/D98465
[libomptarget] Build amdgcn devicertl by default
The cmake for this looks for an llvm install and does the right thing when
building as part of enable_runtimes. It will probably do the right thing
in other settings - at least, it won't try to build this with gcc.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D98658
This patch introduces generic x86-64 edge kinds, and refactors the MachO/x86-64
backend to use these edge kinds. This simplifies the implementation of the
MachO/x86-64 backend and makes it possible to write generic x86-64 passes and
utilities.
The new edge kinds are different from the original set used in the MachO/x86-64
backend. Several edge kinds that were not meaningfully distinguished in that
backend (e.g. the PCRelMinusN edges) have been merged into single edge kinds in
the new scheme (these edge kinds can be reintroduced later if we find a use for
them). At the same time, new edge kinds have been introduced to convey extra
information about the state of the graph. E.g. The Request*AndTransformTo**
edges represent GOT/TLVP relocations prior to synthesis of the GOT/TLVP
entries, and the 'Relaxable' suffix distinguishes edges that are candidates for
optimization from edges which should be left as-is (e.g. to enable runtime
redirection).
ELF/x86-64 will be refactored to use these generic edges at some point in the
future, and I anticipate a similar refactor to create a generic arm64 support
header too.
Differential Revision: https://reviews.llvm.org/D98305
The MSVC runtime library doesn't have a definition for wmemchr,
so provide an inline implementation.
Differential Revision: https://reviews.llvm.org/D98472
Summary:
The request "evaluate" supports a "context" attribute, which is sent by VSCode. The attribute is defined here https://microsoft.github.io/debug-adapter-protocol/specification#Requests_Evaluate
The "clipboard" context is not yet supported by lldb-vscode, so we can forget about it for now. The 'repl' (i.e. Debug Console) and 'watch' (i.e. Watch Expression) contexts must use the expression parser in case the frame's variable path is not enough, as the user expects these expressions to never fail. On the other hand, the 'hover' expression is invoked whenever the user hovers on any keyword on the UI and the user is fine with the expression not being fully resolved, as they know that the 'repl' case is the fallback they can rely on.
Given that the 'hover' expression is invoked many many times without the user noticing it due to it being triggered by the mouse, I'm making it use only the frame's variable path functionality and not the expression parser. This should speed up tremendously the responsiveness of a debug session when the user only sets source breakpoints and inspect local variables, as the entire debug info is not needed to be parsed.
Regarding tests, I've tried to be as comprehensive as possible considering a multi-file project. Fortunately, the results from the "hover" case are enough most of the times.
Differential Revision: https://reviews.llvm.org/D98656
Summary:
because we enable -mignore-xcoff-visibility by default when there is no -fvisibility option in the clang in AIX OS
it will cause some test case fail at aix os. in order to let the -mignore-xcoff-visibility to be disable, we need to add the -fvisibility=default for those test case.
Reviewers: hubert.reinterpretcast daltenty
Differential Revision: https://reviews.llvm.org/D98660
Avoid making a temporary copy of byval argument if all accesses are loads and
therefore the pointer to the parameter can not escape.
This avoids excessive global memory accesses when each kernel makes its own
copy.
Differential revision: https://reviews.llvm.org/D98469
Implement INDEX in the runtime, reusing some infrastructure
(with generalization and renaming as needed) put into place
for its cousins SCAN and VERIFY.
I did not implement full Boyer-Moore substring searching
for the forward case, but did accelerate some advancement on
mismatches.
I (re)implemented unit testing for INDEX in the new gtest
framework, combining it with the tests that have recently
been ported to gtest for SCAN and VERIFY.
Differential Revision: https://reviews.llvm.org/D98553
RA can insert something like a sub1_sub2 COPY of a wide VGPR
tuple which results in the unaligned acces with v_pk_mov_b32
after the copy is expanded. This is regression after D97316.
Differential Revision: https://reviews.llvm.org/D98549
Recognize BI__builtin_isinf and BI__builtin_isfinite (and a few other opcodes
for finite) in testFPKind() and handle with TDC.
Review: Ulrich Weigand.
Differential Revision: https://reviews.llvm.org/D97901
This patch adds the `__PCREL__` define when PC Relative addressing is enabled.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D98546
[libomptarget] Build amdgpu plugin by default
This will build the amdgpu plugin if cmake is able to find the hsa
runtime library, which will be the case if rocm is installed or if
the hsa library has been installed somewhere cmake looks.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D98654
Replace individual operands GLC, SLC, and DLC with a single cache_policy
bitmask operand. This will reduce the number of operands in MIR and I hope
the amount of code. These operands are mostly 0 anyway.
Additional advantage that parser will accept these flags in any order unlike
now.
Differential Revision: https://reviews.llvm.org/D96469
Visual Studios implementation of the C++ Standard Library does not use strerror to produce a message for std::error_code unlike other standard libraries such as libstdc++ or libc++ that might be used.
This patch adds a cmake script that through running a C++ program gets the error messages for the POSIX error codes and passes them onto lit through an optional config parameter.
If the config parameter is not set, or getting the messages failed, due to say a cross compiling configuration without an emulator, it will fall back to using pythons strerror functions.
Differential Revision: https://reviews.llvm.org/D98278
[libomptarget] Fix devicertl build
The target specific functions in target_interface are extern C, but the
implementations for nvptx were mostly C++ mangling. That worked out as
a quirk of DEVICE macro expanding to nothing, except for shuffle.h which
only forward declared the functions with C++ linkage.
Also implements GetWarpSize, as used by shuffle, and includes target_interface
in nvptx target_impl.cu to help catch future divergence between interface and
implementation.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D98651
For ThinLTO's prelink compilation, we need to put external inline candidates into an import list attached to function's entry count metadata. This enables ThinLink to treat such cross module callee as hot in summary index, and later helps postlink to import them for profile guided cross module inlining.
For AutoFDO, the import list is retrieved by traversing the nested inlinee functions. For CSSPGO, since profile is flatterned, a few things need to happen for it to work:
- When loading input profile in extended binary format, we need to load all child context profile whose parent is in current module, so context trie for current module includes potential cross module inlinee.
- In order to make the above happen, we need to know whether input profile is CSSPGO profile before start reading function profile, hence a flag for profile summary section is added.
- When searching for cross module inline candidate, we need to walk through the context trie instead of nested inlinee profile (callsite sample of AutoFDO profile).
- Now that we have more accurate counts with CSSPGO, we swtiched to use entry count instead of total count to decided if an external callee is potentially beneficial to inline. This make it consistent with how we determine whether call tagert is potential inline candidate.
Differential Revision: https://reviews.llvm.org/D98590