Commit Graph

382953 Commits

Author SHA1 Message Date
serge-sans-paille 6e040a19db [NFC] Wisely nest dyn_cast in FunctionLoweringInfo
Take advantage of the inheritance tree to avoid a few comparison.
2021-03-16 10:22:44 +01:00
Caroline Concatto 3c03635d53 [SVE][LoopVectorize] Add support for scalable vectorization of loops with vector reverse
This patch adds support for reverse loop vectorization.
It is possible to vectorize the following loop:
```
  for (int i = n-1; i >= 0; --i)
    a[i] = b[i] + 1.0;
```
with fixed or scalable vector.
The loop-vectorizer will use 'reverse' on the loads/stores to make
sure the lanes themselves are also handled in the right order.
This patch adds support for scalable vector on IRBuilder interface to
create a reverse vector. The IR function
CreateVectorReverse lowers to experimental.vector.reverse for scalable vector
and keedp the original behavior for fixed vector using shuffle reverse.

Differential Revision: https://reviews.llvm.org/D95363
2021-03-16 07:51:59 +00:00
Lorenzo Chelini fd7eee64c5 scf::ForOp: Fold away iterator arguments with no use and for which the corresponding input is yielded
Enhance 'ForOpIterArgsFolder' to remove unused iteration arguments in a
scf::ForOp. If the block argument corresponding to the given iterator has no
use and the yielded value equals the input, we fold it away.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D98503
2021-03-16 07:01:25 +00:00
Jim Lin 678241795c [RISCV] Don't emit #undef BUILTIN from RISCVVEmitter.cpp
In BuiltinsRISCV.def, other extension 's intrinsics need to be defined by using macro BUILTIN.
So, it shouldn't undefine macro BUILTIN in the end of declaration for V intrinsics.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98682
2021-03-16 14:57:45 +08:00
Amara Emerson 9575c48b89 [AArch64][GlobalISel] Fix crash on lowering <1 x half> types. 2021-03-15 23:27:43 -07:00
Yvan Roux c0f224e630 [AArch64][ASAN] Disable fgets_fputs.cpp test.
This test is failing for long a time on AArch64 bots, disable it for now
to keep the bots green while investigating it.
2021-03-16 07:00:19 +01:00
Pushpinder Singh fc12a64ecc [OpenMP][AMDGPU] Skip backend and assemble phases for amdgcn
Remove emit-llvm-bc from addClangTargetOptions as it conflicts with -E for save-temps.

AMDGCN does not yet support linking object files so backend and assemble actions are
skipped, leaving LLVM IR as the output format.

Reviewed By: JonChesterfield, ronlieb

Differential Revision: https://reviews.llvm.org/D96769
2021-03-16 04:58:14 +00:00
wlei dddd590fd0 [CSSPGO][llvm-profgen] Fix getCanonicalFnName usage in llvm-profgen
Previously we didn't support to keep the unique linkage name(-funique-internal-linkage-name) in llvm-profgen. As discussed in https://reviews.llvm.org/D96932, we choose to do canonicalization for it.

Now since "selected" is set as the default parameter of getCanonicalFnName in `D96932`, we don't need to add any attribute here for the previous usage and only fix the missing usage in the pseudo probe decoding.

Differential Revision: https://reviews.llvm.org/D98226
2021-03-15 21:00:42 -07:00
Johannes Doerfert 0a954a528b [OpenMP][FIX] Repair accidental replacement of _shfl_sync with _shfl
This was broken accidentally in D95752.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D98677
2021-03-15 22:46:00 -05:00
Johannes Doerfert f40a2c3bef [NVPTX] CUDA does provide malloc/free since compute capability 2.X
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#dynamic-global-memory-allocation-and-operations

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D98606
2021-03-15 22:45:56 -05:00
Josh Berdine 5bb2757e21 [OCaml][test] Fix Bindings/OCaml/executionengine.ml test
It seems that at some point it became necessary to pass `-thread` to
the ocaml compiler for this test.

Differential Revision: https://reviews.llvm.org/D98593
2021-03-16 02:48:36 +00:00
LLVM GN Syncbot 6547dcb4f3 [gn build] Port 4f198b0c27 2021-03-16 02:41:16 +00:00
Bing1 Yu 4f198b0c27 [X86] Pass to transform amx intrinsics to scalar operation.
This pass runs in any situations but we skip it when it is not O0 and the
function doesn't have optnone attribute. With -O0, the def of shape to amx
intrinsics is near the amx intrinsics code. We are not able to find a
point which post-dominate all the shape and dominate all amx intrinsics.
To decouple the dependency of the shape, we transform amx intrinsics
to scalar operation, so that compiling doesn't fail. In long term, we
 should improve fast register allocation to allocate amx register.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D93594
2021-03-16 10:40:22 +08:00
David Blaikie 9341bcbdc9 Skip path separators to make the test portable across Win/Linux 2021-03-15 18:24:40 -07:00
Aart Bik 6ad7b97e20 [mlir][amx] Add Intel AMX dialect (architectural-specific vector dialect)
The Intel Advanced Matrix Extensions (AMX) provides a tile matrix
multiply unit (TMUL), a tile control register (TILECFG), and eight
tile registers TMM0 through TMM7 (TILEDATA). This new MLIR dialect
provides a bridge between MLIR concepts like vectors and memrefs
and the lower level LLVM IR details of AMX.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D98470
2021-03-15 17:59:05 -07:00
Walter Erquinigo b5657d1fbf Fix 34885bffdf
It failed https://lab.llvm.org/buildbot/#/builders/17/builds/5262 and
the fix is simply to relax a regex expression in a test.
2021-03-15 16:36:32 -07:00
Petr Hosek 9466f9b434 [CMake] Clean up unnecessary dependency
The LINK_COMPONENTS dependency between DebugInfoCodeView and
DebugInfoMSF is unnecessary. Breaking them would allow a more
fine-controlled distribution.

Patch By: dangyi

Differential Revision: https://reviews.llvm.org/D98465
2021-03-15 16:29:16 -07:00
Jon Chesterfield e23f3502d9 [libomptarget] Build amdgcn devicertl by default
[libomptarget] Build amdgcn devicertl by default

The cmake for this looks for an llvm install and does the right thing when
building as part of enable_runtimes. It will probably do the right thing
in other settings - at least, it won't try to build this with gcc.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D98658
2021-03-15 23:17:50 +00:00
LLVM GN Syncbot 2ef6ee1978 [gn build] Port ecf6466f01 2021-03-15 23:01:19 +00:00
Lang Hames ecf6466f01 [JITLink][MachO][x86-64] Introduce generic x86-64 support.
This patch introduces generic x86-64 edge kinds, and refactors the MachO/x86-64
backend to use these edge kinds. This simplifies the implementation of the
MachO/x86-64 backend and makes it possible to write generic x86-64 passes and
utilities.

The new edge kinds are different from the original set used in the MachO/x86-64
backend. Several edge kinds that were not meaningfully distinguished in that
backend (e.g. the PCRelMinusN edges) have been merged into single edge kinds in
the new scheme (these edge kinds can be reintroduced later if we find a use for
them). At the same time, new edge kinds have been introduced to convey extra
information about the state of the graph. E.g. The Request*AndTransformTo**
edges represent GOT/TLVP relocations prior to synthesis of the GOT/TLVP
entries, and the 'Relaxable' suffix distinguishes edges that are candidates for
optimization from edges which should be left as-is (e.g. to enable runtime
redirection).

ELF/x86-64 will be refactored to use these generic edges at some point in the
future, and I anticipate a similar refactor to create a generic arm64 support
header too.

Differential Revision: https://reviews.llvm.org/D98305
2021-03-15 15:43:07 -07:00
Tim Keith bcf95cbb2c [flang] Create intrinsics modules directory (contd.)
Use -module-dir rather than WORKING_DIRECTORY because we are potentially
creating the working directory in this custom command.
2021-03-15 15:38:05 -07:00
Amy Huang f5352dd9da Emit inline implementation of __builtin__wmemchr on MSVCRT platforms.
The MSVC runtime library doesn't have a definition for wmemchr,
so provide an inline implementation.

Differential Revision: https://reviews.llvm.org/D98472
2021-03-15 15:30:55 -07:00
Nico Weber 264ff539f3 [gn build] merge af2796c76d a bit more
The default is fine on non-Win, but on Win this needs an explicit
setting now that lit no longer has the right default.
2021-03-15 18:20:54 -04:00
Tim Keith 566a2c18bf [flang] Create intrinsics modules directory
A clean build fails using make because the intrinsics modules directory
doesn't exist. For some reason it works fine with ninja.
2021-03-15 15:19:30 -07:00
Walter Erquinigo 34885bffdf [lldb-vscode] Handle request_evaluate's context attribute
Summary:
The request "evaluate" supports a "context" attribute, which is sent by VSCode. The attribute is defined here https://microsoft.github.io/debug-adapter-protocol/specification#Requests_Evaluate

The "clipboard" context is not yet supported by lldb-vscode, so we can forget about it for now. The 'repl' (i.e. Debug Console) and 'watch' (i.e. Watch Expression) contexts must use the expression parser in case the frame's variable path is not enough, as the user expects these expressions to never fail. On the other hand, the 'hover' expression is invoked whenever the user hovers on any keyword on the UI and the user is fine with the expression not being fully resolved, as they know that the 'repl' case is the fallback they can rely on.

Given that the 'hover' expression is invoked many many times without the user noticing it due to it being triggered by the mouse, I'm making it use only the frame's variable path functionality and not the expression parser. This should speed up tremendously the responsiveness of a debug session when the user only sets source breakpoints and inspect local variables, as the entire debug info is not needed to be parsed.

Regarding tests, I've tried to be as comprehensive as possible considering a multi-file project. Fortunately, the results from the "hover" case are enough most of the times.

Differential Revision: https://reviews.llvm.org/D98656
2021-03-15 15:09:23 -07:00
Peyton, Jonathan L 7085f04573 [OpenMP] Remove unused cpu_stackoffset member 2021-03-15 16:52:04 -05:00
Alexander Yermolovich 51504bc1d9 [DWARF] Check for AddrOffsetSectionBase to work with DWO Units.
Context: https://lists.llvm.org/pipermail/llvm-dev/2021-February/148521.html

A fix for llvm-symbolizer, and other tools like BOLT, that allows retrieving address when built with -gsplit-dwarf=single mode.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D96827
2021-03-15 14:46:09 -07:00
diggerlin d1f1bff81b [AIX][XCOFF] Fixed the test case which failed at aix OS because enable -mignore-xcoff-visibility by default.
Summary:

because we enable -mignore-xcoff-visibility by default when there is no -fvisibility option in the clang in AIX OS
it will cause some test case fail at aix os. in order to let the -mignore-xcoff-visibility to be disable, we need to add the -fvisibility=default for those test case.

Reviewers: hubert.reinterpretcast daltenty
Differential Revision: https://reviews.llvm.org/D98660
2021-03-15 17:33:02 -04:00
Artem Belevich 50c7504a93 [NVPTX] Avoid temp copy of byval kernel parameters.
Avoid making a temporary copy of byval argument if all accesses are loads and
therefore the pointer to the parameter can not escape.

This avoids excessive global memory accesses when each kernel makes its own
copy.

Differential revision: https://reviews.llvm.org/D98469
2021-03-15 14:27:22 -07:00
Nick Lewycky 483a253ae9 NFC: Formatting changes.
Run clang-format over these files.

Capitalize some variable names per clang-tidy's request.

Pulled out to simplify review of D98302.
2021-03-15 14:26:39 -07:00
peter klausler 6811b96100 [flang] Runtime: implement INDEX intrinsic function
Implement INDEX in the runtime, reusing some infrastructure
(with generalization and renaming as needed) put into place
for its cousins SCAN and VERIFY.

I did not implement full Boyer-Moore substring searching
for the forward case, but did accelerate some advancement on
mismatches.

I (re)implemented unit testing for INDEX in the new gtest
framework, combining it with the tests that have recently
been ported to gtest for SCAN and VERIFY.

Differential Revision: https://reviews.llvm.org/D98553
2021-03-15 14:19:13 -07:00
Stanislav Mekhanoshin bc27a31801 [AMDGPU] Fix copyPhysReg to not produce unalined vgpr access
RA can insert something like a sub1_sub2 COPY of a wide VGPR
tuple which results in the unaligned acces with v_pk_mov_b32
after the copy is expanded. This is regression after D97316.

Differential Revision: https://reviews.llvm.org/D98549
2021-03-15 14:14:30 -07:00
Florian Hahn bb244ea2a8
[AnnotationRemarks] Remove unneeded Function.h include (NFC). 2021-03-15 21:09:35 +00:00
Nico Weber 01d648a69b [gn build] merge 9bcf0eff99 2021-03-15 17:05:05 -04:00
Jonas Paulsson 9cfd301ec8 [SystemZ] Test for isinf and isfinite in testFPKind().
Recognize BI__builtin_isinf and BI__builtin_isfinite (and a few other opcodes
for finite) in testFPKind() and handle with TDC.

Review: Ulrich Weigand.

Differential Revision: https://reviews.llvm.org/D97901
2021-03-15 15:02:39 -06:00
Nico Weber efbaf4030b [gn build] kind of merge af2796c76d
Good enough for now. If we need more, we'll do the usual
platform-dependent hardcoding that in practice works for everything else
too.
2021-03-15 17:01:00 -04:00
Stanislav Mekhanoshin c297709ee1 [AMDGPU] Fixed msan failure with uninitialized value 2021-03-15 13:58:19 -07:00
Jon Chesterfield bb38d7ff05 [libomptarget][nfc][amdgcn] Use precise triple for devicertl build 2021-03-15 20:24:13 +00:00
Stefan Pintilie 86f2a3d178 [PowerPC] Add __PCREL__ when PC Relative is enabled.
This patch adds the `__PCREL__` define when PC Relative addressing is enabled.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D98546
2021-03-15 15:13:02 -05:00
Jon Chesterfield d0bc85f04a [libomptarget][nfc] Drop unused DEVICE macro
[libomptarget][nfc] Drop unused DEVICE macro

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D98655
2021-03-15 20:12:50 +00:00
Jon Chesterfield 7da76aaaf4 [libomptarget] Build amdgpu plugin by default
[libomptarget] Build amdgpu plugin by default

This will build the amdgpu plugin if cmake is able to find the hsa
runtime library, which will be the case if rocm is installed or if
the hsa library has been installed somewhere cmake looks.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D98654
2021-03-15 20:12:01 +00:00
Kirill Bobyrev 9bcf0eff99
[clangd] Optionally add reflection for clangd-index-server
This was originally landed without the optional part and reverted later:

8080ea4c4b

Reviewed By: kadircet

Differential Revision: https://reviews.llvm.org/D98404
2021-03-15 21:07:25 +01:00
Markus Böck 68e4084bf6 Revert line accidentally included in af2796c76d 2021-03-15 21:03:46 +01:00
Sanjay Patel b1b07dd071 [SLP] update stale test comments; NFC
These bugs were fixed with 0a8e7ca402
2021-03-15 16:02:46 -04:00
Stanislav Mekhanoshin 3bffb1cd0e [AMDGPU] Use single cache policy operand
Replace individual operands GLC, SLC, and DLC with a single cache_policy
bitmask operand. This will reduce the number of operands in MIR and I hope
the amount of code. These operands are mostly 0 anyway.

Additional advantage that parser will accept these flags in any order unlike
now.

Differential Revision: https://reviews.llvm.org/D96469
2021-03-15 13:00:59 -07:00
Markus Böck af2796c76d [test] Add ability to get error messages from CMake for errc substitution
Visual Studios implementation of the C++ Standard Library does not use strerror to produce a message for std::error_code unlike other standard libraries such as libstdc++ or libc++ that might be used.

This patch adds a cmake script that through running a C++ program gets the error messages for the POSIX error codes and passes them onto lit through an optional config parameter.

If the config parameter is not set, or getting the messages failed, due to say a cross compiling configuration without an emulator, it will fall back to using pythons strerror functions.

Differential Revision: https://reviews.llvm.org/D98278
2021-03-15 20:56:08 +01:00
Jon Chesterfield bcb3f0f867 [libomptarget] Fix devicertl build
[libomptarget] Fix devicertl build

The target specific functions in target_interface are extern C, but the
implementations for nvptx were mostly C++ mangling. That worked out as
a quirk of DEVICE macro expanding to nothing, except for shuffle.h which
only forward declared the functions with C++ linkage.

Also implements GetWarpSize, as used by shuffle, and includes target_interface
in nvptx target_impl.cu to help catch future divergence between interface and
implementation.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D98651
2021-03-15 19:50:22 +00:00
Michael Kruse 9c486eb348 [Polly] Fix deprecation warning. NFC.
IRBuilder::CreateLoad without type parameter was deprecated in 6312c538
to prepare for opaque pointers.
2021-03-15 14:31:16 -05:00
Wenlei He a5d30421a6 [CSSPGO] Load context profile for external functions in PreLink and populate ThinLTO import list
For ThinLTO's prelink compilation, we need to put external inline candidates into an import list attached to function's entry count metadata. This enables ThinLink to treat such cross module callee as hot in summary index, and later helps postlink to import them for profile guided cross module inlining.

For AutoFDO, the import list is retrieved by traversing the nested inlinee functions. For CSSPGO, since profile is flatterned, a few things need to happen for it to work:

 - When loading input profile in extended binary format, we need to load all child context profile whose parent is in current module, so context trie for current module includes potential cross module inlinee.
 - In order to make the above happen, we need to know whether input profile is CSSPGO profile before start reading function profile, hence a flag for profile summary section is added.
 - When searching for cross module inline candidate, we need to walk through the context trie instead of nested inlinee profile (callsite sample of AutoFDO profile).
 - Now that we have more accurate counts with CSSPGO, we swtiched to use entry count instead of total count to decided if an external callee is potentially beneficial to inline. This make it consistent with how we determine whether call tagert is potential inline candidate.

Differential Revision: https://reviews.llvm.org/D98590
2021-03-15 12:22:15 -07:00
Jianzhou Zhao 9cf5220c5c [dfsan] Updated check_custom_wrappers.sh to dedup function names
The origin wrappers added by https://reviews.llvm.org/D98359 reuse
those __dfsw_ functions.
2021-03-15 19:12:08 +00:00