The builtins vec_xl_len_r and vec_xst_len_r actually use the
wrong side of the vector on big endian Power9 systems. We never
spotted this before because there was no such thing as a big
endian distro that supported Power9. Now we have AIX and the
elements are in the wrong part of the vector. This just fixes
it so the elements are loaded to and stored from the right
side of the vector.
Change `CountersPtr` in `__profd_` to a label difference, which is a link-time
constant. On ELF, when linking a shared object, this requires that `__profc_` is
either private or linkonce/linkonce_odr hidden. On COFF, we need D104564 so that
`.quad a-b` (64-bit label difference) can lower to a 32-bit PC-relative relocation.
```
# ELF: R_X86_64_PC64 (PC-relative)
.quad .L__profc_foo-.L__profd_foo
# Mach-O: a pair of 8-byte X86_64_RELOC_UNSIGNED and X86_64_RELOC_SUBTRACTOR
.quad l___profc_foo-l___profd_foo
# COFF: we actually use IMAGE_REL_AMD64_REL32/IMAGE_REL_ARM64_REL32 so
# the high 32-bit value is zero even if .L__profc_foo < .L__profd_foo
# As compensation, we truncate CountersDelta in the header so that
# __llvm_profile_merge_from_buffer and llvm-profdata reader keep working.
.quad .L__profc_foo-.L__profd_foo
```
(Note: link.exe sorts `.lprfc` before `.lprfd` even if the object writer
has `.lprfd` before `.lprfc`, so we cannot work around by reordering
`.lprfc` and `.lprfd`.)
With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`)
`ld -pie` linked clang is 1.74% smaller due to fewer R_X86_64_RELATIVE relocations.
```
% readelf -r pie | awk '$3~/R.*/{s[$3]++} END {for (k in s) print k, s[k]}'
R_X86_64_JUMP_SLO 331
R_X86_64_TPOFF64 2
R_X86_64_RELATIVE 476059 # was: 607712
R_X86_64_64 2616
R_X86_64_GLOB_DAT 31
```
The absolute function address (used by llvm-profdata to collect indirect call
targets) can be converted to relative as well, but is not done in this patch.
Differential Revision: https://reviews.llvm.org/D104556
This distributes reductions based on the relative offset of loads, if
one is found from their operands. Given chains of reductions this will
then sort them in ascending load order, which in turn can help simple
prefetches latch on to increasing strides more easily.
Differential Revision: https://reviews.llvm.org/D106569
@tcanens pointed out the current behavior of the macro breaks the usage
pattern described in http://wg21.link/SD6
```
# if __has_include(<optional>)
# include <optional>
# if __cpp_lib_optional >= 201606
# define have_optional 1
# endif
```
To support this usage pattern the hard errror is removed. Instead the
header includes nothing but the `<version>` header.
Reviewed By: ldionne, #libc
Differential Revision: https://reviews.llvm.org/D107134
Rationale:
External file formats always store the values as doubles, so this was
hard coded in the memory resident COO scheme used to pass data into the
final sparse storage scheme during setup. However, with alternative methods
on the horizon of setting up these temporary COO schemes, it is time to
properly template this data structure.
Reviewed By: gussmith23
Differential Revision: https://reviews.llvm.org/D107001
Always codesign binaries on macOS. Apple Silicon has stricter
codesigning requirements, for example requiring macCatalyst binaries to
be signed. Ad-hoc sign everything like we do for other Darwin platforms.
When using `-DLLVM_ENABLED_RUNTIMES` instead of `-DLLVM_ENABLED_PROJECTS`
the `llvm-omp-device-info` tool is not compiled or installed.
In general, no llvm tool would be build on runtimes, because the
-DLLVM_BUILD_TOOLS flag is removed by the way runtimes compilation calls
cmake again.
This patch is simple. Just forward the value of this flag to the
runtime cmake command.
I'm also removing an unnecessary comment in the compilation of the tool
Differential Revision: https://reviews.llvm.org/D107177
Don't try to run the non-integrated assembler; just verify that the
invocations look like what we expect. Do verify that the integrated
assembler handles warnings as expected.
In mixed mode builds, we should not be including errno as part of
LLVM libc - errno from another library (or the system library) should be
used. But, other entrypoints which use errno list LLVM libc's errno as a
dep ta satisfy the full build mode. So, we add a dummy errno
implementation with empty files to make both mixed mode and full build
mode happy.
This could be smarter by picking an ideal type, or at least splitting
the vector in half first. Also handles lower for non-power-of-2,
non-extending vector loads.
Currently this just avoids failing to legalize some odd vector AMDGPU
tests, but is a step towards removing the split logic from the
NarrowScalar logic.
Replace insertelement instructions for splats with just single
insertelement + broadcast shuffle. Also, try to merge these instructions
if they come from the same/shuffled gather node.
Differential Revision: https://reviews.llvm.org/D107104
If a reduction Phi has a single user which `AND`s the Phi with a type mask,
`lookThroughAnd` will return the user of the Phi and the narrower type represented
by the mask. Currently this is only used for arithmetic reductions, whereas loops
containing logical reductions will create a reduction intrinsic using the widened
type, for example:
for.body:
%phi = phi i32 [ %and, %for.body ], [ 255, %entry ]
%mask = and i32 %phi, 255
%gep = getelementptr inbounds i8, i8* %ptr, i32 %iv
%load = load i8, i8* %gep
%ext = zext i8 %load to i32
%and = and i32 %mask, %ext
...
^ this will generate an and reduction intrinsic such as the following:
call i32 @llvm.vector.reduce.and.v8i32(<8 x i32>...)
The same example for an add instruction would create an intrinsic of type i8:
call i8 @llvm.vector.reduce.add.v8i8(<8 x i8>...)
This patch changes AddReductionVar to call lookThroughAnd for other integer
reductions, allowing loops similar to the example above with reductions such
as and, or & xor to vectorize.
Reviewed By: david-arm, dmgreen
Differential Revision: https://reviews.llvm.org/D105632
The code for splitting an unaligned access into 2 pieces is
essentially the same as for splitting a non-power-of-2 load for
scalars. It would be better to pick an optimal memory access size and
directly use it, but splitting in half is what the DAG does.
As-is this fixes handling of some unaligned sextload/zextloads for
AMDGPU. In the future this will help drop the ugly abuse of
narrowScalar to handle splitting unaligned accesses.
FixIt, and add support for initialization check of scoped enum
In C++, the enumeration is never Integer, and the enumeration condition judgment is added to avoid compiling errors when it is initialized to an integer.
Add support for initialization check of scope enum.
As the following case show, clang-tidy will give a wrong automatic fix:
enum Color {Red, Green, Blue};
enum class Gender {Male, Female};
void func() {
Color color; // Color color = 0; <--- fix bug
Gender gender; // <--- no warning
}
Reviewd By: aaron.ballman, whisperity
Differential Revision: http://reviews.llvm.org/D106431
Port external-io test to use GTest. Remove Runtime tests directory.
Rename RuntimeGTest directory to Runtime.
This is the last in a series of patches which ported tests from the old
flang/unittests/Runtime test directory to use GTest in a temporary
unittest directory under flang/unittests/RuntimeGTest. Now that all the
tests in the old directory have been ported to use GTest, the old
directory has been removed and the GTest directory has been renamed to
flang/unittests/Runtime.
Differential Revision: https://reviews.llvm.org/D105315
Reviewed by: Meinersbur, awarzynski
The effect name is used by tablegen when generating the getEffects method of the SideEffectInterfaces. It is currently unqualified even though the class is contained within the mlir namespace, leading to compiler errors when using namespace mlir; isn't used before including the generated cpp file.
This patch fixes that by simply fully qualifying the class name.
Differential Revision: https://reviews.llvm.org/D107171
This patch will re-enable the patch posted under https://reviews.llvm.org/D106688 originally which was reverted due to buildbreak that was caused by mismatched diagnostic message arguments.
Reviewed By: Zarko Todorovski
Differential Revision: https://reviews.llvm.org/D107105
All `nowait` series of interfaces in `libomptarget` accept four more arguments (`int32_t depNum, void *depList, int32_t noAliasDepNum, void *noAliasDepList`) compared with their counterparts w/o `nowait`. These extra arguments were expected for dependence resolution, potentially lowered to device side. Current implementation calls `libomp` function `__kmpc_omp_taskwait`. However, the front end simply ignores them, that these four arguments are not emitted at all. As a consequence, the `depNum` and `noAliasDepNum` are garbage, which could lead to unnecessary task wait.
Reviewed By: grokos
Differential Revision: https://reviews.llvm.org/D107164
'pipe' keyword is introduced in OpenCL C 2.0: so do checks for OpenCL C version while
parsing and then later on check for language options to construct actual pipe. This feature
requires support of __opencl_c_generic_address_space, so diagnostics for that is provided as well.
This is the same patch as in D106748 but with a tiny fix in checking of diagnostic messages.
Also added tests when program scope global variables are not supported.
Reviewed By: Anastasia
Differential Revision: https://reviews.llvm.org/D107154
Commit 83df122 (r368334) added 'REQUIRES: linux' to this test, but
because triples are not respected by REQUIRES, that meant it was
invariably Unsupported. The correct keyword would be 'system-linux'
(checking the host rather than the target).
Because the test was always skipped, commit 0cfd9e5 (r375439) did not
notice that the test modification was incorrect.
This patch corrects the REQUIRES clause and fixes the incorrect
previous patch.
Found after implementing https://reviews.llvm.org/D107162
With this patch, OpenMP on AMDGCN will use the math functions
provided by ROCm ocml library. Linking device code to the ocml will be
done in the next patch.
Reviewed By: JonChesterfield, jdoerfert, scchan
Differential Revision: https://reviews.llvm.org/D104904
Under the -faltivec-src-compat=gcc option, AltiVec vector initialization should
be treated as if they were compiled with gcc - which is, to emit an error when
the vectors are initialized in the parenthesized or non-parenthesized manner.
This patch implements this behaviour.
Differential Revision: https://reviews.llvm.org/D106410
Put declarations/definitions of unused variables under corresponding macros
to silence clang build warnings.
Differential Revision: https://reviews.llvm.org/D106608
In a post-commit message to https://reviews.llvm.org/D102343
@MaskRay pointed out syntax errors in one of the test cases. This
patch fixes those problems, I had forgotten the colon after the CHECK- strings.
Math libraries are linked only when -lm is specified. This is because
host system could be missing rocm-device-libs.
Reviewed By: JonChesterfield, yaxunl
Differential Revision: https://reviews.llvm.org/D105981