This patch adds the necessary AMDGPU calling convention to the ctor /
dtor kernels. These are fundamentally device kenels called by the host
on image load. Without this calling convention information the AMDGPU
plugin is unable to identify them.
Depends on D122504
Fixes#54091
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D122515
The default construction of constructor functions by LLVM tends to make
them have internal linkage. When we call a ctor / dtor function in the
target region we are actually creating a kernel that is called at
registration. Because the ctor is a kernel we need to make sure it's
externally visible so we can actually call it. This prevented AMDGPU
from correctly using constructors while NVPTX could use them simply
because it ignored internal visibility.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D122504
Currently the device kernels all have weak linkage to prevent linkage
errors on multiple defintions. However, this prevents some optimizations
from adequately analyzing them because of the nature of weak linkage.
This patch replaces the weak linkage with weak_odr linkage so we can
statically assert that multiple declarations of the same kernel will
have the same definition.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D122443
Current clang generates extra set of simd variant function attribute
with extra 'v' encoding.
For example:
_ZGVbN2v__Z5add_1Pf vs _ZGVbN2vv__Z5add_1Pf
The problem is due to declaration of ParamAttrs following:
llvm::SmallVector<ParamAttrTy, 8> ParamAttrs(ParamPositions.size());
where ParamPositions.size() is grown after following assignment:
Pos = ParamPositions[PVD];
So the PVD is not find in ParamPositions.
The problem is ParamPositions need to set for each FD decl. To fix this
Move ParamPositions's init inside while loop for each FD.
Differential Revision: https://reviews.llvm.org/D122338
Currently we create offloading entries to register device variables with
the host. When we register a variable we will look up the symbol in the
device image and map the device address to the host address. This is a
problem when the symbol is declared with hidden visibility or internal
linkage. This means the symbol is not accessible externally and we
cannot get its address. We should still allow static variables to be
declared on the device, but ew should not create an offloading entry for
them so they exist independently on the host and device.
Fixes#54309
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D122352
This requires some adjustment in caller code, because there was
a confusion regarding the meaning of the PtrTy argument: This
argument is the type of the pointer being loaded, not the addresses
being loaded from.
Reapply after fixing the specified pointer type for one call in
47eb4f7dcd, where the used type is
important for determining alignment.
This requires some adjustment in caller code, because there was
a confusion regarding the meaning of the PtrTy argument: This
argument is the type of the pointer being loaded, not the addresses
being loaded from.
Rather than using a dummy void pointer type, we should specify the
correct private type and perform the bitcast beforehand rather than
afterwards. This way, the Address will have correct alignment
information.
Rather than specifying a dummy type in EmitLoadOfPointer() and
then casting it to the correct one, we should instead specify the
correct type and cast beforehand. Otherwise the computed alignment
will be incorrect.
In AMD GPU device code the globals are in AS(1). Before, we crashed if
the global was a structure. Now we simply cast away the AS before we
generate the code to initialize the global.
Differential Revision: https://reviews.llvm.org/D121837
* Use default ref capture for non-escaping lambdas (this makes
maintenance easier by allowing new uses, removing uses, having
conditional uses (such as in assertions) not require updates to an
explicit capture list)
* Simplify addPrivate API not to take a lambda, since it calls it
unconditionally/immediately anyway - most callers are simply passing
in a named value or short expression anyway and the lambda syntax just
adds noise/overhead
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D121077
Changed the we handle llvm::Constants in sizes arrays. ConstExprs and
GlobalValues cannot be used as initializers, need to put them at the
runtime, otherwise there wight be the compilation errors.
Differential Revision: https://reviews.llvm.org/D105297
Changed the we handle llvm::Constants in sizes arrays. ConstExprs and
GlobalValues cannot be used as initializers, need to put them at the
runtime, otherwise there wight be the compilation errors.
Differential Revision: https://reviews.llvm.org/D105297
Currently when we generate OpenMP offloading code we always make
fallback code for the CPU. This is necessary for implementing features
like conditional offloading and ensuring that unhandled pragmas don't
result in missing symbols. However, this is problematic for a few cases.
For offloading tests we can silently fail to the host without realizing
that offloading failed. Additionally, this makes it impossible to
provide interoperabiility to other offloading schemes like HIP or CUDA
because those methods do not provide any such host fallback guaruntee.
this patch adds the `-fopenmp-offload-mandatory` flag to prevent
generating the fallback symbol on the CPU and instead replaces the
function with a dummy global and the failed branch with 'unreachable'.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D120353
To make uses of the deprecated constructor easier to spot, and to
ensure that no new uses are introduced, rename it to
Address::deprecated().
While doing the rename, I've filled in element types in cases
where it was relatively obvious, but we're still left with 135
calls to the deprecated constructor.
Done in manner similar to mutexinoutset
(see https://reviews.llvm.org/D57576)
Runtime support already exists in LLVM OpenMP runtime (see
https://reviews.llvm.org/D97085).
The value used to identify an inoutset dependency type in the LLVM
OpenMP runtime is 8.
Some tests updated due to change in dependency type error messages that
now include new dependency type. Also updated
test/OpenMP/task_codegen.cpp to verify we emit the right code.
Instead use either Type::getPointerElementType() or
Type::getNonOpaquePointerElementType().
This is part of D117885, in preparation for deprecating the API.
One of the unused ident_t fields now holds the size of the string
(=const char *) field so we have an easier time dealing with those
in the future.
Differential Revision: https://reviews.llvm.org/D113126
The reduction initialization code creates a "naturally aligned null
pointer to void lvalue", which I found somewhat odd, even though it
works out in the end because it is not actually used. It doesn't
look like this code actually needs an LValue for anything though,
and we can use an invalid Address to represent this case instead.
Differential Revision: https://reviews.llvm.org/D116214
Add an overload for an Address and a single non-constant offset.
This makes it easier to preserve the element type and adjust the
alignment appropriately.
The problem with the old scheme is that we would need to keep track of
the "next region" and reset the num_threads value after it. The new RT
doesn't do it and an assertion is triggered. The old RT doesn't do it
either, I haven't tested it but I assume a num_threads clause might
impact multiple parallel regions "accidentally". Further, in SPMD mode
num_threads was simply ignored, for some reason beyond me.
In any case, parallel_51 is designed to take the clause value directly,
so let's do that instead.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D113623