Windows doesn't allow common with alignment >32 bits, so these tests
were broken in windows mode. This patch makes 'common' optional in
these cases.
Change-Id: I4d5fdd07ecdafc3570ef9b09cd816c2e5e4ed15e
llvm-svn: 350645
Summary:
If a function argument is byval and RV is located in default or alloca address space
an optimization of creating addrspacecast instead of memcpy is performed. That is
not correct for OpenCL, where that can lead to a situation of address space casting
from __private * to __global *. See an example below:
```
typedef struct {
int x;
} MyStruct;
void foo(MyStruct val) {}
kernel void KernelOneMember(__global MyStruct* x) {
foo (*x);
}
```
for this code clang generated following IR:
...
%0 = load %struct.MyStruct addrspace(1)*, %struct.MyStruct addrspace(1)**
%x.addr, align 4
%1 = addrspacecast %struct.MyStruct addrspace(1)* %0 to %struct.MyStruct*
...
So the optimization was disallowed for OpenCL if RV is located in an address space
different than that of the argument (0).
Reviewers: yaxunl, Anastasia
Reviewed By: Anastasia
Subscribers: cfe-commits, asavonic
Differential Revision: https://reviews.llvm.org/D54947
llvm-svn: 348752
The spec is ambiguous on whether vector types are allowed to be
implicitly converted. The only legal context I think this can
be used for OpenCL is printf, where it seems necessary.
llvm-svn: 348083
Summary:
Prior to this patch, OpenCL code such as the following would attempt to create
a BranchInst with a non-bool argument:
if (enqueue_kernel(get_default_queue(), 0, nd, ^(void){})) /* ... */
This patch is a follow up on a similar issue with pipe builtin
operations. See commit r280800 and https://bugs.llvm.org/show_bug.cgi?id=30219.
This change, while being conservative on non-builtin functions,
should set the type of expressions invoking builtins to the
proper type, instead of defaulting to `bool` and requiring
manual overrides in Sema::CheckBuiltinFunctionCall.
In addition to tests for enqueue_kernel, the tests are extended to
check other OpenCL builtins.
Reviewers: Anastasia, spatel, rsmith
Reviewed By: Anastasia
Subscribers: kristina, cfe-commits, svenvh
Differential Revision: https://reviews.llvm.org/D52879
llvm-svn: 347658
Summary: The name of the synthesized constants for constant initialization was using mangling for statics, which isn't generally correct and (in a yet-uncommitted patch) causes the mangler to assert out because the static ends up trying to mangle function parameters and this makes no sense. Instead, mangle to `"__const." + FunctionName + "." + DeclName`.
Reviewers: rjmccall
Subscribers: dexonsmith, cfe-commits
Differential Revision: https://reviews.llvm.org/D54055
llvm-svn: 346915
This patch breaks Index/opencl-types.cl LIT test:
Script:
--
: 'RUN: at line 1'; stage1/bin/c-index-test -test-print-type llvm/tools/clang/test/Index/opencl-types.cl -cl-std=CL2.0 | stage1/bin/FileCheck llvm/tools/clang/test/Index/opencl-types.cl
--
Command Output (stderr):
--
llvm/tools/clang/test/Index/opencl-types.cl:3:26: warning: unsupported OpenCL extension 'cl_khr_fp16' - ignoring [-Wignored-pragmas]
llvm/tools/clang/test/Index/opencl-types.cl:4:26: warning: unsupported OpenCL extension 'cl_khr_fp64' - ignoring [-Wignored-pragmas]
llvm/tools/clang/test/Index/opencl-types.cl:8:9: error: use of type 'double' requires cl_khr_fp64 extension to be enabled
llvm/tools/clang/test/Index/opencl-types.cl:11:8: error: declaring variable of type 'half' is not allowed
llvm/tools/clang/test/Index/opencl-types.cl:15:3: error: use of type 'double' requires cl_khr_fp64 extension to be enabled
llvm/tools/clang/test/Index/opencl-types.cl:16:3: error: use of type 'double4' (vector of 4 'double' values) requires cl_khr_fp64 extension to be enabled
llvm/tools/clang/test/Index/opencl-types.cl:26:26: warning: unsupported OpenCL extension 'cl_khr_gl_msaa_sharing' - ignoring [-Wignored-pragmas]
llvm/tools/clang/test/Index/opencl-types.cl:35:44: error: use of type '__read_only image2d_msaa_t' requires cl_khr_gl_msaa_sharing extension to be enabled
llvm/tools/clang/test/Index/opencl-types.cl:36:49: error: use of type '__read_only image2d_array_msaa_t' requires cl_khr_gl_msaa_sharing extension to be enabled
llvm/tools/clang/test/Index/opencl-types.cl:37:49: error: use of type '__read_only image2d_msaa_depth_t' requires cl_khr_gl_msaa_sharing extension to be enabled
llvm/tools/clang/test/Index/opencl-types.cl:38:54: error: use of type '__read_only image2d_array_msaa_depth_t' requires cl_khr_gl_msaa_sharing extension to be enabled
llvm-svn: 346338
This is a continuation of my patches to inform the X86 backend about what the largest IR types are in the function so that we can restrict the backend type legalizer to prevent 512-bit vectors on SKX when -mprefer-vector-width=256 is specified if no explicit 512 bit vectors were specified by the user.
This patch updates the vector width based on the argument and return types of the current function and from the types of any functions it calls. This is intended to make sure the backend type legalizer doesn't disturb any types that are required for ABI.
Differential Revision: https://reviews.llvm.org/D52441
llvm-svn: 345168
Emit llvm.amdgcn.update.dpp for both __builtin_amdgcn_mov_dpp and
__builtin_amdgcn_update_dpp. The first argument to
llvm.amdgcn.update.dpp will be undef for __builtin_amdgcn_mov_dpp.
Differential Revision: https://reviews.llvm.org/D52320
llvm-svn: 344665
r326937 ("[OpenCL] Remove block invoke function from emitted block
literal struct", 2018-03-07) broke block argument handling. In
particular the commit was causing a crash during code generation, see
the discussion in https://reviews.llvm.org/D43783 .
The offending commit has just been reverted; add a test to avoid
breaking this again in the future.
llvm-svn: 343583
This reverts r326937 as it broke block argument handling in OpenCL.
See the discussion on https://reviews.llvm.org/D43783 .
The next commit will add a test case that revealed the issue.
llvm-svn: 343582
Fast FMAF is not a sufficient condition to enable denormals.
Before VI, enabling denormals caused F32 instructions to
run at F64 speeds.
llvm-svn: 339278
Always emit alloca in entry block for enqueue_kernel builtin.
Ensures the statically sized alloca is not converted to DYNAMIC_STACKALLOC
later because it is not in the entry block.
llvm-svn: 339150
Ensures the statically sized alloca is not converted to DYNAMIC_STACKALLOC
later because it is not in the entry block.
Differential Revision: https://reviews.llvm.org/D50104
llvm-svn: 338899
The way address space declarations for builtins currently work
is nearly useless. The code assumes the address spaces used for
builtins is a confusingly named "target address space" from user
code using __attribute__((address_space(N))) that matches
the builtin declaration. There's no way to use this to declare
a builtin that returns a language specific address space.
The terminology used is highly cofusing since it has nothing
to do with the the address space selected by the target to use
for a language address space.
This feature is essentially unused as-is. AMDGPU and NVPTX
are the only in-tree targets attempting to use this. The AMDGPU
builtins certainly do not behave as intended (i.e. all of the
builtins returning pointers can never compile because the numbered
address space never matches the expected named address space).
The NVPTX builtins are missing tests for some, and the others
seem to rely on an implicit addrspacecast.
Change the used address space for builtins based on a target
hook to allow using a language address space for a builtin.
This allows the same builtin declaration to be used for multiple
languages with similarly purposed address spaces (e.g. the same
AMDGPU builtin can be used in OpenCL and CUDA even though the
constant address spaces are arbitarily different).
This breaks the possibility of using arbitrary numbered
address spaces alongside the named address spaces for builtins.
If this is an issue we probably need to introduce another builtin
declaration character to distinguish language address spaces from
so-called "target address spaces".
llvm-svn: 338707
OpenCL block literal structs have different fields which are now correctly
identified in the debug info.
Differential Revision: https://reviews.llvm.org/D49930
llvm-svn: 338299
Summary: Automatic variable initialization was generating default-aligned stores (which are deprecated) instead of using the known alignment from the alloca. Further, they didn't specify inbounds.
Subscribers: dexonsmith, cfe-commits
Differential Revision: https://reviews.llvm.org/D49209
llvm-svn: 337041
1. added restrictions to memory scope, order and volatile parameters
2. added custom processing for these builtins - currently is not used code,
needed to switch off GCCBuiltin link to the builtins (ongoing change to llvm
tree)
3. builtins renamed as requested
Differential Revision: https://reviews.llvm.org/D43281
llvm-svn: 332848
There shouldn't be any tests that run the entire optimizer here,
but the last test in this file is definitely going to break with
a change in LLVM IR canonicalization. Change that part to check
the unoptimized IR because that's the real intent of this file.
llvm-svn: 332473
Added string literal helper function to obtain the type
attributed by a constant address space.
Also fixed predefind __func__ expr to use the helper
to constract the string literal correctly.
Differential Revision: https://reviews.llvm.org/D46049
llvm-svn: 331877
Half-type mangling is accomplished following the method introduced by Erich
Keane for mangling _Float16. Updated the half.cl LIT test to cover this
particular case.
Patch By: vbridgers
Differential Revision: https://reviews.llvm.org/D46131
llvm-svn: 331263
SPIR-V encodes the read_only and write_only access qualifiers of pipes,
so separate LLVM IR types are required to target SPIR-V. Other backends
may also find this useful.
These new types are `opencl.pipe_ro_t` and `opencl.pipe_wo_t`, which
replace `opencl.pipe_t`.
This replaces __get_pipe_num_packets(...) and __get_pipe_max_packets(...)
which took a read_only pipe with separate versions for read_only and
write_only pipes, namely:
* __get_pipe_num_packets_ro(...)
* __get_pipe_num_packets_wo(...)
* __get_pipe_max_packets_ro(...)
* __get_pipe_max_packets_wo(...)
These separate versions exist to avoid needing a bitcast to one of the
two qualified pipe types.
Patch by Stuart Brady.
Differential Revision: https://reviews.llvm.org/D46015
llvm-svn: 331026
Found via codespell -q 3 -I ../clang-whitelist.txt
Where whitelist consists of:
archtype
cas
classs
checkk
compres
definit
frome
iff
inteval
ith
lod
methode
nd
optin
ot
pres
statics
te
thru
Patch by luzpaz! (This is a subset of D44188 that applies cleanly with a few
files that have dubious fixes reverted.)
Differential revision: https://reviews.llvm.org/D44188
llvm-svn: 329399
Need to override convertConstraint to recognise amdgpu specific register names.
Differential Revision: https://reviews.llvm.org/D44533
llvm-svn: 328359
Add two additional implicit arguments for OpenCL for the AMDGPU target using the AMDHSA runtime to support device enqueue.
Differential Revision: https://reviews.llvm.org/D44696
llvm-svn: 328350
- Remove use of the opencl and amdopencl environment member of the target triple for the AMDGPU target.
- Use a function attribute to communicate to the AMDGPU backend.
Differential Revision: https://reviews.llvm.org/D43735
llvm-svn: 328347
The indirect function argument is in alloca address space in LLVM IR. However,
during Clang codegen for C++, the address space of indirect function argument
should match its address space in the source code, i.e., default addr space, even
for indirect argument. This is because destructor of the indirect argument may
be called in the caller function, and address of the indirect argument may be
taken, in either case the indirect function argument is expected to be in default
addr space, not the alloca address space.
Therefore, the indirect function argument should be mapped to the temp var
casted to default address space. The caller will cast it to alloca addr space
when passing it to the callee. In the callee, the argument is also casted to the
default address space and used.
CallArg is refactored to facilitate this fix.
Differential Revision: https://reviews.llvm.org/D34367
llvm-svn: 326946
OpenCL runtime tracks the invoke function emitted for
any block expression. Due to restrictions on blocks in
OpenCL (v2.0 s6.12.5), it is always possible to know the
block invoke function when emitting call of block expression
or __enqueue_kernel builtin functions. Since __enqueu_kernel
already has an argument for the invoke function, it is redundant
to have invoke function member in the llvm block literal structure.
This patch removes invoke function from the llvm block literal
structure. It also removes the bitcast of block invoke function
to the generic block literal type which is useless for OpenCL.
This will save some space for the kernel argument, and also
eliminate some store instructions.
Differential Revision: https://reviews.llvm.org/D43783
llvm-svn: 326937
The tests that failed on a windows host have been fixed.
Original message:
Start setting dso_local for COFF.
With this there are still some GVs where we don't set dso_local
because setGVProperties is never called. I intend to fix that in
followup commits. This is just the bare minimum to teach
shouldAssumeDSOLocal what it should do for COFF.
llvm-svn: 325940
Summary:
OpenCL 2.0 specification defines '-cl-uniform-work-group-size' option,
which requires that the global work-size be a multiple of the work-group
size specified to clEnqueueNDRangeKernel and allows optimizations that
are made possible by this restriction.
The patch introduces the support of this option.
To keep information about whether an OpenCL kernel has uniform work
group size or not, clang generates 'uniform-work-group-size' function
attribute for every kernel:
- "uniform-work-group-size"="true" for OpenCL 1.2 and lower,
- "uniform-work-group-size"="true" for OpenCL 2.0 and higher if
'-cl-uniform-work-group-size' option was specified,
- "uniform-work-group-size"="false" for OpenCL 2.0 and higher if no
'-cl-uniform-work-group-size' options was specified.
If the function is not an OpenCL kernel, 'uniform-work-group-size'
attribute isn't generated.
Patch by: krisb
Reviewers: yaxunl, Anastasia, b-sumner
Reviewed By: yaxunl, Anastasia
Subscribers: nhaehnle, yaxunl, Anastasia, cfe-commits
Differential Revision: https://reviews.llvm.org/D43570
llvm-svn: 325771
The following test case causes issue with codegen of __enqueue_block
void (^block)(void) = ^{ callee(id, out); };
enqueue_kernel(queue, 0, ndrange, block);
Clang first does codegen for block expression in the first line and deletes its block info.
Clang then tries to do codegen for the same block expression again for the second line,
and fails because the block info is gone.
The fix is to do normal codegen for both lines. Introduce an API to OpenCL runtime to
record llvm block invoke function and llvm block literal emitted for each AST block
expression, and use the recorded information for generating the wrapper kernel.
The EmitBlockLiteral APIs are cleaned up to minimize changes to the normal codegen
of blocks.
Another minor issue is that some clean up AST expression is generated for block
with captures, which can be stripped by IgnoreImplicit.
Differential Revision: https://reviews.llvm.org/D43240
llvm-svn: 325264
Summary:
Upstream LLVM is changing the the prototypes of the @llvm.memcpy/memmove/memset
intrinsics. This change updates the Clang tests for this change.
The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument
which is required to be a constant integer. It represents the alignment of the
dest (and source), and so must be the minimum of the actual alignment of the
two.
This change removes the alignment argument in favour of placing the alignment
attribute on the source and destination pointers of the memory intrinsic call.
For example, code which used to read:
call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false)
will now read
call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false)
At this time the source and destination alignments must be the same (Step 1).
Step 2 of the change, to be landed shortly, will relax that contraint and allow
the source and destination to have different alignments.
llvm-svn: 322964
CreateCoercedLoad/CreateCoercedStore assumes pointer argument of
memcpy is in addr space 0, which is not correct and causes invalid
bitcasts for triple amdgcn---amdgiz.
It is fixed by using alloca addr space instead.
Differential Revision: https://reviews.llvm.org/D40806
llvm-svn: 320000
Summary:
Constant samplers are handled as static variables and clang's code generation
library, which leads to llvm::unreachable. We bypass emitting sampler variable
as static since it's translated to a function call later.
Reviewers: yaxunl, Anastasia
Reviewed By: yaxunl, Anastasia
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D34342
llvm-svn: 318290
Builder save/restores insertion pointer when emitting addr space cast
for alloca, but does not save/restore debug loc, which causes verifier
failure for certain call instructions.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D39069
llvm-svn: 316484
Currently clang assumes the temporary variables emitted during
codegen of atomic builtins have address space 0, which
is not true for target triple amdgcn---amdgiz and causes invalid
bitcasts.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D38966
llvm-svn: 316000
In OpenCL the kernel function and non-kernel function has different calling conventions.
For certain targets they have different argument ABIs. Also kernels have special function
attributes and metadata for runtime to launch them.
The blocks passed to enqueue_kernel is supposed to be executed as kernels. As such,
the block invoke function should be emitted as kernel with proper calling convention and
argument ABI.
This patch emits enqueued block as kernel. If a block is both called directly and passed
to enqueue_kernel, separate functions will be generated.
Differential Revision: https://reviews.llvm.org/D38134
llvm-svn: 315804
Currently Clang uses default address space (0) to represent private address space for OpenCL
in AST. There are two issues with this:
Multiple address spaces including private address space cannot be diagnosed.
There is no mangling for default address space. For example, if private int* is emitted as
i32 addrspace(5)* in IR. It is supposed to be mangled as PUAS5i but it is mangled as
Pi instead.
This patch attempts to represent OpenCL private address space explicitly in AST. It adds
a new enum LangAS::opencl_private and adds it to the variable types which are implicitly
private:
automatic variables without address space qualifier
function parameter
pointee type without address space qualifier (OpenCL 1.2 and below)
Differential Revision: https://reviews.llvm.org/D35082
llvm-svn: 315668
This was done for CUDA functions in r261779, and for the same
reason this also needs to be done for OpenCL. An arbitrary
function could have a barrier() call in it, which in turn
requires the calling function to be convergent.
llvm-svn: 315094
Currently block is translated to a structure equivalent to
struct Block {
void *isa;
int flags;
int reserved;
void *invoke;
void *descriptor;
};
Except invoke, which is the pointer to the block invoke function,
all other fields are useless for OpenCL, which clutter the IR and
also waste memory since the block struct is passed to the block
invoke function as argument.
On the other hand, the size and alignment of the block struct is
not stored in the struct, which causes difficulty to implement
__enqueue_kernel as library function, since the library function
needs to know the size and alignment of the argument which needs
to be passed to the kernel.
This patch removes the useless fields from the block struct and adds
size and align fields. The equivalent block struct will become
struct Block {
int size;
int align;
generic void *invoke;
/* custom fields */
};
It also changes the pointer to the invoke function to be
a generic pointer since the address space of a function
may not be private on certain targets.
Differential Revision: https://reviews.llvm.org/D37822
llvm-svn: 314932
Added missing addrspacecast case in alignment computation
logic of pointer type emission in IR generation.
Differential Revision: https://reviews.llvm.org/D37804
llvm-svn: 314304
Add tests for different address spaces and insert some blank lines to make them more readable.
Differential Revision: https://reviews.llvm.org/D37742
llvm-svn: 313172
Not all targets support vararg (e.g. amdgpu). Instead of using vararg in the emitted functions for enqueue_kernel,
this patch creates a temporary array of size_t, stores the size arguments in the temporary array
and passes it to the emitted functions for enqueue_kernel.
Differential Revision: https://reviews.llvm.org/D36678
llvm-svn: 312441
Summary:
Most DIExpressions are empty or very simple. When they are complex, they
tend to be unique, so checking them inline is reasonable.
This also avoids the need for CodeGen passes to append to the
llvm.dbg.mir named md node.
See also PR22780, for making DIExpression not be an MDNode.
Reviewers: aprantl, dexonsmith, dblaikie
Subscribers: qcolombet, javed.absar, eraman, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D37075
llvm-svn: 311594
Generalize getOpenCLImageAddrSpace into getOpenCLTypeAddrSpace, such
that targets can select the address space per type.
No functional changes intended.
Initial patch by Simon Perretta.
Differential Revision: https://reviews.llvm.org/D33989
llvm-svn: 310911
This is an improvement over always using byval for
structs.
This will use registers until ~16 are used, and then
switch back to byval. This needs more work, since I'm
not sure it ever really makes sense to use byval. If
the register limit is exceeded, the arguments still
end up passed on the stack, but with a different ABI.
It also may make sense to base this on number of
registers used for non-struct arguments, rather than
just arguments that appear first in the argument list.
llvm-svn: 310527
OpenCL 2.0 atomic builtin functions have a scope argument which is ideally
represented as synchronization scope argument in LLVM atomic instructions.
Clang supports translating Clang atomic builtin functions to LLVM atomic
instructions. However it currently does not support synchronization scope
of LLVM atomic instructions. Without this, users have to use LLVM assembly
code to implement OpenCL atomic builtin functions.
This patch adds OpenCL 2.0 atomic builtin functions as Clang builtin
functions, which supports generating LLVM atomic instructions with
synchronization scope operand.
Currently only constant memory scope argument is supported. Support of
non-constant memory scope argument will be added later.
Differential Revision: https://reviews.llvm.org/D28691
llvm-svn: 310082
Currently AMDGPUTargetInfo does not initialize AddrSpaceMap in constructor, which causes regressions in mesa/clover with libclc.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D34987
llvm-svn: 307105
Clang assumes coerced function argument is in address space 0, which is not always true and results in invalid bitcasts.
This patch fixes failure in OpenCL conformance test api/get_kernel_arg_info with amdgcn---amdgizcl triple, where non-zero alloca address space is used.
Differential Revision: https://reviews.llvm.org/D34777
llvm-svn: 306721
Summary: OpenCL and SPIR version metadata must be generated once per module instead of once per mangled global value.
Reviewers: Anastasia, yaxunl
Reviewed By: Anastasia
Subscribers: ahatanak, cfe-commits
Differential Revision: https://reviews.llvm.org/D34235
llvm-svn: 305796
Rationale: OpenCL kernels are called via an explicit runtime API
with arguments set with clSetKernelArg(), not as normal sub-functions.
Return SPIR_KERNEL by default as the kernel calling convention to ensure
the fingerprint is fixed such way that each OpenCL argument gets one
matching argument in the produced kernel function argument list to enable
feasible implementation of clSetKernelArg() with aggregates etc. In case
we would use the default C calling conv here, clSetKernelArg() might
break depending on the target-specific conventions; different targets
might split structs passed as values to multiple function arguments etc.
https://reviews.llvm.org/D33639
llvm-svn: 304389
Amongst other, this will help LTO to correctly handle/honor files
compiled with O0, helping debugging failures.
It also seems in line with how we handle other options, like how
-fnoinline adds the appropriate attribute as well.
Differential Revision: https://reviews.llvm.org/D28404
llvm-svn: 304127
A recent change requires opencl triple environment for compiling OpenCL
program, which causes regressions in libclc.
This patch fixes that. Instead of deducing language based on triple
environment, it checks LangOptions.
Differential Revision: https://reviews.llvm.org/D33445
llvm-svn: 303644
Alloca always returns a pointer in alloca address space, which may
be different from the type defined by the language. For example,
in C++ the auto variables are in the default address space. Therefore
cast alloca to the expected address space when necessary.
Differential Revision: https://reviews.llvm.org/D32248
llvm-svn: 303370
LLVM has changed the semantics of dbg.declare for describing function
arguments. After this patch a dbg.declare always takes the *address*
of a variable as the first argument, even if the argument is not an
alloca.
https://bugs.llvm.org/show_bug.cgi?id=32382
rdar://problem/31205000
llvm-svn: 300523
For OpenCL, the private address space qualifier is 0 in AST. Before this change, 0 address space qualifier
is always mapped to target address space 0. As now target private address space is specified by
alloca address space in data layout, address space qualifier 0 needs to be mapped to alloca addr space specified by the data layout.
This change has no impact on targets whose alloca addr space is 0.
With contributions from Matt Arsenault, Tony Tye and Wen-Heng (Jack) Chung
Differential Revision: https://reviews.llvm.org/D31404
llvm-svn: 299965
Change constant address space from 4 to 2 for the new address space mapping in Clang.
Differential Revision: https://reviews.llvm.org/D31771
llvm-svn: 299691
These two attributes specify the same info in a different way.
AMGPU BE only checks the latter as a target specific attribute
as opposed to language specific reqd_work_group_size.
This change produces amdgpu_flat_work_group_size out of
reqd_work_group_size if specified.
Differential Revision: https://reviews.llvm.org/D31728
llvm-svn: 299678
Summary:
"kernel_arg_type_qual" metadata should contain const/volatile/restrict
tags only for pointer types to match the corresponding requirement of
the OpenCL specification.
OpenCL 2.0 spec 5.9.3 Kernel Object Queries:
CL_KERNEL_ARG_TYPE_VOLATILE is returned if the argument is a pointer
and the referenced type is declared with the volatile qualifier.
[...]
Similarly, CL_KERNEL_ARG_TYPE_CONST is returned if the argument is a
pointer and the referenced type is declared with the restrict or const
qualifier.
[...]
CL_KERNEL_ARG_TYPE_RESTRICT will be returned if the pointer type is
marked restrict.
Reviewers: Anastasia, cfe-commits
Reviewed By: Anastasia
Subscribers: bader, yaxunl
Differential Revision: https://reviews.llvm.org/D31321
llvm-svn: 299192
For target environment amdgiz and amdgizcl (giz means Generic Is Zero), AMDGPU will use new address space mapping where generic address space is 0 and private address space is 5. The data layout is also changed correspondingly.
Differential Revision: https://reviews.llvm.org/D31210
llvm-svn: 298767
For variables in generic address spaces, for example:
```
unsigned char V[6442450944];
...
```
the address space is not yet known when we get into
*getConstantArrayType*, it is 0. AMDGCN target's
address space 0 has 32 bits pointers, so when we
call *getPointerWidth* with 0, the array size is
trimmed to 32 bits, which is not right.
Differential Revision: https://reviews.llvm.org/D30845
llvm-svn: 298420
Summary: I added a new rank to ImplicitConversionRank enum to resolve the function overload ambiguity with vector types. Rank of scalar types conversion is lower than vector splat. So, we can choose which function should we call. See test for more details.
Reviewers: Anastasia, cfe-commits
Reviewed By: Anastasia
Subscribers: bader, yaxunl
Differential Revision: https://reviews.llvm.org/D30816
llvm-svn: 298366
We can't actually pretend that 0 is valid for address space 0.
r295877 added a workaround to stop allocating user objects
there, so we can use 0 as the invalid pointer.
Some of the tests seemed to be using private as the non-0 null
test address space, so add copies using local to make sure
this is still stressed.
llvm-svn: 297659
Removed ndrange_t as Clang builtin type and added
as a struct type in the OpenCL header.
Use type name to do the Sema checking in enqueue_kernel
and modify IR generation accordingly.
Review: D28058
Patch by Dmitry Borisenkov!
llvm-svn: 295311
Modify ObjC blocks impl wrt address spaces as follows:
- keep default private address space for blocks generated
as local variables (with captures);
- add global address space for global block literals (no captures);
- make the block invoke function and enqueue_kernel prototype with
the generic AS block pointer parameter to accommodate both
private and global AS cases from above;
- add block handling into default AS because it's implemented as
a special pointer type (BlockPointer) in the frontend and therefore
it is used as a pointer everywhere. This is also needed to accommodate
both private and global AS blocks for the two cases above.
- removes ObjC RT specific symbols (NSConcreteStackBlock and
NSConcreteGlobalBlock) in the OpenCL mode.
Review: https://reviews.llvm.org/D28814
llvm-svn: 293286
Summary:
We compile user opencl kernel code with spir triple. But built-ins are written in OpenCL and we compile it with triple x86_64 to be able to use x86 intrinsics. And we need address spaces to match in both cases. So, we change fake address space map in OpenCL for matching with spir.
On CPU address spaces are not really important but we'd like to preserve address space information in order to perform optimizations relying on this info like enhanced alias analysis.
Reviewers: pekka.jaaskelainen, Anastasia
Subscribers: pekka.jaaskelainen, yaxunl, bader, cfe-commits
Differential Revision: https://reviews.llvm.org/D28048
llvm-svn: 290436
-fno-inline-functions, -O0, and optnone.
These were really, really tangled together:
- We used the noinline LLVM attribute for -fno-inline
- But not for -fno-inline-functions (breaking LTO)
- But we did use it for -finline-hint-functions (yay, LTO is happy!)
- But we didn't for -O0 (LTO is sad yet again...)
- We had weird structuring of CodeGenOpts with both an inlining
enumeration and a boolean. They interacted in weird ways and
needlessly.
- A *lot* of set smashing went on with setting these, and then got worse
when we considered optnone and other inlining-effecting attributes.
- A bunch of inline affecting attributes were managed in a completely
different place from -fno-inline.
- Even with -fno-inline we failed to put the LLVM noinline attribute
onto many generated function definitions because they didn't show up
as AST-level functions.
- If you passed -O0 but -finline-functions we would run the normal
inliner pass in LLVM despite it being in the O0 pipeline, which really
doesn't make much sense.
- Lastly, we used things like '-fno-inline' to manipulate the pass
pipeline which forced the pass pipeline to be much more
parameterizable than it really needs to be. Instead we can *just* use
the optimization level to select a pipeline and control the rest via
attributes.
Sadly, this causes a bunch of churn in tests because we don't run the
optimizer in the tests and check the contents of attribute sets. It
would be awesome if attribute sets were a bit more FileCheck friendly,
but oh well.
I think this is a significant improvement and should remove the semantic
need to change what inliner pass we run in order to comply with the
requested inlining semantics by relying completely on attributes. It
also cleans up tho optnone and related handling a bit.
One unfortunate aspect of this is that for generating alwaysinline
routines like those in OpenMP we end up removing noinline and then
adding alwaysinline. I tried a bunch of other approaches, but because we
recompute function attributes from scratch and don't have a declaration
here I couldn't find anything substantially cleaner than this.
Differential Revision: https://reviews.llvm.org/D28053
llvm-svn: 290398
This is a recommit of r290149, which was reverted in r290169 due to msan
failures. msan was failing because we were calling
`isMostDerivedAnUnsizedArray` on an invalid designator, which caused us
to read uninitialized memory. To fix this, the logic of the caller of
said function was simplified, and we now have a `!Invalid` assert in
`isMostDerivedAnUnsizedArray`, so we can catch this particular bug more
easily in the future.
Fingers crossed that this patch sticks this time. :)
Original commit message:
This patch does three things:
- Gives us the alloc_size attribute in clang, which lets us infer the
number of bytes handed back to us by malloc/realloc/calloc/any user
functions that act in a similar manner.
- Teaches our constexpr evaluator that evaluating some `const` variables
is OK sometimes. This is why we have a change in
test/SemaCXX/constant-expression-cxx11.cpp and other seemingly
unrelated tests. Richard Smith okay'ed this idea some time ago in
person.
- Uniques some Blocks in CodeGen, which was reviewed separately at
D26410. Lack of uniquing only really shows up as a problem when
combined with our new eagerness in the face of const.
llvm-svn: 290297
This reverts commit r290171. It triggers a bunch of warnings, because
the new enumerator isn't handled in all switches. We want a warning-free
build.
Replied on the commit with more details.
llvm-svn: 290173
Summary: Enabling the compression of CLK_NULL_QUEUE to variable of type queue_t.
Reviewers: Anastasia
Subscribers: cfe-commits, yaxunl, bader
Differential Revision: https://reviews.llvm.org/D27569
llvm-svn: 290171
This commit fails MSan when running test/CodeGen/object-size.c in
a confusing way. After some discussion with George, it isn't really
clear what is going on here. We can make the MSan failure go away by
testing for the invalid bit, but *why* things are invalid isn't clear.
And yet, other code in the surrounding area is doing precisely this and
testing for invalid.
George is going to take a closer look at this to better understand the
nature of the failure and recommit it, for now backing it out to clean
up MSan builds.
llvm-svn: 290169
This patch does three things:
- Gives us the alloc_size attribute in clang, which lets us infer the
number of bytes handed back to us by malloc/realloc/calloc/any user
functions that act in a similar manner.
- Teaches our constexpr evaluator that evaluating some `const` variables
is OK sometimes. This is why we have a change in
test/SemaCXX/constant-expression-cxx11.cpp and other seemingly
unrelated tests. Richard Smith okay'ed this idea some time ago in
person.
- Uniques some Blocks in CodeGen, which was reviewed separately at
D26410. Lack of uniquing only really shows up as a problem when
combined with our new eagerness in the face of const.
Differential Revision: https://reviews.llvm.org/D14274
llvm-svn: 290149
Added a map to associate types and declarations with extensions.
Refactored existing diagnostic for disabled types associated with extensions and extended it to declarations for generic situation.
Fixed some bugs for types associated with extensions.
Allow users to use pragma to declare types and functions for supported extensions, e.g.
#pragma OPENCL EXTENSION the_new_extension_name : begin
// declare types and functions associated with the extension here
#pragma OPENCL EXTENSION the_new_extension_name : end
Differential Revision: https://reviews.llvm.org/D21698
llvm-svn: 289979
This change makes sure single-precision floating point types are used if the
cl_fp64 extension is not supported by the target.
Also removed the check to see whether the OpenCL version is >= 1.2, as this has
been incorporated into the extension setting code.
Differential Revision: https://reviews.llvm.org/D24235
llvm-svn: 289544
Summary: Although the feature was introduced only in OpenCL C v2.0 spec., it's useful for OpenCL 1.x too and doesn't require HW support.
Reviewers: Anastasia
Subscribers: yaxunl, cfe-commits, bader
Differential Revision: https://reviews.llvm.org/D27453
llvm-svn: 289535
In amdgcn target, null pointers in global, constant, and generic address space take value 0 but null pointers in private and local address space take value -1. Currently LLVM assumes all null pointers take value 0, which results in incorrectly translated IR. To workaround this issue, instead of emit null pointers in local and private address space, a null pointer in generic address space is emitted and casted to local and private address space.
Tentative definition of global variables with non-zero initializer will have weak linkage instead of common linkage since common linkage requires zero initializer and does not have explicit section to hold the non-zero value.
Virtual member functions getNullPointer and performAddrSpaceCast are added to TargetCodeGenInfo which by default returns ConstantPointerNull and emitting addrspacecast instruction. A virtual member function getNullPointerValue is added to TargetInfo which by default returns 0. Each target can override these virtual functions to get target specific null pointer and the null pointer value for specific address space, and perform specific translations for addrspacecast.
Wrapper functions getNullPointer is added to CodegenModule and getTargetNullPointerValue is added to ASTContext to facilitate getting the target specific null pointers and their values.
This change has no effect on other targets except amdgcn target. Other targets can provide support of non-zero null pointer in a similar way.
This change only provides support for non-zero null pointer for C and OpenCL. Supporting for other languages will be added later incrementally.
Differential Revision: https://reviews.llvm.org/D26196
llvm-svn: 289252
Avoid using shortcut for const qualified non-constant address space
aggregate variables while generating them on the stack such that
the alloca object is used instead of a global variable containing
initializer.
Review: https://reviews.llvm.org/D27109
llvm-svn: 288163
The wave barrier represents the discardable barrier. Its main purpose is to
carry convergent attribute, thus preventing illegal CFG optimizations. All lanes
in a wave come to convergence point simultaneously with SIMT, thus no special
instruction is needed in the ISA. The barrier is discarded during code generation.
Differential Revision: https://reviews.llvm.org/D26584
llvm-svn: 287006
Make handling integer parameters more flexible:
- For the number of events argument allow to pass larger
integers than 32 bits as soon as compiler can prove that
the range fits in 32 bits. If not, the diagnostic will be given.
- Change type of the arguments specifying the sizes of
the corresponding block arguments to be size_t.
Review: https://reviews.llvm.org/D26509
llvm-svn: 286849
- Accept NULL pointer as a valid parameter value for clk_event.
- Generate clk_event_t arguments of internal
__enqueue_kernel_XXX function as pointers in generic address space.
Review: https://reviews.llvm.org/D26507
llvm-svn: 286836
It doesn't make sense to use the target's address space ids in this context as
this is metadata that should be referring to the "logical" OpenCL address spaces.
For flat AS machines like all "CPUs" in general, the logical AS info gets lost as
there's only one address space (0).
This commit changes the logic such that we always use the SPIR address space
ids for the argument metadata. It thus allows implementing the clGetKernelArgInfo()
and the other detection needs.
https://reviews.llvm.org/D26157
llvm-svn: 286819
This change makes sure single-precision floating point types are used if the
cl_fp64 extension is not supported by the target.
Also removed the check to see whether the OpenCL version is >= 1.2, as this has
been incorporated into the extension setting code.
Differential Revision: https://reviews.llvm.org/D24235
llvm-svn: 286815
Certain OpenCL builtin functions are supposed to be executed by all threads in a work group or sub group. Such functions should not be made divergent during transformation. It makes sense to mark them with convergent attribute.
The adding of convergent attribute is based on Ettore Speziale's work and the original proposal and patch can be found at https://www.mail-archive.com/cfe-commits@lists.llvm.org/msg22271.html.
Differential Revision: https://reviews.llvm.org/D25343
llvm-svn: 285725
Summary: Setting constant address space for global constants used for memcpy-initialization of arrays.
Patch by Alexey Sotkin.
Reviewers: bader, yaxunl, Anastasia
Subscribers: cfe-commits, AlexeySotkin
Differential Revision: https://reviews.llvm.org/D25305
llvm-svn: 285557
Currently Clang allows partial initializer for C99 but not for OpenCL, e.g.
float a[16][16] = {1.0f, 2.0f};
is allowed in C99 but not allowed in OpenCL.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D25335
llvm-svn: 283891
__builtin_astype is used to cast OpenCL opaque types to other types, as such, it needs to be able to handle casting from and to pointer types correctly.
Current it cannot handle 1) casting between pointers of different addr spaces 2) casting between pointer type and non-pointer types.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D25123
llvm-svn: 283114
__attribute__((amdgpu_flat_work_group_size(<min>, <max>))) - request minimum and maximum flat work group size
__attribute__((amdgpu_waves_per_eu(<min>[, <max>]))) - request minimum and/or maximum waves per execution unit
Differential Revision: https://reviews.llvm.org/D24513
llvm-svn: 282371
Fix target options for fp32/64-denormals so that
+fp64-denormals is set if fp64 is supported
-fp32-denormals if fp32 denormals is not supported, or -cl-denorms-are-zero is set
+fp32-denormals if fp32 denormals is supported and -cl-denorms-are-zero is not set
If target feature fp32/64-denormals is explicitly set, they will override default options and options deduced from -cl-denorms-are-zero.
Differential Revision: https://reviews.llvm.org/D24512
llvm-svn: 281357
Summary:
Remove access qualifiers on images in arg info metadata:
* kernel_arg_type
* kernel_arg_base_type
Image access qualifiers are inseparable from type in clang implementation,
but OpenCL spec provides a special query to get access qualifier
via clGetKernelArgInfo with CL_KERNEL_ARG_ACCESS_QUALIFIER.
Besides that OpenCL conformance test_api get_kernel_arg_info expects
image types without access qualifier.
Patch by Evgeniy Tyurin.
Reviewers: bader, yaxunl, Anastasia
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D23915
llvm-svn: 280699
Structs are currently handled as pointer + byval, which makes AMDGPU
LLVM backend generate incorrect code when structs are used. This patch
changes struct argument to be handled directly and without flattening,
which Clover (Mesa 3D Gallium OpenCL state tracker) will be able to
handle. Flattening would expand the struct to individual elements and
pass each as a separate argument, which Clover can not
handle. Furthermore, such expansion does not fit the OpenCL
programming model which requires to explicitely specify each argument
index, size and memory location.
Patch by Vedran Miletić
llvm-svn: 279463
Summary:
int __builtin_amdgcn_ds_swizzle (int a, int imm);
while imm is a constant.
Differential Revision:
http://reviews.llvm.org/D23682
llvm-svn: 279165
Pointers of certain GPUs in AMDGCN target in private address space is 32 bit but pointers in other address spaces are 64 bit. size_t type should be defined as 64 bit for these GPUs so that it could hold pointers in all address spaces. Also fixed issues in pointer arithmetic codegen by using pointer specific intptr type.
Differential Revision: https://reviews.llvm.org/D23361
llvm-svn: 279121
Let the driver pass the option to frontend. Do not set precision metadata for division instructions when this option is set. Set function attribute "correctly-rounded-divide-sqrt-fp-math" based on this option.
Differential Revision: https://reviews.llvm.org/D22940
llvm-svn: 278155
Adjust target features for amdgcn target when -cl-denorms-are-zero is set.
Denormal support is controlled by feature strings fp32-denormals fp64-denormals in amdgcn target. If -cl-denorms-are-zero is not set and the command line does not set fp32/64-denormals feature string, +fp32-denormals +fp64-denormals will be on for GPU's supporting them.
A new virtual function virtual void TargetInfo::adjustTargetOptions(const CodeGenOptions &CGOpts, TargetOptions &TargetOpts) const is introduced to allow adjusting target option by codegen option.
Differential Revision: https://reviews.llvm.org/D22815
llvm-svn: 278151
Summary:
In order to re-define OpenCL built-in functions
'to_{private,local,global}' in OpenCL run-time library LLVM names must
be different from the clang built-in function names.
Reviewers: yaxunl, Anastasia
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D23120
llvm-svn: 277743
Currently Clang use int32 to represent sampler_t, which have been a source of issue for some backends, because in some backends sampler_t cannot be represented by int32. They have to depend on kernel argument metadata and use IPA to find the sampler arguments and global variables and transform them to target specific sampler type.
This patch uses opaque pointer type opencl.sampler_t* for sampler_t. For each use of file-scope sampler variable, it generates a function call of __translate_sampler_initializer. For each initialization of function-scope sampler variable, it generates a function call of __translate_sampler_initializer.
Each builtin library can implement its own __translate_sampler_initializer(). Since the real sampler type tends to be architecture dependent, allowing it to be initialized by a library function simplifies backend design. A typical implementation of __translate_sampler_initializer could be a table lookup of real sampler literal values. Since its argument is always a literal, the returned pointer is known at compile time and easily optimized to finally become some literal values directly put into image read instructions.
This patch is partially based on Alexey Sotkin's work in Khronos Clang (3d4eec6162).
Differential Revision: https://reviews.llvm.org/D21567
llvm-svn: 277024
Allows AMDGCN target to generate images (such as %opencl.image2d_t) in constant address space.
Images will still be generated in global address space by default.
Added tests to existing opencl-types.cl in test\CodeGenOpenCL.
Patch by Aaron En Ye Shi.
Differential Revision: https://reviews.llvm.org/D22523
llvm-svn: 276161
Added the opencl.ocl.version metadata to be emitted with amdgcn. Created a static function emitOCLVerMD which is shared between triple spir and target amdgcn.
Also added new testcases to existing test file, spir_version.cl inside test/CodeGenOpenCL.
Patch by Aaron En Ye Shi.
Differential Revision: https://reviews.llvm.org/D22424
llvm-svn: 276010
This reverts also r275029, "Update Clang tests after adding inference for the returned argument attribute"
It broke LTO build. Seems miscompilation.
llvm-svn: 275756
Improved test with user define structure pipe type case.
Reviewers: Anastasia, pxli168
Subscribers: yaxunl, cfe-commits
Differential revision: http://reviews.llvm.org/D21744
llvm-svn: 275259
Add OCL option -cl-no-signed-zeros to driver options.
Also added to opencl.cl testcases.
Patch by Aaron En Ye Shi.
Differential Revision: http://reviews.llvm.org/D22067
llvm-svn: 274923
OpenCL s6.6: "Access qualifier must be used with image object arguments
of kernels and of user-defined functions [...] If no qualifier is
provided, read_only is assumed".
This does not define the behavior for image types used in typedef
declaration, but following the spec logic, we should allow access
qualifiers specification in typedefs, e.g.:
typedef write_only image1d_t img1d_wo;
Unlike cv-qualifiers, user cannot add access qualifier to a typedef
type, i.e. this is not allowed:
typedef image1d_t img1d; // note: previously declared 'read_only' here
void foo(write_only img1d im) {} // error: multiple access qualifier
Patch by Andrew Savonichev.
Reviewers: Anastasia Stulova.
Differential revision: http://reviews.llvm.org/D20948
llvm-svn: 274858
- Added new Builtins: enqueue_kernel, get_kernel_work_group_size
and get_kernel_preferred_work_group_size_multiple.
These Builtins use custom check to diagnose parameters of the passed Blocks
i. e. variable number of 'local void*' type params, and check different
overloads specified in Table 6.31 of OpenCL v2.0.
- IR is generated as an internal library call for each OpenCL Builtin,
reusing ObjC Block implementation.
Review: http://reviews.llvm.org/D20249
llvm-svn: 274540
Summary:
Summary:
Change Clang calling convention SpirKernel to OpenCLKernel.
Set calling convention OpenCLKernel for amdgcn as well.
Add virtual method .getOpenCLKernelCallingConv() to TargetCodeGenInfo
and use it to set target calling convention for AMDGPU and SPIR.
Update tests.
Reviewers: rsmith, tstellarAMD, Anastasia, yaxunl
Subscribers: kzhuravl, cfe-commits
Differential Revision: http://reviews.llvm.org/D21367
llvm-svn: 274220
This patch uses function metadata to represent reqd_work_group_size, work_group_size_hint and vector_type_hint kernel attributes and kernel argument info.
Differential Revision: http://reviews.llvm.org/D20979
llvm-svn: 273425
__builtin_astype does not generate correct LLVM IR for vec3 types. This patch inserts bitcasts to/from vec4 when necessary in addition to generating vector shuffle. Sema and codegen tests are added.
Differential Revision: http://reviews.llvm.org/D20133
llvm-svn: 272153
Summary:
OpenCL should support array with const value size length, those const
varibale in global and constant address space and variable in constant
address space.
Fixed test case error.
Reviewers: Anastasia, yaxunl, bader
Subscribers: bader, cfe-commits
Differential Revision: http://reviews.llvm.org/D20090
llvm-svn: 271978
Summary:
OpenCL should support array with const value size length, those const varibale in global and constant address space and variable in constant address space.
Reviewers: Anastasia, yaxunl, bader
Subscribers: bader, cfe-commits
Differential Revision: http://reviews.llvm.org/D20090
llvm-svn: 271971
OpenCL builtin functions to_{global|local|private} accepts argument of pointer type to arbitrary pointee type, and return a pointer to the same pointee type in different addr space, i.e.
global gentype *to_global(gentype *p);
It is not desirable to declare it as
global void *to_global(void *);
in opencl header file since it misses diagnostics.
This patch implements these builtin functions as Clang builtin functions. In the builtin def file they are defined to have signature void*(void*). When handling call expressions, their declarations are re-written to have correct parameter type and return type corresponding to the call argument.
In codegen call to addr void *to_addr(void*) is generated with addrcasts or bitcasts to facilitate implementation in builtin library.
Differential Revision: http://reviews.llvm.org/D19932
llvm-svn: 270261
Add supported OpenCL extensions to target info. It serves as default values to save the users of the burden setting each supported extensions and optional core features in command line.
Re-commit after fixing build error due to missing override attribute.
Differential Revision: http://reviews.llvm.org/D19484
llvm-svn: 269670
Revert r269431 due to build failure caused by warning msg:
llvm/tools/clang/lib/Basic/Targets.cpp:2090:9: error: 'setSupportedOpenCLOpts' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override]
void setSupportedOpenCLOpts() {
llvm-svn: 269435
Add supported OpenCL extensions to target info. It serves as default values to save the users of the burden setting each supported extensions and optional core features in command line.
Differential Revision: http://reviews.llvm.org/D19484
llvm-svn: 269431
When comparing unqualified types, canonical types should be used, otherwise equivalent types may be treated as different type.
Differential Revision: http://reviews.llvm.org/D19662
llvm-svn: 267906
In codegen different address spaces may be mapped to the same address
space for a target, e.g. in x86/x86-64 all address spaces are mapped
to 0. Therefore AddressSpaceConversion should be translated by
CreatePointerBitCastOrAddrSpaceCast instead of CreateAddrSpaceCast.
Differential Revision: http://reviews.llvm.org/D18713
llvm-svn: 266107
I. Current implementation of images is not conformant to spec in the following points:
1. It makes no distinction with respect to access qualifiers and therefore allows to use images with different access type interchangeably. The following code would compile just fine:
void write_image(write_only image2d_t img);
kernel void foo(read_only image2d_t img) { write_image(img); } // Accepted code
which is disallowed according to s6.13.14.
2. It discards access qualifier on generated code, which leads to generated code for the above example:
call void @write_image(%opencl.image2d_t* %img);
In OpenCL2.0 however we can have different calls into write_image with read_only and wite_only images.
Also generally following compiler steps have no easy way to take different path depending on the image access: linking to the right implementation of image types, performing IR opts and backend codegen differently.
3. Image types are language keywords and can't be redeclared s6.1.9, which can happen currently as they are just typedef names.
4. Default access qualifier read_only is to be added if not provided explicitly.
II. This patch corrects the above points as follows:
1. All images are encapsulated into a separate .def file that is inserted in different points where image handling is required. This avoid a lot of code repetition as all images are handled the same way in the code with no distinction of their exact type.
2. The Cartesian product of image types and image access qualifiers is added to the builtin types. This simplifies a lot handling of access type mismatch as no operations are allowed by default on distinct Builtin types. Also spec intended access qualifier as special type qualifier that are combined with an image type to form a distinct type (see statement above - images can't be created w/o access qualifiers).
3. Improves testing of images in Clang.
Author: Anastasia Stulova
Reviewers: bader, mgrang.
Subscribers: pxli168, pekka.jaaskelainen, yaxunl.
Differential Revision: http://reviews.llvm.org/D17821
llvm-svn: 265783
Add support for opencl_unroll_hint attribute from OpenCL v2.0 s6.11.5.
Reusing most of metadata generation from CGLoopInfo helper class.
The code is based on Khronos OpenCL compiler:
https://github.com/KhronosGroup/SPIR/tree/spirv-1.0
Patch by Liu Yaxun (Sam)!
Differential Revision: http://reviews.llvm.org/D16686
llvm-svn: 261350
The test is failing on SystemZ since different IR is being
generated due to platform ABI differences. Add a target triple.
Fix suggested by Anastasia Stulova.
llvm-svn: 259183
Fix arc patch fuzz error.
Summary:
Support for the pipe built-in functions for OpenCL 2.0.
The pipe builtin functions may have infinite kinds of element types, one approach
would be to just generate calls that would always use generic types such as void*.
This patch is based on bader's opencl support patch on SPIR-V branch.
Reviewers: Anastasia, pekka.jaaskelainen
Subscribers: keryell, bader, cfe-commits
Differential Revision: http://reviews.llvm.org/D15914
llvm-svn: 258782
Summary:
Support for the pipe built-in functions for OpenCL 2.0.
The pipe builtin functions may have infinite kinds of element types, one approach
would be to just generate calls that would always use generic types such as void*.
This patch is based on bader's opencl support patch on SPIR-V branch.
Reviewers: Anastasia, pekka.jaaskelainen
Subscribers: keryell, bader, cfe-commits
Differential Revision: http://reviews.llvm.org/D15914
llvm-svn: 258773