Commit Graph

332 Commits

Author SHA1 Message Date
Yaxun Liu f2127d1728 [AMDGPU] Fix bug in enqueued block codegen due to an extra line
llvm-svn: 316165
2017-10-19 15:56:13 +00:00
Yaxun Liu 8ab5ab066a CodeGen: Fix invalid bitcasts for atomic builtins
Currently clang assumes the temporary variables emitted during
codegen of atomic builtins have address space 0, which
is not true for target triple amdgcn---amdgiz and causes invalid
bitcasts.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D38966

llvm-svn: 316000
2017-10-17 14:19:29 +00:00
Yaxun Liu c2a87a05f1 [OpenCL] Emit enqueued block as kernel
In OpenCL the kernel function and non-kernel function has different calling conventions.
For certain targets they have different argument ABIs. Also kernels have special function
attributes and metadata for runtime to launch them.

The blocks passed to enqueue_kernel is supposed to be executed as kernels. As such,
the block invoke function should be emitted as kernel with proper calling convention and
argument ABI.

This patch emits enqueued block as kernel. If a block is both called directly and passed
to enqueue_kernel, separate functions will be generated.

Differential Revision: https://reviews.llvm.org/D38134

llvm-svn: 315804
2017-10-14 12:23:50 +00:00
Yaxun Liu d029234fc6 Fix regression of test/CodeGenOpenCL/address-spaces.cl on ppc
llvm-svn: 315678
2017-10-13 13:53:06 +00:00
Yaxun Liu b7318e02c1 [OpenCL] Add LangAS::opencl_private to represent private address space in AST
Currently Clang uses default address space (0) to represent private address space for OpenCL
in AST. There are two issues with this:

Multiple address spaces including private address space cannot be diagnosed.
There is no mangling for default address space. For example, if private int* is emitted as
i32 addrspace(5)* in IR. It is supposed to be mangled as PUAS5i but it is mangled as
Pi instead.

This patch attempts to represent OpenCL private address space explicitly in AST. It adds
a new enum LangAS::opencl_private and adds it to the variable types which are implicitly
private:

automatic variables without address space qualifier

function parameter

pointee type without address space qualifier (OpenCL 1.2 and below)

Differential Revision: https://reviews.llvm.org/D35082

llvm-svn: 315668
2017-10-13 03:37:48 +00:00
Matt Arsenault f12e3b848a AMDGPU: Add read_exec_lo/hi builtins
llvm-svn: 315238
2017-10-09 20:06:37 +00:00
Matt Arsenault cbe0dd13d2 AMDGPU: Fix missing declaration for __builtin_amdgcn_dispatch_ptr
llvm-svn: 315219
2017-10-09 17:44:18 +00:00
Matt Arsenault a1cf61b6fc OpenCL: Assume functions are convergent
This was done for CUDA functions in r261779, and for the same
reason this also needs to be done for OpenCL. An arbitrary
function could have a barrier() call in it, which in turn
requires the calling function to be convergent.

llvm-svn: 315094
2017-10-06 19:34:40 +00:00
Yaxun Liu 10712d9203 [OpenCL] Clean up and add missing fields for block struct
Currently block is translated to a structure equivalent to

struct Block {
  void *isa;
  int flags;
  int reserved;
  void *invoke;
  void *descriptor;
};
Except invoke, which is the pointer to the block invoke function,
all other fields are useless for OpenCL, which clutter the IR and
also waste memory since the block struct is passed to the block
invoke function as argument.

On the other hand, the size and alignment of the block struct is
not stored in the struct, which causes difficulty to implement
__enqueue_kernel as library function, since the library function
needs to know the size and alignment of the argument which needs
to be passed to the kernel.

This patch removes the useless fields from the block struct and adds
size and align fields. The equivalent block struct will become

struct Block {
  int size;
  int align;
  generic void *invoke;
 /* custom fields */
};
It also changes the pointer to the invoke function to be
a generic pointer since the address space of a function
may not be private on certain targets.

Differential Revision: https://reviews.llvm.org/D37822

llvm-svn: 314932
2017-10-04 20:32:17 +00:00
Anastasia Stulova 89a947da72 [OpenCL] Fixed CL version in failing test.
llvm-svn: 314317
2017-09-27 17:03:35 +00:00
Anastasia Stulova 0a72ed40d3 [OpenCL] Handle address space conversion while setting type alignment.
Added missing addrspacecast case in alignment computation
logic of pointer type emission in IR generation.

Differential Revision: https://reviews.llvm.org/D37804

llvm-svn: 314304
2017-09-27 14:37:00 +00:00
Yaxun Liu 1f33d8eb19 Add more tests for OpenCL atomic builtin functions
Add tests for different address spaces and insert some blank lines to make them more readable.

Differential Revision: https://reviews.llvm.org/D37742

llvm-svn: 313172
2017-09-13 18:56:25 +00:00
Yaxun Liu e37793545e [AMDGPU] Change addr space of clk_event_t, queue_t and reserve_id_t to global
Differential Revision: https://reviews.llvm.org/D37703

llvm-svn: 313171
2017-09-13 18:50:42 +00:00
Jan Vesely 31ecb4bf60 [OpenCL] Add half load and store builtins
This enables load/stores of half type, without half being a legal type.

Differential Revision: https://reviews.llvm.org/D37231

llvm-svn: 312742
2017-09-07 19:39:10 +00:00
Yaxun Liu 29a5ee358e [OpenCL] Do not use vararg in emitted functions for enqueue_kernel
Not all targets support vararg (e.g. amdgpu). Instead of using vararg in the emitted functions for enqueue_kernel,
this patch creates a temporary array of size_t, stores the size arguments in the temporary array
and passes it to the emitted functions for enqueue_kernel.

Differential Revision: https://reviews.llvm.org/D36678

llvm-svn: 312441
2017-09-03 13:52:24 +00:00
Adrian Prantl 9e83fb0838 Adapt testcases to LLVM change r312144 in DIGlobalVariableExpression
llvm-svn: 312148
2017-08-30 18:22:23 +00:00
Reid Kleckner 6d353348e5 Parse and print DIExpressions inline to ease IR and MIR testing
Summary:
Most DIExpressions are empty or very simple. When they are complex, they
tend to be unique, so checking them inline is reasonable.

This also avoids the need for CodeGen passes to append to the
llvm.dbg.mir named md node.

See also PR22780, for making DIExpression not be an MDNode.

Reviewers: aprantl, dexonsmith, dblaikie

Subscribers: qcolombet, javed.absar, eraman, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D37075

llvm-svn: 311594
2017-08-23 20:31:27 +00:00
Yaxun Liu 504d4e2403 Attempt to fix failure in CodeGenOpenCL/atomic-ops.cl again
llvm-svn: 310937
2017-08-15 17:59:26 +00:00
Yaxun Liu adcfe5d881 Attempt to fix failure in CodeGenOpenCL/atomic-ops.cl
llvm-svn: 310932
2017-08-15 17:16:44 +00:00
Yaxun Liu 99d56d291f Remove -finclude-default-header in OpenCL atomic tests
Differential Revision: https://reviews.llvm.org/D36676

llvm-svn: 310927
2017-08-15 16:30:31 +00:00
Yaxun Liu 30d652a447 [OpenCL] Support variable memory scope in atomic builtins
Differential Revision: https://reviews.llvm.org/D36580

llvm-svn: 310924
2017-08-15 16:02:49 +00:00
Sven van Haastregt efb4d4c78c [OpenCL] Allow targets to select address space per type
Generalize getOpenCLImageAddrSpace into getOpenCLTypeAddrSpace, such
that targets can select the address space per type.

No functional changes intended.

Initial patch by Simon Perretta.

Differential Revision: https://reviews.llvm.org/D33989

llvm-svn: 310911
2017-08-15 09:38:18 +00:00
Matt Arsenault 3fe7395fbc AMDGPU: Use direct struct returns and arguments
This is an improvement over always using byval for
structs.

This will use registers until ~16 are used, and then
switch back to byval. This needs more work, since I'm
not sure it ever really makes sense to use byval. If
the register limit is exceeded, the arguments still
end up passed on the stack, but with a different ABI.
It also may make sense to base this on number of
registers used for non-struct arguments, rather than
just arguments that appear first in the argument list.

llvm-svn: 310527
2017-08-09 21:44:58 +00:00
Yaxun Liu 39195062c2 Add OpenCL 2.0 atomic builtin functions as Clang builtin
OpenCL 2.0 atomic builtin functions have a scope argument which is ideally
represented as synchronization scope argument in LLVM atomic instructions.

Clang supports translating Clang atomic builtin functions to LLVM atomic
instructions. However it currently does not support synchronization scope
of LLVM atomic instructions. Without this, users have to use LLVM assembly
code to implement OpenCL atomic builtin functions.

This patch adds OpenCL 2.0 atomic builtin functions as Clang builtin
functions, which supports generating LLVM atomic instructions with
synchronization scope operand.

Currently only constant memory scope argument is supported. Support of
non-constant memory scope argument will be added later.

Differential Revision: https://reviews.llvm.org/D28691

llvm-svn: 310082
2017-08-04 18:16:31 +00:00
Joey Gouly fa76b49cef [OpenCL] Add missing subgroup builtins
This adds get_kernel_max_sub_group_size_for_ndrange and
get_kernel_sub_group_count_for_ndrange.

llvm-svn: 309678
2017-08-01 13:27:09 +00:00
Joey Gouly 53160cdc45 [OpenCL] Enable subgroup extension in tests
This fixes the test, so that it can be run on different hosts that may have
different OpenCL extensions enabled.

llvm-svn: 309571
2017-07-31 15:50:27 +00:00
Joey Gouly 84ae3364df [OpenCL] Add extension Sema check for subgroup builtins
Check the subgroup extension is enabled, before doing other Sema checks.

llvm-svn: 309567
2017-07-31 15:15:59 +00:00
Alexey Sotkin 7d7f0dc08b [OpenCL] Fix access qualifiers metadata for kernel arguments with typedef
Subscribers: cfe-commits, yaxunl, Anastasia

Differential Revision: https://reviews.llvm.org/D35420

llvm-svn: 309155
2017-07-26 18:49:54 +00:00
Egor Churaev 53f9a30543 [OpenCL] Added extended tests on metadata generation for half data type and arrays.
Reviewers: Anastasia

Reviewed By: Anastasia

Subscribers: bader, cfe-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D35000

llvm-svn: 308266
2017-07-18 06:04:01 +00:00
Yaxun Liu 25d1b4341f [AMDGPU] Fix size and alignment of size_t and pointer types
Differential Revision: https://reviews.llvm.org/D34995

llvm-svn: 307121
2017-07-05 04:58:24 +00:00
Yaxun Liu 3ba4a720ad [AMDGPU] Fix regressions on mesa/clover with libclc due to address space
Currently AMDGPUTargetInfo does not initialize AddrSpaceMap in constructor, which causes regressions in mesa/clover with libclc.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D34987

llvm-svn: 307105
2017-07-04 19:57:18 +00:00
Yaxun Liu e9e5c4f975 CodeGen: Fix invalid bitcast for coerced function argument
Clang assumes coerced function argument is in address space 0, which is not always true and results in invalid bitcasts.

This patch fixes failure in OpenCL conformance test api/get_kernel_arg_info with amdgcn---amdgizcl triple, where non-zero alloca address space is used.

Differential Revision: https://reviews.llvm.org/D34777

llvm-svn: 306721
2017-06-29 18:47:45 +00:00
Alexey Bader 364a11651e [OpenCL] Fix OpenCL and SPIR version metadata generation.
Summary: OpenCL and SPIR version metadata must be generated once per module instead of once per mangled global value.

Reviewers: Anastasia, yaxunl

Reviewed By: Anastasia

Subscribers: ahatanak, cfe-commits

Differential Revision: https://reviews.llvm.org/D34235

llvm-svn: 305796
2017-06-20 14:30:18 +00:00
Pekka Jaaskelainen 2eb0bcc9e6 [OpenCL] spir_kern by defaul: fix old test cases
llvm-svn: 304396
2017-06-01 08:19:43 +00:00
Pekka Jaaskelainen fc2629a65a [OpenCL] Makes kernels use the SPIR_KERNEL CC by default.
Rationale: OpenCL kernels are called via an explicit runtime API
with arguments set with clSetKernelArg(), not as normal sub-functions.
Return SPIR_KERNEL by default as the kernel calling convention to ensure
the fingerprint is fixed such way that each OpenCL argument gets one
matching argument in the produced kernel function argument list to enable
feasible implementation of clSetKernelArg() with aggregates etc. In case
we would use the default C calling conv here, clSetKernelArg() might
break depending on the target-specific conventions; different targets
might split structs passed as values to multiple function arguments etc.

https://reviews.llvm.org/D33639

llvm-svn: 304389
2017-06-01 07:18:49 +00:00
Javed Absar 0841d620c5 Fix issue with test that caused bildbot failure
These tests did not specify the target. 
The failure was triggered by change - 
https://reviews.llvm.org/D33205
http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-full/builds/7314

which sets vector alignment to 8-byte for arm-targets (except for Android).
So, fixing the test to make it target specific. 

llvm-svn: 304210
2017-05-30 13:34:26 +00:00
Egor Churaev dd7d82c408 [OpenCL] Test on half immediate support.
Reviewers: Anastasia

Reviewed By: Anastasia

Subscribers: yaxunl, cfe-commits, bader

Differential Revision: https://reviews.llvm.org/D33592

llvm-svn: 304134
2017-05-29 07:44:22 +00:00
Mehdi Amini 6aa9e9b41a IRGen: Add optnone attribute on function during O0
Amongst other, this will help LTO to correctly handle/honor files
compiled with O0, helping debugging failures.
It also seems in line with how we handle other options, like how
-fnoinline adds the appropriate attribute as well.

Differential Revision: https://reviews.llvm.org/D28404

llvm-svn: 304127
2017-05-29 05:38:20 +00:00
Konstantin Zhuravlyov 1f144a18ff Resubmit r303861.
[AMDGPU] add __builtin_amdgcn_s_getpc

Patch by Tim Corringham

llvm-svn: 304033
2017-05-26 21:08:20 +00:00
Reid Kleckner 581a6c5d56 Revert "[AMDGPU] add __builtin_amdgcn_s_getpc"
This reverts commit r303861, the LLVM intrinsic was reverted.

llvm-svn: 303908
2017-05-25 20:28:26 +00:00
Tim Corringham 702fe45bcd [AMDGPU] add __builtin_amdgcn_s_getpc
Summary: Added the builtin corresponding to the s_getpc intrinsic added in llvm D32862

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D33276

llvm-svn: 303861
2017-05-25 14:16:11 +00:00
Yaxun Liu af3d4db64b [AMDGPU] Do not require opencl triple environment for OpenCL
A recent change requires opencl triple environment for compiling OpenCL
program, which causes regressions in libclc.

This patch fixes that. Instead of deducing language based on triple
environment, it checks LangOptions.

Differential Revision: https://reviews.llvm.org/D33445

llvm-svn: 303644
2017-05-23 16:15:53 +00:00
Yaxun Liu 6d96f16347 CodeGen: Cast alloca to expected address space
Alloca always returns a pointer in alloca address space, which may
be different from the type defined by the language. For example,
in C++ the auto variables are in the default address space. Therefore
cast alloca to the expected address space when necessary.

Differential Revision: https://reviews.llvm.org/D32248

llvm-svn: 303370
2017-05-18 18:51:09 +00:00
Yaxun Liu 4f33b3d396 [OpenCL] Emit function-scope variable in constant address space as static variable
Differential Revision: https://reviews.llvm.org/D32977

llvm-svn: 303072
2017-05-15 14:47:47 +00:00
Xiuli Pan be6da4bbdb [OpenCL] Add intel_reqd_sub_group_size attribute support
Summary:
Add intel_reqd_sub_group_size attribute support as intel extension  cl_intel_required_subgroup_size from
https://www.khronos.org/registry/OpenCL/extensions/intel/cl_intel_required_subgroup_size.txt

Reviewers: Anastasia, bader, hfinkel, pxli168

Reviewed By: Anastasia, bader, pxli168

Subscribers: cfe-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D30805

llvm-svn: 302125
2017-05-04 07:31:20 +00:00
Adrian Prantl c3782a1a6f Debug Info: Remove special-casing of indirect function argument handling.
LLVM has changed the semantics of dbg.declare for describing function
arguments. After this patch a dbg.declare always takes the *address*
of a variable as the first argument, even if the argument is not an
alloca.

https://bugs.llvm.org/show_bug.cgi?id=32382
rdar://problem/31205000

llvm-svn: 300523
2017-04-18 01:22:01 +00:00
Yaxun Liu d7523283a7 CodeGen: Let byval parameter use alloca address space
Differential Revision: https://reviews.llvm.org/D32133

llvm-svn: 300487
2017-04-17 20:10:44 +00:00
Yaxun Liu 7f7f323e4f CodeGen: Let lifetime intrinsic use alloca address space
Differential Revision: https://reviews.llvm.org/D31717

llvm-svn: 300485
2017-04-17 20:03:11 +00:00
Konstantin Zhuravlyov e668b1cd1e [AMDGPU][GFX9] Set +fp32-denormals for >=gfx900 unless -cl-denorms-are-zero is set
Differential Revision: https://reviews.llvm.org/D31482

llvm-svn: 300306
2017-04-14 05:33:57 +00:00
Yaxun Liu b34ec829be [OpenCL] Map default address space to alloca address space
For OpenCL, the private address space qualifier is 0 in AST. Before this change, 0 address space qualifier
is always mapped to target address space 0. As now target private address space is specified by
alloca address space in data layout, address space qualifier 0 needs to be mapped to alloca addr space specified by the data layout.

This change has no impact on targets whose alloca addr space is 0.

With contributions from Matt Arsenault, Tony Tye and Wen-Heng (Jack) Chung

Differential Revision: https://reviews.llvm.org/D31404

llvm-svn: 299965
2017-04-11 17:24:23 +00:00
Yaxun Liu b122ed9181 [AMDGPU] Temporarily change constant address space from 4 to 2 for the new address space mapping
Change constant address space from 4 to 2 for the new address space mapping in Clang.

Differential Revision: https://reviews.llvm.org/D31771

llvm-svn: 299691
2017-04-06 19:18:36 +00:00
Stanislav Mekhanoshin 921a42314b [AMDGPU] Translate reqd_work_group_size into amdgpu_flat_work_group_size
These two attributes specify the same info in a different way.
AMGPU BE only checks the latter as a target specific attribute
as opposed to language specific reqd_work_group_size.

This change produces amdgpu_flat_work_group_size out of
reqd_work_group_size if specified.

Differential Revision: https://reviews.llvm.org/D31728

llvm-svn: 299678
2017-04-06 18:15:44 +00:00
Egor Churaev a8d2451533 [OpenCL] Enables passing sampler initializer to function argument
Reviewers: Anastasia, cfe-commits

Reviewed By: Anastasia

Subscribers: yaxunl, bader

Differential Revision: https://reviews.llvm.org/D31594

llvm-svn: 299524
2017-04-05 09:02:56 +00:00
Jin-Gu Kang e7cdcdea73 Preserve vec3 type.
Summary: Preserve vec3 type with CodeGen option.

Reviewers: Anastasia, bruno

Reviewed By: Anastasia

Subscribers: bruno, ahatanak, cfe-commits

Differential Revision: https://reviews.llvm.org/D30810

llvm-svn: 299445
2017-04-04 16:40:25 +00:00
Egor Churaev ba8b84d7fb [OpenCL] Do not generate "kernel_arg_type_qual" metadata for non-pointer args
Summary:
"kernel_arg_type_qual" metadata should contain const/volatile/restrict
tags only for pointer types to match the corresponding requirement of
the OpenCL specification.

OpenCL 2.0 spec 5.9.3 Kernel Object Queries:

CL_KERNEL_ARG_TYPE_VOLATILE is returned if the argument is a pointer
and the referenced type is declared with the volatile qualifier.
[...]
Similarly, CL_KERNEL_ARG_TYPE_CONST is returned if the argument is a
pointer and the referenced type is declared with the restrict or const
qualifier.
[...]
CL_KERNEL_ARG_TYPE_RESTRICT will be returned if the pointer type is
marked restrict.

Reviewers: Anastasia, cfe-commits

Reviewed By: Anastasia

Subscribers: bader, yaxunl

Differential Revision: https://reviews.llvm.org/D31321

llvm-svn: 299192
2017-03-31 10:14:52 +00:00
Egor Churaev 45c26ee0bf [OpenCL] Extended mapping of parcing CodeGen arguments
Summary: Enable cl_mad_enamle and cl_no_signed_zeros options when user turns on cl_unsafe_math_optimizations or cl_fast_relaxed_math options.

Reviewers: Anastasia, cfe-commits

Reviewed By: Anastasia

Subscribers: bader, yaxunl

Differential Revision: https://reviews.llvm.org/D31324

llvm-svn: 298838
2017-03-27 10:38:01 +00:00
Yaxun Liu 3464f92e23 [AMDGPU] Switch address space mapping by triple environment amdgiz
For target environment amdgiz and amdgizcl (giz means Generic Is Zero), AMDGPU will use new address space mapping where generic address space is 0 and private address space is 5. The data layout is also changed correspondingly.

Differential Revision: https://reviews.llvm.org/D31210

llvm-svn: 298767
2017-03-25 03:46:25 +00:00
Konstantin Zhuravlyov 9c1e310c16 Fix array sizes where address space is not yet known
For variables in generic address spaces, for example:

```
unsigned char V[6442450944];
...
```

the address space is not yet known when we get into
*getConstantArrayType*, it is 0. AMDGCN target's
address space 0 has 32 bits pointers, so when we
call *getPointerWidth* with 0, the array size is
trimmed to 32 bits, which is not right.

Differential Revision: https://reviews.llvm.org/D30845

llvm-svn: 298420
2017-03-21 18:55:39 +00:00
Egor Churaev c217f37cb6 [OpenCL] Added implicit conversion rank for overloading functions with vector data type in OpenCL
Summary: I added a new rank to ImplicitConversionRank enum to resolve the function overload ambiguity with vector types. Rank of scalar types conversion is lower than vector splat. So, we can choose which function should we call. See test for more details.

Reviewers: Anastasia, cfe-commits

Reviewed By: Anastasia

Subscribers: bader, yaxunl

Differential Revision: https://reviews.llvm.org/D30816

llvm-svn: 298366
2017-03-21 12:55:55 +00:00
Matt Arsenault bf5e3e4391 AMDGPU: Make 0 the private nullptr value
We can't actually pretend that 0 is valid for address space 0.
r295877 added a workaround to stop allocating user objects
there, so we can use 0 as the invalid pointer.

Some of the tests seemed to be using private as the non-0 null
test address space, so add copies using local to make sure
this is still stressed.

llvm-svn: 297659
2017-03-13 19:47:53 +00:00
Yaxun Liu 4d86799219 [AMDGPU] Add builtin functions readlane ds_permute mov_dpp
Differential Revision: https://reviews.llvm.org/D30551

llvm-svn: 297436
2017-03-10 01:30:46 +00:00
Konstantin Zhuravlyov 2b4917fcc9 [DebugInfo] Append extended dereferencing mechanism to variables' DIExpression for targets that support more than one address space
Differential Revision: https://reviews.llvm.org/D29673

llvm-svn: 297397
2017-03-09 18:06:23 +00:00
Konstantin Zhuravlyov d1ba16e762 [DebugInfo] Add address space when creating DIDerivedTypes
Differential Revision: https://reviews.llvm.org/D29671

llvm-svn: 297321
2017-03-08 23:56:48 +00:00
Jan Vesely 9488560bb8 AMDGPU: export s_sendmsg{halt} instrinsics
Differential Revision: https://reviews.llvm.org/D30366

llvm-svn: 296241
2017-02-25 04:20:24 +00:00
Jan Vesely c255097517 AMDGPU: export l1 cache invalidation intrinsics
Differential Revision: https://reviews.llvm.org/D30360

llvm-svn: 296240
2017-02-25 04:20:22 +00:00
Jan Vesely d26dbb389f AMDGPU: export s_waitcnt builtin
Differential Revision: https://reviews.llvm.org/D30359

llvm-svn: 296239
2017-02-25 04:20:20 +00:00
Matt Arsenault a0c6dca15b AMDGPU: Add fmed3 half builtin
llvm-svn: 295874
2017-02-22 20:55:59 +00:00
Jan Vesely a6f369c727 [OpenCL] r600 needs OpenCL kernel calling convention
Differential Revision: https://reviews.llvm.org/D30236

llvm-svn: 295843
2017-02-22 15:01:42 +00:00
Anastasia Stulova 58984e7087 [OpenCL] Correct ndrange_t implementation
Removed ndrange_t as Clang builtin type and added
as a struct type in the OpenCL header.

Use type name to do the Sema checking in enqueue_kernel
and modify IR generation accordingly.

Review: D28058

Patch by Dmitry Borisenkov!  
 

llvm-svn: 295311
2017-02-16 12:27:47 +00:00
Matt Arsenault 77ce553891 AMDGPU: Add a test checking alignments of emitted globals/allocas
Make sure the spec required type alignments are used in preparation
for a possible change which may break this.

llvm-svn: 294278
2017-02-07 04:28:02 +00:00
Matt Arsenault a274b209f5 AMDGPU: Add builtin for fmed3 intrinsic
llvm-svn: 293600
2017-01-31 03:42:07 +00:00
Anastasia Stulova af0a7bbbe2 [OpenCL] Add missing address spaces in IR generation of blocks
Modify ObjC blocks impl wrt address spaces as follows:

- keep default private address space for blocks generated
as local variables (with captures);

- add global address space for global block literals (no captures);

- make the block invoke function and enqueue_kernel prototype with
the generic AS block pointer parameter to accommodate both 
private and global AS cases from above;

- add block handling into default AS because it's implemented as
a special pointer type (BlockPointer) in the frontend and therefore
it is used as a pointer everywhere. This is also needed to accommodate
both private and global AS blocks for the two cases above.

- removes ObjC RT specific symbols (NSConcreteStackBlock and
NSConcreteGlobalBlock) in the OpenCL mode.

Review: https://reviews.llvm.org/D28814
llvm-svn: 293286
2017-01-27 15:11:34 +00:00
Matt Arsenault 09cca093a3 AMDGPU: Update for changed subtarget feature name
llvm-svn: 292838
2017-01-23 22:31:14 +00:00
Matt Arsenault 24b5ae4497 AMDGPU: Add builtin for getreg intrinsic
llvm-svn: 292636
2017-01-20 19:24:22 +00:00
Egor Churaev 28f00aab73 [OpenCL] Align fake address space map with the SPIR target maps.
Summary:
We compile user opencl kernel code with spir triple. But built-ins are written in OpenCL and we compile it with triple x86_64 to be able to use x86 intrinsics. And we need address spaces to match in both cases. So, we change fake address space map in OpenCL for matching with spir.

On CPU address spaces are not really important but we'd like to preserve address space information in order to perform optimizations relying on this info like enhanced alias analysis.

Reviewers: pekka.jaaskelainen, Anastasia

Subscribers: pekka.jaaskelainen, yaxunl, bader, cfe-commits

Differential Revision: https://reviews.llvm.org/D28048

llvm-svn: 290436
2016-12-23 16:11:25 +00:00
Egor Churaev 89831421af Fix problems in "[OpenCL] Enabling the usage of CLK_NULL_QUEUE as compare operand."
Summary: Fixed warnings in commit: https://reviews.llvm.org/rL290171

Reviewers: djasper, Anastasia

Subscribers: yaxunl, cfe-commits, bader

Differential Revision: https://reviews.llvm.org/D27981

llvm-svn: 290431
2016-12-23 14:55:49 +00:00
Chandler Carruth fcd33149b4 Cleanup the handling of noinline function attributes, -fno-inline,
-fno-inline-functions, -O0, and optnone.

These were really, really tangled together:
- We used the noinline LLVM attribute for -fno-inline
  - But not for -fno-inline-functions (breaking LTO)
  - But we did use it for -finline-hint-functions (yay, LTO is happy!)
  - But we didn't for -O0 (LTO is sad yet again...)
- We had weird structuring of CodeGenOpts with both an inlining
  enumeration and a boolean. They interacted in weird ways and
  needlessly.
- A *lot* of set smashing went on with setting these, and then got worse
  when we considered optnone and other inlining-effecting attributes.
- A bunch of inline affecting attributes were managed in a completely
  different place from -fno-inline.
- Even with -fno-inline we failed to put the LLVM noinline attribute
  onto many generated function definitions because they didn't show up
  as AST-level functions.
- If you passed -O0 but -finline-functions we would run the normal
  inliner pass in LLVM despite it being in the O0 pipeline, which really
  doesn't make much sense.
- Lastly, we used things like '-fno-inline' to manipulate the pass
  pipeline which forced the pass pipeline to be much more
  parameterizable than it really needs to be. Instead we can *just* use
  the optimization level to select a pipeline and control the rest via
  attributes.

Sadly, this causes a bunch of churn in tests because we don't run the
optimizer in the tests and check the contents of attribute sets. It
would be awesome if attribute sets were a bit more FileCheck friendly,
but oh well.

I think this is a significant improvement and should remove the semantic
need to change what inliner pass we run in order to comply with the
requested inlining semantics by relying completely on attributes. It
also cleans up tho optnone and related handling a bit.

One unfortunate aspect of this is that for generating alwaysinline
routines like those in OpenMP we end up removing noinline and then
adding alwaysinline. I tried a bunch of other approaches, but because we
recompute function attributes from scratch and don't have a declaration
here I couldn't find anything substantially cleaner than this.

Differential Revision: https://reviews.llvm.org/D28053

llvm-svn: 290398
2016-12-23 01:24:49 +00:00
George Burgess IV e37633713d Add the alloc_size attribute to clang, attempt 2.
This is a recommit of r290149, which was reverted in r290169 due to msan
failures. msan was failing because we were calling
`isMostDerivedAnUnsizedArray` on an invalid designator, which caused us
to read uninitialized memory. To fix this, the logic of the caller of
said function was simplified, and we now have a `!Invalid` assert in
`isMostDerivedAnUnsizedArray`, so we can catch this particular bug more
easily in the future.

Fingers crossed that this patch sticks this time. :)

Original commit message:

This patch does three things:
- Gives us the alloc_size attribute in clang, which lets us infer the
  number of bytes handed back to us by malloc/realloc/calloc/any user
  functions that act in a similar manner.
- Teaches our constexpr evaluator that evaluating some `const` variables
  is OK sometimes. This is why we have a change in
  test/SemaCXX/constant-expression-cxx11.cpp and other seemingly
  unrelated tests. Richard Smith okay'ed this idea some time ago in
  person.
- Uniques some Blocks in CodeGen, which was reviewed separately at
  D26410. Lack of uniquing only really shows up as a problem when
  combined with our new eagerness in the face of const.

llvm-svn: 290297
2016-12-22 02:50:20 +00:00
Daniel Jasper 9068938eb0 Revert "[OpenCL] Enabling the usage of CLK_NULL_QUEUE as compare operand."
This reverts commit r290171. It triggers a bunch of warnings, because
the new enumerator isn't handled in all switches. We want a warning-free
build.

Replied on the commit with more details.

llvm-svn: 290173
2016-12-20 10:05:04 +00:00
Egor Churaev 67c3f3ec68 [OpenCL] Enabling the usage of CLK_NULL_QUEUE as compare operand.
Summary: Enabling the compression of CLK_NULL_QUEUE to variable of type queue_t.

Reviewers: Anastasia

Subscribers: cfe-commits, yaxunl, bader

Differential Revision: https://reviews.llvm.org/D27569

llvm-svn: 290171
2016-12-20 09:15:21 +00:00
Chandler Carruth d7738fe6ad Revert r290149: Add the alloc_size attribute to clang.
This commit fails MSan when running test/CodeGen/object-size.c in
a confusing way. After some discussion with George, it isn't really
clear what is going on here. We can make the MSan failure go away by
testing for the invalid bit, but *why* things are invalid isn't clear.
And yet, other code in the surrounding area is doing precisely this and
testing for invalid.

George is going to take a closer look at this to better understand the
nature of the failure and recommit it, for now backing it out to clean
up MSan builds.

llvm-svn: 290169
2016-12-20 08:28:19 +00:00
George Burgess IV a747027bc6 Add the alloc_size attribute to clang.
This patch does three things:

- Gives us the alloc_size attribute in clang, which lets us infer the
  number of bytes handed back to us by malloc/realloc/calloc/any user
  functions that act in a similar manner.
- Teaches our constexpr evaluator that evaluating some `const` variables
  is OK sometimes. This is why we have a change in
  test/SemaCXX/constant-expression-cxx11.cpp and other seemingly
  unrelated tests. Richard Smith okay'ed this idea some time ago in
  person.
- Uniques some Blocks in CodeGen, which was reviewed separately at
  D26410. Lack of uniquing only really shows up as a problem when
  combined with our new eagerness in the face of const.

Differential Revision: https://reviews.llvm.org/D14274

llvm-svn: 290149
2016-12-20 01:05:42 +00:00
Yaxun Liu 5b74665a41 Recommit r289979 [OpenCL] Allow disabling types and declarations associated with extensions
Fixed undefined behavior due to cast integer to bool in initializer list.

llvm-svn: 290056
2016-12-18 05:18:55 +00:00
Yaxun Liu 35f6d66b0d Revert r289979 due to regressions
llvm-svn: 289991
2016-12-16 21:23:55 +00:00
Yaxun Liu 2e8331cab6 [OpenCL] Allow disabling types and declarations associated with extensions
Added a map to associate types and declarations with extensions.

Refactored existing diagnostic for disabled types associated with extensions and extended it to declarations for generic situation.

Fixed some bugs for types associated with extensions.

Allow users to use pragma to declare types and functions for supported extensions, e.g.

#pragma OPENCL EXTENSION the_new_extension_name : begin
// declare types and functions associated with the extension here
#pragma OPENCL EXTENSION the_new_extension_name : end

Differential Revision: https://reviews.llvm.org/D21698

llvm-svn: 289979
2016-12-16 19:22:08 +00:00
Yaxun Liu 402804b6d6 Re-commit r289252 and r289285, and fix PR31374
llvm-svn: 289787
2016-12-15 08:09:08 +00:00
Nico Weber 7849eeb035 Revert 289252 (and follow-up 289285), it caused PR31374
llvm-svn: 289713
2016-12-14 21:38:18 +00:00
Neil Hickey c881be1c23 Fixing build failure by adding triple option to new test condition.
Adding -triple option to ensure target supports double for fpmath test.

llvm-svn: 289552
2016-12-13 17:04:33 +00:00
Neil Hickey 88c0fac534 Improve handling of floating point literals in OpenCL to only use double precision if the target supports fp64.
This change makes sure single-precision floating point types are used if the 
cl_fp64 extension is not supported by the target.

Also removed the check to see whether the OpenCL version is >= 1.2, as this has
been incorporated into the extension setting code.

Differential Revision: https://reviews.llvm.org/D24235

llvm-svn: 289544
2016-12-13 16:22:50 +00:00
Egor Churaev 24939d479e [OpenCL] Enable unroll hint for OpenCL 1.x.
Summary: Although the feature was introduced only in OpenCL C v2.0 spec., it's useful for OpenCL 1.x too and doesn't require HW support.

Reviewers: Anastasia

Subscribers: yaxunl, cfe-commits, bader

Differential Revision: https://reviews.llvm.org/D27453

llvm-svn: 289535
2016-12-13 14:02:35 +00:00
Yaxun Liu 8f66b4b44a Add support for non-zero null pointer for C and OpenCL
In amdgcn target, null pointers in global, constant, and generic address space take value 0 but null pointers in private and local address space take value -1. Currently LLVM assumes all null pointers take value 0, which results in incorrectly translated IR. To workaround this issue, instead of emit null pointers in local and private address space, a null pointer in generic address space is emitted and casted to local and private address space.

Tentative definition of global variables with non-zero initializer will have weak linkage instead of common linkage since common linkage requires zero initializer and does not have explicit section to hold the non-zero value.

Virtual member functions getNullPointer and performAddrSpaceCast are added to TargetCodeGenInfo which by default returns ConstantPointerNull and emitting addrspacecast instruction. A virtual member function getNullPointerValue is added to TargetInfo which by default returns 0. Each target can override these virtual functions to get target specific null pointer and the null pointer value for specific address space, and perform specific translations for addrspacecast.

Wrapper functions getNullPointer is added to CodegenModule and getTargetNullPointerValue is added to ASTContext to facilitate getting the target specific null pointers and their values.

This change has no effect on other targets except amdgcn target. Other targets can provide support of non-zero null pointer in a similar way.

This change only provides support for non-zero null pointer for C and OpenCL. Supporting for other languages will be added later incrementally.

Differential Revision: https://reviews.llvm.org/D26196

llvm-svn: 289252
2016-12-09 19:01:11 +00:00
Alexey Bader a60db59d6f [OpenCL] Added a LIT test for ensuring address space mangling is done the same both in OpenCL1.2 and OpenCL2.0.
Patch by Egor Churaev (echuraev).

Reviewers: Anastasia

Subscribers: yaxunl, cfe-commits, bader

Differential Revision: https://reviews.llvm.org/D27403

llvm-svn: 288891
2016-12-07 08:43:49 +00:00
Alexey Bader b3190829e5 [OpenCL] Fix SPIR version generation.
Patch by Egor Churaev (echuraev).

Reviewers: Anastasia

Subscribers: bader, yaxunl, cfe-commits

Differential Revision: https://reviews.llvm.org/D27300

llvm-svn: 288890
2016-12-07 08:38:24 +00:00
Anastasia Stulova e4a1c38109 [OpenCL] Prevent generation of globals in non-constant AS for OpenCL.
Avoid using shortcut for const qualified non-constant address space
aggregate variables while generating them on the stack such that
the alloca object is used instead of a global variable containing
initializer.

Review: https://reviews.llvm.org/D27109
llvm-svn: 288163
2016-11-29 17:01:19 +00:00
Konstantin Zhuravlyov 62ae8f671c [AMDGPU] Change frexp.exp builtin to return i16 for f16 input
Differential Revision: https://reviews.llvm.org/D26863

llvm-svn: 287390
2016-11-18 22:31:51 +00:00
Stanislav Mekhanoshin cd433d2811 [AMDGPU] Add wave barrier builtin
The wave barrier represents the discardable barrier. Its main purpose is to
carry convergent attribute, thus preventing illegal CFG optimizations. All lanes
in a wave come to convergence point simultaneously with SIMT, thus no special
instruction is needed in the ISA. The barrier is discarded during code generation.

Differential Revision: https://reviews.llvm.org/D26584

llvm-svn: 287006
2016-11-15 18:58:03 +00:00
Anastasia Stulova 0df4ac3f94 [OpenCL] Fix for integer parameters of enqueue_kernel
Make handling integer parameters more flexible:

- For the number of events argument allow to pass larger
integers than 32 bits as soon as compiler can prove that
the range fits in 32 bits. If not, the diagnostic will be given.

- Change type of the arguments specifying the sizes of
the corresponding block arguments to be size_t.

Review: https://reviews.llvm.org/D26509
llvm-svn: 286849
2016-11-14 17:39:58 +00:00
Anastasia Stulova 2b46120a09 [OpenCL] Change to clk_event parameter in enqueue_kernel.
- Accept NULL pointer as a valid parameter value for clk_event.
- Generate clk_event_t arguments of internal
__enqueue_kernel_XXX function as pointers in generic address space.

Review: https://reviews.llvm.org/D26507
llvm-svn: 286836
2016-11-14 15:34:01 +00:00
Pekka Jaaskelainen 5136dd81ad Fix r286819 (accidentally patched multiple times.
llvm-svn: 286821
2016-11-14 13:14:38 +00:00
Pekka Jaaskelainen 2a1cc587bf [OpenCL] always use SPIR address spaces for kernel_arg_addr_space MD
It doesn't make sense to use the target's address space ids in this context as
this is metadata that should be referring to the "logical" OpenCL address spaces.
For flat AS machines like all "CPUs" in general, the logical AS info gets lost as
there's only one address space (0).

This commit changes the logic such that we always use the SPIR address space
ids for the argument metadata. It thus allows implementing the clGetKernelArgInfo()
and the other detection needs.

https://reviews.llvm.org/D26157

llvm-svn: 286819
2016-11-14 13:08:30 +00:00