The IR hasn't switched the default yet, so explicitly add the ieee
attributes.
I'm still not really sure how the target default denormal mode should
interact with -fno-unsafe-math-optimizations. The target may have
selected the default mode to be non-IEEE based on the flags or based
on its true behavior, but we don't know which is the case. Since the
only users of a non-IEEE mode without a flag still support IEEE mode,
just reset to IEEE.
This reverts commit 0a9fc9233e.
Going to look at the asan failures.
I find the failures in the test suite weird, because they look
like compile time test and I don't understand how that can be
failing, but will have a brief look at that too.
This makes -fno-common the default for all targets because this has performance
and code-size benefits and is more language conforming for C code.
Additionally, GCC10 also defaults to -fno-common and so we get consistent
behaviour with GCC.
With this change, C code that uses tentative definitions as definitions of a
variable in multiple translation units will trigger multiple-definition linker
errors. Generally, this occurs when the use of the extern keyword is neglected
in the declaration of a variable in a header file. In some cases, no specific
translation unit provides a definition of the variable. The previous behavior
can be restored by specifying -fcommon.
As GCC has switched already, we benefit from applications already being ported
and existing documentation how to do this. For example:
- https://gcc.gnu.org/gcc-10/porting_to.html
- https://wiki.gentoo.org/wiki/Gcc_10_porting_notes/fno_common
Differential revision: https://reviews.llvm.org/D75056
norecurse function attr indicates the function is not called recursively
directly or indirectly.
Add norecurse to OpenCL functions, SYCL functions in device compilation
and CUDA/HIP kernels.
Although there is LLVM pass adding norecurse to functions, it only works
for whole-program compilation. Also FE adding norecurse can make that
pass run faster since functions with norecurse do not need to be checked
again.
Differential Revision: https://reviews.llvm.org/D73651
This CL adds clang declarations of built-in functions for AMDGPU MFMA intrinsics and instructions.
OpenCL tests for new built-ins are included.
Differential Revision: https://reviews.llvm.org/D72723
Currently there are 4 different mechanisms for controlling denormal
flushing behavior, and about as many equivalent frontend controls.
- AMDGPU uses the fp32-denormals and fp64-f16-denormals subtarget features
- NVPTX uses the nvptx-f32ftz attribute
- ARM directly uses the denormal-fp-math attribute
- Other targets indirectly use denormal-fp-math in one DAGCombine
- cl-denorms-are-zero has a corresponding denorms-are-zero attribute
AMDGPU wants a distinct control for f32 flushing from f16/f64, and as
far as I can tell the same is true for NVPTX (based on the attribute
name).
Work on consolidating these into the denormal-fp-math attribute, and a
new type specific denormal-fp-math-f32 variant. Only ARM seems to
support the two different flush modes, so this is overkill for the
other use cases. Ideally we would error on the unsupported
positive-zero mode on other targets from somewhere.
Move the logic for selecting the flush mode into the compiler driver,
instead of handling it in cc1. denormal-fp-math/denormal-fp-math-f32
are now both cc1 flags, but denormal-fp-math-f32 is not yet exposed as
a user flag.
-cl-denorms-are-zero, -fcuda-flush-denormals-to-zero and
-fno-cuda-flush-denormals-to-zero will be mapped to
-fp-denormal-math-f32=ieee or preserve-sign rather than the old
attributes.
Stop emitting the denorms-are-zero attribute for the OpenCL flag. It
has no in-tree users. The meaning would also be target dependent, such
as the AMDGPU choice to treat this as only meaning allow flushing of
f32 and not f16 or f64. The naming is also potentially confusing,
since DAZ in other contexts refers to instructions implicitly treating
input denormals as zero, not necessarily flushing output denormals to
zero.
This also does not attempt to change the behavior for the current
attribute. The LangRef now states that the default is ieee behavior,
but this is inaccurate for the current implementation. The clang
handling is slightly hacky to avoid touching the existing
denormal-fp-math uses. Fixing this will be left for a future patch.
AMDGPU is still using the subtarget feature to control the denormal
mode, but the new attribute are now emitted. A future change will
switch this and remove the subtarget features.
Commit 9a8d477a0e ("[OpenCL] Add builtin function attribute
handling", 2019-11-05) stopped Clang from mangling single-overload
builtins, which is incorrect.
Add handling for the "pure", "const" and "convergent" function
attributes for OpenCL builtin functions.
Patch by Pierre Gondois and Sven van Haastregt.
Differential Revision: https://reviews.llvm.org/D64319
The implementation of the OpenCL builtin currently library uses 2
different hacks to get to the corresponding IR intrinsics from the
source. This will allow removal of those.
This is the set that is currently used (minus a few vector ones).
llvm-svn: 367973
For consistency with normal instructions and clarity when reading IR,
it's best to print the %0, %1, ... names of function arguments in
definitions.
Also modifies the parser to accept IR in that form for obvious reasons.
llvm-svn: 367755
Rename lang mode flag to -cl-std=clc++/-cl-std=CLC++
or -std=clc++/-std=CLC++.
This aligns with OpenCL C conversion and removes ambiguity
with OpenCL C++.
Differential Revision: https://reviews.llvm.org/D65102
llvm-svn: 367008
Modified the intrinsics
int_addressofreturnaddress,
int_frameaddress & int_sponentry.
This commit depends on the changes in rL366679
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D64563
llvm-svn: 366683
To enable a new implicit kernel argument,
increased the number of argument bytes from 48 to 56.
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D63756
llvm-svn: 365643
This patch ensures built-in functions are rewritten using the proper
parent declaration.
Existing tests are modified to run in C++ mode to ensure the
functionality works also with C++ for OpenCL while not increasing the
testing runtime.
llvm-svn: 365499
For CodeGenOpenCL/convergent.cl, the new PM produced a slightly different for
loop, but this still checks for no loop unrolling as intended. This is
committed separately from D63174.
llvm-svn: 364202
LLVM IR recently added a Type parameter to the byval Attribute, so that
when pointers become opaque and no longer have an element type the
information will still be present in IR.
For now the Type parameter is optional (which is why Clang didn't need
this change at the time), but it will become mandatory soon.
llvm-svn: 362652
Since byval is now a typed attribute it gets sorted slightly differently by
LLVM when the order of attributes is being canonicalized. This updates the few
Clang tests that depend on the old order.
Clang patch is unchanged.
llvm-svn: 362129
Support logical operators on vectors in C++ for OpenCL mode, to
preserve backwards compatibility with OpenCL C.
Differential Revision: https://reviews.llvm.org/D62588
llvm-svn: 362087
Since byval is now a typed attribute it gets sorted slightly differently by
LLVM when the order of attributes is being canonicalized. This updates the few
Clang tests that depend on the old order.
llvm-svn: 362013
OpenCL spec v2.0 s6.13.14:
Samplers can also be declared as global constants in the program
source using the following syntax.
const sampler_t <sampler name> = <value>
This works fine for OpenCL 1.2 but fails for 2.0, because clang duduces
address space of file-scope const sampler variable to be in global address
space whereas spec v2.0 s6.9.b forbids file-scope sampler variable to be
in global address space.
The fix is not to deduce address space for file-scope sampler variables.
Differential Revision: https://reviews.llvm.org/D62197
llvm-svn: 361757
The semantics for converting nested pointers between address
spaces are not very well defined. Some conversions which do not
really carry any meaning only produce warnings, and in some cases
warnings hide invalid conversions, such as 'global int*' to
'local float*'!
This patch changes the logic in checkPointerTypesForAssignment
and checkAddressSpaceCast to fail properly on implicit conversions
that should definitely not be permitted. We also dig deeper into the
pointer types and warn on explicit conversions where the address
space in a nested pointer changes, regardless of whether the address
space is compatible with the corresponding pointer nesting level
on the destination type.
Fixes PR39674!
Patch by ebevhan (Bevin Hansson)!
Differential Revision: https://reviews.llvm.org/D58236
llvm-svn: 360258
AMDGPU currently relies on global properties being set before
setTargetProperties is called. Existing targets like MIPS which rely on
setTargetProperties do not rely on the current behavior, so this patch
moves the call later in SetFunctionAttributes.
Differential Revision: https://reviews.llvm.org/D60967
llvm-svn: 359039
Summary:
https://reviews.llvm.org/D53809 fixed wrong address space(assert in debug build)
generated for event_ret argument. But exactly the same problem exists for
event_wait_list argument. This patch should fix both.
Reviewers: Anastasia, yaxunl
Reviewed By: Anastasia
Subscribers: kristina, ebevhan, cfe-commits
Differential Revision: https://reviews.llvm.org/D59985
llvm-svn: 358151
Summary:
[OpenCL] Generate 'unroll.enable' metadata for __attribute__((opencl_unroll_hint))
For both !{!"llvm.loop.unroll.enable"} and !{!"llvm.loop.unroll.full"} the unroller
will try to fully unroll a loop unless the trip count is not known at compile time.
In that case for '.full' metadata no unrolling will be processed, while for '.enable'
the loop will be partially unrolled with a heuristically chosen unroll factor.
See: docs/LanguageExtensions.rst
From https://www.khronos.org/registry/OpenCL/sdk/2.0/docs/man/xhtml/attributes-loopUnroll.html
__attribute__((opencl_unroll_hint))
for (int i=0; i<2; i++)
{
...
}
In the example above, the compiler will determine how much to unroll the loop.
Before the patch for __attribute__((opencl_unroll_hint)) was generated metadata
!{!"llvm.loop.unroll.full"}, which limits ability of loop unroller to decide, how
much to unroll the loop.
Reviewers: Anastasia, yaxunl
Reviewed By: Anastasia
Subscribers: zzheng, dmgreen, jdoerfert, cfe-commits, asavonic, AlexeySotkin
Tags: #clang
Differential Revision: https://reviews.llvm.org/D59493
llvm-svn: 356571
A recent change caused assertion in CodeGenFunction::EmitBlockCallExpr when a block is called.
There is code
Func = CGM.getOpenCLRuntime().getInvokeFunction(E->getCallee());
getCalleeDecl calls Expr::getReferencedDeclOfCallee, which does not handle
BlockExpr and returns nullptr, which causes isa to assert.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D58658
llvm-svn: 354893
Summary:
Emit direct call of block invoke functions when possible, i.e. in case the
block is not passed as a function argument.
Also doing some refactoring of `CodeGenFunction::EmitBlockCallExpr()`
Reviewers: Anastasia, yaxunl, svenvh
Reviewed By: Anastasia
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D58388
llvm-svn: 354568
Summary:
For some reason OpenCL blocks in LLVM IR are represented as function pointers.
These pointers do not point to any real function and never get called. Actually
they point to some structure, which in turn contains pointer to the real block
invoke function.
This patch changes represntation of OpenCL blocks in LLVM IR from function
pointers to pointers to `%struct.__block_literal_generic`.
Such representation allows to avoid unnecessary bitcasts and simplifies
further processing (e.g. translation to SPIR-V ) of the module for targets
which do not support function pointers.
Patch by: Alexey Sotkin.
Reviewers: Anastasia, yaxunl, svenvh
Reviewed By: Anastasia
Subscribers: alexbatashev, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D58277
llvm-svn: 354337
This allows the global visibility controls to be restrictive while still
populating the dynamic symbol table where required.
Differential Revision: https://reviews.llvm.org/D56871
llvm-svn: 353870
Some of these functions take some extraneous arguments, e.g. EltSize,
Offset, which are computable from the Type and DataLayout.
Add some asserts to ensure that the computed values are consistent
with the passed-in values, in preparation for eliminating the
extraneous arguments. This also asserts that the Type is an Array for
the calls named "Array" and a Struct for the calls named "Struct".
Then, correct a couple of errors:
1. Using CreateStructGEP on an array type. (this causes the majority
of the test differences, as struct GEPs are created with i32
indices, while array GEPs are created with i64 indices)
2. Passing the wrong Offset to CreateStructGEP in TargetInfo.cpp on
x86-64 NACL (which uses 32-bit pointers).
Differential Revision: https://reviews.llvm.org/D57766
llvm-svn: 353529
This reverts r348083. This was based on a misreading of the spec
for printf specifiers.
Also revert r343653, as without a subsequent patch, a correctly
specified format for a vector will incorrectly warn.
Fixes bug 40491.
llvm-svn: 352539
Windows doesn't allow common with alignment >32 bits, so these tests
were broken in windows mode. This patch makes 'common' optional in
these cases.
Change-Id: I4d5fdd07ecdafc3570ef9b09cd816c2e5e4ed15e
llvm-svn: 350645
Summary:
If a function argument is byval and RV is located in default or alloca address space
an optimization of creating addrspacecast instead of memcpy is performed. That is
not correct for OpenCL, where that can lead to a situation of address space casting
from __private * to __global *. See an example below:
```
typedef struct {
int x;
} MyStruct;
void foo(MyStruct val) {}
kernel void KernelOneMember(__global MyStruct* x) {
foo (*x);
}
```
for this code clang generated following IR:
...
%0 = load %struct.MyStruct addrspace(1)*, %struct.MyStruct addrspace(1)**
%x.addr, align 4
%1 = addrspacecast %struct.MyStruct addrspace(1)* %0 to %struct.MyStruct*
...
So the optimization was disallowed for OpenCL if RV is located in an address space
different than that of the argument (0).
Reviewers: yaxunl, Anastasia
Reviewed By: Anastasia
Subscribers: cfe-commits, asavonic
Differential Revision: https://reviews.llvm.org/D54947
llvm-svn: 348752
The spec is ambiguous on whether vector types are allowed to be
implicitly converted. The only legal context I think this can
be used for OpenCL is printf, where it seems necessary.
llvm-svn: 348083
Summary:
Prior to this patch, OpenCL code such as the following would attempt to create
a BranchInst with a non-bool argument:
if (enqueue_kernel(get_default_queue(), 0, nd, ^(void){})) /* ... */
This patch is a follow up on a similar issue with pipe builtin
operations. See commit r280800 and https://bugs.llvm.org/show_bug.cgi?id=30219.
This change, while being conservative on non-builtin functions,
should set the type of expressions invoking builtins to the
proper type, instead of defaulting to `bool` and requiring
manual overrides in Sema::CheckBuiltinFunctionCall.
In addition to tests for enqueue_kernel, the tests are extended to
check other OpenCL builtins.
Reviewers: Anastasia, spatel, rsmith
Reviewed By: Anastasia
Subscribers: kristina, cfe-commits, svenvh
Differential Revision: https://reviews.llvm.org/D52879
llvm-svn: 347658
Summary: The name of the synthesized constants for constant initialization was using mangling for statics, which isn't generally correct and (in a yet-uncommitted patch) causes the mangler to assert out because the static ends up trying to mangle function parameters and this makes no sense. Instead, mangle to `"__const." + FunctionName + "." + DeclName`.
Reviewers: rjmccall
Subscribers: dexonsmith, cfe-commits
Differential Revision: https://reviews.llvm.org/D54055
llvm-svn: 346915
This patch breaks Index/opencl-types.cl LIT test:
Script:
--
: 'RUN: at line 1'; stage1/bin/c-index-test -test-print-type llvm/tools/clang/test/Index/opencl-types.cl -cl-std=CL2.0 | stage1/bin/FileCheck llvm/tools/clang/test/Index/opencl-types.cl
--
Command Output (stderr):
--
llvm/tools/clang/test/Index/opencl-types.cl:3:26: warning: unsupported OpenCL extension 'cl_khr_fp16' - ignoring [-Wignored-pragmas]
llvm/tools/clang/test/Index/opencl-types.cl:4:26: warning: unsupported OpenCL extension 'cl_khr_fp64' - ignoring [-Wignored-pragmas]
llvm/tools/clang/test/Index/opencl-types.cl:8:9: error: use of type 'double' requires cl_khr_fp64 extension to be enabled
llvm/tools/clang/test/Index/opencl-types.cl:11:8: error: declaring variable of type 'half' is not allowed
llvm/tools/clang/test/Index/opencl-types.cl:15:3: error: use of type 'double' requires cl_khr_fp64 extension to be enabled
llvm/tools/clang/test/Index/opencl-types.cl:16:3: error: use of type 'double4' (vector of 4 'double' values) requires cl_khr_fp64 extension to be enabled
llvm/tools/clang/test/Index/opencl-types.cl:26:26: warning: unsupported OpenCL extension 'cl_khr_gl_msaa_sharing' - ignoring [-Wignored-pragmas]
llvm/tools/clang/test/Index/opencl-types.cl:35:44: error: use of type '__read_only image2d_msaa_t' requires cl_khr_gl_msaa_sharing extension to be enabled
llvm/tools/clang/test/Index/opencl-types.cl:36:49: error: use of type '__read_only image2d_array_msaa_t' requires cl_khr_gl_msaa_sharing extension to be enabled
llvm/tools/clang/test/Index/opencl-types.cl:37:49: error: use of type '__read_only image2d_msaa_depth_t' requires cl_khr_gl_msaa_sharing extension to be enabled
llvm/tools/clang/test/Index/opencl-types.cl:38:54: error: use of type '__read_only image2d_array_msaa_depth_t' requires cl_khr_gl_msaa_sharing extension to be enabled
llvm-svn: 346338
This is a continuation of my patches to inform the X86 backend about what the largest IR types are in the function so that we can restrict the backend type legalizer to prevent 512-bit vectors on SKX when -mprefer-vector-width=256 is specified if no explicit 512 bit vectors were specified by the user.
This patch updates the vector width based on the argument and return types of the current function and from the types of any functions it calls. This is intended to make sure the backend type legalizer doesn't disturb any types that are required for ABI.
Differential Revision: https://reviews.llvm.org/D52441
llvm-svn: 345168
Emit llvm.amdgcn.update.dpp for both __builtin_amdgcn_mov_dpp and
__builtin_amdgcn_update_dpp. The first argument to
llvm.amdgcn.update.dpp will be undef for __builtin_amdgcn_mov_dpp.
Differential Revision: https://reviews.llvm.org/D52320
llvm-svn: 344665
r326937 ("[OpenCL] Remove block invoke function from emitted block
literal struct", 2018-03-07) broke block argument handling. In
particular the commit was causing a crash during code generation, see
the discussion in https://reviews.llvm.org/D43783 .
The offending commit has just been reverted; add a test to avoid
breaking this again in the future.
llvm-svn: 343583
This reverts r326937 as it broke block argument handling in OpenCL.
See the discussion on https://reviews.llvm.org/D43783 .
The next commit will add a test case that revealed the issue.
llvm-svn: 343582
Fast FMAF is not a sufficient condition to enable denormals.
Before VI, enabling denormals caused F32 instructions to
run at F64 speeds.
llvm-svn: 339278
Always emit alloca in entry block for enqueue_kernel builtin.
Ensures the statically sized alloca is not converted to DYNAMIC_STACKALLOC
later because it is not in the entry block.
llvm-svn: 339150
Ensures the statically sized alloca is not converted to DYNAMIC_STACKALLOC
later because it is not in the entry block.
Differential Revision: https://reviews.llvm.org/D50104
llvm-svn: 338899
The way address space declarations for builtins currently work
is nearly useless. The code assumes the address spaces used for
builtins is a confusingly named "target address space" from user
code using __attribute__((address_space(N))) that matches
the builtin declaration. There's no way to use this to declare
a builtin that returns a language specific address space.
The terminology used is highly cofusing since it has nothing
to do with the the address space selected by the target to use
for a language address space.
This feature is essentially unused as-is. AMDGPU and NVPTX
are the only in-tree targets attempting to use this. The AMDGPU
builtins certainly do not behave as intended (i.e. all of the
builtins returning pointers can never compile because the numbered
address space never matches the expected named address space).
The NVPTX builtins are missing tests for some, and the others
seem to rely on an implicit addrspacecast.
Change the used address space for builtins based on a target
hook to allow using a language address space for a builtin.
This allows the same builtin declaration to be used for multiple
languages with similarly purposed address spaces (e.g. the same
AMDGPU builtin can be used in OpenCL and CUDA even though the
constant address spaces are arbitarily different).
This breaks the possibility of using arbitrary numbered
address spaces alongside the named address spaces for builtins.
If this is an issue we probably need to introduce another builtin
declaration character to distinguish language address spaces from
so-called "target address spaces".
llvm-svn: 338707
OpenCL block literal structs have different fields which are now correctly
identified in the debug info.
Differential Revision: https://reviews.llvm.org/D49930
llvm-svn: 338299
Summary: Automatic variable initialization was generating default-aligned stores (which are deprecated) instead of using the known alignment from the alloca. Further, they didn't specify inbounds.
Subscribers: dexonsmith, cfe-commits
Differential Revision: https://reviews.llvm.org/D49209
llvm-svn: 337041
1. added restrictions to memory scope, order and volatile parameters
2. added custom processing for these builtins - currently is not used code,
needed to switch off GCCBuiltin link to the builtins (ongoing change to llvm
tree)
3. builtins renamed as requested
Differential Revision: https://reviews.llvm.org/D43281
llvm-svn: 332848
There shouldn't be any tests that run the entire optimizer here,
but the last test in this file is definitely going to break with
a change in LLVM IR canonicalization. Change that part to check
the unoptimized IR because that's the real intent of this file.
llvm-svn: 332473
Added string literal helper function to obtain the type
attributed by a constant address space.
Also fixed predefind __func__ expr to use the helper
to constract the string literal correctly.
Differential Revision: https://reviews.llvm.org/D46049
llvm-svn: 331877