Produce remarks when atomic instructions are expanded into hardware instructions
in SIISelLowering.cpp. Currently, these remarks are only emitted for atomic fadd
instructions.
Differential Revision: https://reviews.llvm.org/D108150
With -fpreserve-vec3-type enabled, a cast was not created when
converting from a vec3 type to a non-vec3 type, even though a
conversion to vec4 was performed. This resulted in creation of
invalid store instructions.
Differential Revision: https://reviews.llvm.org/D107963
Implement target builtins for gfx90a including fadd64, fadd32, add2h,
max and min on various global, flat and ds address spaces for which
intrinsics are implemented.
Differential Revision: https://reviews.llvm.org/D106909
'pipe' keyword is introduced in OpenCL C 2.0: so do checks for OpenCL C version while
parsing and then later on check for language options to construct actual pipe. This feature
requires support of __opencl_c_generic_address_space, so diagnostics for that is provided as well.
This is the same patch as in D106748 but with a tiny fix in checking of diagnostic messages.
Also added tests when program scope global variables are not supported.
Reviewed By: Anastasia
Differential Revision: https://reviews.llvm.org/D107154
'pipe' keyword is introduced in OpenCL C 2.0: so do checks for OpenCL C version while
parsing and then later on check for language options to construct actual pipe. This feature
requires support of __opencl_c_generic_address_space, so diagnostics for that is provided as well.
Reviewed By: Anastasia
Differential Revision: https://reviews.llvm.org/D106748
Set default version for OpenCL C to 1.2. This means that the
absence of any standard flag will be equivalent to passing
'-cl-std=CL1.2'.
Note that this patch also fixes incorrect version check for
the pointer to pointer kernel arguments diagnostic and
atomic test.
Differential Revision: https://reviews.llvm.org/D106504
Reapply with fixes for clang tests.
-----
This is a simple enum attribute. Test changes are because enum
attributes are sorted before type attributes, so mustprogress is
now in a different position.
This reverts commit 52aeacfbf5.
There isn't full agreement on a path forward yet, but there is agreement that
this shouldn't land as-is. See discussion on https://reviews.llvm.org/D105338
Also reverts unreviewed "[clang] Improve `-Wnull-dereference` diag to be more in-line with reality"
This reverts commit f4877c78c0.
And all the related changes to tests:
This reverts commit 9a0152799f.
This reverts commit 3f7c9cc274.
This reverts commit 329f8197ef.
This reverts commit aa9f58cc2c.
This reverts commit 2df37d5ddd.
This reverts commit a72a441812.
Note regarding C++ for OpenCL:
When compiling C++ for OpenCL, DW_LANG_C_plus_plus* is emitted.
There is no DWARF language code defined for C++ for OpenCL as of yet,
but DWARF issue 210514.1 has been raised to request one.
In the mean time, continuing to emit DW_LANG_C_plus_plus* for C++ for
OpenCL allows the potential to distinguish between C++ for OpenCL and
OpenCL C in !DICompileUnit nodes, whereas using DW_LANG_OpenCL for
C++ for OpenCL would prevent this.
This change therefore leaves C++ for OpenCL as-is.
Reviewed By: shchenz, Anastasia
Differential Revision: https://reviews.llvm.org/D104118
Extend debug info handling by adding DWARF address space mapping for
SPIR, with corresponding test case.
Reviewed By: Anastasia
Differential Revision: https://reviews.llvm.org/D103097
There already exists cl_khr_fp64 extension. So OpenCL C 3.0
and higher should use the feature, earlier versions still
use the extension. OpenCL C 3.0 API spec states that extension
will be not described in the option string if corresponding
optional functionality is not supported (see 4.2. Querying Devices).
Due to that fact the usage of features for OpenCL C 3.0 must
be as follows:
```
$ clang -Xclang -cl-ext=+cl_khr_fp64,+__opencl_c_fp64 ...
$ clang -Xclang -cl-ext=-cl_khr_fp64,-__opencl_c_fp64 ...
```
e.g. the feature and the equivalent extension (if exists)
must be set to the same values
Reviewed By: Anastasia
Differential Revision: https://reviews.llvm.org/D96524
Drop non-conformant extension pragma implementation as
it does not properly disable anything and therefore
enabling non-disabled logic has no meaning.
This simplifies clang code and user interface to the extension
functionality. With this patch extension pragma 'begin'/'end'
and 'enable'/'disable' are only accepted for backward
compatibility and no longer have any default behavior.
Differential Revision: https://reviews.llvm.org/D101043
This removed the pointless need for extension pragma since
it doesn't disable anything properly and it doesn't need to
enable anything that is not possible to disable.
The change doesn't break existing kernels since it allows to
compile more cases i.e. without pragma statements but the
pragma continues to be accepted.
Differential Revision: https://reviews.llvm.org/D100985
Follow up on 431e3138a and complete the other possible combinations.
Besides enforcing the new behavior, it also mitigates TSAN false positives when
combining orders that used to be stronger.
This is a patch that disables the poison-unsafe select -> and/or i1 folding.
It has been blocking D72396 and also has been the source of a few miscompilations
described in llvm.org/pr49688 .
D99674 conditionally blocked this folding and successfully fixed the latter one.
The former one was still blocked, and this patch addresses it.
Note that a few test functions that has `_logical` suffix are now deoptimized.
These are created by @nikic to check the impact of disabling this optimization
by copying existing original functions and replacing and/or with select.
I can see that most of these are poison-unsafe; they can be revived by introducing
freeze instruction. I left comments at fcmp + select optimizations (or-fcmp.ll, and-fcmp.ll)
because I think they are good targets for freeze fix.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D101191
AMDGPU backend need to know whether floating point opcodes that support exception
flag gathering quiet and propagate signaling NaN inputs per IEEE754-2008, which is
conveyed by a function attribute "amdgpu-ieee". "amdgpu-ieee"="false" turns this off.
Without this function attribute backend assumes it is on for compute functions.
-mamdgpu-ieee and -mno-amdgpu-ieee are added to Clang to control this function attribute.
By default it is on. -mno-amdgpu-ieee requires -fno-honor-nans or equivalent.
Reviewed by: Matt Arsenault
Differential Revision: https://reviews.llvm.org/D77013
Have funcattrs expand all implied attributes into the IR. This expands the infrastructure from D100400, but for definitions not declarations this time.
Somewhat subtly, this mostly isn't semantic. Because the accessors did the inference, any client which used the accessor was already getting the stronger result. Clients that directly checked presence of attributes (there are some), will see a stronger result now.
The old behavior can end up quite confusing for two reasons:
* Without this change, we have situations where function-attrs appears to fail when inferring an attribute (as seen by a human reading IR), but that consuming code will see that it should have been implied. As a human trying to sanity check test results and study IR for optimization possibilities, this is exceeding error prone and confusing. (I'll note that I wasted several hours recently because of this.)
* We can have transforms which trigger without the IR appearing (on inspection) to meet the preconditions. This change doesn't prevent this from happening (as the accessors still involve multiple checks), but it should make it less frequent.
I'd argue in favor of deleting the extra checks out of the accessors after this lands, but I want that in it's own review as a) it's purely stylistic, and b) I already know there's some disagreement.
Once this lands, I'm also going to do a cleanup change which will delete some now redundant duplicate predicates in the inference code, but again, that deserves to be a change of it's own.
Differential Revision: https://reviews.llvm.org/D100226
This reverts commit ab98f2c712 and 98eea392cd.
It includes a fix for the clang test which triggered the revert. I failed to notice this one because there was another AMDGPU llvm test with a similiar name and the exact same text in the error message. Odd. Since only one build bot reported the clang test, I didn't notice that one.
Recently atomicrmw started to support fadd/fsub:
https://reviews.llvm.org/D53965
However clang atomic builtins fetch add/sub still does not support
emitting atomicrmw fadd/fsub.
This patch adds that.
Reviewed by: John McCall, Artem Belevich, Matt Arsenault, JF Bastien,
James Y Knight, Louis Dionne, Olivier Giroux
Differential Revision: https://reviews.llvm.org/D71726
Clang test CodeGenOpenCL/fpmath.cl uses a variable defined in an earlier
CHECK-NOT directive. However, by definition the pattern in that
directive is not supposed to occur so no variable will be defined. This
commit solves the issue by using a regex match with the same regex as in
the definition. It also changes the definition into a regex match since
no variable is going to be defined.
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D99857
Split out some of the instructions predicated on the dot2-insts target
feature into a new dot7-insts, in preparation for subtargets that have
some but not all of these instructions. NFCI.
Differential Revision: https://reviews.llvm.org/D98717
`__translate_sampler_initializer` has a calling convention of
`spir_func`, but clang generated calls to it using the default CC.
Instruction Combining was lowering these mismatching calling conventions
to `store i1* undef` which itself was subsequently lowered to a trap
instruction by simplifyCFG resulting in runtime `SIGILL`
There are arguably two bugs here: but whether there's any wisdom in
converting an obviously invalid call into a runtime crash over aborting
with a sensible error message will require further discussion. So for
now it's enough to set the right calling convention on the runtime
helper.
Reviewed By: svenh, bader
Differential Revision: https://reviews.llvm.org/D98411
IR produced using TableGen builtin function declarations
(`fdeclare-opencl-builtins.cl`) did not have the target's calling
convention applied to builtin calls.
Fix this, and update the codegen test to check that IR produced using
opencl-c.h and `-fdeclare-opencl-builtins` is identical with respect
to the builtin calls.
Differential Revision: https://reviews.llvm.org/D98039
gfx1030 added a new way to implement readcyclecounter using the
SHADER_CYCLES hardware register, but the s_memtime instruction still
exists, so the MC layer should still accept it and the
llvm.amdgcn.s.memtime intrinsic should still work.
Differential Revision: https://reviews.llvm.org/D97928
`mix` is subtly different from `clamp`: in the overloads where the
last argument is a scalar, the second argument should be a gentype for
`mix`.
As scalars can be implicitly converted to vectors, this cannot be
caught in the Sema test. Hence adding a CodeGen test, where we can
verify the types using the mangled name.