The named and generic address space overloads for atomic_init added
by 50f8abb9f4 ("[OpenCL] Add OpenCL 3.0 atomics to
-fdeclare-opencl-builtins", 2022-02-11) were not guarded by the
corresponding extensions.
This patch tries to implement RVO for coroutine's return object got from
get_return_object.
From [dcl.fct.def.coroutine]/p7 we could know that the return value of
get_return_object is either a reference or a prvalue. So it makes sense
to do copy elision for the return value. The return object should be
constructed directly into the storage where they would otherwise be
copied/moved to.
Test Plan: folly, check-all
Reviewed By: junparser
Differential revision: https://reviews.llvm.org/D117087
constexpr var may be initialized with address of non-const variable.
In this case the initializer is not constant in device compilation.
This has been handled for const vars but not for constexpr vars.
This patch makes handling of const var and constexpr var
consistent.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D119615
Fixes: https://github.com/llvm/llvm-project/issues/53780
The `__builtin_pdepd` and `__builtin_pextd` are P10 builtins that are meant to
be used under 64-bit only. For instance, when the builtins are compiled under
32-bit mode:
```
$ cat t.c
unsigned long long foo(unsigned long long a, unsigned long long b) {
return __builtin_pextd(a,b);
}
$ clang -c t.c -mcpu=pwr10 -m32
ExpandIntegerResult #0: t31: i64 = llvm.ppc.pextd TargetConstant:i32<6928>, t28, t29
fatal error: error in backend: Do not know how to expand the result of this operator!
```
This patch adds sema checking for these builtins to compile under 64-bit
mode only and on P10. The builtins will emit a diagnostic when they are compiled on
non-P10 compilations and on 32-bit mode.
Differential Revision: https://reviews.llvm.org/D118753
Post-commit review feedback suggested dropping the deprecated
diagnostic for the 'noreturn' macro (the diagnostic from the header
file suffices and the macro diagnostic could be confusing) and to only
issue the deprecated diagnostic for [[_Noreturn]] when the attribute
identifier is either directly written or not from a system macro.
Amends the commit made in 5029dce492.
when the function declaration's return type is already invalid for
some reason. This is relevant to https://github.com/llvm/llvm-project/issues/49188
because another way that the declaration's return type could become
invalid is that it might be `C auto` where `C<void>` is false.
Differential Revision: https://reviews.llvm.org/D119094
This adds support for http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2764.pdf,
which was adopted at the Feb 2022 WG14 meeting. That paper adds
[[noreturn]] and [[_Noreturn]] to the list of supported attributes in
C2x. These attributes have the same semantics as the [[noreturn]]
attribute in C++.
The [[_Noreturn]] attribute was added as a deprecated feature so that
translation units which include <stdnoreturn.h> do not get an error on
use of [[noreturn]] because the macro expands to _Noreturn. Users can
use -Wno-deprecated-attributes to silence the diagnostic.
Use of <stdnotreturn.h> or the noreturn macro were both deprecated.
Users can define the _CLANG_DISABLE_CRT_DEPRECATION_WARNINGS macro to
suppress the deprecation diagnostics coming from the header file.
When forming the function type from a declarator, we look for an
overloadable attribute before issuing a diagnostic in C about a
function signature containing only .... When the attribute is present,
we allow such a declaration for compatibility with the overloading
rules in C++. However, we were not looking for the attribute in all of
the places it is legal to write it on a declarator and so we only
accepted the signature in some forms and incorrectly rejected the
signature in others.
We now check for the attribute preceding the declarator instead of only
being applied to the declarator directly.
OpenCL C 3.0 __opencl_c_subgroups feature is slightly different
then other equivalent features and extensions (fp64 and 3d image writes):
OpenCL C 3.0 device can support the extension but not the feature.
cl_khr_subgroups requires subgroup independent forward progress.
This patch adjusts the check which is used when translating language
builtins to check either the extension or feature is supported.
Reviewed By: Anastasia
Differential Revision: https://reviews.llvm.org/D118999
Add the atomic overloads for the `global` and `local` address spaces,
which are new in OpenCL 3.0. Ensure the preexisting `generic`
overloads are guarded by the generic address space feature macro.
Ensure a subset of the atomic builtins are guarded by the
`__opencl_c_atomic_order_seq_cst` and `__opencl_c_atomic_scope_device`
feature macros, and enable those macros for SPIR/SPIR-V targets in
`opencl-c-base.h`.
Also guard the `cl_ext_float_atomics` builtins with the atomic order
and scope feature macros.
Differential Revision: https://reviews.llvm.org/D119420
Reduce the amount of repetition in the declarations by leveraging more
TableGen constructs. This is in preparation for adding the OpenCL 3.0
atomics feature optionality.
An error in the tablegen description affects the declarations
provided by `-fdeclare-opencl-builtins` for `atomic_fetch_add` and
`atomic_fetch_sub`.
The atomic argument should be an atomic_half, not an atomic_float.
The "-fzero-call-used-regs" option tells the compiler to zero out
certain registers before the function returns. It's also available as a
function attribute: zero_call_used_regs.
The two upper categories are:
- "used": Zero out used registers.
- "all": Zero out all registers, whether used or not.
The individual options are:
- "skip": Don't zero out any registers. This is the default.
- "used": Zero out all used registers.
- "used-arg": Zero out used registers that are used for arguments.
- "used-gpr": Zero out used registers that are GPRs.
- "used-gpr-arg": Zero out used GPRs that are used as arguments.
- "all": Zero out all registers.
- "all-arg": Zero out all registers used for arguments.
- "all-gpr": Zero out all GPRs.
- "all-gpr-arg": Zero out all GPRs used for arguments.
This is used to help mitigate Return-Oriented Programming exploits.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D110869
The above change assumed that malloc (and friends) would always
allocate memory to getNewAlign(), even for allocations which have a
smaller size. This is not actually required by spec (a 1-byte
allocation may validly have 1-byte alignment).
Some real-world malloc implementations do not provide this guarantee,
and thus this optimization is breaking programs.
Fixes#53540
This reverts commit c2297544c0.
Differential Revision: https://reviews.llvm.org/D118804
These changes make the Clang parser recognize expression parameter pack
expansion and initializer lists in attribute arguments. Because
expression parameter pack expansion requires additional handling while
creating and instantiating templates, the support for them must be
explicitly supported through the AcceptsExprPack flag.
Handling expression pack expansions may require a delay to when the
arguments of an attribute are correctly populated. To this end,
attributes that are set to accept these - through setting the
AcceptsExprPack flag - will automatically have an additional variadic
expression argument member named DelayedArgs. This member is not
exposed the same way other arguments are but is set through the new
CreateWithDelayedArgs creator function generated for applicable
attributes.
To illustrate how to implement support for expression pack expansion
support, clang::annotate is made to support pack expansions. This is
done by making handleAnnotationAttr delay setting the actual attribute
arguments until after template instantiation if it was unable to
populate the arguments due to dependencies in the parsed expressions.
Implement P2128R6 in C++23 mode.
Unlike GCC's implementation, this doesn't try to recover when a user
meant to use a comma expression.
Because the syntax changes meaning in C++23, the patch is *NOT*
implemented as an extension. Instead, declaring an array with not
exactly 1 parameter is an error in older languages modes. There is an
off-by-default extension warning in C++23 mode.
Unlike the standard, we supports default arguments;
Ie, we assume, based on conversations in WG21, that the proposed
resolution to CWG2507 will be accepted.
We allow arrays OpenMP sections and C++23 multidimensional array to
coexist:
[a , b] multi dimensional array
[a : b] open mp section
[a, b: c] // error
The rest of the patch is relatively straight forward: we take care to
support an arbitrary number of arguments everywhere.
Done in manner similar to mutexinoutset
(see https://reviews.llvm.org/D57576)
Runtime support already exists in LLVM OpenMP runtime (see
https://reviews.llvm.org/D97085).
The value used to identify an inoutset dependency type in the LLVM
OpenMP runtime is 8.
Some tests updated due to change in dependency type error messages that
now include new dependency type. Also updated
test/OpenMP/task_codegen.cpp to verify we emit the right code.
This patch implements `__builtin_elementwise_add_sat` and `__builtin_elementwise_sub_sat` builtins.
These map to the add/sub saturated math intrinsics described here:
https://llvm.org/docs/LangRef.html#saturation-arithmetic-intrinsics
With this in place we should then be able to replace the x86 SSE adds/subs intrinsics with these generic variants - it looks like other targets should be able to use these as well (arm/aarch64/webassembly all have similar examples in cgbuiltin).
Differential Revision: https://reviews.llvm.org/D117898
Since the serialization code would recognize modules by names and the
name of all global module fragment is <global>, so that the
serialization code would complain for the same module.
This patch fixes this by using a unique global module fragment in Sema.
Before this patch, the compiler would fail on an assertion complaining
the duplicated modules.
Reviewed By: urnathan, rsmith
Differential Revision: https://reviews.llvm.org/D115610
See the discussion in https://reviews.llvm.org/D100282. The coroutine
marked always inline might not be inlined properly in current compiler
support. Since the coroutine would be splitted into pieces. And the call
to resume() and destroy() functions might be indirect call. Also the
ramp function wouldn't get inlined under O0 due to pipeline ordering
problems. It might be different to what users expects to. Emit a warning
to tell it.
This is what GCC does too: https://godbolt.org/z/7eajb1Gf8
Reviewed By: Quuxplusone
Differential Revision: https://reviews.llvm.org/D115867
This reverts commit 852afed5e0.
Changes since D114732:
On PS4, we reverse the expectation that classes whose constructor is deleted are not trivially relocatable. Because, at the moment, only classes which are passed in registers are trivially relocatable, and PS4 allows passing in registers if the copy constructor is deleted, the original assertions were broken on PS4.
(This is kinda similar to DR1734.)
Reviewed By: gribozavr2
Differential Revision: https://reviews.llvm.org/D119017
OpenMP Spec 5.0 [2.12.5, Restrictions]: If a device clause in which the
ancestor device-modifier appears is present on the target construct,
then a requires directive with the reverse_offload clause must be
specified.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D118887
This will simplify future conditionalization for OpenCL 3.0
optionality of atomic features.
The only set of atomic functions not using the multiclass is
atomic_compare_exchange_strong/weak, as these don't fit the common
pattern due to having 2 MemoryOrder arguments.
This change enables library code to skip paired move-construction and destruction for `trivial_abi` types, as if they were trivially-movable and trivially-destructible. This offers an extension to the performance fix offered by `trivial_abi`: rather than only offering trivial-type-like performance for pass-by-value, it also offers it for library code that moves values but not as arguments.
For example, if we use `memcpy` for trivially relocatable types inside of vector reallocation, and mark `unique_ptr` as `trivial_abi` (via `_LIBCPP_ABI_ENABLE_UNIQUE_PTR_TRIVIAL_ABI` / `_LIBCPP_ABI_UNSTABLE` / etc.), this would speed up `vector<unique_ptr>::push_back` by 40% on my benchmarks. (Though note that in this case, the compiler could have done this anyway, but happens not to due to the inlining horizon.)
If accepted, I intend to follow up with exactly such changes to library code, including and especially `std::vector`, making them use a trivial relocation operation on trivially relocatable types.
**D50119 and P1144:**
This change is very similar to D50119, which was rejected from Clang. (That change was an implementation of P1144, which is not yet part of the C++ standard.)
The intent of this change, rather than trying to pick a winning proposal for trivial relocation operations, is to extend the behavior of `trivial_abi` in a way that could be made compatible with any such proposal. If P1144 or any similar proposal were accepted, then `trivial_abi`, `__is_trivially_relocatable`, and everything else in this change would be redefined in terms of that.
**Safety:**
It's worth pointing out, specifically, that `trivial_abi` already implies trivial relocatability in a narrow sense: a `trivial_abi` type, when passed by value, has its constructor run in one location, and its destructor run in another, after the type has been trivially relocated (through registers).
Trivial relocatability optimizations could change the number of paired constructor/destructor calls, but this seems unlikely to matter for `trivial_abi` types.
Reviewed By: rsmith
Differential Revision: https://reviews.llvm.org/D114732
If this is a SFINAE context, then continuing to look up names
(in particular, to treat a non-function as a function, and then
do ADL) might too-eagerly complete a type that it's not safe to
complete right now. We should just say "okay, that's a substitution
failure" and not do any more work than absolutely required.
Fixes#52970.
Differential Revision: https://reviews.llvm.org/D117603
Bug #52905 was originally papered over in a different way, but
I believe this is the actually proper fix, or at least closer to
it. We need to detect placeholder types as close to the front-end
as possible, and cause them to fail constraints, rather than letting
them persist into later stages.
Fixes#52905.
Fixes#52909.
Fixes#53075.
Differential Revision: https://reviews.llvm.org/D118552
Some functions can end up non-externally visible despite not being
declared "static" or in an unnamed namespace in C++ - such as by having
parameters that are of non-external types.
Such functions aren't mistakenly intended to be defining some function
that needs a declaration. They could be maybe more legible (except for
the `operator new` example) with an explicit static, but that's a
stylistic thing outside what should be addressed by a warning.
When the stepsize does not evenly divide the range's end, round-up to ensure that that last multiple of the stepsize before the reaching the upper boud is reached. For instance, the trip count of
for (int i = 0; i < 7; i+=5)
is two (i=0 and i=5), not (7-0)/5 == 1.
Reviewed By: peixin
Differential Revision: https://reviews.llvm.org/D118542
Currently, -fdeclare-opencl-builtins always adds the generic address
space overloads of e.g. the vload builtin functions in OpenCL 3.0
mode, even when the generic address space feature is disabled.
Guard the generic address space overloads by the
`__opencl_c_generic_address_space` feature instead of by OpenCL
version.
Guard the private, global, and local overloads using the internal
`__opencl_c_named_address_space_builtins` feature.
Differential Revision: https://reviews.llvm.org/D107769
Do not warn on reserved identifiers resulting from expansion of system macros.
Also properly test -Wreserved-identifier wrt. system headers.
Should fix#49592
Differential Revision: https://reviews.llvm.org/D118532