Commit Graph

14592 Commits

Author SHA1 Message Date
Michael Kruse b1de32d6dd [OMPIRBuilder] Clarify CanonicalLoopInfo. NFC.
Add in-source documentation on how CanonicalLoopInfo is intended to be used. In particular, clarify what parts of a CanonicalLoopInfo is considered part of the loop, that those parts must be side-effect free, and that InsertPoints to instructions outside those parts can be expected to be preserved after method calls implementing loop-associated directives.

CanonicalLoopInfo are now invalidated after it does not describe canonical loop anymore and asserts when trying to use it afterwards.

In addition, rename `createXYZWorkshareLoop` to `applyXYZWorkshareLoop` and remove the update location to avoid that the impression that they insert something from scratch at that location where in reality its InsertPoint is ignored. createStaticWorkshareLoop does not return a CanonicalLoopInfo anymore. First, it was not a canonical loop in the clarified sense (containing side-effects in form of calls to the OpenMP runtime). Second, it is ambiguous which of the two possible canonical loops it should actually return. It will not be needed before a feature expected to be introduced in OpenMP 6.0

Also see discussion in D105706.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D107540
2021-08-12 21:02:19 -05:00
Arnold Schwaighofer 9eb99d2e73 CodeGen: No need to check for isExternC if HasStrictReturn is already false
NFC intended.

Differential Revision: https://reviews.llvm.org/D107841
2021-08-11 07:42:48 -07:00
Wang, Pengfei 6f7f5b54c8 [X86] AVX512FP16 instructions enabling 1/6
1. Enable FP16 type support and basic declarations used by following patches.
2. Enable new instructions VMOVW and VMOVSH.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105263
2021-08-10 12:46:01 +08:00
Michael Liao 6ec36d18ec [cuda] Mark builtin texture/surface reference variable as 'externally_initialized'.
- They need to be preserved even if there's no reference within the
  device code as the host code may need to initialize them based on the
  application logic.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D107718
2021-08-09 13:27:40 -04:00
Roger Ferrer Ibanez bfb77364d0 [OpenMP] Fix accidental reuse of VLA size
We were using an OpaqueValueExpr allocated on the stack to store
the size of a VLA. Because the VLASizeMap in CodegenFunction
uses the address of the expression to avoid recomputing VLAs,
we were accidentally reusing an earlier llvm::Value. This led to
invalid LLVM IR.

This is a temporary solution until VLASizeMap can be pushed and popped
based on the context.

Differential Revision: https://reviews.llvm.org/D107666
2021-08-07 05:55:27 +00:00
Joseph Huber 41a6b50c25 [OpenMP]Fix PR51349: Remove AlwaysInline for if regions.
After D94315 we add the `NoInline` attribute to the outlined function to handle
data environments in the OpenMP if clause. This conflicted with the `AlwaysInline`
attribute added to the outlined function. for better performance in D106799.
The data environments should ideally not require NoInline, but for now this
fixes PR51349.

Reviewed By: mikerice

Differential Revision: https://reviews.llvm.org/D107649
2021-08-06 17:53:04 -04:00
Serge Pavlov 4c4093e6e3 Introduce intrinsic llvm.isnan
This is recommit of the patch 16ff91ebcc,
reverted in 0c28a7c990 because it had
an error in call of getFastMathFlags (base type should be FPMathOperator
but not Instruction). The original commit message is duplicated below:

    Clang has builtin function '__builtin_isnan', which implements C
    library function 'isnan'. This function now is implemented entirely in
    clang codegen, which expands the function into set of IR operations.
    There are three mechanisms by which the expansion can be made.

    * The most common mechanism is using an unordered comparison made by
      instruction 'fcmp uno'. This simple solution is target-independent
      and works well in most cases. It however is not suitable if floating
      point exceptions are tracked. Corresponding IEEE 754 operation and C
      function must never raise FP exception, even if the argument is a
      signaling NaN. Compare instructions usually does not have such
      property, they raise 'invalid' exception in such case. So this
      mechanism is unsuitable when exception behavior is strict. In
      particular it could result in unexpected trapping if argument is SNaN.

    * Another solution was implemented in https://reviews.llvm.org/D95948.
      It is used in the cases when raising FP exceptions by 'isnan' is not
      allowed. This solution implements 'isnan' using integer operations.
      It solves the problem of exceptions, but offers one solution for all
      targets, however some can do the check in more efficient way.

    * Solution implemented by https://reviews.llvm.org/D96568 introduced a
      hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
      specific code into IR. Now only SystemZ implements this hook and it
      generates a call to target specific intrinsic function.

    Although these mechanisms allow to implement 'isnan' with enough
    efficiency, expanding 'isnan' in clang has drawbacks:

    * The operation 'isnan' is hidden behind generic integer operations or
      target-specific intrinsics. It complicates analysis and can prevent
      some optimizations.

    * IR can be created by tools other than clang, in this case treatment
      of 'isnan' has to be duplicated in that tool.

    Another issue with the current implementation of 'isnan' comes from the
    use of options '-ffast-math' or '-fno-honor-nans'. If such option is
    specified, 'fcmp uno' may be optimized to 'false'. It is valid
    optimization in general, but it results in 'isnan' always returning
    'false'. For example, in some libc++ implementations the following code
    returns 'false':

        std::isnan(std::numeric_limits<float>::quiet_NaN())

    The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
    operands are never NaNs. This assumption however should not be applied
    to the functions that check FP number properties, including 'isnan'. If
    such function returns expected result instead of actually making
    checks, it becomes useless in many cases. The option '-ffast-math' is
    often used for performance critical code, as it can speed up execution
    by the expense of manual treatment of corner cases. If 'isnan' returns
    assumed result, a user cannot use it in the manual treatment of NaNs
    and has to invent replacements, like making the check using integer
    operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
    which also expresses the opinion, that limitations imposed by
    '-ffast-math' should be applied only to 'math' functions but not to
    'tests'.

    To overcome these drawbacks, this change introduces a new IR intrinsic
    function 'llvm.isnan', which realizes the check as specified by IEEE-754
    and C standards in target-agnostic way. During IR transformations it
    does not undergo undesirable optimizations. It reaches instruction
    selection, where is lowered in target-dependent way. The lowering can
    vary depending on options like '-ffast-math' or '-ffp-model' so the
    resulting code satisfies requested semantics.

    Differential Revision: https://reviews.llvm.org/D104854
2021-08-06 14:32:27 +07:00
Fangrui Song c38efb4899 [clang] Implement -falign-loops=N (N is a power of 2) for non-LTO
GCC supports multiple forms of -falign-loops=.
-falign-loops= is currently ignored in Clang.

This patch implements the simplest but the most useful form where N is a
power of 2.

The underlying implementation uses a `llvm::TargetOptions` option for now.
Bitcode generation ignores this option.

Differential Revision: https://reviews.llvm.org/D106701
2021-08-05 12:17:50 -07:00
Anshil Gandhi 39dac1f7f6 [clang] Add clang builtins support for gfx90a
Implement target builtins for gfx90a including fadd64, fadd32, add2h,
max and min on various global, flat and ds address spaces for which
intrinsics are implemented.

Differential Revision: https://reviews.llvm.org/D106909
2021-08-05 02:08:06 -06:00
Pavel Asyutchenko 7df405e079 Apply -fmacro-prefix-map to __builtin_FILE()
This matches the behavior of GCC.
Patch does not change remapping logic itself, so adding one simple smoke test should be enough.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D107393
2021-08-04 16:42:14 -07:00
Bradley Smith e57e1e4e00 [clang][AArch64][SVE] Avoid going through memory for fixed/scalable predicate casts
For fixed SVE types, predicates are represented using vectors of i8,
where as for scalable types they are represented using vectors of i1. We
can avoid going through memory for casts between these by bitcasting the
i1 scalable vectors to/from a scalable i8 vector of matching size, which
can then use the existing vector insert/extract logic.

Differential Revision: https://reviews.llvm.org/D106860
2021-08-04 16:10:37 +00:00
Serge Pavlov 0c28a7c990 Revert "Introduce intrinsic llvm.isnan"
This reverts commit 16ff91ebcc.
Several errors were reported mainly test-suite execution time. Reverted
for investigation.
2021-08-04 17:18:15 +07:00
Serge Pavlov 16ff91ebcc Introduce intrinsic llvm.isnan
Clang has builtin function '__builtin_isnan', which implements C
library function 'isnan'. This function now is implemented entirely in
clang codegen, which expands the function into set of IR operations.
There are three mechanisms by which the expansion can be made.

* The most common mechanism is using an unordered comparison made by
  instruction 'fcmp uno'. This simple solution is target-independent
  and works well in most cases. It however is not suitable if floating
  point exceptions are tracked. Corresponding IEEE 754 operation and C
  function must never raise FP exception, even if the argument is a
  signaling NaN. Compare instructions usually does not have such
  property, they raise 'invalid' exception in such case. So this
  mechanism is unsuitable when exception behavior is strict. In
  particular it could result in unexpected trapping if argument is SNaN.

* Another solution was implemented in https://reviews.llvm.org/D95948.
  It is used in the cases when raising FP exceptions by 'isnan' is not
  allowed. This solution implements 'isnan' using integer operations.
  It solves the problem of exceptions, but offers one solution for all
  targets, however some can do the check in more efficient way.

* Solution implemented by https://reviews.llvm.org/D96568 introduced a
  hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
  specific code into IR. Now only SystemZ implements this hook and it
  generates a call to target specific intrinsic function.

Although these mechanisms allow to implement 'isnan' with enough
efficiency, expanding 'isnan' in clang has drawbacks:

* The operation 'isnan' is hidden behind generic integer operations or
  target-specific intrinsics. It complicates analysis and can prevent
  some optimizations.

* IR can be created by tools other than clang, in this case treatment
  of 'isnan' has to be duplicated in that tool.

Another issue with the current implementation of 'isnan' comes from the
use of options '-ffast-math' or '-fno-honor-nans'. If such option is
specified, 'fcmp uno' may be optimized to 'false'. It is valid
optimization in general, but it results in 'isnan' always returning
'false'. For example, in some libc++ implementations the following code
returns 'false':

    std::isnan(std::numeric_limits<float>::quiet_NaN())

The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
operands are never NaNs. This assumption however should not be applied
to the functions that check FP number properties, including 'isnan'. If
such function returns expected result instead of actually making
checks, it becomes useless in many cases. The option '-ffast-math' is
often used for performance critical code, as it can speed up execution
by the expense of manual treatment of corner cases. If 'isnan' returns
assumed result, a user cannot use it in the manual treatment of NaNs
and has to invent replacements, like making the check using integer
operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
which also expresses the opinion, that limitations imposed by
'-ffast-math' should be applied only to 'math' functions but not to
'tests'.

To overcome these drawbacks, this change introduces a new IR intrinsic
function 'llvm.isnan', which realizes the check as specified by IEEE-754
and C standards in target-agnostic way. During IR transformations it
does not undergo undesirable optimizations. It reaches instruction
selection, where is lowered in target-dependent way. The lowering can
vary depending on options like '-ffast-math' or '-ffp-model' so the
resulting code satisfies requested semantics.

Differential Revision: https://reviews.llvm.org/D104854
2021-08-04 15:27:49 +07:00
Alexandros Lamprineas 29b263a34f [Clang][AArch64] Inline assembly support for the ACLE type 'data512_t'
In LLVM IR terms the ACLE type 'data512_t' is essentially an aggregate
type { [8 x i64] }. When emitting code for inline assembly operands,
clang tries to scalarize aggregate types to an integer of the equivalent
length, otherwise it passes them by-reference. This patch adds a target
hook to tell whether a given inline assembly operand is scalarizable
so that clang can emit code to pass/return it by-value.

Differential Revision: https://reviews.llvm.org/D94098
2021-07-31 09:51:28 +01:00
Tarindu Jayatilaka 7a797b2902 Take OptimizationLevel class out of Pass Builder
Pulled out the OptimizationLevel class from PassBuilder in order to be able to access it from within the PassManager and avoid include conflicts.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D107025
2021-07-29 21:57:23 -07:00
Melanie Blower bc5b5ea037 [clang][patch][FPEnv] Make initialization of C++ globals strictfp aware
@kpn pointed out that the global variable initialization functions didn't
have the "strictfp" metadata set correctly, and @rjmccall said that there
was buggy code in SetFPModel and StartFunction, this patch is to solve
those problems. When Sema creates a FunctionDecl, it sets the
FunctionDeclBits.UsesFPIntrin to "true" if the lexical FP settings
(i.e. a combination of command line options and #pragma float_control
settings) correspond to ConstrainedFP mode. That bit is used when CodeGen
starts codegen for a llvm function, and it translates into the
"strictfp" function attribute. See bugs.llvm.org/show_bug.cgi?id=44571

Reviewed By: Aaron Ballman

Differential Revision: https://reviews.llvm.org/D102343
2021-07-29 12:02:37 -04:00
Kai Luo e4902e69e9 [PowerPC] Fix return type of XL compat CAS
`__compare_and_swap*` should return `i32` rather than `i1`.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D107077
2021-07-29 14:49:26 +00:00
Fangrui Song 828767f325 COFF/ELF: Place llvm.global_ctors elements in llvm.used if comdat is used
On ELF, an SHT_INIT_ARRAY outside a section group is a GC root. The current
codegen abuses SHT_INIT_ARRAY in a section group to mean a GC root.

On PE/COFF, the dynamic initialization for `__declspec(selectany)` in a comdat
can be garbage collected by `-opt:ref`.

Call `addUsedGlobal` for the two cases to fix the abuse/bug.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D106925
2021-07-28 11:44:19 -07:00
Matheus Izvekov 4819b751bd [clang] NFC: change uses of `Expr->getValueKind` into `is?Value`
Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>

Reviewed By: rsmith

Differential Revision: https://reviews.llvm.org/D100733
2021-07-28 03:09:31 +02:00
Jose M Monsalve Diaz 0276db1416 [OpenMP] Creating the `omp_target_num_teams` and `omp_target_thread_limit` attributes to outlined functions
The device runtime contains several calls to __kmpc_get_hardware_num_threads_in_block
and __kmpc_get_hardware_num_blocks. If the thread_limit and the num_teams are constant,
these calls can be folded to the constant value.

In commit D106033 we have the optimization phase. This commit adds the attributes to
the outlined function for the grid size. the two attributes are `omp_target_num_teams` and
`omp_target_thread_limit`. These values are added as long as they are constant.

Two functions are created `getNumThreadsExprForTargetDirective` and
`getNumTeamsExprForTargetDirective`. The original functions `emitNumTeamsForTargetDirective`
 and `emitNumThreadsForTargetDirective` identify the expresion and emit the code.
However, for the Device version of the outlined function, we cannot emit anything.
Therefore, this is a first attempt to separate emision of code from deduction of the
values.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106298
2021-07-27 17:21:04 -04:00
Thomas Lively 33786576fd [WebAssembly] Codegen for extmul SIMD instructions
Replace the clang builtins and LLVM intrinsics for the SIMD extmul instructions
with normal codegen patterns.

Differential Revision: https://reviews.llvm.org/D106724
2021-07-27 08:41:30 -07:00
Reid Kleckner f9f56488e0 [DebugInfo] Use per-enumerator signedness for DIEnumerator
Allegedly the DWARF backend ignores this field of DIEnumerator, but we
set it nonetheless in case we decide to use it in the future.
Alternatively, we could remove it, but it is simpler to pass down the
signed bit as it is in the AST for now.

Implemented to address comments on D106585
2021-07-26 16:14:28 -07:00
Joseph Huber af000197c4 [OpenMP] Always inline the OpenMP outlined function
This patch adds the always inline attribute to the outlined functions generated
by OpenMP regions. Because there is only a single instance of this function and
it always has internal linkage it is safe to inline in every instance it is
created. This could potentially lead to performance degredation due to
inflated register counts in the parallel region.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106799
2021-07-26 17:27:59 -04:00
Reid Kleckner 3230493299 Fix clang debug info irgen of i128 enums
DIEnumerator stores an APInt as of April 2020, so now we don't need to
truncate the enumerator value to 64 bits. Fixes assertions during IRGen.

Split from D105320, thanks to Matheus Izvekov for the test case and
report.

Differential Revision: https://reviews.llvm.org/D106585
2021-07-26 12:25:29 -07:00
Nemanja Ivanovic 1c50a5da36 [PowerPC] Implement partial vector ld/st builtins for XL compatibility
XL provides functions __vec_ldrmb/__vec_strmb for loading/storing a
sequence of 1 to 16 bytes in big endian order, right justified in the
vector register (regardless of target endianness).
This is equivalent to vec_xl_len_r/vec_xst_len_r which are only
available on Power9.

This patch simply uses the Power9 functions when compiled for Power9,
but provides a more general implementation for Power8.

Differential revision: https://reviews.llvm.org/D106757
2021-07-26 13:19:52 -05:00
Shilei Tian 3274cdc83e [Clang][OpenMP] Remove the mandatory flush for capture for OpenMP 5.1
In OpenMP 5.1:
> If the `write` or `update` clause is specifieded, the atomic operation is not an atomic conditional update for which the comparison fails, and the effective memory ordering is `release`, `acq_rel`, or `seq_cst`, the strong flush on entry to the atomic operation is also a release flush. If the `read` or `update` clause is specified and the effective memory ordering is `acquire`, `acq_rel`, or `seq_cst` then the strong flush on exit from the atomic operation is also an acquire flush.

In OpenMP 5.0:
> If the `write`, `update`, or **`capture`** clause is specified and the `release`, `acq_rel`, or `seq_cst` clause is specified then the strong flush on entry to the atomic operation is also a release flush. If the `read` or `capture` clause is specified and the `acquire`, `acq_rel`, or `seq_cst` clause is specified then the strong flush on exit from the atomic operation is also an acquire flush.

From my understanding, in OpenMP 5.1, `capture` is removed from the requirement for flush, therefore we don't have to enforce it.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D100768
2021-07-26 11:00:44 -04:00
Thomas Lively 85157c0079 [WebAssembly] Codegen for pmin and pmax
Replace the clang builtins and LLVM intrinsics for {f32x4,f64x2}.{pmin,pmax}
with standard codegen patterns. Since wasm_simd128.h uses an integer vector as
the standard single vector type, the IR for the pmin and pmax intrinsic
functions contains bitcasts that would not be there otherwise. Add extra codegen
patterns that can still select the pmin and pmax instructions in the presence of
these bitcasts.

Differential Revision: https://reviews.llvm.org/D106612
2021-07-23 14:49:21 -07:00
Fangrui Song 7290ddd6b1 Revert "[clang] -falign-loops="
This reverts commit 42896eeed9.

Unfinished. Accidentally pushed when reverting a clangd commit.
2021-07-23 09:58:35 -07:00
Fangrui Song 42896eeed9 [clang] -falign-loops= 2021-07-23 09:50:43 -07:00
Yaxun (Sam) Liu 44dbbe6106 [HIP] Preserve ASAN bitcode library functions
Address sanitizer passes may generate call of ASAN bitcode library
functions after bitcode linking in lld, therefore lld cannot add
those symbols since it does not know they will be used later.

To solve this issue, clang emits a reference to a bicode library
function which calls all ASAN functions which need to be
preserved. This basically force all ASAN functions to be
linked in.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D106315
2021-07-23 10:35:52 -04:00
Kai Luo e4ed93cb25 [PowerPC] Implement XL compatible behavior of __compare_and_swap
According to https://www.ibm.com/docs/en/xl-c-and-cpp-aix/16.1?topic=functions-compare-swap-compare-swaplp
XL's `__compare_and_swap` has a weird behavior that

> In either case, the contents of the memory location specified by addr are copied into the memory location specified by old_val_addr.

(unlike c11 `atomic_compare_exchange` specified in http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf)

This patch let clang's implementation follow this behavior.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D106344
2021-07-23 01:16:02 +00:00
Florian Mayer 96c63492cb [hwasan] Use stack safety analysis.
This avoids unnecessary instrumentation.

Reviewed By: eugenis, vitalybuka

Differential Revision: https://reviews.llvm.org/D105703
2021-07-22 16:20:27 -07:00
Alexey Bataev b88a68c45e [OPENMP]Fix PR49787: Codegen for calling __tgt_target_teams_nowait_mapper has too few arguments.
Added missed arguments in
__tgt_target_teams_nowait_mapper/__tgt_target_nowait_mapper runtime
functions calls.

Differential Revision: https://reviews.llvm.org/D106542
2021-07-22 08:44:37 -07:00
Alexey Bataev f828f0a90f Revert "[OPENMP]Fix PR49787: Codegen for calling __tgt_target_teams_nowait_mapper has too few arguments."
This reverts commit b455f7f225 to fix
buildbots.
2021-07-22 08:06:29 -07:00
Alexey Bataev b455f7f225 [OPENMP]Fix PR49787: Codegen for calling __tgt_target_teams_nowait_mapper has too few arguments.
Added missed arguments in
__tgt_target_teams_nowait_mapper/__tgt_target_nowait_mapper runtime
functions calls.

Differential Revision: https://reviews.llvm.org/D106542
2021-07-22 07:53:37 -07:00
Florian Mayer 789a4a2e5c Revert "[hwasan] Use stack safety analysis."
This reverts commit bde9415fef.
2021-07-22 12:16:16 +01:00
Florian Mayer bde9415fef [hwasan] Use stack safety analysis.
This avoids unnecessary instrumentation.

Reviewed By: eugenis, vitalybuka

Differential Revision: https://reviews.llvm.org/D105703
2021-07-22 12:04:54 +01:00
Simon Tatham bd41136746 [clang] Use i64 for the !srcloc metadata on asm IR nodes.
This is part of a patch series working towards the ability to make
SourceLocation into a 64-bit type to handle larger translation units.

!srcloc is generated in clang codegen, and pulled back out by llvm
functions like AsmPrinter::emitInlineAsm that need to report errors in
the inline asm. From there it goes to LLVMContext::emitError, is
stored in DiagnosticInfoInlineAsm, and ends up back in clang, at
BackendConsumer::InlineAsmDiagHandler(), which reconstitutes a true
clang::SourceLocation from the integer cookie.

Throughout this code path, it's now 64-bit rather than 32, which means
that if SourceLocation is expanded to a 64-bit type, this error report
won't lose half of the data.

The compiler will tolerate both of i32 and i64 !srcloc metadata in
input IR without faulting. Test added in llvm/MC. (The semantic
accuracy of the metadata is another matter, but I don't know of any
situation where that matters: if you're reading an IR file written by
a previous run of clang, you don't have the SourceManager that can
relate those source locations back to the original source files.)

Original version of the patch by Mikhail Maltsev.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D105491
2021-07-22 10:24:52 +01:00
Joseph Huber 754eb1c210 [OpenMP] Change `__kmpc_free_shared` to include the paired allocation size
This patch changes `__kmpc_free_shared` to take an additional argument
corresponding to the associated allocation's size. This makes it easier to
implement the allocator in the runtime.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106496
2021-07-21 20:56:21 -04:00
Thomas Lively 8af333cf1a [WebAssembly] Replace @llvm.wasm.popcnt with @llvm.ctpop.v16i8
Use the standard target-independent intrinsic to take advantage of standard
optimizations.

Differential Revision: https://reviews.llvm.org/D106506
2021-07-21 16:45:54 -07:00
Thomas Lively db7efcab7d [WebAssembly] Remove clang builtins for extract_lane and replace_lane
These builtins were added to capture the fact that the underlying Wasm
instructions return i32s and implicitly sign or zero extend the extracted lanes
in the case of the i8x16 and i16x8 variants. But we do sufficient optimizations
during code gen that these low-level details do not need to be exposed to users.

This commit replaces the use of the builtins in wasm_simd128.h with normal
target-independent vector code. As a result, we can switch the relevant
intrinsics to use functions rather than macros and can use more user-friendly
return types rather than trying to precisely expose the underlying Wasm types.
Note, however, that the generated LLVM IR is no different after this change.

Differential Revision: https://reviews.llvm.org/D106500
2021-07-21 16:11:00 -07:00
Thomas Lively 1a57ee1276 [WebAssembly] Codegen for v128.load{32,64}_zero
Replace the experimental clang builtins and LLVM intrinsics for these
instructions with normal instruction selection patterns. The wasm_simd128.h
intrinsics header was already using portable code for the corresponding
intrinsics, so now it produces the correct instructions.

Differential Revision: https://reviews.llvm.org/D106400
2021-07-21 09:02:12 -07:00
Quinn Pham e002d251dd [PowerPC] Floating Point Builtins for XL Compat.
This patch is in a series of patches to provide
builtins for compatibility with the XL compiler.
This patch adds builtins related to floating point
operations

Reviewed By: #powerpc, nemanjai, amyk, NeHuang

Differential Revision: https://reviews.llvm.org/D103986
2021-07-21 08:33:39 -05:00
Simon Tatham 21401a7262 [clang] Introduce SourceLocation::[U]IntTy typedefs.
This is part of a patch series working towards the ability to make
SourceLocation into a 64-bit type to handle larger translation units.

NFC: this patch introduces typedefs for the integer type used by
SourceLocation and makes all the boring changes to use the typedefs
everywhere, but for the moment, they are unconditionally defined to
uint32_t.

Patch originally by Mikhail Maltsev.

Reviewed By: tmatheson

Differential Revision: https://reviews.llvm.org/D105492
2021-07-21 10:45:46 +01:00
Albion Fung 2fd1520247 [PowerPC] Implemented mtmsr, mfspr, mtspr Builtins
Implemented builtins for mtmsr, mfspr, mtspr on PowerPC;
the patch is intended for XL Compatibility.

Differential revision: https://reviews.llvm.org/D106130
2021-07-20 17:51:00 -05:00
Albion Fung 3434ac9e39 [PowerPC] Store, load, move from and to registers related builtins
This patch implements store, load, move from and to registers related
builtins, as well as the builtin for stfiw. The patch aims to provide
feature parady with xlC on AIX.

Differential revision: https://reviews.llvm.org/D105946
2021-07-20 15:46:14 -05:00
Victor Huang 1a762f93f8 [PowerPC] Add PowerPC cmpb builtin and emit target indepedent code for XL compatibility
This patch is in a series of patches to provide builtins for compatibility
with the XL compiler. This patch add the builtin and emit target independent
code for __cmpb.

Reviewed By: nemanjai, #powerpc

Differential revision: https://reviews.llvm.org/D105194
2021-07-20 13:06:22 -05:00
Melanie Blower ea864c9933 [clang][patch][NFC] Refactor calculation of FunctionDecl to avoid duplicate code 2021-07-20 11:01:22 -04:00
Quinn Pham fd855c24c7 [PowerPC] Restore FastMathFlags of Builder for Vector FDiv Builtins
This patch fixes `__builtin_ppc_recipdivf`, `__builtin_ppc_recipdivd`,
`__builtin_ppc_rsqrtf`, and `__builtin_ppc_rsqrtd`. FastMathFlags are
set to fast immediately before emitting these builtins. Now the flags
are restored to their previous values after the builtins are emitted.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D105984
2021-07-20 09:41:00 -05:00
Stefan Pintilie 02cd937945 [PowerPC][Builtins] Added a number of builtins for compatibility with XL.
Added a number of different builtins that exist in the XL compiler. Most of
these builtins already exist in clang under a different name.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D104386
2021-07-20 08:57:55 -05:00
Florian Mayer 5f08219322 Revert "[hwasan] Use stack safety analysis."
This reverts commit e9c63ed10b.
2021-07-20 10:36:46 +01:00
Florian Mayer e9c63ed10b [hwasan] Use stack safety analysis.
This avoids unnecessary instrumentation.

Reviewed By: eugenis, vitalybuka

Differential Revision: https://reviews.llvm.org/D105703
2021-07-20 10:06:35 +01:00
Quinn Pham 0268e123be [PowerPC] swdiv_nochk Builtins for XL Compat
This patch is in a series of patches to provide builtins for
compatibility with the XL compiler. This patch adds software divide
builtins with no checking. These builtins are each emitted as a fast
fdiv.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D106150
2021-07-19 16:51:10 -05:00
Giorgis Georgakoudis fb0cf01795 Revert "[OpenMP] Codegen aggregate for outlined function captures"
This reverts commit e9c7291cb2.

Fix failing tests
2021-07-19 07:54:26 -07:00
Jamie Schmeiser 73840f9f81 thread_local support for AIX
Summary:
The AIX linker will produce errors on unresolved weak symbols.  Change the
generated code to not check for the initialization function but just call
it and ensure that it always exists.  Also, the AIX atexit routine has a
different name (and signature) so call it correctly.  Update the lit tests
to test on AIX appropriately.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: hubert.reinterpretcast (Hubert Tong)
Differential Revision: https://reviews.llvm.org/D104420
2021-07-19 10:03:22 -04:00
Florian Mayer 807d50100c Revert "[hwasan] Use stack safety analysis."
This reverts commit 12268fe14a.
2021-07-19 12:08:32 +01:00
Florian Mayer 12268fe14a [hwasan] Use stack safety analysis.
This avoids unnecessary instrumentation.

Reviewed By: eugenis, vitalybuka

Differential Revision: https://reviews.llvm.org/D105703
2021-07-19 11:54:44 +01:00
David Blaikie dac582ad3a DebugInfo: Name class templates with default arguments consistently (both direct naming, and as a template argument for a function template)
It's noteworthy that GCC has the same bug here, which is a bit
surprising. Both Clang and GCC's bug is only for function template
arguments that are themselves templates with default template arguments
(f1<t1<int[, missing_default_here]>>). Probably because function name
matching isn't generally necessary - whereas type matching is necessary
for DWARF consumers to associate declarations and definitions across
translation units, so the bug's been addressed there already - but
continued to exist for function templates since it's fairly benign
there.

I came across this while working on a change that could reconstitute
these pretty printed names based on the rest of the DWARF, reducing the
size of the DWARF by not having to encode all the template parameters in
the name string. That reconstitution code can't tell the difference
between a defaulted argument or not, so couldn't create the current
buggy-ish output.

Making the names more consistent between direct and indirect references,
and between function and class templates seems all to the good.

(I fixed the function template version of this a few years back in
9fdd09a4cc - clearly I should've looked
more closely and generalized the code better so it only had to be fixed
once - well, doing that here now)
2021-07-17 23:58:15 -07:00
Nikita Popov 2c68ecccc9 [OpaquePtr] Remove uses of CreateGEP() without element type
Remove uses of to-be-deprecated API. In cases where the correct
element type was not immediately obvious to me, fall back to
explicit getPointerElementType().
2021-07-17 22:56:27 +02:00
Nikita Popov 6225d0cc6e [OpaquePtr] Remove uses of CreateInBoundsGEP() without element type
Remove uses of to-be-deprecated API.

Unfortunately this one mostly just makes the use of
getPointerElementType() explicit, as the correct type to use
wasn't immediately available (deriving it from QualType is left
as an excercise to the reader).
2021-07-17 21:27:16 +02:00
Nikita Popov 4ace6008f2 [OpaquePtr] Remove uses of CreateStructGEP() without element type
Remove uses of to-be-deprecated API.
2021-07-17 18:48:21 +02:00
Nikita Popov 6d3e7c783b [OpaquePtr] Remove uses of CreateConstGEP1_32() without element type
Remove uses of to-be-deprecated API. I've fallen back to calling
getPointerElementType() in some cases where the correct type wasn't
immediately obvious to me.
2021-07-17 18:32:36 +02:00
Nikita Popov 5071360eb1 [OpaquePtr] Remove uses of CGF.Builder.CreateConstInBoundsGEP1_64() without type
Remove uses of to-be-deprecated API.
2021-07-17 17:07:46 +02:00
Nikita Popov 357756ecf6 [OpaquePtr] Remove uses of CreateConstGEP1_64() without element type
Remove uses of to-be-deprecated API.
2021-07-17 16:43:20 +02:00
Nikita Popov 4737eebc0d [OpaquePtr] Remove uses of CreateConstInBoundsGEP2_64() without type
Remove uses of to-be-deprecated API.
2021-07-17 16:42:10 +02:00
Giorgis Georgakoudis e9c7291cb2 [OpenMP] Codegen aggregate for outlined function captures
Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3)  forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102107
2021-07-16 23:27:44 -07:00
Nemanja Ivanovic 35a18a981f [PowerPC] Implement intrinsics for mtfsf[i]
This provides intrinsics for emitting instructions that set the FPSCR (`mtfsf/mtfsfi`).

The patch also conservatively marks the rounding mode as an implicit def for both since they both may set the rounding mode depending on the operands.

Reviewed By: #powerpc, qiucf

Differential Revision: https://reviews.llvm.org/D105957
2021-07-16 16:26:11 -05:00
Victor Huang 4eb107ccba [PowerPC] Add PowerPC population count, reversed load and store related builtins and instrinsics for XL compatibility
This patch is in a series of patches to provide builtins for compatibility
with the XL compiler. This patch adds the builtins and instrisics for population
count, reversed load and store related operations.

Reviewed By: nemanjai, #powerpc

Differential revision: https://reviews.llvm.org/D106021
2021-07-15 17:23:56 -05:00
Artem Belevich d774b4aa5e [NVPTX, CUDA] Add .and.popc variant of the b1 MMA instruction.
That should allow clang to compile mma.h from CUDA-11.3.

Differential Revision: https://reviews.llvm.org/D105384
2021-07-15 12:02:09 -07:00
Quinn Pham de3956605a [PowerPC] Fix popcntb XL Compat Builtin for 32bit
This patch implements the `__popcntb` XL compatibility builtin for 32bit in the frontend and backend. This patch also updates tests for `__popcntb` and other XL Compat sync related builtins.

Reviewed By: #powerpc, nemanjai, amyk

Differential Revision: https://reviews.llvm.org/D105360
2021-07-15 13:19:47 -05:00
Victor Huang d40e8091bd [PowerPC] Add PowerPC rotate related builtins and emit target independent code for XL compatibility
This patch is in a series of patches to provide builtins for compatibility
with the XL compiler. This patch adds the builtins and emit target independent
code for rotate related operations.

Reviewed By: nemanjai, #powerpc

Differential revision: https://reviews.llvm.org/D104744
2021-07-15 10:23:54 -05:00
Chuanqi Xu 8a1727ba51 [Coroutines] Run coroutine passes by default
This patch make coroutine passes run by default in LLVM pipeline. Now
the clang and opt could handle IR inputs containing coroutine intrinsics
without special options.
It should be fine. On the one hand, the coroutine passes seems to be stable
since there are already many projects using coroutine feature.
On the other hand, the coroutine passes should do nothing for IR who doesn't
contain coroutine intrinsic.

Test Plan: check-llvm

Reviewed by: lxfind, aeubanks

Differential Revision: https://reviews.llvm.org/D105877
2021-07-15 14:33:40 +08:00
Thomas Lively 4a4229f70f [WebAssembly] Codegen for v128.storeX_lane instructions
Replace the experimental clang builtins and LLVM intrinsics for these
instructions with normal codegen patterns. Resolves PR50435.

Differential Revision: https://reviews.llvm.org/D106019
2021-07-14 16:15:25 -07:00
Thomas Lively 970e090010 [WebAssembly] Codegen for v128.loadX_lane instructions
Replace the experimental clang builtin and LLVM intrinsics for these
instructions with normal codegen patterns. Resolves PR50433.

Differential Revision: https://reviews.llvm.org/D105950
2021-07-14 11:31:53 -07:00
Albion Fung f1aca5ac96 [PowerPC] Fix L[D|W]ARX Implementation
LDARX and LWARX sometimes gets optimized out by the compiler
when it is critical to the correctness of the code. This inline asm generation
ensures that it preserved.

Differential Revision: https://reviews.llvm.org/D105754
2021-07-13 11:02:07 -05:00
Dave MacLachlan 45ffe6341d [clang/objc] Optimize getters for non-atomic, copied properties
Properties that were declared `@property(copy, nonatomic) id foo` make an
unnecessary call to objc_get_property().  This call can be replaced with a
direct access to the backing variable identical to how a `@property(nonatomic)
id foo` would do it.

This reduces codegen by 4 bytes (x86_64/arm64) and removes a cross linkage unit
function call per property declared as copy/nonatomic.

Differential Revision: https://reviews.llvm.org/D105311
2021-07-13 09:22:13 -04:00
Thomas Lively cbabfc63b1 [WebAssembly] Custom combines for f32x4.demote_zero_f64x2
Replace the clang builtin function and LLVM intrinsic for
f32x4.demote_zero_f64x2 with combines from normal SDNodes. Also add missing
combines for i32x4.trunc_sat_zero_f64x2_{s,u}, which share the same pattern.

Differential Revision: https://reviews.llvm.org/D105755
2021-07-12 10:32:18 -07:00
Johannes Doerfert e2cfbfcc0c [OpenMP] Unified entry point for SPMD & generic kernels in the device RTL
In the spirit of TRegions [0], this patch provides a simpler and uniform
interface for a kernel to set up the device runtime. The OMPIRBuilder is
used for reuse in Flang. A custom state machine will be generated in the
follow up patch.

The "surplus" threads of the "master warp" will not exit early anymore
so we need to use non-aligned barriers. The new runtime will not have an
extra warp but also require these non-aligned barriers.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

This was in parts extracted from D59319.

Reviewed By: ABataev, JonChesterfield

Differential Revision: https://reviews.llvm.org/D101976
2021-07-10 17:53:56 -05:00
Nico Weber d3e7491333 Revert Attributor patch series
Broke check-clang, see https://reviews.llvm.org/D102307#2869065
Ran `git revert -n ebbe149a6f08535ede848a531a601ae6591cfbc5..269416d41908bb670f67af689155d5ab8eea689a`
2021-07-10 16:15:55 -04:00
Johannes Doerfert 1d5711c3ee [OpenMP] Unified entry point for SPMD & generic kernels in the device RTL
In the spirit of TRegions [0], this patch provides a simpler and uniform
interface for a kernel to set up the device runtime. The OMPIRBuilder is
used for reuse in Flang. A custom state machine will be generated in the
follow up patch.

The "surplus" threads of the "master warp" will not exit early anymore
so we need to use non-aligned barriers. The new runtime will not have an
extra warp but also require these non-aligned barriers.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

This was in parts extracted from D59319.

Reviewed By: ABataev, JonChesterfield

Differential Revision: https://reviews.llvm.org/D101976
2021-07-10 12:32:50 -05:00
Thomas Lively e5220104d0 [WebAssembly] Custom combines for f64x2.promote_low_f32x4
Replace the clang builtin function and LLVM intrinsic previously used to select
the f64x2.promote_low_f32x4 instruction with custom combines from standard
SelectionDAG nodes. Implement the new combines to share code with the similar
combines for f64x2.convert_low_i32x4_{s,u}. Resolves PR50232.

Differential Revision: https://reviews.llvm.org/D105675
2021-07-09 18:59:29 -07:00
Alexey Bataev ab8989ab87 [OPENMP]Fix overlapped mapping for dereferenced pointer members.
If the base is used in a map clause and later we have a memberexpr with
this base, and the member is a pointer, and this pointer is dereferenced
anyhow (subscript, array section, dereference, etc.), such components
should be considered as overlapped, otherwise it may lead to incorrect
size computations, since we try to map a pointee as a part of the whole
struct, which is not true for the pointer members.

Differential Revision: https://reviews.llvm.org/D105562
2021-07-09 12:51:26 -07:00
David Blaikie 768e3af634 PR51034: Debug Info: Remove 'prototyped' from K&R function declarations
Regression caused by 6c9559b67b.
2021-07-09 12:07:36 -07:00
Varun Gandhi 92dcb1d2db [Clang] Introduce Swift async calling convention.
This change is intended as initial setup. The plan is to add
more semantic checks later. I plan to update the documentation
as more semantic checks are added (instead of documenting the
details up front). Most of the code closely mirrors that for
the Swift calling convention. Three places are marked as
[FIXME: swiftasynccc]; those will be addressed once the
corresponding convention is introduced in LLVM.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D95561
2021-07-09 11:50:10 -07:00
David Blaikie 1def2579e1 PR51018: Remove explicit conversions from SmallString to StringRef to future-proof against C++23
C++23 will make these conversions ambiguous - so fix them to make the
codebase forward-compatible with C++23 (& a follow-up change I've made
will make this ambiguous/invalid even in <C++23 so we don't regress
this & it generally improves the code anyway)
2021-07-08 13:37:57 -07:00
Nikita Popov a0ea367562 [CodeGen] Avoid nullptr arg to CreateStructGEP (NFC)
For now just make the getPointerElementType() explicit.
2021-07-08 21:21:43 +02:00
Alexey Bataev f57d396dca [OPENMP]Do no privatize const firstprivates in target regions.
No need to emit private copyfor firstprivate constants in target
regions, we can use the original copy instead.

Differential Revision: https://reviews.llvm.org/D105647
2021-07-08 11:55:37 -07:00
Nikita Popov 693251fb2f [CodeGen] Avoid CreateGEP with nullptr type (NFC)
In preparation for dropping support for it. I've replaced it with
a proper type where the correct type was obvious and left an
explicit getPointerElementType() where it wasn't.
2021-07-08 20:38:54 +02:00
Alexey Bataev b3c80dd894 [OPENMP]Remove const firstprivate allocation as a variable in a constant space.
Current implementation is not compatible with asynchronous target
regions, need to remove it.

Differential Revision: https://reviews.llvm.org/D105375
2021-07-07 05:56:48 -07:00
Hsiangkai Wang 593bf9b4de [Clang][RISCV] Implement vlseg and vlsegff.
Differential Revision: https://reviews.llvm.org/D103527
2021-07-07 13:44:40 +08:00
David Blaikie 6c9559b67b DebugInfo: Mangle K&R declarations for debug info linkage names
This fixes a gap in the `overloadable` attribute support (K&R declared
functions would get mangled symbol names, but that name wouldn't be
represented in the debug info linkage name field for the function) and
in -funique-internal-linkage-names (this came up in review discussion on
D98799) where K&R static declarations would not get the uniqued linkage
names.
2021-07-06 16:28:02 -07:00
Xiang1 Zhang a39bb960fc [X86] Refine code of generating BB labels in Keylocker
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D105336
2021-07-05 09:29:51 +08:00
Nikita Popov fabc17192e [IRBuilder] Add type argument to CreateMaskedLoad/Gather
Same as other CreateLoad-style APIs, these need an explicit type
argument to support opaque pointers.

Differential Revision: https://reviews.llvm.org/D105395
2021-07-04 12:17:59 +02:00
Jun Ma 3afbf89804 [clang][AArch64][SVE] Handle PRValue under VLAT <-> VLST cast
This change fixes the crash that PRValue cannot be handled by
EmitLValue.

Differential Revision: https://reviews.llvm.org/D105097
2021-07-01 10:09:47 +08:00
Yaxun (Sam) Liu 434bd5bf54 [AMDGPU] Add builtin functions image_bvh_intersect_ray
Reviewed by: Stanislav Mekhanoshin, Matt Arsenault

Differential Revision: https://reviews.llvm.org/D104946
2021-06-30 13:10:47 -04:00
Melanie Blower e773216f46 [clang][patch] Add builtin __arithmetic_fence and option fprotect-parens
This patch adds a new clang builtin, __arithmetic_fence. The purpose of the
builtin is to provide the user fine control, at the expression level, over
floating point optimization when -ffast-math (-ffp-model=fast) is enabled.
The builtin prevents the optimizer from rearranging floating point expression
evaluation. The new option fprotect-parens has the same effect on
parenthesized expressions, forcing the optimizer to respect the parentheses.

Reviewed By: aaron.ballman, kpn

Differential Revision: https://reviews.llvm.org/D100118
2021-06-30 09:58:06 -04:00
Akira Hatanaka 6cda73e3c4 [CodeGen] Add ParmVarDecls to FunctionDecls that are created to generate
ObjC property getter/setter functions

This is needed to prevent clang from crashing when we make the changes
proposed in https://reviews.llvm.org/D98799.

Differential Revision: https://reviews.llvm.org/D104883
2021-06-29 16:27:24 -07:00
Steffen Larsen 3644726a78 [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX 6.5 and 7.0 WMMA and MMA instructions
Adds NVPTX builtins and intrinsics for the CUDA PTX `wmma.load`, `wmma.store`, `wmma.mma`, and `mma` instructions added in PTX 6.5 and 7.0.

PTX ISA description of

  - `wmma.load`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-ld
  - `wmma.store`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-st
  - `wmma.mma`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-mma
  - `mma`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-mma

Overview of `wmma.mma` and `mma` matrix shape/type combinations added with specific PTX versions: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-shape

Authored-by: Steffen Larsen <steffen.larsen@codeplay.com>
Co-Authored-by: Stuart Adams <stuart.adams@codeplay.com>

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D104847
2021-06-29 15:44:07 -07:00
Akira Hatanaka 8d21d54725 [CodeGen] Stop creating fake FunctionDecls when generating IR for
functions implicitly generated by the compiler

These fake functions would cause clang to crash if the changes proposed
in https://reviews.llvm.org/D98799 were made.
2021-06-29 14:22:33 -07:00
Akira Hatanaka 952944c12c [ObjC][ARC] Don't add operand bundle clang.arc.attachedcall to a call if
the call already has the operand bundle

This bug was causing the call to `replaceAllUsesWith` to crash because
the old call instruction and the new call instruction were the same.

rdar://74957948

Differential Revision: https://reviews.llvm.org/D97824
2021-06-29 10:23:01 -07:00
Bruno De Fraine 4d8871a898 PR50767: clear non-distinct debuginfo for function with nodebug definition after undecorated declaration
Fix suggested by Yuanfang Chen:

Non-distinct debuginfo is attached to the function due to the undecorated declaration. Later, when seeing the function definition and `nodebug` attribute, the non-distinct debuginfo should be cleared.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D104777
2021-06-29 10:26:45 +02:00
Xiang1 Zhang 6d234a6908 [X86] Zero some outputs of Kelocker intrinsics in error case
Reviewed By: WangPengfei

Differential Revision: https://reviews.llvm.org/D104766
2021-06-29 13:35:40 +08:00
Ben Shi c94c8d8b5d [AVR][clang] Fix wrong calling convention in functions return struct type
According to AVR ABI (https://gcc.gnu.org/wiki/avr-gcc), returned struct value
within size 1-8 bytes should be returned directly (via register r18-r25), while
larger ones should be returned via an implicit struct pointer argument.

Reviewed By: dylanmckay

Differential Revision: https://reviews.llvm.org/D99237
2021-06-29 11:32:39 +08:00
Michael Liao 948308ef34 Fix `-Wunused-variable` warning. NFC. 2021-06-28 22:50:36 -04:00
Hongtao Yu 633ca3ff2f [UniqueLinkageName] Use exsiting GlobalDecl object instead of reconstructing one.
C++ constructors/destructors need to go through a different constructor to construct a GlobalDecl object in order to retrieve their linkage type. This causes an assert failure in the default constructor of GlobalDecl. I'm chaning it to using the exsiting GlobalDecl object.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D102356
2021-06-28 14:50:41 -07:00
Melanie Blower c27e5a2a8e Revert "[clang][patch][fpenv] Add builtin __arithmetic_fence and option fprotect-parens"
This reverts commit 4f1238e44d.
Buildbot fails on predecessor patch
2021-06-28 12:42:59 -04:00
Melanie Blower 4f1238e44d [clang][patch][fpenv] Add builtin __arithmetic_fence and option fprotect-parens
This patch adds a new clang builtin, __arithmetic_fence. The purpose of the
builtin is to provide the user fine control, at the expression level, over
floating point optimization when -ffast-math (-ffp-model=fast) is enabled.
The builtin prevents the optimizer from rearranging floating point expression
evaluation. The new option fprotect-parens has the same effect on
parenthesized expressions, forcing the optimizer to respect the parentheses.

Reviewed By: aaron.ballman, kpn

Differential Revision: https://reviews.llvm.org/D100118
2021-06-28 12:26:53 -04:00
Jinsong Ji eb237ffca8 [PowerPC] Add XL Compat fetch builtins
Prototype
```
unsigned int __fetch_and_add (volatile unsigned int* addr, unsigned int
val);
unsigned long __fetch_and_addlp (volatile unsigned long* addr, unsigned
long val);
```
Ref:
https://www.ibm.com/docs/en/xl-c-and-cpp-linux/16.1.1?topic=functions-fetch

Reviewed By: #powerpc, w2yehia, lkail

Differential Revision: https://reviews.llvm.org/D104991
2021-06-28 02:52:32 +00:00
Craig Topper 7a112356e4 [X86] Correct the conversion of VALIGND/Q intrinsics to shufflevector.
We need to mask the immediate to the width of a single vector
rather than 2 vectors. If we use the width of 2 vectors then
any shift larger than the length of 1 vector is going to overflow
the shuffle indices.

Fixes PR50895.
2021-06-26 19:06:00 -07:00
Joseph Huber 9ce02ea8c9 [OpenMP] Add Module metadata for OpenMP compilation
This patch adds a module level metadata flag indicating that the module
was compiled with the `-fopenmp` flag. This will make it easier for
passes like OpenMPOpt to determine if it should be run.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102361
2021-06-25 16:34:19 -04:00
Stuart Brady e47027d091 [OpenCL] Use DW_LANG_OpenCL language tag for OpenCL C
Note regarding C++ for OpenCL:

   When compiling C++ for OpenCL, DW_LANG_C_plus_plus* is emitted.

   There is no DWARF language code defined for C++ for OpenCL as of yet,
   but DWARF issue 210514.1 has been raised to request one.

   In the mean time, continuing to emit DW_LANG_C_plus_plus* for C++ for
   OpenCL allows the potential to distinguish between C++ for OpenCL and
   OpenCL C in !DICompileUnit nodes, whereas using DW_LANG_OpenCL for
   C++ for OpenCL would prevent this.

   This change therefore leaves C++ for OpenCL as-is.

Reviewed By: shchenz, Anastasia

Differential Revision: https://reviews.llvm.org/D104118
2021-06-25 11:48:42 +01:00
Jinsong Ji f3ef4f5bff [PowerPC] Add XL compat __compare_and_swap builtins
Prototype
int __compare_and_swap (volatile int* addr, int* old_val_addr, int
new_val);

int __compare_and_swaplp (volatile long* addr, long* old_val_addr, long
new_val);

Refer to
https://www.ibm.com/docs/en/xl-c-and-cpp-aix/16.1?topic=functions-compare-swap-compare-swaplp

Reviewed By: w2yehia

Differential Revision: https://reviews.llvm.org/D104837
2021-06-25 01:08:48 +00:00
Martin Storsjö e5c7c171e5 [clang] Rename StringRef _lower() method calls to _insensitive()
This is mostly a mechanical change, but a testcase that contains
parts of the StringRef class (clang/test/Analysis/llvm-conventions.cpp)
isn't touched.
2021-06-25 00:22:01 +03:00
Akira Hatanaka 8db0dbbe2c [CodeGen] Don't create fake FunctionDecls when generating block/byref
copy/dispose helper functions

We found out that these fake functions would cause clang to crash if the
changes proposed in https://reviews.llvm.org/D98799 were made.

The original patch was reverted in f681fd927e
because debug locations were missing in the body of the block byref
helper functions. This patch fixes the bug by calling CreateArtificial
after the calls to StartFunction.

Differential Revision: https://reviews.llvm.org/D104082
2021-06-24 11:45:52 -07:00
Aakanksha Patil 3453f3dd46 [AMDGPU] Add gfx1035 target
Differential Revision: https://reviews.llvm.org/D104804
2021-06-24 14:32:41 -04:00
Zequan Wu f681fd927e Revert "[CodeGen] Don't create fake FunctionDecls when generating block/byref"
That commit causes crash with error "!dbg attachment points at wrong subprogram for function" on iOS platforms.

This reverts commit f4c06bcb67.
2021-06-22 21:48:00 -07:00
Akira Hatanaka f4c06bcb67 [CodeGen] Don't create fake FunctionDecls when generating block/byref
copy/dispose helper functions

We found out that these fake functions would cause clang to crash if the
changes proposed in https://reviews.llvm.org/D98799 were made.

Differential Revision: https://reviews.llvm.org/D104082
2021-06-22 11:42:53 -07:00
Fangrui Song 948016228f Improve clang -Wframe-larger-than= diagnostic
Match the style in D104667.

This commit is for non-LTO diagnostics, while D104667 is for LTO and llc diagnostics.
2021-06-22 11:20:49 -07:00
Joseph Huber 68d133a3e8 [OpenMP] Simplify GPU memory globalization
Summary:
Memory globalization is required to maintain OpenMP standard semantics for data sharing between
worker and master threads. The GPU cannot share data between its threads so must allocate global or
shared memory to store the data in. Currently this is implemented fully in the frontend using the
`__kmpc_data_sharing_push_stack` and __kmpc_data_sharing_pop_stack` functions to emulate standard
CPU stack sharing. The front-end scans the target region for variables that escape the region and
must be shared between the threads. Each variable then has a field created for it in a global record
type.

This patch replaces this functinality with a single allocation command, effectively mimicing an
alloca instruction for the variables that must be shared between the threads. This will be much
slower than the current solution, but makes it much easier to optimize as we can analyze each
variable independently and determine if it is not captured. In the future, we can replace these
calls with an `alloca` and small allocations can be pushed to shared memory.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D97680
2021-06-22 10:52:46 -04:00
Graham Hunter a83ce95b09 [clang] Remove unused capture in closure
c6a91ee6aa removed uses of IsMonotonic from OpenMP SIMD codegen,
but that left a capture of the variable unused which upset buildbots
using -Werror.
2021-06-22 15:09:39 +01:00
Graham Hunter c6a91ee6aa [Clang][OpenMP] Monotonic does not apply to SIMD
The codegen for simd constructs was affected by the presence (or
absence) of the 'monotonic' schedule modifier for worksharing
loops. The modifier is only intended to apply to the scheduling of
chunks for a thread, not iterations of a loop inside a chunk.

In addition, the monotonic modifier was applied to worksharing loops
by default if no schedule clause was present; the referenced part of
the OpenMP 4.5 spec in the code (section 2.7.1) only applies if the
user specified a schedule clause with a static kind but no modifier.
Without a user-specified schedule clause we should default to
nonmonotonic scheduling.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D103793
2021-06-22 10:24:11 +01:00
Nick Desaulniers 8ace121305 [IR] convert warn-stack-size from module flag to fn attr
Otherwise, this causes issues when building with LTO for object files
that use different values.

Link: https://github.com/ClangBuiltLinux/linux/issues/1395

Reviewed By: dblaikie, MaskRay

Differential Revision: https://reviews.llvm.org/D104342
2021-06-21 15:09:25 -07:00
Nick Desaulniers 193e41c987 [Clang][Codegen] Add GNU function attribute 'no_profile' and lower it to noprofile
noprofile IR attribute already exists to prevent profiling with PGO;
emit that when a function uses the newly added no_profile function
attribute.

The Linux kernel would like to avoid compiler generated code in
functions annotated with such attribute. We already respect this for
libcalls to fentry() and mcount().

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223
Link: https://lore.kernel.org/lkml/CAKwvOdmPTi93n2L0_yQkrzLdmpxzrOR7zggSzonyaw2PGshApw@mail.gmail.com/

Reviewed By: MaskRay, void, phosek, aaron.ballman

Differential Revision: https://reviews.llvm.org/D104475
2021-06-18 13:42:32 -07:00
Bjorn Pettersson 4c7f820b2b Update @llvm.powi to handle different int sizes for the exponent
This can be seen as a follow up to commit 0ee439b705,
that changed the second argument of __powidf2, __powisf2 and
__powitf2 in compiler-rt from si_int to int. That was to align with
how those runtimes are defined in libgcc.
One thing that seem to have been missing in that patch was to make
sure that the rest of LLVM also handle that the argument now depends
on the size of int (not using the si_int machine mode for 32-bit).
When using __builtin_powi for a target with 16-bit int clang crashed.
And when emitting libcalls to those rtlib functions, typically when
lowering @llvm.powi), the backend would always prepare the exponent
argument as an i32 which caused miscompiles when the rtlib was
compiled with 16-bit int.

The solution used here is to use an overloaded type for the second
argument in @llvm.powi. This way clang can use the "correct" type
when lowering __builtin_powi, and then later when emitting the libcall
it is assumed that the type used in @llvm.powi matches the rtlib
function.

One thing that needed some extra attention was that when vectorizing
calls several passes did not support that several arguments could
be overloaded in the intrinsics. This patch allows overload of a
scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with
an entry for powi.

Differential Revision: https://reviews.llvm.org/D99439
2021-06-17 09:38:28 +02:00
Chen Zheng 4590b406c0 [Debug-Info] guard DW_LANG_C_plus_plus_14 under strict dwarf
Reviewed By: stuart

Differential Revision: https://reviews.llvm.org/D104291
2021-06-16 03:17:56 +00:00
Kevin Athey e0b469ffa1 [clang-cl][sanitizer] Add -fsanitize-address-use-after-return to clang.
Also:
  - add driver test (fsanitize-use-after-return.c)
  - add basic IR test (asan-use-after-return.cpp)
  - (NFC) cleaned up logic for generating table of __asan_stack_malloc
    depending on flag.

for issue: https://github.com/google/sanitizers/issues/1394

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D104076
2021-06-11 12:07:35 -07:00
Simon Pilgrim 61cdaf66fe [ADT] Remove APInt/APSInt toString() std::string variants
<string> is currently the highest impact header in a clang+llvm build:

https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html

One of the most common places this is being included is the APInt.h header, which needs it for an old toString() implementation that returns std::string - an inefficient method compared to the SmallString versions that it actually wraps.

This patch replaces these APInt/APSInt methods with a pair of llvm::toString() helpers inside StringExtras.h, adjusts users accordingly and removes the <string> from APInt.h - I was hoping that more of these users could be converted to use the SmallString methods, but it appears that most end up creating a std::string anyhow. I avoided trying to use the raw_ostream << operators as well as I didn't want to lose having the integer radix explicit in the code.

Differential Revision: https://reviews.llvm.org/D103888
2021-06-11 13:19:15 +01:00
Nick Desaulniers fc018ebb60 [IR] make -warn-frame-size into a module attr
-Wframe-larger-than= is an interesting warning; we can't know the frame
size until PrologueEpilogueInsertion (PEI); very late in the compilation
pipeline.

-Wframe-larger-than= was propagated through CC1 as an -mllvm flag, then
was a cl::opt in LLVM's PEI pass; this meant it was dropped during LTO
and needed to be re-specified via -plugin-opt.

Instead, make it part of the IR proper as a module level attribute,
similar to D103048. Introduce -fwarn-stack-size CC1 option.

Reviewed By: rsmith, qcolombet

Differential Revision: https://reviews.llvm.org/D103928
2021-06-10 16:15:27 -07:00
Michael Kruse a22236120f [OpenMP] Implement '#pragma omp unroll'.
Implementation of the unroll directive introduced in OpenMP 5.1. Follows the approach from D76342 for the tile directive (i.e. AST-based, not using the OpenMPIRBuilder). Tries to use `llvm.loop.unroll.*` metadata where possible, but has to fall back to an AST representation of the outer loop if the partially unrolled generated loop is associated with another directive (because it needs to compute the number of iterations).

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D99459
2021-06-10 14:30:17 -05:00
Joseph Huber 0c32ffceed [OpenMP] Add type to firstprivate symbol for const firstprivate values
Clang will create a global value put in constant memory if an aggregate value
is declared firstprivate in the target device. The symbol name only uses the
name of the firstprivate variable, so symbol name conflicts will occur if the
variable is allowed to have different types through templates. An example of
this behvaiour is shown in https://godbolt.org/z/EsMjYh47n. This patch adds the
mangled type name to the symbol to avoid such naming conflicts. This fixes
https://bugs.llvm.org/show_bug.cgi?id=50642.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D103995
2021-06-10 09:02:20 -04:00
Hongtao Yu 64b2fb7967 [CSSPGO] Emit mangled dwarf names for line tables debug option under -fpseudo-probe-for-profiling
Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D103909
2021-06-09 10:46:03 -07:00
AndreyChurbanov 9ce2e5e700 Revert "[OpenMP] libomp: implement OpenMP 5.1 inoutset task dependence type"
This reverts commit a1f550e052.

Revert in order to fix backwards compatibility breakage
caused by type size change for task dependence flag.
2021-06-09 17:38:38 +03:00
Matheus Izvekov aef5d8fdc7 [clang] NFC: Rename rvalue to prvalue
This renames the expression value categories from rvalue to prvalue,
keeping nomenclature consistent with C++11 onwards.

C++ has the most complicated taxonomy here, and every other language
only uses a subset of it, so it's less confusing to use the C++ names
consistently, and mentally remap to the C names when working on that
context (prvalue -> rvalue, no xvalues, etc).

Renames:
* VK_RValue -> VK_PRValue
* Expr::isRValue -> Expr::isPRValue
* SK_QualificationConversionRValue -> SK_QualificationConversionPRValue
* JSON AST Dumper Expression nodes value category: "rvalue" -> "prvalue"

Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>

Reviewed By: rsmith

Differential Revision: https://reviews.llvm.org/D103720
2021-06-09 12:27:10 +02:00
Brendon Cahoon 294efbbd3e Reland "[AMDGPU] Add gfx1013 target"
This reverts commit 211e584fa2.

Fixed a use-after-free error that caused the sanitizers to fail.
2021-06-08 21:15:35 -04:00
Brendon Cahoon 211e584fa2 Revert "[AMDGPU] Add gfx1013 target"
This reverts commit ea10a86984.

A sanitizer buildbot reports an error.
2021-06-08 16:29:41 -04:00
Nathan Sidwell b2d0c16e91 [clang] p1099 using enum part 2
This implements the 'using enum maybe-qualified-enum-tag ;' part of
1099. It introduces a new 'UsingEnumDecl', subclassed from
'BaseUsingDecl'. Much of the diff is the boilerplate needed to get the
new class set up.

There is one case where we accept ill-formed, but I believe this is
merely an extended case of an existing bug, so consider it
orthogonal. AFAICT in class-scope the c++20 rule is that no 2 using
decls can bring in the same target decl ([namespace.udecl]/8). But we
already accept:

struct A { enum { a }; };
struct B : A { using A::a; };
struct C : B { using A::a;
using B::a; }; // same enumerator

this patch permits mixtures of 'using enum Bob;' and 'using Bob::member;' in the same way.

Differential Revision: https://reviews.llvm.org/D102241
2021-06-08 11:11:46 -07:00
Nick Desaulniers 3787ee4571 reland [IR] make -stack-alignment= into a module attr
Relands commit 433c8d950c with fixes for
MIPS.

Similar to D102742, specifying the stack alignment via CodegenOpts means
that this flag gets dropped during LTO, unless the command line is
re-specified as a plugin opt. Instead, encode this information as a
module level attribute so that we don't have to expose this llvm
internal flag when linking the Linux kernel with LTO.

Looks like external dependencies might need a fix:
* https://github.com/llvm-hs/llvm-hs/issues/345
* https://github.com/halide/Halide/issues/6079

Link: https://github.com/ClangBuiltLinux/linux/issues/1377

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D103048
2021-06-08 10:59:46 -07:00
Brendon Cahoon ea10a86984 [AMDGPU] Add gfx1013 target
Differential Revision: https://reviews.llvm.org/D103663
2021-06-08 12:49:49 -04:00
Nick Desaulniers a596b54d47 Revert "[IR] make -stack-alignment= into a module attr"
This reverts commit 433c8d950c.

Breaks the MIPS build.
2021-06-08 08:55:50 -07:00
Nick Desaulniers 433c8d950c [IR] make -stack-alignment= into a module attr
Similar to D102742, specifying the stack alignment via CodegenOpts means
that this flag gets dropped during LTO, unless the command line is
re-specified as a plugin opt. Instead, encode this information as a
module level attribute so that we don't have to expose this llvm
internal flag when linking the Linux kernel with LTO.

Looks like external dependencies might need a fix:
* https://github.com/llvm-hs/llvm-hs/issues/345
* https://github.com/halide/Halide/issues/6079

Link: https://github.com/ClangBuiltLinux/linux/issues/1377

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D103048
2021-06-08 08:31:04 -07:00
Yaxun (Sam) Liu 054cc3b1b4 [CUDA][HIP] Fix store of vtbl in ctor
vtbl itself is in default global address space. When clang emits
ctor, it gets a pointer to the vtbl field based on the this pointer,
then stores vtbl to the pointer.

Since this pointer can point to any address space (e.g. an object
created in stack), this pointer points to default address space, therefore
the pointer to vtbl field in this object should also be in default
address space.

Currently, clang incorrectly casts the pointer to vtbl field in this object
to global address space. This caused assertions in backend.

This patch fixes that by removing the incorrect addr space cast.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D103835
2021-06-08 10:24:44 -04:00
Martin Storsjö b34da6ff9c [clang] Apply MS ABI details on __builtin_ms_va_list on non-windows platforms on x86_64
This fixes inconsistencies in the ms_abi.c testcase.

Also add a couple cases of missing double pointers in the windows part
of the testcase; the outcome of building that testcase on windows hasn't
changed, but the previous form of the test was imprecise (checking
for "%[[STRUCT_FOO]]*" when clang actually generates "%[[STRUCT_FOO]]**"),
which still used to match.

Ideally this would share code with the native Windows case, but
X86_64ABIInfo and WinX86_64ABIInfo aren't superclasses/subclasses of
each other so it's impractical, and the code to share currently only
consists of a couple lines.

Differential Revision: https://reviews.llvm.org/D103837
2021-06-08 12:14:12 +03:00
Martin Storsjö 6de45b9e6a [clang] Fix reading long doubles with va_arg on x86_64 mingw
On x86_64 mingw, long doubles are always passed indirectly as
arguments (see an existing case in WinX86_64ABIInfo::classify);
generalize the existing code for reading varargs - any non-aggregate
type that is larger than 64 bits (which would be both long double
in mingw, and __int128) are passed indirectly too.

This makes reading varargs consistent with how they're passed,
fixing interop with both gcc and clang callers, for long double
and __int128.

Differential Revision: https://reviews.llvm.org/D103452
2021-06-07 22:34:10 +03:00
AndreyChurbanov a1f550e052 [OpenMP] libomp: implement OpenMP 5.1 inoutset task dependence type
Refactored code of dependence processing and added new inoutset dependence type.
Compiler can set dependence flag to 0x8 when call __kmpc_omp_task_with_deps.
Size of type of the dependence flag changed from 1 to 4 bytes in clang.
All dependence flags library gets so far and corresponding dependence types:
1 - IN, 2 - OUT, 3 - INOUT, 4 - MUTEXINOUTSET, 8 - INOUTSET.

Differential Revision: https://reviews.llvm.org/D97085
2021-06-07 21:42:51 +03:00
Hsiangkai Wang 2b13ff6979 [Clang][CodeGen] Set the size of llvm.lifetime to unknown for scalable types.
If the memory object is scalable type, we do not know the exact size of
it at compile time. Set the size of lifetime marker to unknown if the
object is scalable one.

Differential Revision: https://reviews.llvm.org/D102822
2021-06-07 23:30:13 +08:00
Nathan Sidwell ddda05add5 [clang][NFC] Break out BaseUsingDecl from UsingDecl
This is a pre-patch for adding using-enum support.  It breaks out
the shadow decl handling of UsingDecl to a new intermediate base
class, BaseUsingDecl, altering the decl hierarchy to

def BaseUsing : DeclNode<Named, "", 1>;
  def Using : DeclNode<BaseUsing>;
def UsingPack : DeclNode<Named>;
def UsingShadow : DeclNode<Named>;
  def ConstructorUsingShadow : DeclNode<UsingShadow>;

Differential Revision: https://reviews.llvm.org/D101777
2021-06-07 06:29:28 -07:00
Bradley Smith 60c9b5f35c [AArch64][SVE] Improve codegen for dupq SVE ACLE intrinsics
Use llvm.experimental.vector.insert instead of storing into an alloca
when generating code for these intrinsics. This defers the codegen of
the generated vector to instruction selection, allowing existing
shufflevector style optimizations to apply.

Additionally, introduce a new target transform that can recognise fixed
predicate patterns in the svbool variants of these intrinsics.

Differential Revision: https://reviews.llvm.org/D103082
2021-06-07 12:21:38 +01:00
Andrew Savonichev b31f41e78b [Clang] Support a user-defined __dso_handle
This fixes PR49198: Wrong usage of __dso_handle in user code leads to
a compiler crash.

When Init is an address of the global itself, we need to track it
across RAUW. Otherwise the initializer can be destroyed if the global
is replaced.

Differential Revision: https://reviews.llvm.org/D101156
2021-06-07 12:54:08 +03:00
Ten Tzen 33ba8bd2c9 [Windows SEH]: Fix -O2 crash for Windows -EHa
This patch fixes a Windows -EHa crash induced by previous commit 797ad70152.
The crash was caused by "LifetimeMarker" scope (with option -O2) that should not be considered as SEH Scope.

This change also turns off -fasync-exceptions by default under -EHa option for now.

Differential Revision: https://reviews.llvm.org/D103664#2799944
2021-06-04 14:07:44 -07:00
Alexey Bataev c84a5448b5 [OPENMP]Fix PR50129: omp cancel parallel not working as expected.
Need to emit a call for __kmpc_cancel_barrier in the exit block for
__kmpc_cancel function call if cancellation of the parallel block is
requested.

Differential Revision: https://reviews.llvm.org/D103646
2021-06-04 08:24:55 -07:00