Commit Graph

15165 Commits

Author SHA1 Message Date
David Blaikie c0a6433f2b Simplify OpenMP Lambda use
* Use default ref capture for non-escaping lambdas (this makes
  maintenance easier by allowing new uses, removing uses, having
  conditional uses (such as in assertions) not require updates to an
  explicit capture list)
* Simplify addPrivate API not to take a lambda, since it calls it
  unconditionally/immediately anyway - most callers are simply passing
  in a named value or short expression anyway and the lambda syntax just
  adds noise/overhead

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D121077
2022-03-07 18:23:20 +00:00
Qiu Chaofan b2497e5435 [PowerPC] Add generic fnmsub intrinsic
Currently in Clang, we have two types of builtins for fnmsub operation:
one for float/double vector, they'll be transformed into IR operations;
one for float/double scalar, they'll generate corresponding intrinsics.

But for the vector version of builtin, the 3 op chain may be recognized
as expensive by some passes (like early cse). We need some way to keep
the fnmsub form until code generation.

This patch introduces ppc.fnmsub.* intrinsic to unify four fnmsub
intrinsics.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D116015
2022-03-07 13:00:06 +08:00
Shao-Ce SUN fa9c8bab0c [RISCV] Support k-ext clang intrinsics
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D112774
2022-03-05 13:57:18 +08:00
Akira Hatanaka 3717b9661f [NFC][Clang][OpaquePtr] Remove calls to Address::deprecated in
CGBlocks.cpp

Differential Revision: https://reviews.llvm.org/D120856
2022-03-03 08:54:46 -08:00
Aakanksha 840695814a [AMDGPU] Add gfx1036 target
Differential Revision: https://reviews.llvm.org/D120846
2022-03-02 23:26:38 +00:00
Stanislav Mekhanoshin 2e2e64df4a [AMDGPU] Add gfx940 target
This is target definition only.

Differential Revision: https://reviews.llvm.org/D120688
2022-03-02 13:54:48 -08:00
Tong Zhang f76d3b800f [clang][CGStmt] fix crash on invalid asm statement
Clang is crashing on the following statement

  char var[9];
  __asm__ ("" : "=r" (var) : "0" (var));

This is similar to existing test: crbug_999160_regtest

The issue happens when EmitAsmStmt is trying to convert input to match
output type length. However, that is not guaranteed to be successful all the
time and if the statement itself is invalid like having an array type in
the example, we should give a regular error message here instead of
using assert().

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D120596
2022-03-02 11:18:55 -08:00
Akira Hatanaka d112cc2756 [NFC][Clang][OpaquePtr] Remove the call to Address::deprecated in
CreatePointerBitCastOrAddrSpaceCast

Differential Revision: https://reviews.llvm.org/D120757
2022-03-02 08:58:00 -08:00
Tong Zhang 17ce89fa80 [SanitizerBounds] Add support for NoSanitizeBounds function
Currently adding attribute no_sanitize("bounds") isn't disabling
-fsanitize=local-bounds (also enabled in -fsanitize=bounds). The Clang
frontend handles fsanitize=array-bounds which can already be disabled by
no_sanitize("bounds"). However, instrumentation added by the
BoundsChecking pass in the middle-end cannot be disabled by the
attribute.

The fix is very similar to D102772 that added the ability to selectively
disable sanitizer pass on certain functions.

In this patch, if no_sanitize("bounds") is provided, an additional
function attribute (NoSanitizeBounds) is attached to IR to let the
BoundsChecking pass know we want to disable local-bounds checking. In
order to support this feature, the IR is extended (similar to D102772)
to make Clang able to preserve the information and let BoundsChecking
pass know bounds checking is disabled for certain function.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D119816
2022-03-01 18:47:02 +01:00
Michael Kruse a66f7769a3 [OpenMPIRBuilder] Implement static-chunked workshare-loop schedules.
Add applyStaticChunkedWorkshareLoop method implementing static schedule when chunk-size is specified. Unlike a static schedule without chunk-size (where chunk-size is chosen by the runtime such that each thread receives one chunk), we need two nested loops: one for looping over the iterations of a chunk, and a second for looping over all chunks assigned to the threads.

This patch includes the following related changes:
 * Adapt applyWorkshareLoop to triage between the schedule types, now possible since all schedules have been implemented. The default schedule is assumed to be non-chunked static, as without OpenMPIRBuilder.
 * Remove the chunk parameter from applyStaticWorkshareLoop, it is ignored by the runtime. Change the value for the value passed to the init function to 0, as without OpenMPIRBuilder.
 * Refactor CanonicalLoopInfo::setTripCount and CanonicalLoopInfo::mapIndVar as used by both, applyStaticWorkshareLoop and applyStaticChunkedWorkshareLoop.
 * Enable Clang to use the OpenMPIRBuilder in the presence of the schedule clause.

Differential Revision: https://reviews.llvm.org/D114413
2022-02-28 18:18:33 -06:00
Dávid Bolvanský 223b824022 [Clang] noinline call site attribute
Motivation:

```
int foo(int x, int y) { // any compiler will happily inline this function
    return x / y;
}

int test(int x, int y) {
    int r = 0;
    [[clang::noinline]] r += foo(x, y); // for some reason we don't want any inlining here
    return r;
}

```

In 2018, @kuhar proposed "Introduce per-callsite inline intrinsics"  in https://reviews.llvm.org/D51200 to solve this motivation case (and many others).

This patch solves this problem with call site attribute. The implementation is "smaller" wrt approach which uses new intrinsics and thanks to https://reviews.llvm.org/D79121 (Add nomerge statement attribute to clang), we have got some basic infrastructure to deal with attrs on statements with call expressions.

GCC devs are more inclined to call attribute solution as well, as builtins are problematic for them - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104187. But they have no patch proposal yet so..  We have free hands here.

If this approach makes sense, next future steps would be support for call site attributes for always_inline / flatten.

Reviewed By: aaron.ballman, kuhar

Differential Revision: https://reviews.llvm.org/D119061
2022-02-28 21:21:17 +01:00
Itay Bookstein f3480390be [clang][CodeGen] Avoid emitting ifuncs with undefined resolvers
The purpose of this change is to fix the following codegen bug:

```
// main.c
__attribute__((cpu_specific(generic)))
int *foo(void) { static int z; return &z;}
int main() { return *foo() = 5; }

// other.c
__attribute__((cpu_dispatch(generic))) int *foo(void);

// run:
clang main.c other.c -o main; ./main
```

This will segfault prior to the change, and return the correct
exit code 5 after the change.

The underlying cause is that when a translation unit contains
a cpu_specific function without the corresponding cpu_dispatch
the generated code binds the reference to foo() against a
GlobalIFunc whose resolver is undefined. This is invalid: the
resolver must be defined in the same translation unit as the
ifunc, but historically the LLVM bitcode verifier did not check
that. The generated code then binds against the resolver rather
than the ifunc, so it ends up calling the resolver rather than
the resolvee. In the example above it treats its return value as
an int *, therefore trying to write to program text.

The root issue at the representation level is that GlobalIFunc,
like GlobalAlias, does not support a "declaration" state. The
object which provides the correct semantics in these cases
is a Function declaration, but unlike Functions, changing a
declaration to a definition in the GlobalIFunc case constitutes
a change of the object type, as opposed to simply emitting code
into a Function.

I think this limitation is unlikely to change, so I implemented
the fix by returning a function declaration rather than an ifunc
when encountering cpu_specific, and upgrading it to an ifunc
when emitting cpu_dispatch.
This uses `takeName` + `replaceAllUsesWith` in similar vein to
other places where the correct IR object type cannot be known
locally/up-front, like in `CodeGenModule::EmitAliasDefinition`.

Previous discussion in: https://reviews.llvm.org/D112349

Signed-off-by: Itay Bookstein <ibookstein@gmail.com>

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D120266
2022-02-26 11:17:49 +02:00
Adrian Prantl bc7aeea854 Revert "Don't append the working directory to absolute paths"
This reverts commit 2cd9a86da5.
2022-02-25 17:00:10 -08:00
Adrian Prantl 2cd9a86da5 Don't append the working directory to absolute paths
This fixes a bug that happens when using -fdebug-prefix-map to remap
an absolute path to a relative path. Since the path was absolute
before remapping, it is safe to assume that concatenating the remapped
working directory would be wrong.

Differential Revision: https://reviews.llvm.org/D113718
2022-02-25 13:03:59 -08:00
Alexey Bataev d04d9220e1 [OPENMP]Fix PR50347: Mapping of global scope deep object fails.
Changed the we handle llvm::Constants in sizes arrays. ConstExprs and
GlobalValues cannot be used as initializers, need to put them at the
runtime, otherwise there wight be the compilation errors.

Differential Revision: https://reviews.llvm.org/D105297
2022-02-25 10:54:24 -08:00
Shangwu Yao c2f501f395
[CUDA][SPIRV] Assign global address space to CUDA kernel arguments
(resubmit https://reviews.llvm.org/D119207 after fixing the test for
some build settings)

This patch converts CUDA pointer kernel arguments with default address
space to CrossWorkGroup address space (__global in OpenCL). This is
because Generic or Function (OpenCL's private) is not supported as
storage class for kernel pointer types.

Differential revision: https://reviews.llvm.org/D120366
2022-02-24 20:51:43 -08:00
Alexey Bataev ca6fa71b7e Revert "[OPENMP]Fix PR50347: Mapping of global scope deep object fails."
This reverts commit 638938117a. Need to
fix reported fail https://lab.llvm.org/buildbot/#/builders/193/builds/7496
2022-02-24 12:04:39 -08:00
Alexey Bataev 638938117a [OPENMP]Fix PR50347: Mapping of global scope deep object fails.
Changed the we handle llvm::Constants in sizes arrays. ConstExprs and
GlobalValues cannot be used as initializers, need to put them at the
runtime, otherwise there wight be the compilation errors.

Differential Revision: https://reviews.llvm.org/D105297
2022-02-24 11:49:14 -08:00
Joseph Huber 7aef8b3754 [OpenMP] Make section variable external to prevent collisions
Summary:
We use a section to embed offloading code into the host for later
linking. This is normally unique to the translation unit as it is thrown
away during linking. However, if the user performs a relocatable link
the sections will be merged and we won't be able to access the files
stored inside. This patch changes the section variables to have external
linkage and a name defined by the section name, so if two sections are
combined during linking we get an error.
2022-02-24 10:57:09 -05:00
Yaxun (Sam) Liu 9d899d8f01 [HIP] Support `-fgpu-default-stream`
Introduce -fgpu-default-stream={legacy|per-thread} option to
support per-thread default stream for HIP runtime.

When -fgpu-default-stream=per-thread, HIP kernels are
launched through hipLaunchKernel_spt instead of
hipLaunchKernel. Also HIP_API_PER_THREAD_DEFAULT_STREAM=1
is defined by the preprocessor to enable other per-thread stream
API's.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D120298
2022-02-23 22:28:29 -05:00
Fangrui Song 0477cac332 [asan] Allow -fsanitize-address-globals-dead-stripping with -fno-data-sections for ELF
-fdata-sections decides whether global variables go into different sections.
This is orthogonal to whether we place their metadata (`.data` or `asan_globals`) into different sections.

With -fno-data-sections, `-fsanitize-address-globals-dead-stripping` can still:

* deduplicate COMDAT `asan.module_ctor` and `asan.module_dtor`
* (with ld --gc-sections): for a data section (e.g. `.data`), if all global variables defined relative to it are unreferenced, discard them and associated `asan_globals` sections (rare but no need to exclude this case)

Similar to c7b90947bd for PE/COFF.

Reviewed By: #sanitizers, kstoimenov, vitalybuka

Differential Revision: https://reviews.llvm.org/D120394
2022-02-23 16:08:25 -08:00
Joseph Huber 119d71cb73 [OpenMP][NFC] Address warnings and lint messages in CGOpenMPRuntime
Summary:
This patch addressed the warnings and linting messages for the
CGOpenMPRuntime.cpp file. This was causing some -Werror builds to fail.
2022-02-23 18:07:25 -05:00
Reid Kleckner 1d1b089c5d Fix more unused lambda capture warnings, NFC 2022-02-23 14:07:04 -08:00
Reid Kleckner cd37594c03 Fix unused lambda capture warning, NFC 2022-02-23 14:01:01 -08:00
Joseph Huber 2b97b16f29 [OpenMP] Add option to make offloading mandatory
Currently when we generate OpenMP offloading code we always make
fallback code for the CPU. This is necessary for implementing features
like conditional offloading and ensuring that unhandled pragmas don't
result in missing symbols. However, this is problematic for a few cases.
For offloading tests we can silently fail to the host without realizing
that offloading failed. Additionally, this makes it impossible to
provide interoperabiility to other offloading schemes like HIP or CUDA
because those methods do not provide any such host fallback guaruntee.
this patch adds the `-fopenmp-offload-mandatory` flag to prevent
generating the fallback symbol on the CPU and instead replaces the
function with a dummy global and the failed branch with 'unreachable'.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D120353
2022-02-23 16:45:36 -05:00
Arthur Eubanks 4cb24ef90a [clang] Remove Address::deprecated() from CGClass.cpp 2022-02-23 13:31:56 -08:00
Arthur Eubanks 6eec483584 [clang] Remove getPointerElementType() in EmitVTableTypeCheckedLoad() 2022-02-23 09:38:33 -08:00
Nikita Popov b1863d8245 [Clang][OpenMP] Remove use of getPointerElementType()
This new pointer element type use snuck in via D118632.
2022-02-23 16:14:24 +01:00
Arthur Eubanks 36e335eeb5 [clang] Remove Address::deprecated() calls in CodeGenFunction.cpp 2022-02-22 18:28:49 -08:00
Arthur Eubanks cde658fa1f [clang] Remove Address::deprecated() calls in CGVTables.cpp 2022-02-22 16:54:28 -08:00
Arthur Eubanks 3ef7e6c53c [clang] Remove an Address::deprecated() call in CGClass.cpp 2022-02-22 16:19:06 -08:00
Shilei Tian 104d9a6743 [Clang][OpenMP] Add the codegen support for `atomic compare`
This patch adds the codegen support for `atomic compare` in clang.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D118632
2022-02-22 13:01:39 -05:00
Shilei Tian ccebf8ac8c [Clang][OpenMP] Add support for compare capture in parser
This patch adds the support for `atomic compare capture` in parser and part of
sema. We don't create an AST node for this because the spec doesn't say `compare`
and `capture` clauses should be used tightly, so we cannot look one more token
ahead in the parser.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D116261
2022-02-18 10:23:59 -05:00
Joseph Huber 0870a4f59a [OpenMP] Add flag for disabling thread state in runtime
The runtime uses thread state values to indicate when we use an ICV or
are in nested parallelism. This is done for OpenMP correctness, but it
not needed in the majority of cases. The new flag added is
`-fopenmp-assume-no-thread-state`.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D120106
2022-02-18 08:35:05 -05:00
Alexander Potapenko c85a26454d [asan] Add support for disable_sanitizer_instrumentation attribute
For ASan this will effectively serve as a synonym for
__attribute__((no_sanitize("address"))).

Adding the disable_sanitizer_instrumentation to functions will drop the
sanitize_XXX attributes on the IR level.

This is the third reland of https://reviews.llvm.org/D114421.
Now that TSan test is fixed (https://reviews.llvm.org/D120050) there
should be no deadlocks.

Differential Revision: https://reviews.llvm.org/D120055
2022-02-18 09:51:54 +01:00
hyeongyukim b529744c29 [Clang] Rename `disable-noundef-analysis` flag to `-[no-]enable-noundef-analysis`
This flag was previously renamed `enable_noundef_analysis` to
`disable-noundef-analysis,` which is not a conventional name. (Driver and
CC1's boolean options are using [no-] prefix)
As discussed at https://reviews.llvm.org/D105169, this patch reverts its
name to `[no-]enable_noundef_analysis` and enables noundef-analysis as
default.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D119998
2022-02-18 17:02:41 +09:00
Arthur Eubanks 0b5fe2c9f2 [clang] Remove Address::deprecated() in emitVoidPtrDirectVAArg() 2022-02-17 15:05:50 -08:00
Matthew Voss 9ce09099bb Revert "[CUDA][SPIRV] Assign global address space to CUDA kernel arguments"
This reverts commit 9de4fc0f2d.

Reverting due to test failure: https://lab.llvm.org/buildbot/#/builders/139/builds/17199
2022-02-17 14:32:10 -08:00
Arthur Eubanks ba9944ea1d [clang] Remove Address::deprecated() in CGCXXABI.h 2022-02-17 14:23:02 -08:00
Arthur Eubanks 0e219af475 [clang] Remove Address::deprecated() call in CGExprCXX.cpp 2022-02-17 13:58:26 -08:00
Shafik Yaghmour f56cb520d8 [DEBUGINFO] [LLDB] Add support for generating debug-info for structured bindings of structs and arrays
Currently we are not emitting debug-info for all cases of structured bindings a
C++17 feature which allows us to bind names to subobjects in an initializer.

A structured binding is represented by a DecompositionDecl AST node and the
binding are represented by a BindingDecl. It looks the original implementation
only covered the tuple like case which be represented by a DeclRefExpr which
contains a VarDecl.

If the binding is to a subobject of the struct the binding will contain a
MemberExpr and in the case of arrays it will contain an ArraySubscriptExpr.
This PR adds support emitting debug-info for the MemberExpr and ArraySubscriptExpr
cases as well as llvm and lldb tests for these cases as well as the tuple case.

Differential Revision: https://reviews.llvm.org/D119178
2022-02-17 11:14:14 -08:00
Shangwu Yao 9de4fc0f2d
[CUDA][SPIRV] Assign global address space to CUDA kernel arguments
This patch converts CUDA pointer kernel arguments with default address space to
CrossWorkGroup address space (__global in OpenCL). This is because Generic or
Function (OpenCL's private) is not supported as storage class for kernel pointer types.

Differential Revision: https://reviews.llvm.org/D119207
2022-02-17 09:38:06 -08:00
Simon Pilgrim 57fc9798d7 [clang] CGDebugInfo::getOrCreateMethodType - use castAs<> instead of getAs<> to avoid dereference of nullptr
The pointer is always dereferenced, so assert the cast is correct instead of returning nullptr
2022-02-17 13:18:23 +00:00
Simon Pilgrim 2614de8202 [clang] CGCXXABI::EmitLoadOfMemberFunctionPointer - use castAs<> instead of getAs<> to avoid dereference of nullptr
The pointer is always dereferenced by arrangeCXXMethodType, so assert the cast is correct instead of returning nullptr
2022-02-17 13:18:23 +00:00
Nikita Popov 5065076698 [CodeGen] Rename deprecated Address constructor
To make uses of the deprecated constructor easier to spot, and to
ensure that no new uses are introduced, rename it to
Address::deprecated().

While doing the rename, I've filled in element types in cases
where it was relatively obvious, but we're still left with 135
calls to the deprecated constructor.
2022-02-17 11:26:42 +01:00
Nikita Popov fe3407a91b [CGBuilder] Assert that CreateAddrSpaceCast does not change element type
Address space casts in general may change the element type, but
don't allow it in the method working on Address, so we can
preserve the element type.

CreatePointerBitCastOrAddrSpaceCast() still needs to be addressed.
2022-02-16 15:17:08 +01:00
Chuanqi Xu d30ca5e2e2 [C++20] [Coroutines] Implement return value optimization for get_return_object
This patch tries to implement RVO for coroutine's return object got from
get_return_object.
From [dcl.fct.def.coroutine]/p7 we could know that the return value of
get_return_object is either a reference or a prvalue. So it makes sense
to do copy elision for the return value. The return object should be
constructed directly into the storage where they would otherwise be
copied/moved to.

Test Plan: folly, check-all

Reviewed By: junparser

Differential revision: https://reviews.llvm.org/D117087
2022-02-16 13:38:00 +08:00
David Blaikie 9980a3f831 DebugInfo: Disable simplified template names for -gmlt and below
Since -gmlt doesn't carry any type information necessary to rebuild
template names.
2022-02-15 11:58:40 -08:00
David Blaikie 1ea326634b DebugInfo: Don't simplify template names using _BitInt(N)
_BitInt(N) only encodes the byte size in DWARF, not the bit size, so
can't be reconstituted.
2022-02-15 11:58:40 -08:00
Alexander Potapenko 05ee1f4af8 Revert "[asan] Add support for disable_sanitizer_instrumentation attribute"
This reverts commit dd145f953d.

https://reviews.llvm.org/D119726, like https://reviews.llvm.org/D114421,
still causes TSan to fail, see https://lab.llvm.org/buildbot/#/builders/70/builds/18020

Differential Revision: https://reviews.llvm.org/D119838
2022-02-15 15:04:53 +01:00
Alexander Potapenko dd145f953d [asan] Add support for disable_sanitizer_instrumentation attribute
For ASan this will effectively serve as a synonym for
__attribute__((no_sanitize("address")))

This is a reland of https://reviews.llvm.org/D114421

Reviewed By: melver, eugenis

Differential Revision: https://reviews.llvm.org/D119726
2022-02-15 14:06:12 +01:00
Momchil Velikov 6398903ac8 Extend the `uwtable` attribute with unwind table kind
We have the `clang -cc1` command-line option `-funwind-tables=1|2` and
the codegen option `VALUE_CODEGENOPT(UnwindTables, 2, 0) ///< Unwind
tables (1) or asynchronous unwind tables (2)`. However, this is
encoded in LLVM IR by the presence or the absence of the `uwtable`
attribute, i.e.  we lose the information whether to generate want just
some unwind tables or asynchronous unwind tables.

Asynchronous unwind tables take more space in the runtime image, I'd
estimate something like 80-90% more, as the difference is adding
roughly the same number of CFI directives as for prologues, only a bit
simpler (e.g. `.cfi_offset reg, off` vs. `.cfi_restore reg`). Or even
more, if you consider tail duplication of epilogue blocks.
Asynchronous unwind tables could also restrict code generation to
having only a finite number of frame pointer adjustments (an example
of *not* having a finite number of `SP` adjustments is on AArch64 when
untagging the stack (MTE) in some cases the compiler can modify `SP`
in a loop).
Having the CFI precise up to an instruction generally also means one
cannot bundle together CFI instructions once the prologue is done,
they need to be interspersed with ordinary instructions, which means
extra `DW_CFA_advance_loc` commands, further increasing the unwind
tables size.

That is to say, async unwind tables impose a non-negligible overhead,
yet for the most common use cases (like C++ exceptions), they are not
even needed.

This patch extends the `uwtable` attribute with an optional
value:
      -  `uwtable` (default to `async`)
      -  `uwtable(sync)`, synchronous unwind tables
      -  `uwtable(async)`, asynchronous (instruction precise) unwind tables

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D114543
2022-02-14 14:35:02 +00:00
Nikita Popov 1aeb4c6b50 [ItaniumCXXABI] Avoid pointer element type accesses 2022-02-14 15:17:14 +01:00
Nikita Popov f208644ed3 [CGBuilder] Remove CreateBitCast() method
Use CreateElementBitCast() instead, or don't work on Address
where not necessary.
2022-02-14 15:06:04 +01:00
phyBrackets de4e855204 Refactor nested if else with ternary operator in CGExprScalar.cpp
Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D119364
2022-02-13 00:15:35 +05:30
Evgenii Stepanov a730b6a41a [NFC] clang-format one function.
fix code formatting

Differential Revision: https://reviews.llvm.org/D119299
2022-02-11 15:00:29 -08:00
Weverything d5c314cdf4 [Clang][OpaquePtr] Remove deprecated Address constructor calls
Remove most calls to deprcated Address constructor in CGExpr.cpp

Differential Revision: https://reviews.llvm.org/D119496
2022-02-11 13:02:09 -08:00
Arthur Eubanks 87dd3d350c [clang][OpaquePtr] Remove call to getPointerElementType() in CodeGenModule::GetAddrOfGlobalTemporary() 2022-02-11 10:39:49 -08:00
Sameer Sahasrabuddhe d8f99bb6e0 [AMDGPU] replace hostcall module flag with function attribute
The module flag to indicate use of hostcall is insufficient to catch
all cases where hostcall might be in use by a kernel. This is now
replaced by a function attribute that gets propagated to top-level
kernel functions via their respective call-graph.

If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the
default behaviour is to emit kernel metadata indicating that the
kernel uses the hostcall buffer pointer passed as an implicit
argument.

The attribute may be placed explicitly by the user, or inferred by the
AMDGPU attributor by examining the call-graph. The attribute is
inferred only if the function is not being sanitized, and the
implictarg_ptr does not result in a load of any byte in the hostcall
pointer argument.

Reviewed By: jdoerfert, arsenm, kpyzhov

Differential Revision: https://reviews.llvm.org/D119216
2022-02-11 22:51:56 +05:30
Simon Pilgrim 9ece72c159 [clang] VisitCastExpr - use cast<> instead of dyn_cast<> to avoid dereference of nullptr
The pointer is always dereferenced, so assert the cast is correct (which it should be as we just created that ScalableVectorType) instead of returning nullptr
2022-02-11 10:51:34 +00:00
Sander de Smalen 0b41238ae7 [AArch64] Emit TBAA metadata for SVE load/store intrinsics
In Clang we can attach TBAA metadata based on the load/store intrinsics
based on the operation's element type.

This also contains changes to InstCombine where the AArch64-specific
intrinsics are transformed into generic LLVM load/store operations,
to ensure that all metadata is transferred to the new instruction.

There will be some further work after this patch to also emit TBAA
metadata for SVE's gather/scatter- and struct load/store intrinsics.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D119319
2022-02-11 09:00:29 +00:00
Arthur Eubanks e487ddc5c6 [clang][OpaquePtr] Use proper Address constructor in AtomicInfo::getAtomicAddress() 2022-02-10 18:29:51 -08:00
David Blaikie 389f67b35b DebugInfo: Don't simplify names referencing local enums
Due to the way type units work, this would lead to a declaration in a
type unit of a local type in a CU - which is ambiguous. Rather than
trying to resolve that relative to the CU that references the type unit,
let's just not try to simplify these names.

Longer term this should be fixed by not putting the template
instantiation in a type unit to begin with - since it references an
internal linkage type, it can't legitimately be duplicated/in more than
one translation unit, so skip the type unit overhead. (but the right fix
for that is to move type unit management into a DICompositeType flag
(dropping the "identifier" field is not a perfect solution since it
breaks LLVM IR linking decl/def merging during IR linking))
2022-02-10 15:51:47 -08:00
David Blaikie 26c5cf8fa0 Fix Windows build that fails if a class has a member with the same naem 2022-02-10 15:27:31 -08:00
Yuanfang Chen f927021410 Reland "[clang-cl] Support the /JMC flag"
This relands commit b380a31de0.

Restrict the tests to Windows only since the flag symbol hash depends on
system-dependent path normalization.
2022-02-10 15:16:17 -08:00
David Blaikie f3a2cfc103 DebugInfo: Don't simplify any template referencing a lambda
Lambda names aren't entirely canonical (as demonstrated by the
cross-project-test added here) at the moment (we should fix that for a
bunch of reasons) - even if the template referencing them is
non-simplified, other names referencing /that/ template can't be
simplified either because type units might cause a different template to
be picked up that would conflict with the expected name.

(other than for roundtripping precision, it'd be OK to simplify types
that reference types that reference lambdas - but best be consistent
between the roundtrip/verify mode and the actual simplified template
names mode)
2022-02-10 14:56:54 -08:00
Yuanfang Chen b380a31de0 Revert "[clang-cl] Support the /JMC flag"
This reverts commit bd3a1de683.

Break bots:
https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-windows-x64/b8822587673277278177/overview
2022-02-10 14:17:37 -08:00
Yuanfang Chen bd3a1de683 [clang-cl] Support the /JMC flag
The introduction and some examples are on this page:
https://devblogs.microsoft.com/cppblog/announcing-jmc-stepping-in-visual-studio/

The `/JMC` flag enables these instrumentations:
- Insert at the beginning of every function immediately after the prologue with
  a call to `void __fastcall __CheckForDebuggerJustMyCode(unsigned char *JMC_flag)`.
  The argument for `__CheckForDebuggerJustMyCode` is the address of a boolean
  global variable (the global variable is initialized to 1) with the name
  convention `__<hash>_<filename>`. All such global variables are placed in
  the `.msvcjmc` section.
- The `<hash>` part of `__<hash>_<filename>` has a one-to-one mapping
  with a directory path. MSVC uses some unknown hashing function. Here I
  used DJB.
- Add a dummy/empty COMDAT function `__JustMyCode_Default`.
- Add `/alternatename:__CheckForDebuggerJustMyCode=__JustMyCode_Default` link
  option via ".drectve" section. This is to prevent failure in
  case `__CheckForDebuggerJustMyCode` is not provided during linking.

Implementation:
All the instrumentations are implemented in an IR codegen pass. The pass is placed immediately before CodeGenPrepare pass. This is to not interfere with mid-end optimizations and make the instrumentation target-independent (I'm still working on an ELF port in a separate patch).

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D118428
2022-02-10 10:26:30 -08:00
Yaxun (Sam) Liu 1d97cb1f6e [HIP] Emit amdgpu_code_object_version module flag
code object version determines ABI, therefore should not be mixed.

This patch emits amdgpu_code_object_version module flag in LLVM IR
based on code object version (default 4).

The amdgpu_code_object_version value is code object version times 100.

LLVM IR with different amdgpu_code_object_version module flag cannot
be linked.

The -cc1 option -mcode-object-version=none is for ROCm device library use
only, which supports multiple ABI.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D119026
2022-02-08 21:58:40 -05:00
Bill Wendling deaf22bc0e [X86] Implement -fzero-call-used-regs option
The "-fzero-call-used-regs" option tells the compiler to zero out
certain registers before the function returns. It's also available as a
function attribute: zero_call_used_regs.

The two upper categories are:

  - "used": Zero out used registers.
  - "all": Zero out all registers, whether used or not.

The individual options are:

  - "skip": Don't zero out any registers. This is the default.
  - "used": Zero out all used registers.
  - "used-arg": Zero out used registers that are used for arguments.
  - "used-gpr": Zero out used registers that are GPRs.
  - "used-gpr-arg": Zero out used GPRs that are used as arguments.
  - "all": Zero out all registers.
  - "all-arg": Zero out all registers used for arguments.
  - "all-gpr": Zero out all GPRs.
  - "all-gpr-arg": Zero out all GPRs used for arguments.

This is used to help mitigate Return-Oriented Programming exploits.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D110869
2022-02-08 17:42:54 -08:00
Arthur Eubanks f05a63f9a0 [clang] Properly cache member pointer LLVM types
When not going through the main Clang->LLVM type cache, we'd
accidentally create multiple different opaque types for a member pointer
type.

This allows us to remove the -verify-type-cache flag now that
check-clang passes with it on. We can do the verification in expensive
builds. Previously microsoft-abi-member-pointers.cpp was failing with
-verify-type-cache.

I suspect that there may be more issues when we have multiple member
pointer types and we clear the cache, but we can leave that for later.

Followup to D118744.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D119215
2022-02-08 13:22:24 -08:00
Dawid Jurczak 5d8d3a11c4 [NFC] Increase initial size of FoldingSets used in ASTContext and CodeGenTypes
Among many FoldingSet users most notable seem to be ASTContext and CodeGenTypes.
The reasons that we spend not-so-tiny amount of time in FoldingSet calls from there, are following:

  1. Default FoldingSet capacity for 2^6 items very often is not enough.
     For PointerTypes/ElaboratedTypes/ParenTypes it's not unlikely to observe growing it to 256 or 512 items.
     FunctionProtoTypes can easily exceed 1k items capacity growing up to 4k or even 8k size.

  2. FoldingSetBase::GrowBucketCount cost itself is not very bad (pure reallocations are rather cheap thanks to BumpPtrAllocator).
     What matters is high collision rate when lot of items end up in same bucket slowing down FoldingSetBase::FindNodeOrInsertPos and trashing CPU cache
     (as items with same hash are organized in intrusive linked list which need to be traversed).

This change address both issues by increasing initial size of FoldingSets used in ASTContext and CodeGenTypes.

Extracted from: https://reviews.llvm.org/D118385

Differential Revision: https://reviews.llvm.org/D118608
2022-02-08 17:54:04 +01:00
Nikita Popov 18834dca2d [OpenCL] Mark kernel arguments as ABI aligned
Following the discussion on D118229, this marks all pointer-typed
kernel arguments as having ABI alignment, per section 6.3.5 of
the OpenCL spec:

> For arguments to a __kernel function declared to be a pointer to
> a data type, the OpenCL compiler can assume that the pointee is
> always appropriately aligned as required by the data type.

Differential Revision: https://reviews.llvm.org/D118894
2022-02-08 16:12:51 +01:00
Simon Pilgrim 09857a4bd1 [X86] Remove __builtin_ia32_padd/psub saturated intrinsics and use generic __builtin_elementwise_add/sub_sat
D117898 added the generic __builtin_elementwise_add_sat and __builtin_elementwise_sub_sat with the same integer behaviour as the SSE/AVX instructions

This patch removes the __builtin_ia32_padd/psub saturated intrinsics and just uses the generics - the existing tests see no changes:

__m256i test_mm256_adds_epi8(__m256i a, __m256i b) {
  // CHECK-LABEL: test_mm256_adds_epi8
  // CHECK: call <32 x i8> @llvm.sadd.sat.v32i8(<32 x i8> %{{.*}}, <32 x i8> %{{.*}})
  return _mm256_adds_epi8(a, b);
}
2022-02-08 15:00:10 +00:00
Simon Pilgrim a59faf272e Revert rG6c174ab2ad0676b295f11f6c3913eff9289fa6b9 "[X86] Remove __builtin_ia32_padd/psub saturated intrinsics and use generic __builtin_elementwise_add/sub_sat"
Missed some legacy builtin tests that need cleaning up first
2022-02-08 14:45:28 +00:00
Simon Pilgrim 6c174ab2ad [X86] Remove __builtin_ia32_padd/psub saturated intrinsics and use generic __builtin_elementwise_add/sub_sat
D117898 added the generic __builtin_elementwise_add_sat and __builtin_elementwise_sub_sat with the same integer behaviour as the SSE/AVX instructions

This patch removes the __builtin_ia32_padd/psub saturated intrinsics and just uses the generics - the existing tests see no changes:

__m256i test_mm256_adds_epi8(__m256i a, __m256i b) {
  // CHECK-LABEL: test_mm256_adds_epi8
  // CHECK: call <32 x i8> @llvm.sadd.sat.v32i8(<32 x i8> %{{.*}}, <32 x i8> %{{.*}})
  return _mm256_adds_epi8(a, b);
}
2022-02-08 14:21:20 +00:00
David Pagan 0a7cc078ac Enable inoutset dependency-type in depend clause.
Done in manner similar to mutexinoutset
(see https://reviews.llvm.org/D57576)

Runtime support already exists in LLVM OpenMP runtime (see
https://reviews.llvm.org/D97085).

The value used to identify an inoutset dependency type in the LLVM
OpenMP runtime is 8.

Some tests updated due to change in dependency type error messages that
now include new dependency type. Also updated
test/OpenMP/task_codegen.cpp to verify we emit the right code.
2022-02-08 08:35:36 -05:00
Simon Pilgrim c00db97159 [Clang] Add elementwise saturated add/sub builtins
This patch implements `__builtin_elementwise_add_sat` and `__builtin_elementwise_sub_sat` builtins.

These map to the add/sub saturated math intrinsics described here:
https://llvm.org/docs/LangRef.html#saturation-arithmetic-intrinsics

With this in place we should then be able to replace the x86 SSE adds/subs intrinsics with these generic variants - it looks like other targets should be able to use these as well (arm/aarch64/webassembly all have similar examples in cgbuiltin).

Differential Revision: https://reviews.llvm.org/D117898
2022-02-08 11:22:01 +00:00
Arthur Eubanks 45084eab5e [clang] Fix some clang->llvm type cache invalidation issues
Take the following as an example

  struct z {
    z (*p)();
  };

  z f();

When we attempt to get the LLVM type of f, we recurse into z. z itself
has a function pointer with the same type as f. Given the recursion,
Clang simply treats z::p as a pointer to an empty struct `{}*`. The
LLVM type of f is as expected. So we have two different potential
LLVM types for a given Clang type. If we store one of those into the
cache, when we access the cache with a different context (e.g. we
are/aren't recursing on z) we may get an incorrect result. There is some
attempt to clear the cache in these cases, but it doesn't seem to handle
all cases.

This change makes it so we only use the cache when we are not in any
sort of function context, i.e. `noRecordsBeingLaidOut() &&
FunctionsBeingProcessed.empty()`, which are the cases where we may
decide to choose a different LLVM type for a given Clang type. LLVM
types for builtin types are never recursive so they're always ok.

This allows us to clear the type cache less often (as seen with the
removal of one of the calls to `TypeCache.clear()`). We
still need to clear it when we use a placeholder type then replace it
later with the final type and other dependent types need to be
recalculated.

I've added a check that the cached type matches what we compute. It
triggered in this test case without the fix. It's currently not
check-clang clean so it's not on by default for something like expensive
checks builds.

This change uncovered another issue where the LLVM types for an argument
and its local temporary don't match. For example in type-cache-3, when
expanding z::dc's argument into a temporary alloca, we ConvertType() the
type of z::p which is `void ({}*)*`, which doesn't match the alloca GEP
type of `{}*`.

No noticeable compile time changes:
https://llvm-compile-time-tracker.com/compare.php?from=3918dd6b8acf8c5886b9921138312d1c638b2937&to=50bdec9836ed40e38ece0657f3058e730adffc4c&stat=instructions

Fixes #53465.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D118744
2022-02-07 18:59:09 -08:00
Arthur Eubanks 2724c153f9 [clang] Cache OpenCL types
If we call CGOpenCLRuntime::convertOpenCLSpecificType() multiple times
we should get the same type back.

Reviewed By: svenvh

Differential Revision: https://reviews.llvm.org/D119011
2022-02-07 09:23:04 -08:00
Nikita Popov c45a99f36b [MatrixBuilder] Require explicit element type in CreateColumnMajorLoad()
This makes the method compatible with opaque pointers.
2022-02-07 16:57:33 +01:00
Nikita Popov cdc0573f75 [MatrixBuilder] Remove unnecessary IRBuilder template (NFC)
IRBuilderBase exists specifically to avoid the need for this.
2022-02-07 16:42:38 +01:00
Yaxun (Sam) Liu 171da443d5 [HIPSPV] Fix literals are mapped to Generic address space
This issue is an oversight in D108621.

Literals in HIP are emitted as global constant variables with default
address space which maps to Generic address space for HIPSPV. In
SPIR-V such variables translate to OpVariable instructions with
Generic storage class which are not legal. Fix by mapping literals
to CrossWorkGroup address space.

The literals are not mapped to UniformConstant because the “flat”
pointers in HIP may reference them and “flat” pointers are modeled
as Generic pointers in SPIR-V. In SPIR-V/OpenCL UniformConstant
pointers may not be casted to Generic.

Patch by: Henry Linjamäki

Reviewed by: Yaxun Liu

Differential Revision: https://reviews.llvm.org/D118876
2022-02-05 17:26:52 -05:00
James Y Knight caa1ebde70 Don't assume that a new cleanup was added to InnermostEHScope.
After fa87fa97fb, this was no longer guaranteed to be the cleanup
just added by this code, if IsEHCleanup got disabled. Instead, use
stable_begin(), which _is_ guaranteed to be the cleanup just added.

This caused a crash when a object that is callee destroyed (e.g. with the MS ABI) was passed in a call from a noexcept function.

Added a test to verify.

Fixes: fa87fa97fb
2022-02-04 23:39:42 -05:00
Joseph Huber 034adaf5be [OpenMP] Completely remove old device runtime
This patch completely removes the old OpenMP device runtime. Previously,
the old runtime had the prefix `libomptarget-new-` and the old runtime
was simply called `libomptarget-`. This patch makes the formerly new
runtime the only runtime available. The entire project has been deleted,
and all references to the `libomptarget-new` runtime has been replaced
with `libomptarget-`.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D118934
2022-02-04 15:31:33 -05:00
Shilei Tian b35be6fe98 [Clang][Sema][OpenMP] Sema support for `atomic compare`
This patch adds the Sema support for `atomic compare`.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D116637
2022-02-04 12:30:56 -05:00
Hans Wennborg 853e0aa424 Don't dllexport reference temporaries
Even if the reference itself is dllexport, the temporary should not be.
In fact, we're already giving it internal linkage, so dllexporting it
is not just wasteful, but will fail to link, as in the example below:

  $ cat /tmp/a.cc
  void _DllMainCRTStartup() {}
  const int __declspec(dllexport) &foo = 42;

  $ clang-cl -fuse-ld=lld /tmp/a.cc /Zl /link /dll /out:a.dll
  lld-link: error: <root>: undefined symbol: int const &foo::$RT1

Differential revision: https://reviews.llvm.org/D118980
2022-02-04 16:31:51 +01:00
John Brawn bca998ed3c [AArch64] Generate fcmps when appropriate for neon intrinsics
Differential Revision: https://reviews.llvm.org/D118257
2022-02-04 12:55:38 +00:00
Jan Svoboda 42afaf7f47 [clang][CodeGen] Use memory type representation in `va_arg`
Some types (e.g. `_Bool`) have different scalar and memory representations. CodeGen for `va_arg` didn't take this into account, leading to an assertion failures with different types.

This patch makes sure we use memory representation for `va_arg`.

Reviewed By: ahatanak

Differential Revision: https://reviews.llvm.org/D118904
2022-02-04 12:10:57 +01:00
James Y Knight fa87fa97fb Skip exception cleanups when the innermost scope is EHTerminateScope.
EHTerminateScope is used to implement C++ noexcept semantics. Per C++
[except.terminate], it is implemented-defined whether no, some, or all
cleanups are run prior to terminatation.

Therefore, the code to run cleanups on the way towards termination is
unnecessary, and may be omitted.

After this change, we will still run some cleanups: any cleanups in a
function called from the noexcept function will continue to run, while
those in the noexcept function itself will not.

(Commit attempt 2: check InnermostEHScope != stable_end() before accessing it.)

Differential Revision: https://reviews.llvm.org/D113620
2022-02-02 17:50:18 -05:00
Rainer Orth efdd0a29b7 [clang][Sparc] Fix __builtin_extract_return_addr etc.
While investigating the failures of `symbolize_pc.cpp` and
`symbolize_pc_inline.cpp` on SPARC (both Solaris and Linux), I noticed that
`__builtin_extract_return_addr` is a no-op in `clang` on all targets, while
`gcc` has non-default implementations for arm, mips, s390, and sparc.

This patch provides the SPARC implementation.  For background see
`SparcISelLowering.cpp` (`SparcTargetLowering::LowerReturn_32`), the SPARC
psABI p.3-12, `%i7` and p.3-16/17, and SCD 2.4.1, p.3P-10, `%i7` and
p.3P-15.

Tested (after enabling the `sanitizer_common` tests on SPARC) on
`sparcv9-sun-solaris2.11`.

Differential Revision: https://reviews.llvm.org/D91607
2022-02-02 19:20:02 +01:00
Alex Lorenz 116c1bea65 [clang][macho] add clang frontend support for emitting macho files with two build version load commands
This patch extends clang frontend to add metadata that can be used to emit macho files with two build version load commands.
It utilizes "darwin.target_variant.triple" and "darwin.target_variant.SDK Version" metadata names for that.

MachO uses two build version load commands to represent an object file / binary that is targeting both the macOS target,
and the Mac Catalyst target. At runtime, a dynamic library that supports both targets can be loaded from either a native
macOS or a Mac Catalyst app on a macOS system. We want to add support to this to upstream to LLVM to be able to build
compiler-rt for both targets, to finish the complete support for the Mac Catalyst platform, which is right now targetable
by upstream clang, but the compiler-rt bits aren't supported because of the lack of this multiple build version support.

Differential Revision: https://reviews.llvm.org/D115415
2022-02-02 08:30:39 -08:00
serge-sans-paille e188aae406 Cleanup header dependencies in LLVMCore
Based on the output of include-what-you-use.

This is a big chunk of changes. It is very likely to break downstream code
unless they took a lot of care in avoiding hidden ehader dependencies, something
the LLVM codebase doesn't do that well :-/

I've tried to summarize the biggest change below:

- llvm/include/llvm-c/Core.h: no longer includes llvm-c/ErrorHandling.h
- llvm/IR/DIBuilder.h no longer includes llvm/IR/DebugInfo.h
- llvm/IR/IRBuilder.h no longer includes llvm/IR/IntrinsicInst.h
- llvm/IR/LLVMRemarkStreamer.h no longer includes llvm/Support/ToolOutputFile.h
- llvm/IR/LegacyPassManager.h no longer include llvm/Pass.h
- llvm/IR/Type.h no longer includes llvm/ADT/SmallPtrSet.h
- llvm/IR/PassManager.h no longer includes llvm/Pass.h nor llvm/Support/Debug.h

And the usual count of preprocessed lines:
$ clang++ -E  -Iinclude -I../llvm/include ../llvm/lib/IR/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l
before: 6400831
after:  6189948

200k lines less to process is no that bad ;-)

Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup

Differential Revision: https://reviews.llvm.org/D118652
2022-02-02 06:54:20 +01:00
Joseph Huber 53d5757ea2 [OpenMP] Add kernel string attribute to kernel function
This patch adds a function attribute to the kernel function generated in
OpenMP offloading. We already create a `nvvm.annotations` metadata node
indicating the kernels present in the program. However, this created
some indirection when trying to identify if a specific function was an
entry. We add a single function attribute for each function now to
simplify this.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D118708
2022-02-01 13:49:31 -05:00
Fangrui Song 7aaf024dac [BitcodeWriter] Fix cases of some functions
`WriteIndexToFile` is used by external projects so I do not touch it.
2022-01-31 16:46:11 -08:00
Fangrui Song 85dfe19b36 [ModuleUtils] Move EmbedBufferInModule to LLVMTransformsUtils
D116542 adds EmbedBufferInModule which introduces a layer violation
(https://llvm.org/docs/CodingStandards.html#library-layering).
See 2d5f857a1e for detail.

EmbedBufferInModule does not use BitcodeWriter functionality and should be moved
LLVMTransformsUtils. While here, change the function case to the prevailing
convention.

It seems that EmbedBufferInModule just follows the steps of
EmbedBitcodeInModule. EmbedBitcodeInModule calls WriteBitcodeToFile but has IR
update operations which ideally should be refactored to another library.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D118666
2022-01-31 16:33:57 -08:00
Itay Bookstein 2a868802a3 [clang][CodeGen][NFC] Remove unused CodeGenModule fields
Signed-off-by: Itay Bookstein <ibookstein@gmail.com>

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D118619
2022-01-31 23:45:53 +02:00
Joseph Huber 551b177452 [OpenMP] Add a flag for embedding a file into the module
This patch adds support for a flag `-fembed-offload-binary` to embed a
file as an ELF section in the output by placing it in a global variable.
This can be used to bundle offloading files with the host binary so it
can be accessed by the linker. The section is named using the
`-fembed-offload-section` option.

Depends on D116541

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D116542
2022-01-31 15:56:00 -05:00
tyb0807 51e188d079 [AArch64] Support for memset tagged intrinsic
This introduces a new ACLE intrinsic for memset tagged
(https://github.com/ARM-software/acle/blob/next-release/main/acle.md#memcpy-family-of-operations-intrinsics---mops).

  void *__builtin_arm_mops_memset_tag(void *, int, size_t)

A corresponding LLVM intrinsic is introduced:

  i8* llvm.aarch64.mops.memset.tag(i8*, i8, i64)

The types match llvm.memset but the return type is not void.

This is part 1/4 of a series of patches split from
https://reviews.llvm.org/D117405 to facilitate reviewing.

Patch by Tomas Matheson

Differential Revision: https://reviews.llvm.org/D117753
2022-01-31 20:49:34 +00:00
Ben Shi 653836251a [clang][AVR] Set '-fno-use-cxa-atexit' to default
AVR is baremetal environment, so the avr-libc does not support
'__cxa_atexit()'.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D118445
2022-01-30 02:26:19 +00:00
Weverything be2147db05 Remove reference type when checking const structs
ConstStructBuilder::Finalize in CGExprConstant.ccp assumes that the
passed in QualType is a RecordType.  In some instances, the type is a
reference to a RecordType and the reference needs to be removed first.

Differential Revision: https://reviews.llvm.org/D117376
2022-01-28 13:08:58 -08:00
Amilendra Kodithuwakku 1f08b08674 [clang][ARM] Emit warnings when PACBTI-M is used with unsupported architectures
Branch protection in M-class is supported by
 - Armv8.1-M.Main
 - Armv8-M.Main
 - Armv7-M

Attempting to enable this for other architectures, either by
command-line (e.g -mbranch-protection=bti) or by target attribute
in source code (e.g.  __attribute__((target("branch-protection=..."))) )
will generate a warning.

In both cases function attributes related to branch protection will not
be emitted. Regardless of the warning, module level attributes related to
branch protection will be emitted when it is enabled via the command-line.

The following people also contributed to this patch:
- Victor Campos

Reviewed By: chill

Differential Revision: https://reviews.llvm.org/D115501
2022-01-28 09:59:58 +00:00
Joseph Huber 2945f11c60 [OpenMP] Only generate runtime flags with host input
This patch changes the code generation of runtime flags to only occur if
a host bitcode file was passed in. This is a cheap way to determine if
we are compiling the OpenMP device runtime itself or user code. This is
needed because the global flags we generate for the device runtime e.g.
__omp_rtl_debug_kind were being generated with default values when we
compiled the runtime library. This would then invalidate the ones we
want to be able to add in when the user defines it.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D118399
2022-01-27 18:43:41 -05:00
Arthur Eubanks 662ef6d177 [NFC][Clang][OpaquePtr] Move away from deprecated Address constructor in VisitArrayInitLoopExpr
With this we can bootstrap an `-O0 -g0` clang with `-mllvm -opaque-pointers`!
2022-01-27 14:44:53 -08:00
Arthur Eubanks 6e8a66bdad [NFC][Clang][OpaquePtr] Move away from deprecated Address constructor in EmitCXXMemberDataPointerAddress() 2022-01-27 14:44:53 -08:00
Arthur Eubanks f17123831e [NFC][Clang][OpaquePtr] Move away from deprecated Address constructor in CreateTempAlloca()
Specify the Address element type, which is the bitcast destination type.
(the whole bitcast won't be needed after opaque pointers)
2022-01-27 14:18:54 -08:00
Arthur Eubanks 63cf2063a2 [NFC][Clang][OpaquePtr] Move away from deprecated Address constructor in EmitNewArrayInitializer()
Specify the Address element type, which is the same for all pointers in the array.
2022-01-27 14:00:16 -08:00
Sri Hari Krishna Narayanan 5aa24558cf OMPIRBuilder for Interop directive
Implements the OMPIRBuilder portion for the
Interop directive.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105876
2022-01-27 14:53:18 -05:00
David Green 82973edfb7 [ARM][AArch64] Introduce qrdmlah and qrdmlsh intrinsics
Since it's introduction, the qrdmlah has been represented as a qrdmulh
and a sadd_sat. This doesn't produce the same result for all input
values though. This patch fixes that by introducing a qrdmlah (and
qrdmlsh) intrinsic specifically for the vqrdmlah and sqrdmlah
instructions. The old test cases will now produce a qrdmulh and sqadd,
as expected.

Fixes #53120 and #50905 and #51761.

Differential Revision: https://reviews.llvm.org/D117592
2022-01-27 19:19:46 +00:00
Dawid Jurczak b88ca619d3 [NFC][CodeGen] Use llvm::DenseMap for DeferredDecls
CodeGenModule::DeferredDecls std::map::operator[] seem to be hot especially while code generating huge compilation units.
In such cases using DenseMap instead gives observable compile time improvement. Patch was tested on Linux build with default config acting as benchmark.
Build was performed on isolated CPU cores in silent x86-64 Linux environment following: https://llvm.org/docs/Benchmarking.html#linux rules.
Compile time statistics diff produced by perf and time before and after change are following:
instructions -0.15%, cycles -0.7%, max-rss +0.65%.
Using StringMap instead DenseMap doesn't bring any visible gains.

Differential Revision: https://reviews.llvm.org/D118169
2022-01-27 10:57:48 +01:00
Ahmed Bougacha ecb502342c [ObjC] Emit selector load right before msgSend call.
We currently emit the selector load early, but only because we need
it to compute the signature (so that we know which msgSend variant to
call).  We can prepare the signature with a plain undef, and replace
it with the materialized selector value if (and only if) needed, later.

Concretely, this usually doesn't have an effect, but tests need updating
because we reordered the receiver bitcast and the selector load, which
is always fine.

There is one notable change: with this, when a msgSend needs a
receiver null check, the selector is now loaded in the non-null
block, instead of before the null check.  That should be a mild
improvement.
2022-01-26 20:52:54 -08:00
Arthur Eubanks eee97f1617 [clang] Use proper type to left shift after D117262
Causing warnings like
warning C4334: '<<': result of 32-bit shift implicitly converted to 64 bits
as reported in D117262.
2022-01-26 17:54:37 -08:00
Arthur Eubanks 6a953d931c [clang] Fix -Wsubobject-linkage after D117262
/home/buildbot/llvm-avr-linux/llvm-avr-linux/llvm/clang/lib/CodeGen/Address.h:76:7: warning: 'clang::CodeGen::Address' has a field 'clang::CodeGen::Address::A' whose type uses the anonymous namespace [-Wsubobject-linkage]

https://lab.llvm.org/buildbot/#/builders/112/builds/12047
2022-01-26 11:43:44 -08:00
Arthur Eubanks b1613f05ae [NFC] Store Address's alignment into PointerIntPairs
This mitigates the extra memory caused by D115725.

On 32-bit arches where we only have 2 bits per PointerIntPair we fall
back to simply storing alignment separately.

Reviewed By: rnk, nikic

Differential Revision: https://reviews.llvm.org/D117262
2022-01-26 10:35:28 -08:00
Benjamin Kramer f15014ff54 Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17"
This reverts commit ef82063207.

- It conflicts with the existing llvm::size in STLExtras, which will now
  never be called.
- Calling it without llvm:: breaks C++17 compat
2022-01-26 16:55:53 +01:00
serge-sans-paille ef82063207 Rename llvm::array_lengthof into llvm::size to match std::size from C++17
As a conquence move llvm::array_lengthof from STLExtras.h to
STLForwardCompat.h (which is included by STLExtras.h so no build
breakage expected).
2022-01-26 16:17:45 +01:00
JackAKirk 0ad19a8331 [CUDA,NVPTX] Corrected fragment size for tf32 LD B matrix.
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D118023
2022-01-25 11:29:19 -08:00
Nikita Popov 30d4a7e295 [IRBuilder] Require explicit element type in CreatePtrDiff()
For opaque pointer compatibility, we cannot derive the element
type from the pointer type.
2022-01-25 12:43:57 +01:00
Nikita Popov caff8591ef [OpenMP] Simplify pointer comparison
Rather than checking ptrdiff(a, b) != 0, directly check a != b.
2022-01-25 12:38:37 +01:00
Nikita Popov 99adacbcb7 [clang] Remove some getPointerElementType() uses
Same cases where the call can be removed in a straightforward way.
2022-01-25 12:09:06 +01:00
Nikita Popov aa97bc116d [NFC] Remove uses of PointerType::getElementType()
Instead use either Type::getPointerElementType() or
Type::getNonOpaquePointerElementType().

This is part of D117885, in preparation for deprecating the API.
2022-01-25 09:44:52 +01:00
Simon Pilgrim e4074432d5 [X86] Remove avx512f integer and/or/xor/min/max reduction intrinsics and use generic equivalents
None of these have any reordering issues, and they still emit the same reduction intrinsics without any change in the existing test coverage:

llvm-project\clang\test\CodeGen\X86\avx512-reduceIntrin.c
llvm-project\clang\test\CodeGen\X86\avx512-reduceMinMaxIntrin.c

Differential Revision: https://reviews.llvm.org/D117881
2022-01-24 11:57:53 +00:00
Simon Pilgrim 3e50593b18 [X86] Remove `__builtin_ia32_pmax/min` intrinsics and use generic `__builtin_elementwise_max/min`
D111985 added the generic `__builtin_elementwise_max` and `__builtin_elementwise_min` intrinsics with the same integer behaviour as the SSE/AVX instructions

This patch removes the `__builtin_ia32_pmax/min` intrinsics and just uses `__builtin_elementwise_max/min` - the existing tests see no changes:
```
__m256i test_mm256_max_epu32(__m256i a, __m256i b) {
  // CHECK-LABEL: test_mm256_max_epu32
  // CHECK: call <8 x i32> @llvm.umax.v8i32(<8 x i32> %{{.*}}, <8 x i32> %{{.*}})
  return _mm256_max_epu32(a, b);
}
```
This requires us to add a `__v64qs` explicitly signed char vector type (we already have `__v16qs` and `__v32qs`).

Sibling patch to D117791

Differential Revision: https://reviews.llvm.org/D117798
2022-01-24 11:40:29 +00:00
Simon Pilgrim e5147f82e1 [X86] Remove __builtin_ia32_pabs intrinsics and use generic __builtin_elementwise_abs
D111986 added the generic `__builtin_elementwise_abs()` intrinsic with the same integer absolute behaviour as the SSE/AVX instructions (abs(INT_MIN) == INT_MIN)

This patch removes the `__builtin_ia32_pabs*` intrinsics and just uses `__builtin_elementwise_abs` - the existing tests see no changes:
```
__m256i test_mm256_abs_epi8(__m256i a) {
  // CHECK-LABEL: test_mm256_abs_epi8
  // CHECK: [[ABS:%.*]] = call <32 x i8> @llvm.abs.v32i8(<32 x i8> %{{.*}}, i1 false)
  return _mm256_abs_epi8(a);
}
```
This requires us to add a `__v64qs` explicitly signed char vector type (we already have `__v16qs` and `__v32qs`).

Differential Revision: https://reviews.llvm.org/D117791
2022-01-24 11:25:21 +00:00
Wei Wang 55d887b833 [time-trace] Add optimizer and codegen regions to NPM
Optimizer and codegen regions were only added to legacy PM. Add
them to NPM as well.

Differential Revision: https://reviews.llvm.org/D117605
2022-01-21 19:17:57 -08:00
Simon Pilgrim 0abaf64580 Revert rG4727d29d908f9dd608dd97a58c0af1ad579fd3ca "[X86] Remove __builtin_ia32_pabs intrinsics and use generic __builtin_elementwise_abs"
Some build bots are referencing the `__builtin_ia32_pabs` intrinsics via alternative headers
2022-01-21 12:35:36 +00:00
Simon Pilgrim 3ef88b3184 Revert rG8ee135dcf8ff060656ad481c3e980fe8763576f5 "[X86] Remove `__builtin_ia32_pmax/min` intrinsics and use generic `__builtin_elementwise_max/min`"
Some build bots are referencing the `__builtin_ia32_pmax/min` intrinsics via alternative headers
2022-01-21 12:34:19 +00:00
Simon Pilgrim 8ee135dcf8 [X86] Remove `__builtin_ia32_pmax/min` intrinsics and use generic `__builtin_elementwise_max/min`
D111985 added the generic `__builtin_elementwise_max` and `__builtin_elementwise_min` intrinsics with the same integer behaviour as the SSE/AVX instructions

This patch removes the `__builtin_ia32_pmax/min` intrinsics and just uses `__builtin_elementwise_max/min` - the existing tests see no changes:
```
__m256i test_mm256_max_epu32(__m256i a, __m256i b) {
  // CHECK-LABEL: test_mm256_max_epu32
  // CHECK: call <8 x i32> @llvm.umax.v8i32(<8 x i32> %{{.*}}, <8 x i32> %{{.*}})
  return _mm256_max_epu32(a, b);
}
```
This requires us to add a `__v64qs` explicitly signed char vector type (we already have `__v16qs` and `__v32qs`).

Sibling patch to D117791

Differential Revision: https://reviews.llvm.org/D117798
2022-01-21 12:24:58 +00:00
Simon Pilgrim 4727d29d90 [X86] Remove __builtin_ia32_pabs intrinsics and use generic __builtin_elementwise_abs
D111986 added the generic `__builtin_elementwise_abs()` intrinsic with the same integer absolute behaviour as the SSE/AVX instructions (abs(INT_MIN) == INT_MIN)

This patch removes the `__builtin_ia32_pabs*` intrinsics and just uses `__builtin_elementwise_abs` - the existing tests see no changes:
```
__m256i test_mm256_abs_epi8(__m256i a) {
  // CHECK-LABEL: test_mm256_abs_epi8
  // CHECK: [[ABS:%.*]] = call <32 x i8> @llvm.abs.v32i8(<32 x i8> %{{.*}}, i1 false)
  return _mm256_abs_epi8(a);
}
```
This requires us to add a `__v64qs` explicitly signed char vector type (we already have `__v16qs` and `__v32qs`).

Differential Revision: https://reviews.llvm.org/D117791
2022-01-21 11:59:08 +00:00
Joao Moreira 82af95029e [X86] Enable ibt-seal optimization when LTO is used in Kernel
Intel's CET/IBT requires every indirect branch target to be an ENDBR instruction. Because of that, the compiler needs to correctly emit these instruction on function's prologues. Because this is a security feature, it is desirable that only actual indirect-branch-targeted functions are emitted with ENDBRs. While it is possible to identify address-taken functions through LTO, minimizing these ENDBR instructions remains a hard task for user-space binaries because exported functions may end being reachable through PLT entries, that will use an indirect branch for such. Because this cannot be determined during compilation-time, the compiler currently emits ENDBRs to every non-local-linkage function.

Despite the challenge presented for user-space, the kernel landscape is different as no PLTs are used. With the intent of providing the most fit ENDBR emission for the kernel, kernel developers proposed an optimization named "ibt-seal" which replaces the ENDBRs for NOPs directly in the binary. The discussion of this feature can be seen in [1].

This diff brings the enablement of the flag -mibt-seal, which in combination with LTO enforces a different policy for ENDBR placement in when the code-model is set to "kernel". In this scenario, the compiler will only emit ENDBRs to address taken functions, ignoring non-address taken functions that are don't have local linkage.

A comparison between an LTO-compiled kernel binaries without and with the -mibt-seal feature enabled shows that when -mibt-seal was used, the number of ENDBRs in the vmlinux.o binary patched by objtool decreased from 44383 to 33192, and that the number of superfluous ENDBR instructions nopped-out decreased from 11730 to 540.

The 540 missed superfluous ENDBRs need to be investigated further, but hypotheses are: assembly code not being taken care of by the compiler, kernel exported symbols mechanisms creating bogus address taken situations or even these being removed due to other binary optimizations like kernel's static_calls. For now, I assume that the large drop in the number of ENDBR instructions already justifies the feature being merged.

[1] - https://lkml.org/lkml/2021/11/22/591

Reviewed By: xiangzhangllvm

Differential Revision: https://reviews.llvm.org/D116070
2022-01-21 10:55:34 +08:00
Alexandre Ganea 5af2433e17 [clang-cl] Support the /HOTPATCH flag
This patch adds support for the MSVC /HOTPATCH flag: https://docs.microsoft.com/sv-se/cpp/build/reference/hotpatch-create-hotpatchable-image?view=msvc-170&viewFallbackFrom=vs-2019

The flag is translated to a new -fms-hotpatch flag, which in turn adds a 'patchable-function' attribute for each function in the TU. This is then picked up by the PatchableFunction pass which would generate a TargetOpcode::PATCHABLE_OP of minsize = 2 (which means the target instruction must resolve to at least two bytes). TargetOpcode::PATCHABLE_OP is only implemented for x86/x64. When targetting ARM/ARM64, /HOTPATCH isn't required (instructions are always 2/4 bytes and suitable for hotpatching).

Additionally, when using /Z7, we generate a 'hot patchable' flag in the CodeView debug stream, in the S_COMPILE3 record. This flag is then picked up by LLD (or link.exe) and is used in conjunction with the linker /FUNCTIONPADMIN flag to generate extra space before each function, to accommodate for live patching long jumps. Please see: d703b92296/lld/COFF/Writer.cpp (L1298)

The outcome is that we can finally use Live++ or Recode along with clang-cl.

NOTE: It seems that MSVC cl.exe always enables /HOTPATCH on x64 by default, although if we did the same I thought we might generate sub-optimal code (if this flag was active by default). Additionally, MSVC always generates a .debug$S section and a S_COMPILE3 record, which Clang doesn't do without /Z7. Therefore, the following MSVC command-line "cl /c file.cpp" would have to be written with Clang such as "clang-cl /c file.cpp /HOTPATCH /Z7" in order to obtain the same result.

Depends on D43002, D80833 and D81301 for the full feature.

Differential Revision: https://reviews.llvm.org/D116511
2022-01-20 12:57:19 -05:00
Florian Hahn 67aa314bce
[IRGen] Do not overwrite existing attributes in CGCall.
When adding new attributes, existing attributes are dropped. While
this appears to be a longstanding issue, this was highlighted by D105169
which dropped a lot of attributes due to adding the new noundef
attribute.

Ahmed Bougacha (@ab) tracked down the issue and provided the fix in
CGCall.cpp. I bundled it up and updated the tests.
2022-01-20 13:45:19 +00:00
Chenbing.Zheng 0be3da1fab [RISCV] Add intrinsic for Zbt extension
RV32: fsl, fsr, fsri
RV64: fsl, fsr, fsri, fslw, fsrw, fsriw

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117468
2022-01-20 08:27:05 +00:00
Yaxun (Sam) Liu 85c2bd2a0e Prevent adding module flag amdgpu_hostcall multiple times
HIP program with printf call fails to compile with -fsanitize=address
option, because of appending module flag - amdgpu_hostcall twice, one
for printf and one for sanitize option. This patch fixes that issue.

Patch by: Praveen Velliengiri

Reviewed by: Yaxun Liu, Roman Lebedev

Differential Revision: https://reviews.llvm.org/D116216
2022-01-19 12:52:33 -05:00
Arnamoy Bhattacharyya 9fbd33ad62 [OMPIRBuilder] Add support for simd (loop) directive.
This patch adds OMPIRBuilder support for the simd directive (without any clause).  This will be a first step towards lowering simd directive in LLVM_Flang.  The patch uses existing CanonicalLoop infrastructure of IRBuilder to add the support.  Also adds necessary code to add llvm.access.group and llvm.loop metadata wherever needed.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D114379
2022-01-19 11:32:17 -05:00
Ben Shi a2f488c6a5 [clang][AVR] Implement '__flashN' for variables on different flash banks
Reviewed By: aykevl

Differential Revision: https://reviews.llvm.org/D115982
2022-01-19 11:24:01 +00:00
hyeongyu kim 1b1c8d83d3 [Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.

Test updates are made as a separate patch: D108453

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D105169
2022-01-16 18:54:17 +09:00
Nikita Popov c63a3175c2 [AttrBuilder] Remove ctor accepting AttributeList and Index
Use the AttributeSet constructor instead. There's no good reason
why AttrBuilder itself should exact the AttributeSet from the
AttributeList. Moving this out of the AttrBuilder generally results
in cleaner code.
2022-01-15 22:39:31 +01:00
James Y Knight 0d3f2fd269 Revert "Skip exception cleanups when the innermost scope is EHTerminateScope."
Breaks tests on some platforms. Reverting while investigating.

This reverts commit a4e255f9c6.
2022-01-14 18:59:24 -05:00
Matt Arsenault 33315ef321 clang/AMDGPU: Don't set implicit arg attribute to default size
Since 2959e082e1, we conservatively
assume all inputs are enabled by default. This isn't the best
interface for controlling these anyway, since it's not granular and
only allows trimming the last fields.
2022-01-14 18:43:30 -05:00
James Y Knight a4e255f9c6 Skip exception cleanups when the innermost scope is EHTerminateScope.
EHTerminateScope is used to implement C++ noexcept semantics. Per C++
[except.terminate], it is implemented-defined whether no, some, or all
cleanups are run prior to terminatation.

Therefore, the code to run cleanups on the way towards termination is
unnecessary, and may be omitted.

After this change, we will still run some cleanups: any cleanups in a
function called from the noexcept function will continue to run, while
those in the noexcept function itself will not.

Differential Revision: https://reviews.llvm.org/D113620
2022-01-14 18:01:29 -05:00
Erich Keane 2bcba21c8b [CPU-Dispatch] Make sure Dispatch names get updated if previously mangled
Cases where there is a mangling of a cpu-dispatch/cpu-specific function
before the function becomes 'multiversion' (such as a member function)
causes the wrong name to be emitted for one of the variants/resolver,
since the name is cached.  Make sure we invalidate the cache in
cpu-dispatch/cpu-specific modes, like we previously did for just target
multiversioning.
2022-01-14 10:45:55 -08:00
Benjamin Kramer 765dd8b8a4 [CGBuiltin] Simplify code. NFCI. 2022-01-14 16:02:02 +01:00
Jun Zhang 8de0c1feca
[Clang] Add __builtin_reduce_or and __builtin_reduce_and
This patch implements two builtins specified in D111529.
The last __builtin_reduce_add will be seperated into another one.

Differential Revision: https://reviews.llvm.org/D116736
2022-01-14 22:05:26 +08:00
Kevin Athey a0458b531c Add -fsanitize-address-param-retval to clang.
With the introduction of this flag, it is no longer necessary to enable noundef analysis with 4 separate flags.
(-Xclang -enable-noundef-analysis -mllvm -msan-eager-checks=1).
This change only covers the introduction into the compiler.

This is a follow up to: https://reviews.llvm.org/D116855

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D116633
2022-01-14 00:41:28 -08:00
Maurice Heumann 072e2a7c67 [MS] Implement on-demand TLS initialization for Microsoft CXX ABI
TLS initializers, for example constructors of thread-local variables, don't necessarily get called. If a thread was created before a module is loaded, the module's TLS initializers are not executed for this particular thread.

This is why Microsoft added support for dynamic TLS initialization. Before every use of thread-local variables, a check is added that runs the module's TLS initializers on-demand.

To do this, the method `__dyn_tls_on_demand_init` gets called. Internally, it simply calls `__dyn_tls_init`.

No additional TLS initializer that sets the guard needs to be emitted, as the guard always gets set by `__dyn_tls_init`.
The guard is also checked again within `__dyn_tls_init`. This makes our check redundant, however, as Microsoft's compiler also emits this check, the behaviour is adopted here.

Reviewed By: majnemer

Differential Revision: https://reviews.llvm.org/D115456
2022-01-13 21:23:23 -08:00
Elizabeth Andrews 4eaf5846d0 [clang] Fix function pointer address space
Functions pointers should be created with program address space. This
patch introduces program address space in TargetInfo. Targets with
non-default (default is 0) address space for functions should explicitly
set this value. This patch fixes a crash on lvalue reference to function
pointer (in device code) when using oneAPI DPC++ compiler.

Differential Revision: https://reviews.llvm.org/D111566
2022-01-13 08:06:19 -08:00
Erich Keane b699e8b11a Add another assert to cpu-dispatch emission to help track down a tough
to repro error.

As mentioned yesterday, I've got a problem that I can only reproduce on
Godbolt (none of the build configs on my local machine!), so this is at
least somewhat usable until I figure out a cause.
2022-01-13 06:54:08 -08:00
Lian Wang 16877c5d2c [RISCV] Add bfp and bfpw intrinsic in zbf extension
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D116994
2022-01-13 02:53:00 +00:00
Kevin Athey a141e47138 [NFC] Minimize noundef analysis when disabled
Minor adjustment in order of noundef analysis to be a bit more optimal (when disabled).

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D117078
2022-01-12 17:21:19 -08:00
Erich Keane 6e77ad11ff Add an assert in cpudispatch emit to try to track down an error.
I'm attempting to debug an issue that I can only get to happen on
godbolt, where the cpu-dispatch resolver for an out of line member
function is generated with the wrong name, causing a link failure.
2022-01-12 10:31:28 -08:00
Simon Pilgrim 497a4b26c4 CGBuiltin - Use castAs<> instead of getAs<> to avoid dereference of nullptr
The pointer is always dereferenced immediately below, so assert the cast is correct instead of returning nullptr
2022-01-12 15:35:37 +00:00
Marco Elver 732ad8ea62 [clang][auto-init] Provide __builtin_alloca*_uninitialized variants
When `-ftrivial-auto-var-init=` is enabled, allocas unconditionally
receive auto-initialization since [1].

In certain cases, it turns out, this is causing problems. For example,
when using alloca to add a random stack offset, as the Linux kernel does
on syscall entry [2]. In this case, none of the alloca'd stack memory is
ever used, and initializing it should be controllable; furthermore, it
is not always possible to safely call memset (see [2]).

Introduce `__builtin_alloca_uninitialized()` (and
`__builtin_alloca_with_align_uninitialized`), which never performs
initialization when `-ftrivial-auto-var-init=` is enabled.

[1] https://reviews.llvm.org/D60548
[2] https://lkml.kernel.org/r/YbHTKUjEejZCLyhX@elver.google.com

Reviewed By: glider

Differential Revision: https://reviews.llvm.org/D115440
2022-01-12 15:13:10 +01:00
Sven van Haastregt 4b85800bfd [OpenCL] Set external linkage for block enqueue kernels
All kernels can be called from the host as per the SPIR_KERNEL calling
convention.  As such, all kernels should have external linkage, but
block enqueue kernels were created with internal linkage.

Reported-by: Pedro Olsen Ferreira

Differential Revision: https://reviews.llvm.org/D115523
2022-01-12 13:30:09 +00:00
Adam Magier b2715660ed [clang][CodeGen][UBSan] VLA size checking for unsigned integer parameter
The code generation for the UBSan VLA size check was qualified by a con-
dition that the parameter must be a signed integer, however the C spec
does not make any distinction that only signed integer parameters can be
used to declare a VLA, only qualifying that it must be greater than zero
if it is not a constant.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D116048
2022-01-12 01:11:52 +01:00
Nick Desaulniers 5c562f62a4 [clang] number labels in asm goto strings after tied inputs
I noticed that the following case would compile in Clang but not GCC:
    void *x(void) {
      void *p = &&foo;
      asm goto ("# %0\n\t# %l1":"+r"(p):::foo);
      foo:;
      return p;
    }

Changing the output template above from %l2 would compile in GCC but not
Clang.

This demonstrates that when using tied outputs (say via the "+r" output
constraint), the hidden inputs occur or are numbered BEFORE the labels,
at least with GCC.

In fact, GCC does denote this in its documentation:
https://gcc.gnu.org/onlinedocs/gcc-11.2.0/gcc/Extended-Asm.html#Goto-Labels

> Output operand with constraint modifier ‘+’ is counted as two operands
> because it is considered as one output and one input operand.

For the sake of compatibility, I think it's worthwhile to just make this
change.

It's better to use symbolic names for compatibility (especially now
between released version of Clang that support asm goto with outputs).
ie. %l1 from the above would be %l[foo]. The GCC docs also make this
recommendation.

Also, I cleaned up some cruft in GCCAsmStmt::getNamedOperand. AFAICT,
NumPlusOperands was no longer used, though I couldn't find which commit
didn't clean that up correctly.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98096
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103640
Link: https://gcc.gnu.org/onlinedocs/gcc-11.2.0/gcc/Extended-Asm.html#Goto-Labels

Reviewed By: void

Differential Revision: https://reviews.llvm.org/D115471
2022-01-11 12:09:24 -08:00
Nick Desaulniers c8463fd22b [clang][CGStmt] emit i constraint rather than X for asm goto indirect dests
As suggested in:
https://reviews.llvm.org/D114895#3177794
X will be converted to i by SelectionDAGISEL anyways.

Reviewed By: void, jyknight

Differential Revision: https://reviews.llvm.org/D115311
2022-01-11 11:48:40 -08:00
Akira Hatanaka e5df9cc098 [CodeGen] Treat ObjC `__unsafe_unretained` and class types as trivial
when generating copy/dispose helper functions

Analyze the block captures just once before generating copy/dispose
block helper functions and honor the inert `__unsafe_unretained`
qualifier. This refactor fixes a bug where captures of ObjC
`__unsafe_unretained` and class types were needlessly retained/released
by the copy/dispose helper functions.

Differential Revision: https://reviews.llvm.org/D116948
2022-01-11 11:18:24 -08:00
Nikita Popov acc39873b7 [CodeGen] Avoid deprecated Address constructor 2022-01-11 13:07:02 +01:00
Hans Wennborg 0b48d0fe12 [ADT] Add an in-place version of toHex()
and use that to simplify MD5's hex string code which was previously
using a string stream, as well as Clang's
CGDebugInfo::computeChecksum().

Differential revision: https://reviews.llvm.org/D116960
2022-01-11 11:51:04 +01:00
Nikita Popov 2d1b55ebea [CodeGen] Make element type in emitArrayDestroy() predictable
When calling emitArrayDestroy(), the pointer will usually have
ConvertTypeForMem(EltType) as the element type, as one would expect.
However, globals with initializers sometimes don't use the same
types as values normally would, e.g. here the global uses
{ double, i32 } rather than %struct.T as element type.

Add an early cast to the global destruction path to avoid this
special case. The cast would happen lateron anyway, it only gets
moved to an earlier point.

Differential Revision: https://reviews.llvm.org/D116219
2022-01-11 09:25:29 +01:00
Jennifer Yu 140a6b1e5c [clang][OpenMP5.1] Initial parsing/sema for 'indirect' clause
Differential Revision: https://reviews.llvm.org/D116764
2022-01-10 16:58:56 -08:00
Adrian Prantl eb200e584e Emit the C++ dialect in -gmodules .pcm files.
Because of commit: https://reviews.llvm.org/D104291 the -gmodules .pcm
files do not have the same DW_AT_language dialect as the .o file. This
was a simple matter of passing the DebugStrictDwarf flag to the
PCHContainerGenerator object's CodeGenOpts from the CompilerInstance
passed in to it.

Before this change if you ran dwarfdump on the gmodule cache folder
you would get DW_AT_language (DW_LANG_C_plus_plus) even when using
-std=c++14 with clang

Patch by Shubham Rastogi!

Differential Revision: https://reviews.llvm.org/D116790
2022-01-10 16:13:40 -08:00
Nikita Popov 7725331ccd [CodeGen] Avoid some pointer element type accesses
Possibly this is sufficient to fix PR53089.
2022-01-10 15:02:55 +01:00
Serge Guelton d2cc6c2d0c Use a sorted array instead of a map to store AttrBuilder string attributes
Using and std::map<SmallString, SmallString> for target dependent attributes is
inefficient: it makes its constructor slightly heavier, and involves extra
allocation for each new string attribute. Storing the attribute key/value as
strings implies extra allocation/copy step.

Use a sorted vector instead. Given the low number of attributes generally
involved, this is cheaper, as showcased by

https://llvm-compile-time-tracker.com/compare.php?from=5de322295f4ade692dc4f1823ae4450ad3c48af2&to=05bc480bf641a9e3b466619af43a2d123ee3f71d&stat=instructions

Differential Revision: https://reviews.llvm.org/D116599
2022-01-10 14:49:53 +01:00
Kazu Hirata 17d4bd3d78 [clang] Fix bugprone argument comments (NFC)
Identified with bugprone-argument-comment.
2022-01-09 00:19:49 -08:00
Kazu Hirata 40446663c7 [clang] Use true/false instead of 1/0 (NFC)
Identified with modernize-use-bool-literals.
2022-01-09 00:19:47 -08:00
Johannes Doerfert 37639b72a1 [OpenMP][FIX] Emit debug declares only if debug info is available
The `EmitDeclareOfAutoVariable` introduced in D114504 and D115510 has a
precondition that cannot be violated. It is unclear if we should call it
directly given the sparse usage in clang but for now we should at least
not crash if the debug info kind is too low.

Fixes #52938.

Differential Revision: https://reviews.llvm.org/D116865
2022-01-08 17:01:19 -06:00
Kazu Hirata d1b127b5b7 [clang] Remove unused forward declarations (NFC) 2022-01-08 11:56:40 -08:00
Simon Pilgrim 6ee589e2f5 [CGObjCMac] Use castAs<> instead of getAs<> to avoid dereference of nullptr inside BuildRCBlockVarRecordLayout
This will assert the cast is correct instead of returning nullptr (UnionType is a subtype of RecordType so this should be clean).
2022-01-08 16:18:55 +00:00
Simon Pilgrim 06e9733fec [CGExpr] Use castAs<> instead of getAs<> to avoid dereference of nullptr
This will assert the cast is correct instead of returning nullptr
2022-01-08 14:26:09 +00:00
Jun Zhang 5be131922c
[NFC] Test commit.
This is just a test commit to check whether the permission I got is
correct or not.
2022-01-08 10:36:09 +08:00
Jun Zhang b2ed9f3f44
[Clang] Implement the rest of __builtin_elementwise_* functions.
The patch implement the rest of __builtin_elementwise_* functions
specified in D111529, including:
* __builtin_elementwise_floor
* __builtin_elementwise_roundeven
* __builtin_elementwise_trunc

Signed-off-by: Jun <jun@junz.org>

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D115429
2022-01-07 15:11:36 +00:00
Nikita Popov e8b98a5216 [CodeGen] Emit elementtype attributes for indirect inline asm constraints
This implements the clang side of D116531. The elementtype
attribute is added for all indirect constraints (*) and tests are
updated accordingly.

Differential Revision: https://reviews.llvm.org/D116666
2022-01-06 09:29:22 +01:00
David Pagan 7df2371bc6 Add codegen for allocate directive's 'align' clause 2022-01-05 12:40:58 -05:00
Chuanqi Xu c75cedc237 [Coroutines] Set presplit attribute in Clang and mlir
This fixes bug49264.

Simply, coroutine shouldn't be inlined before CoroSplit. And the marker
for pre-splited coroutine is created in CoroEarly pass, which ran after
AlwaysInliner Pass in O0 pipeline. So that the AlwaysInliner couldn't
detect it shouldn't inline a coroutine. So here is the error.

This patch set the presplit attribute in clang and mlir. So the inliner
would always detect the attribute before splitting.

Reviewed By: rjmccall, ezhulenev

Differential Revision: https://reviews.llvm.org/D115790
2022-01-05 10:25:02 +08:00
serge-sans-paille 9290ccc3c1 Introduce the AttributeMask class
This class is solely used as a lightweight and clean way to build a set of
attributes to be removed from an AttrBuilder. Previously AttrBuilder was used
both for building and removing, which introduced odd situation like creation of
Attribute with dummy value because the only relevant part was the attribute
kind.

Differential Revision: https://reviews.llvm.org/D116110
2022-01-04 15:37:46 +01:00
Jun Zhang 82020de532
Recommit "[Clang] Extend emitUnaryBuiltin to avoid duplicate logic.""
This reverts the revert commit f552ba6e84.

Recommit with fixed author name.
2022-01-04 13:46:41 +00:00
Florian Hahn f552ba6e84
Revert "[Clang] Extend emitUnaryBuiltin to avoid duplicate logic."
This reverts commit 5c57e6aa57.

Reverted due to a typo in the authors name. Will recommit soon with
fixed authorship.
2022-01-04 13:45:28 +00:00
Jun Zhan 5c57e6aa57
[Clang] Extend emitUnaryBuiltin to avoid duplicate logic.
This patch extends `emitUnaryBuiltin` so that we can better emitting IR when
implement builtins specified in D111529.

Also contains some NFC, applying it to existing code.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D116161
2022-01-04 11:47:41 +00:00
Kazu Hirata d677a7cb05 [clang] Remove redundant member initialization (NFC)
Identified with readability-redundant-member-init.
2022-01-02 10:20:23 -08:00
Kazu Hirata 683e6ee7d0 [CodeGen] Remove redundant string initialization (NFC)
Identified with readability-redundant-string-init.
2022-01-01 09:14:23 -08:00
Kazu Hirata 298367ee6e [clang] Use nullptr instead of 0 or NULL (NFC)
Identified with modernize-use-nullptr.
2021-12-29 08:34:20 -08:00
Johannes Doerfert 944aa0421c Reapply "[OpenMP][NFCI] Embed the source location string size in the ident_t"
This reverts commit 73ece231ee and
reapplies 7bfcdbcbf3 with mlir changes.
Also reverts commit 423ba12971 and
includes the unit test changes of
16da214004.
2021-12-29 01:10:38 -06:00
Mehdi Amini 73ece231ee Revert "[OpenMP][NFCI] Embed the source location string size in the ident_t"
This reverts commit 7bfcdbcbf3.
Broke MLIR build
2021-12-29 06:57:36 +00:00
Johannes Doerfert 7bfcdbcbf3 [OpenMP][NFCI] Embed the source location string size in the ident_t
One of the unused ident_t fields now holds the size of the string
(=const char *) field so we have an easier time dealing with those
in the future.

Differential Revision: https://reviews.llvm.org/D113126
2021-12-28 23:53:29 -06:00
Joseph Huber 7cdaa5a94e [OpenMP][FIX] Change globalization alignment to 16
This patch changes the default aligntment from 8 to 16, and encodes this
information in the `__kmpc_alloc_shared` runtime call to communicate it
to the HeapToStack pass. The previous alignment of 8 was not sufficient
for the maximum size of primitive types on 64-bit systems, and needs to
be increaesd. This reduces the amount of space availible in the data
sharing stack, so this implementation will need to be improved later to
include the alignment requirements in the allocation call, and use it
properly in the data sharing stack in the runtime.

Depends on D115888

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D115971
2021-12-27 16:58:25 -05:00
Nikita Popov 3e65861131 [CodeGen] Avoid one more pointer element type access
The number of elements is always a SizeTy here.
2021-12-27 12:58:22 +01:00
Nikita Popov 1f07a4a569 [CodeGen] Avoid more pointer element type accesses 2021-12-27 12:00:22 +01:00
Shao-Ce SUN ec501f15a8 [clang][CodeGen] Remove the signed version of createExpression
Fix a TODO. Remove the callers of this signed version and delete.

Reviewed By: CodaFi

Differential Revision: https://reviews.llvm.org/D116014
2021-12-27 14:16:08 +08:00
Kazu Hirata 31cfb3f4f6 [clang] Remove redundant calls to c_str() (NFC)
Identified with readability-redundant-string-cstr.
2021-12-26 13:31:40 -08:00
Kazu Hirata 0542d15211 Remove redundant string initialization (NFC)
Identified with readability-redundant-string-init.
2021-12-26 09:39:26 -08:00
Kazu Hirata 2d303e6781 Remove redundant return and continue statements (NFC)
Identified with readability-redundant-control-flow.
2021-12-24 23:17:54 -08:00
Kazu Hirata 76f0f1cc5c Use {DenseSet,SetVector,SmallPtrSet}::contains (NFC) 2021-12-24 21:43:06 -08:00
Kazu Hirata 9c0a4227a9 Use Optional::getValueOr (NFC) 2021-12-24 20:57:40 -08:00
Shilei Tian c7a589a2c4 [Clang][OpenMP] Add the support for atomic compare in parser
This patch adds the support for `atomic compare` in parser. The support
in Sema and CodeGen will come soon. For now, it simply eimits an error when it
is encountered.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D115561
2021-12-24 08:16:51 -05:00
Nikita Popov dd903173c0 [OpenMP] Avoid creating null pointer lvalue (NFC)
The reduction initialization code creates a "naturally aligned null
pointer to void lvalue", which I found somewhat odd, even though it
works out in the end because it is not actually used. It doesn't
look like this code actually needs an LValue for anything though,
and we can use an invalid Address to represent this case instead.

Differential Revision: https://reviews.llvm.org/D116214
2021-12-24 09:01:56 +01:00
Chuanqi Xu f3d4e168db [C++20] Conform coroutine's comments in clang (NFC-ish)
The comments for coroutine in clang wrote for coroutine-TS. Now
coroutine is merged into standard. Try to conform the comments.
2021-12-24 12:41:44 +08:00
Nikita Popov 7977fd7cfc [OpenMP] Remove no-op cast (NFC)
This was casting the address to its own element type, which is
a no-op.
2021-12-23 15:15:26 +01:00
Nikita Popov bf2b5551f9 [CodeGen] Use CreateConstInBoundsGEP() in one more place
This does exactly what this code manually implemented.
2021-12-23 14:58:47 +01:00