Initial support for using the OpenMPIRBuilder by clang to generate loops using the OpenMPIRBuilder. This initial support is intentionally limited to:
* Only the worksharing-loop directive.
* Recognizes only the nowait clause.
* No loop nests with more than one loop.
* Untested with templates, exceptions.
* Semantic checking left to the existing infrastructure.
This patch introduces a new AST node, OMPCanonicalLoop, which becomes parent of any loop that has to adheres to the restrictions as specified by the OpenMP standard. These restrictions allow OMPCanonicalLoop to provide the following additional information that depends on base language semantics:
* The distance function: How many loop iterations there will be before entering the loop nest.
* The loop variable function: Conversion from a logical iteration number to the loop variable.
These allow the OpenMPIRBuilder to act solely using logical iteration numbers without needing to be concerned with iterator semantics between calling the distance function and determining what the value of the loop variable ought to be. Any OpenMP logical should be done by the OpenMPIRBuilder such that it can be reused MLIR OpenMP dialect and thus by flang.
The distance and loop variable function are implemented using lambdas (or more exactly: CapturedStmt because lambda implementation is more interviewed with the parser). It is up to the OpenMPIRBuilder how they are called which depends on what is done with the loop. By default, these are emitted as outlined functions but we might think about emitting them inline as the OpenMPRuntime does.
For compatibility with the current OpenMP implementation, even though not necessary for the OpenMPIRBuilder, OMPCanonicalLoop can still be nested within OMPLoopDirectives' CapturedStmt. Although OMPCanonicalLoop's are not currently generated when the OpenMPIRBuilder is not enabled, these can just be skipped when not using the OpenMPIRBuilder in case we don't want to make the AST dependent on the EnableOMPBuilder setting.
Loop nests with more than one loop require support by the OpenMPIRBuilder (D93268). A simple implementation of non-rectangular loop nests would add another lambda function that returns whether a loop iteration of the rectangular overapproximation is also within its non-rectangular subset.
Reviewed By: jdenny
Differential Revision: https://reviews.llvm.org/D94973
Background:
Wasm EH, while using Windows EH (catchpad/cleanuppad based) IR, uses
Itanium-based libraries and ABIs with some modifications.
`__clang_call_terminate` is a wrapper generated in Clang's Itanium C++
ABI implementation. It contains this code, in C-style pseudocode:
```
void __clang_call_terminate(void *exn) {
__cxa_begin_catch(exn);
std::terminate();
}
```
So this function is a wrapper to call `__cxa_begin_catch` on the
exception pointer before termination.
In Itanium ABI, this function is called when another exception is thrown
while processing an exception. The pointer for this second, violating
exception is passed as the argument of this `__clang_call_terminate`,
which calls `__cxa_begin_catch` with that pointer and calls
`std::terminate` to terminate the program.
The spec (https://libcxxabi.llvm.org/spec.html) for `__cxa_begin_catch`
says,
```
When the personality routine encounters a termination condition, it
will call __cxa_begin_catch() to mark the exception as handled and then
call terminate(), which shall not return to its caller.
```
In wasm EH's Clang implementation, this function is called from
cleanuppads that terminates the program, which we also call terminate
pads. Cleanuppads normally don't access the thrown exception and the
wasm backend converts them to `catch_all` blocks. But because we need
the exception pointer in this cleanuppad, we generate
`wasm.get.exception` intrinsic (which will eventually be lowered to
`catch` instruction) as we do in the catchpads. But because terminate
pads are cleanup pads and should run even when a foreign exception is
thrown, so what we have been doing is:
1. In `WebAssemblyLateEHPrepare::ensureSingleBBTermPads()`, we make sure
terminate pads are in this simple shape:
```
%exn = catch
call @__clang_call_terminate(%exn)
unreachable
```
2. In `WebAssemblyHandleEHTerminatePads` pass at the end of the
pipeline, we attach a `catch_all` to terminate pads, so they will be in
this form:
```
%exn = catch
call @__clang_call_terminate(%exn)
unreachable
catch_all
call @std::terminate()
unreachable
```
In `catch_all` part, we don't have the exception pointer, so we call
`std::terminate()` directly. The reason we ran HandleEHTerminatePads at
the end of the pipeline, separate from LateEHPrepare, was it was
convenient to assume there was only a single `catch` part per `try`
during CFGSort and CFGStackify.
---
Problem:
While it thinks terminate pads could have been possibly split or calls
to `__clang_call_terminate` could have been duplicated,
`WebAssemblyLateEHPrepare::ensureSingleBBTermPads()` assumes terminate
pads contain no more than calls to `__clang_call_terminate` and
`unreachable` instruction. I assumed that because in LLVM very limited
forms of transformations are done to catchpads and cleanuppads to
maintain the scoping structure. But it turned out to be incorrect;
passes can merge cleanuppads into one, including terminate pads, as long
as the new code has a correct scoping structure. One pass that does this
I observed was `SimplifyCFG`, but there can be more. After this
transformation, a single cleanuppad can contain any number of other
instructions with the call to `__clang_call_terminate` and can span many
BBs. It wouldn't be practical to duplicate all these BBs within the
cleanuppad to generate the equivalent `catch_all` blocks, only with
calls to `__clang_call_terminate` replaced by calls to `std::terminate`.
Unless we do more complicated transformation to split those calls to
`__clang_call_terminate` into a separate cleanuppad, it is tricky to
solve.
---
Solution (?):
This CL just disables the generation and use of `__clang_call_terminate`
and calls `std::terminate()` directly in its place.
The possible downside of this approach can be, because the Itanium ABI
intended to "mark" the violating exception handled, we don't do that
anymore. What `__cxa_begin_catch` actually does is increment the
exception's handler count and decrement the uncaught exception count,
which in my opinion do not matter much given that we are about to
terminate the program anyway. Also it does not affect info like stack
traces that can be possibly shown to developers.
And while we use a variant of Itanium EH ABI, we can make some
deviations if we choose to; we are already different in that in the
current version of the EH spec we don't support two-phase unwinding. We
can possibly consider a more complicated transformation later to
reenable this, but I don't think that has high priority.
Changes in this CL contains:
- In Clang, we don't generate a call to `wasm.get.exception()` intrinsic
and `__clang_call_terminate` function in terminate pads anymore; we
simply generate calls to `std::terminate()`, which is the default
implementation of `CGCXXABI::emitTerminateForUnexpectedException`.
- Remove `WebAssembly::ensureSingleBBTermPads() function and
`WebAssemblyHandleEHTerminatePads` pass, because terminate pads are
already `catch_all` now (because they don't need the exception
pointer) and we don't need these transformations anymore.
- Change tests to use `std::terminate` directly. Also removes tests that
tested `LateEHPrepare::ensureSingleBBTermPads` and
`HandleEHTerminatePads` pass.
- Drive-by fix: Add some function attributes to EH intrinsic
declarations
Fixes https://github.com/emscripten-core/emscripten/issues/13582.
Reviewed By: dschuff, tlively
Differential Revision: https://reviews.llvm.org/D97834
Use a WeakTrackingVH to cope with the stmt emission logic that cleans up
unreachable blocks. This invalidates the reference to the deferred
replacement placeholder. Cope with it.
Fixes PR25102 (from 2015!)
This change adds a new IR noundef attribute, which denotes when a function call argument or return val may never contain uninitialized bits.
In MemorySanitizer, this attribute enables optimizations which decrease instrumented code size by up to 17% (measured with an instrumented build of clang) . I'll introduce the change allowing msan to take advantage of this information in a separate patch.
Differential Revision: https://reviews.llvm.org/D81678
explicitly emitting retainRV or claimRV calls in the IR
This reapplies ed4718eccb, which was reverted
because it was causing a miscompile. The bug that was causing the miscompile
has been fixed in 75805dce5f.
Original commit message:
Background:
This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
which indicates the call is implicitly followed by a marker
instruction and an implicit retainRV/claimRV call that consumes the
call result. In addition, it emits a call to
@llvm.objc.clang.arc.noop.use, which consumes the call result, to
prevent the middle-end passes from changing the return type of the
called function. This is currently done only when the target is arm64
and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
claimRV is attached to the call since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since the ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if retainRV is attached to the call and
does nothing if claimRV is attached to it.
- SCCP refrains from replacing the return value of a call with a
constant value if the call has the operand bundle. This ensures the
call always has at least one user (the call to
@llvm.objc.clang.arc.noop.use).
- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
multiple operand bundles of the same kind were being added to a call.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
... unless it's a literal
D94640 was a bit too aggressive in its analysis, considering integers
representing valid addresses as invalid. This change rolls back some of
the check, so that only the most obvious case is still flagged.
Before:
```cpp
free((void*)1000); // literal converted to `void*`: warning good
free((void*)an_int); // `int` object converted to `void*`: warning might
// be a false positive
```
After
```cpp
free((void*)1000); // literal converted to `void*`: warning good
free((void*)an_int); // doesn't warn
```
Differential Revision: https://reviews.llvm.org/D97512
If the file in line directive does not exist on the system we need, to
use the original file to get its file id.
Differential Revision: https://reviews.llvm.org/D97945
The SwitchStmt::FirstCase member is not initialized when the AST is
built by the ASTStmtReader. See the below code of
ASTStmtReader::VisitSwitchStmt in the case where the for loop does not
have any iterations:
```
// ... more code ...
SwitchCase *PrevSC = nullptr;
for (auto E = Record.size(); Record.getIdx() != E; ) {
SwitchCase *SC = Record.getSwitchCaseWithID(Record.readInt());
if (PrevSC)
PrevSC->setNextSwitchCase(SC);
else
S->setSwitchCaseList(SC); // Sets FirstCase !!!
PrevSC = SC;
}
} // return
```
Later, in ASTNodeImporter::VisitSwitchStmt,
we have a condition that depends on this uninited value:
```
for (SwitchCase *SC = S->getSwitchCaseList(); SC != nullptr;
SC = SC->getNextSwitchCase()) {
// ... more code ...
}
```
This is clearly an UB. This causes non-deterministic crashes when
ClangSA analyzes some code with CTU. See the below report by valgrind
(the whole valgrind output is attached):
```
==31019== Conditional jump or move depends on uninitialised value(s)
==31019== at 0x12ED1983: clang::ASTNodeImporter::VisitSwitchStmt(clang::SwitchStmt*) (ASTImporter.cpp:6195)
==31019== by 0x12F1D509: clang::StmtVisitorBase<std::add_pointer, clang::ASTNodeImporter, llvm::Expected<clang::Stmt*>>::Visit(clang::Stmt*) (StmtNodes.inc:591)
==31019== by 0x12EE4FDF: clang::ASTImporter::Import(clang::Stmt*) (ASTImporter.cpp:8484)
==31019== by 0x12F09498: llvm::Expected<clang::Stmt*> clang::ASTNodeImporter::import<clang::Stmt>(clang::Stmt*) (ASTImporter.cpp:164)
==31019== by 0x12F3A1F5: llvm::Error clang::ASTNodeImporter::ImportArrayChecked<clang::Stmt**, clang::Stmt**>(clang::Stmt**, clang::Stmt**, clang::Stmt**) (ASTImporter.cpp:653)
==31019== by 0x12F13152: llvm::Error clang::ASTNodeImporter::ImportContainerChecked<llvm::iterator_range<clang::Stmt**>, llvm::SmallVector<clang::Stmt*, 8u> >(llvm::iterator_range<clang::Stmt**> const&, llvm::SmallVector<clang::Stmt*, 8u>&) (ASTImporter.cpp:669)
==31019== by 0x12ED099F: clang::ASTNodeImporter::VisitCompoundStmt(clang::CompoundStmt*) (ASTImporter.cpp:6077)
==31019== by 0x12F1CC2D: clang::StmtVisitorBase<std::add_pointer, clang::ASTNodeImporter, llvm::Expected<clang::Stmt*>>::Visit(clang::Stmt*) (StmtNodes.inc:73)
==31019== by 0x12EE4FDF: clang::ASTImporter::Import(clang::Stmt*) (ASTImporter.cpp:8484)
==31019== by 0x12F09498: llvm::Expected<clang::Stmt*> clang::ASTNodeImporter::import<clang::Stmt>(clang::Stmt*) (ASTImporter.cpp:164)
==31019== by 0x12F13275: clang::Stmt* clang::ASTNodeImporter::importChecked<clang::Stmt*>(llvm::Error&, clang::Stmt* const&) (ASTImporter.cpp:197)
==31019== by 0x12ED0CE6: clang::ASTNodeImporter::VisitCaseStmt(clang::CaseStmt*) (ASTImporter.cpp:6098)
```
Differential Revision: https://reviews.llvm.org/D97849
IR symbol table does not parse inline asm. A symbol only referenced by inline
asm is not in the IR symbol table, so LTO does not know that the definition (in
another translation unit) is referenced and may internalize it, even if that
definition has `__attribute__((used))` (which lowers to `llvm.compiler.used` on
ELF targets since D97446).
```
// cabac.c
__attribute__((used)) const uint8_t ff_h264_cabac_tables[...] = {...};
// h264_cabac.c
asm("lea ff_h264_cabac_tables(%rip), %0" : ...);
```
`__attribute__((used))` is the recommended way to tell the compiler there may
be inline asm references, so the usage is perfectly fine. This patch
conservatively sets the `FB_used` bit on `llvm.compiler.used` symbols to work
around the IR symbol table limitation. Note: before D97446, Clang never emitted
symbols in the `llvm.compiler.used` list, so this change does not punish any
Clang emitted global object.
Without the patch, `ff_h264_cabac_tables` may be assigned to a non-external
partition and get internalized. Then we will get a linker error because the
`cabac.c` definition is not exposed.
Differential Revision: https://reviews.llvm.org/D97755
Adding it to the general filepaths results in it being added to the
linker arguments. The AIX linker always looks in this path anyway
and adds it as a default library path component. Adding this duplicate
explicitly results in duplicate entries in path in the loader section
of executables and messes up tools like CMake that parse the default
library flags.
Reviewed By: ZarkoCA
Differential Revision: https://reviews.llvm.org/D97574
Currently the test on line 3 is identical to the test on line 1.
Looking at the rest of the file (particularily the use of FOVERRIDE
as the check-prefix), I think it's pretty clear that this line
was supposed to use `-ftrigraphs` instead of `-trigraphs`.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D97796
https://wg21.link/P2173 is making its way through WG21 currently and
has not been formally adopted yet. This feature provides very useful
functionality in that you can specify attributes on the various
function *declarations* generated by a lambda expression, where the
current C++ grammar only allows attributes which apply to the various
function *types* so generated.
This patch implements P2173 on the assumption that it will be adopted
by WG21 with this syntax for C++23.
This commit refactors extension support to allow
specifying whether pragma is needed or not explicitly.
For backward compatibility pragmas are set to required
for all extensions that were added prior to this but
not for OpenCL 3.0 features.
Differential Revision: https://reviews.llvm.org/D97052
This caused miscompiles of Chromium tests for iOS due clobbering of live
registers. See discussion on the code review for details.
> Background:
>
> This fixes a longstanding problem where llvm breaks ARC's autorelease
> optimization (see the link below) by separating calls from the marker
> instructions or retainRV/claimRV calls. The backend changes are in
> https://reviews.llvm.org/D92569.
>
> https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
>
> What this patch does to fix the problem:
>
> - The front-end adds operand bundle "clang.arc.attachedcall" to calls,
> which indicates the call is implicitly followed by a marker
> instruction and an implicit retainRV/claimRV call that consumes the
> call result. In addition, it emits a call to
> @llvm.objc.clang.arc.noop.use, which consumes the call result, to
> prevent the middle-end passes from changing the return type of the
> called function. This is currently done only when the target is arm64
> and the optimization level is higher than -O0.
>
> - ARC optimizer temporarily emits retainRV/claimRV calls after the calls
> with the operand bundle in the IR and removes the inserted calls after
> processing the function.
>
> - ARC contract pass emits retainRV/claimRV calls after the call with the
> operand bundle. It doesn't remove the operand bundle on the call since
> the backend needs it to emit the marker instruction. The retainRV and
> claimRV calls are emitted late in the pipeline to prevent optimization
> passes from transforming the IR in a way that makes it harder for the
> ARC middle-end passes to figure out the def-use relationship between
> the call and the retainRV/claimRV calls (which is the cause of
> PR31925).
>
> - The function inliner removes an autoreleaseRV call in the callee if
> nothing in the callee prevents it from being paired up with the
> retainRV/claimRV call in the caller. It then inserts a release call if
> claimRV is attached to the call since autoreleaseRV+claimRV is
> equivalent to a release. If it cannot find an autoreleaseRV call, it
> tries to transfer the operand bundle to a function call in the callee.
> This is important since the ARC optimizer can remove the autoreleaseRV
> returning the callee result, which makes it impossible to pair it up
> with the retainRV/claimRV call in the caller. If that fails, it simply
> emits a retain call in the IR if retainRV is attached to the call and
> does nothing if claimRV is attached to it.
>
> - SCCP refrains from replacing the return value of a call with a
> constant value if the call has the operand bundle. This ensures the
> call always has at least one user (the call to
> @llvm.objc.clang.arc.noop.use).
>
> - This patch also fixes a bug in replaceUsesOfNonProtoConstant where
> multiple operand bundles of the same kind were being added to a call.
>
> Future work:
>
> - Use the operand bundle on x86-64.
>
> - Fix the auto upgrader to convert call+retainRV/claimRV pairs into
> calls with the operand bundles.
>
> rdar://71443534
>
> Differential Revision: https://reviews.llvm.org/D92808
This reverts commit ed4718eccb.
Our diagnostics relating to static assertions were a bit confused. For
instance, when in MS compatibility mode in C (where we accept
static_assert even without including <assert.h>), we would fail
to warn the user that they were using the wrong spelling (even in
pedantic mode), we were missing a compatibility warning about using
_Static_assert in earlier standards modes, diagnostics for the optional
message were not reflected in C as they were in C++, etc.
Fix regression where we aren't passing `-platform_version` to new ld64.lld after {D95204}.
Most of the changes were originally in D95204, but I backed them out due to
test failures on builds which have `CLANG_DEFAULT_LINKER=lld`. The tests are
properly updated in this diff.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D97741
__builtin_isinf currently generates a floating-point compare operation
which triggers a trap when faced with a signaling NaN in StrictFP mode.
This commit uses integer operations instead to not generate any trap in
such a case.
Reviewed By: mibintc
Differential Revision: https://reviews.llvm.org/D97125
If the mapped structure has data members, which have 'default' mappers,
need to map these members individually using their 'default' mappers.
Differential Revision: https://reviews.llvm.org/D92195
ccb4124a41 fixed translating -gz=zlib to --compress-debug-sections for
linker invocation for several ToolChains, but omitted FreeBSD.
Differential Revision: https://reviews.llvm.org/D97752
`GetAddrOfGlobalTemporary` previously tried to emit the initializer of
a global temporary before updating the global temporary map. Emitting the
initializer could recurse back into `GetAddrOfGlobalTemporary` for the same
temporary, resulting in an infinite recursion.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D97733
The VSX-only overloads (for 8-byte element vectors) are missing.
Add the missing overloads and convert element numbering to
modulo arithmetic to match GCC and XLC.
Currently clang uses stub function to launch kernel. This is inconvenient
to interop with C++ programs since the stub function has different name
as kernel, which is required by ROCm debugger.
This patch emits a variable symbol which has the same name as the kernel
and uses it to register and launch the kernel. This allows C++ program to
launch a kernel by using the original kernel name.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D86376
This makes CC1 and driver defaults consistent.
In addition, for more common cases (-g is specified without -gsplit-dwarf), users will not see -fno-split-dwarf-inlining in CC1 options.
Verified that the below is still true:
* `clang -g` => `splitDebugInlining: false` in DICompileUnit
* `clang -g -gsplit-dwarf` => `splitDebugInlining: false` in DICompileUnit
* `clang -g -gsplit-dwarf -fsplit-dwarf-inlining` => no `splitDebugInlining: false`
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D97706
Lorenz Bauer reported that the following code will have
compilation error for bpf target:
enum e { TWO };
bpf_core_enum_value_exists(enum e, TWO);
The clang emitted the following error message:
__builtin_preserve_enum_value argument 1 invalid
In SemaChecking, an expression like "*(enum NAME)1" will have
cast kind CK_IntegralToPointer, but "*(enum NAME)0" will have
cast kind CK_NullToPointer. Current implementation only permits
CK_IntegralToPointer, missing enum value 0 case.
This patch permits CK_NullToPointer cast kind and
the above test case can pass now.
Differential Revision: https://reviews.llvm.org/D97659
This seems to be more of a Clang thing rather than a generic LLVM thing,
so this moves it out of LLVM pipelines and as Clang extension hooks into
LLVM pipelines.
Move the post-inline EEInstrumentation out of the backend pipeline and
into a late pass, similar to other sanitizer passes. It doesn't fit
into the codegen pipeline.
Also fix up EntryExitInstrumentation not running at -O0 under the new
PM. PR49143
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D97608
The new Darwin backend for LLD is now able to link reasonably large
real-world programs on x86_64. For instance, we have achieved
self-hosting for the X86_64 target, where all LLD tests pass when
building lld with itself on macOS. As such, we would like to make it the
default back-end.
The new port is now named `ld64.lld`, and the old port remains
accessible as `ld64.lld.darwinold`
This [annoucement email][1] has some context. (But note that, unlike
what the email says, we are no longer doing this as part of the LLVM 12
branch cut -- instead we will go into LLVM 13.)
Numerous mechanical test changes were required to make this change; in
the interest of creating something that's reviewable on Phabricator,
I've split out the boring changes into a separate diff (D95905). I plan to
merge its contents with those in this diff before landing.
(@gkm made the original draft of this diff, and he has agreed to let me
take over.)
[1]: https://lists.llvm.org/pipermail/llvm-dev/2021-January/147665.html
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D95204
On clang emits the compiler version string into debug information
by default for both dwarf and codeview. That makes compiler output
needlessly compiler-version-dependent which makes e.g. comparing
object file outputs during a bisect hard. So it's nice if there's
an easy way to turn this off.
(On ELF, this flag also controls the .comment section, but that
part is ELF-only. The debug-info bit isn't.)
Differential Revision: https://reviews.llvm.org/D97695
Simply make sure that the CodeGenFunction::CXXThisValue and CXXABIThisValue
are correctly initialized to the recovered value.
For lambda capture, we also need to make sure to fill the LambdaCaptureFields
Differential Revision: https://reviews.llvm.org/D97534
For ELF targets, GCC 11 will set SHF_GNU_RETAIN on the section of a
`__attribute__((retain))` function/variable to prevent linker garbage
collection. (See AttrDocs.td for the linker support).
This patch adds `retain` functions/variables to the `llvm.used` list, which has
the desired linker GC semantics. Note: `retain` does not imply `used`,
so an unused function/variable can be dropped by Sema.
Before 'retain' was introduced, previous ELF solutions require inline asm or
linker tricks, e.g. `asm volatile(".reloc 0, R_X86_64_NONE, target");`
(architecture dependent) or define a non-local symbol in the section and use
`ld -u`. There was no elegant source-level solution.
With D97448, `__attribute__((retain))` will set `SHF_GNU_RETAIN` on ELF targets.
Differential Revision: https://reviews.llvm.org/D97447