We have the `clang -cc1` command-line option `-funwind-tables=1|2` and
the codegen option `VALUE_CODEGENOPT(UnwindTables, 2, 0) ///< Unwind
tables (1) or asynchronous unwind tables (2)`. However, this is
encoded in LLVM IR by the presence or the absence of the `uwtable`
attribute, i.e. we lose the information whether to generate want just
some unwind tables or asynchronous unwind tables.
Asynchronous unwind tables take more space in the runtime image, I'd
estimate something like 80-90% more, as the difference is adding
roughly the same number of CFI directives as for prologues, only a bit
simpler (e.g. `.cfi_offset reg, off` vs. `.cfi_restore reg`). Or even
more, if you consider tail duplication of epilogue blocks.
Asynchronous unwind tables could also restrict code generation to
having only a finite number of frame pointer adjustments (an example
of *not* having a finite number of `SP` adjustments is on AArch64 when
untagging the stack (MTE) in some cases the compiler can modify `SP`
in a loop).
Having the CFI precise up to an instruction generally also means one
cannot bundle together CFI instructions once the prologue is done,
they need to be interspersed with ordinary instructions, which means
extra `DW_CFA_advance_loc` commands, further increasing the unwind
tables size.
That is to say, async unwind tables impose a non-negligible overhead,
yet for the most common use cases (like C++ exceptions), they are not
even needed.
This patch extends the `uwtable` attribute with an optional
value:
- `uwtable` (default to `async`)
- `uwtable(sync)`, synchronous unwind tables
- `uwtable(async)`, asynchronous (instruction precise) unwind tables
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D114543
The module flag to indicate use of hostcall is insufficient to catch
all cases where hostcall might be in use by a kernel. This is now
replaced by a function attribute that gets propagated to top-level
kernel functions via their respective call-graph.
If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the
default behaviour is to emit kernel metadata indicating that the
kernel uses the hostcall buffer pointer passed as an implicit
argument.
The attribute may be placed explicitly by the user, or inferred by the
AMDGPU attributor by examining the call-graph. The attribute is
inferred only if the function is not being sanitized, and the
implictarg_ptr does not result in a load of any byte in the hostcall
pointer argument.
Reviewed By: jdoerfert, arsenm, kpyzhov
Differential Revision: https://reviews.llvm.org/D119216
The pointer is always dereferenced, so assert the cast is correct (which it should be as we just created that ScalableVectorType) instead of returning nullptr
In Clang we can attach TBAA metadata based on the load/store intrinsics
based on the operation's element type.
This also contains changes to InstCombine where the AArch64-specific
intrinsics are transformed into generic LLVM load/store operations,
to ensure that all metadata is transferred to the new instruction.
There will be some further work after this patch to also emit TBAA
metadata for SVE's gather/scatter- and struct load/store intrinsics.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D119319
Due to the way type units work, this would lead to a declaration in a
type unit of a local type in a CU - which is ambiguous. Rather than
trying to resolve that relative to the CU that references the type unit,
let's just not try to simplify these names.
Longer term this should be fixed by not putting the template
instantiation in a type unit to begin with - since it references an
internal linkage type, it can't legitimately be duplicated/in more than
one translation unit, so skip the type unit overhead. (but the right fix
for that is to move type unit management into a DICompositeType flag
(dropping the "identifier" field is not a perfect solution since it
breaks LLVM IR linking decl/def merging during IR linking))
Lambda names aren't entirely canonical (as demonstrated by the
cross-project-test added here) at the moment (we should fix that for a
bunch of reasons) - even if the template referencing them is
non-simplified, other names referencing /that/ template can't be
simplified either because type units might cause a different template to
be picked up that would conflict with the expected name.
(other than for roundtripping precision, it'd be OK to simplify types
that reference types that reference lambdas - but best be consistent
between the roundtrip/verify mode and the actual simplified template
names mode)
The introduction and some examples are on this page:
https://devblogs.microsoft.com/cppblog/announcing-jmc-stepping-in-visual-studio/
The `/JMC` flag enables these instrumentations:
- Insert at the beginning of every function immediately after the prologue with
a call to `void __fastcall __CheckForDebuggerJustMyCode(unsigned char *JMC_flag)`.
The argument for `__CheckForDebuggerJustMyCode` is the address of a boolean
global variable (the global variable is initialized to 1) with the name
convention `__<hash>_<filename>`. All such global variables are placed in
the `.msvcjmc` section.
- The `<hash>` part of `__<hash>_<filename>` has a one-to-one mapping
with a directory path. MSVC uses some unknown hashing function. Here I
used DJB.
- Add a dummy/empty COMDAT function `__JustMyCode_Default`.
- Add `/alternatename:__CheckForDebuggerJustMyCode=__JustMyCode_Default` link
option via ".drectve" section. This is to prevent failure in
case `__CheckForDebuggerJustMyCode` is not provided during linking.
Implementation:
All the instrumentations are implemented in an IR codegen pass. The pass is placed immediately before CodeGenPrepare pass. This is to not interfere with mid-end optimizations and make the instrumentation target-independent (I'm still working on an ELF port in a separate patch).
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D118428
code object version determines ABI, therefore should not be mixed.
This patch emits amdgpu_code_object_version module flag in LLVM IR
based on code object version (default 4).
The amdgpu_code_object_version value is code object version times 100.
LLVM IR with different amdgpu_code_object_version module flag cannot
be linked.
The -cc1 option -mcode-object-version=none is for ROCm device library use
only, which supports multiple ABI.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D119026
The "-fzero-call-used-regs" option tells the compiler to zero out
certain registers before the function returns. It's also available as a
function attribute: zero_call_used_regs.
The two upper categories are:
- "used": Zero out used registers.
- "all": Zero out all registers, whether used or not.
The individual options are:
- "skip": Don't zero out any registers. This is the default.
- "used": Zero out all used registers.
- "used-arg": Zero out used registers that are used for arguments.
- "used-gpr": Zero out used registers that are GPRs.
- "used-gpr-arg": Zero out used GPRs that are used as arguments.
- "all": Zero out all registers.
- "all-arg": Zero out all registers used for arguments.
- "all-gpr": Zero out all GPRs.
- "all-gpr-arg": Zero out all GPRs used for arguments.
This is used to help mitigate Return-Oriented Programming exploits.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D110869
When not going through the main Clang->LLVM type cache, we'd
accidentally create multiple different opaque types for a member pointer
type.
This allows us to remove the -verify-type-cache flag now that
check-clang passes with it on. We can do the verification in expensive
builds. Previously microsoft-abi-member-pointers.cpp was failing with
-verify-type-cache.
I suspect that there may be more issues when we have multiple member
pointer types and we clear the cache, but we can leave that for later.
Followup to D118744.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D119215
Among many FoldingSet users most notable seem to be ASTContext and CodeGenTypes.
The reasons that we spend not-so-tiny amount of time in FoldingSet calls from there, are following:
1. Default FoldingSet capacity for 2^6 items very often is not enough.
For PointerTypes/ElaboratedTypes/ParenTypes it's not unlikely to observe growing it to 256 or 512 items.
FunctionProtoTypes can easily exceed 1k items capacity growing up to 4k or even 8k size.
2. FoldingSetBase::GrowBucketCount cost itself is not very bad (pure reallocations are rather cheap thanks to BumpPtrAllocator).
What matters is high collision rate when lot of items end up in same bucket slowing down FoldingSetBase::FindNodeOrInsertPos and trashing CPU cache
(as items with same hash are organized in intrusive linked list which need to be traversed).
This change address both issues by increasing initial size of FoldingSets used in ASTContext and CodeGenTypes.
Extracted from: https://reviews.llvm.org/D118385
Differential Revision: https://reviews.llvm.org/D118608
Following the discussion on D118229, this marks all pointer-typed
kernel arguments as having ABI alignment, per section 6.3.5 of
the OpenCL spec:
> For arguments to a __kernel function declared to be a pointer to
> a data type, the OpenCL compiler can assume that the pointee is
> always appropriately aligned as required by the data type.
Differential Revision: https://reviews.llvm.org/D118894
D117898 added the generic __builtin_elementwise_add_sat and __builtin_elementwise_sub_sat with the same integer behaviour as the SSE/AVX instructions
This patch removes the __builtin_ia32_padd/psub saturated intrinsics and just uses the generics - the existing tests see no changes:
__m256i test_mm256_adds_epi8(__m256i a, __m256i b) {
// CHECK-LABEL: test_mm256_adds_epi8
// CHECK: call <32 x i8> @llvm.sadd.sat.v32i8(<32 x i8> %{{.*}}, <32 x i8> %{{.*}})
return _mm256_adds_epi8(a, b);
}
D117898 added the generic __builtin_elementwise_add_sat and __builtin_elementwise_sub_sat with the same integer behaviour as the SSE/AVX instructions
This patch removes the __builtin_ia32_padd/psub saturated intrinsics and just uses the generics - the existing tests see no changes:
__m256i test_mm256_adds_epi8(__m256i a, __m256i b) {
// CHECK-LABEL: test_mm256_adds_epi8
// CHECK: call <32 x i8> @llvm.sadd.sat.v32i8(<32 x i8> %{{.*}}, <32 x i8> %{{.*}})
return _mm256_adds_epi8(a, b);
}
Done in manner similar to mutexinoutset
(see https://reviews.llvm.org/D57576)
Runtime support already exists in LLVM OpenMP runtime (see
https://reviews.llvm.org/D97085).
The value used to identify an inoutset dependency type in the LLVM
OpenMP runtime is 8.
Some tests updated due to change in dependency type error messages that
now include new dependency type. Also updated
test/OpenMP/task_codegen.cpp to verify we emit the right code.
This patch implements `__builtin_elementwise_add_sat` and `__builtin_elementwise_sub_sat` builtins.
These map to the add/sub saturated math intrinsics described here:
https://llvm.org/docs/LangRef.html#saturation-arithmetic-intrinsics
With this in place we should then be able to replace the x86 SSE adds/subs intrinsics with these generic variants - it looks like other targets should be able to use these as well (arm/aarch64/webassembly all have similar examples in cgbuiltin).
Differential Revision: https://reviews.llvm.org/D117898
Take the following as an example
struct z {
z (*p)();
};
z f();
When we attempt to get the LLVM type of f, we recurse into z. z itself
has a function pointer with the same type as f. Given the recursion,
Clang simply treats z::p as a pointer to an empty struct `{}*`. The
LLVM type of f is as expected. So we have two different potential
LLVM types for a given Clang type. If we store one of those into the
cache, when we access the cache with a different context (e.g. we
are/aren't recursing on z) we may get an incorrect result. There is some
attempt to clear the cache in these cases, but it doesn't seem to handle
all cases.
This change makes it so we only use the cache when we are not in any
sort of function context, i.e. `noRecordsBeingLaidOut() &&
FunctionsBeingProcessed.empty()`, which are the cases where we may
decide to choose a different LLVM type for a given Clang type. LLVM
types for builtin types are never recursive so they're always ok.
This allows us to clear the type cache less often (as seen with the
removal of one of the calls to `TypeCache.clear()`). We
still need to clear it when we use a placeholder type then replace it
later with the final type and other dependent types need to be
recalculated.
I've added a check that the cached type matches what we compute. It
triggered in this test case without the fix. It's currently not
check-clang clean so it's not on by default for something like expensive
checks builds.
This change uncovered another issue where the LLVM types for an argument
and its local temporary don't match. For example in type-cache-3, when
expanding z::dc's argument into a temporary alloca, we ConvertType() the
type of z::p which is `void ({}*)*`, which doesn't match the alloca GEP
type of `{}*`.
No noticeable compile time changes:
https://llvm-compile-time-tracker.com/compare.php?from=3918dd6b8acf8c5886b9921138312d1c638b2937&to=50bdec9836ed40e38ece0657f3058e730adffc4c&stat=instructionsFixes#53465.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D118744
If we call CGOpenCLRuntime::convertOpenCLSpecificType() multiple times
we should get the same type back.
Reviewed By: svenvh
Differential Revision: https://reviews.llvm.org/D119011
This issue is an oversight in D108621.
Literals in HIP are emitted as global constant variables with default
address space which maps to Generic address space for HIPSPV. In
SPIR-V such variables translate to OpVariable instructions with
Generic storage class which are not legal. Fix by mapping literals
to CrossWorkGroup address space.
The literals are not mapped to UniformConstant because the “flat”
pointers in HIP may reference them and “flat” pointers are modeled
as Generic pointers in SPIR-V. In SPIR-V/OpenCL UniformConstant
pointers may not be casted to Generic.
Patch by: Henry Linjamäki
Reviewed by: Yaxun Liu
Differential Revision: https://reviews.llvm.org/D118876
After fa87fa97fb, this was no longer guaranteed to be the cleanup
just added by this code, if IsEHCleanup got disabled. Instead, use
stable_begin(), which _is_ guaranteed to be the cleanup just added.
This caused a crash when a object that is callee destroyed (e.g. with the MS ABI) was passed in a call from a noexcept function.
Added a test to verify.
Fixes: fa87fa97fb
This patch completely removes the old OpenMP device runtime. Previously,
the old runtime had the prefix `libomptarget-new-` and the old runtime
was simply called `libomptarget-`. This patch makes the formerly new
runtime the only runtime available. The entire project has been deleted,
and all references to the `libomptarget-new` runtime has been replaced
with `libomptarget-`.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D118934
Even if the reference itself is dllexport, the temporary should not be.
In fact, we're already giving it internal linkage, so dllexporting it
is not just wasteful, but will fail to link, as in the example below:
$ cat /tmp/a.cc
void _DllMainCRTStartup() {}
const int __declspec(dllexport) &foo = 42;
$ clang-cl -fuse-ld=lld /tmp/a.cc /Zl /link /dll /out:a.dll
lld-link: error: <root>: undefined symbol: int const &foo::$RT1
Differential revision: https://reviews.llvm.org/D118980
Some types (e.g. `_Bool`) have different scalar and memory representations. CodeGen for `va_arg` didn't take this into account, leading to an assertion failures with different types.
This patch makes sure we use memory representation for `va_arg`.
Reviewed By: ahatanak
Differential Revision: https://reviews.llvm.org/D118904
EHTerminateScope is used to implement C++ noexcept semantics. Per C++
[except.terminate], it is implemented-defined whether no, some, or all
cleanups are run prior to terminatation.
Therefore, the code to run cleanups on the way towards termination is
unnecessary, and may be omitted.
After this change, we will still run some cleanups: any cleanups in a
function called from the noexcept function will continue to run, while
those in the noexcept function itself will not.
(Commit attempt 2: check InnermostEHScope != stable_end() before accessing it.)
Differential Revision: https://reviews.llvm.org/D113620
While investigating the failures of `symbolize_pc.cpp` and
`symbolize_pc_inline.cpp` on SPARC (both Solaris and Linux), I noticed that
`__builtin_extract_return_addr` is a no-op in `clang` on all targets, while
`gcc` has non-default implementations for arm, mips, s390, and sparc.
This patch provides the SPARC implementation. For background see
`SparcISelLowering.cpp` (`SparcTargetLowering::LowerReturn_32`), the SPARC
psABI p.3-12, `%i7` and p.3-16/17, and SCD 2.4.1, p.3P-10, `%i7` and
p.3P-15.
Tested (after enabling the `sanitizer_common` tests on SPARC) on
`sparcv9-sun-solaris2.11`.
Differential Revision: https://reviews.llvm.org/D91607
This patch extends clang frontend to add metadata that can be used to emit macho files with two build version load commands.
It utilizes "darwin.target_variant.triple" and "darwin.target_variant.SDK Version" metadata names for that.
MachO uses two build version load commands to represent an object file / binary that is targeting both the macOS target,
and the Mac Catalyst target. At runtime, a dynamic library that supports both targets can be loaded from either a native
macOS or a Mac Catalyst app on a macOS system. We want to add support to this to upstream to LLVM to be able to build
compiler-rt for both targets, to finish the complete support for the Mac Catalyst platform, which is right now targetable
by upstream clang, but the compiler-rt bits aren't supported because of the lack of this multiple build version support.
Differential Revision: https://reviews.llvm.org/D115415
Based on the output of include-what-you-use.
This is a big chunk of changes. It is very likely to break downstream code
unless they took a lot of care in avoiding hidden ehader dependencies, something
the LLVM codebase doesn't do that well :-/
I've tried to summarize the biggest change below:
- llvm/include/llvm-c/Core.h: no longer includes llvm-c/ErrorHandling.h
- llvm/IR/DIBuilder.h no longer includes llvm/IR/DebugInfo.h
- llvm/IR/IRBuilder.h no longer includes llvm/IR/IntrinsicInst.h
- llvm/IR/LLVMRemarkStreamer.h no longer includes llvm/Support/ToolOutputFile.h
- llvm/IR/LegacyPassManager.h no longer include llvm/Pass.h
- llvm/IR/Type.h no longer includes llvm/ADT/SmallPtrSet.h
- llvm/IR/PassManager.h no longer includes llvm/Pass.h nor llvm/Support/Debug.h
And the usual count of preprocessed lines:
$ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/IR/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l
before: 6400831
after: 6189948
200k lines less to process is no that bad ;-)
Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D118652
This patch adds a function attribute to the kernel function generated in
OpenMP offloading. We already create a `nvvm.annotations` metadata node
indicating the kernels present in the program. However, this created
some indirection when trying to identify if a specific function was an
entry. We add a single function attribute for each function now to
simplify this.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D118708
D116542 adds EmbedBufferInModule which introduces a layer violation
(https://llvm.org/docs/CodingStandards.html#library-layering).
See 2d5f857a1e for detail.
EmbedBufferInModule does not use BitcodeWriter functionality and should be moved
LLVMTransformsUtils. While here, change the function case to the prevailing
convention.
It seems that EmbedBufferInModule just follows the steps of
EmbedBitcodeInModule. EmbedBitcodeInModule calls WriteBitcodeToFile but has IR
update operations which ideally should be refactored to another library.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D118666
This patch adds support for a flag `-fembed-offload-binary` to embed a
file as an ELF section in the output by placing it in a global variable.
This can be used to bundle offloading files with the host binary so it
can be accessed by the linker. The section is named using the
`-fembed-offload-section` option.
Depends on D116541
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D116542
AVR is baremetal environment, so the avr-libc does not support
'__cxa_atexit()'.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D118445
ConstStructBuilder::Finalize in CGExprConstant.ccp assumes that the
passed in QualType is a RecordType. In some instances, the type is a
reference to a RecordType and the reference needs to be removed first.
Differential Revision: https://reviews.llvm.org/D117376
Branch protection in M-class is supported by
- Armv8.1-M.Main
- Armv8-M.Main
- Armv7-M
Attempting to enable this for other architectures, either by
command-line (e.g -mbranch-protection=bti) or by target attribute
in source code (e.g. __attribute__((target("branch-protection=..."))) )
will generate a warning.
In both cases function attributes related to branch protection will not
be emitted. Regardless of the warning, module level attributes related to
branch protection will be emitted when it is enabled via the command-line.
The following people also contributed to this patch:
- Victor Campos
Reviewed By: chill
Differential Revision: https://reviews.llvm.org/D115501
This patch changes the code generation of runtime flags to only occur if
a host bitcode file was passed in. This is a cheap way to determine if
we are compiling the OpenMP device runtime itself or user code. This is
needed because the global flags we generate for the device runtime e.g.
__omp_rtl_debug_kind were being generated with default values when we
compiled the runtime library. This would then invalidate the ones we
want to be able to add in when the user defines it.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D118399
Since it's introduction, the qrdmlah has been represented as a qrdmulh
and a sadd_sat. This doesn't produce the same result for all input
values though. This patch fixes that by introducing a qrdmlah (and
qrdmlsh) intrinsic specifically for the vqrdmlah and sqrdmlah
instructions. The old test cases will now produce a qrdmulh and sqadd,
as expected.
Fixes#53120 and #50905 and #51761.
Differential Revision: https://reviews.llvm.org/D117592
CodeGenModule::DeferredDecls std::map::operator[] seem to be hot especially while code generating huge compilation units.
In such cases using DenseMap instead gives observable compile time improvement. Patch was tested on Linux build with default config acting as benchmark.
Build was performed on isolated CPU cores in silent x86-64 Linux environment following: https://llvm.org/docs/Benchmarking.html#linux rules.
Compile time statistics diff produced by perf and time before and after change are following:
instructions -0.15%, cycles -0.7%, max-rss +0.65%.
Using StringMap instead DenseMap doesn't bring any visible gains.
Differential Revision: https://reviews.llvm.org/D118169
We currently emit the selector load early, but only because we need
it to compute the signature (so that we know which msgSend variant to
call). We can prepare the signature with a plain undef, and replace
it with the materialized selector value if (and only if) needed, later.
Concretely, this usually doesn't have an effect, but tests need updating
because we reordered the receiver bitcast and the selector load, which
is always fine.
There is one notable change: with this, when a msgSend needs a
receiver null check, the selector is now loaded in the non-null
block, instead of before the null check. That should be a mild
improvement.
/home/buildbot/llvm-avr-linux/llvm-avr-linux/llvm/clang/lib/CodeGen/Address.h:76:7: warning: 'clang::CodeGen::Address' has a field 'clang::CodeGen::Address::A' whose type uses the anonymous namespace [-Wsubobject-linkage]
https://lab.llvm.org/buildbot/#/builders/112/builds/12047
This mitigates the extra memory caused by D115725.
On 32-bit arches where we only have 2 bits per PointerIntPair we fall
back to simply storing alignment separately.
Reviewed By: rnk, nikic
Differential Revision: https://reviews.llvm.org/D117262
This reverts commit ef82063207.
- It conflicts with the existing llvm::size in STLExtras, which will now
never be called.
- Calling it without llvm:: breaks C++17 compat
Instead use either Type::getPointerElementType() or
Type::getNonOpaquePointerElementType().
This is part of D117885, in preparation for deprecating the API.
None of these have any reordering issues, and they still emit the same reduction intrinsics without any change in the existing test coverage:
llvm-project\clang\test\CodeGen\X86\avx512-reduceIntrin.c
llvm-project\clang\test\CodeGen\X86\avx512-reduceMinMaxIntrin.c
Differential Revision: https://reviews.llvm.org/D117881
D111985 added the generic `__builtin_elementwise_max` and `__builtin_elementwise_min` intrinsics with the same integer behaviour as the SSE/AVX instructions
This patch removes the `__builtin_ia32_pmax/min` intrinsics and just uses `__builtin_elementwise_max/min` - the existing tests see no changes:
```
__m256i test_mm256_max_epu32(__m256i a, __m256i b) {
// CHECK-LABEL: test_mm256_max_epu32
// CHECK: call <8 x i32> @llvm.umax.v8i32(<8 x i32> %{{.*}}, <8 x i32> %{{.*}})
return _mm256_max_epu32(a, b);
}
```
This requires us to add a `__v64qs` explicitly signed char vector type (we already have `__v16qs` and `__v32qs`).
Sibling patch to D117791
Differential Revision: https://reviews.llvm.org/D117798
D111986 added the generic `__builtin_elementwise_abs()` intrinsic with the same integer absolute behaviour as the SSE/AVX instructions (abs(INT_MIN) == INT_MIN)
This patch removes the `__builtin_ia32_pabs*` intrinsics and just uses `__builtin_elementwise_abs` - the existing tests see no changes:
```
__m256i test_mm256_abs_epi8(__m256i a) {
// CHECK-LABEL: test_mm256_abs_epi8
// CHECK: [[ABS:%.*]] = call <32 x i8> @llvm.abs.v32i8(<32 x i8> %{{.*}}, i1 false)
return _mm256_abs_epi8(a);
}
```
This requires us to add a `__v64qs` explicitly signed char vector type (we already have `__v16qs` and `__v32qs`).
Differential Revision: https://reviews.llvm.org/D117791
D111985 added the generic `__builtin_elementwise_max` and `__builtin_elementwise_min` intrinsics with the same integer behaviour as the SSE/AVX instructions
This patch removes the `__builtin_ia32_pmax/min` intrinsics and just uses `__builtin_elementwise_max/min` - the existing tests see no changes:
```
__m256i test_mm256_max_epu32(__m256i a, __m256i b) {
// CHECK-LABEL: test_mm256_max_epu32
// CHECK: call <8 x i32> @llvm.umax.v8i32(<8 x i32> %{{.*}}, <8 x i32> %{{.*}})
return _mm256_max_epu32(a, b);
}
```
This requires us to add a `__v64qs` explicitly signed char vector type (we already have `__v16qs` and `__v32qs`).
Sibling patch to D117791
Differential Revision: https://reviews.llvm.org/D117798
D111986 added the generic `__builtin_elementwise_abs()` intrinsic with the same integer absolute behaviour as the SSE/AVX instructions (abs(INT_MIN) == INT_MIN)
This patch removes the `__builtin_ia32_pabs*` intrinsics and just uses `__builtin_elementwise_abs` - the existing tests see no changes:
```
__m256i test_mm256_abs_epi8(__m256i a) {
// CHECK-LABEL: test_mm256_abs_epi8
// CHECK: [[ABS:%.*]] = call <32 x i8> @llvm.abs.v32i8(<32 x i8> %{{.*}}, i1 false)
return _mm256_abs_epi8(a);
}
```
This requires us to add a `__v64qs` explicitly signed char vector type (we already have `__v16qs` and `__v32qs`).
Differential Revision: https://reviews.llvm.org/D117791
Intel's CET/IBT requires every indirect branch target to be an ENDBR instruction. Because of that, the compiler needs to correctly emit these instruction on function's prologues. Because this is a security feature, it is desirable that only actual indirect-branch-targeted functions are emitted with ENDBRs. While it is possible to identify address-taken functions through LTO, minimizing these ENDBR instructions remains a hard task for user-space binaries because exported functions may end being reachable through PLT entries, that will use an indirect branch for such. Because this cannot be determined during compilation-time, the compiler currently emits ENDBRs to every non-local-linkage function.
Despite the challenge presented for user-space, the kernel landscape is different as no PLTs are used. With the intent of providing the most fit ENDBR emission for the kernel, kernel developers proposed an optimization named "ibt-seal" which replaces the ENDBRs for NOPs directly in the binary. The discussion of this feature can be seen in [1].
This diff brings the enablement of the flag -mibt-seal, which in combination with LTO enforces a different policy for ENDBR placement in when the code-model is set to "kernel". In this scenario, the compiler will only emit ENDBRs to address taken functions, ignoring non-address taken functions that are don't have local linkage.
A comparison between an LTO-compiled kernel binaries without and with the -mibt-seal feature enabled shows that when -mibt-seal was used, the number of ENDBRs in the vmlinux.o binary patched by objtool decreased from 44383 to 33192, and that the number of superfluous ENDBR instructions nopped-out decreased from 11730 to 540.
The 540 missed superfluous ENDBRs need to be investigated further, but hypotheses are: assembly code not being taken care of by the compiler, kernel exported symbols mechanisms creating bogus address taken situations or even these being removed due to other binary optimizations like kernel's static_calls. For now, I assume that the large drop in the number of ENDBR instructions already justifies the feature being merged.
[1] - https://lkml.org/lkml/2021/11/22/591
Reviewed By: xiangzhangllvm
Differential Revision: https://reviews.llvm.org/D116070
This patch adds support for the MSVC /HOTPATCH flag: https://docs.microsoft.com/sv-se/cpp/build/reference/hotpatch-create-hotpatchable-image?view=msvc-170&viewFallbackFrom=vs-2019
The flag is translated to a new -fms-hotpatch flag, which in turn adds a 'patchable-function' attribute for each function in the TU. This is then picked up by the PatchableFunction pass which would generate a TargetOpcode::PATCHABLE_OP of minsize = 2 (which means the target instruction must resolve to at least two bytes). TargetOpcode::PATCHABLE_OP is only implemented for x86/x64. When targetting ARM/ARM64, /HOTPATCH isn't required (instructions are always 2/4 bytes and suitable for hotpatching).
Additionally, when using /Z7, we generate a 'hot patchable' flag in the CodeView debug stream, in the S_COMPILE3 record. This flag is then picked up by LLD (or link.exe) and is used in conjunction with the linker /FUNCTIONPADMIN flag to generate extra space before each function, to accommodate for live patching long jumps. Please see: d703b92296/lld/COFF/Writer.cpp (L1298)
The outcome is that we can finally use Live++ or Recode along with clang-cl.
NOTE: It seems that MSVC cl.exe always enables /HOTPATCH on x64 by default, although if we did the same I thought we might generate sub-optimal code (if this flag was active by default). Additionally, MSVC always generates a .debug$S section and a S_COMPILE3 record, which Clang doesn't do without /Z7. Therefore, the following MSVC command-line "cl /c file.cpp" would have to be written with Clang such as "clang-cl /c file.cpp /HOTPATCH /Z7" in order to obtain the same result.
Depends on D43002, D80833 and D81301 for the full feature.
Differential Revision: https://reviews.llvm.org/D116511
When adding new attributes, existing attributes are dropped. While
this appears to be a longstanding issue, this was highlighted by D105169
which dropped a lot of attributes due to adding the new noundef
attribute.
Ahmed Bougacha (@ab) tracked down the issue and provided the fix in
CGCall.cpp. I bundled it up and updated the tests.
HIP program with printf call fails to compile with -fsanitize=address
option, because of appending module flag - amdgpu_hostcall twice, one
for printf and one for sanitize option. This patch fixes that issue.
Patch by: Praveen Velliengiri
Reviewed by: Yaxun Liu, Roman Lebedev
Differential Revision: https://reviews.llvm.org/D116216
This patch adds OMPIRBuilder support for the simd directive (without any clause). This will be a first step towards lowering simd directive in LLVM_Flang. The patch uses existing CanonicalLoop infrastructure of IRBuilder to add the support. Also adds necessary code to add llvm.access.group and llvm.loop metadata wherever needed.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D114379
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.
Test updates are made as a separate patch: D108453
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D105169
Use the AttributeSet constructor instead. There's no good reason
why AttrBuilder itself should exact the AttributeSet from the
AttributeList. Moving this out of the AttrBuilder generally results
in cleaner code.
Since 2959e082e1, we conservatively
assume all inputs are enabled by default. This isn't the best
interface for controlling these anyway, since it's not granular and
only allows trimming the last fields.
EHTerminateScope is used to implement C++ noexcept semantics. Per C++
[except.terminate], it is implemented-defined whether no, some, or all
cleanups are run prior to terminatation.
Therefore, the code to run cleanups on the way towards termination is
unnecessary, and may be omitted.
After this change, we will still run some cleanups: any cleanups in a
function called from the noexcept function will continue to run, while
those in the noexcept function itself will not.
Differential Revision: https://reviews.llvm.org/D113620
Cases where there is a mangling of a cpu-dispatch/cpu-specific function
before the function becomes 'multiversion' (such as a member function)
causes the wrong name to be emitted for one of the variants/resolver,
since the name is cached. Make sure we invalidate the cache in
cpu-dispatch/cpu-specific modes, like we previously did for just target
multiversioning.
This patch implements two builtins specified in D111529.
The last __builtin_reduce_add will be seperated into another one.
Differential Revision: https://reviews.llvm.org/D116736
With the introduction of this flag, it is no longer necessary to enable noundef analysis with 4 separate flags.
(-Xclang -enable-noundef-analysis -mllvm -msan-eager-checks=1).
This change only covers the introduction into the compiler.
This is a follow up to: https://reviews.llvm.org/D116855
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D116633
TLS initializers, for example constructors of thread-local variables, don't necessarily get called. If a thread was created before a module is loaded, the module's TLS initializers are not executed for this particular thread.
This is why Microsoft added support for dynamic TLS initialization. Before every use of thread-local variables, a check is added that runs the module's TLS initializers on-demand.
To do this, the method `__dyn_tls_on_demand_init` gets called. Internally, it simply calls `__dyn_tls_init`.
No additional TLS initializer that sets the guard needs to be emitted, as the guard always gets set by `__dyn_tls_init`.
The guard is also checked again within `__dyn_tls_init`. This makes our check redundant, however, as Microsoft's compiler also emits this check, the behaviour is adopted here.
Reviewed By: majnemer
Differential Revision: https://reviews.llvm.org/D115456
Functions pointers should be created with program address space. This
patch introduces program address space in TargetInfo. Targets with
non-default (default is 0) address space for functions should explicitly
set this value. This patch fixes a crash on lvalue reference to function
pointer (in device code) when using oneAPI DPC++ compiler.
Differential Revision: https://reviews.llvm.org/D111566
to repro error.
As mentioned yesterday, I've got a problem that I can only reproduce on
Godbolt (none of the build configs on my local machine!), so this is at
least somewhat usable until I figure out a cause.
Minor adjustment in order of noundef analysis to be a bit more optimal (when disabled).
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D117078
I'm attempting to debug an issue that I can only get to happen on
godbolt, where the cpu-dispatch resolver for an out of line member
function is generated with the wrong name, causing a link failure.
When `-ftrivial-auto-var-init=` is enabled, allocas unconditionally
receive auto-initialization since [1].
In certain cases, it turns out, this is causing problems. For example,
when using alloca to add a random stack offset, as the Linux kernel does
on syscall entry [2]. In this case, none of the alloca'd stack memory is
ever used, and initializing it should be controllable; furthermore, it
is not always possible to safely call memset (see [2]).
Introduce `__builtin_alloca_uninitialized()` (and
`__builtin_alloca_with_align_uninitialized`), which never performs
initialization when `-ftrivial-auto-var-init=` is enabled.
[1] https://reviews.llvm.org/D60548
[2] https://lkml.kernel.org/r/YbHTKUjEejZCLyhX@elver.google.com
Reviewed By: glider
Differential Revision: https://reviews.llvm.org/D115440
All kernels can be called from the host as per the SPIR_KERNEL calling
convention. As such, all kernels should have external linkage, but
block enqueue kernels were created with internal linkage.
Reported-by: Pedro Olsen Ferreira
Differential Revision: https://reviews.llvm.org/D115523
The code generation for the UBSan VLA size check was qualified by a con-
dition that the parameter must be a signed integer, however the C spec
does not make any distinction that only signed integer parameters can be
used to declare a VLA, only qualifying that it must be greater than zero
if it is not a constant.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D116048
I noticed that the following case would compile in Clang but not GCC:
void *x(void) {
void *p = &&foo;
asm goto ("# %0\n\t# %l1":"+r"(p):::foo);
foo:;
return p;
}
Changing the output template above from %l2 would compile in GCC but not
Clang.
This demonstrates that when using tied outputs (say via the "+r" output
constraint), the hidden inputs occur or are numbered BEFORE the labels,
at least with GCC.
In fact, GCC does denote this in its documentation:
https://gcc.gnu.org/onlinedocs/gcc-11.2.0/gcc/Extended-Asm.html#Goto-Labels
> Output operand with constraint modifier ‘+’ is counted as two operands
> because it is considered as one output and one input operand.
For the sake of compatibility, I think it's worthwhile to just make this
change.
It's better to use symbolic names for compatibility (especially now
between released version of Clang that support asm goto with outputs).
ie. %l1 from the above would be %l[foo]. The GCC docs also make this
recommendation.
Also, I cleaned up some cruft in GCCAsmStmt::getNamedOperand. AFAICT,
NumPlusOperands was no longer used, though I couldn't find which commit
didn't clean that up correctly.
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98096
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103640
Link: https://gcc.gnu.org/onlinedocs/gcc-11.2.0/gcc/Extended-Asm.html#Goto-Labels
Reviewed By: void
Differential Revision: https://reviews.llvm.org/D115471
when generating copy/dispose helper functions
Analyze the block captures just once before generating copy/dispose
block helper functions and honor the inert `__unsafe_unretained`
qualifier. This refactor fixes a bug where captures of ObjC
`__unsafe_unretained` and class types were needlessly retained/released
by the copy/dispose helper functions.
Differential Revision: https://reviews.llvm.org/D116948
and use that to simplify MD5's hex string code which was previously
using a string stream, as well as Clang's
CGDebugInfo::computeChecksum().
Differential revision: https://reviews.llvm.org/D116960
When calling emitArrayDestroy(), the pointer will usually have
ConvertTypeForMem(EltType) as the element type, as one would expect.
However, globals with initializers sometimes don't use the same
types as values normally would, e.g. here the global uses
{ double, i32 } rather than %struct.T as element type.
Add an early cast to the global destruction path to avoid this
special case. The cast would happen lateron anyway, it only gets
moved to an earlier point.
Differential Revision: https://reviews.llvm.org/D116219
Because of commit: https://reviews.llvm.org/D104291 the -gmodules .pcm
files do not have the same DW_AT_language dialect as the .o file. This
was a simple matter of passing the DebugStrictDwarf flag to the
PCHContainerGenerator object's CodeGenOpts from the CompilerInstance
passed in to it.
Before this change if you ran dwarfdump on the gmodule cache folder
you would get DW_AT_language (DW_LANG_C_plus_plus) even when using
-std=c++14 with clang
Patch by Shubham Rastogi!
Differential Revision: https://reviews.llvm.org/D116790
The `EmitDeclareOfAutoVariable` introduced in D114504 and D115510 has a
precondition that cannot be violated. It is unclear if we should call it
directly given the sparse usage in clang but for now we should at least
not crash if the debug info kind is too low.
Fixes#52938.
Differential Revision: https://reviews.llvm.org/D116865
This implements the clang side of D116531. The elementtype
attribute is added for all indirect constraints (*) and tests are
updated accordingly.
Differential Revision: https://reviews.llvm.org/D116666
This fixes bug49264.
Simply, coroutine shouldn't be inlined before CoroSplit. And the marker
for pre-splited coroutine is created in CoroEarly pass, which ran after
AlwaysInliner Pass in O0 pipeline. So that the AlwaysInliner couldn't
detect it shouldn't inline a coroutine. So here is the error.
This patch set the presplit attribute in clang and mlir. So the inliner
would always detect the attribute before splitting.
Reviewed By: rjmccall, ezhulenev
Differential Revision: https://reviews.llvm.org/D115790
This class is solely used as a lightweight and clean way to build a set of
attributes to be removed from an AttrBuilder. Previously AttrBuilder was used
both for building and removing, which introduced odd situation like creation of
Attribute with dummy value because the only relevant part was the attribute
kind.
Differential Revision: https://reviews.llvm.org/D116110
This patch extends `emitUnaryBuiltin` so that we can better emitting IR when
implement builtins specified in D111529.
Also contains some NFC, applying it to existing code.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D116161
One of the unused ident_t fields now holds the size of the string
(=const char *) field so we have an easier time dealing with those
in the future.
Differential Revision: https://reviews.llvm.org/D113126
This patch changes the default aligntment from 8 to 16, and encodes this
information in the `__kmpc_alloc_shared` runtime call to communicate it
to the HeapToStack pass. The previous alignment of 8 was not sufficient
for the maximum size of primitive types on 64-bit systems, and needs to
be increaesd. This reduces the amount of space availible in the data
sharing stack, so this implementation will need to be improved later to
include the alignment requirements in the allocation call, and use it
properly in the data sharing stack in the runtime.
Depends on D115888
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D115971
This patch adds the support for `atomic compare` in parser. The support
in Sema and CodeGen will come soon. For now, it simply eimits an error when it
is encountered.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D115561
The reduction initialization code creates a "naturally aligned null
pointer to void lvalue", which I found somewhat odd, even though it
works out in the end because it is not actually used. It doesn't
look like this code actually needs an LValue for anything though,
and we can use an invalid Address to represent this case instead.
Differential Revision: https://reviews.llvm.org/D116214
Add an overload for an Address and a single non-constant offset.
This makes it easier to preserve the element type and adjust the
alignment appropriately.
sret is special in that it does not use the memory type
representation. Manually construct the LValue using ConvertType
instead of ConvertTypeForMem here.
This fixes matrix-lowering-opt-levels.c on s390x.
MakeNaturalAlignAddrLValue() expects the pointee type, but the
pointer type was passed. As a result, the natural alignment of
the pointer (usually 8) was always used in place of the natural
alignment of the value type.
Differential Revision: https://reviews.llvm.org/D116171
HVX does not have load/store instructions for vector predicates (i.e. bool
vectors). Because of that, vector predicates need to be converted to another
type before being stored, and the most convenient representation is an HVX
vector.
As a consequence, in C/C++, source-level builtins that either take or
produce vector predicates take or return regular vectors instead. On the
other hand, the corresponding LLVM intrinsics do have boolean types that,
and so a conversion of the operand or the return value was necessary.
This conversion would happen inside clang's codegen, but was somewhat
fragile.
This patch changes the strategy: a builtin that takes a vector predicate
now really expects a vector predicate. Since such a predicate cannot be
provided via a variable, this builtin must be composed with other builtins
that either convert vector to a predicate (V6_vandvrt) or predicate to a
vector (V6_vandqrt).
For users using builtins defined in hvx_hexagon_protos.h there is no impact:
the conversions were added to that file. Other users will need to insert
- __builtin_HEXAGON_V6_vandvrt[_128B](V, -1) to convert vector V to a
vector predicate, or
- __builtin_HEXAGON_V6_vandqrt[_128B](Q, -1) to convert vector predicate Q
to a vector.
Builtins __builtin_HEXAGON_V6_vmaskedstore.* are a temporary exception to
that, but they are deprecated and should not be used anyway. In the future
they will either follow the same rule, or be removed.
Over in D114631 I turned this debug-info feature on by default, for x86_64
only. I'd previously stripped out the clang cc1 option that controlled it
in 651122fc4a, unfortunately that turned out to not be completely
effective, and the two things deleted in this patch continued to keep it
off-by-default. Oooff.
As a follow-up, this patch removes the last few things to do with
ValueTrackingVariableLocations from clang, which was the original purpose
of D114631. In an ideal world, if this patch causes you trouble you'd
revert 3c04507088 instead, which was where this behaviour was supposed
to start being the default, although that might not be practical any more.
This patch implements __builtin_reduce_xor as specified in D111529.
Reviewed By: fhahn, aaron.ballman
Differential Revision: https://reviews.llvm.org/D115231
Reland integrates build fixes & further review suggestions.
Thanks to @zturner for the initial S_OBJNAME patch!
Differential Revision: https://reviews.llvm.org/D43002
Also revert all subsequent fixes:
- abd1cbf5e5 [Clang] Disable debug-info-objname.cpp test on Unix until I sort out the issue.
- 00ec441253 [Clang] debug-info-objname.cpp test: explictly encode a x86 target when using %clang_cl to avoid falling back to a native CPU triple.
- cd407f6e52 [Clang] Fix build by restricting debug-info-objname.cpp test to x86.
Add an overload that accepts and returns an Address, as we
generally just want to replace the pointer with a laundered one,
while retaining remaining information.
Control-Flow Integrity (CFI) replaces references to address-taken
functions with pointers to the CFI jump table. This is a problem
for low-level code, such as operating system kernels, which may
need the address of an actual function body without the jump table
indirection.
This change adds the __builtin_function_start() builtin, which
accepts an argument that can be constant-evaluated to a function,
and returns the address of the function body.
Link: https://github.com/ClangBuiltLinux/linux/issues/1353
Depends on D108478
Reviewed By: pcc, rjmccall
Differential Revision: https://reviews.llvm.org/D108479
Profile merging is not supported when using debug info profile
correlation because the data section won't be in the binary at runtime.
Change the default profile name in this mode to `default_%p.proflite` so
we don't use profile merging.
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D115979
This reverts commit cc56c66f27.
Fixed a bad assertion, the target of a UsingShadowDecl must not have
*local* qualifiers, but it can be a typedef whose underlying type is qualified.
Currently there's no way to find the UsingDecl that a typeloc found its
underlying type through. Compare to DeclRefExpr::getFoundDecl().
Design decisions:
- a sugar type, as there are many contexts this type of use may appear in
- UsingType is a leaf like TypedefType, the underlying type has no TypeLoc
- not unified with UnresolvedUsingType: a single name is appealing,
but being sometimes-sugar is often fiddly.
- not unified with TypedefType: the UsingShadowDecl is not a TypedefNameDecl or
even a TypeDecl, and users think of these differently.
- does not cover other rarer aliases like objc @compatibility_alias,
in order to be have a concrete API that's easy to understand.
- implicitly desugared by the hasDeclaration ASTMatcher, to avoid
breaking existing patterns and following the precedent of ElaboratedType.
Scope:
- This does not cover types associated with template names introduced by
using declarations. A future patch should introduce a sugar TemplateName
variant for this. (CTAD deduced types fall under this)
- There are enough AST matchers to fix the in-tree clang-tidy tests and
probably any other matchers, though more may be useful later.
Caveats:
- This changes a fairly common pattern in the AST people may depend on matching.
Previously, typeLoc(loc(recordType())) matched whether a struct was
referred to by its original scope or introduced via using-decl.
Now, the using-decl case is not matched, and needs a separate matcher.
This is similar to the case of typedefs but nevertheless both adds
complexity and breaks existing code.
Differential Revision: https://reviews.llvm.org/D114251
This patch enables SPIR-V binary emission for HIP device code via the
HIPSPV tool chain.
‘--offload’ option, which is envisioned in [1], is added for specifying
offload targets. This option is used to override default device target
(amdgcn-amd-amdhsa) for HIP compilation for emitting device code as
SPIR-V binary. The option is handled in getHIPOffloadTargetTriple().
getOffloadingDeviceToolChain() function (based on the design in the
SYCL repository) is added to select HIPSPVToolChain when HIP offload
target is ‘spirv64’.
The HIPActionBuilder is modified to produce LLVM IR at the backend
phase. HIPSPV tool chain expects to receive HIP device code as LLVM
IR so it can run external LLVM passes over them. HIPSPV TC is also
responsible for emitting the SPIR-V binary.
A Cuda GPU architecture ‘generic’ is added. The name is picked from
the LLVM SPIR-V Backend. In the HIPSPV code path the architecture
name is inserted to the bundle entry ID as target ID. Target ID is
expected to be always present so a component in the target triple
is not mistaken as target ID.
Tests are added for checking the HIPSPV tool chain.
[1]: https://lists.llvm.org/pipermail/cfe-dev/2020-December/067362.html
Patch by: Henry Linjamäki
Reviewed by: Yaxun Liu, Artem Belevich, Alexey Bader
Differential Revision: https://reviews.llvm.org/D110622
The UpperBound of RVV type in debug info should be elements count minus one,
as the LowerBound start from zero.
Reviewed By: HsiangKai
Differential Revision: https://reviews.llvm.org/D115430
Summary: This patch records the access flag for
class/struct/union types in the clang part.
The summary of binary size change and debug info size change due to the DW_AT_accessibility attribute are as the following table. They are built with flags of `clang -O0 -g` (no -gz).
| section | before | after | change | % |
| .debug_loc | 929821 | 929821 |0|0|
|.debug_abbrev | 5885289 | 5971547 |+86258|+1.466%|
|.debug_info | 497613455 | 498122074 |+508619|+0.102%|
|.debug_ranges | 45731664 | 45731664 |0|0|
|.debug_str | 233842595 | 233839388 |-3207| -0.001%|
|.debug_line | 149773166 | 149764583 |-8583|-0.006%|
|total (debug) |933775990 |934359077|+583087 |+0.062%|
|total (binary) |1394617288 | 1395200024| +582736|+0.042%|
Reviewed By: dblaikie, shchenz
Differential Revision: https://reviews.llvm.org/D115503
This is the last cleanup step resulting from D115804 .
Now that clang uses intrinsics when we're in the special FP mode,
we don't need a function attribute as an indicator to the backend.
The LLVM part of the change is in D115885.
Differential Revision: https://reviews.llvm.org/D115886
Fix a mistake in 9bf917394eba3ba4df77cc17690c6d04f4e9d57f: sret
arguments use ConvertType, not ConvertTypeForMem, see the handling
in CodeGenTypes::GetFunctionType().
This fixes fp-matrix-pragma.c on s390x.
For aggregates, we need to store the element type to be able to
reconstruct the aggregate Address. This increases the size of this
packed structure (as the second value is already used for alignment
in this case), but I did not observe any compile-time or memory
usage regression from this change.
Add the llvm flag `-debug-info-correlate` to attach debug info to instrumentation counters so we can correlate raw profile data to their functions. Raw profiles are dumped as `.proflite` files. The next diff enables `llvm-profdata` to consume `.proflite` and debug info files to produce a normal `.profdata` profile.
Part of the "lightweight instrumentation" work: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4
The original diff https://reviews.llvm.org/D114565 was reverted because of the `Instrumentation/InstrProfiling/debug-info-correlate.ll` test, which is fixed in this commit.
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D115693
With opaque pointers the pointer cast may be a no-op, such that
var and castedAddr are the same. However, we still need to update
the map entry as the underlying global changed. We could explicitly
check whether the global was replaced, but we may as well just
always update the entry.
We got an unintended consequence of the optimizer getting smarter when
compiling in a non-standard mode, and there's no good way to inhibit
those optimizations at a later stage. The test is based on an example
linked from D92270.
We allow the "no-strict-float-cast-overflow" exception to normal C
cast rules to preserve legacy code that does not expect overflowing
casts from FP to int to produce UB. See D46236 for details.
Differential Revision: https://reviews.llvm.org/D115804
Store the pointer element type inside LValue so that we can
preserve it when converting it back into an Address. Storing the
pointer element type might not be strictly required here in that
we could probably re-derive it from the QualType (which would
require CGF access though), but storing it seems like the simpler
solution.
The global register case is special and does not store an element
type, as the value is not a pointer type in that case and it's not
possible to create an Address from it.
This is the main remaining part from D103465.
Differential Revision: https://reviews.llvm.org/D115791
CreateElementBitCast() can preserve the pointer element type in
the presence of opaque pointers, so use it in place of CreateBitCast()
in some places. This also sometimes simplifies the code a bit.
Change all uses of the deprecated constructor to pass the
element type explicitly and drop it.
For cases where the correct element type was not immediately
obvious to me or would require a slightly larger change I'm
falling back to explicitly calling getPointerElementType() for now.
Explicitly track the pointer element type in Address, rather than
deriving it from the pointer type, which will no longer be possible
with opaque pointers. This just adds the basic facility, for now
everything is still going through the deprecated constructors.
I had to adjust one place in the LValue implementation to satisfy
the new assertions: Global registers are represented as a
MetadataAsValue, which does not have a pointer type. We should
avoid using Address in this case.
This implements a part of D103465.
Differential Revision: https://reviews.llvm.org/D115725
In 32bit mode, attaching TBAA metadata to the store following the call
to inline assembler results in describing the wrong type by making a
fake lvalue(i.e., whatever the inline assembler happens to leave in
EAX:EDX.) Even if inline assembler somehow describes the correct type,
setting TBAA information on return type of call to inline assembler is
likely not correct, since TBAA rules need not apply to inline assembler.
Differential Revision: https://reviews.llvm.org/D115320
This no longer allows creating an invalid Address through the regular
constructor. There were only two places that did this (AggValueSlot
and EHCleanupScope) which did this by converting a potential nullptr
into an Address. I've fixed both of these by directly storing an
Address instead.
This is intended as a bit of preliminary cleanup for D103465.
Differential Revision: https://reviews.llvm.org/D115630
This reverts commit 800bf8ed29.
The `Instrumentation/InstrProfiling/debug-info-correlate.ll` test was
failing because I forgot the `llc` commands are architecture specific.
I'll follow up with a fix.
Differential Revision: https://reviews.llvm.org/D115689
Add the llvm flag `-debug-info-correlate` to attach debug info to instrumentation counters so we can correlate raw profile data to their functions. Raw profiles are dumped as `.proflite` files. The next diff enables `llvm-profdata` to consume `.proflite` and debug info files to produce a normal `.profdata` profile.
Part of the "lightweight instrumentation" work: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D114565
There are instances where clang codegen creates stores to
address space 4 in ctors, which causes a crash in llc.
This store was being optimized out at opt levels > 0.
For example:
pragma omp declare target
static const double log_smallx = log2(smallx);
pragma omp end declare target
This patch ensures that any global const that does not
have constant initialization stays in address space 1.
Note - a second patch is in the works where all global
constants are placed in address space 1 during
codegen and then the opt pass InferAdressSpaces
will promote to address space 4 where necessary.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D115661
This reverts commit 2b554920f1.
This change causes tsan test timeout on x86_64-linux-autoconf.
The timeout can be reproduced by:
git clone https://github.com/llvm/llvm-zorg.git
BUILDBOT_CLOBBER= BUILDBOT_REVISION=eef8f3f85679c5b1ae725bade1c23ab7bb6b924f llvm-zorg/zorg/buildbot/builders/sanitizers/buildbot_standard.sh
The problem with the old scheme is that we would need to keep track of
the "next region" and reset the num_threads value after it. The new RT
doesn't do it and an assertion is triggered. The old RT doesn't do it
either, I haven't tested it but I assume a num_threads clause might
impact multiple parallel regions "accidentally". Further, in SPMD mode
num_threads was simply ignored, for some reason beyond me.
In any case, parallel_51 is designed to take the clause value directly,
so let's do that instead.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D113623
I found that the coroutine intrinsic llvm.coro.param in documentation
(https://llvm.org/docs/Coroutines.html#id101) didn't get used actually
since there isn't lowering codes in LLVM. I also checked the
implementation of libstdc++ and libc++. Both of them didn't use
llvm.coro.param. So I am pretty sure that the llvm.coro.param intrinsic
is unused. I think it would be better t to remove it to avoid possible
misleading understandings.
Note: according to [class.copy.elision]/p1.3, this optimization is
allowed by the C++ language specification. Let's make it someday.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D115222
Update `OpenMPIRBuilder::collapseLoops()` to call `resize()` instead of
`set_size()`. The latter asserts on capacity limits and cannot grow,
which seems likely to be unintentional here (if it is, I think a local
assertion would be good for clarity).
Also update `CodeGenFunction::EmitOMPCollapsedCanonicalLoopNest()` to
use `pop_back_n()` instead of `set_size()`.
Differential Revision: https://reviews.llvm.org/D115378
This patch translates HIP kernels to SPIR-V kernels when the HIP
compilation mode is targeting SPIR-S. This involves:
* Setting Cuda calling convention to CC_OpenCLKernel (which maps to
SPIR_KERNEL in LLVM IR later on).
* Coercing pointer arguments with default address space (AS) qualifier
to CrossWorkGroup AS (__global in OpenCL). HIPSPV's device code is
ultimately SPIR-V for OpenCL execution environment (as
starter/default) where Generic or Function (OpenCL's private) is not
supported as storage class for kernel pointer types. This leaves the
CrossWorkGroup to be the only reasonable choice for HIP buffers.
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D109818
This patch fixes issues for -fgpu-rdc for Windows MSVC
toolchain:
Fix COFF specific section flags and remove section types
in llvm-mc input file for Windows.
Escape fatbin path in llvm-mc input file.
Add -triple option to llvm-mc.
Put __hip_gpubin_handle in comdat when it has linkonce_odr
linkage.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D115039
WG14 adopted the _ExtInt feature from Clang for C23, but renamed the
type to be _BitInt. This patch does the vast majority of the work to
rename _ExtInt to _BitInt, which accounts for most of its size. The new
type is exposed in older C modes and all C++ modes as a conforming
extension. However, there are functional changes worth calling out:
* Deprecates _ExtInt with a fix-it to help users migrate to _BitInt.
* Updates the mangling for the type.
* Updates the documentation and adds a release note to warn users what
is going on.
* Adds new diagnostics for use of _BitInt to call out when it's used as
a Clang extension or as a pre-C23 compatibility concern.
* Adds new tests for the new diagnostic behaviors.
I want to call out the ABI break specifically. We do not believe that
this break will cause a significant imposition for early adopters of
the feature, and so this is being done as a full break. If it turns out
there are critical uses where recompilation is not an option for some
reason, we can consider using ABI tags to ease the transition.
The ray_origin, ray_dir and ray_inv_dir arguments should all be vec3 to
match how the hardware instruction works.
Don't change the API of the corresponding OpenCL builtins.
Differential Revision: https://reviews.llvm.org/D115032
With C++17 the exception specification has been made part of the
function type, and therefore part of mangled type names.
However, it's valid to convert function pointers with an exception
specification to function pointers with the same argument and return
types but without an exception specification, which means that e.g. a
function of type "void () noexcept" can be called through a pointer
of type "void ()". We must therefore consider the two types to be
compatible for CFI purposes.
We can do this by stripping the exception specification before mangling
the type name, which is what this patch does.
Differential Revision: https://reviews.llvm.org/D115015
This does similar thing to 6b1341e, but fixes single element 128-bit
float type: `struct { long double x; }`.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D114937
Glibc 2.32 and newer uses these symbol names to support IEEE-754 128-bit
float. GCC transforms name of these builtins to align with Glibc header
behavior.
Since Clang doesn't have all GCC-compatible builtins implemented, this
patch only mutates the implemented part.
Note nexttoward is a special case (no nexttowardf128) so it's also
handled here.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D112401
If clang's output is set to bitcode and LTO is enabled, clang would
unconditionally add the flag to the module. Unfortunately, if the input were a
bitcode or IR file and had the flag set, this would result in two copies of the
flag, which is illegal IR. Guard the setting of the flag by checking whether it
already exists. This follows existing practice for the related "ThinLTO" module
flag.
Differential Revision: https://reviews.llvm.org/D112177
Handle branch protection option on the commandline as well as a function
attribute. One patch for both mechanisms, as they use the same underlying
parsing mechanism.
These are recorded in a set of LLVM IR module-level attributes like we do for
AArch64 PAC/BTI (see https://reviews.llvm.org/D85649):
- command-line options are "translated" to module-level LLVM IR
attributes (metadata).
- functions have PAC/BTI specific attributes iff the
__attribute__((target("branch-protection=...))) was used in the function
declaration.
- command-line option -mbranch-protection to armclang targeting Arm,
following this grammar:
branch-protection ::= "-mbranch-protection=" <protection>
protection ::= "none" | "standard" | "bti" [ "+" <pac-ret-clause> ]
| <pac-ret-clause> [ "+" "bti"]
pac-ret-clause ::= "pac-ret" [ "+" <pac-ret-option> ]
pac-ret-option ::= "leaf" ["+" "b-key"] | "b-key" ["+" "leaf"]
b-key is simply a placeholder to make it consistent with AArch64's
version. In Arm, however, it triggers a warning informing that b-key is
unsupported and a-key will be selected instead.
- Handle _attribute_((target(("branch-protection=..."))) for AArch32 with the
same grammer as the commandline options.
This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:
https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension
The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:
https://developer.arm.com/documentation/ddi0553/latest
The following people contributed to this patch:
- Momchil Velikov
- Victor Campos
- Ties Stuij
Reviewed By: vhscampos
Differential Revision: https://reviews.llvm.org/D112421
See discussion in D51650, this change was a little aggressive in an
error while doing a 'while we were here', so this removes that error
condition, as it is apparently useful.
This reverts commit bb4934601d.
Currently variables appearing inside private/firstprivate/lastprivate
clause of openmp task construct are not visible inside lldb debugger.
This is because compiler does not generate debug info for it.
Please consider the testcase debug_private.c attached with patch.
```
28 #pragma omp task shared(res) private(priv1, priv2) firstprivate(fpriv)
29 {
30 priv1 = n;
31 priv2 = n + 2;
32 printf("Task n=%d,priv1=%d,priv2=%d,fpriv=%d\n",n,priv1,priv2,fpriv);
33
-> 34 res = priv1 + priv2 + fpriv + foo(n - 1);
35 }
36 #pragma omp taskwait
37 return res;
(lldb) p priv1
error: <user expression 0>:1:1: use of undeclared identifier 'priv1'
priv1
^
(lldb) p priv2
error: <user expression 1>:1:1: use of undeclared identifier 'priv2'
priv2
^
(lldb) p fpriv
error: <user expression 2>:1:1: use of undeclared identifier 'fpriv'
fpriv
^
```
After the current patch, lldb is able to show the variables
```
(lldb) p priv1
(int) $0 = 10
(lldb) p priv2
(int) $1 = 12
(lldb) p fpriv
(int) $2 = 14
```
Reviewed By: djtodoro
Differential Revision: https://reviews.llvm.org/D114504
Add an AtomicScopeModel for HIP and support for OpenCL builtins
that are missing in HIP.
Patch by: Michael Liao
Revised by: Anshil Ghandi
Reviewed by: Yaxun Liu
Differential Revision: https://reviews.llvm.org/D113925
Eachempati.
This patch adds clang (parsing, sema, serialization, codegen) support for the 'depend' clause on the 'taskwait' directive.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D113540
AMD64 ABI mandates caller to specify the number of used SSE registers
when passing variable arguments.
GCC also provides option -mskip-rax-setup to skip the setup of rax when
SSE is disabled. This helps to reduce the code size, see pr23258.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D112413
With this,
void f() { __asm__("mov eax, ebx"); }
now compiles with clang with -masm=intel.
This matches gcc.
The flag is not accepted in clang-cl mode. It has no effect on
MSVC-style `__asm {}` blocks, which are unconditionally in intel
mode both before and after this change.
One difference to gcc is that in clang, inline asm strings are
"local" while they're "global" in gcc. Building the following with
-masm=intel works with clang, but not with gcc where the ".att_syntax"
from the 2nd __asm__() is in effect until file end (or until a
".intel_syntax" somewhere later in the file):
__asm__("mov eax, ebx");
__asm__(".att_syntax\nmovl %ebx, %eax");
__asm__("mov eax, ebx");
This also updates clang's intrinsic headers to work both in
-masm=att (the default) and -masm=intel modes.
The official solution for this according to "Multiple assembler dialects in asm
templates" in gcc docs->Extensions->Inline Assembly->Extended Asm
is to write every inline asm snippet twice:
bt{l %[Offset],%[Base] | %[Base],%[Offset]}
This works in LLVM after D113932 and D113894, so use that.
(Just putting `.att_syntax` at the start of the snippet works in some but not
all cases: When LLVM interpolates in parameters like `%0`, it uses at&t or
intel syntax according to the inline asm snippet's flavor, so the `.att_syntax`
within the snippet happens to late: The interpolated-in parameter is already
in intel style, and then won't parse in the switched `.att_syntax`.)
It might be nice to invent a `#pragma clang asm_dialect push "att"` /
`#pragma clang asm_dialect pop` to be able to force asm style per snippet,
so that the inline asm string doesn't contain the same code in two variants,
but let's leave that for a follow-up.
Fixes PR21401 and PR20241.
Differential Revision: https://reviews.llvm.org/D113707
Calls to MMA builtins that take pointer to void
do not accept other pointers/arrays whereas normal
functions with the same parameter do. This patch
allows MMA built-ins to accept non-void pointers
and arrays.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D113306
category is empty
Currently, if we create a category in ObjC that is empty, we still emit
runtime metadata for that category. This is a scenario that could
commonly be run into when using __attribute__((objc_direct_members)),
which elides the need for much of the category metadata. This is
slightly wasteful and can be easily skipped by checking the category
metadata contents during CodeGen.
rdar://66177182
Differential Revision: https://reviews.llvm.org/D113455
As discussed here: https://lwn.net/Articles/691932/
GCC6.0 adds target_clones multiversioning. This functionality is
an odd cross between the cpu_dispatch and 'target' MV, but is compatible
with neither.
This attribute allows you to list all options, then emits a separately
optimized version of each function per-option (similar to the
cpu_specific attribute). It automatically generates a resolver, just
like the other two.
The mangling however, is... ODD to say the least. The mangling format
is:
<normal_mangling>.<option string>.<option ordinal>.
Differential Revision:https://reviews.llvm.org/D51650
Per C++17 [except.spec], 'throw()' has become equivalent to
'noexcept', and should therefore call std::terminate, not
std::unexpected.
Differential Revision: https://reviews.llvm.org/D113517
this patch - https://reviews.llvm.org/D110337 changes the way how hostcall
hidden argument is emitted for printf, but the sanitized kernels also use
hostcall buffer to report a error for invalid memory access, which is not
handled by the above patch and it leads to vdi runtime error:
Device::callbackQueue aborting with error : HSA_STATUS_ERROR_MEMORY_FAULT:
Agent attempted to access an inaccessible address. code: 0x2b
Patch by: Praveen Velliengiri
Reviewed by: Yaxun Liu, Matt Arsenault
Differential Revision: https://reviews.llvm.org/D112820
Two identical instantiations of a template function can be emitted by two TU's
with linkonce_odr linkage without causing duplicate symbols in linker. MSVC
also requires these symbols be in comdat sections. Linux does not require
the symbols in comdat sections to be merged by linker but by default
clang puts them in comdat sections.
If a template kernel is instantiated identically in two TU's. MSVC requires
that them to be in comdat sections, otherwise MSVC linker will diagnose them as
duplicate symbols. However, currently clang does not put instantiated template
kernels in comdat sections, which causes link error for MSVC.
This patch allows putting instantiated template kernels into comdat sections.
Reviewed by: Artem Belevich, Reid Kleckner
Differential Revision: https://reviews.llvm.org/D112492
After the changes introduced by D106799 it is possible to tag
outlined function with both AlwaysInline and NoInline attributes using
-fno-inline command line options.
This issue is similiar to D107649.
Differential Revision: https://reviews.llvm.org/D112645
Extension of D112504. Lower amdgpu printf to `__llvm_omp_vprintf`
which takes the same const char*, void* arguments as cuda vprintf and also
passes the size of the void* alloca which will be needed by a non-stub
implementation of `__llvm_omp_vprintf` for amdgpu.
This removes the amdgpu link error on any printf in a target region in favour
of silently compiling code that doesn't print anything to stdout.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D112680
at the start of the entry block, which in turn would aid better code transformation/optimization.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D110257
This patch removes the assumption propagation that was added in D110655
primarily to get assumption informatino on opaque call sites for
optimizations. The analysis done in D111445 allows us to do this more
intelligently in the back-end.
Depends on D111445
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D111463
add tracing for loads and stores.
The primary goal is to have more options for data-flow-guided fuzzing,
i.e. use data flow insights to perform better mutations or more agressive corpus expansion.
But the feature is general puspose, could be used for other things too.
Pipe the flag though clang and clang driver, same as for the other SanitizerCoverage flags.
While at it, change some plain arrays into std::array.
Tests: clang flags test, LLVM IR test, compiler-rt executable test.
Reviewed By: morehouse
Differential Revision: https://reviews.llvm.org/D113447
Previously, the Backend_Emit{Nothing,BC,LL} modes did
not run the LLVM verifier since it is usually added via
the TargetMachine::addPassesToEmitFile method according
to the DisableVerify parameter. This is called from
EmitAssemblyHelper::AddEmitPasses, which is only relevant
for BackendAction-s that require CodeGen.
Note:
* In these particular situations the verifier is added
to the optimization pipeline rather than the codegen
pipeline so that it runs prior to the BC/LL emission
pass.
* This change applies to both the old and the new PMs.
* Because the clang tests use -emit-llvm ubiquitously,
this change will enable the verifier for them.
* A small bug is fixed in emitIFuncDefinition so that
the clang/test/CodeGen/ifunc.c test would pass:
the emitIFuncDefinition incorrectly passed the
GlobalDecl of the IFunc itself to the call to
GetOrCreateLLVMFunction for creating the resolver.
Signed-off-by: Itay Bookstein <ibookstein@gmail.com>
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D113352