https://reviews.llvm.org/D23944 implemented the #pragma intrinsic from
MSVC. This causes the statement #pragma intrinsic(cpuid) to fail [0]
on Clang because cpuid is currently implemented in intrin.h instead
of a Clang builtin. Reimplementing cpuid (as well as it's releated
function, cpuidex) should resolve this.
[0]: https://crbug.com/1279344
Differential revision: https://reviews.llvm.org/D121653
PowerPC is lacking tests checking `_Atomic` alignment in cfe. Adding these tests since we're going to make change to align with gcc on Linux.
Reviewed By: hubert.reinterpretcast, jsji
Differential Revision: https://reviews.llvm.org/D121441
FLT_EVAL_METHOD tells the user the precision at which, temporary results
are evaluated but when fast-math is enabled, the numeric values are not
guaranteed to match the source semantics, so the eval-method is
meaningless.
For example, the expression `x + y + z` has as source semantics `(x + y)
+ z`. FLT_EVAL_METHOD is telling the user at which precision `(x + y)`
is evaluated. With fast-math enable the compiler can choose to
evaluate the expression as `(y + z) + x`.
The correct behavior is to set the FLT_EVAL_METHOD to `-1` to tell the
user that the precision of the intermediate values is unknow. This
patch is doing that.
Differential Revision: https://reviews.llvm.org/D121122
Currently we allow half types in vectors if the scalar Zfh extension
is enabled. This behavior is not inline with the vector spec. For f32
and f64 types, the Zve32f, Zve64f, Zve64d, and V explicitly control
the availablity of floating point types in vectors.
In order to make our compiler compliant, we either need to remove all support
for half in vectors or we need an extension to control it.
Draft spec here https://github.com/riscv/riscv-v-spec/pull/780
Reviewed By: kito-cheng
Differential Revision: https://reviews.llvm.org/D121345
Includes verifier changes checking the elementtype, clang codegen
changes to emit the elementtype, and ISel changes using the elementtype.
Basically the same as D120527.
Reviewed By: #opaque-pointers, nikic
Differential Revision: https://reviews.llvm.org/D121847
Fix the instruction names to match the WebAssembly spec:
- `i32x4.trunc_sat_zero_f64x2_{s,u}` => `i32x4.trunc_sat_f64x2_{s,u}_zero`
- `f32x4.demote_zero_f64x2` => `f32x4.demote_f64x2_zero`
Also rename related things like intrinsics, builtins, and test functions to
match.
Reviewed By: aheejin
Differential Revision: https://reviews.llvm.org/D121661
Current ASTContext.getAttributedType() takes attribute kind,
ModifiedType and EquivType as the hash to decide whether an AST node
has been generated or note. But this is not enough for btf_type_tag
as the attribute might have the same ModifiedType and EquivType, but
still have different string associated with attribute.
For example, for a data structure like below,
struct map_value {
int __attribute__((btf_type_tag("tag1"))) __attribute__((btf_type_tag("tag3"))) *a;
int __attribute__((btf_type_tag("tag2"))) __attribute__((btf_type_tag("tag4"))) *b;
};
The current ASTContext.getAttributedType() will produce
an AST similar to below:
struct map_value {
int __attribute__((btf_type_tag("tag1"))) __attribute__((btf_type_tag("tag3"))) *a;
int __attribute__((btf_type_tag("tag1"))) __attribute__((btf_type_tag("tag3"))) *b;
};
and this is incorrect.
It is very difficult to use the current AttributedType as it is hard to
get the tag information. To fix the problem, this patch introduced
BTFTagAttributedType which is similar to AttributedType
in many ways but with an additional BTFTypeTagAttr. The tag itself can
be retrieved with BTFTypeTagAttr.
With the new BTFTagAttributed type, the debuginfo code can be greatly
simplified compared to previous TypeLoc based approach.
Differential Revision: https://reviews.llvm.org/D120296
Add the rest of intrinsics to clang except intrinsics using vector mask
registers.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D121586
This is the `ext_vector_type` alternative to D81083.
This patch extends Clang to allow 'bool' as a valid vector element type
(attribute ext_vector_type) in C/C++.
This is intended as the canonical type for SIMD masks and facilitates
clean vector intrinsic declarations. Vectors of i1 are supported on IR
level and below down to many SIMD ISAs, such as AVX512, ARM SVE (fixed
vector length) and the VE target (NEC SX-Aurora TSUBASA).
The RFC on cfe-dev: https://lists.llvm.org/pipermail/cfe-dev/2020-May/065434.html
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D88905
This fixes a bug that happens when using -fdebug-prefix-map to remap an
absolute path to a relative path. Since the path was absolute before
remapping, it is safe to assume that concatenating the remapped working
directory would be wrong.
This was originally submitted as https://reviews.llvm.org/D113718, but
reverted because when testing with dwarf 5 enabled, the tests were too
strict.
Differential Revision: https://reviews.llvm.org/D121663
On unix systems this logic would not separate the file and directory of
the DIFile unless they shared more components at the start than just the
root path character. The logic to do this was unix specific so it didn't
work on Windows. Now we check if the entire root_path is the same as
what you were going to set as the Dir and use the full filepath in that
case.
Differential Revision: https://reviews.llvm.org/D111579
Motivation:
```
int test(int x, int y) {
int r = 0;
[[clang::always_inline]] r += foo(x, y); // force compiler to inline this function here
return r;
}
```
In 2018, @kuhar proposed "Introduce per-callsite inline intrinsics" in https://reviews.llvm.org/D51200 to solve this motivation case (and many others).
This patch solves this problem with call site attribute. "noinline" statement attribute already landed in D119061. Also, some LLVM Inliner fixes landed so call site attribute is stronger than function attribute.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D120717
Includes verifier changes checking the elementtype, clang codegen
changes to emit the elementtype, and ISel changes using the elementtype.
Reviewed By: #opaque-pointers, nikic
Differential Revision: https://reviews.llvm.org/D120527
RequireAnalysis<GlobalsAA> doesn't actually recompute GlobalsAA.
GlobalsAA isn't invalidated (unless specifically invalidated) because
it's self-updating via ValueHandles, but can be imprecise during the
self-updates.
Rather than invalidating GlobalsAA, which would invalidate AAManager and
any analyses that use AAManager, create a new pass that recomputes
GlobalsAA.
Fixes#53131.
Differential Revision: https://reviews.llvm.org/D121167
Due to various implementation constraints, despite the programmer
choosing a 'processor' cpu_dispatch/cpu_specific needs to use the
'feature' list of a processor to identify it. This results in the
identified processor in source-code not being propogated to the
optimizer, and thus, not able to be tuned for.
This patch changes to use the actual cpu as written for tune-cpu so that
opt can make decisions based on the cpu-as-spelled, which should better
match the behavior expected by the programmer.
Note that the 'valid' list of processors for x86 is in
llvm/include/llvm/Support/X86TargetParser.def. At the moment, this list
contains only Intel processors, but other vendors may wish to add their
own entries as 'alias'es (or with different feature lists!).
If this is not done, there is two potential performance issues with the
patch, but I believe them to be worth it in light of the improvements to
behavior and performance.
1- In the event that the user spelled "ProcessorB", but we only have the
features available to test for "ProcessorA" (where A is B minus
features),
AND there is an optimization opportunity for "B" that negatively affects
"A", the optimizer will likely choose to do so.
2- In the event that the user spelled VendorI's processor, and the
feature
list allows it to run on VendorA's processor of similar features, AND
there
is an optimization opportunity for VendorIs that negatively affects
"A"s,
the optimizer will likely choose to do so. This can be fixed by adding
an
alias to X86TargetParser.def.
Differential Revision: https://reviews.llvm.org/D121410
This patch implements support for the +, -, *, / and % operators on sizeless SVE
types. Support for these operators on svbool_t is excluded.
Differential Revision: https://reviews.llvm.org/D120323
Giving an int parameter to SVE intrinsics svptrue and svcnt caused Clang
to crash on compilation. Changing their parameter types to void instead of
omitting args results in a diagnostic error message instead.
Differential Revision: https://reviews.llvm.org/D121294
The llvm pre-merge test got timeout due to large test files, this commit
split up the files that have over 10k lines under clang/test/CodeGen/RISCV
into even smaller ones.
Differential Revision: https://reviews.llvm.org/D121431
Currently in Clang, we have two types of builtins for fnmsub operation:
one for float/double vector, they'll be transformed into IR operations;
one for float/double scalar, they'll generate corresponding intrinsics.
But for the vector version of builtin, the 3 op chain may be recognized
as expensive by some passes (like early cse). We need some way to keep
the fnmsub form until code generation.
This patch introduces ppc.fnmsub.* intrinsic to unify four fnmsub
intrinsics.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D116015
This commit divides the large test files(over 30k lines) under clang/test/CodeGen/RISCV including:
rvv-intrinsics/vloxseg.c
rvv-intrinsics/vluxseg.c
rvv-intrinsics-overloaded/vloxseg.c
rvv-intrinsics-overloaded/vluxseg.c
into "non-masked" version and "masked" version which can reduce the test cases by 50% in a single file.
Differential Revision: https://reviews.llvm.org/D120967
Clang is crashing on the following statement
char var[9];
__asm__ ("" : "=r" (var) : "0" (var));
This is similar to existing test: crbug_999160_regtest
The issue happens when EmitAsmStmt is trying to convert input to match
output type length. However, that is not guaranteed to be successful all the
time and if the statement itself is invalid like having an array type in
the example, we should give a regular error message here instead of
using assert().
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D120596
NOTE: this is a follow-up commit with the missing clang-side changes.
This patch adds builtins and intrinsics for the f16 and f16x2 variants of the ex2
instruction.
These two variants were added in PTX7.0, and are supported by sm_75 and above.
Note that this isn't wired with the exp2 llvm intrinsic because the ex2
instruction is only available in its approx variant.
Running ptxas on the assembly generated by the test f16-ex2.ll works as
expected.
Differential Revision: https://reviews.llvm.org/D119157
Currently adding attribute no_sanitize("bounds") isn't disabling
-fsanitize=local-bounds (also enabled in -fsanitize=bounds). The Clang
frontend handles fsanitize=array-bounds which can already be disabled by
no_sanitize("bounds"). However, instrumentation added by the
BoundsChecking pass in the middle-end cannot be disabled by the
attribute.
The fix is very similar to D102772 that added the ability to selectively
disable sanitizer pass on certain functions.
In this patch, if no_sanitize("bounds") is provided, an additional
function attribute (NoSanitizeBounds) is attached to IR to let the
BoundsChecking pass know we want to disable local-bounds checking. In
order to support this feature, the IR is extended (similar to D102772)
to make Clang able to preserve the information and let BoundsChecking
pass know bounds checking is disabled for certain function.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D119816
NVVM IR specification defines them with i32 return type:
declare i32 @llvm.nvvm.match.any.sync.i64(i32 %membermask, i64 %value)
declare {i32, i1} @llvm.nvvm.match.all.sync.i64(i32 %membermask, i64 %value)
...
The i32 return value is a 32-bit mask where bit position in mask corresponds
to thread’s laneid.
as well as PTX ISA:
9.7.12.8. Parallel Synchronization and Communication Instructions: match.sync
match.any.sync.type d, a, membermask;
match.all.sync.type d[|p], a, membermask;
...
Destination d is a 32-bit mask where bit position in mask corresponds
to thread’s laneid.
Additionally, ptxas doesn't accept intructions, produced by NVPTX backend.
After this patch, it compiles with no issues.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D120499
Motivation:
```
int foo(int x, int y) { // any compiler will happily inline this function
return x / y;
}
int test(int x, int y) {
int r = 0;
[[clang::noinline]] r += foo(x, y); // for some reason we don't want any inlining here
return r;
}
```
In 2018, @kuhar proposed "Introduce per-callsite inline intrinsics" in https://reviews.llvm.org/D51200 to solve this motivation case (and many others).
This patch solves this problem with call site attribute. The implementation is "smaller" wrt approach which uses new intrinsics and thanks to https://reviews.llvm.org/D79121 (Add nomerge statement attribute to clang), we have got some basic infrastructure to deal with attrs on statements with call expressions.
GCC devs are more inclined to call attribute solution as well, as builtins are problematic for them - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104187. But they have no patch proposal yet so.. We have free hands here.
If this approach makes sense, next future steps would be support for call site attributes for always_inline / flatten.
Reviewed By: aaron.ballman, kuhar
Differential Revision: https://reviews.llvm.org/D119061
The purpose of this change is to fix the following codegen bug:
```
// main.c
__attribute__((cpu_specific(generic)))
int *foo(void) { static int z; return &z;}
int main() { return *foo() = 5; }
// other.c
__attribute__((cpu_dispatch(generic))) int *foo(void);
// run:
clang main.c other.c -o main; ./main
```
This will segfault prior to the change, and return the correct
exit code 5 after the change.
The underlying cause is that when a translation unit contains
a cpu_specific function without the corresponding cpu_dispatch
the generated code binds the reference to foo() against a
GlobalIFunc whose resolver is undefined. This is invalid: the
resolver must be defined in the same translation unit as the
ifunc, but historically the LLVM bitcode verifier did not check
that. The generated code then binds against the resolver rather
than the ifunc, so it ends up calling the resolver rather than
the resolvee. In the example above it treats its return value as
an int *, therefore trying to write to program text.
The root issue at the representation level is that GlobalIFunc,
like GlobalAlias, does not support a "declaration" state. The
object which provides the correct semantics in these cases
is a Function declaration, but unlike Functions, changing a
declaration to a definition in the GlobalIFunc case constitutes
a change of the object type, as opposed to simply emitting code
into a Function.
I think this limitation is unlikely to change, so I implemented
the fix by returning a function declaration rather than an ifunc
when encountering cpu_specific, and upgrading it to an ifunc
when emitting cpu_dispatch.
This uses `takeName` + `replaceAllUsesWith` in similar vein to
other places where the correct IR object type cannot be known
locally/up-front, like in `CodeGenModule::EmitAliasDefinition`.
Previous discussion in: https://reviews.llvm.org/D112349
Signed-off-by: Itay Bookstein <ibookstein@gmail.com>
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D120266
This fixes a bug that happens when using -fdebug-prefix-map to remap
an absolute path to a relative path. Since the path was absolute
before remapping, it is safe to assume that concatenating the remapped
working directory would be wrong.
Differential Revision: https://reviews.llvm.org/D113718
This patch adds -Wno-strict-prototypes to all of the test cases that
use functions without prototypes, but not as the primary concern of the
test. e.g., attributes testing whether they can/cannot be applied to a
function without a prototype, etc.
This is done in preparation for enabling -Wstrict-prototypes by
default.
A significant number of our tests in C accidentally use functions
without prototypes. This patch converts the function signatures to have
a prototype for the situations where the test is not specific to K&R C
declarations. e.g.,
void func();
becomes
void func(void);
This is the final batch of tests being updated to add prototypes,
hopefully.
Now that integer min/max intrinsics have good support in both
InstCombine and other passes, start canonicalizing SPF min/max
to intrinsic min/max.
Once this sticks, we can stop matching SPF min/max in various
places, and can remove hacks we have for preventing infinite loops
and breaking of SPF canonicalization.
Differential Revision: https://reviews.llvm.org/D98152
-fdata-sections decides whether global variables go into different sections.
This is orthogonal to whether we place their metadata (`.data` or `asan_globals`) into different sections.
With -fno-data-sections, `-fsanitize-address-globals-dead-stripping` can still:
* deduplicate COMDAT `asan.module_ctor` and `asan.module_dtor`
* (with ld --gc-sections): for a data section (e.g. `.data`), if all global variables defined relative to it are unreferenced, discard them and associated `asan_globals` sections (rare but no need to exclude this case)
Similar to c7b90947bd for PE/COFF.
Reviewed By: #sanitizers, kstoimenov, vitalybuka
Differential Revision: https://reviews.llvm.org/D120394
For ASan this will effectively serve as a synonym for
__attribute__((no_sanitize("address"))).
Adding the disable_sanitizer_instrumentation to functions will drop the
sanitize_XXX attributes on the IR level.
This is the third reland of https://reviews.llvm.org/D114421.
Now that TSan test is fixed (https://reviews.llvm.org/D120050) there
should be no deadlocks.
Differential Revision: https://reviews.llvm.org/D120055
This flag was previously renamed `enable_noundef_analysis` to
`disable-noundef-analysis,` which is not a conventional name. (Driver and
CC1's boolean options are using [no-] prefix)
As discussed at https://reviews.llvm.org/D105169, this patch reverts its
name to `[no-]enable_noundef_analysis` and enables noundef-analysis as
default.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D119998
In post-commit feedback on D104830 Jessica Clarke pointed out that
unconditionally adding __va_list to the std namespace caused namespace
debug info to be emitted in C, which is not only inappropriate but
turned out to confuse the dtrace tool. Therefore, move __va_list back
to std only in C++ so that the correct debug info is generated. We
also considered moving __va_list to the top level unconditionally
but this would contradict the specification and be visible to AST
matchers and such, so make it conditional on the language mode.
To avoid breaking name mangling for __va_list, teach the Itanium
name mangler to always mangle it as if it were in the std namespace
when targeting ARM architectures. This logic is not needed for the
Microsoft name mangler because Microsoft platforms define va_list as
a typedef of char *.
Depends on D116773
Differential Revision: https://reviews.llvm.org/D116774
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
The nomask vector Multiply-Add need a policy operand
because merge value could not be undef.
Reviewed By: monkchiang
Differential Revision: https://reviews.llvm.org/D119727
Add the passthru operand for
VMV_V_X_VL, VFMV_V_F_VL and SPLAT_VECTOR_SPLIT_I64_VL also.
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D119688
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D119686
A significant number of our tests in C accidentally use functions
without prototypes. This patch converts the function signatures to have
a prototype for the situations where the test is not specific to K&R C
declarations. e.g.,
void func();
becomes
void func(void);
This is the twelfth batch of tests being updated (the end may be in
sight soon though).
These tests are dumped without optimization, which makes them too
lengthy and contain meaningless load/stores. Clean them up to prepare
for future headers update.
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.
My plan is to handle more complex operations in follow-up patches.
Reviewers: frasercrmck
Differential Revision: https://reviews.llvm.org/D118253
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.
Add passthru operand for VSLIDE1UP_VL and VSLIDE1DOWN_VL to support
i64 scalar in rv32.
The masked VSLIDE1 would only emit mask undisturbed policy regardless
of giving mask agnostic policy until InsertVSETVLI supports mask agnostic.
Reviewed by: craig.topper, rogfer01
Differential Revision: https://reviews.llvm.org/D117989
A significant number of our tests in C accidentally use functions
without prototypes. This patch converts the function signatures to have
a prototype for the situations where the test is not specific to K&R C
declarations. e.g.,
void func();
becomes
void func(void);
This is the eleventh batch of tests being updated (there are a
significant number of other tests left to be updated).
The `__builtin_pdepd` and `__builtin_pextd` are P10 builtins that are meant to
be used under 64-bit only. For instance, when the builtins are compiled under
32-bit mode:
```
$ cat t.c
unsigned long long foo(unsigned long long a, unsigned long long b) {
return __builtin_pextd(a,b);
}
$ clang -c t.c -mcpu=pwr10 -m32
ExpandIntegerResult #0: t31: i64 = llvm.ppc.pextd TargetConstant:i32<6928>, t28, t29
fatal error: error in backend: Do not know how to expand the result of this operator!
```
This patch adds sema checking for these builtins to compile under 64-bit
mode only and on P10. The builtins will emit a diagnostic when they are compiled on
non-P10 compilations and on 32-bit mode.
Differential Revision: https://reviews.llvm.org/D118753
A significant number of our tests in C accidentally use functions
without prototypes. This patch converts the function signatures to have
a prototype for the situations where the test is not specific to K&R C
declarations. e.g.,
void func();
becomes
void func(void);
This is the tenth batch of tests being updated (there are a
significant number of other tests left to be updated).
Depending on toolchain and ABI, a target might not output DWARF unwind tables by default.
Run the test for a target with a known behaviour, test coverage is not reduced.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D119724
We have the `clang -cc1` command-line option `-funwind-tables=1|2` and
the codegen option `VALUE_CODEGENOPT(UnwindTables, 2, 0) ///< Unwind
tables (1) or asynchronous unwind tables (2)`. However, this is
encoded in LLVM IR by the presence or the absence of the `uwtable`
attribute, i.e. we lose the information whether to generate want just
some unwind tables or asynchronous unwind tables.
Asynchronous unwind tables take more space in the runtime image, I'd
estimate something like 80-90% more, as the difference is adding
roughly the same number of CFI directives as for prologues, only a bit
simpler (e.g. `.cfi_offset reg, off` vs. `.cfi_restore reg`). Or even
more, if you consider tail duplication of epilogue blocks.
Asynchronous unwind tables could also restrict code generation to
having only a finite number of frame pointer adjustments (an example
of *not* having a finite number of `SP` adjustments is on AArch64 when
untagging the stack (MTE) in some cases the compiler can modify `SP`
in a loop).
Having the CFI precise up to an instruction generally also means one
cannot bundle together CFI instructions once the prologue is done,
they need to be interspersed with ordinary instructions, which means
extra `DW_CFA_advance_loc` commands, further increasing the unwind
tables size.
That is to say, async unwind tables impose a non-negligible overhead,
yet for the most common use cases (like C++ exceptions), they are not
even needed.
This patch extends the `uwtable` attribute with an optional
value:
- `uwtable` (default to `async`)
- `uwtable(sync)`, synchronous unwind tables
- `uwtable(async)`, asynchronous (instruction precise) unwind tables
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D114543
MSVC currently doesn't support 80 bits long double. But ICC does support
it on Windows. Besides, there're also some users asked for this feature.
We can find the discussions from stackoverflow, msdn etc.
Given Clang has already support `-mlong-double-80`, extending it to
support for Windows seems worthwhile.
Reviewed By: rnk, erichkeane
Differential Revision: https://reviews.llvm.org/D115441
without prototypes. This patch converts the function signatures to have
a prototype for the situations where the test is not specific to K&R C
declarations. e.g.,
void func();
becomes
void func(void);
This is the ninth batch of tests being updated (there are a
significant number of other tests left to be updated).
Masked reduction intrinsics are specical cases which don't need to have policy
operand. The mask only affects which elements are read. It doesn't effect the
destination register.
The reduction intrinsics have a dedicated destination operand. If it
is undef, we use tail agnostic. If it not undef we use tail
undisturbed.
Co-Authored-by: Craig Topper <craig.topper@sifive.com>
Differential Revision: https://reviews.llvm.org/D117681
In Clang we can attach TBAA metadata based on the load/store intrinsics
based on the operation's element type.
This also contains changes to InstCombine where the AArch64-specific
intrinsics are transformed into generic LLVM load/store operations,
to ensure that all metadata is transferred to the new instruction.
There will be some further work after this patch to also emit TBAA
metadata for SVE's gather/scatter- and struct load/store intrinsics.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D119319
This introduces the new __ARM_FEATURE_MOPS ACLE feature test macro,
which signals the availability of the new Armv8.8-A/Armv9.3-A
instructions for standardising memcpy, memset and memmove operations.
This patch supersedes the one from https://reviews.llvm.org/D116160.
Differential Revision: https://reviews.llvm.org/D118199
Neither LLDB nor GDB seem to work with DWARF 5 debug information on
Windows right now.
This applies the same change as in
9c62728610 (Default to DWARFv4 on Windows)
to the MinGW driver too.
Differential Revision: https://reviews.llvm.org/D119326
Previous test in ppc-pmmintrin.c did not check IR of intrinsic function
definition. Add them and simplify.
These tests shouldn't be auto-generated, because we don't want to check
wrapper functions.
The "-fzero-call-used-regs" option tells the compiler to zero out
certain registers before the function returns. It's also available as a
function attribute: zero_call_used_regs.
The two upper categories are:
- "used": Zero out used registers.
- "all": Zero out all registers, whether used or not.
The individual options are:
- "skip": Don't zero out any registers. This is the default.
- "used": Zero out all used registers.
- "used-arg": Zero out used registers that are used for arguments.
- "used-gpr": Zero out used registers that are GPRs.
- "used-gpr-arg": Zero out used GPRs that are used as arguments.
- "all": Zero out all registers.
- "all-arg": Zero out all registers used for arguments.
- "all-gpr": Zero out all GPRs.
- "all-gpr-arg": Zero out all GPRs used for arguments.
This is used to help mitigate Return-Oriented Programming exploits.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D110869
This introduces the new __ARM_FEATURE_MOPS ACLE feature test macro,
which signals the availability of the new Armv8.8-A/Armv9.3-A
instructions for standardising memcpy, memset and memmove operations.
This patch supersedes the one from https://reviews.llvm.org/D116160.
Differential Revision: https://reviews.llvm.org/D118199
The above change assumed that malloc (and friends) would always
allocate memory to getNewAlign(), even for allocations which have a
smaller size. This is not actually required by spec (a 1-byte
allocation may validly have 1-byte alignment).
Some real-world malloc implementations do not provide this guarantee,
and thus this optimization is breaking programs.
Fixes#53540
This reverts commit c2297544c0.
Differential Revision: https://reviews.llvm.org/D118804
D117898 added the generic __builtin_elementwise_add_sat and __builtin_elementwise_sub_sat with the same integer behaviour as the SSE/AVX instructions
This patch removes the __builtin_ia32_padd/psub saturated intrinsics and just uses the generics - the existing tests see no changes:
__m256i test_mm256_adds_epi8(__m256i a, __m256i b) {
// CHECK-LABEL: test_mm256_adds_epi8
// CHECK: call <32 x i8> @llvm.sadd.sat.v32i8(<32 x i8> %{{.*}}, <32 x i8> %{{.*}})
return _mm256_adds_epi8(a, b);
}
This patch implements `__builtin_elementwise_add_sat` and `__builtin_elementwise_sub_sat` builtins.
These map to the add/sub saturated math intrinsics described here:
https://llvm.org/docs/LangRef.html#saturation-arithmetic-intrinsics
With this in place we should then be able to replace the x86 SSE adds/subs intrinsics with these generic variants - it looks like other targets should be able to use these as well (arm/aarch64/webassembly all have similar examples in cgbuiltin).
Differential Revision: https://reviews.llvm.org/D117898
This patch fixes a bug introduced in commit 4eaf5846d0. Commit
4eaf5846d0 sets address space of function type as program
address space unconditionally. This breaks types which have
address space qualifiers. E.g. __ptr32.
This patch fixes the bug by using address space qualifiers if
present.
Differential Revision: https://reviews.llvm.org/D119045
Some types (e.g. `_Bool`) have different scalar and memory representations. CodeGen for `va_arg` didn't take this into account, leading to an assertion failures with different types.
This patch makes sure we use memory representation for `va_arg`.
Reviewed By: ahatanak
Differential Revision: https://reviews.llvm.org/D118904
While investigating the failures of `symbolize_pc.cpp` and
`symbolize_pc_inline.cpp` on SPARC (both Solaris and Linux), I noticed that
`__builtin_extract_return_addr` is a no-op in `clang` on all targets, while
`gcc` has non-default implementations for arm, mips, s390, and sparc.
This patch provides the SPARC implementation. For background see
`SparcISelLowering.cpp` (`SparcTargetLowering::LowerReturn_32`), the SPARC
psABI p.3-12, `%i7` and p.3-16/17, and SCD 2.4.1, p.3P-10, `%i7` and
p.3P-15.
Tested (after enabling the `sanitizer_common` tests on SPARC) on
`sparcv9-sun-solaris2.11`.
Differential Revision: https://reviews.llvm.org/D91607
This patch extends clang frontend to add metadata that can be used to emit macho files with two build version load commands.
It utilizes "darwin.target_variant.triple" and "darwin.target_variant.SDK Version" metadata names for that.
MachO uses two build version load commands to represent an object file / binary that is targeting both the macOS target,
and the Mac Catalyst target. At runtime, a dynamic library that supports both targets can be loaded from either a native
macOS or a Mac Catalyst app on a macOS system. We want to add support to this to upstream to LLVM to be able to build
compiler-rt for both targets, to finish the complete support for the Mac Catalyst platform, which is right now targetable
by upstream clang, but the compiler-rt bits aren't supported because of the lack of this multiple build version support.
Differential Revision: https://reviews.llvm.org/D115415
Part of the _BitInt feature in C2x
(http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2763.pdf) is a new
macro in limits.h named BITINT_MAXWIDTH that can be used to determine
the maximum width of a bit-precise integer type. This macro must expand
to a value that is at least as large as ULLONG_WIDTH.
This adds an implementation-defined macro named __BITINT_MAXWIDTH__ to
specify that value, which is used by limits.h for the standard macro.
This also limits the maximum bit width to 128 bits because backends do
not currently support all mathematical operations (such as division) on
wider types yet. This maximum is expected to be increased in the future.
Branch protection in M-class is supported by
- Armv8.1-M.Main
- Armv8-M.Main
- Armv7-M
Attempting to enable this for other architectures, either by
command-line (e.g -mbranch-protection=bti) or by target attribute
in source code (e.g. __attribute__((target("branch-protection=..."))) )
will generate a warning.
In both cases function attributes related to branch protection will not
be emitted. Regardless of the warning, module level attributes related to
branch protection will be emitted when it is enabled via the command-line.
The following people also contributed to this patch:
- Victor Campos
Reviewed By: chill
Differential Revision: https://reviews.llvm.org/D115501
Since it's introduction, the qrdmlah has been represented as a qrdmulh
and a sadd_sat. This doesn't produce the same result for all input
values though. This patch fixes that by introducing a qrdmlah (and
qrdmlsh) intrinsic specifically for the vqrdmlah and sqrdmlah
instructions. The old test cases will now produce a qrdmulh and sqadd,
as expected.
Fixes#53120 and #50905 and #51761.
Differential Revision: https://reviews.llvm.org/D117592