This corrects the leader for the swift names. The encoding for 4.2 and
5.0 differ by a single bit on the second character and were swapped.
llvm-svn: 345360
Adds support for -mno-stack-arg-probe and -mstack-probe-size.
(Not really happy copy-pasting code, but that's what we do for all the
other Windows targets.)
Differential Revision: https://reviews.llvm.org/D53617
llvm-svn: 345354
Generate the FP16FML intrinsics into arm_neon.h (AArch64 only for now).
Add two new type modifiers to NeonEmitter to handle the new prototypes.
Define __ARM_FEATURE_FP16FML when +fp16fml is enabled and guard the
intrinsics with the macro in arm_neon.h.
Based on a patch by Gao Yiling.
Differential Revision: https://reviews.llvm.org/D53633
llvm-svn: 345344
storage class.
To be more in line with what GCC does, switch the condition to be based
on the Static Storage duration instead of the storage class.
Change-Id: I8e959d762433cda48855099353bf3c950b9d54b8
llvm-svn: 345302
Similar to how ICC handles CPU-Dispatch on Windows, this patch uses the
resolver function directly to forward the call to the proper function.
This is not nearly as efficient as IFuncs of course, but is still quite
useful for large functions specifically developed for certain
processors.
This is unfortunately still limited to x86, since it depends on
__builtin_cpu_supports and __builtin_cpu_is, which are x86 builtins.
The naming for the resolver/forwarding function for cpu-dispatch was
taken from ICC's implementation, which uses the unmodified name for this
(no mangling additions). This is possible, since cpu-dispatch uses '.A'
for the 'default' version.
In 'target' multiversioning, this function keeps the '.resolver'
extension in order to keep the default function keeping the default
mangling.
Change-Id: I4731555a39be26c7ad59a2d8fda6fa1a50f73284
Differential Revision: https://reviews.llvm.org/D53586
llvm-svn: 345298
The X86 backend will need to see the attribute to make decisions. If it isn't present the backend will have to assume large vectors may be present.
llvm-svn: 345237
Add a new driver level flag `-fcf-runtime-abi=` that allows one to specify the
runtime ABI for CoreFoundation. This controls the language interoperability.
In particular, this is relevant for generating the CFConstantString classes
(primarily through the `__builtin___CFStringMakeConstantString` builtin) which
construct a reference to the "CFObject"'s `isa` field. This type differs
between swift 4.1 and 4.2+.
Valid values for the new option include:
- objc [default behaviour] - enable ObjectiveC interoperability
- swift-4.1 - enable interoperability with swift 4.1
- swift-4.2 - enable interoperability with swift 4.2
- swift-5.0 - enable interoperability with swift 5.0
- swift [alias] - target the latest swift ABI
Furthermore, swift 4.2+ changed the layout for the CFString when building
CoreFoundation *without* ObjectiveC interoperability. In such a case, a field
was added to the CFObject base type changing it from: <{ const int*, int }> to
<{ uintptr_t, uintptr_t, uint64_t }>.
In swift 5.0, the CFString type will be further adjusted to change the length
from a uint32_t on everything but BE LP64 targets to uint64_t.
Note that the default behaviour for clang remains unchanged and the new layout
must be explicitly opted into via `-fcf-runtime-abi=swift*`.
llvm-svn: 345222
This is a continuation of my patches to inform the X86 backend about what the largest IR types are in the function so that we can restrict the backend type legalizer to prevent 512-bit vectors on SKX when -mprefer-vector-width=256 is specified if no explicit 512 bit vectors were specified by the user.
This patch updates the vector width based on the argument and return types of the current function and from the types of any functions it calls. This is intended to make sure the backend type legalizer doesn't disturb any types that are required for ABI.
Differential Revision: https://reviews.llvm.org/D52441
llvm-svn: 345168
This broke the Chromium build. See
https://bugs.chromium.org/p/chromium/issues/detail?id=898152#c1 for the
reproducer.
> Generate DILabel metadata and call llvm.dbg.label after label
> statement to associate the metadata with the label.
>
> After fixing PR37395.
> After fixing problems in LiveDebugVariables.
> After fixing NULL symbol problems in AddressPool when enabling
> split-dwarf-file.
> After fixing PR39094.
>
> Differential Revision: https://reviews.llvm.org/D45045
llvm-svn: 345026
Generate DILabel metadata and call llvm.dbg.label after label
statement to associate the metadata with the label.
After fixing PR37395.
After fixing problems in LiveDebugVariables.
After fixing NULL symbol problems in AddressPool when enabling
split-dwarf-file.
After fixing PR39094.
Differential Revision: https://reviews.llvm.org/D45045
llvm-svn: 345009
Since multiversion variant functions can be inline, in C they become
available-externally linkage. This ends up causing the variants to not
be emitted, and not available to the linker.
The solution is to make sure that multiversion functions are always
emitted by marking them linkonce.
Change-Id: I897aa37c7cbba0c1eb2c57ee881d5000a2113b75
llvm-svn: 344957
libgcc supports more than 32 features by adding a new 32-bit variable __cpu_features2.
This adds the clang support for checking these feature bits.
Patches for compiler-rt and llvm to support this are coming as well.
Probably still need an additional patch for target multiversioning in clang.
Differential Revision: https://reviews.llvm.org/D53458
llvm-svn: 344832
Summary:
The multiversioning code repurposed the code from __builtin_cpu_supports for checking if a single feature is enabled. That code essentially performed (_cpu_features & (1 << C)) != 0. But with the multiversioning path, the mask is no longer guaranteed to be a power of 2. So we return true anytime any one of the bits in the mask is set not just all of the bits.
The correct check is (_cpu_features & mask) == mask
Reviewers: erichkeane, echristo
Reviewed By: echristo
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D53460
llvm-svn: 344824
This patch exposes functionality added in rL344723 to the Clang driver/frontend
as a flag and adds appropriate metadata.
Driver tests pass:
```
ninja check-clang-driver
-snip-
Expected Passes : 472
Expected Failures : 3
Unsupported Tests : 65
```
Odd failure in CodeGen tests but unrelated to this:
```
ninja check-clang-codegen
-snip-
/SourceCache/llvm-trunk-8.0/tools/clang/test/CodeGen/builtins-wasm.c:87:10:
error: cannot compile this builtin function yet
-snip-
Failing Tests (1):
Clang :: CodeGen/builtins-wasm.c
Expected Passes : 1250
Expected Failures : 2
Unsupported Tests : 120
Unexpected Failures: 1
```
Original commit:
[X86] Support for the mno-tls-direct-seg-refs flag
Allows to disable direct TLS segment access (%fs or %gs). GCC supports a
similar flag, it can be useful in some circumstances, e.g. when a thread
context block needs to be updated directly from user space. More info and
specific use cases: https://bugs.llvm.org/show_bug.cgi?id=16145
Patch by nruslan (Ruslan Nikolaev).
Differential Revision: https://reviews.llvm.org/D53102
llvm-svn: 344739
This change adds support for the following MIPS target triples:
mipsisa32r6-linux-gnu
mipsisa32r6el-linux-gnu
mipsisa64r6-linux-gnuabi64
mipsisa64r6el-linux-gnuabi64
mipsisa64r6-linux-gnuabin32
mipsisa64r6el-linux-gnuabin32
Patch by Yun Qiang Su.
Differential revision: https://reviews.llvm.org/D50850
llvm-svn: 344608
The `GNUABIN32` environment in a target triple implies using the N32
ABI. This patch adds support for this environment and switches on N32
ABI if necessary.
Patch by Patch by YunQiang Su.
Differential revision: https://reviews.llvm.org/D51464
llvm-svn: 344570
This reverts commit https://reviews.llvm.org/rL344150 which causes
MachineOutliner related failures on the ppc64le multistage buildbot.
llvm-svn: 344526
Summary:
As per IRC disscussion, it seems we really want to have more fine-grained `-fsanitize=implicit-integer-truncation`:
* A check when both of the types are unsigned.
* Another check for the other cases (either one of the types is signed, or both of the types is signed).
This is clang part.
Compiler-rt part is D50902.
Reviewers: rsmith, vsk, Sanitizers
Reviewed by: rsmith
Differential Revision: https://reviews.llvm.org/D50901
llvm-svn: 344230
This is currently a clang extension and a resolution
of the defect report in the C++ Standard.
Differential Revision: https://reviews.llvm.org/D46441
llvm-svn: 344150
This matches the output of binutils' nm and ensures that any scripts
or tools that use nm and expect empty output in case there no symbols
don't break.
Differential Revision: https://reviews.llvm.org/D52943
llvm-svn: 343887
Fix code for constant evaluation of __builtin_memcpy() and
__builtin_memmove() that would attempt to divide by zero when given two
pointers to an incomplete array.
Differential Revision: https://reviews.llvm.org/D51855
llvm-svn: 343761
from those that aren't.
This patch changes the way __block variables that aren't captured by
escaping blocks are handled:
- Since non-escaping blocks on the stack never get copied to the heap
(see https://reviews.llvm.org/D49303), Sema shouldn't error out when
the type of a non-escaping __block variable doesn't have an accessible
copy constructor.
- IRGen doesn't have to use the specialized byref structure (see
https://clang.llvm.org/docs/Block-ABI-Apple.html#id8) for a
non-escaping __block variable anymore. Instead IRGen can emit the
variable as a normal variable and copy the reference to the block
literal. Byref copy/dispose helpers aren't needed either.
This reapplies r343518 after fixing a use-after-free bug in function
Sema::ActOnBlockStmtExpr where the BlockScopeInfo was dereferenced after
it was popped and deleted.
rdar://problem/39352313
Differential Revision: https://reviews.llvm.org/D51564
llvm-svn: 343542
from those that aren't.
This patch changes the way __block variables that aren't captured by
escaping blocks are handled:
- Since non-escaping blocks on the stack never get copied to the heap
(see https://reviews.llvm.org/D49303), Sema shouldn't error out when
the type of a non-escaping __block variable doesn't have an accessible
copy constructor.
- IRGen doesn't have to use the specialized byref structure (see
https://clang.llvm.org/docs/Block-ABI-Apple.html#id8) for a
non-escaping __block variable anymore. Instead IRGen can emit the
variable as a normal variable and copy the reference to the block
literal. Byref copy/dispose helpers aren't needed either.
This reapplies r341754, which was reverted in r341757 because it broke a
couple of bots. r341754 was calling markEscapingByrefs after the call to
PopFunctionScopeInfo, which caused the popped function scope to be
cleared out when the following code was compiled, for example:
$ cat test.m
struct A {
id data[10];
};
void foo() {
__block A v;
^{ (void)v; };
}
This commit calls markEscapingByrefs before calling PopFunctionScopeInfo
to prevent that from happening.
rdar://problem/39352313
Differential Revision: https://reviews.llvm.org/D51564
llvm-svn: 343518
These intrinsics exist in icc. They can be found on the Intel Intrinsics Guide website.
All the backend support is in place to pattern match a load+bswap or a bswap+store pattern to the MOVBE instructions. So we just need to get the frontend to emit the correct IR. The pointer arguments in icc are declared as void so I had to jump through a packed struct to forcing a specific alignment on the load/store. Same trick we use in the unaligned vector load/store intrinsics
Differential Revision: https://reviews.llvm.org/D52586
llvm-svn: 343343
Generate DILabel metadata and call llvm.dbg.label after label
statement to associate the metadata with the label.
After fixing PR37395.
After fixing problems in LiveDebugVariables.
After fixing NULL symbol problems in AddressPool when enabling
split-dwarf-file.
Differential Revision: https://reviews.llvm.org/D45045
llvm-svn: 343148
Previously we used a select and the zero_undef=true intrinsic. In -O2 this pattern will get optimized to zero_undef=false. But in -O0 this optimization won't happen. This results in a compare and cmov being wrapped around a tzcnt/lzcnt instruction.
By using the zero_undef=false intrinsic directly without the select, we can improve the -O0 codegen to just an lzcnt/tzcnt instruction.
Differential Revision: https://reviews.llvm.org/D52392
llvm-svn: 343126
PPC64BE bots use % instead of @ for directives like progbits. Since CFString tests also
check asm output, they fail on the following:
cfstring3.c:44:19: error: CHECK-ASM-ELF: expected string not found in input
// CHECK-ASM-ELF: .section cfstring,"aw",@progbits
<stdin>:30:2: note: possible intended match here
.section cfstring,"aw",%progbits
Updating that check with a {{[@%]}}progbits regex to make those bots happy.
llvm-svn: 343044
Relanding rL342883 with more fragmented tests to test ELF-specific
section emission separately from broad-scope CFString tests. Now this
tests the following separately
1). CoreFoundation builds and linkage for ELF while building it.
2). CFString ELF section emission outside CF in assembly output.
3). Broad scope `cfstring3.c` tests which cover all object formats at
bitcode level and assembly level (including ELF).
This fixes non-bridged CoreFoundation builds on ELF targets
that use -fconstant-cfstrings. The original changes from differential
for a similar patch to PE/COFF (https://reviews.llvm.org/D44491) did not
check for an edge case where the global could be a constant which surfaced
as an issue when building for ELF because of different linkage semantics.
This patch addresses several issues with crashes related to CF builds on ELF
as well as improves data layout by ensuring string literals that back
the actual CFConstStrings end up in .rodata in line with Mach-O.
Change itself tested with CoreFoundation on Linux x86_64 but should be valid
for BSD-like systems as well that use ELF as the native object format.
Differential Revision: https://reviews.llvm.org/D52344
llvm-svn: 343038
Added
__builtin_vsx_scalar_extract_expq
__builtin_vsx_scalar_insert_exp_qp
Builtins should behave the same way as in GCC.
Differential Revision: https://reviews.llvm.org/D48184
llvm-svn: 342911
[Clang][CodeGen][ObjC]: Fix non-bridged CoreFoundation builds on ELF targets
that use `-fconstant-cfstrings`. The original changes from differential
for a similar patch to PE/COFF (https://reviews.llvm.org/D44491) did not
check for an edge case where the global could be a constant which surfaced
as an issue when building for ELF because of different linkage semantics.
This patch addresses several issues with crashes related to CF builds on ELF
as well as improves data layout by ensuring string literals that back
the actual CFConstStrings end up in .rodata in line with Mach-O.
Change itself tested with CoreFoundation on Linux x86_64 but should be valid
for BSD-like systems as well that use ELF as the native object format.
Differential Revision: https://reviews.llvm.org/D52344
llvm-svn: 342883
aarch64 testing is broken because "medium" is not a valid
code-model on aarch64, and codemodels.c tests that. This fixes
that problem by adding "-triple x86_64-unknown-linux-gnu" to the
test with "-mcode-model moedium".
llvm-svn: 342812
A recent commit I made broke aarch64 testing, because "kernel"
apparently is not a valid code-model on aarch64, and one of my tests
tested that. This fixes the problem (hopefully) by adding "-triple
x86_64-unknown-linux-gnu" to the test build with "-mcodel-model
kernel".
Differential Revision: https://reviews.llvm.org/D52383
llvm-svn: 342789
Currently the code-model does not get saved in the module IR, so if a
code model is specified when compiling with LTO, it gets lost and is
not propagated properly to LTO. This patch does what is necessary in
the front end to pass the code-model to the module, so that the back
end can store it in the Module .
Differential Revision: https://reviews.llvm.org/D52323
llvm-svn: 342758
Summary:
Some lines have a hit counter where they should not have one.
Cleanup stuff is located to the last line of the body which is most of the time a '}'.
And Exception stuff is added at the beginning of a function and at the end (represented by '{' and '}').
So in such cases, the DebugLoc used in GCOVProfiling.cpp must be marked as not covered.
This patch is a followup of https://reviews.llvm.org/D49915.
Tests in projects/compiler_rt are fixed by: https://reviews.llvm.org/D49917
Reviewers: marco-c, davidxl
Reviewed By: marco-c
Subscribers: dblaikie, cfe-commits, sylvestre.ledru
Differential Revision: https://reviews.llvm.org/D49916
llvm-svn: 342717
unsigned long long builtin_unpack_vector_int128 (vector int128_t, int);
vector int128_t builtin_pack_vector_int128 (unsigned long long, unsigned long long);
Builtins should behave the same way as in GCC.
Patch By: wuzish (Zixuan Wu)
Differential Revision: https://reviews.llvm.org/D52074
llvm-svn: 342614
The code in ASTContext::DeclMustBeEmitted was supposed to handle this,
but didn't take into account that synthesized members such as operator=
might not get marked as template specializations, because they're
synthesized on the instantiation directly when handling the class-level
dllexport attribute.
llvm-svn: 342240
Summary:
Before this change, we only emit the XRay attributes in LLVM IR when the
-fxray-instrument flag is provided. This may cause issues with thinlto
when the final binary is being built/linked with -fxray-instrument, and
the constitutent LLVM IR gets re-lowered with xray instrumentation.
With this change, we can honour the "never-instrument "attributes
provided in the source code and preserve those in the IR. This way, even
in thinlto builds, we retain the attributes which say whether functions
should never be XRay instrumented.
This change addresses llvm.org/PR38922.
Reviewers: mboerger, eizan
Subscribers: mehdi_amini, dexonsmith, cfe-commits, llvm-commits
Differential Revision: https://reviews.llvm.org/D52015
llvm-svn: 342200
Summary:
On targets that do not support FP16 natively LLVM currently legalizes
vectors of FP16 values by scalarizing them and promoting to FP32. This
causes problems for the following code:
void foo(int, ...);
typedef __attribute__((neon_vector_type(4))) __fp16 float16x4_t;
void bar(float16x4_t x) {
foo(42, x);
}
According to the AAPCS (appendix A.2) float16x4_t is a containerized
vector fundamental type, so 'foo' expects that the 4 16-bit FP values
are packed into 2 32-bit registers, but instead bar promotes them to
4 single precision values.
Since we already handle scalar FP16 values in the frontend by
bitcasting them to/from integers, this patch adds similar handling for
vector types and homogeneous FP16 vector aggregates.
One existing test required some adjustments because we now generate
more bitcasts (so the patch changes the test to target a machine with
native FP16 support).
Reviewers: eli.friedman, olista01, SjoerdMeijer, javed.absar, efriedma
Reviewed By: javed.absar, efriedma
Subscribers: efriedma, kristof.beyls, cfe-commits, chrib
Differential Revision: https://reviews.llvm.org/D50507
llvm-svn: 342034
from those that aren't.
This patch changes the way __block variables that aren't captured by
escaping blocks are handled:
- Since non-escaping blocks on the stack never get copied to the heap
(see https://reviews.llvm.org/D49303), Sema shouldn't error out when
the type of a non-escaping __block variable doesn't have an accessible
copy constructor.
- IRGen doesn't have to use the specialized byref structure (see
https://clang.llvm.org/docs/Block-ABI-Apple.html#id8) for a
non-escaping __block variable anymore. Instead IRGen can emit the
variable as a normal variable and copy the reference to the block
literal. Byref copy/dispose helpers aren't needed either.
rdar://problem/39352313
Differential Revision: https://reviews.llvm.org/D51564
llvm-svn: 341754
Summary:
The optimized (__atomic_foo_<n>) libcalls assume that the atomic object
is properly aligned, so should never be called on an underaligned
object.
This addresses one of several problems identified in PR38846.
Reviewers: jyknight, t.p.northover
Subscribers: jfb, cfe-commits
Differential Revision: https://reviews.llvm.org/D51817
llvm-svn: 341734
This is the clang side of D51803. The llvm intrinsic now returns two results. So we need to emit an explicit store in IR for the out parameter. This is similar to addcarry/subborrow/rdrand/rdseed.
Differential Revision: https://reviews.llvm.org/D51805
llvm-svn: 341699
This is the clang side of D51769. The llvm intrinsics now return two results instead of using an out parameter.
Differential Revision: https://reviews.llvm.org/D51771
llvm-svn: 341678
Boilerplate code for using KMSAN instrumentation in Clang.
We add a new command line flag, -fsanitize=kernel-memory, with a
corresponding SanitizerKind::KernelMemory, which, along with
SanitizerKind::Memory, maps to the memory_sanitizer feature.
KMSAN is only supported on x86_64 Linux.
It's incompatible with other sanitizers, but supports code coverage
instrumentation.
llvm-svn: 341641
This reverts commit r341519, which generates debug info that causes
backend crashes. (with -split-dwarf-file)
Details in https://reviews.llvm.org/D50495
llvm-svn: 341549
Generate DILabel metadata and call llvm.dbg.label after label
statement to associate the metadata with the label.
After fixing PR37395.
After fixing problems in LiveDebugVariables.
Differential Revision: https://reviews.llvm.org/D45045
llvm-svn: 341519
Load Hardening.
Wires up the existing pass to work with a proper IR attribute rather
than just a hidden/internal flag. The internal flag continues to work
for now, but I'll likely remove it soon.
Most of the churn here is adding the IR attribute. I talked about this
Kristof Beyls and he seemed at least initially OK with this direction.
The idea of using a full attribute here is that we *do* expect at least
some forms of this for other architectures. There isn't anything
*inherently* x86-specific about this technique, just that we only have
an implementation for x86 at the moment.
While we could potentially expose this as a Clang-level attribute as
well, that seems like a good question to defer for the moment as it
isn't 100% clear whether that or some other programmer interface (or
both?) would be best. We'll defer the programmer interface side of this
for now, but at least get to the point where the feature can be enabled
without relying on implementation details.
This also allows us to do something that was really hard before: we can
enable *just* the indirect call retpolines when using SLH. For x86, we
don't have any other way to mitigate indirect calls. Other architectures
may take a different approach of course, and none of this is surfaced to
user-level flags.
Differential Revision: https://reviews.llvm.org/D51157
llvm-svn: 341363
These aren't documented in the Intel Intrinsics Guide, but are supported by gcc and icc.
Includes these intrinsics:
_ktestc_mask8_u8, _ktestz_mask8_u8, _ktest_mask8_u8
_ktestc_mask16_u8, _ktestz_mask16_u8, _ktest_mask16_u8
_ktestc_mask32_u8, _ktestz_mask32_u8, _ktest_mask32_u8
_ktestc_mask64_u8, _ktestz_mask64_u8, _ktest_mask64_u8
llvm-svn: 341265
This adds:
_cvtmask8_u32, _cvtmask16_u32, _cvtmask32_u32, _cvtmask64_u64
_cvtu32_mask8, _cvtu32_mask16, _cvtu32_mask32, _cvtu64_mask64
_load_mask8, _load_mask16, _load_mask32, _load_mask64
_store_mask8, _store_mask16, _store_mask32, _store_mask64
These are currently missing from the Intel Intrinsics Guide webpage.
llvm-svn: 341251
This adds the following intrinsics:
_kshiftli_mask8
_kshiftli_mask16
_kshiftli_mask32
_kshiftli_mask64
_kshiftri_mask8
_kshiftri_mask16
_kshiftri_mask32
_kshiftri_mask64
llvm-svn: 341234
Summary:
Added option -gline-directives-only to support emission of the debug directives
only. It behaves very similar to -gline-tables-only, except that it sets
llvm debug info emission kind to
llvm::DICompileUnit::DebugDirectivesOnly.
Reviewers: echristo
Subscribers: aprantl, fedor.sergeev, JDevlieghere, cfe-commits
Differential Revision: https://reviews.llvm.org/D51177
llvm-svn: 341212
Since MinGW supports automatically importing external variables from
DLLs even without the DLLImport attribute, we shouldn't mark them
as DSO local unless we actually know them to be local for sure.
Keep marking thread local variables as DSO local.
Differential Revision: https://reviews.llvm.org/D51382
llvm-svn: 340941
This adds the following intrinsics:
_kadd_mask64
_kadd_mask32
_kadd_mask16
_kadd_mask8
These are missing from the Intel Intrinsics Guide, but are implemented by both gcc and icc.
llvm-svn: 340879
This patch removes uses of the Darwin ABI for PowerPC related test cases. This
is the first step in removing Darwin support from the POWER backend.
clang/test/CodeGen/darwin-ppc-varargs.c was deleted because it was a darwin/ppc
specific test case.
All other tests were updated to remove the darwin/ppc specific invocation.
Phabricator Review: https://reviews.llvm.org/D50989.
llvm-svn: 340770
This also adds a second intrinsic name for the 16-bit mask versions.
These intrinsics match gcc and icc. They just aren't published in the Intel Intrinsics Guide so I only recently found they existed.
llvm-svn: 340719
If all LLVM passes are disabled, we can't emit a summary because there
could be unnamed globals in the IR.
Differential Revision: https://reviews.llvm.org/D51198
llvm-svn: 340640
After this commit there is an addrspace(1) before the attribute #. Since
these tests are only checking the value of the attribute add a {{.*}} to
make the test resilient to future output changes.
llvm-svn: 340522
When complaining that the triple is incompatible with all targets, print out the triple not just a generic error about triples not matching.
llvm-svn: 340510
constants by default when there is no optimization.
GCC's option -fno-keep-static-consts can be used to not emit
unused static constants.
In Clang, since default behavior does not keep unused static constants,
-fkeep-static-consts can be used to emit these if required. This could be
useful for producing identification strings like SVN identifiers
inside the object file even though the string isn't used by the program.
Differential Revision: https://reviews.llvm.org/D40925
llvm-svn: 340439
EmitX86BuiltinExpr() emits all args into Ops at the beginning, so don't do that
work again.
This changes behavior: If e.g. ++a was passed as an arg, we incremented a twice
previously. This change fixes that bug.
https://reviews.llvm.org/D50979
llvm-svn: 340348
If using a custom stack alignment, one is expected to make sure
that all callers provide such alignment, or realign the stack in
all entry points (and callbacks).
Despite this, the compiler can assume that the main function will
need realignment in these cases, since the startup routines calling
the main function most probably won't provide the custom alignment.
This matches what GCC does in similar cases; if compiling with
-mincoming-stack-boundary=X -mpreferred-stack-boundary=X, GCC normally
assumes such alignment on entry to a function, but specifically for
the main function still does realignment.
Differential Revision: https://reviews.llvm.org/D51026
llvm-svn: 340334
Summary:
We decided to revert this from i64 to i32 in Nov 28 CG meeting. Fixes
PR38632.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, jfb, cfe-commits
Differential Revision: https://reviews.llvm.org/D51013
llvm-svn: 340235
This changes the current default behavior (from emitting pubnames by
default, to not emitting them by default) & moves to matching GCC's
behavior* with one significant difference: -gno(-gnu)-pubnames disables
pubnames even in the presence of -gsplit-dwarf (though -gsplit-dwarf
still by default enables -ggnu-pubnames). This allows users to disable
pubnames (& the new DWARF5 accelerated access tables) when they might
not be worth the size overhead.
* GCC's behavior is that -ggnu-pubnames and -gpubnames override each
other, and that -gno-gnu-pubnames and -gno-pubnames act as synonyms and
disable either kind of pubnames if they come last. (eg: -gpubnames
-gno-gnu-pubnames causes no pubnames (neither gnu or standard) to be
emitted)
llvm-svn: 340206
This patch fixes definitions of vld and vst NEON intrinsics so
that we only define them if half-precision arithmetic is
supported on the target platform, as prescribed in ACLE 2.0.
Differential Revision: https://reviews.llvm.org/D49075
llvm-svn: 340140
This is a retry of rL340135 (reverted at rL340136 because of gcc host compiler crashing)
with 2 changes:
1. Move the code into a helper to reduce code duplication (and hopefully work-around the crash).
2. The original commit had a formatting bug in the docs (missing an underscore).
Original commit message:
This exposes the LLVM funnel shift intrinsics as more familiar bit rotation functions in clang
(when both halves of a funnel shift are the same value, it's a rotate).
We're free to name these as we want because we're not copying gcc, but if there's some other
existing art (eg, the microsoft ops that are modified in this patch) that we want to replicate,
we can change the names.
The funnel shift intrinsics were added here:
https://reviews.llvm.org/D49242
With improved codegen in:
https://reviews.llvm.org/rL337966https://reviews.llvm.org/rL339359
And basic IR optimization added in:
https://reviews.llvm.org/rL338218https://reviews.llvm.org/rL340022
...so these are expected to produce asm output that's equal or better to the multi-instruction
alternatives using primitive C/IR ops.
In the motivating loop example from PR37387:
https://bugs.llvm.org/show_bug.cgi?id=37387#c7
...we get the expected 'rolq' x86 instructions if we substitute the rotate builtin into the source.
Differential Revision: https://reviews.llvm.org/D50924
llvm-svn: 340137
This exposes the LLVM funnel shift intrinsics as more familiar bit rotation functions in clang
(when both halves of a funnel shift are the same value, it's a rotate).
We're free to name these as we want because we're not copying gcc, but if there's some other
existing art (eg, the microsoft ops that are modified in this patch) that we want to replicate,
we can change the names.
The funnel shift intrinsics were added here:
D49242
With improved codegen in:
rL337966
rL339359
And basic IR optimization added in:
rL338218
rL340022
...so these are expected to produce asm output that's equal or better to the multi-instruction
alternatives using primitive C/IR ops.
In the motivating loop example from PR37387:
https://bugs.llvm.org/show_bug.cgi?id=37387#c7
...we get the expected 'rolq' x86 instructions if we substitute the rotate builtin into the source.
Differential Revision: https://reviews.llvm.org/D50924
llvm-svn: 340135
r337619 added __shiftleft128 / __shiftright128 as functions in intrin.h.
Microsoft's STL plans on using these functions, and they're using intrin0.h
which just has declarations of built-ins to not pull in the huge intrin.h
header in the standard library headers. That requires that these functions are
real built-ins.
https://reviews.llvm.org/D50907
llvm-svn: 340048
- Add a command line options -msign-return-address to enable return address
signing
- Armv8.3a added instructions to sign the return address to help mitigate
against ROP attacks
- This patch adds command line options to generate function attributes that
signal to the back whether return address signing instructions should be
added
Differential revision: https://reviews.llvm.org/D49793
llvm-svn: 340019
Thread sanitizer instrumentation fails to skip all loads and stores to
profile counters. This can happen if profile counter updates are merged:
%.sink = phi i64* ...
%pgocount5 = load i64, i64* %.sink
%27 = add i64 %pgocount5, 1
%28 = bitcast i64* %.sink to i8*
call void @__tsan_write8(i8* %28)
store i64 %27, i64* %.sink
To suppress TSan diagnostics about racy counter updates, make the
counter updates atomic when TSan is enabled. If there's general interest
in this mode it can be surfaced as a clang/swift driver option.
Testing: check-{llvm,clang,profile}
rdar://40477803
Differential Revision: https://reviews.llvm.org/D50867
llvm-svn: 339955
The tests `CodeGen/aapcs[64]-align.cc` are not run since files with a `.cc`
suffix aren't recognisze as tests. This patch renames the above two files to
`.cpp`.
Differential Revision: https://reviews.llvm.org/D46013
Comitting as obvious.
llvm-svn: 339766
Summary:
Another piece of my ongoing to work for prefer-vector-width.
min-legal-vector-width will eventually be used by the X86 backend to know whether it needs to make 512 bits type legal when prefer-vector-width=256. If the user used inline assembly that passed in/out a 512-bit register, we need to make sure 512 bits are considered legal. Otherwise we'll get an assert failure when we try to wire up the inline assembly to the rest of the code.
This patch just checks the LLVM IR types to see if they are vectors and then updates the attribute based on their total width. I'm not sure if this is the best way to do this or if there's any subtlety I might have missed. So if anyone has other opinions on how to do this I'm open to suggestions.
Reviewers: chandlerc, rsmith, rnk
Reviewed By: rnk
Subscribers: eraman, cfe-commits
Differential Revision: https://reviews.llvm.org/D50678
llvm-svn: 339721
Summary: This is the patch that lowers x86 intrinsics to native IR in order to enable optimizations.
Reviewers: craig.topper, spatel, RKSimon
Reviewed By: craig.topper
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D46892
llvm-svn: 339651
Clang generates copy and dispose helper functions for each block literal
on the stack. Often these functions are equivalent for different blocks.
This commit makes changes to merge equivalent copy and dispose helper
functions and reduce code size.
To enable merging equivalent copy/dispose functions, the captured object
infomation is encoded into the helper function name. This allows IRGen
to check whether an equivalent helper function has already been emitted
and reuse the function instead of generating a new helper function
whenever a block is defined. In addition, the helper functions are
marked as linkonce_odr to enable merging helper functions that have the
same name across translation units and marked as unnamed_addr to enable
the linker's deduplication pass to merge functions that have different
names but the same content.
rdar://problem/42640608
Differential Revision: https://reviews.llvm.org/D50152
llvm-svn: 339438
This extension emits the guard cf table without inserting the
instrumentation. Currently that's what clang-cl does with /guard:cf
anyway, but this allows a user to request that explicitly.
Differential Revision: https://reviews.llvm.org/D50513
llvm-svn: 339420
Summary:
Windows does not allow globals to be initialised to point to globals in
another DLL. Exported globals may be referenced only from code. Work
around this by creating an initialiser that runs in early library
initialisation and sets the isa pointer.
Reviewers: rjmccall
Reviewed By: rjmccall
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D50436
llvm-svn: 339317
Changes the default Windows target triple returned by
GetHostTriple.cmake from the old environment names (which we wanted to
move away from) to newer, normalized ones. This also requires updating
all tests to use the new systems names in constraints.
Differential Revision: https://reviews.llvm.org/D47381
llvm-svn: 339307
LLVM triple normalization is handling "unknown" and empty components
differently; for example given "x86_64-unknown-linux-gnu" and
"x86_64-linux-gnu" which should be equivalent, triple normalization
returns "x86_64-unknown-linux-gnu" and "x86_64--linux-gnu". autoconf's
config.sub returns "x86_64-unknown-linux-gnu" for both
"x86_64-linux-gnu" and "x86_64-unknown-linux-gnu". This changes the
triple normalization to behave the same way, replacing empty triple
components with "unknown".
This addresses PR37129.
Differential Revision: https://reviews.llvm.org/D50219
llvm-svn: 339294
gcc defines an intrinsic called __builtin_clrsb which counts the number of extra sign bits on a number. This is equivalent to counting the number of leading zeros on a positive number or the number of leading ones on a negative number and subtracting one from the result. Since we can't count leading ones we need to invert negative numbers to count zeros.
This patch will cause the builtin to be expanded inline while gcc uses a call to a function like clrsbdi2 that is implemented in libgcc. But this is similar to what we already do for popcnt. And I don't think compiler-rt supports clrsbdi2.
Differential Revision: https://reviews.llvm.org/D50168
llvm-svn: 339282
Generate DILabel metadata and call llvm.dbg.label after label
statement to associate the metadata with the label.
After fixing PR37395.
Differential Revision: https://reviews.llvm.org/D45045
llvm-svn: 338989
Summary:
Optimization remark format is slightly changed by LLVM patch D49412.
Two tests are fixed with expected messages changed.
Frankly speaking I have not tested this change yet. I will test when manage to setup the project.
Reviewers: xbolva00
Reviewed By: xbolva00
Subscribers: mehdi_amini, eraman, steven_wu, dexonsmith
Differential Revision: https://reviews.llvm.org/D50241
llvm-svn: 338971
__builtin_memmove (in non-type-punning cases).
This is intended to permit libc++ to make std::copy etc constexpr
without sacrificing the optimization that uses memcpy on
trivially-copyable types.
__builtin_strcpy and __builtin_wcscpy are not handled by this change.
They'd be straightforward to add, but we haven't encountered a need for
them just yet.
This reinstates r338455, reverted in r338602, with a fix to avoid trying
to constant-evaluate a memcpy call if either pointer operand has an
invalid designator.
llvm-svn: 338941
- Testcase attempts to (not) grep 'g0' in output to ensure asm symbol is
properly renamed, but g0 is too generic and can be part of the
module's path in LLVM IR output.
- Changed to grep '@g0', which is what the proper global symbol name
would be if not using asm.
llvm-svn: 338895
It caused asserts during Chromium builds, see reply on the cfe-commits thread.
> This is intended to permit libc++ to make std::copy etc constexpr
> without sacrificing the optimization that uses memcpy on
> trivially-copyable types.
>
> __builtin_strcpy and __builtin_wcscpy are not handled by this change.
> They'd be straightforward to add, but we haven't encountered a need for
> them just yet.
llvm-svn: 338602
__builtin_memmove (in non-type-punning cases).
This is intended to permit libc++ to make std::copy etc constexpr
without sacrificing the optimization that uses memcpy on
trivially-copyable types.
__builtin_strcpy and __builtin_wcscpy are not handled by this change.
They'd be straightforward to add, but we haven't encountered a need for
them just yet.
llvm-svn: 338455
Summary:
C and C++ are interesting languages. They are statically typed, but weakly.
The implicit conversions are allowed. This is nice, allows to write code
while balancing between getting drowned in everything being convertible,
and nothing being convertible. As usual, this comes with a price:
```
unsigned char store = 0;
bool consume(unsigned int val);
void test(unsigned long val) {
if (consume(val)) {
// the 'val' is `unsigned long`, but `consume()` takes `unsigned int`.
// If their bit widths are different on this platform, the implicit
// truncation happens. And if that `unsigned long` had a value bigger
// than UINT_MAX, then you may or may not have a bug.
// Similarly, integer addition happens on `int`s, so `store` will
// be promoted to an `int`, the sum calculated (0+768=768),
// and the result demoted to `unsigned char`, and stored to `store`.
// In this case, the `store` will still be 0. Again, not always intended.
store = store + 768; // before addition, 'store' was promoted to int.
}
// But yes, sometimes this is intentional.
// You can either make the conversion explicit
(void)consume((unsigned int)val);
// or mask the value so no bits will be *implicitly* lost.
(void)consume((~((unsigned int)0)) & val);
}
```
Yes, there is a `-Wconversion`` diagnostic group, but first, it is kinda
noisy, since it warns on everything (unlike sanitizers, warning on an
actual issues), and second, there are cases where it does **not** warn.
So a Sanitizer is needed. I don't have any motivational numbers, but i know
i had this kind of problem 10-20 times, and it was never easy to track down.
The logic to detect whether an truncation has happened is pretty simple
if you think about it - https://godbolt.org/g/NEzXbb - basically, just
extend (using the new, not original!, signedness) the 'truncated' value
back to it's original width, and equality-compare it with the original value.
The most non-trivial thing here is the logic to detect whether this
`ImplicitCastExpr` AST node is **actually** an implicit conversion, //or//
part of an explicit cast. Because the explicit casts are modeled as an outer
`ExplicitCastExpr` with some `ImplicitCastExpr`'s as **direct** children.
https://godbolt.org/g/eE1GkJ
Nowadays, we can just use the new `part_of_explicit_cast` flag, which is set
on all the implicitly-added `ImplicitCastExpr`'s of an `ExplicitCastExpr`.
So if that flag is **not** set, then it is an actual implicit conversion.
As you may have noted, this isn't just named `-fsanitize=implicit-integer-truncation`.
There are potentially some more implicit conversions to be warned about.
Namely, implicit conversions that result in sign change; implicit conversion
between different floating point types, or between fp and an integer,
when again, that conversion is lossy.
One thing i know isn't handled is bitfields.
This is a clang part.
The compiler-rt part is D48959.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=21530 | PR21530 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=37552 | PR37552 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=35409 | PR35409 ]].
Partially fixes [[ https://bugs.llvm.org/show_bug.cgi?id=9821 | PR9821 ]].
Fixes https://github.com/google/sanitizers/issues/940. (other than sign-changing implicit conversions)
Reviewers: rjmccall, rsmith, samsonov, pcc, vsk, eugenis, efriedma, kcc, erichkeane
Reviewed By: rsmith, vsk, erichkeane
Subscribers: erichkeane, klimek, #sanitizers, aaron.ballman, RKSimon, dtzWill, filcab, danielaustin, ygribov, dvyukov, milianw, mclow.lists, cfe-commits, regehr
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D48958
llvm-svn: 338288
The "Procedure Call Procedure Call Standard for the ARM® Architecture"
(https://static.docs.arm.com/ihi0042/f/IHI0042F_aapcs.pdf), specifies that
composite types are passed according to their "natural alignment", i.e. the
alignment before alignment adjustment on the entire composite is applied.
The same applies for AArch64 ABI.
Clang, however, used the adjusted alignment.
GCC already implements the ABI correctly. With this patch Clang becomes
compatible with GCC and passes such arguments in accordance with AAPCS.
Differential Revision: https://reviews.llvm.org/D46013
llvm-svn: 338279
Summary:
Right now automatic variables are either initialized with bzero followed by a few stores, or memcpy'd from a synthesized global. We end up encountering a fair amount of code where memcpy of non-zero byte patterns would be better than memcpy from a global because it touches less memory and generates a smaller binary. The optimizer could reason about this, but it's not really worth it when clang already knows.
This code could definitely be more clever but I'm not sure it's worth it. In particular we could track a histogram of bytes seen and figure out (as we do with bzero) if a memset could be followed by a handful of stores. Similarly, we could tune the heuristics for GlobalSize, but using the same as for bzero seems conservatively OK for now.
<rdar://problem/42563091>
Reviewers: dexonsmith
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D49771
llvm-svn: 337887
Generate DILabel metadata and call llvm.dbg.label after label
statement to associate the metadata with the label.
After fixing PR37395.
Differential Revision: https://reviews.llvm.org/D45045
Patch by Hsiangkai Wang.
llvm-svn: 337800
This patch adds support for vrndi_f32() and vrndiq_f32()
intrinsics in AArch32 mode and for vrndns_f32() intrinsic in
AArch64 mode.
Differential Revision: https://reviews.llvm.org/D48829
llvm-svn: 337690
As documented here: https://software.intel.com/en-us/node/682969 and
https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning
is an ICC feature that provides for function multiversioning.
This feature is implemented with two attributes: First, cpu_specific,
which specifies the individual function versions. Second, cpu_dispatch,
which specifies the location of the resolver function and the list of
resolvable functions.
This is valuable since it provides a mechanism where the resolver's TU
can be specified in one location, and the individual implementions
each in their own translation units.
The goal of this patch is to be source-compatible with ICC, so this
implementation diverges from the ICC implementation in a few ways:
1- Linux x86/64 only: This implementation uses ifuncs in order to
properly dispatch functions. This is is a valuable performance benefit
over the ICC implementation. A future patch will be provided to enable
this feature on Windows, but it will obviously more closely fit ICC's
implementation.
2- CPU Identification functions: ICC uses a set of custom functions to identify
the feature list of the host processor. This patch uses the cpu_supports
functionality in order to better align with 'target' multiversioning.
1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function
marked cpu_dispatch be an empty definition. This patch supports that as well,
however declarations are also permitted, since the linker will solve the
issue of multiple emissions.
Differential Revision: https://reviews.llvm.org/D47474
llvm-svn: 337552
constant, don't convert the rest into a packed struct.
If an array constant has a large non-zero portion and a large zero
portion, we want to emit the first part as an array and the rest as a
zeroinitializer if possible. This fixes a memory usage regression from
r333141 when compiling PHP.
llvm-svn: 337498
The codegen for this builtin was initially implemented to match GCC.
However, due to interest from users GCC changed behaviour to account for the
big endian bias of the instruction and correct it. This patch brings the
handling inline with GCC.
Fixes https://bugs.llvm.org/show_bug.cgi?id=38192
Differential Revision: https://reviews.llvm.org/D49424
llvm-svn: 337449
Summary:
Support for this option is needed for building Linux kernel.
This is a very frequently requested feature by kernel developers.
More details : https://lkml.org/lkml/2018/4/4/601
GCC option description for -fdelete-null-pointer-checks:
This Assume that programs cannot safely dereference null pointers,
and that no code or data element resides at address zero.
-fno-delete-null-pointer-checks is the inverse of this implying that
null pointer dereferencing is not undefined.
This feature is implemented in as the function attribute
"null-pointer-is-valid"="true".
This CL only adds the attribute on the function.
It also strips "nonnull" attributes from function arguments but
keeps the related warnings unchanged.
Corresponding LLVM change rL336613 already updated the
optimizations to not treat null pointer dereferencing
as undefined if the attribute is present.
Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv
Reviewed By: jyknight
Subscribers: drinkcat, xbolva00, cfe-commits
Differential Revision: https://reviews.llvm.org/D47894
llvm-svn: 337433
Summary:
Using _Atomic to do implicit load / store is just a seq_cst atomic_load / atomic_store. Stores currently assert in Sema::ImpCastExprToType with 'can't implicitly cast lvalue to rvalue with this cast kind', but that's erroneous. The codegen is fine as the test shows.
While investigating I found that Richard had found the problem here: https://reviews.llvm.org/D46112#1113557
<rdar://problem/40347123>
Reviewers: dexonsmith
Subscribers: cfe-commits, efriedma, rsmith, aaron.ballman
Differential Revision: https://reviews.llvm.org/D49458
llvm-svn: 337410
which was reverted in r337336.
The problem that required a revert was fixed in r337338.
Also added a missing "REQUIRES: x86-registered-target" to one of
the tests.
Original commit message:
> Teach Clang to emit address-significance tables.
>
> By default, we emit an address-significance table on all ELF
> targets when the integrated assembler is enabled. The emission of an
> address-significance table can be controlled with the -faddrsig and
> -fno-addrsig flags.
>
> Differential Revision: https://reviews.llvm.org/D48155
llvm-svn: 337339
Causing multiple failures on sanitizer bots due to TLS symbol errors,
e.g.
/usr/bin/ld: __msan_origin_tls: TLS definition in /home/buildbots/ppc64be-clang-test/clang-ppc64be/stage1/lib/clang/7.0.0/lib/linux/libclang_rt.msan-powerpc64.a(msan.cc.o) section .tbss.__msan_origin_tls mismatches non-TLS reference in /tmp/lit_tmp_0a71tA/mallinfo-3ca75e.o
llvm-svn: 337336
By default, we emit an address-significance table on all ELF
targets when the integrated assembler is enabled. The emission of an
address-significance table can be controlled with the -faddrsig and
-fno-addrsig flags.
Differential Revision: https://reviews.llvm.org/D48155
llvm-svn: 337333
Summary:
If the build path is short, `Line` field can end up fitting on the same line as `File`,
but the `{{.*}}` would consume it. Keeping in mind rL293149, i think we can fix it,
while keeping it working when there are and there are not any quotations.
At least this fixes this test for me.
Reviewers: anemet, aaron.ballman, hfinkel
Reviewed By: anemet
Subscribers: cfe-commits, llvm-commits
Differential Revision: https://reviews.llvm.org/D49348
llvm-svn: 337249
Summary: Automatic variable initialization was generating default-aligned stores (which are deprecated) instead of using the known alignment from the alloca. Further, they didn't specify inbounds.
Subscribers: dexonsmith, cfe-commits
Differential Revision: https://reviews.llvm.org/D49209
llvm-svn: 337041
string, choose the strictest one instead of the last.
Also fix an undefined behavior. Move the pointer update to a later point to
avoid adding StringRef::npos to the pointer.
rdar://problem/40706280
llvm-svn: 336863
Code in `CodeGenModule::SetFunctionAttributes()` could set an empty
attribute `implicit-section-name` on a function that is affected by
`#pragma clang text="section"`. This is incorrect because the attribute
should contain a valid section name. If the function additionally also
used `__attribute__((section("section")))` then this could result in
emitting the function in a section with an empty name.
The patch fixes the issue by removing the problematic code that sets
empty `implicit-section-name` from
`CodeGenModule::SetFunctionAttributes()` because it is sufficient to set
this attribute only from a similar code in `setNonAliasAttributes()`
when the function is emitted.
Differential Revision: https://reviews.llvm.org/D48916
llvm-svn: 336842
Summary:
Make sure that loop metadata only is put on the backedge
when expanding a do-while loop.
Previously we added the loop metadata also on the branch
in the pre-header. That could confuse optimization passes
and result in the loop metadata being associated with the
wrong loop.
Fixes https://bugs.llvm.org/show_bug.cgi?id=38011
Committing on behalf of deepak2427 (Deepak Panickal)
Reviewers: #clang, ABataev, hfinkel, aaron.ballman, bjope
Reviewed By: bjope
Subscribers: bjope, rsmith, shenhan, zzheng, xbolva00, lebedev.ri, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D48721
llvm-svn: 336717
This patch lowers the _mm[256|512]_cvtepi{64|32|16}_epi{32|16|8} intrinsics to
native IR in cases where the result's length is less than 128 bits.
The resulting IR for 256-bit inputs is folded into VPMOV instructions, while for
128-bit inputs the vpshufb (or, in the 64-to-32-bit case, vinsertps)
instructions are generated instead
Differential Revision: https://reviews.llvm.org/D48712
llvm-svn: 336643
The rounding mode is checked in CGBuiltin.cpp to generate the correct intrinsic call.
Making this switch switchs the masking to use the i8 bitcast to <8 x i1> and extract i1 version of the IR for the mask. Previously we ended up with a scalar 'and' plus an icmp.
llvm-svn: 336637
Privacy annotations shouldn't have to appear in the first
comma-delimited string in order to be recognized. Also, they should be
ignored if they are preceded or followed by non-whitespace characters.
rdar://problem/40706280
llvm-svn: 336629
This will convert the i8 mask argument to <8 x i1> and extract an i1 and then emit a select instruction. This replaces the '(__U & 1)" and ternary operator used in some of intrinsics. The old sequence was lowered to a scalar and and compare. The new sequence uses an i1 vector that will interoperate better with other mask intrinsics.
This removes the need to handle div_ss/sd specially in CGBuiltin.cpp. A follow up patch will add the GCCBuiltin name back in llvm and remove the custom handling.
I made some adjustments to legacy move_ss/sd intrinsics which we reused here to do a simpler extract and insert instead of 2 extracts and two inserts or a shuffle.
llvm-svn: 336622
This is part of an ongoing attempt at making 512 bit vectors illegal in the X86 backend type legalizer due to CPU frequency penalties associated with wide vectors on Skylake Server CPUs. We want the loop vectorizer to be able to emit IR containing wide vectors as intermediate operations in vectorized code and allow these wide vectors to be legalized to 256 bits by the X86 backend even though we are targetting a CPU that supports 512 bit vectors. This is similar to what happens with an AVX2 CPU, the vectorizer can emit wide vectors and the backend will split them. We want this splitting behavior, but still be able to use new Skylake instructions that work on 256-bit vectors and support things like masking and gather/scatter.
Of course if the user uses explicit vector code in their source code we need to not split those operations. Especially if they have used any of the 512-bit vector intrinsics from immintrin.h. And we need to make it so that merely using the intrinsics produces the expected code in order to be backwards compatible.
To support this goal, this patch adds a new IR function attribute "min-legal-vector-width" that can indicate the need for a minimum vector width to be legal in the backend. We need to ensure this attribute is set to the largest vector width needed by any intrinsics from immintrin.h that the function uses. The inliner will be reponsible for merging this attribute when a function is inlined. We may also need a way to limit inlining in the future as well, but we can discuss that in the future.
To make things more complicated, there are two different ways intrinsics are implemented in immintrin.h. Either as an always_inline function containing calls to builtins(can be target specific or target independent) or vector extension code. Or as a macro wrapper around a taget specific builtin. I believe I've removed all cases where the macro was around a target independent builtin.
To support the always_inline function case this patch adds attribute((min_vector_width(128))) that can be used to tag these functions with their vector width. All x86 intrinsic functions that operate on vectors have been tagged with this attribute.
To support the macro case, all x86 specific builtins have also been tagged with the vector width that they require. Use of any builtin with this property will implicitly increase the min_vector_width of the function that calls it. I've done this as a new property in the attribute string for the builtin rather than basing it on the type string so that we can opt into it on a per builtin basis and avoid any impact to target independent builtins.
There will be future work to support vectors passed as function arguments and supporting inline assembly. And whatever else we can find that isn't covered by this patch.
Special thanks to Chandler who suggested this direction and reviewed a preview version of this patch. And thanks to Eric Christopher who has had many conversations with me about this issue.
Differential Revision: https://reviews.llvm.org/D48617
llvm-svn: 336583
Add a number of builtins for __float128 Round To Odd.
This is the Clang portion of the builtins work.
Differential Revision: https://reviews.llvm.org/D47548
llvm-svn: 336579
I believe these have been broken since their introduction into clang.
I've enhanced the tests for these intrinsics to using a real rounding mode and checking all the intrinsic arguments instead of just the name.
llvm-svn: 336498
This case occurs in the intrinsic headers so we should avoid emitting the mask in those cases.
Factor the code into a helper function to make this easy.
llvm-svn: 336472
We had the mask versions of the rounding intrinsics, but not one without masking.
Also change the rounding tests to not use the CUR_DIRECTION rounding mode.
llvm-svn: 336470
A Chromium developer reported a bug which turned out to be a mangling
collision between these two literals:
char s[] = "foo";
char t[32] = "foo";
They may look the same, but for the initialization of t we will (under
some circumstances) use a literal that's extended with zeros, and
both the length and those zeros should be accounted for by the mangling.
This actually makes the mangling code simpler: where it previously had
special logic for null terminators, which are not part of the
StringLiteral, that is now covered by the general algorithm.
(The problem was reported at https://crbug.com/857442)
Differential Revision: https://reviews.llvm.org/D48928
llvm-svn: 336415
Shufflevector is easier to generate and matches what the backend pattern matches without relying on constant selects being turned into shuffles.
While I was there I also made the IR regular expressions a little stricter to ensure operand order on the shuffle.
llvm-svn: 336388
This patch removes on optimization used with the TRUE/FALSE
predicates, as was suggested in https://reviews.llvm.org/D45616
for r335339.
The optimization was buggy, since r335339 used it also
for *_mask builtins, without actually applying the mask -- the
mask argument was just ignored.
Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D48715
llvm-svn: 336355
Add test cases with each predicate using the following
intrinsics:
_mm_cmp_pd
_mm_cmp_ps
_mm256_cmp_pd
_mm256_cmp_ps
_mm_cmp_pd_mask
_mm_cmp_ps_mask
_mm256_cmp_pd_mask
_mm256_cmp_ps_mask
_mm512_cmp_pd_mask
_mm512_cmp_ps_mask
_mm_mask_cmp_pd_mask
_mm_mask_cmp_ps_mask
_mm256_mask_cmp_pd_mask
_mm256_mask_cmp_ps_mask
_mm512_mask_cmp_pd_mask
_mm512_mask_cmp_ps_mask
Some of these are marked with FIXME, as there is bug in lowering
e.g. _mm512_mask_cmp_ps_mask.
llvm-svn: 336346
Update clang to treat fp128 as a valid base type for homogeneous aggregate
passing and returning.
Differential Revision: https://reviews.llvm.org/D48044
llvm-svn: 336308
All of these found by grepping through IR from the builtin tests for extra trunc and zext/sext instructions that shouldn't have been there.
Some of these were real bugs where we lost bits from the user input:
_mm512_mask_broadcast_f32x8
_mm512_maskz_broadcast_f32x8
_mm512_mask_broadcast_i32x8
_mm512_maskz_broadcast_i32x8
_mm256_mask_cvtusepi16_storeu_epi8
llvm-svn: 336042
Summary:
Changes to some clang side tests to go with the summary parsing patch.
Depends on D47905.
Reviewers: pcc, dexonsmith, mehdi_amini
Subscribers: inglorion, eraman, cfe-commits, steven_wu
Differential Revision: https://reviews.llvm.org/D47906
llvm-svn: 335618
The WebAssembly backend in particular benefits from being
able to distinguish between varargs functions (...) and prototype-less
C functions.
Differential Revision: https://reviews.llvm.org/D48443
llvm-svn: 335510
With MSVC, PCH files are created along with an object file that needs to
be linked into the final library or executable. That object file
contains the code generated when building the headers. In particular, it
will include definitions of inline dllexport functions, and because they
are emitted in this object file, other files using the PCH do not need
to emit them. See the bug for an example.
This patch makes clang-cl match MSVC's behaviour in this regard, causing
significant compile-time savings when building dlls using precompiled
headers.
For example, in a 64-bit optimized shared library build of Chromium with
PCH, it reduces the binary size and compile time of
stroke_opacity_custom.obj from 9315564 bytes to 3659629 bytes and 14.6
to 6.63 s. The wall-clock time of building blink_core.dll goes from
38m41s to 22m33s. ("user" time goes from 1979m to 1142m).
Differential Revision: https://reviews.llvm.org/D48426
llvm-svn: 335466