When complaining that the triple is incompatible with all targets, print out the triple not just a generic error about triples not matching.
llvm-svn: 340510
constants by default when there is no optimization.
GCC's option -fno-keep-static-consts can be used to not emit
unused static constants.
In Clang, since default behavior does not keep unused static constants,
-fkeep-static-consts can be used to emit these if required. This could be
useful for producing identification strings like SVN identifiers
inside the object file even though the string isn't used by the program.
Differential Revision: https://reviews.llvm.org/D40925
llvm-svn: 340439
EmitX86BuiltinExpr() emits all args into Ops at the beginning, so don't do that
work again.
This changes behavior: If e.g. ++a was passed as an arg, we incremented a twice
previously. This change fixes that bug.
https://reviews.llvm.org/D50979
llvm-svn: 340348
If using a custom stack alignment, one is expected to make sure
that all callers provide such alignment, or realign the stack in
all entry points (and callbacks).
Despite this, the compiler can assume that the main function will
need realignment in these cases, since the startup routines calling
the main function most probably won't provide the custom alignment.
This matches what GCC does in similar cases; if compiling with
-mincoming-stack-boundary=X -mpreferred-stack-boundary=X, GCC normally
assumes such alignment on entry to a function, but specifically for
the main function still does realignment.
Differential Revision: https://reviews.llvm.org/D51026
llvm-svn: 340334
Summary:
We decided to revert this from i64 to i32 in Nov 28 CG meeting. Fixes
PR38632.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, jfb, cfe-commits
Differential Revision: https://reviews.llvm.org/D51013
llvm-svn: 340235
This changes the current default behavior (from emitting pubnames by
default, to not emitting them by default) & moves to matching GCC's
behavior* with one significant difference: -gno(-gnu)-pubnames disables
pubnames even in the presence of -gsplit-dwarf (though -gsplit-dwarf
still by default enables -ggnu-pubnames). This allows users to disable
pubnames (& the new DWARF5 accelerated access tables) when they might
not be worth the size overhead.
* GCC's behavior is that -ggnu-pubnames and -gpubnames override each
other, and that -gno-gnu-pubnames and -gno-pubnames act as synonyms and
disable either kind of pubnames if they come last. (eg: -gpubnames
-gno-gnu-pubnames causes no pubnames (neither gnu or standard) to be
emitted)
llvm-svn: 340206
This patch fixes definitions of vld and vst NEON intrinsics so
that we only define them if half-precision arithmetic is
supported on the target platform, as prescribed in ACLE 2.0.
Differential Revision: https://reviews.llvm.org/D49075
llvm-svn: 340140
This is a retry of rL340135 (reverted at rL340136 because of gcc host compiler crashing)
with 2 changes:
1. Move the code into a helper to reduce code duplication (and hopefully work-around the crash).
2. The original commit had a formatting bug in the docs (missing an underscore).
Original commit message:
This exposes the LLVM funnel shift intrinsics as more familiar bit rotation functions in clang
(when both halves of a funnel shift are the same value, it's a rotate).
We're free to name these as we want because we're not copying gcc, but if there's some other
existing art (eg, the microsoft ops that are modified in this patch) that we want to replicate,
we can change the names.
The funnel shift intrinsics were added here:
https://reviews.llvm.org/D49242
With improved codegen in:
https://reviews.llvm.org/rL337966https://reviews.llvm.org/rL339359
And basic IR optimization added in:
https://reviews.llvm.org/rL338218https://reviews.llvm.org/rL340022
...so these are expected to produce asm output that's equal or better to the multi-instruction
alternatives using primitive C/IR ops.
In the motivating loop example from PR37387:
https://bugs.llvm.org/show_bug.cgi?id=37387#c7
...we get the expected 'rolq' x86 instructions if we substitute the rotate builtin into the source.
Differential Revision: https://reviews.llvm.org/D50924
llvm-svn: 340137
This exposes the LLVM funnel shift intrinsics as more familiar bit rotation functions in clang
(when both halves of a funnel shift are the same value, it's a rotate).
We're free to name these as we want because we're not copying gcc, but if there's some other
existing art (eg, the microsoft ops that are modified in this patch) that we want to replicate,
we can change the names.
The funnel shift intrinsics were added here:
D49242
With improved codegen in:
rL337966
rL339359
And basic IR optimization added in:
rL338218
rL340022
...so these are expected to produce asm output that's equal or better to the multi-instruction
alternatives using primitive C/IR ops.
In the motivating loop example from PR37387:
https://bugs.llvm.org/show_bug.cgi?id=37387#c7
...we get the expected 'rolq' x86 instructions if we substitute the rotate builtin into the source.
Differential Revision: https://reviews.llvm.org/D50924
llvm-svn: 340135
r337619 added __shiftleft128 / __shiftright128 as functions in intrin.h.
Microsoft's STL plans on using these functions, and they're using intrin0.h
which just has declarations of built-ins to not pull in the huge intrin.h
header in the standard library headers. That requires that these functions are
real built-ins.
https://reviews.llvm.org/D50907
llvm-svn: 340048
- Add a command line options -msign-return-address to enable return address
signing
- Armv8.3a added instructions to sign the return address to help mitigate
against ROP attacks
- This patch adds command line options to generate function attributes that
signal to the back whether return address signing instructions should be
added
Differential revision: https://reviews.llvm.org/D49793
llvm-svn: 340019
Thread sanitizer instrumentation fails to skip all loads and stores to
profile counters. This can happen if profile counter updates are merged:
%.sink = phi i64* ...
%pgocount5 = load i64, i64* %.sink
%27 = add i64 %pgocount5, 1
%28 = bitcast i64* %.sink to i8*
call void @__tsan_write8(i8* %28)
store i64 %27, i64* %.sink
To suppress TSan diagnostics about racy counter updates, make the
counter updates atomic when TSan is enabled. If there's general interest
in this mode it can be surfaced as a clang/swift driver option.
Testing: check-{llvm,clang,profile}
rdar://40477803
Differential Revision: https://reviews.llvm.org/D50867
llvm-svn: 339955
The tests `CodeGen/aapcs[64]-align.cc` are not run since files with a `.cc`
suffix aren't recognisze as tests. This patch renames the above two files to
`.cpp`.
Differential Revision: https://reviews.llvm.org/D46013
Comitting as obvious.
llvm-svn: 339766
Summary:
Another piece of my ongoing to work for prefer-vector-width.
min-legal-vector-width will eventually be used by the X86 backend to know whether it needs to make 512 bits type legal when prefer-vector-width=256. If the user used inline assembly that passed in/out a 512-bit register, we need to make sure 512 bits are considered legal. Otherwise we'll get an assert failure when we try to wire up the inline assembly to the rest of the code.
This patch just checks the LLVM IR types to see if they are vectors and then updates the attribute based on their total width. I'm not sure if this is the best way to do this or if there's any subtlety I might have missed. So if anyone has other opinions on how to do this I'm open to suggestions.
Reviewers: chandlerc, rsmith, rnk
Reviewed By: rnk
Subscribers: eraman, cfe-commits
Differential Revision: https://reviews.llvm.org/D50678
llvm-svn: 339721
Summary: This is the patch that lowers x86 intrinsics to native IR in order to enable optimizations.
Reviewers: craig.topper, spatel, RKSimon
Reviewed By: craig.topper
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D46892
llvm-svn: 339651
Clang generates copy and dispose helper functions for each block literal
on the stack. Often these functions are equivalent for different blocks.
This commit makes changes to merge equivalent copy and dispose helper
functions and reduce code size.
To enable merging equivalent copy/dispose functions, the captured object
infomation is encoded into the helper function name. This allows IRGen
to check whether an equivalent helper function has already been emitted
and reuse the function instead of generating a new helper function
whenever a block is defined. In addition, the helper functions are
marked as linkonce_odr to enable merging helper functions that have the
same name across translation units and marked as unnamed_addr to enable
the linker's deduplication pass to merge functions that have different
names but the same content.
rdar://problem/42640608
Differential Revision: https://reviews.llvm.org/D50152
llvm-svn: 339438
This extension emits the guard cf table without inserting the
instrumentation. Currently that's what clang-cl does with /guard:cf
anyway, but this allows a user to request that explicitly.
Differential Revision: https://reviews.llvm.org/D50513
llvm-svn: 339420
Summary:
Windows does not allow globals to be initialised to point to globals in
another DLL. Exported globals may be referenced only from code. Work
around this by creating an initialiser that runs in early library
initialisation and sets the isa pointer.
Reviewers: rjmccall
Reviewed By: rjmccall
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D50436
llvm-svn: 339317
Changes the default Windows target triple returned by
GetHostTriple.cmake from the old environment names (which we wanted to
move away from) to newer, normalized ones. This also requires updating
all tests to use the new systems names in constraints.
Differential Revision: https://reviews.llvm.org/D47381
llvm-svn: 339307
LLVM triple normalization is handling "unknown" and empty components
differently; for example given "x86_64-unknown-linux-gnu" and
"x86_64-linux-gnu" which should be equivalent, triple normalization
returns "x86_64-unknown-linux-gnu" and "x86_64--linux-gnu". autoconf's
config.sub returns "x86_64-unknown-linux-gnu" for both
"x86_64-linux-gnu" and "x86_64-unknown-linux-gnu". This changes the
triple normalization to behave the same way, replacing empty triple
components with "unknown".
This addresses PR37129.
Differential Revision: https://reviews.llvm.org/D50219
llvm-svn: 339294
gcc defines an intrinsic called __builtin_clrsb which counts the number of extra sign bits on a number. This is equivalent to counting the number of leading zeros on a positive number or the number of leading ones on a negative number and subtracting one from the result. Since we can't count leading ones we need to invert negative numbers to count zeros.
This patch will cause the builtin to be expanded inline while gcc uses a call to a function like clrsbdi2 that is implemented in libgcc. But this is similar to what we already do for popcnt. And I don't think compiler-rt supports clrsbdi2.
Differential Revision: https://reviews.llvm.org/D50168
llvm-svn: 339282
Generate DILabel metadata and call llvm.dbg.label after label
statement to associate the metadata with the label.
After fixing PR37395.
Differential Revision: https://reviews.llvm.org/D45045
llvm-svn: 338989
Summary:
Optimization remark format is slightly changed by LLVM patch D49412.
Two tests are fixed with expected messages changed.
Frankly speaking I have not tested this change yet. I will test when manage to setup the project.
Reviewers: xbolva00
Reviewed By: xbolva00
Subscribers: mehdi_amini, eraman, steven_wu, dexonsmith
Differential Revision: https://reviews.llvm.org/D50241
llvm-svn: 338971
__builtin_memmove (in non-type-punning cases).
This is intended to permit libc++ to make std::copy etc constexpr
without sacrificing the optimization that uses memcpy on
trivially-copyable types.
__builtin_strcpy and __builtin_wcscpy are not handled by this change.
They'd be straightforward to add, but we haven't encountered a need for
them just yet.
This reinstates r338455, reverted in r338602, with a fix to avoid trying
to constant-evaluate a memcpy call if either pointer operand has an
invalid designator.
llvm-svn: 338941
- Testcase attempts to (not) grep 'g0' in output to ensure asm symbol is
properly renamed, but g0 is too generic and can be part of the
module's path in LLVM IR output.
- Changed to grep '@g0', which is what the proper global symbol name
would be if not using asm.
llvm-svn: 338895
It caused asserts during Chromium builds, see reply on the cfe-commits thread.
> This is intended to permit libc++ to make std::copy etc constexpr
> without sacrificing the optimization that uses memcpy on
> trivially-copyable types.
>
> __builtin_strcpy and __builtin_wcscpy are not handled by this change.
> They'd be straightforward to add, but we haven't encountered a need for
> them just yet.
llvm-svn: 338602
__builtin_memmove (in non-type-punning cases).
This is intended to permit libc++ to make std::copy etc constexpr
without sacrificing the optimization that uses memcpy on
trivially-copyable types.
__builtin_strcpy and __builtin_wcscpy are not handled by this change.
They'd be straightforward to add, but we haven't encountered a need for
them just yet.
llvm-svn: 338455
Summary:
C and C++ are interesting languages. They are statically typed, but weakly.
The implicit conversions are allowed. This is nice, allows to write code
while balancing between getting drowned in everything being convertible,
and nothing being convertible. As usual, this comes with a price:
```
unsigned char store = 0;
bool consume(unsigned int val);
void test(unsigned long val) {
if (consume(val)) {
// the 'val' is `unsigned long`, but `consume()` takes `unsigned int`.
// If their bit widths are different on this platform, the implicit
// truncation happens. And if that `unsigned long` had a value bigger
// than UINT_MAX, then you may or may not have a bug.
// Similarly, integer addition happens on `int`s, so `store` will
// be promoted to an `int`, the sum calculated (0+768=768),
// and the result demoted to `unsigned char`, and stored to `store`.
// In this case, the `store` will still be 0. Again, not always intended.
store = store + 768; // before addition, 'store' was promoted to int.
}
// But yes, sometimes this is intentional.
// You can either make the conversion explicit
(void)consume((unsigned int)val);
// or mask the value so no bits will be *implicitly* lost.
(void)consume((~((unsigned int)0)) & val);
}
```
Yes, there is a `-Wconversion`` diagnostic group, but first, it is kinda
noisy, since it warns on everything (unlike sanitizers, warning on an
actual issues), and second, there are cases where it does **not** warn.
So a Sanitizer is needed. I don't have any motivational numbers, but i know
i had this kind of problem 10-20 times, and it was never easy to track down.
The logic to detect whether an truncation has happened is pretty simple
if you think about it - https://godbolt.org/g/NEzXbb - basically, just
extend (using the new, not original!, signedness) the 'truncated' value
back to it's original width, and equality-compare it with the original value.
The most non-trivial thing here is the logic to detect whether this
`ImplicitCastExpr` AST node is **actually** an implicit conversion, //or//
part of an explicit cast. Because the explicit casts are modeled as an outer
`ExplicitCastExpr` with some `ImplicitCastExpr`'s as **direct** children.
https://godbolt.org/g/eE1GkJ
Nowadays, we can just use the new `part_of_explicit_cast` flag, which is set
on all the implicitly-added `ImplicitCastExpr`'s of an `ExplicitCastExpr`.
So if that flag is **not** set, then it is an actual implicit conversion.
As you may have noted, this isn't just named `-fsanitize=implicit-integer-truncation`.
There are potentially some more implicit conversions to be warned about.
Namely, implicit conversions that result in sign change; implicit conversion
between different floating point types, or between fp and an integer,
when again, that conversion is lossy.
One thing i know isn't handled is bitfields.
This is a clang part.
The compiler-rt part is D48959.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=21530 | PR21530 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=37552 | PR37552 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=35409 | PR35409 ]].
Partially fixes [[ https://bugs.llvm.org/show_bug.cgi?id=9821 | PR9821 ]].
Fixes https://github.com/google/sanitizers/issues/940. (other than sign-changing implicit conversions)
Reviewers: rjmccall, rsmith, samsonov, pcc, vsk, eugenis, efriedma, kcc, erichkeane
Reviewed By: rsmith, vsk, erichkeane
Subscribers: erichkeane, klimek, #sanitizers, aaron.ballman, RKSimon, dtzWill, filcab, danielaustin, ygribov, dvyukov, milianw, mclow.lists, cfe-commits, regehr
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D48958
llvm-svn: 338288
The "Procedure Call Procedure Call Standard for the ARM® Architecture"
(https://static.docs.arm.com/ihi0042/f/IHI0042F_aapcs.pdf), specifies that
composite types are passed according to their "natural alignment", i.e. the
alignment before alignment adjustment on the entire composite is applied.
The same applies for AArch64 ABI.
Clang, however, used the adjusted alignment.
GCC already implements the ABI correctly. With this patch Clang becomes
compatible with GCC and passes such arguments in accordance with AAPCS.
Differential Revision: https://reviews.llvm.org/D46013
llvm-svn: 338279
Summary:
Right now automatic variables are either initialized with bzero followed by a few stores, or memcpy'd from a synthesized global. We end up encountering a fair amount of code where memcpy of non-zero byte patterns would be better than memcpy from a global because it touches less memory and generates a smaller binary. The optimizer could reason about this, but it's not really worth it when clang already knows.
This code could definitely be more clever but I'm not sure it's worth it. In particular we could track a histogram of bytes seen and figure out (as we do with bzero) if a memset could be followed by a handful of stores. Similarly, we could tune the heuristics for GlobalSize, but using the same as for bzero seems conservatively OK for now.
<rdar://problem/42563091>
Reviewers: dexonsmith
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D49771
llvm-svn: 337887
Generate DILabel metadata and call llvm.dbg.label after label
statement to associate the metadata with the label.
After fixing PR37395.
Differential Revision: https://reviews.llvm.org/D45045
Patch by Hsiangkai Wang.
llvm-svn: 337800
This patch adds support for vrndi_f32() and vrndiq_f32()
intrinsics in AArch32 mode and for vrndns_f32() intrinsic in
AArch64 mode.
Differential Revision: https://reviews.llvm.org/D48829
llvm-svn: 337690
As documented here: https://software.intel.com/en-us/node/682969 and
https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning
is an ICC feature that provides for function multiversioning.
This feature is implemented with two attributes: First, cpu_specific,
which specifies the individual function versions. Second, cpu_dispatch,
which specifies the location of the resolver function and the list of
resolvable functions.
This is valuable since it provides a mechanism where the resolver's TU
can be specified in one location, and the individual implementions
each in their own translation units.
The goal of this patch is to be source-compatible with ICC, so this
implementation diverges from the ICC implementation in a few ways:
1- Linux x86/64 only: This implementation uses ifuncs in order to
properly dispatch functions. This is is a valuable performance benefit
over the ICC implementation. A future patch will be provided to enable
this feature on Windows, but it will obviously more closely fit ICC's
implementation.
2- CPU Identification functions: ICC uses a set of custom functions to identify
the feature list of the host processor. This patch uses the cpu_supports
functionality in order to better align with 'target' multiversioning.
1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function
marked cpu_dispatch be an empty definition. This patch supports that as well,
however declarations are also permitted, since the linker will solve the
issue of multiple emissions.
Differential Revision: https://reviews.llvm.org/D47474
llvm-svn: 337552
constant, don't convert the rest into a packed struct.
If an array constant has a large non-zero portion and a large zero
portion, we want to emit the first part as an array and the rest as a
zeroinitializer if possible. This fixes a memory usage regression from
r333141 when compiling PHP.
llvm-svn: 337498
The codegen for this builtin was initially implemented to match GCC.
However, due to interest from users GCC changed behaviour to account for the
big endian bias of the instruction and correct it. This patch brings the
handling inline with GCC.
Fixes https://bugs.llvm.org/show_bug.cgi?id=38192
Differential Revision: https://reviews.llvm.org/D49424
llvm-svn: 337449
Summary:
Support for this option is needed for building Linux kernel.
This is a very frequently requested feature by kernel developers.
More details : https://lkml.org/lkml/2018/4/4/601
GCC option description for -fdelete-null-pointer-checks:
This Assume that programs cannot safely dereference null pointers,
and that no code or data element resides at address zero.
-fno-delete-null-pointer-checks is the inverse of this implying that
null pointer dereferencing is not undefined.
This feature is implemented in as the function attribute
"null-pointer-is-valid"="true".
This CL only adds the attribute on the function.
It also strips "nonnull" attributes from function arguments but
keeps the related warnings unchanged.
Corresponding LLVM change rL336613 already updated the
optimizations to not treat null pointer dereferencing
as undefined if the attribute is present.
Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv
Reviewed By: jyknight
Subscribers: drinkcat, xbolva00, cfe-commits
Differential Revision: https://reviews.llvm.org/D47894
llvm-svn: 337433
Summary:
Using _Atomic to do implicit load / store is just a seq_cst atomic_load / atomic_store. Stores currently assert in Sema::ImpCastExprToType with 'can't implicitly cast lvalue to rvalue with this cast kind', but that's erroneous. The codegen is fine as the test shows.
While investigating I found that Richard had found the problem here: https://reviews.llvm.org/D46112#1113557
<rdar://problem/40347123>
Reviewers: dexonsmith
Subscribers: cfe-commits, efriedma, rsmith, aaron.ballman
Differential Revision: https://reviews.llvm.org/D49458
llvm-svn: 337410
which was reverted in r337336.
The problem that required a revert was fixed in r337338.
Also added a missing "REQUIRES: x86-registered-target" to one of
the tests.
Original commit message:
> Teach Clang to emit address-significance tables.
>
> By default, we emit an address-significance table on all ELF
> targets when the integrated assembler is enabled. The emission of an
> address-significance table can be controlled with the -faddrsig and
> -fno-addrsig flags.
>
> Differential Revision: https://reviews.llvm.org/D48155
llvm-svn: 337339
Causing multiple failures on sanitizer bots due to TLS symbol errors,
e.g.
/usr/bin/ld: __msan_origin_tls: TLS definition in /home/buildbots/ppc64be-clang-test/clang-ppc64be/stage1/lib/clang/7.0.0/lib/linux/libclang_rt.msan-powerpc64.a(msan.cc.o) section .tbss.__msan_origin_tls mismatches non-TLS reference in /tmp/lit_tmp_0a71tA/mallinfo-3ca75e.o
llvm-svn: 337336
By default, we emit an address-significance table on all ELF
targets when the integrated assembler is enabled. The emission of an
address-significance table can be controlled with the -faddrsig and
-fno-addrsig flags.
Differential Revision: https://reviews.llvm.org/D48155
llvm-svn: 337333
Summary:
If the build path is short, `Line` field can end up fitting on the same line as `File`,
but the `{{.*}}` would consume it. Keeping in mind rL293149, i think we can fix it,
while keeping it working when there are and there are not any quotations.
At least this fixes this test for me.
Reviewers: anemet, aaron.ballman, hfinkel
Reviewed By: anemet
Subscribers: cfe-commits, llvm-commits
Differential Revision: https://reviews.llvm.org/D49348
llvm-svn: 337249
Summary: Automatic variable initialization was generating default-aligned stores (which are deprecated) instead of using the known alignment from the alloca. Further, they didn't specify inbounds.
Subscribers: dexonsmith, cfe-commits
Differential Revision: https://reviews.llvm.org/D49209
llvm-svn: 337041
string, choose the strictest one instead of the last.
Also fix an undefined behavior. Move the pointer update to a later point to
avoid adding StringRef::npos to the pointer.
rdar://problem/40706280
llvm-svn: 336863
Code in `CodeGenModule::SetFunctionAttributes()` could set an empty
attribute `implicit-section-name` on a function that is affected by
`#pragma clang text="section"`. This is incorrect because the attribute
should contain a valid section name. If the function additionally also
used `__attribute__((section("section")))` then this could result in
emitting the function in a section with an empty name.
The patch fixes the issue by removing the problematic code that sets
empty `implicit-section-name` from
`CodeGenModule::SetFunctionAttributes()` because it is sufficient to set
this attribute only from a similar code in `setNonAliasAttributes()`
when the function is emitted.
Differential Revision: https://reviews.llvm.org/D48916
llvm-svn: 336842
Summary:
Make sure that loop metadata only is put on the backedge
when expanding a do-while loop.
Previously we added the loop metadata also on the branch
in the pre-header. That could confuse optimization passes
and result in the loop metadata being associated with the
wrong loop.
Fixes https://bugs.llvm.org/show_bug.cgi?id=38011
Committing on behalf of deepak2427 (Deepak Panickal)
Reviewers: #clang, ABataev, hfinkel, aaron.ballman, bjope
Reviewed By: bjope
Subscribers: bjope, rsmith, shenhan, zzheng, xbolva00, lebedev.ri, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D48721
llvm-svn: 336717
This patch lowers the _mm[256|512]_cvtepi{64|32|16}_epi{32|16|8} intrinsics to
native IR in cases where the result's length is less than 128 bits.
The resulting IR for 256-bit inputs is folded into VPMOV instructions, while for
128-bit inputs the vpshufb (or, in the 64-to-32-bit case, vinsertps)
instructions are generated instead
Differential Revision: https://reviews.llvm.org/D48712
llvm-svn: 336643
The rounding mode is checked in CGBuiltin.cpp to generate the correct intrinsic call.
Making this switch switchs the masking to use the i8 bitcast to <8 x i1> and extract i1 version of the IR for the mask. Previously we ended up with a scalar 'and' plus an icmp.
llvm-svn: 336637
Privacy annotations shouldn't have to appear in the first
comma-delimited string in order to be recognized. Also, they should be
ignored if they are preceded or followed by non-whitespace characters.
rdar://problem/40706280
llvm-svn: 336629
This will convert the i8 mask argument to <8 x i1> and extract an i1 and then emit a select instruction. This replaces the '(__U & 1)" and ternary operator used in some of intrinsics. The old sequence was lowered to a scalar and and compare. The new sequence uses an i1 vector that will interoperate better with other mask intrinsics.
This removes the need to handle div_ss/sd specially in CGBuiltin.cpp. A follow up patch will add the GCCBuiltin name back in llvm and remove the custom handling.
I made some adjustments to legacy move_ss/sd intrinsics which we reused here to do a simpler extract and insert instead of 2 extracts and two inserts or a shuffle.
llvm-svn: 336622
This is part of an ongoing attempt at making 512 bit vectors illegal in the X86 backend type legalizer due to CPU frequency penalties associated with wide vectors on Skylake Server CPUs. We want the loop vectorizer to be able to emit IR containing wide vectors as intermediate operations in vectorized code and allow these wide vectors to be legalized to 256 bits by the X86 backend even though we are targetting a CPU that supports 512 bit vectors. This is similar to what happens with an AVX2 CPU, the vectorizer can emit wide vectors and the backend will split them. We want this splitting behavior, but still be able to use new Skylake instructions that work on 256-bit vectors and support things like masking and gather/scatter.
Of course if the user uses explicit vector code in their source code we need to not split those operations. Especially if they have used any of the 512-bit vector intrinsics from immintrin.h. And we need to make it so that merely using the intrinsics produces the expected code in order to be backwards compatible.
To support this goal, this patch adds a new IR function attribute "min-legal-vector-width" that can indicate the need for a minimum vector width to be legal in the backend. We need to ensure this attribute is set to the largest vector width needed by any intrinsics from immintrin.h that the function uses. The inliner will be reponsible for merging this attribute when a function is inlined. We may also need a way to limit inlining in the future as well, but we can discuss that in the future.
To make things more complicated, there are two different ways intrinsics are implemented in immintrin.h. Either as an always_inline function containing calls to builtins(can be target specific or target independent) or vector extension code. Or as a macro wrapper around a taget specific builtin. I believe I've removed all cases where the macro was around a target independent builtin.
To support the always_inline function case this patch adds attribute((min_vector_width(128))) that can be used to tag these functions with their vector width. All x86 intrinsic functions that operate on vectors have been tagged with this attribute.
To support the macro case, all x86 specific builtins have also been tagged with the vector width that they require. Use of any builtin with this property will implicitly increase the min_vector_width of the function that calls it. I've done this as a new property in the attribute string for the builtin rather than basing it on the type string so that we can opt into it on a per builtin basis and avoid any impact to target independent builtins.
There will be future work to support vectors passed as function arguments and supporting inline assembly. And whatever else we can find that isn't covered by this patch.
Special thanks to Chandler who suggested this direction and reviewed a preview version of this patch. And thanks to Eric Christopher who has had many conversations with me about this issue.
Differential Revision: https://reviews.llvm.org/D48617
llvm-svn: 336583
Add a number of builtins for __float128 Round To Odd.
This is the Clang portion of the builtins work.
Differential Revision: https://reviews.llvm.org/D47548
llvm-svn: 336579
I believe these have been broken since their introduction into clang.
I've enhanced the tests for these intrinsics to using a real rounding mode and checking all the intrinsic arguments instead of just the name.
llvm-svn: 336498
This case occurs in the intrinsic headers so we should avoid emitting the mask in those cases.
Factor the code into a helper function to make this easy.
llvm-svn: 336472
We had the mask versions of the rounding intrinsics, but not one without masking.
Also change the rounding tests to not use the CUR_DIRECTION rounding mode.
llvm-svn: 336470
A Chromium developer reported a bug which turned out to be a mangling
collision between these two literals:
char s[] = "foo";
char t[32] = "foo";
They may look the same, but for the initialization of t we will (under
some circumstances) use a literal that's extended with zeros, and
both the length and those zeros should be accounted for by the mangling.
This actually makes the mangling code simpler: where it previously had
special logic for null terminators, which are not part of the
StringLiteral, that is now covered by the general algorithm.
(The problem was reported at https://crbug.com/857442)
Differential Revision: https://reviews.llvm.org/D48928
llvm-svn: 336415
Shufflevector is easier to generate and matches what the backend pattern matches without relying on constant selects being turned into shuffles.
While I was there I also made the IR regular expressions a little stricter to ensure operand order on the shuffle.
llvm-svn: 336388
This patch removes on optimization used with the TRUE/FALSE
predicates, as was suggested in https://reviews.llvm.org/D45616
for r335339.
The optimization was buggy, since r335339 used it also
for *_mask builtins, without actually applying the mask -- the
mask argument was just ignored.
Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D48715
llvm-svn: 336355
Add test cases with each predicate using the following
intrinsics:
_mm_cmp_pd
_mm_cmp_ps
_mm256_cmp_pd
_mm256_cmp_ps
_mm_cmp_pd_mask
_mm_cmp_ps_mask
_mm256_cmp_pd_mask
_mm256_cmp_ps_mask
_mm512_cmp_pd_mask
_mm512_cmp_ps_mask
_mm_mask_cmp_pd_mask
_mm_mask_cmp_ps_mask
_mm256_mask_cmp_pd_mask
_mm256_mask_cmp_ps_mask
_mm512_mask_cmp_pd_mask
_mm512_mask_cmp_ps_mask
Some of these are marked with FIXME, as there is bug in lowering
e.g. _mm512_mask_cmp_ps_mask.
llvm-svn: 336346
Update clang to treat fp128 as a valid base type for homogeneous aggregate
passing and returning.
Differential Revision: https://reviews.llvm.org/D48044
llvm-svn: 336308
All of these found by grepping through IR from the builtin tests for extra trunc and zext/sext instructions that shouldn't have been there.
Some of these were real bugs where we lost bits from the user input:
_mm512_mask_broadcast_f32x8
_mm512_maskz_broadcast_f32x8
_mm512_mask_broadcast_i32x8
_mm512_maskz_broadcast_i32x8
_mm256_mask_cvtusepi16_storeu_epi8
llvm-svn: 336042
Summary:
Changes to some clang side tests to go with the summary parsing patch.
Depends on D47905.
Reviewers: pcc, dexonsmith, mehdi_amini
Subscribers: inglorion, eraman, cfe-commits, steven_wu
Differential Revision: https://reviews.llvm.org/D47906
llvm-svn: 335618
The WebAssembly backend in particular benefits from being
able to distinguish between varargs functions (...) and prototype-less
C functions.
Differential Revision: https://reviews.llvm.org/D48443
llvm-svn: 335510
With MSVC, PCH files are created along with an object file that needs to
be linked into the final library or executable. That object file
contains the code generated when building the headers. In particular, it
will include definitions of inline dllexport functions, and because they
are emitted in this object file, other files using the PCH do not need
to emit them. See the bug for an example.
This patch makes clang-cl match MSVC's behaviour in this regard, causing
significant compile-time savings when building dlls using precompiled
headers.
For example, in a 64-bit optimized shared library build of Chromium with
PCH, it reduces the binary size and compile time of
stroke_opacity_custom.obj from 9315564 bytes to 3659629 bytes and 14.6
to 6.63 s. The wall-clock time of building blink_core.dll goes from
38m41s to 22m33s. ("user" time goes from 1979m to 1142m).
Differential Revision: https://reviews.llvm.org/D48426
llvm-svn: 335466
Since we are now producing a summary also for regular LTO builds, we
need to run the NameAnonGlobals pass in those cases as well (the
summary cannot handle anonymous globals).
See https://reviews.llvm.org/D34156 for details on the original change.
This reverts commit 6c9ee4a4a438a8059aacc809b2dd57128fccd6b3.
llvm-svn: 335385
dead code.
This is important for C++ templates that essentially compute the valid
input in a way that is constant and will cause all the invalid cases to
be dead code that is deleted. Code in the wild actually does this and
GCC also accepts these kinds of patterns so it is important to support
it.
To make this work, we provide a non-error path to diagnose these issues,
and use a default-error warning instead. This keeps the relatively
strict handling but prevents nastiness like SFINAE on these errors. It
also allows us to safely use the system to diagnose this only when it
occurs at runtime (in emitted code).
Entertainingly, this required fixing the syntax in various other ways
for the x86 test because we never bothered to diagnose that the returns
were invalid.
Since debugging these compile failures was super confusing, I've also
improved the diagnostic to actually say what the value was. Most of the
checks I've made ignore this to simplify maintenance, but I've checked
it in a few places to make sure the diagnsotic is working.
Depends on D48462. Without that, we might actually crash some part of
the compiler after bypassing the error here.
Thanks to Richard, Ben Kramer, and especially Craig Topper for all the
help here.
Differential Revision: https://reviews.llvm.org/D48464
llvm-svn: 335309
D48464 contains changes that will loosen some of the range checks in SemaChecking to a DefaultError warning that can be disabled.
This patch adds explicit masking to avoid using the upper bits of immediates to gracefully handle the warning being disabled.
llvm-svn: 335308
This is breaking a couple of buildbots. We need to run the
NameAnonGlobal pass for regular LTO now as well (since we're producing a
summary). I'll post a separate patch for review to make this happen and
then re-commit.
This reverts commit c0759b7b1f4a81ff9021b952aa38a222d5fa4dfd.
llvm-svn: 335291
Summary:
With D33921, we gained the ability to have module summaries in regular
LTO modules without triggering ThinLTO compilation. Module summaries in
regular LTO allow garbage collection (dead stripping) before LTO
compilation and thus open up additional optimization opportunities.
This patch enables summary emission in regular LTO for all targets
except ld64-based ones (which use the legacy LTO API).
Reviewers: pcc, tejohnson, mehdi_amini
Subscribers: inglorion, eraman, cfe-commits
Differential Revision: https://reviews.llvm.org/D34156
llvm-svn: 335284
Summary:
This test is a strip down version of a function inside the
amalgamated sqlite source. When converted to IR clang produces
a phi instruction without debug location.
This patch fixes the above issue.
Differential Revision: https://reviews.llvm.org/D47720
llvm-svn: 335255
Similar to what was done to max/min recently.
These already reduced the vector width to 256 and 128 bit as we go unlike the original max/min code.
Differential Revision: https://reviews.llvm.org/D48346
llvm-svn: 335253
The SPIR target currently allows for half precision floating point types to be
emitted using the LLVM intrinsic functions which convert half types to floats
and doubles. However, this is illegal in SPIR as the only intrinsic allowed by
SPIR is memcpy, as per section 3 of the SPIR specification. Currently this is
leading to an assert being hit in the Clang CodeGen when attempting to emit a
constant or literal _Float16 type in a comparison operation on a SPIR or SPIR64
target. This assert stems from the CodeGen attempting to emit a constant half
value as an integer because the backend has specified that it is using these
half conversion intrinsics (which represents half as i16). This patch prevents
SPIR targets from using these intrinsics by overloading the responsible target
info method, marks SPIR targets as having a legal half type and provides
additional regression testing for the _Float16 type on SPIR targets.
Patch by: Stephen McGroarty
Differential Revision: https://reviews.llvm.org/D48188
llvm-svn: 335111
We only need to use 512 bit vectors all the way through v8i64 reductions since those max instructions are new to avx512f and only available in 512 bits until SKX.
For v16i32 and floating point we have legacy 128/256 bit instructions we can use.
I've tried to use other intrinsics to reduce the verbosity of the code and avoid having to mention all the shuffles. I've also removed all the -1 shuffle indices so the output sequence is fully specified and not left to backend optimization.
Differential Revision: https://reviews.llvm.org/D47401
llvm-svn: 335070
Clang/LLVM doesn't have a way to pass an HLE hint through to the X86 backend to emit HLE prefixed instructions. So this is a good short term fix.
Differential Revision: https://reviews.llvm.org/D47672
llvm-svn: 334751
Summary: These intrinsics result in hint instructions. They are provided here for MSVC ARM64 compatibility.
Reviewers: mstorsjo, compnerd, javed.absar
Reviewed By: mstorsjo
Subscribers: kristof.beyls, chrib, cfe-commits
Differential Revision: https://reviews.llvm.org/D48132
llvm-svn: 334639
https://bugs.llvm.org/show_bug.cgi?id=37778
...shows a miscompile resulting from marking nan builtins as 'const'.
The nan libcalls/builtins take a pointer argument:
http://www.cplusplus.com/reference/cmath/nan-function/
...and the chars dereferenced by that arg are used to fill in the NaN constant payload bits.
"const" means that the pointer argument isn't dereferenced. That's translated to "readnone" in LLVM.
"pure" means that the pointer argument may be dereferenced. That's translated to "readonly" in LLVM.
This change prevents the IR optimizer from killing the lead-up to the nan call here:
double a() {
char buf[4];
buf[0] = buf[1] = buf[2] = '9';
buf[3] = '\0';
return __builtin_nan(buf);
}
...the optimizer isn't currently able to simplify this to a constant as we might hope,
but this patch should solve the miscompile.
Differential Revision: https://reviews.llvm.org/D48134
llvm-svn: 334628
Summary:
This fixes the ranges for the vcvth family of FP16 intrinsics in the clang front end. Previously it was accepting incorrect ranges
-Changed builtin range checking in SemaChecking
-added tests SemaCheck changes - included in their own file since no similar one exists
-modified existing tests to reflect new ranges
Reviewers: SjoerdMeijer, javed.absar
Reviewed By: SjoerdMeijer
Subscribers: kristof.beyls, cfe-commits
Differential Revision: https://reviews.llvm.org/D47592
llvm-svn: 334489
This patch adds aliases for -Qn (-fno-ident) and -Qy (-fident) which
look less cryptic than -Qn/-Qy. The aliases are compatible with GCC.
Differential Revision: https://reviews.llvm.org/D48021
llvm-svn: 334414
Summary: We've had these target independent intrinsics for at least a year and a half. Looks like they do exactly what we need here and the backend already supports them.
Reviewers: RKSimon, delena, spatel, GBuella
Reviewed By: RKSimon
Subscribers: cfe-commits, llvm-commits
Differential Revision: https://reviews.llvm.org/D47693
llvm-svn: 334366
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.
Differential Revision: https://reviews.llvm.org/D47446
llvm-svn: 334362
Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc.
llvm-svn: 334261
The windows-msvc target is used for MSVC ABI compatibility, including
the exceptions model. It doesn't make sense to pair a windows-msvc
target with a non-MSVC exception model. This would previously cause an
assertion failure; explicitly error out for it in the frontend instead.
This also allows us to reduce the matrix of target/exception models a
bit (see the modified tests), and we can possibly simplify some of the
personality code in a follow-up.
Differential Revision: https://reviews.llvm.org/D47853
llvm-svn: 334243
Adds support for these intrinsics, which are ARM and ARM64 only:
_interlockedbittestandreset_acq
_interlockedbittestandreset_rel
_interlockedbittestandreset_nf
_interlockedbittestandset_acq
_interlockedbittestandset_rel
_interlockedbittestandset_nf
Refactor the bittest intrinsic handling to decompose each intrinsic into
its action, its width, and its atomicity.
llvm-svn: 334239
We still lower them to native shuffle IR, but we do it in CGBuiltin.cpp now. This allows us to check the target feature and ensure the immediate fits in 8 bits.
This also improves our -O0 codegen slightly because we're able to see the zeroinitializer in the shuffle. It looks like it got lost behind a store+load previously.
llvm-svn: 334208
Summary:
When requirement imposed by __target__ attributes on functions
are not satisfied, prefer printing those requirements, which
are explicitly mentioned in the attributes.
This makes such messages more useful, e.g. printing avx512f instead of avx2
in the following scenario:
```
$ cat foo.c
static inline void __attribute__((__always_inline__, __target__("avx512f")))
x(void)
{
}
int main(void)
{
x();
}
$ clang foo.c
foo.c:7:2: error: always_inline function 'x' requires target feature 'avx2', but would be inlined into function 'main' that is compiled without support for 'avx2'
x();
^
1 error generated.
```
bugzilla: https://bugs.llvm.org/show_bug.cgi?id=37338
Reviewers: craig.topper, echristo, dblaikie
Reviewed By: craig.topper, echristo
Differential Revision: https://reviews.llvm.org/D46541
llvm-svn: 334174
Summary:
We recently switch to using a selects in the intrinsics header files for FMA instructions. But the 512-bit versions support flavors with rounding mode which must be an Integer Constant Expression. This has forced those intrinsics to be implemented as macros. As it stands now the mask and mask3 intrinsics evaluate one of their macro arguments twice. If that argument itself is another intrinsic macro, we can end up over expanding macros. Or if its something we can CSE later it would show up multiple times when it shouldn't.
I tried adding __extension__ around the macro and making it an expression statement and declaring a local variable. But whatever name you choose for the local variable can never be used as the name of an input to the macro in user code. If that happens you would end up with the same name on the LHS and RHS of an assignment after expansion. We might be safe if we use __ in front of the variable names because those names are reserved and user code shouldn't use that, but I wasn't sure I wanted to make that claim.
The other option which I've chosen here, is to add back _mask, _maskz, and _mask3 flavors of the builtin which we will expand in CGBuiltin.cpp to replicate the argument as needed and insert any fneg needed on the third operand to make a subtract. The _maskz isn't truly necessary if we have an unmasked version or if we use the masked version with a -1 mask and wrap a select around it. But I've chosen to make things more uniform.
I separated out the scalar builtin handling to avoid too many things going on in EmitX86FMAExpr. It was different enough due to the extract and insert that the minor duplication of the CreateCall was probably worth it.
Reviewers: tkrupa, RKSimon, spatel, GBuella
Reviewed By: tkrupa
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D47724
llvm-svn: 334159
Factor out the common setjmp call emission code.
Based on a patch by Chris January
Differential Revision: https://reviews.llvm.org/D47784
llvm-svn: 334112
I tested these locally on an x86 machine by disabling the inline asm
codepath and confirming that it does the same bitflips as we do with the
inline asm.
Addresses code review feedback.
llvm-svn: 334059
Previously we were just using extended vector operations in the header file.
This unfortunately allowed non-constant indices to be used with the intrinsics. This is incompatible with gcc, icc, and MSVC. It also introduces a different performance characteristic because non-constant index gets lowered to a vector store and an element sized load.
By adding the builtins we can check for the index to be a constant and ensure its in range of the vector element count.
User code still has the option to use extended vector operations themselves if they need non-constant indexing.
llvm-svn: 334057
We need to implement _interlockedbittestandset as a builtin for
windows.h, so we might as well do the whole family. It reduces code
duplication anyway.
Fixes PR33188, a long standing bug in our bittest implementation
encountered by Chakra.
llvm-svn: 333978
Summary:
Adds testing of combined index summary entries in disassembly format
to CFI tests that were already testing the bitcode format.
Depends on D46699.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, eraman, cfe-commits
Differential Revision: https://reviews.llvm.org/D46700
llvm-svn: 333966
Adding __attribute__((aligned(32))) to __m256 breaks the implementation
of _mm256_loadu_ps on Windows. On Windows, alignment attributes have
higher precedence than packing attributes.
We also might want to carefully consider the consequences of changing
our vector typedefs, since many users copy them and invent their own
new, non-Intel specific vector type names.
llvm-svn: 333958
This is more consistent with other usages of builtin_shufflevector. Later optimization passes or codegen will detect the duplicate vector and replace it with undef. Using _mm_undefined just puts a zeroinitializer that still needs to be optimized out later.
llvm-svn: 333944
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.
Differential Revision: https://reviews.llvm.org/D47121
llvm-svn: 333829
This fixes two major problems:
- We were not capping vector alignment as desired on 32-bit ARM.
- We were using different alignments based on the AVX settings on
Intel, so we did not have a consistent ABI.
This is an ABI break, but we think we can get away with it because
vectors tend to be used mostly in inline code (which is why not having
a consistent ABI has not proven disastrous on Intel).
Intel's AVX types are specified as having 32-byte / 64-byte alignment,
so align them explicitly instead of relying on the base ABI rule.
Note that this sort of attribute is stripped from template arguments
in template substitution, so there's a possibility that code templated
over vectors will produce inadequately-aligned objects. The right
long-term solution for this is for alignment attributes to be
interpreted as true qualifiers and thus preserved in the canonical type.
llvm-svn: 333791
The WebAssembly committee has decided on the names `memory.size` and
`memory.grow` for the memory intrinsics, so update the clang builtin
functions to follow those names, keeping both sets of old names in place
for compatibility.
llvm-svn: 333712
I think this is a holdover from when we used to declare variables inside the macros. And then its been copy and pasted forward for years every time a new macro intrinsic gets added.
Interestingly this caused some tests for IRGen to be slightly more optimized. We now return a zeroinitializer directly instead of going through a store+load.
It also removed a bogus error message on another test.
llvm-svn: 333613
We don't need the insertion back into the original vector at the end. The builtin already understands that.
This is different than _mm_sqrt_sd which takes two arguments and we do need to insert.
llvm-svn: 333572
We had quite a few for different element sizes of integers sometimes with strange target features attached to them.
We only need a single version for each of _m128i, _m256i, and _m512i with the target feature that first introduced those types.
llvm-svn: 333568
This patch replaces all packed (and scalar without rounding
mode) fused intrinsics with fmadd/fmaddsub variations.
Then fmadd/fmaddsub are lowered to native IR.
Patch by tkrupa
Reviewers: craig.topper, sroland, spatel, RKSimon
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D47444
llvm-svn: 333555
These intrinsics are used by MSVC's header files on AArch64 Windows as
well as AArch32, so we should support them for both targets. I've
factored them out of CodeGenFunction::EmitARMBuiltinExpr into separate
functions that EmitAArch64BuiltinExpr can call as well.
Reviewers: javed.absar, mstorsjo
Reviewed By: mstorsjo
Subscribers: kristof.beyls, cfe-commits
Differential Revision: https://reviews.llvm.org/D47476
llvm-svn: 333513
Mostly this fixes the names of all the 128-bit intrinsics to start with _mm_ instead of _mm128_ as is the convention and what the Intel docs say.
This also fixes the name of the bitshuffle intrinsics to say epi64 for 128 and 256 bit versions.
llvm-svn: 333497
Summary:
We only need to use 512 bit vectors all the way through v8i64 reductions since those max instructions are new to avx512f and only available in 512 bits until SKX.
For v16i32 and floating point we have legacy 128/256 bit instructions we can use.
I've tried to use other intrinsics to reduce the verbosity of the code and avoid having to mention all the shuffles. I've also removed all the -1 shuffle indices so the output sequence is fully specified and not left to backend optimization.
Reviewers: RKSimon, spatel, GBuella
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D47401
llvm-svn: 333347
The checksum will not reflect the real source, so there's no clear
reason to include them in the debug info. Also this was causing a
crash on the DWARF side.
Differential Revision: https://reviews.llvm.org/D47260
llvm-svn: 333311
An intrinsic for an old instruction, as described in the Intel SDM.
Reviewers: craig.topper, rnk
Reviewed By: craig.topper, rnk
Differential Revision: https://reviews.llvm.org/D47142
llvm-svn: 333256
to checking for attributes on the call site - and fix up builtin
functions that we were testing for but not ensuring wouldn't be
optimized by the backend.
Leave one set of asm tests to make sure that we're also communicating
builtin-ness to TLI.
llvm-svn: 333154
It caused asserts, see PR37560.
> Use zeroinitializer for (trailing zero portion of) large array initializers
> more reliably.
>
> Clang has two different ways it emits array constants (from InitListExprs and
> from APValues), and both had some ability to emit zeroinitializer, but neither
> was able to catch all cases where we could use zeroinitializer reliably. In
> particular, emitting from an APValue would fail to notice if all the explicit
> array elements happened to be zero. In addition, for large arrays where only an
> initial portion has an explicit initializer, we would emit the complete
> initializer (which could be huge) rather than emitting only the non-zero
> portion. With this change, when the element would have a suffix of more than 8
> zero elements, we emit the array constant as a packed struct of its initial
> portion followed by a zeroinitializer constant for the trailing zero portion.
>
> In passing, I found a bug where SemaInit would sometimes walk the entire array
> when checking an initializer that only covers the first few elements; that's
> fixed here to unblock testing of the rest.
>
> Differential Revision: https://reviews.llvm.org/D47166
llvm-svn: 333067
Previously we negated the whole vector after splatting infinity. But its better to negate the infinity before splatting. This generates IR with the negate already folded with the infinity constant.
llvm-svn: 333062
more reliably.
Clang has two different ways it emits array constants (from InitListExprs and
from APValues), and both had some ability to emit zeroinitializer, but neither
was able to catch all cases where we could use zeroinitializer reliably. In
particular, emitting from an APValue would fail to notice if all the explicit
array elements happened to be zero. In addition, for large arrays where only an
initial portion has an explicit initializer, we would emit the complete
initializer (which could be huge) rather than emitting only the non-zero
portion. With this change, when the element would have a suffix of more than 8
zero elements, we emit the array constant as a packed struct of its initial
portion followed by a zeroinitializer constant for the trailing zero portion.
In passing, I found a bug where SemaInit would sometimes walk the entire array
when checking an initializer that only covers the first few elements; that's
fixed here to unblock testing of the rest.
Differential Revision: https://reviews.llvm.org/D47166
llvm-svn: 333044
The clang builtins have the same semantics as the stdlib functions.
The stdlib functions are defined in section 7.20.6.1 of the C standard with:
"If the result cannot be represented, the behavior is undefined."
That lets us mark the negation with 'nsw' because "sub i32 0, INT_MIN" would
be UB/poison.
Differential Revision: https://reviews.llvm.org/D47202
llvm-svn: 333038
This change will help Visual Studio resolve forward references to C++ lambda
routines used by captured variables.
Differential Revision: https://reviews.llvm.org/D45438
llvm-svn: 332975
Because the intrinsics in the headers are implemented as macros, we can't just use a select builtin and pternlog builtin. This would require one of the macro arguments to be used twice. Depending on what was passed to the macro we could expand an expression twice leading to weird behavior. We could maybe declare our local variable in the macro, but that would need to worry about name collisions.
To avoid that just generate IR directly in CGBuiltin.cpp.
Differential Revision: https://reviews.llvm.org/D47125
llvm-svn: 332891
I believe this is safe assuming default default FP environment. The conversion might be inexact, but it can never overflow the FP type so this shouldn't be undefined behavior for the uitofp/sitofp instructions.
We already do something similar for scalar conversions.
Differential Revision: https://reviews.llvm.org/D46863
llvm-svn: 332882
If a variable has an initializer, codegen tries to build its value. If
the variable is large in size, building its value requires substantial
resources. It causes strange behavior from user viewpoint: compilation
of huge zero initialized arrays like:
char data_1[2147483648u] = { 0 };
consumes enormous amount of time and memory.
With this change codegen tries to determine if variable initializer is
equivalent to zero initializer. In this case variable value is not
constructed.
This change fixes PR18978.
Differential Revision: https://reviews.llvm.org/D46241
llvm-svn: 332847
in gcc by https://gcc.gnu.org/ml/gcc-cvs/2018-04/msg00534.html.
The -mibt feature flag is being removed, and the -fcf-protection
option now also defines a CET macro and causes errors when used
on non-X86 targets, while X86 targets no longer check for -mibt
and -mshstk to determine if -fcf-protection is supported. -mshstk
is now used only to determine availability of shadow stack intrinsics.
Comes with an LLVM patch (D46882).
Patch by mike.dvoretsky
Differential Revision: https://reviews.llvm.org/D46881
llvm-svn: 332704
The existing test for the AArch64 inline assembly constraint S uses the
A and L modifiers. These modifiers were implemented in the original
AArch64 backend but were not carried forward to the merged backend. The
A is associated with ADRP and does nothing, the L is associated with
:lo12: . Given that A and L are not supported by GCC and not supported
by the new implementation of constraint S in LLVM (see D46745) I've
altered the test to put :lo12: directly in the string so that A and L
are not needed.
Differential Revision: https://reviews.llvm.org/D46932
llvm-svn: 332606
As long as the destination type is a 256 or 128 bit vector with the same number of elements we can use __builtin_convertvector to directly generate trunc IR instruction which will be handled natively by the backend.
Differential Revision: https://reviews.llvm.org/D46742
llvm-svn: 332266
If we're using default rounding mode we can let __builtin_convertvector to generate an fpextend. This matches 128 and 256 bit.
If we're using the version that takes an explicit rounding mode argument we would need to look at the immediate to see if its CUR_DIRECTION.
llvm-svn: 332210
We can use direct C code for these that will use uitofp and insertelement instructions.
For the versions that take an explicit rounding mode we can't do this.
llvm-svn: 332203
These intrinsics work exactly as all other atomic_fetch_* intrinsics and allow to create *atomicrmw* with ordering.
Updated the clang-extensions document.
Differential Revision: https://reviews.llvm.org/D46386
llvm-svn: 332193
This is unnecessary for AVX512VL supporting CPUs like SKX. We can just emit a 128-bit masked load/store here no matter what. The backend will widen it to 512-bits on KNL CPUs.
Fixes the frontend portion of PR37386. Need to fix the backend to optimize the new sequences well.
llvm-svn: 331958
Previously we emitted something like
rotl(x, n) {
n &= bitwidth-1;
return n != 0 ? ((x << n) | (x >> (bitwidth - n)) : x;
}
We use a select to avoid the undefined behavior on the (bitwidth - n) shift.
The middle and backend don't really recognize this as a rotate and end up emitting a cmov or control flow because of the select.
A better pattern is (x << (n & mask)) | (x << (-n & mask)) where mask is bitwidth - 1.
Fixes the main complaint in PR37387. There's still some work to be done if the user writes that sequence directly on a short or char where type promotion rules can prevent it from being recognized. The builtin is emitting direct IR with unpromoted types so that isn't a problem for it.
Differential Revision: https://reviews.llvm.org/D46656
llvm-svn: 331943
Summary:
This attribute tells clang to skip this function from stack protector
when -stack-protector option is passed.
GCC option for this is:
__attribute__((__optimize__("no-stack-protector"))) and the
equivalent clang syntax would be: __attribute__((no_stack_protector))
This is used in Linux kernel to selectively disable stack protector
in certain functions.
Reviewers: aaron.ballman, rsmith, rnk, probinson
Reviewed By: aaron.ballman
Subscribers: probinson, srhines, cfe-commits
Differential Revision: https://reviews.llvm.org/D46300
llvm-svn: 331925
It broke the Chromium build (see reply on the review).
> Generate DILabel metadata and call llvm.dbg.label after label
> statement to associate the metadata with the label.
>
> Differential Revision: https://reviews.llvm.org/D45045
>
> Patch by Hsiangkai Wang.
This doesn't revert the change to backend-unsupported-error.ll
that seems to correspond to an llvm-side change.
llvm-svn: 331861
Summary:
An _Atomic of an empty struct is pretty silly. In general we just widen empty
structs to hold a byte's worth of storage, and we represent size and alignment
as 0 internally and let LLVM figure out what to do. For _Atomic it's a bit
different: the memory model mandates concrete effects occur when atomic
operations occur, so in most cases actual instructions need to get emitted. It's
really not worth trying to optimize empty struct atomics by figuring out e.g.
that a fence would do, even though sane compilers should do optimize atomics.
Further, wg21.link/p0528 will fix C++20 atomics with padding bits so that
cmpxchg on them works, which means that we'll likely need to do the zero-init
song and dance for empty atomic structs anyways (and I think we shouldn't
special-case this behavior to C++20 because prior standards are just broken).
This patch therefore makes a minor change to r176658 "Promote atomic type sizes
up to a power of two": if the width of the atomic's value type is 0, just use 1
byte for width and leave alignment as-is (since it should never be zero, and
over-aligned zero-width structs are weird but fine).
This fixes an assertion:
(NumBits >= MIN_INT_BITS && "bitwidth too small"), function get, file ../lib/IR/Type.cpp, line 241.
It seems like this has run into other assertions before (namely the unreachable
Kind check in ImpCastExprToType), but I haven't reproduced that issue with
tip-of-tree.
<rdar://problem/39678063>
Reviewers: arphaman, rjmccall
Subscribers: aheejin, cfe-commits
Differential Revision: https://reviews.llvm.org/D46613
llvm-svn: 331845
Generate DILabel metadata and call llvm.dbg.label after label
statement to associate the metadata with the label.
Differential Revision: https://reviews.llvm.org/D45045
Patch by Hsiangkai Wang.
llvm-svn: 331843
The error turns out to be:
Assertion failed: (Target.isCompatibleDataLayout(getDataLayout()) && "Can't create a MachineFunction using a Module with a " "Target-incompatible DataLayout attached\n"), function init, file /Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm/lib/CodeGen/MachineFunction.cpp, line 180.
Add -target to address this. Also re-enable the test I had temporarily
commented, and move it further down in case there is still a failure
(since it pipes stderr to FileCheck).
llvm-svn: 331597
Summary:
Passes down the necessary code ge options to the LTO Config to enable
-fdiagnostics-show-hotness and -fsave-optimization-record in the ThinLTO
backend for a distributed build.
Also, remove warning about not having PGO when the input is IR.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, eraman, cfe-commits
Differential Revision: https://reviews.llvm.org/D46464
llvm-svn: 331592
instrumentation codegeneration strategy of using a data structure and
a loop. Required some finesse to get the critical things being tested to
surface in a nice way for FileCheck but I think this preserves the
original intent of the test.
llvm-svn: 331411
Both sides of this #if #include the same file. Drop the #if, leaving only the #include.
Patch by Matt Glazar.
Differential Revision: https://reviews.llvm.org/D45779
llvm-svn: 331305
Teach AsmParser to check with Assembler for when evaluating constant
expressions. This improves the handing of preprocessor expressions
that must be resolved at parse time. This idiom can be found as
assembling-time assertion checks in source-level assemblers. Note that
this relies on the MCStreamer to keep sufficient tabs on Section /
Fragment information which the MCAsmStreamer does not. As a result the
textual output may fail where the equivalent object generation would
pass. This can most easily be resolved by folding the MCAsmStreamer
and MCObjectStreamer together which is planned for in a separate
patch.
Currently, this feature is only enabled for assembly input, keeping IR
compilation consistent between assembly and object generation.
Reviewers: echristo, rnk, probinson, espindola, peter.smith
Reviewed By: peter.smith
Subscribers: eraman, peter.smith, arichardson, jyknight, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D45164
llvm-svn: 331218
As suggested in the post-commit thread for rL331056, we should match these
clang options with the established vocabulary of the corresponding sanitizer
option. Also, the use of 'strict' is well-known for these kinds of knobs,
and we can improve the descriptive text in the docs.
So this intends to match the logic of D46135 but only change the words.
Matching LLVM commit to match this spelling of the attribute to follow shortly.
Differential Revision: https://reviews.llvm.org/D46236
llvm-svn: 331209
As discussed in the post-commit thread for:
rL330437 ( http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20180423/545906.html )
We need a way to opt-out of a float-to-int-to-float cast optimization because too much
existing code relies on the platform-specific undefined result of those casts when the
float-to-int overflows.
The LLVM changes associated with adding this function attribute are here:
rL330947
rL330950
rL330951
Also as suggested, I changed the LLVM doc to mention the specific sanitizer flag that
catches this problem:
rL330958
Differential Revision: https://reviews.llvm.org/D46135
llvm-svn: 331041
The ACLE spec which describes these intrinsics hasn't been published yet, but
this is based on the final draft which will be published soon, and these have
already been implemented by GCC.
Differential revision: https://reviews.llvm.org/D46109
llvm-svn: 331039
On AVX512F targets we'll produce an emulated sequence using 3 pmuludqs with shifts and adds. On AVX512DQ we'll use vpmulld.
Fixes PR37140.
llvm-svn: 330923
Passing the features in random order will lead to unpredictable results
when some of the features are related (like the architecture-version
features on ARM).
It might be possible to fix this particular case in the ARM target code,
to avoid adding overlapping target features. But we should probably be
sorting in any case: the behavior shouldn't depend on StringMap's
hashing algorithm.
Differential Revision: https://reviews.llvm.org/D46030
llvm-svn: 330861
The unmasked versions already didn't have this restrction. I don't think gcc or icc limit these to 64-bit mode so we shouldn't either.
llvm-svn: 330681
If an atomic variable is misaligned (and that suspicion is why Clang emits
libcalls at all) the runtime support library will have to use a lock to safely
access it, with potentially very bad performance consequences. There's a very
good chance this is unintentional so it makes sense to issue a warning.
Also give it a named group so people can promote it to an error, or disable it
if they really don't care.
llvm-svn: 330566
Summary:
By default Clang outputs its version (including git commit hash, in
case of trunk builds) into object and assembly files. It might be
useful to have an option to disable this, especially for debugging
purposes.
This patch implements new command line flags -Qn and -Qy (the names
are chosen for compatibility with GCC). -Qn disables output of
the 'llvm.ident' metadata string and the 'producer' debug info. -Qy
(enabled by default) does the opposite.
Reviewers: faisalv, echristo, aprantl
Reviewed By: aprantl
Subscribers: aprantl, cfe-commits, JDevlieghere, rogfer01
Differential Revision: https://reviews.llvm.org/D45255
llvm-svn: 330442
This implements support for the previously ignored flag
`-falign-functions`. This allows the frontend to request alignment on
function definitions in the translation unit where they are not
explicitly requested in code. This is compatible with the GCC behaviour
and the ICC behaviour.
The scalar value passed to `-falign-functions` aligns functions to a
power-of-two boundary. If flag is used, the functions are aligned to
16-byte boundaries. If the scalar is specified, it must be an integer
less than or equal to 4096. If the value is not a power-of-two, the
driver will round it up to the nearest power of two.
llvm-svn: 330378
The force_align_arg_pointer attribute was using a hardcoded 16-byte
alignment value which in combination with -mstack-alignment=32 (or
larger) would produce a misaligned stack which could result in crashes
when accessing stack buffers using aligned AVX load/store instructions.
Fix the issue by using the "stackrealign" function attribute instead
of using a hardcoded 16-byte alignment.
Patch By: Gramner
Differential Revision: https://reviews.llvm.org/D45812
llvm-svn: 330331
This is the patch that lowers x86 intrinsics to native IR
in order to enable optimizations.
Patch by tkrupa
Differential Revision: https://reviews.llvm.org/D44786
llvm-svn: 330323
Summary:
A clang builtin for xray typed events. Differs from
__xray_customevent(...) by the presence of a type tag that is vended by
compiler-rt in typical usage. This allows xray handlers to expand logged
events with their type description and plugins to process traced events
based on type.
This change depends on D45633 for the intrinsic definition.
Reviewers: dberris, pelikan, rnk, eizan
Subscribers: cfe-commits, llvm-commits
Differential Revision: https://reviews.llvm.org/D45716
llvm-svn: 330220
register destructor functions annotated with __attribute__((destructor))
using __cxa_atexit or atexit.
Register destructor functions annotated with __attribute__((destructor))
calling __cxa_atexit in a synthesized constructor function instead of
emitting references to the functions in a special section.
The primary reason for adding this option is that we are planning to
deprecate the __mod_term_funcs section on Darwin in the future. This
feature is enabled by default only on Darwin. Users who do not want this
can use command line option 'fno_register_global_dtors_with_atexit' to
disable it.
rdar://problem/33887655
Differential Revision: https://reviews.llvm.org/D45578
llvm-svn: 330199
Summary:
The clang driver option -save-temps was not passed to the LTO config,
so when invoking the ThinLTO backends via clang during distributed
builds there was no way to get LTO to save temp files.
Getting this to work with ThinLTO distributed builds also required
changing the driver to avoid a separate compile step to emit unoptimized
bitcode when the input was already bitcode under -save-temps. Not only is
this unnecessary in general, it is problematic for ThinLTO backends since
the temporary bitcode file to the backend would not match the module path
in the combined index, leading to incorrect ThinLTO backend index-based
optimizations.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, eraman, cfe-commits
Differential Revision: https://reviews.llvm.org/D45217
llvm-svn: 330194
Currently, the interaction between the triple, the CPU, and the
supported features is a mess: the driver edits the triple to indicate
the supported architecture version, and the LLVM backend uses this to
figure out what instructions are legal. This makes it difficult to
understand what's happening, and makes it impossible to LTO together two
modules with different computed architectures.
Instead of relying on triple rewriting to get the correct target
features, we should add the right target features explicitly.
Differential Revision: https://reviews.llvm.org/D45240
llvm-svn: 330169
Summary:
This change addresses http://llvm.org/PR36926 by allowing users to pick
which instrumentation bundles to use, when instrumenting with XRay. In
particular, the flag `-fxray-instrumentation-bundle=` has four valid
values:
- `all`: the default, emits all instrumentation kinds
- `none`: equivalent to -fnoxray-instrument
- `function`: emits the entry/exit instrumentation
- `custom`: emits the custom event instrumentation
These can be combined either as comma-separated values, or as
repeated flag values.
Reviewers: echristo, kpw, eizan, pelikan
Reviewed By: pelikan
Subscribers: mgorny, cfe-commits
Differential Revision: https://reviews.llvm.org/D44970
llvm-svn: 329985
It means the same thing as -mllvm; there isn't any reason to have two
options which do the same thing.
Differential Revision: https://reviews.llvm.org/D45109
llvm-svn: 329965
A previously missing intrinsic for an old instruction.
Reviewers: craig.topper, echristo
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D45311
llvm-svn: 329937
The WBNOINVD instruction writes back all modified
cache lines in the processor’s internal cache to main memory
but does not invalidate (flush) the internal caches.
Reviewers: craig.topper, zvi, ashlykov
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D43817
llvm-svn: 329848
When we enter a __finally block, the CGF's CurCodeDecl will be null
(because CodeGenFunction::StartFunction is given an empty GlobalDecl for
a __finally block), and so the dyn_cast here will result in an assertion
failure. Change it to dyn_cast_or_null to handle this case.
Differential Revision: https://reviews.llvm.org/D45523
llvm-svn: 329836
When NVPTX TARGET_BUILTIN specifies sm_XX or ptxYY as required feature,
consider those features available if we're compiling for GPU >= sm_XX or have
enabled PTX version >= ptxYY.
Differential Revision: https://reviews.llvm.org/D45061
llvm-svn: 329829
This test ensures the popfd instruction in MS inline assembly can properly find a clobber name for the dirflag register. Previously the register was named 'DF', but it needs to be named 'dirflag' to match the name in the GCC register name list.
llvm-svn: 329738
Summary:
Right now to disable -fsanitize=kernel-address instrumentation, one needs to use no_sanitize("kernel-address"). Make either no_sanitize("address") or no_sanitize("kernel-address") disable both ASan and KASan instrumentation. Also remove redundant test.
Patch by Andrey Konovalov
Reviewers: eugenis, kcc, glider, dvyukov, vitalybuka
Reviewed By: eugenis, vitalybuka
Differential Revision: https://reviews.llvm.org/D44981
llvm-svn: 329612
I believe all the pieces are now in place in the backend to make this work correctly. We can either mask the input to 32 bits for pmuludg or shl/ashr for pmuldq and use a regular mul instruction. The backend should combine this to PMULUDQ/PMULDQ and then SimplifyDemandedBits will remove the and/shifts.
Differential Revision: https://reviews.llvm.org/D45421
llvm-svn: 329605
Summary:
This change consolidates the always/never lists that may be provided to
clang to externally control which functions should be XRay instrumented
by imbuing attributes. The files follow the same format as defined in
https://clang.llvm.org/docs/SanitizerSpecialCaseList.html for the
sanitizer blacklist.
We also deprecate the existing `-fxray-instrument-always=` and
`-fxray-instrument-never=` flags, in favour of `-fxray-attr-list=`.
This fixes http://llvm.org/PR34721.
Reviewers: echristo, vlad.tsyrklevich, eugenis
Reviewed By: vlad.tsyrklevich
Subscribers: llvm-commits, cfe-commits
Differential Revision: https://reviews.llvm.org/D45357
llvm-svn: 329543
Found via codespell -q 3 -I ../clang-whitelist.txt
Where whitelist consists of:
archtype
cas
classs
checkk
compres
definit
frome
iff
inteval
ith
lod
methode
nd
optin
ot
pres
statics
te
thru
Patch by luzpaz! (This is a subset of D44188 that applies cleanly with a few
files that have dubious fixes reverted.)
Differential revision: https://reviews.llvm.org/D44188
llvm-svn: 329399
Summary:
"-fmerge-all-constants" is a non-conforming optimization and should not
be the default. It is also causing miscompiles when building Linux
Kernel (https://lkml.org/lkml/2018/3/20/872).
Fixes PR18538.
Reviewers: rjmccall, rsmith, chandlerc
Reviewed By: rsmith, chandlerc
Subscribers: srhines, cfe-commits
Differential Revision: https://reviews.llvm.org/D45289
llvm-svn: 329300
Summary:
Add support for the -fsanitize=shadow-call-stack flag which causes clang
to add ShadowCallStack attribute to functions compiled with that flag
enabled.
Reviewers: pcc, kcc
Reviewed By: pcc, kcc
Subscribers: cryptoad, cfe-commits, kcc
Differential Revision: https://reviews.llvm.org/D44801
llvm-svn: 329122
Summary:
Allow rN registers to be simply parsed as correspoing xN registers.
The "register ... asm("rN")" is an command to the
compiler's register allocator, not an operand to any individual assembly
instruction. GCC documents this syntax as "...the name of the register
that should be used."
This is needed to support the changes in Linux kernel (see
https://lkml.org/lkml/2018/3/1/268 )
Note: This will add support only for the limited use case of
register ... asm("rN"). Any other uses that make rN leak into assembly
are not supported.
Reviewers: kristof.beyls, rengolin, peter.smith, t.p.northover
Reviewed By: peter.smith
Subscribers: javed.absar, eraman, cfe-commits, srhines
Differential Revision: https://reviews.llvm.org/D44815
llvm-svn: 328829
This commit generalizes NRVO to cover C structs (both trivial and
non-trivial structs).
rdar://problem/33599681
Differential Revision: https://reviews.llvm.org/D44968
llvm-svn: 328809
The conversion of operatios to bitcode helps to eliminate an additional
store in certain cases. We used to lower these load intrinsics in DAG to
DAG conversion by which time, the "Dead Store Elimination" pass is
already run. There is an associated LLVM patch.
Patch by Sumanth Gundapaneni.
llvm-svn: 328776
These instructions have been around for a long time, but we
haven't supported intrinsics for them. The "new" vesrions use
the CSx register for the start of the buffer instead of the K
field in the Mx register.
There is a related llvm patch.
Patch by Brendon Cahoon.
llvm-svn: 328725
Summary:
r318093 sets fma, fmaf, fmal as const for Gnu and MSVC. Android also
does not set errno for these functions. So mark these const for
Android.
Reviewers: spatel, efriedma, srhines, chh, enh
Subscribers: cfe-commits, llvm-commits
Differential Revision: https://reviews.llvm.org/D44852
llvm-svn: 328552
Putting back the code in commit r327189 that was reverted in r322737. The code is being committed in three stages and this one is the last stage: 1) r327455 fp16 feature flags, 2) r327836 pass half type or i16 based on FullFP16, and 3) the code here which the front-end fp16 vector intrinsic for ARM.
Differential revision https://reviews.llvm.org/D43650
llvm-svn: 328277
The difference between CreateRuntimeFunction and CreateBuiltinFunction
is that CreateBuiltinFunction would not set dllimport or dso_local.
To keep the current semantics, just forward to CreateRuntimeFunction
with Local=true so it doesn't add dllimport.
llvm-svn: 328224
Now that LLVM has support for emitting calling conventions in DWARF (see
r328191) have clang emit them.
Patch by: Adrien Guinet
Differential revision: https://reviews.llvm.org/D42351
llvm-svn: 328196
This is needed for the upcoming implementation of the
new 8x32x16 and 32x8x16 variants of WMMA instructions
introduced in CUDA 9.1.
Differential Revision: https://reviews.llvm.org/D44719
llvm-svn: 328158
The patch adds nocf_check target independent attribute for disabling checks that were enabled by cf-protection flag.
The attribute can be appertained to functions and function pointers.
Attribute name follows GCC's similar attribute name.
Differential Revision: https://reviews.llvm.org/D41880
llvm-svn: 327768
In C, we'll wait until the end of the scope to clean up aggregate
temporaries used for returns from calls. This means in cases like:
{
// Assuming that `Bar` is large enough to warrant indirect returns
struct Bar b = {};
b = foo(&b);
b = foo(&b);
b = foo(&b);
b = foo(&b);
}
...We'll allocate space for 5 Bars on the stack (`b`, and 4
temporaries). This becomes painful in things like large switch
statements.
If cleaning up sooner is trivial, we should do it.
llvm-svn: 327229
Simplify the dispatching for the personality routines. This really had
no test coverage previously, so add test coverage for the various cases.
This turns out to be pretty complicated as the various languages and
models interact to change personalities around.
You really should feel bad for the compiler if you are using exceptions.
There is no reason for this type of cruelty.
llvm-svn: 327105
EmitLifetimeStart returns a non-null `size` pointer if it actually
emits a lifetime.start. Later in this function, we use `tempSize`'s
nullness to determine whether or not we should emit a lifetime.end.
llvm-svn: 326844
Summary:
The _get_ssp intrinsic can be used to retrieve the
shadow stack pointer, independent of the current arch -- in
contract with the rdsspd and the rdsspq intrinsics.
Also, this intrinsic returns zero on CPUs which don't
support CET. The rdssp[d|q] instruction is decoded as nop,
essentially just returning the input operand, which is zero.
Example result of compilation:
```
xorl %eax, %eax
movl %eax, %ecx
rdsspq %rcx # NOP when CET is not supported
movq %rcx, %rax # return zero
```
Reviewers: craig.topper
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D43814
llvm-svn: 326689
Summary:
Currently only calls to mcount were suppressed with
no_instrument_function attribute.
Linux kernel requires that calls to fentry should also not be
generated.
This is an extended fix for PR PR33515.
Reviewers: hfinkel, rengolin, srhines, rnk, rsmith, rjmccall, hans
Reviewed By: rjmccall
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D43995
llvm-svn: 326639
Make types with sizes that aren't a power of two an error (that can
be disabled) in structs with ms_struct layout, except on mingw where
the situation is quite likely to occur and GCC handles it silently.
Differential Revision: https://reviews.llvm.org/D43908
llvm-svn: 326476
When targeting GNU/MinGW for i386, the size of the "long double" data
type is 12 bytes (while it is 8 bytes in MSVC). When building
with -mms-bitfields to have struct layouts match MSVC, data types
are laid out in a struct with alignment according to their size.
However, this doesn't make sense for the long double type, since
it doesn't match MSVC at all, and aligning to a non-power-of-2
size triggers other asserts later.
This matches what GCC does, aligning a long double to 4 bytes
in structs on i386 even when -mms-bitfields is specified.
This fixes asserts when using the max_align_t data type when
building for MinGW/i386 with the -mms-bitfields flag.
Differential Revision: https://reviews.llvm.org/D43734
llvm-svn: 326173