Commit Graph

4964 Commits

Author SHA1 Message Date
Hsiangkai Wang 3bec3abf38 [DebugInfo] Generate debug information for labels. (Fix PR37395)
Generate DILabel metadata and call llvm.dbg.label after label
statement to associate the metadata with the label.

After fixing PR37395.

Differential Revision: https://reviews.llvm.org/D45045

llvm-svn: 338989
2018-08-06 05:58:59 +00:00
David Bolvansky eb3aad9935 Fix tests for changed opt remarks format
Summary:
Optimization remark format is slightly changed by LLVM patch D49412.
Two tests are fixed with expected messages changed.
Frankly speaking I have not tested this change yet. I will test when manage to setup the project.

Reviewers: xbolva00

Reviewed By: xbolva00

Subscribers: mehdi_amini, eraman, steven_wu, dexonsmith

Differential Revision: https://reviews.llvm.org/D50241

llvm-svn: 338971
2018-08-05 14:53:34 +00:00
Richard Smith 06f71b5bd8 [constexpr] Support for constant evaluation of __builtin_memcpy and
__builtin_memmove (in non-type-punning cases).

This is intended to permit libc++ to make std::copy etc constexpr
without sacrificing the optimization that uses memcpy on
trivially-copyable types.

__builtin_strcpy and __builtin_wcscpy are not handled by this change.
They'd be straightforward to add, but we haven't encountered a need for
them just yet.

This reinstates r338455, reverted in r338602, with a fix to avoid trying
to constant-evaluate a memcpy call if either pointer operand has an
invalid designator.

llvm-svn: 338941
2018-08-04 00:57:17 +00:00
Graham Yiu 668e894ccb Fix asm label testcase flaw
- Testcase attempts to (not) grep 'g0' in output to ensure asm symbol is
  properly renamed, but g0 is too generic and can be part of the
  module's path in LLVM IR output.
- Changed to grep '@g0', which is what the proper global symbol name
  would be if not using asm.

llvm-svn: 338895
2018-08-03 14:36:44 +00:00
Heejin Ahn 00aa81b4df [WebAssembly] Support for atomic.wait / atomic.wake builtins
Summary:
Add support for atomic.wait / atomic.wake builtins based on the Wasm
thread proposal.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, cfe-commits

Differential Revision: https://reviews.llvm.org/D49396

llvm-svn: 338771
2018-08-02 21:44:40 +00:00
Hans Wennborg 6bd4f924e7 Revert r338455 "[constexpr] Support for constant evaluation of __builtin_memcpy and __builtin_memmove (in non-type-punning cases)."
It caused asserts during Chromium builds, see reply on the cfe-commits thread.

> This is intended to permit libc++ to make std::copy etc constexpr
> without sacrificing the optimization that uses memcpy on
> trivially-copyable types.
>
> __builtin_strcpy and __builtin_wcscpy are not handled by this change.
> They'd be straightforward to add, but we haven't encountered a need for
> them just yet.

llvm-svn: 338602
2018-08-01 17:51:23 +00:00
Richard Smith 96beffba15 [constexpr] Support for constant evaluation of __builtin_memcpy and
__builtin_memmove (in non-type-punning cases).

This is intended to permit libc++ to make std::copy etc constexpr
without sacrificing the optimization that uses memcpy on
trivially-copyable types.

__builtin_strcpy and __builtin_wcscpy are not handled by this change.
They'd be straightforward to add, but we haven't encountered a need for
them just yet.

llvm-svn: 338455
2018-07-31 23:35:09 +00:00
Mandeep Singh Grang 1f6fb8d063 [COFF, ARM64] Enable SEH for ARM64 Windows
Reviewers: rnk, mstorsjo, ssijaric, haripul, TomTan

Reviewed By: rnk

Subscribers: kristof.beyls, chrib, cfe-commits

Differential Revision: https://reviews.llvm.org/D50029

llvm-svn: 338405
2018-07-31 17:42:05 +00:00
Roman Lebedev b69ba22773 [clang][ubsan] Implicit Conversion Sanitizer - integer truncation - clang part
Summary:
C and C++ are interesting languages. They are statically typed, but weakly.
The implicit conversions are allowed. This is nice, allows to write code
while balancing between getting drowned in everything being convertible,
and nothing being convertible. As usual, this comes with a price:

```
unsigned char store = 0;

bool consume(unsigned int val);

void test(unsigned long val) {
  if (consume(val)) {
    // the 'val' is `unsigned long`, but `consume()` takes `unsigned int`.
    // If their bit widths are different on this platform, the implicit
    // truncation happens. And if that `unsigned long` had a value bigger
    // than UINT_MAX, then you may or may not have a bug.

    // Similarly, integer addition happens on `int`s, so `store` will
    // be promoted to an `int`, the sum calculated (0+768=768),
    // and the result demoted to `unsigned char`, and stored to `store`.
    // In this case, the `store` will still be 0. Again, not always intended.
    store = store + 768; // before addition, 'store' was promoted to int.
  }

  // But yes, sometimes this is intentional.
  // You can either make the conversion explicit
  (void)consume((unsigned int)val);
  // or mask the value so no bits will be *implicitly* lost.
  (void)consume((~((unsigned int)0)) & val);
}
```

Yes, there is a `-Wconversion`` diagnostic group, but first, it is kinda
noisy, since it warns on everything (unlike sanitizers, warning on an
actual issues), and second, there are cases where it does **not** warn.
So a Sanitizer is needed. I don't have any motivational numbers, but i know
i had this kind of problem 10-20 times, and it was never easy to track down.

The logic to detect whether an truncation has happened is pretty simple
if you think about it - https://godbolt.org/g/NEzXbb - basically, just
extend (using the new, not original!, signedness) the 'truncated' value
back to it's original width, and equality-compare it with the original value.

The most non-trivial thing here is the logic to detect whether this
`ImplicitCastExpr` AST node is **actually** an implicit conversion, //or//
part of an explicit cast. Because the explicit casts are modeled as an outer
`ExplicitCastExpr` with some `ImplicitCastExpr`'s as **direct** children.
https://godbolt.org/g/eE1GkJ

Nowadays, we can just use the new `part_of_explicit_cast` flag, which is set
on all the implicitly-added `ImplicitCastExpr`'s of an `ExplicitCastExpr`.
So if that flag is **not** set, then it is an actual implicit conversion.

As you may have noted, this isn't just named `-fsanitize=implicit-integer-truncation`.
There are potentially some more implicit conversions to be warned about.
Namely, implicit conversions that result in sign change; implicit conversion
between different floating point types, or between fp and an integer,
when again, that conversion is lossy.

One thing i know isn't handled is bitfields.

This is a clang part.
The compiler-rt part is D48959.

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=21530 | PR21530 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=37552 | PR37552 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=35409 | PR35409 ]].
Partially fixes [[ https://bugs.llvm.org/show_bug.cgi?id=9821 | PR9821 ]].
Fixes https://github.com/google/sanitizers/issues/940. (other than sign-changing implicit conversions)

Reviewers: rjmccall, rsmith, samsonov, pcc, vsk, eugenis, efriedma, kcc, erichkeane

Reviewed By: rsmith, vsk, erichkeane

Subscribers: erichkeane, klimek, #sanitizers, aaron.ballman, RKSimon, dtzWill, filcab, danielaustin, ygribov, dvyukov, milianw, mclow.lists, cfe-commits, regehr

Tags: #sanitizers

Differential Revision: https://reviews.llvm.org/D48958

llvm-svn: 338288
2018-07-30 18:58:30 +00:00
Momchil Velikov 20208cc046 [ARM, AArch64]: Use unadjusted alignment when passing composites as arguments
The "Procedure Call Procedure Call Standard for the ARM® Architecture"
(https://static.docs.arm.com/ihi0042/f/IHI0042F_aapcs.pdf), specifies that
composite types are passed according to their "natural alignment", i.e. the
alignment before alignment adjustment on the entire composite is applied.

The same applies for AArch64 ABI.

Clang, however, used the adjusted alignment.

GCC already implements the ABI correctly. With this patch Clang becomes
compatible with GCC and passes such arguments in accordance with AAPCS.

Differential Revision: https://reviews.llvm.org/D46013

llvm-svn: 338279
2018-07-30 17:48:23 +00:00
Stefan Maksimovic 6e50be1e97 [mips64][clang] Adjust tests to account for changes in r338239
llvm-svn: 338246
2018-07-30 12:27:40 +00:00
Mandeep Singh Grang 2a153101bf [COFF, ARM64] Decide when to mark struct returns as SRet
Summary:
Refer the MS ARM64 ABI Convention for the behavior for struct returns:
https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions#return-values

Reviewers: mstorsjo, compnerd, rnk, javed.absar, yinma, efriedma

Reviewed By: rnk, efriedma

Subscribers: haripul, TomTan, yinma, efriedma, kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D49464

llvm-svn: 338050
2018-07-26 18:07:59 +00:00
JF Bastien 6508929da9 CodeGen: use non-zero memset when possible for automatic variables
Summary:
Right now automatic variables are either initialized with bzero followed by a few stores, or memcpy'd from a synthesized global. We end up encountering a fair amount of code where memcpy of non-zero byte patterns would be better than memcpy from a global because it touches less memory and generates a smaller binary. The optimizer could reason about this, but it's not really worth it when clang already knows.

This code could definitely be more clever but I'm not sure it's worth it. In particular we could track a histogram of bytes seen and figure out (as we do with bzero) if a memset could be followed by a handful of stores. Similarly, we could tune the heuristics for GlobalSize, but using the same as for bzero seems conservatively OK for now.

<rdar://problem/42563091>

Reviewers: dexonsmith

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D49771

llvm-svn: 337887
2018-07-25 04:29:03 +00:00
Shiva Chen 0ed11a9792 Revert "[DebugInfo] Generate debug information for labels. (Fix PR37395)"
This reverts commit 4288dd3bf082482e02c8a044c611c18168cb0180.

llvm-svn: 337803
2018-07-24 02:57:11 +00:00
Shiva Chen c50fbb9da7 [DebugInfo] Generate debug information for labels. (Fix PR37395)
Generate DILabel metadata and call llvm.dbg.label after label
statement to associate the metadata with the label.

After fixing PR37395.

Differential Revision: https://reviews.llvm.org/D45045

Patch by Hsiangkai Wang.

llvm-svn: 337800
2018-07-24 02:23:59 +00:00
Ivan A. Kosarev 8264bb8d34 [NEON] Fix support for vrndi_f32(), vrndiq_f32() and vrndns_f32() intrinsics
This patch adds support for vrndi_f32() and vrndiq_f32()
intrinsics in AArch32 mode and for vrndns_f32() intrinsic in
AArch64 mode.

Differential Revision: https://reviews.llvm.org/D48829

llvm-svn: 337690
2018-07-23 13:26:37 +00:00
Erich Keane 3efe00206f Implement cpu_dispatch/cpu_specific Multiversioning
As documented here: https://software.intel.com/en-us/node/682969 and
https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning
is an ICC feature that provides for function multiversioning.

This feature is implemented with two attributes: First, cpu_specific,
which specifies the individual function versions. Second, cpu_dispatch,
which specifies the location of the resolver function and the list of
resolvable functions.

This is valuable since it provides a mechanism where the resolver's TU
can be specified in one location, and the individual implementions
each in their own translation units.

The goal of this patch is to be source-compatible with ICC, so this
implementation diverges from the ICC implementation in a few ways:
1- Linux x86/64 only: This implementation uses ifuncs in order to
properly dispatch functions. This is is a valuable performance benefit
over the ICC implementation. A future patch will be provided to enable
this feature on Windows, but it will obviously more closely fit ICC's
implementation.
2- CPU Identification functions: ICC uses a set of custom functions to identify
the feature list of the host processor. This patch uses the cpu_supports
functionality in order to better align with 'target' multiversioning.
1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function
marked cpu_dispatch be an empty definition. This patch supports that as well,
however declarations are also permitted, since the linker will solve the
issue of multiple emissions.

Differential Revision: https://reviews.llvm.org/D47474

llvm-svn: 337552
2018-07-20 14:13:28 +00:00
Richard Smith 4c6568869e Fix typo causing assert in self-host.
llvm-svn: 337508
2018-07-19 23:24:41 +00:00
Richard Smith 83497d9ead When we choose to use zeroinitializer for a trailing portion of an array
constant, don't convert the rest into a packed struct.

If an array constant has a large non-zero portion and a large zero
portion, we want to emit the first part as an array and the rest as a
zeroinitializer if possible. This fixes a memory usage regression from
r333141 when compiling PHP.

llvm-svn: 337498
2018-07-19 21:38:56 +00:00
Nemanja Ivanovic 1ac56bd33f [PowerPC] Handle __builtin_xxpermdi the same way as GCC does
The codegen for this builtin was initially implemented to match GCC.
However, due to interest from users GCC changed behaviour to account for the
big endian bias of the instruction and correct it. This patch brings the
handling inline with GCC.

Fixes https://bugs.llvm.org/show_bug.cgi?id=38192

Differential Revision: https://reviews.llvm.org/D49424

llvm-svn: 337449
2018-07-19 12:44:15 +00:00
Manoj Gupta da08f6ac16 [clang]: Add support for "-fno-delete-null-pointer-checks"
Summary:
Support for this option is needed for building Linux kernel.
This is a very frequently requested feature by kernel developers.

More details : https://lkml.org/lkml/2018/4/4/601

GCC option description for -fdelete-null-pointer-checks:
This Assume that programs cannot safely dereference null pointers,
and that no code or data element resides at address zero.

-fno-delete-null-pointer-checks is the inverse of this implying that
null pointer dereferencing is not undefined.

This feature is implemented in as the function attribute
"null-pointer-is-valid"="true".
This CL only adds the attribute on the function.
It also strips "nonnull" attributes from function arguments but
keeps the related warnings unchanged.

Corresponding LLVM change rL336613 already updated the
optimizations to not treat null pointer dereferencing
as undefined if the attribute is present.

Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv

Reviewed By: jyknight

Subscribers: drinkcat, xbolva00, cfe-commits

Differential Revision: https://reviews.llvm.org/D47894

llvm-svn: 337433
2018-07-19 00:44:52 +00:00
JF Bastien 7d60a0f118 Support implicit _Atomic struct load / store
Summary:
Using _Atomic to do implicit load / store is just a seq_cst atomic_load / atomic_store. Stores currently assert in Sema::ImpCastExprToType with 'can't implicitly cast lvalue to rvalue with this cast kind', but that's erroneous. The codegen is fine as the test shows.

While investigating I found that Richard had found the problem here: https://reviews.llvm.org/D46112#1113557

<rdar://problem/40347123>

Reviewers: dexonsmith

Subscribers: cfe-commits, efriedma, rsmith, aaron.ballman

Differential Revision: https://reviews.llvm.org/D49458

llvm-svn: 337410
2018-07-18 18:01:41 +00:00
Peter Collingbourne 14b468bab6 Re-land r337333, "Teach Clang to emit address-significance tables.",
which was reverted in r337336.

The problem that required a revert was fixed in r337338.

Also added a missing "REQUIRES: x86-registered-target" to one of
the tests.

Original commit message:
> Teach Clang to emit address-significance tables.
>
> By default, we emit an address-significance table on all ELF
> targets when the integrated assembler is enabled. The emission of an
> address-significance table can be controlled with the -faddrsig and
> -fno-addrsig flags.
>
> Differential Revision: https://reviews.llvm.org/D48155

llvm-svn: 337339
2018-07-18 00:27:07 +00:00
Peter Collingbourne 35c6996b68 Revert r337333, "Teach Clang to emit address-significance tables."
Causing multiple failures on sanitizer bots due to TLS symbol errors,
e.g.

/usr/bin/ld: __msan_origin_tls: TLS definition in /home/buildbots/ppc64be-clang-test/clang-ppc64be/stage1/lib/clang/7.0.0/lib/linux/libclang_rt.msan-powerpc64.a(msan.cc.o) section .tbss.__msan_origin_tls mismatches non-TLS reference in /tmp/lit_tmp_0a71tA/mallinfo-3ca75e.o

llvm-svn: 337336
2018-07-17 23:56:30 +00:00
Peter Collingbourne 27242c0402 Teach Clang to emit address-significance tables.
By default, we emit an address-significance table on all ELF
targets when the integrated assembler is enabled. The emission of an
address-significance table can be controlled with the -faddrsig and
-fno-addrsig flags.

Differential Revision: https://reviews.llvm.org/D48155

llvm-svn: 337333
2018-07-17 23:17:16 +00:00
Mandeep Singh Grang 0054f48b44 [COFF] Add more missing MSVC ARM64 intrinsics
Summary:
Added the following intrinsics:
_BitScanForward, _BitScanReverse, _BitScanForward64, _BitScanReverse64
_InterlockedAnd64, _InterlockedDecrement64, _InterlockedExchange64,
_InterlockedExchangeAdd64, _InterlockedExchangeSub64,
_InterlockedIncrement64, _InterlockedOr64, _InterlockedXor64.

Reviewers: compnerd, mstorsjo, rnk, javed.absar

Reviewed By: mstorsjo

Subscribers: kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D49445

llvm-svn: 337327
2018-07-17 22:03:24 +00:00
Joerg Sonnenberger b79e61f8b3 Always use __mcount on NetBSD. Some platforms don't provide _mcount.
llvm-svn: 337277
2018-07-17 13:13:34 +00:00
Roman Lebedev a67e0d0989 Harden/relax clang/test/CodeGen/opt-record-MIR.c test
Summary:
If the build path is short, `Line` field can end up fitting on the same line as `File`,
but the `{{.*}}` would consume it. Keeping in mind rL293149, i think we can fix it,
while keeping it working when there are and there are not any quotations.
At least this fixes this test for me.

Reviewers: anemet, aaron.ballman, hfinkel

Reviewed By: anemet

Subscribers: cfe-commits, llvm-commits

Differential Revision: https://reviews.llvm.org/D49348

llvm-svn: 337249
2018-07-17 07:12:08 +00:00
Krzysztof Parzyszek d66941fab9 [Hexagon] Fix hvx-length feature name in testcases
llvm-svn: 337049
2018-07-13 21:32:33 +00:00
JF Bastien 9aab85a6a0 CodeGen: specify alignment + inbounds for automatic variable initialization
Summary: Automatic variable initialization was generating default-aligned stores (which are deprecated) instead of using the known alignment from the alloca. Further, they didn't specify inbounds.

Subscribers: dexonsmith, cfe-commits

Differential Revision: https://reviews.llvm.org/D49209

llvm-svn: 337041
2018-07-13 20:33:23 +00:00
Craig Topper 2f7de23bea [X86] Change the rounding mode used when testing the sqrt_round intrinsics.
Using CUR_DIRECTION is not a realistic scenario. That is equivalent to the intrinsic without rounding.

llvm-svn: 337040
2018-07-13 20:16:38 +00:00
Krzysztof Parzyszek d5cd7ce743 [Hexagon] Fix testcases failing after r336933
llvm-svn: 336936
2018-07-12 19:44:39 +00:00
Akira Hatanaka e18c2d2ce3 os_log: When there are multiple privacy annotations in the format
string, choose the strictest one instead of the last.

Also fix an undefined behavior. Move the pointer update to a later point to
avoid adding StringRef::npos to the pointer.

rdar://problem/40706280

llvm-svn: 336863
2018-07-11 22:19:14 +00:00
Joel E. Denny 72c2783012 [FileCheck] Add -allow-deprecated-dag-overlap to failing clang tests
See https://reviews.llvm.org/D47106 for details.

Reviewed By: probinson

Differential Revision: https://reviews.llvm.org/D47172

llvm-svn: 336844
2018-07-11 20:26:20 +00:00
Petr Pavlu a934f9da41 Fix setting of empty implicit-section-name attribute
Code in `CodeGenModule::SetFunctionAttributes()` could set an empty
attribute `implicit-section-name` on a function that is affected by
`#pragma clang text="section"`. This is incorrect because the attribute
should contain a valid section name. If the function additionally also
used `__attribute__((section("section")))` then this could result in
emitting the function in a section with an empty name.

The patch fixes the issue by removing the problematic code that sets
empty `implicit-section-name` from
`CodeGenModule::SetFunctionAttributes()` because it is sufficient to set
this attribute only from a similar code in `setNonAliasAttributes()`
when the function is emitted.

Differential Revision: https://reviews.llvm.org/D48916

llvm-svn: 336842
2018-07-11 20:17:54 +00:00
Craig Topper db5c9aa366 [X86] Also fix the test for _mm512_mullo_epi64 to test the intrinsic instead of a copy of the intrinsic implementation.
This had the same issue I just fixed in r336739. Apparently I copy pasted _mm512_mullo_epi64 when I added _mm512_mullox_epi64.

llvm-svn: 336740
2018-07-10 23:28:05 +00:00
Craig Topper 4ddb2f3a18 [X86] Fix the test for _mm512_mullox_epi64 to test the intrinsic instead of a copy of the intrinsic implementation.
llvm-svn: 336739
2018-07-10 23:13:01 +00:00
Bjorn Pettersson 404f414ee1 Patch to fix pragma metadata for do-while loops
Summary:
Make sure that loop metadata only is put on the backedge
when expanding a do-while loop.
Previously we added the loop metadata also on the branch
in the pre-header. That could confuse optimization passes
and result in the loop metadata being associated with the
wrong loop.

Fixes https://bugs.llvm.org/show_bug.cgi?id=38011

Committing on behalf of deepak2427 (Deepak Panickal)

Reviewers: #clang, ABataev, hfinkel, aaron.ballman, bjope

Reviewed By: bjope

Subscribers: bjope, rsmith, shenhan, zzheng, xbolva00, lebedev.ri, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D48721

llvm-svn: 336717
2018-07-10 19:55:02 +00:00
Matt Arsenault 1dae500c4e AMDGPU: Try to fix test again
llvm-svn: 336681
2018-07-10 14:47:31 +00:00
Matt Arsenault 3ae7d63c80 Update test for backend error message change
llvm-svn: 336676
2018-07-10 14:03:50 +00:00
Mikhail Dvoretckii d1bf9ef0c7 [X86] Lowering integer truncation intrinsics to native IR
This patch lowers the _mm[256|512]_cvtepi{64|32|16}_epi{32|16|8} intrinsics to
native IR in cases where the result's length is less than 128 bits.

The resulting IR for 256-bit inputs is folded into VPMOV instructions, while for
128-bit inputs the vpshufb (or, in the 64-to-32-bit case, vinsertps)
instructions are generated instead

Differential Revision: https://reviews.llvm.org/D48712

llvm-svn: 336643
2018-07-10 08:22:44 +00:00
Craig Topper 36ab775cc1 [X86] Use masked the masked scalar fma builtins to implement the default rounding version of the fma intrinsics.
The rounding mode is checked in CGBuiltin.cpp to generate the correct intrinsic call.

Making this switch switchs the masking to use the i8 bitcast to <8 x i1> and extract i1 version of the IR for the mask. Previously we ended up with a scalar 'and' plus an icmp.

llvm-svn: 336637
2018-07-10 04:38:29 +00:00
Akira Hatanaka 189359d1ff Fix parsing of privacy annotations in os_log format strings.
Privacy annotations shouldn't have to appear in the first
comma-delimited string in order to be recognized. Also, they should be
ignored if they are preceded or followed by non-whitespace characters.

rdar://problem/40706280

llvm-svn: 336629
2018-07-10 00:50:25 +00:00
Craig Topper 638426fc36 [X86] Add __builtin_ia32_selectss_128 and __builtin_ia32_selectsd_128 that is suitable for use in scalar mask intrinsics.
This will convert the i8 mask argument to <8 x i1> and extract an i1 and then emit a select instruction. This replaces the '(__U & 1)" and ternary operator used in some of intrinsics. The old sequence was lowered to a scalar and and compare. The new sequence uses an i1 vector that will interoperate better with other mask intrinsics.

This removes the need to handle div_ss/sd specially in CGBuiltin.cpp. A follow up patch will add the GCCBuiltin name back in llvm and remove the custom handling.

I made some adjustments to legacy move_ss/sd intrinsics which we reused here to do a simpler extract and insert instead of 2 extracts and two inserts or a shuffle.

llvm-svn: 336622
2018-07-10 00:37:25 +00:00
Stefan Pintilie c172604dba [Power9] [CLANG] Add __float128 support for trunc to double round to odd
Add support for this builtin:
double builtin_truncf128_round_to_odd(float128)

Differential Revision: https://reviews.llvm.org/D48482

llvm-svn: 336596
2018-07-09 20:09:52 +00:00
Craig Topper 74c10e3236 [Builtins][Attributes][X86] Tag all X86 builtins with their required vector width. Add a min_vector_width function attribute and tag all x86 instrinsics with it
This is part of an ongoing attempt at making 512 bit vectors illegal in the X86 backend type legalizer due to CPU frequency penalties associated with wide vectors on Skylake Server CPUs. We want the loop vectorizer to be able to emit IR containing wide vectors as intermediate operations in vectorized code and allow these wide vectors to be legalized to 256 bits by the X86 backend even though we are targetting a CPU that supports 512 bit vectors. This is similar to what happens with an AVX2 CPU, the vectorizer can emit wide vectors and the backend will split them. We want this splitting behavior, but still be able to use new Skylake instructions that work on 256-bit vectors and support things like masking and gather/scatter.

Of course if the user uses explicit vector code in their source code we need to not split those operations. Especially if they have used any of the 512-bit vector intrinsics from immintrin.h. And we need to make it so that merely using the intrinsics produces the expected code in order to be backwards compatible.

To support this goal, this patch adds a new IR function attribute "min-legal-vector-width" that can indicate the need for a minimum vector width to be legal in the backend. We need to ensure this attribute is set to the largest vector width needed by any intrinsics from immintrin.h that the function uses. The inliner will be reponsible for merging this attribute when a function is inlined. We may also need a way to limit inlining in the future as well, but we can discuss that in the future.

To make things more complicated, there are two different ways intrinsics are implemented in immintrin.h. Either as an always_inline function containing calls to builtins(can be target specific or target independent) or vector extension code. Or as a macro wrapper around a taget specific builtin. I believe I've removed all cases where the macro was around a target independent builtin.

To support the always_inline function case this patch adds attribute((min_vector_width(128))) that can be used to tag these functions with their vector width. All x86 intrinsic functions that operate on vectors have been tagged with this attribute.

To support the macro case, all x86 specific builtins have also been tagged with the vector width that they require. Use of any builtin with this property will implicitly increase the min_vector_width of the function that calls it. I've done this as a new property in the attribute string for the builtin rather than basing it on the type string so that we can opt into it on a per builtin basis and avoid any impact to target independent builtins.

There will be future work to support vectors passed as function arguments and supporting inline assembly. And whatever else we can find that isn't covered by this patch.

Special thanks to Chandler who suggested this direction and reviewed a preview version of this patch. And thanks to Eric Christopher who has had many conversations with me about this issue.

Differential Revision: https://reviews.llvm.org/D48617

llvm-svn: 336583
2018-07-09 19:00:16 +00:00
Stefan Pintilie 3dbde8a778 [Power9] Add __float128 builtins for Round To Odd
Add a number of builtins for __float128 Round To Odd.
This is the Clang portion of the builtins work.

Differential Revision: https://reviews.llvm.org/D47548

llvm-svn: 336579
2018-07-09 18:50:40 +00:00
Craig Topper 8a8d72794f [X86] Add new scalar fma intrinsics with rounding mode that use f32/f64 types.
This allows us to handle masking in a very similar way to the default rounding version that uses llvm.fma

llvm-svn: 336507
2018-07-08 01:10:47 +00:00
Craig Topper 3e720a302c [X86] Fix a few intrinsics that were ignoring their rounding mode argument and hardcoded _MM_FROUND_CUR_DIRECTION internally.
I believe these have been broken since their introduction into clang.

I've enhanced the tests for these intrinsics to using a real rounding mode and checking all the intrinsic arguments instead of just the name.

llvm-svn: 336498
2018-07-07 22:03:16 +00:00
Craig Topper 5cbeeedd27 [X86] Fix various type mismatches in intrinsic headers and intrinsic tests that cause extra bitcasts to be emitted in the IR.
Found via imprecise grepping of the -O0 IR. There could still be more bugs out there.

llvm-svn: 336487
2018-07-07 17:03:32 +00:00
Craig Topper f89f62a680 [X86] When creating a select for scalar masked sqrt and div builtins make sure we optimize the all ones mask case.
This case occurs in the intrinsic headers so we should avoid emitting the mask in those cases.

Factor the code into a helper function to make this easy.

llvm-svn: 336472
2018-07-06 22:46:52 +00:00
Craig Topper 10f20fc42b [X86] Add missing scalar fma intrinsics with rounding, but no mask.
We had the mask versions of the rounding intrinsics, but not one without masking.

Also change the rounding tests to not use the CUR_DIRECTION rounding mode.

llvm-svn: 336470
2018-07-06 22:08:43 +00:00
Craig Topper be4c2933a2 [X86] Implement _builtin_ia32_vfmaddss and _builtin_ia32_vfmaddsd with native IR using llvm.fma intrinsic.
This generates some extra zeroing currently, but we should be able to quickly address that with some isel patterns.

llvm-svn: 336417
2018-07-06 07:14:47 +00:00
Hans Wennborg 7525edc890 [ms] Fix mangling of string literals used to initialize arrays larger or smaller than the literal
A Chromium developer reported a bug which turned out to be a mangling
collision between these two literals:

  char s[] = "foo";
  char t[32] = "foo";

They may look the same, but for the initialization of t we will (under
some circumstances) use a literal that's extended with zeros, and
both the length and those zeros should be accounted for by the mangling.

This actually makes the mangling code simpler: where it previously had
special logic for null terminators, which are not part of the
StringLiteral, that is now covered by the general algorithm.

(The problem was reported at https://crbug.com/857442)

Differential Revision: https://reviews.llvm.org/D48928

llvm-svn: 336415
2018-07-06 06:54:16 +00:00
Craig Topper 284c5f342c [X86] Use shufflevector instead of a select with a constant mask for fmaddsub/fmsubadd IR emission.
Shufflevector is easier to generate and matches what the backend pattern matches without relying on constant selects being turned into shuffles.

While I was there I also made the IR regular expressions a little stricter to ensure operand order on the shuffle.

llvm-svn: 336388
2018-07-05 20:38:31 +00:00
Gabor Buella 9679eb6527 [X86] Fix some vector cmp builtins - TRUE/FALSE predicates
This patch removes on optimization used with the TRUE/FALSE
predicates, as was suggested in https://reviews.llvm.org/D45616
for r335339.
The optimization was buggy, since r335339 used it also
for *_mask builtins, without actually applying the mask -- the
mask argument was just ignored.

Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D48715

llvm-svn: 336355
2018-07-05 14:26:56 +00:00
Gabor Buella 38a7ce76ae [X86] NFC - add more test cases for vector cmp intrinsics
Add test cases with each predicate using the following
intrinsics:
_mm_cmp_pd
_mm_cmp_ps
_mm256_cmp_pd
_mm256_cmp_ps
_mm_cmp_pd_mask
_mm_cmp_ps_mask
_mm256_cmp_pd_mask
_mm256_cmp_ps_mask
_mm512_cmp_pd_mask
_mm512_cmp_ps_mask
_mm_mask_cmp_pd_mask
_mm_mask_cmp_ps_mask
_mm256_mask_cmp_pd_mask
_mm256_mask_cmp_ps_mask
_mm512_mask_cmp_pd_mask
_mm512_mask_cmp_ps_mask

Some of these are marked with FIXME, as there is bug in lowering
e.g. _mm512_mask_cmp_ps_mask.

llvm-svn: 336346
2018-07-05 12:57:47 +00:00
Lei Huang 449252d2ad [Power9] Update fp128 as a valid homogenous aggregate base type
Update clang to treat fp128 as a valid base type for homogeneous aggregate
passing and returning.

Differential Revision: https://reviews.llvm.org/D48044

llvm-svn: 336308
2018-07-05 04:32:01 +00:00
Gabor Buella bf591794f4 NFC - Fix type in builtins-ppc-p9vector.c test
llvm-svn: 336264
2018-07-04 11:29:21 +00:00
Gabor Buella ab38eb29cf NFC - typo fix in test/CodeGen/avx512f-builtins.c
llvm-svn: 336243
2018-07-04 08:32:02 +00:00
Yaron Keren 14cd1997f1 Add expected fail triple x86_64-pc-windows-gnu to test as x86_64-w64-mingw32 is already there
llvm-svn: 336047
2018-06-30 11:18:44 +00:00
Craig Topper 0029470dde [X86] Correct the width of mask arguments in intrinsic headers and tests.
All of these found by grepping through IR from the builtin tests for extra trunc and zext/sext instructions that shouldn't have been there.

Some of these were real bugs where we lost bits from the user input:
_mm512_mask_broadcast_f32x8
_mm512_maskz_broadcast_f32x8
_mm512_mask_broadcast_i32x8
_mm512_maskz_broadcast_i32x8
_mm256_mask_cvtusepi16_storeu_epi8

llvm-svn: 336042
2018-06-30 06:05:17 +00:00
Craig Topper 0e9de769a0 [X86] Remove masking from the avx512 rotate builtins. Use a select builtin instead.
llvm-svn: 336036
2018-06-30 01:32:14 +00:00
Craig Topper 8bf793fb35 [X86] Remove masking from the avx512 packed sqrt builtins. Use select builtins instead.
llvm-svn: 335945
2018-06-29 05:43:33 +00:00
Francis Visoiu Mistrih 7b7b5eb60d [NEON] Remove empty test file from r335734
Fails on Green Dragon:
http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/50174/consoleFull

UNRESOLVED: Clang :: CodeGen/vld_dup.c (5546 of 38947)
******************** TEST 'Clang :: CodeGen/vld_dup.c' FAILED ********************
Test has no run line!

llvm-svn: 335750
2018-06-27 16:17:32 +00:00
Craig Topper 851f363691 [X86] Rename llvm.x86.avx512.mask.fpclass.p* to exclude 'mask.' from the name to match llvm.
llvm-svn: 335745
2018-06-27 15:57:57 +00:00
Ivan A. Kosarev a9f484ac4a [NEON] Support vldNq intrinsics in AArch32 (Clang part)
This patch reworks the support for dup NEON intrinsics as
described in D48439.

Differential Revision: https://reviews.llvm.org/D48440

llvm-svn: 335734
2018-06-27 13:58:43 +00:00
Teresa Johnson 25becb35e9 [ThinLTO] Add testing of summary index parsing to a couple CFI tests
Summary:
Changes to some clang side tests to go with the summary parsing patch.

Depends on D47905.

Reviewers: pcc, dexonsmith, mehdi_amini

Subscribers: inglorion, eraman, cfe-commits, steven_wu

Differential Revision: https://reviews.llvm.org/D47906

llvm-svn: 335618
2018-06-26 15:50:34 +00:00
Sam Clegg 6fd7d680b0 [WebAssembly] Add no-prototype attribute to prototype-less C functions
The WebAssembly backend in particular benefits from being
able to distinguish between varargs functions (...) and prototype-less
C functions.

Differential Revision: https://reviews.llvm.org/D48443

llvm-svn: 335510
2018-06-25 18:47:32 +00:00
Hans Wennborg 08c5a7b8fd [clang-cl] Don't emit dllexport inline functions etc. from pch files (PR37801)
With MSVC, PCH files are created along with an object file that needs to
be linked into the final library or executable. That object file
contains the code generated when building the headers. In particular, it
will include definitions of inline dllexport functions, and because they
are emitted in this object file, other files using the PCH do not need
to emit them. See the bug for an example.

This patch makes clang-cl match MSVC's behaviour in this regard, causing
significant compile-time savings when building dlls using precompiled
headers.

For example, in a 64-bit optimized shared library build of Chromium with
PCH, it reduces the binary size and compile time of
stroke_opacity_custom.obj from 9315564 bytes to 3659629 bytes and 14.6
to 6.63 s. The wall-clock time of building blink_core.dll goes from
38m41s to 22m33s. ("user" time goes from 1979m to 1142m).

Differential Revision: https://reviews.llvm.org/D48426

llvm-svn: 335466
2018-06-25 13:23:49 +00:00
Tobias Edler von Koch 7609cb83e6 Re-land "[LTO] Enable module summary emission by default for regular LTO"
Since we are now producing a summary also for regular LTO builds, we
need to run the NameAnonGlobals pass in those cases as well (the
summary cannot handle anonymous globals).

See https://reviews.llvm.org/D34156 for details on the original change.

This reverts commit 6c9ee4a4a438a8059aacc809b2dd57128fccd6b3.

llvm-svn: 335385
2018-06-22 20:23:21 +00:00
Gabor Buella 716863c820 [X86] Lower _mm[256|512]_cmp[.]_mask intrinsics to native llvm IR
Summary:
Lowering some vector comparision builtins to fcmp IR instructions.
This ignores the signaling behaviour specified in the predicate
argument of said builtins.

Affected AVX512 builtins:

__builtin_ia32_cmpps128_mask
__builtin_ia32_cmpps256_mask
__builtin_ia32_cmpps512_mask
__builtin_ia32_cmppd128_mask
__builtin_ia32_cmppd256_mask
__builtin_ia32_cmppd512_mask

Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma

Reviewed By: craig.topper, spatel, efriedma

Differential Revision: https://reviews.llvm.org/D45616

llvm-svn: 335339
2018-06-22 11:59:16 +00:00
Chandler Carruth 16e6bc23a1 [x86] Teach the builtin argument range check to allow invalid ranges in
dead code.

This is important for C++ templates that essentially compute the valid
input in a way that is constant and will cause all the invalid cases to
be dead code that is deleted. Code in the wild actually does this and
GCC also accepts these kinds of patterns so it is important to support
it.

To make this work, we provide a non-error path to diagnose these issues,
and use a default-error warning instead. This keeps the relatively
strict handling but prevents nastiness like SFINAE on these errors. It
also allows us to safely use the system to diagnose this only when it
occurs at runtime (in emitted code).

Entertainingly, this required fixing the syntax in various other ways
for the x86 test because we never bothered to diagnose that the returns
were invalid.

Since debugging these compile failures was super confusing, I've also
improved the diagnostic to actually say what the value was. Most of the
checks I've made ignore this to simplify maintenance, but I've checked
it in a few places to make sure the diagnsotic is working.

Depends on D48462. Without that, we might actually crash some part of
the compiler after bypassing the error here.

Thanks to Richard, Ben Kramer, and especially Craig Topper for all the
help here.

Differential Revision: https://reviews.llvm.org/D48464

llvm-svn: 335309
2018-06-21 23:46:09 +00:00
Craig Topper 342b095689 [X86] Update handling in CGBuiltin to be tolerant of out of range immediates.
D48464 contains changes that will loosen some of the range checks in SemaChecking to a DefaultError warning that can be disabled.

This patch adds explicit masking to avoid using the upper bits of immediates to gracefully handle the warning being disabled.

llvm-svn: 335308
2018-06-21 23:39:47 +00:00
Evgeniy Stepanov fb762b27f2 Ignore blacklist when generating __cfi_check_fail.
Summary: Fixes PR37898.

Reviewers: pcc, vlad.tsyrklevich

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D48454

llvm-svn: 335305
2018-06-21 23:22:37 +00:00
Tobias Edler von Koch e597a2cf81 Revert "[LTO] Enable module summary emission by default for regular LTO"
This is breaking a couple of buildbots. We need to run the
NameAnonGlobal pass for regular LTO now as well (since we're producing a
summary). I'll post a separate patch for review to make this happen and
then re-commit.

This reverts commit c0759b7b1f4a81ff9021b952aa38a222d5fa4dfd.

llvm-svn: 335291
2018-06-21 21:24:30 +00:00
Tobias Edler von Koch 9a8be606f3 [LTO] Enable module summary emission by default for regular LTO
Summary:
With D33921, we gained the ability to have module summaries in regular
LTO modules without triggering ThinLTO compilation. Module summaries in
regular LTO allow garbage collection (dead stripping) before LTO
compilation and thus open up additional optimization opportunities.

This patch enables summary emission in regular LTO for all targets
except ld64-based ones (which use the legacy LTO API).

Reviewers: pcc, tejohnson, mehdi_amini

Subscribers: inglorion, eraman, cfe-commits

Differential Revision: https://reviews.llvm.org/D34156

llvm-svn: 335284
2018-06-21 20:20:41 +00:00
Craig Topper 1763dbb278 [X86] Correct the inline assembly implementations of __movsb/w/d/q and __stosw/d/q to mark registers/memory as modified
The inline assembly for these didn't mark that edi, esi, ecx are modified by movs/stos instruction. It also didn't mark that memory is modified.

This issue was reported to llvm-dev last year http://lists.llvm.org/pipermail/cfe-dev/2017-November/055863.html but no bug was ever filed.

Differential Revision: https://reviews.llvm.org/D48448

llvm-svn: 335270
2018-06-21 18:56:30 +00:00
Anastasis Grammenos dfe8fe503c [DebugInfo] Inline for without DebugLocation
Summary:
This test is a strip down version of a function inside the
amalgamated sqlite source. When converted to IR clang produces
a phi instruction without debug location.

This patch fixes the above issue.

Differential Revision: https://reviews.llvm.org/D47720

llvm-svn: 335255
2018-06-21 16:53:48 +00:00
Craig Topper ddfe69cc99 [X86] Rewrite the add/mul/or/and reduction intrinsics to make better use of other intrinsics and remove undef shuffle indices.
Similar to what was done to max/min recently.

These already reduced the vector width to 256 and 128 bit as we go unlike the original max/min code.

Differential Revision: https://reviews.llvm.org/D48346

llvm-svn: 335253
2018-06-21 16:41:28 +00:00
Craig Topper 2da60bc231 [X86] Remove masking from the 512-bit floating point max/min builtins. Use select in IR instead.
llvm-svn: 335200
2018-06-21 05:01:01 +00:00
Sjoerd Meijer 5c4c998a54 [SPIR] Prevent SPIR targets from using half conversion intrinsics
The SPIR target currently allows for half precision floating point types to be
emitted using the LLVM intrinsic functions which convert half types to floats
and doubles. However, this is illegal in SPIR as the only intrinsic allowed by
SPIR is memcpy, as per section 3 of the SPIR specification. Currently this is
leading to an assert being hit in the Clang CodeGen when attempting to emit a
constant or literal _Float16 type in a comparison operation on a SPIR or SPIR64
target. This assert stems from the CodeGen attempting to emit a constant half
value as an integer because the backend has specified that it is using these
half conversion intrinsics (which represents half as i16). This patch prevents
SPIR targets from using these intrinsics by overloading the responsible target
info method, marks SPIR targets as having a legal half type and provides
additional regression testing for the _Float16 type on SPIR targets.

Patch by: Stephen McGroarty

Differential Revision: https://reviews.llvm.org/D48188

llvm-svn: 335111
2018-06-20 09:49:40 +00:00
Craig Topper 61495b3650 Recommit r335070 "[X86] Rewrite the max and min reduction intrinsics to make better use of other functions and to reduce width to 256 and 128 bits were possible.""
Test has been updated to reflect the IRGen.

llvm-svn: 335075
2018-06-19 21:00:30 +00:00
Craig Topper 79b13a0348 Revert r335070 "[X86] Rewrite the max and min reduction intrinsics to make better use of other functions and to reduce width to 256 and 128 bits were possible."
The test changes are failing the buildbot and its going to take me some time to fix it.

llvm-svn: 335072
2018-06-19 19:37:07 +00:00
Craig Topper 873afd0262 [X86] Rewrite the max and min reduction intrinsics to make better use of other functions and to reduce width to 256 and 128 bits were possible.
We only need to use 512 bit vectors all the way through v8i64 reductions since those max instructions are new to avx512f and only available in 512 bits until SKX.

For v16i32 and floating point we have legacy 128/256 bit instructions we can use.

I've tried to use other intrinsics to reduce the verbosity of the code and avoid having to mention all the shuffles. I've also removed all the -1 shuffle indices so the output sequence is fully specified and not left to backend optimization.

Differential Revision: https://reviews.llvm.org/D47401

llvm-svn: 335070
2018-06-19 19:13:54 +00:00
Tomasz Krupa f1792bb3d6 [X86] Lowering sqrt intrinsics to native IR
Reviewers: craig.topper, spatel, RKSimon, igorb, uriel.k

Reviewed By: craig.topper

Subscribers: tkrupa, cfe-commits

Differential Revision: https://reviews.llvm.org/D41168

llvm-svn: 334850
2018-06-15 18:05:59 +00:00
Luke Geeson da2b2e8c26 [AArch64] Reverted rC334696 with Clang VCVTA test fix
llvm-svn: 334820
2018-06-15 10:10:45 +00:00
Craig Topper b521dc3acf [X86] Add inline assembly versions of _InterlockedExchange_HLEAcquire/Release and _InterlockedCompareExchange_HLEAcquire/Release for MSVC compatibility.
Clang/LLVM doesn't have a way to pass an HLE hint through to the X86 backend to emit HLE prefixed instructions. So this is a good short term fix.

Differential Revision: https://reviews.llvm.org/D47672

llvm-svn: 334751
2018-06-14 18:43:52 +00:00
Tomasz Krupa 82aa42af49 [X86] Lowering Mask Scalar intrinsics to native IR (Clang part)
Summary: Lowering add, sub, mul, and div mask scalar intrinsic calls
to native IR.

Reviewers: craig.topper, RKSimon, spatel, sroland

Reviewed By: craig.topper

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D47979

llvm-svn: 334741
2018-06-14 17:36:23 +00:00
Luke Geeson bb399f8013 [AArch64] reverting rC334693 due to build failures
llvm-svn: 334696
2018-06-14 08:59:33 +00:00
Luke Geeson 010bbbf390 [AArch64] Added support for the vcvta_u16_f16 instrinsic for FP16 Armv8.2-A
llvm-svn: 334693
2018-06-14 08:28:56 +00:00
Mandeep Singh Grang 2d28383097 [COFF] Add ARM64 intrinsics: __yield, __wfe, __wfi, __sev, __sevl
Summary: These intrinsics result in hint instructions. They are provided here for MSVC ARM64 compatibility.

Reviewers: mstorsjo, compnerd, javed.absar

Reviewed By: mstorsjo

Subscribers: kristof.beyls, chrib, cfe-commits

Differential Revision: https://reviews.llvm.org/D48132

llvm-svn: 334639
2018-06-13 18:49:35 +00:00
Sanjay Patel 1d7ed94439 [CodeGen] make nan builtins pure rather than const (PR37778)
https://bugs.llvm.org/show_bug.cgi?id=37778
...shows a miscompile resulting from marking nan builtins as 'const'.

The nan libcalls/builtins take a pointer argument:
http://www.cplusplus.com/reference/cmath/nan-function/
...and the chars dereferenced by that arg are used to fill in the NaN constant payload bits.

"const" means that the pointer argument isn't dereferenced. That's translated to "readnone" in LLVM.
"pure" means that the pointer argument may be dereferenced. That's translated to "readonly" in LLVM.

This change prevents the IR optimizer from killing the lead-up to the nan call here:

double a() {
  char buf[4];
  buf[0] = buf[1] = buf[2] = '9';
  buf[3] = '\0';
  return __builtin_nan(buf);
}

...the optimizer isn't currently able to simplify this to a constant as we might hope, 
but this patch should solve the miscompile.

Differential Revision: https://reviews.llvm.org/D48134

llvm-svn: 334628
2018-06-13 17:54:52 +00:00
Craig Topper 2527c378c6 [X86] Remove masking from avx512vbmi2 concat and shift by immediate builtins. Use select builtins instead.
llvm-svn: 334577
2018-06-13 07:19:28 +00:00
Luke Geeson dc54b37414 [AArch64] Corrected FP16 Intrinsic range checks in Clang + added Sema tests
Summary:
This fixes the ranges for the vcvth family of FP16 intrinsics in the clang front end. Previously it was accepting incorrect ranges
-Changed builtin range checking in SemaChecking
-added tests SemaCheck changes - included in  their own file since no similar one exists
-modified existing tests to reflect new ranges

Reviewers: SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Subscribers: kristof.beyls, cfe-commits

Differential Revision: https://reviews.llvm.org/D47592

llvm-svn: 334489
2018-06-12 09:54:27 +00:00
Mikhail Maltsev 5434047862 [Driver] Add aliases for -Qn/-Qy
This patch adds aliases for -Qn (-fno-ident) and -Qy (-fident) which
look less cryptic than -Qn/-Qy. The aliases are compatible with GCC.

Differential Revision: https://reviews.llvm.org/D48021

llvm-svn: 334414
2018-06-11 16:10:06 +00:00
Craig Topper 91bbe98757 [X86] Remove masking from dbpsadbw builtins, use select builtin instead.
llvm-svn: 334385
2018-06-11 06:18:29 +00:00
Craig Topper 3cce6a7ed9 [X86] Use target independent masked expandload and compressstore intrinsics to implement expandload/compressstore builtins.
Summary: We've had these target independent intrinsics for at least a year and a half. Looks like they do exactly what we need here and the backend already supports them.

Reviewers: RKSimon, delena, spatel, GBuella

Reviewed By: RKSimon

Subscribers: cfe-commits, llvm-commits

Differential Revision: https://reviews.llvm.org/D47693

llvm-svn: 334366
2018-06-10 17:27:05 +00:00
Ivan A. Kosarev 73c76c35a5 [NEON] Support VST1xN intrinsics in AArch32 mode (Clang part)
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.

Differential Revision: https://reviews.llvm.org/D47446

llvm-svn: 334362
2018-06-10 09:28:10 +00:00
Craig Topper 3614b41a8e [X86] Remove masking from the 512-bit packed floating point add/sub/mul/div builtins. Use select in IR instead.
llvm-svn: 334359
2018-06-10 06:01:42 +00:00
Craig Topper 03f4f04b91 [X86] Add builtins for vpermq/vpermpd instructions to enable target feature checking.
llvm-svn: 334311
2018-06-08 18:00:25 +00:00
Craig Topper 03de166ccd [X86] Add builtins for pshufd, pshuflw, and pshufhw to enable target feature and immediate range checking.
llvm-svn: 334265
2018-06-08 06:13:16 +00:00
Craig Topper 3428beeb2f [X86] Add subvector insert and extract builtins to enable target feature checking and immediate range checking.
Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc.

llvm-svn: 334261
2018-06-08 03:24:47 +00:00
Craig Topper acf5601961 [X86] Add builtins for vpermilps/pd instructions to enable target feature checking.
llvm-svn: 334256
2018-06-08 00:59:27 +00:00
Craig Topper 7d17d7278b [X86] Add builtins for blend with immediate control to enforce target feature requirements and check immediate range.
llvm-svn: 334249
2018-06-08 00:00:21 +00:00
Craig Topper 9392136414 [X86] Add builtins for shuff32x4/shuff64x2/shufi32x4/shuff64x2 to enable target feature checking and immediate range checking.
llvm-svn: 334244
2018-06-07 23:03:08 +00:00
Shoaib Meenai d8d1547387 [Frontend] Disallow non-MSVC exception models for windows-msvc targets
The windows-msvc target is used for MSVC ABI compatibility, including
the exceptions model. It doesn't make sense to pair a windows-msvc
target with a non-MSVC exception model. This would previously cause an
assertion failure; explicitly error out for it in the frontend instead.
This also allows us to reduce the matrix of target/exception models a
bit (see the modified tests), and we can possibly simplify some of the
personality code in a follow-up.

Differential Revision: https://reviews.llvm.org/D47853

llvm-svn: 334243
2018-06-07 22:54:54 +00:00
Reid Kleckner aa46ed9278 [MS] Re-add support for the ARM interlocked bittest intrinscs
Adds support for these intrinsics, which are ARM and ARM64 only:
  _interlockedbittestandreset_acq
  _interlockedbittestandreset_rel
  _interlockedbittestandreset_nf
  _interlockedbittestandset_acq
  _interlockedbittestandset_rel
  _interlockedbittestandset_nf

Refactor the bittest intrinsic handling to decompose each intrinsic into
its action, its width, and its atomicity.

llvm-svn: 334239
2018-06-07 21:39:04 +00:00
Craig Topper d3623155a2 [X86] Add back builtins for _mm_slli_si128/_mm_srli_si128 and similar intrinsics.
We still lower them to native shuffle IR, but we do it in CGBuiltin.cpp now. This allows us to check the target feature and ensure the immediate fits in 8 bits.

This also improves our -O0 codegen slightly because we're able to see the zeroinitializer in the shuffle. It looks like it got lost behind a store+load previously.

llvm-svn: 334208
2018-06-07 17:28:03 +00:00
Gabor Buella 1a83d06768 [CodeGen] Improve diagnostics related to target attributes
Summary:
When requirement imposed by __target__ attributes on functions
are not satisfied, prefer printing those requirements, which
are explicitly mentioned in the attributes.

This makes such messages more useful, e.g. printing avx512f instead of avx2
in the following scenario:

```
$ cat foo.c
static inline void __attribute__((__always_inline__, __target__("avx512f")))
x(void)
{
}

int main(void)
{
            x();
}
$ clang foo.c
foo.c:7:2: error: always_inline function 'x' requires target feature 'avx2', but would be inlined into function 'main' that is compiled without support for 'avx2'
        x();
    ^
1 error generated.
```

bugzilla: https://bugs.llvm.org/show_bug.cgi?id=37338

Reviewers: craig.topper, echristo, dblaikie

Reviewed By: craig.topper, echristo

Differential Revision: https://reviews.llvm.org/D46541

llvm-svn: 334174
2018-06-07 08:48:36 +00:00
Craig Topper b92c77d176 [X86] Add back _mask, _maskz, and _mask3 builtins for some 512-bit fmadd/fmsub/fmaddsub/fmsubadd builtins.
Summary:
We recently switch to using a selects in the intrinsics header files for FMA instructions. But the 512-bit versions support flavors with rounding mode which must be an Integer Constant Expression. This has forced those intrinsics to be implemented as macros. As it stands now the mask and mask3 intrinsics evaluate one of their macro arguments twice. If that argument itself is another intrinsic macro, we can end up over expanding macros. Or if its something we can CSE later it would show up multiple times when it shouldn't.

I tried adding __extension__ around the macro and making it an expression statement and declaring a local variable. But whatever name you choose for the local variable can never be used as the name of an input to the macro in user code. If that happens you would end up with the same name on the LHS and RHS of an assignment after expansion. We might be safe if we use __ in front of the variable names because those names are reserved and user code shouldn't use that, but I wasn't sure I wanted to make that claim.

The other option which I've chosen here, is to add back _mask, _maskz, and _mask3 flavors of the builtin which we will expand in CGBuiltin.cpp to replicate the argument as needed and insert any fneg needed on the third operand to make a subtract. The _maskz isn't truly necessary if we have an unmasked version or if we use the masked version with a -1 mask and wrap a select around it. But I've chosen to make things more uniform.

I separated out the scalar builtin handling to avoid too many things going on in EmitX86FMAExpr. It was different enough due to the extract and insert that the minor duplication of the CreateCall was probably worth it.

Reviewers: tkrupa, RKSimon, spatel, GBuella

Reviewed By: tkrupa

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D47724

llvm-svn: 334159
2018-06-07 02:46:02 +00:00
Evandro Menezes 0804523bd5 [PATCH 2/2] [test] Add support for Samsung Exynos M4 (NFC)
Add test cases for Exynos M4.

llvm-svn: 334116
2018-06-06 18:58:01 +00:00
Reid Kleckner 11c99ed05f [MS][ARM64]: Promote _setjmp to_setjmpex as there is no _setjmp in the ARM64 libvcruntime.lib
Factor out the common setjmp call emission code.

Based on a patch by Chris January

Differential Revision: https://reviews.llvm.org/D47784

llvm-svn: 334112
2018-06-06 18:39:47 +00:00
Reid Kleckner 368d52b7e0 Implement bittest intrinsics generically for non-x86 platforms
I tested these locally on an x86 machine by disabling the inline asm
codepath and confirming that it does the same bitflips as we do with the
inline asm.

Addresses code review feedback.

llvm-svn: 334059
2018-06-06 01:35:08 +00:00
Craig Topper f3914b74c1 [X86] Add builtins for vector element insert and extract for different 128 and 256 bit vector types. Use them to implement the extract and insert intrinsics.
Previously we were just using extended vector operations in the header file.

This unfortunately allowed non-constant indices to be used with the intrinsics. This is incompatible with gcc, icc, and MSVC. It also introduces a different performance characteristic because non-constant index gets lowered to a vector store and an element sized load.

By adding the builtins we can check for the index to be a constant and ensure its in range of the vector element count.

User code still has the option to use extended vector operations themselves if they need non-constant indexing.

llvm-svn: 334057
2018-06-06 00:24:55 +00:00
Reid Kleckner 1d9c249db5 Reimplement the bittest intrinsic family as builtins with inline asm
We need to implement _interlockedbittestandset as a builtin for
windows.h, so we might as well do the whole family. It reduces code
duplication anyway.

Fixes PR33188, a long standing bug in our bittest implementation
encountered by Chakra.

llvm-svn: 333978
2018-06-05 01:33:40 +00:00
Teresa Johnson e2e630b01b [ThinLTO] Add testing of new summary index format to a couple CFI tests
Summary:
Adds testing of combined index summary entries in disassembly format
to CFI tests that were already testing the bitcode format.

Depends on D46699.

Reviewers: pcc

Subscribers: mehdi_amini, inglorion, eraman, cfe-commits

Differential Revision: https://reviews.llvm.org/D46700

llvm-svn: 333966
2018-06-04 23:05:24 +00:00
Reid Kleckner 89fbd55145 Revert r333791 "Cap "voluntary" vector alignment at 16 for all Darwin platforms."
Adding __attribute__((aligned(32))) to __m256 breaks the implementation
of _mm256_loadu_ps on Windows. On Windows, alignment attributes have
higher precedence than packing attributes.

We also might want to carefully consider the consequences of changing
our vector typedefs, since many users copy them and invent their own
new, non-Intel specific vector type names.

llvm-svn: 333958
2018-06-04 21:39:20 +00:00
Craig Topper 95ed88a1a9 [X86] Avoid passing _mm_undefined* to builtin_shufflevector if we are able to pass the first input a second time.
This is more consistent with other usages of builtin_shufflevector. Later optimization passes or codegen will detect the duplicate vector and replace it with undef. Using _mm_undefined just puts a zeroinitializer that still needs to be optimized out later.

llvm-svn: 333944
2018-06-04 19:28:09 +00:00
Craig Topper 6fb26f93ef [X86] Replace __builtin_ia32_vbroadcastf128_pd256 and __builtin_ia32_vbroadcastf128_ps256 with an unaligned load intrinsics and a __builtin_shufflevector call.
llvm-svn: 333853
2018-06-03 19:42:59 +00:00
Ivan A. Kosarev 9c40c0ad0c [NEON] Support VLD1xN intrinsics in AArch32 mode (Clang part)
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.

Differential Revision: https://reviews.llvm.org/D47121

llvm-svn: 333829
2018-06-02 17:42:59 +00:00
John McCall 280c656031 Cap "voluntary" vector alignment at 16 for all Darwin platforms.
This fixes two major problems:
- We were not capping vector alignment as desired on 32-bit ARM.
- We were using different alignments based on the AVX settings on
  Intel, so we did not have a consistent ABI.

This is an ABI break, but we think we can get away with it because
vectors tend to be used mostly in inline code (which is why not having
a consistent ABI has not proven disastrous on Intel).

Intel's AVX types are specified as having 32-byte / 64-byte alignment,
so align them explicitly instead of relying on the base ABI rule.
Note that this sort of attribute is stripped from template arguments
in template substitution, so there's a possibility that code templated
over vectors will produce inadequately-aligned objects.  The right
long-term solution for this is for alignment attributes to be
interpreted as true qualifiers and thus preserved in the canonical type.

llvm-svn: 333791
2018-06-01 21:34:26 +00:00
Dan Gohman 9f8ee03772 [WebAssembly] Update to the new names for the memory builtin functions.
The WebAssembly committee has decided on the names `memory.size` and
`memory.grow` for the memory intrinsics, so update the clang builtin
functions to follow those names, keeping both sets of old names in place
for compatibility.

llvm-svn: 333712
2018-06-01 00:05:51 +00:00
Peter Collingbourne 3aa30e8062 IRGen: Write .dwo files when -split-dwarf-file is used together with -fthinlto-index.
Differential Revision: https://reviews.llvm.org/D47597

llvm-svn: 333677
2018-05-31 18:25:59 +00:00
Craig Topper a6dd2faaea [X86] Make 512-bit unmasked load/store builtins more like their 128/256-bit equivalents.
Previously we were just passing -1 mask to the masked builtin. This changes it to the more generic way that the 128/256 bit use.

llvm-svn: 333626
2018-05-31 05:02:08 +00:00
Craig Topper c633867944 [X86] Remove __extension__ from macro intrinsics when its not needed.
I think this is a holdover from when we used to declare variables inside the macros. And then its been copy and pasted forward for years every time a new macro intrinsic gets added.

Interestingly this caused some tests for IRGen to be slightly more optimized. We now return a zeroinitializer directly instead of going through a store+load.

It also removed a bogus error message on another test.

llvm-svn: 333613
2018-05-31 00:51:20 +00:00
Eric Christopher 5b91350b4a Add fopen to the list of builtins that we check and whitelist.
llvm-svn: 333594
2018-05-30 21:11:45 +00:00
Craig Topper c5ec55e921 [X86] Simplify the implementation of _mm_sqrt_ss, _mm_rcp_ss, and _mm_rsqrt_ss.
We don't need the insertion back into the original vector at the end. The builtin already understands that.

This is different than _mm_sqrt_sd which takes two arguments and we do need to insert.

llvm-svn: 333572
2018-05-30 18:27:07 +00:00
Craig Topper dff5b311af [X86] Reduce the number of setzero intrinsics to just the set defined by the Intel Intrinsics Guide.
We had quite a few for different element sizes of integers sometimes with strange target features attached to them.

We only need a single version for each of _m128i, _m256i, and _m512i with the target feature that first introduced those types.

llvm-svn: 333568
2018-05-30 18:02:11 +00:00
Gabor Buella 70d8d51073 [X86] Lowering FMA intrinsics to native IR (Clang part)
This patch replaces all packed (and scalar without rounding
mode) fused intrinsics with fmadd/fmaddsub variations.
Then fmadd/fmaddsub are lowered to native IR.

Patch by tkrupa

Reviewers: craig.topper, sroland, spatel, RKSimon

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D47444

llvm-svn: 333555
2018-05-30 15:27:49 +00:00
Simon Tatham 89e31fa7fc Support __iso_volatile_load8 etc on aarch64-win32.
These intrinsics are used by MSVC's header files on AArch64 Windows as
well as AArch32, so we should support them for both targets. I've
factored them out of CodeGenFunction::EmitARMBuiltinExpr into separate
functions that EmitAArch64BuiltinExpr can call as well.

Reviewers: javed.absar, mstorsjo

Reviewed By: mstorsjo

Subscribers: kristof.beyls, cfe-commits

Differential Revision: https://reviews.llvm.org/D47476

llvm-svn: 333513
2018-05-30 07:54:05 +00:00
Daniel Cederman 8cc53aecad [Sparc] Add floating-point register names
Reviewers: jyknight

Reviewed By: jyknight

Subscribers: eraman, fedor.sergeev, jrtc27, cfe-commits

Differential Revision: https://reviews.llvm.org/D47137

llvm-svn: 333510
2018-05-30 06:02:18 +00:00
Craig Topper f6e79c6d3f [X86] Remove masking from the AVX512VNNI builtins. Use a select in IR instead.
llvm-svn: 333509
2018-05-30 05:26:04 +00:00
Craig Topper 3d9305f28b [X86] Fix the names of a bunch of icelake intrinsics.
Mostly this fixes the names of all the 128-bit intrinsics to start with _mm_ instead of _mm128_ as is the convention and what the Intel docs say.

This also fixes the name of the bitshuffle intrinsics to say epi64 for 128 and 256 bit versions.

llvm-svn: 333497
2018-05-30 03:38:15 +00:00
Craig Topper 68a272d501 [X86] Merge the 3 different flavors of masked vpermi2var/vpermt2var builtins to a single version without masking. Use select builtins with appropriate operand instead.
llvm-svn: 333387
2018-05-29 03:26:38 +00:00
Craig Topper f99532faee Revert r333347 "[X86] Rewrite the max and min reduction intrinsics to make better use of other functions and to reduce width to 256 and 128 bits were possible."
This wasn't supposed to be commited yet.

llvm-svn: 333349
2018-05-26 18:57:41 +00:00
Craig Topper 387b1423db [X86] Remove mask from avx512ifma builtins. Use a select instruction instead.
This reduces from 12 builtins to 6 since we no longer need a mask and maskz version.

llvm-svn: 333348
2018-05-26 18:55:26 +00:00
Craig Topper e091523c82 [X86] Rewrite the max and min reduction intrinsics to make better use of other functions and to reduce width to 256 and 128 bits were possible.
Summary:
We only need to use 512 bit vectors all the way through v8i64 reductions since those max instructions are new to avx512f and only available in 512 bits until SKX.

For v16i32 and floating point we have legacy 128/256 bit instructions we can use.

I've tried to use other intrinsics to reduce the verbosity of the code and avoid having to mention all the shuffles. I've also removed all the -1 shuffle indices so the output sequence is fully specified and not left to backend optimization.

Reviewers: RKSimon, spatel, GBuella

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D47401

llvm-svn: 333347
2018-05-26 18:55:24 +00:00
Paul Robinson 76178632a2 Revert "[DebugInfo] Don't bother with MD5 checksums of preprocessed files."
This reverts commit d734f2aa3f76fbf355ecd2bbe081d0c1f49867ab.
Also known as r333311.  A very small but nonzero number of bots fail.

llvm-svn: 333319
2018-05-25 22:35:59 +00:00
Paul Robinson 638d606f83 [DebugInfo] Don't bother with MD5 checksums of preprocessed files.
The checksum will not reflect the real source, so there's no clear
reason to include them in the debug info.  Also this was causing a
crash on the DWARF side.

Differential Revision: https://reviews.llvm.org/D47260

llvm-svn: 333311
2018-05-25 20:59:29 +00:00
Gabor Buella 078bb99a90 [x86] invpcid intrinsic
An intrinsic for an old instruction, as described in the Intel SDM.

Reviewers: craig.topper, rnk

Reviewed By: craig.topper, rnk

Differential Revision: https://reviews.llvm.org/D47142

llvm-svn: 333256
2018-05-25 06:34:42 +00:00
Gabor Buella 5219ed89be [X86] NFC Include immintrin.h in CodeGen tests
Following r333110:
"Move all Intel defined intrinsic includes into immintrin.h"

llvm-svn: 333160
2018-05-24 07:09:08 +00:00
Eric Christopher 85f0e505e7 Add Builtins.def support for fread and fwrite to ensure that -fno-builtin-
works with them and test accordingly.

llvm-svn: 333156
2018-05-24 06:09:28 +00:00
Eric Christopher c27ad9bbc8 Migrate libcalls-fno-builtin.c test from checking optimized assembly
to checking for attributes on the call site - and fix up builtin
functions that we were testing for but not ensuring wouldn't be
optimized by the backend.

Leave one set of asm tests to make sure that we're also communicating
builtin-ness to TLI.

llvm-svn: 333154
2018-05-24 06:00:50 +00:00
Richard Smith 3e268632cf Use zeroinitializer for (trailing zero portion of) large array initializers
more reliably.

This re-commits r333044 with a fix for PR37560.

llvm-svn: 333141
2018-05-23 23:41:38 +00:00
Hans Wennborg 156349fa10 Revert r333044 "Use zeroinitializer for (trailing zero portion of) large array initializers"
It caused asserts, see PR37560.

> Use zeroinitializer for (trailing zero portion of) large array initializers
> more reliably.
>
> Clang has two different ways it emits array constants (from InitListExprs and
> from APValues), and both had some ability to emit zeroinitializer, but neither
> was able to catch all cases where we could use zeroinitializer reliably. In
> particular, emitting from an APValue would fail to notice if all the explicit
> array elements happened to be zero. In addition, for large arrays where only an
> initial portion has an explicit initializer, we would emit the complete
> initializer (which could be huge) rather than emitting only the non-zero
> portion. With this change, when the element would have a suffix of more than 8
> zero elements, we emit the array constant as a packed struct of its initial
> portion followed by a zeroinitializer constant for the trailing zero portion.
>
> In passing, I found a bug where SemaInit would sometimes walk the entire array
> when checking an initializer that only covers the first few elements; that's
> fixed here to unblock testing of the rest.
>
> Differential Revision: https://reviews.llvm.org/D47166

llvm-svn: 333067
2018-05-23 08:24:01 +00:00
Craig Topper 39e0347e6a [X86] In the floating point max reduction intrinsics, negate infinity before feeding it to set1.
Previously we negated the whole vector after splatting infinity. But its better to negate the infinity before splatting. This generates IR with the negate already folded with the infinity constant.

llvm-svn: 333062
2018-05-23 05:51:52 +00:00
Craig Topper f2043b08b4 [X86] Remove mask argument from more builtins that are handled completely in CGBuiltin.cpp. Just wrap a select builtin around them in the header file instead.
llvm-svn: 333061
2018-05-23 04:51:54 +00:00
Richard Smith 9062bbf419 Use zeroinitializer for (trailing zero portion of) large array initializers
more reliably.

Clang has two different ways it emits array constants (from InitListExprs and
from APValues), and both had some ability to emit zeroinitializer, but neither
was able to catch all cases where we could use zeroinitializer reliably. In
particular, emitting from an APValue would fail to notice if all the explicit
array elements happened to be zero. In addition, for large arrays where only an
initial portion has an explicit initializer, we would emit the complete
initializer (which could be huge) rather than emitting only the non-zero
portion. With this change, when the element would have a suffix of more than 8
zero elements, we emit the array constant as a packed struct of its initial
portion followed by a zeroinitializer constant for the trailing zero portion.

In passing, I found a bug where SemaInit would sometimes walk the entire array
when checking an initializer that only covers the first few elements; that's
fixed here to unblock testing of the rest.

Differential Revision: https://reviews.llvm.org/D47166

llvm-svn: 333044
2018-05-23 00:09:29 +00:00
Sanjay Patel 74c7fb002f [CodeGen] use nsw negation for builtin abs
The clang builtins have the same semantics as the stdlib functions.
The stdlib functions are defined in section 7.20.6.1 of the C standard with:
"If the result cannot be represented, the behavior is undefined."

That lets us mark the negation with 'nsw' because "sub i32 0, INT_MIN" would
be UB/poison.

Differential Revision: https://reviews.llvm.org/D47202

llvm-svn: 333038
2018-05-22 23:02:13 +00:00
Peter Collingbourne 91d02844a3 Reland r332885, "CodeGen, Driver: Start using direct split dwarf emission in clang."
As well as two follow-on commits r332906, r332911 with a fix for
test clang/test/CodeGen/split-debug-filename.c.

llvm-svn: 333013
2018-05-22 18:52:37 +00:00
Sanjay Patel 1ff6b27940 [CodeGen] produce the LLVM canonical form of abs
We chose the 'slt' form as canonical in IR with:
rL332819
...so we should generate that form directly for efficiency.

llvm-svn: 332989
2018-05-22 15:36:50 +00:00
Sanjay Patel b6e5d4ead1 [CodeGen] add tests for abs builtins; NFC
llvm-svn: 332988
2018-05-22 15:11:59 +00:00
Brock Wyma 8557ec5d64 [CodeView] Enable debugging of captured variables within C++ lambdas
This change will help Visual Studio resolve forward references to C++ lambda
routines used by captured variables.

Differential Revision: https://reviews.llvm.org/D45438

llvm-svn: 332975
2018-05-22 12:41:19 +00:00
Amara Emerson f528bcc32a Revert "CodeGen, Driver: Start using direct split dwarf emission in clang."
This reverts commit r332885 as it broke several greendragon buildbots.

llvm-svn: 332973
2018-05-22 11:18:58 +00:00
Craig Topper 9efb77e25f [X86] Remove a builtin that should have been removed in r332882.
llvm-svn: 332909
2018-05-21 22:10:02 +00:00
Craig Topper 288bd2e5a0 [X86] Remove masking from pternlog llvm intrinsics and use a select instruction instead.
Because the intrinsics in the headers are implemented as macros, we can't just use a select builtin and pternlog builtin. This would require one of the macro arguments to be used twice. Depending on what was passed to the macro we could expand an expression twice leading to weird behavior. We could maybe declare our local variable in the macro, but that would need to worry about name collisions.

To avoid that just generate IR directly in CGBuiltin.cpp.

Differential Revision: https://reviews.llvm.org/D47125

llvm-svn: 332891
2018-05-21 20:58:23 +00:00
Richard Smith 3f1d6de4f7 Revert r332847; it caused us to miscompile certain forms of reference initialization.
llvm-svn: 332886
2018-05-21 20:36:58 +00:00
Peter Collingbourne 47bc01786d CodeGen, Driver: Start using direct split dwarf emission in clang.
Fixes PR37466.

Differential Revision: https://reviews.llvm.org/D47093

llvm-svn: 332885
2018-05-21 20:31:59 +00:00
Craig Topper 842171de36 [X86] Use __builtin_convertvector to implement some of the packed integer to packed float conversion intrinsics.
I believe this is safe assuming default default FP environment. The conversion might be inexact, but it can never overflow the FP type so this shouldn't be undefined behavior for the uitofp/sitofp instructions.

We already do something similar for scalar conversions.

Differential Revision: https://reviews.llvm.org/D46863

llvm-svn: 332882
2018-05-21 20:19:17 +00:00
Serge Pavlov 9f8068420a [CodeGen] Recognize more cases of zero initialization
If a variable has an initializer, codegen tries to build its value. If
the variable is large in size, building its value requires substantial
resources. It causes strange behavior from user viewpoint: compilation
of huge zero initialized arrays like:

    char data_1[2147483648u] = { 0 };

consumes enormous amount of time and memory.

With this change codegen tries to determine if variable initializer is
equivalent to zero initializer. In this case variable value is not
constructed.

This change fixes PR18978.

Differential Revision: https://reviews.llvm.org/D46241

llvm-svn: 332847
2018-05-21 16:09:54 +00:00
Craig Topper 55b4067350 [X86] Remove mask arguments from permvar builtins/intrinsics. Use a select in IR instead.
Someday maybe we'll use selects for all the builtins.

llvm-svn: 332825
2018-05-20 23:34:10 +00:00
Alexander Ivchenko 0fb8c877c4 This patch aims to match the changes introduced
in gcc by https://gcc.gnu.org/ml/gcc-cvs/2018-04/msg00534.html.
The -mibt feature flag is being removed, and the -fcf-protection
option now also defines a CET macro and causes errors when used
on non-X86 targets, while X86 targets no longer check for -mibt
and -mshstk to determine if -fcf-protection is supported. -mshstk
is now used only to determine availability of shadow stack intrinsics.

Comes with an LLVM patch (D46882).

Patch by mike.dvoretsky

Differential Revision: https://reviews.llvm.org/D46881

llvm-svn: 332704
2018-05-18 11:56:21 +00:00
Peter Smith 84a9c481f5 [AArch64] Correct inline assembly test case for S modifier [NFC]
The existing test for the AArch64 inline assembly constraint S uses the
A and L modifiers. These modifiers were implemented in the original
AArch64 backend but were not carried forward to the merged backend. The
A is associated with ADRP and does nothing, the L is associated with
:lo12: . Given that A and L are not supported by GCC and not supported
by the new implementation of constraint S in LLVM (see D46745) I've
altered the test to put :lo12: directly in the string so that A and L
are not needed.

Differential Revision: https://reviews.llvm.org/D46932

llvm-svn: 332606
2018-05-17 13:17:33 +00:00
Craig Topper 9d146bbaf7 [X86] Revert part of r332266: Use __builtin_convertvector to replace some of the avx512 truncate builtins.
The masking doesn't work right in the backend for the ones that produce byte or word elements without avx512bw.

llvm-svn: 332322
2018-05-15 03:17:52 +00:00
Craig Topper 25de41cfbc [X86] Use __builtin_convertvector to replace some of the avx512 truncate builtins.
As long as the destination type is a 256 or 128 bit vector with the same number of elements we can use __builtin_convertvector to directly generate trunc IR instruction which will be handled natively by the backend.

Differential Revision: https://reviews.llvm.org/D46742

llvm-svn: 332266
2018-05-14 17:50:40 +00:00
Craig Topper 8cb261e353 [X86] Use select instrution and fpextend in the implementation of _mm512_mask_cvtps_pd and _mm512_maskz_cvtps_pd.
llvm-svn: 332213
2018-05-14 04:57:46 +00:00
Craig Topper daaf105f86 [X86] Use __builtin_convertvector to implement _mm512_cvtps_pd.
If we're using default rounding mode we can let __builtin_convertvector to generate an fpextend. This matches 128 and 256 bit.

If we're using the version that takes an explicit rounding mode argument we would need to look at the immediate to see if its CUR_DIRECTION.

llvm-svn: 332210
2018-05-14 04:05:06 +00:00
Craig Topper 6fa91254e4 [X86] Emit better code for _mm_cvtu32_sd, _mm_cvtu64_sd, _mm_cvtu32_ss, and _mm_cvtu64_ss.
We can use direct C code for these that will use uitofp and insertelement instructions.

For the versions that take an explicit rounding mode we can't do this.

llvm-svn: 332203
2018-05-13 23:03:30 +00:00
Elena Demikhovsky d31327d505 Added atomic_fetch_min, max, umin, umax intrinsics to clang.
These intrinsics work exactly as all other atomic_fetch_* intrinsics and allow to create *atomicrmw* with ordering.
Updated the clang-extensions document.

Differential Revision: https://reviews.llvm.org/D46386

llvm-svn: 332193
2018-05-13 07:45:58 +00:00
Krzysztof Parzyszek 458506871a [Hexagon] Implement checking arguments of builtin calls
llvm-svn: 332105
2018-05-11 16:41:51 +00:00
Gabor Buella 9cd4f16601 [X86] Assume alignment of movdir64b dst argument
Reviewers: craig.topper

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D46683

llvm-svn: 332091
2018-05-11 14:22:04 +00:00
Gabor Buella 3a7571259e [X86] ptwrite intrinsic
Reviewers: craig.topper, RKSimon

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D46540

llvm-svn: 331962
2018-05-10 07:28:54 +00:00
Craig Topper 74ac0eda68 [X86] Change the implementation of scalar masked load/store intrinsics to not use a 512-bit intermediate vector.
This is unnecessary for AVX512VL supporting CPUs like SKX. We can just emit a 128-bit masked load/store here no matter what. The backend will widen it to 512-bits on KNL CPUs.

Fixes the frontend portion of PR37386. Need to fix the backend to optimize the new sequences well.

llvm-svn: 331958
2018-05-10 05:43:43 +00:00
Craig Topper 2b248849ae [Builtins] Improve the IR emitted for MSVC compatible rotr/rotl builtins to match what the middle and backends understand
Previously we emitted something like

rotl(x, n) {
  n &= bitwidth-1;
  return n != 0 ? ((x << n) | (x >> (bitwidth - n)) : x;
}

We use a select to avoid the undefined behavior on the (bitwidth - n) shift.

The middle and backend don't really recognize this as a rotate and end up emitting a cmov or control flow because of the select.

A better pattern is (x << (n & mask)) | (x << (-n & mask)) where mask is bitwidth - 1.

Fixes the main complaint in PR37387. There's still some work to be done if the user writes that sequence directly on a short or char where type promotion rules can prevent it from being recognized. The builtin is emitting direct IR with unpromoted types so that isn't a problem for it.

Differential Revision: https://reviews.llvm.org/D46656

llvm-svn: 331943
2018-05-10 00:05:13 +00:00
Manoj Gupta 4fbf84c173 [Clang] Implement function attribute no_stack_protector.
Summary:
This attribute tells clang to skip this function from stack protector
when -stack-protector option is passed.
GCC option for this is:
__attribute__((__optimize__("no-stack-protector"))) and the
equivalent clang syntax would be: __attribute__((no_stack_protector))

This is used in Linux kernel to selectively disable stack protector
in certain functions.

Reviewers: aaron.ballman, rsmith, rnk, probinson

Reviewed By: aaron.ballman

Subscribers: probinson, srhines, cfe-commits

Differential Revision: https://reviews.llvm.org/D46300

llvm-svn: 331925
2018-05-09 21:41:18 +00:00
Hans Wennborg ef2f6948be Revert r331843 "[DebugInfo] Generate debug information for labels."
It broke the Chromium build (see reply on the review).

> Generate DILabel metadata and call llvm.dbg.label after label
> statement to associate the metadata with the label.
>
> Differential Revision: https://reviews.llvm.org/D45045
>
> Patch by Hsiangkai Wang.

This doesn't revert the change to backend-unsupported-error.ll
that seems to correspond to an llvm-side change.

llvm-svn: 331861
2018-05-09 09:29:58 +00:00
JF Bastien 801fca259e _Atomic of empty struct shouldn't assert
Summary:

An _Atomic of an empty struct is pretty silly. In general we just widen empty
structs to hold a byte's worth of storage, and we represent size and alignment
as 0 internally and let LLVM figure out what to do. For _Atomic it's a bit
different: the memory model mandates concrete effects occur when atomic
operations occur, so in most cases actual instructions need to get emitted. It's
really not worth trying to optimize empty struct atomics by figuring out e.g.
that a fence would do, even though sane compilers should do optimize atomics.
Further, wg21.link/p0528 will fix C++20 atomics with padding bits so that
cmpxchg on them works, which means that we'll likely need to do the zero-init
song and dance for empty atomic structs anyways (and I think we shouldn't
special-case this behavior to C++20 because prior standards are just broken).

This patch therefore makes a minor change to r176658 "Promote atomic type sizes
up to a power of two": if the width of the atomic's value type is 0, just use 1
byte for width and leave alignment as-is (since it should never be zero, and
over-aligned zero-width structs are weird but fine).

This fixes an assertion:
   (NumBits >= MIN_INT_BITS && "bitwidth too small"), function get, file ../lib/IR/Type.cpp, line 241.

It seems like this has run into other assertions before (namely the unreachable
Kind check in ImpCastExprToType), but I haven't reproduced that issue with
tip-of-tree.

<rdar://problem/39678063>

Reviewers: arphaman, rjmccall

Subscribers: aheejin, cfe-commits

Differential Revision: https://reviews.llvm.org/D46613

llvm-svn: 331845
2018-05-09 03:51:12 +00:00
Shiva Chen 667fbe2cb0 [DebugInfo] Generate debug information for labels.
Generate DILabel metadata and call llvm.dbg.label after label
statement to associate the metadata with the label.

Differential Revision: https://reviews.llvm.org/D45045

Patch by Hsiangkai Wang.

llvm-svn: 331843
2018-05-09 02:41:56 +00:00
Craig Topper 45fc2c83e6 [X86] Use target feature defines in tests instead of defining our own flag on the command line. NFCI
llvm-svn: 331683
2018-05-07 21:47:13 +00:00
Teresa Johnson fedd39045f Add -target to address errors in test from r331592
The error turns out to be:
Assertion failed: (Target.isCompatibleDataLayout(getDataLayout()) && "Can't create a MachineFunction using a Module with a " "Target-incompatible DataLayout attached\n"), function init, file /Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm/lib/CodeGen/MachineFunction.cpp, line 180.

Add -target to address this. Also re-enable the test I had temporarily
commented, and move it further down in case there is still a failure
(since it pipes stderr to FileCheck).

llvm-svn: 331597
2018-05-05 16:37:31 +00:00
Teresa Johnson 1237b3acc9 Skip part of test added in r331592 to help debug bot failures
Trying to debug why/where a few bots getting exit code 256 e.g.
http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/48471/testReport/Clang/CodeGen/thinlto_diagnostic_handler_remarks_with_hotness_ll/

and a few windows bots getting no output from that RUN line e.g.
http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/builds/11865/steps/ninja%20check%201/logs/FAIL%3A%20Clang%3A%3Athinlto-diagnostic-handler-remarks-with-hotness.ll

llvm-svn: 331596
2018-05-05 15:54:57 +00:00
Teresa Johnson 259f8ddff5 Add required target to address bot failures from r331592
Failing on non-x86 bots, needs x86 target for code gen.

llvm-svn: 331593
2018-05-05 15:15:04 +00:00
Teresa Johnson 66744f8137 [ThinLTO] Support opt remarks options with distributed ThinLTO backends
Summary:
Passes down the necessary code ge options to the LTO Config to enable
-fdiagnostics-show-hotness and -fsave-optimization-record in the ThinLTO
backend for a distributed build.

Also, remove warning about not having PGO when the input is IR.

Reviewers: pcc

Subscribers: mehdi_amini, inglorion, eraman, cfe-commits

Differential Revision: https://reviews.llvm.org/D46464

llvm-svn: 331592
2018-05-05 14:37:29 +00:00
Chandler Carruth 9325c38fdb [gcov] Make the CLang side coverage test work with the new
instrumentation codegeneration strategy of using a data structure and
a loop. Required some finesse to get the critical things being tested to
surface in a nice way for FileCheck but I think this preserves the
original intent of the test.

llvm-svn: 331411
2018-05-02 22:57:20 +00:00
Shoaib Meenai c4cf3daad8 [ARM] Remove redundant #if in test. NFC
Both sides of this #if #include the same file. Drop the #if, leaving only the #include.

Patch by Matt Glazar.

Differential Revision: https://reviews.llvm.org/D45779

llvm-svn: 331305
2018-05-01 20:38:05 +00:00
Danil Malyshev cd3fd82da3 Update existed CodeGen TBAA tests
Reviewers: hfinkel, kosarev, rjmccall

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D44616

llvm-svn: 331292
2018-05-01 18:14:36 +00:00
Gabor Buella a51e0c2243 [X86] directstore and movdir64b intrinsics
Reviewers: spatel, craig.topper, RKSimon

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D45984

llvm-svn: 331249
2018-05-01 10:05:42 +00:00
Nirav Dave 6c0665e221 [MC] Change AsmParser to leverage Assembler during evaluation
Teach AsmParser to check with Assembler for when evaluating constant
expressions.  This improves the handing of preprocessor expressions
that must be resolved at parse time. This idiom can be found as
assembling-time assertion checks in source-level assemblers. Note that
this relies on the MCStreamer to keep sufficient tabs on Section /
Fragment information which the MCAsmStreamer does not. As a result the
textual output may fail where the equivalent object generation would
pass. This can most easily be resolved by folding the MCAsmStreamer
and MCObjectStreamer together which is planned for in a separate
patch.

Currently, this feature is only enabled for assembly input, keeping IR
compilation consistent between assembly and object generation.

Reviewers: echristo, rnk, probinson, espindola, peter.smith

Reviewed By: peter.smith

Subscribers: eraman, peter.smith, arichardson, jyknight, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D45164

llvm-svn: 331218
2018-04-30 19:22:40 +00:00
Sanjay Patel c81450e29b [Driver, CodeGen] rename options to disable an FP cast optimization
As suggested in the post-commit thread for rL331056, we should match these 
clang options with the established vocabulary of the corresponding sanitizer
option. Also, the use of 'strict' is well-known for these kinds of knobs, 
and we can improve the descriptive text in the docs.

So this intends to match the logic of D46135 but only change the words.
Matching LLVM commit to match this spelling of the attribute to follow shortly.

Differential Revision: https://reviews.llvm.org/D46236

llvm-svn: 331209
2018-04-30 18:19:03 +00:00
Sanjay Patel d175476566 [Driver, CodeGen] add options to enable/disable an FP cast optimization
As discussed in the post-commit thread for:
rL330437 ( http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20180423/545906.html )

We need a way to opt-out of a float-to-int-to-float cast optimization because too much 
existing code relies on the platform-specific undefined result of those casts when the 
float-to-int overflows.

The LLVM changes associated with adding this function attribute are here:
rL330947
rL330950
rL330951

Also as suggested, I changed the LLVM doc to mention the specific sanitizer flag that 
catches this problem:
rL330958

Differential Revision: https://reviews.llvm.org/D46135

llvm-svn: 331041
2018-04-27 14:22:48 +00:00
Oliver Stannard 2fcee8bd52 [ARM,AArch64] Add intrinsics for dot product instructions
The ACLE spec which describes these intrinsics hasn't been published yet, but
this is based on the final draft which will be published soon, and these have
already been implemented by GCC.

Differential revision: https://reviews.llvm.org/D46109

llvm-svn: 331039
2018-04-27 14:03:32 +00:00
Chandler Carruth 16429acacb [x86] Revert r330322 (& r330323): Lowering x86 adds/addus/subs/subus intrinsics
The LLVM commit introduces a crash in LLVM's instruction selection.

I filed http://llvm.org/PR37260 with the test case.

llvm-svn: 330997
2018-04-26 21:46:01 +00:00
Craig Topper e95bde33df [X86] Add support for _mm512_mullox_epi64 and _mm512_mask_mullox_epi64 intrinsics to match icc.
On AVX512F targets we'll produce an emulated sequence using 3 pmuludqs with shifts and adds. On AVX512DQ we'll use vpmulld.

Fixes PR37140.

llvm-svn: 330923
2018-04-26 05:38:39 +00:00
Eli Friedman e54d0ff400 [TargetInfo] Sort target features before passing them to the backend
Passing the features in random order will lead to unpredictable results
when some of the features are related (like the architecture-version
features on ARM).

It might be possible to fix this particular case in the ARM target code,
to avoid adding overlapping target features. But we should probably be
sorting in any case: the behavior shouldn't depend on StringMap's
hashing algorithm.

Differential Revision: https://reviews.llvm.org/D46030

llvm-svn: 330861
2018-04-25 19:14:05 +00:00
Paul Semel 80daae2736 add check for long double for __builtin_dump_struct
llvm-svn: 330808
2018-04-25 10:09:20 +00:00
Craig Topper ce281a41b5 [X86] Remove '#ifdef __x86_64__' around mask_set1_epi64 intrinsics.
The unmasked versions already didn't have this restrction. I don't think gcc or icc limit these to 64-bit mode so we shouldn't either.

llvm-svn: 330681
2018-04-24 03:36:08 +00:00
Mikhail Maltsev 4a4e7a31ad [CodeGen] Reland r330442: Add an option to suppress output of llvm.ident
The test case in the original patch was overly contrained and
failed on PPC targets.

llvm-svn: 330575
2018-04-23 10:08:46 +00:00
Tim Northover 9dc1d0c74e [Atomics] warn about atomic accesses using libcalls
If an atomic variable is misaligned (and that suspicion is why Clang emits
libcalls at all) the runtime support library will have to use a lock to safely
access it, with potentially very bad performance consequences. There's a very
good chance this is unintentional so it makes sense to issue a warning.

Also give it a named group so people can promote it to an error, or disable it
if they really don't care.

llvm-svn: 330566
2018-04-23 08:16:24 +00:00
Gabor Buella eba6c42e66 [X86] WaitPKG intrinsics
Reviewers: craig.topper, zvi

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D45254

llvm-svn: 330463
2018-04-20 18:44:33 +00:00