Commit Graph

4728 Commits

Author SHA1 Message Date
Luke Geeson da2b2e8c26 [AArch64] Reverted rC334696 with Clang VCVTA test fix
llvm-svn: 334820
2018-06-15 10:10:45 +00:00
Craig Topper b521dc3acf [X86] Add inline assembly versions of _InterlockedExchange_HLEAcquire/Release and _InterlockedCompareExchange_HLEAcquire/Release for MSVC compatibility.
Clang/LLVM doesn't have a way to pass an HLE hint through to the X86 backend to emit HLE prefixed instructions. So this is a good short term fix.

Differential Revision: https://reviews.llvm.org/D47672

llvm-svn: 334751
2018-06-14 18:43:52 +00:00
Tomasz Krupa 82aa42af49 [X86] Lowering Mask Scalar intrinsics to native IR (Clang part)
Summary: Lowering add, sub, mul, and div mask scalar intrinsic calls
to native IR.

Reviewers: craig.topper, RKSimon, spatel, sroland

Reviewed By: craig.topper

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D47979

llvm-svn: 334741
2018-06-14 17:36:23 +00:00
Luke Geeson bb399f8013 [AArch64] reverting rC334693 due to build failures
llvm-svn: 334696
2018-06-14 08:59:33 +00:00
Luke Geeson 010bbbf390 [AArch64] Added support for the vcvta_u16_f16 instrinsic for FP16 Armv8.2-A
llvm-svn: 334693
2018-06-14 08:28:56 +00:00
Mandeep Singh Grang 2d28383097 [COFF] Add ARM64 intrinsics: __yield, __wfe, __wfi, __sev, __sevl
Summary: These intrinsics result in hint instructions. They are provided here for MSVC ARM64 compatibility.

Reviewers: mstorsjo, compnerd, javed.absar

Reviewed By: mstorsjo

Subscribers: kristof.beyls, chrib, cfe-commits

Differential Revision: https://reviews.llvm.org/D48132

llvm-svn: 334639
2018-06-13 18:49:35 +00:00
Sanjay Patel 1d7ed94439 [CodeGen] make nan builtins pure rather than const (PR37778)
https://bugs.llvm.org/show_bug.cgi?id=37778
...shows a miscompile resulting from marking nan builtins as 'const'.

The nan libcalls/builtins take a pointer argument:
http://www.cplusplus.com/reference/cmath/nan-function/
...and the chars dereferenced by that arg are used to fill in the NaN constant payload bits.

"const" means that the pointer argument isn't dereferenced. That's translated to "readnone" in LLVM.
"pure" means that the pointer argument may be dereferenced. That's translated to "readonly" in LLVM.

This change prevents the IR optimizer from killing the lead-up to the nan call here:

double a() {
  char buf[4];
  buf[0] = buf[1] = buf[2] = '9';
  buf[3] = '\0';
  return __builtin_nan(buf);
}

...the optimizer isn't currently able to simplify this to a constant as we might hope, 
but this patch should solve the miscompile.

Differential Revision: https://reviews.llvm.org/D48134

llvm-svn: 334628
2018-06-13 17:54:52 +00:00
Craig Topper 2527c378c6 [X86] Remove masking from avx512vbmi2 concat and shift by immediate builtins. Use select builtins instead.
llvm-svn: 334577
2018-06-13 07:19:28 +00:00
Luke Geeson dc54b37414 [AArch64] Corrected FP16 Intrinsic range checks in Clang + added Sema tests
Summary:
This fixes the ranges for the vcvth family of FP16 intrinsics in the clang front end. Previously it was accepting incorrect ranges
-Changed builtin range checking in SemaChecking
-added tests SemaCheck changes - included in  their own file since no similar one exists
-modified existing tests to reflect new ranges

Reviewers: SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Subscribers: kristof.beyls, cfe-commits

Differential Revision: https://reviews.llvm.org/D47592

llvm-svn: 334489
2018-06-12 09:54:27 +00:00
Mikhail Maltsev 5434047862 [Driver] Add aliases for -Qn/-Qy
This patch adds aliases for -Qn (-fno-ident) and -Qy (-fident) which
look less cryptic than -Qn/-Qy. The aliases are compatible with GCC.

Differential Revision: https://reviews.llvm.org/D48021

llvm-svn: 334414
2018-06-11 16:10:06 +00:00
Craig Topper 91bbe98757 [X86] Remove masking from dbpsadbw builtins, use select builtin instead.
llvm-svn: 334385
2018-06-11 06:18:29 +00:00
Craig Topper 3cce6a7ed9 [X86] Use target independent masked expandload and compressstore intrinsics to implement expandload/compressstore builtins.
Summary: We've had these target independent intrinsics for at least a year and a half. Looks like they do exactly what we need here and the backend already supports them.

Reviewers: RKSimon, delena, spatel, GBuella

Reviewed By: RKSimon

Subscribers: cfe-commits, llvm-commits

Differential Revision: https://reviews.llvm.org/D47693

llvm-svn: 334366
2018-06-10 17:27:05 +00:00
Ivan A. Kosarev 73c76c35a5 [NEON] Support VST1xN intrinsics in AArch32 mode (Clang part)
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.

Differential Revision: https://reviews.llvm.org/D47446

llvm-svn: 334362
2018-06-10 09:28:10 +00:00
Craig Topper 3614b41a8e [X86] Remove masking from the 512-bit packed floating point add/sub/mul/div builtins. Use select in IR instead.
llvm-svn: 334359
2018-06-10 06:01:42 +00:00
Craig Topper 03f4f04b91 [X86] Add builtins for vpermq/vpermpd instructions to enable target feature checking.
llvm-svn: 334311
2018-06-08 18:00:25 +00:00
Craig Topper 03de166ccd [X86] Add builtins for pshufd, pshuflw, and pshufhw to enable target feature and immediate range checking.
llvm-svn: 334265
2018-06-08 06:13:16 +00:00
Craig Topper 3428beeb2f [X86] Add subvector insert and extract builtins to enable target feature checking and immediate range checking.
Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc.

llvm-svn: 334261
2018-06-08 03:24:47 +00:00
Craig Topper acf5601961 [X86] Add builtins for vpermilps/pd instructions to enable target feature checking.
llvm-svn: 334256
2018-06-08 00:59:27 +00:00
Craig Topper 7d17d7278b [X86] Add builtins for blend with immediate control to enforce target feature requirements and check immediate range.
llvm-svn: 334249
2018-06-08 00:00:21 +00:00
Craig Topper 9392136414 [X86] Add builtins for shuff32x4/shuff64x2/shufi32x4/shuff64x2 to enable target feature checking and immediate range checking.
llvm-svn: 334244
2018-06-07 23:03:08 +00:00
Shoaib Meenai d8d1547387 [Frontend] Disallow non-MSVC exception models for windows-msvc targets
The windows-msvc target is used for MSVC ABI compatibility, including
the exceptions model. It doesn't make sense to pair a windows-msvc
target with a non-MSVC exception model. This would previously cause an
assertion failure; explicitly error out for it in the frontend instead.
This also allows us to reduce the matrix of target/exception models a
bit (see the modified tests), and we can possibly simplify some of the
personality code in a follow-up.

Differential Revision: https://reviews.llvm.org/D47853

llvm-svn: 334243
2018-06-07 22:54:54 +00:00
Reid Kleckner aa46ed9278 [MS] Re-add support for the ARM interlocked bittest intrinscs
Adds support for these intrinsics, which are ARM and ARM64 only:
  _interlockedbittestandreset_acq
  _interlockedbittestandreset_rel
  _interlockedbittestandreset_nf
  _interlockedbittestandset_acq
  _interlockedbittestandset_rel
  _interlockedbittestandset_nf

Refactor the bittest intrinsic handling to decompose each intrinsic into
its action, its width, and its atomicity.

llvm-svn: 334239
2018-06-07 21:39:04 +00:00
Craig Topper d3623155a2 [X86] Add back builtins for _mm_slli_si128/_mm_srli_si128 and similar intrinsics.
We still lower them to native shuffle IR, but we do it in CGBuiltin.cpp now. This allows us to check the target feature and ensure the immediate fits in 8 bits.

This also improves our -O0 codegen slightly because we're able to see the zeroinitializer in the shuffle. It looks like it got lost behind a store+load previously.

llvm-svn: 334208
2018-06-07 17:28:03 +00:00
Gabor Buella 1a83d06768 [CodeGen] Improve diagnostics related to target attributes
Summary:
When requirement imposed by __target__ attributes on functions
are not satisfied, prefer printing those requirements, which
are explicitly mentioned in the attributes.

This makes such messages more useful, e.g. printing avx512f instead of avx2
in the following scenario:

```
$ cat foo.c
static inline void __attribute__((__always_inline__, __target__("avx512f")))
x(void)
{
}

int main(void)
{
            x();
}
$ clang foo.c
foo.c:7:2: error: always_inline function 'x' requires target feature 'avx2', but would be inlined into function 'main' that is compiled without support for 'avx2'
        x();
    ^
1 error generated.
```

bugzilla: https://bugs.llvm.org/show_bug.cgi?id=37338

Reviewers: craig.topper, echristo, dblaikie

Reviewed By: craig.topper, echristo

Differential Revision: https://reviews.llvm.org/D46541

llvm-svn: 334174
2018-06-07 08:48:36 +00:00
Craig Topper b92c77d176 [X86] Add back _mask, _maskz, and _mask3 builtins for some 512-bit fmadd/fmsub/fmaddsub/fmsubadd builtins.
Summary:
We recently switch to using a selects in the intrinsics header files for FMA instructions. But the 512-bit versions support flavors with rounding mode which must be an Integer Constant Expression. This has forced those intrinsics to be implemented as macros. As it stands now the mask and mask3 intrinsics evaluate one of their macro arguments twice. If that argument itself is another intrinsic macro, we can end up over expanding macros. Or if its something we can CSE later it would show up multiple times when it shouldn't.

I tried adding __extension__ around the macro and making it an expression statement and declaring a local variable. But whatever name you choose for the local variable can never be used as the name of an input to the macro in user code. If that happens you would end up with the same name on the LHS and RHS of an assignment after expansion. We might be safe if we use __ in front of the variable names because those names are reserved and user code shouldn't use that, but I wasn't sure I wanted to make that claim.

The other option which I've chosen here, is to add back _mask, _maskz, and _mask3 flavors of the builtin which we will expand in CGBuiltin.cpp to replicate the argument as needed and insert any fneg needed on the third operand to make a subtract. The _maskz isn't truly necessary if we have an unmasked version or if we use the masked version with a -1 mask and wrap a select around it. But I've chosen to make things more uniform.

I separated out the scalar builtin handling to avoid too many things going on in EmitX86FMAExpr. It was different enough due to the extract and insert that the minor duplication of the CreateCall was probably worth it.

Reviewers: tkrupa, RKSimon, spatel, GBuella

Reviewed By: tkrupa

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D47724

llvm-svn: 334159
2018-06-07 02:46:02 +00:00
Evandro Menezes 0804523bd5 [PATCH 2/2] [test] Add support for Samsung Exynos M4 (NFC)
Add test cases for Exynos M4.

llvm-svn: 334116
2018-06-06 18:58:01 +00:00
Reid Kleckner 11c99ed05f [MS][ARM64]: Promote _setjmp to_setjmpex as there is no _setjmp in the ARM64 libvcruntime.lib
Factor out the common setjmp call emission code.

Based on a patch by Chris January

Differential Revision: https://reviews.llvm.org/D47784

llvm-svn: 334112
2018-06-06 18:39:47 +00:00
Reid Kleckner 368d52b7e0 Implement bittest intrinsics generically for non-x86 platforms
I tested these locally on an x86 machine by disabling the inline asm
codepath and confirming that it does the same bitflips as we do with the
inline asm.

Addresses code review feedback.

llvm-svn: 334059
2018-06-06 01:35:08 +00:00
Craig Topper f3914b74c1 [X86] Add builtins for vector element insert and extract for different 128 and 256 bit vector types. Use them to implement the extract and insert intrinsics.
Previously we were just using extended vector operations in the header file.

This unfortunately allowed non-constant indices to be used with the intrinsics. This is incompatible with gcc, icc, and MSVC. It also introduces a different performance characteristic because non-constant index gets lowered to a vector store and an element sized load.

By adding the builtins we can check for the index to be a constant and ensure its in range of the vector element count.

User code still has the option to use extended vector operations themselves if they need non-constant indexing.

llvm-svn: 334057
2018-06-06 00:24:55 +00:00
Reid Kleckner 1d9c249db5 Reimplement the bittest intrinsic family as builtins with inline asm
We need to implement _interlockedbittestandset as a builtin for
windows.h, so we might as well do the whole family. It reduces code
duplication anyway.

Fixes PR33188, a long standing bug in our bittest implementation
encountered by Chakra.

llvm-svn: 333978
2018-06-05 01:33:40 +00:00
Teresa Johnson e2e630b01b [ThinLTO] Add testing of new summary index format to a couple CFI tests
Summary:
Adds testing of combined index summary entries in disassembly format
to CFI tests that were already testing the bitcode format.

Depends on D46699.

Reviewers: pcc

Subscribers: mehdi_amini, inglorion, eraman, cfe-commits

Differential Revision: https://reviews.llvm.org/D46700

llvm-svn: 333966
2018-06-04 23:05:24 +00:00
Reid Kleckner 89fbd55145 Revert r333791 "Cap "voluntary" vector alignment at 16 for all Darwin platforms."
Adding __attribute__((aligned(32))) to __m256 breaks the implementation
of _mm256_loadu_ps on Windows. On Windows, alignment attributes have
higher precedence than packing attributes.

We also might want to carefully consider the consequences of changing
our vector typedefs, since many users copy them and invent their own
new, non-Intel specific vector type names.

llvm-svn: 333958
2018-06-04 21:39:20 +00:00
Craig Topper 95ed88a1a9 [X86] Avoid passing _mm_undefined* to builtin_shufflevector if we are able to pass the first input a second time.
This is more consistent with other usages of builtin_shufflevector. Later optimization passes or codegen will detect the duplicate vector and replace it with undef. Using _mm_undefined just puts a zeroinitializer that still needs to be optimized out later.

llvm-svn: 333944
2018-06-04 19:28:09 +00:00
Craig Topper 6fb26f93ef [X86] Replace __builtin_ia32_vbroadcastf128_pd256 and __builtin_ia32_vbroadcastf128_ps256 with an unaligned load intrinsics and a __builtin_shufflevector call.
llvm-svn: 333853
2018-06-03 19:42:59 +00:00
Ivan A. Kosarev 9c40c0ad0c [NEON] Support VLD1xN intrinsics in AArch32 mode (Clang part)
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.

Differential Revision: https://reviews.llvm.org/D47121

llvm-svn: 333829
2018-06-02 17:42:59 +00:00
John McCall 280c656031 Cap "voluntary" vector alignment at 16 for all Darwin platforms.
This fixes two major problems:
- We were not capping vector alignment as desired on 32-bit ARM.
- We were using different alignments based on the AVX settings on
  Intel, so we did not have a consistent ABI.

This is an ABI break, but we think we can get away with it because
vectors tend to be used mostly in inline code (which is why not having
a consistent ABI has not proven disastrous on Intel).

Intel's AVX types are specified as having 32-byte / 64-byte alignment,
so align them explicitly instead of relying on the base ABI rule.
Note that this sort of attribute is stripped from template arguments
in template substitution, so there's a possibility that code templated
over vectors will produce inadequately-aligned objects.  The right
long-term solution for this is for alignment attributes to be
interpreted as true qualifiers and thus preserved in the canonical type.

llvm-svn: 333791
2018-06-01 21:34:26 +00:00
Dan Gohman 9f8ee03772 [WebAssembly] Update to the new names for the memory builtin functions.
The WebAssembly committee has decided on the names `memory.size` and
`memory.grow` for the memory intrinsics, so update the clang builtin
functions to follow those names, keeping both sets of old names in place
for compatibility.

llvm-svn: 333712
2018-06-01 00:05:51 +00:00
Peter Collingbourne 3aa30e8062 IRGen: Write .dwo files when -split-dwarf-file is used together with -fthinlto-index.
Differential Revision: https://reviews.llvm.org/D47597

llvm-svn: 333677
2018-05-31 18:25:59 +00:00
Craig Topper a6dd2faaea [X86] Make 512-bit unmasked load/store builtins more like their 128/256-bit equivalents.
Previously we were just passing -1 mask to the masked builtin. This changes it to the more generic way that the 128/256 bit use.

llvm-svn: 333626
2018-05-31 05:02:08 +00:00
Craig Topper c633867944 [X86] Remove __extension__ from macro intrinsics when its not needed.
I think this is a holdover from when we used to declare variables inside the macros. And then its been copy and pasted forward for years every time a new macro intrinsic gets added.

Interestingly this caused some tests for IRGen to be slightly more optimized. We now return a zeroinitializer directly instead of going through a store+load.

It also removed a bogus error message on another test.

llvm-svn: 333613
2018-05-31 00:51:20 +00:00
Eric Christopher 5b91350b4a Add fopen to the list of builtins that we check and whitelist.
llvm-svn: 333594
2018-05-30 21:11:45 +00:00
Craig Topper c5ec55e921 [X86] Simplify the implementation of _mm_sqrt_ss, _mm_rcp_ss, and _mm_rsqrt_ss.
We don't need the insertion back into the original vector at the end. The builtin already understands that.

This is different than _mm_sqrt_sd which takes two arguments and we do need to insert.

llvm-svn: 333572
2018-05-30 18:27:07 +00:00
Craig Topper dff5b311af [X86] Reduce the number of setzero intrinsics to just the set defined by the Intel Intrinsics Guide.
We had quite a few for different element sizes of integers sometimes with strange target features attached to them.

We only need a single version for each of _m128i, _m256i, and _m512i with the target feature that first introduced those types.

llvm-svn: 333568
2018-05-30 18:02:11 +00:00
Gabor Buella 70d8d51073 [X86] Lowering FMA intrinsics to native IR (Clang part)
This patch replaces all packed (and scalar without rounding
mode) fused intrinsics with fmadd/fmaddsub variations.
Then fmadd/fmaddsub are lowered to native IR.

Patch by tkrupa

Reviewers: craig.topper, sroland, spatel, RKSimon

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D47444

llvm-svn: 333555
2018-05-30 15:27:49 +00:00
Simon Tatham 89e31fa7fc Support __iso_volatile_load8 etc on aarch64-win32.
These intrinsics are used by MSVC's header files on AArch64 Windows as
well as AArch32, so we should support them for both targets. I've
factored them out of CodeGenFunction::EmitARMBuiltinExpr into separate
functions that EmitAArch64BuiltinExpr can call as well.

Reviewers: javed.absar, mstorsjo

Reviewed By: mstorsjo

Subscribers: kristof.beyls, cfe-commits

Differential Revision: https://reviews.llvm.org/D47476

llvm-svn: 333513
2018-05-30 07:54:05 +00:00
Daniel Cederman 8cc53aecad [Sparc] Add floating-point register names
Reviewers: jyknight

Reviewed By: jyknight

Subscribers: eraman, fedor.sergeev, jrtc27, cfe-commits

Differential Revision: https://reviews.llvm.org/D47137

llvm-svn: 333510
2018-05-30 06:02:18 +00:00
Craig Topper f6e79c6d3f [X86] Remove masking from the AVX512VNNI builtins. Use a select in IR instead.
llvm-svn: 333509
2018-05-30 05:26:04 +00:00
Craig Topper 3d9305f28b [X86] Fix the names of a bunch of icelake intrinsics.
Mostly this fixes the names of all the 128-bit intrinsics to start with _mm_ instead of _mm128_ as is the convention and what the Intel docs say.

This also fixes the name of the bitshuffle intrinsics to say epi64 for 128 and 256 bit versions.

llvm-svn: 333497
2018-05-30 03:38:15 +00:00
Craig Topper 68a272d501 [X86] Merge the 3 different flavors of masked vpermi2var/vpermt2var builtins to a single version without masking. Use select builtins with appropriate operand instead.
llvm-svn: 333387
2018-05-29 03:26:38 +00:00
Craig Topper f99532faee Revert r333347 "[X86] Rewrite the max and min reduction intrinsics to make better use of other functions and to reduce width to 256 and 128 bits were possible."
This wasn't supposed to be commited yet.

llvm-svn: 333349
2018-05-26 18:57:41 +00:00