This brings the compile time of Function.cpp from ~40s down to ~4s for
me locally. It also shaves off about 400KB of object file size in a
release+asserts build.
I also realized that the AMDGPU backend does not have any GCC builtin
names to match, so the extra lookup was a no-op. I removed it to silence
a zero-length string table array warning. There should be no functional
change here.
This change really ends the story of PR11951.
llvm-svn: 258897
The AMDGPU backend was the last user of the old StringMatcher
recognition code. Move it over to the new lookupLLVMIntrinsicName
funciton, which is now improved to handle all of the interesting edge
cases exposed by AMDGPU intrinsic names.
llvm-svn: 258875
There is no need to generate separate table for intrinsics mod ref behaviour.
It can now be determined purely from function attributes.
Differential Revision: http://reviews.llvm.org/D13917
llvm-svn: 251040
Summary:
Add the necessary plumbing so that llvm_token_ty can be used as an
argument/return type in intrinsic definitions and correspondingly require
TokenTy in function types. TokenTy is an opaque type that has no target
lowering, but can be used in machine-independent intrinsics. It is
required for the upcoming llvm.eh.padparam intrinsic.
Reviewers: majnemer, rnk
Subscribers: stoklund, llvm-commits
Differential Revision: http://reviews.llvm.org/D12532
llvm-svn: 246651
preparation for de-coupling the AA implementations.
In order to do this, they had to become fake-scoped using the
traditional LLVM pattern of a leading initialism. These can't be actual
scoped enumerations because they're bitfields and thus inherently we use
them as integers.
I've also renamed the behavior enums that are specific to reasoning
about the mod/ref behavior of functions when called. This makes it more
clear that they have a very narrow domain of applicability.
I think there is a significantly cleaner API for all of this, but
I don't want to try to do really substantive changes for now, I just
want to refactor the things away from analysis groups so I'm preserving
the exact original design and just cleaning up the names, style, and
lifting out of the class.
Differential Revision: http://reviews.llvm.org/D10564
llvm-svn: 242963
If the type isn't trivially moveable emplace can skip a potentially
expensive move. It also saves a couple of characters.
Call sites were found with the ASTMatcher + some semi-automated cleanup.
memberCallExpr(
argumentCountIs(1), callee(methodDecl(hasName("push_back"))),
on(hasType(recordDecl(has(namedDecl(hasName("emplace_back")))))),
hasArgument(0, bindTemporaryExpr(
hasType(recordDecl(hasNonTrivialDestructor())),
has(constructExpr()))),
unless(isInTemplateInstantiation()))
No functional change intended.
llvm-svn: 238602
in POWER8:
vadduqm
vaddeuqm
vaddcuq
vaddecuq
vsubuqm
vsubeuqm
vsubcuq
vsubecuq
In addition to adding the instructions themselves, it also adds support for the
v1i128 type for intrinsics (Intrinsics.td, Function.cpp, and
IntrinsicEmitter.cpp).
http://reviews.llvm.org/D9081
llvm-svn: 238144
Gather and Scatter are new introduced intrinsics, comming after recently implemented masked load and store.
This is the first patch for Gather and Scatter intrinsics. It includes only the syntax, parsing and verification.
Gather and Scatter intrinsics allow to perform multiple memory accesses (read/write) in one vector instruction.
The intrinsics are not target specific and will have the following syntax:
Gather:
declare <16 x i32> @llvm.masked.gather.v16i32(<16 x i32*> <vector of ptrs>, i32 <alignment>, <16 x i1> <mask>, <16 x i32> <passthru>)
declare <8 x float> @llvm.masked.gather.v8f32(<8 x float*><vector of ptrs>, i32 <alignment>, <8 x i1> <mask>, <8 x float><passthru>)
Scatter:
declare void @llvm.masked.scatter.v8i32(<8 x i32><vector value to be stored> , <8 x i32*><vector of ptrs> , i32 <alignment>, <8 x i1> <mask>)
declare void @llvm.masked.scatter.v16i32(<16 x i32> <vector value to be stored> , <16 x i32*> <vector of ptrs>, i32 <alignment>, <16 x i1><mask> )
Vector of ptrs - a set of source/destination addresses, to load/store the value.
Mask - switches on/off vector lanes to prevent memory access for switched-off lanes
vector of ptrs, value and mask should have the same vector width.
These are code examples where gather / scatter should be used and will allow function vectorization
;void foo1(int * restrict A, int * restrict B, int * restrict C) {
; for (int i=0; i<SIZE; i++) {
; A[i] = B[C[i]];
; }
;}
;void foo3(int * restrict A, int * restrict B) {
; for (int i=0; i<SIZE; i++) {
; A[B[i]] = i+5;
; }
;}
Tests will come in the following patches, with CodeGen and Vectorizer.
http://reviews.llvm.org/D7433
llvm-svn: 228521
Specifically, gc.result benefits from this greatly. Instead of:
gc.result.int.*
gc.result.float.*
gc.result.ptr.*
...
We now have a gc.result.* that can specialize to literally any type.
Differential Revision: http://reviews.llvm.org/D7020
llvm-svn: 226857
I'm recommiting the codegen part of the patch.
The vectorizer part will be send to review again.
Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)
Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.
http://reviews.llvm.org/D6191
llvm-svn: 223348
This reverts commit r222632 (and follow-up r222636), which caused a host
of LNT failures on an internal bot. I'll respond to the commit on the
list with a reproduction of one of the failures.
Conflicts:
lib/Target/X86/X86TargetTransformInfo.cpp
llvm-svn: 222936
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)
Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.
http://reviews.llvm.org/D6191
llvm-svn: 222632
Add MSBuiltin which is similar in vein to GCCBuiltin. This allows for adding
intrinsics for Microsoft compatibility to individual instructions. This is
needed to permit the creation of ARM specific MSVC extensions.
This is not currently in use, and requires an associated change in clang to
enable use of the intrinsics defined by this new class. This merely sets the
LLVM portion of the infrastructure in place to permit the use of this
functionality. A separate set of changes will enable the new intrinsics.
llvm-svn: 212350
entirely clear whether this should be valid with modules enabled, but the fixed
code is cleaner regardless.
Also fix a TU-local type that accidentally had external linkage.
llvm-svn: 206714
This is like the LLVMMatchType, except the verifier checks that the
second argument is a vector with the same base type and half the
number of elements.
This will be used by the ARM64 backend.
llvm-svn: 205079
These are used in the ARM backends to aid type-checking on patterns involving
intrinsics. By making sure one argument is an extended/truncated version of
another.
However, there's no reason to limit them to just vectors types. For example
AArch64 has the instruction "uqshrn sD, dN, #imm" which would naturally use an
intrinsic taking an i64 and returning an i32.
llvm-svn: 205003
The "noduplicate" function attribute exists to prevent certain optimizations
from duplicating calls to the function. This is important on platforms where
certain function call duplications are unsafe (for example execution barriers
for CUDA and OpenCL).
This patch makes it possible to specify intrinsics as "noduplicate" and
translates that to the appropriate function attribute.
llvm-svn: 204200
should not be marked nounwind.
Marking them nounwind caused crashes in the WebKit FTL JIT, because if we enable
sufficient optimizations, LLVM starts eliding compact_unwind sections (or any unwind
data for that matter), making deoptimization via stackmaps impossible.
This changes the stackmap intrinsic to be may-throw, adds a test for exactly the
sympton that WebKit saw, and fixes TableGen to handle un-attributed intrinsics.
Thanks to atrick and philipreames for reviewing this.
llvm-svn: 201826
Patch by Ana Pazos.
1.Added support for v1ix and v1fx types.
2.Added Scalar Pairwise Reduce instructions.
3.Added initial implementation of Scalar Arithmetic instructions.
llvm-svn: 191263
For two intrinsics 'llvm.nvvm.texsurf.handle' and 'llvm.nvvm.texsurf.handle.internal',
TableGen was emitting matching code like:
if (Name.startswith("llvm.nvvm.texsurf.handle")) ...
if (Name.startswith("llvm.nvvm.texsurf.handle.internal")) ...
We can never match "llvm.nvvm.texsurf.handle.internal" here because it will
always be erroneously matched by the first condition.
The fix is to sort the intrinsic names and emit them in reverse order.
llvm-svn: 187119
In the future, AttributeWithIndex won't be used anymore. Besides, it exposes the
internals of the AttributeSet to outside users, which isn't goodness.
llvm-svn: 173606
When code deletes the context, the AttributeImpls that the AttrListPtr points to
are now invalid. Therefore, instead of keeping a separate managed static for the
AttrListPtrs that's reference counted, move it into the LLVMContext and delete
it when deleting the AttributeImpls.
llvm-svn: 168354
Most places can use PrintFatalError as the unwinding mechanism was not
used for anything other than printing the error. The single exception
was CodeGenDAGPatterns.cpp, where intermediate errors during type
resolution were ignored to simplify incremental platform development.
This use is replaced by an error flag in TreePattern and bailout earlier
in various places if it is set.
llvm-svn: 166712
Convert the internal representation of the Attributes class into a pointer to an
opaque object that's uniqued by and stored in the LLVMContext object. The
Attributes class then becomes a thin wrapper around this opaque
object. Eventually, the internal representation will be expanded to include
attributes that represent code generation options, etc.
llvm-svn: 165917