Flag -fast-isel-abort is required in order to verify that X86FastISel
never fails to select FPExt (float-to-double) and FPTrunc (double-to-float).
No Functional change intended.
llvm-svn: 229489
GCC 4.8 reported two new warnings due to comparisons
between signed and unsigned integer expressions. The new warnings were
accidentally introduced by revision 229480.
Added explicit casts to silence the warnings. No functional change intended.
llvm-svn: 229488
The atoms may be processed in different orders on different systems
based on allocated addresses. This is a bit unfortunate as it would
be nice to have error messages emitted in order of file contents.
However we are emitting errors inside a parallel_for_each so even if
we stabilize the order of atom processing we would need to do some
further work in order to ensure that thread scheduling doesn't perturb
the order of errors. For now switch to using CHECK-DAG instead of CHECK.
llvm-svn: 229487
This added API to the InstrProfWriter to write to a string so I could
write unittests without using temp files. This doesn't really work,
since the format has tighter alignment requirements than a char.
This reverts r229478 and its follow-up, r229481.
llvm-svn: 229483
- added mask types v8i1 and v16i1 to possible function parameters
- enabled passing 512-bit vectors in standard CC
- added a test for KNL intel_ocl_bi conventions
llvm-svn: 229482
Vector zext tends to get legalized into a vector anyext, represented as a vector shuffle with an undef vector + a bitcast, that gets ANDed with a mask that zeroes the undef elements.
Combine this into an explicit shuffle with a zero vector instead. This allows shuffle lowering to match it as a zext, instead of matching it as an anyext and emitting an explicit AND.
This combine only covers a subset of the cases, but it's a start.
Differential Revision: http://reviews.llvm.org/D7666
llvm-svn: 229480
This is just a single commit that includes a performance optimization that
should improve dependence analysis time. Our performance bots should measure
this difference.
llvm-svn: 229476
initialization. Initialize the subtarget once per function and
migrate EmitStartOfAsmFile to either use attributes on the
TargetMachine or get information from all of the various
subtargets.
llvm-svn: 229475
The version of the tutorial uses the new compile callbacks API to inject stubs
that trigger IRGen & Codegen of their respective function bodies when they are
first called.
llvm-svn: 229466
This allows it to match still more places where previously we would have
to fall back on floating point shuffles or other more complex lowering
strategies.
I'm hoping to replace some of the hand-rolled unpack matching with this
routine is it gets more and more clever.
llvm-svn: 229463
BDCE is a bit-tracking dead code elimination pass. It is based on ADCE (the
"aggressive DCE" pass), with the added capability to track dead bits of integer
valued instructions and remove those instructions when all of the bits are
dead.
Currently, it does not actually do this all-bits-dead removal, but rather
replaces the instruction's uses with a constant zero, and lets instcombine (and
the later run of ADCE) do the rest. Because we essentially get a run of ADCE
"for free" while tracking the dead bits, we also do what ADCE does and removes
actually-dead instructions as well (this includes instructions newly trivially
dead because all bits were dead, but not all such instructions can be removed).
The motivation for this is a case like:
int __attribute__((const)) foo(int i);
int bar(int x) {
x |= (4 & foo(5));
x |= (8 & foo(3));
x |= (16 & foo(2));
x |= (32 & foo(1));
x |= (64 & foo(0));
x |= (128& foo(4));
return x >> 4;
}
As it turns out, if you order the bit-field insertions so that all of the dead
ones come last, then instcombine will remove them. However, if you pick some
other order (such as the one above), the fact that some of the calls to foo()
are useless is not locally obvious, and we don't remove them (without this
pass).
I did a quick compile-time overhead check using sqlite from the test suite
(Release+Asserts). BDCE took ~0.4% of the compilation time (making it about
twice as expensive as ADCE).
I've not looked at why yet, but we eliminate instructions due to having
all-dead bits in:
External/SPEC/CFP2006/447.dealII/447.dealII
External/SPEC/CINT2006/400.perlbench/400.perlbench
External/SPEC/CINT2006/403.gcc/403.gcc
MultiSource/Applications/ClamAV/clamscan
MultiSource/Benchmarks/7zip/7zip-benchmark
llvm-svn: 229462
This patch replaces most of the Orc indirection utils API with a new class:
JITCompileCallbackManager, which creates and manages JIT callbacks.
Exposing this functionality directly allows the user to create callbacks that
are associated with user supplied compilation actions. For example, you can
create a callback to lazyily IR-gen something from an AST. (A kaleidoscope
example demonstrating this will be committed shortly).
This patch also refactors the CompileOnDemand layer to use the
JITCompileCallbackManager API.
llvm-svn: 229461
This test was failing on non-x86 hosts because it specified a cpu of x86_64,
but not an architecture. x86_64 is obviously not a valid cpu on all
architectures.
llvm-svn: 229460
While looking at a heap profile of a clang LTO bootstrap with -g, I
noticed that 2.2% of memory in an `llvm-lto` of clang is from calling
`DebugLoc::get()` in `collectVariableInfo()` (accounting for ~40% of
memory used for `MDLocation`s).
I suspect this was introduced by r226736, whose goal was to prevent
uniquing of `DebugLoc`s (goal achieved, if so).
There's no reason we need a `DebugLoc` here at all -- it was just being
used for (in)convenient API -- so the fix is to pass the scope and
inlined-at directly to `LexicalScopes::findInlinedScope()`.
llvm-svn: 229459
Our register allocation has become better recently, it seems, and is now
starting to generate cross-block copies into inflated register classes. These
copies are not transformed into subregister insertions/extractions by the
PPCVSXCopy class, and so need to be handled directly by
PPCInstrInfo::copyPhysReg. The code to do this was *almost* there, but not
quite (it was unnecessarily restricting itself to only the direct
sub/super-register-class case (not copying between, for example, something in
VRRC and the lower-half of VSRC which are super-registers of F8RC).
Triggering this behavior manually is difficult; I'm including two
bugpoint-reduced test cases from the test suite.
llvm-svn: 229457
This required some minor API to be added to these types to avoid
needing temp files.
Also, I've used initializer lists in the tests, as MSVC 2013 claims to
support them. I'll redo this without them if the bots complain.
llvm-svn: 229455
and LazyEmittingLayer of Orc.
This method allows you to immediately emit and finalize a module. It is required
by an upcoming refactor of the indirection utils and the compile-on-demand
layer.
I've filed http://llvm.org/PR22608 to write unit tests for this and other Orc
APIs.
llvm-svn: 229451
ParsePostfixExpressionSuffix() for '->' (or '.') postfixes first calls
ActOnStartCXXMemberReference() to inform sema that a member reference is about
to start, and that function lets the parser know if sema thinks that the
base expression's type could allow a pseudo destructor from a semantic point of
view (for example, if the the base expression has a dependent type).
ParsePostfixExpressionSuffix() then calls ParseOptionalCXXScopeSpecifier() and
passes MayBePseudoDestructor on to that function, expecting the function to
set it to false if a pseudo destructor is impossible from a syntactic point of
view (due to a lack of '~' sigil). However, ParseOptionalCXXScopeSpecifier()
had early-outs for ::new and __super, so MayBePseudoDestructor stayed true,
so we tried to parse a pseudo dtor, and then became confused since we couldn't
find a '~'. Move the snippet in ParseOptionalCXXScopeSpecifier() that sets
MayBePseudoDestructor to false above the early exits.
Parts of this found by SLi's bot.
llvm-svn: 229449
The deprecated attribute was adopted as part of the C++14, however, there is a
GNU version available in C++11. When using C++ earlier than C++14, diagnose the
use of the attribute without the GNU scope, but only when using the generalised
attribute syntax.
llvm-svn: 229447
In the case that we diagnosed an invalid attribute due to missing or present
arguments, we would return false, indicating to the caller that the parsing
failed. However, we would have added the attribute in ParseAttributeArgsCommon
(which may have been called indirectly through ParseGNUAttributeArgs).
Returning true in this case ensures that a second copy of the attribute is not
added.
I haven't added a test case for this as the existing test will cover this with
the next commit which diagnoses a C++14 attribute applied in C++11 mode. Rather
than duplicating the existing test case, allow the tree to remain without a test
between this and the next change. We would see double warnings in the
[[deprecated()]] applied to a declaration in C++11 mode, which will cause an
error in the cxx0x-attributes test.
llvm-svn: 229446
We cannot simply rematerialize instructions which only defining a
subregister, as the final value also depends on the previous
instructions.
This fixes test/CodeGen/R600/subreg-coalescer-bug.ll with subreg
liveness enabled.
llvm-svn: 229444