test file.
This old test had a bunch of functions that were never even checked. =/
The only thing it really did was to make sure that we did something
reasonable in 32-bit mode with SSE4.1. Adding another run line to the
main vector-sext.ll test seems a better way to do that.
llvm-svn: 218810
of architectures: SSE2, SSSE3, SSE4.1, AVX, and AVX2.
Unfortunately, this exposses the absolute horror of the code we generate
for many of these patterns. Anyone wanting to familiarize themselves
with the x86 backend and improve performance could do a lot of good
sitting down and making these test cases not look so terrible. While the
new vector shuffle code I'm working on well help some, it won't fix all
of the crimes here.
llvm-svn: 218807
As discussed here:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140609/220598.html
And again here:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/077168.html
The sqrt of a negative number when using the llvm intrinsic is undefined.
We should return undef rather than 0.0 to match the definition in the LLVM IR lang ref.
This change should not affect any code that isn't using "no-nans-fp-math";
ie, no-nans is a requirement for generating the llvm intrinsic in place of a sqrt function call.
Unfortunately, the behavior introduced by this patch will not match current gcc, xlc, icc, and
possibly other compilers. The current clang/llvm behavior of returning 0.0 doesn't either.
We knowingly approve of this difference with the other compilers in an attempt to flag code
that is invoking undefined behavior.
A front-end warning should also try to convince the user that the program will fail:
http://llvm.org/bugs/show_bug.cgi?id=21093
Differential Revision: http://reviews.llvm.org/D5527
llvm-svn: 218803
These tests are far and away the best sext and zext tests we have for
vectors. I'm going to merge the other similar tests into them and expand
the ISA coverage.
llvm-svn: 218800
`DIExpression`'s elements are 64-bit integers that are stored as
`ConstantInt`. The accessors already encapsulate the storage. This
commit updates the `DIBuilder` API to also encapsulate that.
llvm-svn: 218797
No functionality change. This removes a down-cast from LinkingContext to
MachOLinkingContext.
Also, remove const from LinkingContext::createImplicitFiles() to remove
the need for another const cast. Seems reasonable for createImplicitFiles()
to need to modify the context (MachOLinkingContext does).
llvm-svn: 218796
script to make them nice and predictable. This will ease updating them
for the new vector shuffle lowering and seeing the delta if any.
llvm-svn: 218795
avx-sext.ll using my new script.
Also add an AVX2 mode to this test.
Part of cleaning up the test suite before enabling the new vector
shuffle lowering. This also highlights some of the abysmal failures of
the old shuffle lowering. Check out those 'pinsrw' and 'pextrw'
sequences!
llvm-svn: 218794
This change allows to annotate all parallel loops with loop id metadata.
Furthermore, it will annotate memory instructions with
llvm.mem.parallel_loop_access metadata for all surrounding parallel loops.
This is especially usefull if an external paralleliser is used.
This also removes the PollyLoopInfo class and comments the
LoopAnnotator.
A test case for multiple parallel loops is attached.
llvm-svn: 218793
- Problem
One program takes ~3min to compile under -O2. This happens after a certain
function A is inlined ~700 times in a function B, inserting thousands of new
BBs. This leads to 80% of the compilation time spent in
GVN::processNonLocalLoad and
MemoryDependenceAnalysis::getNonLocalPointerDependency, while searching for
nonlocal information for basic blocks.
Usually, to avoid spending a long time to process nonlocal loads, GVN bails out
if it gets more than 100 deps as a result from
MD->getNonLocalPointerDependency. However this only happens *after* all
nonlocal information for BBs have been computed, which is the bottleneck in
this scenario. For instance, there are 8280 times where
getNonLocalPointerDependency returns deps with more than 100 bbs and from
those, 600 times it returns more than 1000 blocks.
- Solution
Bail out early during the nonlocal info computation whenever we reach a
specified threshold. This patch proposes a 100 BBs threshold, it also
reduces the compile time from 3min to 23s.
- Testing
The test-suite presented no compile nor execution time regressions.
Some numbers from my machine (x86_64 darwin):
- 17s under -Oz (which avoids inlining).
- 1.3s under -O1.
- 2m51s under -O2 ToT
*** 23s under -O2 w/ Result.size() > 100
- 1m54s under -O2 w/ Result.size() > 500
With NumResultsLimit = 100, GVN yields the same outcome as in the
unlimited 3min version.
http://reviews.llvm.org/D5532
rdar://problem/18188041
llvm-svn: 218792
No functionality change intended.
This implements Elena's idea to put the new additionalOperand outside the
switch to cover all cases
(http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140929/237763.html).
Note only nontrivial change is in MRMSrcMemFrm. This requires an inclusive
interval of [2, 4] because we have prefix-dependent *optional* immediate
operand.
llvm-svn: 218790
As with x86 and AArch64, certain situations can arise where we need to spill
CPSR in the middle of a calculation. These should be avoided where possible
(MRS/MSR is rather expensive), which ARM is actually better at than the other
two since it tries to Glue defs to uses, but as a last ditch effort, copying is
better than crashing.
rdar://problem/18011155
llvm-svn: 218789
Complex address expressions are no longer part of DIVariable, but
rather an extra argument to the debug intrinsics.
http://reviews.llvm.org/D4919
rdar://problem/17994491
llvm-svn: 218788
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.
Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.
By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.
The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)
This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.
What this patch doesn't do:
This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.
http://reviews.llvm.org/D4919
rdar://problem/17994491
Thanks to dblaikie and dexonsmith for reviewing this patch!
Note: I accidentally committed a bogus older version of this patch previously.
llvm-svn: 218787
Summary: Implement conversion of 64 to 32 bit floating point numbers (fptrunc) in mips fast-isel
Test Plan:
fptrunc.ll
checked also with 4 internal mips build bot flavors mip32r1/miprs32r2 and at -O0 and -O2
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: rfuhler
Differential Revision: http://reviews.llvm.org/D5553
llvm-svn: 218785
r206400 and r209442 added remarks that are disabled by default.
However, if a diagnostic handler is registered, the remarks are sent
unfiltered to the handler. This is the right behaviour for clang, since
it has its own filters.
However, the diagnostic handler exposed in the LTO API receives only the
severity and message. It doesn't have the information to filter by pass
name. For LTO, disabled remarks should be filtered by the producer.
I've changed `LLVMContext::setDiagnosticHandler()` to take a `bool`
argument indicating whether to respect the built-in filters. This
defaults to `false`, so other consumers don't have a behaviour change,
but `LTOCodeGenerator::setDiagnosticHandler()` sets it to `true`.
To make this behaviour testable, I added a `-use-diagnostic-handler`
command-line option to `llvm-lto`.
This fixes PR21108.
llvm-svn: 218784
to recover from parse error parsing the default
argument. Patch prevents crash after spewing 100s
of errors caused by someone who forgot to compile in c++11
mode. So no test. rdar://18508589
llvm-svn: 218780
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.
Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.
By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.
The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)
This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.
What this patch doesn't do:
This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.
http://reviews.llvm.org/D4919
rdar://problem/17994491
Thanks to dblaikie and dexonsmith for reviewing this patch!
llvm-svn: 218778
Complex address expressions are no longer part of DIVariable, but
rather an extra argument to the debug intrinsics.
http://reviews.llvm.org/D4919
rdar://problem/17994491
llvm-svn: 218777
ThreadIDFunc => ThreadIDFunction
LogFunc => LogIDFunction
We try to avoid abbreviations/shortened names. Adjusted function parameter names
as well to replace _func with _function.
llvm-svn: 218773
Summary:
This change introduces DynMatcherInterface and changes the internal
representation of DynTypedMatcher and Matcher<T> to use a generic
interface instead.
It removes unnecessary indirections and virtual function calls when
converting matchers by implicit and dynamic casts.
DynTypedMatcher now remembers the stricter type in the chain of casts
and checks it before calling into DynMatcherInterface.
This change improves our clang-tidy related benchmark by ~14%.
Also, it opens the door for more optimizations of this kind that are
coming in future changes.
As a side effect of removing these template instantiations, it also
speeds up compilation of Dynamic/Registry.cpp by ~17% and reduces the
number of
symbols generated by ~30%.
Reviewers: klimek
Subscribers: klimek, cfe-commits
Differential Revision: http://reviews.llvm.org/D5542
llvm-svn: 218769
Summary: It's better if we have a consistent name for .cpload-related functions.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5437
llvm-svn: 218768
There is some strange interaction between mmap limit and unlimited stack
(ulimit -s unlimited), which results in this test failing when run with
"make".
llvm-svn: 218764
Currently, we only codegen the VRINT[APMXZR] and VCVT[BT] instructions
when targeting ARMv8, but they are actually present on any target with
FP-ARMv8. Note that FP-ARMv8 is called FPv5 when is is part of an
M-profile core, but they have the same instructions so we model them
both as FPARMv8 in the ARM backend.
llvm-svn: 218763
r218568 added an explicit #include of the Linux ProcessMonitor.h to
POSIXThread.cpp, rather than including just "ProcessMonitor.h" and
relying on the build infrastructure for the appropriate paths.
For now add #ifdefs in the source to use the FreeBSD or Linux header
as appropriate; a cleaner fix (and perhaps some refactoring of the
POSIX classes) should still be done later.
llvm-svn: 218762