analysis of the alloca. We don't need to visit all the users twice for
this. We build up a kill list during the analysis and then just process
it afterward. This recovers the tiny bit of performance lost by moving
to the visitor based analysis system as it removes one entire use-list
walk from mem2reg. In some cases, this is now faster than mem2reg was
previously.
llvm-svn: 187296
Also always add DIType, DISubprogram and DIGlobalVariable to the list
in DebugInfoFinder without checking them, so we can verify them later
on.
llvm-svn: 187285
Adds unit tests for it too.
Split BasicBlockUtils into an analysis-half and a transforms-half, and put the
analysis bits into a new Analysis/CFG.{h,cpp}. Promote isPotentiallyReachable
into llvm::isPotentiallyReachable and move it into Analysis/CFG.
llvm-svn: 187283
Merge consecutive if-regions if they contain identical statements.
Both transformations reduce number of branches. The transformation
is guarded by a target-hook, and is currently enabled only for +R600,
but the correctness has been tested on X86 target using a variety of
CPU benchmarks.
Patch by: Mei Ye
llvm-svn: 187278
This change makes test with RUN lines like
RUN: opt ... | FileCheck
fail if opt fails, even if it prints what FileCheck wants. Enabling this
found some interesting cases of broken tests that were not being noticed
because opt (or some other tool) was crashing late.
Pipefail is used when the shell supports it or when using the internal
python based tester.
llvm-svn: 187261
Both GCC and LLVM will implicitly define __ppc__ and __powerpc__ for
all PowerPC targets, whether 32- or 64-bit. They will both implicitly
define __ppc64__ and __powerpc64__ for 64-bit PowerPC targets, and not
for 32-bit targets. We cannot be sure that all other possible
compilers used to compile Clang/LLVM define both __ppc__ and
__powerpc__, for example, so it is best to check for both when relying
on either inside the Clang/LLVM code base.
This patch makes sure we always check for both variants. In addition,
it fixes one unnecessary check in lib/Target/PowerPC/PPCJITInfo.cpp.
(At least one of __ppc__ and __powerpc__ should always be defined when
compiling for a PowerPC target, no matter which compiler is used, so
testing for them is unnecessary.)
There are some places in the compiler that check for other variants,
like __POWERPC__ and _POWER, and I have left those in place. There is
no need to add them elsewhere. This seems to be in Apple-specific
code, and I won't take a chance on breaking it.
There is no intended change in behavior; thus, no test cases are
added.
llvm-svn: 187248
We used to call Verify before adding DICompileUnit to the list, and now we
remove the check and always add DICompileUnit to the list in DebugInfoFinder,
so we can verify them later on.
llvm-svn: 187237
These were reverted in r167222 along with the rest
of the last different address space pointer size attempt.
These will be used in later commits.
llvm-svn: 187223
type units.
Initially this support is used in the computation of an ODR checker
for C++. For now we're attaching it to the DIE, but in the future
it will be attached to the type unit.
This also starts breaking out types into the separation for type
units, but without actually splitting the DIEs.
In preparation for hashing the DIEs this adds a DIEString type
that contains a StringRef with the string contained at the label.
llvm-svn: 187213
On Windows, this improves clean cmake configuration time on my
workstation from 1m58s to 1m32s, which is pretty significant. There's
probably more that can be done here, but this is the low hanging fruit.
Eric volunteered to regenerate ./configure for me.
llvm-svn: 187209
* Remove LLVM_ENABLE_CRT_REPORT. LLVM_DISABLE_CRASH_REPORT made it redundant.
* set Return to 1, so that we get a stack trace on failure.
* don't call _exit, so that we get a negative exit value and "not --crash"
correctly differentiates crashes and regular errors.
This is a bit experimental since the documentation on this interface is sparse.
It doesn't bring up a dialog on my windows setup, but feel free to revert
if it causes problem for your setup (and let me know what it is so that I
can try to fix this patch).
llvm-svn: 187206
CustomLowerNode was not being called during SplitVectorOperand,
meaning custom legalization could not be used by targets.
This also adds a test case for NVPTX that depends on this custom
legalization.
Differential Revision: http://llvm-reviews.chandlerc.com/D1195
Attempt to fix the buildbots by making the X86 test I just added platform independent
llvm-svn: 187202
This reverts commit 187198. It broke the bots.
The soft float test probably needs a -triple because of name differences.
On the hard float test I am getting a "roundss $1, %xmm0, %xmm0", instead of
"vroundss $1, %xmm0, %xmm0, %xmm0".
llvm-svn: 187201
CustomLowerNode was not being called during SplitVectorOperand,
meaning custom legalization could not be used by targets.
This also adds a test case for NVPTX that depends on this custom
legalization.
Differential Revision: http://llvm-reviews.chandlerc.com/D1195
llvm-svn: 187198
robust. It now uses an InstVisitor and worklist to actually walk the
uses of the Alloca transitively and detect the pattern which we can
directly promote: loads & stores of the whole alloca and instructions we
can completely ignore.
Also, with this new implementation teach both the predicate for testing
whether we can promote and the promotion engine itself to use the same
code so we no longer have strange divergence between the two code paths.
I've added some silly test cases to demonstrate that we can handle
slightly more degenerate code patterns now. See the below for why this
is even interesting.
Performance impact: roughly 1% regression in the performance of SROA or
ScalarRepl on a large C++-ish test case where most of the allocas are
basically ready for promotion. The reason is because of silly redundant
work that I've left FIXMEs for and which I'll address in the next
commit. I wanted to separate this commit as it changes the behavior.
Once the redundant work in removing the dead uses of the alloca is
fixed, this code appears to be faster than the old version. =]
So why is this useful? Because the previous requirement for promotion
required a *specific* visit pattern of the uses of the alloca to verify:
we *had* to look for no more than 1 intervening use. The end goal is to
have SROA automatically detect when an alloca is already promotable and
directly hand it to the mem2reg machinery rather than trying to
partition and rewrite it. This is a 25% or more performance improvement
for SROA, and a significant chunk of the delta between it and
ScalarRepl. To get there, we need to make mem2reg actually capable of
promoting allocas which *look* promotable to SROA without have SROA do
tons of work to massage the code into just the right form.
This is actually the tip of the iceberg. There are tremendous potential
savings we can realize here by de-duplicating work between mem2reg and
SROA.
llvm-svn: 187191
The bitcode representation attribute kinds are encoded into / decoded from
should be independent of the current set of LLVM attributes and their position
in the AttrKind enum. This patch explicitly encodes attributes to fixed bitcode
values.
With this patch applied, LLVM does not silently misread attributes written by
LLVM 3.3. We also enhance the decoding slightly such that an error message is
printed if an unknown AttrKind encoding was dected.
Bonus: Dropping bitcode attributes from AttrKind is now easy, as old AttrKinds
do not need to be kept to support the Bitcode reader.
llvm-svn: 187186
This patch provides basic support for powerpc64le as an LLVM target.
However, use of this target will not actually generate little-endian
code. Instead, use of the target will cause the correct little-endian
built-in defines to be generated, so that code that tests for
__LITTLE_ENDIAN__, for example, will be correctly parsed for
syntax-only testing. Code generation will otherwise be the same as
powerpc64 (big-endian), for now.
The patch leaves open the possibility of creating a little-endian
PowerPC64 back end, but there is no immediate intent to create such a
thing.
The LLVM portions of this patch simply add ppc64le coverage everywhere
that ppc64 coverage currently exists. There is nothing of any import
worth testing until such time as little-endian code generation is
implemented. In the corresponding Clang patch, there is a new test
case variant to ensure that correct built-in defines for little-endian
code are generated.
llvm-svn: 187179
Back in r140220 we removed the autoconf code that would set LLVMCC_OPTION
since it was only used by the test-suite. This patch now removes code
that would only be used if LLVMCC_OPTION was set.
llvm-svn: 187154
The previous change to local live range allocation also suppressed
eviction of local ranges. In rare cases, this could result in more
expensive register choices. This commit actually revives a feature
that I added long ago: check if live ranges can be reassigned before
eviction. But now it only happens in rare cases of evicting a local
live range because another local live range wants a cheaper register.
The benefit is improved code size for some benchmarks on x86 and armv7.
I measured no significant compile time increase and performance
changes are noise.
llvm-svn: 187140
Also avoid locals evicting locals just because they want a cheaper register.
Problem: MI Sched knows exactly how many registers we have and assumes
they can be colored. In cases where we have large blocks, usually from
unrolled loops, greedy coloring fails. This is a source of
"regressions" from the MI Scheduler on x86. I noticed this issue on
x86 where we have long chains of two-address defs in the same live
range. It's easy to see this in matrix multiplication benchmarks like
IRSmk and even the unit test misched-matmul.ll.
A fundamental difference between the LLVM register allocator and
conventional graph coloring is that in our model a live range can't
discover its neighbors, it can only verify its neighbors. That's why
we initially went for greedy coloring and added eviction to deal with
the hard cases. However, for singly defined and two-address live
ranges, we can optimally color without visiting neighbors simply by
processing the live ranges in instruction order.
Other beneficial side effects:
It is much easier to understand and debug regalloc for large blocks
when the live ranges are allocated in order. Yes, global allocation is
still very confusing, but it's nice to be able to comprehend what
happened locally.
Heuristics could be added to bias register assignment based on
instruction locality (think late register pairing, banks...).
Intuituvely this will make some test cases that are on the threshold
of register pressure more stable.
llvm-svn: 187139
For two intrinsics 'llvm.nvvm.texsurf.handle' and 'llvm.nvvm.texsurf.handle.internal',
TableGen was emitting matching code like:
if (Name.startswith("llvm.nvvm.texsurf.handle")) ...
if (Name.startswith("llvm.nvvm.texsurf.handle.internal")) ...
We can never match "llvm.nvvm.texsurf.handle.internal" here because it will
always be erroneously matched by the first condition.
The fix is to sort the intrinsic names and emit them in reverse order.
llvm-svn: 187119
Before the patch we took advantage of the fact that the compare and
branch are glued together in the selection DAG and fused them together
(where possible) while emitting them. This seemed to work well in practice.
However, fusing the compare so early makes it harder to remove redundant
compares in cases where CC already has a suitable value. This patch
therefore uses the peephole analyzeCompare/optimizeCompareInstr pair of
functions instead.
No behavioral change intended, but it paves the way for a later patch.
llvm-svn: 187116
These instructions are allowed to trap even if the condition is false,
so for now they are only used for "*ptr = (cond ? x : *ptr)"-style
constructs.
llvm-svn: 187111
Make sure the context and type fields are MDNodes. We will generate
verification errors if those fields are non-empty strings.
Fix testing cases to make them pass the verifier.
llvm-svn: 187106
The language reference says that:
"If a symbol appears in the @llvm.used list, then the compiler,
assembler, and linker are required to treat the symbol as if there is
a reference to the symbol that it cannot see"
Since even the linker cannot see the reference, we must assume that
the reference can be using the symbol table. For example, a user can add
__attribute__((used)) to a debug helper function like dump and use it from
a debugger.
llvm-svn: 187103
There's no need to specify a flag to omit frame pointer elimination on non-leaf
nodes...(Honestly, I can't parse that option out.) Use the function attribute
stuff instead.
llvm-svn: 187093
Prior to this patch, IfConverter may widen the cases where a sequence of
instructions were executed because of the way it uses nested predicates. This
result in incorrect execution.
For instance, Let A be a basic block that flows conditionally into B and B be a
predicated block.
B can be predicated with A.BrToBPredicate into A iff B.Predicate is less
"permissive" than A.BrToBPredicate, i.e., iff A.BrToBPredicate subsumes
B.Predicate.
The IfConverter was checking the opposite: B.Predicate subsumes
A.BrToBPredicate.
<rdar://problem/14379453>
llvm-svn: 187071
Before this patch we would strdup each argument. If one was a response file,
we would replace it with the response file contents, leaking the original
strdup result.
We now don't strdup the originals and let StringSaver free any memory it
allocated. This also saves a bit of malloc traffic when response files are
not used.
Leak found by the valgrind build bot.
llvm-svn: 187042
The Binary constructor takes ownership of the memory buffer. This is a fairly
unfortunate interface, but for now make createObjectFile consistent with it
by also deleting the buffer if it fails.
Fixes a leak in llvm-ar found by the valgrind bots.
llvm-svn: 187039
schedule an alloca for another iteration in SROA. This only showed up
with a mixture of promotable and unpromotable selects and phis. Added
a test case for this.
llvm-svn: 187031
pending speculation for a phi node. The problem here is that we were
using growth of the specluation set as an indicator of whether
speculation would occur, and if the phi node is already in the set we
don't see it grow. This is a symptom of the fact that this signal is
a total hack.
Unfortunately, I couldn't really come up with a non-hacky way of
signaling that promotion remains valid *after* speculation occurs, such
that we only speculate when all else looks good for promotion. In the
end, I went with at least a much more explicit approach of doing the
work of queuing inside the phi and select processing and setting
a preposterously named flag to convey that we're in the special state of
requiring speculating before promotion.
Thanks to Richard Trieu and Nick Lewycky for the excellent work reducing
a testcase for this from a pretty giant, nasty assert in a big
application. =] The testcase was excellent.
llvm-svn: 187029
This removes the need to store the asm variant in each row of the single table that existed before. Shaves ~16K off the size of X86AsmParser.o.
llvm-svn: 187026
Similar to ARM change r182800, dynamic linker will read bits/addends from
the original object rather than from the object that might have been patched
previously. For the purpose of relocations for MCJIT stubs on MIPS, we
internally use otherwise unused MIPS relocations.
The change also enables MCJIT unit tests for MIPS (EL/BE), and the following
two tests now pass:
- MCJITTest.return_global and
- MCJITTest.multiple_functions.
These issues have been tracked as Bug 16250.
Patch by Petar Jovanovic.
llvm-svn: 187019
These are really the same address space in hardware. The only
difference is that CONSTANT_ADDRESS uses a special cache for faster
access. When we are unable to use the constant kcache for some reason
(e.g. smaller types or lack of indirect addressing) then the instruction
selector must use GLOBAL_ADDRESS loads instead.
llvm-svn: 187006
When vectors are built from a single value, the ARM lowering issues a
scalar_to_vector node.
This node is then always morphed into a move from the general purpose unit to
the vector unit.
When the value comes from a load, this can be simplified into a vector load to
the right lane.
This patch changes the lowering of insert_vector_elt to expose a vector
friendly pattern in this situation.
This is a step toward fixing <rdar://problem/14170854>.
llvm-svn: 186999
The main observation is that we never need both the filesize and the map size.
When mapping a slice of a file, it doesn't make sense to request a null
terminator and that would be the only case where the filesize would be used.
There are other cleanups that should be done in this area:
* A client should not have to pass the size (even an explicit -1) to say if
it wants a null terminator or not, so we should probably swap the argument
order.
* The default should be to not require a null terminator. Very few clients
require this, but many end up asking for it just because it is the default.
llvm-svn: 186984
The gold plugin was passing the desired map size as the file size. This was
working for two reasons:
* Recent version of gold provide the get_view callback, so this code was not
used.
* In older versions, getOpenFile was called, but the file size is never used
if we don't require null terminated buffers and map size defaults to the
file size.
Thanks to Eli Bendersky for noticing this.
I will try to make this api a bit less error prone.
llvm-svn: 186978
The symbol table has forward references in the file. Instead of allocating
a temporary buffer or counting the size and then writing, this implementation
writes a dummy value first and patches it once the final value is known.
There is room for performance improvement. I will implement them as soon as I
get some other features (like a ranlib mode) in.
llvm-svn: 186934