The following function currently relies on tail-merging for if
conversion to succeed. The common tail of cond_true and cond_false is
extracted, and this then forms a diamond pattern that can be
successfully if converted.
If this block does not get extracted, either because tail-merging is
disabled or the threshold is higher, we should still recognize this
pattern and if-convert it.
Fixed a regression in the original commit. Need to un-reverse branches after
reversing them, or other conversions go awry.
Regression on self-hosting bots with no obvious explanation. Tidied up range
handling to be more obviously correct, but there was no smoking gun.
define i32 @t2(i32 %a, i32 %b) nounwind {
entry:
%tmp1434 = icmp eq i32 %a, %b ; <i1> [#uses=1]
br i1 %tmp1434, label %bb17, label %bb.outer
bb.outer: ; preds = %cond_false, %entry
%b_addr.021.0.ph = phi i32 [ %b, %entry ], [ %tmp10, %cond_false ]
%a_addr.026.0.ph = phi i32 [ %a, %entry ], [ %a_addr.026.0, %cond_false ]
br label %bb
bb: ; preds = %cond_true, %bb.outer
%indvar = phi i32 [ 0, %bb.outer ], [ %indvar.next, %cond_true ]
%tmp. = sub i32 0, %b_addr.021.0.ph
%tmp.40 = mul i32 %indvar, %tmp.
%a_addr.026.0 = add i32 %tmp.40, %a_addr.026.0.ph
%tmp3 = icmp sgt i32 %a_addr.026.0, %b_addr.021.0.ph
br i1 %tmp3, label %cond_true, label %cond_false
cond_true: ; preds = %bb
%tmp7 = sub i32 %a_addr.026.0, %b_addr.021.0.ph
%tmp1437 = icmp eq i32 %tmp7, %b_addr.021.0.ph
%indvar.next = add i32 %indvar, 1
br i1 %tmp1437, label %bb17, label %bb
cond_false: ; preds = %bb
%tmp10 = sub i32 %b_addr.021.0.ph, %a_addr.026.0
%tmp14 = icmp eq i32 %a_addr.026.0, %tmp10
br i1 %tmp14, label %bb17, label %bb.outer
bb17: ; preds = %cond_false, %cond_true, %entry
%a_addr.026.1 = phi i32 [ %a, %entry ], [ %tmp7, %cond_true ], [ %a_addr.026.0, %cond_false ]
ret i32 %a_addr.026.1
}
Without tail-merging or diamond-tail if conversion:
LBB1_1: @ %bb
@ =>This Inner Loop Header: Depth=1
cmp r0, r1
ble LBB1_3
@ BB#2: @ %cond_true
@ in Loop: Header=BB1_1 Depth=1
subs r0, r0, r1
cmp r1, r0
it ne
cmpne r0, r1
bgt LBB1_4
LBB1_3: @ %cond_false
@ in Loop: Header=BB1_1 Depth=1
subs r1, r1, r0
cmp r1, r0
bne LBB1_1
LBB1_4: @ %bb17
bx lr
With diamond-tail if conversion, but without tail-merging:
@ BB#0: @ %entry
cmp r0, r1
it eq
bxeq lr
LBB1_1: @ %bb
@ =>This Inner Loop Header: Depth=1
cmp r0, r1
ite le
suble r1, r1, r0
subgt r0, r0, r1
cmp r1, r0
bne LBB1_1
@ BB#2: @ %bb17
bx lr
llvm-svn: 279168
The cost of predicating a diamond is only the instructions that are not shared
between the two branches. Additionally If a predicate clobbering instruction
occurs in the shared portion of the branches (e.g. a cond move), it may still
be possible to if convert the sub-cfg. This change handles these two facts by
rescanning the non-shared portion of a diamond sub-cfg to recalculate both the
predication cost and whether both blocks are pred-clobbering.
llvm-svn: 279167
This may affect calculations for thresholds, but is not a significant change
in behavior.
The problem was that an inclusive range must have an additonal flag to showr
that it is empty, because otherwise begin == end implies that the range has one
element, and it may not be possible to move past on either side.
llvm-svn: 279166
Each runtime project has a top-level target that is the name of the runtime (minus the "lib" prefix if applicable). This creates top-level targets mapping to runtime projects.
llvm-svn: 279160
Summary:
Inline asm memory constraints can have the base or index register be assigned
to %r0 right now. Make sure that we assign only ADDR64 registers to the base
and index.
Reviewers: uweigand
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23367
llvm-svn: 279157
The subproject interface being used for runtime libraries expects that llvm-config is passed into the subproject for consumption. We currently do this for every subproject, so we should expect that all LLVM ExternalProjects depend on llvm-config for the time being.
Eventually I'd like to see the sub-projects using LLVMConfig.cmake instead of the llvm-config binary, but that will take time to roll out.
llvm-svn: 279155
Xcode 8 requires toolchain compatibility version 2. This allows us to select the correct compatibility version based on the installed version of Xcode.
llvm-svn: 279152
Clean up the existing code by:
1. Renaming variables
2. Adding local variables
3. Making it vector-safe
This is still guarded by a ConstantInt check, so no functional change is intended.
But this should be ready to go: if we move the ConstantInt check down, all of
these folds should do the right thing for vector types.
llvm-svn: 279150
Summary:
We need to use floating-point compares to ensure that s_cbranch_vcc*
instructions are always generated. With integer compares, future
optimizations could cause s_cbranch_scc* to be generated instead.
Reviewers: arsenm, nhaehnle
Subscribers: llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D23401
llvm-svn: 279148
The names of the tablegen defs now match the names of the ISD nodes.
This makes the world a slightly saner place, as previously "fround" matched
ISD::FP_ROUND and not ISD::FROUND.
Differential Revision: https://reviews.llvm.org/D23597
llvm-svn: 279129
We abort building vectorizable trees in some cases (e.g., if the maximum
recursion depth is reached, if the region size is too large, etc.). If this
happens for a reduction, we can be left with a root entry that needs to be
gathered. For these cases, we need make sure we actually set VectorizedValue to
the resulting vector.
This patch ensures we properly set VectorizedValue, and it also ensures the
insertelement sequence generated for the gathers is inserted at the correct
location.
Reference: https://llvm.org/bugs/show_bug.cgi?id=28330
Differential Revison: https://reviews.llvm.org/D23410
llvm-svn: 279125
Re-apply r276044 with off-by-1 instruction fix for the reload placement.
This is a variant of scavengeRegister() that works for
enterBasicBlockEnd()/backward(). The benefit of the backward mode is
that it is not affected by incomplete kill flags.
This patch also changes
PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register
scavenger in backwards mode.
Differential Revision: http://reviews.llvm.org/D21885
llvm-svn: 279124
This is prep work for allowing the threshold to be different during layout,
and to enforce a single threshold between merging and duplicating during
layout. No observable change intended.
llvm-svn: 279117
When running 'opt -O2 verify-uselistorder-nodbg.lto.bc', there are 33m allocations. 8.2m
come from std::string allocations in Intrinsic::getName(). Turns out this method only
returns a std::string because it needs to handle overloads, but that is not the common case.
This adds an overload of getName which just returns a StringRef when there are no overloads
and so saves on the allocations.
llvm-svn: 279113
The Xcode and Visual Studio generators always log "-- No build type selected, default to Debug". This is because CMake doesn't initialize "CMAKE_CONFIGURATION_TYPES" until the generator's EnableLanguage call gets hit.
The first place EnableLanguage gets hit in our configuration is in the project() call. Since CMAKE_BUILD_TYPE isn't used until after we call project() it is safe to just move this check down a bit.
llvm-svn: 279110
Normally, when an AND with a constant is lowered to NILL, the constant value is truncated to 16 bits. However, since r274066, ANDs whose results are used in a shift are caught by a different pattern that does not truncate. The instruction printer expects a 16-bit unsigned immediate operand for NILL, so this results in an abort.
This patch adds code to manually truncate the constant in this situation. The rest of the bits are then set, so we will detect a case for NILL "naturally" rather than using peephole optimizations.
Differential Revision: http://reviews.llvm.org/D21854
llvm-svn: 279105
Remove an unnecessary round-trip:
iterator => operator->() => getIterator()
In some cases, the iterator is end(), so the dereference of operator->
is invalid (UB).
The testcase only crashes with r278974 (currently reverted to
investigate this), which adds an assertion for invalid dereferences of
ilist nodes.
Fixes PR29035.
llvm-svn: 279104
The WebAssemly spec removing the return value from store instructions, so
remove the associated optimization from LLVM.
This patch leaves the store instruction operands in place for now, so stores
now always write to "$drop"; these will be removed in a seperate patch.
llvm-svn: 279100
number of assume intrinsics.
The classical way to have a cache-friendly vector style container when
we need queue semantics for BFS instead of stack semantics for DFS is to
use an ever-growing vector and an index. Erasing from the front requires
O(size) work, and unless we expect the worklist to grow *very* large,
its probably cheaper to just grow and race down the list.
But that makes it more bad that we're putting the assume intrinsics in
this at all. We end up looking at the (by definition empty) use list to
see if they're ephemeral (when we've already put them in that set), etc.
Instead, directly populate the worklist with the operands when we mark
the assume intrinsics as ephemeral. Also, test the visited set *before*
putting things into the worklist so we don't accumulate the same value
in the list 100s of times.
It would be nice to use a set-vector for this but I think its useful to
test the set earlier to avoid repeatedly querying whether the same
instruction is safe to speculate.
Hopefully with these changes the number of values pushed onto the
worklist is smaller, and we avoid quadratic work by letting it grow as
necessary.
Differential Revision: https://reviews.llvm.org/D23396
llvm-svn: 279099
This reverts commit r279086, reapplying r279084. I'm not sure what I
ran before, because the compile failure for ADTTests reproduced locally.
The problem is that TestRev is calling BidirectionalVector::rbegin()
when the BidirectionalVector is const, but rbegin() is always non-const.
I've updated BidirectionalVector::rbegin() to be callable from const.
Original commit message follows.
--
As a follow-up to r278991, add some tests that check that
decltype(reverse(R).begin()) == decltype(R.rbegin()), and get them
passing by adding std::remove_reference to has_rbegin.
I'm using static_assert instead of EXPECT_TRUE (and updated the other
has_rbegin check from r278991 in the same way) since I figure that's
more helpful.
llvm-svn: 279091
The original patch was breaking some buildbots due to an
incorrect ordering of function definitions which caused some
compilers to recognize a definition but others to not.
llvm-svn: 279089
This adds behaviour similar to binutils' objdump which can show symbols in an
import library. Differences from that stem around the fact that we do not
create section symbols nor the all import import descriptor symbol reference.
However, this does mean that the tool can serve as a possible replacement for
the existing tool.
llvm-svn: 279088
As a follow-up to r278991, add some tests that check that
decltype(reverse(R).begin()) == decltype(R.rbegin()), and get them
passing by adding std::remove_reference to has_rbegin.
I'm using static_assert instead of EXPECT_TRUE (and updated the other
has_rbegin check from r278991 in the same way) since I figure that's
more helpful.
llvm-svn: 279084
It causes a regression on our internal benchmark. Introduce cvp-dont-process flag and set it off by default while investigating the regression.
llvm-svn: 279082
The ARMv8*-A descriptions in the ARM and AArch64 TargetParsers are incorrect
architecturally and mismatched to the backend descriptions.
RAS is an optional extension to ARMv8-A and ARMv8.1-A and mandatory in
ARMv8.2-A. Correct the ARMTargetParser descriptions which had this as enabled
by default in the earlier versions.
The FP16 and SPE extensions are optional in ARMv8.2-A and the backend defaults
them as off. They are not available as extensions to earlier ARMv8-A versions.
Correct the AArch64TargetParser which had these as enabled by default in all
ARMv8-A definitions.
These macros are only used to define preprocessor macros. There are no macros
yet as ACLE has not caught up with ARMv8.2-A so not possible to add a test.
Differential Revision: https://reviews.llvm.org/D23500
llvm-svn: 279078
This patch changes the code structure of
WebAssemblyLowerEmscriptenException pass to support both exception
handling and setjmp/longjmp. It also changes the name of the pass and
the source file.
1. Change the file/pass name to WebAssemblyLowerEmscriptenExceptions ->
WebAssemblyLowerEmscriptenEHSjLj to make it clear that it supports both
EH and SjLj
2. List function / global variable names at the top so they
can be changed easily
3. Some cosmetic changes
Patch by Heejin Ahn
Differential Revision: https://reviews.llvm.org/D23588
llvm-svn: 279075
There is no REM instruction; that will require an expansion.
It's not obvious that should be done in select, rather than as a
(custom?) legalization.
llvm-svn: 279074
`link -dump -exports` lists exported symbols from import libraries as well as
normal dlls. Ensure that we can handle import libraries as well in
llvm-readobj.
llvm-svn: 279069
r277708 enabled tails calls for MIPS but used the 'jr' instruction when the
jump target was held in a register. For MIPSR6, 'jalr $zero, $reg' should
have been used. Additionally, add missing patterns for external and global
symbols for tail calls.
Reviewers: dsanders, vkalintiris
Differential Review: https://reviews.llvm.org/D23301
llvm-svn: 279064
Summary:
This is a pretty trivial, but I thought it was worth just checking that nobody feels it's completely the wrong thing to be doing.
The motivation is that when starting a new backend, you often start with a minimal stub, pretty much just FooTargetMachine and FooTargetInfo. Once that's built, you might naturally try `llc -march=foo myinput.ll` and it seems more developer-friendly if this ends up asserting due to the lack of MCAsmInfo with an informative message rather than just segfaulting.
Reviewers: MatzeB, chandlerc
Subscribers: bogner, llvm-commits
Differential Revision: https://reviews.llvm.org/D23443
llvm-svn: 279061
I had updated the output file name but not the corresponding nm based check
before submitting as r279023. This should fix the bot failures
llvm-svn: 279025
Summary:
Skip the merging of common symbols for ThinLTO modules, they will be
merged by the final native object link. Trying to merge the symbols and
add to a combined module will incorrectly enable the common symbol to be
internalized in the ThinLTO module. Additionally, we will not want to
create a combined module for ThinLTO distributed builds.
This fixes failures in 7 cpu2006 benchmarks from the new LTO API in
ThinLTO mode.
Reviewers: mehdi_amini
Subscribers: pcc, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23637
llvm-svn: 279023
Summary:
This was reversed compared to ThinLTOCodeGenerator for some reason,
and lead to an increased code-size on my tests. I figured that the
weak resolution may internalize a linkonce function, which will be
promoted immediately (and renamed), before being internalized again.
Reviewers: tejohnson
Subscribers: pcc, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23632
llvm-svn: 279021
Summary:
We are going to combine poisoning of red zones and scope poisoning.
PR27453
Reviewers: kcc, eugenis
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23623
llvm-svn: 279020
RTDyldMemoryManager::getSymbolAddressInProcess()
This should allow JIT'd code for win32 to find in-process symbols. See
http://llvm.org/PR28699 .
Patch by James Holderness. Thanks James!
llvm-svn: 279016
Summary:
It does not play well with directories (end up with a bunch of hidden
files).
Also, do not strip the 0 suffix for the first task, especially since
0 can be used by ThinLTO as well now.
Reviewers: tejohnson
Subscribers: mehdi_amini, pcc, llvm-commits
Differential Revision: https://reviews.llvm.org/D23612
llvm-svn: 279014
Change the ilist traits to use decltype instead of sizeof, and add
HasObsoleteCustomization so that additions to this list don't need to be
added in two places.
I suspect this will now work with MSVC, since the trait tested in
r278991 seems to work. If for some reason it continues to fail on
Windows I'll follow up by adding back the #ifndef _MSC_VER.
llvm-svn: 279012
Primarily, this clarifies wording in a few places, and adds "\ "s to
make the formatting of things like "``Foo`` s" better.
Thanks to Michael Kuperstein for the comments.
llvm-svn: 279007
llvm-pdbdump already had code to retrieve column information in the line tables, but it wasn't using it.
Most Microsoft PDBs don't seem to have column info, so this wasn't missed. But Clang includes column info by default (at least for now), and being able to see that is useful for ensuring we get the column info correct.
Differential Revision: https://reviews.llvm.org/D23629
llvm-svn: 279001
Summary: I later (after r278573) found that LoopIterator.h has some overlapping with LoopBodyTraits. It's good to use LoopBodyTraits because a *Traits struct is algorithm independent.
Reviewers: anemet, nadav, mkuper
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D23529
llvm-svn: 278996
Also, add a scalar test to demonstrate one of the intermediate folds that
is necessary to accomplish the existing, multi-step test. And simplify
the vector tests to only check the final piece of that multi-step transform.
llvm-svn: 278995
Duncan found that reverse worked on mutable rbegin(), but the has_rbegin
trait didn't work with a const method. See http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160815/382890.html
for more details.
Turns out this was already solved in clang with has_getDecl. Copied that and made it work for rbegin.
This includes the tests Duncan attached to that thread, including the traits test.
llvm-svn: 278991
Since I stopped writing empty export tries it causes LinkEdit to potentially be completely empty which results in invalid yaml being generated.
To prevent this we skip linkedit data if it is empty.
llvm-svn: 278985
This will allow tail duplication and tail merging during layout to have a
shared threshold to make sure that they don't overlap. No observable change
intended.
llvm-svn: 278981
This removes the undefined behaviour (UB) in ilist/ilist_node/etc.,
mainly by removing (gutting) the ilist_sentinel_traits customization
point and canonicalizing on a single, efficient memory layout. This
fixes PR26753.
The new ilist is a doubly-linked circular list.
- ilist_node_base has two ilist_node_base*: Next and Prev. Size-of: two
pointers.
- ilist_node<T> (size-of: two pointers) is a type-safe wrapper around
ilist_node_base.
- ilist_iterator<T> (size-of: two pointers) operates on an
ilist_node<T>*, and downcasts to T* on dereference.
- ilist_sentinel<T> (size-of: two pointers) is a wrapper around
ilist_node<T> that has some extra API for list management.
- ilist<T> (size-of: two pointers) has an ilist_sentinel<T>, whose
address is returned for end().
The new memory layout matches ilist_half_embedded_sentinel_traits<T>
exactly. The Head pointer that previously lived in ilist<T> is
effectively glued to the ilist_half_node<T> that lived in
ilist_half_embedded_sentinel_traits<T>, becoming the Next and Prev in
the ilist_sentinel_node<T>, respectively. sizeof(ilist<T>) is now the
size of two pointers, and there is never any additional storage for a
sentinel.
This is a much simpler design for a doubly-linked list, removing most of
the corner cases of list manipulation (add, remove, etc.). In follow-up
commits, I intend to move as many algorithms as possible into a
non-templated base class (ilist_base) to reduce code size.
Moreover, this fixes the UB in ilist_iterator/getNext/getPrev
operations. Previously, ilist_iterator<T> operated on a T*, even when
the sentinel was not of type T (i.e., ilist_embedded_sentinel_traits and
ilist_half_embedded_sentinel_traits). This added UB to all operations
involving end(). Now, ilist_iterator<T> operates on an ilist_node<T>*,
and only downcasts when the full type is guaranteed to be T*.
What did we lose? There used to be a crash (in some configurations) on
++end(). Curiously (via UB), ++end() would return begin() for users of
ilist_half_embedded_sentinel_traits<T>, but otherwise ++end() would
cause a nice dependable nullptr dereference, crashing instead of a
possible infinite loop. Options:
1. Lose that behaviour.
2. Keep it, by stealing a bit from Prev in asserts builds.
3. Crash on dereference instead, using the same technique.
Hans convinced me (because of the number of problems this and r278532
exposed on Windows) that we really need some assertion here, at least in
the short term. I've opted for #3 since I think it catches more bugs.
I added only a couple of unit tests to root out specific bugs I hit
during bring-up, but otherwise this is tested implicitly via the
extensive usage throughout LLVM.
Planned follow-ups:
- Remove ilist_*sentinel_traits<T>. Here I've just gutted them to
prevent build failures in sub-projects. Once I stop referring to them
in sub-projects, I'll come back and delete them.
- Add ilist_base and move algorithms there.
- Check and fix move construction and assignment.
Eventually, there are other interesting directions:
- Rewrite reverse iterators, so that rbegin().getNodePtr()==&*rbegin().
This allows much simpler logic when erasing elements during a reverse
traversal.
- Remove ilist_traits::createNode, by deleting the remaining API that
creates nodes. Intrusive lists shouldn't be creating nodes
themselves.
- Remove ilist_traits::deleteNode, by (1) asserting that lists are empty
on destruction and (2) changing API that calls it to take a Deleter
functor (intrusive lists shouldn't be in the memory management
business).
- Reconfigure the remaining callback traits (addNodeToList, etc.) to be
higher-level, pulling out a simple_ilist<T> that is much easier to
read and understand.
- Allow tags (e.g., ilist_node<T,tag1> and ilist_node<T,tag2>) so that T
can be a member of multiple intrusive lists.
llvm-svn: 278974
This reverts commit r278967, since the new test is failing when you
don't build the WebAssembly target (most people, since it's
off-by-default).
llvm-svn: 278973
Making explicit our current policy to accept new targets as experimental and
later official. Every new target should follow these rules to be added,
and kept relevant in the upstream tree.
llvm-svn: 278971
Summary:
This is part of the "NodeType* -> NodeRef" migration. Notice that since
GraphWriter prints object address as identity, I added a static_assert on
NodeRef to be a pointer type.
Reviewers: dblaikie
Subscribers: llvm-commits, MatzeB
Differential Revision: https://reviews.llvm.org/D23580
llvm-svn: 278966