When a linkonce_odr value that is on the dso list is not unnamed_addr
we can still look to see if anything is actually using its address. If
not, it is safe to hide it.
This patch implements that by moving GlobalStatus to Transforms/Utils
and using it in Internalize.
llvm-svn: 193090
A landing pad can be jumped to only by the unwind edge of an invoke
instruction. If we eliminate a partially redundant load in a landing pad, it
will create a basic block that violates this constraint. It then leads to other
problems down the line if it tries to merge that basic block with the landing
pad. Avoid this by not eliminating the load in a landing pad.
PR17621
llvm-svn: 193064
One optimization simplify-cfg performs is the converting of switches to
lookup tables if the switch has > 4 cases. This is done by:
1. Finding the max/min case value and calculating the switch case range.
2. Create a lookup table basic block.
3. Perform a check in the switch's BB to see if the input value is in
the switch's case range. If the input value satisfies said predicate
branch to the lookup table BB, otherwise branch to the switch's default
destination BB using the default value as the result.
The conditional check consists of subtracting the min case value of the
table from any input iN value and then ensuring that said value is
unsigned less than the size of the lookup table represented as an iN
value.
If the lookup table is a covered lookup table, the size of the table will be N
which is 0 as an iN value. Thus the comparison will be an `icmp ult` of an iN
value against 0 which is always false yielding the incorrect result.
This patch fixes this problem by recognizing if we have a covered lookup table
and if we do, unconditionally jumps to the lookup table BB since the covering
property of the lookup table implies no input values could not be handled by
said BB.
rdar://15268442
llvm-svn: 193045
If the predecessor's being spliced into a landing pad, then we need the PHIs to
come first and the rest of the predecessor's code to come *after* the landing
pad instruction.
llvm-svn: 193035
UpdatePHINodes has an optimization to reuse an existing PHI node, where it
first deletes all of its entries and then replaces them. Unfortunately, in the
case where we had duplicate predecessors (which are allowed so long as the
associated PHI entries have the same value), the loop removing the existing PHI
entries from the to-be-reused PHI would assert (if that PHI was not the one
which had the duplicates).
llvm-svn: 192001
infrastructure.
This was essentially work toward PGO based on a design that had several
flaws, partially dating from a time when LLVM had a different
architecture, and with an effort to modernize it abandoned without being
completed. Since then, it has bitrotted for several years further. The
result is nearly unusable, and isn't helping any of the modern PGO
efforts. Instead, it is getting in the way, adding confusion about PGO
in LLVM and distracting everyone with maintenance on essentially dead
code. Removing it paves the way for modern efforts around PGO.
Among other effects, this removes the last of the runtime libraries from
LLVM. Those are being developed in the separate 'compiler-rt' project
now, with somewhat different licensing specifically more approriate for
runtimes.
llvm-svn: 191835
This makes using array_pod_sort significantly safer. The implementation relies
on function pointer casting but that should be safe as we're dealing with void*
here.
llvm-svn: 191175
The work on this project was left in an unfinished and inconsistent state.
Hopefully someone will eventually get a chance to implement this feature, but
in the meantime, it is better to put things back the way the were. I have
left support in the bitcode reader to handle the case-range bitcode format,
so that we do not lose bitcode compatibility with the llvm 3.3 release.
This reverts the following commits: 155464, 156374, 156377, 156613, 156704,
156757, 156804 156808, 156985, 157046, 157112, 157183, 157315, 157384, 157575,
157576, 157586, 157612, 157810, 157814, 157815, 157880, 157881, 157882, 157884,
157887, 157901, 158979, 157987, 157989, 158986, 158997, 159076, 159101, 159100,
159200, 159201, 159207, 159527, 159532, 159540, 159583, 159618, 159658, 159659,
159660, 159661, 159703, 159704, 160076, 167356, 172025, 186736
llvm-svn: 190328
The existing code missed some edge cases when e.g. we're going to emit sqrtf but
only the availability of sqrt was checked. This happens on odd platforms like
windows.
llvm-svn: 189724
extremely subtle miscompilations (such as a load getting replaced with
the value stored *below* the load within a basic block) related to
promoting an alloca to an SSA value, there is the dim possibility that
you hit this. Please let me know if you won this unfortunate lottery.
The first half of mem2reg's core logic (as it is used both in the
standalone mem2reg pass and in SROA) builds up a mapping from
'Instruction *' to the index of that instruction within its basic block.
This allows quickly establishing which store dominate a particular load
even for large basic blocks. We cache this information throughout the
run of mem2reg over a function in order to amortize the cost of
computing it.
This is not in and of itself a strange pattern in LLVM. However, it
introduces a very important constraint: absolutely no instruction can be
deleted from the program without updating the mapping. Otherwise a newly
allocated instruction might get the same pointer address, and then end
up with a wrong index. Yes, LLVM routinely suffers from a *single
threaded* variant of the ABA problem. Most places in LLVM don't find
avoiding this an imposition because they don't both delete and create
new instructions iteratively, but mem2reg *loves* to do this... All the
time. Fortunately, the mem2reg code was really careful about updating
this cache to handle this eventuallity... except when it comes to the
debug declare intrinsic. Oops. The fix is to invalidate that pointer in
the cache when we delete it, the same as we do when deleting alloca
instructions and other instructions.
I've also caused the same bug in new code while working on a fix to
PR16867, so this seems to be a really unfortunate pattern. Hopefully in
subsequent patches the deletion of dead instructions can be consolidated
sufficiently to make it less likely that we'll see future occurences of
this bug.
Sorry for not having a test case, but I have literally no idea how to
reliably trigger this kind of thing. It may be single-threaded, but it
remains an ABA problem. It would require a really amazing number of
stars to align.
llvm-svn: 188367
However, opt -O2 doesn't run mem2reg directly so nobody noticed until r188146
when SROA started sending more things directly down the PromoteMemToReg path.
In order to revert r187191, I also revert dependent revisions r187296, r187322
and r188146. Fixes PR16867. Does not add the testcases from that PR, but both
of them should get added for both mem2reg and sroa when this revert gets
unreverted.
llvm-svn: 188327
Summary:
Doing work in constructors is bad: this change suggests to
call SpecialCaseList::create(Path, Error) instead of
"new SpecialCaseList(Path)". Currently the latter may crash with
report_fatal_error, which is undesirable - sometimes we want to report
the error to user gracefully - for example, if he provides an incorrect
file as an argument of Clang's -fsanitize-blacklist flag.
Reviewers: pcc
Reviewed By: pcc
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D1327
llvm-svn: 188156
It is breaking builbots with libgmalloc enabled on Mac OS X.
$ cd llvm ; mkdir release ; cd release
$ ../configure --enable-optimized —prefix=$PWD/install
$ make
$ make check
$ Release+Asserts/bin/llvm-lit -v --param use_gmalloc=1 --param \
gmalloc_path=/usr/lib/libgmalloc.dylib \
../test/Instrumentation/DataFlowSanitizer/args-unreachable-bb.ll
llvm-svn: 188142
This moves removeUnreachableBlocksFromFn from SimplifyCFGPass.cpp
to Utils/Local.cpp and uses it to replace the implementation of
llvm::removeUnreachableBlocks, which appears to do a strict subset
of what removeUnreachableBlocksFromFn does.
Differential Revision: http://llvm-reviews.chandlerc.com/D1334
llvm-svn: 188119
Our internal regex implementation does not cope with large numbers
of anchors very efficiently. Given a ~3600-entry special case list,
regex compilation can take on the order of seconds. This patch solves
the problem for the special case of patterns matching literal global
names (i.e. patterns with no regex metacharacters). Rather than
forming regexes from literal global name patterns, add them to
a StringSet which is checked before matching against the regex.
This reduces regex compilation time by an order of roughly thousands
when reading the aforementioned special case list, according to a
completely unscientific study.
No test cases. I figure that any new tests for this code should
check that regex metacharacters are properly recognised. However,
I could not find any documentation which documents the fact that the
syntax of global names in special case lists is based on regexes.
The extent to which regex syntax is supported in special case lists
should probably be decided on/documented before writing tests.
Differential Revision: http://llvm-reviews.chandlerc.com/D1150
llvm-svn: 187732
standards for LLVM. Remove duplicated comments on the interface from the
implementation file (implementation comments are left there of course).
Also clean up, re-word, and fix a few typos and errors in the commenst
spotted along the way.
This is in preparation for changes to these files and to keep the
uninteresting tidying in a separate commit.
llvm-svn: 187335
analysis of the alloca. We don't need to visit all the users twice for
this. We build up a kill list during the analysis and then just process
it afterward. This recovers the tiny bit of performance lost by moving
to the visitor based analysis system as it removes one entire use-list
walk from mem2reg. In some cases, this is now faster than mem2reg was
previously.
llvm-svn: 187296
Adds unit tests for it too.
Split BasicBlockUtils into an analysis-half and a transforms-half, and put the
analysis bits into a new Analysis/CFG.{h,cpp}. Promote isPotentiallyReachable
into llvm::isPotentiallyReachable and move it into Analysis/CFG.
llvm-svn: 187283
Merge consecutive if-regions if they contain identical statements.
Both transformations reduce number of branches. The transformation
is guarded by a target-hook, and is currently enabled only for +R600,
but the correctness has been tested on X86 target using a variety of
CPU benchmarks.
Patch by: Mei Ye
llvm-svn: 187278
robust. It now uses an InstVisitor and worklist to actually walk the
uses of the Alloca transitively and detect the pattern which we can
directly promote: loads & stores of the whole alloca and instructions we
can completely ignore.
Also, with this new implementation teach both the predicate for testing
whether we can promote and the promotion engine itself to use the same
code so we no longer have strange divergence between the two code paths.
I've added some silly test cases to demonstrate that we can handle
slightly more degenerate code patterns now. See the below for why this
is even interesting.
Performance impact: roughly 1% regression in the performance of SROA or
ScalarRepl on a large C++-ish test case where most of the allocas are
basically ready for promotion. The reason is because of silly redundant
work that I've left FIXMEs for and which I'll address in the next
commit. I wanted to separate this commit as it changes the behavior.
Once the redundant work in removing the dead uses of the alloca is
fixed, this code appears to be faster than the old version. =]
So why is this useful? Because the previous requirement for promotion
required a *specific* visit pattern of the uses of the alloca to verify:
we *had* to look for no more than 1 intervening use. The end goal is to
have SROA automatically detect when an alloca is already promotable and
directly hand it to the mem2reg machinery rather than trying to
partition and rewrite it. This is a 25% or more performance improvement
for SROA, and a significant chunk of the delta between it and
ScalarRepl. To get there, we need to make mem2reg actually capable of
promoting allocas which *look* promotable to SROA without have SROA do
tons of work to massage the code into just the right form.
This is actually the tip of the iceberg. There are tremendous potential
savings we can realize here by de-duplicating work between mem2reg and
SROA.
llvm-svn: 187191
The language reference says that:
"If a symbol appears in the @llvm.used list, then the compiler,
assembler, and linker are required to treat the symbol as if there is
a reference to the symbol that it cannot see"
Since even the linker cannot see the reference, we must assume that
the reference can be using the symbol table. For example, a user can add
__attribute__((used)) to a debug helper function like dump and use it from
a debugger.
llvm-svn: 187103
helper function. This leaves both trivial cases handled entirely in
helper functions and merely manages the list of allocas to process in
the run method.
The next step will be to handle all of the trivial promotion work prior
to even creating the core class and the subsequent simplifications that
enables.
llvm-svn: 186784
a single block into the helper routine. This takes advantage of the fact
that we can directly replace uses prior to any store with undef to
simplify matters and unconditionally promote allocas only used within
one block.
I've removed the special handling for the case of no stores existing.
This has no semantic effect but might slow things down. I'll fix that in
a later patch when I refactor this entire thing to be easier to manage
the different cases.
llvm-svn: 186783
handles the general cases.
The hope is to refactor this so that we don't end up building the entire
class for the trivial cases. I also want to lift a lot of the early
pre-processing in the initial segment of run() into a separate routine,
and really none of it needs to happen inside the primary promotion
class.
These routines in particular used none of the actual state in the
promotion class, so they don't really make sense as members.
llvm-svn: 186781
This struct is nicely independent of everything else, and we already
needed a foward declaration here. It's simpler to just define it
immediately.
llvm-svn: 186780
predecessors of the two blocks it is attempting to merge supply the
same incoming values to any phi in the successor block. This change
allows merging in the case where there is one or more incoming values
that are undef. The undef values are rewritten to match the non-undef
value that flows from the other edge. Patch by Mark Lacey.
llvm-svn: 186069
A special case list can now specify categories for specific globals,
which can be used to instruct an instrumentation pass to treat certain
functions or global variables in a specific way, such as by omitting
certain aspects of instrumentation while keeping others, or informing
the instrumentation pass that a specific uninstrumentable function
has certain semantics, thus allowing the pass to instrument callers
according to those semantics.
For example, AddressSanitizer now uses the "init" category instead of
global-init prefixes for globals whose initializers should not be
instrumented, but which in all other respects should be instrumented.
The motivating use case is DataFlowSanitizer, which will have a
number of different categories for uninstrumentable functions, such
as "functional" which specifies that a function has pure functional
semantics, or "discard" which indicates that a function's return
value should not be labelled.
Differential Revision: http://llvm-reviews.chandlerc.com/D1092
llvm-svn: 185978
This allows us to create switches even if instcombine has munged two of the
incombing compares into one and some bit twiddling. This was motivated by enum
compares that are common in clang.
llvm-svn: 185632
No functionality change.
It should suffice to check the type of a debug info metadata, instead of
calling Verify. For cases where we know the type of a DI metadata, use
assert.
Also update testing cases to make them conform to the format of DI classes.
llvm-svn: 185135
The Builtin attribute is an attribute that can be placed on function call site that signal that even though a function is declared as being a builtin,
rdar://problem/13727199
llvm-svn: 185049
This commit completely removes what is left of the simplify-libcalls
pass. All of the functionality has now been migrated to the instcombine
and functionattrs passes. The following C API functions are now NOPs:
1. LLVMAddSimplifyLibCallsPass
2. LLVMPassManagerBuilderSetDisableSimplifyLibCalls
llvm-svn: 184459
The problem this time seems to be a thinko. We were assuming that in the CFG
A
| \
| B
| /
C
speculating the basic block B would cause only the phi value for the B->C edge
to be speculated. That is not true, the phi's are semantically in the edges, so
if the A->B->C path is taken, any code needed for A->C is not executed and we
have to consider it too when deciding to speculate B.
llvm-svn: 183226
PR16069 is an interesting case where an incoming value to a PHI is a
trap value while also being a 'ConstantExpr'.
We do not consider this case when performing the 'HoistThenElseCodeToIf'
optimization.
Instead, make our modifications more conservative if we detect that we
cannot transform the PHI to a select.
llvm-svn: 183152
Extend LinkModules to pass a ValueMaterializer to RemapInstruction and friends to lazily create Functions for lazily linked globals. This is a big win when linking small modules with large (mostly unused) library modules.
llvm-svn: 182776
Other passes, PPC counter-loop formation for example, also need to add loop
preheaders outside of the regular loop simplification pass. This makes
InsertPreheaderForLoop a global function so that it can be used by other
passes.
No functionality change intended.
llvm-svn: 182299
the things, and renames it to CBindingWrapping.h. I also moved
CBindingWrapping.h into Support/.
This new file just contains the macros for defining different wrap/unwrap
methods.
The calls to those macros, as well as any custom wrap/unwrap definitions
(like for array of Values for example), are put into corresponding C++
headers.
Doing this required some #include surgery, since some .cpp files relied
on the fact that including Wrap.h implicitly caused the inclusion of a
bunch of other things.
This also now means that the C++ headers will include their corresponding
C API headers; for example Value.h must include llvm-c/Core.h. I think
this is harmless, since the C API headers contain just external function
declarations and some C types, so I don't believe there should be any
nasty dependency issues here.
llvm-svn: 180881
This resurrects r179957, but adds code that makes sure we don't touch
atomic/volatile stores:
This transformation will transform a conditional store with a preceeding
uncondtional store to the same location:
a[i] =
may-alias with a[i] load
if (cond)
a[i] = Y
into an unconditional store.
a[i] = X
may-alias with a[i] load
tmp = cond ? Y : X;
a[i] = tmp
We assume that on average the cost of a mispredicted branch is going to be
higher than the cost of a second store to the same location, and that the
secondary benefits of creating a bigger basic block for other optimizations to
work on outway the potential case where the branch would be correctly predicted
and the cost of the executing the second store would be noticably reflected in
performance.
hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With
this change we are on par with gcc's performance (gcc also performs this
transformation). There was a 1.2 % performance improvement on a ARM swift chip.
Other tests in the test-suite+external seem to be mostly uninfluenced in my
experiments:
This optimization was triggered on 41 tests such that the executable was
different before/after the patch. Only 1 out of the 40 tests (dealII) was
reproducable below 100% (by about .4%). Given that hmmer benefits so much I
believe this to be a fair trade off.
llvm-svn: 180731
Since we can't guarantee that the original dbg.declare instrinsic
is removed by LowerDbgDeclare(), we need to make sure that we are
not inserting the same dbg.value intrinsic over and over.
This removes tons of redundant DIEs when compiling optimized code.
rdar://problem/13056109
llvm-svn: 180615
debug location. This solves a problem where range of an inlined
subroutine is emitted wrongly.
Patch by Manman Ren.
Fixes rdar://problem/12415623
llvm-svn: 180140
There is the temptation to make this tranform dependent on target information as
it is not going to be beneficial on all (sub)targets. Therefore, we should
probably do this in MI Early-Ifconversion.
This reverts commit r179957. Original commit message:
"SimplifyCFG: If convert single conditional stores
This transformation will transform a conditional store with a preceeding
uncondtional store to the same location:
a[i] =
may-alias with a[i] load
if (cond)
a[i] = Y
into an unconditional store.
a[i] = X
may-alias with a[i] load
tmp = cond ? Y : X;
a[i] = tmp
We assume that on average the cost of a mispredicted branch is going to be
higher than the cost of a second store to the same location, and that the
secondary benefits of creating a bigger basic block for other optimizations to
work on outway the potential case were the branch would be correctly predicted
and the cost of the executing the second store would be noticably reflected in
performance.
hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With
this change we are on par with gcc's performance (gcc also performs this
transformation). There was a 1.2 % performance improvement on a ARM swift chip.
Other tests in the test-suite+external seem to be mostly uninfluenced in my
experiments:
This optimization was triggered on 41 tests such that the executable was
different before/after the patch. Only 1 out of the 40 tests (dealII) was
reproducable below 100% (by about .4%). Given that hmmer benefits so much I
believe this to be a fair trade off.
I am going to watch performance numbers across the builtbots and will revert
this if anything unexpected comes up."
llvm-svn: 179980
This transformation will transform a conditional store with a preceeding
uncondtional store to the same location:
a[i] =
may-alias with a[i] load
if (cond)
a[i] = Y
into an unconditional store.
a[i] = X
may-alias with a[i] load
tmp = cond ? Y : X;
a[i] = tmp
We assume that on average the cost of a mispredicted branch is going to be
higher than the cost of a second store to the same location, and that the
secondary benefits of creating a bigger basic block for other optimizations to
work on outway the potential case were the branch would be correctly predicted
and the cost of the executing the second store would be noticably reflected in
performance.
hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With
this change we are on par with gcc's performance (gcc also performs this
transformation). There was a 1.2 % performance improvement on a ARM swift chip.
Other tests in the test-suite+external seem to be mostly uninfluenced in my
experiments:
This optimization was triggered on 41 tests such that the executable was
different before/after the patch. Only 1 out of the 40 tests (dealII) was
reproducable below 100% (by about .4%). Given that hmmer benefits so much I
believe this to be a fair trade off.
I am going to watch performance numbers across the builtbots and will revert
this if anything unexpected comes up.
llvm-svn: 179957
If a switch instruction has a case for every possible value of its type,
with the same successor, SimplifyCFG would replace it with an icmp ult,
but the computation of the bound overflows in that case, which inverts
the test.
Patch by Jed Davis!
llvm-svn: 179587
How did this ever work?
Basically, if you have a function that's inlined into the caller, it may not
have any 'call' instructions, but any 'resume' instructions it may have should
still be forwarded to the outer (caller's) landing pad. This requires that all
of the 'landingpad' instructions in the callee have their clauses merged with
the caller's outer 'landingpad' instruction (hence the bit of ugly code in the
`forwardResume' method).
Testcase in a follow commit to the test-suite repository.
<rdar://problem/13360379> & PR15555
llvm-svn: 177680
Nadav reported a performance regression due to the work I did to
merge the library call simplifier into instcombine [1]. The issue
is that a new LibCallSimplifier object is being created whenever
InstCombiner::runOnFunction is called. Every time a LibCallSimplifier
object is used to optimize a call it creates a hash table to map from
a function name to an object that optimizes functions of that name.
For short-lived LibCallSimplifier instances this is quite inefficient.
Especially for cases where no calls are actually simplified.
This patch fixes the issue by dropping the hash table and implementing
an explicit lookup function to correlate the function name to the object
that optimizes functions of that name. This avoids the cost of always
building and destroying the hash table in cases where the LibCallSimplifier
object is short-lived and avoids the cost of building the table when no
simplifications are actually preformed.
On a benchmark containing 100,000 calls where none of them are simplified
I noticed a 30% speedup. On a benchmark containing 100,000 calls where
all of them are simplified I noticed an 8% speedup.
[1] http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130304/167639.html
llvm-svn: 176840
Fixes rdar:13349374.
Volatile loads and stores need to be preserved even if the language
standard says they are undefined. "volatile" in this context means "get
out of the way compiler, let my platform handle it".
Additionally, this is the only way I know of with llvm to write to the
first page (when hardware allows) without dropping to assembly.
llvm-svn: 176599
* Only apply divide bypass optimization when not optimizing for size.
* Fixed bug caused by constant for 0 value of type Int32,
used dividend type to generate the constant instead.
* For atom x86-64 apply the divide bypass to use 16-bit divides instead of
64-bit divides when operand values are small enough.
* Added lit tests for 64-bit divide bypass.
Patch by Tyler Nowicki!
llvm-svn: 176442
enhancement done the trivial way; by extending inputs and truncating outputs
which is addequate for targets with little or no support for integer arithmetic
on integer types less than 32 bits.
llvm-svn: 176139
The 'nobuiltin' attribute is applied to call sites to indicate that LLVM should
not treat the callee function as a built-in function. I.e., it shouldn't try to
replace that function with different code.
llvm-svn: 175835
isn't using the default calling convention. However, if the transformation is
from a call to inline IR, then the calling convention doesn't matter.
rdar://13157990
llvm-svn: 174724
This is a re-worked version of r174048.
Given source IR:
call void @llvm.dbg.declare(metadata !{i32* %argc.addr}, metadata !14), !dbg !15
we used to generate
call void @llvm.dbg.declare(metadata !27, metadata !28), !dbg !29!27 = metadata !{null}
With this patch, we will correctly generate
call void @llvm.dbg.declare(metadata !{i32* %argc.addr}, metadata !27), !dbg !28
Looking up %argc.addr in ValueMap will return null, since %argc.addr is already
correctly set up, we can use identity mapping.
rdar://problem/13089880
llvm-svn: 174093
Given source IR:
call void @llvm.dbg.declare(metadata !{i32* %argc.addr}, metadata !14), !dbg !15
we used to generate
call void @llvm.dbg.declare(metadata !27, metadata !28), !dbg !29!27 = metadata !{null}
With this patch, we will correctly generate
call void @llvm.dbg.declare(metadata !{i32* %argc.addr}, metadata !27), !dbg !28
Looking up %argc.addr in ValueMap will return null, since %argc.addr is already
correctly set up, we can use identity mapping.
llvm-svn: 173946
loops over instructions in the basic block or the use-def list of the
value, neither of which are really efficient when repeatedly querying
about values in the same basic block.
What's more, we already know that the CondBB is small, and so we can do
a much more efficient test by counting the uses in CondBB, and seeing if
those account for all of the uses.
Finally, we shouldn't blanket fail on any such instruction, instead we
should conservatively assume that those instructions are part of the
cost.
Note that this actually fixes a bug in the pass because
isUsedInBasicBlock has a really terrible bug in it. I'll fix that in my
next commit, but the fix for it would make this code suddenly take the
compile time hit I thought it already was taking, so I wanted to go
ahead and migrate this code to a faster & better pattern.
The bug in isUsedInBasicBlock was also causing other tests to test the
wrong thing entirely: for example we weren't actually disabling
speculation for floating point operations as intended (and tested), but
the test passed because we failed to speculate them due to the
isUsedInBasicBlock failure.
llvm-svn: 173417
Original commit message:
Plug TTI into the speculation logic, giving it a real cost interface
that can be specialized by targets.
The goal here is not to be more aggressive, but to just be more accurate
with very obvious cases. There are instructions which are known to be
truly free and which were not being modeled as such in this code -- see
the regression test which is distilled from an inner loop of zlib.
Everywhere the TTI cost model is insufficiently conservative I've added
explicit checks with FIXME comments to go add proper modelling of these
cost factors.
If this causes regressions, the likely solution is to make TTI even more
conservative in its cost estimates, but test cases will help here.
llvm-svn: 173357
that can be specialized by targets.
The goal here is not to be more aggressive, but to just be more accurate
with very obvious cases. There are instructions which are known to be
truly free and which were not being modeled as such in this code -- see
the regression test which is distilled from an inner loop of zlib.
Everywhere the TTI cost model is insufficiently conservative I've added
explicit checks with FIXME comments to go add proper modelling of these
cost factors.
If this causes regressions, the likely solution is to make TTI even more
conservative in its cost estimates, but test cases will help here.
llvm-svn: 173342
a cost fuction that seems both a bit ad-hoc and also poorly suited to
evaluating constant expressions.
Notably, it is missing any support for trivial expressions such as
'inttoptr'. I could fix this routine, but it isn't clear to me all of
the constraints its other users are operating under.
The core protection that seems relevant here is avoiding the formation
of a select instruction wich a further chain of select operations in
a constant expression operand. Just explicitly encode that constraint.
Also, update the comments and organization here to make it clear where
this needs to go -- this should be driven off of real cost measurements
which take into account the number of constants expressions and the
depth of the constant expression tree.
llvm-svn: 173340
terms of cost rather than hoisting a single instruction.
This does *not* change the cost model! We still set the cost threshold
at 1 here, it's just that we track it by accumulating cost rather than
by storing an instruction.
The primary advantage is that we no longer leave no-op intrinsics in the
basic block. For example, this will now move both debug info intrinsics
and a single instruction, instead of only moving the instruction and
leaving a basic block with nothing bug debug info intrinsics in it, and
those intrinsics now no longer ordered correctly with the hoisted value.
Instead, we now splice the entire conditional basic block's instruction
sequence.
This also places the code for checking the safety of hoisting next to
the code computing the cost.
Currently, the only observable side-effect of this change is that debug
info intrinsics are no longer abandoned. I'm not sure how to craft
a test case for this, and my real goal was the refactoring, but I'll
talk to Dave or Eric about how to add a test case for this.
llvm-svn: 173339
Previously, the code would scan the PHI nodes and build up a small
setvector of candidate value pairs in phi nodes to go and rewrite. Once
certain the rewrite could be performed, the code walks the set, and for
each one re-scans the entire PHI node list looking for nodes to rewrite
operands.
Instead, scan the PHI nodes once to check for hazards, and then scan it
a second time to rewrite the operands to selects. No set vector, and
a max of two scans.
The only downside is that we might form identical selects, but
instcombine or anything else should fold those easily, and it seems
unlikely to happen often.
llvm-svn: 173337
pretty in doxygen, adding some of the details actually present in
a classic example where this matters (a loop from gzip and many other
compression algorithms), and a cautionary note about the risks inherent
in the transform. This has come up on the mailing lists recently, and
I suspect folks reading this code could benefit from going and looking
at the MI pass that can really deal with these issues.
llvm-svn: 173329
used uninitialized, since it fails to understand that Array is only used when
SingleValue is not, and outputs a warning. It also seems generally safer given
that the constructor is non-trivial and has plenty of early exits.
llvm-svn: 173242
Collections of attributes are handled via the AttributeSet class now. This
finally frees us up to make significant changes to how attributes are structured.
llvm-svn: 173228
Because the Attribute class is going to stop representing a collection of
attributes, limit the use of it as an aggregate in favor of using AttributeSet.
This replaces some of the uses for querying the function attributes.
llvm-svn: 172844
through as a reference rather than a pointer. There is always *some*
implementation of this available, so this simplifies code by not having
to test for whether it is available or not.
Further, it turns out there were piles of places where SimplifyCFG was
recursing and not passing down either TD or TTI. These are fixed to be
more pedantically consistent even though I don't have any particular
cases where it would matter.
llvm-svn: 171691
next to its only user. This helper relies on TargetLowering information
that shouldn't be generally used throughout the Transfoms library, and
so it made little sense as a generic utility.
This also consolidates the file where we need to remove the remaining
uses of TargetLowering in favor of the IR-layer abstract interface in
TargetTransformInfo.
llvm-svn: 171590
into their new header subdirectory: include/llvm/IR. This matches the
directory structure of lib, and begins to correct a long standing point
of file layout clutter in LLVM.
There are still more header files to move here, but I wanted to handle
them in separate commits to make tracking what files make sense at each
layer easier.
The only really questionable files here are the target intrinsic
tablegen files. But that's a battle I'd rather not fight today.
I've updated both CMake and Makefile build systems (I think, and my
tests think, but I may have missed something).
I've also re-sorted the includes throughout the project. I'll be
committing updates to Clang, DragonEgg, and Polly momentarily.
llvm-svn: 171366
utils/sort_includes.py script.
Most of these are updating the new R600 target and fixing up a few
regressions that have creeped in since the last time I sorted the
includes.
llvm-svn: 171362
When ASan replaces <alloca instruction> with
<offset into a common large alloca>, it should also patch
llvm.dbg.declare calls and replace debug info descriptors to mark
that we've replaced alloca with a value that stores an address
of the user variable, not the user variable itself.
See PR11818 for more context.
llvm-svn: 169984
Sooooo many of these had incorrect or strange main module includes.
I have manually inspected all of these, and fixed the main module
include to be the nearest plausible thing I could find. If you own or
care about any of these source files, I encourage you to take some time
and check that these edits were sensible. I can't have broken anything
(I strictly added headers, and reordered them, never removed), but they
may not be the headers you'd really like to identify as containing the
API being implemented.
Many forward declarations and missing includes were added to a header
files to allow them to parse cleanly when included first. The main
module rule does in fact have its merits. =]
llvm-svn: 169131
We're iterating over a non-deterministically ordered container looking
for two saturating flags. To do this correctly, we have to saturate
both, and only stop looping if both saturate to their final value.
Otherwise, which flag we see first changes the result.
This is also a micro-optimization of the previous version as now we
don't go into the (possibly expensive) test logic once the first
violation of either constraint is detected.
llvm-svn: 168989
functionality changed.
Evan's commit r168970 moved the code that the primary comment in this
function referred to to the other end of the function without moving the
comment, and there has been a steady creep of "boolean" logic in it that
is simpler if handled via early exit. That way each special case can
have its own comments. I've also made the variable name a bit more
explanatory than "AllFit". This is in preparation to fix the
non-deterministic output of this function.
llvm-svn: 168988
This patch migrates the puts optimizations from the simplify-libcalls
pass into the instcombine library call simplifier.
All the simplifiers from simplify-libcalls have now been migrated to
instcombine. Yay! Just a few other bits to migrate (prototype attribute
inference and a few statistics) and simplify-libcalls can finally be put
to rest.
llvm-svn: 168925
When code deletes the context, the AttributeImpls that the AttrListPtr points to
are now invalid. Therefore, instead of keeping a separate managed static for the
AttrListPtrs that's reference counted, move it into the LLVMContext and delete
it when deleting the AttributeImpls.
llvm-svn: 168354
This patch migrates the math library call simplifications from the
simplify-libcalls pass into the instcombine library call simplifier.
I have typically migrated just one simplifier at a time, but the math
simplifiers are interdependent because:
1. CosOpt, PowOpt, and Exp2Opt all depend on UnaryDoubleFPOpt.
2. CosOpt, PowOpt, Exp2Opt, and UnaryDoubleFPOpt all depend on
the option -enable-double-float-shrink.
These two factors made migrating each of these simplifiers individually
more of a pain than it would be worth. So, I migrated them all together.
llvm-svn: 167815
The library call simplifier folds memcmp calls with all constant arguments
to a constant. For example:
memcmp("foo", "foo", 3) -> 0
memcmp("hel", "foo", 3) -> 1
memcmp("foo", "hel", 3) -> -1
The folding is implemented in terms of the system memcmp that LLVM gets
linked with. It currently just blindly uses the value returned from
the system memcmp as the folded constant.
This patch normalizes the values returned from the system memcmp to
(-1, 0, 1) so that we get consistent results across multiple platforms.
The test cases were adjusted accordingly.
llvm-svn: 167726
In some cases the library call simplifier may need to replace instructions
other than the library call being simplified. In those cases it may be
necessary for clients of the simplifier to override how the replacements
are actually done. As such, a new overrideable method for replacing
instructions was added to LibCallSimplifier.
A new subclass of LibCallSimplifier is also defined which overrides
the instruction replacement method. This is because the instruction
combiner defines its own replacement method which updates the worklist
when instructions are replaced.
llvm-svn: 167681
Several of the simplifiers migrated from the simplify-libcalls pass to
the instcombine pass were not correctly checking the target library
information to gate the simplifications. This patch ensures that the
check is made.
llvm-svn: 167660
r165941: Resubmit the changes to llvm core to update the functions to
support different pointer sizes on a per address space basis.
Despite this commit log, this change primarily changed stuff outside of
VMCore, and those changes do not carry any tests for correctness (or
even plausibility), and we have consistently found questionable or flat
out incorrect cases in these changes. Most of them are probably correct,
but we need to devise a system that makes it more clear when we have
handled the address space concerns correctly, and ideally each pass that
gets updated would receive an accompanying test case that exercises that
pass specificaly w.r.t. alternate address spaces.
However, from this commit, I have retained the new C API entry points.
Those were an orthogonal change that probably should have been split
apart, but they seem entirely good.
In several places the changes were very obvious cleanups with no actual
multiple address space code added; these I have not reverted when
I spotted them.
In a few other places there were merge conflicts due to a cleaner
solution being implemented later, often not using address spaces at all.
In those cases, I've preserved the new code which isn't address space
dependent.
This is part of my ongoing effort to clean out the partial address space
code which carries high risk and low test coverage, and not likely to be
finished before the 3.2 release looms closer. Duncan and I would both
like to see the above issues addressed before we return to these
changes.
llvm-svn: 167222
getIntPtrType support for multiple address spaces via a pointer type,
and also introduced a crasher bug in the constant folder reported in
PR14233.
These commits also contained several problems that should really be
addressed before they are re-committed. I have avoided reverting various
cleanups to the DataLayout APIs that are reasonable to have moving
forward in order to reduce the amount of churn, and minimize the number
of commits that were reverted. I've also manually updated merge
conflicts and manually arranged for the getIntPtrType function to stay
in DataLayout and to be defined in a plausible way after this revert.
Thanks to Duncan for working through this exact strategy with me, and
Nick Lewycky for tracking down the really annoying crasher this
triggered. (Test case to follow in its own commit.)
After discussing with Duncan extensively, and based on a note from
Micah, I'm going to continue to back out some more of the more
problematic patches in this series in order to ensure we go into the
LLVM 3.2 branch with a reasonable story here. I'll send a note to
llvmdev explaining what's going on and why.
Summary of reverted revisions:
r166634: Fix a compiler warning with an unused variable.
r166607: Add some cleanup to the DataLayout changes requested by
Chandler.
r166596: Revert "Back out r166591, not sure why this made it through
since I cancelled the command. Bleh, sorry about this!
r166591: Delete a directory that wasn't supposed to be checked in yet.
r166578: Add in support for getIntPtrType to get the pointer type based
on the address space.
llvm-svn: 167221
- Use value handle tricks to communicate use replacements instead of forgetLoop, this is a lot faster.
- Move the "big hammer" out of the main loop so it's not called for every instruction.
This should recover most (if not all) compile time regressions introduced by this code.
llvm-svn: 167136
By propagating the value for the switch condition, LLVM can now build
lookup tables for code such as:
switch (x) {
case 1: return 5;
case 2: return 42;
case 3: case 4: case 5:
return x - 123;
default:
return 123;
}
Given that x is known for each case, "x - 123" becomes a constant for
cases 3, 4, and 5.
llvm-svn: 167115
This patch migrates the stpcpy optimizations from the simplify-libcalls
pass into the instcombine library call simplifier. Note that the
__stpcpy_chk simplifications were migrated in a previous commit.
llvm-svn: 167083
r166198 migrated the strcpy optimization to instcombine. The strcpy
simplifier that was migrated from Transforms/Scalar/SimplifyLibCalls.cpp
was also doing some __strcpy_chk simplifications. Those fortified
simplifications were migrated as well, but introduced a bug in the
__stpcpy_chk simplifier in the process. This happened because the
__strcpy_chk and __stpcpy_chk simplifiers were both mapped to StrCpyChkOpt
which was updated with simplifications that worked for __strcpy_chk, but
not __stpcpy_chk.
This patch fixes the problem by adding proper test coverage and creating a
new simplifier for __stpcpy_chk (instead of sharing one with __strcpy_chk).
llvm-svn: 167082
When the switch-to-lookup tables transform landed in SimplifyCFG, it
was pointed out that this could be inappropriate for some targets.
Since there was no way at the time for the pass to know anything about
the target, an awkward reverse-transform was added in CodeGenPrepare
that turned lookup tables back into switches for some targets.
This patch uses the new TargetTransformInfo to determine if a
switch should be transformed, and removes
CodeGenPrepare::ConvertLoadToSwitch.
llvm-svn: 167011
wrapper returns a vector of integers when passed a vector of pointers) by having
getIntPtrType itself return a vector of integers in this case. Outside of this
wrapper, I didn't find anywhere in the codebase that was relying on the old
behaviour for vectors of pointers, so give this a whirl through the buildbots.
llvm-svn: 166939
This is currently true, but may change when DA grows more aggressive caching.
Without this setting it's impossible to use DA from a LoopPass because DA is a
function pass and cannot be properly scheduled in between LoopPasses. The
LoopManager reacts to this with an infinite loop which made this really annoying
to debug.
llvm-svn: 166788
The LoopSimplify bug is pretty harmless because the loop goes from unanalyzable
to analyzable but the LCSSA bug is very nasty. It only comes into play with a
specific order of the LoopPassManager worklist and can cause actual
miscompilations, when a SCEV refers to a value that has been replaced with PHI
node. SCEVExpander may then insert code into the wrong place, either violating
domination or randomly miscompiling stuff.
Comes with an extensive test case reduced from the test-suite with
bugpoint+SCEVValidator.
llvm-svn: 166787
The isValueEqualityComparison() guard at the top of SimplifySwitch()
only applies to some of the possible transformations.
The newer transformations work just fine on large switches, and the
check on predecessor count is nonsensical.
llvm-svn: 166710
deterministic, replace it with a DenseMap<std::pair<unsigned, unsigned>,
PHINode*> (we already have a map from BasicBlock to unsigned).
<rdar://problem/12541389>
llvm-svn: 166435
This patch migrates the strcpy optimizations from the simplify-libcalls pass
into the instcombine library call simplifier. Note also that StrCpyChkOpt
has been updated with a few simplifications that were being done in the
simplify-libcalls version of StrCpyOpt, but not in the migrated implementation
of StrCpyOpt. There is no reason to overload StrCpyOpt with fortified and
regular simplifications in the new model since there is already a dedicated
simplifier for __strcpy_chk.
llvm-svn: 166198
The TargetTransform changes are breaking LTO bootstraps of clang. I am
working with Nadav to figure out the problem, but I am reverting it for now
to get our buildbots working.
This reverts svn commits: 165665 165669 165670 165786 165787 165997
and I have also reverted clang svn 165741
llvm-svn: 166168
Convert the internal representation of the Attributes class into a pointer to an
opaque object that's uniqued by and stored in the LLVMContext object. The
Attributes class then becomes a thin wrapper around this opaque
object. Eventually, the internal representation will be expanded to include
attributes that represent code generation options, etc.
llvm-svn: 165917
This patch migrates the strcmp and strncmp optimizations from the
simplify-libcalls pass into the instcombine library call simplifier.
llvm-svn: 165915
This patch migrates the strchr and strrchr optimizations from the
simplify-libcalls pass into the instcombine library call simplifier.
llvm-svn: 165875
This patch migrates the strcat and strncat optimizations from the
simplify-libcalls pass into the instcombine library call simplifier.
llvm-svn: 165874
This patch implements the new LibCallSimplifier class as outlined in [1].
In addition to providing the new base library simplification infrastructure,
all the fortified library call simplifications were moved over to the new
infrastructure. The rest of the library simplification optimizations will
be moved over with follow up patches.
NOTE: The original fortified library call simplifier located in the
SimplifyFortifiedLibCalls class was not removed because it is still
used by CodeGenPrepare. This class will eventually go away too.
[1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-August/052283.html
llvm-svn: 165873
When all cases of a switch statement are dead, the weights vector only has one
element, and we will get an ssertion failure when calling createBranchWeights.
llvm-svn: 165759
We conservatively only check the first use to avoid walking long use chains.
This catches the common case of having both a load and a store to a pointer
supplied by a PHI node.
llvm-svn: 165232
instruction (for Intel Atom) was not being done by Clang, because
the type context used by Clang is not the default context.
It fixes the problem by getting the global context types for each div/rem
instruction in order to compare them against the types in the BypassTypeMap.
Tests for this will be done as a separate patch to Clang.
Patch by Tyler Nowicki.
llvm-svn: 165126