While the subprogram map cache used by Dead Argument Elimination works
there, I made a mistake when reusing it for Argument Promotion in
r212128 because ArgPromo may transform functions more than once whereas
DAE transforms each function only once, removing all the dead arguments
in one go.
To address this, ensure that the map is updated after each argument
promotion.
In retrospect it might be a little wasteful to create a map of all
subprograms when only handling a single CGSCC, but the alternative is
walking the debug info for each function in the CGSCC that gets updated.
It's not clear to me what the right tradeoff is there, but since the
current tradeoff seems to be working OK (and the code to keep things
updated is very cheap), let's stick with that for now.
llvm-svn: 213805
Summary: This patch introduces two new iterator ranges and updates existing code to use it. No functional change intended.
Test Plan: All tests (make check-all) still pass.
Reviewers: dblaikie
Reviewed By: dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D4481
llvm-svn: 213474
Merges equivalent loads on both sides of a hammock/diamond
and hoists into into the header.
Merges equivalent stores on both sides of a hammock/diamond
and sinks it to the footer.
Can enable if conversion and tolerate better load misses
and store operand latencies.
llvm-svn: 213396
isDereferenceablePointer should not give up upon encountering any bitcast. If
we're casting from a pointer to a larger type to a pointer to a small type, we
can continue by examining the bitcast's operand. This missing capability
was noted in a comment in the function.
In order for this to work, isDereferenceablePointer now takes an optional
DataLayout pointer (essentially all callers already had such a pointer
available). Most code uses isDereferenceablePointer though
isSafeToSpeculativelyExecute (which already took an optional DataLayout
pointer), and to enable the LICM test case, LICM needs to actually provide its DL
pointer to isSafeToSpeculativelyExecute (which it was not doing previously).
llvm-svn: 212686
This reverts commit 5b55a47e94e28fbb56d0cd5d72c3db9105c15b4c.
A test case was found to crash after this was applied. I'll file a bug to track fixing this with the test case needed.
llvm-svn: 212550
This is useful for functions that are not actually available externally but
referenced by a vtable of some kind. Clang emits functions like this for the MS
ABI.
PR20182.
llvm-svn: 212337
Exposes more constant globals that can be removed by
the global optimizer. A specific example is the removal
of the static global block address array in
clang/test/CodeGen/indirect-goto.c. This change impacts only
lower optimization levels. With LTO interprocedural
const prop runs already before global opt.
llvm-svn: 212284
Matching behavior with DeadArgumentElimination (and leveraging some
now-common infrastructure), keep track of the function from debug info
metadata if arguments are promoted.
This may produce interesting debug info - since the arguments may be
missing or of different types... but at least backtraces, inlining, etc,
will be correct.
llvm-svn: 212128
There were transforms whose *intent* was to downgrade the linkage of
external objects to have internal linkage.
However, it fired on things with private linkage as well.
llvm-svn: 212104
This new IR facility allows us to represent the object-file semantic of
a COMDAT group.
COMDATs allow us to tie together sections and make the inclusion of one
dependent on another. This is required to implement features like MS
ABI VFTables and optimizing away certain kinds of initialization in C++.
This functionality is only representable in COFF and ELF, Mach-O has no
similar mechanism.
Differential Revision: http://reviews.llvm.org/D4178
llvm-svn: 211920
Folding a reference to a thread_local variable into another global
variable's initializer is very problematic, there is no relocation that
exists to represent such an access.
llvm-svn: 211762
Referencing a dllimport variable requires actually instructions, not
just a relocation. This fixes PR19955.
Differential Revision: http://reviews.llvm.org/D4249
llvm-svn: 211571
Patch removes rest part of code related to old implementation.
This patch belongs to patch series that improves MergeFunctions
performance time from O(N*N) to O(N*log(N)).
This one was the final patch.
llvm-svn: 211457
Added short description for new comparison algorithm, that introduces
total ordering among functions set.
This patch belongs to patch series that improves MergeFunctions
performance time from O(N*N) to O(N*log(N)).
llvm-svn: 211456
Patch activates new implementation.
So from now, merging process should take time O(N*log(N)).
Where N size of module (we are free to measure it in
functions or in instructions). Internally FnTree represents
binary tree. So every lookup operation takes O(log(N)) time.
It is still not the last patch in series, we also have to
clean-up pass from old code, and update pass comments.
This patch belongs to patch series that improves MergeFunctions
performance time from O(N*N) to O(N*log(N)).
llvm-svn: 211445
Patch removed next old FunctionComparator methods:
* enumerate
* isEquivalentOperation
* isEquivalentGEP
* isEquivalentType
This patch belongs to patch series that improves MergeFunctions
performance time from O(N*N) to O(N*log(N)).
llvm-svn: 211444
introduced among functions set.
This patch belongs to patch series that improves MergeFunctions
performance time from O(N*N) to O(N*log(N)).
llvm-svn: 211442
methods.
Patch changes return type of FunctionComparator::compare() and
FunctionComparator::compare(const BasicBlock*, const BasicBlock*)
methods from bool (equal or not) to {-1, 0, 1} (less, equal, great).
This patch belongs to patch series that improves MergeFunctions
performance time from O(N*N) to O(N*log(N)).
llvm-svn: 211437
Summary:
Different range metadata can lead to different optimizations in later
passes, possibly breaking the semantics of the merged function. So range
metadata must be taken into consideration when comparing Load
instructions.
Thanks!
llvm-svn: 211391
This commit adds a weak variant of the cmpxchg operation, as described
in C++11. A cmpxchg instruction with this modifier is permitted to
fail to store, even if the comparison indicated it should.
As a result, cmpxchg instructions must return a flag indicating
success in addition to their original iN value loaded. Thus, for
uniformity *all* cmpxchg instructions now return "{ iN, i1 }". The
second flag is 1 when the store succeeded.
At the DAG level, a new ATOMIC_CMP_SWAP_WITH_SUCCESS node has been
added as the natural representation for the new cmpxchg instructions.
It is a strong cmpxchg.
By default this gets Expanded to the existing ATOMIC_CMP_SWAP during
Legalization, so existing backends should see no change in behaviour.
If they wish to deal with the enhanced node instead, they can call
setOperationAction on it. Beware: as a node with 2 results, it cannot
be selected from TableGen.
Currently, no use is made of the extra information provided in this
patch. Test updates are almost entirely adapting the input IR to the
new scheme.
Summary for out of tree users:
------------------------------
+ Legacy Bitcode files are upgraded during read.
+ Legacy assembly IR files will be invalid.
+ Front-ends must adapt to different type for "cmpxchg".
+ Backends should be unaffected by default.
llvm-svn: 210903
It includes a pass that rewrites all indirect calls to jumptable functions to pass through these tables.
This also adds backend support for generating the jump-instruction tables on ARM and X86.
Note that since the jumptable attribute creates a second function pointer for a
function, any function marked with jumptable must also be marked with unnamed_addr.
llvm-svn: 210280
This extension point allows adding passes that perform peephole optimizations
similar to the instruction combiner. These passes will be inserted after
each instance of the instruction combiner pass.
Differential Revision: http://reviews.llvm.org/D3905
llvm-svn: 209595
Summary:
This adds two new diagnostics: -pass-remarks-missed and
-pass-remarks-analysis. They take the same values as -pass-remarks but
are intended to be triggered in different contexts.
-pass-remarks-missed is used by LLVMContext::emitOptimizationRemarkMissed,
which passes call when they tried to apply a transformation but
couldn't.
-pass-remarks-analysis is used by LLVMContext::emitOptimizationRemarkAnalysis,
which passes call when they want to inform the user about analysis
results.
The patch also:
1- Adds support in the inliner for the two new remarks and a
test case.
2- Moves emitOptimizationRemark* functions to the llvm namespace.
3- Adds an LLVMContext argument instead of making them member functions
of LLVMContext.
Reviewers: qcolombet
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3682
llvm-svn: 209442
This patch changes the design of GlobalAlias so that it doesn't take a
ConstantExpr anymore. It now points directly to a GlobalObject, but its type is
independent of the aliasee type.
To avoid changing all alias related tests in this patches, I kept the common
syntax
@foo = alias i32* @bar
to mean the same as now. The cases that used to use cast now use the more
general syntax
@foo = alias i16, i32* @bar.
Note that GlobalAlias now behaves a bit more like GlobalVariable. We
know that its type is always a pointer, so we omit the '*'.
For the bitcode, a nice surprise is that we were writing both identical types
already, so the format change is minimal. Auto upgrade is handled by looking
through the casts and no new fields are needed for now. New bitcode will
simply have different types for Alias and Aliasee.
One last interesting point in the patch is that replaceAllUsesWith becomes
smart enough to avoid putting a ConstantExpr in the aliasee. This seems better
than checking and updating every caller.
A followup patch will delete getAliasedGlobal now that it is redundant. Another
patch will add support for an explicit offset.
llvm-svn: 209007
Patch replaces old isEquivalentGEP implementation, and changes type of
comparison result from bool (equal or not) to {-1, 0, 1} (less, equal, greater).
This patch belongs to patch series that improves MergeFunctions
performance time from O(N*N) to O(N*log(N)).
llvm-svn: 208976
Patch replaces old isEquivalentOperation implementation, and changes type of
comparison result from bool (equal or not) to {-1, 0, 1} (less, equal, greater).
This patch belongs to patch series that improves MergeFunctions
performance time from O(N*N) to O(N*log(N)).
llvm-svn: 208973
The change to ExtractGV.cpp has no functionality change except to avoid
the asserts. Existing testcases already cover this, so I didn't add a
new one.
llvm-svn: 208264
This is a third patch of patch series that improves MergeFunctions
performance time from O(N*N) to O(N*log(N)).
This patch description:
Being comparing functions we need to compare values we meet at left and
right sides.
Its easy to sort things out for external values. It just should be
the same value at left and right.
But for local values (those were introduced inside function body)
we have to ensure they were introduced at exactly the same place,
and plays the same role.
In short, patch introduces values serial numbering and comparison routine.
The last one compares two values by their serial numbers.
llvm-svn: 208189
O(N*log(N)). The idea is to introduce total ordering among functions set.
It allows to build binary tree and perform function look-up procedure in O(log(N)) time.
This patch description:
Introduced total ordering among constants implemented in cmpConstants method.
Method performs lexicographical comparison between constants represented as
hypothetical numbers of next format:
<bitcastability-trait><raw-bit-contents>
Please, read cmpConstants declaration comments for more details.
llvm-svn: 208173
Visibility is meaningless when the linkage is local. Change
`-internalize` to reset the visibility to `default`.
<rdar://problem/16141113>
llvm-svn: 207979
There is no point in creating it if we're not going to vectorize
anything. Creating the map is expensive as it creates large values.
No functionality change.
llvm-svn: 207916
This moves most of GlobalOpt's constructor optimization
code out of GlobalOpt into Transforms/Utils/CDtorUtils.{h,cpp}. The
public interface is a single function OptimizeGlobalCtorsList() that
takes a predicate returning which constructors to remove.
GlobalOpt calls this with a function that statically evaluates all
constructors, just like it did before. This part of the change is
behavior-preserving.
Also add a call to this from GlobalDCE with a filter that removes global
constructors that contain a "ret" instruction and nothing else – this
fixes PR19590.
llvm-svn: 207856
It's fishy to be changing the `std::vector<>` owned by the iterator, and
no one actual does it, so I'm going to remove the ability in a
subsequent commit. First, update the users.
<rdar://problem/14292693>
llvm-svn: 207252
override the default cold threshold.
When we use command line argument to set the inline threshold, the default
cold threshold will not be used. This is in line with how we use
OptSizeThreshold. When we want a higher threshold for all functions, we
do not have to set both inline threshold and cold threshold.
llvm-svn: 207245
definition below all of the header #include lines, lib/Transforms/...
edition.
This one is tricky for two reasons. We again have a couple of passes
that define something else before the includes as well. I've sunk their
name macros with the DEBUG_TYPE.
Also, InstCombine contains headers that need DEBUG_TYPE, so now those
headers #define and #undef DEBUG_TYPE around their code, leaving them
well formed modular headers. Fixing these headers was a large motivation
for all of these changes, as "leaky" macros of this form are hard on the
modules implementation.
llvm-svn: 206844
Summary:
This patch adds backend support for -Rpass=, which indicates the name
of the optimization pass that should emit remarks stating when it
made a transformation to the code.
Pass names are taken from their DEBUG_NAME definitions.
When emitting an optimization report diagnostic, the lack of debug
information causes the diagnostic to use "<unknown>:0:0" as the
location string.
This is the back end counterpart for
http://llvm-reviews.chandlerc.com/D3226
Reviewers: qcolombet
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D3227
llvm-svn: 205774
The generic (concatenation) loop unroller is currently placed early in the
standard optimization pipeline. This is a good place to perform full unrolling,
but not the right place to perform partial/runtime unrolling. However, most
targets don't enable partial/runtime unrolling, so this never mattered.
However, even some x86 cores benefit from partial/runtime unrolling of very
small loops, and follow-up commits will enable this. First, we need to move
partial/runtime unrolling late in the optimization pipeline (importantly, this
is after SLP and loop vectorization, as vectorization can drastically change
the size of a loop), while keeping the full unrolling where it is now. This
change does just that.
llvm-svn: 205264
Patch by Tobias Güntner.
I tried to write a test, but the only difference is the Changed value that
gets returned. It can be tested with "opt -debug-pass=Executions -functionattrs,
but that doesn't seem worth it.
llvm-svn: 205121
When GlobalOpt has determined that a GlobalVariable only ever has two values,
it would convert the GlobalVariable to a boolean, and introduce SelectInsts
at every load, to choose between the two possible values. These SelectInsts
introduce overhead and other unpleasantness.
This patch makes GlobalOpt just add range metadata to loads from such
GlobalVariables instead. This enables the same main optimization (as seen in
test/Transforms/GlobalOpt/integer-bool.ll), without introducing selects.
The main downside is that it doesn't get the memory savings of shrinking such
GlobalVariables, but this is expected to be negligible.
llvm-svn: 204076
O(N*log(N)). The idea is to introduce total ordering among functions set.
That allows to build binary tree and perform function look-up procedure in O(log(N)) time.
This patch description:
Introduced total ordering among Type instances. Actually it is improvement for existing
isEquivalentType.
0. Coerce pointer of 0 address space to integer.
1. If left and right types are equal (the same Type* value), return 0 (means equal).
2. If types are of different kind (different type IDs). Return result of type IDs
comparison, treating them as numbers.
3. If types are vectors or integers, return result of its
pointers comparison (casted to numbers).
4. Check whether type ID belongs to the next group:
* Void
* Float
* Double
* X86_FP80
* FP128
* PPC_FP128
* Label
* Metadata
If so, return 0.
5. If left and right are pointers, return result of address space
comparison (numbers comparison).
6. If types are complex.
Then both LEFT and RIGHT will be expanded and their element types will be checked with
the same way. If we get Res != 0 on some stage, return it. Otherwise return 0.
7. For all other cases put llvm_unreachable.
llvm-svn: 203788
There's a bit of duplicated "magic" code in opt.cpp and Clang's CodeGen that
computes the inliner threshold from opt level and size opt level.
This patch moves the code to a function that lives alongside the inliner itself,
providing a convenient overload to the inliner creation.
A separate patch can be committed to Clang to use this once it's committed to
LLVM. Standalone tools that use the inlining pass can also avoid duplicating
this code and fearing it will go out of sync.
Note: this patch also restructures the conditinal logic of the computation to
be cleaner.
llvm-svn: 203669
The syntax for "cmpxchg" should now look something like:
cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic
where the second ordering argument gives the required semantics in the case
that no exchange takes place. It should be no stronger than the first ordering
constraint and cannot be either "release" or "acq_rel" (since no store will
have taken place).
rdar://problem/15996804
llvm-svn: 203559
This requires a number of steps.
1) Move value_use_iterator into the Value class as an implementation
detail
2) Change it to actually be a *Use* iterator rather than a *User*
iterator.
3) Add an adaptor which is a User iterator that always looks through the
Use to the User.
4) Wrap these in Value::use_iterator and Value::user_iterator typedefs.
5) Add the range adaptors as Value::uses() and Value::users().
6) Update *all* of the callers to correctly distinguish between whether
they wanted a use_iterator (and to explicitly dig out the User when
needed), or a user_iterator which makes the Use itself totally
opaque.
Because #6 requires churning essentially everything that walked the
Use-Def chains, I went ahead and added all of the range adaptors and
switched them to range-based loops where appropriate. Also because the
renaming requires at least churning every line of code, it didn't make
any sense to split these up into multiple commits -- all of which would
touch all of the same lies of code.
The result is still not quite optimal. The Value::use_iterator is a nice
regular iterator, but Value::user_iterator is an iterator over User*s
rather than over the User objects themselves. As a consequence, it fits
a bit awkwardly into the range-based world and it has the weird
extra-dereferencing 'operator->' that so many of our iterators have.
I think this could be fixed by providing something which transforms
a range of T&s into a range of T*s, but that *can* be separated into
another patch, and it isn't yet 100% clear whether this is the right
move.
However, this change gets us most of the benefit and cleans up
a substantial amount of code around Use and User. =]
llvm-svn: 203364
to ensure we don't mess up any of the overrides. Necessary for cleaning
up the Value use iterators and enabling range-based traversing of use
lists.
llvm-svn: 202958
Move the test for this class into the IR unittests as well.
This uncovers that ValueMap too is in the IR library. Ironically, the
unittest for ValueMap is useless in the Support library (honestly, so
was the ValueHandle test) and so it already lives in the IR unittests.
Mmmm, tasty layering.
llvm-svn: 202821
name might indicate, it is an iterator over the types in an instruction
in the IR.... You see where this is going.
Another step of modularizing the support library.
llvm-svn: 202815
business.
This header includes Function and BasicBlock and directly uses the
interfaces of both classes. It has to do with the IR, it even has that
in the name. =] Put it in the library it belongs to.
This is one step toward making LLVM's Support library survive a C++
modules bootstrap.
llvm-svn: 202814
We should apply fastcc whenever profitable. We can expand this list,
but there are lots of conventions with performance implications that we
don't want to change.
Differential Revision: http://llvm-reviews.chandlerc.com/D2705
llvm-svn: 202293
During the LTO phase LICM will move loop invariant global variables out of loops
(informed by GlobalModRef). This makes more loops countable presenting
opportunity for the loop vectorizer.
Adding the loop vectorizer improves some TSVC benchmarks and twolf/ref dataset
(5%) on x86-64.
radar://15970632
llvm-svn: 202051