This reapplies r252949. I've changed the type of FuncName to be
std::string instead of StringRef in emitFnAttrCompatCheck.
Original commit message for r252949:
Provide a way to specify inliner's attribute compatibility and merging
rules using table-gen. NFC.
This commit adds new classes CompatRule and MergeRule to Attributes.td,
which are used to generate code to check attribute compatibility and
merge attributes of the caller and callee.
rdar://problem/19836465
llvm-svn: 252990
Summary:
The value that the CoreCLR personality passes to a funclet for the
establisher frame may be the root function's frame or may be the parent
funclet's (mostly empty) frame in the case of nested funclets. Each
funclet stores a pointer to the root frame in its own (mostly empty)
frame, as does the root function itself. All frames allocate this slot at
the same offset, measured from the post-prolog stack pointer, so that the
same sequence can accept any ancestor as an establisher frame parameter
value, and so that a single offset can be reported to the GC, which also
looks at this slot.
This change allocate the slot when processing function entry, and records
its frame index on the WinEHFuncInfo object, then inserts the code to
set/copy it during prolog emission.
Reviewers: majnemer, AndyAyers, pgavlin, rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14614
llvm-svn: 252983
rules using table-gen. NFC.
This commit adds new classes CompatRule and MergeRule to Attributes.td,
which are used to generate code to check attribute compatibility and
merge attributes of the caller and callee.
rdar://problem/19836465
llvm-svn: 252949
r243347 was intended to support a change to LSR (r243348). That change
to LSR has since had to be reverted (r243939) because it was buggy, and
now the code added in r243347 is untested and unexercised. Given that,
I think it is appropriate to revert r243347 for now, with the intent of
adding it back in later if I get around to checking in a fixed version
of r243348.
llvm-svn: 252948
Summary:
This change addresses two possible instances of user error / confusion when
merging sampled profile data.
Previously any input that didn't match the raw or processed instrumented format
would automatically be interpreted as instrumented profile text format data.
No error would be reported during the merge.
Example:
If foo-sampled.profdata and bar-sampled.profdata are binary sampled profiles:
Old behavior:
$ llvm-profdata merge foo-sampled.profdata bar-sampled.profdata -output foobar-sampled.profdata
$ llvm-profdata show -sample foobar-sampled.profdata
error: foobar-sampled.profdata:1: Expected 'mangled_name:NUM:NUM', found lprofi
This change adds basic checks for valid input data when assuming text input.
It also makes error messages related to file format validity more specific about
the assumbed profile data type.
New behavior:
$ llvm-profdata merge foo-sampled.profdata bar-sampled.profdata -o foobar-sampled.profdata
error: foo.profdata: Unrecognized instrumentation profile encoding format
Perhaps you forgot to use the -sample option?
Reviewers: bogner, davidxl, dnovillo
Subscribers: davidxl, llvm-commits
Differential Revision: http://reviews.llvm.org/D14558
llvm-svn: 252916
Summary:
This patch changes ARMV5, ARMV5E, ARMV6SM, ARMV6HL, ARMV7, ARMV7L,
ARMV7HL, ARMV7EM to be treated as aliases for the corresponding
standard architectures, instead of as actual architectures.
Reviewers: rengolin
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D14577
llvm-svn: 252903
Summary:
Support for R_MIPS_NONE allows us to parse MIPS16's usage of .reloc.
R_MIPS_32 was included to be able to better test the directive.
Targets can add their relocations by overriding MCAsmBackend::getFixupKind().
Subscribers: grosbach, rafael, majnemer, dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D13659
llvm-svn: 252888
Several backends have instructions to reverse the order of bits in an integer. Conceptually matching such patterns is similar to @llvm.bswap, and it was mentioned in http://reviews.llvm.org/D14234 that it would be best if these patterns were matched in InstCombine instead of reimplemented in every different target.
This patch introduces an intrinsic @llvm.bitreverse.i* that operates similarly to @llvm.bswap. For plumbing purposes there is also a new ISD node ISD::BITREVERSE, with simple expansion and promotion support.
The intention is that InstCombine's BSWAP detection logic will be extended to support BITREVERSE too, and @llvm.bitreverse intrinsics emitted (if the backend supports lowering it efficiently).
llvm-svn: 252878
Added "macro" option to "-debug-dump" flag, which trigger parsing and dumping of the ".debug_macinfo" section.
Differential Revision: http://reviews.llvm.org/D14294
llvm-svn: 252866
When working with tokens, it is often the case that one has instructions
which consume a token and produce a new token. Currently, we have no
mechanism to represent an initial token state.
Instead, we can create a notional "empty token" by inventing a new
constant which captures the semantics we would like. This new constant
is called ConstantTokenNone and is written textually as "token none".
Differential Revision: http://reviews.llvm.org/D14581
llvm-svn: 252811
Summary:
This change introduces the notion of "deoptimization" operand bundles.
LLVM can recognize and optimize these in more precise ways than it can a
generic "unknown" operand bundles.
The current form of this special recognition / optimization is an enum
entry in LLVMContext, a LangRef blurb and a verifier rule. Over time we
will teach LLVM to do more aggressive optimization around deoptimization
operand bundles, exploiting known facts about kinds of state
deoptimization operand bundles are allowed to track.
Reviewers: reames, majnemer, chandlerc, dexonsmith
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14551
llvm-svn: 252806
This is a step towards consolidating some of the information regarding
attributes in a single place.
This patch moves the enum attributes in Attributes.h to the table-gen
file. Additionally, it adds definitions of target independent string
attributes that will be used in follow-up commits by the inliner to
check attribute compatibility.
rdar://problem/19836465
llvm-svn: 252796
This patch adds DWARF values for the Delphi language and Borland C++
language extensions.
Reviewed by: dblaikie
Subscribers: llvm-commits, majnemer
Differential Revision: http://reviews.llvm.org/D14522
llvm-svn: 252776
Summary: Inlined callsites need to be emitted in debug info so that sample profile can be annotated to the correct inlined instance.
Reviewers: dnovillo, dblaikie
Subscribers: dblaikie, llvm-commits
Differential Revision: http://reviews.llvm.org/D14511
llvm-svn: 252768
Re-implement `ilist_node::getNextNode()` and `getPrevNode()` without
relying on the sentinel having a "next" pointer. Instead, get access to
the owning list and compare against the `begin()` and `end()` iterators.
This only works when the node *can* get access to the owning list. The
new support is in `ilist_node_with_parent<>`, and any class `Ty`
inheriting from `ilist_node<NodeTy>` that wants `getNextNode()` and/or
`getPrevNode()` should inherit from
`ilist_node_with_parent<NodeTy, ParentTy>` instead. The requirements:
- `NodeTy` must have a `getParent()` function that returns the parent.
- `ParentTy` must have a `getSublistAccess()` static that, given a(n
ignored) `NodeTy*` (to determine which list), returns a member field
pointer to the appropriate `ilist<>`.
This isn't the cleanest way to get access to the owning list, but it
leverages the API already used in the IR hierarchy (see, e.g.,
`Instruction::getSublistAccess()`).
If anyone feels like ripping out the calls to `getNextNode()` and
`getPrevNode()` and replacing with direct iterator logic, they can also
remove the access function, etc., but as an incremental step, I'm
maintaining the API where it's currently used in tree.
If these requirements are *not* met, call sites with access to the ilist
can call `iplist<NodeTy>::getNextNode(NodeTy*)` directly, as in
ilistTest.cpp.
Why rewrite this?
The old code was broken, calling `getNext()` on a sentinel that possibly
didn't have a "next" pointer at all! The new code avoids that
particular flavour of UB (see the commit message for r252538 for more
details about the "lucky" memory layout that made this function so
interesting).
There's still some UB here: the end iterator gets downcast to `NodeTy*`,
even when it's a sentinel (which is typically
`ilist_half_node<NodeTy*>`). I'll tackle that in follow-up commits.
See this llvm-dev thread for more details:
http://lists.llvm.org/pipermail/llvm-dev/2015-October/091115.html
What's the danger?
There might be some code that relies on `getNextNode()` or
`getPrevNode()` *never* returning `nullptr` -- i.e., that relies on them
being broken when the sentinel is an `ilist_half_node<NodeTy>`. I tried
to root out those cases with the audits I did leading up to r252380, but
it's possible I missed one or two. I hope not.
(If (1) you have out-of-tree code, (2) you've reverted r252380
temporarily, and (3) you get some weird crashes with this commit, then I
recommend un-reverting r252380 and auditing the compile errors looking
for "strange" implicit conversions.)
llvm-svn: 252694
For CoreCLR on Windows, stack probes must be emitted as inline sequences that probe successive stack pages
between the current stack limit and the desired new stack pointer location. This implements support for
the inline expansion on x64.
For in-body alloca probes, expansion is done during instruction lowering. For prolog probes, a stub call
is initially emitted during prolog creation, and expanded after epilog generation, to avoid complications
that arise when introducing new machine basic blocks during prolog and epilog creation.
Added a new test case, modified an existing one to exclude non-x64 coreclr (for now).
Add test case
Fix tests
llvm-svn: 252578
- Make indexed value profile data more compact by peeling out
the per-site value count field into its own smaller sized array.
- Introduced formal data structure definitions to specify value
profile data layout in indexed format. Previously the layout
of the data is only assumed in the client code (scattered in
three different places : size computation, EmitData, and ReadData
- The new data structure serves as a central place for layout documentation.
- Add interfaces to force BE output for value profile data (testing purpose)
- Add byte swap unit tests
Differential Revision: http://reviews.llvm.org/D14401
llvm-svn: 252563
Be honest about using iterator semantics in `SlotIndex::getNextSlot()`
and `SlotIndex::getPrevSlot()`. Instead of calling `getNextNode()` --
which is documented (but fails) to check for the sentinel -- call
`&*++getIterator()`.
This is (surprisingly!) a NFC commit. `ilist_traits<IndexListEntry>`
has an `ilist_half_node<IndexListEntry>` as a sentinel (and no other
fields), and so the layout of `ilist<IndexListEntry>` is:
--
struct ilist<IndexListEntry> {
ilist_half_node<IndexListEntry> Sentinel;
IndexListEntry *Head;
IndexListEntry *getHead() { return Head; }
IndexListEntry *getSentinel() { return cast<...>(&Sentinel); }
};
--
In memory, this happens to look just like:
--
struct ilist<IndexListEntry> {
ilist_node<IndexListEntry> Sentinel;
IndexListEntry *getHead() { return Sentinel.getNext(); }
IndexListEntry *getSentinel() { return cast<...>(&Sentinel); }
};
--
As a result, `ilist_node<IndexListEntry>::getNextNode()` that checks
`getNext()` of the possible sentinel will get a pointer to the head of
the list; it will never detect the sentinel, and will return the
sentinel itself instead of `nullptr` in the special cases.
Since `getNextNode()` and `getPrevNode()` don't work, just be honest
that we're not checking for the end/beginning of the list here. Since
this code works, I guess we must never go past the sentinel.
(It's possible we're just getting lucky, and the new code will get
"lucky" in the same situations. To properly fix that hypothetical bug,
we would need to check the iterator against `end()`/`begin()`.)
llvm-svn: 252538
SCEVUnionPredicate is copied constructed here: lib/Transforms/Scalar/LoopDistribute.cpp:793
and move assigned (which can use the base class's copy ctor just
fine/without extra cost (I'd add it if it weren't for MSVC's issues
meaning = default is insufficient)) here: lib/Transforms/Utils/LoopVersioning.cpp:46
llvm-svn: 252537
This is a prerequisite for further optimisations of these functions,
which will be commited as a separate patch.
Differential Revision: http://reviews.llvm.org/D14219
llvm-svn: 252535
Summary:
LAA currently generates a set of SCEV predicates that must be checked by users.
In the case of Loop Distribute/Loop Load Elimination, no such predicates could have
been emitted, since we don't allow stride versioning. However, in the future there
could be SCEV predicates that will need to be checked.
This change adds support for SCEV predicate versioning in the Loop Distribute, Loop
Load Eliminate and the loop versioning infrastructure.
Reviewers: anemet
Subscribers: mssimpso, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D14240
llvm-svn: 252467
"GCC requires the freestanding environment provide memcpy, memmove, memset
and memcmp": https://gcc.gnu.org/onlinedocs/gcc-5.2.0/gcc/Standards.html
Hence in GNUEABI targets LLVM should not convert 'memops' to their equivalent
'__aeabi_memops'. This convertion violates GCC contract.
The -meabi flag controls whether or not LLVM will modify 'memops' in GNUEABI
targets.
Without -meabi: use the triple default EABI.
With -meabi=default: use the triple default EABI.
With -meabi=gnu: use 'memops'.
With -meabi=4 or -meabi=5: use '__aeabi_memops'.
With -meabi set to an unknown value: same as -meabi=default.
Patch by Vinicius Tinti.
llvm-svn: 252462
Summary: Mimic parseTriple(); and exposes it to LTOModule.cpp
Reviewers: dexonsmith, rafael
Subscribers: llvm-commits
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 252442
This commit adds enums in LLVMBitCodes.h to improve readability and
maintainability. This is a follow-up to r252368 which was discussed
here:
http://reviews.llvm.org/D12923
llvm-svn: 252395
Summary:
The CLR's personality routine passes these in rdx/edx, not rax/eax.
Make getExceptionPointerRegister a virtual method parameterized by
personality function to allow making this distinction.
Similarly make getExceptionSelectorRegister a virtual method parameterized
by personality function, for symmetry.
Reviewers: pgavlin, majnemer, rnk
Subscribers: jyknight, dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D14344
llvm-svn: 252383
This reverts commit r252373, reapplying r252372 now that I've updated
clang-tools-extra. Original commit message follows.
ADT: Require explicit ilist iterator/pointer conversions
Disallow implicit conversions between ilist iterators and element
points. Explicit conversions still work of course.
This is the first step toward removing the undefined behaviour in
`ilist` and `iplist`:
http://lists.llvm.org/pipermail/llvm-dev/2015-October/091115.html
The motivation for removing the implicit iterators is that I came across
real bugs (that were *really* getting lucky). More details and some
brief discussion later in that thread:
http://lists.llvm.org/pipermail/llvm-dev/2015-October/091617.html
Note: if you have out-of-tree code, it should be fairly easy to revert
this patch downstream while you update your out-of-tree call sites.
Note that these conversions are occasionally latent bugs (that may
happen to "work" now, but only because of getting lucky with UB;
follow-ups will change your luck). When they are valid, I suggest using
`->getIterator()` to go from pointer to iterator, and `&*` to go from
iterator to pointer.
llvm-svn: 252380
Disallow implicit conversions between ilist iterators and element
points. Explicit conversions still work of course.
This is the first step toward removing the undefined behaviour in
`ilist` and `iplist`:
http://lists.llvm.org/pipermail/llvm-dev/2015-October/091115.html
The motivation for removing the implicit iterators is that I came across
real bugs (that were *really* getting lucky). More details and some
brief discussion later in that thread:
http://lists.llvm.org/pipermail/llvm-dev/2015-October/091617.html
Note: if you have out-of-tree code, it should be fairly easy to revert
this patch downstream while you update your out-of-tree call sites.
Note that these conversions are occasionally latent bugs (that may
happen to "work" now, but only because of getting lucky with UB;
follow-ups will change your luck). When they are valid, I suggest using
`->getIterator()` to go from pointer to iterator, and `&*` to go from
iterator to pointer.
llvm-svn: 252372
This marker prevents optimization passes from adding 'tail' or
'musttail' markers to a call. Is is used to prevent tail call
optimization from being performed on the call.
rdar://problem/22667622
Differential Revision: http://reviews.llvm.org/D12923
llvm-svn: 252368
Summary:
This change makes the `isImpliedCondition` interface similar to the rest
of the functions in ValueTracking (in that it takes a DataLayout,
AssumptionCache etc.). This is an NFC, intended to make a later diff
less noisy.
Depends on D14369
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14391
llvm-svn: 252333
Summary:
In this implementation, LiveIntervalAnalysis invents a few register
masks on basic block boundaries that preserve no registers. The nice
thing about this is that it prevents the prologue inserter from thinking
it needs to spill all XMM CSRs, because it doesn't see any explicit
physreg defs in the MI.
Reviewers: MatzeB, qcolombet, JosephTremoulet, majnemer
Subscribers: MatzeB, llvm-commits
Differential Revision: http://reviews.llvm.org/D14407
llvm-svn: 252318
We now create the .eh_frame section early, just like every other special
section.
This means that the special flags are visible in code that explicitly
asks for ".eh_frame".
llvm-svn: 252313
This attribute allows the compiler to assume that the function never recurses into itself, either directly or indirectly (transitively). This can be used among other things to demote global variables to locals.
llvm-svn: 252282
The bug: I missed adding break statements in the switch / case.
Original commit message:
[SCEV] Teach SCEV some axioms about non-wrapping arithmetic
Summary:
- A s< (A + C)<nsw> if C > 0
- A s<= (A + C)<nsw> if C >= 0
- (A + C)<nsw> s< A if C < 0
- (A + C)<nsw> s<= A if C <= 0
Right now `C` needs to be a constant, but we can later generalize it to
be a non-constant if needed.
Reviewers: atrick, hfinkel, reames, nlewycky
Subscribers: sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D13686
llvm-svn: 252236
Previously, subprograms contained a metadata reference to the function they
described. Because most clients need to get or set a subprogram for a given
function rather than the other way around, this created unneeded inefficiency.
For example, many passes needed to call the function llvm::makeSubprogramMap()
to build a mapping from functions to subprograms, and the IR linker needed to
fix up function references in a way that caused quadratic complexity in the IR
linking phase of LTO.
This change reverses the direction of the edge by storing the subprogram as
function-level metadata and removing DISubprogram's function field.
Since this is an IR change, a bitcode upgrade has been provided.
Fixes PR23367. An upgrade script for textual IR for out-of-tree clients is
attached to the PR.
Differential Revision: http://reviews.llvm.org/D14265
llvm-svn: 252219
Also, remove an enum hack where enum values were used as indexes into an array.
We may want to make this a real class to allow pattern-based queries/customization (D13417).
llvm-svn: 252196
The needed lld matching changes to be submitted immediately next,
but this revision will cause lld failures with this alone which is expected.
This removes the eating of the error in Archive::Child::getSize() when the characters
in the size field in the archive header for the member is not a number. To do this we
have all of the needed methods return ErrorOr to push them up until we get out of lib.
Then the tools and can handle the error in whatever way is appropriate for that tool.
So the solution is to plumb all the ErrorOr stuff through everything that touches archives.
This include its iterators as one can create an Archive object but the first or any other
Child object may fail to be created due to a bad size field in its header.
Thanks to Lang Hames on the changes making child_iterator contain an
ErrorOr<Child> instead of a Child and the needed changes to ErrorOr.h to add
operator overloading for * and -> .
We don’t want to use llvm_unreachable() as it calls abort() and is produces a “crash”
and using report_fatal_error() to move the error checking will cause the program to
stop, neither of which are really correct in library code. There are still some uses of
these that should be cleaned up in this library code for other than the size field.
The test cases use archives with text files so one can see the non-digit character,
in this case a ‘%’, in the size field.
These changes will require corresponding changes to the lld project. That will be
committed immediately after this change. But this revision will cause lld failures
with this alone which is expected.
llvm-svn: 252192
With this change, instrumentation code and reader/write
code related to profile data structs are kept strictly
in-sync. THis will be extended to cfe and compile-rt
references as well.
Differential Revision: http://reviews.llvm.org/D13843
llvm-svn: 252113
1. A macro with argument: LLVM_PACKED(StructDefinition)
2. A pair of macros defining scope of region with packing:
LLVM_PACKED_START
struct A { ... };
struct B { ... };
LLVM_PACKED_END
Differential Revision: http://reviews.llvm.org/D14337
llvm-svn: 252099
Splits PrintLoopPass into a new-style pass and a PrintLoopPassWrapper,
much like we already do for PrintFunctionPass and PrintModulePass.
llvm-svn: 252085
This is part-1 of the patch that replaces all edge weights in MBB by
probabilities, which only adds new interfaces. No functional changes.
Differential revision: http://reviews.llvm.org/D13908
llvm-svn: 252083
Summary:
Data operands of a call or invoke consist of the call arguments, and
the bundle operands associated with the `call` (or `invoke`)
instruction. The motivation for this change is that we'd like to be
able to query "argument attributes" like `readonly` and `nocapture`
for bundle operands naturally.
This change also provides a conservative "implementation" for these
attributes for any bundle operand, and an extension point for future
work.
Reviewers: chandlerc, majnemer, reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14305
llvm-svn: 252077
We can often end up with conditional stores that cannot be speculated. They can come from fairly simple, idiomatic code:
if (c & flag1)
*a = x;
if (c & flag2)
*a = y;
...
There is no dominating or post-dominating store to a, so it is not legal to move the store unconditionally to the end of the sequence and cache the intermediate result in a register, as we would like to.
It is, however, legal to merge the stores together and do the store once:
tmp = undef;
if (c & flag1)
tmp = x;
if (c & flag2)
tmp = y;
if (c & flag1 || c & flag2)
*a = tmp;
The real power in this optimization is that it allows arbitrary length ladders such as these to be completely and trivially if-converted. The typical code I'd expect this to trigger on often uses binary-AND with constants as the condition (as in the above example), which means the ending condition can simply be truncated into a single binary-AND too: 'if (c & (flag1|flag2))'. As in the general case there are bitwise operators here, the ladder can often be optimized further too.
This optimization involves potentially increasing register pressure. Even in the simplest case, the lifetime of the first predicate is extended. This can be elided in some cases such as using binary-AND on constants, but not in the general case. Threading 'tmp' through all branches can also increase register pressure.
The optimization as in this patch is enabled by default but kept in a very conservative mode. It will only optimize if it thinks the resultant code should be if-convertable, and additionally if it can thread 'tmp' through at least one existing PHI, so it will only ever in the worst case create one more PHI and extend the lifetime of a predicate.
This doesn't trigger much in LNT, unfortunately, but it does trigger in a big way in a third party test suite.
llvm-svn: 252051
This was breaking the modules build and is being reverted while we reach consensus on the right way to solve this layering problem. This reverts commit r251785.
llvm-svn: 252040
Intended to make later changes simpler. Exposes
`getBundleOperandsStartIndex` and `getBundleOperandsEndIndex`, and uses
them for the computation in `getNumTotalBundleOperands`.
llvm-svn: 252037
Summary:
The goal of this pass is to perform store-to-load forwarding across the
backedge of a loop. E.g.:
for (i)
A[i + 1] = A[i] + B[i]
=>
T = A[0]
for (i)
T = T + B[i]
A[i + 1] = T
The pass relies on loop dependence analysis via LoopAccessAnalisys to
find opportunities of loop-carried dependences with a distance of one
between a store and a load. Since it's using LoopAccessAnalysis, it was
easy to also add support for versioning away may-aliasing intervening
stores that would otherwise prevent this transformation.
This optimization is also performed by Load-PRE in GVN without the
option of multi-versioning. As was discussed with Daniel Berlin in
http://reviews.llvm.org/D9548, this is inferior to a more loop-aware
solution applied here. Hopefully, we will be able to remove some
complexity from GVN/MemorySSA as a consequence.
In the long run, we may want to extend this pass (or create a new one if
there is little overlap) to also eliminate loop-indepedent redundant
loads and store that *require* versioning due to may-aliasing
intervening stores/loads. I have some motivating cases for store
elimination. My plan right now is to wait for MemorySSA to come online
first rather than using memdep for this.
The main motiviation for this pass is the 456.hmmer loop in SPECint2006
where after distributing the original loop and vectorizing the top part,
we are left with the critical path exposed in the bottom loop. Being
able to promote the memory dependence into a register depedence (even
though the HW does perform store-to-load fowarding as well) results in a
major gain (~20%). This gain also transfers over to x86: it's
around 8-10%.
Right now the pass is off by default and can be enabled
with -enable-loop-load-elim. On the LNT testsuite, there are two
performance changes (negative number -> improvement):
1. -28% in Polybench/linear-algebra/solvers/dynprog: the length of the
critical paths is reduced
2. +2% in Polybench/stencils/adi: Unfortunately, I couldn't reproduce this
outside of LNT
The pass is scheduled after the loop vectorizer (which is after loop
distribution). The rational is to try to reuse LAA state, rather than
recomputing it. The order between LV and LLE is not critical because
normally LV does not touch scalar st->ld forwarding cases where
vectorizing would inhibit the CPU's st->ld forwarding to kick in.
LoopLoadElimination requires LAA to provide the full set of dependences
(including forward dependences). LAA is known to omit loop-independent
dependences in certain situations. The big comment before
removeDependencesFromMultipleStores explains why this should not occur
for the cases that we're interested in.
Reviewers: dberlin, hfinkel
Subscribers: junbuml, dberlin, mssimpso, rengolin, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D13259
llvm-svn: 252017
Summary: Will be used by the LoopLoadElimination pass.
Reviewers: hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13258
llvm-svn: 252016
Summary:
The functions use LAI and MemoryDepChecker classes so they need to be
defined after those definitions outside of the Dependence class.
Will be used by the LoopLoadElimination pass.
Reviewers: hfinkel
Subscribers: rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D13257
llvm-svn: 252015
A profile of an LTO link of Chrome revealed that we were spending some
~30-50% of execution time in the function Constant::getRelocationInfo(),
which is called from TargetLoweringObjectFile::getKindForGlobal() and in turn
from TargetMachine::getNameWithPrefix().
It turns out that we only need the result of getKindForGlobal() when
targeting Mach-O, so this change moves the relevant part of the logic to
TargetLoweringObjectFileMachO.
NFCI.
Differential Revision: http://reviews.llvm.org/D14168
llvm-svn: 252014
Introduce DIPrinter which takes care of rendering DILineInfo and
friends. This allows LLVMSymbolizer class to return a structured data
instead of plain std::strings.
llvm-svn: 251989
Summary:
We now collect all types of dependences including lexically forward
deps not just "interesting" ones.
Reviewers: hfinkel
Subscribers: rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D13256
llvm-svn: 251985
Make printDILineInfo and friends responsible for just rendering the
contents of the structures, demangling should actually be performed
earlier, when we have the information about the originating
SymbolizableModule at hand.
llvm-svn: 251981
Summary:
When the dependence distance in zero then we have a loop-independent
dependence from the earlier to the later access.
No current client of LAA uses forward dependences so other than
potentially hitting the MaxDependences threshold earlier, this change
shouldn't affect anything right now.
This and the previous patch were tested together for compile-time
regression. None found in LNT/SPEC.
Reviewers: hfinkel
Subscribers: rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D13255
llvm-svn: 251973
Bypassing LLVM for this has a number of benefits:
1) Laziness support becomes asm-syntax agnostic (previously lazy jitting didn't
work on Windows as the resolver block was in Darwin asm).
2) For cross-process JITs, it allows resolver blocks and trampolines to be
emitted directly in the target process, reducing cross process traffic.
3) It should be marginally faster.
llvm-svn: 251933
When push instructions are being used to pass function arguments on
the stack, and either EH or debugging are enabled, we need to generate
.cfi_adjust_cfa_offset directives appropriately. For (synch) EH, it is
enough for the CFA offset to be correct at every call site, while
for debugging we want to be correct after every push.
Darwin does not support this well, so don't use pushes whenever it
would be required.
Differential Revision: http://reviews.llvm.org/D13767
llvm-svn: 251904
ScheduleDAGInstrs doesn't behave differently before or after register
allocation. It was only used in a method of MachineSchedulerBase which
behaved differently in MachineScheduler/PostMachineScheduler. Change
this to let MachineScheduler/PostMachineScheduler just pass in a
parameter to that function.
The order of the LiveIntervals* and bool RemoveKillFlags paramters have
been switched to make out-of-tree code fail instead of unintentionally
passing a value intended for the IsPostRA flag to the (previously
following and default initialized) RemoveKillFlags.
Differential Revision: http://reviews.llvm.org/D14245
llvm-svn: 251883
This reverts commit r251837, due to a number of bot failures of the form:
/home/grosser/buildslave/perf-x86_64-penryn-O3-polly-fast/llvm.obj/tools/llvm-link/Release+Asserts/llvm-link.o:llvm-link.cpp:function
loadIndex(llvm::LLVMContext&, llvm::Module const*): error: undefined
reference to
'llvm::object::FunctionIndexObjectFile::create(llvm::MemoryBufferRef,
llvm::LLVMContext&, llvm::Module const*, bool)'
/home/grosser/buildslave/perf-x86_64-penryn-O3-polly-fast/llvm.obj/tools/llvm-link/Release+Asserts/llvm-link.o:llvm-link.cpp:function
loadIndex(llvm::LLVMContext&, llvm::Module const*): error: undefined
reference to 'llvm::object::FunctionIndexObjectFile::takeIndex()'
I'm not sure why these are happening - I added Object to the requred
libraries in tools/llvm-link/LLVMBuild.txt and the LLVM_LINK_COMPONENTS
in tools/llvm-link/CMakeLists.txt. Confirmed for my build that these
symbols come out of libLLVMObject.a. What am I missing?
llvm-svn: 251841
Summary:
Support for necessary linkage changes and symbol renaming during
ThinLTO function importing.
Also includes llvm-link support for manually importing functions
and associated llvm-link based tests.
Note that this does not include support for intelligently importing
metadata, which is currently imported duplicate times. That support will
be in the follow-on patch, and currently is ignored by the tests.
Reviewers: dexonsmith, joker.eph, davidxl
Subscribers: tobiasvk, tejohnson, llvm-commits
Differential Revision: http://reviews.llvm.org/D13515
llvm-svn: 251837
Summary:
SCEV Predicates represent conditions that typically cannot be derived from
static analysis, but can be used to reduce SCEV expressions to forms which are
usable for different optimizers.
ScalarEvolution now has the rewriteUsingPredicate method which can simplify a
SCEV expression using a SCEVPredicateSet. The normal workflow of a pass using
SCEVPredicates would be to hold a SCEVPredicateSet and every time assumptions
need to be made a new SCEV Predicate would be created and added to the set.
Each time after calling getSCEV, the user will call the rewriteUsingPredicate
method.
We add two types of predicates
SCEVPredicateSet - implements a set of predicates
SCEVEqualPredicate - tests for equality between two SCEV expressions
We use the SCEVEqualPredicate to re-implement stride versioning. Every time we
version a stride, we will add a SCEVEqualPredicate to the context.
Instead of adding specific stride checks, LoopVectorize now adds a more
generic SCEV check.
We only need to add support for this in the LoopVectorizer since this is the
only pass that will do stride versioning.
Reviewers: mzolotukhin, anemet, hfinkel, sanjoy
Subscribers: sanjoy, hfinkel, rengolin, jmolloy, llvm-commits
Differential Revision: http://reviews.llvm.org/D13595
llvm-svn: 251800
Summary:
The new function sys::path::user_cache_directory tries to discover
a directory suitable for cache storage for current system user.
On Windows and Darwin it returns a path to system-specific user cache directory.
On Linux it follows XDG Base Directory Specification, what is:
- use non-empty $XDG_CACHE_HOME env var,
- use $HOME/.cache.
Reviewers: chapuni, aaron.ballman, rafael
Subscribers: rafael, aaron.ballman, llvm-commits
Differential Revision: http://reviews.llvm.org/D13801
llvm-svn: 251784
1. Added a set of public interfaces in InstrProfRecord
class to access (read/write) value profile data.
2. Changed IndexedProfile reader and writer code to
use the newly defined interfaces and hide implementation
details.
3. Added a couple of unittests for value profiling:
- Test new interfaces to get and set value profile data
- Test value profile data merging with various scenarios.
No functional change is expected. The new interfaces will also
make it possible to change on-disk format of value prof data
to be more compact (to be submitted).
llvm-svn: 251771
This is a bit ugly, but has a few advantages:
* Archive is now easy to copy since there is no Archive -> Child -> Archive
loop.
* It makes it clear that we already checked for errors when finding the Child
data.
llvm-svn: 251750
Introduce LLVMSymbolizer::symbolizeInlinedCode() instead of switching
on PrintInlining option passed to the constructor. This will be needed
once we retrun structured data (instead of std::string) from
LLVMSymbolizer and move printing logic out.
llvm-svn: 251675
Summary:
This is mostly NFC. It is a first step in cleaning up LLVMSymbolize
library. It removes "ModuleInfo" class which bundles together ObjectFile
and its debug info context in favor of:
* abstract SymbolizableModule in public headers;
* SymbolizableObjectFile subclass in implementation.
Additionally, SymbolizableObjectFile is now created via factory, so we
can properly detect object parsing error at this stage instead of keeping
the broken half-parsed object. As a next step, we would be able to
propagate the error all the way back to the library user.
Further improvements might include:
* factoring out the logic of finding appropriate file with debug info
for a given object file, and caching all parsed object files into a
separate class [A].
* factoring out DILineInfo rendering [B].
This would make what is now a heavyweight "LLVMSymbolizer" a relatively
straightforward class, that calls into [A] to turn filepath into a
SymbolizableModule, delegates actual symbolization to concrete SymbolizableModule
implementation, and lets [C] render the result.
Reviewers: dblaikie, echristo, rafael
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14099
llvm-svn: 251662
This was a layering violation in ScheduleDAGInstrs (and
MachineSchedulerBase) they both shouldn't know directly whether they are
used by the PostMachineScheduler or the MachineScheduler.
llvm-svn: 251608