we do not use the information from SCEVAddRecExpr to compute the shape of the array,
so a better place for this function is in ScalarEvolution.
llvm-svn: 208456
MSVC always places the implicit sret parameter after the implicit this
parameter of instance methods. We used to handle this for
x86_thiscallcc by allocating the sret parameter on the stack and leaving
the this pointer in ecx, but that doesn't handle alternative calling
conventions like cdecl, stdcall, fastcall, or the win64 convention.
Instead, change the verifier to allow sret on the second parameter.
This also requires changing the Mips and X86 backends to return the
argument with the sret parameter, instead of assuming that the sret
parameter comes first.
The Sparc backend also returns sret parameters in a register, but I
wasn't able to update it to handle secondary sret parameters. It
currently calls report_fatal_error if you feed it an sret in the second
parameter.
Reviewers: rafael.espindola, majnemer
Differential Revision: http://reviews.llvm.org/D3617
llvm-svn: 208453
When using the ARM AAPCS, HFAs (Homogeneous Floating-point Aggregates) must
be passed in a block of consecutive floating-point registers, or on the stack.
This means that unused floating-point registers cannot be back-filled with
part of an HFA, however this can currently happen. This patch, along with the
corresponding clang patch (http://reviews.llvm.org/D3083) prevents this.
llvm-svn: 208413
(r207876 was reverted in r208131 after seeing some consistent buildbot
failure for MSVC 2012. The original commits were in r207724-r207726)
Takumi was nice enough to dig into this and locate this Microsoft
Connect issue:
http://connect.microsoft.com/VisualStudio/feedback/details/814899/forward-as-tuple-debug-implementation-error
describing a bug in MSVC2012's forward_as_tuple implementation.
Since the parameters in this instance are trivial/small, pass them by
value (using make_tuple) instead of perfectly-forwarded tuple of rvalue
references (involving the broken forward_as_tuple). Hopefully this will
satisfy MSVC2012.
llvm-svn: 208364
This behavior was added to support StringMaps of StringMaps, default +
move construction are sufficient for this.
Real move construction support coming soon (& probably copy construction
too).
llvm-svn: 208360
The old method used by X86TTI to determine partial-unrolling thresholds was
messy (because it worked by testing target features), and also would not
correctly identify the target CPU if certain target features were disabled.
After some discussions on IRC with Chandler et al., it was decided that the
processor scheduling models were the right containers for this information
(because it is often tied to special uop dispatch-buffer sizes).
This does represent a small functionality change:
- For generic x86-64 (which uses the SB model and, thus, will get some
unrolling).
- For AMD cores (because they still currently use the SB scheduling model)
- For Haswell (based on benchmarking by Louis Gerbarg, it was decided to bump
the default threshold to 50; we're working on a test case for this).
Otherwise, nothing has changed for any other targets. The logic, however, has
been moved into BasicTTI, so other targets may now also opt-in to this
functionality simply by setting LoopMicroOpBufferSize in their processor
model definitions.
llvm-svn: 208289
The change to ExtractGV.cpp has no functionality change except to avoid
the asserts. Existing testcases already cover this, so I didn't add a
new one.
llvm-svn: 208264
OnDiskHashTable::insert() calls the Item constructor via placement new, but
nothing called the destructor. This matters in cases when the Info template
parameter has key_type or data_type typedefs that have a destructor, for
example like IdentifierIndexWriterTrait in clang's GlobalModuleIndex.cpp.
This fixes a 5-year old bug that's been around since the OnDiskHashTable code
was added in r64192. Bug found by LSan!
llvm-svn: 208243
When reducing the bitwidth of a comparison against a constant, the
original setcc's result type was used, which was incorrect.
No test since I don't think any other in tree targets change the
bitwidth of the setcc type depending on the bitwidth of the compared
type.
llvm-svn: 208236
To compute the dimensions of the array in a unique way, we split the
delinearization analysis in three steps:
- find parametric terms in all memory access functions
- compute the array dimensions from the set of terms
- compute the delinearized access functions for each dimension
The first step is executed on all the memory access functions such that we
gather all the patterns in which an array is accessed. The second step reduces
all this information in a unique description of the sizes of the array. The
third step is delinearizing each memory access function following the common
description of the shape of the array computed in step 2.
This rewrite of the delinearization pass also solves a problem we had with the
previous implementation: because the previous algorithm was by induction on the
structure of the SCEV, it would not correctly recognize the shape of the array
when the memory access was not following the nesting of the loops: for example,
see polly/test/ScopInfo/multidim_only_ivs_3d_reverse.ll
; void foo(long n, long m, long o, double A[n][m][o]) {
;
; for (long i = 0; i < n; i++)
; for (long j = 0; j < m; j++)
; for (long k = 0; k < o; k++)
; A[i][k][j] = 1.0;
Starting with this patch we no longer delinearize access functions that do not
contain parameters, for example in test/Analysis/DependenceAnalysis/GCD.ll
;; for (long int i = 0; i < 100; i++)
;; for (long int j = 0; j < 100; j++) {
;; A[2*i - 4*j] = i;
;; *B++ = A[6*i + 8*j];
these accesses will not be delinearized as the upper bound of the loops are
constants, and their access functions do not contain SCEVUnknown parameters.
llvm-svn: 208232
Summary:
It concatenates two or more lists. In addition to the !strconcat semantics
the lists must have the same element type.
My overall aim is to make it easy to append to Instruction.Predicates
rather than override it. This can be done by concatenating lists passed as
arguments, or by concatenating lists passed in additional fields.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: hfinkel, llvm-commits
Differential Revision: http://reviews.llvm.org/D3506
llvm-svn: 208183
1) Fix for printing debug locations for absolute paths.
2) Location printing is moved into public method DebugLoc::print() to avoid re-inventing the wheel.
Differential Revision: http://reviews.llvm.org/D3513
llvm-svn: 208177
If the source files referenced by a gcno file are missing, gcov
outputs a coverage file where every line is simply /*EOF*/. This also
occurs for lines in the coverage that are past the end of a file that
is found.
This change mimics gcov.
llvm-svn: 208149
In gcov, there's a -n/--no-output option, which disables the writing
of any .gcov files, so that it emits only the summary info on stdout.
This implements the same behaviour in llvm-cov.
llvm-svn: 208148
This is similar to the getAlignment patch, but is done just for
completeness. It looks like we never call getSection on an alias. All the
tests still pass if the if is replaced with an assert.
llvm-svn: 208139
Speculatively reverting due to a suspicious failure on a Windows
buildbot.
This reverts commit 10c37a012ea11596d44cd9059fe09c959caf30c8.
llvm-svn: 208131
This patch implements the infrastructure to use named register constructs in
programs that need access to specific registers (bare metal, kernels, etc).
So far, only the stack pointer is supported as a technology preview, but as it
is, the intrinsic can already support all non-allocatable registers from any
architecture.
llvm-svn: 208104
An alias has the address of what it points to, so it also has the same
alignment.
This allows a few optimizations to see past aliases for free.
llvm-svn: 208103
Also, provide the ability to create temporary and non-temporary
declarations, as not all declarations may be replaced by definitions
later on.
This provides the necessary infrastructure for Clang to fix PR19598,
leaking temporary MDNodes in Clang's debug info generation.
llvm-svn: 208054
The number of tail call to loop conversions remains the same (1618 by my count).
The new algorithm does a local scan over the use-def chains to identify local "alloca-derived" values, as well as points where the alloca could escape. Then, a visit over the CFG marks blocks as being before or after the allocas have escaped, and annotates the calls accordingly.
llvm-svn: 208017
operations on the call graph. This one forms a cycle, and while not as
complex as removing an internal edge from an SCC, it involves
a reasonable amount of work to find all of the nodes newly connected in
a cycle.
Also somewhat alarming is the worst case complexity here: it might have
to walk roughly the entire SCC inverse DAG to insert a single edge. This
is carefully documented in the API (I hope).
llvm-svn: 207935
The fix itself is fairly simple: move getAccessVariant to MCValue so that we
replace the old weak expression evaluation with the far more general
EvaluateAsRelocatable.
This then requires that EvaluateAsRelocatable stop when it finds a non
trivial reference kind. And that in turn requires the ELF writer to look
harder for weak references.
Last but not least, this found a case where we were being bug by bug
compatible with gas and accepting an invalid input. I reported pr19647
to track it.
llvm-svn: 207920
Committed initially in r207724-r207726 and reverted due to compiler-rt
crashes in r207732.
Instead, fix this harder with unordered_map and store the LexicalScopes
by value in the map. This did necessitate moving the definition of
LexicalScope above the definition of LexicalScopes.
Let's see how the buildbots/compilers tolerate unordered_map::emplace +
std::piecewise_construct + std::forward_as_tuple...
llvm-svn: 207876
This moves most of GlobalOpt's constructor optimization
code out of GlobalOpt into Transforms/Utils/CDtorUtils.{h,cpp}. The
public interface is a single function OptimizeGlobalCtorsList() that
takes a predicate returning which constructors to remove.
GlobalOpt calls this with a function that statically evaluates all
constructors, just like it did before. This part of the change is
behavior-preserving.
Also add a call to this from GlobalDCE with a filter that removes global
constructors that contain a "ret" instruction and nothing else – this
fixes PR19590.
llvm-svn: 207856
This optimization merges the common part of a group of GEPs, so we can compute
each pointer address by adding a simple offset to the common part.
The optimization is currently only enabled for the NVPTX backend, where it has
a large payoff on some benchmarks.
Review: http://reviews.llvm.org/D3462
Patch by Jingyue Wu.
llvm-svn: 207783
just connects an SCC to one of its descendants directly. Not much of an
impact. The last one is the hard one -- connecting an SCC to one of its
ancestors, and thereby forming a cycle such that we have to merge all
the SCCs participating in the cycle.
llvm-svn: 207751
of SCCs in the SCC DAG. Exercise them in the big graph test case. These
will be especially useful for establishing invariants in insertion
logic.
llvm-svn: 207749
removal. We can't just blindly increment (or decrement) the adapted
iterator when the value is null because doing so can walk past the end
(or beginning) and keep inspecting the value. The fix I've implemented
is to restrict this further to a forward iterator and add an end
iterator to the members (replacing a member that had become dead when
I switched to the adaptor base!) and using that to stop the iteration.
I'm not entirely pleased with this solution. I feel like forward
iteration is too restrictive. I wasn't even happy about bidirectional
iteration. It also makes the iterator objects larger and the iteration
loops more complex. However, I also don't really like the other
alternative that seems obvious: a sentinel node. I'm still hoping to
come up with a more elegant solution here, but this at least fixes the
MSan and Valgrind errors on this code.
llvm-svn: 207743
For pattern like ((x >> C1) & Mask) << C2, DAG combiner may convert it
into (x >> (C1-C2)) & (Mask << C2), which makes pattern matching of ubfx
more difficult.
For example:
Given
%shr = lshr i64 %x, 4
%and = and i64 %shr, 15
%arrayidx = getelementptr inbounds [8 x [64 x i64]]* @arr, i64 0, %i64 2, i64 %and
%0 = load i64* %arrayidx
With current shift folding, it takes 3 instrs to compute base address:
lsr x8, x0, #1
and x8, x8, #0x78
add x8, x9, x8
If using ubfx, it only needs 2 instrs:
ubfx x8, x0, #4, #4
add x8, x9, x8, lsl #3
This fixes bug 19589
llvm-svn: 207702
We already do this for shstrtab, so might as well do it for strtab. This
extracts the string table building code into a separate class. The idea
is to use it for other object formats too.
I mostly wanted to do this for the general principle, but it does save a
little bit on object file size. I tried this on a clang bootstrap and
saved 0.54% on the sum of object file sizes (1.14 MB out of 212 MB for
a release build).
Differential Revision: http://reviews.llvm.org/D3533
llvm-svn: 207670
When we were moving from a larger vector to a smaller one but didn't
need to re-allocate, we would move-assign over uninitialized memory in
the target, then move-construct that same data again.
llvm-svn: 207663
Summary:
The N64 ABI allows up to three operations to be specified per relocation record
independently of the endianness.
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D3529
llvm-svn: 207636
edge entirely within an existing SCC. Shockingly, making the connected
component more connected is ... a total snooze fest. =]
Anyways, its wired up, and I even added a test case to make sure it
pretty much sorta works. =D
llvm-svn: 207631
(OutBufCur + Size) might overflow if Size were large. For example on i686-linux,
OutBufCur: 0xFFFDF27D
OutBufEnd: 0xFFFDF370
Size: 0x0002BF20 (180,000)
It caused flaky error in MC/COFF/section-name-encoding.s.
llvm-svn: 207621
bits), and discover that it's totally broken. Yay tests. Boo bug. Fix
the basic edge removal so that it works by nulling out the removed edges
rather than actually removing them. This leaves the indices valid in the
map from callee to index, and preserves some of the locality for
iterating over edges. The iterator is made bidirectional to reflect that
it now has to skip over null entries, and the skipping logic is layered
onto it.
As future work, I would like to track essentially the "load factor" of
the edge list, and when it falls below a threshold do a compaction.
An alternative I considered (and continue to consider) is storing the
callees in a doubly linked list where each element of the list is in
a set (which is essentially the classical linked-hash-table
datastructure). The problem with that approach is that either you need
to heap allocate the linked list nodes and use pointers to them, or use
a bucket hash table (with even *more* linked list pointer overhead!),
etc. It's pretty easy to get 5x overhead for values that are just
pointers. So far, I think punching holes in the vector, and periodic
compaction is likely to be much more efficient overall in the space/time
tradeoff.
llvm-svn: 207619
Seems MSVC wants to be able to codegen inline-definitions of virtual
functions even in TUs that don't define the key function - and it's well
within its rights to do so.
llvm-svn: 207581
This starts in MCJIT::getSymbolAddress where the
unique_ptr<object::Binary> is release()d and (after a cast) passed to a
single caller, MCJIT::addObjectFile.
addObjectFile calls RuntimeDyld::loadObject.
RuntimeDld::loadObject calls RuntimeDyldELF::createObjectFromFile
And the pointer is never owned at this point. I say this point, because
the alternative codepath, RuntimeDyldMachO::createObjectFile certainly
does take ownership, so this seemed like a good hint that this was a/the
right place to take ownership.
llvm-svn: 207580
Before this patch, if 'nul' was passed in input to clang, function
getStatus() (in Path.inc) always returned an instance of file_status with
field 'nFileSizeHigh' and 'nFileSizeLow' left uninitialized.
This was causing the triggering of an assertion failure in MemoryBuffer.cpp due
to an invalid FileSize for device 'nul'.
This patch fixes the assertion failure modifying the constructors of class
file_status (in llvm/Support/FileSystem.h) so that every field of the class
gets initialized to zero by default.
A clang test will be submitted on a separate patch.
llvm-svn: 207575
Change `BlockFrequency` to defer to `BranchProbability::scale()` and
`BranchProbability::scaleByInverse()`.
This removes `BlockFrequency::scale()` from its API (and drops the
ability to see the remainder), but the only user was the unit tests. If
some code in the future needs an API that exposes the remainder, we can
add something to `BranchProbability`, but I find that unlikely.
llvm-svn: 207550
Since `BlockMass` is an implementation detail and there are no current
users of this, delete `BlockMass::operator*=(BlockMass)`. I might need
this when I try to strip out `UnsignedFloat`, but I can pull it back in
at that point.
llvm-svn: 207546
Summary:
This calls emitOptimizationRemark from the loop unroller and vectorizer
at the point where they make a positive transformation. For the
vectorizer, it reports vectorization and interleave factors. For the
loop unroller, it reports all the different supported types of
unrolling.
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3456
llvm-svn: 207528
This patch centralizes the handling of the thumb bit around
MCStreamer::isThumbFunc and makes isThumbFunc handle aliases.
This fixes a corner case, but the main advantage is having just one
way to check if a MCSymbol is thumb or not. This should still be
refactored to be ARM only, but at least now it is just one predicate
that has to be refactored instead of 3 (isThumbFunc,
ELF_Other_ThumbFunc, and SF_ThumbFunc).
llvm-svn: 207522
requiring full control over the various parameters to the std::iterator
concept / trait thing. This is a precursor for adjusting these things to
where you can write a bidirectional iterator wrapping a random access
iterator with custom increment and decrement logic.
llvm-svn: 207487
When evaluating an assembly expression for a relocation, we want to
stop at MCSymbols that are in the symbol table, even if they are variables.
This is needed since the semantics may require that the relocation use them.
That is not the case when computing the value of a symbol in the symbol table.
There are no relocations in this case and we have to keep going until we hit
a section or find out that the expression doesn't have an assembly time
value.
llvm-svn: 207445
This reverts commit r207287, reapplying r207286.
I'm hoping that declaring an explicit struct and instantiating
`addBlockEdges()` directly works around the GCC crash from r207286.
This is a lot more boilerplate, though.
llvm-svn: 207438
This commit provides the necessary C/C++ APIs and infastructure to enable fine-
grain progress report and safe suspension points after each pass in the pass
manager.
Clients can provide a callback function to the pass manager to call after each
pass. This can be used in a variety of ways (progress report, dumping of IR
between passes, safe suspension of threads, etc).
The run listener list is maintained in the LLVMContext, which allows a multi-
threaded client to be only informed for it's own thread. This of course assumes
that the client created a LLVMContext for each thread.
This fixes <rdar://problem/16728690>
llvm-svn: 207430
domtree. When finding a nearest common dominator, if neither A dominates
B nor B dominates A, we immediately resorted to a tree walk. The tree
walk here is *particularly* expensive because we have to build
a (potentially very large) set for one side's dominators and compare it
with the other side's.
If at any point we have DFS info, we don't need to do any of this. We
can just walk up one side's immediate dominators and return the first
one which dominates the other side. Because of the DFS info, the
dominates queries are trivially constant time.
This reduces the optimizers time in the test case on PR19499 by 70%. It
now optimizes in about 30 seconds for me. And there is still more to be
done for this case.
llvm-svn: 207406
This introduces a target specific streamer, X86WinCOFFStreamer, which handles
the target specific behaviour (e.g. WinEH). This is mostly to ensure that
differences between ARM and X86 remain disjoint and do not accidentally cross
boundaries. This is the final staging change for enabling object emission for
Windows on ARM.
llvm-svn: 207344
This is in preparation for promoting WinCOFFStreamer to a base class which will
be shared by the X86 and ARM specific target COFF streamers. Also add a new
getOrCreateSymbolData interface (like MCELFStreamer) for the ARM COFF Streamer.
This makes the COFFStreamer more similar to the ELFStreamer.
llvm-svn: 207343
API requirements much more obvious.
The key here is that there are two totally different use cases for
mutating the graph. Prior to doing any SCC formation, it is very easy to
mutate the graph. There may be users that want to do small tweaks here,
and then use the already-built graph for their SCC-based operations.
This method remains on the graph itself and is documented carefully as
being cheap but unavailable once SCCs are formed.
Once SCCs are formed, and there is some in-flight DFS building them, we
have to be much more careful in how we mutate the graph. These mutation
operations are sunk onto the SCCs themselves, which both simplifies
things (the code was already there!) and helps make it obvious that
these interfaces are only applicable within that context. The other
primary constraint is that the edge being mutated is actually related to
the SCC on which we call the method. This helps make it obvious that you
cannot arbitrarily mutate some other SCC.
I've tried to write much more complete documentation for the interesting
mutation API -- intra-SCC edge removal. Currently one aspect of this
documentation is a lie (the result list of SCCs) but we also don't even
have tests for that API. =[ I'm going to add tests and fix it to match
the documentation next.
llvm-svn: 207339
Otherwise the legalizer would just scalarize everything. Support for
mulhi in the targets isn't that great yet so on most targets we get
exactly the same scalarized output. Add a test for x86 vector udiv.
I had to disable the mulhi nodes on ARM because there aren't any patterns
for it. As far as I know ARM has instructions for getting the high part of
a multiply so this should be fixed.
llvm-svn: 207315
them, just skip over any DFS-numbered nodes when finding the next root
of a DFS. This allows the entry set to just be a vector as we populate
it from a uniqued source. It also removes the possibility for a linear
scan of the entry set to actually do the removal which can make things
go quadratic if we get unlucky.
llvm-svn: 207312
makes working through the worklist much cleaner, and makes it possible
to avoid the 'bool-to-continue-the-outer-loop' hack. Not a huge
difference, but I think this is approaching as polished as I can make
it.
llvm-svn: 207310
processed in the DFS out of the stack completely. Keep it exclusively in
a variable. Re-shuffle some code structure to make this easier. This can
have a very dramatic effect in some cases because call graphs tend to
look like a high fan-out spanning tree. As a consequence, there are
a large number of leaf nodes in the graph, and this technique causes
leaf nodes to never even go into the stack. While this only reduces the
max depth by 1, it may cause the total number of round trips through the
stack to drop by a lot.
Now, most of this isn't really relevant for the incremental version. =]
But I wanted to prototype it first here as this variant is in ways more
complex. As long as I can get the code factored well here, I'll next
make the primary walk look the same. There are several refactorings this
exposes I think.
llvm-svn: 207306
a helper function. Also factor the other two places where we did the
same thing into the helper function. =] Much cleaner this way. NFC.
llvm-svn: 207300
This reverts commit r207286. It causes an ICE on the
cmake-llvm-x86_64-linux buildbot [1]:
llvm/lib/Analysis/BlockFrequencyInfo.cpp: In lambda function:
llvm/lib/Analysis/BlockFrequencyInfo.cpp:182:1: internal compiler error: in get_expr_operands, at tree-ssa-operands.c:1035
[1]: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/12093/steps/build_llvm/logs/stdio
llvm-svn: 207287
Previously, irreducible backedges were ignored. With this commit,
irreducible SCCs are discovered on the fly, and modelled as loops with
multiple headers.
This approximation specifies the headers of irreducible sub-SCCs as its
entry blocks and all nodes that are targets of a backedge within it
(excluding backedges within true sub-loops). Block frequency
calculations act as if we insert a new block that intercepts all the
edges to the headers. All backedges and entries to the irreducible SCC
point to this imaginary block. This imaginary block has an edge (with
even probability) to each header block.
The result is now reasonable enough that I've added a number of
testcases for irreducible control flow. I've outlined in
`BlockFrequencyInfoImpl.h` ways to improve the approximation.
<rdar://problem/14292693>
llvm-svn: 207286
This adds support for an -mattr option to the gold plugin and to llvm-lto. This
allows the caller to specify details of the subtarget architecture, like +aes,
or +ssse3 on x86. Note that this requires a change to the include/llvm-c/lto.h
interface: it adds a function lto_codegen_set_attr and it increments the
version of the interface.
llvm-svn: 207279
Actually use the `reference` typedef, and remove the private
redefinition of `pointer` since it has no users.
Using `reference` exposes a problem with r207257, which specified the
wrong `value_type` to `iterator_facade_base` (fixed that too).
llvm-svn: 207270
buildbot - do not insert debug intrinsics before phi nodes.
Debug info for optimized code: Support variables that are on the stack and
described by DBG_VALUEs during their lifetime.
Previously, when a variable was at a FrameIndex for any part of its
lifetime, this would shadow all other DBG_VALUEs and only a single
fbreg location would be emitted, which in fact is only valid for a small
range and not the entire lexical scope of the variable. The included
dbg-value-const-byref testcase demonstrates this.
This patch fixes this by
Local
- emitting dbg.value intrinsics for allocas that are passed by reference
- dropping all dbg.declares (they are now fully lowered to dbg.values)
SelectionDAG
- renamed constructors for SDDbgValue for better readability.
- fix UserValue::match() to handle indirect values correctly
- not inserting an MMI table entries for dbg.values that describe allocas.
- lowering dbg.values that describe allocas into *indirect* DBG_VALUEs.
CodeGenPrepare
- leaving dbg.values for an alloca were they are (see comment)
Other
- regenerated/updated instcombine.ll testcase and included source
rdar://problem/16679879
http://reviews.llvm.org/D3374
llvm-svn: 207269
Use the fancy new `iterator_facade_base` to add
`scc_iterator::operator->()`. Remove other definitions where
`iterator_facade_base` does the right thing.
<rdar://problem/14292693>
llvm-svn: 207257
This intrinsic is no longer needed with the new @llvm.arm.hint(i32) intrinsic
which provides a generic, extensible manner for adding hint instructions. This
functionality can now be represented as @llvm.arm.hint(i32 5).
llvm-svn: 207246
Introduce the llvm.arm.hint(i32) intrinsic that can be used to inject hints into
the instruction stream. This is particularly useful for generating IR from a
compiler where the user may inject an intrinsic (e.g. __yield). These are then
pattern substituted into the correct instruction which already existed.
llvm-svn: 207242
AllocaInst that was missing in one location.
Debug info for optimized code: Support variables that are on the stack and
described by DBG_VALUEs during their lifetime.
Previously, when a variable was at a FrameIndex for any part of its
lifetime, this would shadow all other DBG_VALUEs and only a single
fbreg location would be emitted, which in fact is only valid for a small
range and not the entire lexical scope of the variable. The included
dbg-value-const-byref testcase demonstrates this.
This patch fixes this by
Local
- emitting dbg.value intrinsics for allocas that are passed by reference
- dropping all dbg.declares (they are now fully lowered to dbg.values)
SelectionDAG
- renamed constructors for SDDbgValue for better readability.
- fix UserValue::match() to handle indirect values correctly
- not inserting an MMI table entries for dbg.values that describe allocas.
- lowering dbg.values that describe allocas into *indirect* DBG_VALUEs.
CodeGenPrepare
- leaving dbg.values for an alloca were they are (see comment)
Other
- regenerated/updated instcombine.ll testcase and included source
rdar://problem/16679879
http://reviews.llvm.org/D3374
llvm-svn: 207235
Remove the concepts of "forward" and "general" mass distributions, which
was wrong. The split might have made sense in an early version of the
algorithm, but it's definitely wrong now.
<rdar://problem/14292693>
llvm-svn: 207195
Make `getPackagedNode()` a member function of
`BlockFrequencyInfoImplBase` so that it's available for templated code.
<rdar://problem/14292693>
llvm-svn: 207183
Continue refactoring to make `LoopData` first-class. Here I'm making
the `LoopData` hierarchy explicit, instead of bouncing back and forth
with `WorkingData`. This simplifies the logic and better matches the
`LoopInfo` design. (Eventually, `LoopInfo` should be restructured so
that it supports this pass, and `LoopData` can be removed.)
<rdar://problem/14292693>
llvm-svn: 207180
As pointed out by David Blaikie in code review, a `std::list<T>` is
simpler than a `std::vector<std::unique_ptr<T>>`. Another option is a
`std::deque<T>` (which allocates in chunks), but I'd like to leave open
the option of inserting in the middle of the sequence for handling
irreducible control flow on the fly.
<rdar://problem/14292693>
llvm-svn: 207177
AllocaInst that was missing in one location.
Debug info for optimized code: Support variables that are on the stack and
described by DBG_VALUEs during their lifetime.
Previously, when a variable was at a FrameIndex for any part of its
lifetime, this would shadow all other DBG_VALUEs and only a single
fbreg location would be emitted, which in fact is only valid for a small
range and not the entire lexical scope of the variable. The included
dbg-value-const-byref testcase demonstrates this.
This patch fixes this by
Local
- emitting dbg.value intrinsics for allocas that are passed by reference
- dropping all dbg.declares (they are now fully lowered to dbg.values)
SelectionDAG
- renamed constructors for SDDbgValue for better readability.
- fix UserValue::match() to handle indirect values correctly
- not inserting an MMI table entries for dbg.values that describe allocas.
- lowering dbg.values that describe allocas into *indirect* DBG_VALUEs.
CodeGenPrepare
- leaving dbg.values for an alloca were they are (see comment)
Other
- regenerated/updated instcombine.ll testcase and included source
rdar://problem/16679879
http://reviews.llvm.org/D3374
llvm-svn: 207165
This is similar to the 'tail' marker, except that it guarantees that
tail call optimization will occur. It also comes with convervative IR
verification rules that ensure that tail call optimization is possible.
Reviewers: nicholas
Differential Revision: http://llvm-reviews.chandlerc.com/D3240
llvm-svn: 207143
described by DBG_VALUEs during their lifetime.
Previously, when a variable was at a FrameIndex for any part of its
lifetime, this would shadow all other DBG_VALUEs and only a single
fbreg location would be emitted, which in fact is only valid for a small
range and not the entire lexical scope of the variable. The included
dbg-value-const-byref testcase demonstrates this.
This patch fixes this by
Local
- emitting dbg.value intrinsics for allocas that are passed by reference
- dropping all dbg.declares (they are now fully lowered to dbg.values)
SelectionDAG
- renamed constructors for SDDbgValue for better readability.
- fix UserValue::match() to handle indirect values correctly
- not inserting an MMI table entries for dbg.values that describe allocas.
- lowering dbg.values that describe allocas into *indirect* DBG_VALUEs.
CodeGenPrepare
- leaving dbg.values for an alloca were they are (see comment)
Other
- regenerated/updated instcombine-intrinsics testcase and included source
rdar://problem/16679879
http://reviews.llvm.org/D3374
llvm-svn: 207130
This patch:
- Adds two new X86 builtin intrinsics ('int_x86_rdtsc' and
'int_x86_rdtscp') as GCCBuiltin intrinsics;
- Teaches the backend how to lower the two new builtins;
- Introduces a common function to lower READCYCLECOUNTER dag nodes
and the two new rdtsc/rdtscp intrinsics;
- Improves (and extends) the existing x86 test 'rdtsc.ll'; now test 'rdtsc.ll'
correctly verifies that both READCYCLECOUNTER and the two new intrinsics
work fine for both 64bit and 32bit Subtargets.
llvm-svn: 207127
I discovered this const-hole while attempting to coalesnce the Symbol
and SymbolMap data structures. There's some pending issues with that,
but I figured this change was easy to flush early.
llvm-svn: 207124
algorithm here: http://dl.acm.org/citation.cfm?id=177301.
The idea of isolating the roots has even more relevance when using the
stack not just to implement the DFS but also to implement the recursive
step. Because we use it for the recursive step, to isolate the roots we
need to maintain two stacks: one for our recursive DFS walk, and another
of the nodes that have been walked. The nice thing is that the latter
will be half the size. It also fixes a complete hack where we scanned
backwards over the stack to find the next potential-root to continue
processing. Now that is always the top of the DFS stack.
While this is a really nice improvement already (IMO) it further opens
the door for two important simplifications:
1) De-duplicating some of the code across the two different walks. I've
actually made the duplication a bit worse in some senses with this
patch because the two are starting to converge.
2) Dramatically simplifying the loop structures of both walks.
I wanted to do those separately as they'll be essentially *just* CFG
restructuring. This patch on the other hand actually uses different
datastructures to implement the algorithm itself.
llvm-svn: 207098
a SmallPtrSet. Currently, there is no need for stable iteration in this
dimension, and I now thing there won't need to be going forward.
If this is ever re-introduced in any form, it needs to not be
a SetVector based solution because removal cannot be linear. There will
be many SCCs with large numbers of parents. When encountering these, the
incremental SCC update for intra-SCC edge removal was quadratic due to
linear removal (kind of).
I'm really hoping we can avoid having an ordering property here at all
though...
llvm-svn: 207091
own CRTP base class for more general purpose use. Add some clarifying
comments for the exact way in which the adaptor uses it. Hopefully this
will help us write increasingly full featured iterators. This is
becoming important as they start to be used heavily inside of ranges.
llvm-svn: 207072
Boost's iterator_adaptor, and a specific adaptor which iterates over
pointees when wrapped around an iterator over pointers.
This is the result of a long discussion on IRC with Duncan Smith, Dave
Blaikie, Richard Smith, and myself. Essentially, I could use some subset
of the iterator facade facilities often used from Boost, and everyone
seemed interested in having the functionality in a reasonably generic
form. I've tried to strike a balance between the pragmatism and the
established Boost design. The primary differences are:
1) Delegating to the standard iterator interface names rather than
special names that then make up a second iterator-like API.
2) Using the name 'pointee_iterator' which seems more clear than
'indirect_iterator'. The whole business of calling the '*p' operation
'pointer indirection' in the standard is ... quite confusing. And
'dereference' is no better of a term for moving from a pointer to
a reference.
Hoping Duncan, and others continue to provide comments on this until
we've got a nice, minimal abstraction.
llvm-svn: 207069
than functions. So far, this access pattern is *much* more common. It
seems likely that any user of this interface is going to have nodes at
the point that they are querying the SCCs.
No functionality changed.
llvm-svn: 207045
GCOV provides an option to prepend output file names with the source
file name, to disambiguate between covered data that's included from
multiple sources. Add a flag to llvm-cov that does the same.
llvm-svn: 207035
For now it contains a single flag, SanitizeAddress, which enables
AddressSanitizer instrumentation of inline assembly.
Patch by Yuri Gorshenin.
llvm-svn: 206971
This implements the core functionality necessary to remove an edge from
the call graph and correctly update both the basic graph and the SCC
structure. As part of that it has to run a tiny (in number of nodes)
Tarjan-style DFS walk of an SCC being mutated to compute newly formed
SCCs, etc.
This is *very rough* and a WIP. I have a bunch of FIXMEs for code
cleanup that will reduce the boilerplate in this change substantially.
I also have a bunch of simplifications to various parts of both
algorithms that I want to make, but first I'd like to have a more
holistic picture. Ideally, I'd also like more testing. I'll probably add
quite a few more unit tests as I go here to cover the various different
aspects and corner cases of removing edges from the graph.
Still, this is, so far, successfully updating the SCC graph in-place
without disrupting the identity established for the existing SCCs even
when we do challenging things like delete the critical edge that made an
SCC cycle at all and have to reform things as a tree of smaller SCCs.
Getting this to work is really critical for the new pass manager as it
is going to associate significant state with the SCC instance and needs
it to be stable. That is also the motivation behind the return of the
newly formed SCCs. Eventually, I'll wire this all the way up to the
public API so that the pass manager can use it to correctly re-enqueue
newly formed SCCs into a fresh postorder traversal.
llvm-svn: 206968
up the stack finishing the exploration of each entries children before
we're finished in addition to accounting for their low-links. Added
a unittest that really hammers home the need for this with interlocking
cycles that would each appear distinct otherwise and crash or compute
the wrong result. As part of this, nuke a stale fixme and bring the rest
of the implementation still more closely in line with the original
algorithm.
llvm-svn: 206966
parents of an SCC, and add a lookup method for finding the SCC for
a given function. These aren't used yet, but will be used shortly in
some unit tests I'm adding and are really part of the broader intended
interface for the analysis.
llvm-svn: 206959
resisted this for too long. Just with the basic testing here I was able
to exercise the analysis in more detail and sift out both type signature
bugs in the API and a bug in the DFS numbering. All of these are fixed
here as well.
The unittests will be much more important for the mutation support where
it is necessary to craft minimal mutations and then inspect the state of
the graph. There is just no way to do that with a standard FileCheck
test. However, unittesting these kinds of analyses is really quite easy,
especially as they're designed with the new pass manager where there is
essentially no infrastructure required to rig up the core logic and
exercise it at an API level.
As a minor aside about the DFS numbering bug, the DFS numbering used in
LCG is a bit unusual. Rather than numbering from 0, we number from 1,
and use 0 as the sentinel "unvisited" state. Other implementations often
use '-1' for this, but I find it easier to deal with 0 and it shouldn't
make any real difference provided someone doesn't write silly bugs like
forgetting to actually initialize the DFS numbering. Oops. ;]
llvm-svn: 206954
the Callee list. This is going to be quite important to prevent removal
from going quadratic. No functionality changed at this point, this is
one of the refactoring patches I've broken out of my initial work toward
mutation updates of the call graph.
llvm-svn: 206938
from places like MCCodeEmitter() in the MC backend when the
MCContext is const.
I was going to use this in my change for r206669 but Jim convinced
me to use an assert there. But this still is a good tweak.
llvm-svn: 206923
r206916 was not logically the same as the previous code because the
goto statements did not create loop. This should be the same as the
previous code.
llvm-svn: 206918
Goto statements jumping into previous inner blocks are pretty confusing
to read even though in this case they are valid. No reason to not use
while loops there.
llvm-svn: 206916
diagnostic that includes location information.
Currently if one has this assembly:
.quad (0x1234 + (4 * SOME_VALUE))
where SOME_VALUE is undefined ones gets the less than
useful error message with no location information:
% clang -c x.s
clang -cc1as: fatal error: error in backend: expected relocatable expression
With this fix one now gets a more useful error message
with location information:
% clang -c x.s
x.s:5:8: error: expected relocatable expression
.quad (0x1234 + (4 * SOME_VALUE))
^
To do this I plumbed the SMLoc through the MCObjectStreamer
EmitValue() and EmitValueImpl() interfaces so it could be used
when creating the MCFixup.
rdar://12391022
llvm-svn: 206906
Store pointers directly to loops inside the nodes. This could have been
done without changing the type stored in `std::vector<>`. However,
rather than computing the number of loops before constructing them
(which `LoopInfo` doesn't provide directly), I've switched to a
`vector<unique_ptr<LoopData>>`.
This adds some heap overhead, but the number of loops is typically
small.
llvm-svn: 206857
define below all header includes in the lib/CodeGen/... tree. While the
current modules implementation doesn't check for this kind of ODR
violation yet, it is likely to grow support for it in the future. It
also removes one layer of macro pollution across all the included
headers.
Other sub-trees will follow.
llvm-svn: 206837
ELFEntityIterator does not implement RandomAccessIterator. It does
not even implement BidirectionalIterator.
This patch fixes LLD build issue when compiled with MSVC2013 with
debug: MSVC's find_if checks if the start iterator is before the end
iterator in the sense of operator< if it declares implementing
RandomAccessIterator. If a class does not have operator<, it fails
to compile.
llvm-svn: 206825
behavior based on other files defining DEBUG_TYPE, which means it cannot
define DEBUG_TYPE at all. This is actually better IMO as it forces folks
to define relevant DEBUG_TYPEs for their files. However, it requires all
files that currently use DEBUG(...) to define a DEBUG_TYPE if they don't
already. I've updated all such files in LLVM and will do the same for
other upstream projects.
This still leaves one important change in how LLVM uses the DEBUG_TYPE
macro going forward: we need to only define the macro *after* header
files have been #include-ed. Previously, this wasn't possible because
Debug.h required the macro to be pre-defined. This commit removes that.
By defining DEBUG_TYPE after the includes two things are fixed:
- Header files that need to provide a DEBUG_TYPE for some inline code
can do so by defining the macro before their inline code and undef-ing
it afterward so the macro does not escape.
- We no longer have rampant ODR violations due to including headers with
different DEBUG_TYPE definitions. This may be mostly an academic
violation today, but with modules these types of violations are easy
to check for and potentially very relevant.
Where necessary to suppor headers with DEBUG_TYPE, I have moved the
definitions below the includes in this commit. I plan to move the rest
of the DEBUG_TYPE macros in LLVM in subsequent commits; this one is big
enough.
The comments in Debug.h, which were hilariously out of date already,
have been updated to reflect the recommended practice going forward.
llvm-svn: 206822
This reverts commit r206707, reapplying r206704. The preceding commit
to CalcSpillWeights should have sorted out the failing buildbots.
<rdar://problem/14292693>
llvm-svn: 206766
We normally don't drop functions from the C API's, but in this case I think we
can:
* The old implementation of getFileOffset was fairly broken
* The introduction of LLVMGetSymbolFileOffset was itself a C api breaking
change as it removed LLVMGetSymbolOffset.
* It is an incredibly specialized use case. The only reason MCJIT needs it is
because of its odd position of being a dynamic linker of .o files.
llvm-svn: 206750
LazyCallGraph analysis framework. Wire it up all the way through the opt
driver and add some very basic testing that we can build pass pipelines
including these components. Still a lot more to do in terms of testing
that all of this works, but the basic pieces are here.
There is a *lot* of boiler plate here. It's something I'm going to
actively look at reducing, but I don't have any immediate ideas that
don't end up making the code terribly complex in order to fold away the
boilerplate. Until I figure out something to minimize the boilerplate,
almost all of this is based on the code for the existing pass managers,
copied and heavily adjusted to suit the needs of the CGSCC pass
management layer.
The actual CG management still has a bunch of FIXMEs in it. Notably, we
don't do *any* updating of the CG as it is potentially invalidated.
I wanted to get this in place to motivate the new analysis, and add
update APIs to the analysis and the pass management layers in concert to
make sure that the *right* APIs are present.
llvm-svn: 206745
It could even be made non-virtual if it weren't for bad compiler
warnings.
This demonstrates that ArgList objects aren't destroyed polymorphically
and possibly that they aren't even used polymorphically. If that's the
case, it might be possible to refactor the two ArgList types more
separately and simplify the Arg ownership model. *continues
experimenting*
llvm-svn: 206727
This might be able to be simplified further by using Arg as a value type
in a linked list (to maintain pointer validity), but here's something
simple to start with.
llvm-svn: 206724
This reverts commit r206677, reapplying my BlockFrequencyInfo rewrite.
I've done a careful audit, added some asserts, and fixed a couple of
bugs (unfortunately, they were in unlikely code paths). There's a small
chance that this will appease the failing bots [1][2]. (If so, great!)
If not, I have a follow-up commit ready that will temporarily add
-debug-only=block-freq to the two failing tests, allowing me to compare
the code path between what the failing bots and what my machines (and
the rest of the bots) are doing. Once I've triggered those builds, I'll
revert both commits so the bots go green again.
[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816
[2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18445
<rdar://problem/14292693>
llvm-svn: 206704
Win64 stack unwinder gets confused when execution flow "falls through" after
a call to 'noreturn' function. This fixes the "missing epilogue" problem by
emitting a trap instruction for IR 'unreachable' on x86_x64-pc-windows.
A secondary use for it would be for anyone wanting to make double-sure that
'noreturn' functions, indeed, do not return.
llvm-svn: 206684
This reverts commit r206666, as planned.
Still stumped on why the bots are failing. Sanitizer bots haven't
turned anything up. If anyone can help me debug either of the failures
(referenced in r206666) I'll owe them a beer. (In the meantime, I'll be
auditing my patch for undefined behaviour.)
llvm-svn: 206677