SCCMap to test for nodes that have been re-added to the root SCC rather
than a set vector. We already have done the SCCMap lookup, we juts need
to test it in two different ways. In turn, do most of the processing of
these nodes as they go into the root SCC rather than lazily. This
simplifies the final loop to just stitch the root SCC into its
children's parent sets. No functionlatiy changed.
However, this makes a few things painfully obvious, which was my intent.
=] There is tons of repeated code introduced here and elsewhere. I'm
splitting the refactoring of that code into helpers from this change so
its clear that this is the change which switches the datastructures used
around, and the other is a pure factoring & deduplication of code
change.
llvm-svn: 207217
This patch is a supplement of implementing predicate of FP, enabling aarch64 backend
no-fp tests on arm64 target for verification. During this, one bug is exposed and
fixed by this patch.
llvm-svn: 207215
remove the nodes in the SCC from the SCC map entirely prior to the DFS
walk. This allows the SCC map to represent both the state of
not-yet-re-added-to-an-SCC and added-back-to-this-SCC independently. The
first is being missing from the SCC map, the second is mapping back to
'this'. In a subsequent commit, I'm going to use this property to
simplify the new node list for this SCC.
In theory, I think this also makes the contract for orphaning a node
from the graph slightly less confusing. Now it is also orphaned from the
SCC graph. Still, this isn't quite right either, and so I'm not adding
test cases here. I'll add test cases for the behavior of orphaning nodes
when the code *actually* supports it. The change here is mostly
incidental, my goal is simplifying the algorithm.
llvm-svn: 207213
child from the worklist, wait until we actually need to pop another
element off of the worklist and skip over any that were already visited
by the DFS. This also enables swapping the nodes of the SCC into the
worklist. No functionality changed.
llvm-svn: 207212
thing, just mucking up the code. I feel bad that I even wrote this loop.
Very sorry. The diff is huge because of the indent change, but I promise
all this is doing is realizing that the outer two loops were actually
the exact same loops, and we didn't need two of them.
llvm-svn: 207202
factored into a more reasonable form, replace the tail call with
a simple outer-loop continuation. It's sad that C++ makes this so
awkward to write, but it seems more direct and clear than the tail call
at this point.
llvm-svn: 207201
Change the object streamer selection to a switch from a series of if conditions.
Rather than defaulting to ELF, require that an ELF format is requested. The
Windows/!ELF is maintained as MachO would have been selected first and will
still provide a MachO format. Add an assertion that if COFF is requested that
the target platform is Windows as only WinCOFF object emission is currently
supported.
llvm-svn: 207200
The returnvalue was handled as c_char_p which ment that ctypes
handled it as a NUL-terminated string making it cut the contents
at first NUL (or even worse - overrunning the buffer if it doesn't
contain a NUL).
Differential Revision: http://reviews.llvm.org/D3474
llvm-svn: 207199
Remove the concepts of "forward" and "general" mass distributions, which
was wrong. The split might have made sense in an early version of the
algorithm, but it's definitely wrong now.
<rdar://problem/14292693>
llvm-svn: 207195
Strip irreducible testcases to pure control flow. The function calls
made the branch weights more believable but cluttered it up a lot.
There isn't going to be any constant analysis here, so just use dumb
branch logic to clarify the important parts.
<rdar://problem/14292693>
llvm-svn: 207192
Rather than scaling loop headers and then scaling all the loop members
by the header frequency, scale `LoopData::Scale` itself, and scale the
loop members by it. It's much more obvious what's going on this way,
and doesn't cost any extra multiplies.
<rdar://problem/14292693>
llvm-svn: 207189
Make `getPackagedNode()` a member function of
`BlockFrequencyInfoImplBase` so that it's available for templated code.
<rdar://problem/14292693>
llvm-svn: 207183
Continue refactoring to make `LoopData` first-class. Here I'm making
the `LoopData` hierarchy explicit, instead of bouncing back and forth
with `WorkingData`. This simplifies the logic and better matches the
`LoopInfo` design. (Eventually, `LoopInfo` should be restructured so
that it supports this pass, and `LoopData` can be removed.)
<rdar://problem/14292693>
llvm-svn: 207180
As pointed out by David Blaikie in code review, a `std::list<T>` is
simpler than a `std::vector<std::unique_ptr<T>>`. Another option is a
`std::deque<T>` (which allocates in chunks), but I'd like to leave open
the option of inserting in the middle of the sequence for handling
irreducible control flow on the fly.
<rdar://problem/14292693>
llvm-svn: 207177
Should fix PR19526.
When Oscar added this code in the intial CMake build system port, he had
a TODO saying that ${CMAKE_SHARED_LINKER_FLAGS} was probably wrong. I
agree. I'm using ${CMAKE_CXX_LINK_FLAGS} to point LLVM at my custom
installation of gcc 4.recent, so that seems more correct. With this
change, I can build creduce against an installed clang, and it picks up
the write flags from --ldflags.
llvm-svn: 207171
When fixing the symbols in each compressed section we were iterating
over all symbols for each compressed section. In extreme cases this
could snowball severely (5min uncompressed -> 35min compressed) due to
iterating over all symbols for each compressed section (large numbers of
compressed sections can be generated by DWARF type units).
To address this, build a map of the symbols in each section ahead of
time, and access that map if a section is being compressed. This brings
compile time for the aforementioned example down to ~6 minutes.
llvm-svn: 207167
AllocaInst that was missing in one location.
Debug info for optimized code: Support variables that are on the stack and
described by DBG_VALUEs during their lifetime.
Previously, when a variable was at a FrameIndex for any part of its
lifetime, this would shadow all other DBG_VALUEs and only a single
fbreg location would be emitted, which in fact is only valid for a small
range and not the entire lexical scope of the variable. The included
dbg-value-const-byref testcase demonstrates this.
This patch fixes this by
Local
- emitting dbg.value intrinsics for allocas that are passed by reference
- dropping all dbg.declares (they are now fully lowered to dbg.values)
SelectionDAG
- renamed constructors for SDDbgValue for better readability.
- fix UserValue::match() to handle indirect values correctly
- not inserting an MMI table entries for dbg.values that describe allocas.
- lowering dbg.values that describe allocas into *indirect* DBG_VALUEs.
CodeGenPrepare
- leaving dbg.values for an alloca were they are (see comment)
Other
- regenerated/updated instcombine.ll testcase and included source
rdar://problem/16679879
http://reviews.llvm.org/D3374
llvm-svn: 207165
This is similar to the 'tail' marker, except that it guarantees that
tail call optimization will occur. It also comes with convervative IR
verification rules that ensure that tail call optimization is possible.
Reviewers: nicholas
Differential Revision: http://llvm-reviews.chandlerc.com/D3240
llvm-svn: 207143
described by DBG_VALUEs during their lifetime.
Previously, when a variable was at a FrameIndex for any part of its
lifetime, this would shadow all other DBG_VALUEs and only a single
fbreg location would be emitted, which in fact is only valid for a small
range and not the entire lexical scope of the variable. The included
dbg-value-const-byref testcase demonstrates this.
This patch fixes this by
Local
- emitting dbg.value intrinsics for allocas that are passed by reference
- dropping all dbg.declares (they are now fully lowered to dbg.values)
SelectionDAG
- renamed constructors for SDDbgValue for better readability.
- fix UserValue::match() to handle indirect values correctly
- not inserting an MMI table entries for dbg.values that describe allocas.
- lowering dbg.values that describe allocas into *indirect* DBG_VALUEs.
CodeGenPrepare
- leaving dbg.values for an alloca were they are (see comment)
Other
- regenerated/updated instcombine-intrinsics testcase and included source
rdar://problem/16679879
http://reviews.llvm.org/D3374
llvm-svn: 207130
This patch:
- Adds two new X86 builtin intrinsics ('int_x86_rdtsc' and
'int_x86_rdtscp') as GCCBuiltin intrinsics;
- Teaches the backend how to lower the two new builtins;
- Introduces a common function to lower READCYCLECOUNTER dag nodes
and the two new rdtsc/rdtscp intrinsics;
- Improves (and extends) the existing x86 test 'rdtsc.ll'; now test 'rdtsc.ll'
correctly verifies that both READCYCLECOUNTER and the two new intrinsics
work fine for both 64bit and 32bit Subtargets.
llvm-svn: 207127
I discovered this const-hole while attempting to coalesnce the Symbol
and SymbolMap data structures. There's some pending issues with that,
but I figured this change was easy to flush early.
llvm-svn: 207124
This skips a couple of compare ones due to the different syntaxt for
floating-point 0.0. AArch64 does it more canonically, and we'll need to fiddle
ARM64 to make it work.
llvm-svn: 207119
Leak identified by LSan and reported by Kostya Serebryany.
Let's get a bit experimental here... in theory our minimum compiler
versions support unordered_map.
llvm-svn: 207118
This matches ARM64 behaviour, which I think is clearer. It also puts all the
churn from that difference into one easily ignored commit.
llvm-svn: 207116
These can have different relocations in ELF. In particular both:
b.eq global
ldr x0, global
are valid, giving different relocations. The only possible way to distinguish
them is via a different fixup, so the operands had to be separated throughout
the backend.
llvm-svn: 207105
ARM64 was not producing pure BFI instructions for bitfield insertion
operations, unlike AArch64. The approach had to be a little different (in
ISelDAGToDAG rather than ISelLowering), and the outcomes aren't identical but
hopefully this gives it similar power.
This should address PR19424.
llvm-svn: 207102
algorithm here: http://dl.acm.org/citation.cfm?id=177301.
The idea of isolating the roots has even more relevance when using the
stack not just to implement the DFS but also to implement the recursive
step. Because we use it for the recursive step, to isolate the roots we
need to maintain two stacks: one for our recursive DFS walk, and another
of the nodes that have been walked. The nice thing is that the latter
will be half the size. It also fixes a complete hack where we scanned
backwards over the stack to find the next potential-root to continue
processing. Now that is always the top of the DFS stack.
While this is a really nice improvement already (IMO) it further opens
the door for two important simplifications:
1) De-duplicating some of the code across the two different walks. I've
actually made the duplication a bit worse in some senses with this
patch because the two are starting to converge.
2) Dramatically simplifying the loop structures of both walks.
I wanted to do those separately as they'll be essentially *just* CFG
restructuring. This patch on the other hand actually uses different
datastructures to implement the algorithm itself.
llvm-svn: 207098
applied prior to pushing a node onto the DFSStack. This is the first
step toward avoiding the stack entirely for leaf nodes. It also
simplifies things a bit and I think is pointing the way toward factoring
some more of the shared logic out of the two implementations.
It is also making it more obvious how to restructure the loops
themselves to be a bit easier to read (although no different in terms of
functionality).
llvm-svn: 207095
a SmallPtrSet. Currently, there is no need for stable iteration in this
dimension, and I now thing there won't need to be going forward.
If this is ever re-introduced in any form, it needs to not be
a SetVector based solution because removal cannot be linear. There will
be many SCCs with large numbers of parents. When encountering these, the
incremental SCC update for intra-SCC edge removal was quadratic due to
linear removal (kind of).
I'm really hoping we can avoid having an ordering property here at all
though...
llvm-svn: 207091
This allows us to compile
return (mask & 0x8 ? a : b);
into
testb $8, %dil
cmovnel %edx, %esi
instead of
andl $8, %edi
shrl $3, %edi
cmovnel %edx, %esi
which we formed previously because dag combiner canonicalizes setcc of and into shift.
llvm-svn: 207088
Added support for bytes replication feature, so it could be GAS compatible.
E.g. instructions below:
"vmov.i32 d0, 0xffffffff"
"vmvn.i32 d0, 0xabababab"
"vmov.i32 d0, 0xabababab"
"vmov.i16 d0, 0xabab"
are incorrect, but we could deal with such cases.
For first one we should emit:
"vmov.i8 d0, 0xff"
For second one ("vmvn"):
"vmov.i8 d0, 0x54"
For last two instructions it should emit:
"vmov.i8 d0, 0xab"
P.S.: In ARMAsmParser.cpp I have also fixed few nearby style issues in old code.
Just for keeping method bodies in harmony with themselves.
llvm-svn: 207080
own CRTP base class for more general purpose use. Add some clarifying
comments for the exact way in which the adaptor uses it. Hopefully this
will help us write increasingly full featured iterators. This is
becoming important as they start to be used heavily inside of ranges.
llvm-svn: 207072
Boost's iterator_adaptor, and a specific adaptor which iterates over
pointees when wrapped around an iterator over pointers.
This is the result of a long discussion on IRC with Duncan Smith, Dave
Blaikie, Richard Smith, and myself. Essentially, I could use some subset
of the iterator facade facilities often used from Boost, and everyone
seemed interested in having the functionality in a reasonably generic
form. I've tried to strike a balance between the pragmatism and the
established Boost design. The primary differences are:
1) Delegating to the standard iterator interface names rather than
special names that then make up a second iterator-like API.
2) Using the name 'pointee_iterator' which seems more clear than
'indirect_iterator'. The whole business of calling the '*p' operation
'pointer indirection' in the standard is ... quite confusing. And
'dereference' is no better of a term for moving from a pointer to
a reference.
Hoping Duncan, and others continue to provide comments on this until
we've got a nice, minimal abstraction.
llvm-svn: 207069
This excludes avx512 as I don't have hardware to verify. It excludes _dq
variants because they are represented in the IR as <{2,4} x i64> when it's
actually a byte shift of the entire i{128,265}.
This also excludes _dq_bs as they aren't at all supported by the backend.
There are also no corresponding instructions in the ISA. I have no idea why
they exist...
llvm-svn: 207058
Summary:
Since the upper 64 bits of the destination register are undefined when
performing this operation, we can substitute it and let the optimizer
figure out that only a copy is needed.
Also added range merging, if an instruction copies a range that can be
merged with a previous copied range.
Added test cases for both optimizations.
Reviewers: grosbach, nadav
CC: llvm-commits
Differential Revision: http://reviews.llvm.org/D3357
llvm-svn: 207055
than functions. So far, this access pattern is *much* more common. It
seems likely that any user of this interface is going to have nodes at
the point that they are querying the SCCs.
No functionality changed.
llvm-svn: 207045
values rather than an expensive dense map query to test whether children
have already been popped into an SCC. This matches the incremental SCC
building code. I've also included the assert that I put there but
updated both of their text.
No functionality changed here.
I still don't have any great ideas for sharing the code between the two
implementations, but I may try a brute-force approach to factoring it at
some point.
llvm-svn: 207042
This is dependent on changes that are not fully ready to be merged yet (WoA
object file emission). The test can be re-enabled for that target later.
llvm-svn: 207038
GCOV provides an option to prepend output file names with the source
file name, to disambiguate between covered data that's included from
multiple sources. Add a flag to llvm-cov that does the same.
llvm-svn: 207035
There's only ever one address pool, not one per DWARF output file, so
let's just have one.
(similar refactoring of the string pool to come soon)
llvm-svn: 207026
ANDS does not use the same encoding scheme as other xxxS instructions (e.g.,
ADDS). Take that into account to avoid wrong peephole optimization.
<rdar://problem/16693089>
llvm-svn: 207020
Some of these types (DwarfDebug in particular) are quite large to begin
with (and I keep forgetting whether DwarfFile is in DwarfDebug or
DwarfUnit... ) so having a few smaller files seems like goodness.
llvm-svn: 207010
We're currently copying CounterData from InstrProfWriter into the
OnDiskHashTable, even though we don't need to, and then carelessly
leaking those copies. A const pointer is much better here.
llvm-svn: 207009
Don't replace shifts greater than the type with the maximum shift.
This isn't hit anywhere in the tests, and somewhere else is replacing
these with undef.
llvm-svn: 207000
For now it contains a single flag, SanitizeAddress, which enables
AddressSanitizer instrumentation of inline assembly.
Patch by Yuri Gorshenin.
llvm-svn: 206971
This implements the core functionality necessary to remove an edge from
the call graph and correctly update both the basic graph and the SCC
structure. As part of that it has to run a tiny (in number of nodes)
Tarjan-style DFS walk of an SCC being mutated to compute newly formed
SCCs, etc.
This is *very rough* and a WIP. I have a bunch of FIXMEs for code
cleanup that will reduce the boilerplate in this change substantially.
I also have a bunch of simplifications to various parts of both
algorithms that I want to make, but first I'd like to have a more
holistic picture. Ideally, I'd also like more testing. I'll probably add
quite a few more unit tests as I go here to cover the various different
aspects and corner cases of removing edges from the graph.
Still, this is, so far, successfully updating the SCC graph in-place
without disrupting the identity established for the existing SCCs even
when we do challenging things like delete the critical edge that made an
SCC cycle at all and have to reform things as a tree of smaller SCCs.
Getting this to work is really critical for the new pass manager as it
is going to associate significant state with the SCC instance and needs
it to be stable. That is also the motivation behind the return of the
newly formed SCCs. Eventually, I'll wire this all the way up to the
public API so that the pass manager can use it to correctly re-enqueue
newly formed SCCs into a fresh postorder traversal.
llvm-svn: 206968
up the stack finishing the exploration of each entries children before
we're finished in addition to accounting for their low-links. Added
a unittest that really hammers home the need for this with interlocking
cycles that would each appear distinct otherwise and crash or compute
the wrong result. As part of this, nuke a stale fixme and bring the rest
of the implementation still more closely in line with the original
algorithm.
llvm-svn: 206966
parents of an SCC, and add a lookup method for finding the SCC for
a given function. These aren't used yet, but will be used shortly in
some unit tests I'm adding and are really part of the broader intended
interface for the analysis.
llvm-svn: 206959
This model is not final and work is still in progress.
However there are substantial improvements on integer tests mainly because of better RAL with new scheduler.
Differential Revision: http://reviews.llvm.org/D3451
llvm-svn: 206957
Use -stats to see how many loops were analyzed for possible vectorization and how many of them were actually vectorized.
Patch by Zinovy Nis
Differential Revision: http://reviews.llvm.org/D3438
llvm-svn: 206956
resisted this for too long. Just with the basic testing here I was able
to exercise the analysis in more detail and sift out both type signature
bugs in the API and a bug in the DFS numbering. All of these are fixed
here as well.
The unittests will be much more important for the mutation support where
it is necessary to craft minimal mutations and then inspect the state of
the graph. There is just no way to do that with a standard FileCheck
test. However, unittesting these kinds of analyses is really quite easy,
especially as they're designed with the new pass manager where there is
essentially no infrastructure required to rig up the core logic and
exercise it at an API level.
As a minor aside about the DFS numbering bug, the DFS numbering used in
LCG is a bit unusual. Rather than numbering from 0, we number from 1,
and use 0 as the sentinel "unvisited" state. Other implementations often
use '-1' for this, but I find it easier to deal with 0 and it shouldn't
make any real difference provided someone doesn't write silly bugs like
forgetting to actually initialize the DFS numbering. Oops. ;]
llvm-svn: 206954
AArch64 has feature predicates for NEON, FP and CRYPTO instructions.
This allows the compiler to generate code without using FP, NEON
or CRYPTO instructions.
llvm-svn: 206949
the Callee list. This is going to be quite important to prevent removal
from going quadratic. No functionality changed at this point, this is
one of the refactoring patches I've broken out of my initial work toward
mutation updates of the call graph.
llvm-svn: 206938
This prompted me to push references through most of DwarfDebug. Sorry
for the churn.
Honestly it's a bit silly that we're passing around units all over the
place like that anyway and I think it's mostly due to the DIE attribute
adding utility functions being utilities in DwarfUnit. I should have
another go at moving them out of DwarfUnit...
llvm-svn: 206925
from places like MCCodeEmitter() in the MC backend when the
MCContext is const.
I was going to use this in my change for r206669 but Jim convinced
me to use an assert there. But this still is a good tweak.
llvm-svn: 206923
r206916 was not logically the same as the previous code because the
goto statements did not create loop. This should be the same as the
previous code.
llvm-svn: 206918
Goto statements jumping into previous inner blocks are pretty confusing
to read even though in this case they are valid. No reason to not use
while loops there.
llvm-svn: 206916
In the case where the constant comes from a cloned cast instruction, the
materialization code has to go before the cloned cast instruction.
This commit fixes the method that finds the materialization insertion point
by making it aware of this case.
This fixes <rdar://problem/15532441>
llvm-svn: 206913
diagnostic that includes location information.
Currently if one has this assembly:
.quad (0x1234 + (4 * SOME_VALUE))
where SOME_VALUE is undefined ones gets the less than
useful error message with no location information:
% clang -c x.s
clang -cc1as: fatal error: error in backend: expected relocatable expression
With this fix one now gets a more useful error message
with location information:
% clang -c x.s
x.s:5:8: error: expected relocatable expression
.quad (0x1234 + (4 * SOME_VALUE))
^
To do this I plumbed the SMLoc through the MCObjectStreamer
EmitValue() and EmitValueImpl() interfaces so it could be used
when creating the MCFixup.
rdar://12391022
llvm-svn: 206906
The point of these calls is to allow Thumb-1 code to make use of the VFP unit
to perform its operations. This is not desirable with -msoft-float, since most
of the reasons you'd want that apply equally to the runtime library.
rdar://problem/13766161
llvm-svn: 206874
This reverts commit r206780.
This commit was regressing gdb.opt/inline-locals.exp in the GDB 7.5 test
suite. Reverting until I can fix the issue.
llvm-svn: 206867
The branch that skips irreducible backedges was only active when
propagating mass at the top-level. In particular, when propagating mass
through a loop recognized by `LoopInfo` with irreducible control flow
inside, irreducible backedges would not be skipped.
Not sure where that idea came from, but the result was that mass was
lost until after loop exit. Added a testcase that covers this case.
llvm-svn: 206860
Store pointers directly to loops inside the nodes. This could have been
done without changing the type stored in `std::vector<>`. However,
rather than computing the number of loops before constructing them
(which `LoopInfo` doesn't provide directly), I've switched to a
`vector<unique_ptr<LoopData>>`.
This adds some heap overhead, but the number of loops is typically
small.
llvm-svn: 206857
This was implicitly with copy assignment before, which fails to actually
clear `std::vector<>`'s heap storage. Move assignment would work, but
since MSVC can't imply those anyway, explicitly `clear()`-ing members
makes more sense.
llvm-svn: 206856
definition below all of the header #include lines, lib/Transforms/...
edition.
This one is tricky for two reasons. We again have a couple of passes
that define something else before the includes as well. I've sunk their
name macros with the DEBUG_TYPE.
Also, InstCombine contains headers that need DEBUG_TYPE, so now those
headers #define and #undef DEBUG_TYPE around their code, leaving them
well formed modular headers. Fixing these headers was a large motivation
for all of these changes, as "leaky" macros of this form are hard on the
modules implementation.
llvm-svn: 206844
definition below all the header #include lines, lib/Analysis/...
edition.
This one has a bit extra as there were *other* #define's before #include
lines in addition to DEBUG_TYPE. I've sunk all of them as a block.
llvm-svn: 206843
of a '.inc' file before including actual headers. In this case we had
both duplicated a header's include and were including a standard header.
llvm-svn: 206840
system headers above the includes of generated '.inc' files that
actually contain code. In a few targets this was already done pretty
consistently, but it wasn't done *really* consistently anywhere. It is
strictly cleaner IMO and necessary in a bunch of places where the
DEBUG_TYPE is referenced from the generated code. Consistency with the
necessary places trumps. Hopefully the build bots are OK with the
movement of intrin.h...
llvm-svn: 206838
define below all header includes in the lib/CodeGen/... tree. While the
current modules implementation doesn't check for this kind of ODR
violation yet, it is likely to grow support for it in the future. It
also removes one layer of macro pollution across all the included
headers.
Other sub-trees will follow.
llvm-svn: 206837
while checking candidate for bit field extract.
Otherwise the value may not fit in uint64_t and this will trigger an
assertion.
This fixes PR19503.
llvm-svn: 206834
ELFEntityIterator does not implement RandomAccessIterator. It does
not even implement BidirectionalIterator.
This patch fixes LLD build issue when compiled with MSVC2013 with
debug: MSVC's find_if checks if the start iterator is before the end
iterator in the sense of operator< if it declares implementing
RandomAccessIterator. If a class does not have operator<, it fails
to compile.
llvm-svn: 206825
behavior based on other files defining DEBUG_TYPE, which means it cannot
define DEBUG_TYPE at all. This is actually better IMO as it forces folks
to define relevant DEBUG_TYPEs for their files. However, it requires all
files that currently use DEBUG(...) to define a DEBUG_TYPE if they don't
already. I've updated all such files in LLVM and will do the same for
other upstream projects.
This still leaves one important change in how LLVM uses the DEBUG_TYPE
macro going forward: we need to only define the macro *after* header
files have been #include-ed. Previously, this wasn't possible because
Debug.h required the macro to be pre-defined. This commit removes that.
By defining DEBUG_TYPE after the includes two things are fixed:
- Header files that need to provide a DEBUG_TYPE for some inline code
can do so by defining the macro before their inline code and undef-ing
it afterward so the macro does not escape.
- We no longer have rampant ODR violations due to including headers with
different DEBUG_TYPE definitions. This may be mostly an academic
violation today, but with modules these types of violations are easy
to check for and potentially very relevant.
Where necessary to suppor headers with DEBUG_TYPE, I have moved the
definitions below the includes in this commit. I plan to move the rest
of the DEBUG_TYPE macros in LLVM in subsequent commits; this one is big
enough.
The comments in Debug.h, which were hilariously out of date already,
have been updated to reflect the recommended practice going forward.
llvm-svn: 206822
Removes some extra manual dynamic memory allocation/management. It does
get a bit quirky having to make State's members mutable and
pointers/references to const rather than non-const, but that's a
necessary workaround to dealing with the std::set elements.
llvm-svn: 206807
The comment claimed that the register class information wasn't available
in the assembly parser, but that's not really true. It's just annoying to
get to. Replace the helper functions with references to the auto-generated
information.
llvm-svn: 206802
With a constant mask a vpermil* is just a shufflevector. This patch implements
that simplification. This allows us to produce denser code. It should also
allow more folding down the line.
llvm-svn: 206801
The canonical form for the extended addressing mode (e.g.,
"[x1, w2, uxtw #3]" is for the MCInst to have the second register be the
full 64-bit GPR64 register class. The instruction printer cleans up
the output for display to show the 32-bit register instead, per the
specification.
This simplifies 205893 now that the aliasing is handled in the printer
in 206495 so that the codegen path and the disassembler path give the
same MCInst form.
llvm-svn: 206797
The rationale for this artificial dependency seems to have been lost to the
ravages of time, it is covered by no regression tests, and has no impact on
test-suite performance numbers on either x86 or PPC.
For the test suite, on both x86 and PPC, I ran the test suite 10 times (both as
a baseline and with this change), and found no statistically-significant
changes. For PPC, I used a P7 box. For x86, I used an Intel Xeon E5430. Both
with -O3 -mcpu=native.
This was discussed on-list back in January, but I've not had a chance to run
the performance tests until today.
llvm-svn: 206795
This avoids copying the container by simply deleting until empty.
While I'd rather move to a stricter ownership semantic (unique_ptr),
SmallPtrSet can't cope with unique_ptr and the ownership semantics here
are a bit incestuous (Module sort of owns itself, but sort of doesn't
(if the LLVMContext is destroyed before the Module, then it deregisters
itself from the context... )).
Ideally Modules would be given to the context, or possibly an
emplace-like function to construct them there. Modules then shouldn't be
destroyed by LLVM API clients, but by interacting with the owner
(LLVMContext) directly (but even then, passing a Module* to LLVMContext
doesn't provide an easy way to destroy the Module, since the set would
be over unique_ptrs and you'd need a heterogenous lookup function which
SmallPtrSet doesn't have either).
llvm-svn: 206794
With this MC is able to handle _GLOBAL_OFFSET_TABLE_ in 64 bit mode, which is
needed for medium and large code models.
This fixes pr19470.
llvm-svn: 206793
The -tailcallelim pass should be checking if byval or inalloca args can
be captured before marking calls as tail calls. This was the real root
cause of PR7272.
With a better fix in place, revert the inliner change from r105255. The
test case it introduced still passes and has been moved to
test/Transforms/Inline/byval-tail-call.ll.
Reviewers: chandlerc
Differential Revision: http://reviews.llvm.org/D3403
llvm-svn: 206789
Requires switching some vectors to lists to maintain pointer validity.
These could be changed to forward_lists (singly linked) with a bit more
work - I've left comments to that effect.
llvm-svn: 206780
Summary:
The INSERTPS pattern fragment was called insrtps (mising 'e'), which
would make it harder to grep for the patterns related to this instruction.
Renaming it to use the proper instruction name.
Reviewers: nadav
CC: llvm-commits
Differential Revision: http://reviews.llvm.org/D3443
llvm-svn: 206779
header files and into the cpp files.
These files will require more touches as the header files actually use
DEBUG(). Eventually, I'll have to introduce a matched #define and #undef
of DEBUG_TYPE for the header files, but that comes as step N of many to
clean all of this up.
llvm-svn: 206777
This reverts commit r206707, reapplying r206704. The preceding commit
to CalcSpillWeights should have sorted out the failing buildbots.
<rdar://problem/14292693>
llvm-svn: 206766
We normally don't drop functions from the C API's, but in this case I think we
can:
* The old implementation of getFileOffset was fairly broken
* The introduction of LLVMGetSymbolFileOffset was itself a C api breaking
change as it removed LLVMGetSymbolOffset.
* It is an incredibly specialized use case. The only reason MCJIT needs it is
because of its odd position of being a dynamic linker of .o files.
llvm-svn: 206750
LazyCallGraph analysis framework. Wire it up all the way through the opt
driver and add some very basic testing that we can build pass pipelines
including these components. Still a lot more to do in terms of testing
that all of this works, but the basic pieces are here.
There is a *lot* of boiler plate here. It's something I'm going to
actively look at reducing, but I don't have any immediate ideas that
don't end up making the code terribly complex in order to fold away the
boilerplate. Until I figure out something to minimize the boilerplate,
almost all of this is based on the code for the existing pass managers,
copied and heavily adjusted to suit the needs of the CGSCC pass
management layer.
The actual CG management still has a bunch of FIXMEs in it. Notably, we
don't do *any* updating of the CG as it is potentially invalidated.
I wanted to get this in place to motivate the new analysis, and add
update APIs to the analysis and the pass management layers in concert to
make sure that the *right* APIs are present.
llvm-svn: 206745
became empty. This would manifest later as an assert failure due to
a non-empty list map but an empty result map. This doesn't easily
manifest with just the module pass manager and the function pass
manager, but the next commit will add the CGSCC pass manager that hits
this assert immediately.
llvm-svn: 206744
Generating BZHI in the variable mask case, i.e. (and X, (sub (shl 1, N), 1)),
was already supported, but we were missing the constant-mask case. This patch
fixes that.
<rdar://problem/15480077>
llvm-svn: 206738
file. This will make it easy to scale up the number of passes supported.
Currently, it just supports the function and module transformation
passes that were already supported in the opt tool explicitly.
llvm-svn: 206737
It could even be made non-virtual if it weren't for bad compiler
warnings.
This demonstrates that ArgList objects aren't destroyed polymorphically
and possibly that they aren't even used polymorphically. If that's the
case, it might be possible to refactor the two ArgList types more
separately and simplify the Arg ownership model. *continues
experimenting*
llvm-svn: 206727
This might be able to be simplified further by using Arg as a value type
in a linked list (to maintain pointer validity), but here's something
simple to start with.
llvm-svn: 206724
reason to expose a global symbol 'decodeInstruction' nor to pollute the global
scope with a bunch of external linkage entities (some of which conflict with
others elsewhere in LLVM).
This is just the initial transition to C++; more cleanups to follow.
llvm-svn: 206717
entirely clear whether this should be valid with modules enabled, but the fixed
code is cleaner regardless.
Also fix a TU-local type that accidentally had external linkage.
llvm-svn: 206714
This reverts commit r206677, reapplying my BlockFrequencyInfo rewrite.
I've done a careful audit, added some asserts, and fixed a couple of
bugs (unfortunately, they were in unlikely code paths). There's a small
chance that this will appease the failing bots [1][2]. (If so, great!)
If not, I have a follow-up commit ready that will temporarily add
-debug-only=block-freq to the two failing tests, allowing me to compare
the code path between what the failing bots and what my machines (and
the rest of the bots) are doing. Once I've triggered those builds, I'll
revert both commits so the bots go green again.
[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816
[2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18445
<rdar://problem/14292693>
llvm-svn: 206704
Win64 stack unwinder gets confused when execution flow "falls through" after
a call to 'noreturn' function. This fixes the "missing epilogue" problem by
emitting a trap instruction for IR 'unreachable' on x86_x64-pc-windows.
A secondary use for it would be for anyone wanting to make double-sure that
'noreturn' functions, indeed, do not return.
llvm-svn: 206684