Based on discussions with Lang Hames and Jakob Stoklund Olesen at the hacker's lab, and in the light of upcoming work on the PBQP register allocator, it was though that CalcSpillWeights does not need to be a pass. This change will enable to customize / tune the spill weight computation depending on the allocator.
Update the documentation style while there.
No functionnal change.
llvm-svn: 194356
This is still just a skeleton. I'm trying to pull together the
experimentation I've done into committable chunks, and this is the first
coherent one. Others will follow in hopefully short order that move this
more toward a useful initial implementation. I still expect the design
to continue evolving in small ways as I work through the different
requirements and features needed here though.
Keep in mind, all of this is off by default.
Currently, this mostly exercises the use of a polymorphic smart pointer
and templates to hide the polymorphism for the pass manager from the
pass implementation. The next step will be more significant, adding the
first framework of analysis support.
llvm-svn: 194325
give the files a legacy prefix in the right directory. Use forwarding
headers in the old locations to paper over the name change for most
clients during the transitional period.
No functionality changed here! This is just clearing some space to
reduce renaming churn later on with a new system.
Even when the new stuff starts to go in, it is going to be hidden behind
a flag and off-by-default as it is still WIP and under development.
This patch is specifically designed so that very little out-of-tree code
has to change. I'm going to work as hard as I can to keep that the case.
Only direct forward declarations of the PassManager class are impacted
by this change.
llvm-svn: 194324
unique ownership smart pointer which is *deep* copyable by assuming it
can call a T::clone() method to allocate a copy of the owned data.
This is mostly useful with containers or other collections of uniquely
owned data in C++98 where they *might* copy. With C++11 we can likely
remove this in favor of move-only types and containers wrapped around
those types.
llvm-svn: 194315
This patch moves the jump address materialization inside the noop slide. This
enables patching of the materialization itself or its complete removal. This
patch also adds the ability to define scratch registers that can be used safely
by the code called from the patchpoint intrinsic. At least one scratch register
is required, because that one is used for the materialization of the jump
address. This patch depends on D2009.
Differential Revision: http://llvm-reviews.chandlerc.com/D2074
Reviewed by Andy
llvm-svn: 194306
The new graph structure replaces the node and edge linked lists with vectors.
Free lists (well, free vectors) are used for fast insertion/deletion.
The ultimate aim is to make PBQP graphs cheap to clone. The motivation is that
the PBQP solver destructively consumes input graphs while computing a solution,
forcing the graph to be fully reconstructed for each round of PBQP. This
imposes a high cost on large functions, which often require several rounds of
solving/spilling to find a final register allocation. If we can cheaply clone
the PBQP graph and incrementally update it between rounds then hopefully we can
reduce this cost. Further, once we begin pooling matrix/vector values (future
work), we can cache some PBQP solver metadata and share it between cloned
graphs, allowing the PBQP solver to re-use some of the computation done in
earlier rounds.
For now this is just a data structure update. The allocator and solver still
use the graph the same way as before, fully reconstructing it between each
round. I expect no material change from this update, although it may change
the iteration order of the nodes, causing ties in the solver to break in
different directions, and this could perturb the generated allocations
(hopefully in a completely benign way).
Thanks very much to Arnaud Allard de Grandmaison for encouraging me to get back
to work on this, and for a lot of discussion and many useful PBQP test cases.
llvm-svn: 194300
The idea of the AnyReg Calling Convention is to provide the call arguments in
registers, but not to force them to be placed in a paticular order into a
specified set of registers. Instead it is up tp the register allocator to assign
any register as it sees fit. The same applies to the return value (if
applicable).
Differential Revision: http://llvm-reviews.chandlerc.com/D2009
Reviewed by Andy
llvm-svn: 194293
Based on discussions with Lang Hames and Jakob Stoklund Olesen at the hacker's lab, and in the light of upcoming work on the PBQP register allocator, it was though that CalcSpillWeights does not need to be a pass. This change will enable to customize / tune the spill weight computation depending on the allocator.
Update the documentation style while there.
No functionnal change.
llvm-svn: 194269
Patch by Michele Scandale!
Rewrite of the functions used to compute the backedge taken count of a
loop on LT and GT comparisons.
I decided to split the handling of LT and GT cases becasue the trick
"a > b == -a < -b" in some cases prevents the trip count computation
due to the multiplication by -1 on the two operands of the
comparison. This issue comes from the conservative computation of
value range of SCEVs: taking the negative SCEV of an expression that
have a small positive range (e.g. [0,31]), we would have a SCEV with a
fullset as value range.
Indeed, in the new rewritten function I tried to better handle the
maximum backedge taken count computation when MAX/MIN expression are
used to handle the cases where no entry guard is found.
Some test have been modified in order to check the new value correctly
(I manually check them and reasoning on possible overflow the new
values seem correct).
I finally added a new test case related to the multiplication by -1
issue on GT comparisons.
llvm-svn: 194116
One of the uses of the IsValid flag is to support default constructing
a ErrorOr that is not a Error or a Value. There is not much value in
doing that IMHO. If ErrorOr was to have a default constructor, it
should be implemented by default constructing the value, but even that
looks unnecessary.
The other use is to avoid calling destructors on moved objects. This
looks wrong. If the data being moved has non trivial treatment of
moves (an std::vector for example), it is its destructor that should
handle it, not ~ErrorOr.
With this change ErrorOr becomes a fairly simple wrapper and should
always be better than using an error_code + value in an API.
llvm-svn: 194109
This patch enables llvm-cov to correctly output the run count stored in
the GCDA file. GCOVProfiling currently does not generate this
information, so the GCDA run data had to be hacked on from a GCDA file
generated by gcc. This is corrected by a subsequent patch.
With the run and program data included, both llvm-cov and gcov produced
the same output.
llvm-svn: 194033
ErrorOr had quiet a bit of complexity and indirection to be able to hold a user
type with the error.
That feature is not used anymore. This patch removes it, it will live in svn
history if we ever need it again.
If we do need it again, IMHO there is one thing that should be done
differently: Holding extra info in the error is not a property a function also
returning a value or not. The ability to hold extra info should be in the error
type and ErrorOr templated over it so that we don't need the funny looking
ErrorOr<void>.
llvm-svn: 194030
As with the other loop unrolling parameters (the unrolling threshold, partial
unrolling, etc.) runtime unrolling can now also be controlled via the
constructor. This will be necessary for moving non-trivial unrolling late in
the pass manager (after loop vectorization).
No functionality change intended.
llvm-svn: 194027
stack traces by default if you use PrettyStackTraceProgram, so that existing LLVM-based
tools will continue to get it without any changes.
llvm-svn: 193971
This adds an SimplifyLibCalls case which converts the special __sinpi and
__cospi (float & double variants) into a __sincospi_stret where appropriate to
remove duplicated work.
Patch by Tim Northover
llvm-svn: 193943
linkonce_odr_auto_hide was in incomplete attempt to implement a way
for the linker to hide symbols that are known to be available in every
TU and whose addresses are not relevant for a particular DSO.
It was redundant in that it all its uses are equivalent to
linkonce_odr+unnamed_addr. Unlike those, it has never been connected
to clang or llvm's optimizers, so it was effectively dead.
Given that nothing produces it, this patch just nukes it
(other than the llvm-c enum value).
llvm-svn: 193865
Objective-C data structures.
This is allows tools such as darwin's otool(1) that uses the
LLVM disassembler take a pointer value being loaded by
an instruction and add a comment to what it is being referenced
to make following disassembly of Objective-C programs
more readable.
For example disassembling the Mac OS X TextEdit app one
will see comments like the following:
movq 0x20684(%rip), %rsi ## Objc selector ref: standardUserDefaults
movq 0x21985(%rip), %rdi ## Objc class ref: _OBJC_CLASS_$_NSUserDefaults
movq 0x1d156(%rip), %r14 ## Objc message: +[NSUserDefaults standardUserDefaults]
leaq 0x23615(%rip), %rdx ## Objc cfstring ref: @"SelectLinePanel"
callq 0x10001386c ## Objc message: -[[%rdi super] initWithWindowNibName:]
These diffs also include putting quotes around C strings
in literal pools and uses "symbol address" in the comment
when adding a symbol name to the comment to tell these
types of references apart:
leaq 0x4f(%rip), %rax ## literal pool for: "Hello world"
movq 0x1c3ea(%rip), %rax ## literal pool symbol address: ___stack_chk_guard
Of course the easy changes are in the LLVM disassembler and
the hard work is up to the implementer of the SymbolLookUp()
call back.
rdar://10602439
llvm-svn: 193833
There are two ways one could implement hiding of linkonce_odr symbols in LTO:
* LLVM tells the linker which symbols can be hidden if not used from native
files.
* The linker tells LLVM which symbols are not used from other object files,
but will be put in the dso symbol table if present.
GOLD's API is the second option. It was implemented almost 1:1 in llvm by
passing the list down to internalize.
LLVM already had partial support for the first option. It is also very similar
to how ld64 handles hiding these symbols when *not* doing LTO.
This patch then
* removes the APIs for the DSO list.
* marks LTO_SYMBOL_SCOPE_DEFAULT_CAN_BE_HIDDEN all linkonce_odr unnamed_addr
global values and other linkonce_odr whose address is not used.
* makes the gold plugin responsible for handling the API mismatch.
llvm-svn: 193800
With this patch llvm produces a weak_def_can_be_hidden for linkonce_odr
if they are also unnamed_addr or don't have their address taken.
There is not a lot of documentation about .weak_def_can_be_hidden, but
from the old discussion about linkonce_odr_auto_hide and the name of
the directive this looks correct: these symbols can be hidden.
Testing this with the ld64 in Xcode 5 linking clang reduces the number of
exported symbols from 21053 to 19049.
llvm-svn: 193718
startswith_lower is ocassionally useful and I think worth adding.
endwith_lower is added for completeness.
Differential Revision: http://llvm-reviews.chandlerc.com/D2041
llvm-svn: 193706
Also corrected the definition of the intrinsics for these instructions (the
result register is also the first operand), and added intrinsics for bsel and
bseli to clang (they already existed in the backend).
These four operations are mostly equivalent to bsel, and bseli (the difference
is which operand is tied to the result). As a result some of the tests changed
as described below.
bitwise.ll:
- bsel.v test adapted so that the mask is unknown at compile-time. This stops
it emitting bmnzi.b instead of the intended bsel.v.
- The bseli.b test now tests the right thing. Namely the case when one of the
values is an uimm8, rather than when the condition is a uimm8 (which is
covered by bmnzi.b)
compare.ll:
- bsel.v tests now (correctly) emits bmnz.v instead of bsel.v because this
is the same operation (see MSA.txt).
i8.ll
- CHECK-DAG-ized test.
- bmzi.b test now (correctly) emits equivalent bmnzi.b with swapped operands
because this is the same operation (see MSA.txt).
- bseli.b still emits bseli.b though because the immediate makes it
distinguishable from bmnzi.b.
vec.ll:
- CHECK-DAG-ized test.
- bmz.v tests now (correctly) emits bmnz.v with swapped operands (see
MSA.txt).
- bsel.v tests now (correctly) emits bmnz.v with swapped operands (see
MSA.txt).
llvm-svn: 193693
This problem was found and fixed by José Fonseca in March 2011 for
SmallPtrSet, committed r128566. But as far as I can tell, all other
llvm hash tables retain the same problem: the bucket count can grow
without bound while size() remains near constant by repeated
insert/erase cycles that tend to fill the container with tombstones.
Here is a demo that has been reduced to a trivial case:
int
main()
{
llvm::DenseSet<unsigned> d;
for (unsigned i = 0; i < 0xFFFFFFF; ++i)
{
d.insert(i);
d.erase(i);
}
}
While the container size() never grows above 1, the bucket count grows
like this:
nb = 64
nb = 128
nb = 256
nb = 512
nb = 1024
nb = 2048
nb = 4096
nb = 8192
nb = 16384
nb = 32768
nb = 65536
nb = 131072
nb = 262144
nb = 524288
nb = 1048576
nb = 2097152
nb = 4194304
nb = 8388608
nb = 16777216
nb = 33554432
nb = 67108864
nb = 134217728
nb = 268435456
The above program currently consumes a few GB ram. This patch brings
the memory consumption down by several orders of magnitude, and keeps
the bucket count at 64 for the above test.
llvm-svn: 193689
This required correcting the definition of the bins[lr]i intrinsics because
the result is also the first operand.
It also required removing the (arbitrary) check for 32-bit immediates in
MipsSEDAGToDAGISel::selectVSplat().
Currently using binsli.d with 2 bits set in the mask doesn't select binsli.d
because the constant is legalized into a ConstantPool. Similar things can
happen with binsri.d with more than 10 bits set in the mask. The resulting
code when this happens is correct but not optimal.
llvm-svn: 193687
This modifies the pass to classify every SSP-triggering AllocaInst according to
an SSPLayoutKind (LargeArray, SmallArray, AddrOf). This analysis is collected
by the pass and made available for use, but no other pass uses it yet.
The next patch will make use of this analysis in PEI and StackSlot
passes. The end goal is to support ssp-strong stack layout rules.
WIP.
Differential Revision: http://llvm-reviews.chandlerc.com/D1789
llvm-svn: 193653
Use 32-bit types for the array instead of 64. This should
generally be better anyway.
In optimized + assert builds, I saw a failure when a
cond code / type combination that is never set was loading
a non-zero value and hitting the != Promote assert.
It turns out when loading the 64-bit value to do the shift,
the assembly loads the 2 32-bit halves from non-consecutive
addresses. The address the second half of the loaded uint64_t
doesn't include the offset of the array in the struct. Instead
of being offset + 4, it's just + 4.
I'm not entirely sure why this wasn't observed before.
setCondCodeAction isn't heavily used by the in-tree targets,
and not with the higher valued vector SimpleValueTypes. Only
PPC is using one of the > 32 valued types, and that is probably
never used by anyone on a 32-bit MSVC compiled host.
I ran into this when upgrading LLVM versions, so I guess the
value loaded from the nonsense address happened to work out
before.
No test since I'm not really sure if / how it can be reproduced
with the current in tree targets, and it's not supposed to change
anything.
llvm-svn: 193650
Summary:
Use DWARF4 table of form classes to fetch attributes from DIE
in a more consistent way. This shouldn't change the functionality and
serves as a refactoring for upcoming change: DW_AT_high_pc has different
semantics depending on its form class.
Reviewers: dblaikie, echristo
Reviewed By: echristo
CC: echristo, llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D1961
llvm-svn: 193553
This commit allows the ARM integrated assembler to parse
and assemble the code with .eabi_attribute, .cpu, and
.fpu directives.
To implement the feature, this commit moves the code from
AttrEmitter to ARMTargetStreamers, and several new test
cases related to cortex-m4, cortex-r5, and cortex-a15 are
added.
Besides, this commit also change the Subtarget->isFPOnlySP()
to Subtarget->hasD16() to match the usage of .fpu directive.
This commit changes the test cases:
* Several .eabi_attribute directives in
2010-09-29-mc-asm-header-test.ll are removed because the .fpu
directive already cover the functionality.
* In the Cortex-A15 test case, the value for
Tag_Advanced_SIMD_arch has be changed from 1 to 2,
which is more precise.
llvm-svn: 193524
Most SelectionDAG code drops the TBAA info when creating a new form of a
load and store (e.g. during legalization, or when converting a plain
load to an extending one). This patch tries to catch all cases where
the TBAA information can legitimately be carried over.
The patch adds alternative forms of getLoad() and getExtLoad() that take
a MachineMemOperand instead of individual fields. (The corresponding
getTruncStore() already exists.) The idea is to use the MachineMemOperand
forms when all fields are carried over (size, pointer info, isVolatile,
isNonTemporal, alignment and TBAA info). If some adjustment is being
made, e.g. to narrow the load, then we still pass the individual fields
but also pass the TBAA info.
llvm-svn: 193517
Partial fix for PR17459: wrong code at -O3 on x86_64-linux-gnu
(affecting trunk and 3.3)
When SCEV expands a recurrence outside of a loop it attempts to scale
by the stride of the recurrence. Chained recurrences don't work that
way. We could compute binomial coefficients, but would hve to
guarantee that the chained AddRec's are in a perfectly reduced form.
llvm-svn: 193438
This fix a memory leak found by valgrind.
Calling it from the base class destructor would not destroy the BasicCallGraph
bits.
FIXME: BasicCallGraph is the only thing that inherits from CallGraph. Can
we merge the two?
llvm-svn: 193412
When assembling, a .thumb_func directive is supposed to be applicable to the
next symbol definition, even if there are intervening directives. We were
racing ahead to try and find it, and this commit should fix the issue.
Patch by Gabor Ballabas
llvm-svn: 193403
ARM processors without ldrex/strex need to be able to make libcalls for all
atomic operations, including the newer min/max versions.
The alternative would probably be expanding these operations in terms of
cmpxchg (as x86 does always), but in the configurations where this matters
code-size tends to be paramount so the libcall is more desirable.
llvm-svn: 193398
llvm-cov will now be able to read program counts from the GCDA file and
output it in the same format as gcov. The program summary tag was
identified from gcov-io.h as "\0\0\0\a3".
There is currently a bug in GCOVProfiling.cpp which does not generate
the
run- or program-counting IR, so this change was tested manually by
modifying the GCDA file and comparing the gcov and llvm-cov outputs.
llvm-svn: 193389
Also improve the implementation of EmitRawText(Twine) so it doesn't
bother using the SmallString buffer if the Twine is a simple StringRef
anyway.
llvm-svn: 193378
This reverts commit r193255 and instead creates an lto_bool_t typedef
that points to bool, _Bool, or unsigned char depending on what is
available. Only recent versions of MSVC provide a stdbool.h header.
Reviewers: rafael.espindola
Differential Revision: http://llvm-reviews.chandlerc.com/D2019
llvm-svn: 193377
LLVM optimizers may widen accesses to packed structures that overflow the structure itself, but should be in bounds up to the alignment of the object
llvm-svn: 193317
This was a fundamental flaw in llvm-cov where it treated the values in
the GCDA files as block counts instead of edge counts. This created
incorrect line counts when branching was present. Instead, the edge
counts should be summed to obtain the correct block count.
The fix was tested using custom test files as well as single source
files from the test-suite directory. The behaviour can be verified by
reading the GCOV documentation that describes the GCDA spec ("ARC_COUNTS
gives the counter values for those arcs that are instrumented") and the
header description provided by GCOVProfiling.cpp ("instruments the code
that runs to records (sic) the edges between blocks that run and emit a
complementary "gcda" file on exit").
llvm-svn: 193299
There are a few motivations for this:
- Using a map allows for checking if line is in map. This differentiates
unexecutable lines (such as comments) from unexecuted logical lines of
code. "#####" is now outputted in this case, in line with gcov.
- Source files are no longer read in twice: once when storing the line
counts, and once when outputting the data.
- Greatly simplifies the function FileInfo::addLineCount().
llvm-svn: 193264
Major steps include:
1). introduces a not-addr-taken bit-field in GlobalVariable
2). GlobalOpt pass sets "not-address-taken" if it proves a global varirable
dosen't have its address taken.
3). AA use this info for disambiguation.
llvm-svn: 193251
For some targets, it is useful to be able to look at the original
type of an argument without having to dig through the original IR.
This also fixes a bug in SelectionDAGBuilder where InputArg.PartOffset
was not taking into account the offset of structure elements.
Patch by: Justin Holewinski
Tom Stellard:
- Changed the type of ArgVT to EVT, so it can store non-simple types
like v3i32.
llvm-svn: 193214
Line counts in llvm-cov are read in as 64-bit integers but were being truncated
to 32-bit in collectLineCounts(), which caused overflow for large counts.
This patch fixes all counts to be uint64_t.
Patch by Yuchen Wu!
llvm-svn: 193172
VTList has a long life cycle through the module and getVTList is frequently called. In current getVTList, sequential search over a std::vector is used, this is inefficient in big module.
This patch use FoldingSet to implement hashing mechanism when searching.
Reviewer: Nadav Rotem
Test : Pass unit tests & LNT test suite
llvm-svn: 193150
- Replaced tabs with proper padding
- print() takes two arguments, which are the GCNO and GCDA filenames
- Files are listed at the top of output, appended by line 0
- Stripped strings of trailing \0s
- Removed last two lines of whitespace in output
Patch by Yuchen Wu!
llvm-svn: 193148
This allows various variables to be more self-documenting and easier to
debug by being of specific types without overlapping enum values.
Precommit review by Eric Christopher.
llvm-svn: 193091
When a linkonce_odr value that is on the dso list is not unnamed_addr
we can still look to see if anything is actually using its address. If
not, it is safe to hide it.
This patch implements that by moving GlobalStatus to Transforms/Utils
and using it in Internalize.
llvm-svn: 193090
This is another (final?) stab at making us able to parse our own asm output
on Windows.
Symbols on Windows often contain @'s and ?'s in their names. Our asm parser
didn't like this. ?'s were not allowed, and @'s were intepreted as trying to
reference PLT/GOT/etc.
We can't just add quotes around the bad names, since e.g. for MinGW, we use gas
to assemble, and it doesn't like quotes in some places (notably in .def
directives).
This commit makes us allow ?'s in symbol names, and @'s in symbol names for MS
assembly.
Differential Revision: http://llvm-reviews.chandlerc.com/D1978
llvm-svn: 193000
There are targets that support i128 sized scalars but cannot emit
instructions that modify them directly. The proper thing to do is to
emit a libcall.
This fixes PR17481.
llvm-svn: 192957
All of the Core API functions have versions which accept explicit context, in
addition to ones which work on global context. This commit adds functions
which accept explicit context to the Target API for consistency.
Patch by Peter Zotov
Differential Revision: http://llvm-reviews.chandlerc.com/D1912
llvm-svn: 192913
class. The instruction class includes the signed saturating doubling
multiply-add long, signed saturating doubling multiply-subtract long, and
the signed saturating doubling multiply long instructions.
llvm-svn: 192908