Original commit message:
Use uint16_t to store InstrNameIndices in MCInstrInfo. Add asserts to protect
all 16-bit string table offsets. Also make sure the string to offset table
string is not larger than 65536 characters since larger string literals aren't
portable.
llvm-svn: 152233
compilers. It seems that GCC 4.3 (and likely older) simply aren't going
to do SFINAE on non-type template parameters the way Clang and modern
GCCs do...
Now we detect the implicit conversion to an integer type, and then
blacklist classes, pointers, and floating point types. This seems to
work well enough, and I'm hopeful will return the bots to life.
llvm-svn: 152227
traits.
With this change, the pattern used here is *extremely* close to the
pattern used elsewhere in the file, so I'm hoping it survives the
build-bots.
llvm-svn: 152225
template argument and an *implicit* conversion from '0' to a null
pointer. For some bizarre reason, GCC 4.3.2 thinks that the cast to
'(T*)' is invalid inside of an enumerator's value... which it isn't but
whatever. ;] This pattern is used elsewhere in the type_traits header
and so hopefully will survive the wrath of the build bots.
llvm-svn: 152220
Deleting them because they aren't used. =D
Yell if you need these, I'm happy to instead replace them with nice uses
of the new infrastructure.
llvm-svn: 152219
This one is particularly annoying because the hashing algorithm is
highly specialized, with a strange "equivalence" definition that subsets
the fields involved.
Still, this looks at the exact same set of data as the old code, but
without bitwise or-ing over parts of it and other mixing badness. No
functionality changed here. I've left a substantial fixme about the fact
that there is a cleaner and more principled way to do this, but it
requires making the equality definition actual stable for particular
types...
llvm-svn: 152218
integral and enumeration types. This is accomplished with a bit of
template type trait magic. Thanks to Richard Smith for the core idea
here to detect viable types by detecting the set of types which can be
default constructed in a template parameter.
This is used (in conjunction with a system for detecting nullptr_t
should it exist) to provide an is_integral_or_enum type trait that
doesn't need a whitelist or direct compiler support.
With this, the hashing is extended to the more general facility. This
will be used in a subsequent commit to hashing more things, but I wanted
to make sure the type trait magic went through the build bots separately
in case other compilers don't like this formulation.
llvm-svn: 152217
the DebugLoc information can be maintained throughout by grabbing the DebugLoc
before the RemoveBranch and then passing the result to the InsertBranch.
Patch by Andrew Stanford-Jason!
llvm-svn: 152212
ScheduleDAG is responsible for the DAG: SUnits and SDeps. It provides target hooks for latency computation.
ScheduleDAGInstrs extends ScheduleDAG and defines the current scheduling region in terms of MachineInstr iterators. It has access to the target's scheduling itinerary data. ScheduleDAGInstrs provides the logic for building the ScheduleDAG for the sequence of MachineInstrs in the current region. Target's can implement highly custom schedulers by extending this class.
ScheduleDAGPostRATDList provides the driver and diagnostics for current postRA scheduling. It maintains a current Sequence of scheduled machine instructions and logic for splicing them into the block. During scheduling, it uses the ScheduleHazardRecognizer provided by the target.
Specific changes:
- Removed driver code from ScheduleDAG. clearDAG is the only interface needed.
- Added enterRegion/exitRegion hooks to ScheduleDAGInstrs to delimit the scope of each scheduling region and associated DAG. They should be used to setup and cleanup any region-specific state in addition to the DAG itself. This is necessary because we reuse the same ScheduleDAG object for the entire function. The target may extend these hooks to do things at regions boundaries, like bundle terminators. The hooks are called even if we decide not to schedule the region. So all instructions in a block are "covered" by these calls.
- Added ScheduleDAGInstrs::begin()/end() public API.
- Moved Sequence into the driver layer, which is specific to the scheduling algorithm.
llvm-svn: 152208
to hash_combine. One of the interfaces could already do this, and the
other can just use a small buffer. This is a much more efficient way to
use the hash_combine interface, although I don't have any particular
benchmark where this code was hot, so I can't measure much of an impact.
It at least doesn't slow anything down.
llvm-svn: 152200
"is sized". This prevents every query to isSized() from recursing over
every sub-type of a struct type. This could get *very* slow for
extremely deep nesting of structs, as in 177.mesa.
This change is a 45% speedup for 'opt -O2' of 177.mesa.linked.bc, and
likely a significant speedup for other cases as well. It even impacts
-O0 cases because so many part of the code try to check whether a type
is sized.
Thanks for the review from Nick Lewycky and Benjamin Kramer on IRC.
llvm-svn: 152197
This currently assumes that both sets have the same SmallSize to keep the implementation simple,
a limitation that can be lifted if someone cares.
llvm-svn: 152143
- On OS X 10.7+ this is apparently recommended practice. This maybe should
become a configurey thing one day, but I'm not sure it is right to
automatically turn it on.
llvm-svn: 152133
When an instruction only writes sub-registers, it is still necessary to
add an <imp-def> operand for the super-register. When reloading into a
virtual register, rewriting will add the operand, but when loading
directly into a virtual register, the <imp-def> operand is still
necessary.
llvm-svn: 152095
The fpscr register contains both flags (set by FP operations/comparisons) and
control bits. The control bits (FPSCR) should be reserved, since they're always
available and needn't be defined before use. The flag bits (FPSCR_NZCV) should
like to be unreserved so they can be hoisted by MachineCSE. This fixes PR12165.
llvm-svn: 152076
With the new composite physical registers to represent arbitrary pairs
of DPR registers, we don't need the pseudo-registers anymore. Get rid of
a bunch of them that use DPR register pairs and just use the real
instructions directly instead.
llvm-svn: 152045
Specifically, remove the magic number when checking to see if the copy has a
glue operand and simplify the checking logic.
rdar://10930395
llvm-svn: 152041
This was testing the handling of sub-register coalescing followed by
remat. The original problem was caused by the extra <imp-def> operands
added by sub-register coalescing. Those <imp-def> operands are not
added any longer, and the test case passes even when the original patch
is reverted.
llvm-svn: 152040
In this update:
- I assumed neon2 does not imply vfpv4, but neon and vfpv4 imply neon2.
- I kept setting .fpu=neon-vfpv4 code attribute because that is what the
assembler understands.
Patch by Ana Pazos <apazos@codeaurora.org>
llvm-svn: 152036
This implicitly fixes a nasty bug in the GVN hashing (that thankfully
could only manifest as a performance bug): actually include the opcode
in the hash. The old code started the hash off with the opcode, but then
overwrote it with the type pointer.
Since this is likely to be pretty hot (GVN being already pretty
expensive) I've included a micro-optimization to just not bother with
the varargs hashing if they aren't present. I can't measure any change
in GVN performance due to this, even with a big test case like Duncan's
sqlite one. Everything I see is in the noise floor. That said, this
closes a loop hole for a potential scaling problem due to collisions if
the opcode were the differentiating aspect of the expression.
llvm-svn: 152025
hashing infrastructure. I wonder why we don't just use StringMap here,
and I may revisit the issue if I have time, but for now I'm just trying
to consolidate.
llvm-svn: 152023
complains about the truncation of a 64-bit constant to a 32-bit value
when size_t is 32-bits wide, but *only with static_cast*!!! The exact
signal that should *silence* such a warning, and in fact does silence it
with both GCC and Clang.
Anyways, this was causing grief for all the MSVC builds, so pointless
change made. Thanks to Nikola on IRC for confirming that this works.
llvm-svn: 152021
The first def of a virtual register cannot also read the register.
Assert on such bad machine code instead of trying to fix it.
TwoAddressInstructionPass should never create code like that.
llvm-svn: 152010
MachineOperands that define part of a virtual register must have an
<undef> flag if they are not intended as read-modify-write operands.
The old trick of adding an <imp-def> operand doesn't work any longer.
Fixes PR12177.
llvm-svn: 152008
equalities into phi node operands for which the equality is known to
hold in the incoming basic block. That's because replaceAllDominatedUsesWith
wasn't handling phi nodes correctly in general (that this didn't give wrong
results was just luck: the specific way GVN uses replaceAllDominatedUsesWith
precluded wrong changes to phi nodes).
llvm-svn: 152006
new hash_value infrastructure, and replace their implementations using
hash_combine. This removes a complete copy of Jenkin's lookup3 hash
function (which is both significantly slower and lower quality than the
one implemented in hash_combine) along with a somewhat scary xor-only
hash function.
Now that APInt and APFloat can be passed directly to hash_combine,
simplify the rest of the LLVMContextImpl hashing to use the new
infrastructure.
llvm-svn: 152004
Some BBs can become dead after codegen preparation. If we delete them here, it
could help enable tail-call optimizations later on.
<rdar://problem/10256573>
llvm-svn: 152002
- Shrink the opcode field to 16 bits.
- Shrink the AsmVariantID field to 8 bits.
- Store the mnemonic string in a string table, store a 16 bit index.
- Store a pascal-style length byte in the string instead of a null terminator,
so we can avoid calling strlen on every entry we visit during mnemonic search.
Shrinks X86AsmParser.o from 434k to 201k on x86_64 and eliminates relocs from the table.
llvm-svn: 151984
This could probably be made a lot smarter, but this is a common case and doesn't require LVI to scan a lot
of code. With this change CVP can optimize away the "shift == 0" case in Hashing.h that only gets hit when
"shift" is in a range not containing 0.
llvm-svn: 151919
folks who know something about PPC tell me that the byte swap is crazy
fast and without this the bit mixture would actually be different. It
might not be worse, but I've not measured it and so I'd rather not trust
it. This way, the algorithm is identical on both endianness hosts. I'll
look into any performance issues etc stemming from this.
llvm-svn: 151892
just ensure that the number of bytes in the pair is the sum of the bytes
in each side of the pair. As long as thats true, there are no extra
bytes that might be padding.
Also add a few tests that previously would have slipped through the
checking. The more accurate checking mechanism catches these and ensures
they are handled conservatively correctly.
Thanks to Duncan for prodding me to do this right and more simply.
llvm-svn: 151891
This change replaces getTypeStoreSize with getTypeAllocSize in AddressSanitizer
instrumentation for stack allocations.
One case where old behaviour produced undesired results is an optimization in
InstCombine pass (PromoteCastOfAllocation), which can replace alloca(T) with
alloca(S), where S has the same AllocSize, but a smaller StoreSize. Another
case is memcpy(long double => long double), where ASan will poison bytes 10-15
of a stack-allocated long double (StoreSize 10, AllocSize 16,
sizeof(long double) = 16).
See http://llvm.org/bugs/show_bug.cgi?id=12047 for more context.
llvm-svn: 151887
hashable data. This matters when we have pair<T*, U*> as a key, which is
quite common in DenseMap, etc. To that end, we need to detect when this
is safe. The requirements on a generic std::pair<T, U> are:
1) Both T and U must satisfy the existing is_hashable_data trait. Note
that this includes the requirement that T and U have no internal
padding bits or other bits not contributing directly to equality.
2) The alignment constraints of std::pair<T, U> do not require padding
between consecutive objects.
3) The alignment constraints of U and the size of T do not conspire to
require padding between the first and second elements.
Grow two somewhat magical traits to detect this by forming a pod
structure and inspecting offset artifacts on it. Hopefully this won't
cause any compilers to panic.
Added and adjusted tests now that pairs, even nested pairs, are treated
as just sequences of data.
Thanks to Jeffrey Yasskin for helping me sort through this and reviewing
the somewhat subtle traits.
llvm-svn: 151883
an open question of whether we can do better than this by treating pairs
as boring data containers and directly hashing the two subobjects. This
at least makes the API reasonable.
In order to make this change, I reorganized the header a bit. I lifted
the declarations of the hash_value functions up to the top of the header
with their doxygen comments as these are intended for users to interact
with. They shouldn't have to wade through implementation details. I then
defined them at the very end so that they could be defined in terms of
hash_combine or any other hashing infrastructure.
Added various pair-hashing unittests.
llvm-svn: 151882
In this instance we are generating the tail-call during legalizeDAG. The 2nd
floor call can't be a tail call because it clobbers %xmm1, which is defined by
the first floor call. The first floor call can't be a tail-call because it's
not in the tail position. The only reasonable way I could think to fix this
in a target-independent manner was to check for glue logic on the copy reg.
rdar://10930395
llvm-svn: 151877
the hash_code. I'm not sure what I was thinking here, the use cases for
special values are in the *keys*, not in the hashes of those keys.
We can always resurrect this if needed, or clients can accomplish the
same goal themselves. This makes the general case somewhat faster (~5
cycles faster on my machine) and smaller with less branching.
llvm-svn: 151865
The inline table needs to be constructed ahead of time so that it doesn't try to
create new strings while we're emitting everything.
This reverts commit a8ff9bccb399183cdd5f1c3cec2bda763664b4b0.
llvm-svn: 151864
floating point equality comparisons into integer ones with -ffast-math. The
issue is the optimization causes +0.0 != -0.0.
Now the optimization is only done when one side is known to be 0.0. The other
side's sign bit is masked off for the comparison.
rdar://10964603
llvm-svn: 151861
to do more invasive refactoring here to get FoldingSet to use size_t or
even hash_code directly, but for now this is a good first step to remove
Yet Another Hashing Algorithm from LLVM.
llvm-svn: 151859
This function could have r12 live across a function call when compiling
thumb1 code.
The test case for this is not included because it is very long. It must
provoke emergency spilling near a function call. The behavior is
provoked by MultiSource/Applications/JM/lencod, and it triggers an
assertion in the scavenger.
<rdar://problem/10963642>
llvm-svn: 151855
fixups that are being used to determine section offsets. Reduces
the total number of fixups by 50% for a non-trivial testcase.
Part of rdar://10413936
llvm-svn: 151852
and stores was added.
- SelectAddr should return false if Parent is an unaligned f32 load or store.
- Only aligned load and store nodes should be matched to select reg+imm
floating point instructions.
- MIPS does not have support for f64 unaligned load or store instructions.
llvm-svn: 151843
so that the test will not fail when run on an Intel Atom
processor, due to the Atom scheduler producing an instruction sequence that is
different from that which is normally expected.
llvm-svn: 151832
of the proposed standard hashing interfaces (N3333), and to use
a modified and tuned version of the CityHash algorithm.
Some of the highlights of this change:
-- Significantly higher quality hashing algorithm with very well
distributed results, and extremely few collisions. Should be close to
a checksum for up to 64-bit keys. Very little clustering or clumping of
hash codes, to better distribute load on probed hash tables.
-- Built-in support for reserved values.
-- Simplified API that composes cleanly with other C++ idioms and APIs.
-- Better scaling performance as keys grow. This is the fastest
algorithm I've found and measured for moderately sized keys (such as
show up in some of the uniquing and folding use cases)
-- Support for enabling per-execution seeds to prevent table ordering
or other artifacts of hashing algorithms to impact the output of
LLVM. The seeding would make each run different and highlight these
problems during bootstrap.
This implementation was tested extensively using the SMHasher test
suite, and pased with flying colors, doing better than the original
CityHash algorithm even.
I've included a unittest, although it is somewhat minimal at the moment.
I've also added (or refactored into the proper location) type traits
necessary to implement this, and converted users of GeneralHash over.
My only immediate concerns with this implementation is the performance
of hashing small keys. I've already started working to improve this, and
will continue to do so. Currently, the only algorithms faster produce
lower quality results, but it is likely there is a better compromise
than the current one.
Many thanks to Jeffrey Yasskin who did most of the work on the N3333
paper, pair-programmed some of this code, and reviewed much of it. Many
thanks also go to Geoff Pike Pike and Jyrki Alakuijala, the original
authors of CityHash on which this is heavily based, and Austin Appleby
who created MurmurHash and the SMHasher test suite.
Also thanks to Nadav, Tobias, Howard, Jay, Nick, Ahmed, and Duncan for
all of the review comments! If there are further comments or concerns,
please let me know and I'll jump on 'em.
llvm-svn: 151822
Allows us to de-virtualize the function and provides access to it in
the instruction printer, which is useful for handling composite
physical registers (e.g., ARM register lists).
llvm-svn: 151815
This reverts commit 151760.
We want to move getSubReg() from TargetRegisterInfo into MCRegisterInfo,
but to do that, the type of the lookup table needs to be the same for
all targets.
llvm-svn: 151814
This allows us to make TRC non-polymorphic and value-initializable, eliminating a huge static
initializer and a ton of cruft from the generated code.
Shrinks ARMBaseRegisterInfo.o by ~100k.
llvm-svn: 151806
Simply treat bundles as instructions. Spill code is inserted between
bundles, never inside a bundle. Rewrite all operands in a bundle at
once.
Don't attempt and memory operand folding inside bundles.
llvm-svn: 151787
* Add begin_dynamic_table() / end_dynamic_table() private interface to ELFObjectFile.
* Add begin_libraries_needed() / end_libraries_needed() interface to ObjectFile, for grabbing the list of needed libraries for a shared object or dynamic executable.
* Implement this new interface completely for ELF, leave stubs for COFF and MachO.
* Add 'llvm-readobj' tool for dumping ObjectFile information.
llvm-svn: 151785
This allows the function to be inlined, and makes it suitable for use in
getInstructionIndex().
Also provide a const version. C++ is great for touch typing practice.
llvm-svn: 151782
- The search bounds are constant, in the worst case (ARM target) it will scan over 30 uint16_ts.
- This method isn't very hot, I had problems finding a testcase where it's called more than a dozen of times (no perf impact).
llvm-svn: 151773
Instead of nested switch statements, use a lookup table. On ARM, this replaces
a 23k (x86_64 release build) function with a 16k table. Its not unlikely to
be faster, as well.
llvm-svn: 151751
std::vector.
- Good for 1-2% speedup on writing PCH for Cocoa.h.
- Clang side API match to follow shortly, there wasn't an easy way to make this
non-breaking.
llvm-svn: 151750
Rename ST_External to ST_Unknown, and slightly change its semantics. It now only indicates that the symbol's type
is unknown, not that the symbol is undefined. (For that, use ST_Undefined).
llvm-svn: 151696
This function does more or less the same as
MI::readsWritesVirtualRegister(), but it supports bundles as well.
It also determines if any constraint requires reading and writing
operands to use the same register. Most clients want to know.
Use the more modern MO.readsReg() instead of trying to sort out undefs
and partial redefines. Stop supporting the extra full <imp-def> operand
as an alternative to <def,undef> sub-register defines.
llvm-svn: 151690
Extract a base class and provide four specific sub-classes for iterating
over const/non-const bundles/instructions.
This eliminates the mystery bool constructor argument.
llvm-svn: 151684
find root names on Unix.
- This fixes make_absolute to not basically always call current_path() on
Unix systems.
- I think the API probably needs cleanup in this area, but I'll let Michael
handle that.
llvm-svn: 151681
Without this hook, functions w/ a completely empty body (including no
epilogue) will cause an MCEmitter assertion failure.
For example,
define internal fastcc void @empty_function() {
unreachable
}
rdar://10947471
llvm-svn: 151673
SlotIndexes are not assigned to instructions inside bundles, but it is
still valid to look up the index of those instructions.
The reverse getInstructionFromIndex() will return the first instruction
in the bundle.
llvm-svn: 151672
methods are no longer needed now that LinearScan has gone away.
(Contains tweaks trivialSpillEverywhere to enable the removal of getNewVRegs).
llvm-svn: 151658
debug info for assembly files. We were already doing the right thing when
producing debug info for C/C++.
ELF linkers don't know dwarf, so they depend on these relocations to produce
valid dwarf output.
llvm-svn: 151655
Clang builds. The detection logic for compilers that support the warning
isn't working. Rafael is going to investigate it, but didn't want people
to have to wade through build spam until then.
llvm-svn: 151649
To avoid problems with zero shifts when getting the bits that move between words
we use a trick: first shift the by amount-1, then do another shift by one. When
amount is 0 (and size 32) we first shift by 31, then by one, instead of by 32.
Also fix a latent bug that emitted the low and high words in the wrong order
when shifting right.
Fixes PR12113.
llvm-svn: 151637
When the GEP index is a vector of pointers, the code that calculated the size
of the element started from the vector type, and not the contained pointer type.
As a result, instead of looking at the data element pointed by the vector, this
code used the size of the vector. This works for 32bit members (on 32bit
systems), but not for other types. Added code to peel the vector type and
added a test.
llvm-svn: 151626
the processor keeps a return addresses stack (RAS) which stores the address
and the instruction execution state of the instruction after a function-call
type branch instruction.
Calling a "noreturn" function with normal call instructions (e.g. bl) can
corrupt RAS and causes 100% return misprediction so LLVM should use a
unconditional branch instead. i.e.
mov lr, pc
b _foo
The "mov lr, pc" is issued in order to get proper backtrace.
rdar://8979299
llvm-svn: 151623
When an outgoing call takes more than 2k of arguments on the stack, we
don't allocate that call frame in the prolog, but adjust the stack
pointer immediately before the call instead.
This causes problems with the emergency spill slot because PEI can't
track stack pointer adjustments on the second pass, and if the outgoing
arguments are too big, SP can't be used to reach the emergency spill
slot at all.
Work around these problems by ensuring there is a base or frame pointer
that can be used to access the emergency spill slot.
<rdar://problem/10917166>
llvm-svn: 151604
We on the linker to resolve calls to the appropriate BL/BLX instruction
to make interworking function correctly. It uses the symbol in the
relocation to do that, so we need to be careful about being too clever.
To enable this for ARM mode, split the BL/BLX fixup kind off from the
unconditional-branch fixups.
rdar://10927209
llvm-svn: 151571
After the SlotIndex slot names were updated, it is possible to apply
stricter checks to live intervals.
Also treat bundles as bags of operands when checking live intervals.
llvm-svn: 151531
The MIOperands iterator can visit operands on a single instruction, or
all operands in a bundle. This simplifies code like the register
allocator that treats bundles as a set of operands.
llvm-svn: 151529
value numbers to be assigned when calculating any particular value number.
Enhance the logic that detects new value numbers to take this into account,
for a tiny compile time speedup. Fix a comment typo while there.
llvm-svn: 151522
%cmp (eg: A==B) we already replace %cmp with "true" under the true edge, and
with "false" under the false edge. This change enhances this to replace the
negated compare (A!=B) with "false" under the true edge and "true" under the
false edge. Reported to improve perlbench results by 1%.
llvm-svn: 151517
verifier does. This correctly handles invoke.
Thanks to Duncan, Andrew and Chris for the comments.
Thanks to Joerg for the early testing.
llvm-svn: 151469