The bad behavior happens when we have a function with a long linear chain of
basic blocks, and have a live range spanning most of this chain, but with very
few uses.
Let say we have only 2 uses.
The Hopfield network is only seeded with two active blocks where the uses are,
and each iteration of the outer loop in `RAGreedy::growRegion()` only adds two
new nodes to the network due to the completely linear shape of the CFG.
Meanwhile, `SpillPlacer->iterate()` visits the whole set of discovered nodes,
which adds up to a quadratic algorithm.
This is an historical accident effect from r129188.
When the Hopfield network is expanding, most of the action is happening on the
frontier where new nodes are being added. The internal nodes in the network are
not likely to be flip-flopping much, or they will at least settle down very
quickly. This means that while `SpillPlacer->iterate()` is recomputing all the
nodes in the network, it is probably only the two frontier nodes that are
changing their output.
Instead of recomputing the whole network on each iteration, we can maintain a
SparseSet of nodes that need to be updated:
- `SpillPlacement::activate()` adds the node to the todo list.
- When a node changes value (i.e., `update()` returns true), its neighbors are
added to the todo list.
- `SpillPlacement::iterate()` only updates the nodes in the list.
The result of Hopfield iterations is not necessarily exact. It should converge
to a local minimum, but there is no guarantee that it will find a global
minimum. It is possible that updating nodes in a different order will cause us
to switch to a different local minimum. In other words, this is not NFC, but
although I saw a few runtime improvements and regressions when I benchmarked
this change, those were side effects and actually the performance change is in
the noise as expected.
Huge thanks to Jakob Stoklund Olesen <stoklund@2pi.dk> for his feedbacks,
guidance and time for the review.
llvm-svn: 263460
Fundamentally, the length of a variable or function name is bound by the
maximum size of a record: 0xffff. However, the name doesn't live in a
vacuum; other data is associated with the name, lowering the bound
further.
We would naively attempt to emit the name, causing us to assert because
the record would no-longer fit in 16-bits. Instead, truncate the name
but preserve as much as we can.
While I have tested this locally, I've decided to not commit it due to
the test's size.
N.B. While this behavior is undesirable, it is better than MSVC's
behavior. They seem to truncate to ~4000 characters.
llvm-svn: 263378
Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions.
We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT.
Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718).
Differential Revision: http://reviews.llvm.org/D17932
llvm-svn: 263303
The truncation was causing the sorting algorithm to behave oddly when comparing
positive and negative offsets. Fortunately, this doesn't currently happen in
practice and was exposed by a WIP. Thus, I can't test this change now, but the
follow on patch will.
llvm-svn: 263255
llvm::getDISubprogram walks the instructions in a function, looking for one in the scope of the current function, so that it can find the !dbg entry for the subprogram itself.
Now that !dbg is attached to functions, this should not be necessary. This patch changes all uses to just query the subprogram directly on the function.
Ideally this should be NFC, but in reality its possible that a function:
has no !dbg (in which case there's likely a bug somewhere in an opt pass), or
that none of the instructions had a scope referencing the function, so we used to not find the !dbg on the function but now we will
Reviewed by Duncan Exon Smith.
Differential Revision: http://reviews.llvm.org/D18074
llvm-svn: 263184
Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.
Reapplied with a fix for PR26870 (avoid premature use of TargetConstant in ZERO_EXTEND_VECTOR_INREG expansion).
Differential Revision: http://reviews.llvm.org/D17691
llvm-svn: 263159
This is a fairly straightforward port to the new pass manager with one
exception. It removes a very questionable use of releaseMemory() in
the old pass to invalidate its caches between runs on a function.
I don't think this is really guaranteed to be safe. I've just used the
more direct port to the new PM to address this by nuking the results
object each time the pass runs. While this could cause some minor malloc
traffic increase, I don't expect the compile time performance hit to be
noticable, and it makes the correctness and other aspects of the pass
much easier to reason about. In some cases, it may make things faster by
making the sets and maps smaller with better locality. Indeed, the
measurements collected by Bruno (thanks!!!) show mostly compile time
improvements.
There is sadly very limited testing at this point as there are only two
tests of memdep, and both rely on GVN. I'll be porting GVN next and that
will exercise this heavily though.
Differential Revision: http://reviews.llvm.org/D17962
llvm-svn: 263082
This patch teaches CGP to duplicate addressing mode computations into cold paths (detected via explicit cold attribute on calls) if required to let addressing mode be safely sunk into the basic block containing each load and store.
In general, duplicating code into cold blocks may result in code growth, but should not effect performance. In this case, it's better to duplicate some code than to put extra pressure on the register allocator by making it keep the address through the entirely of the fast path.
This patch only handles addressing computations, but in principal, we could implement a more general cold cold scheduling heuristic which tries to reduce register pressure in the fast path by duplicating code into the cold path. Getting the profitability of the general case right seemed likely to be challenging, so I stuck to the existing case (addressing computation) we already had.
Differential Revision: http://reviews.llvm.org/D17652
llvm-svn: 263074
Summary:
The code in SelectionDAG did not handle the case where the
register type and output types were different, but had the same size.
Reviewers: arsenm, echristo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17940
llvm-svn: 263022
This re-applies r262886 with a fix for 32 bit platforms that have 8 byte
pointer alignment, effectively reverting r262892.
Original Message:
Currently some SDNode operands are malloc'd, some are stored inline in
subclasses of SDNode, and some are thrown into a BumpPtrAllocator.
This scheme is complex, inconsistent, and makes refactoring SDNodes
fairly difficult.
Instead, we can allocate all of the operands using an ArrayRecycler
that wraps a BumpPtrAllocator. This keeps the cache locality when
iterating operands, improves locality when iterating SDNodes without
looking at operands, and vastly simplifies the ownership semantics.
It also means we stop overallocating SDNodes by 2-3x and will make it
simpler to fix the rampant undefined behaviour we have in how we
mutate SDNodes from one kind to another (See llvm.org/pr26808).
This is NFC other than the changes in memory behaviour, and I ran some
LNT tests to make sure this didn't hurt compile time. Not many tests
changed: there were a couple of 1-2% regressions reported, but there
were more improvements (of up to 4%) than regressions.
llvm-svn: 262902
Looks like the largest SDNode is different between 32 and 64 bit now,
so this is breaking 32 bit bots. Reverting while I figure out a fix.
This reverts r262886.
llvm-svn: 262892
Currently some SDNode operands are malloc'd, some are stored inline in
subclasses of SDNode, and some are thrown into a BumpPtrAllocator.
This scheme is complex, inconsistent, and makes refactoring SDNodes
fairly difficult.
Instead, we can allocate all of the operands using an ArrayRecycler
that wraps a BumpPtrAllocator. This keeps the cache locality when
iterating operands, improves locality when iterating SDNodes without
looking at operands, and vastly simplifies the ownership semantics.
It also means we stop overallocating SDNodes by 2-3x and will make it
simpler to fix the rampant undefined behaviour we have in how we
mutate SDNodes from one kind to another (See llvm.org/pr26808).
This is NFC other than the changes in memory behaviour, and I ran some
LNT tests to make sure this didn't hurt compile time. Not many tests
changed: there were a couple of 1-2% regressions reported, but there
were more improvements (of up to 4%) than regressions.
llvm-svn: 262886
Before this change, we would get the type definition in the middle
of the instruction.
E.g., %0(48) = G_ADD %struct_alias = type { i32, i16 } %edi, %edi
Now, we have just the expected type name:
%0(48) = G_ADD %struct_alias %edi, %edi
llvm-svn: 262885
Now the type API is always available, but when global-isel is not
built the implementation does nothing.
Note: The implementation free of ifdefs is WIP and tracked here in PR26576.
llvm-svn: 262873
Rematerializing and merging into a bigger register class at the same
time, requires the subregister range lanemasks getting remapped to the
new register class.
This fixes http://llvm.org/PR26805
llvm-svn: 262768
copy coalescing with enabled subregister liveness can reveal undef uses,
previously this was only checked for the SrcReg in updateRegDefsUses()
but we need to check DstReg as well.
llvm-svn: 262767
The divrem combine assumed the type of the div/rem is simple, which isn't
necessarily true. This probably worked fine until r250825, since it only
saw legal types, but now breaks when it runs as a pre-type-legalization
combine.
This fixes PR26835.
Differential Revision: http://reviews.llvm.org/D17878
llvm-svn: 262746
When div+rem calls on the same arguments are found, the ARM back-end merges the
two calls into one __aeabi_divmod call for up to 32-bits values. However,
for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't
merging the calls, and thus calling ldivmod twice and spilling the temporary
results, which generated pretty bad code.
This patch legalises 64-bit lib calls for divmod, so that now all the spilling
and the second call are gone. It also relaxes the DivRem combiner a bit on the
legal type check, since it was already checking for isLegalOrCustom on every
value, so the extra check for isTypeLegal was redundant.
Second attempt, creating TLI.isOperationCustom like isOperationExpand, to make
sure we only emit valid types or the ones that were explicitly marked as custom.
Now, passing check-all and test-suite on x86, ARM and AArch64.
This patch fixes PR17193 (and a long time FIXME in the tests).
llvm-svn: 262738
Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.
Differential Revision: http://reviews.llvm.org/D17691
llvm-svn: 262599
Summary:
Removing MMOs is not our prefer behavior any more.
Reviewers: mcrosier, reames
Differential Revision: http://reviews.llvm.org/D17668
llvm-svn: 262580
If we have a loop with a rarely taken path, we will prune that from the blocks which get added as part of the loop chain. The problem is that we weren't then recognizing the loop chain as schedulable when considering the preheader when forming the function chain. We'd then fall to various non-predecessors before finally scheduling the loop chain (as if the CFG was unnatural.) The net result was that there could be lots of garbage between a loop preheader and the loop, even though we could have directly fallen into the loop. It also meant we separated hot code with regions of colder code.
The particular reason for the rejection of the loop chain was that we were scanning predecessor of the header, seeing the backedge, believing that was a globally more important predecessor (true), but forgetting to account for the fact the backedge precessor was already part of the existing loop chain (oops!.
Differential Revision: http://reviews.llvm.org/D17830
llvm-svn: 262547
Catch objects with a displacement of zero do not initialize a catch
object. The displacement is relative to %rsp at the end of the
function's prologue for x86_64 targets.
If we place an object at the top-of-stack, we will end up wit a
displacement of zero resulting in our catch object remaining
uninitialized.
Address this by creating our catch objects as fixed objects. We will
ensure that the UnwindHelp object is created after the catch objects so
that no catch object will have a displacement of zero.
Differential Revision: http://reviews.llvm.org/D17823
llvm-svn: 262546
When div+rem calls on the same arguments are found, the ARM back-end merges the
two calls into one __aeabi_divmod call for up to 32-bits values. However,
for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't
merging the calls, and thus calling ldivmod twice and spilling the temporary
results, which generated pretty bad code.
This patch legalises 64-bit lib calls for divmod, so that now all the spilling
and the second call are gone. It also relaxes the DivRem combiner a bit on the
legal type check, since it was already checking for isLegalOrCustom on every
value, so the extra check for isTypeLegal was redundant.
This patch fixes PR17193 (and a long time FIXME in the tests).
llvm-svn: 262507
The placement new calls here were all calling the allocation function
in RecyclingAllocator/Recycler for SDNode, instead of the function for
the specific subclass we were constructing.
Since this particular allocator always overallocates it more or less
worked, but would hide what we're actually doing from any memory
tools. Also, if you tried to change this allocator so something like a
BumpPtrAllocator or MallocAllocator, the compiler would crash horribly
all the time.
Part of llvm.org/PR26808.
llvm-svn: 262500
On AMDGPU where operations i64 operations are often bitcasted to v2i32
and back, this pattern shows up regularly where it breaks some
expected combines on i64, such as load width reducing.
This fixes some test failures in a future commit when i64 loads
are changed to promote.
llvm-svn: 262397
This reverts commit r262316.
It seems that my change breaks an out-of-tree chromium buildbot, so
I'm reverting this in order to investigate the situation further.
llvm-svn: 262387
Summary:
Calls sometimes need to be convergent. This is already handled at the
LLVM IR level, but it also needs to be handled at the MI level.
Ideally we'd propagate convergence from instructions, down through the
selection DAG, and into MIs. But this is Hard, and would affect
optimizations in the SDNs -- right now only SDNs with two operands have
any flags at all.
Instead, here's a much simpler hack: Add new opcodes for NVPTX for
convergent calls, and generate these when lowering convergent LLVM
calls.
Reviewers: jholewinski
Subscribers: jholewinski, chandlerc, joker.eph, jhen, tra, llvm-commits
Differential Revision: http://reviews.llvm.org/D17423
llvm-svn: 262373
Summary:
This patch modifies the existing comparison, branch, conditional-move
and select patterns, and adds new ones where needed. Also, the updated
SLT{u,i,iu} set of instructions generate a GPR width result.
The majority of the code changes in the Mips back-end fix the wrong
assumption that the result of SETCC nodes always produce an i32 value.
The changes in the common code path account for the fact that in 64-bit
MIPS targets, i1 is promoted to i32 instead of i64.
Reviewers: dsanders
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D10970
llvm-svn: 262316
The CatchObjOffset is relative to the end of the EH registration node
for 32-bit x86 WinEH targets. A special sentinel value, 0, is used to
indicate that no catch object should be initialized.
This means that a catch object allocated immediately before the
registration node would be assigned a CatchObjOffset of 0, leading the
runtime to believe that a catch object should not be initialized.
To handle this, allocate the registration node prior to any other frame
object. This will ensure that catch objects will not be allocated
before the registration node.
This fixes PR26757.
Differential Revision: http://reviews.llvm.org/D17689
llvm-svn: 262294
When a variable is described by a single DBG_VALUE instruction we can
often use a more efficient inline DW_AT_location instead of using a
location list.
This commit makes the heuristic that decides when to apply this
optimization stricter by also verifying that the DBG_VALUE is live at the
entry of the function (instead of just checking that it is valid until
the end of the function).
<rdar://problem/24611008>
llvm-svn: 262247
InlineSpiller::rematerializeFor() never uses its parameter as an
iterator, so take it by reference instead. This removes an implicit
conversion from MachineBasicBlock::iterator to MachineInstr*.
llvm-svn: 262152
Change MachineInstr API to prefer MachineInstr& over MachineInstr*
whenever the parameter is expected to be non-null. Slowly inching
toward being able to fix PR26753.
llvm-svn: 262149
In the case where op = add, y = base_ptr, and x = offset, this
transform:
(op y, (op x, c1)) -> (op (op x, y), c1)
breaks the canonical form of add by putting the base pointer in the
second operand and the offset in the first.
This fix is important for the R600 target, because for some address
spaces the base pointer and the offset are stored in separate register
classes. The old pattern caused the ISel code for matching addressing
modes to put the base pointer and offset in the wrong register classes,
which required no-trivial code transformations to fix.
llvm-svn: 262148
Take parameters as MachineInstr& instead of MachineInstr* in
AntiDepBreaker API, since these are required to be non-null. No
functionality change intended. Looking toward PR26753.
llvm-svn: 262145
In all but one case, change the DFAPacketizer API to take MachineInstr&
instead of MachineInstr*. In DFAPacketizer::endPacket(), take
MachineBasicBlock::iterator. Besides cleaning up the API, this is in
search of PR26753.
llvm-svn: 262142
Update APIs in MachineInstrBundle.h to take and return MachineInstr&
instead of MachineInstr* when the instruction cannot be null. Besides
being a nice cleanup, this is tacking toward a fix for PR26753.
llvm-svn: 262141
Take MachineInstr by reference instead of by pointer in SlotIndexes and
the SlotIndex wrappers in LiveIntervals. The MachineInstrs here are
never null, so this cleans up the API a bit. It also incidentally
removes a few implicit conversions from MachineInstrBundleIterator to
MachineInstr* (see PR26753).
At a couple of call sites it was convenient to convert to a range-based
for loop over MachineBasicBlock::instr_begin/instr_end, so I added
MachineBasicBlock::instrs.
llvm-svn: 262115
Summary: This is extracted from D17555
Reviewers: davidxl, reames, sanjoy, MatzeB, pete
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17580
llvm-svn: 262058
MBB slot index intervals are half open, not closed. getMBBEndIndex()
returns the slot index of the start of the next block in layout order.
Placing a register mask there is incorrect if the successor of the
funclet return is not laid out after the return. Clang generates IR for
catch bodies before generating the following normal code, so we never
noticed this issue until the D frontend authors filed a bug about it.
Instead, we can put the clobber mask on the last instruction of the
funclet return block. We still aren't using a register mask operand on
the CATCHRET instruction because it would cause PEI to spill all CSRs,
including XMM regs, in the prologue.
Fixes PR26679.
llvm-svn: 262035
This also simplifies the code by removing the overly conservative
NoInterveningSideEffect() function. This function checked:
- That the two copies belong to the same block: We only process one
block at a time and clear our maps in between it is impossible to find a
copy from a different block.
- There is no terminator between the two copy instructions: This is not
allowed anyway (the MachineVerifier would complain)
- Does not have instructions with hasUnmodeledSideEffects() or isCall()
set: Even for those instructuction we must have all clobbers/defs of
registers explicit as an operand. If the register is explicitely
clobbered we would never come to the point of checking for
NoInterveningSideEffect() anyway.
(I also checked this with a temporary build of the test-suite with all
potentially failing conditions in NoInterveningSideEffect() turned into
asserts)
Differential Revision: http://reviews.llvm.org/D17474
llvm-svn: 261965
Inline-asm calls aren't annotated with funclet bundle operands because
they don't throw and cannot be inlined through. We shouldn't require
them to bear an funclet bundle operand.
llvm-svn: 261942
Summary:
Both the hardware and LLVM have changed since 2012.
Now, load-based heuristic don't show big differences any more on OoO cores.
There is no notable regressons and improvements on spec2000/2006. (Cortex-A57, Core i5).
Reviewers: spatel, zansari
Differential Revision: http://reviews.llvm.org/D16836
llvm-svn: 261809
(This is the second attemp to commit this patch, after fixing pr26652 & pr26653).
This patch detects vector reductions before instruction selection. Vector
reductions are vectorized reduction operations, and for such operations we have
freedom to reorganize the elements of the result as long as the reduction of them
stay unchanged. This will enable some reduction pattern recognition during
instruction combine such as SAD/dot-product on X86. A flag is added to
SDNodeFlags to mark those vector reduction nodes to be checked during instruction
combine.
To detect those vector reductions, we search def-use chains starting from the
given instruction, and check if all uses fall into two categories:
1. Reduction with another vector.
2. Reduction on all elements.
in which 2 is detected by recognizing the pattern that the loop vectorizer
generates to reduce all elements in the vector outside of the loop, which
includes several ShuffleVector and one ExtractElement instructions.
Differential revision: http://reviews.llvm.org/D15250
llvm-svn: 261804
This fixes bugs in copy elimination code in llvm. It slightly changes the
semantics of clearRegisterKills(). This is appropriate because:
- Users in lib/CodeGen/MachineCopyPropagation.cpp and
lib/Target/AArch64RedundantCopyElimination.cpp and
lib/Target/SystemZ/SystemZElimCompare.cpp are incorrect without it
(see included testcase).
- All other users in llvm are unaffected (they pass TRI==nullptr)
- (Kill flags are optional anyway so removing too many shouldn't hurt.)
Differential Revision: http://reviews.llvm.org/D17554
llvm-svn: 261763
This is a part of the refactoring to unify isSafeToLoadUnconditionally and isDereferenceablePointer functions. In subsequent change I'm going to eliminate isDerferenceableAndAlignedPointer from Loads API, leaving isSafeToLoadSpecualtively the only function to check is load instruction can be speculated.
Reviewed By: hfinkel
Differential Revision: http://reviews.llvm.org/D16180
llvm-svn: 261736
Summary: If a function is hot, put it in text.hot section.
Reviewers: davidxl
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17532
llvm-svn: 261607
Change TargetInstrInfo API to take `MachineInstr&` instead of
`MachineInstr*` in the functions related to predicated instructions
(I'll try to come back later and get some of the rest). All of these
functions require non-null parameters already, so references are more
clear. As a bonus, this happens to factor away a host of implicit
iterator => pointer conversions.
No functionality change intended.
llvm-svn: 261605
Summary: If a function is hot, put it in text.hot section.
Reviewers: davidxl
Subscribers: eraman, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D17460
llvm-svn: 261582
This was causing assertions later from using the wrong pointer
size with LDS operations. getOptimalMemOpType should also have
address space arguments later.
This avoids assertions in existing tests exposed by
a future commit.
llvm-svn: 261580
DMB instructions can be expensive, so it's best to avoid them if possible. In
atomicrmw operations there will always be an attempted store so a release
barrier is always needed, but in the cmpxchg case we can delay the DMB until we
know we'll definitely try to perform a store (and so need release semantics).
In the strong cmpxchg case this isn't quite free: we must duplicate the LDREX
instructions to skip the barrier on subsequent iterations. The basic outline
becomes:
ldrex rOld, [rAddr]
cmp rOld, rDesired
bne Ldone
dmb
Lloop:
strex rRes, rNew, [rAddr]
cbz rRes Ldone
ldrex rOld, [rAddr]
cmp rOld, rDesired
beq Lloop
Ldone:
So we'll skip this version for strong operations in "minsize" functions.
llvm-svn: 261568
Summary:
Also add a comment briefly explaining what ifcnv is.
No functional changes.
Reviewers: resistor
Subscribers: echristo, tra, llvm-commits
Differential Revision: http://reviews.llvm.org/D17430
llvm-svn: 261543
Summary:
Convergent instrs shouldn't be made control-dependent on other values,
but this is basically the whole point of tail duplication. So just bail
if we see a convergent instruction.
Reviewers: iteratee
Subscribers: jholewinski, jhen, hfinkel, tra, jingyue, llvm-commits
Differential Revision: http://reviews.llvm.org/D17320
llvm-svn: 261540
This reverts commit r261510, effectively reapplying r261509. The
original commit missed a caller in AArch64ConditionalCompares.
Original commit message:
Pass non-null arguments by reference in MachineTraceMetrics::Trace,
simplifying future work to remove implicit iterator => pointer
conversions.
llvm-svn: 261511
Pass non-null arguments by reference in MachineTraceMetrics::Trace,
simplifying future work to remove implicit iterator => pointer
conversions.
llvm-svn: 261509
Delete MachineInstr::getIterator(), since the term "iterator" is
overloaded when talking about MachineInstr.
- Downcast to ilist_node in iplist::getNextNode() and getPrevNode() so
that ilist_node::getIterator() is still available.
- Add it back as MachineInstr::getInstrIterator(). This matches the
naming in MachineBasicBlock.
- Add MachineInstr::getBundleIterator(). This is explicitly called
"bundle" (not matching MachineBasicBlock) to disintinguish it clearly
from ilist_node::getIterator().
- Update all calls. Some of these I switched to `auto` to remove
boiler-plate, since the new name is clear about the type.
There was one call I updated that looked fishy, but it wasn't clear what
the right answer was. This was in X86FrameLowering::inlineStackProbe(),
added in r252578 in lib/Target/X86/X86FrameLowering.cpp. I opted to
leave the behaviour unchanged, but I'll reply to the original commit on
the list in a moment.
llvm-svn: 261504
I missed == and != when I removed implicit conversions between iterators
and pointers in r252380 since they were defined outside ilist_iterator.
Since they depend on getNodePtrUnchecked(), they indirectly rely on UB.
This commit removes all uses of these operators. (I'll delete the
operators themselves in a separate commit so that it can be easily
reverted if necessary.)
There should be NFC here.
llvm-svn: 261498
Stop using `getNodePtrUnchecked()` when building IR. Eventually a
dereference will be required to get at the downcast node, since the
iterator will only store an `ilist_node_base` of some sort.
This should have no functionality change for now, but is a path towards
removing some more UB from ilist.
llvm-svn: 261495
`ilist_iterator<NodeTy>::getNodePtrUnchecked()` is documented as being
for internal use only, but CodeGenPrepare was using it anyway. This
code relies on pulling out the `Value*` pointer even after the lifetime
of the iterator is over. But having this pointer available in
ilist_iterator depends on UB in the first place.
Instead, safely pull out the `Value*` when the iterator is alive and
stop using the internal-only API.
There should be no functionality change here.
llvm-svn: 261493
COFF doesn't have sections with mergeable contents. Instead, each
constant pool entry ends up in a COMDAT section. The linker, when
choosing between COMDAT sections, doesn't choose the max alignment of
the two sections. You just get whatever alignment was on the section.
If one constant needed a higher alignment in one object file from
another one, then we will get into trouble if the linker chooses the
lower alignment one.
Instead, lets promote the alignment of the constant pool entry to make
sure we don't use an under aligned constant with an instruction which
assumed otherwise.
This fixes PR26680.
llvm-svn: 261462
TailDuplicate can run on either on SSA code or non-SSA code, as indicated to
it by MRI->isSSA() ("PreRegAlloc" here). TailDuplicate does extra work to
preserve SSA invariants when it duplicates code. This patch makes it skip
some of this extra work in the case where the code is not in SSA form.
llvm-svn: 261450
This avoids unnecessarily passing them around when calling helper
functions. It may also be slightly faster to call clear() on the
datastructures instead of freshly initializing them for each block.
llvm-svn: 261407
Now that we don't always add an element to AllocatedStackSlots if we
don't find a pre-existing unallocated stack slot, bumping
StatepointMaxSlotsRequired to `NumSlots + 1` is not correct. Instead
bump the statistic near the push_back, to
Builder.FuncInfo.StatepointStackSlots.size().
llvm-svn: 261348
The check on MFI->getObjectSize() has to be on the FrameIndex, not on
the index of the FrameIndex in AllocatedStackSlots. Weirdly, the tests
I added in rL261336 didn't catch this.
llvm-svn: 261347
NFCI. They key motivation here is that I'd like to use
SmallBitVector::all() in a later change. Also, using a bit vector here
seemed better in general.
The only interesting change here is that in the failure case of
allocateStackSlot, we no longer (the equivalent of) push_back(true) to
AllocatedStackSlots. As far as I can tell, this is fine, since we'd
never re-use those slots in the same StatepointLoweringState instance.
Technically there was no need to change the operator[] type accesses to
set() and test(), but I thought it'd be nice to make it obvious that
we're using something other than a std::vector like thing.
llvm-svn: 261337
allocateStackSlot did not consider the size of the value to be spilled
before deciding to re-use a spill slot. This was originally okay (since
originally we'd only ever spill pointers), but it became not okay when
we changed our scheme to directly spill vectors of pointers.
While this change fixes the bug pointed out, it has two performance
caveats:
- It matches spill slot and spillee size exactly, while in theory we
can spill, e.g., an 8 byte pointer into a 16 byte slot. This is
slightly complicated to fix since in the stackmaps section, we report
the size of the spill slot as the size of the "indirect value"; and
if they're no longer equivalent, we'll have to keep track of the
(indirect) value size separately from the stack slot size.
- It will "spuriously run out" of reusable slots, since we now have an
second check in the search loop in addition to the availablity
check (e.g. you had two free scalar slots, and you first ask for a
vector slot followed by a scalar slot). I'll fix this in a later
commit.
llvm-svn: 261336
This removes the unusual loop structure in allocateStackSlot in favor of
something more straightforward. I've also removed the cautionary
comment in the function, which I suspect is historical cruft now, and
confuses more than it enlightens.
llvm-svn: 261335
Certain optimization passes (like globaldce) can prune function
declaration that SjLjEHPrepare assumed would exit when it'd
runOnFunction.
This fixes PR26669.
llvm-svn: 261303
Summary:
Without this, this command
$ llvm-run llc -stop-after machine-cp -o - <( echo '' )
outputs an error, because we close stdout twice -- once when closing the
file opened for "-o", and again when closing outs().
Also clarify in the outs() definition that you can't ever call it if you
want to open your own raw_fd_ostream on stdout.
Reviewers: jroelofs, tstellarAMD
Subscribers: jholewinski, qcolombet, dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D17422
llvm-svn: 261286
Today, we do not allow cmpxchg operations with pointer arguments. We require the frontend to insert ptrtoint casts and do the cmpxchg in integers. While correct, this is problematic from a couple of perspectives:
1) It makes the IR harder to analyse (for instance, it make capture tracking overly conservative)
2) It pushes work onto the frontend authors for no real gain
This patch implements the simplest form of IR support. As we did with floating point loads and stores, we teach AtomicExpand to convert back to the old representation. This prevents us needing to change all backends in a single lock step change. Over time, we can migrate each backend to natively selecting the pointer type. In the meantime, we get the advantages of a cleaner IR representation without waiting for the backend changes.
Differential Revision: http://reviews.llvm.org/D17413
llvm-svn: 261281
covmap needs to created as non allocatable, but not with
SHT_NOTE. The latter was needed to workaround a problem
of BFD linker with gc, which is no longer needed. (A more
proper longer term fix requires changing FE driver to force
referencing the section using linker script).
Differential Revision: http://reviews.llvm.org/D17309
llvm-svn: 261228
The commit breaks stage2 compilation on PowerPC. Reverting for now while
this is analyzed. I also have to revert the LiveIntervalTest for now as
that depends on this commit.
Revert "LiveIntervalAnalysis: Remove LiveVariables requirement"
This reverts commit r260806.
Revert "Remove an unnecessary std::move to fix -Wpessimizing-move warning."
This reverts commit r260931.
Revert "Fix typo in LiveIntervalTest"
This reverts commit r260907.
Revert "Add unittest for LiveIntervalAnalysis::handleMove()"
This reverts commit r260905.
llvm-svn: 261189
described by an immediate.
Found via http://reviews.llvm.org/D16867
Thanks to Paul Robinson for pointing this out.
<rdar://problem/24456528>
llvm-svn: 261168
This patch detects vector reductions before instruction selection. Vector
reductions are vectorized reduction operations, and for such operations we have
freedom to reorganize the elements of the result as long as the reduction of them
stay unchanged. This will enable some reduction pattern recognition during
instruction combine such as SAD/dot-product on X86. A flag is added to
SDNodeFlags to mark those vector reduction nodes to be checked during instruction
combine.
To detect those vector reductions, we search def-use chains starting from the
given instruction, and check if all uses fall into two categories:
1. Reduction with another vector.
2. Reduction on all elements.
in which 2 is detected by recognizing the pattern that the loop vectorizer
generates to reduce all elements in the vector outside of the loop, which
includes several ShuffleVector and one ExtractElement instructions.
Differential revision: http://reviews.llvm.org/D15250
llvm-svn: 261070
This apparently comes up when the register allocator decides that a
variable will become undef along a certain path.
Also improve the error message we emit when we can't map from LLVM
register number to CV register number.
llvm-svn: 261016
Original message:
Get rid of the ifdefs in TargetLowering.
Introduce a new API used only by GlobalISel: CallLowering.
This API will contain target hooks dedicated to call lowering.
llvm-svn: 260998
This is an updated version which fixes a bug that happened with
uses tied to an earlyclobber operand which end at an unusual slotindex.
If two definitions write to independent subregisters then they can be
put in any order. LiveIntervalAnalysis::handleMove() did not support
this previously because it looks like moving a definition of a vreg past
another one.
This is a modified version of a patch proposed (two years ago) by
Vincent Lejeune! This version does not touch the read-undef flags and is
extended for the case of moving a subregister def behind all uses - this
can happen for subregister defs that are completely unused.
Differential Revision: http://reviews.llvm.org/D9067
llvm-svn: 260906
This function was basically useless, since volatile memacesses or MIs with
unmodelled sideffects become global memory objects, and the other little
checks are also done elsewhere.
Reviewed by Andy Trick
http://reviews.llvm.org/D16881
llvm-svn: 260899
This requirement was a huge hack to keep LiveVariables alive because it
was optionally used by TwoAddressInstructionPass and PHIElimination.
However we have AnalysisUsage::addUsedIfAvailable() which we can use in
those passes.
llvm-svn: 260806
Summary:
This patch skips DAG combine of fp_round (fp_round x) if it results in
an fp_round from f80 to f16.
fp_round from f80 to f16 always generates an expensive (and as yet,
unimplemented) libcall to __truncxfhf2. This prevents selection of
native f16 conversion instructions from f32 or f64. Moreover, the first
(value-preserving) fp_round from f80 to either f32 or f64 may become a
NOP in platforms like x86.
Reviewers: ab
Subscribers: srhines, llvm-commits
Differential Revision: http://reviews.llvm.org/D17221
llvm-svn: 260769
The code change is simple enough: instead of attaching an anonymous SDLoc to splatted
vector constants, use the scalar constant's existing SDLoc since that is what is passed
into getConstant() as a param. But this changes instruction scheduling, so I'll explain
why that happens.
The motivation for this patch starts near:
http://reviews.llvm.org/rL258833
...x86's getZeroVector() could be similarly cleaned up and I thought it would be 'NFC'.
But when I made that change locally, several x86 codegen tests wiggled.
It turns out that the lack of SDLoc consistency in getConstant() changes the way
ScheduleDAGRRList behaves. This is because the SDLoc contains 'IROrder' and some DAG
scheduler algorithms use IROrder for tie-breaking.
Differential Revision: http://reviews.llvm.org/D16972
llvm-svn: 260582
Rather than storing type units in a vector and emitting them at the end
of code generation, emit them immediately and destroy them, reclaiming the
memory we were using for their DIEs.
In one benchmark carried out against Chromium's 50 largest (by bitcode
file size) translation units, total peak memory consumption with type units
decreased by median 17%, or by 7% when compared against disabling type units.
Tested using check-{llvm,clang}, the GDB 7.5 test suite (with
'-fdebug-types-section') and by eyeballing llvm-dwarfdump output on those
Chromium translation units with split DWARF both disabled and enabled, and
verifying that the only changes were to addresses and abbreviation ordering.
Differential Revision: http://reviews.llvm.org/D17118
llvm-svn: 260578
If there were wrapper functions with no instructions of their own in the
inlining tree, we would fail to emit InlineSite records for them.
llvm-svn: 260571
The rational for this change is that LLVMBuild cannot express conditional
dependencies. Therefore, when we start optionally using GlobalISel library for
say AArch64, without that change, all the tools that use the AArch64 library
would need to explicitly link with GlobalISel when we ask for it.
This does not scale.
Instead, we will set the dependencies between the target and GlobalISel and if
we did not ask to build GlobalISel, the library will just be empty.
Thanks to Chris Bieneman and Mehdi Animi for the idea.
llvm-svn: 260566
If two definitions write to independent subregisters then they can be
put in any order. LiveIntervalAnalysis::handleMove() did not support
this previously because it looks like moving a definition of a vreg past
another one.
This is a modified version of a patch proposed (two years ago) by
Vincent Lejeune! This version does not touch the read-undef flags and is
extended for the case of moving a subregister def behind all uses - this
can happen for subregister defs that are completely unused.
Differential Revision: http://reviews.llvm.org/D9067
llvm-svn: 260565
We actually need that information only for generic instructions, therefore it
would be nice not to have to pay the extra memory consumption for all
instructions. Especially because a typed non-generic instruction does not make
sense.
The question is then, is it possible to have that information in a union or
something?
My initial thought was that we could have a derived class GenericMachineInstr
with additional information, but in practice it makes little to no sense since
generic MachineInstrs are likely turned into non-generic ones by just switching
the opcode. In other words, we don't want to go through the process of creating
a new, non-generic MachineInstr, object each time we do this switch. The memory
benefit probably is not worth the extra compile time.
Another option would be to keep the type of the MachineInstr in a side table.
This would induce an extra indirection though.
Anyway, I will file a PR to discuss about it and remember we need to come back
to it at some point.
llvm-svn: 260558
If a class has hidden visibility all derived classes and all classes
that have it as a member must have hidden visibility too. That may
be fixable here but requires changes to quite a lot of debug info
classes.
This is also one of the things that GCC enforces aggressively while
clang ignores it, making testing more annoying than necessary.
llvm-svn: 260529
For now, generic virtual registers will not have a register class. We may want
to change that. For instance, if we want to use all the methods from
TargetRegisterInfo with generic virtual registers, we need to either have some
sort of generic register classes that do what we want, or teach those methods
how to deal with nullptr register class.
Although the latter seems easy enough to do, we may still want to differenciate
generic register classes from nullptr to catch cases where nullptr gets
introduced by a bug of some sort.
Anyway, I will file a PR to keep track of that.
llvm-svn: 260474
Summary:
Refactor common value, scope, and label tracking logic out of DwarfDebug
into a common base class called DebugHandlerBase.
Update an old LLVM IR test case to avoid an assertion in LexicalScopes.
Reviewers: dblaikie, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16931
llvm-svn: 260432
I reinvented this functionality in http://reviews.llvm.org/D16828 because it was
hidden away as a static function. The changes in x86 are not based on a complete
audit. I suspect there are other possible uses there, and there are almost certainly
more potential users in other targets.
llvm-svn: 260295
This matches GCC and MSVC's behaviour, and saves on code size.
We were already not extending i1 return values on x86_64 after r127766. This
takes that patch further by applying it to x86 target as well, and also for i8
and i16.
The ABI docs have been unclear about the required behaviour here. The new i386
psABI [1] clearly states (Table 2.4, page 14) that i1, i8, and i16 return
vales do not need to be extended beyond 8 bits. The x86_64 ABI doc is being
updated to say the same [2].
Differential Revision: http://reviews.llvm.org/D16907
[1]. https://01.org/sites/default/files/file_attach/intel386-psabi-1.0.pdf
[2]. https://groups.google.com/d/msg/x86-64-abi/E8O33onbnGQ/_RFWw_ixDQAJ
llvm-svn: 260133
If a range has a lower bound of 0, add an AssertZext from the
nearest floor power of two.
This allows operations with some workitem intrinsics with known
maximum ranges to use fast 24-bit multiplies.
llvm-svn: 260109
LiveRangeEdit::eliminateDeadDef is used to remove dead define instructions
after rematerialization. To remove a VNI for a vreg from its LiveInterval,
LiveIntervals::removeVRegDefAt is used. However, after non-PHI VNIs are all
removed, PHI VNI are still left in the LiveInterval. Such unused vregs will
be kept in RegsToSpill[] at the end of InlineSpiller::reMaterializeAll and
spiller will allocate stackslot for them.
The fix is to get rid of unused reg by checking whether it has non-dbg
reference instead of whether it has non-empty interval.
llvm-svn: 259895
Only single and double FP immediates are correctly printed by
MachineInstr::print() during debug output. Half float type goes to
APFloat::convertToDouble() and hits assertion it is not a double
semantics. This diff prints half machine operands correctly.
This cannot currently be hit by any in-tree target.
Patch by Stanislav Mekhanoshin
llvm-svn: 259857
This patch corresponds to review:
http://reviews.llvm.org/D16847
There are some files in glibc that use the output operand modifier even though
it was deprecated in GCC. This patch just adds support for it to prevent issues
with such files.
llvm-svn: 259798
This patch implements softening of long double type (ppcf128) on ppc32
architecture and enables operations for this type for soft float.
Patch by Strahinja Petrovic.
Differential Revision: http://reviews.llvm.org/D15811
llvm-svn: 259791
This is mostly about having shorter lines and standardizing on one
interface, but it also avoids some needless indirection.
No functional change.
llvm-svn: 259697
Summary:
In r257979, I added code to ensure that we wouldn't merge DebugLocEntries if
the pieces they describe overlap. Unfortunately, I failed to cover the case,
where there may have multiple active Expressions in the entry, in which case we
need to make sure that no two values overlap before we can perform the merge.
This fixed PR26148.
Reviewers: aprantl
Differential Revision: http://reviews.llvm.org/D16742
llvm-svn: 259696
This patch consists of two parts: a performance fix in DAGCombiner.cpp
and a correctness fix in SelectionDAG.cpp.
The test case tests the bug that's uncovered by the performance fix, and
fixed by the correctness fix.
The performance fix keeps the containers required by the
hasPredecessorHelper (which is a lazy DFS) and reuse them. Since
hasPredecessorHelper is called in a loop, the overall efficiency reduced
from O(n^2) to O(n), where n is the number of SDNodes.
The correctness fix keeps iterating the neighbor list even if it's time
to early return. It will return after finishing adding all neighbors to
Worklist, so that no neighbors are discarded due to the original early
return.
llvm-svn: 259691
Recommited, after some fixing with test cases.
Updated test cases:
test/CodeGen/AArch64/arm64-misched-memdep-bug.ll
test/CodeGen/AArch64/tailcall_misched_graph.ll
Temporarily disabled test cases:
test/CodeGen/AMDGPU/split-vector-memoperand-offsets.ll
test/CodeGen/PowerPC/ppc64-fastcc.ll (partially updated)
test/CodeGen/PowerPC/vsx-fma-m.ll
test/CodeGen/PowerPC/vsx-fma-sp.ll
http://reviews.llvm.org/D8705
Reviewers: Hal Finkel, Andy Trick.
llvm-svn: 259673
The register coalescer can rematerialize constants that define
more of a register than the copy it is going to replace was going
to do.
This is valid in the case the register was undef before the
copy happened.
This patch makes sure that all the subranges defined by the new
rematerialization instructions have at least a dead def.
Review: http://reviews.llvm.org/D16693
llvm-svn: 259614
CodeView requires us to accurately describe the extent of the inlined
code. We did this by grabbing the next debug location in source order
and using *that* to denote where we stopped inlining. However, this is
not sufficient or correct in instances where there is no next debug
location or the next debug location belongs to the start of another
function.
To get this correct, use the end symbol of the function to denote the
last possible place the inlining could have stopped at.
llvm-svn: 259548
This directive emits the binary annotations that describe line and code
deltas in inlined call sites. Single-stepping through inlined frames in
windbg now works.
llvm-svn: 259535
When rematerializing a computation by replacing the copy, use the copy's
location. The location of the copy is more representative of the
original program.
This partially fixes PR10003.
llvm-svn: 259469
Summary:
This is an extension to the existing implementation of r242436 which
restricts to only select inputs. This version fixes missed opportunities
in pr26084 by attempting to lower conditional compare sequences of
and/or trees with setcc leafs. This will additionaly handle the case
when a tree with select input is not a conjunction-disjunction tree
but some of the sub trees are conjunction-disjunction trees.
Reviewers: jmolloy, t.p.northover, mcrosier, MatzeB
Subscribers: mcrosier, llvm-commits, junbuml, haicheng, mssimpso, gberry
Differential Revision: http://reviews.llvm.org/D16291
llvm-svn: 259387
Changed emitting offset of macinfo entry into compiler unit DIE to use "addSectionLabel" method rather than explicitly calculating size/offset of macro entry.
Differential Revision: http://reviews.llvm.org/D16292
llvm-svn: 259358
Those commits created an artificial edge from a cleanup to a synthesized
catchswitch in order to get the MSVC personality routine to execute
cleanups which don't cleanupret and are not wrapped by a catchswitch.
This worked well enough but is not a complete solution in situations
where there the cleanup infinite loops.
However, the real deal breaker behind this approach comes about from a
degenerate case where the cleanup is post-dominated by unreachable *and*
throws an exception. This ends poorly because the catchswitch will
inadvertently catch the exception.
Because of this we should go back to our previous behavior of not
executing certain cleanups (identical behavior with the Itanium ABI
implementation in clang, GCC and ICC).
N.B. I think this could be salvaged by making the catchpad rethrow the
exception and properly transforming throwing calls in the cleanup into
invokes.
llvm-svn: 259338
Summary:
There are three parts to inlined call frames:
1. The inlinee line subsection
2. The inline site symbol record
3. The function ids referenced by both
This change starts by emitting function ids (3) for all subprograms and
emitting the base inline site symbol record (2). The actual line numbers
in (2) use an encoded format that will come next, along with the inlinee
line subsection.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16333
llvm-svn: 259217
The buildSchedGraph() was in need of reworking as the AA features had been
added on top of earlier code. It was very difficult to understand, and buggy.
There had been found cases where scheduling dependencies had actually been
missed (see r228686).
AliasChain, RejectMemNodes, adjustChainDeps() and iterateChainSucc() have
been removed. There are instead now just the four maps from Value to SUs, which
have been renamed to Stores, Loads, NonAliasStores and NonAliasLoads.
An unknown store used to become the AliasChain, but now becomes a store mapped
to 'unknownValue' (in Stores). What used to be PendingLoads is instead the
list of SUs mapped to 'unknownValue' in Loads.
RejectMemNodes and adjustChainDeps() used to be a safety-net for everything.
The SU maps were sometimes cleared and SUs were put in RejectMemNodes, where
adjustChainDeps() would look. Instead of this, a more straight forward approach
is used in maintaining the SU maps without clearing them and simply letting
them grow over time. Instead of the cutt-off in adjustChainDeps() search, a
reduction of maps will be done if needed (see below).
Each SUnit either becomes the BarrierChain, or is put into one of the maps. For
each SUnit encountered, all the information about previous ones are still
available until a new BarrierChain is set, at which point the maps are cleared.
For huge regions, the algorithm becomes slow, therefore the maps will get
reduced at a threshold (current default is 1000 nodes), by a fraction (default 1/2).
These values can be tuned by use of CL options in case some test case shows that
they need to be changed (-dag-maps-huge-region and -dag-maps-reduction-size).
There has not been any considerable change observed in output quality or compile
time. There may now be more DAG edges inserted than before (i.e. if A->B->C,
then A->C is not needed). However, in a comparison run there were fewer total
calls to AA, and a somewhat improved compile time, which means this seems to
be not a problem.
http://reviews.llvm.org/D8705
Reviewers: Hal Finkel, Andy Trick.
llvm-svn: 259201
This reverts commit r259117.
The LineInfo constructor is defined in the codeview library and we have
to link against it now. Doing that isn't trivial, so reverting for now.
llvm-svn: 259126
Adds a new family of .cv_* directives to LLVM's variant of GAS syntax:
- .cv_file: Similar to DWARF .file directives
- .cv_loc: Similar to the DWARF .loc directive, but starts with a
function id. CodeView line tables are emitted by function instead of
by compilation unit, so we needed an extra field to communicate this.
Rather than overloading the .loc direction further, we decided it was
better to have our own directive.
- .cv_stringtable: Emits the codeview string table at the current
position. Currently this just contains the filenames as
null-terminated strings.
- .cv_filechecksums: Emits the file checksum table for all files used
with .cv_file so far. There is currently no support for emitting
actual checksums, just filenames.
This moves the line table emission code down into the assembler. This
is in preparation for implementing the inlined call site line table
format. The inline line table format encoding algorithm requires knowing
the absolute code offsets, so it must run after the assembler has laid
out the code.
David Majnemer collaborated on this patch.
llvm-svn: 259117
While legalizing a 64-bit shift left by 1, the following occurs:
We split the shift operand in half: a high half and a low half.
We then create an ADDC with the low half and a ADDE with the high half +
the carry bit from the ADDC.
This is problematic if X is any_ext'd because the high half computation
is now undef + undef + carry bit and there is no way to ensure that the
two undef values had the same bitwise representation. This results in
the lowest bit in the high half turning into garbage.
Instead, do not try to turn shifts into arithmetic during type
legalization.
This fixes PR26350.
llvm-svn: 259065
Re-commit of r258951 after fixing layering violation.
The related LLVM patch adds a backend diagnostic type for reporting
unsupported features, this adds a printer for them to clang.
In the case where debug location information is not available, I've
changed the printer to report the location as the first line of the
function, rather than the closing brace, as the latter does not give the
user any information. This also affects optimisation remarks.
Differential Revision: http://reviews.llvm.org/D16590
llvm-svn: 259035
Summary:
findBetterNeighborChains does not handle volatile or indexed stores.
However, it did not check when adding stores to ChainedStores.
Reviewers: arsenm
Differential Revision: http://reviews.llvm.org/D16463
llvm-svn: 259024
The BPF and WebAssembly backends had identical code for emitting errors
for unsupported features, and AMDGPU had very similar code. This merges
them all into one DiagnosticInfo subclass, that can be used by any
backend.
There should be minimal functional changes here, but some AMDGPU tests
have been updated for the new format of errors (it used a slightly
different format to BPF and WebAssembly). The AMDGPU error messages will
now benefit from having precise source locations when debug info is
available.
The implementation of DiagnosticInfoUnsupported::print must be in
lib/Codegen rather than in the existing file in lib/IR/ to avoid
introducing a dependency from IR to CodeGen.
Differential Revision: http://reviews.llvm.org/D16590
llvm-svn: 258951
Summary:
This patch is provided in preparation for removing autoconf on 1/26. The proposal to remove autoconf on 1/26 was discussed on the llvm-dev thread here: http://lists.llvm.org/pipermail/llvm-dev/2016-January/093875.html
"I felt a great disturbance in the [build system], as if millions of [makefiles] suddenly cried out in terror and were suddenly silenced. I fear something [amazing] has happened."
- Obi Wan Kenobi
Reviewers: chandlerc, grosbach, bob.wilson, tstellarAMD, echristo, whitequark
Subscribers: chfast, simoncook, emaste, jholewinski, tberghammer, jfb, danalbert, srhines, arsenm, dschuff, jyknight, dsanders, joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D16471
llvm-svn: 258861
These two functions are hard to reason about. This commit makes the code
more comprehensible:
- Use four distinct variables (OldIdxIn, OldIdxOut, NewIdxIn, NewIdxOut)
with a fixed value instead of a changing iterator I that points to
different things during the function.
- Remove the early explanation before the function in favor of more
detailed comments inside the function. Should have more/clearer comments now
stating which conditions are tested and which invariants hold at
different points in the functions.
The behaviour of the code was not changed.
I hope that this will make it easier to review the changes in
http://reviews.llvm.org/D9067 which I will adapt next.
Differential Revision: http://reviews.llvm.org/D16379
llvm-svn: 258756
When generating calls to memcpy, memmove, and memset, use void* as the return
type rather than void, to match the standard signatures for these functions.
This has no practical effect for most targets, since the return values of
these calls aren't being used anyway, and most calling conventions tolerate
this kind of mismatch. However, this change will help support future
optimizations to utilize the return value to avoid holding the argument
value live across a call.
llvm-svn: 258691
A cleanup can have paths which unwind or end up in unreachable.
If there is an unreachable path *and* a path which unwinds to caller,
we would mistakenly inject an unwind path to a catchswitch on the
unreachable path. This results in a verifier assertion firing because
the cleanup unwinds to two different places: to the caller and to the
catchswitch.
This occured because we used getCleanupRetUnwindDest to determine if the
cleanuppad had no cleanuprets.
This is incorrect, getCleanupRetUnwindDest returns null for cleanuprets
which unwind to caller.
llvm-svn: 258651
Cleanups in C++ are a little weird. They are only guaranteed to be
reliably executed if, and only if, there is a viable catch handler which
can handle the exception.
This means that reachability of a cleanup is lexically determined by it
being nested with a try-block which unwinds to a catch. It is *cannot*
be reasoned about by examining the control flow edges leaving a cleanup.
Usually this is not a problem. It becomes a problem when there are *no*
edges out of a cleanup because we believed that code post-dominated by
the cleanup is dead. In LLVM's case, this code is what informs the
personality routine about the presence of a suitable catch handler.
However, the lack of edges to that catch handler makes the handler
become unreachable which causes us to remove it. By removing the
handler, the cleanup becomes unreachable.
Instead, inject a catch-all handler with every cleanup that has no
unwind edges. This will allow us to properly unwind the stack.
This fixes PR25997.
llvm-svn: 258580
Summary:
The testing for returnBB was flipped which may cause ARM ld/st opt pass uses callee saved regs in returnBB when shrink-wrap is used.
Reviewers: t.p.northover, apazos, MatzeB
Subscribers: mcrosier, zzheng, aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D16434
llvm-svn: 258569
This reapplies r258296 and r258366, and also fixes an existing bug in
SelectionDAG.cpp's isMemSrcFromString, neglecting to account for the
offset in a GlobalAddressSDNode, which is uncovered by those patches.
llvm-svn: 258482
This reverts r258296 and the follow up r258366. With this change, we
miscompiled the following program on Windows:
#include <string>
#include <iostream>
static const char kData[] = "asdf jkl;";
int main() {
std::string s(kData + 3, sizeof(kData) - 3);
std::cout << s << '\n';
}
llvm-svn: 258465
The X86 musttail implementation finds register parameters to forward by
running the calling convention algorithm until a non-register location
is returned. However, assigning a vector memory location has the side
effect of increasing the function's stack alignment. We shouldn't
increase the stack alignment when we are only looking for register
parameters, so this change conditionalizes it.
llvm-svn: 258442
Summary: This option is being added for testing purposes.
Reviewers: mcrosier
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D16410
llvm-svn: 258409
Summary:
Although this is a slight cleanup on its own, the main motivation is to
refactor the constant folding API to ease migration to opaque pointers.
This will be follow-up work.
Reviewers: eddyb
Subscribers: zzheng, dblaikie, llvm-commits
Differential Revision: http://reviews.llvm.org/D16380
llvm-svn: 258390
Summary:
Mark the LLVMGlobalISel library as optional in
LLVMBuild.txt, since the library is only built
if LLVM_BUILD_GLOBAL_ISEL is set. Without doing
this, llvm-config includes the library in the
list of components regardless of whether it's
built, and then will error out when asked for
the library names/paths.
Reviewers: qcolombet
Subscribers: joker.eph, llvm-commits, vkalintiris
Differential Revision: http://reviews.llvm.org/D16386
llvm-svn: 258379
This patch adds the necessary plumbing to cmake to build the sources related to
GlobalISel.
To build the sources related to GlobalISel, we need to add -DBUILD_GLOBAL_ISEL=ON.
By default, this is OFF, thus GlobalISel sources will not impact people that do
not explicitly opt-in.
Differential Revision: http://reviews.llvm.org/D15983
llvm-svn: 258344
If converter was somewhat careless about "diamond" cases, where there
was no join block, or in other words, where the true/false blocks did
not have analyzable branches. In such cases, it was possible for it to
remove (needed) branches, resulting in a loss of entire basic blocks.
Differential Revision: http://reviews.llvm.org/D16156
llvm-svn: 258310
SelectionDAG previously missed opportunities to fold constants into
GlobalAddresses in several areas. For example, given `(add (add GA, c1), y)`, it
would often reassociate to `(add (add GA, y), c1)`, missing the opportunity to
create `(add GA+c, y)`. This isn't often visible on targets such as X86 which
effectively reassociate adds in their complex address-mode folding logic,
however it is currently visible on WebAssembly since it currently has very
simple address mode folding code that doesn't reassociate anything.
This patch fixes this by making SelectionDAG fold offsets into GlobalAddresses
at the same times that it folds constants together, so that it doesn't miss any
opportunities to perform such folding.
Differential Revision: http://reviews.llvm.org/D16090
llvm-svn: 258296
Note that this is disabled by default and still requires a patch to
handleMove() which is not upstreamed yet.
If the TrackLaneMasks policy/strategy is enabled the MachineScheduler
will build a schedule graph where definitions of independent
subregisters are no longer serialised.
Implementation comments:
- Without lane mask tracking a sub register def also counts as a use
(except for the first one with the read-undef flag set), with lane
mask tracking enabled this is no longer the case.
- Pressure Diffs where previously maintained per definition of a
vreg with the help of the SSA information contained in the
LiveIntervals. With lanemask tracking enabled we cannot do this
anymore and instead change the pressure diffs for all uses of the vreg
as it becomes live/dead. For this changed style to work correctly we
ignore uses of instructions that define the same register again: They
won't affect register pressure.
- With lanemask tracking we remove all read-undef flags from
sub register defs when building the graph and re-add them later when
all vreg lanes have become dead.
Differential Revision: http://reviews.llvm.org/D14969
llvm-svn: 258259
This renaming is necessary to avoid a subregister aware scheduler
accidentally creating liveness "holes" which are rejected by the
MachineVerifier.
Explanation as found in this patch:
Helper class that can divide MachineOperands of a virtual register into
equivalence classes of connected components.
MachineOperands belong to the same equivalence class when they are part of
the same SubRange segment or adjacent segments (adjacent in control
flow); Different subranges affected by the same MachineOperand belong to
the same equivalence class.
Example:
vreg0:sub0 = ...
vreg0:sub1 = ...
vreg0:sub2 = ...
...
xxx = op vreg0:sub1
vreg0:sub1 = ...
store vreg0:sub0_sub1
The example contains 3 different equivalence classes:
- One for the (dead) vreg0:sub2 definition
- One containing the first vreg0:sub1 definition and its use,
but not the second definition!
- The remaining class contains all other operands involving vreg0.
We provide a utility function here to rename disjunct classes to different
virtual registers.
Differential Revision: http://reviews.llvm.org/D16126
llvm-svn: 258257
Summary:
This teaches MachineSink to not sink instructions that might break the
implicit null check optimization that runs later. This should not
affect frontends that do not use implicit null checks.
Reviewers: aadg, reames, hfinkel, atrick
Subscribers: majnemer, llvm-commits
Differential Revision: http://reviews.llvm.org/D14632
llvm-svn: 258254
According the build bots, clang is using the Registry class somewhere as well. Will reapply with appropriate clang changes at a later point.
llvm-svn: 258159
The Registry class constructs a linked list of nodes whose storage is inside static variables and nodes are added via static initializers. The trick is that those static initializers are in both the LLVM code base, and some random plugin that might get loaded in at runtime. The existing code tries to use C++ templates and their ODR rules to get a single definition of the registry for each type, but, experimentally, this doesn't quite work as designed. (Well, the entire structure doesn't. It might not actually be an ODR problem.)
Previously, when I tried moving the GCStrategy class (along with it's registry) from CodeGen to IR, I ran into a problem where asking the GCStrategyRegistry a question would return inconsistent results depending on whether you asked from CodeGen (where the static initializers still were) or Transforms. My best guess is that this is a result of either a) an order of initialization error, or b) we ended up with two copies of the registry being created. I remember at the time having convinced myself it was probably (b), but I don't have any of my notes around from that investigation any more.
See http://reviews.llvm.org/rL226311 for the original patch in question.
This patch tries to remove the possibility of (b) above. (a) was already fixed in change 258109.
Differential Revision: http://reviews.llvm.org/D16170
llvm-svn: 258157
Summary:
GEPOperator: provide getResultElementType alongside getSourceElementType.
This is made possible by adding a result element type field to GetElementPtrConstantExpr, which GetElementPtrInst already has.
GEP: replace get(Pointer)ElementType uses with get{Source,Result}ElementType.
Reviewers: mjacob, dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16275
llvm-svn: 258145
Combine a bunch of small files into a single, still rather small, file. The primary purpose of this is to get all of the static initializers into a single file so as to have a well defined order of initialization.
llvm-svn: 258109
Summary:
Currently llvm::SplitModule as the first step globalizes all local objects, which might not be desirable in some scenarios.
This change adds a new flag to llvm::SplitModule that uses SCC approach to search for a balanced partition without the need to externalize symbols.
Such partition might not be possible or fully balanced for a given number of partitions, and is a function of the module properties (global/local dependencies within the module).
Joint development Tobias Edler von Koch (tobias@codeaurora.org) and Sergei Larin (slarin@codeaurora.org)
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D16124
llvm-svn: 258083