The DIType* for void is the null pointer. A null DIType can never be a
qualified type, so we can just exit the loop at this point and go to
getTypeIndex(BaseTy).
Fixes PR27984
llvm-svn: 271550
Summary:
If the target requests it, use emptry spaces in the fixed and
callee-save stack area to allocate local stack objects.
AArch64: Change last callee-save reg stack object alignment instead of
size to leave a gap to take advantage of above change.
Reviewers: t.p.northover, qcolombet, MatzeB
Subscribers: rengolin, mcrosier, llvm-commits, aemerson
Differential Revision: http://reviews.llvm.org/D20220
llvm-svn: 271527
Although this was intended to be NFC, the test case wiggle shows a change in
code scheduling/RA caused by a difference in the SDLoc() generation.
Depending on how you look at it, this is the (dis)advantage of exact checking
in regression tests.
llvm-svn: 271526
Use the type index of the underlying type unless we have a typedef from
long to HRESULT; HRESULT typedefs are translated to T_HRESULT.
llvm-svn: 271494
When the index is known to be constant 0, insert directly into the the low half,
instead of spilling, performing the insert in-memory, and reloading.
Differential Revision: http://reviews.llvm.org/D20763
llvm-svn: 271428
Summary:
Re-enable lifetime-start-on-first-use for stack coloring,
but explicitly disable it for slots with more than one start
or end lifetime marker.
Bug: 27903
Reviewers: wmi, tejohnson, qcolombet, gbiv
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20739
llvm-svn: 271412
Summary:
This is meant to be the tiniest step towards DIType to CV type index
translation that I could come up with. Whenever translation fails, we use type
index zero, which is the unknown type.
Reviewers: aaboud, zturner
Subscribers: llvm-commits, amccarth
Differential Revision: http://reviews.llvm.org/D20840
llvm-svn: 271408
This should have been converting the size to bytes, but wasn't really.
These should probably all be using getStoreSize instead.
I haven't been able to come up with a meaningful testcase for this.
I can trigger it using combinations of struct loads and stores,
but can't observe a difference in non-broken testcases.
isAlias is only really used during store merging, so I'm not sure how
to get into the vector splitting situation the comment describes
since store merging is only done before type legalization.
llvm-svn: 271356
Refactor LiveIntervals::renameDisconnectedComponents() to be a pass.
Also change the name to "RenameIndependentSubregs":
- renameDisconnectedComponents() worked on a MachineFunction at a time
so it is a natural candidate for a machine function pass.
- The algorithm is testable with a .mir test now.
- This also fixes a problem where the lazy renaming as part of the
MachineScheduler introduced IMPLICIT_DEF instructions after the number
of a nodes in a region were counted leading to a mismatch.
Differential Revision: http://reviews.llvm.org/D20507
llvm-svn: 271345
We think it's OK to generate half fminnan because it's legal for the
transform-to type (f32; r245196). However, PromoteFloatRes was missing
the case; simply promote like the other binops, including minnum.
llvm-svn: 271317
Adds the method MCStreamer::EmitBinaryData, which is usually an alias
for EmitBytes. In the MCAsmStreamer case, it is overridden to emit hex
dump output like this:
.byte 0x0e, 0x00, 0x08, 0x10
.byte 0x03, 0x00, 0x00, 0x00
.byte 0x00, 0x00, 0x00, 0x00
.byte 0x00, 0x10, 0x00, 0x00
Also, when verbose asm comments are enabled, this patch prints the dump
output for each comment before its record, like this:
# ArgList (0x1000) {
# TypeLeafKind: LF_ARGLIST (0x1201)
# NumArgs: 0
# Arguments [
# ]
# }
.byte 0x06, 0x00, 0x01, 0x12
.byte 0x00, 0x00, 0x00, 0x00
This should make debugging easier and testing more convenient.
Reviewers: aaboud
Subscribers: majnemer, zturner, amccarth, aaboud, llvm-commits
Differential Revision: http://reviews.llvm.org/D20711
llvm-svn: 271313
This adds support to the backed to actually support SjLj EH as an exception
model. This is *NOT* the default model, and requires explicitly opting into it
from the frontend. GCC supports this model and for MinGW can still be enabled
via the `--using-sjlj-exceptions` options.
Addresses PR27749!
llvm-svn: 271244
Summary:
Turn off lifetime-start-on-first-use enhancement for the moment
pending a fix for bug 27903.
Bug: 27903
Reviewers: tejohnson, wmi, qcolombet, gbiv
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20731
llvm-svn: 271003
Fix: updated clang code which was not updated by mistake.
Original commit message:
[llvm-mc] - Teach llvm-mc to generate zlib styled compression sections.
This patch is strongly based on previously reverted D20331.
(because of gnuutils < 2.26 does not support compressed debug sections in non zlib-gnu style)
Difference that this patch supports both zlib and zlib-gnu styles.
-compress-debug-sections option now supports next values:
-compress-debug-sections=zlib-gnu
-compress-debug-sections=zlib
-compress-debug-sections=none
Previously specifying -compress-debug-sections enabled zlib-gnu compression,
so anyone can put "-compress-debug-sections=zlib-gnu" to restore the behavior
that was before this patch for case when compression was enabled.
Differential revision: http://reviews.llvm.org/D20676
llvm-svn: 270987
It broke buildbot:
http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/13585/steps/build/logs/stdio
Initial commit message:
[llvm-mc] - Teach llvm-mc to generate zlib styled compression sections.
This patch is strongly based on previously reverted D20331.
(because of gnuutils < 2.26 does not support compressed debug sections in non zlib-gnu style)
Difference that this patch supports both zlib and zlib-gnu styles.
-compress-debug-sections option now supports next values:
-compress-debug-sections=zlib-gnu
-compress-debug-sections=zlib
-compress-debug-sections=none
Previously specifying -compress-debug-sections enabled zlib-gnu compression,
so anyone can put "-compress-debug-sections=zlib-gnu" to restore the behavior
that was before this patch for case when compression was enabled.
Differential revision: http://reviews.llvm.org/D20676
llvm-svn: 270978
This patch is strongly based on previously reverted D20331.
(because of gnuutils < 2.26 does not support compressed debug sections in non zlib-gnu style)
Difference that this patch supports both zlib and zlib-gnu styles.
-compress-debug-sections option now supports next values:
-compress-debug-sections=zlib-gnu
-compress-debug-sections=zlib
-compress-debug-sections=none
Previously specifying -compress-debug-sections enabled zlib-gnu compression,
so anyone can put "-compress-debug-sections=zlib-gnu" to restore the behavior
that was before this patch for case when compression was enabled.
Differential revision: http://reviews.llvm.org/D20676
llvm-svn: 270977
CriticalAntiDepBreaker was not correctly tracking defs of the high X86 byte
registers, leading to incorrect use of a busy register to break an
antidependence.
Fixes pr27681, and its duplicates pr27580, pr27804.
Differential Revision: http://reviews.llvm.org/D20456
llvm-svn: 270935
This patch builds upon r270776 and speeds up
LiveDebugValues::transferDebugValue() by adding an index that maps each
DebugVariable to its open VarLoc.
The transferDebugValue() function needs to close all open ranges for a
given DebugVariable. Iterating over the set bits of OpenRanges is
prohibitively slow in practice. I experimented with using the sorted map
of VarLocs in the UniqueVector to iterate only over the range of VarLocs
with a given DebugVariable, but the binary search turned out to be even
more expensive than just iterating over the set bits in OpenRanges.
Instead, this patch exploits the fact that there can only be one open
location for each DebugVariable and redundantly stores this location in a
DenseMap.
This patch brings the time spent in the LiveDebugValues pass down to an
almost neglectiable amount.
http://llvm.org/bugs/show_bug.cgi?id=26055http://reviews.llvm.org/D20636
rdar://problem/24091200
llvm-svn: 270923
Summary:
This allows the linker to discard unused symbol information for comdat
functions that were discarded during the link. Before this change,
searching for the name of an inline function in the debugger would
return multiple results, one per symbol subsection in the object file.
After this change, there is only one result, the result for the function
chosen by the linker.
Reviewers: zturner, majnemer
Subscribers: aaboud, amccarth, llvm-commits
Differential Revision: http://reviews.llvm.org/D20642
llvm-svn: 270792
This patch modifies the LiveDebugValues pass to use more efficient set
data structures as outlined in PR26055. Both VarLocSet and VarLocList are
now SparseBitVectors which allows us to perform much faster bitvector
arithmetic on them.
The speedup can be in the order of minutes especially on ASANified code.
The change is not NFC in the assembler output because the inserted
DBG_VALUEs are now sorted by variable and location.
Many thanks to Daniel Berlin for helping design the improved algorithm and
reviewing the patch.
https://llvm.org/bugs/show_bug.cgi?id=26055http://reviews.llvm.org/D20178
rdar://problem/24091200
llvm-svn: 270776
LegalizeIntegerTypes does not have a way to expand multiplications for large
integer types (i.e. larger than twice the native bit width). There's no
standard runtime call to use in that case, and so we'd just assert.
Unfortunately, as it turns out, it is possible to hit this case from
standard-ish C code in rare cases. A particular case a user ran into yesterday
involved an __int128 induction variable and a loop with a quadratic (not
linear) recurrence which triggered some backend logic using SCEVExpander. In
this case, the BinomialCoefficient code in SCEV generates some i129 variables,
which get widened to i256. At a high level, this is not actually good (i.e. the
underlying optimization, PPCLoopPreIncPrep, should not be transforming the loop
in question for performance reasons), but regardless, the backend shouldn't
crash because of cost-modeling issues in the optimizer.
This is a straightforward implementation of the multiplication expansion, based
on the algorithm in Hacker's Delight. I validated it against the code for the
mul256b function from http://locklessinc.com/articles/256bit_arithmetic/ using
random inputs. There should be no functional change for previously-working code
(the new expansion code only replaces an assert).
Fixes PR19797.
llvm-svn: 270720
Also, rename recognizeBitReverseOrBSwapIdiom to recognizeBSwapOrBitReverseIdiom,
so the ordering of the MatchBSwaps and MatchBitReversals arguments are
consistent with the function name.
llvm-svn: 270715
We have to modify V2SU before inserting new elements into the
CurrentVRegDefs set because that may move V2SU in memory invalidating
the reference.
llvm-svn: 270644
The benefits of this patch are
-- We call AnalyzeBranch() to optimize unanalyzable branches, but the result of
AnalyzeBranch() is not used. Now the result is useful.
-- Before the layout of all the MBBs is set, the result of AnalyzeBranch() is
not correct and needs to be fixed before using it to optimize the branch
conditions. Now this optimization is called after the layout, the code used
to fix the result of AnalyzeBranch() is not needed.
-- The branch condition of the last block is not optimized before. Now it is
optimized.
Differential Revision: http://reviews.llvm.org/D20177
llvm-svn: 270623
These attributes aren't used by other debuggers (& may be confused with
other DWARF extensions) so they just waste space (about 1.5% on .dwo
file size on a random large program I tested).
We could remove the ObjC property ones too, but I figured they were
probably more necessary when trying to understand ObjC (I could be wrong
though) & so any debugger interested in working with ObjC would use
them, perhaps? (also, there are some legacy tests in Clang that test for
them - making it one of those annoying cross-project commits and/or
cleanup to refactor those tests)
llvm-svn: 270613
Replace bidirectional flow analysis to compute liveness with forward
analysis pass. Treat lifetimes as starting when there is a first
reference to the stack slot, as opposed to starting at the point of the
lifetime.start intrinsic, so as to increase the number of stack
variables we can overlap.
Reviewers: gbiv, qcolumbet, wmi
Differential Revision: http://reviews.llvm.org/D18827
Bug: 25776
llvm-svn: 270559
Before r269750 we did the comparisons in this loop in signed ints so
that it DTRT when MinCSFrameIndex was 0. This was changed because it's
now possible for MinCSFrameIndex to be UINT_MAX, but that introduced a
bug when we were comparing `>= 0` - this is tautological in unsigned.
Rework the comparisons here to avoid issues with unsigned wrapping.
No test. I couldn't find a way to get any of the StackGrowsUp in-tree
targets to reach the code that sets MinCSFrameIndex.
llvm-svn: 270492
This effectively revers commit r270389 and re-lands r270106, but it's
almost a rewrite.
The behavior change in r270106 was that we could no longer assume that
each LF_FUNC_ID record got its own type index. This patch adds a map
from DINode* to TypeIndex, so we can stop making that assumption.
This change also emits padding bytes between type records similar to the
way MSVC does. The size of the type record includes the padding bytes.
llvm-svn: 270485
Summary:
MBBs don't necessarily have a name (in my experience, they almost never
do), in which case this logging is quite unhelpful. The number seems to
work well.
Reviewers: iteratee
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20533
llvm-svn: 270477
This will pave the way to introduce a full fledged symbol visitor
similar to how we have a type visitor, thus allowing the same
dumping code to be used in llvm-readobj and llvm-pdbdump.
Differential Revision: http://reviews.llvm.org/D20384
Reviewed By: rnk
llvm-svn: 270475
This fixes a bug introduced in:
r262115 - CodeGen: Take MachineInstr& in SlotIndexes and LiveIntervals, NFC
The iterator End here might == MBB->end(), and so we can't unconditionally
dereference it. This often goes unnoticed (I don't have a test case that always
crashes, and ASAN does not catch it either) because the function call arguments are
turned right back into iterators. MachineInstrBundleIterator's constructor,
however, does have an assert which might randomly fire.
llvm-svn: 270323
Prior to this patch, we were using 1 for all the repairing costs.
Now, we use the information from the target to get this information.
llvm-svn: 270304
We now use LiveRangeCalc::extendToUses() instead of a specially designed
algorithm in constructMainRangeFromSubranges():
- The original motivation for constructMainRangeFromSubranges() were
differences between the main liverange and subranges because of hidden
dead definitions. This case however cannot happen anymore with the
DetectDeadLaneMasks pass in place.
- It simplifies the code.
- This fixes a longstanding bug where we did not properly create new SSA
values on merging control flow (the MachineVerifier missed most of
these cases).
- Move constructMainRangeFromSubranges() to LiveIntervalAnalysis and
LiveRangeCalc to better match the implementation/available helper
functions.
This re-applies r269016. The fixes from r270290 and r270259 should avoid
the machine verifier problems this time.
llvm-svn: 270291
It is fine for subregister ranges to be undefined on some CFG paths as
we may have a "vregX:other_subreg<read-undef> =" def on that path. We
do not (and should not) have live segments for the subregister ranges.
The MachineVerifier should not complain about this.
This is a slight variant of http://llvm.org/PR27705
llvm-svn: 270290
Depending on the compiler used to build LLVM, llvm_unreachable can either
expand to a call to abort(), or to a __builtin_unreachable. The latter
does not have a predictable behavior at runtime.
llvm-svn: 270260
Fix renameDisconnectedComponents() creating vreg uses that can be
reached from function begin withouthaving a definition (or explicit
live-in). Fix this by inserting IMPLICIT_DEF instruction before
control-flow joins as necessary.
Removes an assert from MachineScheduler because we may now get
additional IMPLICIT_DEF when preparing the scheduling policy.
This fixes the underlying problem of http://llvm.org/PR27705
llvm-svn: 270259
This gives AsmPrinter a chance to initialize its DD field before
we call beginModule(), which is about to start using it.
Differential Revision: http://reviews.llvm.org/D20413
llvm-svn: 270258
We are about to start using DIEDwarfExpression to create global variable
DIEs, which happens before we generate code for functions.
Differential Revision: http://reviews.llvm.org/D20412
llvm-svn: 270257
The Fast mode takes the first mapping, the greedy mode loops over all
the possible mapping for an instruction and choose the cheaper one.
Test case will come with target specific code, since we currently do not
have instructions that have several mappings.
llvm-svn: 270249
computeMapping.
Computing the cost of a mapping takes some time.
Since in Fast mode, the cost is irrelevant, just spare some cycles by not
computing it.
In Greedy mode, we need to choose the best cost, that means that when
the local cost gets more expensive than the best cost, we can stop
computing the repairing and cost for the current mapping.
llvm-svn: 270245
The previous choice of the insertion points for repairing was
straightfoward but may introduce some basic block or edge splitting. In
some situation this is something we can avoid.
For instance, when repairing a phi argument, instead of placing the
repairing on the related incoming edge, we may move it to the previous
block, before the terminators. This is only possible when the argument
is not defined by one of the terminator.
llvm-svn: 270232
an instruction.
Use the previously introduced RepairingPlacement class to split the code
computing the repairing placement from the code doing the actual
placement. That way, we will be able to consider different placement and
then, only apply the best one.
llvm-svn: 270168
When assigning the register banks we may have to insert repairing code
to move already assigned values accross register banks.
Introduce a few helper classes to keep track of what is involved in the
repairing of an operand:
- InsertPoint and its derived classes record the positions, in the CFG,
where repairing has to be inserted.
- RepairingPlacement holds all the insert points for the repairing of an
operand plus the kind of action that is required to do the repairing.
This is going to be used to keep track of how the repairing should be
done, while comparing different solutions for an instruction. Indeed, we
will need the repairing placement to capture the cost of a solution and
we do not want to compute it a second time when we do the actual
repairing.
llvm-svn: 270167
register bank twice.
Prior to this change, we were checking if the assignment for the current
machine operand was matching, then we would check if the mismatch
requires to insert repair code.
We actually already have this information from the first check, so just
pass it along.
NFCI.
llvm-svn: 270166
Sorry for the lack testcase. There is one in the pr, but it depends on
std::sort and the .ll version is 110 lines, so I don't think it is
wort it.
The bug was that we were sorting after adding a terminator, and the
sorting algorithm could end up putting the terminator in the middle of
the List vector.
With that we would create a Spans map entry keyed on nullptr which would
then be added to CUs and fail in that sorting.
llvm-svn: 270165
This helper class will be used to represent the cost of mapping an
instruction to a specific register bank.
The particularity of these costs is that they are mostly local, thus the
frequency of the basic block is irrelevant. However, for few
instructions (e.g., phis and terminators), the cost may be non-local and
then, we need to account for the frequency of the involved basic blocks.
This will be used by the greedy mode I am working on.
llvm-svn: 270163
Using Chandler's words from r265331:
This commit was greatly exacerbating PR17409 and effectively regressed
build time for lot of (very large) code when compiled with ASan or MSan.
PR17409 is fixed by r269249, so this is fine to reapply r263460.
Original commit message:
The bad behavior happens when we have a function with a long linear
chain of basic blocks, and have a live range spanning most of this
chain, but with very few uses.
Let say we have only 2 uses.
The Hopfield network is only seeded with two active blocks where the
uses are, and each iteration of the outer loop in
`RAGreedy::growRegion()` only adds two new nodes to the network due to
the completely linear shape of the CFG. Meanwhile,
`SpillPlacer->iterate()` visits the whole set of discovered nodes, which
adds up to a quadratic algorithm.
This is an historical accident effect from r129188.
When the Hopfield network is expanding, most of the action is happening
on the frontier where new nodes are being added. The internal nodes in
the network are not likely to be flip-flopping much, or they will at
least settle down very quickly. This means that while
`SpillPlacer->iterate()` is recomputing all the nodes in the network, it
is probably only the two frontier nodes that are changing their output.
Instead of recomputing the whole network on each iteration, we can
maintain a SparseSet of nodes that need to be updated:
- `SpillPlacement::activate()` adds the node to the todo list.
- When a node changes value (i.e., `update()` returns true), its
neighbors are added to the todo list.
- `SpillPlacement::iterate()` only updates the nodes in the list.
The result of Hopfield iterations is not necessarily exact. It should
converge to a local minimum, but there is no guarantee that it will find
a global minimum. It is possible that updating nodes in a different
order will cause us to switch to a different local minimum. In other
words, this is not NFC, but although I saw a few runtime improvements
and regressions when I benchmarked this change, those were side effects
and actually the performance change is in the noise as expected.
Huge thanks to Jakob Stoklund Olesen <stoklund@2pi.dk> for his
feedbacks, guidance and time for the review.
llvm-svn: 270149
When matching an interleaved load to an ldN pattern, the interleaved access
pass checks that all users of the load are shuffles. If the load is used by an
instruction other than a shuffle, the pass gives up and an ldN is not
generated. This patch considers users of the load that are extractelement
instructions. It attempts to modify the extracts to use one of the available
shuffles rather than the load. After the transformation, the load is only used
by shuffles and will then be matched with an ldN pattern.
Differential Revision: http://reviews.llvm.org/D20250
llvm-svn: 270142
A baby step toward translating DIType records to CodeView.
This does not (yet) combine the record length with the record data. I'm going back and forth trying to determine if that's a good idea.
llvm-svn: 270106
Previously, specifying -post-RA-scheduler=true had the side effect of
disabling the antidependency breaker, yielding different behavior than
if the post-RA-scheduler was enabled via the scheduling model.
Differential Revision: http://reviews.llvm.org/D20186
llvm-svn: 270077
There are at least 2 places (DAGCombiner, X86ISelLowering) where this could be used instead
of ad-hoc and watered down code that is trying to match a power-of-2 pattern.
Differential Revision: http://reviews.llvm.org/D20439
llvm-svn: 270073
When looking for an available spill slot, the register scavenger would stop
after finding the first one with no register assigned to it. That slot may
have size and alignment that do not meet the requirements of the register
that is to be spilled. Instead, find an available slot that is the closest
in size and alignment to one that is needed to spill a register from RC.
Differential Revision: http://reviews.llvm.org/D20295
llvm-svn: 269969
with an additional fix to make RegAllocFast ignore undef physreg uses. It would
previously get confused about the "push %eax" instruction's use of eax. That
method for adjusting the stack pointer is used in X86FrameLowering::emitSPUpdate
as well, but since that runs after register-allocation, we didn't run into the
RegAllocFast issue before.
llvm-svn: 269949
* Reworks the CVSymbolTypes.def to work similarly to TypeRecords.def.
* Moves some enums from SymbolRecords.h to CodeView.h to maintain
consistency with how we do type records.
* Generalize a few simple things like the record prefix
* Define the leaf enum and the kind enum similar to how we do with tyep
records.
Differential Revision: http://reviews.llvm.org/D20342
Reviewed By: amccarth, rnk
llvm-svn: 269867
instead of having DwarfUnit query the debugger tuning options.
Follow-up commmit to r269827.
Thanks to Paul Robinson for pointing this out!
llvm-svn: 269840
As discovered in PR27758, GDB does not fully support the DWARF 4 format.
This patch ensures we always emit bitfields in the DWARF 2 when tuning for GDB.
llvm-svn: 269827
When processing inline asm that contains errors, make sure we can recover
gracefully by creating an UNDEF SDValue for the inline asm statement before
returning from SelectionDAGBuilder::visitInlineAsm. This is necessary for
consumers that don't exit on the first error that is emitted (e.g. clang)
and that would assert later on.
Fixes PR24071.
Patch by Diana Picus.
llvm-svn: 269811
Since r207518 they are printed exactly like non-hidden stubs on x86 and
since r207517 on ARM.
This means we can use a single set for all stubs in those platforms.
llvm-svn: 269776
PrologEpilogInserter has these 3 phases, which are related, but not
all of them are needed by all targets. This patch reorganizes PEI's
varous functions around those phases for more clear separation. It also
introduces a new TargetMachine hook, usesPhysRegsForPEI, which is true
for non-virtual targets. When it is true, all the phases operate as
before, and PEI requires the AllVRegsAllocated property on
MachineFunctions. Otherwise, CSR spilling and scavenging are skipped and
only prolog/epilog insertion/frame finalization is done.
Differential Revision: http://reviews.llvm.org/D18366
llvm-svn: 269750
The DWARF spec states that a member entry may have either a
DW_AT_data_member_location or a DW_AT_data_bit_offset, but not both.
This fixes a bug found in PR 27758.
llvm-svn: 269731
This code currently relies on static methods in ProfileSummary to determine whether a function is hot or unlikley. I am refactoring the ProfileSummary code and these methods will be removed. As discussed offline, the right way to re-introduce this is to add a pass to annotate functions with unlikely/hot hints and use the hints to determine the prefix here.
llvm-svn: 269726
The DWARF spec clearly states that a bit field member should have either a
DW_AT_byte_size or a DW_AT_bit_size, but not both.
Also the DW_AT_byte_size is redundant with the size of the type of the member.
This fixes a bug found in PR 27758.
llvm-svn: 269714
In practice only a few well known appending linkage variables work.
Currently if codegen sees an unknown appending linkage variable it will
just print it as a regular global. That is wrong as the symbol in the
produced object file has different semantics as the one provided by the
appending linkage.
This just errors early instead of producing a broken .o.
llvm-svn: 269706
Allow two users of the condition if the other user
is also a min/max select. i.e.
%c = icmp slt i32 %x, %y
%min = select i1 %c, i32 %x, i32 %y
%max = select i1 %c, i32 %y, i32 %x
llvm-svn: 269699
Summary: This way we can get rid of one of the fields in the .def file.
Reviewers: llvm-commits
Subscribers: zturner
Differential Revision: http://reviews.llvm.org/D20251
llvm-svn: 269461
For BITREVERSE, bit shifting/masking every bit in a vector element is a very lengthy procedure.
If the input vector type is a whole multiple of bytes wide then we can split this into a BSWAP shuffle stage (to reverse at the byte level) and then a BITREVERSE stage applied to each byte. Most vector capable targets can efficiently BSWAP using shuffles resulting in a considerable reduction in instructions.
With this patch targets would only need to implement a target specific vXi8 BITREVERSE implementation to efficiently reverse most legal vector types.
Differential Revision: http://reviews.llvm.org/D19978
llvm-svn: 269290
Currently cost based loop rotation algo can only be turned on with
two conditions: the function has real profile data, and -precise-rotation-cost
flag is turned on. This is not convenient for developers to experiment
when profile is not available. Add a new option to force the new
rotation algorithm -force-precise-rotation-cost
llvm-svn: 269266
This is to fix the bug in https://llvm.org/bugs/show_bug.cgi?id=27612.
When spill is hoisted to a BB with landingpad successor, and if the VNI
of the spill reg lives into the landingpad successor, the spill should be
inserted before the call which may throw exception. InsertPointAnalysis
is used to compute the safe insert point.
http://reviews.llvm.org/D20027 is a preparing patch for this patch.
Differential Revision: http://reviews.llvm.org/D19884.
llvm-svn: 269249
InsertPointAnalysis.
Because both split and spill hoisting want to use LastSplitPoint computation
result, extract the LastSplitPoint computation from SplitAnalysis class which
also contains a bunch of other analysises only related to split.
Differential Revision: http://reviews.llvm.org/D20027.
llvm-svn: 269248
It's awkward to force callers of SelectNodeTo to figure out whether
the node was morphed or CSE'd. Update uses here instead of requiring
callers to (sometimes) do it.
llvm-svn: 269235
This means SelectCode unconditionally returns nullptr now. I'll follow
up with a change to make that return void as well, but it seems best
to keep that one very mechanical.
This is part of the work to have Select return void instead of an
SDNode *, which is in turn part of llvm.org/pr26808.
llvm-svn: 269136
Usually subregister definitions are consider uses of the remaining
lanes that did not get defined. Add a comment why the code in
ScheduleDAGInstrs does not add use dependencies regardless.
llvm-svn: 269107
for the same subprogram.
This fixes a bug where DW_AT_abstract_origin is being emitted twice for
the same subprogram if a function is both inlined and emitted in the same
translation unit, by restoring the pre-r266446 behavior.
http://reviews.llvm.org/D20072
llvm-svn: 269103
Summary:
While setting kill flags on instructions inside a BUNDLE, we bail out as soon
as we set kill flag on a register. But we are missing a check when all the
registers already have the correct kill flag set. We need to bail out in that
case as well.
This patch refactors the old code and simply makes use of the addRegisterKilled
function in MachineInstr.cpp in order to determine whether to set/remove kill
on an instruction.
Reviewers: apazos, t.p.northover, pete, MatzeB
Subscribers: MatzeB, davide, llvm-commits
Differential Revision: http://reviews.llvm.org/D17356
llvm-svn: 269092
An example from Hexagon where things went wrong:
%R0<def> = L2_loadrigp <ga:@fp04> ; load function address
J2_callr %R0<kill>, ..., %R0<imp-def> ; call *R0, return value in R0
ScheduleDAGInstrs::buildSchedGraph would visit all instructions going
backwards, and in each instruction it would visit all operands in their
order on the operand list. In the case of this call, it visited the use
of R0 first, then removed it from the set Uses after it visited the def.
This caused the DAG to be missing the data dependence edge on R0 between
the load and the call.
Differential Revision: http://reviews.llvm.org/D20102
llvm-svn: 269076
Currently, SelectionDAG assumes 8/16-bit cmpxchg returns either a sign
extended result, or a zero extended result. SystemZ takes a third
option by returning junk in the high bits (rotated contents of the other
bytes in the memory word). In that case, don't use Assert*ext, and
zero-extend the result ourselves if a comparison is needed.
Differential Revision: http://reviews.llvm.org/D19800
llvm-svn: 269075
SystemZ (and probably other targets as well) can fold a memory operand
by changing the opcode into a new instruction that as a side-effect
also clobbers the CC-reg.
In order to do this, liveness of that reg must first be checked. When
LIS is passed, getRegUnit() can be called on it and the right
LiveRange is computed on demand.
Reviewed by Matthias Braun.
http://reviews.llvm.org/D19861
llvm-svn: 269026
We now use LiveRangeCalc::extendToUses() instead of a specially designed
algorithm in constructMainRangeFromSubranges():
- The original motivation for constructMainRangeFromSubranges() were
differences between the main liverange and subranges because of hidden
dead definitions. This case however cannot happen anymore with the
DetectDeadLaneMasks pass in place.
- It simplifies the code.
- This fixes a longstanding bug where we did not properly create new SSA
values on merging control flow (the MachineVerifier missed most of
these cases).
- Move constructMainRangeFromSubranges() to LiveIntervalAnalysis and
LiveRangeCalc to better match the implementation/available helper
functions.
llvm-svn: 269016
Move the register stackification and coloring passes to run very late, after
PEI, tail duplication, and most other passes. This means that all code emitted
and expanded by those passes is now exposed to these passes. This also
eliminates the need for prologue/epilogue code to be manually stackified,
which significantly simplifies the code.
This does require running LiveIntervals a second time. It's useful to think
of these late passes not as late optimization passes, but as a domain-specific
compression algorithm based on knowledge of liveness information. It's used to
compress the code after all conventional optimizations are complete, which is
why it uses LiveIntervals at a phase when actual optimization passes don't
typically need it.
Differential Revision: http://reviews.llvm.org/D20075
llvm-svn: 269012
Many files include Passes.h but only a fraction needs to know about the
TargetPassConfig class. Move it into an own header. Also rename
Passes.cpp to TargetPassConfig.cpp while we are at it.
llvm-svn: 269011
Add convenience function to create MachineModuleInfo and
MachineFunctionAnalysis passes and add them to a pass manager.
Despite factoring out some shared code in
LiveIntervalTest/LLVMTargetMachine this will be used by my upcoming llc
change.
llvm-svn: 269002
After looking at D19087 again, it occurred to me that we can do better. If we consolidate
the valueHasExactlyOneBitSet() transforms, we won't incur extra overhead from calling it a
2nd time, and we can shrink SimplifySetCC() a bit. No functional change intended.
Differential Revision: http://reviews.llvm.org/D20050
llvm-svn: 268932
In case of COPY-like instruction we may be able to deduce that a certain
input is unused, based on the used lanes of the register defined by the
instruction.
This even works accross otherwise incompatible copies (no need to have
compatible lanemasks, completely unused operands are still completely
unused). It even makes sense to redo the analysis in this case since we
gained information for a case we previously stopped at because of the
incompatible masks.
llvm-svn: 268815
Relying on the caller to clean up after we've replaced all uses of a
node won't work when we've migrated to the `void Select(...)` API.
llvm-svn: 268774
If we don't, values that aren't precisely representable in f16 could
be used as-is in a promoted f32 operation, which would produce
incorrect results.
AArch64 had the correct behavior; add a focused test.
Fixes http://llvm.org/PR26871
llvm-svn: 268700
This is a step towards removing the rampant undefined behaviour in
SelectionDAG, which is a part of llvm.org/PR26808.
We rename SelectionDAGISel::Select to SelectImpl and update targets to
match, and then change Select to return void and consolidate the
sketchy behaviour we're trying to get away from there.
Next, we'll update backends to implement `void Select(...)` instead of
SelectImpl and eventually drop the base Select implementation.
llvm-svn: 268693
This opcode never happens in practice, and yet the logic we have in
place to handle it would be undefined behaviour if we ever executed
it. Remove it rather than trying to refactor code that's never
reached.
llvm-svn: 268692
Some vector bit operations are promoted instead of having custom lowering. This patch changes the isOperationLegalOrCustom tests for vector AND/OR operations to use a new TLI helper isOperationLegalOrCustomOrPromote instead, allowing the SSE implementations to stay on the simd unit.
Differential Revision: http://reviews.llvm.org/D19805
llvm-svn: 268561
Vector bit operations are typically promoted instead of having custom lowering. This patch changes the isOperationLegalOrCustom tests for vector AND/OR operations to use isOperationLegalOrPromote instead, allowing the SSE implementations to stay on the simd unit.
Differential Revision: http://reviews.llvm.org/D19805
llvm-svn: 268504
The replaced load may have implicit-defs and those defs may be used
in the block of the original load. Make sure to update the liveness
accordingly.
This is a generalization of r267817.
llvm-svn: 268412
Remove the AddPristinesAndCSRs parameters from
addLiveIns()/addLiveOuts().
We need to respect pristine registers after prologue epilogue insertion,
Seeing that we got this wrong in at least two commits already, we should
rather pay the small price to query MachineFrameInfo for it.
There are three cases that did not set AddPristineAndCSRs to true even
after register allocation:
- ExecutionDepsFix: live-out registers are used as a hint that the
register is used soon. This is not true for pristine registers so
use the new addLiveOutsNoPristines() to maintain this behaviour.
- SystemZShortenInst: Not setting AddPristineAndCSRs to true looks like
a bug, should do the right thing automatically now.
- StackMapLivenessAnalysis: Not adding pristine registers looks like a
bug to me. Added a FIXME comment but maintain the current behaviour
as a change may need to get coordinated with GC runtimes.
llvm-svn: 268336
Summary:
This adds a unique ID to the COFF section uniquing map, similar to the
one we have for ELF. The unique id is not currently exposed via the
assembler because we don't have a use case for it yet. Users generally
create .pdata with the .seh_* family of directives, and the assembler
internally needs to produce .pdata and .xdata sections corresponding to
the code section.
The association between .text sections and the assembler-created .xdata
and .pdata sections is maintained as an ID field of MCSectionCOFF. The
CFI-related sections are created with the given unique ID, so if more
code is added to the same text section, we can find and reuse the CFI
sections that were already created.
Reviewers: majnemer, rafael
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19376
llvm-svn: 268331
After the layout of the basic blocks is set, the target may be able to get rid
of unconditional branches to fallthrough blocks that the generic code does not
catch. This happens any time TargetInstrInfo::AnalyzeBranch is not able to
analyze all the branches involved in the terminators sequence, while still
understanding a few of them.
In such situation, AnalyzeBranch can directly modify the branches if it has been
instructed to do so.
This patch takes advantage of that.
llvm-svn: 268328
This operation may branch to the handler block and we do not want it
to happen anywhere within the basic block.
Moreover, by marking it "terminator and branch" the machine verifier
does not wrongly assume (because of AnalyzeBranch not knowing better)
the branch is analyzable. Indeed, the target was seeing only the
unconditional branch and not the faulting load op and thought it was
a simple unconditional block.
The machine verifier was complaining because of that and moreover,
other optimizations could have done wrong transformation!
In the process, simplify the representation of the handler block in
the faulting load op. Now, we directly reference the handler block
instead of using a label. This has the benefits of:
1. MC knows how to issue a label for a BB, so leave that to it.
2. Accessing the target BB from its label is painful, whereas it is
direct from a MBB operand.
Note: The 2 bytes offset in implicit-null-check.ll comes from the
fact the unconditional jumps are not removed anymore, as the whole
terminator sequence is not analyzable anymore.
Will fix it in a subsequence commit.
llvm-svn: 268327
Summary:
When SelectionDAG performs CSE it is possible that the context's source
location is different from that of the selected node. This can lead to
incorrect line number records. We update the debug location to the
one that occurs earlier in the instruction sequence.
This fixes PR21006.
Reviewers: echristo, sdmitrouk
Subscribers: jevinskie, asl, llvm-commits
Differential Revision: http://reviews.llvm.org/D12094
llvm-svn: 268323
Summary:
Historically, we had a switch in the Makefiles for turning on "expensive
checks". This has never been ported to the cmake build, but the
(dead-ish) code is still around.
This will also make it easier to turn it on in buildbots.
Reviewers: chandlerc
Subscribers: jyknight, mzolotukhin, RKSimon, gberry, llvm-commits
Differential Revision: http://reviews.llvm.org/D19723
llvm-svn: 268050
In case of missing live intervals for a physical registers
getLanesWithProperty() would report 0 which was not a safe default in
all situations. Add a parameter to pass in a safe default.
No testcase because in-tree targets do not skip computing register unit
live intervals.
Also cleanup the getXXX() functions to not perform the
RequireLiveIntervals checks anymore so we do not even need to return
safe defaults.
llvm-svn: 267977
With the DetectDeadLanes pass in place we cannot run into situations
anymore where defs suddenly become dead.
Also add a missing check so we do not try to add an undef flag to a
physreg (found by visual inspection, no failing test).
llvm-svn: 267976
This requirement was a huge hack to keep LiveVariables alive because it
was optionally used by TwoAddressInstructionPass and PHIElimination.
However we have AnalysisUsage::addUsedIfAvailable() which we can use in
those passes.
This re-applies r260806 with LiveVariables manually added to PowerPC to
hopefully not break the stage 2 bots this time.
llvm-svn: 267954
The DetectDeadLaneMask already ensures that we have no dead subregister
definitions making the special handling in LiveIntervalAnalysis
unnecessary. This reverts most of r248335.
llvm-svn: 267937
ScheduleDAGMI::initQueues changes the RegionBegin to the first non-debug
instruction. Since it does not track register pressure, it does not affect
any RP trackers. ScheduleDAGMILive inherits initQueues from ScheduleDAGMI,
and it does reset the TopTPTracker in its schedule method. Any derived,
target-specific scheduler will need to do it as well, but the TopRPTracker
is only exposed as a "const" object to derived classes. Without the ability
to modify the tracker directly, this leaves a derived scheduler with a
potential of having the TopRPTracker out-of-sync with the CurrentTop.
The symptom of the problem:
void llvm::ScheduleDAGMILive::scheduleMI(llvm::SUnit *, bool):
Assertion `TopRPTracker.getPos() == CurrentTop && "out of sync"' failed.
Differential Revision: http://reviews.llvm.org/D19438
llvm-svn: 267918
The DWARF2 specification of DW_AT_bit_offset is ambiguous for
little-endian machines, but by restoring to the old behavior
we match what debuggers expect and what other popular compilers
generate.
llvm-svn: 267896
The DWARF2 specification of DW_AT_bit_offset was written from the perspective of
a big-endian machine with unclear semantics for other systems. DWARF4
deprecated DW_AT_bit_offset and introduced a new attribute DW_AT_data_bit_offset
that simply counts the number of bits from the beginning of the containing
entity regardless of endianness.
After this patch LLVM emits DW_AT_bit_offset for DWARF 2 or 3 and
DW_AT_data_bit_offset when DWARF 4 or later is requested.
llvm-svn: 267895
The DetectDeadLanes pass performs a dataflow analysis of used/defined
subregister lanes across COPY instructions and instructions that will
get lowered to copies. It detects dead definitions and uses reading
undefined values which are obscured by COPY and subregister usage.
These dead definitions cause trouble in the register coalescer which
cannot deal with definitions suddenly becoming dead after coalescing
COPY instructions.
For now the pass only adds dead and undef flags to machine operands. It
should be possible to extend it in the future to remove the dead
instructions and redo the analysis for the affected virtual
registers.
Differential Revision: http://reviews.llvm.org/D18427
llvm-svn: 267851
handleMove() was incorrectly swapping two value numbers. This was missed
before because the problem only occured when moving subregister definitions
and needed -verify-machineinstrs to be detected.
I cannot add a testcase as long as I cannot reapply r260905/r260806.
llvm-svn: 267840
We basically replace:
HoistBB:
cond_br NullBB, NotNullBB
NullBB:
...
NotNullBB:
<reg> = load
into
HoistBB
<reg> = load_faulting_op NullBB
uncond_br NotNullBB
NullBB:
...
NotNullBB: ## <reg> is now live-in of NotNullBB
...
This partially fixes the machine verifier error for
test/CodeGen/X86/implicit-null-check.ll, but it still fails because
of the implicit CFG structure.
llvm-svn: 267817
The sink cast machinery is supposed to sink casts as close to their user
as possible. However, an EH pad is the first instruction in it's basic
block. Don't sink if the user is an EH pad.
This fixes PR27536.
llvm-svn: 267767
Summary:
Refactor debugging routines to reduce code duplication. Remove a couple
of #include's that were not needed. Don't require MachineDominator as a
prereq for this pass (not needed).
These changes split off from http://reviews.llvm.org/D18827.
Reviewers: wmi, gbiv, qcolombet
Subscribers: llvm-commits, davidxl, jevinskie
Differential Revision: http://reviews.llvm.org/D18992
llvm-svn: 267766
is defined!
The users were checking the proper thing (Defined + PartialDeadDef), but the
information may have been wrong for other use cases, so fix that.
llvm-svn: 267641
When a block is tail-duplicated, the PHI nodes from that block are
replaced with appropriate COPY instructions. When those PHI nodes
contained use operands with subregisters, the subregisters were
dropped from the COPY instructions, resulting in incorrect code.
Keep track of the subregister information and use this information
when remapping instructions from the duplicated block.
Differential Revision: http://reviews.llvm.org/D19337
llvm-svn: 267583
This is part of solving PR27344:
https://llvm.org/bugs/show_bug.cgi?id=27344
CGP should undo the SimplifyCFG transform for the same reason that earlier patches have used this
same mechanism: it's possible that passes between SimplifyCFG and CGP may be able to optimize the
IR further with a select in place.
For the TLI hook default, >99% taken or not taken is chosen as the default threshold for a highly
predictable branch. Even the most limited HW branch predictors will be correct on this branch almost
all the time, so even a massive mispredict penalty perf loss would be overcome by the win from all
the times the branch was predicted correctly.
As a follow-up, we could make the default target hook less conservative by using the SchedMachineModel's
MispredictPenalty. Or we could just let targets override the default by implementing the hook with that
and other target-specific options. Note that trying to statically determine mispredict rates for
close-to-balanced profile weight data is generally impossible if the HW is sufficiently advanced. Ie,
50/50 taken/not-taken might still be 100% predictable.
Finally, note that this patch as-is will not solve PR27344 because the current __builtin_unpredictable()
branch weight default values are 4 and 64. A proposal to change that is in D19435.
Differential Revision: http://reviews.llvm.org/D19488
llvm-svn: 267572