When one operand is a user of another in a promoted binary operation
we may replace and delete the returned value before returning
triggering an assertion. Reorder node replacements to prevent this.
Fixes PR34137.
Landing on behalf of Nirav.
Differential Revision: https://reviews.llvm.org/D36581
llvm-svn: 311623
G_PHI has the same semantics as PHI but also has types.
This lets us verify that the types in the G_PHI are consistent.
This also allows specifying legalization actions for G_PHIs.
https://reviews.llvm.org/D36990
llvm-svn: 311596
Summary:
Most DIExpressions are empty or very simple. When they are complex, they
tend to be unique, so checking them inline is reasonable.
This also avoids the need for CodeGen passes to append to the
llvm.dbg.mir named md node.
See also PR22780, for making DIExpression not be an MDNode.
Reviewers: aprantl, dexonsmith, dblaikie
Subscribers: qcolombet, javed.absar, eraman, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D37075
llvm-svn: 311594
Implementing this pass as a PowerPC specific pass. Branch coalescing utilizes
the analyzeBranch method which currently does not include any implicit operands.
This is not an issue on PPC but must be handled on other targets.
Differential Revision : https: // reviews.llvm.org/D32776
llvm-svn: 311588
Summary:
This change achieves two things:
- Redefine the Custom Event handling instrumentation points emitted by
the compiler to not require dynamic relocation of references to the
__xray_CustomEvent trampoline.
- Remove the synthetic reference we emit at the end of a function that
we used to keep auxiliary sections alive in favour of SHF_LINK_ORDER
associated with the section where the function is defined.
To achieve the custom event handling change, we've had to introduce the
concept of sled versioning -- this will need to be supported by the
runtime to allow us to understand how to turn on/off the new version of
the custom event handling sleds. That change has to land first before we
change the way we write the sleds.
To remove the synthetic reference, we rely on a relatively new linker
feature that preserves the sections that are associated with each other.
This allows us to limit the effects on the .text section of ELF
binaries.
Because we're still using absolute references that are resolved at
runtime for the instrumentation map (and function index) maps, we mark
these sections write-able. In the future we can re-define the entries in
the map to use relative relocations instead that can be statically
determined by the linker. That change will be a bit more invasive so we
defer this for later.
Depends on D36816.
Reviewers: dblaikie, echristo, pcc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36615
llvm-svn: 311525
This also changes the TailDuplicator to be configured explicitely
pre/post regalloc rather than relying on the isSSA() flag. This was
necessary to have `llc -run-pass` work reliably.
llvm-svn: 311520
Adds infrastructure to clone whole instruction bundles rather than just
single instructions. This fixes a bug where tail duplication would
unbundle instructions while cloning.
This should unbreak the "Clang Stage 1: cmake, RA, with expensive checks
enabled" build on greendragon. The bot broke with r311139 hitting this
pre-existing bug.
A proper testcase will come next.
llvm-svn: 311511
This partially reverts r311429 in favor of making ISD::isConstantSplatVector do something not confusing. Turns out the only other user of it was also having to deal with the weird property of it returning a smaller size.
So rather than continue to deal with this quirk everywhere, just make the interface do something sane.
Differential Revision: https://reviews.llvm.org/D37039
llvm-svn: 311510
I was contacted by Jesper Antonsson from Ericsson who ran into problems
with r311181 in their test suites with for an out-of-tree target.
Because of the latter I don't have a reproducer, but we definitely don't
want to modify the data structure on which we are iterating inside the
loop.
llvm-svn: 311466
When expanding a BRCOND into a BR_CC, do not create an AND 1
if one already exists.
Review: D36705
Patch by Joel Galenson <jgalenson@google.com>
llvm-svn: 311447
This adds debug messages to various functions that create new SDValue nodes.
This is e.g. useful to have during legalization, as otherwise it can prints
legalization info of nodes that did not appear in the dumps before.
Differential Revision: https://reviews.llvm.org/D36984
llvm-svn: 311444
ISD::isConstantSplatVector can shrink to the smallest splat width. But we don't check the size of the resulting APInt at all. This can cause us to misinterpret the results.
This patch just adds a flag to prevent the APInt from changing width.
Fixes PR34271.
Differential Revision: https://reviews.llvm.org/D36996
llvm-svn: 311429
When removing a live-range we used to not touch them making debug
prints harder to read because the IR was not matching what the
live-ranges information was saying.
This only affects debug printing and allows to put stronger asserts in
the code (see r308906 for instance).
llvm-svn: 311401
Summary:
If all the operands of a BUILD_VECTOR extract elements from same vector then split the
vector efficiently based on the maximum vector access index.
Reviewers: zvi, delena, RKSimon, thakis
Reviewed By: RKSimon
Subscribers: chandlerc, eladcohen, llvm-commits
Differential Revision: https://reviews.llvm.org/D35788
llvm-svn: 311255
They won't affect the DWARF output, but they will mess with the
sorting of the fragments. This fixes the crash reported in PR34159.
https://bugs.llvm.org/show_bug.cgi?id=34159
llvm-svn: 311217
This patch teaches the SDag type legalizer how to split up debug info for
integer values that are split into a hi and lo part.
(re-commit)
Differential Revision: https://reviews.llvm.org/D36805
llvm-svn: 311181
We see a modest performance improvement from this slightly higher tail dup threshold.
Differential Revision: https://reviews.llvm.org/D36775
llvm-svn: 311139
Two issues identified by buildbots were addressed:
- The pass no longer forwards COPYs to physical register uses, since
doing so can break code that implicitly relies on the physical
register number of the use.
- The pass no longer forwards COPYs to undef uses, since doing so
can break the machine verifier by creating LiveRanges that don't
end on a use (since the undef operand is not considered a use).
[MachineCopyPropagation] Extend pass to do COPY source forwarding
This change extends MachineCopyPropagation to do COPY source forwarding.
This change also extends the MachineCopyPropagation pass to be able to
be run during register allocation, after physical registers have been
assigned, but before the virtual registers have been re-written, which
allows it to remove virtual register COPY LiveIntervals that become dead
through the forwarding of all of their uses.
Reviewers: qcolombet, javed.absar, MatzeB, jonpa
Subscribers: jyknight, nemanjai, llvm-commits, nhaehnle, mcrosier, mgorny
Differential Revision: https://reviews.llvm.org/D30751
llvm-svn: 311135
This patch teaches the SDag type legalizer how to split up debug info for
integer values that are split into a hi and lo part.
Differential Revision: https://reviews.llvm.org/D36805
llvm-svn: 311102
The SelectionDAGBuilder translates various conditional branches into
CaseBlocks which are then translated into SDNodes. If a conditional
branch results in multiple CaseBlocks only the first CaseBlock is
translated into SDNodes immediately, the rest of the CaseBlocks are
put in a queue and processed when all LLVM IR instructions in the
basic block have been processed.
When a CaseBlock is transformed into SDNodes the SelectionDAGBuilder
is queried for the current LLVM IR instruction and the resulting
SDNodes are annotated with the debug info of the current
instruction (if it exists and has debug metadata).
When the deferred CaseBlocks are processed, the SelectionDAGBuilder
does not have a current LLVM IR instruction, and the resulting SDNodes
will not have any debuginfo. As DwarfDebug::beginInstruction() outputs
a .loc directive for the first instruction in a labeled
block (typically the case for something coming from a CaseBlock) this
tends to produce a line-0 directive.
This patch changes the handling of CaseBlocks to store the current
instruction's debug info into the CaseBlock when it is created (and the
SelectionDAGBuilder knows the current instruction) and to always use
the stored debug info when translating a CaseBlock to SDNodes.
Patch by Frej Drejhammar!
Differential Revision: https://reviews.llvm.org/D36671
llvm-svn: 311097
The idea of this patch is to continue the scheduler state over an MBB boundary
in the case where the successor block has only one predecessor. This means
that the scheduler will continue in the successor block (after emitting any
branch instructions) with e.g. maintained processor resource counters.
Benchmarks have been confirmed to benefit from this.
The algorithm in MachineScheduler.cpp that extracts scheduling regions of an
MBB has been extended so that the strategy may optionally reverse the order
of processing the regions themselves. This is controlled by a new method
doMBBSchedRegionsTopDown(), which defaults to false.
Handling the top-most region of an MBB first also means that a top-down
scheduler can continue the scheduler state across any scheduling boundary
between to regions inside MBB.
Review: Ulrich Weigand, Matthias Braun, Andy Trick.
https://reviews.llvm.org/D35053
llvm-svn: 311072
When v1i1 is legal (e.g. AVX512) the legalizer can reach
a case where a v1i1 SETCC with an illgeal vector type operand
wasn't scalarized (since v1i1 is legal) but its operands does
have to be scalarized. This used to assert because SETCC was
missing from the vector operand scalarizer.
This patch attemps to teach the legalizer to handle these cases
by scalazring the operands, converting the node into a scalar
SETCC node.
Differential revision: https://reviews.llvm.org/D36651
llvm-svn: 311071
If we want to substitute the relocation of derived pointer with gep of base then
we must ensure that relocation of base dominates the relocation of derived pointer.
Currently only check for basic block is present. However it is possible that both
relocation are in the same basic block but relocation of derived pointer is defined
earlier.
The patch moves the relocation of base pointer right before relocation of derived
pointer in this case.
Reviewers: sanjoy,artagnon,igor-laevsky,reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36462
llvm-svn: 311067
This reverts commit r311038.
Several buildbots are breaking, and at least one appears to be due to
the forwarding of physical regs enabled by this change. Reverting while
I investigate further.
llvm-svn: 311062
This change extends MachineCopyPropagation to do COPY source forwarding.
This change also extends the MachineCopyPropagation pass to be able to
be run during register allocation, after physical registers have been
assigned, but before the virtual registers have been re-written, which
allows it to remove virtual register COPY LiveIntervals that become dead
through the forwarding of all of their uses.
Reviewers: qcolombet, javed.absar, MatzeB, jonpa
Subscribers: jyknight, nemanjai, llvm-commits, nhaehnle, mcrosier, mgorny
Differential Revision: https://reviews.llvm.org/D30751
llvm-svn: 311038
r310825 caused the clang-ppc64le-linux-lnt bot to go red
(http://lab.llvm.org:8011/builders/clang-ppc64le-linux-lnt/builds/5712)
because of a test-suite failure of
SingleSource/UnitTests/2003-07-09-SignedArgs
This reverts commit 0028f6a87224fb595a1c19c544cde9b003035996.
llvm-svn: 311008
Undef subreg definition means that the content of the super register
doesn't matter at this point. While that's true for virtual registers,
this may not hold when replacing them with actual physical registers.
Indeed, some part of the physical register may be coalesced with the
related virtual register and thus, the values for those parts matter and
must be live.
The fix consists in checking whether or not subregs of the physical register
being assigned to an undef subreg definition are live through that def and
insert an implicit use if they are. Doing so, will keep them alive until
that point like they should be.
E.g., let vreg14 being assigned to R0_R1 then
%vreg14:gsub_0<def,read-undef> = COPY %R0 ; <-- R1 is still live here
%vreg14:gsub_1<def> = COPY %R1
Before this changes, the rewriter would change the code into:
%R0<def> = KILL %R0, %R0_R1<imp-def> ; <-- this tells R1 is redefined
%R1<def> = KILL %R1, %R0_R1<imp-def>, %R0_R1<imp-use> ; this value of this R1
; is believed to come
; from the previous
; instruction
Because of this invalid liveness, later pass could make wrong choices and in
particular clobber live register as it happened with the register scavenger in
llvm.org/PR34107
Now we would generate:
%R0<def> = KILL %R0, %R0_R1<imp-def>, %R0_R1<imp-use> ; This tells R1 needs to
; reach this point
%R1<def> = KILL %R1, %R0_R1<imp-def>, %R0_R1<imp-use>
The bug has been here forever, it got exposed recently because the register
scavenger got smarter.
Fixes llvm.org/PR34107
llvm-svn: 310979
Since we don't factor in instruction lengths into outlining calculations
right now, it's never the case that a candidate could have length < 2.
Thus, we should quit early when we see such candidates.
llvm-svn: 310894
This was running immediately after isel, before
isel pseudos were even expanded which is really
unreasonable. Move this to before pre-reglloc
passes in case some other pre-regalloc pass wants to
use the updated regmask info.
Fixes one of the reasons IPRA doesn't do anything on
AMDGPU currently. Tests will be included with future
patch after a few more are fixed.
llvm-svn: 310875
Summary: It is creating useless work as the commuted nodes is the same as the node we are working on in that case.
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33840
llvm-svn: 310832
into vextract(vNiX,Idx) when creating vextract with getNode().
This case appeared in AVX512 after fixing pr33349 in r310552.
Differential revision: https://reviews.llvm.org/D36571
llvm-svn: 310828
Summary:
Without the SrcVT its hard to know what is really being asked for. For example if your target has 128, 256, and 512 bit vectors. Maybe extracting 128 from 256 is cheap, but maybe extracting 128 from 512 is not.
For x86 we do support extracting a quarter of a 512-bit register. But for i1 vectors we don't have isel patterns for extracting arbitrary pieces. So we need this to have a correct implementation of isExtractSubvectorCheap for mask vectors.
Reviewers: RKSimon, zvi, efriedma
Reviewed By: RKSimon
Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D36649
llvm-svn: 310793
If all the operands of a BUILD_VECTOR extract elements from same vector then split the vector efficiently based on the maximum vector access index.
Reapplied with fix to only work with simple value types.
Committed on behalf of @jbhateja (Jatin Bhateja)
Differential Revision: https://reviews.llvm.org/D35788
llvm-svn: 310782
The previous rev (r310208) failed to account for overflow when subtracting the
constants to see if they're suitable for shift/lea. This version add a check
for that and more test were added in r310490.
We can convert any select-of-constants to math ops:
http://rise4fun.com/Alive/d7d
For this patch, I'm enhancing an existing x86 transform that uses fake multiplies
(they always become shl/lea) to avoid cmov or branching. The current code misses
cases where we have a negative constant and a positive constant, so this is just
trying to plug that hole.
The DAGCombiner diff prevents us from hitting a terrible inefficiency: we can start
with a select in IR, create a select DAG node, convert it into a sext, convert it
back into a select, and then lower it to sext machine code.
Some notes about the test diffs:
1. 2010-08-04-MaskedSignedCompare.ll - We were creating control flow that didn't exist in the IR.
2. memcmp.ll - Choose -1 or 1 is the case that got me looking at this again. We could avoid the
push/pop in some cases if we used 'movzbl %al' instead of an xor on a different reg? That's a
post-DAG problem though.
3. mul-constant-result.ll - The trade-off between sbb+not vs. setne+neg could be addressed if
that's a regression, but those would always be nearly equivalent.
4. pr22338.ll and sext-i1.ll - These tests have undef operands, so we don't actually care about these diffs.
5. sbb.ll - This shows a win for what is likely a common case: choose -1 or 0.
6. select.ll - There's another borderline case here: cmp+sbb+or vs. test+set+lea? Also, sbb+not vs. setae+neg shows up again.
7. select_const.ll - These are motivating cases for the enhancement; replace cmov with cheaper ops.
Assembly differences between movzbl and xor to avoid a partial reg stall are caused later by the X86 Fixup SetCC pass.
Differential Revision: https://reviews.llvm.org/D35340
llvm-svn: 310717
Move store merge to happen after intrinsic lowering to allow lowered
stores to be merged.
Some regressions due in MergeConsecutiveStores to missing
insert_subvector that are addressed in follow up patch.
Reviewers: craig.topper, efriedma, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34559
llvm-svn: 310710
rL310372 enabled simplifyShuffleMask to support undef shuffle mask inputs, but its causing hangs.
Removing support until I can triage the problem
llvm-svn: 310699
Summary:
This fixes PR32721 in IfConvertTriangle and possible similar problems in
IfConvertSimple, IfConvertDiamond and IfConvertForkedDiamond.
In PR32721 we had a triangle
EBB
| \
| |
| TBB
| /
FBB
where FBB didn't have any successors at all since it ended with an
unconditional return. Then TBB and FBB were be merged into EBB, but EBB
would still keep its successors, and the use of analyzeBranch and
CorrectExtraCFGEdges wouldn't help to remove them since the return
instruction is not analyzable (at least not on ARM).
The edge updating code and branch probability updating code is now pushed
into MergeBlocks() which allows us to share the same update logic between
more callsites. This lets us remove several dependencies on analyzeBranch
and completely eliminate RemoveExtraEdges.
One thing that showed up with this patch was that IfConversion sometimes
left a successor with 0% probability even if there was no branch or
fallthrough to the successor.
One such example from the test case ifcvt_bad_zero_prob_succ.mir. The
indirect branch tBRIND can only jump to bb.1, but without the patch we
got:
bb.0:
successors: %bb.1(0x80000000)
bb.1:
successors: %bb.1(0x80000000), %bb.2(0x00000000)
tBRIND %r1, 1, %cpsr
B %bb.1
bb.2:
There is no way to jump from bb.1 to bb2, but still there is a 0% edge
from bb.1 to bb.2.
With the patch applied we instead get the expected:
bb.0:
successors: %bb.1(0x80000000)
bb.1:
successors: %bb.1(0x80000000)
tBRIND %r1, 1, %cpsr
B %bb.1
Since bb.2 had no predecessor at all, it was removed.
Several testcases had to be updated due to this since the removed
successor made the "Branch Probability Basic Block Placement" pass
sometimes place blocks in a different order.
Finally added a couple of new test cases:
* PR32721_ifcvt_triangle_unanalyzable.mir:
Regression test for the original problem dexcribed in PR 32721.
* ifcvt_triangleWoCvtToNextEdge.mir:
Regression test for problem that caused a revert of my first attempt
to solve PR 32721.
* ifcvt_simple_bad_zero_prob_succ.mir:
Test case showing the problem where a wrong successor with 0% probability
was previously left.
* ifcvt_[diamond|forked_diamond|simple]_unanalyzable.mir
Very simple test cases for the simple and (forked) diamond cases
involving unanalyzable branches that can be nice to have as a base if
wanting to write more complicated tests.
Reviewers: iteratee, MatzeB, grosser, kparzysz
Reviewed By: kparzysz
Subscribers: kbarton, davide, aemerson, nemanjai, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D34099
llvm-svn: 310697
The liveness-tracking code assumes that the registers that were saved
in the function's prolog are live outside of the function. Specifically,
that registers that were saved are also live-on-exit from the function.
This isn't always the case as illustrated by the LR register on ARM.
Differential Revision: https://reviews.llvm.org/D36160
llvm-svn: 310619
Summary:
Preserve chain dependecies between old and new loads constructed to
prevent loads from reordering below later stores.
Fixes PR34088.
Reviewers: craig.topper, spatel, RKSimon, efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36528
llvm-svn: 310604
In FoldConstantArithmetic, handle BUILD_VECTOR nodes that do implicit truncation on the elements.
This is similar to what is done in FoldConstantVectorArithmetic.
Differential Revision:
https://reviews.llvm.org/D36506
llvm-svn: 310593
a legal cond operand.
When scalarizing the result of a vselect, the legalizer currently expects
to already have scalarized the operands. While this is true for the true/false
operands (which have the same type as the result), it is not case for the
condition operand. On X86 AVX512, v1i1 is legal - this leads to operations such
as '< N x type> vselect < N x i1> < N x type> < N x type>' where < N x type > is
illegal to hit an assertion during the scalarization.
The handling is similar to r205625.
This also exposes the fact that (v1i1 extract_subvector) should be legal
and selectable on AVX512 - We do this by custom lowering to vector_extract_elt.
This still leaves us in some cases with redundant dag nodes which will be
combined in a separate soon to come patch.
This fixes pr33349.
Differential revision: https://reviews.llvm.org/D36511
llvm-svn: 310552
It is possible that dependent instruction may access memory.
In this case we must reject optimization because the memory change will
be visible in null handler basic block. So we will execute an instruction which
we must not execute if check fails.
Reviewers: sanjoy, reames
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36392
llvm-svn: 310443
Previously we limited ourselves to only emitting nested classes, but we
need other kinds of types as well.
This fixes the Visual Studio STL visualizers, so that users can
visualize std::string and other objects.
llvm-svn: 310410
Relanding after case to insert explicit truncation as necessary.
Allow SCALAR_TO_VECTOR of EXTRACT_VECTOR_ELT to reduce to
EXTRACT_SUBVECTOR of vector shuffle when output is smaller. Marginally
improves vector shuffle computations.
Reviewers: efriedma, RKSimon, spatel
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D35566
llvm-svn: 310256
The NewNodesMustHaveLegalTypes flag is set to false at the beginning of CodeGenAndEmitDAG, and set to true after legalizing types.
But before calling CodeGenAndEmitDAG we build the DAG for the basic block.
So for the first basic block NewNodesMustHaveLegalTypes would be 'false' during the SDAG building, and for all other basic blocks it would be 'true'.
This patch sets the flag to false before SDAG building each basic block.
Differential Revision:
https://reviews.llvm.org/D33435
llvm-svn: 310239
We can convert any select-of-constants to math ops:
http://rise4fun.com/Alive/d7d
For this patch, I'm enhancing an existing x86 transform that uses fake multiplies
(they always become shl/lea) to avoid cmov or branching. The current code misses
cases where we have a negative constant and a positive constant, so this is just
trying to plug that hole.
The DAGCombiner diff prevents us from hitting a terrible inefficiency: we can start
with a select in IR, create a select DAG node, convert it into a sext, convert it
back into a select, and then lower it to sext machine code.
Some notes about the test diffs:
1. 2010-08-04-MaskedSignedCompare.ll - We were creating control flow that didn't exist in the IR.
2. memcmp.ll - Choose -1 or 1 is the case that got me looking at this again. I
think we could avoid the push/pop in some cases if we used 'movzbl %al' instead of an xor on
a different reg? That's a post-DAG problem though.
3. mul-constant-result.ll - The trade-off between sbb+not vs. setne+neg could be addressed if
that's a regression, but I think those would always be nearly equivalent.
4. pr22338.ll and sext-i1.ll - These tests have undef operands, so I don't think we actually care about these diffs.
5. sbb.ll - This shows a win for what I think is a common case: choose -1 or 0.
6. select.ll - There's another borderline case here: cmp+sbb+or vs. test+set+lea? Also, sbb+not vs. setae+neg shows up again.
7. select_const.ll - These are motivating cases for the enhancement; replace cmov with cheaper ops.
Assembly differences between movzbl and xor to avoid a partial reg stall are caused later by the X86 Fixup SetCC pass.
Differential Revision: https://reviews.llvm.org/D35340
llvm-svn: 310208
If all the operands of a BUILD_VECTOR extract elements from same vector then split the vector efficiently based on the maximum vector access index.
Committed on behalf of @jbhateja (Jatin Bhateja)
Differential Revision: https://reviews.llvm.org/D35788
llvm-svn: 310058
With this change, the GlobalISel library gets always built. In
particular, this is not possible to opt GlobalISel out of the build
using the LLVM_BUILD_GLOBAL_ISEL variable any more.
llvm-svn: 309990
The debug value live intervals computed by Live Debug Variables may extend
beyond the range of the debug location's lexical scope. In this case,
splitting of an interval can result in an interval outside of the scope being
created, causing extra unnecessary DBG_VALUEs to be emitted. To prevent this,
trim the intervals to the lexical scope.
This resolves PR33730.
Reviewers: aprantl
Differential Revision: https://reviews.llvm.org/D35953
llvm-svn: 309933
rL306209 taught SelectionDAG how to add the dereferenceable flag when
expanding memcpy and memmove. The fix however contained a nit where
the offset + size was constructed as an APInt of PointerSize rather
than PointerSizeInBits.
This lead to isDereferenceableAndAlignedPointer() get truncated values or
values which would be sign extended within that function leading to
incorrect results.
Thanks to Alex Crichton for reporting the issue!
This resolves PR33978.
Reviewers: inouehrs
Differential Revision: https://reviews.llvm.org/D36236
llvm-svn: 309930
Summary:
To delete an instruction the coalescer needs to call eraseFromParent()
on the MachineInstr, insert it in the ErasedInstrs list and update the
Live Ranges structure. This patch re-factors the code to do all that in
one function. This will also fix cases where previous code wasn't
inserting deleted instructions in the ErasedList.
Reviewers: qcolombet, kparzysz
Reviewed By: qcolombet
Subscribers: MatzeB, llvm-commits, qcolombet
Differential Revision: https://reviews.llvm.org/D36204
llvm-svn: 309915
IMHO it is an antipattern to have a enum value that is Default.
At any given piece of code it is not clear if we have to handle
Default or if has already been mapped to a concrete value. In this
case in particular, only the target can do the mapping and it is nice
to make sure it is always done.
This deletes the two default enum values of CodeModel and uses an
explicit Optional<CodeModel> when it is possible that it is
unspecified.
llvm-svn: 309911
This patch is update after the first patch (https://reviews.llvm.org/rL309651) based on the post-commit comments.
Stack coloring pass need to maintain AliasAnalysis information when merging stack slots of different types.
Actually, there is a FIXME comment in StackColoring.cpp
// FIXME: In order to enable the use of TBAA when using AA in CodeGen,
// we'll also need to update the TBAA nodes in MMOs with values
// derived from the merged allocas.
But, TBAA has been already enabled in CodeGen without fixing this pass.
The incorrect TBAA metadata results in recent failures in bootstrap test on ppc64le (PR33928) by allowing unsafe instruction scheduling.
Although we observed the problem on ppc64le, this is a platform neutral issue.
This patch makes the stack coloring pass maintains AliasAnalysis information when merging multiple stack slots.
This patch fixes PR33928.
llvm-svn: 309849
During store merge we construct a sorted list of consecutive store
candidates and consider subsequences for merging into a single
store. For each subsequence we check if the stored value type is legal
the merged store would have valid and fast and if the constructed
value to be stored is valid. The only properties that affect this
check between subsequences is the size of the subsequence, the
alignment of the first store, the alignment of the stored load value
(when merging stores-of-loads), and whether the merged value is a
constant zero.
If we do not find a viable mergeable subsequence starting from the
first store of length N, we know that a subsequence starting at a
later store of length N will also fail unless the new store's
alignment, the new load's alignment (if we're merging store-of-loads),
or we've dropped stores of nonzero value and could construct a merged
stores of zero (for merging constants).
As a result if we fail to find a valid subsequence starting from the
first store we can safely skip considering subsequences that start
with subsequent stores unless one of the above properties is
true. This significantly (2x) improves compile time in some
pathological cases.
Reviewers: RKSimon, efriedma, zvi, spatel, waltl
Subscribers: grandinj, llvm-commits
Differential Revision: https://reviews.llvm.org/D35901
llvm-svn: 309830
This should enable us to test the generation of target-specific constant
pools, e.g. for ARM:
constants:
- id: 0
value: 'g(GOT_PREL)-(LPC0+8-.)'
alignment: 4
isTargetSpecific: true
I intend to use this to test PIC support in GlobalISel for ARM.
This is difficult to test outside of that context, since the existing
MIR tests usually rely on parser support as well, and that seems a bit
trickier to add. We could try to add a unit test, but the setup for that
seems rather convoluted and overkill.
We do test however that the parser reports a nice error when
encountering a target-specific constant pool.
Differential Revision: https://reviews.llvm.org/D36092
llvm-svn: 309806
This pattern shows up when lowering byval copies on AMDGPU.
The byval object access is split into 4-byte chunks, adding a
constant offset to the FixedStack base. When some of the offsets
turn into ors, this prevents combining the constant offsets.
This makes it not apparent that the object is there when matching
addressing modes, so it ends up using a scratch wave offset
relative access and the lengthy frame index expansion for that.
llvm-svn: 309775
instead of using the deprecated offset field of DBG_VALUE.
This has no observable effect on the generated DWARF, but the
assembler comments will look different.
rdar://problem/33580047
llvm-svn: 309773
In the last half-dozen commits to LLVM I removed code that became dead
after removing the offset parameter from llvm.dbg.value gradually
proceeding from IR towards the backend. Before I can move on to
DwarfDebug and friends there is one last side-called offset I need to
remove: This patch modifies PrologEpilogInserter's use of the
DBG_VALUE's offset argument to use a DIExpression instead. Because the
PrologEpilogInserter runs at the Machine level I had to play a little
trick with a named llvm.dbg.mir node to get the DIExpressions to print
in MIR dumps (which print the llvm::Module followed by the
MachineFunction dump).
I also had to add rudimentary DwarfExpression support to CodeView and
as a side-effect also fixed a bug (CodeViewDebug::collectVariableInfo
was supposed to give up on variables with complex DIExpressions, but
would fail to do so for fragments, which are also modeled as
DIExpressions).
With this last holdover removed we will have only one canonical way of
representing offsets to debug locations which will simplify the code
in DwarfDebug (and future versions of CodeViewDebug once it starts
handling more complex expressions) and make it easier to reason about.
This patch is NFC-ish: All test case changes are for assembler
comments and the binary output does not change.
rdar://problem/33580047
Differential Revision: https://reviews.llvm.org/D36125
llvm-svn: 309751
Summary:
We already have information about static alloca stack locations in our
side table. Emitting instructions for them is inefficient, and it only
happens when the address of the alloca has been materialized within the
current block, which isn't often.
Reviewers: aprantl, probinson, dblaikie
Subscribers: jfb, dschuff, sbc100, jgravelle-google, hiraditya, llvm-commits, aheejin
Differential Revision: https://reviews.llvm.org/D36117
llvm-svn: 309729
Summary:
FEntryInserter pass unconditionally derefs the first Instruction
in the first Basic Block. The pass crashes when the first
BasicBlock is empty. Fix the crash by not dereferencing the basic
Block iterator. This fixes an issue observed when building Linux kernel
4.4 with clang.
Fixes PR33971.
Reviewers: hfinkel, niravd, dblaikie
Reviewed By: niravd
Subscribers: davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D35979
llvm-svn: 309694
Stack coloring pass need to maintain AliasAnalysis information when merging stack slots of different types.
Actually, there is a FIXME comment in StackColoring.cpp
// FIXME: In order to enable the use of TBAA when using AA in CodeGen,
// we'll also need to update the TBAA nodes in MMOs with values
// derived from the merged allocas.
But, TBAA has been already enabled in CodeGen without fixing this pass.
The incorrect TBAA metadata results in recent failures in bootstrap test on ppc64le (PR33928) by allowing unsafe instruction scheduling.
Although we observed the problem on ppc64le, this is a platform neutral issue.
This patch makes the stack coloring pass maintains AliasAnalysis information when merging multiple stack slots.
llvm-svn: 309651
https://reviews.llvm.org/D31536 didn't really solve the problem it was
trying to solve; it got rid of the assertion failure, but we were still
scheduling the DAG incorrectly (mixing together instructions from
different calls), leading to a MachineVerifier failure.
In order to schedule the DAG correctly, we have to make sure we don't
schedule a node which should be blocked by an interference. Fix
ScheduleDAGRRList::PickNodeToScheduleBottomUp so it doesn't pick a node
like that.
The added call to FindAvailableNode() is the key change here; this makes
sure we don't try to schedule a call while we're in the middle of
scheduling a different call. I'm not sure this is the right approach; in
particular, I'm not sure how to prove we don't end up with an infinite
loop of repeatedly backtracking.
This also reverts the code change from D31536. It doesn't do anything
useful: we should never schedule an ADJCALLSTACKDOWN unless we've
already scheduled the corresponding ADJCALLSTACKUP.
Differential Revision: https://reviews.llvm.org/D33818
llvm-svn: 309642
Chromium's gold build seems to have trouble with this (gold produces
errors) - not sure if it's gold that's not coping with the valid
representation, or a bug in the implementation in LLVM, etc.
llvm-svn: 309630
When the first instruction of a basic block has no location (consider a
LEA materializing the address of an alloca for a call), we want to start
the line table for the block with the first valid source location in the
block. We need to ignore DBG_VALUE instructions during this scan to get
decent line tables.
llvm-svn: 309628
This patch refactors the code used in llc such that all the users of the
addPassesToEmitFile API have access to a homogeneous way of handling
start/stop-after/before options right out of the box.
In particular, just invoking addPassesToEmitFile will set the proper
pipeline without additional effort (modulo parsing a .mir file if the
start-before/after options are used.
NFC.
Differential Revision: https://reviews.llvm.org/D30913
llvm-svn: 309599
As noted in the code comment, transforming this in the other direction might require
a separate transform here in CGP given the block-at-a-time DAG constraint.
Besides that theoretical motivation, there are 2 practical motivations for the
subtract-of-cmps form:
1. The codegen for both x86 and PPC is better for this IR (though PPC could be better still).
There is discussion about canonicalizing IR to the select form
( http://lists.llvm.org/pipermail/llvm-dev/2017-July/114885.html ),
so we probably need to add DAG transforms for those patterns anyway, but this improves the
memcmp output without waiting for that step.
2. If we allow vector-sized chunks for the load and compare, x86 is better prepared to convert
that to optimal code when using subtract-of-cmps, so another prerequisite patch is avoided
if we choose to enable that.
Differential Revision: https://reviews.llvm.org/D34904
llvm-svn: 309597
PR33883 shows that calls to intrinsic functions should not have their vector
arguments or returns subject to ABI changes required by the target.
This resolves PR33883.
Thanks to Alex Crichton for reporting the issue!
Reviewers: zoran.jovanovic, atanasyan
Differential Revision: https://reviews.llvm.org/D35765
llvm-svn: 309561
Summary:
Since r293359, most dump() function are only defined when
`!defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)` holds. print() functions
only used by dump() functions are now unused in release builds,
generating lots of warnings. This patch only defines some print()
functions if they are used.
Reviewers: MatzeB
Reviewed By: MatzeB
Subscribers: arsenm, mzolotukhin, nhaehnle, llvm-commits
Differential Revision: https://reviews.llvm.org/D35949
llvm-svn: 309553
Missed the resetting base address selections when going from a base
address version to zero base address for non-base-addressed entries.
llvm-svn: 309529
(from comments in the test)
Group ranges in a range list that apply to the same section and use a base
address selection entry to reduce the number of relocations to one reloc per
section per range list. DWARF5 debug_rnglist will be more efficient than this
in terms of relocations, but it's still better than one reloc per entry in a
range list.
This is an object/executable size tradeoff - shrinking objects, but growing
the linked executable. In one large binary tested, total object size (not just
debug info) shrank by 16%, entirely relocation entries. Linked executable
grew by 4%. This was with compressed debug info in the objects, uncompressed
in the linked executable. Without compression in the objects, the win would be
smaller (the growth of debug_ranges itself would be more significant).
llvm-svn: 309526
This patch is in 2 parts:
1 - replace combineBT's use of SimplifyDemandedBits (hasOneUse only) with SelectionDAG::GetDemandedBits to more aggressively determine the lower bits used by BT.
2 - update SelectionDAG::GetDemandedBits to support ANY_EXTEND - if the demanded bits are only in the non-extended portion, then peek through and demand from the source value and then ANY_EXTEND that if we found a match.
Differential Revision: https://reviews.llvm.org/D35896
llvm-svn: 309486
This commit
- Removes IsTailCall and replaces it with a target-defined unsigned
- Refactors getOutliningCallOverhead and getOutliningFrameOverhead so that they don't use IsTailCall
- Adds a call class + frame class classification to OutlinedFunction and Candidate respectively
This accomplishes a couple things.
Firstly, we don't need the notion of *tail call* in the general outlining algorithm.
Secondly, we now can have different "outlining classes" for each candidate within a set of candidates.
This will make it easy to add new ways to outline sequences for certain targets and dynamically choose
an appropriate cost model for a sequence depending on the context that that sequence lives in.
Ultimately, this should get us closer to being able to do something like, say avoid saving the link
register when outlining AArch64 instructions.
llvm-svn: 309475
There is no situation where this rarely-used argument cannot be
substituted with a DIExpression and removing it allows us to simplify
the DWARF backend. Note that this patch does not yet remove any of
the newly dead code.
rdar://problem/33580047
Differential Revision: https://reviews.llvm.org/D35951
llvm-svn: 309426
The conditional tail call logic did the wrong thing when both
destinations of a conditional branch were the same:
BB#1: derived from LLVM BB %entry
Live Ins: %EFLAGS
Predecessors according to CFG: BB#0
JE_1 <BB#5>, %EFLAGS<imp-use,kill>
JMP_1 <BB#5>
BB#5: derived from LLVM BB %sw.epilog
Predecessors according to CFG: BB#1
TCRETURNdi64 <ga:@mergeable_conditional_tailcall>, 0, ...
We would fold the JE_1 to a TCRETURNdi64cc, and then remove our BB#5
successor. Then BB#5 would be deleted as it had no predecessors, leaving
a dangling "JMP_1 <BB#5>" reference behind to cause assertions later.
This patch checks that both conditional branch destinations are
different before doing the transform. The standard branch folding logic
is able to remove both the JMP_1 and the JE_1, and for my test case we
end up forming a better conditional tail call later.
Fixes PR33980
llvm-svn: 309422
This is some more cleanup in preparation for some actual
functional changes. This splits getOutliningBenefit into
two cost functions: getOutliningCallOverhead and
getOutliningFrameOverhead. These functions return the
number of instructions that would be required to call
a specific function and the number of instructions
that would be required to construct a frame for a
specific funtion. The actual outlining benefit logic
is moved into the outliner, which calls these functions.
The goal of refactoring getOutliningBenefit is to:
- Get us closer to getting rid of the IsTailCall flag
- Further split up "target-specific" things and
"general algorithm" things
llvm-svn: 309356
This can come up in ThinLTO & wastes space & makes degenerate IR.
As per the added FIXME, ultimately, local imported entities should hang
off the function and that way the imported entity list on the CU can be
tested for emptiness like all the other CU lists.
(function-attached local imported entities are probably also the best
path forward for fixing how imported entities are handled both in
cross-module use (currently, while ThinLTO preserves the imported
entities, they would not get used at the imported inlined location -
only in the abstract origin that appears in the partial CU created by
the import (which isn't emitted under Fission due to cross-CU
limitations there)) and to reduce the number of points where imported
entities are emitted (they're currently emitted into every inlined
instance, concrete instance, and abstract origin - they should only go
in teh abstract origin if there is one, otherwise in the concrete
instance - but this requires lots of delayed handling and wiring up,
same as abstract variables & subprograms))
llvm-svn: 309354
Doing some cleanup in preparation for some functional changes.
This commit moves findCandidates out of the suffix tree and into the
MachineOutliner class. This is much easier to follow, and removes
the burden of candidate choice from the suffix tree.
It also adds a couple FIXMEs and simplifies building outlined function
names.
llvm-svn: 309334
Improve DAGTypeLegalizer::convertMask's isSETCCorConvertedSETCC assertion to properly check for any mixture of SETCC or BUILD_VECTOR of constants, or a logical mask op of them.
llvm-svn: 309302
Local imported entities at the top level of a subprogram were being
handled differently from those in nested scopes - that different
handling would cause pseudo concrete out-of-line definitions to be
created (but without any of their attributes, nor an abstract_origin) in
the case where there was no real concrete definition.
These local imported entities also only appeared in the concrete
definition where those imported entities in nested scopes appear in all
cases (abstract, concrete, and inlined). This change at least makes top
level case handle the same as the others - though there's a FIXME to
improve this to /only/ emit them into the abstract origin (though this
requires more plumbing - like the abstract subprogram and variable
handling that must defer population until the end of the unit to
discover if there is an abstract origin, or only a standalone concrete
definition).
llvm-svn: 309237