Similar to D55073. Without this change, the DAG combiner crashes on code
with more than 64k of stores in a single basic block that form parallelizable
chains.
No test case, as it would be very IR file.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D56740
llvm-svn: 351571
Defer inline asm's output fixup work until after we've generated the
inline asm node itself. Remove StoresToEmit, IndirectStoresToEmit, and
RetValRegs in favor of using ConstraintOperands.
llvm-svn: 351558
This functionality is required at multiple places which potentially
create large operand lists, like SelectionDAGBuilder or DAGCombiner.
Differential Revision: https://reviews.llvm.org/D56739
llvm-svn: 351552
Summary:
Use this helper to make sure we use the same value at various places.
This will likely be needed at more places were we currently crash
because we use more operands than possible.
Also makes it easier to change in the future.
Reviewers: RKSimon, craig.topper, efriedma, aemerson
Reviewed By: RKSimon
Subscribers: hiraditya, arsenm, llvm-commits
Differential Revision: https://reviews.llvm.org/D56859
llvm-svn: 351537
We should not pre-scheduled the node has ADJCALLSTACKDOWN parent,
or else, when bottom-up scheduling, ADJCALLSTACKDOWN and
ADJCALLSTACKUP may hold CallResource too long and make other
calls can't be scheduled. If there's no other available node
to schedule, the scheduler will try to rename the register by
creating copy to avoid the conflict which will fail because
CallResource is not a real physical register.
llvm-svn: 351527
The callee address is added as an optional operand (MCSymbol) in
AdjustInstrPostInstrSelection() and then used by asm printer to insert:
'.reloc tmplabel, R_MIPS_JALR, symbol
tmplabel:'.
Controlled with '-mips-jalr-reloc', default is true.
Differential revision: https://reviews.llvm.org/D56694
llvm-svn: 351485
Currently we do not always collapse subsequent .loc 0 0 directives. The
reason is that we were checking for a PrevInstLoc which is not set when
we emit a line-0 record. We should only check the LastAsmLine, which
seems to be created exactly for this purpose.
// When we emit a line-0 record, we don't update PrevInstLoc; so look at
// the last line number actually emitted, to see if it was line 0.
unsigned LastAsmLine =
Asm->OutStreamer->getContext().getCurrentDwarfLoc().getLine();
Differential revision: https://reviews.llvm.org/D56767
llvm-svn: 351395
Summary:
This patch supports MS SEH extensions __try/__except/__finally. The intrinsics localescape and localrecover are responsible for communicating escaped static allocas from the try block to the handler.
We need to preserve frame pointers for SEH. So we create a new function/property HasLocalEscape.
Reviewers: rnk, compnerd, mstorsjo, TomTan, efriedma, ssijaric
Reviewed By: rnk, efriedma
Subscribers: smeenai, jrmuizel, alex, majnemer, ssijaric, ehsan, dmajor, kristina, javed.absar, kristof.beyls, chrib, llvm-commits
Differential Revision: https://reviews.llvm.org/D53540
llvm-svn: 351370
dbg.value intrinsics can appear in blocks where their operand is not used,
meaning the operand never receives an SDNode, and thus no DBG_VALUE will
be created. Get around this by looking to see whether the operand has already
been allocated a virtual register. This allows dbg.values of Phi node and
Values that are used across basic blocks to successfully be translated into
DBG_VALUEs.
Differential Revision: https://reviews.llvm.org/D56678
llvm-svn: 351358
The value returned by max() is the last valid value, adjust the
comparison accordingly.
The code added in D55073 creates TokenFactors with max() operands.
Reviewers: aemerson, efriedma, RKSimon, craig.topper
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D56738
llvm-svn: 351318
ReduceLoadWidth can trigger using a shifted mask is used and this
requires that the function return a shl node to correct for the
offset. However, the way that this was implemented meant that the
returned result could be an existing node, which would be incorrect.
This fixes the method of inserting the new node and replacing uses.
Differential Revision: https://reviews.llvm.org/D50432
llvm-svn: 351310
https://reviews.llvm.org/D52803
This patch adds support to continuously CSE instructions during
each of the GISel passes. It consists of a GISelCSEInfo analysis pass
that can be used by the CSEMIRBuilder.
llvm-svn: 351283
Summary:
Make recoverfp intrinsic target-independent so that it can be implemented for AArch64, etc.
Refer D53541 for the context. Clang counterpart D56748.
Reviewers: rnk, efriedma
Reviewed By: rnk, efriedma
Subscribers: javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D56747
llvm-svn: 351281
The motivating case for this is shown in the first regression test. We are
transferring to scalar and back rather than just zero-extending with 'vpmovzxdq'.
That's a special-case for a more general pattern as shown here. In all tests,
we're avoiding the vector-scalar-vector moves in favor of vector ops.
We aren't producing optimal shuffle code in some cases though, so the patch is
limited to reduce regressions.
Differential Revision: https://reviews.llvm.org/D56281
llvm-svn: 351198
A block ending in an unconditional branch can have two successors if one
is a landing pad. In practice, I think this only has an effect on
Windows because landing pads are never empty for Itanium unwinding.
(Alternatively, I could add a check to
AArch64InstrInfo::canInsertSelect, but this seems more obvious.)
Differential Revision: https://reviews.llvm.org/D56468
llvm-svn: 351142
Split MachinePipeliner code into header and cpp files to allow
inheritance from SwingSchedulerDAG.
This reapplies https://reviews.llvm.org/D56084 after moving the
implementation of the dump functions into the .cpp files. This fixes a
linker error when building with Clang modules enables and local
submodule visibility disabled.
Original patch by Lama Saba <lama.saba@intel.com>!
llvm-svn: 351077
Part of the effort to refactoring frame pointer code generation. We used
to use two function attributes "no-frame-pointer-elim" and
"no-frame-pointer-elim-non-leaf" to represent three kinds of frame
pointer usage: (all) frames use frame pointer, (non-leaf) frames use
frame pointer, (none) frame use frame pointer. This CL makes the idea
explicit by using only one enum function attribute "frame-pointer"
Option "-frame-pointer=" replaces "-disable-fp-elim" for tools such as
llc.
"no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" are still
supported for easy migration to "frame-pointer".
tests are mostly updated with
// replace command line args ‘-disable-fp-elim=false’ with ‘-frame-pointer=none’
grep -iIrnl '\-disable-fp-elim=false' * | xargs sed -i '' -e "s/-disable-fp-elim=false/-frame-pointer=none/g"
// replace command line args ‘-disable-fp-elim’ with ‘-frame-pointer=all’
grep -iIrnl '\-disable-fp-elim' * | xargs sed -i '' -e "s/-disable-fp-elim/-frame-pointer=all/g"
Patch by Yuanfang Chen (tabloid.adroit)!
Differential Revision: https://reviews.llvm.org/D56351
llvm-svn: 351049
I accidentally triggered this code while doing some experiments and it doesn't look lke it could possibly work.
It calls 'getNOT' on a node that should be a CondCode.
I think to do this right we would need to swap the branch target and the fallthrough target. But that's not easy to do. Or we could create an explicit SetCC and feed that into a new BR_CC?
llvm-svn: 351022
This pattern:
t33: v8i32 = insert_subvector undef:v8i32, t35, Constant:i64<0>
t21: v16i32 = insert_subvector undef:v16i32, t33, Constant:i64<0>
...shows up in PR33758:
https://bugs.llvm.org/show_bug.cgi?id=33758
...although this patch doesn't make any difference to the final result on that yet.
In the affected tests here, it looks like it just makes RA wiggle. But we might
as well squash this to prevent it interfering with other pattern-matching.
Differential Revision:
https://reviews.llvm.org/D56604
llvm-svn: 351008
This patch takes some of the code from D49837 to allow us to enable ISD::ABS support for all SSE vector types.
Differential Revision: https://reviews.llvm.org/D56544
llvm-svn: 350998
Summary:
When legalizing the result of a SELECT_CC node by promoting the
floating-point type, use the promoted-to type rather than the original
type.
Fix PR40273.
Reviewers: efriedma, majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56566
llvm-svn: 350951
That is, remove many of the calls to Type::getNumContainedTypes(),
Type::subtypes(), and Type::getContainedType(N).
I'm not intending to remove these accessors -- they are
useful/necessary in some cases. However, removing the pointee type
from pointers would potentially break some uses, and reducing the
number of calls makes it easier to audit.
llvm-svn: 350835
If the caller's return type does not have a zeroext attribute but the
callee does a tail call zeroext, we won't consider the tail call during
CodeGenPrepare because the attributes don't match.
However, if the result of the tail call has no uses, it makes sense to
drop the sext/zext attributes.
Differential Revision: https://reviews.llvm.org/D56486
llvm-svn: 350753
Summary:
This fixes PR39710. In that case we emitted a location list looking like
this:
.Ldebug_loc0:
.quad .Lfunc_begin0-.Lfunc_begin0
.quad .Lfunc_begin0-.Lfunc_begin0
.short 1 # Loc expr size
.byte 85 # DW_OP_reg5
.quad .Lfunc_begin0-.Lfunc_begin0
.quad .Lfunc_end0-.Lfunc_begin0
.short 1 # Loc expr size
.byte 85 # super-register DW_OP_reg5
.quad 0
.quad 0
As seen, the first entry's beginning and ending addresses evalute to 0,
which meant that the entry inadvertently became an "end of list" entry,
resulting in the location list ending sooner than expected.
To fix this, omit all entries with empty ranges. Location list entries
with empty ranges do not have any effect, as specified by DWARF, so we
might as well drop them:
"A location list entry (but not a base address selection or end of list
entry) whose beginning and ending addresses are equal has no effect
because the size of the range covered by such an entry is zero."
Reviewers: davide, aprantl, dblaikie
Reviewed By: aprantl
Subscribers: javed.absar, JDevlieghere, llvm-commits
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D55919
llvm-svn: 350698
This removes check for single use from general ShrinkDemandedConstant
to the BE because of the AArch64 regression after D56289/rL350475.
After several hours of experiments I did not come up with a testcase
failing on any other targets if check is not performed.
Moreover, direct call to ShrinkDemandedConstant is not really needed
and superceed by SimplifyDemandedBits.
Differential Revision: https://reviews.llvm.org/D56406
llvm-svn: 350684
Currently it's possible for following
check on V.WriteLanes (which is not really meaningful
during SubRangeJoin) to pass for one half of the pair,
and then fall through to to one of the impossible
or unresolved states. This then fails as inconsistent
on the other half.
During the main range join, the check between V.WriteLanes
and OtherV.ValidLanes must have passed, meaning this
should be a CR_Replace.
Fixes most of the testcases in bugs 39542 and 39602
llvm-svn: 350678
We can't go back and recover the lanes if it turns
out the implicit_def really can't be erased.
Assume all lanes are valid if an unresolved conflict
is encountered. There aren't any tests where this
seems to matter either way, but this seems like a
safer option.
Fixes bug 39602
llvm-svn: 350676
This patch adds a convenience report() method for physical registers and
uses it to print the offending register with the 'MBB has allocatable
live-in' error.
Reviewers: MatzeB, rtereshin, dsanders
Reviewed By: dsanders
Differential Revision: https://reviews.llvm.org/D55946
llvm-svn: 350630
Commit rL347861 introduced an unintentional change in the behaviour when
compiling for AArch64 at -O0 with -global-isel=0. Previously, explicitly
disabling GlobalISel resulted in using FastISel but an updated condition
in the commit changed it to using SelectionDAG. The patch fixes this
condition and slightly better organizes the code that chooses the
instruction selector.
Fixes PR40131.
Differential Revision: https://reviews.llvm.org/D56266
llvm-svn: 350626
This reverts commit rL350497
reported remaining issues seem to be unrelated to modules or this change.
more info: https://reviews.llvm.org/D56084
llvm-svn: 350621
If a copy was needed to handle the condition of brcond, it was being
inserted before the defining instruction. Add tests for iterator edge
cases.
I find the existing code here suspect for the case where it's looking
for terminators that modify the register. It's going to insert a copy
in the middle of the terminators, which isn't allowed (it might be
necessary to have a COPY_terminator if anybody actually needs this).
Also legalize brcond for AMDGPU.
llvm-svn: 350595
merging a src register in ToBeUpdated set.
This is to fix PR40061 related with https://reviews.llvm.org/rL339035.
In https://reviews.llvm.org/rL339035, live interval of source pseudo register
in rematerialized copy may be saved in ToBeUpdated set and its update may be
postponed.
In PR40061, %t2 = %t1 is rematerialized and %t1 is added into toBeUpdated set
to postpone its live interval update. After the rematerialization, the live
interval of %t1 is larger than necessary. Then %t1 is merged into %t3 and %t1
gets removed. After the merge, %t3 contains live interval larger than necessary.
Because %t3 is not in toBeUpdated set, its live interval is not updated after
register coalescing and it will break some assumption in regalloc.
The patch requires the live interval of destination register in a merge to be
updated if the source register is in ToBeUpdated.
Differential revision: https://reviews.llvm.org/D55867
llvm-svn: 350586
As we saw in D56057 when we tried to use this function on X86, it's unsafe. It allows the operand node to have multiple users, but doesn't prevent recursing past the first node when it does have multiple users. This can cause other simplifications earlier in the graph without regard to what bits are needed by the other users of the first node. Ideally all we should do to the first node if it has multiple uses is bypass it when its not needed by the user we started from. Doing any other transformation that SimplifyDemandedBits can do like turning ZEXT/SEXT into AEXT would result in an increase in instructions.
Fortunately, we already have a function that can do just that, GetDemandedBits. It will only make transformations that involve bypassing a node.
This patch changes AMDGPU's simplifyI24, to use a combination of GetDemandedBits to handle the multiple use simplifications. And then uses the regular SimplifyDemandedBits on each operand to handle simplifications allowed when the operand only has a single use. Unfortunately, GetDemandedBits simplifies constants more aggressively than SimplifyDemandedBits. This caused the -7 constant in the changed test to be simplified to remove the upper bits. I had to modify computeKnownBits to account for this by ignoring the upper 8 bits of the input.
Differential Revision: https://reviews.llvm.org/D56087
llvm-svn: 350560
The FSHL/FSHR nodes are handled in the expand function, but they need to also be listed in the code that queries for the operation action too.
llvm-svn: 350490
Fixes cvt_f32_ubyte combine. performCvtF32UByteNCombine() could shrink
source node to demanded bits only even if there are other uses.
Differential Revision: https://reviews.llvm.org/D56289
llvm-svn: 350475
This adds support for calculating sign bits of insert_subvector. I based it on the computeKnownBits.
My motivating case is propagating sign bits information across basic blocks on AVX targets where concatenating using insert_subvector is common.
Differential Revision: https://reviews.llvm.org/D56283
llvm-svn: 350432
As noted in PR39973 and D55558:
https://bugs.llvm.org/show_bug.cgi?id=39973
...this is a partial implementation of a fold that we do as an IR canonicalization in instcombine:
// extelt (binop X, Y), Index --> binop (extelt X, Index), (extelt Y, Index)
We want to have this in the DAG too because as we can see in some of the test diffs (reductions),
the pattern may not be visible in IR.
Given that this is already an IR canonicalization, any backend that would prefer a vector op over
a scalar op is expected to already have the reverse transform in DAG lowering (not sure if that's
a realistic expectation though). The transform is limited with a TLI hook because there's an
existing transform in CodeGenPrepare that tries to do the opposite transform.
Differential Revision: https://reviews.llvm.org/D55722
llvm-svn: 350354
A DBG_VALUE between a two-address instruction and a following COPY
would prevent rescheduleMIBelowKill optimization inside
TwoAddressInstructionPass.
Differential Revision: https://reviews.llvm.org/D55987
llvm-svn: 350289
Currently we expand the two nodes separately. This gives DAG combiner an opportunity to optimize the expanded sequence taking into account only one set of users. When we expand the other node we'll create the expansion again, but might not be able to optimize it the same way. So the nodes won't CSE and we'll have two similarish sequences in the same basic block. By expanding both nodes at the same time we'll avoid prematurely optimizing the expansion until both the division and remainder have been replaced.
Improves the test case from PR38217. There may be additional opportunities after this.
Differential Revision: https://reviews.llvm.org/D56145
llvm-svn: 350239
By also promoting the input type we get a better idea for what scalar type to use. This can provide better results if the result of the extract is sign extended. What was previously happening is that the extract result would be legalized, sometime later the input of the sign extend would be legalized using the result of the extract. Then later the extract input would be legalized forcing a truncate into the input of the sign extend using a replace all uses. This requires DAG combine to combine out the sext/truncate pair. But sometimes we visited the truncate first and messed things up before the sext could be combined.
By creating the extract with the correct scalar type when we create legalize the result type, the truncate will be added right away. Then when the sign_extend input is legalized it will create an any_extend of the truncate which can be optimized by getNode to maybe remove the truncate. And then a sign_extend_inreg. Now DAG combine doesn't have to worry about getting rid of the extend.
This fixes the regression on X86 in D56156.
Differential Revision: https://reviews.llvm.org/D56176
llvm-svn: 350236
If x has multiple sign bits than it doesn't matter which one we extend from so we can sext from x's msb instead.
The X86 setcc-combine.ll changes are a little weird. It appears we ended up with a (sext_inreg (aext (trunc (extractelt)))) after type legalization. The sext_inreg+aext now gets optimized by this combine to leave (sext (trunc (extractelt))). Then we visit the trunc before we visit the sext. This ends up changing the truncate to an extractvectorelt from a bitcasted vector. I have a follow up patch to fix this.
Differential Revision: https://reviews.llvm.org/D56156
llvm-svn: 350235
default
During the lowering of a switch that would result in the generation of a jump
table, a range check is performed before indexing into the jump table, for the
switch value being outside the jump table range and a conditional branch is
inserted to jump to the default block. In case the default block is
unreachable, this conditional jump can be omitted. This patch implements
omitting this conditional branch for unreachable defaults.
Review Reference: D52002
llvm-svn: 350186
Fixes crash reported after r347354 for frontends that don't always emit
'this' pointers for methods. Now we will silently produce debug info
that makes functions like this look like static methods, which seems
reasonable.
llvm-svn: 350073
The patch adds a possibility to make library calls on NVPTX.
An important thing about library functions - they must be defined within
the current module. This basically should guarantee that we produce a
valid PTX assembly (without calls to not defined functions). The one who
wants to use the libcalls is probably will have to link against
compiler-rt or any other implementation.
Currently, it's completely impossible to make library calls because of
error LLVM ERROR: Cannot select: i32 = ExternalSymbol '...'. But we can
lower ExternalSymbol to TargetExternalSymbol and verify if the function
definition is available.
Also, there was an issue with a DAG during legalisation. When we expand
instruction into libcall, the inner call-chain isn't being "integrated"
into outer chain. Since the last "data-flow" (call retval load) node is
located in call-chain earlier than CALLSEQ_END node, the latter becomes
a leaf and therefore a dead node (and is being removed quite fast).
Proposed here solution relies on another data-flow pseudo nodes
(ProxyReg) which purpose is only to keep CALLSEQ_END at legalisation and
instruction selection phases - we remove the pseudo instructions before
register scheduling phase.
Patch by Denys Zariaiev!
Differential Revision: https://reviews.llvm.org/D34708
llvm-svn: 350069
Add widen scalar for type index 1 (i1 condition) for G_SELECT.
Select G_SELECT for pointer, s32(integer) and smaller low level
types on MIPS32.
Differential Revision: https://reviews.llvm.org/D56001
llvm-svn: 350063
This is an alternative to what I attempted in D56057.
GetDemandedBits is a special version of SimplifyDemandedBits that allows simplifications even when the operand has other uses. GetDemandedBits will only do simplifications that allow a node to be bypassed. It won't create new nodes or alter any of the other users.
I had to add support for bypassing SIGN_EXTEND_INREG to GetDemandedBits.
Based on a patch that Simon Pilgrim sent me in email.
Fixes PR40142.
llvm-svn: 350059
More migration so we can disable the implicit int -> LocationSize
conversion.
All of these are either scatter/gather'ed vector instructions, or direct
loads. Hence, they're all precise.
Perhaps if we see way more getTypeStoreSize calls, we can make a
getTypeStoreLocationSize (or similar) as a wrapper that applies this
::precise. Doesn't appear that it's a good idea to make getTypeStoreSize
return a LocationSize itself, however.
llvm-svn: 350042
It's dangerous to knowingly create an illegal vector type
no matter what stage of combining we're in.
This prevents the missed folding/scalarization seen in:
https://bugs.llvm.org/show_bug.cgi?id=40146
llvm-svn: 350034
trunc (add X, C ) --> add (trunc X), C'
If we're throwing away the top bits of an 'add' instruction, do it in the narrow destination type.
This makes the truncate-able opcode list identical to the sibling transform done in IR (in instcombine).
This change used to show regressions for x86, but those are gone after D55494.
This gets us closer to deleting the x86 custom function (combineTruncatedArithmetic)
that does almost the same thing.
Differential Revision: https://reviews.llvm.org/D55866
llvm-svn: 350006
Instruction::isLifetimeStartOrEnd() checks whether an Instruction is an
llvm.lifetime.start or an llvm.lifetime.end intrinsic.
This was suggested as a cleanup in D55967.
Differential Revision: https://reviews.llvm.org/D56019
llvm-svn: 349964
This adds support for widening G_FCEIL in LegalizerHelper and
AArch64LegalizerInfo. More specifically, it teaches the AArch64 legalizer to
widen G_FCEIL from a 16-bit float to a 32-bit float when the subtarget doesn't
support full FP 16.
This also updates AArch64/f16-instructions.ll to show that we perform the
correct transformation.
llvm-svn: 349927
This saves materializing the immediate. The additional forms are less
common (they don't usually show up for bitfield insert/extract), but
they're still relevant.
I had to add a new target hook to prevent DAGCombine from reversing the
transform. That isn't the only possible way to solve the conflict, but
it seems straightforward enough.
Differential Revision: https://reviews.llvm.org/D55630
llvm-svn: 349857
When deciding lazily whether a CU would be split or non-split I
accidentally dropped some handling for the line tables comp_dir (by
doing it lazily it was too late to be handled properly by the MC line
table code).
Move that bit of the code back to the non-lazy place.
llvm-svn: 349819
Emit static locals within the correct lexical scope so variables with the same
name will not confuse the debugger into getting the wrong value.
Differential Revision: https://reviews.llvm.org/D55336
llvm-svn: 349777
This patch enables funnel shift -> rotate building for all ROTL/ROTR custom/legal operations.
AFAICT X86 was the last target that was missing modulo support (PR38243), but I've tried to CC stakeholders for every target that has ROTL/ROTR custom handling for their final OK.
Differential Revision: https://reviews.llvm.org/D55747
llvm-svn: 349765
Summary:
This allows expanding {7,11,13,14,15,21,22,23,25,26,27,28,29,30,31}-byte memcmp
in just two loads on X86. These were previously calling memcmp.
Reviewers: spatel, gchatelet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D55263
llvm-svn: 349731
Creating the IR builder, then modifying the CFG, leads to an IRBuilder
where the BB and insertion point are inconsistent, so new instructions
have the wrong parent.
Modified an existing test because the test wasn't covering anything
useful (the "invoke" was not actually an invoke by the time we hit the
code in question).
Differential Revision: https://reviews.llvm.org/D55729
llvm-svn: 349693
This adds a G_FCEIL generic instruction and uses it in AArch64. This adds
selection for floating point ceil where it has a supported, dedicated
instruction. Other cases aren't handled here.
It updates the relevant gisel tests and adds a select-ceil test. It also adds a
check to arm64-vcvt.ll which ensures that we don't fall back when we run into
one of the relevant cases.
llvm-svn: 349664
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts is simplifying vector elements, we're seeing more constant BUILD_VECTOR containing undefs.
This patch provides opt-in support for UNDEF elements in matchBinaryPredicate, passing NULL instead of the result ConstantSDNode* argument.
I've updated the (or (and X, c1), c2) -> (and (or X, c2), c1|c2) fold to demonstrate its use, which I believe is safe for undef cases.
Differential Revision: https://reviews.llvm.org/D55822
llvm-svn: 349629
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts is simplifying vector elements, we're seeing more constant BUILD_VECTOR containing undefs.
This patch provides opt-in support for UNDEF elements in matchBinaryPredicate, passing NULL instead of the result ConstantSDNode* argument.
Differential Revision: https://reviews.llvm.org/D55822
llvm-svn: 349628
As described on PR40091, we have several places where zext (and zext_vector_inreg) fold an undef input into an undef output. For zero extensions this is incorrect as the output should guarantee to least have the new upper bits set to zero.
SimplifyDemandedVectorElts is the worst offender (and its the most likely to cause new undefs to appear) but DAGCombiner's tryToFoldExtendOfConstant has a similar issue.
Thanks to @dmgreen for catching this.
Differential Revision: https://reviews.llvm.org/D55883
llvm-svn: 349625
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts are simplifying vector elements, we're seeing more constant BUILD_VECTOR containing UNDEFs.
This patch provides opt-in handling of UNDEF elements in matchUnaryPredicate, passing NULL instead of the ConstantSDNode* argument.
I've updated SelectionDAG::simplifyShift to demonstrate its use.
Differential Revision: https://reviews.llvm.org/D55819
llvm-svn: 349616
This patch moved the following files in lib/CodeGen/AsmPrinter/
AsmPrinterHandler.h
DbgEntityHistoryCalculator.h
DebugHandlerBase.h
to include/llvm/CodeGen directory.
Such a change will enable Target to extend DebugHandlerBase
and emit Target specific debug info sections.
Signed-off-by: Yonghong Song <yhs@fb.com>
Differential Revision: https://reviews.llvm.org/D55755
llvm-svn: 349564
Clang uses weak linkage for objc runtime functions when they are not available on the platform.
The intrinsic has this linkage so we just need to pass that on to the runtime call.
llvm-svn: 349559
For performance reasons, clang set nonlazybind on these functions. Now that we
are using intrinsics instead of runtime calls, we should set this attribute when
creating the runtime functions.
llvm-svn: 349558
SelectionDAG currently changes these intrinsics to function calls, but that won't work
for other ISel's. Also we want to eventually support nonlazybind and weak linkage coming
from the front-end which we can't do in SelectionDAG.
llvm-svn: 349552
Summary: This the initial code change to facilitate managing FMF flags from Instructions to MI wrt Intrinsics in Global Isel. Eventually the GlobalObserver interface will be added as well, where FMF additions can be tracked for the builder and CSE.
Reviewers: aditya_nandakumar, bogner
Reviewed By: bogner
Subscribers: rovka, kristof.beyls, javed.absar
Differential Revision: https://reviews.llvm.org/D55668
llvm-svn: 349514
Add support for s64 libcalls for G_SDIV, G_UDIV, G_SREM and G_UREM
and use integer type of correct size when creating arguments for
CLI.lowerCall.
Select G_SDIV, G_UDIV, G_SREM and G_UREM for types s8, s16, s32 and s64
on MIPS32.
Differential Revision: https://reviews.llvm.org/D55651
llvm-svn: 349499
Add narrowScalar for G_AND and G_XOR.
Legalize G_AND G_OR and G_XOR for types other then s32
with clampScalar on MIPS32.
Differential Revision: https://reviews.llvm.org/D55362
llvm-svn: 349475
- Reapply changes intially introduced in r343089
- The archtecture info is no longer loaded whenever a DWARFContext is created
- The runtimes libraries (santiziers) make use of the dwarf context classes but
do not intialise the target info
- The architecture of the object can be obtained without loading the target info
- Adding a method to the dwarf context to get this information and multiplex the
string printing later on
Differential Revision: https://reviews.llvm.org/D55774
llvm-svn: 349472
For opcodes not covered by SimplifyDemandedVectorElts, SimplifyDemandedBits might be able to help now that it supports demanded elts as well.
llvm-svn: 349466
In PDBs, symbol records must be aligned to four bytes. However, in the
object file, symbol records may not be aligned. MSVC does not pad out
symbol records to make sure they are aligned. That means the linker has
to do extra work to insert the padding. Currently, LLD calculates the
required space with alignment, and copies each record one at a time
while padding them out to the correct size. It has a fast path that
avoids this copy when the records are already aligned.
This change fixes a bug in that codepath so that the copy is actually
saved, and tweaks LLVM's symbol record emission to align symbol records.
Here's how things compare when doing a plain clang Release+PDB build:
- objs are 0.65% bigger (negligible)
- link is 3.3% faster (negligible)
- saves allocating 441MB
- new LLD high water mark is ~1.05GB
llvm-svn: 349431
Mucking about simplifying a test case ( https://reviews.llvm.org/D55261 ) I stumbled across something I've hit before - that LLVM's (GCC's does too, FWIW) assembly output includes a hardcode length for a DWARF unit in its header. Instead we could emit a label difference - making the assembly easier to read/edit (though potentially at a slight (I haven't tried to observe it) performance cost of delaying/sinking the length computation into the MC layer).
Fix: Predicated all the changes (including creating the labels, even if they aren't used/needed) behind the NVPTX useSectionsAsReferences, avoiding emitting labels in NVPTX where ptxas can't parse them.
Reviewers: JDevlieghere, probinson, ABataev
Differential Revision: https://reviews.llvm.org/D55281
llvm-svn: 349430
The assertion type is always supposed to be a scalar type. So if the result VT of the assertion is a vector, we need to get the scalar VT before we can compare them.
Similarly for the assert above it.
I don't have a test case because I don't know of any place we violate this today. A coworker found this while trying to use r347287 on the 6.0 branch without also having r336868
llvm-svn: 349390
This is an initial patch to add the necessary support for a DemandedElts argument to SimplifyDemandedBits, more closely matching computeKnownBits and to help improve vector codegen.
I've added only a small amount of the changes necessary to get at least one test to update - a lot more can be done but I'd like to add these methodically with proper test coverage, at the same time the hope is to slowly move some/all of SimplifyDemandedVectorElts into SimplifyDemandedBits as well.
Differential Revision: https://reviews.llvm.org/D55768
llvm-svn: 349374
We keep a few iterators into the basic block we're selecting while
performing FastISel. Usually this is fine, but occasionally code wants
to remove already-emitted instructions. When this happens we have to be
careful to update those iterators so they're not pointint at dangling
memory.
llvm-svn: 349365
The transform performs a bitwise logic op in a wider type followed by
truncate when both inputs are truncated from the same source type:
logic_op (truncate x), (truncate y) --> truncate (logic_op x, y)
There are a bunch of other checks that should prevent doing this when
it might be harmful.
We already do this transform for scalars in this spot. The vector
limitation was shared with a check for the case when the operands are
extended. I'm not sure if that limit is needed either, but that would
be a separate patch.
Differential Revision: https://reviews.llvm.org/D55448
llvm-svn: 349303
Also exposes an issue in DAGCombiner::visitFunnelShift where we were assuming the shift amount had the result type (after legalization it'll have the targets shift amount type).
llvm-svn: 349298
Summary:
Make machine PHIs optimization to work for single value register taken from
several different copies. This is the first step to fix PR38917. This change
allows to get rid of redundant PHIs (see opt_phis2.mir test) to make
the subsequent optimizations (like CSE) possible and simpler.
For instance, before this patch the code like this:
%b = COPY %z
...
%a = PHI %bb1, %a; %bb2, %b
could be optimized to:
%a = %b
but the code like this:
%c = COPY %z
...
%b = COPY %z
...
%a = PHI %bb1, %a; %bb2, %b; %bb3, %c
would remain unchanged.
With this patch the latter case will be optimized:
%a = %z```.
Committed on behalf of: Anton Afanasyev anton.a.afanasyev@gmail.com
Reviewers: RKSimon, MatzeB
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54839
llvm-svn: 349271
In ThinLTO many split CUs may be effectively empty because of the lack
of support for cross-unit references in split DWARF.
Using a split unit in those cases is just a waste/overhead - and turned
out to be one contributor to a significant symbolizer performance issue
when global variable debug info was being imported (see r348416 for the
primary fix) due to symbolizers seeing CUs with no ranges, assuming
there might still be addresses covered and walking into the split CU to
see if there are any ranges (when that split CU was in a DWP file, that
meant loading the DWP and its index, the index was extra large because
of all these fractured/empty CUs... and so was very expensive to load).
(the 3rd fix which will follow, is to assume that a CU with no ranges is
empty rather than merely missing its CU level range data - and to not
walk into its DIEs (split or otherwise) in search of address information
that is generally not present)
llvm-svn: 349207
Previously beginning a symbol record was excessively verbose. Now it's a
bit simpler. This follows the same pattern as begin/endCVSubsection.
llvm-svn: 349205
Summary:
This allows us to register it with the MachineFunction delegate and be
notified automatically about erasure and creation of instructions. However,
we still need explicit notification for modifications such as those caused
by setReg() or replaceRegWith().
There is a catch with this though. The notification for creation is
delivered before any operands can be added. While appropriate for
scheduling combiner work. This is unfortunate for debug output since an
opcode by itself doesn't provide sufficient information on what happened.
As a result, the work list remembers the instructions (when debug output is
requested) and emits a more complete dump later.
Another nit is that the MachineFunction::Delegate provides const pointers
which is inconvenient since we want to use it to schedule future
modification. To resolve this GISelWorkList now has an optional pointer to
the MachineFunction which describes the scope of the work it is permitted
to schedule. If a given MachineInstr* is in this function then it is
permitted to schedule work to be performed on the MachineInstr's. An
alternative to this would be to remove the const from the
MachineFunction::Delegate interface, however delegates are not permitted
to modify the MachineInstr's they receive.
In addition to this, the observer has three interface changes.
* erasedInstr() is now erasingInstr() to indicate it is about to be erased
but still exists at the moment.
* changingInstr() and changedInstr() have been added to report changes
before and after they are made. This allows us to trace the changes
in the debug output.
* As a convenience changingAllUsesOfReg() and
finishedChangingAllUsesOfReg() will report changingInstr() and
changedInstr() for each use of a given register. This is primarily useful
for changes caused by MachineRegisterInfo::replaceRegWith()
With this in place, both combine rules have been updated to report their
changes to the observer.
Finally, make some cosmetic changes to the debug output and make Combiner
and CombinerHelp
Reviewers: aditya_nandakumar, bogner, volkan, rtereshin, javed.absar
Reviewed By: aditya_nandakumar
Subscribers: mgorny, rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D52947
llvm-svn: 349167
Implement options in clang to enable recording the driver command-line
in an ELF section.
Implement a new special named metadata, llvm.commandline, to support
frontends embedding their command-line options in IR/ASM/ELF.
This differs from the GCC implementation in some key ways:
* In GCC there is only one command-line possible per compilation-unit,
in LLVM it mirrors llvm.ident and multiple are allowed.
* In GCC individual options are separated by NULL bytes, in LLVM entire
command-lines are separated by NULL bytes. The advantage of the GCC
approach is to clearly delineate options in the face of embedded
spaces. The advantage of the LLVM approach is to support merging
multiple command-lines unambiguously, while handling embedded spaces
with escaping.
Differential Revision: https://reviews.llvm.org/D54487
Clang Differential Revision: https://reviews.llvm.org/D54489
llvm-svn: 349155
It costs nothing to spill an IMPLICIT_DEF value (the only spill code that's
generated is a KILL of the value), so when creating split constraints if the
live-out value is IMPLICIT_DEF the exit constraint should be DontCare instead
of PrefReg.
Differential Revision: https://reviews.llvm.org/D55652
llvm-svn: 349151