We support 'big to little' (e.g. extract_subvector(v16i8 bitcast(v2i64))) but not 'little to big' cases (e.g. extract_subvector(v2i64 bitcast(v16i8)))
llvm-svn: 364405
When we calculate MII, we use two loops, one with iterator R++ to
check whether we can reserve the resource, then --R to move back
the iterator to do reservation.
This is risky, as R++, --R may not point to the same element at all.
The can cause wrong MII.
Differential Revision: https://reviews.llvm.org/D63536
llvm-svn: 364353
Peephole opt has a one use limitation which appears to be accidental. The function being used was incorrectly documented as returning whether the def had one *user*, but instead returned true only when there was one *use*. Add a corresponding hasOneNonDbgUser helper, and adjust peephole-opt to use the appropriate one.
All of the actual folding code handles multiple uses within a single instruction. That codepath is well exercised through instruction selection.
Differential Revision: https://reviews.llvm.org/D63656
llvm-svn: 364336
Change the generic ctpop expansion to more efficiently handle a
check for not-a-power-of-two value:
(ctpop x) != 1 --> (x == 0) || ((x & x-1) != 0)
This is the inverted predicate sibling pattern that was added with:
D63004
This should have been done before I changed IR canonicalization to
favor this form with:
rL364246
...so if this requires revert/changing, the earlier commit may also
need to modified.
llvm-svn: 364319
Simplify ZERO_EXTEND_VECTOR_INREG if the extended bits are not required.
Matches what we already do for ZERO_EXTEND.
Reapplies rL363850 but now with legality checks added at rL364290
llvm-svn: 364303
This should not cause any visible change in output, but it's
more efficient because we were producing non-canonical 'sub x, 1'
and 'setcc ugt x, 0'. As mentioned in the TODO, we should also
be handling the inverse predicate.
llvm-svn: 364302
Simplify SIGN_EXTEND_VECTOR_INREG if the extended bits are not required/known zero.
Matches what we already do for SIGN_EXTEND.
Reapplies rL363802 but now with legality checks added at rL364290
llvm-svn: 364299
The *_EXTEND_VECTOR_INREG opcodes were relaxed back around rL346784 to support source vector widths that are smaller than the output - it looks like the legalizers were never updated to account for this.
This patch inserts the smaller source vector into an undef vector of the same width of the result before performing the shuffle+bitcast to correctly handle this.
Part of the yak shaving to solve the crashes from rL364264 and rL364272
llvm-svn: 364295
As part of the fix for rL364264 + rL364272 - limit the *_EXTEND conversion to !TLO.LegalOperations || isOperationLegal cases.
We'll improve X86 legality in future commits.
llvm-svn: 364290
Summary:
This addresses the regression that is being exposed by D50222 in `test/CodeGen/X86/jump_sign.ll`
The missing fold, at least partially, looks trivial:
https://rise4fun.com/Alive/Zsln
i.e. if we are comparing with zero, and comparing the `urem`-by-non-power-of-two,
and the `urem` is of something that may at most have a single bit set (or no bits set at all),
the `urem` is not needed.
Reviewers: RKSimon, craig.topper, xbolva00, spatel
Reviewed By: xbolva00, spatel
Subscribers: xbolva00, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63390
llvm-svn: 364286
This reverts the following patches.
"[TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG"
"[TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG"
"[TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG support"
We can end up with an any_extend_vector_inreg with a 256 bit result type
and a 128 bit result type. This is allowed by the ISD opcode, but the
generic operation legalizer is only able to expand cases where the
total vector width is the same.
The X86 backend creates these mismatched cases for zext_vec_inreg/sext_vec_inreg.
The SimplifyDemandedBits changes are allowing those nodes to become
aext_vec_inreg. For the zext/sext cases, the X86 backend has Custom
handling and never lets them get to the generic legalizer. We need to do the same
for aext_vec_inreg.
llvm-svn: 364264
Widen vector result type for ctlz_zero_undef and cttz_zero_undef the same as
ctlz and cttz.
Differential Revision: https://reviews.llvm.org/D63463
llvm-svn: 364221
Avoids using a plain unsigned for registers throughoug codegen.
Doesn't attempt to change every register use, just something a little
more than the set needed to build after changing the return type of
MachineOperand::getReg().
llvm-svn: 364191
This can occur under certain circumstances when undefs are created later on in the constant multipliers (e.g. in this case due to SimplifyDemandedVectorElts). Its better to let the shift by zero to occur and perform any cleanup afterward.
Fixes OSS Fuzz #15429
llvm-svn: 364179
The code divides the alignment by 2 if the original alignment is
equal to the original VT size. But this wouldn't be correct
if the alignment was larger than the VT size.
The memory operand object already takes care of calling MinAlign
on the base alignment and the memory pointer offset. So we don't
need any special code at all.
llvm-svn: 364151
GlobalISel/IRTranslator.cpp now references SelectionDAG/FunctionLoweringInfo.cpp.
This fixes a link error in -DBUILD_SHARED_LIBS=on builds:
ld.lld: error: undefined symbol: llvm::FunctionLoweringInfo::clear()
>>> referenced by IRTranslator.cpp:2198 (../lib/CodeGen/GlobalISel/IRTranslator.cpp:2198)
>>> lib/CodeGen/GlobalISel/CMakeFiles/LLVMGlobalISel.dir/IRTranslator.cpp.o:(llvm::IRTranslator::finalizeFunction())
llvm-svn: 364124
This change makes use of the newly refactored SwitchLoweringUtils code from
SelectionDAG to in order to generate jump tables and range checks where appropriate.
Much of this code is ported from SDAG with some modifications. We generate
G_JUMP_TABLE and G_BRJT instructions when JT opportunities are found. This means
that targets which previously relied on the naive one MBB per case stmt
translation will now start falling back until they add support for the new opcodes.
For range checks, we don't generate any previously unused operations. This
just recognizes contiguous ranges of case values and generates a single block per
range. Single case value blocks are just a special case of ranges so we get that
support almost for free.
There are still some optimizations missing that I haven't ported over, and
bit-tests are also unimplemented. This patch series is already complex enough.
Actual arm64 support for selection of jump tables is coming in a later patch.
Differential Revision: https://reviews.llvm.org/D63169
llvm-svn: 364085
G_INTTOPTR can prevent the localizer from moving G_CONSTANTs, but since it's
essentially a side effect free cast instruction we can remat both instructions.
This patch changes the localizer to enable localization of the chains by
iterating over the entry block instructions in reverse order. That way, uses will
localized first, and then the defs are free to be localized as well.
This also changes the previous SmallPtrSet of localized instructions to use a
SetVector instead. We're dealing with pointers and need deterministic iteration
order.
Overall, this change improves ARM64 -O0 CTMark code size by around 0.7% geomean.
Differential Revision: https://reviews.llvm.org/D63630
llvm-svn: 364001
We tend to only test for scalar/scalar consts when really we could support non-uniform vectors using ISD::matchUnaryPredicate/matchBinaryPredicate etc.
llvm-svn: 363924
Use getAPIntValue() in a few more places. Most of the time getZExtValue() is fine, but occasionally there's fuzzed code or someone decides to create i65536 or something.....
llvm-svn: 363887
Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases.
This requires us to tweak matchBinaryPredicate to allow it to (optionally) handle constants with different type widths.
llvm-svn: 363792
This allows targets to make more decisions about reserved registers
after isel. For example, now it should be certain there are calls or
stack objects in the frame or not, which could have been introduced by
legalization.
Patch by Matthias Braun
llvm-svn: 363757
Other than adding consistent demanded elts handling which was a trivial addition, the other differences in functionality will be added in later patches.
llvm-svn: 363713
Other than adding consistent demanded elts handling which was a trivial addition, the other differences in functionality will be added in later patches.
llvm-svn: 363710
This adds vector splitting for vaarg instructions during type legalization
Committed on behalf of @luke (Luke Lau)
Differential Revision: https://reviews.llvm.org/D60762
llvm-svn: 363671
Summary:
All the GlobalISel passes are initialized when the target calls
initializeGlobalISel(), so we don't need to call the initializers
from the pass constructors.
Reviewers: qcolombet, t.p.northover, paquette, dsanders, aemerson, aditya_nandakumar
Reviewed By: aemerson
Subscribers: rovka, kristof.beyls, hiraditya, volkan, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63235
llvm-svn: 363642
This was ignoring the flag on fneg, and using the source instruction's
flags. Also fixes tests missing from r358702.
Note the expansion itself isn't correct without nnan, but that should
be fixed separately.
llvm-svn: 363637
The goal is to improve hwasan's error reporting for stack use-after-return by
recording enough information to allow the specific variable that was accessed
to be identified based on the pointer's tag. Currently we record the PC and
lower bits of SP for each stack frame we create (which will eventually be
enough to derive the base tag used by the stack frame) but that's not enough
to determine the specific tag for each variable, which is the stack frame's
base tag XOR a value (the "tag offset") that is unique for each variable in
a function.
In IR, the tag offset is most naturally represented as part of a location
expression on the llvm.dbg.declare instruction. However, the presence of the
tag offset in the variable's actual location expression is likely to confuse
debuggers which won't know about tag offsets, and moreover the tag offset
is not required for a debugger to determine the location of the variable on
the stack, so at the DWARF level it is represented as an attribute so that
it will be ignored by debuggers that don't know about it.
Differential Revision: https://reviews.llvm.org/D63119
llvm-svn: 363635
Inter-block localization is the same as what currently happens, except now it
only runs on the entry block because that's where the problematic constants with
long live ranges come from.
The second phase is a new intra-block localization phase which attempts to
re-sink the already localized instructions further right before one of the
multiple uses.
One additional change is to also localize G_GLOBAL_VALUE as they're constants
too. However, on some targets like arm64 it takes multiple instructions to
materialize the value, so some additional heuristics with a TTI hook have been
introduced attempt to prevent code size regressions when localizing these.
Overall, these changes improve CTMark code size on arm64 by 1.2%.
Full code size results:
Program baseline new diff
------------------------------------------------------------------------------
test-suite...-typeset/consumer-typeset.test 1249984 1217216 -2.6%
test-suite...:: CTMark/ClamAV/clamscan.test 1264928 1232152 -2.6%
test-suite :: CTMark/SPASS/SPASS.test 1394092 1361316 -2.4%
test-suite...Mark/mafft/pairlocalalign.test 731320 714928 -2.2%
test-suite :: CTMark/lencod/lencod.test 1340592 1324200 -1.2%
test-suite :: CTMark/kimwitu++/kc.test 3853512 3820420 -0.9%
test-suite :: CTMark/Bullet/bullet.test 3406036 3389652 -0.5%
test-suite...ark/tramp3d-v4/tramp3d-v4.test 8017000 8016992 -0.0%
test-suite...TMark/7zip/7zip-benchmark.test 2856588 2856588 0.0%
test-suite...:: CTMark/sqlite3/sqlite3.test 765704 765704 0.0%
Geomean difference -1.2%
Differential Revision: https://reviews.llvm.org/D63303
llvm-svn: 363632
Summary: This case is related to D63405 in that we need to be propagating FMF on negates.
Reviewers: volkan, spatel, arsenm
Reviewed By: arsenm
Subscribers: wdng, javed.absar
Differential Revision: https://reviews.llvm.org/D63458
llvm-svn: 363631
Summary:
Change the way we deal with iterator invalidation in the extload combines as it
was still possible to neglect to visit a use. Even worse, it happened in the
in-tree test cases and the checks weren't good enough to detect it.
We now take a cheap copy of the use list before iterating over it. This
prevents iterator invalidation from occurring and has the nice side effect
of making the existing schedule-for-erase/schedule-for-insert mechanism
moot.
Reviewers: aditya_nandakumar
Reviewed By: aditya_nandakumar
Subscribers: rovka, kristof.beyls, javed.absar, volkan, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61813
llvm-svn: 363616
A target intrinsic may be defined as possibly reading memory, but the
call site may have additional knowledge that it doesn't read
memory. The intrinsic lowering will expect the pessimistic assumption
of the intrinsic definition, so the chain should still be used.
I fixed the same bug in SelectionDAG in r287593.
llvm-svn: 363580
I keep using the wrong instruction when manually writing tests. This
really needs to check the number of operands, but I don't see an easy
way to do that right now.
llvm-svn: 363579
Summary:
There is PHINode::getBasicBlockIndex() and PHINode::setIncomingValue()
but no function to replace incoming value for a specified BasicBlock*
predecessor.
Clearly, there are a lot of places that could use that functionality.
Reviewer: craig.topper, lebedev.ri, Meinersbur, kbarton, fhahn
Reviewed By: Meinersbur, fhahn
Subscribers: fhahn, hiraditya, zzheng, jsji, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D63338
llvm-svn: 363566
The HardwareLoops pass finds exit blocks with a scevable exit count.
If the target specifies to update the loop counter in a register,
through a phi, we need to ensure that the exit block is a latch so
that we can insert the phi with the correct value for the incoming
edge.
Differential Revision: https://reviews.llvm.org/D63336
llvm-svn: 363556
Some GEPs were not being split, presumably because that split would just be
undone by the DAGCombiner. Not performing those splits can prevent important
optimizations, such as preventing the element indices / member offsets from
being (partially) folded into load/store instruction immediates. This patch:
- Makes the splits also occur in the cases where the base address and the GEP
are in the same BB.
- Ensures that the DAGCombiner doesn't reassociate them back again.
Differential Revision: https://reviews.llvm.org/D60294
llvm-svn: 363544
This is already done in DAGCombiner::visitINSERT_SUBVECTOR, but this helps a number of shuffles across different vector widths recognise when they come from the same source.
llvm-svn: 363542
This patch changes MIR stack-id from an integer to an enum,
and adds printing/parsing support for this in MIR files. The default
stack-id '0' is now renamed to 'default'.
This should make MIR tests that have stack objects with different stack-ids
more descriptive. It also clarifies code operating on StackID.
Reviewers: arsenm, thegameg, qcolombet
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D60137
llvm-svn: 363533
This is based on the example/discussion in PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428
Proper vector shift instructions don't appear until AVX2, so we may generate several
extra instructions within a loop trying to compensate for that. It's difficult to
recover from that shift expansion later than this, so use the existing TLI hook and
splat analysis to enable better codegen.
This extends CGP functionality introduced with:
rL201655
Differential Revision: https://reviews.llvm.org/D63233
llvm-svn: 363511
This reverts rL363474. -debug-only=isel was added to some tests that
don't specify `REQUIRES: asserts`. This causes failures on
-DLLVM_ENABLE_ASSERTIONS=off builds.
I chose to revert instead of fixing the tests because I'm not sure
whether we should add `REQUIRES: asserts` to more tests.
llvm-svn: 363482
Current findBestLoopTop can find and move one kind of block to top, a latch block has one successor. Another common case is:
* a latch block
* it has two successors, one is loop header, another is exit
* it has more than one predecessors
If it is below one of its predecessors P, only P can fall through to it, all other predecessors need a jump to it, and another conditional jump to loop header. If it is moved before loop header, all its predecessors jump to it, then fall through to loop header. So all its predecessors except P can reduce one taken branch.
Differential Revision: https://reviews.llvm.org/D43256
llvm-svn: 363471
This is a branch opcode that takes a jump table pointer, jump table index and an
index into the table to do an indirect branch.
We pass both the table pointer and JTI to allow targets like ARM64 to more
easily use the existing jump table compression optimization without having to
walk up the block to find a paired G_JUMP_TABLE.
Differential Revision: https://reviews.llvm.org/D63159
llvm-svn: 363434
Avoid producing illegal register bank copies for reg_sequence and
phi. The default implementation assumes it is possible to pick any
operand's bank and use that for the result, introducing a copy for
operands with a different bank. This does not check for illegal
copies. It is not legal to introduce a VGPR->SGPR copy, so any VGPR
operand requires the result to be a VGPR.
The changes in getInstrMappingImpl aren't strictly necessary, since
AMDGPU now just bypasses this for reg_sequence/phi. This could be
replaced with an assert in case other targets run into this. It is
currently responsible for producing the error for unsatisfiable
copies, but this will be better served with a verifier check.
For phis, for now assume any undetermined operands must be
VGPRs. Eventually, this needs to be able to defer mapping these
operations. This also does not yet have a way to check for whether the
block is in a divergent region.
llvm-svn: 363410
Avoid a check for valid and a set of redundant asserts. The place
InstructionMapping is constructed asserts all of the default fields
are passed anyway for an invalid mapping, so don't overcomplicate
this.
llvm-svn: 363391
This is consistent with GCC's behavior (which is the defacto standard
for pubnames). Though I find the presence of enumerators from enum
classes to be a bit confusing, possibly a bug on GCC's end (since they
can't be named unqualified, unlike the other names - and names nested in
classes don't go in pubnames, for instance - presumably because one must
name the class first & that's enough to limit the scope of the search)
llvm-svn: 363349
Summary:
Before it was using the fully qualified name only for static data members.
Now it does for all variable names to match MSVC.
Reviewers: rnk
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63012
llvm-svn: 363335
Constants, including G_GLOBAL_VALUE, are all emitted into the entry block which
lets us use the vreg def assuming it dominates all other users. However, it can
cause jumpy debug behaviour since the DebugLoc attached to these MIs are from
a user instruction that could be in a different block.
Fixes PR40887.
Differential Revision: https://reviews.llvm.org/D63286
llvm-svn: 363331
This was exposed by PowerPC target enablement.
In ScheduleDAG, if we haven't seen any uses in this scheduling region,
we will create a dependence edge to ExitSU to model the live-out latency.
This is required for vreg defs with no in-region use, and prefetches with
no vreg def.
When we build NodeOrder in Scheduler, we ignore these boundary nodes.
However, when we check Succs in checkValidNodeOrder, we did not skip
them, so we still assume all the nodes have been sorted and in order in
Indices array. So when we call lower_bound() for ExitSU, it will return
Indices.end(), causing memory issues in following Node access.
Differential Revision: https://reviews.llvm.org/D63282
llvm-svn: 363329
Summary:
I found the following case having tail blocks with no successors merging opportunities after block placement.
Before block placement:
bb0:
...
bne a0, 0, bb2:
bb1:
mv a0, 1
ret
bb2:
...
bb3:
mv a0, 1
ret
bb4:
mv a0, -1
ret
The conditional branch bne in bb0 is opposite to beq.
After block placement:
bb0:
...
beq a0, 0, bb1
bb2:
...
bb4:
mv a0, -1
ret
bb1:
mv a0, 1
ret
bb3:
mv a0, 1
ret
After block placement, that appears new tail merging opportunity, bb1 and bb3 can be merged as one block. So the conditional constraint for merging tail blocks with no successors should be removed. In my experiment for RISC-V, it decreases code size.
Author of original patch: Jim Lin
Reviewers: haicheng, aheejin, craig.topper, rnk, RKSimon, Jim, dmgreen
Reviewed By: Jim, dmgreen
Subscribers: xbolva00, dschuff, javed.absar, sbc100, jgravelle-google, aheejin, kito-cheng, dmgreen, PkmX, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D54411
llvm-svn: 363284
Summary:
Relate bug: https://bugs.llvm.org/show_bug.cgi?id=37472
The shrink wrapping pass prematurally restores the stack, at a point where the stack might still be accessed.
Taking an exception can cause the stack to be corrupted.
As a first approach, this patch is overly conservative, assuming that any instruction that may load or store could access
the stack.
Reviewers: dmgreen, qcolombet
Reviewed By: qcolombet
Subscribers: simpal01, efriedma, eli.friedman, javed.absar, llvm-commits, eugenis, chill, carwil, thegameg
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63152
llvm-svn: 363265
This patch makes the LiveDebugValues pass consider fragments when propagating
DBG_VALUE insts between blocks, fixing PR41979. Fragment info for a variable
location is added to the open-ranges key, which allows distinct fragments to be
tracked separately. To handle overlapping fragments things become slightly
funkier. To avoid excessive searching for overlaps in the data-flow part of
LiveDebugValues, this patch:
* Pre-computes pairings of fragments that overlap, for each DILocalVariable
* During data-flow, whenever something happens that causes an open range to
be terminated (via erase), any fragments pre-determined to overlap are
also terminated.
The effect of which is that when encountering a DBG_VALUE fragment that
overlaps others, the overlapped fragments do not get propagated to other
blocks. We still rely on later location-list building to correctly handle
overlapping fragments within blocks.
It's unclear whether a mixture of DBG_VALUEs with and without fragmented
expressions are legitimate. To avoid suprises, this patch interprets a
DBG_VALUE with no fragment as overlapping any DBG_VALUE _with_ a fragment.
Differential Revision: https://reviews.llvm.org/D62904
llvm-svn: 363256
Since the DebugLocEntry::Value is used as part of DwarfDebug and
DebugLocEntry make it as the separate class.
Reviewers: aprantl, dstenb
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D63213
llvm-svn: 363246
We aim to ignore changes in variable locations during the prologue and
epilogue of functions, to avoid using space documenting location changes
that aren't visible. However in D61940 / r362951 this got ripped out as
the previous implementation was unsound.
Instead, use the FrameDestroy flag to identify when we're in the epilogue
of a function, and ignore variable location changes accordingly. This fits
in with existing code that examines the FrameSetup flag.
Some variable locations get shuffled in modified tests as they now cover
greater ranges, which is what would be expected. Some additional
single-location variables are generated too. Two tests are un-xfailed,
they were only xfailed due to r362951 deleting functionality they depended
on.
Apparently some out-of-tree backends don't accurately maintain FrameDestroy
flags -- if you're an out-of-tree maintainer and see changes in variable
locations disappear due to a faulty FrameDestroy flag, it's safe to back
this change out. The impact is just slightly more debug info than necessary.
Differential Revision: https://reviews.llvm.org/D62314
llvm-svn: 363245
As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space.
This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them.
If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores.
Differential Revision: https://reviews.llvm.org/D63075
llvm-svn: 363179
This was using its own, outdated list of possible captures. This was
at minimum not catching cmpxchg and addrspacecast captures.
One change is now any volatile access is treated as capturing. The
test coverage for this pass is quite inadequate, but this required
removing volatile in the lifetime capture test.
Also fixes some infrastructure issues to allow running just the IR
pass.
Fixes bug 42238.
llvm-svn: 363169
Summary:
Fix hoisting to basic block which are not legal for hoisting cause
it can be terminated by exception or it is return block.
Reviewers: john.brawn, RKSimon, MatzeB
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63148
llvm-svn: 363164
This opcode generates a pointer to the address of the jump table
specified by the source operand, which is a jump table index.
It will be used in conjunction with an upcoming G_BRJT opcode to support
jump table codegen with GlobalISel.
Differential Revision: https://reviews.llvm.org/D63111
llvm-svn: 363096
Implement necessary target hooks to enable MachinePipeliner for P9 only.
The pass is off by default, can be enabled with -ppc-enable-pipeliner for P9.
Differential Revision: https://reviews.llvm.org/D62164
llvm-svn: 363085
As suggested by @arsenm on D63075 - this adds a TargetLowering::allowsMemoryAccess wrapper that takes a Load/Store node's MachineMemOperand to handle the AddressSpace/Alignment arguments and will also implicitly handle the MachineMemOperand::Flags change in D63075.
llvm-svn: 363048
This patch changes how LLVM handles the accumulator/start value
in the reduction, by never ignoring it regardless of the presence of
fast-math flags on callsites. This change introduces the following
new intrinsics to replace the existing ones:
llvm.experimental.vector.reduce.fadd -> llvm.experimental.vector.reduce.v2.fadd
llvm.experimental.vector.reduce.fmul -> llvm.experimental.vector.reduce.v2.fmul
and adds functionality to auto-upgrade existing LLVM IR and bitcode.
Reviewers: RKSimon, greened, dmgreen, nikic, simoll, aemerson
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D60261
llvm-svn: 363035
An earlier fix of a subtle iterator invalidation bug had uncovered a
nondeterminism that was present in the MultiUsers bag. Problem was that
MultiUsers was being looked up using pointers.
This patch is an NFC change that numbers each multiuser and processes each in
numbered order. This fixes the test failure on netbsd and will likely fix the
green-dragon bot too.
llvm-svn: 363012
If the source is undef, then just don't do anything.
This matches SelectionDAG's behaviour in SelectionDAG.cpp.
Also add a test showing that we do the right thing here.
(irtranslator-memfunc-undef.ll)
Differential Revision: https://reviews.llvm.org/D63095
llvm-svn: 362989
This behavior was added in r130928 for both FastISel and SD, and then
disabled in r131156 for FastISel.
This re-enables it for FastISel with the corresponding fix.
This is triggered only when FastISel can't lower the arguments and falls
back to SelectionDAG for it.
FastISel contains a map of "register fixups" where at the end of the
selection phase it replaces all uses of a register with another
register that FastISel sometimes pre-assigned. Code at the end of
SelectionDAGISel::runOnMachineFunction is doing the replacement at the
very end of the function, while other pieces that come in before that
look through the MachineFunction and assume everything is done. In this
case, the real issue is that the code emitting COPY instructions for the
liveins (physreg to vreg) (EmitLiveInCopies) is checking if the vreg
assigned to the physreg is used, and if it's not, it will skip the COPY.
If a register wasn't replaced with its assigned fixup yet, the copy will
be skipped and we'll end up with uses of undefined registers.
This fix moves the replacement of registers before the emission of
copies for the live-ins.
The initial motivation for this fix is to enable tail calls for
swiftself functions, which were blocked because we couldn't prove that
the swiftself argument (which is callee-save) comes from a function
argument (live-in), because there was an extra copy (vreg to vreg).
A few tests are affected by this:
* llvm/test/CodeGen/AArch64/swifterror.ll: we used to spill x21
(callee-save) but never reload it because it's attached to the return.
We now don't even spill it anymore.
* llvm/test/CodeGen/*/swiftself.ll: we tail-call now.
* llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll: I believe this
test was not really testing the right thing, but it worked because the
same registers were re-used.
* llvm/test/CodeGen/ARM/cmpxchg-O0.ll: regalloc changes
* llvm/test/CodeGen/ARM/swifterror.ll: get rid of a copy
* llvm/test/CodeGen/Mips/*: get rid of spills and copies
* llvm/test/CodeGen/SystemZ/swift-return.ll: smaller stack
* llvm/test/CodeGen/X86/atomic-unordered.ll: smaller stack
* llvm/test/CodeGen/X86/swifterror.ll: same as AArch64
* llvm/test/DebugInfo/X86/dbg-declare-arg.ll: stack size changed
Differential Revision: https://reviews.llvm.org/D62361
llvm-svn: 362963
This commit reapplies r359426 (which was reverted in r360301 due to
performance problems) and rolls in D61940 to address the performance problem.
I've combined the two to avoid creating a span of slow-performance, and to
ease reverting if more problems crop up.
The summary of D61940: This patch removes the "ChangingRegs" facility in
DbgEntityHistoryCalculator, as its overapproximate nature can produce incorrect
variable locations. An unchanging register doesn't mean a variable doesn't
change its location.
The patch kills off everything that calculates the ChangingRegs vector.
Previously ChangingRegs spotted epilogues and marked registers as unchanging if
they weren't modified outside the epilogue, increasing the chance that we can
emit a single-location variable record. Without this feature,
debug-loc-offset.mir and pr19307.mir become temporarily XFAIL. They'll be
re-enabled by D62314, using the FrameDestroy flag to identify epilogues, I've
split this into two steps as FrameDestroy isn't necessarily supported by all
backends.
The logic for terminating variable locations at the end of a basic block now
becomes much more enjoyably simple: we just terminate them all.
Other test changes: inlined-argument.ll becomes XFAIL, but for a longer term.
The current algorithm for detecting that a variable has a single-location
doesn't work in this scenario (inlined function in multiple blocks), only other
bugs were making this test work. fission-ranges.ll gets slightly refreshed too,
as the location of "p" is now correctly determined to be a single location.
Differential Revision: https://reviews.llvm.org/D61940
llvm-svn: 362951
Variable's stack location can stretch longer than it should. If a
variable is placed at the stack in a some nested basic block its range
can be calculated to be up to the next occurrence of the variable's
DBG_VALUE, or up to the end of the function, thus covering a basic
blocks that should not be included in the variable’s location range.
This happens because the DbgEntityHistoryCalculator ends register
locations at the end of a basic block only if the variable’s location
register has been changed throughout the function, which is not the
case for the register used to reference stack objects.
This patch also tries to produce a single value location if the location
list builder managed to merge all the locations into one.
Reviewers: aprantl, dstenb, jmorse
Reviewed By: aprantl, dstenb, jmorse
Subscribers: djtodoro, ivanbaev, asowda
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D61600
llvm-svn: 362923
This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c
static void store64(u64 x, unsigned char* y)
{
for(int i = 0; i != 8; ++i)
y[i] = (x >> ((7-i) * 8)) & 255;
}
static u64 load64(const unsigned char* y)
{
u64 res = 0;
for(int i = 0; i != 8; ++i)
res |= (u64)(y[i]) << ((7-i) * 8);
return res;
}
The load64 has been implemented by https://reviews.llvm.org/D26149
This patch is trying to implement the store pattern.
Match a pattern where a wide type scalar value is stored by several narrow
stores. Fold it into a single store or a BSWAP and a store if the targets
supports it.
Assuming little endian target:
i8 *p = ...
i32 val = ...
p[0] = (val >> 0) & 0xFF;
p[1] = (val >> 8) & 0xFF;
p[2] = (val >> 16) & 0xFF;
p[3] = (val >> 24) & 0xFF;
>
*((i32)p) = val;
i8 *p = ...
i32 val = ...
p[0] = (val >> 24) & 0xFF;
p[1] = (val >> 16) & 0xFF;
p[2] = (val >> 8) & 0xFF;
p[3] = (val >> 0) & 0xFF;
>
*((i32)p) = BSWAP(val);
Differential Revision: https://reviews.llvm.org/D62897
llvm-svn: 362921
This is the second part of the commit fixing PR38917 (hoisting
partitially redundant machine instruction). Most of PRE (partitial
redundancy elimination) and CSE work is done on LLVM IR, but some of
redundancy arises during DAG legalization. Machine CSE is not enough
to deal with it. This simple PRE implementation works a little bit
intricately: it passes before CSE, looking for partitial redundancy
and transforming it to fully redundancy, anticipating that the next
CSE step will eliminate this created redundancy. If CSE doesn't
eliminate this, than created instruction will remain dead and eliminated
later by Remove Dead Machine Instructions pass.
The third part of the commit is supposed to refactor MachineCSE,
to make it more clear and to merge MachinePRE with MachineCSE,
so one need no rely on further Remove Dead pass to clear instrs
not eliminated by CSE.
First step: https://reviews.llvm.org/D54839
Fixes llvm.org/PR38917
This is fixed recommit of r361356 after PowerPC64 multistage build failure.
llvm-svn: 362901
This patch aims to reduce spilling and register moves by using the 3-address
versions of instructions per default instead of the 2-address equivalent
ones. It seems that both spilling and register moves are improved noticeably
generally.
Regalloc hints are passed to increase conversions to 2-address instructions
which are done in SystemZShortenInst.cpp (after regalloc).
Since the SystemZ reg/mem instructions are 2-address (dst and lhs regs are
the same), foldMemoryOperandImpl() can no longer trivially fold a spilled
source register since the reg/reg instruction is now 3-address. In order to
remedy this, new 3-address pseudo memory instructions are used to perform the
folding only when the dst and lhs virtual registers are known to be allocated
to the same physreg. In order to not let MachineCopyPropagation run and
change registers on these transformed instructions (making it 3-address), a
new target pass called SystemZPostRewrite.cpp is run just after
VirtRegRewriter, that immediately lowers the pseudo to a target instruction.
If it would have been possibe to insert a COPY instruction and change a
register operand (convert to 2-address) in foldMemoryOperandImpl() while
trusting that the caller (e.g. InlineSpiller) would update/repair the
involved LiveIntervals, the solution involving pseudo instructions would not
have been needed. This is perhaps a potential improvement (see Phabricator
post).
Common code changes:
* A new hook TargetPassConfig::addPostRewrite() is utilized to be able to run a
target pass immediately before MachineCopyPropagation.
* VirtRegMap is passed as an argument to foldMemoryOperand().
Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D60888
llvm-svn: 362868
In order for GlobalISel to re-use the significant amount of analysis and
optimization code in SDAG's switch lowering, we first have to extract it and
create an interface to be used by both frameworks.
No test changes as it's NFC.
Differential Revision: https://reviews.llvm.org/D62745
llvm-svn: 362857
When we call checkResourceLimit in bumpCycle or bumpNode, and we
know the resource count has just reached the limit (the equations
are equal). We should return true to mark that we are resource
limited for next schedule, or else we might continue to schedule
in favor of latency for 1 more schedule and create a schedule that
actually overbook the resource.
When we call checkResourceLimit to estimate the resource limite before
scheduling, we don't need to return true even if the equations are
equal, as it shouldn't limit the schedule for it .
Differential Revision: https://reviews.llvm.org/D62345
llvm-svn: 362805
This could fail, which looked concerning. However nothing was actually
using the results of this. I assume this was intended to use the
anti-feature of analyzeBranch of removing instructions, but wasn't
actually calling it with AllowModify = true.
Fixes bug 42162.
llvm-svn: 362800
Patch which introduces a target-independent framework for generating
hardware loops at the IR level. Most of the code has been taken from
PowerPC CTRLoops and PowerPC has been ported over to use this generic
pass. The target dependent parts have been moved into
TargetTransformInfo, via isHardwareLoopProfitable, with
HardwareLoopInfo introduced to transfer information from the backend.
Three generic intrinsics have been introduced:
- void @llvm.set_loop_iterations
Takes as a single operand, the number of iterations to be executed.
- i1 @llvm.loop_decrement(anyint)
Takes the maximum number of elements processed in an iteration of
the loop body and subtracts this from the total count. Returns
false when the loop should exit.
- anyint @llvm.loop_decrement_reg(anyint, anyint)
Takes the number of elements remaining to be processed as well as
the maximum numbe of elements processed in an iteration of the loop
body. Returns the updated number of elements remaining.
llvm-svn: 362774
Incorrect Debug Variable Range was calculated while "COMPUTING LIVE DEBUG VARIABLES" stage.
Range for Debug Variable("i") computed according to current state of instructions
inside of basic block. But Register Allocator creates new instructions which were not taken
into account when Live Debug Variables computed. In the result DBG_VALUE instruction for
the "i" variable was put after these newly inserted instructions. This is incorrect.
Debug Value for the loop counter should be inserted before any loop instruction.
Differential Revision: https://reviews.llvm.org/D62650
llvm-svn: 362750
Summary:
(1) Function descriptor on AIX
On AIX, a called routine may have 2 distinct symbols associated with it:
* A function descriptor (Name)
* A function entry point (.Name)
The descriptor structure on AIX is the same as those in the ELF V1 ABI:
* The address of the entry point of the function.
* The TOC base address for the function.
* The environment pointer.
The descriptor symbol uses the same name as the source level function in C.
The function entry point is analogous to the symbol we would generate for a
function in a non-descriptor-based ABI, except that it is renamed by
prepending a ".".
Which symbol gets referenced depends on the context:
* Taking the address of the function references the descriptor symbol.
* Calling the function references the entry point symbol.
(2) Speaking of implementation on AIX, for direct function call target, we
create proper MCSymbol SDNode(e.g . ".foo") while constructing SDAG to
replace original TargetGlobalAddress SDNode. Then down the path, we can
take advantage of this MCSymbol.
Patch by: Xiangling_L
Reviewed by: sfertile, hubert.reinterpretcast, jasonliu, syzaara
Differential Revision: https://reviews.llvm.org/D62532
llvm-svn: 362735
This patch is the first step towards ensuring MergeConsecutiveStores correctly handles non-temporal loads\stores:
1 - When merging load\stores we must ensure that they all have the same non-temporal flag. This is unlikely to occur, but can in strange cases where we're storing at the end of one page and the beginning of another.
2 - The merged load\store node must retain the non-temporal flag.
Differential Revision: https://reviews.llvm.org/D62910
llvm-svn: 362723
The ISD::STRICT_ nodes used to implement the constrained floating-point
intrinsics are currently never passed to the target back-end, which makes
it impossible to handle them correctly (e.g. mark instructions are depending
on a floating-point status and control register, or mark instructions as
possibly trapping).
This patch allows the target to use setOperationAction to switch the action
on ISD::STRICT_ nodes to Legal. If this is done, the SelectionDAG common code
will stop converting the STRICT nodes to regular floating-point nodes, but
instead pass the STRICT nodes to the target using normal SelectionDAG
matching rules.
To avoid having the back-end duplicate all the floating-point instruction
patterns to handle both strict and non-strict variants, we make the MI
codegen explicitly aware of the floating-point exceptions by introducing
two new concepts:
- A new MCID flag "mayRaiseFPException" that the target should set on any
instruction that possibly can raise FP exception according to the
architecture definition.
- A new MI flag FPExcept that CodeGen/SelectionDAG will set on any MI
instruction resulting from expansion of any constrained FP intrinsic.
Any MI instruction that is *both* marked as mayRaiseFPException *and*
FPExcept then needs to be considered as raising exceptions by MI-level
codegen (e.g. scheduling).
Setting those two new flags is straightforward. The mayRaiseFPException
flag is simply set via TableGen by marking all relevant instruction
patterns in the .td files.
The FPExcept flag is set in SDNodeFlags when creating the STRICT_ nodes
in the SelectionDAG, and gets inherited in the MachineSDNode nodes created
from it during instruction selection. The flag is then transfered to an
MIFlag when creating the MI from the MachineSDNode. This is handled just
like fast-math flags like no-nans are handled today.
This patch includes both common code changes required to implement the
new features, and the SystemZ implementation.
Reviewed By: andrew.w.kaylor
Differential Revision: https://reviews.llvm.org/D55506
llvm-svn: 362663
Most parts of LLVM don't care whether the byval type is derived from an
explicit Attribute or from the parameter's pointee type, so it makes
sense for the main access function to just return the right value.
The very few users who do care (only BitcodeReader so far) can find out
how it's specified by accessing the Attribute directly.
llvm-svn: 362642
Instead of passing around fast-math-flags as a parameter, we can set those
using an IRBuilder guard object. This is no-functional-change-intended.
The motivation is to eventually fix the vectorizers to use and set the
correct fast-math-flags for reductions. Examples of that not behaving as
expected are:
https://bugs.llvm.org/show_bug.cgi?id=23116 (should be able to reduce with less than 'fast')
https://bugs.llvm.org/show_bug.cgi?id=35538 (possible miscompile for -0.0)
D61802 (should be able to reduce with IR-level FMF)
Differential Revision: https://reviews.llvm.org/D62272
llvm-svn: 362612
Summary:
An argument that is return by a function but bit-casted before can still
be annotated as "returned". Make sure we do not crash for this case.
Reviewers: sunfish, stephenwlin, niravd, arsenm
Subscribers: wdng, hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59917
llvm-svn: 362546
This is a special case of a more general transform (not (sub Y, X)) -> (add X, ~Y). InstCombine knows the general form. I've restricted to the special case to fix the motivating case PR42118. I tried handling any case where Y was constant, but got some changes on some Mips tests that I couldn't quickly prove where beneficial.
Fixes PR42118
Differential Revision: https://reviews.llvm.org/D62828
llvm-svn: 362533
The proposal in D62498 showed that x86 would benefit from vector
store splitting, but that may conflict with the generic DAG
combiner's store merging transforms.
Add memory type to the existing TLI hook that enables the merging
transforms, so we can limit those changes to scalars only for x86.
llvm-svn: 362507
Summary:
This *might* be the last fold for `sink-addsub-of-const.ll`, but i'm not sure yet.
As far as i can tell, there are no regressions here (ignoring x86-32),
all changes are either good or neutral.
This, almost surprisingly to me, fixes the motivational tests (in `shift-amount-mod.ll`)
`@reg32_lshr_by_sub_from_negated` from [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].
https://rise4fun.com/Alive/vMd3
Reviewers: RKSimon, t.p.northover, craig.topper, spatel, efriedma
Reviewed By: RKSimon
Subscribers: sdardis, javed.absar, arichardson, kristof.beyls, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62774
llvm-svn: 362488
As I mentioned on D61887 we don't get many hits on ComputeNumSignBits as we did on computeKnownBits.
The case we do get is interesting though - it allows us to use the 'ConditionalNegate' combine in combineLogicBlendIntoPBLENDV to remove a select.
It comes too late for SSE41 (BLENDV) cases, but SSE2 tests can hit it now. We should probably try to make use of this for SSE41+ targets as well - avoiding variable blends is usually a good idea. I'll investigate as a followup.
Differential Revision: https://reviews.llvm.org/D62777
llvm-svn: 362486
This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c
static void store64(u64 x, unsigned char* y)
{
for(int i = 0; i != 8; ++i)
y[i] = (x >> ((7-i) * 8)) & 255;
}
static u64 load64(const unsigned char* y)
{
u64 res = 0;
for(int i = 0; i != 8; ++i)
res |= (u64)(y[i]) << ((7-i) * 8);
return res;
}
The load64 has been implemented by https://reviews.llvm.org/D26149
This patch is trying to implement the store pattern.
Match a pattern where a wide type scalar value is stored by several narrow
stores. Fold it into a single store or a BSWAP and a store if the targets
supports it.
Assuming little endian target:
i8 *p = ...
i32 val = ...
p[0] = (val >> 0) & 0xFF;
p[1] = (val >> 8) & 0xFF;
p[2] = (val >> 16) & 0xFF;
p[3] = (val >> 24) & 0xFF;
>
*((i32)p) = val;
i8 *p = ...
i32 val = ...
p[0] = (val >> 24) & 0xFF;
p[1] = (val >> 16) & 0xFF;
p[2] = (val >> 8) & 0xFF;
p[3] = (val >> 0) & 0xFF;
>
*((i32)p) = BSWAP(val);
Differential Revision: https://reviews.llvm.org/D61843
llvm-svn: 362472
Summary: This change facilitates propagating fmf which was placed on setcc from fcmp through folds with selects so that back ends can model this path for arithmetic folds on selects in SDAG.
Reviewers: qcolombet, spatel
Reviewed By: qcolombet
Subscribers: nemanjai, jsji
Differential Revision: https://reviews.llvm.org/D62552
llvm-svn: 362439
For some reason multiple places need to do this, and the variant the
loop unroller and inliner use was not handling it.
Also, introduce a new wrapper to be slightly more precise, since on
AMDGPU some addrspacecasts are free, but not no-ops.
llvm-svn: 362436
We were missing this fold in the DAG, which I've copied directly from llvm::ConstantFoldCastInstruction
Differential Revision: https://reviews.llvm.org/D62807
llvm-svn: 362397
When LiveDebugValues deduces new variable's location from spill, restore or
register copy instruction it should close old variable's location. Otherwise
we can have multiple block output locations for same variable. That could lead
to inserting two DBG_VALUEs for same variable to the beginning of the successor
block which results to ignoring of first DBG_VALUE.
Reviewers: aprantl, jmorse, wolfgangp, dstenb
Reviewed By: aprantl
Subscribers: probinson, asowda, ivanbaev, petarj, djtodoro
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D62196
llvm-svn: 362373
If we hit the limit, we do expand the outstanding tokenfactors.
Otherwise, we might drop nodes with users in the unexpanded
tokenfactors. This fixes the crashes reported by Jordan Rupprecht.
Reviewers: niravd, spatel, craig.topper, rupprecht
Reviewed By: niravd
Differential Revision: https://reviews.llvm.org/D62633
llvm-svn: 362350
Move this combine from x86 into generic DAGCombine, which currently only manages cases where the bitcast is between types of the same scalarsize.
Differential Revision: https://reviews.llvm.org/D59188
llvm-svn: 362324
Add (opt-in) support for implicit truncation to isConstOrConstSplat, which allows us to match truncated 'all ones' cases in isBitwiseNot.
PR41020 compares against using ISD::isBuildVectorAllOnes() instead, but that predicate silently accepts any UNDEF elements in the build vector which might not be what we want in isBitwiseNot - so I've added an opt-in 'AllowUndefs' flag that is set to false by default but will allow us to enable it on individual cases where its safe.
Differential Revision: https://reviews.llvm.org/D62783
llvm-svn: 362323
The results of the dyn_casts were immediately dereferenced on the next line
so they had better not be null.
I don't think there's any way for these dyn_casts to fail, so use a cast
of adding null check.
llvm-svn: 362315
Over a year ago, MachineInstr gained a fourth boolean parameter that occurs
before the TII pointer. When this happened, several places started accidentally
passing TII into this boolean parameter instead of the TII parameter.
llvm-svn: 362312
We were hashing the string pointer, not the string, so two instructions
could be identical (isIdenticalTo), but have different hash codes.
This showed up as a very rare, non-deterministic assertion failure
rehashing a DenseMap constructed by MachineOutliner. So there's no
"real" testcase, just a unittest which checks that the hash function
behaves correctly.
I'm a little scared fixing this is going to cause a regression in
outlining or MachineCSE, but hopefully we won't run into any issues.
Differential Revision: https://reviews.llvm.org/D61975
llvm-svn: 362281
Just copy all of the operands except the chain and call MorphNode on that.
This removes the IsUnary and IsTernary flags.
Also always get the result type from the result type of the original
nodes. Previously we got it from the operand except for two nodes
where that didn't work.
llvm-svn: 362269
Summary:
Fixes a warning produced from scan-build (llvm.org/reports/scan-build/),
further warnings found by annotation isMoveInstr [[nodiscard]].
isMoveInstr potentially does not assign to its parameters, so if they
were uninitialized, they will potentially stay uninitialized. It seems
most call sites pass references to uninitialized values, then use them
without checking the return value.
Reviewers: wmi
Reviewed By: wmi
Subscribers: MatzeB, qcolombet, hiraditya, tpr, llvm-commits, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62109
llvm-svn: 362265
We don't want to create vregs if there is nothing to use them for. That causes
verifier errors.
Differential Revision: https://reviews.llvm.org/D62740
llvm-svn: 362247
[FPEnv] Added a special UnrollVectorOp method to deal with the chain on StrictFP opcodes
This change creates UnrollVectorOp_StrictFP. The purpose of this is to address a failure that consistently occurs when calling StrictFP functions on vectors whose number of elements is 3 + 2n on most platforms, such as PowerPC or SystemZ. The old UnrollVectorOp method does not expect that the vector that it will unroll will have a chain, so it has an assert that prevents it from running if this is the case. This new StrictFP version of the method deals with the chain while unrolling the vector. With this new function in place during vector widending, llc can run vector-constrained-fp-intrinsics.ll for SystemZ successfully.
Submitted by: Drew Wock <drew.wock@sas.com>
Reviewed by: Cameron McInally, Kevin P. Neal
Approved by: Cameron McInally
Differential Revision: https://reviews.llvm.org/D62546
llvm-svn: 362241
This consolidates the vreg skip code into one function (SkipVRegs()).
SkipVRegs() now knows if it should skip as if it is the first initialization or
subsequent skips.
The first skip is also done the first time createVirtualRegister is called by
the cursor instead of by the cursor's constructor. This prevents verifier
errors on machine functions that have no vregs (where the verifier will
complain that there are vregs when the function uses none).
Differential Revision: https://reviews.llvm.org/D62717
llvm-svn: 362195
This is am almost NFC, it does the following:
- If there is no register class for a COPY's src or dst, bail.
- Fixes uses iterator invalidation bug.
Differential Revision: https://reviews.llvm.org/D62713
llvm-svn: 362191
Separate the remark serialization to YAML from the LLVM Diagnostics.
This adds a new serialization abstraction: remarks::Serializer. It's
completely independent from lib/IR and it provides an easy way to
replace YAML by providing a new remarks::Serializer.
Differential Revision: https://reviews.llvm.org/D62632
llvm-svn: 362160
I don't have a test case for these, but there is a test case for D62266
where, even after all the constant-folding patches, we still end up
with endless combine loop. Which makes sense, since we don't constant
fold for opaque constants.
llvm-svn: 362156
Summary:
Only vector tests are being affected here,
since subtraction by scalar constant is rewritten
as addition by negated constant.
No surprising test changes.
https://rise4fun.com/Alive/pbT
This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs.
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: javed.absar, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62257
llvm-svn: 362146
Summary:
Again only vectors affected. Frustrating. Let me take a look into that..
https://rise4fun.com/Alive/AAq
This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: javed.absar, JDevlieghere, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62294
llvm-svn: 362145
Summary:
This prevents regressions in next patch,
and somewhat recovers from the regression to AMDGPU test in D62223.
It is indeed not great that we leave vector decrement,
don't transform it into vector add all-ones..
https://rise4fun.com/Alive/ZRl
This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.
Reviewers: RKSimon, craig.topper, spatel, arsenm
Reviewed By: RKSimon, arsenm
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62263
llvm-svn: 362144
Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?
The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`
https://rise4fun.com/Alive/ffh
This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.
Reviewers: RKSimon, craig.topper, spatel, t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62252
llvm-svn: 362143
Summary:
The main motivation is shown by all these `neg` instructions that are now created.
In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test.
AArch64 test changes all look good (`neg` created), or neutral.
X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created).
I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.
I'm unable to interpret AMDGPU change, looks neutral-ish?
This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].
https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)
This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.
Reviewers: craig.topper, RKSimon, spatel, arsenm
Reviewed By: RKSimon
Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62223
llvm-svn: 362142
Summary:
Direct sibling of D62662, the root cause of the endless combine loop in D62257
https://rise4fun.com/Alive/d3W
Reviewers: RKSimon, craig.topper, spatel, t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62664
llvm-svn: 362133
Summary:
No tests change, and i'm not sure how to test this, but it's better safe than sorry.
Reviewers: spatel, RKSimon, craig.topper, t.p.northover
Reviewed By: craig.topper
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62663
llvm-svn: 362132
Summary:
This was the root cause of the endless combine loop in D62257
https://rise4fun.com/Alive/d3W
Reviewers: RKSimon, spatel, craig.topper, t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62662
llvm-svn: 362131
Summary: No tests change, and i'm not sure how to test this, but it's better safe than sorry.
Reviewers: spatel, RKSimon, craig.topper, t.p.northover
Reviewed By: craig.topper
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62661
llvm-svn: 362130
When we switch to opaque pointer types we will need some way to describe
how many bytes a 'byval' parameter should occupy on the stack. This adds
a (for now) optional extra type parameter.
If present, the type must match the pointee type of the argument.
The original commit did not remap byval types when linking modules, which broke
LTO. This version fixes that.
Note to front-end maintainers: if this causes test failures, it's probably
because the "byval" attribute is printed after attributes without any parameter
after this change.
llvm-svn: 362128
There were crashes (addrspace-memoperands.mir was only one of them) in MIR that
had operands that came from before register classes were set. With these
operands, creating a replacement vreg (for MIR-Canon's renaming) needs to use
the vreg type rather than the RegisterClass which is not present.
Differential Revision: https://reviews.llvm.org/D62543
llvm-svn: 362122
This change creates UnrollVectorOp_StrictFP. The purpose of this is to address a failure that consistently occurs when calling StrictFP functions on vectors whose number of elements is 3 + 2n on most platforms, such as PowerPC or SystemZ. The old UnrollVectorOp method does not expect that the vector that it will unroll will have a chain, so it has an assert that prevents it from running if this is the case. This new StrictFP version of the method deals with the chain while unrolling the vector. With this new function in place during vector widending, llc can run vector-constrained-fp-intrinsics.ll for SystemZ successfully.
Submitted by: Drew Wock <drew.wock@sas.com>
Reviewed by: Cameron McInally, Kevin P. Neal
Approved by: Cameron McInally
Differential Revision: http://reviews.llvm.org/D62546
llvm-svn: 362112
I was looking into an endless combine loop the uncommitted follow-up patch
was causing, and it appears even these patches can exibit such an
endless loop. The root cause is that we try to hoist one binop (add/sub) with
constant operand, and if we get two such binops both of which are
eligible for this hoisting, we get stuck.
Some cases may highlight missing constant-folds.
Reverts r361871,r361872,r361873,r361874.
llvm-svn: 362109
Summary:
Add static data members to IR debug info's list of global variables
so that they are emitted as S_CONSTANT records.
Related to https://bugs.llvm.org/show_bug.cgi?id=41615.
Reviewers: rnk
Subscribers: aprantl, cfe-commits, llvm-commits, thakis
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D62167
llvm-svn: 362038
When we switch to opaque pointer types we will need some way to describe
how many bytes a 'byval' parameter should occupy on the stack. This adds
a (for now) optional extra type parameter.
If present, the type must match the pointee type of the argument.
Note to front-end maintainers: if this causes test failures, it's probably
because the "byval" attribute is printed after attributes without any parameter
after this change.
llvm-svn: 362012
The current design use DFA to do resource tracking in SMS,
and DFA only support InstrItins, and also has scaling limitation.
This patch extend SMS to allow Subtarget to use ProcResource in
InstrSchedModel instead.
Differential Revision: https://reviews.llvm.org/D62163
llvm-svn: 361919
avoid static check fail
RegClassOrBank is an object of RegClassOrRegBank, which is defined as
using llvm::RegClassOrRegBank = typedef PointerUnion<const
TargetRegisterClass *, const RegisterBank *>
so control flow can not get here. Use ""llvm_unreachable" here to avoid
"null pointer" confusion.
Patch by Shengchen Kan (skan)
Differential Revision: https://reviews.llvm.org/D62006
Signed-off-by: pengfei <pengfei.wang@intel.com>
llvm-svn: 361912
To determine the list of clobbered registers, the RegUsageInfoCollector pass
uses the list of callee saved registers provided by the target and then augments
it with the list of registers which have all their subregisters saved. It then
basically does the difference between all the registers and the saved registers
to come up with what is clobbered (plus it checks that the register is defined
within that functions).
The patch fixes a bug where when register does not have any subregister lane,
hence when checking if any of its subregister are not saved, we would find none
and think the register is saved as well.
That's obviously wrong.
The code was actually kind of checking for something like that with the
CoveredBySubRegs bit. What this bit says is that a register is completely
covered by its subregisters.
We required that this bit was set, to check that a register was saved by its
subregister lanes, since without this bit, we potentially would miss to check
some part of the register.
However, this bit is used de facto on registers that don't have any
subregisters (e.g., on ARM) and the code was not prepared for that.
This patch fixes this by checking that a register has subregisters before
declaring it saved when none of its lanes are modified.
llvm-svn: 361901
This patch add the ISD::LRINT and ISD::LLRINT along with new
intrinsics. The changes are straightforward as for other
floating-point rounding functions, with just some adjustments
required to handle the return value being an interger.
The idea is to optimize lrint/llrint generation for AArch64
in a subsequent patch. Current semantic is just route it to libm
symbol.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D62017
llvm-svn: 361875
Summary:
Again only vectors affected. Frustrating. Let me take a look into that..
https://rise4fun.com/Alive/AAq
This is a recommit, originally committed in rL361856, but reverted
to investigate test-suite compile-time hangs.
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: javed.absar, JDevlieghere, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62294
llvm-svn: 361874
Summary:
This prevents regressions in next patch,
and somewhat recovers from the regression to AMDGPU test in D62223.
It is indeed not great that we leave vector decrement,
don't transform it into vector add all-ones..
https://rise4fun.com/Alive/ZRl
This is a recommit, originally committed in rL361855, but reverted
to investigate test-suite compile-time hangs.
Reviewers: RKSimon, craig.topper, spatel, arsenm
Reviewed By: RKSimon, arsenm
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62263
llvm-svn: 361873
Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?
The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`
https://rise4fun.com/Alive/ffh
This is a recommit, originally committed in rL361853, but reverted
to investigate test-suite compile-time hangs.
Reviewers: RKSimon, craig.topper, spatel, t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62252
llvm-svn: 361872
Summary:
The main motivation is shown by all these `neg` instructions that are now created.
In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test.
AArch64 test changes all look good (`neg` created), or neutral.
X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created).
I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.
I'm unable to interpret AMDGPU change, looks neutral-ish?
This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].
https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)
This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs.
Reviewers: craig.topper, RKSimon, spatel, arsenm
Reviewed By: RKSimon
Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62223
llvm-svn: 361871
Summary:
Again only vectors affected. Frustrating. Let me take a look into that..
https://rise4fun.com/Alive/AAq
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: javed.absar, JDevlieghere, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62294
llvm-svn: 361856
Summary:
This prevents regressions in next patch,
and somewhat recovers from the regression to AMDGPU test in D62223.
It is indeed not great that we leave vector decrement,
don't transform it into vector add all-ones..
https://rise4fun.com/Alive/ZRl
Reviewers: RKSimon, craig.topper, spatel, arsenm
Reviewed By: RKSimon, arsenm
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62263
llvm-svn: 361855
Summary:
Only vector tests are being affected here,
since subtraction by scalar constant is rewritten
as addition by negated constant.
No surprising test changes.
https://rise4fun.com/Alive/pbT
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: javed.absar, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62257
llvm-svn: 361854
Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?
The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`
https://rise4fun.com/Alive/ffh
Reviewers: RKSimon, craig.topper, spatel, t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62252
llvm-svn: 361853
Summary:
The main motivation is shown by all these `neg` instructions that are now created.
In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test.
AArch64 test changes all look good (`neg` created), or neutral.
X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created).
I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.
I'm unable to interpret AMDGPU change, looks neutral-ish?
This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].
https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)
Reviewers: craig.topper, RKSimon, spatel, arsenm
Reviewed By: RKSimon
Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62223
llvm-svn: 361852
Move the element index/count variables into the block where they are actually used - appeases cppcheck and helps avoid shadow variable warnings.
llvm-svn: 361821
Summary:
When DwarfDebug::buildLocationList() encountered an undef debug value,
it would truncate all open values, regardless if they were overlapping or
not. This patch fixes so that it only does that for overlapping fragments.
This change unearthed a bug that I had introduced in D57511,
which I have fixed in this patch. The code in DebugHandlerBase that
changes labels for parameter debug values could break DwarfDebug's
assumption that the labels for the entries in the debug value history
are monotonically increasing. Before this patch, that bug could result
in location list entries whose ending address was lower than the
beginning address, and with the changes for undef debug values that this
patch introduces it could trigger an assertion, due to attempting to
emit location list entries with empty ranges. A reproducer for the bug
is added in param-reg-const-mix.mir.
Reviewers: aprantl, jmorse, probinson
Reviewed By: aprantl
Subscribers: javed.absar, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D62379
llvm-svn: 361820
Setting mayLiveOut based only on use instructions after allocating the
def block did not work if the use block was allocated before the def
block, since the virtual register uses were already removed.
Fixes bug 41973.
llvm-svn: 361781
This is derived from the related fold for build vectors.
We also have a version of this in DAGCombiner. The benefit of
having this fold at node creation time is (1) efficiency and
(2) preventing infinite looping from creating patterns that
should not exist in the first place.
Currently, the inf-loop could happen with MergeConsecutiveStores()
because it naively creates concat of extracts when forming a wider
vector store. That could fight with target-specific store narrowing.
llvm-svn: 361780
There's a possible missing fold here for extracting from the
same source vector. It's similar to a check that we use to
squash a build vector with all extracted elements from the
same source vector.
llvm-svn: 361778
Summary:
- The current implementation simplifies the case where the source of
`copyto` is `implicit-def`ed. However, it only works when that
`implicit-def` is single-used since it detects that from
`implicit-def` and cannot determine which destination vreg should be
used if there are multiple uses.
- This patch changes that detection when `copyto` is being emitted. If
that `copyto`'s source is defined from `implicit-def`, it simplifies
it. Hence, it works even that `implicit-def` is multi-used.
- Except it simplifies the internal IR, it won't improve the quality of
code generation. However, it helps to detect 'implicit-def` in a
straight-forward manner in some passes, such as `si-i1-copies`. A test
case is added.
Reviewers: sunfish, nhaehnle
Subscribers: jvesely, hiraditya, asbirlea, llvm-commits, yaxunl
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62342
llvm-svn: 361777
The DemandedElts variable is pretty much inert at the moment - the original GetDemandedBits implementation calls it with an 'all ones' DemandedElts value so the function is active and behaves exactly as it used to.
llvm-svn: 361773
This is problematic on buildbots, as discussed here: https://reviews.llvm.org/rL361356
It seems like the plan already was to revert, but that hasn't happened yet.
llvm-svn: 361746
Details: To make instruction selection really divergence driven it is necessary to assign
the correct register classes to the cross block values beforehand. For the divergent targets
same value type requires different register classes dependent on the value divergence.
Reviewers: rampitec, nhaehnle
Differential Revision: https://reviews.llvm.org/D59990
This commit was reverted because of the build failure.
The reason was mlformed patch.
Build failure fixed.
llvm-svn: 361741
The test based on PR42010:
https://bugs.llvm.org/show_bug.cgi?id=42010
...may show an inaccuracy for PPC's target defs, but we should not
be so aggressive with an assert here. There's no telling what out-of-tree
targets look like.
llvm-svn: 361696
Details: To make instruction selection really divergence driven it is necessary to assign
the correct register classes to the cross block values beforehand. For the divergent targets
same value type requires different register classes dependent on the value divergence.
Reviewers: rampitec, nhaehnle
Differential Revision: https://reviews.llvm.org/D59990
llvm-svn: 361644
This patch adds the overridable TargetLowering::getTargetConstantFromLoad function which allows targets to return any constant value loaded by a LoadSDNode node - only X86 makes use of this so far but everything should be in place for other targets.
computeKnownBits then uses this function to improve codegen, notably vector code after legalization.
A future commit will do the same for ComputeNumSignBits but computeKnownBits sees the bigger benefit.
This required a couple of fixes:
* SimplifyDemandedBits must early-out for getTargetConstantFromLoad cases to prevent infinite loops of constant regeneration (similar to what we already do for BUILD_VECTOR).
* Fix a DAGCombiner::visitTRUNCATE issue as we had trunc(shl(v8i32),v8i16) <-> shl(trunc(v8i16),v8i32) infinite loops after legalization on AVX512 targets.
Differential Revision: https://reviews.llvm.org/D61887
llvm-svn: 361620
Just a minor refactoring to use the new helper method
DataLayout::typeSizeEqualsStoreSize(). This is done when
checking if getTypeSizeInBits is equal/non-equal to
getTypeStoreSizeInBits.
llvm-svn: 361613
swifterror marks an argument as a register pretending to be a pointer, so we
need a guaranteed mem2reg-like analysis of its uses. Fortunately most of the
infrastructure can be reused from the DAG world.
llvm-svn: 361608
This is no-functional-change-intended currently because the definition
of isBinOp() only includes opcodes that produce 1 value. But if we
share that implementation with isCommutativeBinOp() as proposed in
D62191, then we need to make sure that the callers bail out for
opcodes that they are not prepared to handle correctly.
llvm-svn: 361547
When printing assembly for PtrToInt, AsmPrinter::lowerConstant
incorrectly assumed that if PtrToInt was not converting to an
int with exactly the same number of bits, it must be widening
to a larger int. But this isn't necessarily true; PtrToInt can
also shrink the size, which is useful when you want to produce
a known 32-bit pointer on a 64-bit platform (on x86_64 ELF
this yields a R_X86_64_32 relocation).
The old behavior of falling through to the widening case for a
narrowing PtrToInt yields bogus assembly code like this, which
fails to assemble because the no-op bit and it accidentally
creates is not a valid relocation:
```
.long a&-1
```
The fix is to treat a narrowing PtrToInt exactly the same as
it already treats Trunc: just emit the expression and let
the assembler deal with truncating it in the appropriate way.
Patch by Mat Hostetter <mjh@fb.com>.
Differential Revision: https://reviews.llvm.org/D61325
llvm-svn: 361508
Refactor location description kind in order to be easier for extensions
(needed for D60866).
In addition, cut off some bits from the other class fields.
Patch by Djordje Todorovic.
Differential Revision: https://reviews.llvm.org/D62002
llvm-svn: 361480
Keep it optional in cases this is ever needed in some global
context. Currently it's only used for getting an upper bound inline
asm code size.
For AMDGPU, gfx10 increases the maximum instruction size to
20-bytes. This avoids penalizing older subtargets when estimating code
size, and making some annoying branch relaxation test adjustments.
llvm-svn: 361405
This is the second part of the commit fixing PR38917 (hoisting
partitially redundant machine instruction). Most of PRE (partitial
redundancy elimination) and CSE work is done on LLVM IR, but some of
redundancy arises during DAG legalization. Machine CSE is not enough
to deal with it. This simple PRE implementation works a little bit
intricately: it passes before CSE, looking for partitial redundancy
and transforming it to fully redundancy, anticipating that the next
CSE step will eliminate this created redundancy. If CSE doesn't
eliminate this, than created instruction will remain dead and eliminated
later by Remove Dead Machine Instructions pass.
The third part of the commit is supposed to refactor MachineCSE,
to make it more clear and to merge MachinePRE with MachineCSE,
so one need no rely on further Remove Dead pass to clear instrs
not eliminated by CSE.
First step: https://reviews.llvm.org/D54839
Fixes llvm.org/PR38917
llvm-svn: 361356
Register coalescer fails for the test in the patch with the assertion in
JoinVals::ConflictResolution `DefMI != nullptr'. It attempts to join
live intervals for two adjacent instructions and erase the copy:
%2:vreg_256 = COPY %1
%3:vreg_256 = COPY killed %1
The LI needs to be adjusted to kill subrange for the erased instruction
and extend the subrange of the original def. That was done for the main
interval only but not for the subrange. As a result subrange had a VNI
pointing to the erased slot resulting in the above failure.
Differential Revision: https://reviews.llvm.org/D62162
llvm-svn: 361293
Add an intrinsic that takes 2 signed integers with the scale of them provided
as the third argument and performs fixed point multiplication on them. The
result is saturated and clamped between the largest and smallest representable
values of the first 2 operands.
This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.
Differential Revision: https://reviews.llvm.org/D55720
llvm-svn: 361289
DAGCombiner simplifies this more liberally as:
// If inserting an UNDEF, just return the original vector.
if (N1.isUndef())
return N0;
So there's no way to make this visible in output AFAIK, but
doing this at node creation time should be slightly more efficient.
llvm-svn: 361287
getNode() squashes concatenation of undefs via FoldCONCAT_VECTORS():
// Concat of UNDEFs is UNDEF.
if (llvm::all_of(Ops, [](SDValue Op) { return Op.isUndef(); }))
return DAG.getUNDEF(VT);
llvm-svn: 361284
There are no FP callers of DAGCombiner::reassociateOps() currently,
but we can add a fast-math check to make sure this API is not being
misused.
This was noted as a potential risk (and that risk might increase) with:
D62191
llvm-svn: 361268
Summary:
The endianess used in the calling convention does not always match the
endianess of the target on all architectures, namely AVR.
When an argument is too large to be legalised by the architecture and is
split for the ABI, a new hook TargetLoweringInfo::shouldSplitFunctionArgumentsAsLittleEndian
is queried to find the endianess that function arguments must be laid
out in.
This approach was recommended by Eli Friedman.
Originally reported in https://github.com/avr-rust/rust/issues/129.
Patch by Carl Peto.
Reviewers: bogner, t.p.northover, RKSimon, niravd, efriedma
Reviewed By: efriedma
Subscribers: JDevlieghere, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62003
llvm-svn: 361222
Since INLINEASM_BR is a terminator we need to flush the pending exports before
emitting it. If we don't do this, a TokenFactor can be inserted between it and
the BR instruction emitted to finish the callbr lowering.
It looks like nodes are glued to the INLINEASM_BR so I had to make sure we emit
the TokenFactor before that.
Differential Revision: https://reviews.llvm.org/D59981
llvm-svn: 361177
We shouldn't really make assumptions about possible sizes for long and long long. And longer term we should probably support vectorizing these intrinsics. By making the result types not fixed we can support vectors as well.
Differential Revision: https://reviews.llvm.org/D62026
llvm-svn: 361169
This changes the isShift variable to include the constant operand
check that was previously in the if statement.
While there fix an 80 column violation and an unnecessary use of
getNode. Also fix variable name capitalization.
llvm-svn: 361168
Fixes issue reported by aemerson on D57348. Vector op legalization
support is added for uaddo, usubo, saddo and ssubo (umulo and smulo
were already supported). As usual, by extracting TargetLowering methods
and calling them from vector op legalization.
Vector op legalization doesn't really deal with multiple result nodes,
so I'm explicitly performing a recursive legalization call on the
result value that is not being legalized.
There are some existing test changes because expansion happens
earlier, so we don't get a DAG combiner run in between anymore.
Differential Revision: https://reviews.llvm.org/D61692
llvm-svn: 361166
The code did not match the example in the comment, and was checking
the undef flag on the copy dest instead of source. The existing tests
were only hitting the > 2 operands case.
llvm-svn: 361156
Refactor DIExpression::With* into a flag enum in order to be less
error-prone to use (as discussed on D60866).
Patch by Djordje Todorovic.
Differential Revision: https://reviews.llvm.org/D61943
llvm-svn: 361137
This is ported from the custom AMDGPU DAG implementation. I think this
is a better default expansion than what the DAG currently uses, at
least if the target has CTLZ.
This implements the signed version in terms of the unsigned
conversion, which is implemented with bit operations. SelectionDAG has
several other implementations that should eventually be ported
depending on what instructions are legal.
llvm-svn: 361081
Summary:
That check claims that the transform is illegal otherwise.
That isn't true:
1. For `ISD::ADD`, we only process `ISD::SHL` outer shift => sign bit does not matter
https://rise4fun.com/Alive/K4A
2. For `ISD::AND`, there is no restriction on constants:
https://rise4fun.com/Alive/Wy3
3. For `ISD::OR`, there is no restriction on constants:
https://rise4fun.com/Alive/GOH
3. For `ISD::XOR`, there is no restriction on constants:
https://rise4fun.com/Alive/ml6
So, why is it there then?
This changes the testcase that was touched by @spatel in rL347478,
but i'm not sure that test tests anything particular?
Reviewers: RKSimon, spatel, craig.topper, jojo, rengolin
Reviewed By: spatel
Subscribers: javed.absar, llvm-commits, spatel
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61918
llvm-svn: 361044
This patch implements a limited form of autolinking primarily designed to allow
either the --dependent-library compiler option, or "comment lib" pragmas (
https://docs.microsoft.com/en-us/cpp/preprocessor/comment-c-cpp?view=vs-2017) in
C/C++ e.g. #pragma comment(lib, "foo"), to cause an ELF linker to automatically
add the specified library to the link when processing the input file generated
by the compiler.
Currently this extension is unique to LLVM and LLD. However, care has been taken
to design this feature so that it could be supported by other ELF linkers.
The design goals were to provide:
- A simple linking model for developers to reason about.
- The ability to to override autolinking from the linker command line.
- Source code compatibility, where possible, with "comment lib" pragmas in other
environments (MSVC in particular).
Dependent library support is implemented differently for ELF platforms than on
the other platforms. Primarily this difference is that on ELF we pass the
dependent library specifiers directly to the linker without manipulating them.
This is in contrast to other platforms where they are mapped to a specific
linker option by the compiler. This difference is a result of the greater
variety of ELF linkers and the fact that ELF linkers tend to handle libraries in
a more complicated fashion than on other platforms. This forces us to defer
handling the specifiers to the linker.
In order to achieve a level of source code compatibility with other platforms
we have restricted this feature to work with libraries that meet the following
"reasonable" requirements:
1. There are no competing defined symbols in a given set of libraries, or
if they exist, the program owner doesn't care which is linked to their
program.
2. There may be circular dependencies between libraries.
The binary representation is a mergeable string section (SHF_MERGE,
SHF_STRINGS), called .deplibs, with custom type SHT_LLVM_DEPENDENT_LIBRARIES
(0x6fff4c04). The compiler forms this section by concatenating the arguments of
the "comment lib" pragmas and --dependent-library options in the order they are
encountered. Partial (-r, -Ur) links are handled by concatenating .deplibs
sections with the normal mergeable string section rules. As an example, #pragma
comment(lib, "foo") would result in:
.section ".deplibs","MS",@llvm_dependent_libraries,1
.asciz "foo"
For LTO, equivalent information to the contents of a the .deplibs section can be
retrieved by the LLD for bitcode input files.
LLD processes the dependent library specifiers in the following way:
1. Dependent libraries which are found from the specifiers in .deplibs sections
of relocatable object files are added when the linker decides to include that
file (which could itself be in a library) in the link. Dependent libraries
behave as if they were appended to the command line after all other options. As
a consequence the set of dependent libraries are searched last to resolve
symbols.
2. It is an error if a file cannot be found for a given specifier.
3. Any command line options in effect at the end of the command line parsing apply
to the dependent libraries, e.g. --whole-archive.
4. The linker tries to add a library or relocatable object file from each of the
strings in a .deplibs section by; first, handling the string as if it was
specified on the command line; second, by looking for the string in each of the
library search paths in turn; third, by looking for a lib<string>.a or
lib<string>.so (depending on the current mode of the linker) in each of the
library search paths.
5. A new command line option --no-dependent-libraries tells LLD to ignore the
dependent libraries.
Rationale for the above points:
1. Adding the dependent libraries last makes the process simple to understand
from a developers perspective. All linkers are able to implement this scheme.
2. Error-ing for libraries that are not found seems like better behavior than
failing the link during symbol resolution.
3. It seems useful for the user to be able to apply command line options which
will affect all of the dependent libraries. There is a potential problem of
surprise for developers, who might not realize that these options would apply
to these "invisible" input files; however, despite the potential for surprise,
this is easy for developers to reason about and gives developers the control
that they may require.
4. This algorithm takes into account all of the different ways that ELF linkers
find input files. The different search methods are tried by the linker in most
obvious to least obvious order.
5. I considered adding finer grained control over which dependent libraries were
ignored (e.g. MSVC has /nodefaultlib:<library>); however, I concluded that this
is not necessary: if finer control is required developers can fall back to using
the command line directly.
RFC thread: http://lists.llvm.org/pipermail/llvm-dev/2019-March/131004.html.
Differential Revision: https://reviews.llvm.org/D60274
llvm-svn: 360984
Summary:
This emits S_CONSTANT records for global variables.
Currently this emits records for the global variables already being tracked in the
LLVM IR metadata, which are just constant global variables; we'll also want S_CONSTANTs
for static data members and enums.
Related to https://bugs.llvm.org/show_bug.cgi?id=41615
Reviewers: rnk
Subscribers: aprantl, hiraditya, llvm-commits, thakis
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61926
llvm-svn: 360948
The recent introduction of v3i32 etc as an MVT, and its use in AMDGPU
3-dword memory instructions, caused a de-optimization problem for code
with such a load that then bitcasts via vector of i8, because v12i8 is
not an MVT so it legalizes the bitcast by widening it.
This commit adds the ability to widen a bitcast using extract_subvector
on the result, so the value does not need to go via memory.
Differential Revision: https://reviews.llvm.org/D60457
Change-Id: Ie4abb7760547e54a2445961992eafc78e80d4b64
llvm-svn: 360942
This patch add the ISD::LROUND and ISD::LLROUND along with new
intrinsics. The changes are straightforward as for other
floating-point rounding functions, with just some adjustments
required to handle the return value being an interger.
The idea is to optimize lround/llround generation for AArch64
in a subsequent patch. Current semantic is just route it to libm
symbol.
llvm-svn: 360889
Trace through multiple COPYs when looking for a physreg source. Add
hinting for vregs that will be copied into physregs (we only hinted
for vregs getting copied to a physreg previously). Give hinted a
register a bonus when deciding which value to spill. This is part of
my rewrite regallocfast series. In fact this one doesn't even have an
effect unless you also flip the allocation to happen from back to
front of a basic block. Nonetheless it helps to split this up to ease
review of D52010
Patch by Matthias Braun
llvm-svn: 360887
Before this change, they were erroneously constructed with the EH_LABEL
SDNode opcode, which caused other passes to interact with them in
incorrect ways. See the FIXME about fastisel that this addresses in the
existing test case.
Fixes PR41890
llvm-svn: 360818
Summary:
Analogous to the other ChangeToXXX methods. See the next patch for a
use case.
Change-Id: I6548d614706834fb9109ab3c8fe915e9c6ece2a7
Reviewers: arsenm, kzhuravl
Subscribers: wdng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61651
llvm-svn: 360789
Summary:
The emitError path allows the program to continue, unlike report_fatal_error.
This is friendlier to use cases where LLVM is embedded in a larger program,
because the caller may be able to deal with the error somewhat gracefully.
Change the number of requested NOP bytes in the AArch64 and PowerPC
test cases to avoid triggering an unrelated assertion. The compilation
still fails, as verified by the test.
Change-Id: Iafb9ca341002a597b82e59ddc7a1f13c78758e3d
Reviewers: arsenm, MatzeB
Subscribers: qcolombet, nemanjai, wdng, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61489
llvm-svn: 360786
The 3-field form was introduced by D3499 in 2014 and the legacy 2-field
form was planned to be removed in LLVM 4.0
For the textual format, this patch migrates the existing 2-field form to
use the 3-field form and deletes the compatibility code.
test/Verifier/global-ctors-2.ll checks we have a friendly error message.
For bitcode, lib/IR/AutoUpgrade UpgradeGlobalVariables will upgrade the
2-field form (add i8* null as the third field).
Reviewed By: rnk, dexonsmith
Differential Revision: https://reviews.llvm.org/D61547
llvm-svn: 360742
The condition !AddrPool.empty() is tested before attachRangesOrLowHighPC(), which may add an entry to AddrPool. We emit DW_AT_low_pc (DW_FORM_addrx) but may incorrectly omit DW_AT_addr_base for LineTablesOnly. This can be easily reproduced:
clang -gdwarf-5 -gmlt -c a.cc
Fix this by moving !AddrPool.empty() below.
This was discovered while investigating an lld crash (fixed by D61889) on such object files: ld.lld --gdb-index a.o
Reviewed By: probinson
Differential Revision: https://reviews.llvm.org/D61891
llvm-svn: 360678
When breaking up loads and stores of aggregates, the IRTranslator uses
LLT::scalar(64) for the index type of the G_GEP instructions that
compute the addresses. This is unnecessarily large for 32-bit targets.
Use the int ptr type provided by the DataLayout instead.
Note that we're already doing the right thing when translating
getelementptr instructions from the IR. This is just an oversight when
generating new ones while translating loads/stores.
Both x86 and AArch64 already have tests confirming that the old
behaviour is preserved for 64-bit targets.
Differential Revision: https://reviews.llvm.org/D61852
llvm-svn: 360656
Summary:
X86TargetLowering::LowerAsmOperandForConstraint had better support than
TargetLowering::LowerAsmOperandForConstraint for arbitrary depth
getelementpointers for "i", "n", and "s" extended inline assembly
constraints. Hoist its support from the derived class into the base
class.
Link: https://github.com/ClangBuiltLinux/linux/issues/469
Reviewers: echristo, t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, E5ten, kees, jyknight, nemanjai, javed.absar, eraman, hiraditya, jsji, llvm-commits, void, craig.topper, nathanchance, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61560
llvm-svn: 360604
We catch most of these patterns (on x86 at least) by matching
a concat vectors opcode early in combining, but the pattern may
emerge later using insert subvector instead.
The AVX1 diffs for add/sub overflow show another missed narrowing
pattern. That one may be falling though the cracks because of
combine ordering and multiple uses.
llvm-svn: 360585
The new fptrunc and fpext intrinsics are constrained versions of the
regular fptrunc and fpext instructions.
Reviewed by: Andrew Kaylor, Craig Topper, Cameron McInally, Conner Abbot
Approved by: Craig Topper
Differential Revision: https://reviews.llvm.org/D55897
llvm-svn: 360581
Summary:
When we know for sure whether two addresses do or do not alias, we
should immediately return from DAGCombiner::isAlias().
I think this comes from a bad copy/paste, Sorry for not catching that during the
code review.
Fixes PR41855.
Reviewers: niravd, gchatelet, EricWF
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61846
llvm-svn: 360566
I've included a new fix in X86RegisterInfo to prevent PR41619 without
reintroducing r359392. We might be able to improve that in the base class
implementation of shouldRewriteCopySrc somehow. But this hopefully enables
forward progress on SimplifyDemandedBits improvements for now.
Original commit message:
This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly.
The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGComb
but it caused a lot of noise on other targets - some improvements, some regressions.
The X86 changes are all definite wins.
llvm-svn: 360552
I noticed that we were failing to narrow an x86 ymm math op in a case similar
to the 'madd' test diff. That is because a bitcast is sitting between the math
and the extract subvector and thwarting our pattern matching for narrowing:
t56: v8i32 = add t59, t58
t68: v4i64 = bitcast t56
t73: v2i64 = extract_subvector t68, Constant:i64<2>
t96: v4i32 = bitcast t73
There are a few wins and neutral diffs in the other tests.
Differential Revision: https://reviews.llvm.org/D61806
llvm-svn: 360541
We already updated the LegalizedNodes map at the end of the Expand call. This
would have marked the new node as being mapped to itself. So the LegalizeOp
call will find that an immediately return.
llvm-svn: 360472
Split out from D61692 per RKSimon's suggestion. Vector op
legalization will automatically recursively legalize the returned
SDValue, but we need to take care of the other results ourselves.
Otherwise it will end up getting legalized only during op
legalization, by which point it might be too late (though I'm not
aware of any specific cases right now).
There are codegen differences because expansion occurs earlier now
and we don't get a DAGCombiner run in between.
Differential Revision: https://reviews.llvm.org/D61744
llvm-svn: 360470
Follow up to r359122, after a bug was reported in it - the original
change too aggressively tried to move related types out of type units,
which included unnamed types (like array types) which can't reasonably
be declared-but-not-defined.
A step beyond that is that some types in type units can be anonymous, if
they are types with a name for linkage purposes (eg: "typedef struct { }
x;"). So ensure those don't get turned into plain declarations (without
signatures) because, lacking names, they can't be resolved to the
definition.
[Also include a fix for llvm-dwarfdump/libDebugInfoDWARF to pretty print
types in type units]
llvm-svn: 360458
This fix allows the scheduler to take into account the number of instances of
each ProcResource specified. Previously a declaration in a scheduler of
ProcResource<1> would be treated identically to a declaration of
ProcResource<2>. Now the hazard recognizer would report a hazard only after all
of the resource instances are busy.
Patch by Jackson Woodruff and Momchil Velikov.
Differential Revision: https://reviews.llvm.org/D51160
llvm-svn: 360441
To find the candidates to merge stores we iterate over all nodes in a chain
for each store, which leads to quadratic compile times for large basic blocks
with a large number of stores.
Reviewers: niravd, spatel, craig.topper
Reviewed By: niravd
Differential Revision: https://reviews.llvm.org/D61511
llvm-svn: 360357