We don't want to create vregs if there is nothing to use them for. That causes
verifier errors.
Differential Revision: https://reviews.llvm.org/D62740
llvm-svn: 362247
If we can determine that a saturating add/sub will not overflow
based on range analysis, convert it into a simple binary operation.
This is a sibling transform to the existing with.overflow handling.
Differential Revision: https://reviews.llvm.org/D62703
llvm-svn: 362242
[FPEnv] Added a special UnrollVectorOp method to deal with the chain on StrictFP opcodes
This change creates UnrollVectorOp_StrictFP. The purpose of this is to address a failure that consistently occurs when calling StrictFP functions on vectors whose number of elements is 3 + 2n on most platforms, such as PowerPC or SystemZ. The old UnrollVectorOp method does not expect that the vector that it will unroll will have a chain, so it has an assert that prevents it from running if this is the case. This new StrictFP version of the method deals with the chain while unrolling the vector. With this new function in place during vector widending, llc can run vector-constrained-fp-intrinsics.ll for SystemZ successfully.
Submitted by: Drew Wock <drew.wock@sas.com>
Reviewed by: Cameron McInally, Kevin P. Neal
Approved by: Cameron McInally
Differential Revision: https://reviews.llvm.org/D62546
llvm-svn: 362241
AMDGPU uses multiplier 9 for the inline cost. It is taken into account
everywhere except for inline hint threshold. As a result we are penalizing
functions with the inline hint making them less probable to be inlined
than those without the hint. Defaults are 225 for a normal function and
325 for a function with an inline hint. Currently we have effective
threshold 225 * 9 = 2025 for normal functions and just 325 for those with
the hint. That is fixed by this patch.
Differential Revision: https://reviews.llvm.org/D62707
llvm-svn: 362239
In PPCReduceCRLogicals after splitting the original MBB into 2, the 2 impacted branches still use original branch probability. This is unreasonable. Suppose we have following code, and the probability of each successor is 50%.
condc = conda || condb
br condc, label %target, label %fallthrough
It can be transformed to following,
br conda, label %target, label %newbb
newbb:
br condb, label %target, label %fallthrough
Since each branch has a probability of 50% to each successor, the total probability to %fallthrough is 25% now, and the total probability to %target is 75%. This actually changed the original profiling data. A more reasonable probability can be set to 70% to the false side for each branch instruction, so the total probability to %fallthrough is close to 50%.
This patch assumes the branch target with two incoming edges have same edge frequency and computes new probability fore each target, and keep the total probability to original targets unchanged.
Differential Revision: https://reviews.llvm.org/D62430
llvm-svn: 362237
These can take a significant amount of time in some builds.
Suggested by Andrea Di Biagio.
Differential Revision: https://reviews.llvm.org/D62666
llvm-svn: 362219
It looks this fold was already partially happening, indirectly
via some other folds, but with one-use limitation.
No other fold here has that restriction.
https://rise4fun.com/Alive/ftR
llvm-svn: 362217
Summary:
A three sources variant of the TBL instruction is added to the existing
SVE instruction in SVE2. This is implemented with minor changes to the
existing TableGen class. TBX is a new instruction with its own
definition.
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: chill
Differential Revision: https://reviews.llvm.org/D62600
llvm-svn: 362214
Handle position independent code for MIPS32.
When callee is global address, lower call will emit callee
as G_GLOBAL_VALUE and add target flag if needed.
Support $gp in getRegBankFromRegClass().
Select G_GLOBAL_VALUE, specially handle case when
there are target flags attached by lowerCall.
Differential Revision: https://reviews.llvm.org/D62589
llvm-svn: 362210
Move initGlobalBaseReg from MipsSEDAGToDAGISel to MipsFunctionInfo.
This way functions used for handling position independent code during
instruction selection, getGlobalBaseReg and initGlobalBaseReg,
end up in same class.
Differential Revision: https://reviews.llvm.org/D62586
llvm-svn: 362206
Lower call for callee that is register for MIPS32.
Register should contain callee function address.
Differential Revision: https://reviews.llvm.org/D62585
llvm-svn: 362204
These patterns can incorrectly narrow a volatile load from 128-bits to 64-bits.
Similar to PR42079.
Switch to using (v4i32 (bitcast (v2i64 (scalar_to_vector (loadi64))))) as the
load pattern used in the instructions.
This probably still has issues in 32-bit mode where loadi64 isn't legal. Maybe
we should use VZMOVL for widened loads even when we don't need the upper bits
as zeroes?
llvm-svn: 362203
DAG combine will usually fold fpextend+load to an fp extload anyway. So the
256 and 512 patterns were probably unnecessary. The 128 bit pattern was special
in that it looked for a v4f32 load, but then used it in an instruction that
only loads 64-bits. This is bad if the load happens to be volatile. We could
probably make the patterns volatile aware, but that's more work for something
that's probably rare. The peephole pass might kick in and save us anyway. We
might also be able to fix this with some additional DAG combines.
This also adds patterns for vselect+extload to enabled masked vcvtps2pd to be
used. Previously we looked for the unlikely vselect+fpextend+load.
llvm-svn: 362199
This consolidates the vreg skip code into one function (SkipVRegs()).
SkipVRegs() now knows if it should skip as if it is the first initialization or
subsequent skips.
The first skip is also done the first time createVirtualRegister is called by
the cursor instead of by the cursor's constructor. This prevents verifier
errors on machine functions that have no vregs (where the verifier will
complain that there are vregs when the function uses none).
Differential Revision: https://reviews.llvm.org/D62717
llvm-svn: 362195
This makes the 5 address operands come first. And the data operand comes last.
This matches the operand order the instruction is created with. It's also the
expected order in X86MCInstLower. So everything appeared to work, but the
operands didn't match their declared type.
Fixes a -verify-machineinstrs failure.
Also remove the isel patterns from these instructions since they should only
be used for stack spills and reloads. I'm not even sure what types the patterns
were looking for to match.
llvm-svn: 362193
This is am almost NFC, it does the following:
- If there is no register class for a COPY's src or dst, bail.
- Fixes uses iterator invalidation bug.
Differential Revision: https://reviews.llvm.org/D62713
llvm-svn: 362191
The result types aren't mentioned in the pattern name so really shouldn't be in the PatFrags.
The users of these either have their own type constraint or rely on the type constranit system to realize the only legal extend would be to f64.
llvm-svn: 362175
The LoadExt table defaults to all combinations being Legal. For
vector types, only src VTs with an i1 element type were ever changed.
So we don't need to mark them legal manually.
llvm-svn: 362170
Separate the remark serialization to YAML from the LLVM Diagnostics.
This adds a new serialization abstraction: remarks::Serializer. It's
completely independent from lib/IR and it provides an easy way to
replace YAML by providing a new remarks::Serializer.
Differential Revision: https://reviews.llvm.org/D62632
llvm-svn: 362160
I don't have a test case for these, but there is a test case for D62266
where, even after all the constant-folding patches, we still end up
with endless combine loop. Which makes sense, since we don't constant
fold for opaque constants.
llvm-svn: 362156
Previously, this used a statement like this:
Map[A] = Map[B];
This is equivalent to the following:
const auto &Src = Map[B];
auto &Dest = Map[A];
Dest = Src;
The second statement, "auto &Dest = Map[A];" can insert a new
element into the DenseMap, which can potentially grow and reallocate
the DenseMap's internal storage, which will invalidate the existing
reference to the source. When doing the actual assignment,
the Src reference is dereferenced, accessing memory that was
freed when the DenseMap grew.
This issue hasn't shown up when LLVM was built with Clang, because
the right hand side ended up dereferenced before evaulating the
left hand side. (If the value type is a larger data type, Clang doesn't
do this but behaves like GCC.)
With GCC, a cast to Value* isn't enough to make it dereference the
right hand side reference before invoking operator[] (while that is
enough to make Clang/LLVM do the right thing for larger types), but
storing it in an intermediate variable in a separate statement works.
This fixes PR42065.
Differential Revision: https://reviews.llvm.org/D62624
llvm-svn: 362150
Summary:
Only vector tests are being affected here,
since subtraction by scalar constant is rewritten
as addition by negated constant.
No surprising test changes.
https://rise4fun.com/Alive/pbT
This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs.
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: javed.absar, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62257
llvm-svn: 362146
Summary:
Again only vectors affected. Frustrating. Let me take a look into that..
https://rise4fun.com/Alive/AAq
This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: javed.absar, JDevlieghere, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62294
llvm-svn: 362145
Summary:
This prevents regressions in next patch,
and somewhat recovers from the regression to AMDGPU test in D62223.
It is indeed not great that we leave vector decrement,
don't transform it into vector add all-ones..
https://rise4fun.com/Alive/ZRl
This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.
Reviewers: RKSimon, craig.topper, spatel, arsenm
Reviewed By: RKSimon, arsenm
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62263
llvm-svn: 362144
Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?
The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`
https://rise4fun.com/Alive/ffh
This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.
Reviewers: RKSimon, craig.topper, spatel, t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62252
llvm-svn: 362143
Summary:
The main motivation is shown by all these `neg` instructions that are now created.
In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test.
AArch64 test changes all look good (`neg` created), or neutral.
X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created).
I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.
I'm unable to interpret AMDGPU change, looks neutral-ish?
This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].
https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)
This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.
Reviewers: craig.topper, RKSimon, spatel, arsenm
Reviewed By: RKSimon
Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62223
llvm-svn: 362142
increase the MachO/x86-64 stub alignment to 8.
Stub alignment should be guaranteed for any section containing RuntimeDyld
stubs/GOT-entries. To do this we should pad and align all sections containing
stubs, not just code sections.
This commit also bumps the MachO/x86-64 stub alignment to 8, so that GOT entries
will be aligned.
llvm-svn: 362139
Summary:
Direct sibling of D62662, the root cause of the endless combine loop in D62257
https://rise4fun.com/Alive/d3W
Reviewers: RKSimon, craig.topper, spatel, t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62664
llvm-svn: 362133
Summary:
No tests change, and i'm not sure how to test this, but it's better safe than sorry.
Reviewers: spatel, RKSimon, craig.topper, t.p.northover
Reviewed By: craig.topper
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62663
llvm-svn: 362132
Summary:
This was the root cause of the endless combine loop in D62257
https://rise4fun.com/Alive/d3W
Reviewers: RKSimon, spatel, craig.topper, t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62662
llvm-svn: 362131
Summary: No tests change, and i'm not sure how to test this, but it's better safe than sorry.
Reviewers: spatel, RKSimon, craig.topper, t.p.northover
Reviewed By: craig.topper
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62661
llvm-svn: 362130
When we switch to opaque pointer types we will need some way to describe
how many bytes a 'byval' parameter should occupy on the stack. This adds
a (for now) optional extra type parameter.
If present, the type must match the pointee type of the argument.
The original commit did not remap byval types when linking modules, which broke
LTO. This version fixes that.
Note to front-end maintainers: if this causes test failures, it's probably
because the "byval" attribute is printed after attributes without any parameter
after this change.
llvm-svn: 362128
With LLPC, previous investigation has suggested that si-scheduler
interacts badly with SiFormMemoryClauses on an XNACK target in some
games.
That needs further investigation in the future. In the meantime, this
commit adds a target-specific attribute to allow us to disable
SIFormMemoryClauses by setting it to 1 on a per-function basis for LLPC
to use.
Differential Revision: https://reviews.llvm.org/D62572
Change-Id: Ia0ca12ce79093cbbe86caded723ffb13384ede92
llvm-svn: 362127