We were calculating incorrect extract/insert offsets by trying to be too
tricksy with min/max. It's clearer to just split the logic up into "register
starts before this segment" vs "after".
llvm-svn: 297226
Some intrinsics take metadata parameters. These all need custom
handling of some form, and cannot possibly be lowered generically to
G_INTRINSIC calls with vreg operands.
Reject them, instead of hitting an assert later in getOrCreateVReg.
llvm-svn: 297209
When we translate a no-op (same type) bitcast, we try to be clever and
only emit a COPY if we already assigned a vreg to the defined value.
However, when we didn't, we tried to assign to a reference into the
ValToVReg DenseMap, even though the RHS of the assignment
(getOrCreateVReg) could potentially grow that DenseMap, invalidating the
reference.
Avoid that by getting the source vreg first.
I audited the rest of the translator; this is the only tricky case.
The test is quite unwieldy, as the problem is caused by the DenseMap
growing, which happens after the 47th mapped value.
llvm-svn: 297208
For vector operands, the `select` instruction supports both vector and
non-vector conditions. The MIR builder had an overly restrictive
assertion, that only accepted vector conditions for vector selects
(in effect implementing ISD::VSELECT).
Make it possible to express the full range of G_SELECTs.
llvm-svn: 297207
When computing the mapping for non-generic instructions, we skipped
%noreg operands, because we can't always reason about their banks.
Also skip them when applying the mapping. Otherwise, we could end
up with mappings that we can't apply.
While there, duplicate an assert to distinguish between the two
error conditions.
llvm-svn: 297201
When a dbg_value has a constant operand that isn't representable in MI,
there isn't much we can do. Use %noreg (0) for those situations.
This matches the SelectionDAG behavior.
llvm-svn: 297200
We cannot leave the identity copies 'select true, arg, undef' that this pass
inserts for arguments to simplify handling of values on swifterror arguments.
swifterror arguments have restrictions on their uses.
rdar://30839288
llvm-svn: 297197
Broadcom Vulcan is now Cavium ThunderX2T99.
LLVM Bugzilla: http://bugs.llvm.org/show_bug.cgi?id=32113
Minor fixes for the alignments of loops and functions for
ThunderX T81/T83/T88 (better performance).
Patch was tested with SpecCPU2006.
Patch by Stefan Teleman
Differential Revision: https://reviews.llvm.org/D30510
llvm-svn: 297190
The original patch r296865 was reverted as it broke the chromium builds for
Android https://bugs.llvm.org/show_bug.cgi?id=32134, this patch reapplies
r296865 with a fix to make sure it doesn't cause the build regression.
The problem was that intrinsic selection on int_arm_get_fpscr was failing in
ISel this was because the code to manually select this intrinsic still thought
it was the version with no side-effects (INTRINSIC_WO_CHAIN) which is wrong as
it doesn't semantically match the definition in the tablegen code which says it
does have side-effects, I've fixed this by updating the intrinsic type to
INTRINSIC_W_CHAIN (has side-effects). I've also added a test for this based on
Hans original reproducer.
Differential Revision: https://reviews.llvm.org/D30645
llvm-svn: 297137
Summary: Previously, it had always been materialized as a push/pop sequence.
Reviewers: labrinea, jroelofs
Reviewed By: jroelofs
Subscribers: llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D30648
llvm-svn: 297134
evex2vex pass defines 2 tables which maps EVEX instructions to their VEX identical when possible. Adding all missing entries.
Differential Revision: https://reviews.llvm.org/D30501
llvm-svn: 297126
This reverts commit r296771.
We found some wide spread test failures internally. I'm working on a
testcase. Politely revert the patch in the mean time. :)
llvm-svn: 297124
A bit more painful than G_INSERT because it was more widely used, but this
should simplify the handling of extract operations in most locations.
llvm-svn: 297100
Fixed the asan bot failure which led to the last commit of the outliner being reverted.
The change is in lib/CodeGen/MachineOutliner.cpp in the SuffixTree's constructor. LeafVector
is no longer initialized using reserve but just a standard constructor.
llvm-svn: 297081
This patch extends the current functionality of the AArch64 redundant copy
elimination pass to handle CMN instructions as well as a shifted
immediates.
Differential Revision: https://reviews.llvm.org/D30576.
llvm-svn: 297078
If a block has non-analyzable branches, the listed successors don't need
to add up to one. For example, if a block has a conditional tail call,
that tail call will not have a corresponding successor in the successor
list, but will still be a possible branch.
Differential Revision: https://reviews.llvm.org/D30556
llvm-svn: 297054
Before, we were producing G_INSERT instructions that were actually closer to a
cast or even a COPY when both input and output sizes are the same. This doesn't
really make sense and means that everything interpreting a G_INSERT also has to
handle all these kinds of casts.
So now we detect these degenerate cases and emit real casts instead.
llvm-svn: 297051
Use the store size of the argument type, which will be a byte-sized
quantity, rather than dividing the size in bits by 8.
Fixes PR32136 and re-enables copy elision from i64 arguments.
Reverts the workaround in from r296950.
llvm-svn: 297045
Now that G_INSERT instructions can only insert one register, this code was
overly general. In another direction it didn't handle registers that crossed
split boundaries properly, which needed to be fixed.
llvm-svn: 297042
Merge the tail block into the loop in cases where the main loop body
exits early, subject to profitability constraints. This will coalesce
the loop body into fewer blocks.
For example:
loop: loop:
// loop body // loop body
if (...) jump exit --> // more body
more: if (...) jump exit
// more body jump loop
jump loop
llvm-svn: 297033
The code in updateDeadFlags removed unnecessary <dead> flags, but there
can be cases where such a flag is not set, and yet a register has become
dead. For example, if a mux with identical inputs is replaced with a COPY,
the predicate register may no longer be used after that.
llvm-svn: 297032
Refactoring of duplicated code and more fixes to follow.
This is motivated by the post-commit comments for r296699:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170306/435182.html
Ie, we can crash if we're missing obvious simplifications like this that
exist in the IR simplifier or if these occur later than expected.
The x86 change for non-splat division shows a potential opportunity to improve
vector codegen: we assumed that since only one lane had meaningful results, we
should do the math in scalar. But that means moving back and forth from vector
registers.
llvm-svn: 297026
These are not x86-specific, but the problem is not visible for all targets
because it is masked by other transforms. These can lead to compiler crashes.
llvm-svn: 297017
Summary:
Functions with the "xray-log-args" attribute will have a special XRay sled kind
emitted, for compiler-rt to copy any call arguments to your logging handler.
For practical and performance reasons, only the first argument is supported, and
only up to 64 bits.
Reviewers: dberris
Reviewed By: dberris
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29702
llvm-svn: 296998
As described on PR31712, we miss a variety of legalization combines because we lower these to X86ISD::VSEXT/VZEXT despite them having the same functionality. This patch makes 128-bit (SSE41) SIGN/ZERO_EXTEND_VECTOR_IN_REG ops legal, adds the necessary tablegen plumbing and uses a helper 'getExtendInVec' to decide when to use SIGN/ZERO_EXTEND_VECTOR_IN_REG or VSEXT/VZEXT.
We're missing a couple of shuffle combines that will be added in a future patch for review.
Later patches can then support the AVX2 cases as a mixture of SIGN/ZERO_EXTEND and SIGN/ZERO_EXTEND_VECTOR_IN_REG, and then finally deal with the AVX512 cases.
Differential Revision: https://reviews.llvm.org/D30549
llvm-svn: 296985
The larger goal is to move the ADC/SBB transforms currently in
combineX86SetCC() to combineAddOrSubToADCOrSBB() because we're
creating ADC/SBB in lots of places where we shouldn't.
This was intended to be an NFC change, but avx-512 has something
strange going on. It doesn't seem like any of the affected tests
should really be using SET+TEST or ADC; a simple ADD could replace
several instructions. But that's another bug...
llvm-svn: 296978
select Cond, C +/- 1, C --> add(ext Cond), C -- with a target hook.
This is part of the ongoing process to obsolete D24480. The motivation is to
canonicalize to select IR in InstCombine whenever possible, so we need to have a way to
undo that easily in codegen.
PowerPC is an obvious winner for this kind of transform because it has fast and complete
bit-twiddling abilities but generally lousy conditional execution perf (although this might
have changed in recent implementations).
x86 also sees some wins, but the effect is limited because these transforms already mostly
exist in its target-specific combineSelectOfTwoConstants(). The fact that we see any x86
changes just shows that that code is a mess of special-case holes. We may be able to remove
some of that logic now.
My guess is that other targets will want to enable this hook for most cases. The likely
follow-ups would be to add value type and/or the constants themselves as parameters for the
hook. As the tests in select_const.ll show, we can transform any select-of-constants to
math/logic, but the general transform for any 2 constants needs one more instruction
(multiply or 'and').
ARM is one target that I think may not want this for most cases. I see infinite loops there
because it wants to use selects to enable conditionally executed instructions.
Differential Revision: https://reviews.llvm.org/D30537
llvm-svn: 296977
Long ago (2010 according to svn blame), combineShuffle probably needed to prevent the accidental creation of illegal i64 types but there doesn't appear to be any combines that can cause this any more as they all have their own legality checks.
Differential Revision: https://reviews.llvm.org/D30213
llvm-svn: 296966
Summary:
When replacing a SDValue, we should remove the replaced value from
SoftenedFloats (and possibly the other maps as well?).
When we revisit a Node because it needs analyzing again, we have to
remove all result values from SoftenedFloats (and possibly other maps?).
This fixes the fp128 test failures with expensive checks for X86.
I think we probably should also remove the values from the other maps
(PromotedIntegers and so on), let me know what you think.
Reviewers: baldrick, bogner, davidxl, ab, arsenm, pirama, chh, RKSimon
Reviewed By: chh
Subscribers: danalbert, wdng, srhines, hfinkel, sepavloff, llvm-commits
Differential Revision: https://reviews.llvm.org/D29265
llvm-svn: 296964
This fixes cases where i1 types were not properly legalized yet and lead
to the creating of 0-sized stack slots.
This fixes http://llvm.org/PR32136
llvm-svn: 296950