We used to hit an unreachable in getRegBankFromRegClass when dealing with the
stack pointer. This commit adds support for the GPRsp reg class.
llvm-svn: 297621
This reverts r297596.
There were other issues that were making this not work that have been fixed now. Reverting this results in a more accurate table.
llvm-svn: 297602
This exposed that we have several intrinsic instructions that have identical TSFlags to other instructions. We should merge their patterns and kill of the duplicate. I'll fix that in a follow up patch.
llvm-svn: 297596
The immediate should be 1 or 2, not 0 or 1. This was found while adding bounds checking to clang. In fact the existing clang builtin test failed if we ran it all the way to assembly.
llvm-svn: 297591
I noticed unnecessary 'sbb' instructions in D30472 and while looking at 'ptest' codegen recently.
This happens because we were transforming any 'setb' - even when we only wanted a single-bit result.
This patch moves those transforms under visitAdd/visitSub, so we we're only creating sbb/adc when it
is a win. I don't know why we need a SETCC_CARRY node type, but I'm not proposing to change that
existing behavior in this patch.
Also, I'm skeptical that sbb/adc are a win for all micro-arches, so I added comments to the test files
where this transform still fires.
The test changes here are all cases where we no longer produce sbb/adc. Avoiding partial register
stalls (generating an xor to clear a register) is not handled in some cases, but that's a separate
issue.
Differential Revision: https://reviews.llvm.org/D30611
llvm-svn: 297586
Summary:
A53 scheduler causes an assertion failure on all CRC instructions:
include/llvm/CodeGen/MachineInstr.h:280: const llvm::MachineOperand
&llvm::MachineInstr::getOperand(unsigned int) const: Assertion `i <
getNumOperands() && "getOperand() out of range!"' failed.
The case statements corresponding to CRC instructions are incorrect and should
be removed.
Also adding a testcase while on this.
Reviewers: t.p.northover, javed.absar, apazos, rengolin
Reviewed By: rengolin
Subscribers: evandro, aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D30274
llvm-svn: 297582
I'm pretty sure there are more problems lurking here. But I think this fixes PR32241.
I've added the test case from that bug and added asserts that will fail if we ever try to copy between high registers and mask registers again.
llvm-svn: 297574
Without SSE41 (pextrb) we currently extract byte elements from a vector by spilling to stack and reloading the byte.
This patch is an initial attempt at using MOVD/PEXTRW to extract the relevant DWORD/WORD from the vector and then shift+truncate to collect the correct byte.
Extraction of multiple bytes this way would result in code bloat, but as explained in the patch we could probably afford to be more aggressive with the supported extractions before again falling back on spilling - possibly through counting the number of extracts and which DWORD/WORD they originate?
Differential Revision: https://reviews.llvm.org/D29841
llvm-svn: 297568
Since v_max_f32_e64/v_max_f16_e64 can be folded if the target
instruction supports the clamp bit, we also need to maintain
modifiers when converting v_mac to v_mad.
This fixes a rendering issue with Dirt Rally because a v_mac
instruction with the clamp bit set was converted to a v_mad
but that bit was lost during the conversion.
Fixes: e184e01dd79 ("AMDGPU: Fold FP clamp as modifier bit")
Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com>
llvm-svn: 297556
This method inverts the Reason field of a scheduling candidate.
It does right comparison between RegCritical and RegExcess, but
everything else is broken. In fact it can prefer less strong reason
such as Weak over RegCritical because Weak > -RegCritical.
The CandReason enum is properly sorted, so just remove artificial
ranking.
Differential Revision: https://reviews.llvm.org/D30557
llvm-svn: 297536
This only requires a 64-bit memory source, not the whole 128-bits. But the 128-bit case is still supported via X86InstrInfo::foldMemoryOperandImpl
llvm-svn: 297523
SelectionDAG::ComputeNumSignBits is poor at build_vector handling, meaning that we can't see that all the vXi64 sources are in fact sign extended i32 or smaller.
llvm-svn: 297486
Summary:
Depends on D30379
This improves the state of things for the sub class of operation.
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30436
llvm-svn: 297482
If we are transferring MMX registers to XMM for conversion we could use the MMX equivalents (CVTPI2PD + CVTPI2PS) without affecting rounding/exceptions etc.
llvm-svn: 297481
Summary: As per title. This is extracted from D29872 and I threw SADDO in.
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30379
llvm-svn: 297479
If we are transferring XMM conversion results to MMX registers we could use the MMX equivalents (CVTPD2PI/CVTTPD2PI + CVTPS2PI/CVTTPS2PI) with affecting rounding/expections etc.
llvm-svn: 297476
This patches teaches the MIPS backend to accept more values for constant
splats. Previously, only 10 bit signed immediates or values that could be
loaded using an ldi.[bhwd] instruction would be acceptted. This patch relaxes
that constraint so that any constant value that be splatted is accepted.
As a result, the constant pool is used less for vector operations, and the
suite of bit manipulation instructions b(clr|set|neg)i can now be used with
the full range of their immediate operand.
Reviewers: slthakur
Differential Revision: https://reviews.llvm.org/D30640
llvm-svn: 297457
same as already done for ARM and Thumb2.
Reviewers: jmolloy, rogfer01, efriedma
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D30400
llvm-svn: 297443
Summary: This essentially does the same transform as for ADC.
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30417
llvm-svn: 297416
- Fix the insertion point, which occasionally could have been incorrect.
- Avoid creating multiple bitsplits with the same operands, if an old one
could be reused.
llvm-svn: 297414
The good reason to do this is that static allocas are pretty simple to handle
(especially at -O0) and avoiding tracking DBG_VALUEs throughout the pipeline
should give some kind of performance benefit.
The bad reason is that the debug pipeline is an unholy mess of implicit
contracts, where determining whether "DBG_VALUE %reg, imm" actually implies a
load or not involves the services of at least 3 soothsayers and the sacrifice
of at least one chicken. And it still gets it wrong if the variable is at SP
directly.
llvm-svn: 297410
Summary: This essentially does the same transform as for SUBC.
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30437
llvm-svn: 297404
I previously removed the T2XtPk feature from the ARM backend, but it
looks like I missed some of the tests that were using the feature.
Differential Revision: https://reviews.llvm.org/D30778
llvm-svn: 297386
As discussed in the review thread for rL297026, this is actually 2 changes that
would independently fix all of the test cases in the patch:
1. Return undef in FoldConstantArithmetic for div/rem by 0.
2. Move basic undef simplifications for div/rem (simplifyDivRem()) before
foldBinopIntoSelect() as a matter of efficiency.
I will handle the case of vectors with any zero element as a follow-up. That change
is the DAG sibling for D30665 + adding a check of vector elements to FoldConstantVectorArithmetic().
I'm deleting the test for PR30693 because it does not test for the actual bug any more
(dangers of using bugpoint).
Differential Revision:
https://reviews.llvm.org/D30741
llvm-svn: 297384
The fix introduces segfaults and clobbers the value to be stored when
the atomic sequence loops.
Revert "[Target/MIPS] Kill dead code, no functional change intended."
This reverts commit r296153.
Revert "Recommit "[mips] Fix atomic compare and swap at O0.""
This reverts commit r296134.
llvm-svn: 297380
Fix a machine verifier issue where a instruction was using a invalid
register. The return pseudo is expanded and has the return address
register added to it. The return register may have been spuriously
mark as killed earlier.
This partially resolves PR/27458
Thanks to Quentin Colombet for reporting the issue!
llvm-svn: 297372
When extracting a bitfield from the high register in a register pair,
the final offset should be relative to the high register (for 32-bit
extracts).
llvm-svn: 297288
Summary: By using reg_nodbg_empty() to determine if a function can be
treated as a leaf function or not, we miss the case when the register
pair L0_L1 is used but not L0 by itself. This has the effect that
use_all_i32_regs(), a test in reserved-regs.ll which tries to use all
registers, gets treated as a leaf function.
Reviewers: jyknight, venkatra
Reviewed By: jyknight
Subscribers: davide, RKSimon, sepavloff, llvm-commits
Differential Revision: https://reviews.llvm.org/D27089
llvm-svn: 297285
After inspection, it's an UB in our code base. Someone cast a var-arg
function pointer to a non-var-arg one. :/
Re-commit r296771 to continue testing on the patch.
Sorry for the trouble!
llvm-svn: 297256
If there is only one successor, and that successor only
has one predecessor the wait can obviously be delayed until
uses or the end of the next block. This avoids code quality
regressions when there are trivial fallthrough blocks inserted
for structurization.
llvm-svn: 297251
This helps in cases involving bitfields where an AND is exposed by
legalization.
Differential Revision: https://reviews.llvm.org/D30472
llvm-svn: 297249
We were calculating incorrect extract/insert offsets by trying to be too
tricksy with min/max. It's clearer to just split the logic up into "register
starts before this segment" vs "after".
llvm-svn: 297226
Some intrinsics take metadata parameters. These all need custom
handling of some form, and cannot possibly be lowered generically to
G_INTRINSIC calls with vreg operands.
Reject them, instead of hitting an assert later in getOrCreateVReg.
llvm-svn: 297209
When we translate a no-op (same type) bitcast, we try to be clever and
only emit a COPY if we already assigned a vreg to the defined value.
However, when we didn't, we tried to assign to a reference into the
ValToVReg DenseMap, even though the RHS of the assignment
(getOrCreateVReg) could potentially grow that DenseMap, invalidating the
reference.
Avoid that by getting the source vreg first.
I audited the rest of the translator; this is the only tricky case.
The test is quite unwieldy, as the problem is caused by the DenseMap
growing, which happens after the 47th mapped value.
llvm-svn: 297208
For vector operands, the `select` instruction supports both vector and
non-vector conditions. The MIR builder had an overly restrictive
assertion, that only accepted vector conditions for vector selects
(in effect implementing ISD::VSELECT).
Make it possible to express the full range of G_SELECTs.
llvm-svn: 297207
When computing the mapping for non-generic instructions, we skipped
%noreg operands, because we can't always reason about their banks.
Also skip them when applying the mapping. Otherwise, we could end
up with mappings that we can't apply.
While there, duplicate an assert to distinguish between the two
error conditions.
llvm-svn: 297201
When a dbg_value has a constant operand that isn't representable in MI,
there isn't much we can do. Use %noreg (0) for those situations.
This matches the SelectionDAG behavior.
llvm-svn: 297200
We cannot leave the identity copies 'select true, arg, undef' that this pass
inserts for arguments to simplify handling of values on swifterror arguments.
swifterror arguments have restrictions on their uses.
rdar://30839288
llvm-svn: 297197
Broadcom Vulcan is now Cavium ThunderX2T99.
LLVM Bugzilla: http://bugs.llvm.org/show_bug.cgi?id=32113
Minor fixes for the alignments of loops and functions for
ThunderX T81/T83/T88 (better performance).
Patch was tested with SpecCPU2006.
Patch by Stefan Teleman
Differential Revision: https://reviews.llvm.org/D30510
llvm-svn: 297190
The original patch r296865 was reverted as it broke the chromium builds for
Android https://bugs.llvm.org/show_bug.cgi?id=32134, this patch reapplies
r296865 with a fix to make sure it doesn't cause the build regression.
The problem was that intrinsic selection on int_arm_get_fpscr was failing in
ISel this was because the code to manually select this intrinsic still thought
it was the version with no side-effects (INTRINSIC_WO_CHAIN) which is wrong as
it doesn't semantically match the definition in the tablegen code which says it
does have side-effects, I've fixed this by updating the intrinsic type to
INTRINSIC_W_CHAIN (has side-effects). I've also added a test for this based on
Hans original reproducer.
Differential Revision: https://reviews.llvm.org/D30645
llvm-svn: 297137
Summary: Previously, it had always been materialized as a push/pop sequence.
Reviewers: labrinea, jroelofs
Reviewed By: jroelofs
Subscribers: llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D30648
llvm-svn: 297134
evex2vex pass defines 2 tables which maps EVEX instructions to their VEX identical when possible. Adding all missing entries.
Differential Revision: https://reviews.llvm.org/D30501
llvm-svn: 297126
This reverts commit r296771.
We found some wide spread test failures internally. I'm working on a
testcase. Politely revert the patch in the mean time. :)
llvm-svn: 297124
A bit more painful than G_INSERT because it was more widely used, but this
should simplify the handling of extract operations in most locations.
llvm-svn: 297100
Fixed the asan bot failure which led to the last commit of the outliner being reverted.
The change is in lib/CodeGen/MachineOutliner.cpp in the SuffixTree's constructor. LeafVector
is no longer initialized using reserve but just a standard constructor.
llvm-svn: 297081
This patch extends the current functionality of the AArch64 redundant copy
elimination pass to handle CMN instructions as well as a shifted
immediates.
Differential Revision: https://reviews.llvm.org/D30576.
llvm-svn: 297078
If a block has non-analyzable branches, the listed successors don't need
to add up to one. For example, if a block has a conditional tail call,
that tail call will not have a corresponding successor in the successor
list, but will still be a possible branch.
Differential Revision: https://reviews.llvm.org/D30556
llvm-svn: 297054
Before, we were producing G_INSERT instructions that were actually closer to a
cast or even a COPY when both input and output sizes are the same. This doesn't
really make sense and means that everything interpreting a G_INSERT also has to
handle all these kinds of casts.
So now we detect these degenerate cases and emit real casts instead.
llvm-svn: 297051
Use the store size of the argument type, which will be a byte-sized
quantity, rather than dividing the size in bits by 8.
Fixes PR32136 and re-enables copy elision from i64 arguments.
Reverts the workaround in from r296950.
llvm-svn: 297045
Now that G_INSERT instructions can only insert one register, this code was
overly general. In another direction it didn't handle registers that crossed
split boundaries properly, which needed to be fixed.
llvm-svn: 297042
Merge the tail block into the loop in cases where the main loop body
exits early, subject to profitability constraints. This will coalesce
the loop body into fewer blocks.
For example:
loop: loop:
// loop body // loop body
if (...) jump exit --> // more body
more: if (...) jump exit
// more body jump loop
jump loop
llvm-svn: 297033
The code in updateDeadFlags removed unnecessary <dead> flags, but there
can be cases where such a flag is not set, and yet a register has become
dead. For example, if a mux with identical inputs is replaced with a COPY,
the predicate register may no longer be used after that.
llvm-svn: 297032
Refactoring of duplicated code and more fixes to follow.
This is motivated by the post-commit comments for r296699:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170306/435182.html
Ie, we can crash if we're missing obvious simplifications like this that
exist in the IR simplifier or if these occur later than expected.
The x86 change for non-splat division shows a potential opportunity to improve
vector codegen: we assumed that since only one lane had meaningful results, we
should do the math in scalar. But that means moving back and forth from vector
registers.
llvm-svn: 297026
These are not x86-specific, but the problem is not visible for all targets
because it is masked by other transforms. These can lead to compiler crashes.
llvm-svn: 297017
Summary:
Functions with the "xray-log-args" attribute will have a special XRay sled kind
emitted, for compiler-rt to copy any call arguments to your logging handler.
For practical and performance reasons, only the first argument is supported, and
only up to 64 bits.
Reviewers: dberris
Reviewed By: dberris
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29702
llvm-svn: 296998
As described on PR31712, we miss a variety of legalization combines because we lower these to X86ISD::VSEXT/VZEXT despite them having the same functionality. This patch makes 128-bit (SSE41) SIGN/ZERO_EXTEND_VECTOR_IN_REG ops legal, adds the necessary tablegen plumbing and uses a helper 'getExtendInVec' to decide when to use SIGN/ZERO_EXTEND_VECTOR_IN_REG or VSEXT/VZEXT.
We're missing a couple of shuffle combines that will be added in a future patch for review.
Later patches can then support the AVX2 cases as a mixture of SIGN/ZERO_EXTEND and SIGN/ZERO_EXTEND_VECTOR_IN_REG, and then finally deal with the AVX512 cases.
Differential Revision: https://reviews.llvm.org/D30549
llvm-svn: 296985
The larger goal is to move the ADC/SBB transforms currently in
combineX86SetCC() to combineAddOrSubToADCOrSBB() because we're
creating ADC/SBB in lots of places where we shouldn't.
This was intended to be an NFC change, but avx-512 has something
strange going on. It doesn't seem like any of the affected tests
should really be using SET+TEST or ADC; a simple ADD could replace
several instructions. But that's another bug...
llvm-svn: 296978
select Cond, C +/- 1, C --> add(ext Cond), C -- with a target hook.
This is part of the ongoing process to obsolete D24480. The motivation is to
canonicalize to select IR in InstCombine whenever possible, so we need to have a way to
undo that easily in codegen.
PowerPC is an obvious winner for this kind of transform because it has fast and complete
bit-twiddling abilities but generally lousy conditional execution perf (although this might
have changed in recent implementations).
x86 also sees some wins, but the effect is limited because these transforms already mostly
exist in its target-specific combineSelectOfTwoConstants(). The fact that we see any x86
changes just shows that that code is a mess of special-case holes. We may be able to remove
some of that logic now.
My guess is that other targets will want to enable this hook for most cases. The likely
follow-ups would be to add value type and/or the constants themselves as parameters for the
hook. As the tests in select_const.ll show, we can transform any select-of-constants to
math/logic, but the general transform for any 2 constants needs one more instruction
(multiply or 'and').
ARM is one target that I think may not want this for most cases. I see infinite loops there
because it wants to use selects to enable conditionally executed instructions.
Differential Revision: https://reviews.llvm.org/D30537
llvm-svn: 296977