SelectionDAG::ComputeNumSignBits is poor at build_vector handling, meaning that we can't see that all the vXi64 sources are in fact sign extended i32 or smaller.
llvm-svn: 297486
Summary:
Depends on D30379
This improves the state of things for the sub class of operation.
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30436
llvm-svn: 297482
If we are transferring MMX registers to XMM for conversion we could use the MMX equivalents (CVTPI2PD + CVTPI2PS) without affecting rounding/exceptions etc.
llvm-svn: 297481
Summary: As per title. This is extracted from D29872 and I threw SADDO in.
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30379
llvm-svn: 297479
If we are transferring XMM conversion results to MMX registers we could use the MMX equivalents (CVTPD2PI/CVTTPD2PI + CVTPS2PI/CVTTPS2PI) with affecting rounding/expections etc.
llvm-svn: 297476
This patches teaches the MIPS backend to accept more values for constant
splats. Previously, only 10 bit signed immediates or values that could be
loaded using an ldi.[bhwd] instruction would be acceptted. This patch relaxes
that constraint so that any constant value that be splatted is accepted.
As a result, the constant pool is used less for vector operations, and the
suite of bit manipulation instructions b(clr|set|neg)i can now be used with
the full range of their immediate operand.
Reviewers: slthakur
Differential Revision: https://reviews.llvm.org/D30640
llvm-svn: 297457
same as already done for ARM and Thumb2.
Reviewers: jmolloy, rogfer01, efriedma
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D30400
llvm-svn: 297443
Summary: This essentially does the same transform as for ADC.
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30417
llvm-svn: 297416
- Fix the insertion point, which occasionally could have been incorrect.
- Avoid creating multiple bitsplits with the same operands, if an old one
could be reused.
llvm-svn: 297414
The good reason to do this is that static allocas are pretty simple to handle
(especially at -O0) and avoiding tracking DBG_VALUEs throughout the pipeline
should give some kind of performance benefit.
The bad reason is that the debug pipeline is an unholy mess of implicit
contracts, where determining whether "DBG_VALUE %reg, imm" actually implies a
load or not involves the services of at least 3 soothsayers and the sacrifice
of at least one chicken. And it still gets it wrong if the variable is at SP
directly.
llvm-svn: 297410
Summary: This essentially does the same transform as for SUBC.
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30437
llvm-svn: 297404
I previously removed the T2XtPk feature from the ARM backend, but it
looks like I missed some of the tests that were using the feature.
Differential Revision: https://reviews.llvm.org/D30778
llvm-svn: 297386
As discussed in the review thread for rL297026, this is actually 2 changes that
would independently fix all of the test cases in the patch:
1. Return undef in FoldConstantArithmetic for div/rem by 0.
2. Move basic undef simplifications for div/rem (simplifyDivRem()) before
foldBinopIntoSelect() as a matter of efficiency.
I will handle the case of vectors with any zero element as a follow-up. That change
is the DAG sibling for D30665 + adding a check of vector elements to FoldConstantVectorArithmetic().
I'm deleting the test for PR30693 because it does not test for the actual bug any more
(dangers of using bugpoint).
Differential Revision:
https://reviews.llvm.org/D30741
llvm-svn: 297384
The fix introduces segfaults and clobbers the value to be stored when
the atomic sequence loops.
Revert "[Target/MIPS] Kill dead code, no functional change intended."
This reverts commit r296153.
Revert "Recommit "[mips] Fix atomic compare and swap at O0.""
This reverts commit r296134.
llvm-svn: 297380
Fix a machine verifier issue where a instruction was using a invalid
register. The return pseudo is expanded and has the return address
register added to it. The return register may have been spuriously
mark as killed earlier.
This partially resolves PR/27458
Thanks to Quentin Colombet for reporting the issue!
llvm-svn: 297372
When extracting a bitfield from the high register in a register pair,
the final offset should be relative to the high register (for 32-bit
extracts).
llvm-svn: 297288
Summary: By using reg_nodbg_empty() to determine if a function can be
treated as a leaf function or not, we miss the case when the register
pair L0_L1 is used but not L0 by itself. This has the effect that
use_all_i32_regs(), a test in reserved-regs.ll which tries to use all
registers, gets treated as a leaf function.
Reviewers: jyknight, venkatra
Reviewed By: jyknight
Subscribers: davide, RKSimon, sepavloff, llvm-commits
Differential Revision: https://reviews.llvm.org/D27089
llvm-svn: 297285
After inspection, it's an UB in our code base. Someone cast a var-arg
function pointer to a non-var-arg one. :/
Re-commit r296771 to continue testing on the patch.
Sorry for the trouble!
llvm-svn: 297256
If there is only one successor, and that successor only
has one predecessor the wait can obviously be delayed until
uses or the end of the next block. This avoids code quality
regressions when there are trivial fallthrough blocks inserted
for structurization.
llvm-svn: 297251
This helps in cases involving bitfields where an AND is exposed by
legalization.
Differential Revision: https://reviews.llvm.org/D30472
llvm-svn: 297249
We were calculating incorrect extract/insert offsets by trying to be too
tricksy with min/max. It's clearer to just split the logic up into "register
starts before this segment" vs "after".
llvm-svn: 297226
Some intrinsics take metadata parameters. These all need custom
handling of some form, and cannot possibly be lowered generically to
G_INTRINSIC calls with vreg operands.
Reject them, instead of hitting an assert later in getOrCreateVReg.
llvm-svn: 297209
When we translate a no-op (same type) bitcast, we try to be clever and
only emit a COPY if we already assigned a vreg to the defined value.
However, when we didn't, we tried to assign to a reference into the
ValToVReg DenseMap, even though the RHS of the assignment
(getOrCreateVReg) could potentially grow that DenseMap, invalidating the
reference.
Avoid that by getting the source vreg first.
I audited the rest of the translator; this is the only tricky case.
The test is quite unwieldy, as the problem is caused by the DenseMap
growing, which happens after the 47th mapped value.
llvm-svn: 297208
For vector operands, the `select` instruction supports both vector and
non-vector conditions. The MIR builder had an overly restrictive
assertion, that only accepted vector conditions for vector selects
(in effect implementing ISD::VSELECT).
Make it possible to express the full range of G_SELECTs.
llvm-svn: 297207
When computing the mapping for non-generic instructions, we skipped
%noreg operands, because we can't always reason about their banks.
Also skip them when applying the mapping. Otherwise, we could end
up with mappings that we can't apply.
While there, duplicate an assert to distinguish between the two
error conditions.
llvm-svn: 297201
When a dbg_value has a constant operand that isn't representable in MI,
there isn't much we can do. Use %noreg (0) for those situations.
This matches the SelectionDAG behavior.
llvm-svn: 297200
We cannot leave the identity copies 'select true, arg, undef' that this pass
inserts for arguments to simplify handling of values on swifterror arguments.
swifterror arguments have restrictions on their uses.
rdar://30839288
llvm-svn: 297197
Broadcom Vulcan is now Cavium ThunderX2T99.
LLVM Bugzilla: http://bugs.llvm.org/show_bug.cgi?id=32113
Minor fixes for the alignments of loops and functions for
ThunderX T81/T83/T88 (better performance).
Patch was tested with SpecCPU2006.
Patch by Stefan Teleman
Differential Revision: https://reviews.llvm.org/D30510
llvm-svn: 297190
The original patch r296865 was reverted as it broke the chromium builds for
Android https://bugs.llvm.org/show_bug.cgi?id=32134, this patch reapplies
r296865 with a fix to make sure it doesn't cause the build regression.
The problem was that intrinsic selection on int_arm_get_fpscr was failing in
ISel this was because the code to manually select this intrinsic still thought
it was the version with no side-effects (INTRINSIC_WO_CHAIN) which is wrong as
it doesn't semantically match the definition in the tablegen code which says it
does have side-effects, I've fixed this by updating the intrinsic type to
INTRINSIC_W_CHAIN (has side-effects). I've also added a test for this based on
Hans original reproducer.
Differential Revision: https://reviews.llvm.org/D30645
llvm-svn: 297137
Summary: Previously, it had always been materialized as a push/pop sequence.
Reviewers: labrinea, jroelofs
Reviewed By: jroelofs
Subscribers: llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D30648
llvm-svn: 297134
evex2vex pass defines 2 tables which maps EVEX instructions to their VEX identical when possible. Adding all missing entries.
Differential Revision: https://reviews.llvm.org/D30501
llvm-svn: 297126
This reverts commit r296771.
We found some wide spread test failures internally. I'm working on a
testcase. Politely revert the patch in the mean time. :)
llvm-svn: 297124
A bit more painful than G_INSERT because it was more widely used, but this
should simplify the handling of extract operations in most locations.
llvm-svn: 297100
Fixed the asan bot failure which led to the last commit of the outliner being reverted.
The change is in lib/CodeGen/MachineOutliner.cpp in the SuffixTree's constructor. LeafVector
is no longer initialized using reserve but just a standard constructor.
llvm-svn: 297081
This patch extends the current functionality of the AArch64 redundant copy
elimination pass to handle CMN instructions as well as a shifted
immediates.
Differential Revision: https://reviews.llvm.org/D30576.
llvm-svn: 297078
If a block has non-analyzable branches, the listed successors don't need
to add up to one. For example, if a block has a conditional tail call,
that tail call will not have a corresponding successor in the successor
list, but will still be a possible branch.
Differential Revision: https://reviews.llvm.org/D30556
llvm-svn: 297054
Before, we were producing G_INSERT instructions that were actually closer to a
cast or even a COPY when both input and output sizes are the same. This doesn't
really make sense and means that everything interpreting a G_INSERT also has to
handle all these kinds of casts.
So now we detect these degenerate cases and emit real casts instead.
llvm-svn: 297051
Use the store size of the argument type, which will be a byte-sized
quantity, rather than dividing the size in bits by 8.
Fixes PR32136 and re-enables copy elision from i64 arguments.
Reverts the workaround in from r296950.
llvm-svn: 297045
Now that G_INSERT instructions can only insert one register, this code was
overly general. In another direction it didn't handle registers that crossed
split boundaries properly, which needed to be fixed.
llvm-svn: 297042
Merge the tail block into the loop in cases where the main loop body
exits early, subject to profitability constraints. This will coalesce
the loop body into fewer blocks.
For example:
loop: loop:
// loop body // loop body
if (...) jump exit --> // more body
more: if (...) jump exit
// more body jump loop
jump loop
llvm-svn: 297033
The code in updateDeadFlags removed unnecessary <dead> flags, but there
can be cases where such a flag is not set, and yet a register has become
dead. For example, if a mux with identical inputs is replaced with a COPY,
the predicate register may no longer be used after that.
llvm-svn: 297032
Refactoring of duplicated code and more fixes to follow.
This is motivated by the post-commit comments for r296699:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170306/435182.html
Ie, we can crash if we're missing obvious simplifications like this that
exist in the IR simplifier or if these occur later than expected.
The x86 change for non-splat division shows a potential opportunity to improve
vector codegen: we assumed that since only one lane had meaningful results, we
should do the math in scalar. But that means moving back and forth from vector
registers.
llvm-svn: 297026
These are not x86-specific, but the problem is not visible for all targets
because it is masked by other transforms. These can lead to compiler crashes.
llvm-svn: 297017
Summary:
Functions with the "xray-log-args" attribute will have a special XRay sled kind
emitted, for compiler-rt to copy any call arguments to your logging handler.
For practical and performance reasons, only the first argument is supported, and
only up to 64 bits.
Reviewers: dberris
Reviewed By: dberris
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29702
llvm-svn: 296998
As described on PR31712, we miss a variety of legalization combines because we lower these to X86ISD::VSEXT/VZEXT despite them having the same functionality. This patch makes 128-bit (SSE41) SIGN/ZERO_EXTEND_VECTOR_IN_REG ops legal, adds the necessary tablegen plumbing and uses a helper 'getExtendInVec' to decide when to use SIGN/ZERO_EXTEND_VECTOR_IN_REG or VSEXT/VZEXT.
We're missing a couple of shuffle combines that will be added in a future patch for review.
Later patches can then support the AVX2 cases as a mixture of SIGN/ZERO_EXTEND and SIGN/ZERO_EXTEND_VECTOR_IN_REG, and then finally deal with the AVX512 cases.
Differential Revision: https://reviews.llvm.org/D30549
llvm-svn: 296985
The larger goal is to move the ADC/SBB transforms currently in
combineX86SetCC() to combineAddOrSubToADCOrSBB() because we're
creating ADC/SBB in lots of places where we shouldn't.
This was intended to be an NFC change, but avx-512 has something
strange going on. It doesn't seem like any of the affected tests
should really be using SET+TEST or ADC; a simple ADD could replace
several instructions. But that's another bug...
llvm-svn: 296978
select Cond, C +/- 1, C --> add(ext Cond), C -- with a target hook.
This is part of the ongoing process to obsolete D24480. The motivation is to
canonicalize to select IR in InstCombine whenever possible, so we need to have a way to
undo that easily in codegen.
PowerPC is an obvious winner for this kind of transform because it has fast and complete
bit-twiddling abilities but generally lousy conditional execution perf (although this might
have changed in recent implementations).
x86 also sees some wins, but the effect is limited because these transforms already mostly
exist in its target-specific combineSelectOfTwoConstants(). The fact that we see any x86
changes just shows that that code is a mess of special-case holes. We may be able to remove
some of that logic now.
My guess is that other targets will want to enable this hook for most cases. The likely
follow-ups would be to add value type and/or the constants themselves as parameters for the
hook. As the tests in select_const.ll show, we can transform any select-of-constants to
math/logic, but the general transform for any 2 constants needs one more instruction
(multiply or 'and').
ARM is one target that I think may not want this for most cases. I see infinite loops there
because it wants to use selects to enable conditionally executed instructions.
Differential Revision: https://reviews.llvm.org/D30537
llvm-svn: 296977
Long ago (2010 according to svn blame), combineShuffle probably needed to prevent the accidental creation of illegal i64 types but there doesn't appear to be any combines that can cause this any more as they all have their own legality checks.
Differential Revision: https://reviews.llvm.org/D30213
llvm-svn: 296966
Summary:
When replacing a SDValue, we should remove the replaced value from
SoftenedFloats (and possibly the other maps as well?).
When we revisit a Node because it needs analyzing again, we have to
remove all result values from SoftenedFloats (and possibly other maps?).
This fixes the fp128 test failures with expensive checks for X86.
I think we probably should also remove the values from the other maps
(PromotedIntegers and so on), let me know what you think.
Reviewers: baldrick, bogner, davidxl, ab, arsenm, pirama, chh, RKSimon
Reviewed By: chh
Subscribers: danalbert, wdng, srhines, hfinkel, sepavloff, llvm-commits
Differential Revision: https://reviews.llvm.org/D29265
llvm-svn: 296964
This fixes cases where i1 types were not properly legalized yet and lead
to the creating of 0-sized stack slots.
This fixes http://llvm.org/PR32136
llvm-svn: 296950
These are simplified variants of the current G_SEQUENCE and G_EXTRACT, which
assume the individual parts will be contiguous, homogeneous, and occupy the
entirity of the larger register. This makes reasoning about them much easer
since you only have to look at the first register being merged and the result
to know what the instruction is doing.
I intend to gradually replace all uses of the more complicated sequence/extract
with these (or single-element insert/extracts), and then remove the older
variants. For now we start with legalization.
llvm-svn: 296921
The intrinsics __builtin_arm_get_fpscr and __builtin_arm_set_fpscr read and
write to the fpscr (Floating-Point Status and Control Register) register.
A bug exists in the __builtin_arm_get_fpscr intrinsic definition in llvm which
treats this intrinsic as a IntroNoMem which means it's not a memory access and
doesn't have any other side-effects. Having this property on this intrinsic
means that various optimizations can be done on this such as common
sub-expression elimination with other reads. This can cause issues if there has
been write to this register, e.g.
void foo(int *p) {
p[0] = __builtin_arm_get_fpscr();
__builtin_arm_set_fpscr(1);
p[1] = __builtin_arm_get_fpscr();
}
in the above example the second read is currently CSE'd into the first read,
this is because llvm isn't aware that the write done by __builtin_arm_set_fpscr
effects the same register that __builtin_arm_get_fpscr reads from, to fix this
problem I've removed the property IntrNoMem so that __builtin_arm_get_fpscr is
treated as a memory access.
Differential Revision: https://reviews.llvm.org/D30542
llvm-svn: 296865
This patch causes compile times for some patterns to explode. I have
a (large, unreduced) test case that slows down by more than 20x and
several test cases slow down by 2x. I'm sending some of the test cases
directly to Nirav and following up with more details in the review log,
but this should unblock anyone else hitting this.
llvm-svn: 296862
VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be.
Differential Revision: https://reviews.llvm.org/D29874
llvm-svn: 296859
For chains of triangles with small join blocks that can be tail duplicated, a
simple calculation of probabilities is insufficient. Tail duplication
can be profitable in 3 different ways for these cases:
1) The post-dominators marked 50% are actually taken 56% (This shrinks with
longer chains)
2) The chains are statically correlated. Branch probabilities have a very
U-shaped distribution.
[http://nrs.harvard.edu/urn-3:HUL.InstRepos:24015805]
If the branches in a chain are likely to be from the same side of the
distribution as their predecessor, but are independent at runtime, this
transformation is profitable. (Because the cost of being wrong is a small
fixed cost, unlike the standard triangle layout where the cost of being
wrong scales with the # of triangles.)
3) The chains are dynamically correlated. If the probability that a previous
branch was taken positively influences whether the next branch will be
taken
We believe that 2 and 3 are common enough to justify the small margin in 1.
The code pre-scans a function's CFG to identify this pattern and marks the edges
so that the standard layout algorithm can use the computed results.
llvm-svn: 296845
Summary:
Currently, when 't1: i1 = setcc t2, t3, cc' followed by 't4: i1 = xor t1, Constant:i1<-1>' is folded into 't5: i1 = setcc t2, t3 !cc', SDLoc of newly created SDValue 't5' follows SDLoc of 't4', not 't1'. However, as the opcode of newly created SDValue is 'setcc', it make more sense to take DebugLoc from 't1' than 't4'. For the code below
```
extern int bar();
extern int baz();
int foo(int x, int y) {
if (x != y)
return bar();
else
return baz();
}
```
, following is the bitcode representation of 'foo' at the end of llvm-ir level optimization:
```
define i32 @foo(i32 %x, i32 %y) !dbg !4 {
entry:
tail call void @llvm.dbg.value(metadata i32 %x, i64 0, metadata !9, metadata !11), !dbg !12
tail call void @llvm.dbg.value(metadata i32 %y, i64 0, metadata !10, metadata !11), !dbg !13
%cmp = icmp ne i32 %x, %y, !dbg !14
br i1 %cmp, label %if.then, label %if.else, !dbg !16
if.then: ; preds = %entry
%call = tail call i32 (...) @bar() #3, !dbg !17
br label %return, !dbg !18
if.else: ; preds = %entry
%call1 = tail call i32 (...) @baz() #3, !dbg !19
br label %return, !dbg !20
return: ; preds = %if.else, %if.then
%retval.0 = phi i32 [ %call, %if.then ], [ %call1, %if.else ]
ret i32 %retval.0, !dbg !21
}
!14 = !DILocation(line: 5, column: 9, scope: !15)
!16 = !DILocation(line: 5, column: 7, scope: !4)
```
As you can see, in 'entry' block, 'icmp' instruction and 'br' instruction have different debug locations. However, with current implementation, there's no distinction between debug locations of these two when they are lowered to asm instructions. This is because 'icmp' and 'br' become 'setcc' 'xor' and 'brcond' in SelectionDAG, where SDLoc of 'setcc' follows the debug location of 'icmp' but SDLOC of 'xor' and 'brcond' follows the debug location of 'br' instruction, and SDLoc of 'xor' overwrites SDLoc of 'setcc' when they are folded. This patch addresses this issue.
Reviewers: atrick, bogner, andreadb, craig.topper, aprantl
Reviewed By: andreadb
Subscribers: jlebar, mkuper, jholewinski, andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D29813
llvm-svn: 296825
This commit also relied on r296812, which I just reverted. We should probably
apply it again, after the r296812 has been discussed and been reapplied in some
variant.
llvm-svn: 296820