Unsigned addition and subtraction can reuse the instructions created to
legalize large width operations (i.e. both produce and consume a carry flag).
Signed operations and multiplies get a dedicated op-with-overflow instruction.
Once this is produced the two values are combined into a struct register (which
will almost always be merged with a corresponding G_EXTRACT as part of
legalization).
llvm-svn: 279278
This doesn't change tests codegen as we already combined to blend+zero which is what we lower VZEXT_MOVL to on SSE41+ targets, but it does put us in a better position when we improve shuffling for optsize.
llvm-svn: 279273
In addition, the branch instructions will have proper BB destinations, not offsets, like before.
Patch by Vadzim Dambrouski!
Differential Revision: https://reviews.llvm.org/D20162
llvm-svn: 279242
Improved handling of fma, floating point min/max, additional load/store
instructions for floating point types.
Patch by Jyotsna Verma.
llvm-svn: 279239
INSERTPS doesn't fit well with our shuffle mask canonicalization, so we need to attempt both the original mask and the commuted mask to more likely get a match
llvm-svn: 279230
The new version has several advantages:
1) IMSHO it's more readable and neater
2) It handles loads and stores properly
3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.
With this change we can now finally sink load-modify-store idioms such as:
if (a)
return *b += 3;
else
return *b += 4;
=>
%z = load i32, i32* %y
%.sink = select i1 %a, i32 5, i32 7
%b = add i32 %z, %.sink
store i32 %b, i32* %y
ret i32 %b
When this works for switches it'll be even more powerful.
llvm-svn: 279229
The heuristic above this code is incredibly suspect, but disregarding that it mutates the cast opcode so we need to check the *mutated* opcode later to see if we need to emit an AssertSext or AssertZext node.
Fixes PR29041.
llvm-svn: 279223
Without the synthesized reference to a symbol in the xray_instr_map,
linker section garbage collection will helpfully remove the whole
xray_instr_map section from the final executable (or archive). This will
cause the runtime to not be able to identify the sleds and hot-patch the
calls/jumps into the runtime trampolines.
This change adds a reference from the text section at the end of the
function to keep around the associated xray_instr_map section as well.
We also make sure that we catch this reference in the test.
Reviewers: chandlerc, echristo, majnemer, mehdi_amini
Subscribers: mehdi_amini, llvm-commits, dberris
Differential Revision: https://reviews.llvm.org/D23398
llvm-svn: 279204
The ppc64 multistage bot fails on this.
This reverts commit r279124.
Also Revert "CodeGen: Add/Factor out LiveRegUnits class; NFCI" because it depends on the previous change
This reverts commit r279171.
llvm-svn: 279199
The following function currently relies on tail-merging for if
conversion to succeed. The common tail of cond_true and cond_false is
extracted, and this then forms a diamond pattern that can be
successfully if converted.
If this block does not get extracted, either because tail-merging is
disabled or the threshold is higher, we should still recognize this
pattern and if-convert it.
Fixed a regression in the original commit. Need to un-reverse branches after
reversing them, or other conversions go awry.
Regression on self-hosting bots with no obvious explanation. Tidied up range
handling to be more obviously correct, but there was no smoking gun.
define i32 @t2(i32 %a, i32 %b) nounwind {
entry:
%tmp1434 = icmp eq i32 %a, %b ; <i1> [#uses=1]
br i1 %tmp1434, label %bb17, label %bb.outer
bb.outer: ; preds = %cond_false, %entry
%b_addr.021.0.ph = phi i32 [ %b, %entry ], [ %tmp10, %cond_false ]
%a_addr.026.0.ph = phi i32 [ %a, %entry ], [ %a_addr.026.0, %cond_false ]
br label %bb
bb: ; preds = %cond_true, %bb.outer
%indvar = phi i32 [ 0, %bb.outer ], [ %indvar.next, %cond_true ]
%tmp. = sub i32 0, %b_addr.021.0.ph
%tmp.40 = mul i32 %indvar, %tmp.
%a_addr.026.0 = add i32 %tmp.40, %a_addr.026.0.ph
%tmp3 = icmp sgt i32 %a_addr.026.0, %b_addr.021.0.ph
br i1 %tmp3, label %cond_true, label %cond_false
cond_true: ; preds = %bb
%tmp7 = sub i32 %a_addr.026.0, %b_addr.021.0.ph
%tmp1437 = icmp eq i32 %tmp7, %b_addr.021.0.ph
%indvar.next = add i32 %indvar, 1
br i1 %tmp1437, label %bb17, label %bb
cond_false: ; preds = %bb
%tmp10 = sub i32 %b_addr.021.0.ph, %a_addr.026.0
%tmp14 = icmp eq i32 %a_addr.026.0, %tmp10
br i1 %tmp14, label %bb17, label %bb.outer
bb17: ; preds = %cond_false, %cond_true, %entry
%a_addr.026.1 = phi i32 [ %a, %entry ], [ %tmp7, %cond_true ], [ %a_addr.026.0, %cond_false ]
ret i32 %a_addr.026.1
}
Without tail-merging or diamond-tail if conversion:
LBB1_1: @ %bb
@ =>This Inner Loop Header: Depth=1
cmp r0, r1
ble LBB1_3
@ BB#2: @ %cond_true
@ in Loop: Header=BB1_1 Depth=1
subs r0, r0, r1
cmp r1, r0
it ne
cmpne r0, r1
bgt LBB1_4
LBB1_3: @ %cond_false
@ in Loop: Header=BB1_1 Depth=1
subs r1, r1, r0
cmp r1, r0
bne LBB1_1
LBB1_4: @ %bb17
bx lr
With diamond-tail if conversion, but without tail-merging:
@ BB#0: @ %entry
cmp r0, r1
it eq
bxeq lr
LBB1_1: @ %bb
@ =>This Inner Loop Header: Depth=1
cmp r0, r1
ite le
suble r1, r1, r0
subgt r0, r0, r1
cmp r1, r0
bne LBB1_1
@ BB#2: @ %bb17
bx lr
llvm-svn: 279168
Summary:
Inline asm memory constraints can have the base or index register be assigned
to %r0 right now. Make sure that we assign only ADDR64 registers to the base
and index.
Reviewers: uweigand
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23367
llvm-svn: 279157
Summary:
We need to use floating-point compares to ensure that s_cbranch_vcc*
instructions are always generated. With integer compares, future
optimizations could cause s_cbranch_scc* to be generated instead.
Reviewers: arsenm, nhaehnle
Subscribers: llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D23401
llvm-svn: 279148
Re-apply r276044 with off-by-1 instruction fix for the reload placement.
This is a variant of scavengeRegister() that works for
enterBasicBlockEnd()/backward(). The benefit of the backward mode is
that it is not affected by incomplete kill flags.
This patch also changes
PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register
scavenger in backwards mode.
Differential Revision: http://reviews.llvm.org/D21885
llvm-svn: 279124
Normally, when an AND with a constant is lowered to NILL, the constant value is truncated to 16 bits. However, since r274066, ANDs whose results are used in a shift are caught by a different pattern that does not truncate. The instruction printer expects a 16-bit unsigned immediate operand for NILL, so this results in an abort.
This patch adds code to manually truncate the constant in this situation. The rest of the bits are then set, so we will detect a case for NILL "naturally" rather than using peephole optimizations.
Differential Revision: http://reviews.llvm.org/D21854
llvm-svn: 279105
Remove an unnecessary round-trip:
iterator => operator->() => getIterator()
In some cases, the iterator is end(), so the dereference of operator->
is invalid (UB).
The testcase only crashes with r278974 (currently reverted to
investigate this), which adds an assertion for invalid dereferences of
ilist nodes.
Fixes PR29035.
llvm-svn: 279104
The WebAssemly spec removing the return value from store instructions, so
remove the associated optimization from LLVM.
This patch leaves the store instruction operands in place for now, so stores
now always write to "$drop"; these will be removed in a seperate patch.
llvm-svn: 279100
This patch changes the code structure of
WebAssemblyLowerEmscriptenException pass to support both exception
handling and setjmp/longjmp. It also changes the name of the pass and
the source file.
1. Change the file/pass name to WebAssemblyLowerEmscriptenExceptions ->
WebAssemblyLowerEmscriptenEHSjLj to make it clear that it supports both
EH and SjLj
2. List function / global variable names at the top so they
can be changed easily
3. Some cosmetic changes
Patch by Heejin Ahn
Differential Revision: https://reviews.llvm.org/D23588
llvm-svn: 279075
There is no REM instruction; that will require an expansion.
It's not obvious that should be done in select, rather than as a
(custom?) legalization.
llvm-svn: 279074
r277708 enabled tails calls for MIPS but used the 'jr' instruction when the
jump target was held in a register. For MIPSR6, 'jalr $zero, $reg' should
have been used. Additionally, add missing patterns for external and global
symbols for tail calls.
Reviewers: dsanders, vkalintiris
Differential Review: https://reviews.llvm.org/D23301
llvm-svn: 279064
This is a fix for https://llvm.org/bugs/show_bug.cgi?id=29010
Root cause of the bug is that the register class of the machine instruction operand does not fully reflect if this registers that can be allocated.
Both for i386 and x86_64 the operand's register class is VR128RegClass and thus contains xmm0-xmm15, though in i386 we can only use xmm0-xmm8.
In order to get the actual allocable registers of the class we need to use RegisterClassInfo.
Differential Revision: https://reviews.llvm.org/D23613
llvm-svn: 278954
Refactored so that a LSRUse owns its fixups, as oppsed to letting the
LSRInstance own them. This makes it easier to rate formulas for
LSRUses, since the fixups are available directly. The Offsets vector
has been removed since it was no longer necessary.
New target hook isFoldableMemAccessOffset(), which is used during formula
rating.
For SystemZ, this is useful to express that loads and stores with
float or vector types with a big/negative offset should be avoided in
loops. Without this, LSR will generate a lot of negative offsets that
would require extra instructions for loading the address.
Updated tests:
test/CodeGen/SystemZ/loop-01.ll
Reviewed by: Quentin Colombet and Ulrich Weigand.
https://reviews.llvm.org/D19152
llvm-svn: 278927
This is a quick work around, because in some cases, e.g. caller's stack
size > callee's stack size, we are still able to apply sibling call
optimization even callee has any byval arg.
This patch fix: https://llvm.org/bugs/show_bug.cgi?id=28328
Reviewers: hfinkel kbarton nemanjai amehsan
Subscribers: hans, tjablin
https://reviews.llvm.org/D23441
llvm-svn: 278900
If AnalyzeBranch can't analyze a block and it is possible to
fallthrough, then duplicating the block doesn't make sense, as only one
block can be the layout predecessor for the un-analyzable fallthrough.
Submitted wit a test case, but NOTE: the test case doesn't currently
fail. However, the test case fails with D20505 and would have saved me
some time debugging.
llvm-svn: 278866
This patch handles 64-bit constants which can be encoded as 32-bit immediates.
It extends the functionality added by https://reviews.llvm.org/D11363 for 32-bit constants to 64-bit constants.
Patch by Sunita Marathe!
Differential Revision: https://reviews.llvm.org/D23391
llvm-svn: 278857
Do not reorder and move up a loop latch block before a loop header
when optimising for size because this will generate an extra
unconditional branch.
Differential Revision: https://reviews.llvm.org/D22521
llvm-svn: 278840
Check both operands for use of the $zero register which cannot be used with
a compact branch instruction.
Reviewers: dsanders, vkalintris
Differential Review: https://reviews.llvm.org/D23547
llvm-svn: 278824
The pipeliner was generating an invalid Phi name for an operand
in the epilog block, which caused an assert in the live variable
analysis pass. The fix is to the code that generates new Phis
in the epilog block. In this case, there is an existing Phi that
needs to be reused rather than creating a new Phi instruction.
Differential Revision: https://reviews.llvm.org/D23513
llvm-svn: 278805
This reverts commit r278660.
It causes downstream assertion failure in InstCombine on shuffle
instructions. Comes up in __mm_swizzle_epi32.
llvm-svn: 278672
The new version has several advantages:
1) IMSHO it's more readable and neater
2) It handles loads and stores properly
3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.
With this change we can now finally sink load-modify-store idioms such as:
if (a)
return *b += 3;
else
return *b += 4;
=>
%z = load i32, i32* %y
%.sink = select i1 %a, i32 5, i32 7
%b = add i32 %z, %.sink
store i32 %b, i32* %y
ret i32 %b
When this works for switches it'll be even more powerful.
llvm-svn: 278660
If a loop is not rotated (for example when optimizing for size), the latch is not the backedge. If we promote an expression to post-inc form, we not only increase register pressure and add a COPY for that IV expression but for all IVs!
Motivating testcase:
void f(float *a, float *b, float *c, int n) {
while (n-- > 0)
*c++ = *a++ + *b++;
}
It's imperative that the pointer increments be located in the latch block and not the header block; if not, we cannot use post-increment loads and stores and we have to keep both the post-inc and pre-inc values around until the end of the latch which bloats register usage.
llvm-svn: 278658
1. Use shuffle to insert element i1 into vector. The previous implementation was incorrect ( dest_bit OR src_bit , it doesn't clear the bit if src_bit=0 )
2. Improve shuffle i1 vector, use CVT2MASK if supported instead TRUNCATE.
Differential Revision: http://reviews.llvm.org/D23347
llvm-svn: 278623
LowerTargetConstantPool is not properly setting the TargetFlag to indicate
desired relocation. Coding error, the offset parameter was omitted, so the
TargetFlag was used as the offset, and the TargetFlag defaulted to zero.
This only affects -fpic compilation, and only those items created in a
Constant Pool, for example a vector of constants. Halide ran into this issue.
llvm-svn: 278614
Summary:
This test was resulting in asan/valgrind failures due to undefined
DWARF register mappings for WebAssembly, and was disabled in r278495.
These have been resolved.
Reviewers: sunfish, dschuff
Subscribers: bkramer, llvm-commits, jfb
Differential Revision: https://reviews.llvm.org/D23459
llvm-svn: 278576
This bring LLVM-generated PTX closer to what nvcc generates and avoids
triggering issues in ptxas.
For instance, ptxas does not accept .s16 (or .u16) registers as operands
for .fp16 instructions.
Differential Revision: https://reviews.llvm.org/D23460
llvm-svn: 278568
Trunk would try to create something like "stp x9, x8, [x0], #512", which isn't actually a valid instruction.
Differential revision: https://reviews.llvm.org/D23368
llvm-svn: 278559
Currently X86ISelLowering has a similar transformation for sexts:
sext(add_nsw(x, C)) --> add(sext(x), C_sext)
In this change I extend this code to handle zexts as well.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D23359
llvm-svn: 278520
...and the two followup commits:
Revert "[Sparc][Leon] Missed resetting option flags from check-in 278489."
Revert "[Sparc][Leon] Errata fixes for various errata in different
versions of the Leon variants of the Sparc 32 bit processor."
This reverts commit r274856, r278489, and r278492.
llvm-svn: 278511
The PALIGNR target shuffle decode was not taking into account that DecodePALIGNRMask (rather oddly) expects the operands to be in reverse order, nor was it detecting unary patterns, causing combines to combine with the incorrect input.
The cgbuiltin, auto upgrade and instruction comments code correctly swap the operands so are not affected.
llvm-svn: 278494
The nature of the errata are listed in the comments preceding the errata fix passes. Relevant unit tests are implemented for each of these.
These changes update older versions of these errata fixes with improvements to code and unit tests.
Differential Revision: https://reviews.llvm.org/D21960
llvm-svn: 278489
"insert_subreg, subreg_to_reg, and reg_sequence" instructions' after
adjusting some unittest checks.
This is to solve PR28852. The restriction was added at 2010 to make better register
coalescing. We assumed that it was not necessary any more. Testing results on x86
supported the assumption.
We will look closely to any performance impact it will bring and will be prepared
to help analyzing performance problem found on other architectures.
Differential Revision: https://reviews.llvm.org/D23210
llvm-svn: 278466
To fix PR28014, this patch restricts tail merging to blocks that belong to the
same loop after MBP.
Differential Revision: https://reviews.llvm.org/D23191
llvm-svn: 278463
This helped to improved memory-folding and register coalescing optimizations.
Also, this patch fixed the tracker #17229.
Reviewer: Craig Topper.
Differential Revision: https://reviews.llvm.org/D23108
llvm-svn: 278431
It's sharing the integer G_CONSTANT for now since I don't *think* it creates
any ambiguity (even on weird archs). If that turns out wrong we can create a
G_PTRCONSTANT or something.
llvm-svn: 278423