The immediate offset of the non-writeback i8 form (encoding T4) allows
negative offsets only. The positive offset form of the encoding is the
LDRT instruction. Immediate offsets in the range [0,255] use encoding T3
instead.
llvm-svn: 139254
In some cases such as interpreters using indirectbr, the CFG can be very
complicated, and live range splitting may be forced to insert a large
number of phi-defs. When that happens, traceSiblingValue can spend a
lot of time zipping around in the CFG looking for defs and reloads.
This patch causes more information to be cached in SibValues, and the
cached values are used to terminate searches early. This speeds up
spilling by 20x in one interpreter test case. For more typical code,
this is just a 10% speedup of spilling.
llvm-svn: 139247
There is no 16-bit wide encoding, so the .w suffix isn't needed (indeed, isn't
documented as allowed). Also add the missing '!' token on the _UPD
variant.
llvm-svn: 139243
Choose 32-bit vs. 16-bit encoding when there's no .w suffix in post-processing
as match classes are insufficient to handle the context-sensitiveness of
the writeback operand's legality for the 16-bit encodings.
llvm-svn: 139242
(The fix for the related failures on x86 is going to be nastier because we actually need Acquire memoperands attached to the atomic load instrs, etc.)
llvm-svn: 139221
with a vector condition); such selects become VSELECT codegen nodes.
This patch also removes VSETCC codegen nodes, unifying them with SETCC
nodes (codegen was actually often using SETCC for vector SETCC already).
This ensures that various DAG combiner optimizations kick in for vector
comparisons. Passes dragonegg bootstrap with no testsuite regressions
(nightly testsuite as well as "make check-all"). Patch mostly by
Nadav Rotem.
llvm-svn: 139159
Now the 'S' instructions, e.g. ADDS, treat S bit as optional operand as well.
Also fix isel hook to correctly set the optional operand.
rdar://10073745
llvm-svn: 139157
Even if there's no mode switch performed, the .code directive should still
be sent to the output streamer. Otherwise, for example, an output asm stream
is not equivalent to the input stream which generated it (a dependency on
the input target triple arm vs. thumb is introduced which was not originally
there).
llvm-svn: 139155
init.trampoline and adjust.trampoline intrinsics, into two intrinsics
like in GCC. While having one combined intrinsic is tempting, it is
not natural because typically the trampoline initialization needs to
be done in one function, and the result of adjust trampoline is needed
in a different (nested) function. To get around this llvm-gcc hacks the
nested function lowering code to insert an additional parent variable
holding the adjust.trampoline result that can be accessed from the child
function. Dragonegg doesn't have the luxury of tweaking GCC code, so it
stored the result of adjust.trampoline in the memory GCC set aside for
the trampoline itself (this is always available in the child function),
and set up some new memory (using an alloca) to hold the trampoline.
Unfortunately this breaks Go which allocates trampoline memory on the
heap and wants to use it even after the parent has exited (!). Rather
than doing even more hacks to get Go working, it seemed best to just use
two intrinsics like in GCC. Patch mostly by Sanjoy Das.
llvm-svn: 139140
instructions are more aligned than the CPU requires, and adds some additional
directives, to follow in future patches. Patch by David Meyer!
llvm-svn: 139125
- Drop support for X >u 0, it's equivalent to X != 0 and should be canonicalized into the latter.
- Add X < 1 -> unlikely, which is what instcombine canonicalizes X <= 0 into.
- Add X > -1 -> likely, which is what instcombine canonicalizes X >= 0 into.
llvm-svn: 139110
call. The call may be in the same BB as the landingpad instruction. If that's
the case, then inserting the loads after the landingpad inst, but before the
extractvalues, causes undefined behavior.
llvm-svn: 139088
If we have a chain of zext -> assert_zext -> zext -> use, the first zext would get simplified away because of the later zext, and then the later zext would get simplified away because of the assert. The solution is to teach SimplifyDemandedBits that assert_zext demands all of the high bits of its input, rather than only those demanded by its users. No testcase because the only example I have manifests as llvm-gcc miscompiling LLVM, and I haven't found a smaller case that reproduces this problem.
Fixes <rdar://problem/10063365>.
llvm-svn: 139059
The explanation about a 0 argument being materialized as xor is no
longer valid. Rematerialization will check if EFLAGS is live before
clobbering it.
The code produced by X86TargetLowering::EmitLoweredSelect does not
clobber EFLAGS.
This causes one less testb instruction to be generated in the cmov.ll
test case.
llvm-svn: 139057
slots. This fixes a bug where the number of nodes coming into the PHI node may
not equal the number of predecessors. E.g., two or more landingpad instructions
may require a PHI before reaching the eh.exception and eh.selector instructions.
llvm-svn: 139035
This changes loop unrolling to use the same mechanism for trip count
computation as indvars. This is a stronger check that tends to unroll
more loops. A very common side-effect is that many single iteration
loops will be removed sooner. The real goal was simply to remove
dependence on canonical IVs.
x86 is break even.
ARM performance changes to expect (+ is good):
External/SPEC/CFP2000/183.equake/183.equake +13%
SingleSource/Benchmarks/Dhrystone/fldry +21%
MultiSource/Applications/spiff/spiff +3%
SingleSource/Benchmarks/Stanford/Puzzle -14%
The Puzzle regression is actually an improvement in loop optimization
that defeats GVN: rdar://problem/10065079.
llvm-svn: 139009
Perform the upgrading in steps.
* First, create a map of the invokes to the EH intrinsics.
* Next, take that mapping and determine if the invoke's unwind destination has a
single predecessor. If not, then create a new empty block to hold the new
landingpad instruction.
* Create a landingpad instruction into the uwnind destination. Fill it with the
values from the old selector. Map the old intrinsic calls to the new
landingpad values (there may be multiple landingpad instructions per instrinic
call pairs).
* Go through the old intrinsic calls, create a PHI node when necessary, and then
replace their values with the new values from the landingpad instructions.
* Delete all dead instructions.
* ???
* Profit!
llvm-svn: 138990
to be unreliable on platforms which require memcpy calls, and it is
complicating broader legalize cleanups. It is hoped that these cleanups
will make memcpy byval easier to implement in the future.
llvm-svn: 138977
- On COFF the .lcomm directive has an alignment argument.
- On ELF we fall back to .local + .comm
Based on a patch by NAKAMURA Takumi.
Fixes PR9337, PR9483 and PR10128.
llvm-svn: 138976
An instruction may define part of a register where the other bits are
undefined. In that case, it is safe to rematerialize the instruction.
For example:
%vreg2:ssub_0<def> = VLDRS <cp#0>, 0, pred:14, pred:%noreg, %vreg2<imp-def>
The extra <imp-def> operand indicates that the instruction does not read
the other parts of the virtual register, so a remat is safe.
This patch simply allows multiple def operands for the virtual register.
It is MI->readsVirtualRegister() that determines if we depend on a
previous value so remat is impossible.
llvm-svn: 138953
An instruction that redefines only part of a larger register can never
be rematerialized since the virtual register value depends on the old
value in other parts of the register.
This was fixed for the inline spiller in r138794. This patch fixes the
problem for all register allocators, and includes a small test case.
<rdar://problem/10032939>
llvm-svn: 138944
Added canClobberReachingPhysRegUse() to handle a particular pattern in
which a two-address instruction could be forced to interfere with
EFLAGS, causing a compare to be unnecessarilly cloned.
Fixes rdar://problem/5875261
llvm-svn: 138924
The landingpad instruction is required in the landing pad block. Because we're
not deleting terminating instructions, the invoke may still jump to here (see
Transforms/SCCP/2004-11-16-DeadInvoke.ll). Remove all uses of the landingpad
instruction, but keep it around until code-gen can remove the basic block.
llvm-svn: 138890
When we want encoding T3 (the wide encoding), we can explicitly check for
that and twiddle the CanAcceptCarrySet accordingly. For now, just correctly
handle encodings T1 and T2 when in Thumb2 mode.
llvm-svn: 138879
When the destination register of an add immediate instruction is
explicitly specified, encoding T1 is preferred, else encoding T2 is
preferred.
llvm-svn: 138862
It appears that our use of the imp-use and imp-def flags with
sub-registers is not yet robust enough to support this.
The failing test case is complicated, I am working on a reduction.
<rdar://problem/10044201>
llvm-svn: 138861
- Duplicate some store patterns to their AVX forms!
- Catched a bug while restricting the patterns subtarget, fix it
and update a testcase to check it properly
llvm-svn: 138851
code is inserted to first check if the current stacklet has enough
space. If so, space is allocated by simply decrementing the stack
pointer. Otherwise a runtime routine (__morestack_allocate_stack_space
in libgcc) is called which allocates the required memory from the
heap.
Patch by Sanjoy Das.
llvm-svn: 138818
from DYNAMIC_STACKALLOC.
Two new pseudo instructions (SEG_ALLOCA_32 and SEG_ALLOCA_64) which
will match X86SegAlloca (based on word size) are also added. They
will be custom emitted to inject the actual stack handling code.
Patch by Sanjoy Das.
llvm-svn: 138814
X86. Modify the pass added in the previous patch to call this new
code.
This new prologues generated will call a libgcc routine (__morestack)
to allocate more stack space from the heap when required
Patch by Sanjoy Das.
llvm-svn: 138812
Add a instruction flag: hasPostISelHook which tells the pre-RA scheduler to
call a target hook to adjust the instruction. For ARM, this is used to
adjust instructions which may be setting the 's' flag. ADC, SBC, RSB, and RSC
instructions have implicit def of CPSR (required since it now uses CPSR physical
register dependency rather than "glue"). If the carry flag is used, then the
target hook will *fill in* the optional operand with CPSR. Otherwise, the hook
will remove the CPSR implicit def from the MachineInstr.
llvm-svn: 138810
register dependency (rather than glue them together). This is general
goodness as it gives scheduler more freedom. However it is motivated by
a nasty bug in isel.
When a i64 sub is expanded to subc + sube.
libcall #1
\
\ subc
\ / \
\ / \
\ / libcall #2
sube
If the libcalls are not serialized (i.e. both have chains which are dag
entry), legalizer can serialize them in arbitrary orders. If it's
unlucky, it can force libcall #2 before libcall #1 in the above case.
subc
|
libcall #2
|
libcall #1
|
sube
However since subc and sube are "glued" together, this ends up being a
cycle when the scheduler combine subc and sube as a single scheduling
unit.
The right solution is to fix LegalizeType too chains the libcalls together.
However, LegalizeType is not processing nodes in order so that's harder than
it should be. For now, the move to physical register dependency will do.
rdar://10019576
llvm-svn: 138791