On non-Darwin PPC systems, we currently strip off the register name prefix
prior to instruction printing. So instead of something like this:
mr r3, r4
we print this:
mr 3, 4
The first form is the default on Darwin, and is understood by binutils, but not
yet understood by our integrated assembler. Once our integrated-as understands
full register names as well, this temporary option will be replaced by tying
this functionality to the verbose-asm option. The numeric-only form is
compatible with legacy assemblers and tools, and is also gcc's default on most
PPC systems. On the other hand, it is harder to read, and there are some
analysis tools that expect full register names.
llvm-svn: 194384
formal arguments on the stack and stores created afterwards. We need this to
ensure tail call optimized function calls do not write over the argument area
of the stack before it is read out.
llvm-svn: 194309
This patch moves the jump address materialization inside the noop slide. This
enables patching of the materialization itself or its complete removal. This
patch also adds the ability to define scratch registers that can be used safely
by the code called from the patchpoint intrinsic. At least one scratch register
is required, because that one is used for the materialization of the jump
address. This patch depends on D2009.
Differential Revision: http://llvm-reviews.chandlerc.com/D2074
Reviewed by Andy
llvm-svn: 194306
The idea of the AnyReg Calling Convention is to provide the call arguments in
registers, but not to force them to be placed in a paticular order into a
specified set of registers. Instead it is up tp the register allocator to assign
any register as it sees fit. The same applies to the return value (if
applicable).
Differential Revision: http://llvm-reviews.chandlerc.com/D2009
Reviewed by Andy
llvm-svn: 194293
ARM prologues usually look like:
push {r7, lr}
sub sp, sp, #4
If code size is extremely important, this can be optimised to the single
instruction:
push {r6, r7, lr}
where we don't actually care about the contents of r6, but pushing it subtracts
4 from sp as a side effect.
This should implement such a conversion, predicated on the "minsize" function
attribute (-Oz) since I've yet to find any code it actually makes faster.
llvm-svn: 194264
MorphNodeTo is not safe to call during DAG building. It eagerly
deletes dependent DAG nodes which invalidates the NodeMap. We could
expose a safe interface for morphing nodes, but I don't think it's
worth it. Just create a new MachineNode and replaceAllUsesWith.
My understaning of the SD design has been that we want to support
early target opcode selection. That isn't very well supported, but
generally works. It seems reasonable to rely on this feature even if
it isn't widely used.
llvm-svn: 194102
Submit the basic port of the rest of ARM constant islands code to Mips.
Two test cases are added which reflect the next level of functionality:
constants getting moved to water areas that are out of range from the
initial placement at the end of the function and basic blocks being split to
create water when none exists that can be used. There is a bunch of this
code that is not complete and has been marked with IN_PROGRESS. I will
finish cleaning this all up during the next week or two and submit the
rest of the test cases. I have elminated some code for dealing with
inline assembly because to me it unecessarily complicates things and
some of the newer features of llvm like function attributies and builtin
assembler give me better tools to solve the alignment issues created
there. Also, for Mips16 I even have the option of not doing constant
islands in the present of inline assembler if I chose. When everything
has been completed I will summarize the port and notify people that
are knowledgable regarding the ARM Constant Islands code so they can
review it in it's entirety if they wish.
llvm-svn: 194053
If an inline assembly operand has multiple constraints (e.g. "Ir" for immediate
or register) and an operand modifier (E.g. "w" for "print register as wN") then
we need to decide behaviour when the modifier doesn't apply to the constraint.
Previousely produced some combination of an assertion failure and a fatal
error. GCC's behaviour appears to be to ignore the modifier and print the
operand in the default way. This patch should implement that.
llvm-svn: 194024
Add a Virtualization ARM subtarget feature along with adding proper build
attribute emission for Tag_Virtualization_use (encodes Virtualization and
TrustZone) and Tag_MPextension_use.
Also rework test/CodeGen/ARM/2010-10-19-mc-elf-objheader.ll testcase to
something that is more maintainable. This changes the focus of this
testcase away from testing CPU defaults (which is tested elsewhere), onto
specifically testing that attributes are encoded correctly.
llvm-svn: 193859
Fix Tag_ABI_HardFP_use build attribute to handle single precision FP,
replace deprecated Tag_ABI_HardFP_use value of 3 with 0 and also add
some tests for Tag_ABI_VFP_args.
llvm-svn: 193856
As on other hosts, the CPU identification instruction is priveleged,
so we need to look through /proc/cpuinfo. I copied the PowerPC way of
handling "generic".
Several tests were implicitly assuming z10 and so failed on z196.
llvm-svn: 193742
This adds a new subtarget feature called FPARMv8 (implied by NEON), and
predicates the support of the FP instructions and registers on this feature.
llvm-svn: 193739
When an extend more than doubles the size of the elements (e.g., a zext
from v16i8 to v16i32), the normal legalization method of splitting the
vectors will run into problems as by the time the destination vector is
legal, the source vector is illegal. The end result is the operation
often becoming scalarized, with the typical horrible performance. For
example, on x86_64, the simple input of:
define void @bar(<16 x i8> %a, <16 x i32>* %p) nounwind {
%tmp = zext <16 x i8> %a to <16 x i32>
store <16 x i32> %tmp, <16 x i32>*%p
ret void
}
Generates:
.section __TEXT,__text,regular,pure_instructions
.section __TEXT,__const
.align 5
LCPI0_0:
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.section __TEXT,__text,regular,pure_instructions
.globl _bar
.align 4, 0x90
_bar:
vpunpckhbw %xmm0, %xmm0, %xmm1
vpunpckhwd %xmm0, %xmm1, %xmm2
vpmovzxwd %xmm1, %xmm1
vinsertf128 $1, %xmm2, %ymm1, %ymm1
vmovaps LCPI0_0(%rip), %ymm2
vandps %ymm2, %ymm1, %ymm1
vpmovzxbw %xmm0, %xmm3
vpunpckhwd %xmm0, %xmm3, %xmm3
vpmovzxbd %xmm0, %xmm0
vinsertf128 $1, %xmm3, %ymm0, %ymm0
vandps %ymm2, %ymm0, %ymm0
vmovaps %ymm0, (%rdi)
vmovaps %ymm1, 32(%rdi)
vzeroupper
ret
So instead we can check if there are legal types that enable us to split
more cleverly when the input vector is already legal such that we don't
turn it into an illegal type. If the extend is such that it's more than
doubling the size of the input we check if
- the number of vector elements is even,
- the source type is legal,
- the type of a split source is illegal,
- the type of an extended (by doubling element size) source is legal, and
- the type of that extended source when split is legal.
If the conditions are met, instead of just splitting both the
destination and the source types, we create an extend that only goes up
one "step" (doubling the element width), and the continue legalizing the
rest of the operation normally. The result is that this operates as a
new, more effecient, termination condition for the loop of "split the
operation until the destination type is legal."
With this change, the above example now compiles to:
_bar:
vpxor %xmm1, %xmm1, %xmm1
vpunpcklbw %xmm1, %xmm0, %xmm2
vpunpckhwd %xmm1, %xmm2, %xmm3
vpunpcklwd %xmm1, %xmm2, %xmm2
vinsertf128 $1, %xmm3, %ymm2, %ymm2
vpunpckhbw %xmm1, %xmm0, %xmm0
vpunpckhwd %xmm1, %xmm0, %xmm3
vpunpcklwd %xmm1, %xmm0, %xmm0
vinsertf128 $1, %xmm3, %ymm0, %ymm0
vmovaps %ymm0, 32(%rdi)
vmovaps %ymm2, (%rdi)
vzeroupper
ret
This generalizes a custom lowering that was added a while back to the
ARM backend. That lowering is no longer necessary, and is removed. The
testcases for it, however, provide excellent ARM tests for this change
and so remain.
rdar://14735100
llvm-svn: 193727
With this patch llvm produces a weak_def_can_be_hidden for linkonce_odr
if they are also unnamed_addr or don't have their address taken.
There is not a lot of documentation about .weak_def_can_be_hidden, but
from the old discussion about linkonce_odr_auto_hide and the name of
the directive this looks correct: these symbols can be hidden.
Testing this with the ld64 in Xcode 5 linking clang reduces the number of
exported symbols from 21053 to 19049.
llvm-svn: 193718
Also corrected the definition of the intrinsics for these instructions (the
result register is also the first operand), and added intrinsics for bsel and
bseli to clang (they already existed in the backend).
These four operations are mostly equivalent to bsel, and bseli (the difference
is which operand is tied to the result). As a result some of the tests changed
as described below.
bitwise.ll:
- bsel.v test adapted so that the mask is unknown at compile-time. This stops
it emitting bmnzi.b instead of the intended bsel.v.
- The bseli.b test now tests the right thing. Namely the case when one of the
values is an uimm8, rather than when the condition is a uimm8 (which is
covered by bmnzi.b)
compare.ll:
- bsel.v tests now (correctly) emits bmnz.v instead of bsel.v because this
is the same operation (see MSA.txt).
i8.ll
- CHECK-DAG-ized test.
- bmzi.b test now (correctly) emits equivalent bmnzi.b with swapped operands
because this is the same operation (see MSA.txt).
- bseli.b still emits bseli.b though because the immediate makes it
distinguishable from bmnzi.b.
vec.ll:
- CHECK-DAG-ized test.
- bmz.v tests now (correctly) emits bmnz.v with swapped operands (see
MSA.txt).
- bsel.v tests now (correctly) emits bmnz.v with swapped operands (see
MSA.txt).
llvm-svn: 193693
This required correcting the definition of the bins[lr]i intrinsics because
the result is also the first operand.
It also required removing the (arbitrary) check for 32-bit immediates in
MipsSEDAGToDAGISel::selectVSplat().
Currently using binsli.d with 2 bits set in the mask doesn't select binsli.d
because the constant is legalized into a ConstantPool. Similar things can
happen with binsri.d with more than 10 bits set in the mask. The resulting
code when this happens is correct but not optimal.
llvm-svn: 193687
(or (and $a, $mask), (and $b, $inverse_mask)) => (vselect $mask, $a, $b).
where $mask is a constant splat. This allows bitwise operations to make use
of bsel.
It's also a stepping stone towards matching bins[lr], and bins[lr]i from
normal IR.
Two sets of similar tests have been added in this commit. The bsel_* functions
test the case where binsri cannot be used. The binsr_*_i functions will
start to use the binsri instruction in the next commit.
llvm-svn: 193682
splat.d is implemented but this subtest is currently disabled. This is because
it is difficult to match the appropriate IR on MIPS32. There is a patch under
review that should help with this so I hope to enable the subtest soon.
llvm-svn: 193680
The Type Legalizer recognizes that VSELECT needs to be split, because the type
is to wide for the given target. The same does not always apply to SETCC,
because less space is required to encode the result of a comparison. As a result
VSELECT is split and SETCC is unrolled into scalar comparisons.
This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG
Combiner. If a matching pattern is found, then the result mask of SETCC is
promoted to the expected vector mask type for the given target. This mask has
usually the same size as the VSELECT return type (except for Intel KNL). Now the
type legalizer will split both VSELECT and SETCC.
This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX
pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>.
Reviewed by Nadav
llvm-svn: 193676
This commit allows the ARM integrated assembler to parse
and assemble the code with .eabi_attribute, .cpu, and
.fpu directives.
To implement the feature, this commit moves the code from
AttrEmitter to ARMTargetStreamers, and several new test
cases related to cortex-m4, cortex-r5, and cortex-a15 are
added.
Besides, this commit also change the Subtarget->isFPOnlySP()
to Subtarget->hasD16() to match the usage of .fpu directive.
This commit changes the test cases:
* Several .eabi_attribute directives in
2010-09-29-mc-asm-header-test.ll are removed because the .fpu
directive already cover the functionality.
* In the Cortex-A15 test case, the value for
Tag_Advanced_SIMD_arch has be changed from 1 to 2,
which is more precise.
llvm-svn: 193524
useAA significantly improves the handling of vector code that has TBAA
information attached. It also helps other cases, as shown by the testsuite
changes here. The only real downside I've seen is that it interferes with
MergeConsecutiveStores. The problem is that that optimization works top
down, starting at the first store in the chain, and looks for cases where
the chain result is only used by a single related store. These related
stores don't alias, so useAA will have rewritten all the later stores to
use a different chain input (typically the same one as the first store).
I think the advantages outweigh the disadvantages though, so for now I've
just disabled alias analysis for the unaligned-01.ll test.
llvm-svn: 193521
Making useAA() default to true for SystemZ showed that the combiner alias
analysis wasn't handling volatile accesses. This hit many of the SystemZ
tests, but I arbitrarily picked one for the purpose of this patch.
llvm-svn: 193518
Most SelectionDAG code drops the TBAA info when creating a new form of a
load and store (e.g. during legalization, or when converting a plain
load to an extending one). This patch tries to catch all cases where
the TBAA information can legitimately be carried over.
The patch adds alternative forms of getLoad() and getExtLoad() that take
a MachineMemOperand instead of individual fields. (The corresponding
getTruncStore() already exists.) The idea is to use the MachineMemOperand
forms when all fields are carried over (size, pointer info, isVolatile,
isNonTemporal, alignment and TBAA info). If some adjustment is being
made, e.g. to narrow the load, then we still pass the individual fields
but also pass the TBAA info.
llvm-svn: 193517
Before I just ported the shell of the pass. I've tried to keep everything
nearly identical to the ARM version. I think it will be very easy to eventually
merge these two and create a new more general pass that other targets can
use. I have some improvements I would like to make to allow pools to
be shared across functions and some other things. When I'm all done we
can think about making a more general pass. More to be ported but the
basic mechanism works now almost as good as gcc mips16.
llvm-svn: 193509
There's a barrier instruction so that should still be used, but most actual
atomic operations are going to need a platform decision on the correct
behaviour (either nop if single-threaded or OS-support otherwise).
rdar://problem/15287210
llvm-svn: 193399
ARM processors without ldrex/strex need to be able to make libcalls for all
atomic operations, including the newer min/max versions.
The alternative would probably be expanding these operations in terms of
cmpxchg (as x86 does always), but in the configurations where this matters
code-size tends to be paramount so the libcall is more desirable.
llvm-svn: 193398
The compiler-rt functions __adddf3vfp and so on exist purely to allow Thumb1
code to make use of VFP instructions by switching back to ARM mode, they make
no sense for M-class processors which don't even have an ARM mode.
Given that justification, in practice this is a platform ABI decision so the
actual check is based on that rather than CPU features.
rdar://problem/15302004
llvm-svn: 193327
When generating the IfTrue basic block during the F128CSEL pseudo-instruction
handling, the NZCV live-in for the newly created BB wasn't being added. This
caused a fault during MI-sched/live range calculation when the predecessor
for the fall-through BB didn't have a live-in for phys-reg as expected.
llvm-svn: 193316
On sandy bridge (PR17654) we now get
vpxor %xmm1, %xmm1, %xmm1
vpunpckhbw %xmm1, %xmm0, %xmm2
vpunpcklbw %xmm1, %xmm0, %xmm0
vinsertf128 $1, %xmm2, %ymm0, %ymm0
On haswell it's a simple
vpmovzxbw %xmm0, %ymm0
There is a maze of duplicated and dead transforms and patterns in this
area. Remove the dead custom lowering of zext v8i16 to v8i32, that's
already handled by LowerAVXExtend.
llvm-svn: 193262
- Skip instructions added in prolog. For specific targets, prolog may
insert helper function calls (e.g. _chkstk will be called when
there're more than 4K bytes allocated on stack). However, these
helpers don't use/def YMM/XMM registers.
llvm-svn: 193261
The SelectionDAGBuilder was promoting vector kernel arguments to legal
types, but this won't work for R600 and SI since kernel arguments are
stored in memory and can't be promoted. In order to handle vector
arguments correctly we need to look at the original types from the LLVM IR
function.
llvm-svn: 193215
The AMDGPUIndirectAddressing pass was previously responsible for
lowering private loads and stores to indirect addressing instructions.
However, this pass was buggy and way too complicated. The only
advantage it had over the new simplified code was that it saved one
instruction per direct write to private memory. This optimization
likely has a minimal impact on performance, and we may be able
to duplicate it using some other transformation.
For the private address space, we now:
1. Lower private loads/store to Register(Load|Store) instructions
2. Reserve part of the register file as 'private memory'
3. After regalloc lower the Register(Load|Store) instructions to
MOV instructions that use indirect addressing.
llvm-svn: 193179
the instruction defenitions and ISEL reflect this.
Prior to this patch these instructions took an i32i8imm, and the high bits were
dropped during encoding. This led to incorrect behavior for shifts by
immediates higher than 255. This patch fixes that issue by detecting large
immediate shifts and returning constant zero (for logical shifts) or capping
the shift amount at an encodable value (for arithmetic shifts).
Fixes <rdar://problem/14968098>
llvm-svn: 193096
This ensures that the prefix data is treated as part of the function for
the purpose of debug info. This provides a better debugging experience,
among other things by allowing a debug info client to correctly look up
a function in debug info given a function pointer.
llvm-svn: 193042
PR17168 describes a test case that fails when compiling for debug with
fast-isel. Investigation showed that the test was failing because a DBG_VALUE
machine instruction was placed prior to a PHI.
For this problem to occur requires the following:
* Compile for debug
* Compile with fast-isel
* In a block B, fast-isel must partially succeed before punting to DAG-isel
* B must start with a PHI
* The first unhandled node in the DAG must not generate a machine instruction
* A debug value with an order less than that of that first node exists
When all of these circumstances apply, the existing test that an instruction
was not inserted won't fire. Currently it tests whether the block is empty,
or whether the last instruction generated is a phi. When fast-isel has
partially succeeded, the last instruction generated will not be a phi.
Instead, we need to check whether the current insert position is immediately
following a phi. This patch adds that check, and adds the test case from the
PR as a regression test.
llvm-svn: 192976
This caused the clang-native-mingw32-win7 buildbot to break.
The assembler was complaining about the following lines that were showing up
in the asm for CrashRecoveryContext.cpp:
movl $"__ZL16ExceptionHandlerP19_EXCEPTION_POINTERS@4", 4(%eax)
calll "_AddVectoredExceptionHandler@8"
.def "__ZL16ExceptionHandlerP19_EXCEPTION_POINTERS@4";
"__ZL16ExceptionHandlerP19_EXCEPTION_POINTERS@4":
calll "_RemoveVectoredExceptionHandler@4"
Reverting for now.
llvm-svn: 192940
This commit implements the correct lowering of the
COPY_STRUCT_BYVAL_I32 pseudo-instruction for thumb1 targets.
Previously, the lowering of COPY_STRUCT_BYVAL_I32 generated the
post-increment forms of ldr/ldrh/ldrb instructions. Thumb1 does not
have the post-increment form of these instructions so the generated
assembly contained invalid instructions.
Passing the generated assembly to gcc caused it to complain with an
error like this:
Error: cannot honor width suffix -- `ldrb r3,[r0],#1'
and the integrated assembler would generate an object file with an
invalid instruction encoding.
This commit contains a small test case that demonstrates the problem
with thumb1 targets as well as an expanded test case that more
throughly tests the lowering of byval struct passing for arm,
thumb1, and thumb2 targets.
llvm-svn: 192916
class. The instruction class includes the signed saturating doubling
multiply-add long, signed saturating doubling multiply-subtract long, and
the signed saturating doubling multiply long instructions.
llvm-svn: 192908
When canonicalizing dags according to the rule
(shl (zext (shr X, c1) ), c1) ==> (zext (shl (shr X, c1), c1))
remember to add the new shl dag to the DAGCombiner worklist of nodes.
If we don't explicitly add it to the worklist of nodes to visit, we
may not trigger later on the rule that folds the shift left + logical
shift right into a AND instruction with bitmask.
llvm-svn: 192883
Consider the following:
typedef unsigned short ushort4U __attribute__((ext_vector_type(4),
aligned(2)));
typedef unsigned short ushort4 __attribute__((ext_vector_type(4)));
typedef unsigned short ushort8 __attribute__((ext_vector_type(8)));
typedef int int4 __attribute__((ext_vector_type(4)));
int4 __bbase_cvt_int(ushort4 v) {
ushort8 a;
a.lo = v;
return _mm_cvtepu16_epi32(a);
}
This generates the, not unreasonable, IR:
define <4 x i32> @foo0(double %v.coerce) nounwind ssp {
%tmp = bitcast double %v.coerce to <4 x i16>
%tmp1 = shufflevector <4 x i16> %tmp, <4 x i16> undef, <8 x i32> <i32
%0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
%tmp2 = tail call <4 x i32> @llvm.x86.sse41.pmovzxwd(<8 x i16> %tmp1)
ret <4 x i32> %tmp2
}
The problem is when type legalization gets hold of the v4i16. It
legalizes that by spilling to the stack, then doing a zero-extending
load. Things go even more silly from there, ending up with something
like:
_foo0:
movsd %xmm0, -8(%rsp) <== Spill to the stack.
movq -8(%rsp), %xmm0 <== Reload it right back out.
pmovzxwd %xmm0, %xmm1 <== Here's what we actually asked for.
pblendw $1, %xmm1, %xmm0 <== We don't need this at all
pmovzxwd %xmm0, %xmm0 <== We already did this
ret
The v8i8 to v8i16 zext intrinsic gives even worse results, with two
table lookups via pshufb instructions(!!).
To avoid all that, we can move the bitcasting until after we've formed
the wider (legal) vector type. Then our normal codegen flows along
nicely and we get the expected:
_foo0:
pmovzxwd %xmm0, %xmm0
ret
rdar://15245794
llvm-svn: 192866
The reason this got reverted was that the @feat.00 symbol which was emitted
for every TU became quoted, and on cygwin/mingw we use the gas assembler which
couldn't handle the quotes.
This commit fixes the problem by only emitting @feat.00 for win32, where we use
clang -cc1as to assemble. gas would just drop this symbol anyway, so there is no
loss there.
With @feat.00 gone, there shouldn't be quoted symbols showing up on cygwin since
it uses the Itanium ABI, which doesn't put these funny characters in symbols.
> Because of win32 mangling, we produce symbol and section names with
> funny characters in them, most notably @ characters.
>
> MC would choke on trying to parse its own assembly output. This patch addresses
> that by:
>
> - Making @ trigger quoting of symbol names
> - Also quote section names in the same way
> - Just parse section names like other identifiers (to allow for quotes)
> - Don't assume @ signifies a symbol variant if it is in a string.
llvm-svn: 192859
bulldozer and piledriver. Support for the instruction itself seems to have
already been added in r178040.
Differential Revision: http://llvm-reviews.chandlerc.com/D1933
llvm-svn: 192828
We were calling llvm_unreachable() when failing to optimize the
branch into if case. However, it is still possible for us
to structurize the CFG by duplicating blocks even if this optimization
fails.
Reviewed-by: Vincent Lejeune<vljn at ovi.com>
llvm-svn: 192813
This happens e.g. with <2 x i64> -1 on x86_32. It cannot be generated directly
because i64 is illegal. It would be nice if getNOT would handle this
transparently, but I don't see a way to generate a legal constant there right
now. Fixes PR17487.
llvm-svn: 192795
We previously used the default expansion to SELECT_CC, which in turn would
expand to "LHI; BRC; LHI". In most cases it's better to use an IPM-based
sequence instead.
llvm-svn: 192784
This is really an extension of the current (shl (shr ...)) -> shl optimization.
The main difference is that certain upper bits must also not be demanded.
The motivating examples are the first two in the testcase, which occur
in llvmpipe output.
llvm-svn: 192783
GNU AS didn't like quotes in symbol names.
Error: junk at end of line, first unrecognized character is `"'
.def "@feat.00";
"@feat.00" = 1
Reproduced on Cygwin's 2.23.52.20130309 and mingw32's 2.20.1.20100303.
llvm-svn: 192775
Because of win32 mangling, we produce symbol and section names with
funny characters in them, most notably @ characters.
MC would choke on trying to parse its own assembly output. This patch addresses
that by:
- Making @ trigger quoting of symbol names
- Also quote section names in the same way
- Just parse section names like other identifiers (to allow for quotes)
- Don't assume @ signifies a symbol variant if it is in a string.
Differential Revision: http://llvm-reviews.chandlerc.com/D1945
llvm-svn: 192758
This changes the SelectionDAG scheduling preference to source
order. Soon, the SelectionDAG scheduler can be bypassed saving
a nice chunk of compile time.
Performance differences that result from this change are often a
consequence of register coalescing. The register coalescer is far from
perfect. Bugs can be filed for deficiencies.
On x86 SandyBridge/Haswell, the source order schedule is often
preserved, particularly for small blocks.
Register pressure is generally improved over the SD scheduler's ILP
mode. However, we are still able to handle large blocks that require
latency hiding, unlike the SD scheduler's BURR mode. MI scheduler also
attempts to discover the critical path in single-block loops and
adjust heuristics accordingly.
The MI scheduler relies on the new machine model. This is currently
unimplemented for AVX, so we may not be generating the best code yet.
Unit tests are updated so they don't depend on SD scheduling heuristics.
llvm-svn: 192750
- Type of index used in extract_vector_elt or insert_vector_elt supposes
to be TLI.getVectorIdxTy() which is pointer type on most targets. It'd
better to truncate (or zero-extend in case it's changed later) it to
mask element type to guarantee they are matching instead of asserting
that.
llvm-svn: 192722
- Lower signed division by constant powers-of-2 to target-independent
DAG operators instead of target-dependent ones to support them better
on targets where vector types are legal but shift operators on that
types are illegal. E.g., on AVX, PSRAW is only available on <8 x i16>
though <16 x i16> is a legal type.
llvm-svn: 192721
rdar:15221834 False AVX register dependencies cause 5x slowdown on
flops-5/6 and significant slowdown on several others.
This was blocking the switch to MI-Sched.
llvm-svn: 192669
through bitcast, ptrtoint, and inttoptr instructions. This is valid
only if the related instructions are in that same basic block, otherwise
we may reference variables that were not live accross basic blocks
resulting in undefined virtual registers.
The bug was exposed when both SDISel and FastISel were used within the same
function, i.e., one basic block is issued with FastISel and another with SDISel,
as demonstrated with the testcase.
<rdar://problem/15192473>
llvm-svn: 192636
a) x86-64 TLS has been documented
b) the code path should use movq for the correct relocation
to be generated.
I've also added a fixme for the test case that we should improve
the code generated, it should look something like is documented
in the tls abi document.
llvm-svn: 192631
Per original comment, the intention of this loop
is to go ahead and break the critical edge
(in order to sink this instruction) if there's
reason to believe doing so might "unblock" the
sinking of additional instructions that define
registers used by this one. The idea is that if
we have a few instructions to sink "together"
breaking the edge might be worthwhile.
This commit makes a few small changes
to help better realize this goal:
First, modify the loop to ignore registers
defined by this instruction. We don't
sink definitions of physical registers,
and sinking an SSA definition isn't
going to unblock an upstream instruction.
Second, ignore uses of physical registers.
Instructions that define physical registers are
rejected for sinking, and so moving this one
won't enable moving any defining instructions.
As an added bonus, while virtual register
use-def chains are generally small due
to SSA goodness, iteration over the uses
and definitions (used by hasOneNonDBGUse)
for physical registers like EFLAGS
can be rather expensive in practice.
(This is the original reason for looking at this)
Finally, to keep things simple continue
to only consider this trick for registers that
have a single use (via hasOneNonDBGUse),
but to avoid spuriously breaking critical edges
only do so if the definition resides
in the same MBB and therefore this one directly
blocks it from being sunk as well.
If sinking them together is meant to be,
let the iterative nature of this pass
sink the definition into this block first.
Update tests to accomodate this change,
add new testcase where sinking avoids pipeline stalls.
llvm-svn: 192608
When if converting something like:
true:
... = R0<kill>
false:
... = R0<kill>
then the instructions of the true block must not have a <kill> flag
anymore, as the instruction of the false block follow and do still read
the R0 value.
Specifically this patch determines the set of register live-in in the
false block (possibly after simulating the liveness changes of the
duplicated instructions). Each of these live-in registers mustn't be
killed.
llvm-svn: 192482
This should fix the buildbots.
Original commit message:
[DAGCombiner] Slice a big load in two loads when the element are next to each
other in memory and the target has paired load and performs post-isel loads
combining.
E.g., this optimization will transform something like this:
a = load i64* addr
b = trunc i64 a to i32
c = lshr i64 a, 32
d = trunc i64 c to i32
into:
b = load i32* addr1
d = load i32* addr2
Where addr1 = addr2 +/- sizeof(i32), if the target supports paired load and
performs post-isel loads combining.
One should overload TargetLowering::hasPairedLoad to provide this information.
The default is false.
<rdar://problem/14477220>
llvm-svn: 192476
This reverts r192454
Apparently FileCheck isn't as smart as I though and does not enforce a
topological order between variable defs+uses.
llvm-svn: 192472
other in memory and the target has paired load and performs post-isel loads
combining.
E.g., this optimization will transform something like this:
a = load i64* addr
b = trunc i64 a to i32
c = lshr i64 a, 32
d = trunc i64 c to i32
into:
b = load i32* addr1
d = load i32* addr2
Where addr1 = addr2 +/- sizeof(i32), if the target supports paired load and
performs post-isel loads combining.
One should overload TargetLowering::hasPairedLoad to provide this information.
The default is false.
<rdar://problem/14477220>
llvm-svn: 192471
For NVPTX, this fixes a crash where the emitImplicitDef implementation was expecting physical registers,
while NVPTX uses virtual registers (with a couple of exceptions). Now, the implicit def comment will be
emitted as a true PTX register name. Other targets can use this to customize the output of implicit def
comments.
Fixes PR17519
llvm-svn: 192444
When a ConstantExpr which uses a thread local is part of a PHI node
instruction, the insruction that replaces the ConstantExpr must
be inserted in the predecessor block, in front of the terminator instruction.
If the predecessor block has multiple successors, the edge is first split.
llvm-svn: 192432
We can't enable the verifier for tests with SI_IF and SI_ELSE, because
these instructions are always followed by a COPY which copies their
result to the next basic block. This violates the machine verifier's
rule that non-terminators can not folow terminators.
Reviewed-by: Vincent Lejeune<vljn at ovi.com>
llvm-svn: 192366
Including following 14 instructions:
4 ld1 insts: load multiple 1-element structure to sequential 1/2/3/4 registers.
ld2/ld3/ld4: load multiple N-element structure to sequential N registers (N=2,3,4).
4 st1 insts: store multiple 1-element structure from sequential 1/2/3/4 registers.
st2/st3/st4: store multiple N-element structure from sequential N registers (N = 2,3,4).
llvm-svn: 192361
Including following 14 instructions:
4 ld1 insts: load multiple 1-element structure to sequential 1/2/3/4 registers.
ld2/ld3/ld4: load multiple N-element structure to sequential N registers (N=2,3,4).
4 st1 insts: store multiple 1-element structure from sequential 1/2/3/4 registers.
st2/st3/st4: store multiple N-element structure from sequential N registers (N = 2,3,4).
llvm-svn: 192352
When we had a sequence like:
s1 = VLDRS [r0, 1], Q0<imp-def>
s3 = VLDRS [r0, 2], Q0<imp-use,kill>, Q0<imp-def>
s0 = VLDRS [r0, 0], Q0<imp-use,kill>, Q0<imp-def>
s2 = VLDRS [r0, 4], Q0<imp-use,kill>, Q0<imp-def>
we were gathering the {s0, s1} loads below the s3 load. This is fine,
but confused the verifier since now the s3 load had Q0<imp-use> with
no definition above it.
This should mark such uses <undef> as well. The liveness structure at
the beginning and end of the block is unaffected, and the true sN
definitions should prevent any dodgy reorderings being introduced
elsewhere.
rdar://problem/15124449
llvm-svn: 192344
Substantial SelectionDAG scheduling is going away soon, and is
interfering with Hao's attempts to implement LDn/STn instructions, so
I say we make the leap first.
There were a few reorderings (inevitably) which broke some tests. I
tried to replace them with CHECK-DAG variants mostly, but some too
complex for that to be useful and I just reordered them.
llvm-svn: 192282
from struct byval to registers.
We used to pass 0 which means the alignment of PtrVT. Even when the alignment
of the struct is smaller than 4, the LOADs would have alignment of 4, and
further optimizations could combine the LOADs into a ldm, which would
cause crash.
The fix is to pass the alignment of the struct byval.
rdar://problem/15144402
llvm-svn: 192126
The most likely case where this error happens is when the user specifies
too many register operands. Don't make it look like an internal LLVM bug
when we can see that the error is coming from an inline asm instruction.
For other instructions we keep the "ran out of registers" error.
llvm-svn: 192041
optimizeSelect folds (predicated) copy instructions, it must not ignore
the original register class of the operand when replacing the register
with the copies dest register.
llvm-svn: 191963
The jump doesn't really kill the registers, the following call does but
we never get back anyway.
This avoids some verify-machineinstrs problems when TAILJUMPs are
if-converted.
llvm-svn: 191962
This function-attribute modifies the callee-saved register list and function
epilogue (specifically the return instruction) so that a routine is suitable
for use as an interrupt-handler of the specified type without disrupting
user-mode applications.
rdar://problem/14207019
llvm-svn: 191766
This just adds the basics necessary for allocating the upper words to
virtual registers (move, load and store). The move support is parameterised
in a way that makes it easy to handle zero extensions, but the associated
zero-extend patterns are added by a later patch.
The easiest way of testing this seemed to be add a new "h" register
constraint for high words. I don't expect the constraint to be useful
in real inline asms, but it should work, so I didn't try to hide it
behind an option.
llvm-svn: 191739
We were completely ignoring the unorder/ordered attributes of condition
codes and also incorrectly lowering seto and setuo.
Reviewed-by: Vincent Lejeune<vljn at ovi.com>
llvm-svn: 191603
of loops.
Previously, two consecutive calls to function "func" would result in the
following sequence of instructions:
1. load $16, %got(func)($gp) // load address of lazy-binding stub.
2. move $25, $16
3. jalr $25 // jump to lazy-binding stub.
4. nop
5. move $25, $16
6. jalr $25 // jump to lazy-binding stub again.
With this patch, the second call directly jumps to func's address, bypassing
the lazy-binding resolution routine:
1. load $25, %got(func)($gp) // load address of lazy-binding stub.
2. jalr $25 // jump to lazy-binding stub.
3. nop
4. load $25, %got(func)($gp) // load resolved address of func.
5. jalr $25 // directly jump to func.
llvm-svn: 191591
Remove the command line argument "struct-path-tbaa" since we should not depend
on command line argument to decide which format the IR file is using. Instead,
we check the first operand of the tbaa tag node, if it is a MDNode, we treat
it as struct-path aware TBAA format, otherwise, we treat it as scalar TBAA
format.
When clang starts to use struct-path aware TBAA format no matter whether
struct-path-tbaa is no, and we can auto-upgrade existing bc files, the support
for scalar TBAA format can be dropped.
Existing testing cases are updated to use the struct-path aware TBAA format.
llvm-svn: 191538
The backend tries to use block operations like MVC, NC, OC and XC for
simple scalar operations. For correctness reasons, it rejects any case
in which the regions might partially overlap. However, for performance
reasons, it should also reject cases where the regions might be equal,
since the instruction might then not use the fast path.
This fixes a performance regression seen in bzip2. We may want to limit
the optimisation even more in future, or even remove it entirely, but I'll
try with this for now.
llvm-svn: 191525
The backend previously folded offsets into PC-relative addresses
whereever possible. That's the right thing to do when the address
can be used directly in a PC-relative memory reference (using things
like LRL). But if we have a register-based memory reference and need
to load the PC-relative address separately, it's better to use an anchor
point that could be shared with other accesses to the same area of the
variable.
Fixes a FIXME.
llvm-svn: 191524
For v4f32 and v2f64, EXTRACT_VECTOR_ELT is matched by a pseudo-insn which may
be expanded to subregister copies and/or instructions as appropriate.
llvm-svn: 191514
This change fixes the problem reported in pr17380 and re-add the dagcombine
transformation ensuring that the value types are always legal if the
transformation is triggered after Legalization took place.
Added the test case from pr17380.
llvm-svn: 191509
When generating code for shared libraries, even local calls may be
intercepted, so we need a nop after the call for the linker to fix up the
TOC. Test case adapted from the one provided in PR17354.
llvm-svn: 191440
Generally, it is desirable to distribute (a + b) * c to a*c + b*c for
ARM with VMLx forwarding, where a, b and c are vectors.
However, for (a + b)*(a + b), distribution will result in one extra
instruction.
With distribution:
x = a + b (add)
y = a * x (mul)
z = y + b * y (mla)
Without distribution:
x = a + b (add)
z = x * x (mul)
This patch checks if a mul is a square of add/sub. If yes, skip
distribution.
llvm-svn: 191410
(shl (zext (shr A, X)), X) => (zext (shl (shr A, X), X)).
The rule only triggers when there are no other uses of the
zext to avoid materializing more instructions.
This helps the DAGCombiner understand that the shl/shr
sequence can then be converted into an and instruction.
llvm-svn: 191393
PEI inserts a save/restore sequence for the link register, according to the
information it gets from the MachineRegisterInfo.
MachineRegisterInfo is populated by the VirtRegMap pass.
This pass was not aware of noreturn calls and was registering the definitions of
these calls the same way as regular operations.
Modify VirtRegPass so that it does not set the isPhysRegUsed information for
registers only defined by noreturn calls.
The rational is that a noreturn call is the "last instruction" of the program
(if it returns the behavior is undefined), so everything that is defined by it
cannot be used and will not interfere with anything else. Therefore, it is
pointless to account for then.
llvm-svn: 191349
This is being disabled because it is no longer needed for
performance. It is only used by postRAscheduler which is also planned
for removal, and it is implemented with an out-dated view of register
liveness. It consideres aliases instead of register units, assumes
valid kill flags, and assumes implicit uses on partial register
defs. Kill flags and implicit operands are error prone and impossible
to verify. We should gradually eliminate dependence on them in the
postRA phases.
Targets that still benefit from this should move to the MI
scheduler. If that doesn't solve the problem, then we should add a
hook to regalloc to optimize reload placement.
llvm-svn: 191348
Most constant BUILD_VECTOR's are matched using ComplexPatterns which cover
bitcasted as well as normal vectors. However, it doesn't seem to be possible to
match ldi.[bhwd] in a type-agnostic manner (e.g. to support the widest range of
immediates, it should be possible to use ldi.b to load v2i64) using TableGen so
ldi.[bhwd] is matched using custom code in MipsSEISelDAGToDAG.cpp
This made the majority of the constant splat BUILD_VECTOR lowering redundant.
The only transformation remaining for constant splats is when an (up-to) 32-bit
constant splat is possible but the value does not fit into a 10-bit signed
integer. In this case, the BUILD_VECTOR is transformed into a bitcasted
BUILD_VECTOR so that fill.[bhw] can be used to splat the vector from a GPR32
register (which is initialized using the usual lui/addui sequence).
There are no additional tests since this is a re-implementation of previous
functionality. The change is intended to make it easier to implement some of
the upcoming instruction selection patches since they can rely on existing
support for BUILD_VECTOR's in the DAGCombiner.
compare_float.ll changed slightly because a BITCAST is no longer
introduced during legalization.
llvm-svn: 191299
Patch by Ana Pazos.
1.Added support for v1ix and v1fx types.
2.Added Scalar Pairwise Reduce instructions.
3.Added initial implementation of Scalar Arithmetic instructions.
llvm-svn: 191263
Sometimes a copy from a vreg -> vreg sneaks into the middle of a terminator
sequence. It is safe to slice this into the stack protector success bb.
This fixes PR16979.
llvm-svn: 191260
The recursive nature of the address selection code can cause the stack to
explode if there is a long chain of GEPs. Convert the recursive bit into a
iterative method to avoid this.
<rdar://problem/12445434>
llvm-svn: 191252
Changes to MIPS SelectionDAG:
* Added nodes VEXTRACT_[SZ]EXT_ELT to represent extract and extend in a single
operation and implemented the DAG combines necessary to fold sign/zero
extends into the extract.
llvm-svn: 191199
Note: There's a later patch on my branch that re-implements this to select
build_vector without the custom SelectionDAG nodes. The future patch avoids
the constant-folding problems stemming from the custom node (i.e. it doesn't
need to re-implement all the DAG combines related to BUILD_VECTOR).
Changes to MIPS specific SelectionDAG nodes:
* Added VSPLAT
This is a special case of BUILD_VECTOR that covers the case the
BUILD_VECTOR is a splat operation.
* Added VSPLATD
This is a special case of VSPLAT that handles the cases when v2i64 is legal
llvm-svn: 191191
Previously, the DAGISel function WalkChainUsers was spotting that it
had entered already-selected territory by whether a node was a
MachineNode (amongst other things). Since it's fairly common practice
to insert MachineNodes during ISelLowering, this was not the correct
check.
Looking around, it seems that other nodes get their NodeId set to -1
upon selection, so this makes sure the same thing happens to all
MachineNodes and uses that characteristic to determine whether we
should stop looking for a loop during selection.
This should fix PR15840.
llvm-svn: 191165
In AVX 256bit vectors are valid vectors and therefore the Type Legalizer doesn't
split the VSELECT and SETCC nodes. AVX only supports MIN/MAX on 128bit vectors
and this fix enables vector splitting for this special case in the X86 DAG
Combiner.
This fix is related to PR16695, PR17002, and <rdar://problem/14594431>.
llvm-svn: 191131
The Type Legalizer recognizes that VSELECT needs to be split, because the type
is to wide for the given target. The same does not always apply to SETCC,
because less space is required to encode the result of a comparison. As a result
VSELECT is split and SETCC is unrolled into scalar comparisons.
This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG
Combiner. If a matching pattern is found, then the result mask of SETCC is
promoted to the expected vector mask for the given target. This mask has usually
te same size as the VSELECT return type (except for Intel KNL). Now the type
legalizer will split both VSELECT and SETCC.
This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX
pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>.
llvm-svn: 191130
C-like languages promote types like unsigned short to unsigned int before
performing an arithmetic operation. Currently the rotate matcher in the
DAGCombiner does not consider this situation.
This commit extends the DAGCombiner in the way that the pattern
(or (shl ([az]ext x), (*ext y)), (srl ([az]ext x), (*ext (sub 32, y))))
is folded into
([az]ext (rotl x, y))
The matching is restricted to aext and zext because in this cases the upper
bits are either undefined or known. Test case is included.
This fixes PR16726.
llvm-svn: 191049
C-like languages promote types like unsigned short to unsigned int before
performing an arithmetic operation. Currently the rotate matcher in the
DAGCombiner does not consider this situation.
This commit extends the DAGCombiner in the way that the pattern
(or (shl ([az]ext x), (*ext y)), (srl ([az]ext x), (*ext (sub 32, y))))
is folded into
([az]ext (rotl x, y))
The matching is restricted to aext and zext because in this cases the upper
bits are either undefined or known. Test case is included.
This fixes PR16726.
llvm-svn: 191045
When selecting the DAG (add (WrapperRIP ...), (FrameIndex ...)), X86 code had
spotted the FrameIndex possibility and was working out whether it could fold
the WrapperRIP into this.
The test for forming a %rip version is notionally whether we already have a
base or index register (%rip precludes both), but we were forgetting to account
for the register that would be inserted later to access the frame.
rdar://problem/15024520
llvm-svn: 190995
1) make sure that the first two instructions of the sequence cannot
separate from each other. The linker requires that they be sequential.
If they get separated, it can still work but it will not work in all
cases because the first of the instructions mostly involves the hi part
of the pc relative offset and that part changes slowly. You would have
to be at the right boundary for this to matter.
2) make sure that this sequence begins on a longword boundary.
There appears to be a bug in binutils which makes some of these calculations
get messed up if the instruction sequence does not begin on a longword
boundary. This is being investigated with the appropriate binutils folks.
llvm-svn: 190966
For some reason I never got around to adding these at the same time as
the signed versions. No idea why.
I'm not sure whether this SystemZII::BranchC* stuff is useful, or whether
it should just be replaced with an "is normal" flag. I'll leave that
for later though.
There are some boundary conditions that can be tweaked, such as preferring
unsigned comparisons for equality with [128, 256), and "<= 255" over "< 256",
but again I'll leave those for a separate patch.
llvm-svn: 190930
Summary:
We indicate that the object files are safe by emitting a @feat.00
absolute address symbol. The address is presumably interpreted as a
bitfield of features that the compiler would like to enable. Bit 0 is
documented in the PE COFF spec to opt in to "registered SEH", which is
what /safeseh enables.
LLVM's object files are safe by default because LLVM doesn't know how to
produce SEH handlers.
Reviewers: Bigcheese
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D1691
llvm-svn: 190898
Large code model on PPC64 requires creating and referencing TOC entries when
using the addis/ld form of addressing. This was not being done in all cases.
The changes in this patch to PPCAsmPrinter::EmitInstruction() fix this. Two
test cases are also modified to reflect this requirement.
Fast-isel was not creating correct code for loading floating-point constants
using large code model. This also requires the addis/ld form of addressing.
Previously we were using the addis/lfd shortcut which is only applicable to
medium code model. One test case is modified to reflect this requirement.
llvm-svn: 190882
When a truncate node defines a legal vector type but uses an illegal
vector type, the legalization process was splitting the vector until
<1 x vector> type, but then it was failing to scalarize the node because
it did not know how to handle TRUNCATE.
<rdar://problem/14989896>
llvm-svn: 190830
- check that -mcpu=slm uses the call register indirect optimization
- check that -mcpu=slm runs the scheduler
- check that -mcpu=slm supports the movbe instruction
llvm-svn: 190814
The port originally had special patterns for extload, mapping them to the
same instructions as sextload. It seemed neater to have patterns that
match "an extension that is allowed to be signed" and "an extension that
is allowed to be unsigned".
This was originally meant to be a clean-up, but it does improve the handling
of promoted integers a little, as shown by args-06.ll.
llvm-svn: 190777
This is a re-commit of r190764, with an extra check to make sure that we're not
performing the transformation on illegal types (a small test case has been
added for this as well).
Original commit message:
The PPC backend uses a target-specific DAG combine to turn unaligned Altivec
loads into a permutation-based sequence when possible. Unfortunately, the
target-specific DAG combine is not always called on all loads of interest
(sometimes the routines in DAGCombine call CombineTo such that the new node and
users are not added to the worklist); allowing the combine to trigger early
(before type legalization) mitigates this problem. Because the autovectorizers
only create legal vector types, I don't expect a lot of cases where this
optimization is enabled by type legalization in practice.
llvm-svn: 190771
This is causing test-suite failures.
Original commit message:
The PPC backend uses a target-specific DAG combine to turn unaligned Altivec
loads into a permutation-based sequence when possible. Unfortunately, the
target-specific DAG combine is not always called on all loads of interest
(sometimes the routines in DAGCombine call CombineTo such that the new node and
users are not added to the worklist); allowing the combine to trigger early
(before type legalization) mitigates this problem. Because the autovectorizers
only create legal vector types, I don't expect a lot of cases where this
optimization is enabled by type legalization in practice.
llvm-svn: 190765
The PPC backend uses a target-specific DAG combine to turn unaligned Altivec
loads into a permutation-based sequence when possible. Unfortunately, the
target-specific DAG combine is not always called on all loads of interest
(sometimes the routines in DAGCombine call CombineTo such that the new node and
users are not added to the worklist); allowing the combine to trigger early
(before type legalization) mitigates this problem. Because the autovectorizers
only create legal vector types, I don't expect a lot of cases where this
optimization is enabled by type legalization in practice.
llvm-svn: 190764
DAGCombiner::isAlias can be called with SrcValue1 or SrcValue2 null, and we
can't use AA in this case (if we try, then the casting code in AA will assert).
llvm-svn: 190763
When a structure is passed by value, and that structure contains a vector
member, according to the PPC ABI, the structure will receive enhanced alignment
(so that the vector within the structure will always be aligned).
This should resolve PR16641.
llvm-svn: 190636
In fast-math mode sqrt(x) is calculated using the fast expansion of the
reciprocal of the reciprocal sqrt expansion. The reciprocal and reciprocal
sqrt expansions use the associated estimate instructions along with some Newton
iterations. Unfortunately, as a result, sqrt(0) was being calculated as NaN,
which is not correct. Now we explicitly return a result of zero if the input is
zero.
llvm-svn: 190624
Aggressive anti-dependency breaking is enabled by default for all PPC cores.
This provides a general speedup on the P7 and other platforms (among other
factors, the instruction group formation for the non-embedded PPC cores is done
during post-RA scheduling). In order to do this safely, the incompatibility
between uses of the MFOCRF instruction and anti-dependency breaking are
resolved by marking MFOCRF with hasExtraSrcRegAllocReq. As noted in the removed
FIXME, the problem was that MFOCRF's output is sensitive to the identify of the
source register, and always paired with a shift to undo this effect. Because
anti-dependency breaking is unaware of this hidden dependency of the shift
amount on the source register of the MFOCRF instruction, changing that register
must be inhibited.
Two test cases were adjusted: The SjLj test was made more insensitive to
register choices and scheduling; the saveCR test disabled anti-dependency
breaking because part of what it is testing is proper register reuse.
llvm-svn: 190587
For _XYZ, the type of VDATA is v4i32, because v3i32 doesn't exist.
The ADDR64 bit is not exposed. A simpler intrinsic that doesn't take
a resource descriptor might be nicer.
The maximum number of input SGPRs is bumped to 17.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 190575
The main complication here is that TM and TMY (the memory forms) set
CC differently from the register forms. When the tested bits contain
some 0s and some 1s, the register forms set CC to 1 or 2 based on the
value the uppermost bit. The memory forms instead set CC to 1
regardless of the uppermost bit.
Until now, I've tried to make it so that a branch never tests for an
impossible CC value. E.g. NR only sets CC to 0 or 1, so branches on the
result will only test for 0 or 1. Originally I'd tried to do the same
thing for TM and TMY by using custom matching code in ISelDAGToDAG.
That ended up being very ugly though, and would have meant duplicating
some of the chain checks that the common isel code does.
I've therefore gone for the simpler alternative of adding an extra
operand to the TM DAG opcode to say whether a memory form would be OK.
This means that the inverse of a "TM;JE" is "TM;JNE" rather than the
more precise "TM;JNLE", just like the inverse of "TMLL;JE" is "TMLL;JNE".
I suppose that's arguably less confusing though...
llvm-svn: 190400
Fix XCoreLowerThreadLocal trying to initialise globals
which have no initializer.
Add handling of const expressions containing thread local variables.
These need to be replaced with instructions, as the thread ID is
used to access the thread local variable.
llvm-svn: 190300
This sidesteps a bug in PrescheduleNodesWithMultipleUses() which
does not check if callResources will be affected by the transformation.
llvm-svn: 190299
We used to generate the compact unwind encoding from the machine
instructions. However, this had the problem that if the user used `-save-temps'
or compiled their hand-written `.s' file (with CFI directives), we wouldn't
generate the compact unwind encoding.
Move the algorithm that generates the compact unwind encoding into the
MCAsmBackend. This way we can generate the encoding whether the code is from a
`.ll' or `.s' file.
<rdar://problem/13623355>
llvm-svn: 190290
precision loads and stores as well as reg+imm double precision loads and stores.
Previously, expansion of loads and stores was done after register allocation,
but now it takes place during legalization. As a result, users will see double
precision stores and loads being emitted to spill and restore 64-bit FP registers.
llvm-svn: 190235
Field 2 of DIType (Context), field 9 of DIDerivedType (TypeDerivedFrom),
field 12 of DICompositeType (ContainingType), fields 2, 7, 12 of DISubprogram
(Context, Type, ContainingType).
llvm-svn: 190205
Occasionally DAGCombiner can spot that a SETCC operation is completely
redundant and reduce it to "all true" or "all false". If this happens to a
vector, the value produced has to take account of what a normal comparison
would have produced, which may be an all-1s bitmask.
The fix in SelectionDAG.cpp is tested, however, as far as I can see the code in
TargetLowering.cpp is possibly unreachable and almost certainly irrelevant when
triggered so there are no tests. However, I believe it's still clearly the
right change and may save someone else some hassle if it suddenly becomes
reachable. So I'm doing it anyway.
llvm-svn: 190147
The architecture has many comparison instructions, including some that
extend one of the operands. The signed comparison instructions use sign
extensions and the unsigned comparison instructions use zero extensions.
In cases where we had a free choice between signed or unsigned comparisons,
we were trying to decide at lowering time which would best fit the available
instructions, taking things like extension type into account. The code
to do that was getting increasingly hairy and was also making some bad
decisions. E.g. when comparing the result of two LLCs, it is better to use
CR rather than CLR, since CR can be fused with a branch while CLR can't.
This patch removes the lowering code and instead adds an operand to
integer comparisons to say whether signed comparison is required,
whether unsigned comparison is required, or whether either is OK.
We can then leave the choice of instruction up to the normal isel code.
llvm-svn: 190138
If the DAG already has only legal types, then the second round of DAG combines
is skipped. In this case VSELECT+SETCC patterns that match a more efficient
instruction (e.g. min/max) are never recognized.
This fix allows VSELECT+SETCC combines if the types are already legal before DAG
type legalization.
Reviewer: Nadav
llvm-svn: 190105
Solution is not sufficient to prevent 'mov pc, lr' being emitted for jump table code.
Test case doesn't trigger the added functionality.
llvm-svn: 190047
This improves code generation for jump tables by avoiding the emission of "mov pc, lr" which could fool the processor into believing this is a return from a function causing mispredicts. The code generation logic for jump tables uses ADR to materialize the address of the jump target.
Patch by Daniel Stewart!
llvm-svn: 190043
In sparc, setjmp stores only the registers %fp, %sp, %i7 and %o7. longjmp restores
the stack, and the callee-saved registers (all local/in registers: %i0-%i7, %l0-%l7)
using the stored %fp and register windows. However, this does not guarantee that the longjmp
will restore the registers, as they were when the setjmp was called. This is because these
registers may be clobbered after returning from setjmp, but before calling longjmp.
This patch prevents the registers %i0-%i5, %l0-l7 to live across the setjmp call using the register mask.
llvm-svn: 190033
Fast register pressure tracking currently only takes effect during
bottom up scheduling. Forcing this is a bit faster and simpler for
targets that don't have many scheduling constraints and don't need
top-down scheduling.
llvm-svn: 190014
'Force' values in registers using the calling convention. Now, we only depend on
the calling convention and that the allocator performs copy coalescing.
llvm-svn: 189985
This reverts commit r189648.
Fixes for the previously failing clang-side arm_neon_intrinsics test
cases will be checked in separately.
llvm-svn: 189841
For now this just handles simple comparisons of an ANDed value with zero.
The CC value provides enough information to do any comparison for a
2-bit mask, and some nonzero comparisons with more populated masks,
but that's all future work.
llvm-svn: 189819
What we really want is to enable Swift by default for *v7s triples (and there already seems to be some logic which attempts to do that). In that case the iOS version doesn't matter.
llvm-svn: 189763
don't exist in libc. This is really not the right way to solve this problem;
but it's not clear to me at this time exactly what is the right way.
If we create stubs here, they will cause link errors because these functions
do not exist in libc.
llvm-svn: 189727
This patch adds fast-isel support for calls (but not intrinsic calls
or varargs calls). It also removes a badly-formed assert. There are
some new tests just for calls, and also for folding loads into
arguments on calls to avoid extra extends.
llvm-svn: 189701
has hard float, when you compile the mips32 code you have to make sure
that it knows to compile any mips32 routines as hard float. I need to clean
up the way mips16 hard float is specified but I need to first think through
all the details. Mips16 always has a form of soft float, the difference being
whether the underlying hardware has floating point. So it's not really
necessary to pass the -soft-float to llvm since soft-float is always true
for mips16 by virtue of the fact that it will not register floating point
registers. By using this fact, I can simplify the way this is all handled.
llvm-svn: 189690
Yet another chunk of fast-isel code. This one handles various
conversions involving floating-point. (It also includes some
miscellaneous handling throughout the back end for LWA_32 and LWAX_32
that should have been part of the load-store patch.)
llvm-svn: 189677
Created SUPressureDiffs array to hold the per node PDiff computed during DAG building.
Added a getUpwardPressureDelta API that will soon replace the old
one. Compute PressureDelta here from the precomputed PressureDiffs.
Updating for liveness will come next.
llvm-svn: 189640
This is the next big chunk of fast-isel code. The primary purpose is
to implement selection of loads and stores, but there is a lot of
drag-along to support this. The common code to analyze addresses for
both loads and stores is substantial. It's also necessary to add the
materialization code for global values.
Related to load-store processing is the code to fold loads into
integer extends, since otherwise we generate lots of redundant
instructions. We also need to add some overrides to some FastEmit
routines to ensure we don't assign GPR 0 to a virtual register when
this would change the meaning of an instruction.
I added handling selection of a few binary arithmetic instructions, to
enable committing some test cases I wrote a while back.
Finally, ap couple of miscellaneous changes:
* I cleaned up some poor style from a previous patch in
PPCISelLowering.cpp, pointed out by David Blaikie.
* I enlarged the Addr.Offset field to avoid sign problems with 32-bit
offsets.
llvm-svn: 189636
In addition to recognizing when the multiply's second argument is
coming from an explicit VDUPLANE, also look for a plain scalar
f32 reference and reference it via the corresponding vector
lane.
rdar://14870054
llvm-svn: 189619
The usual default of "dmb ish" (inner-shareable) isn't even a valid instruction
on v6M or v7M (well, it does the same thing but software is strongly
discouraged from using it) so we should emit a full-system barrier there.
llvm-svn: 189483
The vqdmlal and vqdmlls instructions are really just a fused pair consisting of
a vqdmull.sN and a vqadd.sN. This adds patterns to LLVM so that we can switch
Clang's CodeGen over to generating these instead of the special vqdmlal
intrinsics.
llvm-svn: 189480
These intrinsics are legalized to V(ALL|ANY)_(NON)?ZERO nodes,
are matched as SN?Z_[BHWDV]_PSEUDO pseudo's, and emitted as
a branch/mov sequence to evaluate to 0 or 1.
Note: The resulting code is sub-optimal since it doesnt seem to be possible
to feed the result of an intrinsic directly into a brcond. At the moment
it uses (SETCC (VALL_ZERO $ws), 0, SETEQ) and similar which unnecessarily
evaluates the boolean twice.
llvm-svn: 189478
For now just handles simple comparisons of an ANDed value with zero.
The CC value provides enough information to do any comparison for a
2-bit mask, and some nonzero comparisons with more populated masks,
but that's all future work.
llvm-svn: 189469
The MSA control registers have been added as reserved registers,
and are only used via ISD::Copy(To|From)Reg. The intrinsics are lowered
into these nodes.
llvm-svn: 189468
We want to convert code like (or (srl N, 8), (shl N, 8)) into (srl (bswap N),
const), but this is only valid if the bits above 16 on the source pattern are
0, the checks we were doing on this were slightly wrong before.
llvm-svn: 189348
These instructions aren't particularly complicated and it's well worth having
patterns for some reasonably useful LLVM IR that will match them. Soon we
should be able to switch Clang over to producing this natural version.
llvm-svn: 189335
Note that all of these tests use ld.b and st.b for the loads and stores
regardless of the data size. This is because the definition of bitcast is
equivalent to a store/load sequence and DAG combiner accordingly folds bitcasts
to/from v16i8 into the load/store nodes to product load/store nodes with
type v16i8.
llvm-svn: 189333
Lengths up to a certain threshold (currently 6 * 256) use a series of MVCs.
Lengths above that threshold use a loop to handle X*256 bytes followed
by a single MVC to handle the excess (if any). This loop will also be
needed in future when support for variable lengths is added.
Because the same tablegen classes are used to define MVC and CLC,
the patch also has the side-effect of defining a pseudo loop instruction
for CLC. That instruction isn't used yet (and wouldn't be handled correctly
if it were). I'm planning to use it soon though.
llvm-svn: 189331
DICompositeType will have an identifier field at position 14. For now, the
field is set to null in DIBuilder.
For DICompositeTypes where the template argument field (the 13th field)
was optional, modify DIBuilder to make sure the template argument field is set.
Now DICompositeType has 15 fields.
Update DIBuilder to use NULL instead of "i32 0" for null value of a MDNode.
Update verifier to check that DICompositeType has 15 fields and the last
field is null or a MDString.
Update testing cases to include an extra field for DICompositeType.
The identifier field will be used by type uniquing so a front end can
genearte a DICompositeType with a unique identifer.
llvm-svn: 189282
Get the register class right for the TST instruction. This keeps the
machine verifier happy, enabling us to turn it on for another test.
rdar://12594152
llvm-svn: 189274
Incremental improvement to fast-isel for PPC64. This allows us to
select on ret, sext, and zext. Filling in sext/zext improves some of
the existing logic in handling compare-immediates that needed extends.
A simplified return convention for fast-isel is also added to the
PPC64 calling conventions. All call/return processing for DAG
selection is handled with custom code, so there isn't an existing CC
to rely on here. The include of PPCGenCallingConv.inc causes compiler
warnings due to the 32-bit calling conventions that are not used, so
the dummy function "usePPC32CCs()" is added here to silence those.
Test cases for the return and extend logic are added.
llvm-svn: 189266
If we have a binary operation like ISD:ADD, we can set the result type
equal to the result type of one of its operands rather than using
TargetLowering::getPointerTy().
Also, any use of DAG.getIntPtrConstant(C) as an operand for a binary
operation can be replaced with:
DAG.getConstant(C, OtherOperand.getValueType());
llvm-svn: 189227
This adds minimal support to the SelectionDAG for handling address spaces
with different pointer sizes. The SelectionDAG should now correctly
lower pointer function arguments to the correct size as well as generate
the correct code when lowering getelementptr.
This patch also updates the R600 DataLayout to use 32-bit pointers for
the local address space.
v2:
- Add more helper functions to TargetLoweringBase
- Use CHECK-LABEL for tests
llvm-svn: 189221
First chunk of actual fast-isel selection code. This handles direct
and indirect branches, as well as feeding compares for direct
branches. PPCFastISel::PPCEmitIntExt() is just roughed in and will be
expanded in a future patch. This also corrects a problem with
selection for constant pool entries in JIT mode or with small code
model.
llvm-svn: 189202
I need to add the rest of these to the list or else to delay putting
out the actual stub until later in code generation when I know if
the external function ever got emitted
Resubmit this patch. The target triple needs to be added to the test so that
clang does not tell the backend the wrong target when the host is BSD. There
is a clang bug in here somewhere that I need to track down. At Mips this
has been filed internally as a bug.
llvm-svn: 189186
I need to add the rest of these to the list or else to delay putting
out the actual stub until later in code generation when I know if
the external function ever got emitted.
llvm-svn: 189161