regs. This is the only change in this checkin that may affects the
default scheduler. With better register tracking and heuristics, it
doesn't make sense to artificially lower the register limit so much.
Added -sched-high-latency-cycles and X86InstrInfo::isHighLatencyDef to
give the scheduler a way to account for div and sqrt on targets that
don't have an itinerary. It is currently defaults to 10 (the actual
number doesn't matter much), but only takes effect on non-default
schedulers: list-hybrid and list-ilp.
Added several heuristics that can be individually disabled for the
non-default sched=list-ilp mode. This helps us determine how much
better we can do on a given benchmark than the default
scheduler. Certain compute intensive loops run much faster in this
mode with the right set of heuristics, and it doesn't seem to have
much negative impact elsewhere. Not all of the heuristics are needed,
but we still need to experiment to decide which should be disabled by
default for sched=list-ilp.
llvm-svn: 127067
There was a previous implementation with patterns that would
have matched e.g.
shl <v4i32> <i32>,
but this is not valid LLVM IR so they never were selected.
llvm-svn: 126998
- Allow i16, i32, i64, float, and double types, using the native .u16,
.u32, .u64, .f32, and .f64 PTX types.
- Allow loading/storing of all primitive types.
- Allow primitive types to be passed as parameters.
- Allow selection of PTX Version and Shader Model as sub-target attributes.
- Merge integer/floating-point test cases for load/store.
- Use .u32 instead of .s32 to conform to output from NVidia nvcc compiler.
Patch by Justin Holewinski
llvm-svn: 126824
and 256-bit forms. Because the number of elements in a vector
does not determine the vector type (4 elements could be v4f32 or
v4f64), pass the full type of the vector to decode routines.
llvm-svn: 126664
- Add appropriate TableGen patterns for fadd, fsub, fmul.
- Add .f32 as the PTX type for the LLVM float type.
- Allow parameters, return values, and global variable declarations
to accept the float type.
- Add appropriate test cases.
Patch by Justin Holewinski
llvm-svn: 126636
1. Inform users of ADDEs with two 0 operands that it never sets carry
2. Fold other ADDs or ADDCs into the ADDE if possible
It would be neat if we could do the same thing for SETCC+ADD eventually, but we can't do that in target independent code.
llvm-svn: 126557
D registers since the vpush list may not have gaps. Make sure the stack
adjustment instruction isn't moved between them. Ditto for vpop in
epilogues.
Sorry, can't reduce a small test case.
rdar://9043312
llvm-svn: 126457
The previous codegen for the slow path (when values are in VFP / NEON
registers) was incorrect if the source is NaN.
The new codegen uses NEON vbsl instruction to copy the sign bit. e.g.
vmov.i32 d1, #0x80000000
vbsl d1, d2, d0
If NEON is not available, it uses integer instructions to copy the sign bit.
rdar://9034702
llvm-svn: 126295
In other words, do not keep track of argument's location. The debugger (gdb) is not prepared to see line table entries for arguments. For the debugger, "second" line table entry marks beginning of function body.
This requires some coordination with debugger to get this working.
- The debugger needs to be aware of prolog_end attribute attached with line table entries.
- The compiler needs to accurately mark prolog_end in line table entries (at -O0 and at -O1+)
llvm-svn: 126155
"dllimport" function must not be GlobalVariable, but Function. It is enough to check with GlobalValue.
test/CodeGen/X86/dll-linkage.ll is updated to check llc -O0.
llvm-svn: 126110
of a constant had a minor typo introduced when copying it from the book, which
caused it to favor negative approximations over positive approximations in many
cases. Positive approximations require fewer operations beyond the multiplication.
In the case of division by 3, we still generate code that is a single instruction
larger than GCC's code.
llvm-svn: 126097
of testing for its presence at cmake time.
This way the build automatically regenerates the makefiles when a svn
update brings in a new sublibrary.
llvm-svn: 126068
query about available library functions. For now this just has
memset_pattern16, which exists on darwin, but it can be extended for a
bunch of other things in the future.
llvm-svn: 125965
(LLVMX86Utils.a) to break cyclic library dependencies between
LLVMX86CodeGen.a and LLVMX86AsmParser.a. Previously this code was in
a header file and marked static but AVX requires some additional
functionality here that won't be used by all clients. Since including
unused static functions causes a gcc compiler warning, keeping it as a
header would break builds that use -Werror. Putting this in its own
library solves both problems at once.
llvm-svn: 125765
No one uses *-mingw64. mingw-w64 is represented as {i686|x86_64}-w64-mingw32. In llvm side, i686 and x64 can be treated as similar way.
llvm-svn: 125747
This is necessary to avoid a crash in certain tangled situations where a kill
flag is first correctly moved to a merged instruction, and then needs to be
moved again:
STR %R0, a...
STR %R0<kill>, b...
First becomes:
STR %R0, b...
STM a, %R0<kill>, ...
and then:
STM a, %R0, ...
STM b, %R0<kill>, ...
We can now remove the kill flag from the merged STM when needed. 8960050.
llvm-svn: 125591
- Add custom operand matching for imod and iflags.
- Rename SplitMnemonicAndCC to SplitMnemonic since it splits more than CC
from mnemonic.
- While adding ".w" as an operand, don't change "Head" to avoid passing the
wrong mnemonic to ParseOperand.
- Add asm parser tests.
- Add disassembler tests just to make sure it can catch all cps versions.
llvm-svn: 125489
have their low bits set to zero. This allows us to optimize
out explicit stack alignment code like in stack-align.ll:test4 when
it is redundant.
Doing this causes the code generator to start turning FI+cst into
FI|cst all over the place, which is general goodness (that is the
canonical form) except that various pieces of the code generator
don't handle OR aggressively. Fix this by introducing a new
SelectionDAG::isBaseWithConstantOffset predicate, and using it
in places that are looking for ADD(X,CST). The ARM backend in
particular was missing a lot of addressing mode folding opportunities
around OR.
llvm-svn: 125470
These are just FXSAVE and FXRSTOR with REX.W prefixes. These versions use
64-bit pointer values instead of 32-bit pointer values in the memory map they
dump and restore.
llvm-svn: 125446
Teach the AsmMatcher handling to distinguish between an error custom-parsing
an operand and a failure to match. The former should propogate the error
upwards, while the latter should continue attempting to parse with
alternative matchers.
Update the ARM asm parser accordingly.
llvm-svn: 125426
This
define float @foo(float %x, float %y) nounwind readnone {
entry:
%0 = tail call float @copysignf(float %x, float %y) nounwind readnone
ret float %0
}
Was compiled to:
vmov s0, r1
bic r0, r0, #-2147483648
vmov s1, r0
vcmpe.f32 s0, #0
vmrs apsr_nzcv, fpscr
it lt
vneglt.f32 s1, s1
vmov r0, s1
bx lr
This fails to copy the sign of -0.0f because it's lost during the float to int
conversion. Also, it's sub-optimal when the inputs are in GPR registers.
Now it uses integer and + or operations when it's profitable. And it's correct!
lsrs r1, r1, #31
bfi r0, r1, #31, #1
bx lr
rdar://8984306
llvm-svn: 125357
t2LDRpci with t2LDRi12.
There are a couple of problems with this.
1. The encoding for the literal and immediate constant are different.
Note bit 7 of the literal case is 'U' so it can be negative.
2. t2LDRi12 is now narrowed to tLDRpci before constant island pass is run.
So we end up never using the Thumb2 instruction, which ends up creating a
lot more constant islands.
llvm-svn: 125074
parsing of operands introduced in r125030. As a small note, besides using a more
generic approach we can also have more descriptive output when debugging
llvm-mc, example:
mcr p7, #1, r5, c1, c1, #4
note: parsed instruction:
['mcr', <ARMCC::al>,
<coprocessor number: 7>,
1,
<register 73>,
<coprocessor register: 1>,
<coprocessor register: 1>,
4]
llvm-svn: 125052
The vld1-lane, vld1-dup and vst1-lane instructions do not yet support using
post-increment versions, but all the rest of the NEON load/store instructions
should be handled now.
llvm-svn: 125014
These operations are expanded to pairs of loads or stores, and the first one
uses the address register update to produce the address for the second one.
So far, the second load/store has also updated the address register, just
for convenience, since that output has never been used. In anticipation of
actually supporting post-increment updates for these operations, this changes
the non-updating operations to use a non-updating load/store for the second
instruction.
llvm-svn: 125013
This allows us to easily support 256-bit operations that don't have
native 256-bit support. This applies to integer operations, certain
types of shuffles and various othher things.
llvm-svn: 124910
(yes, this is different from R_ARM_CALL)
- Adds a new method getARMBranchTargetOpValue() which handles the
necessary distinction between the conditional and unconditional br/bl
needed for ARM/ELF
At least for ARM mode, the needed fixup for conditional versus unconditional
br/bl is identical, but the ARM docs and existing ARM tools expect this
reloc type...
Added a few FIXME's for future naming fixups in ARMInstrInfo.td
llvm-svn: 124895
matching EXTRACT_SUBVECTOR to VEXTRACTF128 along with support routines
to examine and translate index values. VINSERTF128 comes next. With
these two in place we can begin supporting more AVX operations as
INSERT/EXTRACT can be used as a fallback when 256-bit support is not
available.
llvm-svn: 124797
Reversing the operands allows us to fold, but doesn't force us to. Also, at
this point the DAG is still being optimized, so the check for hasOneUse is not
very precise.
llvm-svn: 124773
This makes the job of the later optzn passes easier, allowing the vast amount of
icmp transforms to chew on it.
We transform 840 switches in gcc.c, leading to a 16k byte shrink of the resulting
binary on i386-linux.
The testcase from README.txt now compiles into
decl %edi
cmpl $3, %edi
sbbl %eax, %eax
andl $1, %eax
ret
llvm-svn: 124724
the load, then it may be legal to transform the load and store to integer
load and store of the same width.
This is done if the target specified the transformation as profitable. e.g.
On arm, this can transform:
vldr.32 s0, []
vstr.32 s0, []
to
ldr r12, []
str r12, []
rdar://8944252
llvm-svn: 124708
This happens all the time when a smul is promoted to a larger type.
On x86-64 we now compile "int test(int x) { return x/10; }" into
movslq %edi, %rax
imulq $1717986919, %rax, %rax
movq %rax, %rcx
shrq $63, %rcx
sarq $34, %rax <- used to be "shrq $32, %rax; sarl $2, %eax"
addl %ecx, %eax
This fires 96 times in gcc.c on x86-64.
llvm-svn: 124559
default implementation for x86, going through the stack in a similr
fashion to how the codegen implements BUILD_VECTOR. Eventually this
will get matched to VINSERTF128 if AVX is available.
llvm-svn: 124307
implementation of EXTRACT_SUBVECTOR for x86, going through the stack
in a similr fashion to how the codegen implements BUILD_VECTOR.
Eventually this will get matched to VEXTRACTF128 if AVX is available.
llvm-svn: 124292