Commit Graph

26683 Commits

Author SHA1 Message Date
Gabor Greif 5fde43bf2e typo in comment
llvm-svn: 197136
2013-12-12 08:00:34 +00:00
Hao Liu 46a10eec28 [AArch64]Fix the problem that AArch64 backend fails to select scalar_to_vector of vector types having more than one element.
llvm-svn: 197135
2013-12-12 07:36:26 +00:00
Reed Kotler 3230e725aa Check for null pointer before dereferencing. A careless typo on my part.
I don't know why this did not show up earlier. This code has been
around for ages. 

llvm-svn: 197119
2013-12-12 02:41:11 +00:00
Yi Jiang f92a574246 Resubmit r196544: Apply transformation on OS X 10.9+ and iOS 7.0+: pow(10, x) ―> __exp10(x)
llvm-svn: 197109
2013-12-12 01:55:04 +00:00
Hal Finkel fa50630e43 Remove unused multiclass from PPCInstrInfo.td
llvm-svn: 197100
2013-12-12 00:23:29 +00:00
Hal Finkel ceb1f12d9a Improve instruction scheduling for the PPC POWER7
Aside from a few minor latency corrections, the major change here is a new
hazard recognizer which focuses on better dispatch-group formation on the
POWER7. As with the PPC970's hazard recognizer, the most important thing it
does is avoid load-after-store hazards within the same dispatch group. It uses
the POWER7's special dispatch-group-terminating nop instruction (instead of
inserting multiple regular nop instructions). This new hazard recognizer makes
use of the scheduling dependency graph itself, built using AA information, to
robustly detect the possibility of load-after-store hazards.

significant test-suite performance changes (the error bars are 99.5% confidence
intervals based on 5 test-suite runs both with and without the change --
speedups are negative):

speedups:

MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2
	-0.55171% +/- 0.333168%

MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl
	-17.5576% +/- 14.598%

MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl
	-29.5708% +/- 7.09058%

MultiSource/Benchmarks/TSVC/Reductions-flt/Reductions-flt
	-34.9471% +/- 11.4391%

SingleSource/Benchmarks/BenchmarkGame/puzzle
	-25.1347% +/- 11.0104%

SingleSource/Benchmarks/Misc/flops-8
	-17.7297% +/- 9.79061%

SingleSource/Benchmarks/Shootout-C++/ary3
	-35.5018% +/- 23.9458%

SingleSource/Regression/C/uint64_to_float
	-56.3165% +/- 25.4234%

SingleSource/UnitTests/Vectorizer/gcc-loops
	-18.5309% +/- 6.8496%

regressions:

MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000
	18.351% +/- 12.156%

SingleSource/Benchmarks/Shootout-C++/methcall
	27.3086% +/- 14.4733%

llvm-svn: 197099
2013-12-12 00:19:11 +00:00
Chad Rosier 446d8ea0fb [AArch64] Refactor NEON floating-point Max/Min/Maxnm/Minnm across vector AArch64
intrinsics to use f32 types, rather than their vector equivalents.

llvm-svn: 197090
2013-12-11 23:21:25 +00:00
Hal Finkel 94a6f380bb Fix the PPC subsumes-predicate check
For one predicate to subsume another, they must both check the same condition
register. Failure to check this prerequisite was causing miscompiles.

Fixes PR18003.

llvm-svn: 197089
2013-12-11 23:12:25 +00:00
Chad Rosier 088f93d4b5 [AArch64] Add NEON scalar floating-point compare LLVM AArch64 intrinsics that
use f32/f64 types, rather than their vector equivalents.

llvm-svn: 197068
2013-12-11 21:03:46 +00:00
Chad Rosier 473a01e1c9 [AArch64] Refactor the NEON scalar floating-point reciprocal step and
floating-point reciprocal square root step LLVM AArch64 intrinsics to
use f32/f64 types, rather than their vector equivalents.

llvm-svn: 197067
2013-12-11 21:03:43 +00:00
Chad Rosier 7098fcc062 [AArch64] Refactor the NEON scalar floating-point reciprocal estimate, floating-
point reciprocal exponent, and floating-point reciprocal square root estimate
LLVM AArch64 intrinsics to use f32/f64 types, rather than their vector
equivalents.

llvm-svn: 197066
2013-12-11 21:03:40 +00:00
Rafael Espindola 009e758628 Don't set unused variable.
llvm-svn: 197064
2013-12-11 20:40:57 +00:00
Tom Stellard d7e146ede6 R600: Re-format Processors.td
This makes it a little easier to read.

Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 197058
2013-12-11 17:51:51 +00:00
Tom Stellard f2ba972af6 R600: Register AMDGPUCFGStructurizer pass
This enables -print-before-all to dump MachineInstrs after it is run.

Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 197057
2013-12-11 17:51:47 +00:00
Tom Stellard 1de5582d06 R600: Register R600EmitClauseMarkers pass
This enables -print-before-all to dump MachineInstrs after it is run.

Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 197056
2013-12-11 17:51:41 +00:00
Logan Chien 439e8f9e38 [arm] Implement ARM .arch directive.
llvm-svn: 197052
2013-12-11 17:16:25 +00:00
Tim Northover 76fc8a4c40 ARM: constrain register-class in fast-isel
The tests were no longer using fast-isel at all (MachO needs an "ios" rather
than "darwin" triple at the moment and Linux needs ARM mode). Once that was
corrected, the verifier complained about a t2ADDri created for the alloca.

llvm-svn: 197046
2013-12-11 16:04:57 +00:00
Elena Demikhovsky cf08809813 AVX-512: Removed "z" suffix from AVX-512 instructions, since it is incompatible with GCC.
I moved a test from avx512-vbroadcast-crash.ll to avx512-vbroadcast.ll
I defined HasAVX512 predicate as AssemblerPredicate. It means that you should invoke llvm-mc with "-mcpu=knl" to get encoding for AVX-512 instructions. I need this to let AsmMatcher to set different encoding for AVX and AVX-512 instructions that have the same mnemonic and operands (all scalar instructions).

llvm-svn: 197041
2013-12-11 14:31:04 +00:00
Richard Sandiford 73170f8488 [SystemZ] Optimize fcmp X, 0 in cases where X is also negated
In such cases it's often better to test the result of the negation instead,
since the negation also sets CC.

llvm-svn: 197032
2013-12-11 11:45:08 +00:00
Reed Kotler 5bde5c35f4 Distinguish and choose 16 or 32 bit forms of save/restore for Mips16.
llvm-svn: 196999
2013-12-11 03:32:44 +00:00
Kevin Qin 310b6c08ba [AArch64 NEON] Get instruction BSL matched to VSELECT.
llvm-svn: 196998
2013-12-11 02:33:50 +00:00
Rafael Espindola b2fb78d45a Move mips' datalayout computation out of line and add comments.
llvm-svn: 196996
2013-12-11 01:41:10 +00:00
Rafael Espindola 60f48e5a67 Move Sparc's getDataLayout out of line and add comments.
llvm-svn: 196990
2013-12-11 01:07:43 +00:00
NAKAMURA Takumi 8bc9bfaa5a Prune redundant dependencies in LLVMBuild.txt.
llvm-svn: 196988
2013-12-11 00:30:57 +00:00
Rafael Espindola 5b3585871b Move PPC's getDataLayoutString out of line and document it better.
llvm-svn: 196987
2013-12-11 00:09:06 +00:00
Reid Kleckner ad92aca47c Revert the backend fatal error from r196939
The combination of inline asm, stack realignment, and dynamic allocas
turns out to be too common to reject out of hand.

ASan inserts empy inline asm fragments and uses aligned allocas.
Compiling any trivial function containing a dynamic alloca with ASan is
enough to trigger the check.

XFAIL the test cases that would be miscompiled and add one that uses the
relevant functionality.

llvm-svn: 196986
2013-12-10 23:23:52 +00:00
Rafael Espindola 002f8aa584 Refactor the computation of the x86 datalayout.
llvm-svn: 196976
2013-12-10 22:05:32 +00:00
Matt Arsenault eaa3a7efab Use llvm_unreachable instead of assert(0)
llvm-svn: 196971
2013-12-10 21:37:42 +00:00
David Fang 1b01849f2d on darwin<10, fallback to .weak_definition (PPC,X86)
.weak_def_can_be_hidden was not yet supported by the system assembler

llvm-svn: 196970
2013-12-10 21:37:41 +00:00
Chad Rosier f70af21651 [AArch64] Refactor the NEON floating-point absolute difference LLVM AArch64
intrinsic to use f32/f64 types, rather than their vector equivalents.

llvm-svn: 196965
2013-12-10 21:33:59 +00:00
Chad Rosier 07cc3f9100 [AArch64] Refactor the NEON signed/unsigned floating-point convert to fixed-point
LLVM AArch64 intrinsics to use f32/f64, rather than their vector equivalents.

llvm-svn: 196964
2013-12-10 21:33:56 +00:00
Chad Rosier 98b8baa35c [AArch64] Overload NEON signed/unsigned floating-point convert to fixed-point
and fixed-point convert to floating-point LLVM AArch64 intrinsics.

llvm-svn: 196963
2013-12-10 21:33:53 +00:00
Chad Rosier cc34d187b8 [AArch64] Overload NEON signed/unsigned integer convert to floating-point
LLVM AArch64 intrinsics.

llvm-svn: 196962
2013-12-10 21:33:50 +00:00
Reid Kleckner ee08897fb8 Reland "Fix miscompile of MS inline assembly with stack realignment"
This re-lands commit r196876, which was reverted in r196879.

The tests have been fixed to pass on platforms with a stack alignment
larger than 4.

Update to clang side tests will land shortly.

llvm-svn: 196939
2013-12-10 18:27:32 +00:00
Tim Northover 9653eb5759 Make Triple's isOSBinFormatXXX functions partition triple-space.
Most users would be surprised if "isCOFF" and "isMachO" were simultaneously
true, unless they'd put the compiler in a box with a gun attached to a photon
detector.

This makes sure precisely one of the three formats is true for any triple and
simplifies some target logic based on that.

llvm-svn: 196934
2013-12-10 16:57:43 +00:00
Chad Rosier 7a9bba442f [AArch64] Refactor the Neon vector/scalar floating-point convert intrinsics so
that they use float/double rather than the vector equivalents when appropriate.

llvm-svn: 196930
2013-12-10 16:11:39 +00:00
Chad Rosier fcc4c366d1 [AArch64] Refactor the Neon vector/scalar floating-point convert implementation.
Specifically, reuse the ARM intrinsics when possible.

llvm-svn: 196926
2013-12-10 15:35:33 +00:00
Andrea Di Biagio f7c33c8162 Ensure that the backend no longer emits unnecessary vector insert instructions
immediately after SSE scalar fp instructions like addss or mulss.

Added patterns to select SSE scalar fp arithmetic instructions from a scalar
fp operation followed by a blend.

For example, given the following code:
  __m128 foo(__m128 A, __m128 B) {
    A[0] += B[0];
    return A;
  }

previously we generated:
  addss %xmm0, %xmm1
  movss %xmm1, %xmm0

now we generate:
  addss %xmm1, %xmm0

llvm-svn: 196925
2013-12-10 15:22:48 +00:00
Vincent Lejeune cc0ea74c7b R600: Fix an infinite loop when trying to reorganize export/tex vector input
llvm-svn: 196923
2013-12-10 14:43:31 +00:00
Vincent Lejeune f92d64d160 R600: Fix input modifiers lost for Cayman
llvm-svn: 196922
2013-12-10 14:43:27 +00:00
Reed Kotler 0ff4001781 Next step in Mips16 prologue/epilogue cleanup.
Save S2(reg 18) only when we are calling floating point stubs that
have a return value of float or complex. Some more work to make this
better but this is the first step.

llvm-svn: 196921
2013-12-10 14:29:38 +00:00
Elena Demikhovsky e382c3fdcd AVX-512: changed intrinsics for mask operations
llvm-svn: 196918
2013-12-10 13:53:10 +00:00
Elena Demikhovsky 6270b388c8 AVX-512: Changed intrinsics of VPCONFLICT to match GCC builtin form
llvm-svn: 196914
2013-12-10 11:58:35 +00:00
Daniel Sanders c309be2f1f [mips][msa] Correct sld and sldi builtins.
Summary: The result register of these instructions is also the first operand.

Reviewers: jacksprat, dsanders

Reviewed By: dsanders

Differential Revision: http://llvm-reviews.chandlerc.com/D2362
Differential Revision: http://llvm-reviews.chandlerc.com/D2363

llvm-svn: 196910
2013-12-10 11:37:00 +00:00
Richard Sandiford bef3d7af2b Add TargetLowering::prepareVolatileOrAtomicLoad
One unusual feature of the z architecture is that the result of a
previous load can be reused indefinitely for subsequent loads, even if
a cache-coherent store to that location is performed by another CPU.
A special serializing instruction must be used if you want to force
a load to be reattempted.

Since volatile loads are not supposed to be omitted in this way,
we should insert a serializing instruction before each such load.
The same goes for atomic loads.

The patch implements this at the IR->DAG boundary, in a similar way
to atomic fences.  It is a no-op for targets other than SystemZ.

llvm-svn: 196906
2013-12-10 10:49:34 +00:00
Richard Sandiford 9afe613d12 Add TargetLowering::prepareVolatileOrAtomicLoad
One unusual feature of the z architecture is that the result of a
previous load can be reused indefinitely for subsequent loads, even if
a cache-coherent store to that location is performed by another CPU.
A special serializing instruction must be used if you want to force
a load to be reattempted.

Since volatile loads are not supposed to be omitted in this way,
we should insert a serializing instruction before each such load.
The same goes for atomic loads.

The patch implements this at the IR->DAG boundary, in a similar way
to atomic fences.  It is a no-op for targets other than SystemZ.

llvm-svn: 196905
2013-12-10 10:36:34 +00:00
Kevin Qin 43385c7065 [AArch64 NEON] Replace fpimm with fpz32 for floating compare with zero.
This is a small change to be strict. Just want get pattern safer.

llvm-svn: 196889
2013-12-10 06:51:07 +00:00
Kevin Qin 04396d1e69 [AArch64 NEON] Support poly128_t and implement relevant intrinsic.
llvm-svn: 196887
2013-12-10 06:48:35 +00:00
NAKAMURA Takumi 396d4d3c7e Add proper dependencies to LLVMBuild.txt in llvm/lib.
I'll prune redundant deps in LLVMBuild.txt, later.

llvm-svn: 196881
2013-12-10 05:39:34 +00:00
NAKAMURA Takumi e3afe2ef62 Whitespaces.
llvm-svn: 196880
2013-12-10 05:39:12 +00:00
Reid Kleckner 0a9509f080 Revert "Fix miscompile of MS inline assembly with stack realignment"
This reverts commit r196876.  Its tests failed on the bots, so I'll
figure it out tomorrow.

llvm-svn: 196879
2013-12-10 05:31:27 +00:00
Reid Kleckner 7f10a8cd45 Fix miscompile of MS inline assembly with stack realignment
For stack frames requiring realignment, three pointers may be needed:
- ebp to address incoming arguments
- esi (could be any callee-saved register) to address locals
- esp to address outgoing arguments

We would use esi unconditionally without verifying that it did not
conflict with inline assembly.

This change doesn't do the verification, it simply emits a fatal error
on functions that use stack realignment, dynamic SP adjustments, and
inline assembly.

Because stack realignment is common on Windows, we also no longer assume
that MS inline assembly clobbers esp.  Instead, we analyze the inline
instructions for implicit definitions and check if esp is there.  If so,
we require the use of a base pointer and consider it in the condition
above.

Mostly fixes PR16830, but we could try harder to find a non-conflicting
base pointer.

Reviewers: sunfish

Differential Revision: http://llvm-reviews.chandlerc.com/D1317

llvm-svn: 196876
2013-12-10 05:12:23 +00:00
Rafael Espindola 1d224bd65f Add comments documenting the ARM datalayout string.
llvm-svn: 196850
2013-12-10 00:37:37 +00:00
Rafael Espindola 74d682b443 Simplify further.
Thanks to Jim Grosbach for noticing it.

llvm-svn: 196846
2013-12-10 00:15:35 +00:00
Rafael Espindola 964bf07fb8 Refactor the construction of the DataLayout string on ARM.
llvm-svn: 196843
2013-12-09 23:56:41 +00:00
Chad Rosier 5c8bf9c3db [AArch64] Refactor the NEON scalar reduce pairwise intrinsics, so that they use
float/double rather than the vector equivalents when appropriate.

llvm-svn: 196833
2013-12-09 22:47:38 +00:00
Chad Rosier 3b0b3ee71e [AArch64] Refactor NEON scalar reduce pairwise front-end codegen to remove
unnecessary patterns in tablegen.

llvm-svn: 196832
2013-12-09 22:47:34 +00:00
Chad Rosier 397ff3945c [AArch64] Remove q and non-q intrinsic definitions in the NEON scalar reduce
pairwise implementation, using an overloaded definition instead.

llvm-svn: 196831
2013-12-09 22:47:31 +00:00
Reed Kotler b102fa5aef get rid of superfluous comment
llvm-svn: 196829
2013-12-09 22:08:32 +00:00
Reed Kotler 2e362b3b4b Delete some old code used for testing that is not needed anymore.
This is part of the mips16 epilogue/prologue cleanup.

llvm-svn: 196824
2013-12-09 21:19:51 +00:00
Rafael Espindola 1a3a22fad1 Don't add suffixes for stdcall/fastcall on 64 coff.
This matches the behavior of both msvc and mingw.

llvm-svn: 196814
2013-12-09 20:44:48 +00:00
Rafael Espindola e2a1418e68 Don't set a variable to its default value.
llvm-svn: 196807
2013-12-09 19:36:11 +00:00
Ana Pazos bde2828ae0 Fix pattern match for movi with 0D result
Patch by Jiangning Liu.

With some test case changes:
- intrinsic test added to the existing /test/CodeGen/AArch64/neon-aba-abd.ll.
- New test cases to cover movi 1D scenario without using the intrinsic in
test/CodeGen/AArch64/neon-mov.ll.

llvm-svn: 196806
2013-12-09 19:29:14 +00:00
Daniel Sanders 3519dce968 [mips][msa] Fix invalid generated code when lowering FrameIndex involving unaligned offsets.
Summary:
The MSA ld.[bhwd] and st.[bhwd] instructions scale the immediate by the
element size before use as an offset. The offset must therefore be a
multiple of the element size to be valid in these instructions. However,
an unaligned base address is valid in MSA.

This commit causes the compiler to emit valid code when the calculated
offset is not a multiple of the element size by accounting for the offset
using addiu and using a zero offset in the load/store.

Depends on D2338

Reviewers: matheusalmeida

Reviewed By: matheusalmeida

Differential Revision: http://llvm-reviews.chandlerc.com/D2339

llvm-svn: 196777
2013-12-09 12:47:12 +00:00
Daniel Sanders 26a5a7475e [mips][msa] Fix suboptimal FrameIndex lowering for ld.[hwd] and st.[hwd]
Summary:
The immediate in these instructions is scaled before use as an offset.
They therefore have a wider reach than ld.b/st.b.

Reviewers: matheusalmeida

Reviewed By: matheusalmeida

Differential Revision: http://llvm-reviews.chandlerc.com/D2338

llvm-svn: 196775
2013-12-09 11:50:16 +00:00
Vladimir Medic 0d02be37c2 Method parseSetAssignment treats every operand with '$' sign as register and the parsing is directed to set alias for register. This will result in errors reported when expressions containing label references are parsed(for example long jumps)
As we can't make a complete solution now it has been decided to enable .set directive to handle long jump expressions. This will cause parser to report errors when parsing integer based register assignments, for example:
   .set r3, will be reported as error. Still, the need for expressions is higher priority as the integer based register assignments are Mips specific and can be avoided using register names.

llvm-svn: 196773
2013-12-09 11:03:25 +00:00
Venkatraman Govindaraju 61116e7084 [SPARCV9]: Adjust the resultant pointer of DYNAMIC_STACKALLOC with the stack BIAS on sparcV9.
llvm-svn: 196755
2013-12-09 05:13:25 +00:00
Venkatraman Govindaraju f6c8fe983b [Sparc]: Implement getSetCCResultType() in SparcTargetLowering so that umulo/smulo can be lowered on sparcv9 without an assertion error.
llvm-svn: 196751
2013-12-09 04:02:15 +00:00
Hao Liu 96a587a9f7 [AArch64]Add missing pair intrinsics such as:
int32_t vminv_s32(int32x2_t a)
which should be compiled into SMINP Vd.2S,Vn.2S,Vm.2S

llvm-svn: 196749
2013-12-09 03:51:42 +00:00
Hao Liu 868caea6d1 [AArch64]Pattern match failures for truncate store and extend load
llvm-svn: 196748
2013-12-09 03:34:08 +00:00
Venkatraman Govindaraju 72cc248524 [SparcV9]: Expand MULHU/MULHS:i64 and UMUL_LOHI/SMUL_LOHI:i64 on sparcv9.
This fixes PR18150.

llvm-svn: 196735
2013-12-08 22:06:07 +00:00
Manman Ren 2e06c8c777 Revert 196544 due to internal bot failures.
llvm-svn: 196732
2013-12-08 20:28:33 +00:00
Reed Kotler abaed9ecea Make sure we mark these registers as defined. Previously was done
in the td file.

llvm-svn: 196731
2013-12-08 19:21:47 +00:00
Reed Kotler e0a34ee66e Cleaning up of prologue/epilogue code for Mips16. First step
here is to make save/restore into variable number of argument instructions.

llvm-svn: 196726
2013-12-08 16:51:52 +00:00
Tim Northover a4173715f7 ARM: fix folding of stack-adjustment (yet again).
When trying to eliminate an "sub sp, sp, #N" instruction by folding
it into an existing push/pop using dummy registers, we need to account
for the fact that this might affect precisely how "fp" gets set in the
prologue.

We were attempting this, but assuming that *whenever* we performed a
fold it would make a difference. This is false, for example, in:
    push {r4, r7, lr}
    add fp, sp, #4
    vpush {d8}
    sub sp, sp, #8

we can fold the "sub" into the "vpush", forming "vpush {d7, d8}".
However, in that case the "add fp" instruction mustn't change, which
we were getting wrong before.

Should fix PR18160.

llvm-svn: 196725
2013-12-08 15:56:50 +00:00
Rafael Espindola 080133453b Remove the notion of primitive types.
They were out of place since the introduction of arbitrary precision integer
types.

This also synchronizes the documentation to Types.h, so it refers to first class
types and single value types.

llvm-svn: 196661
2013-12-07 19:34:20 +00:00
Vincent Lejeune 92b0a64906 Add a RequireStructuredCFG Field to TargetMachine.
llvm-svn: 196634
2013-12-07 01:49:19 +00:00
Vincent Lejeune ae7e96062c R600: Remove orphaned declarations
llvm-svn: 196633
2013-12-07 01:49:10 +00:00
Ana Pazos 93a07c2185 Added support for mcpu krait
- krait processor currently modeled with the same features as A9.
- Krait processor additionally has VFP4 (fused multiply add/sub)
and hardware division features enabled.
- krait has currently the same Schedule model as A9
- krait cpu flag is not recognized by the GNU assembler yet,
it is replaced with march=armv7-a to avoid a lower march
from being used.

llvm-svn: 196619
2013-12-06 22:48:17 +00:00
Weiming Zhao 43d8e6cb3b Bug 18149: [AArch32] VSel instructions has no ARMCC field
The current peephole optimizing for compare inst assumes an instr that
uses CPSR has an MO for ARM Cond code.However, for VSEL instructions
(vseqeq, vselgt, vselgt, vselvs), there is no such operand nor do
they support the modification of Cond Code.

llvm-svn: 196588
2013-12-06 17:56:48 +00:00
Cameron McInally e3cc4aacb9 Update AVX512 vector blend intrinsic names.
llvm-svn: 196581
2013-12-06 13:35:35 +00:00
Richard Sandiford 198ddf83c1 [SystemZ] Use LOAD AND TEST for comparisons with -0
...since it os equivalent to comparison with +0.

llvm-svn: 196580
2013-12-06 09:59:12 +00:00
Richard Sandiford 7b4118a0fc [SystemZ] Extend the use of C(L)GFR
instcombine prefers to put extended operands first, so this patch
handles that case for C(L)GFR.

llvm-svn: 196579
2013-12-06 09:56:50 +00:00
Richard Sandiford 48ef6abddc [SystemZ] Optimize selects between 0 and -1
Since z has no setcc instruction as such, the choice of setBooleanContents
is a bit arbitrary.  Currently it's set to ZeroOrOneBooleanContent,
so we produced a branch-free form when selecting between 0 and 1,
but not when selecting between 0 and -1.  This patch handles the latter
case too.

At some point I'd like to measure whether it's better to use conditional
moves for constant selects on z196, but that's future work.

llvm-svn: 196578
2013-12-06 09:53:09 +00:00
Eric Christopher 99952a0823 Fix an index array check.
Patch by Marius Wachtler.

llvm-svn: 196561
2013-12-06 02:45:24 +00:00
Reed Kotler 2db182b5e8 Delete dead code.
llvm-svn: 196551
2013-12-06 00:13:50 +00:00
Yi Jiang 01cfa94212 Apply transformation on OS X 10.9+ and iOS 7.0+: pow(10, x) ―> __exp10(x)
llvm-svn: 196544
2013-12-05 22:42:50 +00:00
Ana Pazos 6b0a8c50dd Implemented vget/vset_lane_f16 intrinsics
llvm-svn: 196533
2013-12-05 21:07:49 +00:00
Andrew Trick 880e573d98 MI-Sched: handle latency of in-order operations with the new machine model.
The per-operand machine model allows the target to define "unbuffered"
processor resources. This change is a quick, cheap way to model stalls
caused by the latency of operations that use such resources. This only
applies when the processor's micro-op buffer size is non-zero
(Out-of-Order). We can't precisely model in-order stalls during
out-of-order execution, but this is an easy and effective
heuristic. It benefits cortex-a9 scheduling when using the new
machine model, which is not yet on by default.

MI-Sched for armv7 was evaluated on Swift (and only not enabled because
of a performance bug related to predication). However, we never
evaluated Cortex-A9 performance on MI-Sched in its current form. This
change adds MI-Sched functionality to reach performance goals on
A9. The only remaining change is to allow MI-Sched to run as a PostRA
pass.

I evaluated performance using a set of options to estimate the performance impact once MI sched is default on armv7:
-mcpu=cortex-a9 -disable-post-ra -misched-bench -scheditins=false

For a simple saxpy loop I see a 1.7x speedup. Here are the llvm-testsuite results:
(min run time over 2 runs, filtering tiny changes)

Speedups:
| Benchmarks/BenchmarkGame/recursive         |  52.39% |
| Benchmarks/VersaBench/beamformer           |  20.80% |
| Benchmarks/Misc/pi                         |  19.97% |
| Benchmarks/Misc/mandel-2                   |  19.95% |
| SPEC/CFP2000/188.ammp                      |  18.72% |
| Benchmarks/McCat/08-main/main              |  18.58% |
| Benchmarks/Misc-C++/Large/sphereflake      |  18.46% |
| Benchmarks/Olden/power                     |  17.11% |
| Benchmarks/Misc-C++/mandel-text            |  16.47% |
| Benchmarks/Misc/oourafft                   |  15.94% |
| Benchmarks/Misc/flops-7                    |  14.99% |
| Benchmarks/FreeBench/distray               |  14.26% |
| SPEC/CFP2006/470.lbm                       |  14.00% |
| mediabench/mpeg2/mpeg2dec/mpeg2decode      |  12.28% |
| Benchmarks/SmallPT/smallpt                 |  10.36% |
| Benchmarks/Misc-C++/Large/ray              |   8.97% |
| Benchmarks/Misc/fp-convert                 |   8.75% |
| Benchmarks/Olden/perimeter                 |   7.10% |
| Benchmarks/Bullet/bullet                   |   7.03% |
| Benchmarks/Misc/mandel                     |   6.75% |
| Benchmarks/Olden/voronoi                   |   6.26% |
| Benchmarks/Misc/flops-8                    |   5.77% |
| Benchmarks/Misc/matmul_f64_4x4             |   5.19% |
| Benchmarks/MiBench/security-rijndael       |   5.15% |
| Benchmarks/Misc/flops-6                    |   5.10% |
| Benchmarks/Olden/tsp                       |   4.46% |
| Benchmarks/MiBench/consumer-lame           |   4.28% |
| Benchmarks/Misc/flops-5                    |   4.27% |
| Benchmarks/mafft/pairlocalalign            |   4.19% |
| Benchmarks/Misc/himenobmtxpa               |   4.07% |
| Benchmarks/Misc/lowercase                  |   4.06% |
| SPEC/CFP2006/433.milc                      |   3.99% |
| Benchmarks/tramp3d-v4                      |   3.79% |
| Benchmarks/FreeBench/pifft                 |   3.66% |
| Benchmarks/Ptrdist/ks                      |   3.21% |
| Benchmarks/Adobe-C++/loop_unroll           |   3.12% |
| SPEC/CINT2000/175.vpr                      |   3.12% |
| Benchmarks/nbench                          |   2.98% |
| SPEC/CFP2000/183.equake                    |   2.91% |
| Benchmarks/Misc/perlin                     |   2.85% |
| Benchmarks/Misc/flops-1                    |   2.82% |
| Benchmarks/Misc-C++-EH/spirit              |   2.80% |
| Benchmarks/Misc/flops-2                    |   2.77% |
| Benchmarks/NPB-serial/is                   |   2.42% |
| Benchmarks/ASC_Sequoia/CrystalMk           |   2.33% |
| Benchmarks/BenchmarkGame/n-body            |   2.28% |
| Benchmarks/SciMark2-C/scimark2             |   2.27% |
| Benchmarks/Olden/bh                        |   2.03% |
| skidmarks10/skidmarks                      |   1.81% |
| Benchmarks/Misc/flops                      |   1.72% |

Slowdowns:
| Benchmarks/llubenchmark/llu                | -14.14% |
| Benchmarks/Polybench/stencils/seidel-2d    |  -5.67% |
| Benchmarks/Adobe-C++/functionobjects       |  -5.25% |
| Benchmarks/Misc-C++/oopack_v1p8            |  -5.00% |
| Benchmarks/Shootout/hash                   |  -2.35% |
| Benchmarks/Prolangs-C++/ocean              |  -2.01% |
| Benchmarks/Polybench/medley/floyd-warshall |  -1.98% |
| Polybench/linear-algebra/kernels/3mm       |  -1.95% |
| Benchmarks/McCat/09-vor/vor                |  -1.68% |

llvm-svn: 196516
2013-12-05 17:55:58 +00:00
Andrew Trick ff199a4b8e Fix the A9 machine model. VTRN writes two registers.
llvm-svn: 196514
2013-12-05 17:55:49 +00:00
Rafael Espindola 4cc2b87375 Add a default constructor to get deterministic behavior.
Should fix the msan and valgrind bots.

llvm-svn: 196509
2013-12-05 16:21:17 +00:00
Justin Holewinski 4459717bab [NVPTX] Fix off-by-one error when creating the VT list for an SDNode
llvm-svn: 196503
2013-12-05 12:58:00 +00:00
Matheus Almeida a6beac1acc [mips] Small code generation improvement for conditional operator (select)
in case the operands are constants and its difference is |1|.
It should be possible in those cases to rematerialize the result using
MIPS's slt and similar instructions.

The small update to some of the tests in cmov.ll, sel1c.ll and sel2c.ll was needed
otherwise the optimization implemented in this patch would have been triggered
(difference between the operands was 1) and that would have changed the semantic
of the tests.

llvm-svn: 196498
2013-12-05 12:07:05 +00:00
Matheus Almeida a611c0f405 [mips] Add some comments related to the optimization performed in performSELECTCombine.
The structure of the code was slightly modified so that the next patch is easier to read/review.

No functional changes.

llvm-svn: 196496
2013-12-05 11:56:56 +00:00
Matheus Almeida 6b59c449d9 [mips][msa] Fix issue with immediate fields of LD/ST instructions
not being correctly encoded/decoded.
In more detail, immediate fields of LD/ST instructions should be
divided/multiplied by the size of the data format before encoding and
after decoding, respectively.

llvm-svn: 196494
2013-12-05 11:06:22 +00:00
Tim Northover e4def5e228 ARM: fix yet another stack-folding bug
We were trying to fold the stack adjustment into the wrong instruction in the
situation where the entire basic-block was epilogue code. Really, it can only
ever be valid to do the folding precisely where the "add sp, ..." would be
placed so there's no need for a separate iterator to track that.

Should fix PR18136.

llvm-svn: 196493
2013-12-05 11:02:02 +00:00
Rafael Espindola 117b20c492 Remove the isImplicitlyPrivate argument of getNameWithPrefix.
getSymbolWithGlobalValueBase use is to create a name of a new symbol based
on the name of an existing GV. Assert that and then remove the last call
to pass true to isImplicitlyPrivate.

This gives the mangler API a 1:1 mapping from GV to names, which is what we
need to drop the mangler dependency on the target (and use an extended
datalayout instead).

llvm-svn: 196472
2013-12-05 05:53:12 +00:00
Alp Toker f907b891da Correct word hyphenations
This patch tries to avoid unrelated changes other than fixing a few
hyphen-related ambiguities and contractions in nearby lines.

llvm-svn: 196471
2013-12-05 05:44:44 +00:00
Rafael Espindola 01d19d0299 Hide the stub created for MO_ExternalSymbol too.
given

declare void @llvm.memset.p0i8.i32(i8* nocapture, i8, i32, i32, i1)
declare void @foo()
define void @bar() {
  call void @foo()
  call void @llvm.memset.p0i8.i32(i8* null, i8 0, i32 188, i32 1, i1 false)
  ret void
}

We used to produce

L_foo$stub:
        .indirect_symbol        _foo
        .ascii  "\364\364\364\364\364"

_memset$stub:
        .indirect_symbol        _memset
        .ascii  "\364\364\364\364\364"

We not produce a private stub for memset too.

Stubs are not needed with recent linkers, but we still produce them for darwin8.

Thanks to David Fang for confirming that gcc used to do this too.

llvm-svn: 196468
2013-12-05 05:19:12 +00:00
Matt Arsenault 89cc49fe5d R600/SI: Add comments for number of used registers.
llvm-svn: 196467
2013-12-05 05:15:35 +00:00
Jiangning Liu 65d8e3422a For AArch64, add missing register cost calculation for big value types like v4i64 and v8i64.
llvm-svn: 196456
2013-12-05 02:12:01 +00:00
Cameron McInally 30bbb214e5 Add AVX512 patterns for v16i32 broadcast and v2i64 zero extend load.
Patch by Aleksey Bader.

llvm-svn: 196435
2013-12-05 00:11:25 +00:00
Kevin Enderby 86496a45cb Fix a bug in darwin's 32-bit X86 handling of evaluating fixups.
Where it would use a scattered relocation entry but falls back to a
normal relocation entry because the FixupOffset is more than 24-bits.

The bug is in the X86MachObjectWriter::RecordScatteredRelocation() where
it changes reference parameter FixedValue but then returns false to indicate
it did not create a scattered relocation entry.  The fix is simply to save the
original value of the parameter FixedValue at the start of the method and
restore it if we are returning false in that case.

rdar://15526046

llvm-svn: 196432
2013-12-04 23:36:24 +00:00
David Peixotto 8ad70b3542 Add support for parsing ARM symbol variants on ELF targets
ARM symbol variants are written with parens instead of @ like this:

  .word __GLOBAL_I_a(target1)

This commit adds support for parsing these symbol variants in
expressions. We introduce a new flag to MCAsmInfo that indicates the
parser should use parens to parse the symbol variant. The expression
parser is modified to look for symbol variants using parens instead
of @ when the corresponding MCAsmInfo flag is true.

The MCAsmInfo parens flag is enabled only for ARM on ELF.

By adding this flag to MCAsmInfo, we are able to get rid of
redundant ARM-specific symbol variants and use the generic variants
instead (e.g. VK_GOT instead of VK_ARM_GOT). We use the new
UseParensForSymbolVariant attribute in MCAsmInfo to correctly print
the symbol variants for arm.

To achive this we need to keep a handle to the MCAsmInfo in the
MCSymbolRefExpr class that we can check when printing the symbol
variant.

Updated Tests:
  Changed case of symbol variant to match the generic kind.
  test/CodeGen/ARM/tls-models.ll
  test/CodeGen/ARM/tls1.ll
  test/CodeGen/ARM/tls2.ll
  test/CodeGen/Thumb2/tls1.ll
  test/CodeGen/Thumb2/tls2.ll

PR18080

llvm-svn: 196424
2013-12-04 22:43:20 +00:00
Cameron McInally cbb51dacfb Fix assembly syntax for AVX512 vector blend instructions.
llvm-svn: 196393
2013-12-04 18:05:36 +00:00
Michael Liao 9a0e3f4823 [X86] Check YMM31/ZMM31 as well
- No test case as there's no calling convention preserve YMM31/ZMM31 only

llvm-svn: 196391
2013-12-04 17:44:22 +00:00
Chad Rosier 1d22b5d1c0 Update the UseFusedMAC definition to directly specify its dependence on having
VFP4.
Patch by Daniel Stewart!

llvm-svn: 196390
2013-12-04 17:16:36 +00:00
Cameron McInally c5f420e129 Suppress '(x < y) ? a : 0 -> (x < y) & a' transform on X86 architectures with dedicated mask registers.
Patch by Aleksey Bader.

llvm-svn: 196386
2013-12-04 14:52:33 +00:00
Kevin Qin afd095de8b [AArch64 Neon] Add ACLE intrinsic vceqz_f64.
llvm-svn: 196362
2013-12-04 08:02:34 +00:00
Kevin Qin f9832e8de7 [AArch64 NEON] Add missing compare intrinsics.
llvm-svn: 196360
2013-12-04 07:53:28 +00:00
Juergen Ributzka 17e0d9ee6c [Stackmap] Emit multi-byte nops for X86.
llvm-svn: 196334
2013-12-04 00:39:08 +00:00
Reed Kotler 59975c2cba final patch for very long conditional branches for mips16 constant islands.
this completes the basic port of ARM constant islands to Mips16.
More testing, code review, cleanup is in order but basically everything
seems to be working. A bug in gas is preventing some of the runtime
testing but I hope to resolve this soon.

llvm-svn: 196331
2013-12-03 23:42:51 +00:00
Rafael Espindola 0a2baf8eaf Fix mingw32 thiscall + sret.
Unlike msvc, when handling a thiscall + sret gcc will
* Put the sret in %ecx
* Put the this pointer is (%esp)

This fixes, for example, calling stringstream::str.

llvm-svn: 196312
2013-12-03 20:51:23 +00:00
James Molloy 8a25992f39 Addrspacecasts are no-ops on ARM.
Testcase added.

llvm-svn: 196269
2013-12-03 11:23:11 +00:00
Richard Sandiford ccc2a7c1a0 [SystemZ] Fix choice of known-zero mask in insertion optimization
The backend converts 64-bit ORs into subreg moves if the upper 32 bits
of one operand and the low 32 bits of the other are known to be zero.
It then tries to peel away redundant ANDs from the upper 32 bits.

Since AND masks are canonicalized to exclude known-zero bits,
the test ORs the mask and the known-zero bits together before
checking for redundancy.  The problem was that it was using the
wrong node when checking for known-zero bits, so could drop ANDs
that were still needed.

llvm-svn: 196267
2013-12-03 11:01:54 +00:00
Michael Liao 14b02848a3 Enhance the fix of PR17631
- The fix to PR17631 fixes part of the cases where 'vzeroupper' should
  not be issued before 'call' insn. There're other cases where helper
  calls will be inserted not limited to epilog. These helper calls do
  not follow the standard calling convention and won't clobber any YMM
  registers. (So far, all call conventions will clobber any or part of
  YMM registers.)
  This patch enhances the previous fix to cover more cases 'vzerosupper' should
  not be inserted by checking if that function call won't clobber any YMM
  registers and skipping it if so.

llvm-svn: 196261
2013-12-03 09:17:32 +00:00
Hao Liu dca64f4a20 [AArch64]Add missing floating point convert, round and misc intrinsics.
E.g. int64x1_t vcvt_s64_f64(float64x1_t a) -> FCVTZS Dd, Dn

llvm-svn: 196210
2013-12-03 06:06:55 +00:00
Hao Liu c250cbc095 AArch64: add missing ACLE intrinsics mapping to general arithmetic operation from VFP instructions.
E.g. float64x1_t vadd_f64(float64x1_t a, float64x1_t b) -> FADD Dd, Dn, Dm.

llvm-svn: 196208
2013-12-03 05:58:30 +00:00
NAKAMURA Takumi bc815b2d21 Whitespace.
llvm-svn: 196203
2013-12-03 05:28:27 +00:00
Hao Liu 21a461353a AArch64: Add missing scalar pair intrinsics.
E.g. "float32_t vaddv_f32(float32x2_t a)" to be matched into "faddp s0, v1.2s".

llvm-svn: 196198
2013-12-03 03:39:47 +00:00
Jiangning Liu 3a541d46a1 Add some missing pattern matches for AArch64 Neon intrinsics like vuqadd_s64 and friends.
llvm-svn: 196192
2013-12-03 01:33:52 +00:00
Jiangning Liu 94a7bb2130 Add some missing pattern matches for AArch64 Neon intrinsics like vmull_high_n_s16 and friends.
llvm-svn: 196190
2013-12-03 01:29:32 +00:00
Rafael Espindola 20a8621e5f Don't set PrivateGlobalPrefix for NVPTX and R600.
These targets have special asm printers that don't use these.

llvm-svn: 196187
2013-12-03 01:03:35 +00:00
Hal Finkel 563cc05cae Remove PPCScoreboardHazardRecognizer
PPCScoreboardHazardRecognizer was a subclass of ScoreboardHazardRecognizer
which did only one thing: filtered out nodes in EmitInstruction for which
DAG->getInstrDesc(SU) returned NULL. This used to be the case for PPC pseudo
instructions. As far as I can tell, this is no longer true, and so we can use
ScoreboardHazardRecognizer directly.

llvm-svn: 196171
2013-12-02 23:52:46 +00:00
Rafael Espindola 5113d166f5 Refactor the setting of PrivateGlobalPrefix.
No functionality change.

llvm-svn: 196170
2013-12-02 23:39:26 +00:00
Rafael Espindola 5733d9bbb0 Don't set PrivateGlobalPrefix twice in the same function.
llvm-svn: 196169
2013-12-02 23:26:31 +00:00
Rafael Espindola 04867ce9b0 Convert two char* that are only ever used as booleans to bool.
llvm-svn: 196168
2013-12-02 23:04:51 +00:00
Chad Rosier 3106de3f9d [AArch64] Implemented vcopy_lane patterns using scalar DUP instruction.
Patch by Ana Pazos!

llvm-svn: 196151
2013-12-02 21:05:16 +00:00
Vincent Lejeune 4b8d9e303c R600: Workaround for cayman loop bug
llvm-svn: 196121
2013-12-02 17:29:37 +00:00
Rafael Espindola f4e6b29a03 Move getSymbolWithGlobalValueBase to TargetLoweringObjectFile.
This allows it to be used in TargetLoweringObjectFileImpl.cpp.

llvm-svn: 196117
2013-12-02 16:25:47 +00:00
Alp Toker a5b88a5851 Introduce poor man's consumeToken() in X86AsmParser
This makes the code a little more idiomatic.

No change in behaviour.

llvm-svn: 196113
2013-12-02 16:06:06 +00:00
Rafael Espindola 957cf6f9e1 Remove dead code.
MO_JumpTableIndex and MO_ExternalSymbol don't show up on inline asm.

Keeping parts of the old asm printer just to print inline asm to a string that
we then parse back looks like a hack.

llvm-svn: 196111
2013-12-02 15:36:37 +00:00
Tim Northover dee8604caf ARM: decide whether to use movw/movt based on "minsize" attribute.
llvm-svn: 196102
2013-12-02 14:46:26 +00:00
NAKAMURA Takumi e4b8a2e559 XCoreFrameLowering.cpp: Use [in,out] instead of [in] [out]. [-Wdocumentation]
llvm-svn: 196094
2013-12-02 11:31:25 +00:00
Robert Lytton 7fbca3cce0 XCore target: Make handling of large frames not dependent upon an FP.
eliminateFrameIndex() has been reworked to handle both small & large frames
with either a FP or SP.
An additional Slot is required for Scavenging spills when not using FP for large frames.
Reworked the handling of Register Scavenging.

Whether we are using an FP or not, whether it is a large frame or not,
and whether we are using a large code model or not are now independent.

llvm-svn: 196091
2013-12-02 11:05:28 +00:00
Tim Northover 72360d201c ARM: add pseudo-instructions for lit-pool global materialisation
These are used by MachO only at the moment, and (much like the existing
MOVW/MOVT set) work around the fact that the labels used in the actual
instructions often contain PC-dependent components, which means that repeatedly
materialising the same global can't be CSEed.

With small modifications, it could be adapted to how ELF finds the address of
_GLOBAL_OFFSET_TABLE_, which would give similar benefits in PIC mode there.

llvm-svn: 196090
2013-12-02 10:35:41 +00:00
Benjamin Kramer 2725bd21ff XCore: Unbreak C++11 build.
llvm-svn: 196089
2013-12-02 10:29:26 +00:00
Robert Lytton d3ffa66c6c XCore target: fix large code model 'select' indirect address handling.
llvm-svn: 196088
2013-12-02 10:18:37 +00:00
Robert Lytton ff38d37c77 XCore target: Add large code model
When using large code model:
Global objects larger than 'CodeModelLargeSize' bytes are placed in sections named with a trailing ".large"
The folded global address of such objects are lowered into the const pool.

During inspection it was noted that LowerConstantPool() was using a default offset of zero.
A fix was made, but due to only offsets of zero being generated, testing only verifies the change is not detrimental.

Correct the flags emitted for explicitly specified sections.

We assume the size of the object queried by getSectionForConstant() is never greater than CodeModelLargeSize.
To handle greater than CodeModelLargeSize, changes to AsmPrinter would be required.

llvm-svn: 196087
2013-12-02 10:18:31 +00:00
Robert Lytton 0abd2c96b5 XCore target: Fix eliminateFrameIndex() to handle large frames
Large frame offsets are loaded from the ConstantPool.
Where possible, offsets are encoded using the smaller MKMSK instruction.
Large frame offsets can only be used when there is a frame-pointer.

llvm-svn: 196085
2013-12-02 10:18:19 +00:00
Robert Lytton a9f984fb76 XCore target: Enable frames larger than 65535 to be lowered
llvm-svn: 196084
2013-12-02 10:18:14 +00:00
Rafael Espindola 321f55ae9f Remove leftovers from a non-MC asm printer.
llvm-svn: 196068
2013-12-02 05:42:16 +00:00
Rafael Espindola 8473259602 Remove #if 0 declarations.
llvm-svn: 196067
2013-12-02 05:24:28 +00:00
Rafael Espindola 50712a456d Change the default of AsmWriterClassName and isMCAsmWriter.
llvm-svn: 196065
2013-12-02 04:55:42 +00:00
Rafael Espindola d0f993ae68 Remove dead declarations.
llvm-svn: 196063
2013-12-02 04:18:19 +00:00
Rafael Espindola 32635781bb Refactor for clarity and efficiency.
The PPC GetSymbolFromOperand already prefixed stubs of MO_ExternalSymbol, so
this should be a nop.

llvm-svn: 196059
2013-12-02 03:26:43 +00:00
Tim Northover 45479dcf49 ARM: fix bug in -Oz stack adjustment folding
Previously, we clobbered callee-saved registers when folding an "add
sp, #N" into a "pop {rD, ...}" instruction. This change checks whether
a register we're going to add to the "pop" could actually be live
outside the function before doing so and should fix the issue.

This should fix PR18081.

llvm-svn: 196046
2013-12-01 14:16:24 +00:00
Benjamin Kramer 951b15eb09 Revamp error checking in the ms inline asm parser.
- Actually abort when an error occurred.
- Check that the frontend lookup worked when parsing length/size/type operators.

Tested by a clang test. PR18096.

llvm-svn: 196044
2013-12-01 11:47:42 +00:00
Hal Finkel 42daeae9bd Add a scheduling model (with itinerary) for the PPC POWER7
This adds a scheduling model for the POWER7 (P7) core, and enables the
machine-instruction scheduler when targeting the P7. Scheduling for the P7,
like earlier ooo PPC cores, requires considering both dispatch group hazards,
and functional unit resources and latencies. These are both modeled in a
combined itinerary. Dispatch group formation is still handled by the post-RA
scheduler (which still needs to be updated for the P7, but nevertheless does a
pretty good job).

One interesting aspect of this change is that I've also enabled to use of AA
duing CodeGen for the P7 (just as it is for the embedded cores). The benchmark
results seem to support this decision (see below), and while this is normally
useful for in-order cores, and not for ooo cores like the P7, I think that the
dispatch slot hazards are enough like in-order resources to make the AA useful.

Test suite significant performance differences (where negative is a speedup,
and positive is a regression) vs. the current situation:

MultiSource/Benchmarks/BitBench/drop3/drop3
  with AA: N/A
  without AA: -28.7614% +/- 19.8356%
(significantly against AA)

MultiSource/Benchmarks/FreeBench/neural/neural
  with AA: -17.7406% +/- 11.2712%
  without AA: N/A
(significantly in favor of AA)

MultiSource/Benchmarks/SciMark2-C/scimark2
  with AA: -11.2079% +/- 1.80543%
  without AA: -11.3263% +/- 2.79651%

MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt
  with AA: -41.8649% +/- 17.0053%
  without AA: -34.5256% +/- 23.7072%

MultiSource/Benchmarks/mafft/pairlocalalign
  with AA: 25.3016% +/- 17.8614%
  without AA: 38.6629% +/- 14.9391%
(significantly in favor of AA)

MultiSource/Benchmarks/sim/sim
  with AA: N/A
  without AA: 13.4844% +/- 7.18195%
(significantly in favor of AA)

SingleSource/Benchmarks/BenchmarkGame/Large/fasta
  with AA: 15.0664% +/- 6.70216%
  without AA: 12.7747% +/- 8.43043%

SingleSource/Benchmarks/BenchmarkGame/puzzle
  with AA: 82.2713% +/- 26.3567%
  without AA: 75.7525% +/- 41.1842%

SingleSource/Benchmarks/Misc/flops-2
  with AA: -37.1621% +/- 20.7964%
  without AA: -35.2342% +/- 20.2999%
(significantly in favor of AA)

These are 99.5% confidence intervals from 5 runs per configuration. Regarding
the choice to turn on AA during CodeGen, of these results, four seem
significantly in favor of using AA, and one seems significantly against. I'm
not making this decision based on these numbers alone, but these results
seem consistent with results I have from other tests, and so I think that, on
balance, using AA is a win.

llvm-svn: 195981
2013-11-30 20:55:12 +00:00
Hal Finkel 46402a4211 Split some PPC itinerary classes
In preparation for adding scheduling definitions for the POWER7, split some PPC
itinerary classes so that the P7's latencies and hazards can be better
described. For the most part, this means differentiating indexed from non-index
pre-increment loads and stores. Also, differentiate single from
double-precision sqrt.

No functionality change intended (except for a more-specific latency for
single-precision sqrt on the A2).

llvm-svn: 195980
2013-11-30 20:41:13 +00:00
Zoran Jovanovic 9d86e26e62 Fixed issue with microMIPS long branch.
llvm-svn: 195975
2013-11-30 19:12:28 +00:00
Daniel Sanders 7fd68d6018 [mips][msa] MSA loads and stores have a 10-bit offset. Account for this when lowering FrameIndex.
This prevents the compiler from emitting invalid ld.[bhwd]'s and st.[bhwd]'s
when the stack frame is between 512 and 32,768 bytes in size.

llvm-svn: 195973
2013-11-30 13:47:57 +00:00
Daniel Sanders 7153414768 [mips][msa] A small refactor to reduce patch noise in my next commit
No functional change. An if-statement has been split into two nested if-statements.

llvm-svn: 195972
2013-11-30 13:15:21 +00:00
Reed Kotler ad450f239f Part 1 of 3 patches that completes very long conditional branches
in constant islands for Mips16. We introdcuce JalB16 as a synomnym
for Jal16. It makes it easier to read and is also necessary because
Jal16 is a call instruction but JalB16 is being used as a branch.
Various parts of LLVM will not work properly even in this late stage of
the backend if we use what was declared as a call instruction to function
as a branch. For one, basic block labels may not get emitted in some
situations. 

llvm-svn: 195968
2013-11-29 22:32:56 +00:00
Zoran Jovanovic 1bc3cce040 Revert revision 195965.
llvm-svn: 195967
2013-11-29 22:10:02 +00:00
Zoran Jovanovic ff2a40ce4d Fixed issue with microMIPS long branch.
llvm-svn: 195965
2013-11-29 21:41:24 +00:00
Hal Finkel 1df3205e8c Adjust PPC A2 input operand latencies
On the PPC A2, instructions are only issued after their input operands are
ready. Model this by specifying that input operands are read at dispatch (0
cycles after issue). This changes all input operand latencies from 1 to 0.

Significant test-suite performance changes (these are 99.5% confidence
intervals on 6 runs for both before and after):

speedups:
MultiSource/Benchmarks/sim/sim
	-1.21915% +/- 0.175063%
MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt
	-1.23946% +/- 1.05133%
SingleSource/Benchmarks/Misc/flops-2
	-1.24237% +/- 0.681362%
MultiSource/Applications/JM/lencod/lencod
	-1.33992% +/- 0.757498%
MultiSource/Benchmarks/TSVC/InductionVariable-flt/InductionVariable-flt
	-1.51802% +/- 1.21468%
MultiSource/Benchmarks/TSVC/GlobalDataFlow-flt/GlobalDataFlow-flt
	-2.18818% +/- 1.28605%
MultiSource/Benchmarks/TSVC/Packing-flt/Packing-flt
	-2.21977% +/- 1.19499%
SingleSource/Benchmarks/BenchmarkGame/spectral-norm
	-2.29822% +/- 0.671871%
MultiSource/Benchmarks/TSVC/Packing-dbl/Packing-dbl
	-2.40975% +/- 0.355931%
SingleSource/Benchmarks/Misc/fp-convert
	-2.41899% +/- 1.04751%
MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl
	-2.50349% +/- 0.126765%
SingleSource/Benchmarks/Misc/flops-3
	-3.00214% +/- 0.700795%
MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt
	-3.56995% +/- 3.2929%
MultiSource/Applications/sgefa/sgefa
	-4.24908% +/- 2.00413%
MultiSource/Benchmarks/ASC_Sequoia/IRSmk/IRSmk
	-18.1294% +/- 3.96489%

regressions:
MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl
	1.03249% +/- 0.178547%
MultiSource/Applications/hexxagon/hexxagon
	1.16597% +/- 0.285235%
MultiSource/Benchmarks/TSVC/IndirectAddressing-flt/IndirectAddressing-flt
	1.39576% +/- 1.07855%
SingleSource/Benchmarks/Misc-C++/stepanov_v1p2
	1.71539% +/- 0.173182%
MultiSource/Benchmarks/Fhourstones-3.1/fhourstones3.1
	1.90013% +/- 0.866472%
MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-dbl
	2.39854% +/- 1.05914%
MultiSource/Benchmarks/TSVC/ControlFlow-dbl/ControlFlow-dbl
	2.4402% +/- 0.817904%
MultiSource/Benchmarks/TSVC/LoopRestructuring-dbl/LoopRestructuring-dbl
	5.87997% +/- 3.3172%
MultiSource/Benchmarks/Trimaran/netbench-crc/netbench-crc
	9.02643% +/- 5.79591%
MultiSource/Benchmarks/VersaBench/bmm/bmm
	10.3517% +/- 1.227%

Obviously, there are data points on both sides of this; but I think, overall,
this supports making the change.

llvm-svn: 195951
2013-11-29 07:04:59 +00:00
Hal Finkel 5a7162f36b Create a PPC440 SchedMachineModel
Some of the older PPC processor definitions don't have associated
SchedMachineModels; correct this for the PPC440.

llvm-svn: 195949
2013-11-29 06:32:17 +00:00
Hal Finkel 4035e8d86a Fixup PPC440 load/store operand latencies
The operand latencies for loads and stores in the PPC440 itinerary were wrong
(the store operands are all inputs, and the "with update" (pre-increment)
instructions need a latency for the additional output).

llvm-svn: 195948
2013-11-29 06:19:43 +00:00
Hal Finkel a10bd1d23a Adjust PPC440 operand latencies
The operand latencies for the PPC440 should be specified relative to dispatch,
not relative to the initial fetch-and-decode stages. Because most instructions
(ignoring bypass) wait in dispatch until their operands are ready, this is
modeled as reading input operands "at dispatch" (0 cycles after issue), and so
every input and output operand has 4 cycles subtracted from it.

This could alter scheduling slightly, but I don't expect a large effect.

llvm-svn: 195947
2013-11-29 05:59:00 +00:00
Hal Finkel dd06369913 Don't model the fetch and decode units for the PPC440
Modeling the fetch and decode units in the PPC440 itinerary does not add
anything to the hazard detection capability (and so modeling them just wastes
compile time).

No functionality change intended.

llvm-svn: 195946
2013-11-29 05:58:38 +00:00
Lang Hames 39609996d9 Refactor a lot of patchpoint/stackmap related code to simplify and make it
target independent.

Most of the x86 specific stackmap/patchpoint handling was necessitated by the
use of the native address-mode format for frame index operands. PEI has now
been modified to treat stackmap/patchpoint similarly to DEBUG_INFO, allowing
us to use a simple, platform independent register/offset pair for frame
indexes on stackmap/patchpoints.

Notes:
  - Folding is now platform independent and automatically supported.
  - Emiting patchpoints with direct memory references now just involves calling
    the TargetLoweringBase::emitPatchPoint utility method from the target's
    XXXTargetLowering::EmitInstrWithCustomInserter method. (See
    X86TargetLowering for an example).
  - No more ugly platform-specific operand parsers.

This patch shouldn't change the generated output for X86. 

llvm-svn: 195944
2013-11-29 03:07:54 +00:00
Hao Liu ba38eee8ac AArch64: The pattern match should check the range of the immediate value.
Or we can generate some illegal instructions.
E.g. shrn2 v0.4s, v1.2d, #35. The legal range should be in [1, 16].

llvm-svn: 195941
2013-11-29 02:11:22 +00:00
Jiangning Liu c429c00f3b Add missing pattern for supporting intrinsic function vbsl_f64 with
argument double floating point.

llvm-svn: 195938
2013-11-29 01:37:15 +00:00
Kevin Qin 337cfcc83c [AArch64 NEON]Fix a assertion failure when disassemble SHLL instruction.
llvm-svn: 195936
2013-11-29 01:29:16 +00:00
Rafael Espindola d5bd5a4716 Refactor to remove a bit of duplication. No functionality change.
llvm-svn: 195933
2013-11-28 20:12:44 +00:00
Benjamin Kramer ea1982aff9 Silence sign-compare warning and reduce nesting.
No functionality change.

llvm-svn: 195932
2013-11-28 19:58:56 +00:00
NAKAMURA Takumi 226e10edff [CMake] Let add_public_tablegen_target() provide intrinsics_gen, too.
I think, in principle, intrinsics_gen may be added explicitly.
That said, it can be added incidentally, since each target already has dependencies to llvm-tblgen.
Almost all source files depend on both CommonTaleGen and intrinsics_gen.

Explicit add_dependencies() have been pruned under lib/Target.

llvm-svn: 195929
2013-11-28 17:04:31 +00:00
NAKAMURA Takumi ce746c6c49 [CMake] Let add_public_tablegen_target responsible to provide dependency to CommonTableGen.
add_public_tablegen_target adds *CommonTableGen to LLVM_COMMON_DEPENDS.
LLVM_COMMON_DEPENDS affects add_llvm_library (and other add_target stuff) within its scope.

llvm-svn: 195927
2013-11-28 17:04:04 +00:00
Rafael Espindola 848493d886 The global prefix is always one char. Don't use a string for it.
llvm-svn: 195926
2013-11-28 17:00:49 +00:00
NAKAMURA Takumi b2abd160b3 [CMake] Prune include_directories() in llvm/lib/Target, take #2.
I forgot to commit them. They were staging in my local repo.

llvm-svn: 195924
2013-11-28 15:30:37 +00:00
Daniel Sanders 063b74ad4e [mips] Revert test commit r195922.
llvm-svn: 195923
2013-11-28 15:26:33 +00:00
Daniel Sanders eb16443fca [mips] A test commit to test my Herald and Audit workflow
Will be reverted in the next commit

llvm-svn: 195922
2013-11-28 15:25:43 +00:00
NAKAMURA Takumi 413518f1f8 [CMake] Prune include_directories() in llvm/lib/Target. add_llvm_target() sets them.
llvm-svn: 195921
2013-11-28 14:53:30 +00:00
NAKAMURA Takumi 979e604d8c Add newline at eof.
llvm-svn: 195920
2013-11-28 14:52:52 +00:00
Rafael Espindola 3e3a3f1f85 Use the mangler consistently instead of using getGlobalPrefix directly.
llvm-svn: 195911
2013-11-28 08:59:52 +00:00
Hal Finkel 92720ab1b2 Don't share functional units among the PPC itineraries
Instead of sharing functional unit names between the various PPC itineraries,
give each core its own unit names prefixed with the core name.  This follows
the convention used by other backends (such as ARM), and removes a non-obvious
ordering dependency between the various PPCSchedule*.td files.

No functionality change intended.

llvm-svn: 195908
2013-11-28 06:05:59 +00:00
Jiangning Liu 4bc9dbd846 Remove the variable only used by assert to avoid the build failure
caused by build options [-Werror,-Wunused-variable].

llvm-svn: 195905
2013-11-28 01:34:55 +00:00
Hao Liu f9f468abee AArch64: Fix a bug about disassembling post-index load single element to 4 vectors
llvm-svn: 195903
2013-11-28 01:07:45 +00:00
Reed Kotler 0d409e2dfe Check in conditional branches for constant islands. Still need to finish
conditional branches for very large targets. That will be the next small
patch. Everything now should in principle work as good (functionality
wise) as without constant islands so we decided at Mips/Imagination to
make constant islands the default for Mips16 now so that it will get
excercised a lot and this port is still experimentatl though hopefully soon
we will change the status. Some more cleanup and code review is in order
but things are converging fast.

llvm-svn: 195902
2013-11-28 00:56:37 +00:00
Akira Hatanaka f6109e4ad7 [mips] Redefine TAILCALL as a pseudo instruction.
No functionality change.

llvm-svn: 195896
2013-11-27 23:58:32 +00:00
Akira Hatanaka f9a0ec4fc4 Add MipsOptimizePICCall.cpp to CMakeLists.txt.
llvm-svn: 195894
2013-11-27 23:47:25 +00:00
Akira Hatanaka 168d4e5b20 [mips] Implement the following optimizations using dominance information to
make PIC calls a little more efficient:

1. Remove instructions setting up $gp if it is known that a function has been
   called at least once.
2. Save the address of a called function in a register instead of loading
   it from the GOT at every call site.

llvm-svn: 195892
2013-11-27 23:38:42 +00:00
Hal Finkel 3e5a360ba3 Add IIC_ prefix to PPC instruction-class names
This adds the IIC_ prefix to the instruction itinerary class names, giving the
PPC backend a naming convention for itinerary classes that is more consistent
with that used by the X86 and ARM backends.

Instruction scheduling in the PPC backend needs a bunch of cleanup and
improvement (especially for the ooo cores). This is just a preliminary step.

No functionality change intended.

llvm-svn: 195890
2013-11-27 23:26:09 +00:00
Rafael Espindola c90584b6f6 Don't set GlobalPrefix to the default value.
llvm-svn: 195884
2013-11-27 21:57:54 +00:00
Rafael Espindola 429e3fb068 The R600 has its own asm printer which doesn't use GlobalPrefix. Drop it.
llvm-svn: 195883
2013-11-27 21:52:37 +00:00
Tom Stellard 175e7a8c97 R600: Expand vector FABS
NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195881
2013-11-27 21:23:39 +00:00
Tom Stellard c149dc02d3 R600/SI: Implement spilling of SGPRs v5
SGPRs are spilled into VGPRs using the {READ,WRITE}LANE_B32 instructions.

v2:
  - Fix encoding of Lane Mask
  - Use correct register flags, so we don't overwrite the low dword
    when restoring multi-dword registers.

v3:
  - Register spilling seems to hang the GPU, so replace all shaders
    that need spilling with a dummy shader.

v4:
  - Fix *LANE definitions
  - Change destination reg class for 32-bit SMRD instructions

v5:
  - Remove small optimization that was crashing Serious Sam 3.

https://bugs.freedesktop.org/show_bug.cgi?id=68224
https://bugs.freedesktop.org/show_bug.cgi?id=71285

NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195880
2013-11-27 21:23:35 +00:00
Tom Stellard 859199dad8 R600/SI: Use SGPR_32 register class for 32-bit SMRD outputs
Writing to the M0 register from an SMRD instruction hangs the GPU, so
we need to use the SGPR_32 register class, which does not include M0.

NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195879
2013-11-27 21:23:29 +00:00
Tom Stellard 4d566b2edf R600: Add support for ISD::FROUND
NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195878
2013-11-27 21:23:20 +00:00
Rafael Espindola 3dc549dbe3 Remove dead code.
MO_ExternalSymbol and MO_JumpTableIndex don't show up in inline asm.

llvm-svn: 195861
2013-11-27 18:38:14 +00:00
Rafael Espindola 52434f9673 Convert two if sequences to switches.
llvm-svn: 195859
2013-11-27 18:26:51 +00:00
Rafael Espindola ed20f478bc Use a switch.
llvm-svn: 195857
2013-11-27 18:18:24 +00:00
Rafael Espindola c5c7bb6b20 Remove more dead code now that this is only used for inline asm.
MO_ConstantPoolIndex is handled in printLeaMemReference.
MO_JumpTableIndex and MO_ExternalSymbol don't show up in inline asm.

llvm-svn: 195847
2013-11-27 15:13:06 +00:00
Jiangning Liu 97aa8cf8b7 Fix the AArch64 NEON bug exposed by checking constant integer argument range of ACLE intrinsics.
llvm-svn: 195843
2013-11-27 14:02:25 +00:00
Rafael Espindola e370147b8c Convert more methods in static helpers.
llvm-svn: 195826
2013-11-27 07:34:09 +00:00
Rafael Espindola 7caa135677 Convert these methods into static functions.
llvm-svn: 195825
2013-11-27 07:14:26 +00:00
Rafael Espindola 09cf06c75e Cleanup and test X86AsmPrinter::printPCRelImm.
It is only used for asm printing.

On X86 we put basic block addresses on register before passing them to inline
asm, so the MO_MachineBasicBlock case was dead.

MO_ExternalSymbol was dead since any symbol being passed to inline asm
is represented as MO_GlobalAddress.

The MO_GlobalAddress and MO_Register cases were not tested.

llvm-svn: 195824
2013-11-27 06:53:13 +00:00
Hal Finkel 8081ae9134 Fix comment in PPCA2Model
llvm-svn: 195807
2013-11-27 03:12:56 +00:00
Rafael Espindola d0ed730f92 Remove dead argument.
llvm-svn: 195806
2013-11-27 02:25:20 +00:00
Chad Rosier 75290c6307 [AArch64] Add support for NEON scalar floating-point absolute difference.
llvm-svn: 195803
2013-11-27 01:45:58 +00:00
Chad Rosier 9653d5c989 [AArch64] Add support for NEON scalar floating-point to integer convert
instructions.

llvm-svn: 195788
2013-11-26 22:17:37 +00:00
Reed Kotler 3aeb1d0857 Fix a bug related to constant islands for Mips16 and mips16/32 dual mode.
The determination of when we are doing constant pools was being made too
early in the asm printer.

llvm-svn: 195781
2013-11-26 20:38:40 +00:00
Michael Liao d617a3015d Fix PR18054
- Fix bug in (vsext (vzext x)) -> (vsext x) in SIGN_EXTEND_IN_REG
  lowering where we need to check whether x is a vector type (in-reg
  type) of i8, i16 or i32; otherwise, that optimization is not valid.

llvm-svn: 195779
2013-11-26 20:31:31 +00:00
Tim Northover fa36dfeeca Darwin-ARM: use movw/movt for static relocations
llvm-svn: 195759
2013-11-26 12:45:05 +00:00
Richard Sandiford dd7dd930d1 [SystemZ] Fix incorrect use of RISBG for a zero-extended right shift
We would wrongly transform the testcase into the equivalent of an AND with 1.
The problem was that, when testing whether the shifted-in bits of the right
shift were significant, we used the width of the final zero-extended result
rather than the width of the shifted value.

llvm-svn: 195731
2013-11-26 10:53:16 +00:00
Kevin Qin 599c47d0de Refactored the implementation of AArch64 NEON instruction ZIP, UZP
and TRN.
Fix a bug when mixed use of vget_high_u8() and vuzp_u8().

llvm-svn: 195716
2013-11-26 03:26:47 +00:00
Kevin Qin 33ca18fdcf [AArch64]Implement 128 bit register copy with NEON.
llvm-svn: 195713
2013-11-26 02:33:42 +00:00
Andrew Trick 391dbadb51 StackMap: Implement support for DirectMemRefOp.
A Direct stack map location records the address of frame index. This
address is itself the value that the runtime requested. This differs
from IndirectMemRefOp locations, which refer to a stack locations from
which the requested values must be loaded. Direct locations can
directly communicate the address if an alloca, while IndirectMemRefOp
handle register spills.

For example:

entry:
  %a = alloca i64...
  llvm.experimental.stackmap(i32 <ID>, i32 <shadowBytes>, i64* %a)

Since both the alloca and stackmap intrinsic are in the entry block,
and the intrinsic takes the address of the alloca, the runtime can
assume that LLVM will not substitute alloca with any intervening
value. This must be verified by the runtime by checking that the stack
map's location is a Direct location type. The runtime can then
determine the alloca's relative location on the stack immediately after
compilation, or at any time thereafter. This differs from Register and
Indirect locations, because the runtime can only read the values in
those locations when execution reaches the instruction address of the
stack map.

llvm-svn: 195712
2013-11-26 02:03:25 +00:00
Andrew Trick d3ab37cfeb whitespace
llvm-svn: 195711
2013-11-26 02:03:20 +00:00
Cameron McInally c592e5251c Add an intrinsic for the SSE2 PAUSE instruction.
llvm-svn: 195697
2013-11-26 00:20:43 +00:00
Rafael Espindola a834e30130 Do the string comparison in the constructor instead of once per nop.
Thanks to Roman Divacky for the suggestion.

llvm-svn: 195684
2013-11-25 20:50:03 +00:00
Rafael Espindola 1b8bfdaae3 Don't use nopl in cpus that don't support it.
Patch by Mikulas Patocka. I added the test. I checked that for cpu names that
gas knows about, it also doesn't generate nopl.

The modified cpus:
i686 - there are i686-class CPUs that don't have nopl: Via c3, Transmeta
        Crusoe, Microsoft VirtualBox - see
        https://bbs.archlinux.org/viewtopic.php?pid=775414
k6, k6-2, k6-3, winchip-c6, winchip2 - these are 586-class CPUs
via c3 c3-2 - see https://bugs.archlinux.org/task/19733 as a proof that
        Via c3 and c3-Nehemiah don't have nopl

llvm-svn: 195679
2013-11-25 20:15:14 +00:00
Tim Northover d34094e525 Fix indentation typo
llvm-svn: 195660
2013-11-25 17:04:35 +00:00
Tim Northover db962e2c45 ARM: remove special cases for Darwin dynamic-no-pic mode.
These are handled almost identically to static mode (and ELF's global address
materialisation), except that a symbol may have "$non_lazy_ptr" appended. This
can be handled by passing appropriate flags along with the instruction instead
of using entirely separate pseudo-instructions.

llvm-svn: 195655
2013-11-25 16:24:52 +00:00
Tim Northover dfe2156c91 ARM: remove unused patterns.
There is no sane way for an LEApcrel (= single ADR) instruction to generate a
global address on any ARM target I know of. Fortunately, no-one was trying to
any more, but there were vestigial patterns.

llvm-svn: 195644
2013-11-25 14:40:57 +00:00
Amara Emerson 34df448f7c [ARM] Enable FeatureMP for Cortex-A5 by default.
Patch by Oliver Stannard.

llvm-svn: 195640
2013-11-25 13:17:15 +00:00
Tim Northover 89ccb616bd X86: enable AVX2 under Haswell native compilation
Patch by Adam Strzelecki

llvm-svn: 195632
2013-11-25 09:52:59 +00:00
Hao Liu fbd2b4484c Fixed a bug about disassembling AArch64 post-index load/store single element instructions.
ie. echo "0x00 0x04 0x80 0x0d" | ../bin/llvm-mc -triple=aarch64 -mattr=+neon -disassemble
    echo "0x00 0x00 0x80 0x0d" | ../bin/llvm-mc -triple=aarch64 -mattr=+neon -disassemble
will be disassembled into the same instruction st1 {v0b}[0], [x0], x0.

llvm-svn: 195591
2013-11-25 01:53:26 +00:00
NAKAMURA Takumi edbeaee857 SparcFrameLowering.cpp: Prune 'DL' [-Wunused-variable]
llvm-svn: 195590
2013-11-25 00:52:46 +00:00
Venkatraman Govindaraju 1116868a0d [Sparc] Emit large negative adjustments to SP/FP with sethi+xor instead of sethi+or. This generates correct code for both sparc32 and sparc64.
llvm-svn: 195576
2013-11-24 20:23:25 +00:00
Venkatraman Govindaraju 9c338504e5 [Sparc]: Implement LEA pattern for sparcv9.
llvm-svn: 195575
2013-11-24 20:07:35 +00:00
Venkatraman Govindaraju f79528c132 [SparcV9]: Do not emit .register directives for global registers that are clobbered by calls but not used in the function itself.
llvm-svn: 195574
2013-11-24 18:41:49 +00:00
Venkatraman Govindaraju 0510db0597 [SparcV9] Enable custom lowering of DYNAMIC_STACKALLOC in sparc64.
llvm-svn: 195573
2013-11-24 17:41:41 +00:00
Reed Kotler a787aa2b1e Make sure that for C++ emitting LwConstant32 pseudos, that it corresponds
to what is needed for constant islands. The prescan method for Mips16 constant
islands will eventually go away. It is only temporary and should be done
earlier when the instructions are first created or from the DAG. If we keep
it here we need to handle better the situation where constant islands
is called multiple times since don't want to prescan more than once.

llvm-svn: 195569
2013-11-24 06:18:50 +00:00
Reed Kotler d3b28ebe03 Fix a funny bug I introduced during conversion of ARM constant islands to Mips.
I had to move some code and I moved a declaration forward past it's first use
in the function but by nutty coincidence there was another variable of the same
name and type and  with completely unrelated function that was declared globally
in the class so no compilation error ensued.
It required some unusual conditions for it to even matter. Caused test
case casts.c in test-suite to fail during compilation with a duplicate 
symbol error. I would have noticed it during final code review for this port.

llvm-svn: 195565
2013-11-24 02:53:09 +00:00
Tom Stellard c0845334da R600/SI: Fixing handling of condition codes
We were ignoring the ordered/onordered bits and also the signed/unsigned
bits of condition codes when lowering the DAG to MachineInstrs.

NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195514
2013-11-22 23:07:58 +00:00
Jim Grosbach 860934a924 X86: Perform integer comparisons at i32 or larger.
Utilizing the 8 and 16 bit comparison instructions, even when an input can
be folded into the comparison instruction itself, is typically not worth it.
There are too many partial register stalls as a result, leading to significant
slowdowns. By always performing comparisons on at least 32-bit
registers, performance of the calculation chain leading to the
comparison improves. Continue to use the smaller comparisons when
minimizing size, as that allows better folding of loads into the
comparison instructions.

rdar://15386341

llvm-svn: 195496
2013-11-22 19:57:47 +00:00
Paul Robinson d89125a5d8 Teach ISel not to optimize 'optnone' functions (revised).
Improvements over r195317:
- Set/restore EnableFastISel flag instead of just running FastISel within
  SelectAllBasicBlocks; the flag is checked in various places, and
  FastISel won't run properly if those places don't do the right thing.
- Test looks for normal ISel versus FastISel behavior, and not
  something more subtle that doesn't work everywhere.

Based on work by Andrea Di Biagio.

llvm-svn: 195491
2013-11-22 19:11:24 +00:00
Michael Liao 02160d580b Fix PR18014
- When simplifying the mask generation for BLEND, check whether that mask is
  also consumed by other non-BLEND insns. If true, skip that simplification.

llvm-svn: 195476
2013-11-22 17:56:57 +00:00
Richard Sandiford f03789ca3f [SystemZ] Fix TMHH and TMHL usage for z10 with -O0
I've no idea why I decided to handle TMxx differently from all the other
high/low logic operations, but it was a stupid thing to do.  The high
registers aren't available as separate 32-bit registers on z10,
so subreg_h32 can't be used on a GR64 there.

I've normally been testing with z196 and with -O3 and so hadn't noticed
this until now.

llvm-svn: 195473
2013-11-22 17:28:28 +00:00
Rafael Espindola 5a8e985ad3 Don't produce tail calls when the caller is x86_thiscallcc.
The callee will not pop the stack for us.

llvm-svn: 195467
2013-11-22 15:18:28 +00:00
Daniel Sanders d40aea8768 Fix typo in a comment added in r195455.
Credit to Matheus Almeida for spotting it.

llvm-svn: 195456
2013-11-22 13:22:52 +00:00
Daniel Sanders 630dbe0a14 [mips][msa] Fix corner case for integer constant splats with undef values.
lowerBUILD_VECTOR() was treating integer constant splats as being legal
regardless of whether they had undef values. This caused instruction
selection failures when the undefs were legalized to zero, making the
constant non-splat.

Fixed this by requiring HasAnyUndef to be false for a integer constant
splat to be legal. If it is true, a new node is generated with the undefs
replaced with the necessary values to remain a splat.

llvm-svn: 195455
2013-11-22 13:14:06 +00:00
Richard Barton c31078cded Add support for Cortex-A12.
Patch by Oliver Stannard!

llvm-svn: 195448
2013-11-22 11:53:16 +00:00
Daniel Sanders fd8e416879 [mips][msa] Float vector constants cannot use ldi.[wd] directly. Bitcast from the appropriate integer vector type.
Fixes an instruction selection failure detected by llvm-stress.

llvm-svn: 195444
2013-11-22 11:24:50 +00:00
Kostya Serebryany 4007009815 Revert r195318 as it causes miscompilation (PR18029)
llvm-svn: 195439
2013-11-22 10:30:39 +00:00
Hao Liu e8bdc8c864 Fix a Cygwin build failure caused by enum values starting with '_', which is conflicted with some platform macros.
This patch only renames variables, no functional change.

llvm-svn: 195432
2013-11-22 09:24:41 +00:00
Hao Liu 25aed9bb5b Fix the bugs about AArch64 Load/Store vector types and bitcast between i64 and vector types.
e.g. "%tmp = load <2 x i64>* %ptr" can't be selected. 
     "%tmp = bitcast i64 %in to <2 x i32>" can't be selected.

llvm-svn: 195424
2013-11-22 08:47:22 +00:00
Hao Liu 91ae869692 Revert last change by haoliu because of buildbot failure.
llvm-svn: 195423
2013-11-22 08:34:54 +00:00
Hao Liu b75d80fdf0 Fix a Cygwin build failure caused by enum values starting with '_', which is conflicted with some platform macros.
This solution only renames variables, no functional change.

NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195421
2013-11-22 08:17:16 +00:00
Jiangning Liu a91633a435 For AArch64 back-end instruction selection, lower Neon_Lowxxx with EXTRCT_SUBREG.
llvm-svn: 195408
2013-11-22 02:45:13 +00:00
Lang Hames 1ca1123598 Fix a typo where we were creating <def,kill> operands instead of
<def,dead> ones.

Add an assertion to make sure we catch this in the future.

Fixes <rdar://problem/15464559>.

llvm-svn: 195401
2013-11-22 00:46:32 +00:00
Tom Stellard cd6b0a658a R600: Implement TargetInstrInfo::isLegalToSplitMBBAt()
Splitting a basic block will create a new ALU clause, so we need to make
sure we aren't moving uses of registers that are local to their
current clause into a new one.

I had a test case for this, but unfortunately unrelated schedule changes
invalidated it, and I wasn't been able to come up with another one.

NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195399
2013-11-22 00:41:08 +00:00
Ekaterina Romanova d5fa55470c SHLD/SHRD are VectorPath (microcode) instructions known to have poor latency on certain architectures. While generating SHLD/SHRD instructions is acceptable when optimizing for size, optimizing for speed on these platforms should be implemented using alternative sequences of instructions composed of add, adc, shr, shl, or and lea which are directPath instructions. These alternative instructions not only have a lower latency but they also increase the decode bandwidth by allowing simultaneous decoding of a third directPath instruction.
AMD's processors family K7, K8, K10, K12, K15 and K16 are known to have SHLD/SHRD instructions with very poor latency. Optimization guides for these processors recommend using an alternative sequence of instructions. For these AMD's processors, I disabled folding (or (x << c) | (y >> (64 - c))) when we are not optimizing for size.

It might be beneficial to disable this folding for some of the Intel's processors. However, since I couldn't find specific recommendations regarding using SHLD/SHRD instructions on Intel's processors, I haven't disabled this peephole for Intel.

llvm-svn: 195383
2013-11-21 23:21:26 +00:00
Daniel Sanders c8c50fb41f [mips][msa] Fix a corner case in performORCombine() when combining nodes into VSELECT.
Mask == ~InvMask asserts if the width of Mask and InvMask differ.
The combine isn't valid (with two exceptions, see below) if the widths differ
so test for this before testing Mask == ~InvMask.

In the specific cases of Mask=~0 and InvMask=0, as well as Mask=0 and
InvMask=~0, the combine is still valid. However, there are more appropriate
combines that could be used in these cases such as folding x & 0 to 0, or
x & ~0 to x.

llvm-svn: 195364
2013-11-21 16:11:31 +00:00
Artyom Skrobov 468ee230ea [ARM] add basic Cortex-A7 support to LLVM backend
llvm-svn: 195358
2013-11-21 14:03:21 +00:00
Daniel Sanders 6e664bcef3 [mips][msa/dsp] Only do DSP combines if DSP is enabled.
Fixes a crash (null pointer dereferenced) when MSA is enabled.

llvm-svn: 195343
2013-11-21 11:40:14 +00:00
NAKAMURA Takumi 66c95430b8 Whitespace.
llvm-svn: 195341
2013-11-21 11:08:31 +00:00
NAKAMURA Takumi 43aa939625 Revert r195317 (and r195333), "Teach ISel not to optimize 'optnone' functions."
It broke, at least, i686 target. It is reproducible with "llc -mtriple=i686-unknown".

FYI, it didn't appear to add either "-O0" or "-fast-isel".

llvm-svn: 195339
2013-11-21 10:55:15 +00:00