llvm-project

Commit Graph

Author	SHA1	Message	Date
Christian Pirker	39db7ec81f	ARMEB: Fix byte order of EH frame unwinding instructions, with modified test file This commit was already commited as revision rL208689 and discussd in phabricator revision D3704. But the test file was crashing on OS X and windows. I fixed the test file in the same way as in rL208340. llvm-svn: 208711	2014-05-13 16:44:30 +00:00
Joey Gouly	12a8bf09d0	[CGP] r205941 changed the logic, so that a cast happens before 'Result' is compared to 'AddrMode.BaseReg'. In the case that 'AddrMode.BaseReg' is nullptr, 'Result' will also be nullptr, so the cast causes an assertion. We should use dyn_cast_or_null here to check 'Result' is not null and it is an instruction. Bug found by Mats Petersson, and I reduced his IR to get a test case. llvm-svn: 208705	2014-05-13 15:42:45 +00:00
Rafael Espindola	2e7eceb317	Revert "ARMEB: Fix byte order of EH frame unwinding instructions" This reverts commit r208689. The test was crashing on OS X and windows. llvm-svn: 208704	2014-05-13 15:19:56 +00:00
Christian Pirker	ea3514ecdb	ARMEB: Fix byte order of EH frame unwinding instructions llvm-svn: 208689	2014-05-13 11:41:49 +00:00
Weiming Zhao	dd83691cc3	Folding into CSEL when there is ZEXT between SETCC and ADD Normally, patterns like (add x, (setcc cc ...)) will be folded into (csel x, x+1, not cc). However, if there is a ZEXT after SETCC, they won't be folded. This patch recognizes the ZEXT and allows the generation of CSINC. This patch fixes bug 19680. llvm-svn: 208660	2014-05-13 00:40:58 +00:00
Adam Nemet	5d78558c2b	[DAGCombiner] Split up an indexed load if only the base pointer value is live Right now the load may not get DCE'd because of the side-effect of updating the base pointer. This can happen if we lower a read-modify-write of an illegal larger type (e.g. i48) such that the modification only affects one of the subparts (the lower i32 part but not the higher i16 part). See the testcase. In order to spot the dead load we need to revisit it when SimplifyDemandedBits decided that the value of the load is masked off. This is the CommitTargetLoweringOpt piece. I checked compile time with ARM64 by sending SPEC bitcode files through llc. No measurable change. Fixes <rdar://problem/16031651> llvm-svn: 208640	2014-05-12 23:00:03 +00:00
Louis Gerbarg	b4013235e3	Fix ARM bswap16.ll test on Windows Windows on ARM only supports thumb mode execution, so we have to explicitly pick some non-Windows OS to test ARM mode codegen. llvm-svn: 208638	2014-05-12 22:13:07 +00:00
Reid Kleckner	7a59e0845f	Try to fix an SDAG dependence issue with sret r208453 added support for having sret on the second parameter. In that change, the code for copying sret into a virtual register was hoisted into the loop that lowers formal parameters. This caused a "Wrong topological sorting" assertion failure during scheduling when a parameter is passed in memory. This change undoes that by creating a second loop that deals with sret. I'm worried that this fix is incomplete. I don't fully understand the dependence issues. However, with this change we produce the same DAGs we used to produce, so if they are broken, they are just as broken as they have always been. llvm-svn: 208637	2014-05-12 22:01:27 +00:00
Adam Nemet	63e4b30f79	[Test] Trim unnecessary .c and .cpp from config.suffix in lit.local.cfg Tested by comparing make check VERBOSE=1 before and after to make sure no tests are missed. (VERBOSE=1 prints the list of tests.) Only one test :( remains where .cpp is required: tools/llvm-cov/range_based_for.cpp:// RUN: llvm-cov range_based_for.cpp \| FileCheck %s --check-prefix=STDOUT The topic was discussed in this thread: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140428/214905.html llvm-svn: 208621	2014-05-12 19:57:31 +00:00
Louis Gerbarg	efdcf23736	Add support bswap16 to/from memory compiling to rev16 on ARM/Thumb The current patterns for REV16 misses mostn __builtin_bswap16() due to legalization promoting the operands to from load/stores toi32s and then truncing/extending them. This patch adds new patterns that catch the resultant DAGs and codegens them to rev16 instructions. Tests included. rdar://15353652 llvm-svn: 208620	2014-05-12 19:53:52 +00:00
Tim Northover	ee20caaf82	TableGen: use PrintMethods to print more aliases llvm-svn: 208607	2014-05-12 18:04:06 +00:00
Matt Arsenault	62b1737081	R600: Add mul24 intrinsics llvm-svn: 208604	2014-05-12 17:49:57 +00:00
Matt Arsenault	2adca6090f	Make SimplifyDemandedBits understand BUILD_PAIR llvm-svn: 208598	2014-05-12 17:14:48 +00:00
Benjamin Kramer	3b36b72a9c	X86: Make sure that we have SSE4.1 before we generate insertps nodes. PR19721. llvm-svn: 208552	2014-05-12 13:12:08 +00:00
Christian Pirker	238c7c165b	ARM: Implement big endian bit-conversion for NEON type llvm-svn: 208538	2014-05-12 11:19:20 +00:00
Elena Demikhovsky	8e8fde8e93	AVX-512: changes in intrinsics 1) Changed gather and scatter intrinsics. Now they are aligned with GCC built-ins. There is no more non-masked form. Masked intrinsic receives -1 if all lanes are executed. 2) I changed the function that works with intrinsics inside X86ISelLowering.cpp. I put all intrinsics in one table. I did it for INTRINSICS_W_CHAIN and plan to put all intrinsics from WO_CHAIN set to the same table in order to avoid the long-long "switch". (I wanted to use static map initialization that allowed by C++11 but I wasn't able to compile it on VS2012). 3) I added gather/scatter prefetch intrinsics. 4) I fixed MRMm encoding for masked instructions. llvm-svn: 208522	2014-05-12 07:18:51 +00:00
Hal Finkel	0d8db46799	[PowerPC] Add global named register support Support for the intrinsics that read from and write to global named registers is added for r1, r2 and r13 (depending on the subtarget). llvm-svn: 208509	2014-05-11 19:29:11 +00:00
Hal Finkel	c4c6c87666	[PowerPC] On PPC32, 128-bit shifts might be runtime calls The counter-loops formation pass needs to know what operations might be function calls (because they can't appear in counter-based loops). On PPC32, 128-bit shifts might be runtime calls (even though you can't use __int128 on PPC32, it seems that SROA might form them). Fixes PR19709. llvm-svn: 208501	2014-05-11 16:23:29 +00:00
Filipe Cabecinhas	0e3d1cb5d6	Fixed a bug when lowering build_vector (PR19694) When lowering build_vector to an insertps, we would still lower it, even if the source vectors weren't v4x32. This would break on avx if the source was a v8x32. We now check the type of the source vectors. llvm-svn: 208487	2014-05-11 08:12:56 +00:00
Vincent Lejeune	29c0c210fc	R600/SI: Fold fabs/fneg into src input modifier llvm-svn: 208480	2014-05-10 19:18:39 +00:00
Vincent Lejeune	94af31fbe8	R600/SI: Prettier display of input modifiers llvm-svn: 208479	2014-05-10 19:18:33 +00:00
Tim Northover	55b3e22927	ARM64: fix SELECT_CC lowering in absence of NaNs. We were swapping the true & false results while testing for FMAX/FMIN, but not putting them back to the original state if the later checks failed. Should fix PR19700. llvm-svn: 208469	2014-05-10 07:37:50 +00:00
Reid Kleckner	c487d73f41	Revert "[ms-cxxabi] Add a new calling convention that swaps 'this' and 'sret'" This reverts commit r200561. This calling convention was an attempt to match the MSVC C++ ABI for methods that return structures by value. This solution didn't scale, because it would have required splitting every CC available on Windows into two: one for methods and one for free functions. Now that we can put sret on the second arg (r208453), and Clang does that (r208458), revert this hack. llvm-svn: 208459	2014-05-09 22:56:42 +00:00
Reid Kleckner	7941856445	Allow sret on the second parameter as well as the first MSVC always places the implicit sret parameter after the implicit this parameter of instance methods. We used to handle this for x86_thiscallcc by allocating the sret parameter on the stack and leaving the this pointer in ecx, but that doesn't handle alternative calling conventions like cdecl, stdcall, fastcall, or the win64 convention. Instead, change the verifier to allow sret on the second parameter. This also requires changing the Mips and X86 backends to return the argument with the sret parameter, instead of assuming that the sret parameter comes first. The Sparc backend also returns sret parameters in a register, but I wasn't able to update it to handle secondary sret parameters. It currently calls report_fatal_error if you feed it an sret in the second parameter. Reviewers: rafael.espindola, majnemer Differential Revision: http://reviews.llvm.org/D3617 llvm-svn: 208453	2014-05-09 22:32:13 +00:00
Reid Kleckner	d0eda92845	Fix ARM intrinsics-overflow.ll test on Windows Windows on ARM only supports thumb mode execution, so we have to explicitly pick some non-Windows OS to test ARM mode codegen. llvm-svn: 208448	2014-05-09 21:52:48 +00:00
Louis Gerbarg	3342bf1451	Add custom lowering for add/sub with overflow intrinsics to ARM This patch adds support to ARM for custom lowering of the llvm.{u\|s}add.with.overflow.i32 intrinsics for i32/i64. This is particularly useful for handling idiomatic saturating math functions as generated by InstCombineCompare. Test cases included. rdar://14853450 llvm-svn: 208435	2014-05-09 17:02:49 +00:00
Tom Stellard	4c00b52e1a	R600/SI: Teach SIInstrInfo::moveToVALU() how to move S_LOAD_*_IMM instructions llvm-svn: 208432	2014-05-09 16:42:22 +00:00
Tom Stellard	d6cb8e8efd	R600/SI: Fix SMRD pattern for offsets > 32 bits We were dropping the high bits of 64-bit immediate offsets. llvm-svn: 208431	2014-05-09 16:42:21 +00:00
Tom Stellard	a2acad785a	R600: Expand i64 SELECT_CC llvm-svn: 208430	2014-05-09 16:42:19 +00:00
Tom Stellard	afa8b532b1	R600: Move MIN/MAX matching from LowerOperation() to PerformDAGCombine() llvm-svn: 208429	2014-05-09 16:42:16 +00:00
James Molloy	dd1aa14a21	Attempt to pacify the bots - this commit requires asserts. llvm-svn: 208424	2014-05-09 16:20:53 +00:00
Oliver Stannard	c24f2171ca	ARM: HFAs must be passed in consecutive registers When using the ARM AAPCS, HFAs (Homogeneous Floating-point Aggregates) must be passed in a block of consecutive floating-point registers, or on the stack. This means that unused floating-point registers cannot be back-filled with part of an HFA, however this can currently happen. This patch, along with the corresponding clang patch (http://reviews.llvm.org/D3083) prevents this. llvm-svn: 208413	2014-05-09 14:01:47 +00:00
Daniel Sanders	b7f1c6ff3e	[mips][mips64r6] Add experimental support for MIPS32r6 and MIPS64r6 Summary: Adds MIPS32r6/MIPS64r6 and checks the compatibility requirements for these processors. I've also included comments to describe removed and re-encoded instructions, along with placeholder def's for the new instructions but there are no functional changes to codegen at this point. Reviewers: jkolek, vmedic Reviewed By: vmedic Differential Revision: http://reviews.llvm.org/D3622 llvm-svn: 208399	2014-05-09 09:46:21 +00:00
Saleem Abdulrasool	40bca0afab	ARM: support PIC on Windows on ARM Handle lowering of global addresses for PIC mode compilation on Windows. Always use the movw/movt load to load the address as Windows on ARM requires ARMv7+ and is a pure Thumb environment. llvm-svn: 208385	2014-05-09 00:58:32 +00:00
Filipe Cabecinhas	e4b482b3ed	Optimize shufflevector that copies an i64/f64 and zeros the rest. Summary: Also ran clang-format on the function. The code added is the last else if block. Reviewers: nadav, craig.topper, delena Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3518 llvm-svn: 208372	2014-05-08 23:16:08 +00:00
Justin Bogner	7833d9facb	test/CodeGen: Check that the correct register is used in a store This tightens up r208351 to ensure that a store is fed with the correct value. Thanks to Quentin Colombet for spotting this! llvm-svn: 208368	2014-05-08 22:45:07 +00:00
Justin Bogner	1de42075fc	Make a CodeGen test more robust against vector register selection llvm-svn: 208351	2014-05-08 18:53:56 +00:00
Andrea Di Biagio	e85ba4df52	[X86] Add target specific combine rules to fold SSE2/AVX2 packed arithmetic shift intrinsics. This patch teaches the backend how to combine packed SSE2/AVX2 arithmetic shift intrinsics. The rules are: - Always fold a packed arithmetic shift by zero to its first operand; - Convert a packed arithmetic shift intrinsic dag node into a ISD::SRA only if the shift count is known to be smaller than the vector element size. This patch also teaches to function 'getTargetVShiftByConstNode' how fold target specific vector shifts by zero. Added two new tests to verify that the DAGCombiner is able to fold sequences of SSE2/AVX2 packed arithmetic shift calls. llvm-svn: 208342	2014-05-08 17:44:04 +00:00
Saleem Abdulrasool	39a939d7d2	test: fix test on Windows When building on Windows, the default target is Windows. Windows on ARM does not support ARM mode compilation, resulting in test failures. Simply specify a triple to ensure that we are testing the correct behaviour. llvm-svn: 208340	2014-05-08 17:11:29 +00:00
Christian Pirker	b5728191c2	ARM big endian function argument passing llvm-svn: 208316	2014-05-08 14:06:24 +00:00
James Molloy	c42ea14f74	[ARM64-BE] Teach fast-isel about how to set up sub-word stack arguments for big endian calls. SelectionDAG already knows about this, but fast-isel was ignorant. llvm-svn: 208307	2014-05-08 12:53:50 +00:00
Tim Northover	18f8bb84fa	ARM64: make sure FastISel emits SSA MachineInstrs We need to use a temporary register for a 2-step operation like REM. llvm-svn: 208297	2014-05-08 10:30:56 +00:00
Hao Liu	1187a3d8db	AArch64/ARM64: Port NEON post-increment load/store with 2/3/4 vectors to ARM64 backend. llvm-svn: 208284	2014-05-08 07:38:13 +00:00
Filipe Cabecinhas	095d9d573a	Lower certain build_vectors to insertps instructions Summary: Vectors built with zeros and elements in the same order as another (source) vector are optimized to be built using a single insertps instruction. Also optimize when we move one element in a vector to a different place in that vector while zeroing out some of the other elements. Further optimizations are possible, described in TODO comments. I will be implementing at least some of them in the near future. Added some tests for different cases where this optimization triggers. Reviewers: nadav, delena, craig.topper Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3521 llvm-svn: 208271	2014-05-08 00:25:16 +00:00
Quentin Colombet	60cdff65c7	[X86] Add a test case for r208252. Prior to r208252, the FMA 231 family was marked as isCommutable. However the memory variants of this family are not commutable. Therefore, we did not implemented the findCommutedOpIndices for those variants and missed that the default implementation (more or less: commute indices 1 and 2) was firing behind our back. As a result, as demonstrated in the test case before the fix, we were transforming a = b * c + a into a = a * c + b. I.e., before r208252 we were generating for this test case: vmovaps %xmm0, %xmm1 vmoss (%rsi), %xmm0 vfmadd231ss (%rdi), %xmm1, %xmm0 Instead of: vmoss (%rsi), %xmm1 vfmadd231ss (%rdi), %xmm1, %xmm0 <rdar://problem/16800495> llvm-svn: 208260	2014-05-07 22:52:58 +00:00
Chad Rosier	788e5e3d7c	[ARM64][fast-isel] Disable target specific optimizations at -O0. Functionally, this patch disables the dead register elimination pass and the load/store pair optimization pass at -O0. The ILP optimizations don't require the optimization level to be checked because the call to addILPOpts is predicated with the necessary check. The AdvSIMDScalar pass is disabled by default at all optimization levels. This patch leaves that pass disabled by default. Also, move command-line options into ARM64TargetMachine.cpp and add a few additional flags to aid in debugging. This fixes an issue with the -debug-pass=Structure flag where passes were printed, but not actually run (i.e., AdvSIMDScalar pass). llvm-svn: 208223	2014-05-07 16:41:55 +00:00
Tim Northover	88a51d983e	AArch64/ARM64: optimise vector selects & enable test When performing a scalar comparison that feeds into a vector select, it's actually better to do the comparison on the vector side: the scalar route would be "CMP -> CSEL -> DUP", the vector is "CM -> DUP" since the vector comparisons are all mask based. llvm-svn: 208210	2014-05-07 14:10:27 +00:00
James Molloy	d3c401a2d0	[ARM64-BE] Fix fast-isel, and add appropriate RUN lines to appropriate tests. llvm-svn: 208200	2014-05-07 12:33:55 +00:00
James Molloy	36132057da	[ARM64-BE] Fix variable-argument saving. llvm-svn: 208199	2014-05-07 12:33:48 +00:00
James Molloy	4049e4fd77	[ARM64-BE] Implement the lane-twiddling logic at AAPCS boundaries for big endian. The AAPCS states that values passed in registers must have a value as though they had been loaded with "LDR". LDR is equivalent to "LD1.64 vX.1D" - that is, loading scalars to vector registers and loading 1-element vectors is equivalent. The logic implemented here is to ensure that at all call boundaries and during formal argument lowering all vectors are treated as their bitwidth-based floating point scalar counterpart, which is always one of f64 or f128 (v2i32 -> f64, v4i32 -> f128 etc). A BITCAST is inserted so that the appropriate REV will be generated during code generation. llvm-svn: 208198	2014-05-07 12:33:41 +00:00
James Molloy	30e0e11eb4	[ARM64-BE] Implement the crazy bitcast handling for big endian vectors. Because we've canonicalised on using LD1/ST1, every time we do a bitcast between vector types we must do an equivalent lane reversal. Consider a simple memory load followed by a bitconvert then a store. v0 = load v2i32 v1 = BITCAST v2i32 v0 to v4i16 store v4i16 v2 In big endian mode every memory access has an implicit byte swap. LDR and STR do a 64-bit byte swap, whereas LD1/ST1 do a byte swap per lane - that is, they treat the vector as a sequence of elements to be byte-swapped. The two pairs of instructions are fundamentally incompatible. We've decided to use LD1/ST1 only to simplify compiler implementation. LD1/ST1 perform the equivalent of a sequence of LDR/STR + REV. This makes the original code sequence: v0 = load v2i32 v1 = REV v2i32 (implicit) v2 = BITCAST v2i32 v1 to v4i16 v3 = REV v4i16 v2 (implicit) store v4i16 v3 But this is now broken - the value stored is different to the value loaded due to lane reordering. To fix this, on every BITCAST we must perform two other REVs: v0 = load v2i32 v1 = REV v2i32 (implicit) v2 = REV v2i32 v3 = BITCAST v2i32 v2 to v4i16 v4 = REV v4i16 v5 = REV v4i16 v4 (implicit) store v4i16 v5 This means an extra two instructions, but actually in most cases the two REV instructions can be combined into one. For example: (REV64_2s (REV64_4h X)) === (REV32_4h X) There is also no 128-bit REV instruction. This must be synthesized with an EXT instruction. Most bitconverts require some sort of conversion. The only exceptions are: a) Identity conversions - vNfX <-> vNiX b) Single-lane-to-scalar - v1fX <-> fX or v1iX <-> iX Even though there are hundreds of changed lines, I have a fairly high confidence that they are somewhat correct. The changes to add two REV instructions per bitcast were pretty mechanical, and once I'd done that I threw the resulting .td at a script I wrote which combined the two REVs together (and added an EXT instruction, for f128) based on an instruction description I gave it. This was much less prone to error than doing it all manually, plus my brain would not just have melted but would have vapourised. llvm-svn: 208194	2014-05-07 11:28:53 +00:00
James Molloy	ccc7f982c1	[ARM64-BE] Make big endian (scalar) argument passing work correctly. This completes the port of r204814 (cpirker "AArch64_BE function argument passing for ARM ABI") from AArch64 to ARM64, and fixes a bunch of issues found during later development along the way. The biggest of these was that the alignment fixup logic wasn't replicated into all the places it should have been. llvm-svn: 208192	2014-05-07 11:28:36 +00:00
Tim Northover	df723343fa	AArch64/ARM64: run test on ARM64 too. llvm-svn: 208188	2014-05-07 10:47:04 +00:00
Tim Northover	76a94e6ead	AArch64/ARM64: put annotation in test It makes finding already covered tests much easier with "grep -L arm64". llvm-svn: 208187	2014-05-07 10:47:00 +00:00
Joerg Sonnenberger	cf86ce136c	Allow using normal .eh_frame based unwinding on ARM. Use the same encodings as x86. Use this exception model for NetBSD. llvm-svn: 208166	2014-05-07 07:49:34 +00:00
Saleem Abdulrasool	acd0338c61	ARM: fix WoA PEI instruction selection The ARM::BLX instruction is an ARM mode instruction. The Windows on ARM target is limited to Thumb instructions. Correctly use the thumb mode tBLXr instruction. This would manifest as an errant write into the object file as the instruction is 4-bytes in length rather than 2. The result would be a corrupted object file that would eventually result in an executable that would crash at runtime. llvm-svn: 208152	2014-05-07 03:03:27 +00:00
Joerg Sonnenberger	818e725158	If a function needs a frame pointer, but r11 (aka fp) has not been used, remove it from the list of unspilled registers. Otherwise the following attempt to keep the stack aligned by picking an extra GPR register to spill will not work as it picks up r11. llvm-svn: 208129	2014-05-06 20:43:01 +00:00
Andrea Di Biagio	c14ccc9184	[X86] Improve the lowering of BITCAST dag nodes from type f64 to type v2i32 (and vice versa). Before this patch, the backend always emitted a store+load sequence to bitconvert from f64 to i64 the input operand of a ISD::BITCAST dag node that performed a bitconvert from type MVT::f64 to type MVT::v2i32. The resulting i64 node was then used to build a v2i32 vector. With this patch, the backend now produces a cheaper SCALAR_TO_VECTOR from MVT::f64 to MVT::v2f64. That SCALAR_TO_VECTOR is then followed by a "free" bitcast to type MVT::v4i32. The elements of the resulting v4i32 are then extracted to build a v2i32 vector (which is illegal and therefore promoted to MVT::v2i64). This is in general cheaper than emitting a stack store+load sequence to bitconvert the operand from type f64 to type i64. llvm-svn: 208107	2014-05-06 17:09:03 +00:00
Renato Golin	c7aea40ec6	Implememting named register intrinsics This patch implements the infrastructure to use named register constructs in programs that need access to specific registers (bare metal, kernels, etc). So far, only the stack pointer is supported as a technology preview, but as it is, the intrinsic can already support all non-allocatable registers from any architecture. llvm-svn: 208104	2014-05-06 16:51:25 +00:00
Rafael Espindola	52dc5d828f	Special case aliases in GlobalValue::getAlignment. An alias has the address of what it points to, so it also has the same alignment. This allows a few optimizations to see past aliases for free. llvm-svn: 208103	2014-05-06 16:48:58 +00:00
Kevin Qin	1353c3405d	[ARM64] Enable alignment control option in front-end for ARM64. This is the modification in llvm part. llvm-svn: 208074	2014-05-06 09:48:52 +00:00
Reid Kleckner	4a406d32e9	Fix i128 div/mod on mingw64 The Win64 docs are very clear that anything larger than 8 bytes is passed by reference, and GCC MinGW64 honors that for __modti3 and friends. Patch by Jameson Nash! llvm-svn: 208029	2014-05-06 01:20:42 +00:00
Tom Stellard	45b3dcd35b	R600: Expand i64 ISD:SUB llvm-svn: 208005	2014-05-05 21:47:15 +00:00
Filipe Cabecinhas	fe59062b75	Revert "Optimize shufflevector that copies an i64/f64 and zeros the rest." This reverts commit 207992. I misread the phab number on the LGTM. llvm-svn: 207993	2014-05-05 19:40:36 +00:00
Filipe Cabecinhas	263d98c19f	Optimize shufflevector that copies an i64/f64 and zeros the rest. Summary: Also ran clang-format on the function. The code added is the last else if block. Reviewers: nadav, craig.topper Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3518 llvm-svn: 207992	2014-05-05 19:36:28 +00:00
Michael Zolotukhin	e37f33c466	Move test from r207969 to another folder and rename it. llvm-svn: 207984	2014-05-05 18:10:15 +00:00
Rafael Espindola	595f54205c	Remove the -disable-cfi option. This also add a release note about it. If this stays I will cleanup MC next week. llvm-svn: 207977	2014-05-05 17:33:26 +00:00
Rafael Espindola	82ad91915e	Modify test to not use -disable-cfi. llvm-svn: 207974	2014-05-05 16:47:07 +00:00
Rafael Espindola	f463b63448	Convert a CodeGen test into a MC test. llvm-svn: 207971	2014-05-05 15:34:13 +00:00
Saleem Abdulrasool	e8a7afef86	CodeGen: correct memset emittance for WoA Windows on ARM does not conform to AEABI. However, memset would be emitted using the AEABI signature, resulting in inverted parameters. Handle this special case appropriately. llvm-svn: 207943	2014-05-04 23:13:21 +00:00
Saleem Abdulrasool	9c4716e4b6	CodeGen: strengthen WoA AEABI avoidance tests Add additional test cases for WoA AEABI avoidance checking. llvm-svn: 207942	2014-05-04 23:13:18 +00:00
Elena Demikhovsky	e73333a50f	AVX-512: minor change in rndscale intrinsic llvm-svn: 207937	2014-05-04 13:35:37 +00:00
Saleem Abdulrasool	82b69fa105	X86: repair export compatibility with MinGW/cygwin Both MinGW and cygwin (i686) construct export directives without the global leader prefix. This is mostly due to the fact that they use GNU ld which does not correctly handle the export directive. This apparently has been been broken for a while. However, this was recently reported as being broken by mingwandroid and diorcety of the msys2 project. Remove the global leader prefix if targeting MinGW or cygwin, otherwise, retain the global leader prefix. Add an explicit test for cygwin's behaviour of export directives. llvm-svn: 207926	2014-05-04 00:03:48 +00:00
Joey Gouly	b0afd1b929	[ARM64] Correctly select ANDWri in FastISel. http://reviews.llvm.org/D3598 llvm-svn: 207917	2014-05-03 17:27:06 +00:00
Tim Northover	820e041a3c	DAGCombine: prevent formation of illegal ConstantFP nodes. llvm-svn: 207850	2014-05-02 17:25:02 +00:00
Tom Stellard	3dbf1f8df0	R600: Expand vector sin and cos. v2: move code to AMDGPUISelLowering.cpp squash with tests (both EG and SI) Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 207845	2014-05-02 15:41:47 +00:00
Tom Stellard	605e116e8e	R600: Expand TruncStore i64 -> {i16,i8} llvm-svn: 207844	2014-05-02 15:41:46 +00:00
Tim Northover	d7360900a8	AArch64/ARM64: add patterns for post-indexed ST1 ops. llvm-svn: 207840	2014-05-02 14:54:27 +00:00
Tim Northover	d0b07e133b	AArch64/ARM64: support indexed loads/stores on vector types. While post-indexed LD1/ST1 instructions do exist for vector loads, this patch makes use of the more flexible addressing-modes in LDR/STR instructions. llvm-svn: 207838	2014-05-02 14:54:15 +00:00
Benjamin Kramer	42d262f410	Allow SelectionDAG::FoldConstantArithmetic to work when it's called with a vector VT but scalar values. llvm-svn: 207835	2014-05-02 12:35:22 +00:00
Michael J. Spencer	1f10c5ea94	[IR] Make {extract,insert}element accept an index of any integer type. Given the following C code llvm currently generates suboptimal code for x86-64: __m128 bss4( const __m128 ptr, size_t i, size_t j ) { float f = ptr[i][j]; return (__m128) { f, f, f, f }; } ================================================= define <4 x float> @_Z4bss4PKDv4_fmm(<4 x float> nocapture readonly %ptr, i64 %i, i64 %j) #0 { %a1 = getelementptr inbounds <4 x float>* %ptr, i64 %i %a2 = load <4 x float>* %a1, align 16, !tbaa !1 %a3 = trunc i64 %j to i32 %a4 = extractelement <4 x float> %a2, i32 %a3 %a5 = insertelement <4 x float> undef, float %a4, i32 0 %a6 = insertelement <4 x float> %a5, float %a4, i32 1 %a7 = insertelement <4 x float> %a6, float %a4, i32 2 %a8 = insertelement <4 x float> %a7, float %a4, i32 3 ret <4 x float> %a8 } ================================================= shlq $4, %rsi addq %rdi, %rsi movslq %edx, %rax vbroadcastss (%rsi,%rax,4), %xmm0 retq ================================================= The movslq is uneeded, but is present because of the trunc to i32 and then sext back to i64 that the backend adds for vbroadcastss. We can't remove it because it changes the meaning. The IR that clang generates is already suboptimal. What clang really should emit is: %a4 = extractelement <4 x float> %a2, i64 %j This patch makes that legal. A separate patch will teach clang to do it. Differential Revision: http://reviews.llvm.org/D3519 llvm-svn: 207801	2014-05-01 22:12:39 +00:00
Reed Kotler	bab3f23da6	Add basic functionality for assignment of ints. This creates a lot of core infrastructure in which to add, with little effort, quite a bit more to mips fast-isel Test Plan: simplestore.ll Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D3527 llvm-svn: 207790	2014-05-01 20:39:21 +00:00
Matt Arsenault	06028dd7be	R600/SI: Fix verifier error with pseudo store instructions. Use i32 instead of specifying SReg_32. When this is the pseudo INDIRECT_BASE_ADDR, this would give a bogus verifier error. llvm-svn: 207770	2014-05-01 16:37:52 +00:00
Bradley Smith	3567cc1b42	[ARM64] Prefer generation of bzero on Darwin only llvm-svn: 207760	2014-05-01 13:11:59 +00:00
Tim Northover	534acbdf73	AArch64/ARM64: print BFM instructions as BFI or BFXIL The canonical form of the BFM instruction is always one of the more explicit extract or insert operations, which makes reading output much easier. llvm-svn: 207752	2014-05-01 12:29:38 +00:00
Weiming Zhao	7f6daf1799	[ARM64] Prevent bit extraction to be adjusted by following shift For pattern like ((x >> C1) & Mask) << C2, DAG combiner may convert it into (x >> (C1-C2)) & (Mask << C2), which makes pattern matching of ubfx more difficult. For example: Given %shr = lshr i64 %x, 4 %and = and i64 %shr, 15 %arrayidx = getelementptr inbounds [8 x [64 x i64]]* @arr, i64 0, %i64 2, i64 %and %0 = load i64* %arrayidx With current shift folding, it takes 3 instrs to compute base address: lsr x8, x0, #1 and x8, x8, #0x78 add x8, x9, x8 If using ubfx, it only needs 2 instrs: ubfx x8, x0, #4, #4 add x8, x9, x8, lsl #3 This fixes bug 19589 llvm-svn: 207702	2014-04-30 21:07:24 +00:00
Michael Zolotukhin	1f4a960ccf	[X86] Never hoist the shift value of a shift instruction. There is no need to check if we want to hoist the immediate value of an shift instruction. Simply return TCC_Free right away. This change is like r206101, but for X86. rdar://problem/16190769 llvm-svn: 207692	2014-04-30 19:17:32 +00:00
Tim Northover	a8c577e454	ARM64: print fp immediates without using scientific notation. llvm-svn: 207669	2014-04-30 16:13:34 +00:00
Tom Stellard	1bd80725b3	R600/SI: Use VALU instructions for copying i1 values We can't use SALU instructions for this since they ignore the EXEC mask and are always executed. This fixes several OpenCV tests. llvm-svn: 207661	2014-04-30 15:31:33 +00:00
Tom Stellard	0c354f25c9	R600/SI: Teach moveToVALU how to handle some SMRD instructions llvm-svn: 207660	2014-04-30 15:31:29 +00:00
Chad Rosier	864e35db0a	[ARM64][fast-isel] Fast-isel doesn't know how to handle f128. llvm-svn: 207659	2014-04-30 15:29:57 +00:00
Sasa Stankovic	7b061a42b1	[mips] Fix MipsLongBranch pass to work when the offset from the branch to the target cannot be determined accurately. This is the case for NaCl where the sandboxing instructions are added in MC layer, after the MipsLongBranch pass. It is also the case when the code has inline assembly. Instead of calculating offset in the MipsLongBranch pass, use %hi(sym1 - sym2) and %lo(sym1 - sym2) expressions that are resolved during the fixup. This patch also deletes microMIPS test file test/CodeGen/Mips/micromips-long-branch.ll and implements microMIPS CHECKs in a much simpler way in a file test/CodeGen/Mips/longbranch.ll, together with MIPS32 and MIPS64. llvm-svn: 207656	2014-04-30 15:06:25 +00:00
Tim Northover	0ac99404f0	ARM64: print lsr instead of lsrv for variable shifts (etc) The canonical syntax for shifts by a variable amount does not end with 'v', but that syntax should be supported as an alias (presumably for legacy reasons). llvm-svn: 207649	2014-04-30 13:37:07 +00:00
Tim Northover	20ad359b77	AArch64/ARM64: use HS instead of CS & LO instead of CC. On instructions using the NZCV register, a couple of conditions have dual representations: HS/CS and LO/CC (meaning unsigned-higher-or-same/carry-set and unsigned-lower/carry-clear). The first of these is more descriptive in most circumstances, so we should print it. llvm-svn: 207644	2014-04-30 13:14:03 +00:00
Daniel Sanders	e296a0fce5	[mips][msa] Fix vector insertions where the index is variable Summary: This isn't supported directly so we rotate the vector by the desired number of elements, insert to element zero, then rotate back. The i64 case generates rather poor code on MIPS32. There is an obvious optimisation to be made in future (do both insert.w's inside a shared rotate/unrotate sequence) but for now it's sufficient to select valid code instead of aborting. Depends on D3536 Reviewers: matheusalmeida Reviewed By: matheusalmeida Differential Revision: http://reviews.llvm.org/D3537 llvm-svn: 207640	2014-04-30 12:09:32 +00:00
Tim Northover	970c4a8d35	ARM64: use hex immediates for movz/movk instructions Since these are mostly used in "lsl #16", "lsl #32", "lsl #48" combinations to piece together an immediate in 16-bit chunks, hex is probably the most appropriate format. llvm-svn: 207635	2014-04-30 11:19:40 +00:00
Tim Northover	4b2f8a990e	ARM64: hexify printing various immediate operands This is mostly aimed at the NEON logical operations and MOVI/MVNI (since they accept weird shifts which are more naturally understandable in hex notation). Also changes BRK/HINT etc, which is probably a neutral change, but easier than the alternative. llvm-svn: 207634	2014-04-30 11:19:28 +00:00
Tim Northover	cfd6e66544	ARM64: print canonical syntax for add/sub (imm) instructions. Since these instructions only accept a 12-bit immediate, possibly shifted left by 12, the canonical syntax used by the architecture reference manual is "#N {, lsl #12 }". We should accept an immediate that has already been shifted, (e.g. Also, print a comment giving the full addend since it can be helpful. llvm-svn: 207633	2014-04-30 11:19:15 +00:00
James Molloy	7c39df37b2	[ARM64] Ensure arm64_be is dealt with when emitting debug info. This is a partial port of r204816 (cpirker "Elf support for MC-JIT runtime dynamic linker") from AArch64 to ARM64. llvm-svn: 207625	2014-04-30 10:15:35 +00:00
Tim Northover	41cec5c3cb	ARM64: make sure FastISel uses a GPR64 source in 64-bit extensions. llvm-svn: 207620	2014-04-30 09:32:01 +00:00
Saleem Abdulrasool	25947c318b	ARM: support stack probe emission for Windows on ARM This introduces the stack lowering emission of the stack probe function for Windows on ARM. The stack on Windows on ARM is a dynamically paged stack where any page allocation which crosses a page boundary of the following guard page will cause a page fault. This page fault must be handled by the kernel to ensure that the page is faulted in. If this does not occur and a write access any memory beyond that, the page fault will go unserviced, resulting in an abnormal program termination. The watermark for the stack probe appears to be at 4080 bytes (for accommodating the stack guard canaries and stack alignment) when SSP is enabled. Otherwise, the stack probe is emitted on the page size boundary of 4096 bytes. llvm-svn: 207615	2014-04-30 07:05:07 +00:00
Saleem Abdulrasool	f8222631a5	ARM: partially handle 32-bit relocations for WoA IMAGE_REL_ARM_MOV32T relocations require that the movw/movt pair-wise relocation is not split up and reordered. When expanding the mov32imm pseudo-instruction, create a bundle if the machine operand is referencing an address. This helps ensure that the relocatable address load is not reordered by subsequent passes. Unfortunately, this only partially handles the case as the Constant Island Pass occurs after the instructions are unbundled and does not properly handle bundles. That is a more fundamental issue with the pass itself and beyond the scope of this change. llvm-svn: 207608	2014-04-30 04:54:58 +00:00
Reid Kleckner	fb69308568	Implement X86 code generation for musttail Currently, musttail codegen is relying on sibcall optimization, and reporting a fatal error if fails. Sibcall optimization fails when stack arguments need to be modified, which is insufficient for musttail. The logic for moving arguments in memory safely is already implemented for GuaranteedTailCallOpt. This change merely arranges for musttail calls to use it. No functional change for GuaranteedTailCallOpt. Reviewers: espindola Differential Revision: http://reviews.llvm.org/D3493 llvm-svn: 207598	2014-04-29 23:55:41 +00:00
Tom Stellard	919bb6b83f	R600/SI: Custom lower SI_IF and SI_ELSE to avoid machine verifier errors SI_IF and SI_ELSE are terminators which also produce a value. For these instructions ISel always inserts a COPY to move their value to another basic block. This COPY ends up between SI_(IF\|ELSE) and the S_BRANCH* instruction at the end of the block. This breaks MachineBasicBlock::getFirstTerminator() and also the machine verifier which assumes that terminators are grouped together at the end of blocks. To solve this we coalesce the copy away right after ISel to make sure there are no instructions in between terminators at the end of blocks. llvm-svn: 207591	2014-04-29 23:12:53 +00:00
Tom Stellard	58ac7440e6	R600/SI: Only select SALU instructions in the entry or exit block SALU instructions ignore control flow, so it is not always safe to use them within branches. This is a partial solution to this problem until we can come up with something better. llvm-svn: 207590	2014-04-29 23:12:48 +00:00
Tom Stellard	676f571999	R600: optimize the UDIVREM 64 algorithm This is a squash of several optimization commits: - calculate DIV_Lo and DIV_Hi separately - use BFE_U32 if we are operating on 32bit values - use precomputed constants instead of shifting in UDVIREM - skip the first 32 iterations of udivrem v2: Check whether BFE is supported before using it Patch by: Jan Vesely Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 207589	2014-04-29 23:12:46 +00:00
Reed Kotler	67077b3032	Add Simple return instruction to Mips fast-isel Reviewers: dsanders Reviewed by: dsanders Differential Revision: http://reviews.llvm.org/D3430 llvm-svn: 207565	2014-04-29 17:57:50 +00:00
Daniel Sanders	6857800b67	[mips][msa] Use CHECK-LABEL in basic_operations*.ll Differential Revision: http://reviews.llvm.org/D3536 llvm-svn: 207529	2014-04-29 14:28:58 +00:00
Daniel Sanders	b3268e71e2	[mips][msa] Fix element extraction where the index is variable. Summary: This isn't supported directly so we splat the vector element and extract the most convenient copy. Reviewers: matheusalmeida Reviewed By: matheusalmeida Differential Revision: http://reviews.llvm.org/D3530 llvm-svn: 207524	2014-04-29 13:31:37 +00:00
Tim Northover	aacce57d61	ARM: fix test after change to indirect symbol emission. llvm-svn: 207519	2014-04-29 10:13:10 +00:00
Tim Northover	9e7782dcf3	X86: emit hidden stubs into a proper non_lazy_symbol_pointer section. rdar://problem/16660411 llvm-svn: 207518	2014-04-29 10:06:10 +00:00
Tim Northover	2372301bcf	ARM: emit hidden stubs into a proper non_lazy_symbol_pointer section. rdar://problem/16660411 llvm-svn: 207517	2014-04-29 10:06:05 +00:00
Benjamin Kramer	e1ab3f062e	AArch64: Mark vector long multiplication as expand. There are no patterns for this. This was already fixed for ARM64 but I forgot to apply it to AArch64 too. llvm-svn: 207515	2014-04-29 09:37:54 +00:00
Elena Demikhovsky	299cf511c4	AVX-512: optimized a shuffle pattern to VINSERTI64x4. Added intrinsics for VPERMT2PS/PD/D/Q instructions. llvm-svn: 207513	2014-04-29 09:09:15 +00:00
Hao Liu	6db3410071	[ARM64]Fix a bug about incorrect operand order in an EXT instruction, which is introduced by r207485. llvm-svn: 207500	2014-04-29 07:51:19 +00:00
Hao Liu	cf37110920	[ARM64]Fix a bug when lowering shuffle vector to an EXT instruction. E.g. Mask like <-1, -1, 1, ...> will generate incorrect EXT index. llvm-svn: 207485	2014-04-29 01:50:36 +00:00
Chad Rosier	0def8e2652	[ARM64] Fix an issue where we were always assuming a copy was coming from a D subregister. llvm-svn: 207423	2014-04-28 16:21:50 +00:00
Hao Liu	9a342778b9	[ARM64]Fix a bug cannot select UQSHL/SQSHL with constant i64 shift amount. llvm-svn: 207399	2014-04-28 07:34:27 +00:00
Benjamin Kramer	3693e77cb4	X86: If SSE4.1 is missing lower SMUL_LOHI of v4i32 to pmuludq and fix up the high parts. This is more expensive than pmuldq but still cheaper than scalarizing the whole thing. llvm-svn: 207370	2014-04-27 18:47:41 +00:00
Benjamin Kramer	99767ddf0b	Update test not to check for a shuffle of an all-zero vector. llvm-svn: 207354	2014-04-27 11:54:45 +00:00
Benjamin Kramer	6bca8ef667	SelectionDAG: Aggressively fold shuffles of constant splats. llvm-svn: 207352	2014-04-27 11:41:06 +00:00
Benjamin Kramer	da4841b3a9	DAGCombiner: Simplify code a bit, make more transforms work with vectors. llvm-svn: 207338	2014-04-26 23:09:49 +00:00
Benjamin Kramer	6d2dff61f9	X86: Lower SMUL_LOHI of v4i32 to pmuldq when SSE4.1 is available. llvm-svn: 207318	2014-04-26 14:12:19 +00:00
Benjamin Kramer	c9827ab103	X86: Add patterns for MULHU/MULHS of v8i16 and v16i16. This gets us pretty code for divs of i16 vectors. Turn the existing intrinsics into the corresponding nodes. llvm-svn: 207317	2014-04-26 13:01:03 +00:00
Benjamin Kramer	4dae598bc8	DAGCombiner: Turn divs of vector splats into vectorized multiplications. Otherwise the legalizer would just scalarize everything. Support for mulhi in the targets isn't that great yet so on most targets we get exactly the same scalarized output. Add a test for x86 vector udiv. I had to disable the mulhi nodes on ARM because there aren't any patterns for it. As far as I know ARM has instructions for getting the high part of a multiply so this should be fixed. llvm-svn: 207315	2014-04-26 12:06:28 +00:00
Michael Zolotukhin	1a97a7bcbf	Revert r206749 till a final decision about the intrinsics is made. llvm-svn: 207313	2014-04-26 09:56:41 +00:00
Juergen Ributzka	a6bda8bae2	[DAG] During DAG legalization keep opaque constants even after expanding. The included test case would return the incorrect results, because the expansion of an shift with a constant shift amount of 0 would generate undefined behavior. This is because ExpandShiftByConstant assumes that all shifts by constants with a value of 0 have already been optimized away. This doesn't happen for opaque constants and usually this isn't a problem, because opaque constants won't take this code path - they are not supposed to. In the case that the opaque constant has to be expanded by the legalizer, the legalizer would drop the opaque flag. In this case we hit the limitations of ExpandShiftByConstant and create incorrect code. This commit fixes the legalizer by not dropping the opaque flag when expanding opaque constants and adding an assertion to ExpandShiftByConstant to catch this not supported case in the future. This fixes <rdar://problem/16718472> llvm-svn: 207304	2014-04-26 02:58:04 +00:00
Quentin Colombet	ea18933d97	[X86] Implement TargetLowering::getScalingFactorCost hook. Scaling factors are not free on X86 because every "complex" addressing mode breaks the related instruction into 2 allocations instead of 1. <rdar://problem/16730541> llvm-svn: 207301	2014-04-26 01:11:26 +00:00
Filipe Cabecinhas	d71f110fe9	Appease the almighty buildbots. llvm-svn: 207295	2014-04-26 00:02:37 +00:00
Filipe Cabecinhas	363b570d2a	Optimization for certain shufflevector by using insertps. Summary: If we're doing a v4f32/v4i32 shuffle on x86 with SSE4.1, we can lower certain shufflevectors to an insertps instruction: When most of the shufflevector result's elements come from one vector (and keep their index), and one element comes from another vector or a memory operand. Added tests for insertps optimizations on shufflevector. Added support and tests for v4i32 vector optimization. Reviewers: nadav Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3475 llvm-svn: 207291	2014-04-25 23:51:17 +00:00
Saleem Abdulrasool	99f0d458c3	ARM: remove @llvm.arm.sevl This intrinsic is no longer needed with the new @llvm.arm.hint(i32) intrinsic which provides a generic, extensible manner for adding hint instructions. This functionality can now be represented as @llvm.arm.hint(i32 5). llvm-svn: 207246	2014-04-25 17:51:25 +00:00
Saleem Abdulrasool	7e7c2f9ca6	ARM: provide a new generic hint intrinsic Introduce the llvm.arm.hint(i32) intrinsic that can be used to inject hints into the instruction stream. This is particularly useful for generating IR from a compiler where the user may inject an intrinsic (e.g. __yield). These are then pattern substituted into the correct instruction which already existed. llvm-svn: 207242	2014-04-25 17:24:24 +00:00
Tilmann Scheller	2c65bbddd8	[ARM64] When compiling for ELF in PIC mode, local symbols shouldn't go through the GOT There's no need for local symbols to go through the GOT, in fact it seems GNU ld is not even emitting GOT entries for local symbols and will error out when trying to resolve a GOT relocation for a local symbol. This bug triggers when bootstrapping clang on AArch64 Linux with -fPIC and the ARM64 backend. The AArch64 backend is not affected. With this commit it's now possible to bootstrap clang on AArch64 Linux with the ARM64 backend (-fPIC, -O3). llvm-svn: 207226	2014-04-25 13:43:18 +00:00
Jiangning Liu	533b560bc6	[ARM64] Handle fp128 for parameter passing on stack llvm-svn: 207222	2014-04-25 12:07:03 +00:00
Tim Northover	eb7354fd3b	ARM64: fix assertion in ISelDAGToDAG Also an unused variable, so double bonus! This should deal with PR19548. llvm-svn: 207221	2014-04-25 10:48:47 +00:00
Bradley Smith	672df15122	[ARM64] Print preferred aliases for SFBM/UBFM in InstPrinter llvm-svn: 207219	2014-04-25 10:25:29 +00:00
Kevin Qin	022d395c9c	[ARM64] Add RUN lines for "–target arm64 –mattr=-fp-armv8" on AArch64 no-fp test. This patch is a supplement of implementing predicate of FP, enabling aarch64 backend no-fp tests on arm64 target for verification. During this, one bug is exposed and fixed by this patch. llvm-svn: 207215	2014-04-25 09:44:20 +00:00
Kevin Qin	0e7b07704e	[ARM64] Support crc predicate on ARM64. According to the specification, CRC is an optional extension of the architecture. llvm-svn: 207214	2014-04-25 09:25:42 +00:00
Benjamin Kramer	76f753e9a9	X86: Don't transform shifts into ands when the sign bit is tested. Should unbreak MultiSource/Benchmarks/mediabench/g721/g721encode/encode. llvm-svn: 207145	2014-04-24 20:51:37 +00:00
Reid Kleckner	5772b77789	Add 'musttail' marker to call instructions This is similar to the 'tail' marker, except that it guarantees that tail call optimization will occur. It also comes with convervative IR verification rules that ensure that tail call optimization is possible. Reviewers: nicholas Differential Revision: http://llvm-reviews.chandlerc.com/D3240 llvm-svn: 207143	2014-04-24 20:14:34 +00:00
Reid Kleckner	0fbb1e91e5	Fix rdtsc.ll test to match r8 on win64 llvm-svn: 207142	2014-04-24 20:14:08 +00:00
Andrea Di Biagio	d1ab866868	[X86] Add support for Read Time Stamp Counter x86 builtin intrinsics. This patch: - Adds two new X86 builtin intrinsics ('int_x86_rdtsc' and 'int_x86_rdtscp') as GCCBuiltin intrinsics; - Teaches the backend how to lower the two new builtins; - Introduces a common function to lower READCYCLECOUNTER dag nodes and the two new rdtsc/rdtscp intrinsics; - Improves (and extends) the existing x86 test 'rdtsc.ll'; now test 'rdtsc.ll' correctly verifies that both READCYCLECOUNTER and the two new intrinsics work fine for both 64bit and 32bit Subtargets. llvm-svn: 207127	2014-04-24 17:18:27 +00:00
Tim Northover	6331d4b975	AArch64: print NEON lists with a space. This matches ARM64 behaviour, which I think is clearer. It also puts all the churn from that difference into one easily ignored commit. llvm-svn: 207116	2014-04-24 14:06:20 +00:00
Tim Northover	9b594d1163	AArch64/ARM64: port bitfield test to ARM64. llvm-svn: 207103	2014-04-24 12:11:56 +00:00
Tim Northover	eb6611e727	AArch64/ARM64: implement BFI optimisation ARM64 was not producing pure BFI instructions for bitfield insertion operations, unlike AArch64. The approach had to be a little different (in ISelDAGToDAG rather than ISelLowering), and the outcomes aren't identical but hopefully this gives it similar power. This should address PR19424. llvm-svn: 207102	2014-04-24 12:11:53 +00:00
Tim Northover	1cb984fbcf	AArch64/ARM64: port more tests llvm-svn: 207101	2014-04-24 12:11:46 +00:00
Benjamin Kramer	f4575db2fd	X86: Emit test instead of constant shift + compare if the shift result is unused. This allows us to compile return (mask & 0x8 ? a : b); into testb $8, %dil cmovnel %edx, %esi instead of andl $8, %edi shrl $3, %edi cmovnel %edx, %esi which we formed previously because dag combiner canonicalizes setcc of and into shift. llvm-svn: 207088	2014-04-24 08:15:31 +00:00
Saleem Abdulrasool	9e6a524551	MC: move test from Generic to COFF This is a COFF specific test, move it to COFF to fix the Hexagon buildbots. llvm-svn: 207030	2014-04-23 21:41:07 +00:00
Saleem Abdulrasool	11049a0fef	MC: honour IMAGE_SCN_CNT_INITIALIZED_DATA Emit the flag to indicate to the assembler that a section contains data if there is pre-populated data present. llvm-svn: 207028	2014-04-23 21:29:34 +00:00
Quentin Colombet	ef86b4067c	[ARM64] Fix the information we give to the peephole optimizer for comparison. ANDS does not use the same encoding scheme as other xxxS instructions (e.g., ADDS). Take that into account to avoid wrong peephole optimization. <rdar://problem/16693089> llvm-svn: 207020	2014-04-23 20:43:38 +00:00
Matt Arsenault	4c6ab696e2	R600: Add a test that used to be broken that I forgot to add llvm-svn: 207017	2014-04-23 19:45:05 +00:00
Kevin Qin	a4ee178762	[ARM64] Enable feature predicates for NEON / FP / CRYPTO. AArch64 has feature predicates for NEON, FP and CRYPTO instructions. This allows the compiler to generate code without using FP, NEON or CRYPTO instructions. llvm-svn: 206949	2014-04-23 06:22:48 +00:00
Reid Kleckner	feb1148ed6	Fix test/CodeGen/arm.ll The 'CHECK: add' line was occasionally matching against the filename, breaking the subsequent CHECK-NOT. Also use CHECK-LABEL. llvm-svn: 206936	2014-04-23 01:09:29 +00:00
Matt Arsenault	16353871c3	R600: Emit error instead of unreachable on function call llvm-svn: 206904	2014-04-22 16:42:00 +00:00
Elena Demikhovsky	acc5c9e83e	AVX-512: store and truncstore for i1 values llvm-svn: 206897	2014-04-22 14:13:10 +00:00
Tim Northover	52d3283026	AArch64/ARM64: more testing from AArch64 to ARM64 llvm-svn: 206889	2014-04-22 12:45:47 +00:00
Tim Northover	a962398a3f	AArch64/ARM64: make use of ANDS and BICS instructions for comparisons. llvm-svn: 206888	2014-04-22 12:45:42 +00:00
Tim Northover	31ebef86b8	AArch64/ARM64: add extra testing from AArch64 to ARM64 llvm-svn: 206887	2014-04-22 12:45:32 +00:00
Tim Northover	2b73e74238	AArch64/ARM64: enable various AArch64 tests on ARM64. llvm-svn: 206877	2014-04-22 10:10:26 +00:00
Tim Northover	00b4ee848f	AArch64/ARM64: add patterns for scalar_to_vector/extract pairs llvm-svn: 206876	2014-04-22 10:10:18 +00:00
Tim Northover	e74fb0d7b9	AArch64/ARM64: mark fmul intrinsic as commutative. This gives DAG patterns matching indexed patterns where either side is an indexed vector. llvm-svn: 206875	2014-04-22 10:10:14 +00:00
Tim Northover	978d25f391	ARM: disable emission of __XYZvfp in soft-float environment. The point of these calls is to allow Thumb-1 code to make use of the VFP unit to perform its operations. This is not desirable with -msoft-float, since most of the reasons you'd want that apply equally to the runtime library. rdar://problem/13766161 llvm-svn: 206874	2014-04-22 10:10:09 +00:00
Hao Liu	c636d15284	Fix an infinite loop bug in DAG Combine about keeping transfering between ANY_EXTEND and SIGN_EXTEND. llvm-svn: 206873	2014-04-22 09:57:06 +00:00
Lang Hames	f6f42cac3f	[X86] Don't use BZHI for short masks (>=32 bits). Thanks to Ben Kramer for the review. llvm-svn: 206869	2014-04-22 07:40:34 +00:00
Matt Arsenault	5dbd5db518	R600: Make sign_extend_inreg legal. Don't know why I didn't just do this in the first place. llvm-svn: 206862	2014-04-22 03:49:30 +00:00
Jiangning Liu	87486e0bac	[AArch64] Enable global merge pass. llvm-svn: 206861	2014-04-22 03:33:26 +00:00
Quentin Colombet	d4f44690ef	[CodeGenPrepare] Use APInt to check the value of the immediate in a and while checking candidate for bit field extract. Otherwise the value may not fit in uint64_t and this will trigger an assertion. This fixes PR19503. llvm-svn: 206834	2014-04-22 01:20:34 +00:00
Yi Jiang	d069f6393a	ARM64: Combine shifts and uses from different basic block to bit-extract instruction llvm-svn: 206774	2014-04-21 19:34:27 +00:00
Duncan P. N. Exon Smith	10be9a8868	Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl" This reverts commit r206707, reapplying r206704. The preceding commit to CalcSpillWeights should have sorted out the failing buildbots. <rdar://problem/14292693> llvm-svn: 206766	2014-04-21 17:57:07 +00:00
Eli Bendersky	7cd70df708	Fix the test: DCE optimized away everything. Use volatile store to protect the generated PTX from DCE. Patch by Jingyue Wu. llvm-svn: 206763	2014-04-21 17:23:12 +00:00
Michael Zolotukhin	f2ba994bf6	Reapply r206732. This time without optimization of branches. llvm-svn: 206749	2014-04-21 12:01:33 +00:00
NAKAMURA Takumi	54d9f88bed	llvm/test/CodeGen/X86/bmi.ll: Relax expressions for targeting win32. llvm-svn: 206743	2014-04-21 11:01:46 +00:00
Lang Hames	5aa6ee80b6	[X86] ISEL (and X, <constant mask>) to BZHI when BMI2 is available. Generating BZHI in the variable mask case, i.e. (and X, (sub (shl 1, N), 1)), was already supported, but we were missing the constant-mask case. This patch fixes that. <rdar://problem/15480077> llvm-svn: 206738	2014-04-21 08:18:53 +00:00
Chandler Carruth	a2533a7bef	Revert r206732 which is causing llc to crash on most of the build bots. Original commit message: Implement builtins for safe division: safe.sdiv.iN, safe.udiv.iN, safe.srem.iN, safe.urem.iN (iN = i8, i61, i32, or i64). llvm-svn: 206735	2014-04-21 07:11:15 +00:00
Michael Zolotukhin	137a84616c	Implement builtins for safe division: safe.sdiv.iN, safe.udiv.iN, safe.srem.iN, safe.urem.iN (iN = i8, i16, i32, or i64). llvm-svn: 206732	2014-04-21 05:33:09 +00:00
Duncan P. N. Exon Smith	e63327e967	Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" This reverts commit r206704, as expected. llvm-svn: 206707	2014-04-19 22:46:00 +00:00
Duncan P. N. Exon Smith	6611a377eb	Revert "blockfreq: Temporarily turn on -debug-only=block-freq" This reverts commit r206705, as planned. llvm-svn: 206706	2014-04-19 22:45:44 +00:00
Duncan P. N. Exon Smith	bffee5bb90	blockfreq: Temporarily turn on -debug-only=block-freq These tests fail after my BlockFrequencyInfo rewrite on two buildbots [1][2]. I can't reproduce it locally, so I'm temporarily turning on -debug-only=block-freq so I can find the problem. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1860 [2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18477 llvm-svn: 206705	2014-04-19 22:40:56 +00:00
Duncan P. N. Exon Smith	875ddfac75	Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl" This reverts commit r206677, reapplying my BlockFrequencyInfo rewrite. I've done a careful audit, added some asserts, and fixed a couple of bugs (unfortunately, they were in unlikely code paths). There's a small chance that this will appease the failing bots [1][2]. (If so, great!) If not, I have a follow-up commit ready that will temporarily add -debug-only=block-freq to the two failing tests, allowing me to compare the code path between what the failing bots and what my machines (and the rest of the bots) are doing. Once I've triggered those builds, I'll revert both commits so the bots go green again. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816 [2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18445 <rdar://problem/14292693> llvm-svn: 206704	2014-04-19 22:34:26 +00:00
Yaron Keren	d7ba46b287	Patch by Vadim Chugunov Win64 stack unwinder gets confused when execution flow "falls through" after a call to 'noreturn' function. This fixes the "missing epilogue" problem by emitting a trap instruction for IR 'unreachable' on x86_x64-pc-windows. A secondary use for it would be for anyone wanting to make double-sure that 'noreturn' functions, indeed, do not return. llvm-svn: 206684	2014-04-19 13:47:43 +00:00
Duncan P. N. Exon Smith	76b813619a	Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2 ) This reverts commit r206666, as planned. Still stumped on why the bots are failing. Sanitizer bots haven't turned anything up. If anyone can help me debug either of the failures (referenced in r206666) I'll owe them a beer. (In the meantime, I'll be auditing my patch for undefined behaviour.) llvm-svn: 206677	2014-04-19 00:42:46 +00:00
Duncan P. N. Exon Smith	b3caf3646f	Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2 ) This reverts commit r206628, reapplying r206622 (and r206626). Two tests are failing only on buildbots [1][2]: i.e., I can't reproduce on Darwin, and Chandler can't reproduce on Linux. Asan and valgrind don't tell us anything, but we're hoping the msan bot will catch it. So, I'm applying this again to get more feedback from the bots. I'll leave it in long enough to trigger builds in at least the sanitizer buildbots (it was failing for reasons unrelated to my commit last time it was in), and hopefully a few others.... and then I expect to revert a third time. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816 [2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18445 llvm-svn: 206666	2014-04-18 22:30:03 +00:00
Chad Rosier	9149acb053	[ARM64] Ports the Cortex-A53 Machine Model description from AArch64. Summary: This port includes the rudimentary latencies that were provided for the Cortex-A53 Machine Model in the AArch64 backend. It also changes the SchedAlias for COPY in the Cyclone model to an explicit WriteRes mapping to avoid conflicts in other subtargets. Differential Revision: http://reviews.llvm.org/D3427 Patch by Dave Estes <cestes@codeaurora.org>! llvm-svn: 206652	2014-04-18 21:22:04 +00:00
Yaron Keren	d0d38bf91e	Expanded test for x86-pc-windows-gnu and x86_64-pc-windows-gnu environments. llvm-svn: 206649	2014-04-18 21:10:11 +00:00
Adam Nemet	ee7a3e38c9	[X86] Improve buildFromShuffleMostly for AVX For a 256-bit BUILD_VECTOR consisting mostly of shuffles of 256-bit vectors, both the BUILD_VECTOR and its operands may need to be legalized in multiple steps. Consider: (v8f32 (BUILD_VECTOR (extract_vector_elt (v8f32 %vreg0,) Constant<1>), (extract_vector_elt %vreg0, Constant<2>), (extract_vector_elt %vreg0, Constant<3>), (extract_vector_elt %vreg0, Constant<4>), (extract_vector_elt %vreg0, Constant<5>), (extract_vector_elt %vreg0, Constant<6>), (extract_vector_elt %vreg0, Constant<7>), %vreg1)) a. We can't build a 256-bit vector efficiently so, we need to split it into two 128-bit vecs and combine them with VINSERTX128. b. Operands like (extract_vector_elt (v8f32 %vreg0), Constant<7>) needs to be split into a VEXTRACTX128 and a further extract_vector_elt from the resulting 128-bit vector. c. The extract_vector_elt from b. is lowered into a shuffle to the first element and a movss. Depending on the order in which we legalize the BUILD_VECTOR and its operands[1], buildFromShuffleMostly may be faced with: (v4f32 (BUILD_VECTOR (extract_vector_elt (vector_shuffle<1,u,u,u> (extract_subvector %vreg0, Constant<4>), undef), Constant<0>), (extract_vector_elt (vector_shuffle<2,u,u,u> (extract_subvector %vreg0, Constant<4>), undef), Constant<0>), (extract_vector_elt (vector_shuffle<3,u,u,u> (extract_subvector %vreg0, Constant<4>), undef), Constant<0>), %vreg1)) In order to figure out the underlying vector and their identity we need to see through the shuffles. [1] Note that the order in which operations and their operands are legalized is only guaranteed in the first iteration of LegalizeDAG. Fixes <rdar://problem/16296956> llvm-svn: 206634	2014-04-18 19:44:16 +00:00
Duncan P. N. Exon Smith	0842ff36a6	Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2 ) This reverts commit r206622 and the MSVC fixup in r206626. Apparently the remotely failing tests are still failing, despite my attempt to fix the nondeterminism in r206621. llvm-svn: 206628	2014-04-18 17:56:08 +00:00
Duncan P. N. Exon Smith	f8361d127a	Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl" This reverts commit r206556, effectively reapplying commit r206548 and its fixups in r206549 and r206550. In an intervening commit I've added target triples to the tests that were failing remotely [1] (but passing locally). I'm hoping the mystery is solved? I'll revert this again if the tests are still failing remotely. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816 llvm-svn: 206622	2014-04-18 17:22:25 +00:00
Duncan P. N. Exon Smith	c812b5b33a	Add some target triples for better determinism These tests were failing on some buildbots after r206548 (reverted in r206556), but passing locally. They were missing target triples, so maybe that's the problem? llvm-svn: 206621	2014-04-18 17:22:19 +00:00
Tim Northover	ff046b64d9	AArch64/ARM64: add more NEON tests. Mostly no testing this time, since they were just wrangling target-specific intrinsics. llvm-svn: 206613	2014-04-18 14:54:53 +00:00
Tim Northover	37d9a9cebf	ARM64: disable generation of .loh directives outside MachO. Part of PR19455. llvm-svn: 206611	2014-04-18 14:54:46 +00:00
Tim Northover	be1d1b6681	ARM64: don't emit .subsections_via_symbols on ELF. Part of PR19455. llvm-svn: 206610	2014-04-18 14:54:41 +00:00
Tim Northover	be3941cc79	ARM64: add extra NEG pattern. llvm-svn: 206609	2014-04-18 14:54:35 +00:00
Tim Northover	0dbdfb8522	AArch64/ARM64: port more AArch64 tests to ARM64. llvm-svn: 206592	2014-04-18 13:16:55 +00:00
Tim Northover	e3028832d1	AArch64/ARM64: add non-scalar lowering for more FCVT operations. llvm-svn: 206591	2014-04-18 13:16:42 +00:00
Tim Northover	01f315a556	AArch64/ARM64: improve spotting of EXT instructions from VECTOR_SHUFFLE. We couldn't cope if the first mask element was UNDEF before, which isn't ideal. llvm-svn: 206588	2014-04-18 12:50:58 +00:00
Benjamin Kramer	e6c821ef4c	X86: Pattern match scalar loads + vcvtph2ps into just vcvtph2ps. vcvtph2ps only reads the lower 64 bits of the address passed to the intrinsic. llvm-svn: 206579	2014-04-18 10:45:33 +00:00
Tim Northover	66c36b814f	AArch64/ARM64: port atomics test to ARM64. Covers quite a few extra instructions (like any of the max/min ones which were broken until recently on ARM64). llvm-svn: 206575	2014-04-18 09:31:31 +00:00
Tim Northover	a2c4c71c12	AArch64/ARM64: spot a greater variety of concat_vector operations. Code mostly copied from AArch64, just tidied up a trifle and plumbed into the ARM64 way of doing things. This also enables the AArch64 tests which inspired the previous untested commits. llvm-svn: 206574	2014-04-18 09:31:27 +00:00
Tim Northover	848bb3ced5	ARM64: implement cunning optimisation from AArch64 A vector extract followed by a dup can become a single instruction even if the types don't match. AArch64 handled this in ISelLowering, but a few reasonably simple patterns can take care of it in TableGen, so that's where I've put it. llvm-svn: 206573	2014-04-18 09:31:20 +00:00
Tim Northover	8b2fa3dfef	AArch64/ARM64: emit all vector FP comparisons as such. ARM64 was scalarizing some vector comparisons which don't quite map to AArch64's compare and mask instructions. AArch64's approach of sacrificing a little efficiency to emulate them with the limited set available was better, so I ported it across. More "inspired by" than copy/paste since the backend's internal expectations were a bit different, but the tests were invaluable. llvm-svn: 206570	2014-04-18 09:31:07 +00:00
Tim Northover	0a44e66bb8	AArch64/ARM64: port BSL logic from AArch64 & enable test. I enhanced it a little in the process. The decision shouldn't really be beased on whether a BUILD_VECTOR is a splat: any set of constants will do the job provided they're related in the correct way. Also, the BUILD_VECTOR could be any operand of the incoming AND nodes, so it's best to check for all 4 possibilities rather than assuming it'll be the RHS. llvm-svn: 206569	2014-04-18 09:31:01 +00:00
Tim Northover	547a4ae6fa	AArch64/ARM64: copy byval implementation from AArch64. It's not actually used to handle C or C++ ABI rules on ARM64, but could well be emitted by other language front-ends, so it's as well to have a sensible implementation. llvm-svn: 206568	2014-04-18 09:30:52 +00:00
Jiangning Liu	40d81e10c5	This is one of the optimizations ported from ARM64 to AArch64 to address the performance gap between these two back ends. The test case newly added for AArch64 already exists in ARM64. Patched by Z.Zheng llvm-svn: 206559	2014-04-18 05:58:09 +00:00
Matt Arsenault	78b8670aac	R600/SI: Try to use scalar BFE. Use scalar BFE with constant shift and offset when possible. This is complicated by the fact that the scalar version packs the two operands of the vector version into one. llvm-svn: 206558	2014-04-18 05:19:26 +00:00
Jiangning Liu	e56c30614f	This commit enables unaligned memory accesses of vector types on AArch64 back end. This should boost vectorized code performance. Patched by Z. Zheng llvm-svn: 206557	2014-04-18 03:58:38 +00:00
Duncan P. N. Exon Smith	e576167df8	Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" This reverts commits r206548, r206549 and r206549. There are some unit tests failing that aren't failing locally [1], so reverting until I have time to investigate. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816 llvm-svn: 206556	2014-04-18 02:17:43 +00:00
Duncan P. N. Exon Smith	12e68e1733	blockfreq: Rewrite BlockFrequencyInfoImpl Rewrite the shared implementation of BlockFrequencyInfo and MachineBlockFrequencyInfo entirely. The old implementation had a fundamental flaw: precision losses from nested loops (or very wide branches) compounded past loop exits (and convergence points). The @nested_loops testcase at the end of test/Analysis/BlockFrequencyAnalysis/basic.ll is motivating. This function has three nested loops, with branch weights in the loop headers of 1:4000 (exit:continue). The old analysis gives non-sensical results: Printing analysis 'Block Frequency Analysis' for function 'nested_loops': ---- Block Freqs ---- entry = 1.0 for.cond1.preheader = 1.00103 for.cond4.preheader = 5.5222 for.body6 = 18095.19995 for.inc8 = 4.52264 for.inc11 = 0.00109 for.end13 = 0.0 The new analysis gives correct results: Printing analysis 'Block Frequency Analysis' for function 'nested_loops': block-frequency-info: nested_loops - entry: float = 1.0, int = 8 - for.cond1.preheader: float = 4001.0, int = 32007 - for.cond4.preheader: float = 16008001.0, int = 128064007 - for.body6: float = 64048012001.0, int = 512384096007 - for.inc8: float = 16008001.0, int = 128064007 - for.inc11: float = 4001.0, int = 32007 - for.end13: float = 1.0, int = 8 Most importantly, the frequency leaving each loop matches the frequency entering it. The new algorithm leverages BlockMass and PositiveFloat to maintain precision, separates "probability mass distribution" from "loop scaling", and uses dithering to eliminate probability mass loss. I have unit tests for these types out of tree, but it was decided in the review to make the classes private to BlockFrequencyInfoImpl, and try to shrink them (or remove them entirely) in follow-up commits. The new algorithm should generally have a complexity advantage over the old. The previous algorithm was quadratic in the worst case. The new algorithm is still worst-case quadratic in the presence of irreducible control flow, but it's linear without it. The key difference between the old algorithm and the new is that control flow within a loop is evaluated separately from control flow outside, limiting propagation of precision problems and allowing loop scale to be calculated independently of mass distribution. Loops are visited bottom-up, their loop scales are calculated, and they are replaced by pseudo-nodes. Mass is then distributed through the function, which is now a DAG. Finally, loops are revisited top-down to multiply through the loop scales and the masses distributed to pseudo nodes. There are some remaining flaws. - Irreducible control flow isn't modelled correctly. LoopInfo and MachineLoopInfo ignore irreducible edges, so this algorithm will fail to scale accordingly. There's a note in the class documentation about how to get closer. See also the comments in test/Analysis/BlockFrequencyInfo/irreducible.ll. - Loop scale is limited to 4096 per loop (2^12) to avoid exhausting the 64-bit integer precision used downstream. - The "bias" calculation proposed on llvmdev is not incorporated here. This will be added in a follow-up commit, once comments from this review have been handled. llvm-svn: 206548	2014-04-18 01:57:45 +00:00
Matt Arsenault	27cc958dff	R600/SI: Match sign_extend_inreg to s_sext_i32_i8 and s_sext_i32_i16 llvm-svn: 206547	2014-04-18 01:53:18 +00:00
Tom Stellard	1aa6cb4d88	R600/SI: Use SReg_64 instead of VSrc_64 when selecting BUILD_PAIR llvm-svn: 206541	2014-04-18 00:36:21 +00:00
Louis Gerbarg	e43a24f444	Make test/CodeGen/ARM64/vector-insertion.ll explicitly select neon syntax Change the command line vector-insertion.ll to explicitly set the neon syntax to apple so that buildbots that default to other syntaxes won't fail. llvm-svn: 206502	2014-04-17 21:32:41 +00:00
Tom Stellard	868fd92e54	R600/SI: Stop using i128 as the resource descriptor type Having i128 as a legal type complicates the legalization phase. v4i32 is already a legal type, so we will use that instead. This fixes several piglit tests. llvm-svn: 206500	2014-04-17 21:00:11 +00:00
Louis Gerbarg	153e695ee2	Improve ARM64 vector creation This patch improves the performance of vector creation in caseiswhere where several of the lanes in the vector are a constant floating point value. It also includes new patterns to fold together some of the instructions when the value is 0.0f. Test cases included. rdar://16349427 llvm-svn: 206496	2014-04-17 20:51:50 +00:00
Jim Grosbach	0fba6d98fc	ARM64: [su]xtw use W regs as inputs, not X regs. Update the SXT[BHW]/UXTW instruction aliases and the shifted reg addressing mode handling. PR19455 and rdar://16650642 llvm-svn: 206495	2014-04-17 20:47:31 +00:00
Tim Northover	11a6082e33	ARM64: switch to IR-based atomic operations. Goodbye code! (Game: spot the bug fixed by the change). llvm-svn: 206490	2014-04-17 20:00:33 +00:00
Tim Northover	0129f298c4	ARM64: add acquire/release versions of the existing atomic intrinsics. These will be needed to support IR-level lowering of atomic operations. llvm-svn: 206489	2014-04-17 20:00:24 +00:00
Josh Magee	adfde5fef6	[stack protector] Make the StackProtector pass respect ssp-buffer-size. Previously, SSPBufferSize was assigned the value of the "stack-protector-buffer-size" attribute after all uses of SSPBufferSize. The effect was that the default SSPBufferSize was always used during analysis. I moved the check for the attribute before the analysis; now --param ssp-buffer-size= works correctly again. Differential Revision: http://reviews.llvm.org/D3349 llvm-svn: 206486	2014-04-17 19:08:36 +00:00
Matt Arsenault	a90d22fad5	R600/SI: f64 frint is legal on CI llvm-svn: 206475	2014-04-17 17:06:37 +00:00
Matt Arsenault	51df0c1965	R600/SI: Fix zext from i1 to i64 llvm-svn: 206437	2014-04-17 02:03:08 +00:00
Adam Nemet	287f989dde	[ARM64] Fix "Cannot select" for vector ctpop The commit of r205855: Author: Arnold Schwaighofer <aschwaighofer@apple.com> Date: Wed Apr 9 14:20:47 2014 +0000 SLPVectorizer: Only vectorize intrinsics whose operands are widened equally The vectorizer only knows how to vectorize intrinics by widening all operands by the same factor. Patch by Tyler Nowicki! exposed a backend bug causing a regression (Cannot select ctpop). The commit msg is a bit confusing because the patch actually changes the behavior for the loop-vectorizer as well. As things got refactored into a helper ctpop got snuck in to the trivially-vectorizable helper which is now used by both vectorizers. In other words, we started seeing vector-ctpops in the backend. This change makes ctpop LegalizeAction::Expand for the types not supported by the byte-only CNT instruction. We may be able to custom-lower these later to a single CNT but this is to fix the compiler crash first. Fixes <rdar://problem/16578951> llvm-svn: 206433	2014-04-17 01:01:37 +00:00
Matheus Almeida	0051f2dc78	[mips] Add initial support for NaN2008 in the back-end. This is so that EF_MIPS_NAN2008 is set if we are using IEEE 754-2008 NaN encoding (-mnan=2008). This patch also adds support for parsing '.nan legacy' and '.nan 2008' assembly directives. The handling of these directives should match GAS' behaviour i.e., the last directive in use sets the ELF header bit (EF_MIPS_NAN2008). Differential Revision: http://reviews.llvm.org/D3346 llvm-svn: 206396	2014-04-16 15:48:55 +00:00
Tim Northover	cb37ab2d9c	AArch64/ARM64: port some NEON tests to ARM64 These ones used completely different sets of intrinsics, so the only way to do it is create a separate ARM64 copy and change them all. Other than that, CodeGen was straightforward, no deficiencies detected here. llvm-svn: 206392	2014-04-16 15:28:02 +00:00
Daniel Sanders	16fa1db637	[mips] Fix emission of '.option pic0' for MIPS-IV. Summary: This was a case of incorrect usage of hasMips64() vs isABI_N64() Reviewers: matheusalmeida, dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D3398 llvm-svn: 206388	2014-04-16 13:58:57 +00:00
Daniel Sanders	a024fb0e04	[mips] Correct r206370 to account for non-Linux targets using the small data section. This should fix the ninja-x64-msvc-RA-centos6 builder. I suspect the check in MipsSubtarget.cpp is incorrect and is really trying to check for a bare-metal target rather and anything other than linux. I'll investigate this. llvm-svn: 206385	2014-04-16 12:29:08 +00:00
Tim Northover	05a4039fc9	ARM64: specify triple so that Linux tests pass Now that Linux is trying to reparse all inline asm it chokes on the different comment character in this test. llvm-svn: 206382	2014-04-16 12:03:56 +00:00
Tim Northover	46ecdf5a0f	AArch64/ARM64: add another set of tests from AArch64 Another batch with no code changes. llvm-svn: 206381	2014-04-16 11:53:07 +00:00
Tim Northover	3ec1de7767	AArch64/ARM64: port across stub handling for ELF C++ exceptions. The most important part here is that we should actuall emit the stubs we refer to in the exception table, but as a side issue this uses more sensible & GCC compatible representations for some of the bits of information. llvm-svn: 206380	2014-04-16 11:52:55 +00:00
Tim Northover	18f68f6d1a	ARM64: use 32-bit moves for constants where possible. If we know that a particular 64-bit constant has all high bits zero, then we can rely on the fact that 32-bit ARM64 instructions automatically zero out the high bits of an x-register. This gives the expansion logic less constraints to satisfy and so sometimes allows it to pick better sequences. Came up while porting test/CodeGen/AArch64/movw-consts.ll: this will allow a 32-bit MOVN to be used in @test8 soon. llvm-svn: 206379	2014-04-16 11:52:51 +00:00
Tim Northover	9cfb57dafa	ARM64: use the integrated assembler on ELF. llvm-svn: 206378	2014-04-16 11:52:40 +00:00
Matheus Almeida	dc7e48e084	[mips] Emit '.set nomicromips' before a function's entry label if not in micromips mode. The test (elf_st_other.ll) was renamed as the name and description didn't make sense as the test wasn't checking any symbol table entry. Differential Revision: http://reviews.llvm.org/D3346 llvm-svn: 206377	2014-04-16 11:46:59 +00:00
Daniel Sanders	11c0c067c2	[mips] Correct callee saved list for the N32 ABI and enable test Summary: Depends on D3339 Reviewers: matheusalmeida, vmedic Reviewed By: matheusalmeida Differential Revision: http://reviews.llvm.org/D3340 llvm-svn: 206371	2014-04-16 10:23:37 +00:00
Daniel Sanders	9fe0ad0c07	[mips] Add calling convention tests covering O32, N32, and N64. Summary: I had difficulty finding tests for the N32 and N64 ABI so I've added a collection of calling convention tests based on the document MIPS ABIs Described (MD00305), the MIPSpro N32 Handbook, and the SYSV ABI. Where the documents/implementations disagree, I've used GCC to resolve the conflict. A few interesting details: * For N32, LLVM uses 64-bit pointers when saving $ra despite pointers being 32-bit. I've yet to find a supporting statement in the ABI documentation but the current behaviour matches GCC. * For O32, the non-variable portion of a varargs argument list is also subject to the rule that floating-point is passed via GPR's (on N32/N64 only the variable portion is subject to this rule). This agrees with GCC's behaviour and the SYSV ABI but contradicts part of the MIPSpro N32 Handbook which talks about O32's behaviour. * The N32 implementation has the wrong callee-saved register list. (I already have a fix for this but will commit it as a follow-up). I've left RUN-TODO lines in for O32 on MIPS64. I don't plan to support this case for now but we should revisit it. Reviewers: matheusalmeida, vmedic Reviewed By: matheusalmeida Differential Revision: http://reviews.llvm.org/D3339 llvm-svn: 206370	2014-04-16 09:59:46 +00:00
Tim Northover	f8d183e8b9	ARM64: explicitly ask for Apple NEON syntax so test passes on Linux llvm-svn: 206368	2014-04-16 09:13:44 +00:00
Tim Northover	97c5b6fe4f	ARM64: mark x7 as used when an i128 gets shunted onto the stack. The second half of a split i128 was ending up in x7, which is not a good thing. This is another part of PR19432. llvm-svn: 206366	2014-04-16 09:03:25 +00:00
Tim Northover	863a789a99	DAGCombiner: don't optimise non-existant litpool load This particular DAG combine is designed to kick in when both ConstantFPs will end up being loaded via a litpool, however those nodes have a semi-legal status, dictated by isFPImmLegal so in some cases there wouldn't have been a litpool in the first place. Don't try to be clever in those circumstances. Picked up while merging some AArch64 tests. llvm-svn: 206365	2014-04-16 09:03:09 +00:00
Matt Arsenault	4ef2588b65	R600: Extend r600 sign_extend_inreg tests for EG Patch by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 206349	2014-04-16 01:41:34 +00:00
Matt Arsenault	4d7d38333b	R600/SI: Print more immediates in hex format Print in decimal for inline immediates, and hex otherwise. Use hex always for offsets in addressing offsets. This approximately matches what the shader compiler does. llvm-svn: 206335	2014-04-15 22:32:49 +00:00
Nick Lewycky	43855af9a7	Make this test not match its own filename, when being run from a path that includes the string 'add'. llvm-svn: 206331	2014-04-15 22:29:32 +00:00
Matt Arsenault	470acd81a8	R600/SI: Fix loads of i1 llvm-svn: 206330	2014-04-15 22:28:39 +00:00
Akira Hatanaka	3d90f99d1a	Make FastISel::SelectInstruction return before target specific fast-isel code handles Intrinsic::trap if TargetOptions::TrapFuncName is set. This fixes a bug in which the trap function was not taken into consideration when a program was compiled without optimization (at -O0). <rdar://problem/16291933> llvm-svn: 206323	2014-04-15 21:30:06 +00:00
Andrea Di Biagio	aac2eac4c2	[X86] Improve the lowering of packed shifts by constant build_vector. This patch teaches the backend how to efficiently lower logical and arithmetic packed shifts on both SSE and AVX/AVX2 machines. When possible, instead of scalarizing a vector shift, the backend should try to expand the shift into a sequence of two packed shifts by immedate count followed by a MOVSS/MOVSD. Example (v4i32 (srl A, (build_vector < X, Y, Y, Y>))) Can be rewritten as: (v4i32 (MOVSS (srl A, <Y,Y,Y,Y>), (srl A, <X,X,X,X>))) [with X and Y ConstantInt] The advantage is that the two new shifts from the example would be lowered into X86ISD::VSRLI nodes. This is always cheaper than scalarizing the vector into four scalar shifts plus four pairs of vector insert/extract. llvm-svn: 206316	2014-04-15 19:30:48 +00:00
Quentin Colombet	72dad56c53	[ARM64] Set default CPU to generic instead of cyclone. llvm-svn: 206313	2014-04-15 19:08:46 +00:00
Robert Lougher	a9bf2463b9	Revert r191049/r191059 as it can produce wrong code (see PR17975). It has already been reverted on the 3.4 branch in r196521. llvm-svn: 206311	2014-04-15 18:34:24 +00:00
Tim Northover	bd668872c0	AArch64/ARM64: enable more AArch64 tests on ARM64. No code changes for this bunch, just some test rejigs. llvm-svn: 206291	2014-04-15 14:00:29 +00:00
Tim Northover	ebb3123a5f	AArch64/ARM64: add missing pattern for extending load. llvm-svn: 206290	2014-04-15 14:00:19 +00:00
Tim Northover	cbcb7a37f7	AArch64/ARM64: only mangle MOVZ/MOVN during encoding when needed Sometimes we need emit the bits that would actually be a MOVN when producing a relocated MOVZ instruction (don't ask). But not always, a check which ARM64 got wrong until now. llvm-svn: 206289	2014-04-15 14:00:15 +00:00
Tim Northover	6e27b8ded5	AArch64/ARM64: add support for large code-model jump tables. I've left the MachO CodeGen as it is, there's a reasonable chance it should use the GOT like ConstPools, but I'm not certain. llvm-svn: 206288	2014-04-15 14:00:11 +00:00
Tim Northover	221b583951	AArch64/ARM64: add patterns for various commutations of FNMADD. llvm-svn: 206287	2014-04-15 14:00:06 +00:00
Tim Northover	b37cff1ae2	AArch64/ARM64: add half as a storage type on ARM64. This brings it into line with the AArch64 behaviour and should open the way for certain OpenCL features. llvm-svn: 206286	2014-04-15 14:00:03 +00:00
Tim Northover	80a70a265a	AArch64/ARM64: copy patterns for fixed-point conversions Code is mostly copied directly across, with a slight extension of the ISelDAGToDAG function so that it can cope with the floating-point constants being behind a litpool. llvm-svn: 206285	2014-04-15 13:59:57 +00:00
Tim Northover	f70577b1cd	ARM64: add constraints to various FastISel operations llvm-svn: 206284	2014-04-15 13:59:53 +00:00
Tim Northover	27010074fb	AArch64/ARM64: add more arm64 lines to AArch64 regression tests llvm-svn: 206282	2014-04-15 13:59:44 +00:00
Tim Northover	20603726ce	AArch64/ARM64: add dp tests from AArch64 llvm-svn: 206281	2014-04-15 13:59:40 +00:00
Quentin Colombet	c396019837	[Register Coalescer] Add a test case for 206060. <rdar://problem/16582185> llvm-svn: 206235	2014-04-15 01:15:32 +00:00
Louis Gerbarg	cfc05450e5	Fix for codegen bug that could cause illegal cmn instruction generation In rare cases the dead definition elimination pass code can cause illegal cmn instructions when it replaces dead registers on instructions that use unmaterialized frame indexes. This patch disables the dead definition optimization for instructions which include frame index operands. rdar://16438284 llvm-svn: 206208	2014-04-14 21:05:05 +00:00
Louis Gerbarg	6d2e3c638f	Add a flag to disable the ARM64DeadRegisterDefinitionsPass This patch adds a -arm64-dead-def-elimination flag so that it is possible to disable dead definition elimination. Includes test case. llvm-svn: 206207	2014-04-14 21:05:02 +00:00
Akira Hatanaka	5638b89944	Fix a bug in which BranchProbabilityInfo wasn't setting branch weights of basic blocks inside loops correctly. Previously, BranchProbabilityInfo::calcLoopBranchHeuristics would determine the weights of basic blocks inside loops even when it didn't have enough information to estimate the branch probabilities correctly. This patch fixes the function to exit early if it doesn't see any exit edges or back edges and let the later heuristics determine the weights. This fixes PR18705 and <rdar://problem/15991090>. Differential Revision: http://reviews.llvm.org/D3363 llvm-svn: 206194	2014-04-14 16:56:19 +00:00
Richard Trieu	3df79775c5	Fix 2008-03-05-SxtInRegBug.ll so that the CHECK-NOT will not match the filename. llvm-svn: 206193	2014-04-14 16:53:50 +00:00
Daniel Sanders	863c35a358	[mips] Fix fcopysign for MIPS-IV and add the test. Summary: This was another incorrect use of hasMips64() vs isGP64bit(). Depends on D3344 Reviewers: matheusalmeida, vmedic Reviewed By: vmedic Differential Revision: http://reviews.llvm.org/D3347 llvm-svn: 206187	2014-04-14 16:24:12 +00:00
Daniel Sanders	1d3ae27f01	[mips] MIPS-IV is broadly the same as MIPS64 so duplicate all -mcpu=mips64 tests with -mcpu=mips4 as a starting point Summary: Two exceptions to this: test/CodeGen/Mips/octeon.ll test/CodeGen/Mips/octeon_popcnt.ll these test extensions to MIPS64 One test is altered for MIPS-IV: test/CodeGen/Mips/mips64countleading.ll Tests dclo/dclz which were added in MIPS64. The MIPS-IV version tests that dclo/dclz are not emitted. Four tests fail and are not in this patch: test/CodeGen/Mips/abicalls.ll test/CodeGen/Mips/fcopysign-f32-f64.ll test/CodeGen/Mips/fcopysign.ll test/CodeGen/Mips/stack-alignment.ll Depends on D3343 Reviewers: matheusalmeida, vmedic Reviewed By: vmedic Differential Revision: http://reviews.llvm.org/D3344 llvm-svn: 206185	2014-04-14 16:00:28 +00:00
Daniel Sanders	3d84935d28	[mips] Fix more incorrect uses of HasMips64 and isMips64() Summary: - Conditional moves acting on 64-bit GPR's should require MIPS-IV rather than MIPS64 - ISD::MUL, and ISD::MULH[US] should be lowered on all 64-bit ISA's Patch by David Chisnall His work was sponsored by: DARPA, AFRL I've added additional testcases to cover as much of the codegen changes affecting MIPS-IV as I can. Where I've been unable to find an existing MIPS64 testcase that can be re-used for MIPS-IV (mainly tests covering ISD::GlobalAddress and similar), I at least agree that MIPS-IV should behave like MIPS64. Further testcases that are fixed by this patch will follow in my next commit. The testcases from that commit that fail for MIPS-IV without this patch are: LLVM :: CodeGen/Mips/2010-07-20-Switch.ll LLVM :: CodeGen/Mips/cmov.ll LLVM :: CodeGen/Mips/eh-dwarf-cfa.ll LLVM :: CodeGen/Mips/largeimmprinting.ll LLVM :: CodeGen/Mips/longbranch.ll LLVM :: CodeGen/Mips/mips64-f128.ll LLVM :: CodeGen/Mips/mips64directive.ll LLVM :: CodeGen/Mips/mips64ext.ll LLVM :: CodeGen/Mips/mips64fpldst.ll LLVM :: CodeGen/Mips/mips64intldst.ll LLVM :: CodeGen/Mips/mips64load-store-left-right.ll LLVM :: CodeGen/Mips/sint-fp-store_pattern.ll Reviewers: dsanders Reviewed By: dsanders CC: matheusalmeida Differential Revision: http://reviews.llvm.org/D3343 llvm-svn: 206183	2014-04-14 15:44:42 +00:00
Tim Northover	db2860f49e	ARM64: specify full triple in tests to pacify Windows. llvm-svn: 206175	2014-04-14 13:18:48 +00:00
Tim Northover	a89617bd33	AArch64: add newline to end of test files. Should be no other change. llvm-svn: 206174	2014-04-14 13:18:40 +00:00
Tim Northover	cb9c3cfb58	ARM64: remove buggy REV16 pattern. The 32-bit pattern is still valid: 0123 -> 3210 -> 1032. llvm-svn: 206172	2014-04-14 12:59:52 +00:00
Tim Northover	b6abe806c7	AArch64/ARM64: enable directcond.ll test on ARM64. Code change is because optimizeCompareInstr didn't know how to pull the condition code out of FCSEL instructions. llvm-svn: 206171	2014-04-14 12:51:06 +00:00
Tim Northover	0d7bd4f444	ARM64: add patterns for csXYZ with reversed operands. AArch64 tests for this, and it's obviously a good idea. Have to invert the condition code, of course. llvm-svn: 206170	2014-04-14 12:51:02 +00:00
Tim Northover	c398cd53aa	ARM64: enable more regression tests from AArch64 llvm-svn: 206169	2014-04-14 12:50:58 +00:00
Tim Northover	2f48303436	ARM64: add support for AArch64's addsub_ext.ll There was one definite issue in ARM64 (the off-by-1 check for whether a shift could be folded in) and one difference that is probably correct: ARM64 didn't fold nodes with multiple uses into the arithmetic operations unless optimising for code size. llvm-svn: 206168	2014-04-14 12:50:50 +00:00
Tim Northover	23b1f08282	ARM64: optimise (cmp x, (sub 0, y)) to (cmn x, y). This transformation is only valid when being used for an EQ or NE comparison since the flags change otherwise. llvm-svn: 206167	2014-04-14 12:50:47 +00:00
Tim Northover	d1719a8f76	ARM64: start porting regression test suite from AArch64 llvm-svn: 206166	2014-04-14 12:50:41 +00:00
Richard Osborne	da16ff47cd	[XCore] Don't create invalid MKMSK instructions inside loadImmediate(). Summary: Previously loadImmediate() would produce MKMSK instructions with invalid immediate values such as mkmsk r0, 9. Fix this by checking the mask size is valid. Reviewers: robertlytton Reviewed By: robertlytton CC: llvm-commits Differential Revision: http://reviews.llvm.org/D3289 llvm-svn: 206163	2014-04-14 12:30:35 +00:00
Hal Finkel	d9963c75da	[PowerPC] Fix rlwimi isel when mask is not constant We had been using the known-zero values of the operand of the or to construct the mask for an rlwimi; this is not quite correct, but fine when the mask is constant. When the mask is constant, then the known zeros of the operand must be a superset of the zeros in the mask. However, when the mask is not a constant, then there might be bits in the operand that are not known to be zero that, at runtime, might be zero in the mask. Therefore, we check that any bits not known to be zero are known to be one in the mask. Otherwise, we can't fold the mask with the or and shift. This was revealed as a miscompile of MultiSource/Benchmarks/BitBench/drop3/drop3 when I started experimenting with constant hoisting. llvm-svn: 206136	2014-04-13 17:10:58 +00:00
Hal Finkel	34974ed503	[PowerPC] Implement some additional TLI callbacks Add implementations of: bool isLegalICmpImmediate(int64_t Imm) const bool isLegalAddImmediate(int64_t Imm) const bool isTruncateFree(Type Ty1, Type Ty2) const bool isTruncateFree(EVT VT1, EVT VT2) const bool shouldConvertConstantLoadToIntImm(const APInt &Imm, Type *Ty) const Unfortunately, this regresses counter-register-based loop formation because some of the loops now end up in forms were SE cannot compute loop counts. However, nevertheless, the test-suite results favor committing: SingleSource/Benchmarks/BenchmarkGame/puzzle: 26% speedup MultiSource/Benchmarks/FreeBench/analyzer/analyzer: 21% speedup MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan: 20% speedup SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trisolv/trisolv: 19% speedup SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gesummv/gesummv: 15% speedup MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2: 2% speedup MultiSource/Benchmarks/VersaBench/bmm/bmm: 26% slowdown llvm-svn: 206120	2014-04-12 21:52:38 +00:00
Richard Trieu	97a268d905	Add extra checks to mvn.ll test to prevent the "f1" check from matching on a directory name instead of the function name. llvm-svn: 206104	2014-04-12 04:47:04 +00:00
Hal Finkel	3b48d08f54	Reenable use of TBAA during CodeGen We had disabled use of TBAA during CodeGen (even when otherwise using AA) because the ptrtoint/inttoptr used by CGP for address sinking caused BasicAA to miss basic type punning that it should catch (and, thus, we'd fail to override TBAA when we should). However, when AA is in use during CodeGen, CGP now uses normal GEPs and bitcasts, instead of ptrtoint/inttoptr, when doing address sinking. As a result, BasicAA should be able to make us do the right thing in the face of type-punning, and it seems safe to enable use of TBAA again. self-hosting seems fine on PPC64/Linux on the P7, with TBAA enabled and -misched=shuffle. Note: We still don't update TBAA when merging stack slots, although because BasicAA should now catch all such cases, this is no longer a blocking issue. Nevertheless, I plan to commit code to deal with this properly in the near future. llvm-svn: 206093	2014-04-12 01:26:00 +00:00
Hal Finkel	c3998306f4	Add the ability to use GEPs for address sinking in CGP The current memory-instruction optimization logic in CGP, which sinks parts of the address computation that can be adsorbed by the addressing mode, does this by explicitly converting the relevant part of the address computation into IR-level integer operations (making use of ptrtoint and inttoptr). For most targets this is currently not a problem, but for targets wishing to make use of IR-level aliasing analysis during CodeGen, the use of ptrtoint/inttoptr is a problem for two reasons: 1. BasicAA becomes less powerful in the face of the ptrtoint/inttoptr 2. In cases where type-punning was used, and BasicAA was used to override TBAA, BasicAA may no longer do so. (this had forced us to disable all use of TBAA in CodeGen; something which we can now enable again) This (use of GEPs instead of ptrtoint/inttoptr) is not currently enabled by default (except for those targets that use AA during CodeGen), and so aside from some PowerPC subtargets and SystemZ, there should be no change in behavior. We may be able to switch completely away from the ptrtoint/inttoptr sinking on all targets, but further testing is required. I've doubled-up on a number of existing tests that are sensitive to the address sinking behavior (including some store-merging tests that are sensitive to the order of the resulting ADD operations at the SDAG level). llvm-svn: 206092	2014-04-12 00:59:48 +00:00
Louis Gerbarg	b9a0551862	Add ARM64 CLS patterns This patch adds patterns to generate the cls instruction ARM64. Includes tests for 64 bit and 32 bit operands. rdar://15611957 llvm-svn: 206079	2014-04-11 22:27:58 +00:00
Quentin Colombet	4344da1c71	[RegAllocGreedy][Last Chance Recoloring] Change the name of the exhaustive search option. fexhaustive-register-search => exhaustive-register-search 'f' is a Clang thing! This is related to PR18747. llvm-svn: 206075	2014-04-11 21:51:09 +00:00
Quentin Colombet	567e30bc2b	[RegAllocGreedy][Last Chance Recoloring] Addition of -fexhaustive-register-search option to allow an exhaustive search during last chance recoloring. This is related to PR18747 Patch by MAYUR PANDEY <mayur.p@samsung.com>. llvm-svn: 206072	2014-04-11 21:39:44 +00:00
Tom Stellard	a1a5d9aa2e	SelectionDAG: Use helper function to improve legalization of ISD::MUL The TargetLowering::expandMUL() helper contains lowering code extracted from the DAGTypeLegalizer and allows the SelectionDAGLegalizer to expand more ISD::MUL patterns without having to use a library call. llvm-svn: 206037	2014-04-11 16:12:01 +00:00
Reid Kleckner	9c6582129a	Move the segmented stack switch to a function attribute This removes the -segmented-stacks command line flag in favor of a per-function "split-stack" attribute. Patch by Luqman Aden and Alex Crichton! llvm-svn: 205997	2014-04-10 22:58:43 +00:00
Josh Magee	79ae600818	[stack protector] Refactor and clean-up test. No functionality change. Refactored stack-protector.ll to use new-style function attributes everywhere and eliminated unnecessary attributes. This cleanup is in preparation for an upcoming test change. llvm-svn: 205996	2014-04-10 22:47:27 +00:00
Jim Grosbach	576f8cf19f	X86: Tighten up test. llc CPU autodection bites again. Speculative fix for bot failures. llvm-svn: 205940	2014-04-10 00:27:43 +00:00
Jim Grosbach	e4fef71981	Add support for load folding of avx1 logical instructions AVX supports logical operations using an operand from memory. Unfortunately because integer operations were not added until AVX2 the AVX1 logical operation's types were preventing the isel from folding the loads. In a limited number of cases the peephole optimizer would fold the loads, but most were missed. This patch adds explicit patterns with appropriate casts in order for these loads to be folded. The included test cases run on reduced examples and disable the peephole optimizer to ensure the folds are being pattern matched. Patch by Louis Gerbarg <lgg@apple.com> rdar://16355124 llvm-svn: 205938	2014-04-09 23:39:25 +00:00
Jim Grosbach	cad4cd6c9e	SelectionDAG: Don't constant fold target-specific nodes. FoldConstantArithmetic() only knows how to deal with a few target independent ISD opcodes. Bail early if it sees a target-specific ISD node. These node do funny things with operand types which may break the assumptions of the code that follows, and there's no actual folding that can be done anyway. For example, non-constant 256 bit vector shifts on X86 have a shift-amount operand that's a 128-bit v4i32 vector regardless of what the first operand type is and that breaks the assumption that the operand types must match. rdar://16530923 llvm-svn: 205937	2014-04-09 23:28:11 +00:00
Chad Rosier	5f8d6a6c15	[AArch64] Implement the isZExtFree APIs. llvm-svn: 205926	2014-04-09 20:51:21 +00:00
Chad Rosier	9ce19fb65c	[AArch64] Implement the isTruncateFree API. In AArch64 i64 to i32 truncate operation is a subregister access. This allows more opportunities for LSR optmization to eliminate variables of different types (i32 and i64). llvm-svn: 205925	2014-04-09 20:43:40 +00:00
Quentin Colombet	0b1a5584d6	[DAGCombiner] DAG combine does not know how to combine indexed loads with sign/zero/any extensions. However a few places were not checking properly the property of the load and were turning an indexed load into a regular extended load. Therefore the indexed value was lost during the process and this was triggering an assertion. <rdar://problem/16389332> llvm-svn: 205923	2014-04-09 20:03:05 +00:00
Justin Holewinski	30d56a7b86	[NVPTX] Add preliminary intrinsics and codegen support for textures/surfaces This commit adds intrinsics and codegen support for the surface read/write and texture read instructions that take an explicit sampler parameter. Codegen operates on image handles at the PTX level, but falls back to direct replacement of handles with kernel arguments if image handles are not enabled. Note that image handles are explicitly disabled for all target architectures in this change (to be enabled later). llvm-svn: 205907	2014-04-09 15:39:15 +00:00
Justin Holewinski	9d852a8e08	[NVPTX] Add support for addrspacecast in global variable initializers, including emitting generic() when casting to address space 0. llvm-svn: 205906	2014-04-09 15:39:11 +00:00
Alp Toker	16f98b255d	Fix some doc and comment typos llvm-svn: 205899	2014-04-09 14:47:27 +00:00
Bradley Smith	3971d3dc75	[ARM64] Rename LR to the UAL-compliant 'X30'. llvm-svn: 205885	2014-04-09 14:43:59 +00:00
Bradley Smith	6f1aa59c31	[ARM64] Rename FP to the UAL-compliant 'X29'. llvm-svn: 205884	2014-04-09 14:43:50 +00:00
Elena Demikhovsky	cf0b9bafc3	AVX-512: insert element to mask vector; store i1 data Implemented INSERT_VECTOR_ELT operation for v16i1 and v8i1 vectors; Implemented "store" for i1 type llvm-svn: 205850	2014-04-09 12:37:50 +00:00
Daniel Sanders	b282f1fec5	Re-commit: [mips] abs.[ds], and neg.[ds] should be allowed regardless of -enable-no-nans-fp-math Summary: They behave in accordance with the Has2008 and ABS2008 configuration bits of the processor which are used to select between the 1985 and 2008 versions of IEEE 754. In 1985 mode, these instructions are arithmetic (i.e. they raise invalid operation exceptions when given NaN), in 2008 mode they are non-arithmetic (i.e. they are copies). nmadd.[ds], and nmsub.[ds] are still subject to -enable-no-nans-fp-math because the ISA spec does not explicitly state that they obey Has2008 and ABS2008. Fixed the issue with the previous version of this patch (r205628). A pre-existing 'let Predicate =' statement was removing some predicates that were necessary for FP64 to behave correctly. Reviewers: matheusalmeida Reviewed By: matheusalmeida Differential Revision: http://llvm-reviews.chandlerc.com/D3274 llvm-svn: 205844	2014-04-09 09:56:43 +00:00
Matt Arsenault	2c33562cd6	R600/SI: Match not instruction. llvm-svn: 205837	2014-04-09 07:16:16 +00:00
Tim Northover	b36d428d27	ARM64: scalarize v1i64 mul operation This is the second part of fixing PR19367. llvm-svn: 205836	2014-04-09 07:07:02 +00:00
Tim Northover	b430cf6681	ARM64: add pattern for <1 x i64> custom not node. This should fix PR19367. llvm-svn: 205835	2014-04-09 06:55:39 +00:00
Juergen Ributzka	c11e8b67bb	[Constant Hoisting][ARM64] Enable constant hoisting for ARM64. This implements the target-hooks for ARM64 to enable constant hoisting. This fixes <rdar://problem/14774662> and <rdar://problem/16381500>. llvm-svn: 205791	2014-04-08 20:39:59 +00:00
Tim Northover	33d07468bc	ARM64: fix fmsub patterns which assumed accum operand was first Confusingly, the NEON fmla instructions put the accumulator first but the scalar versions put it at the end (like the fma lib function & LLVM's intrinsic). This should fix PR19345, assuming there's only one issue. llvm-svn: 205758	2014-04-08 12:23:51 +00:00
Elena Demikhovsky	3dcfbdfa54	AVX-512: Added fp_to_uint and uint_to_fp patterns. llvm-svn: 205754	2014-04-08 07:24:02 +00:00

... 4 5 6 7 8 ...

10005 Commits