llvm-project

Commit Graph

Author	SHA1	Message	Date
Chad Rosier	1bbd7fb38e	[AArch64] Add support for generating pre- and post-index load/store pairs. llvm-svn: 248593	2015-09-25 17:48:17 +00:00
Matt Arsenault	4bf43d4e68	AMDGPU: Handle i64->v2i32 loads/stores in PreprocessISelDAG This fixes a select error when the i64 source was also bitcasted to v2i32 in the original source. Instead of awkwardly trying to select the modified source value and the store, replace before isel begins. Uses a worklist to avoid possible problems from mutating the DAG, although it seems to work OK without it. llvm-svn: 248589	2015-09-25 17:27:08 +00:00
Matt Arsenault	5f70436c49	AMDGPU: Improve accuracy of instruction rates for VOPC These were all using the default 32-bit VALU write class, but the i64/f64 compares are half rate. I'm not sure this is really correct, because they are still using the write to VALU write class, even though they really write to the SALU. llvm-svn: 248582	2015-09-25 16:58:25 +00:00
Saleem Abdulrasool	fe83b50289	ARM: address WoA division limitation We now emit the compiler generated divide by zero check that was needed for the MSVC routines. We construct a psuedo-instruction for the DBZ check as the operation requires splitting up the BB. For the 64-bit operations, we need to custom expand the node as we need to insert the DBZ check and then emit the libcall to the appropriate name. Because this is target specific, it seemed better to reproduce the expansion operation from the target-agnostic type legalization rather than sink this there to avoid the duplication. The division library calls now match MSVC semantically. llvm-svn: 248561	2015-09-25 05:15:46 +00:00
Simon Pilgrim	68d0050c6a	[X86][SSE2] Fix zero/any extension shuffles that don't start from the first element Fix for D12561 - we weren't correctly ensuring that the base element for extension was moved to start on a boundary suitable for UNPCKL/H llvm-svn: 248536	2015-09-24 21:02:17 +00:00
Matt Arsenault	e66621b306	AMDGPU: Add s_dcache_* instructions llvm-svn: 248533	2015-09-24 19:52:27 +00:00
Matt Arsenault	d6adfb401c	AMDGPU: Add cache invalidation instructions. These are necessary for implementing mem_fence for OpenCL 2.0. The VI assembler tests are disabled since it seems to be using the wrong encoding or opcode. llvm-svn: 248532	2015-09-24 19:52:21 +00:00
Mohammad Shahid	d0203cbf1c	Regression Test: Deletes redundant/invalid test. Removes absdiff_expand.ll regression test file which is invalid. Diffrential Revision: http://reviews.llvm.org/D11678 llvm-svn: 248493	2015-09-24 14:37:25 +00:00
Mohammad Shahid	13f1dfdf2e	Codegen: Fix llvm.absdiff semantic. Fixes the overflow case of llvm.absdiff intrinsic also updats the tests and LangRef.rst accordingly. Differential Revision: http://reviews.llvm.org/D11678 llvm-svn: 248483	2015-09-24 10:35:03 +00:00
Matt Arsenault	68d938649e	Introduce target hook for optimizing register copies Allow a target to do something other than search for copies that will avoid cross register bank copies. Implement for SI by only rewriting the most basic copies, so it should look through anything like a subregister extract. I'm not entirely satisified with this because it seems like eliminating a reg_sequence that isn't fully used should work generically for all targets without them having to override something. However, it seems to be tricky to have a simple implementation of this without rewriting to invalid kinds of subregister copies on some targets. I'm not sure if there is currently a generic way to easily check if a subregister index would be valid for the current use. The current set of TargetRegisterInfo::get*Class functions don't quite behave like I would expect (e.g. getSubClassWithSubReg returns the maximal register class rather than the minimal), so I'm not sure how to make the generic test keep searching if SrcRC:SrcSubReg is a valid replacement for DefRC:DefSubReg. Making the default implementation to check for simple copies breaks a variety of ARM and x86 tests by producing illegal subregister uses. The ARM tests are not actually changed since it should still be using the same sharesSameRegisterFile implementation, this just relaxes them to not check for specific registers. llvm-svn: 248478	2015-09-24 08:36:14 +00:00
Matt Arsenault	cab64f1c75	AMDGPU: Fix printing trailing whitespace for mubuf atomics llvm-svn: 248472	2015-09-24 07:51:17 +00:00
Matt Arsenault	c721df0478	Use new TokenFactor chain when merging stores If the stores are storing values from loads which partially alias the stores, we could end up placing the merged loads and stores on the same chain which has the potential to break. Each store may have a different chain dependency on only some of the original loads. Create a new TokenFactor to capture all of the required dependencies of the stores rather than assuming all stores can use the same chain. The testcase is a situation where this happens, although it does not have an observable change from this. The DAG nodes just happened to not be reordered before despite this missing chain dependency. This is based on an off-list report for an out of tree target which regressed due to r246307 and I haven't managed to find a case where the nodes do end up reordered with an in tree target. llvm-svn: 248468	2015-09-24 07:22:38 +00:00
Matt Arsenault	c8e2ce4046	AMDGPU: Reduce number of copies emitted Instead of always inserting a copy in case the super register is itself a subregister, only extract to the super reg class if this is actually the case. This shouldn't really change codegen, but makes looking at the output of SIFixSGPRCopies easier to read. llvm-svn: 248467	2015-09-24 07:16:37 +00:00
Tim Northover	beb5bccf88	ARM: fix folding stack adjustment (again again again...) This time, the issue is that we weren't accounting for the possibility that aligned DPRs could have been stored after the final "push" in a prologue. When that happened we effectively moved a "sub sp, #N" from below the aligned stores to above them, and everything went to pot. To make it worse, I'd actually committed something testing that we produced wrong code, so the test update is tiny. llvm-svn: 248437	2015-09-23 22:21:09 +00:00
Lawrence Hu	cac0b89289	Swap loop invariant GEP with loop variant GEP to allow more LICM. This patch changes the order of GEPs generated by Splitting GEPs pass, specially when one of the GEPs has constant and the base is loop invariant, then we will generate the GEP with constant first when beneficial, to expose more cases for LICM. If originally Splitting GEP generate the following: do.body.i: %idxprom.i = sext i32 %shr.i to i64 %2 = bitcast %typeD* %s to i8* %3 = shl i64 %idxprom.i, 2 %uglygep = getelementptr i8, i8* %2, i64 %3 %uglygep7 = getelementptr i8, i8* %uglygep, i64 1032 ... Now it genereates: do.body.i: %idxprom.i = sext i32 %shr.i to i64 %2 = bitcast %typeD* %s to i8* %3 = shl i64 %idxprom.i, 2 %uglygep = getelementptr i8, i8* %2, i64 1032 %uglygep7 = getelementptr i8, i8* %uglygep, i64 %3 ... For no-loop cases, the original way of generating GEPs seems to expose more CSE cases, so we don't change the logic for no-loop cases, and only limit our change to the specific case we are interested in. llvm-svn: 248420	2015-09-23 19:25:30 +00:00
Sanjay Patel	1a6534661b	[x86] replace integer 'xor' ops with packed SSE FP 'xor' ops when operating on FP scalars Turn this: movd %xmm0, %eax movd %xmm1, %ecx xorl %eax, %ecx movd %ecx, %xmm0 into this: xorps %xmm1, %xmm0 This is related to, but does not solve: https://llvm.org/bugs/show_bug.cgi?id=22428 This is an extension of: http://reviews.llvm.org/rL248395 llvm-svn: 248415	2015-09-23 18:33:42 +00:00
Sanjay Patel	aba37553c4	[x86] replace integer 'or' ops with packed SSE FP 'or' ops when operating on FP scalars Turn this: movd %xmm0, %eax movd %xmm1, %ecx orl %eax, %ecx movd %ecx, %xmm0 into this: orps %xmm1, %xmm0 This is related to, but does not solve: https://llvm.org/bugs/show_bug.cgi?id=22428 This is an extension of: http://reviews.llvm.org/rL248395 llvm-svn: 248409	2015-09-23 18:19:07 +00:00
Sanjay Patel	df2495f331	[x86] replace integer 'and' ops with packed SSE FP 'and' ops when operating on FP scalars Turn this: movd %xmm0, %eax movd %xmm1, %ecx andl %eax, %ecx movd %ecx, %xmm0 into this: andps %xmm1, %xmm0 This is related to, but does not solve: https://llvm.org/bugs/show_bug.cgi?id=22428 Differential Revision: http://reviews.llvm.org/D13065 llvm-svn: 248395	2015-09-23 17:00:06 +00:00
Simon Pilgrim	9cb018b6b6	[X86][SSE] Replace 128-bit SSE41 PMOVSX intrinsics with native IR This patches removes the x86.sse41.pmovsx* intrinsics, provides a suitable upgrade path and updates relevant tests to sign extend a subvector instead. LLVM counterpart to D12835 Differential Revision: http://reviews.llvm.org/D13002 llvm-svn: 248368	2015-09-23 08:48:33 +00:00
Cong Hou	b54a72ef78	Add a test case for the fix of profile update issue when lowering switch statement. llvm-svn: 248356	2015-09-23 00:34:56 +00:00
Matthias Braun	73e4221e6c	LiveIntervalAnalysis: Avoid multiple connected liveness components We may have subregister defs which are unused but not discovered and cleaned up prior to liveness analysis. This creates multiple connected components in the resulting live range which are forbidden in the MachineVerifier because they would unnecesarily constrain the register allocator. Rewrite those dead definitions to define a newly created virtual register. Differential Revision: http://reviews.llvm.org/D13035 llvm-svn: 248335	2015-09-22 22:37:44 +00:00
Ahmed Bougacha	81616a72ea	[ARM] Emit clrex in the expanded cmpxchg fail block. ARM counterpart to r248291: In the comparison failure block of a cmpxchg expansion, the initial ldrex/ldxr will not be followed by a matching strex/stxr. On ARM/AArch64, this unnecessarily ties up the execution monitor, which might have a negative performance impact on some uarchs. Instead, release the monitor in the failure block. The clrex instruction was designed for this: use it. Also see ARMARM v8-A B2.10.2: "Exclusive access instructions and Shareable memory locations". Differential Revision: http://reviews.llvm.org/D13033 llvm-svn: 248294	2015-09-22 17:22:58 +00:00
Ahmed Bougacha	07a844d758	[AArch64] Emit clrex in the expanded cmpxchg fail block. In the comparison failure block of a cmpxchg expansion, the initial ldrex/ldxr will not be followed by a matching strex/stxr. On ARM/AArch64, this unnecessarily ties up the execution monitor, which might have a negative performance impact on some uarchs. Instead, release the monitor in the failure block. The clrex instruction was designed for this: use it. Also see ARMARM v8-A B2.10.2: "Exclusive access instructions and Shareable memory locations". Differential Revision: http://reviews.llvm.org/D13033 llvm-svn: 248291	2015-09-22 17:21:44 +00:00
Stephen Canon	8216d88511	Don't raise inexact when lowering ceil, floor, round, trunc. The C standard has historically not specified whether or not these functions should raise the inexact flag. Traditionally on Darwin, these functions did raise inexact, and the llvm lowerings followed that conventions. n1778 (C bindings for IEEE-754 (2008)) clarifies that these functions should not set inexact. This patch brings the lowerings for arm64 and x86 in line with the newly specified behavior. This also lets us fold some logic into TD patterns, which is nice. Differential Revision: http://reviews.llvm.org/D12969 llvm-svn: 248266	2015-09-22 11:43:17 +00:00
Simon Pilgrim	1cad0cd3ce	[X86][SSE] Match zero/any extension shuffles that don't start from the first element This patch generalizes the lowering of shuffles as zero extensions to allow extensions that don't start from the first element. It now recognises extensions starting anywhere in the lower 128-bits or at the start of any higher 128-bit lane. The motivation was to reduce the number of high cost pshufb calls, but it also improves the SSE2 case as well. Differential Revision: http://reviews.llvm.org/D12561 llvm-svn: 248250	2015-09-22 08:16:08 +00:00
Simon Pilgrim	4003ed2da3	[DAGCombiner] Improve FMA support for interpolation patterns This patch adds support for combining patterns such as (FMUL(FADD(1.0, x), y)) and (FMUL(FSUB(x, 1.0), y)) to their FMA equivalents. This is useful in particular for linear interpolation cases such as (FADD(FMUL(x, t), FMUL(y, FSUB(1.0, t)))) Differential Revision: http://reviews.llvm.org/D13003 llvm-svn: 248210	2015-09-21 20:32:48 +00:00
Jeroen Ketema	41681a5329	[ARM] Do not scale vext with a factor The vext pseudo-instruction takes the number of elements that need to be extracted, not the number of bytes. Hence, use the number of elements directly instead of scaling them with a factor. Reviewers: Silviu Baranga, James Molloy (not reflected in the differential revision) Differential Revision: http://reviews.llvm.org/D12974 llvm-svn: 248208	2015-09-21 20:28:04 +00:00
Ulrich Weigand	126caeb043	[SystemZ] Fix expansion of ISD::FPOW and ISD::FSINCOS The ISD::FPOW and ISD::FSINCOS opcodes default to Legal, but there is no legal instruction for those on SystemZ. This could cause LLVM internal errors. Fixed by setting the operation action to Expand for those opcodes. Also added test cases for all other LLVM IR intrinsics that should generate a library call. (Those already work correctly since the default operation action is fine.) llvm-svn: 248180	2015-09-21 17:35:45 +00:00
Matt Arsenault	b774834429	DAGCombiner: Replace store of FP constant after attemping store merges If storing multiple FP constants, some subset of the stores would be replaced with integers due to visit order, so MergeConsecutiveStores would only partially merge these. llvm-svn: 248169	2015-09-21 15:59:46 +00:00
Sanjay Patel	bab5d6c636	add test file ahead of any functional changes for PR22428 llvm-svn: 248123	2015-09-20 15:58:00 +00:00
Simon Pilgrim	c6a553241c	[X86][SSE] Intrinsics builtins test refresh. NFCI llvm-svn: 248122	2015-09-20 15:41:35 +00:00
Igor Breger	b7e1f9d680	AVX512: Implemented encoding and intrinsics for vcmpss/sd. Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D12593 llvm-svn: 248121	2015-09-20 15:15:10 +00:00
Asaf Badouh	2744d21fb8	[X86][AVX512] extend support in Scalar conversion add scalar FP to Int conversion with truncation intrinsics add scalar conversion FP32 from/to FP64 intrinsics add rounding mode and SAE mode encoding for these intrinsics Differential Revision: http://reviews.llvm.org/D12665 llvm-svn: 248117	2015-09-20 14:31:19 +00:00
Igor Breger	4c4cd789c9	AVX512: vsqrtss/sd encoding and intrinsics implementation. Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D12102 llvm-svn: 248116	2015-09-20 09:13:41 +00:00
Asaf Badouh	572bbceecc	[X86][AVX512DQ] Add fpclass instruction Differential Revision: http://reviews.llvm.org/D12931 llvm-svn: 248115	2015-09-20 08:46:07 +00:00
Michael Kuperstein	58e86bc893	[X86] Fix sitofp and uitofp instruction matching failures with long double and avx512 The operation action for i32 and i64 cannot be set to legal, as long double needs custom lowering. Patch by: mitch.l.bodart@intel.com Differential Revision: http://reviews.llvm.org/D12372 llvm-svn: 248114	2015-09-20 08:12:17 +00:00
Igor Breger	1d55f20bee	AVX512: Implemented intrinsics for vshuff32x4, vshuff64x2, vshufi64x2, vshufi32x4 Added tests for intrinsics. Differential Revision: http://reviews.llvm.org/D12525 llvm-svn: 248113	2015-09-20 07:18:53 +00:00
Igor Breger	0ede3cbb5c	AVX512: Implement instructions encoding, lowering and intrinsics vinserti64x4, vinserti64x2, vinserti32x8, vinserti32x4, vinsertf64x4, vinsertf64x2, vinsertf32x8, vinsertf32x4 Added tests for encoding, lowering and intrinsics. Differential Revision: http://reviews.llvm.org/D11893 llvm-svn: 248111	2015-09-20 06:52:42 +00:00
Simon Pilgrim	27f81776ad	[X86][AVX2] Use general sext IR for vpmovsx stack folding tests llvm-svn: 248093	2015-09-19 17:04:18 +00:00
Simon Pilgrim	d0448ee59f	[X86][SSE] Vectorize CTTZ + CTTZ_ZERO_UNDEF Now that we have fast vector CTPOP implementations we can use this to speed up vector CTTZ using the pattern (cttz(x) = ctpop((x & -x) - 1)) Additionally, for AVX512CD that provides lzcnt instructions we can use the pattern (cttz_undef(x) = (width - 1) - ctlz(x & -x)) Differential Revision: http://reviews.llvm.org/D12663 llvm-svn: 248091	2015-09-19 13:22:57 +00:00
Matt Arsenault	cc5d106263	AMDGPU: Add failing testcase for live interval construction llvm-svn: 248067	2015-09-19 00:03:56 +00:00
Cong Hou	d40105d321	Update edge weights properly when merging blocks in if-conversion. In if-conversion, there is a utility function MergeBlocks() that is used to merge blocks. However, when new edges are built in this function the edge weight is either not provided or not updated properly, leading to a modified CFG with incorrect edge weights. This patch corrects this issue. Differential Revision: http://reviews.llvm.org/D12513 llvm-svn: 248030	2015-09-18 20:22:41 +00:00
Eric Christopher	a835956bda	Limit the range of processors supported by ARM fast isel to v6 or later as that's all that is tested right now. Fixes PR24858. llvm-svn: 248027	2015-09-18 20:08:18 +00:00
Cong Hou	f9f9ffb98b	Scaling up values in ARMBaseInstrInfo::isProfitableToIfCvt() before they are scaled by a probability to avoid precision issue. In ARMBaseInstrInfo::isProfitableToIfCvt(), there is a simple cost model in which the number of cycles is scaled by a probability to estimate the cost. However, when the number of cycles is small (which is usually the case), there is a precision issue after the computation. To avoid this issue, this patch scales those cycles by 1024 (chosen to make the multiplication a litter faster) before they are scaled by the probability. Other variables are also scaled up for the final comparison. Differential Revision: http://reviews.llvm.org/D12742 llvm-svn: 248018	2015-09-18 18:19:40 +00:00
Matthias Braun	0b7d6c14c9	SelectionDAG: Introduce PersistentID to SDNode for assert builds. This gives us more human readable numbers to identify nodes in debug dumps. Before: 0x7fcbd9700160: ch = EntryToken 0x7fcbd985c7c8: i64 = Register %RAX ... 0x7fcbd9700160: <multiple use> 0x7fcbd985c578: i64,ch = MOV64rm 0x7fcbd985c6a0, 0x7fcbd985cc68, 0x7fcbd985c200, 0x7fcbd985cd90, 0x7fcbd985ceb8, 0x7fcbd9700160<Mem:LD8[@foo]> [ORD=2] 0x7fcbd985c8f0: ch,glue = CopyToReg 0x7fcbd9700160, 0x7fcbd985c7c8, 0x7fcbd985c578 [ORD=3] 0x7fcbd985c7c8: <multiple use> 0x7fcbd985c8f0: <multiple use> 0x7fcbd985c8f0: <multiple use> 0x7fcbd985ca18: ch = RETQ 0x7fcbd985c7c8, 0x7fcbd985c8f0, 0x7fcbd985c8f0:1 [ORD=3] Now: t0: ch = EntryToken t5: i64 = Register %RAX ... t0: <multiple use> t3: i64,ch = MOV64rm t10, t12, t11, t13, t14, t0<Mem:LD8[@foo]> [ORD=2] t6: ch,glue = CopyToReg t0, t5, t3 [ORD=3] t5: <multiple use> t6: <multiple use> t6: <multiple use> t7: ch = RETQ t5, t6, t6:1 [ORD=3] Differential Revision: http://reviews.llvm.org/D12564 llvm-svn: 248010	2015-09-18 17:41:00 +00:00
Geoff Berry	43ec15e57e	[AArch64] Improved bitfield instruction selection. Summary: For bitfield insert OR matching, check both operands for larger pattern first before checking for smaller pattern. Add pattern for unsigned bitfield insert-in-zero done with SHL+AND. Resolves PR21631. Reviewers: jmolloy, t.p.northover Subscribers: aemerson, rengolin, llvm-commits, mcrosier Differential Revision: http://reviews.llvm.org/D12908 llvm-svn: 248006	2015-09-18 17:11:53 +00:00
Quentin Colombet	b4c6886215	[ShrinkWrap] Refactor the handling of infinite loop in the analysis. - Strenghten the logic to be sure we hoist the restore point out of the current loop. (The fixes a bug with infinite loop, added as part of the patch.) - Walk over the exit blocks of the current loop to conver to the desired restore point in one iteration of the update loop. llvm-svn: 247958	2015-09-17 23:21:34 +00:00
David Majnemer	163b7f121c	[WinEH] Fix tests broken by funclet-layout llvm-svn: 247944	2015-09-17 21:11:12 +00:00
David Majnemer	978902309a	[WinEH] Add a funclet layout pass Windows EH funclets need to be contiguous. The FuncletLayout pass will ensure that the funclets are together and begin with a funclet entry MBB. Differential Revision: http://reviews.llvm.org/D12943 llvm-svn: 247937	2015-09-17 20:45:18 +00:00
Reid Kleckner	5b8a46e771	[WinEH] Make funclet return instrs pseudo instrs This makes catchret look more like a branch, and less like a weird use of BlockAddress. It also lets us get away from llvm.x86.seh.restoreframe, which relies on the old parentfpoffset label arithmetic. llvm-svn: 247936	2015-09-17 20:43:47 +00:00

1 2 3 4 5 ...

13778 Commits