llvm-project

Commit Graph

Author	SHA1	Message	Date
Lei Huang	34e6621724	Update branch coalescing to be a PowerPC specific pass Implementing this pass as a PowerPC specific pass. Branch coalescing utilizes the analyzeBranch method which currently does not include any implicit operands. This is not an issue on PPC but must be handled on other targets. Pass is currently off by default. Enabled via -enable-ppc-branch-coalesce. Differential Revision : https: // reviews.llvm.org/D32776 llvm-svn: 313061	2017-09-12 18:39:11 +00:00
Craig Topper	958106d0f1	[X86] Move matching of (and (srl/sra, C), (1<<C) - 1) to BEXTR/BEXTRI instruction to custom isel Recognizing this pattern during DAG combine hides information about the 'and' and the shift from other combines. I think it should be recognized at isel so its as late as possible. But it can't be done with table based isel because you need to be able to look at both immediates. This patch moves it to custom isel in X86ISelDAGToDAG.cpp. This does break a couple tests in tbm_patterns because we are now emitting an and_flag node or (cmp and, 0) that we dont' recognize yet. We already had this problem for several other TBM patterns so I think this fine and we can address of them together. I've also fixed a bug where the combine to BEXTR was preventing us from using a trick of zero extending AH to handle extracts of bits 15:8. We might still want to use BEXTR if it enables load folding. But honestly I hope we narrowed the load instead before got to isel. I think we should probably also support matching BEXTR from (srl/srl (and mask << C), C). But that should be a different patch. Differential Revision: https://reviews.llvm.org/D37592 llvm-svn: 313054	2017-09-12 17:40:25 +00:00
Elena Demikhovsky	18ff5c1374	Added "zext" from v2i8 to v2i32. In the next patch I'll optimize the sequence. llvm-svn: 313052	2017-09-12 17:27:53 +00:00
Hans Wennborg	8c1eb106bd	Revert r313009 "[ARM] Use ADDCARRY / SUBCARRY" This was causing PR34045 to fire again. > This is a preparatory step for D34515 and also is being recommitted as its > first version caused PR34045. > > This change: > - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32 > - lowering is done by first converting the boolean value into the carry flag > using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value > using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two > operations does the actual addition. > - for subtraction, given that ISD::SUBCARRY second result is actually a > borrow, we need to invert the value of the second operand and result before > and after using ARMISD::SUBE. We need to invert the carry result of > ARMISD::SUBE to preserve the semantics. > - given that the generic combiner may lower ISD::ADDCARRY and > ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering > as well otherwise i64 operations now would require branches. This implies > updating the corresponding test for unsigned. > - add new combiner to remove the redundant conversions from/to carry flags > to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C > - fixes PR34045 > > Differential Revision: https://reviews.llvm.org/D35192 Also revert follow-up r313010: > [ARM] Fix typo when creating ISD::SUB nodes > > In D35192, I accidentally introduced a typo when creating ISD::SUB nodes, > giving them two values instead of one. > > This fails when the merge_values combiner finds one of these nodes. > > This change fixes PR34564. > > Differential Revision: https://reviews.llvm.org/D37690 llvm-svn: 313044	2017-09-12 16:24:17 +00:00
Simon Pilgrim	76418aae74	[X86][AVX2] Add gather/movntdqa/pmaskmov/pmovmskb/pslldq/psrldq instructions to scheduling tests llvm-svn: 313039	2017-09-12 15:52:01 +00:00
Simon Pilgrim	0af5a772e0	[X86][AVX2] Add further instructions to scheduling tests llvm-svn: 313032	2017-09-12 15:01:20 +00:00
Simon Pilgrim	d2d2b37cc9	[X86][AVX2] Add integer broadcast scheduling tests llvm-svn: 313026	2017-09-12 12:59:20 +00:00
Jonas Paulsson	fc4f323ac1	[SystemZ] Add the CoveredBySubRegs bit to GPR64, GPR128 and FPR128 registers. This bit is needed in order for the CalleeSavedRegs list to automatically include the super registers if all of their subregs are present. Thanks to Wei Mi for initially indicating this deficiency in the SystemZ backend. Review: Ulrich Weigand. https://bugs.llvm.org/show_bug.cgi?id=34550 llvm-svn: 313023	2017-09-12 12:11:29 +00:00
Simon Pilgrim	5a931c641e	[X86][AVX2] Add additional fp-broadcast/subvector/shuffle scheduling tests llvm-svn: 313022	2017-09-12 11:17:01 +00:00
Simon Pilgrim	ef9a9d709a	[X86][AVX] Add vperm2f128 scheduling test llvm-svn: 313021	2017-09-12 11:10:59 +00:00
Simon Pilgrim	f336d9ce3c	[X86][AVX2] Remove old (unused) intrinsic declarations llvm-svn: 313020	2017-09-12 11:09:30 +00:00
Yael Tsafrir	47668b5e03	[X86] Lower _mm[256\|512]_[mask[z]]_avg_epu[8\|16] intrinsics to native llvm IR Differential Revision: https://reviews.llvm.org/D37560 llvm-svn: 313013	2017-09-12 07:50:35 +00:00
Roger Ferrer Ibanez	4f92b4162f	[ARM] Use ADDCARRY / SUBCARRY This is a preparatory step for D34515 and also is being recommitted as its first version caused PR34045. This change: - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32 - lowering is done by first converting the boolean value into the carry flag using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two operations does the actual addition. - for subtraction, given that ISD::SUBCARRY second result is actually a borrow, we need to invert the value of the second operand and result before and after using ARMISD::SUBE. We need to invert the carry result of ARMISD::SUBE to preserve the semantics. - given that the generic combiner may lower ISD::ADDCARRY and ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering as well otherwise i64 operations now would require branches. This implies updating the corresponding test for unsigned. - add new combiner to remove the redundant conversions from/to carry flags to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C - fixes PR34045 Differential Revision: https://reviews.llvm.org/D35192 llvm-svn: 313009	2017-09-12 07:40:09 +00:00
Craig Topper	afdc36ed74	[X86] Add an extra instruction to TruncAssertSext.ll to prevent the 'or' from being narrowed so that the movl is really required to avoid a miscompile. If we allow the OR to be narrowed then the upper bits really are zero and we can't tell if the zeroing movl was removed on purpose. While here regenerate the test with update_llc_test_checks.py llvm-svn: 312995	2017-09-12 03:50:44 +00:00
Craig Topper	66e4ace1c8	[X86] Rename TruncAssertZext.ll test to TruncAssertSext.ll. Since its testing AssertSext. llvm-svn: 312991	2017-09-12 01:30:10 +00:00
Hans Wennborg	075e5a2e2b	Revert r312898 "[ARM] Use ADDCARRY / SUBCARRY" It caused PR34564. > This is a preparatory step for D34515 and also is being recommitted as its > first version caused PR34045. > > This change: > - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32 > - lowering is done by first converting the boolean value into the carry flag > using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value > using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two > operations does the actual addition. > - for subtraction, given that ISD::SUBCARRY second result is actually a > borrow, we need to invert the value of the second operand and result before > and after using ARMISD::SUBE. We need to invert the carry result of > ARMISD::SUBE to preserve the semantics. > - given that the generic combiner may lower ISD::ADDCARRY and > ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering > as well otherwise i64 operations now would require branches. This implies > updating the corresponding test for unsigned. > - add new combiner to remove the redundant conversions from/to carry flags > to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C > - fixes PR34045 > > Differential Revision: https://reviews.llvm.org/D35192 llvm-svn: 312980	2017-09-11 23:52:02 +00:00
Yonghong Song	be9c00347f	bpf: add " ll" in the LD_IMM64 asmstring This partially revert previous fix in commit f5858045aa0b ("bpf: proper print imm64 expression in inst printer"). In that commit, the original suffix "ll" is removed from LD_IMM64 asmstring. In the customer print method, the "ll" suffix is printed if the rhs is an immediate. For example, "r2 = 5ll" => "r2 = 5ll", and "r3 = varll" => "r3 = var". This has an issue though for assembler. Since assembler relies on asmstring to do pattern matching, it will not be able to distiguish between "mov r2, 5" and "ld_imm64 r2, 5" since both asmstring is "r2 = 5". In such cases, the assembler uses 64bit load for all "r = <val>" asm insts. This patch adds back " ll" suffix for ld_imm64 with one additional space for "#reg = #global_var" case. Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 312978	2017-09-11 23:43:35 +00:00
Adrian Prantl	16aa4cf7ef	llvm-dwarfdump: Make -brief the default and add a -verbose option instead. Differential Revision: https://reviews.llvm.org/D37717 llvm-svn: 312972	2017-09-11 23:05:20 +00:00
Adrian Prantl	7bc1b28291	llvm-dwarfdump: Replace -debug-dump=sect option with individual options. As discussed on llvm-dev in http://lists.llvm.org/pipermail/llvm-dev/2017-September/117301.html this changes the command line interface of llvm-dwarfdump to match the one used by the dwarfdump utility shipping on macOS. In addition to being shorter to type this format also has the advantage of allowing more than one section to be specified at the same time. In a nutshell, with this change $ llvm-dwarfdump --debug-dump=info $ llvm-dwarfdump --debug-dump=apple-objc becomes $ dwarfdump --debug-info --apple-objc Differential Revision: https://reviews.llvm.org/D37714 llvm-svn: 312970	2017-09-11 22:59:45 +00:00
Matt Arsenault	537bd3b906	AMDGPU: Allow coldcc calls llvm-svn: 312936	2017-09-11 18:54:20 +00:00
Stanislav Mekhanoshin	710da42b86	[AMDGPU] Produce madak and madmk from the two-address pass These two instructions are normally selected, but when the two address pass converts mac into mad we end up with the mad where we could have one of these. Differential Revision: https://reviews.llvm.org/D37389 llvm-svn: 312928	2017-09-11 17:13:57 +00:00
Zvi Rackover	255488a1e0	X86 Tests: More AVX512 conversions tests. NFC Adding more tests for AVX512 fp<->int conversions that were missing. llvm-svn: 312921	2017-09-11 15:54:38 +00:00
Simon Pilgrim	b092bd321a	[X86][SSE] Add support for X86ISD::PACKSS to ComputeNumSignBitsForTargetNode Helps improve combineLogicBlendIntoPBLENDV support by allowing us to peek into through PACKSS truncations of vector comparison results. Differential Revision: https://reviews.llvm.org/D37680 llvm-svn: 312916	2017-09-11 14:03:47 +00:00
Tim Renouf	660ba2b8af	[AMDGPU] exp should not be in WQM mode A mrt exp with vm=1 must be in exact (non-WQM) mode, as it also exports the exec mask as the valid mask to determine which pixels to render. This commit marks any exp as needing to be in exact mode. Actually, if there are multiple mrt exps, only one needs to have vm=1, and only that one needs to be in exact mode. But that is an optimization for another day. Differential Revision: https://reviews.llvm.org/D36305 llvm-svn: 312915	2017-09-11 13:55:39 +00:00
Simon Pilgrim	d0ff65b50e	[X86][SSE] Add further test cases showing failure to compute sign bits through PACKSS Suggested in D37680 Note: had to drop AVX512VL tests as there is an infinite loop in the new tests that needs further investigation (not relevant to D37680). llvm-svn: 312910	2017-09-11 12:18:43 +00:00
Gadi Haber	3ddffced43	[X86][SKX][KNL] Updating several CodeGen tests to use the attr flag instead of mcpu flag NFC. Updated 3 Codegen regression tests to use the -mattr flag instead of the -mcpu flags as follows: Instead of -mcpu=skx use -mattr=+avx512f,+avx512bw,+avx512vl,+avx512dq Instead of -mcpu=knl use -mattr=+avx512f Reviewers: delena Revision: https://reviews.llvm.org/D37674 llvm-svn: 312909	2017-09-11 11:26:20 +00:00
Michael Zuckerman	9707ba0957	[Interleved][Stride 3]Adding test for case the VF=64 target with AVX512. llvm-svn: 312907	2017-09-11 10:57:15 +00:00
Simon Pilgrim	f6fa1d0369	[X86][SSE] Add test showing failure to compute sign bits through PACKSS Prevents combineLogicBlendIntoPBLENDV from merging to PBLENDV llvm-svn: 312906	2017-09-11 10:50:03 +00:00
Dylan McKay	0fc5fe0a58	[AVR] Enable the '__do_copy_data' function Also enables '__do_clear_bss'. These functions are automaticalled called by the CRT if they are declared. We need these to be called otherwise RAM will start completely uninitialised, even though we need to copy RAM variables from progmem to RAM. llvm-svn: 312905	2017-09-11 10:32:51 +00:00
Igor Breger	1f14364d64	[GlobalISel][X86] G_ANYEXT support. Summary: G_ANYEXT support Reviewers: zvi, delena Reviewed By: delena Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D37675 llvm-svn: 312903	2017-09-11 09:41:13 +00:00
Roger Ferrer Ibanez	12b20f2307	[ARM] Use ADDCARRY / SUBCARRY This is a preparatory step for D34515 and also is being recommitted as its first version caused PR34045. This change: - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32 - lowering is done by first converting the boolean value into the carry flag using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two operations does the actual addition. - for subtraction, given that ISD::SUBCARRY second result is actually a borrow, we need to invert the value of the second operand and result before and after using ARMISD::SUBE. We need to invert the carry result of ARMISD::SUBE to preserve the semantics. - given that the generic combiner may lower ISD::ADDCARRY and ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering as well otherwise i64 operations now would require branches. This implies updating the corresponding test for unsigned. - add new combiner to remove the redundant conversions from/to carry flags to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C - fixes PR34045 Differential Revision: https://reviews.llvm.org/D35192 llvm-svn: 312898	2017-09-11 07:38:05 +00:00
Elena Demikhovsky	cc477bbcea	Fixed a bug in splitting Scatter operation in the Type Legalizer. After the split of the Scatter operation, the order of the new instructions is well defined - Lo goes before Hi. Otherwise the semantic of Scatter (from LSB to MSB) is broken. I'm chaining 2 nodes to prevent reordering. Differential Revision https://reviews.llvm.org/D37670 llvm-svn: 312894	2017-09-11 06:18:15 +00:00
Elena Demikhovsky	9afc3d7b82	Added a test that demonstrates a ug in Scatter scheduling. The bug is going to be fixed in an upcomming patch. llvm-svn: 312883	2017-09-10 13:20:42 +00:00
Simon Pilgrim	ed27bea373	[X86] Add v2i4 store test case (PR20012) llvm-svn: 312874	2017-09-09 20:28:50 +00:00
Simon Pilgrim	e932c7fafa	[X86] Add v2i2 test case (PR20011) llvm-svn: 312873	2017-09-09 20:22:35 +00:00
Simon Pilgrim	da41ca5a25	[X86][FMA] Regenerate FMA tests llvm-svn: 312871	2017-09-09 19:25:59 +00:00
Simon Pilgrim	97a56866a2	[X86][SSE] i32 vector multiplications test cases from PR6399 llvm-svn: 312868	2017-09-09 18:18:17 +00:00
Simon Pilgrim	a866a190d6	[X86][MOVBE] Fix typo in MOVBE scheduling test names Copy+paste is not your friend llvm-svn: 312867	2017-09-09 17:52:44 +00:00
Craig Topper	3be1db82b6	[X86] Don't disable slow INC/DEC if optimizing for size Summary: Just because INC/DEC is a little slow on some processors doesn't mean we shouldn't prefer it when optimizing for size. This appears to match gcc behavior. Reviewers: chandlerc, zvi, RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37177 llvm-svn: 312866	2017-09-09 17:11:59 +00:00
Kyle Butt	8c0314c3ed	PPC: Don't select lxv/stxv for insufficiently aligned stack slots. The lxv/stxv instructions require an offset that is 0 % 16. Previously we were selecting lxv/stxv for loads and stores to the stack where the offset from the slot was a multiple of 16, but the stack slot was not 16 or more byte aligned. When the frame gets lowered these transform to r(1\|31) + slot + offset. If slot is not aligned, slot + offset may not be 0 % 16. Now we require 16 byte or more alignment for select lxv/stxv to stack slots. Includes a testcase that shows both sufficiently and insufficiently aligned stack slots. llvm-svn: 312843	2017-09-09 00:37:56 +00:00
Yonghong Song	6807778e52	bpf: fix test failures due to previous bpf change of assembly code syntax Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 312840	2017-09-09 00:11:13 +00:00
Matt Arsenault	2f4df7ec41	AMDGPU: Recompute scc liveness The various scalar bit operations set SCC, so one is erased or moved it needs to be recomputed. Not sure why the existing tests don't fail on this. llvm-svn: 312819	2017-09-08 18:51:26 +00:00
Craig Topper	56af2cad89	[X86] Simplify the slow-incdec test and add test cases with optsize. I think we want to consider using inc/dec with optsize. llvm-svn: 312804	2017-09-08 17:33:54 +00:00
Wei Mi	5d84d9b35c	Fix a bug for rL312641. rL312641 Allowed llvm.memcpy/memset/memmove to be tail calls when parent function return the intrinsics's first argument. However on arm-none-eabi platform, llvm.memcpy will be expanded to __aeabi_memcpy which doesn't have return value. The fix is to check the libcall name after expansion to match "memcpy/memset/memmove" before allowing those intrinsic to be tail calls. llvm-svn: 312799	2017-09-08 16:44:52 +00:00
Krzysztof Parzyszek	f78eca8fb5	Preserve existing regs when adding pristines to LivePhysRegs/LiveRegUnits Differential Revision: https://reviews.llvm.org/D37600 llvm-svn: 312797	2017-09-08 16:29:50 +00:00
Simon Pilgrim	2e4fb24173	[X86] Added PR31045 test case Reduced version of 'addr-calc-crash.ll' that was included in D27044, that had been fixed already by D31286/rL298633 llvm-svn: 312786	2017-09-08 10:49:11 +00:00
Jatin Bhateja	a251312719	[X86] Adding a test point for PR34149 'Suboptimal codegen for "fast" minnum and maxnum' Differential Revision: https://reviews.llvm.org/D37614 llvm-svn: 312778	2017-09-08 09:15:36 +00:00
Dean Michael Berris	711dec260f	[XRay][CodeGen][PowerPC] Fix tail exit codegen for XRay in PPC Summary: This fixes code-gen for XRay in PPC. The regression wasn't caught by codegen tests which we add in this change. What happened was the following: - For tail exits, we used to unconditionally prepend the returns/exits with a pseudo-instruction that gets lowered to the instrumentation sled (and leave the actual return/exit instruction as-is). - Changes to the XRay instrumentation pass caused the tail exits to suddenly also emit the tail exit pseudo-instruction, since the check for whether a return instruction was also a call instruction meant it was a tail exit instruction. - None of the tests caught the regression either due to non-existent tests, or the tests being disabled/removed for continuous breakage. This change re-introduces some of the basic tests and verifies that we're back to a state that allows the back-end to generate appropriate XRay instrumented binaries for PPC in the presence of tail exits. Reviewers: echristo, timshen Subscribers: nemanjai, kbarton, llvm-commits Differential Revision: https://reviews.llvm.org/D37570 llvm-svn: 312772	2017-09-08 01:47:56 +00:00
Chandler Carruth	acbcf06f03	[x86] Flesh out the custom ISel for RMW aritmetic ops with used flags to cover the bitwise operators. Nothing really exciting here, this just stamps out the rest of the core operations that can RMW memory and set flags. Still not implemented here: ADC, SBB. Those will require more interesting logic to channel the flags in, and I'm not currently planning to try to tackle that. It might be interesting for someone who wants to improve our code generation for bignum implementations. Differential Revision: https://reviews.llvm.org/D37141 llvm-svn: 312768	2017-09-08 00:17:12 +00:00
Chandler Carruth	52a31bf268	[x86] Extend the manual ISel of `add` and `sub` with both RMW memory operands and used flags to support matching immediate operands. This is a bit trickier than register operands, and we still want to fall back on a register operands even for things that appear to be "immediates" when they won't actually select into the operation's immediate operand. This also requires us to handle things like selecting `sub` vs. `add` to minimize the number of bits needed to represent the immediate, and picking the shortest immediate encoding. In order to that, we in turn need to scan to make sure that CF isn't used as it will get inverted. The end result seems very nice though, and we're now generating optimal instruction sequences for these patterns IMO. A follow-up patch will further expand this to other operations with RMW memory operands. But handing `add` and `sub` are useful starting points to flesh out the machinery and make sure interesting and complex cases can be handled. Thanks to Craig Topper who provided a few fixes and improvements to this patch in addition to the review! Differential Revision: https://reviews.llvm.org/D37139 llvm-svn: 312764	2017-09-07 23:54:24 +00:00
Paul Robinson	bb92137080	[DWARF] Line 0 should not have a discriminator. It's meaningless and takes up extra space in the line table. Differential Revision: https://reviews.llvm.org/D37364 llvm-svn: 312751	2017-09-07 22:15:44 +00:00
Artem Belevich	8af4e23d1e	[CUDA] Added rudimentary support for CUDA-9 and sm_70. For now CUDA-9 is not included in the list of CUDA versions clang searches for, so the path to CUDA-9 must be explicitly passed via --cuda-path=. On LLVM side NVPTX added sm_70 GPU type which bumps required PTX version to 6.0, but otherwise is equivalent to sm_62 at the moment. Differential Revision: https://reviews.llvm.org/D37576 llvm-svn: 312734	2017-09-07 18:14:32 +00:00
Matt Arsenault	d7e2303df2	AMDGPU: Start selecting v_mad_mix_f32 llvm-svn: 312732	2017-09-07 18:05:07 +00:00
Konstantin Zhuravlyov	5f5b586c99	AMDGPU: Handle non-temporal loads and stores Differential Revision: https://reviews.llvm.org/D36862 llvm-svn: 312729	2017-09-07 17:14:54 +00:00
Konstantin Zhuravlyov	c8c9d4a0a6	AMDGPU: Handle more than one memory operand in SIMemoryLegalizer Differential Revision: https://reviews.llvm.org/D37397 llvm-svn: 312725	2017-09-07 16:14:21 +00:00
Benjamin Kramer	6ef976d5e1	[ARM] Remove redundant vcvt patterns. These don't add any value as they're just compositions of existing patterns. However, they can confuse the cost logic in ISel, leading to duplicated vcvt instructions like in PR33199. llvm-svn: 312724	2017-09-07 14:52:26 +00:00
Michael Zuckerman	5a385940d3	[X86][LLVM]Expanding Supports lowerInterleavedLoad() in X86InterleavedAccess (VF{8\|16\|32} stride 3). This patch expands the support of lowerInterleavedload to {8\|16\|32}x8i stride 3. LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=3 VF={8\|16\|32}) and we plan to include the store (deinterleved side). The patch goal is to optimize the following sequence: a0 b0 c0 a1 b1 c1 a2 b2 c2 a3 b3 c3 a4 b4 c4 a5 b5 c5 a6 b6 c6 a7 b7 c7 into a0 a1 a2 a3 a4 a5 a6 a7 b0 b1 b2 b3 b4 b5 b6 b7 c0 c1 c2 c3 c4 c5 c6 c7 Reviewers 1. zvi 2. igor 3. guyblank 4. dorit 5. Ayal llvm-svn: 312722	2017-09-07 14:02:13 +00:00
Florian Hahn	d39b8a3533	[MachineCombiner] Update instruction depths incrementally for large BBs. Summary: For large basic blocks with lots of combinable instructions, the MachineTraceMetrics computations in MachineCombiner can dominate the compile time, as computing the trace information is quadratic in the number of instructions in a BB and it's relevant successors/predecessors. In most cases, knowing the instruction depth should be enough to make combination decisions. As we already iterate over all instructions in a basic block, the instruction depth can be computed incrementally. This reduces the cost of machine-combine drastically in cases where lots of instructions are combined. The major drawback is that AFAIK, computing the critical path length cannot be done incrementally. Therefore we only compute instruction depths incrementally, for basic blocks with more instructions than inc_threshold. The -machine-combiner-inc-threshold option can be used to set the threshold and allows for easier experimenting and checking if using incremental updates for all basic blocks has any impact on the performance. Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn Reviewed By: fhahn Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits Differential Revision: https://reviews.llvm.org/D36619 llvm-svn: 312719	2017-09-07 12:49:39 +00:00
Alexander Ivchenko	f3a3cd198e	[x86] Update to cmov promotion tests for D36711; NFC Adding i8 -> [i16, i32, i64] and i32 -> i64 cases. This way we can see what the current codegen looks like. llvm-svn: 312707	2017-09-07 08:59:05 +00:00
Zvi Rackover	25799d93f0	X86: Improve AVX512 fptoui lowering Summary: Add patterns for fptoui <16 x float> to <16 x i8> fptoui <16 x float> to <16 x i16> Reviewers: igorb, delena, craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37505 llvm-svn: 312704	2017-09-07 07:40:34 +00:00
Matt Arsenault	65ca292a8d	AMDGPU: Don't legalize i16 extloads to i32 with legal i16 Keeping non-i16 extloads makes it easier to match some new gfx9 load instructions. llvm-svn: 312699	2017-09-07 05:37:34 +00:00
Saleem Abdulrasool	5fba8ba9cc	ARM: track globals promoted to coalesced const pool entries Globals that are promoted to an ARM constant pool may alias with another existing constant pool entry. We need to keep a reference to all globals that were promoted to each constant pool value so that we can emit a distinct label for each promoted global. These labels are necessary so that debug info can refer to the promoted global without an undefined reference during linking. Patch by Stephen Crane! llvm-svn: 312692	2017-09-07 04:00:13 +00:00
Stanislav Mekhanoshin	442e28dd42	[AMDGPU] Use v_pk_max_f16 for fcanonicalize Differential Revision: https://reviews.llvm.org/D37325 llvm-svn: 312676	2017-09-06 22:27:29 +00:00
Matthias Braun	c9056b834d	Insert IMPLICIT_DEFS for undef uses in tail merging Tail merging can convert an undef use into a normal one when creating a common tail. Doing so can make the register live out from a block which previously contained the undef use. To keep the liveness up-to-date, insert IMPLICIT_DEFs in such blocks when necessary. To enable this patch the computeLiveIns() function which used to compute live-ins for a block and set them immediately is split into new functions: - computeLiveIns() just computes the live-ins in a LivePhysRegs set. - addLiveIns() applies the live-ins to a block live-in list. - computeAndAddLiveIns() is a convenience function combining the other two functions and behaving like computeLiveIns() before this patch. Based on a patch by Krzysztof Parzyszek <kparzysz@codeaurora.org> Differential Revision: https://reviews.llvm.org/D37034 llvm-svn: 312668	2017-09-06 20:45:24 +00:00
Sanjay Patel	e96f875deb	[x86] fix triple and regenerate checks for psubus; NFC Patch by Yulia Koval! Differential Revision: https://reviews.llvm.org/D37523 llvm-svn: 312662	2017-09-06 19:05:20 +00:00
Stanislav Mekhanoshin	ea134bcb13	[AMDGPU] Fixed encoding of v_pk_mul_f16 in fcanonicalize Differential Revision: https://reviews.llvm.org/D37522 llvm-svn: 312660	2017-09-06 18:29:51 +00:00
Krzysztof Parzyszek	a3017aa2ab	[IfConversion] Remove kill flags from common instructions as well When if-converting a diamond, two separate blocks will be placed back to back to form a straight line code. To ensure correctness of the liveness information, any registers that are live in the second block should not be killed in the first block, even if they were in the original code. Additionally, when the two blocks share common instructions at the beginning, these instructions will not be duplicated, but only placed once, before both of the blocks. Since the function "isIdenticalTo" (as used here) ignores kill flags, the common initial code in one block may have a kill flag for a register that is live in the other block. Because the code that removes kill flags only runs for the non-common parts of the predicated blocks, a kill flag mismatch in the common code could still lead to a live register being killed prematurely. llvm-svn: 312654	2017-09-06 17:57:13 +00:00
Krzysztof Parzyszek	daf1a5f94e	[Hexagon] Add option to generate calls to "abort" for "unreachable" llvm-svn: 312644	2017-09-06 16:22:55 +00:00
Wei Mi	818d50a93d	[TailCall] Allow llvm.memcpy/memset/memmove to be tail calls when parent function return the intrinsics's first argument. llvm.memcpy/memset/memmove return void but they will return the first argument after they are expanded as libcalls. Now if the parent function has any return value, llvm.memcpy cannot be turned into tail call after expansion. The patch is to handle that case in SelectionDAGBuilder so when caller function return the same value as the first argument of llvm.memcpy, tail call is allowed. Differential Revision: https://reviews.llvm.org/D37406 llvm-svn: 312641	2017-09-06 16:05:17 +00:00
Stanislav Mekhanoshin	949fac9e40	[AMDGPU] Fix shouldClusterMemOps to process flat loads Flat loads do not have vdata operand but have vdst instead. Differential Revision: https://reviews.llvm.org/D37502 llvm-svn: 312640	2017-09-06 15:31:30 +00:00
Nicolai Haehnle	523827145b	AMDGPU: Make worst-case assumption about the wait states in inline assembly Summary: Mesa still uses a hack where empty inline assembly is used as a kind of optimization barrier. This exposed a problem where not enough wait states were inserted, because the hazard recognizer implicitly assumed that each inline assembly "instruction" has at least one wait state. Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D37205 llvm-svn: 312635	2017-09-06 13:50:13 +00:00
Chandler Carruth	585bfc8443	[x86] Fix PR34377 by disabling cmov conversion when we relied on it performing a zext of a register. On the PR there is discussion of how to more effectively handle this, but this patch prevents us from miscompiling code. Differential Revision: https://reviews.llvm.org/D37504 llvm-svn: 312620	2017-09-06 06:28:08 +00:00
Zvi Rackover	5ebe94a84d	X86 Tests: Tidy up AVX512 conversion tests. NFC. Rename functions to a consistent format to make it easier to track coverage. llvm-svn: 312619	2017-09-06 05:33:04 +00:00
Jatin Bhateja	80b5e38c4e	Updating a test reference for rL312608. Differential Revision: https://reviews.llvm.org/D37501 llvm-svn: 312614	2017-09-06 03:58:14 +00:00
Hal Finkel	112a6bac72	[PowerPC] Don't use xscvdpspn on the P7 xscvdpspn was not introduced until the P8, so don't use it on the P7. Fixes a regression introduced in r288152. llvm-svn: 312612	2017-09-06 03:08:26 +00:00
Jatin Bhateja	2c139f77c7	[X86] Allow cross-lane permutations for sub targets supporting AVX2. Summary: Most instructions in AVX work “in-lane”, that is, each source element is applied only to other elements of the same lane, thus a cross lane permutation is costly and needs more than one instrution. AVX2 includes instructions to perform any-to-any permutation of words over a 256-bit register and vectorized table lookup. This should also Fix PR34369 Differential Revision: https://reviews.llvm.org/D37388 llvm-svn: 312608	2017-09-06 02:58:47 +00:00
Yaxun Liu	fc5121a722	[AMDGPU] Transform __read_pipe_* and __write_pipe_* When packet size equals packet align and is power of 2, transform __read_pipe* and __write_pipe* to specialized library function. Differential Revision: https://reviews.llvm.org/D36831 llvm-svn: 312598	2017-09-06 00:30:27 +00:00
Eli Friedman	c22c699882	[ARM] Make ARMExpandPseudo add implicit uses for predicated instructions Missing these could potentially screw up post-ra scheduling. Issue found by inspection, so I don't have a real testcase. Included test just verifies the expected operands after expansion. Differential Revision: https://reviews.llvm.org/D35156 llvm-svn: 312589	2017-09-05 22:54:06 +00:00
Reid Kleckner	e33c94f1b0	Add llvm.codeview.annotation to implement MSVC __annotation Summary: This intrinsic represents a label with a list of associated metadata strings. It is modelled as reading and writing inaccessible memory so that it won't be removed as dead code. I think the intention is that the annotation strings should appear at most once in the debug info, so I marked it noduplicate. We are allowed to inline code with annotations as long as we strip the annotation, but that can be done later. Reviewers: majnemer Subscribers: eraman, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D36904 llvm-svn: 312569	2017-09-05 20:14:58 +00:00
Craig Topper	784fa8a4e3	[X86] Remove unnecessary (v4f32 (X86vzmovl (v4f32 (scalar_to_vector FR32X)))) patterns We had already disabled the pattern for SSE4.1 and SSE4.2. But it got re-enabled for AVX and AVX512. With SSE41 we rely on a separate (v4f32 (X86vzmovl VR128)) pattern to select blendps with a xorps to create zeroess. And a separate (v4f32 (scalar_to_vector FR32X)) to select a COPY_TO_REG_CLASS to move FR32 to VR128 The same thing can happen for AVX with vblendps and those separate patterns already exist. For AVX512, (v4f32 (X86vzmov VR128)) will select a VMOVSS instruction instead of VBLENDPS due to their not being a EVEX VBLENDPS. This is what we were getting out of the larger pattern anyway. So the larger pattern is unneeded for AVX512 too. For SSE1-SSSE3 we can rely on (v4f32 (X86vzmov VR128)) selecting a MOVSS similar to AVX512. Again this is what the larger pattern did too. So the only real change here is that AVX1/2 now properly outputs a VBLENDPS during isel instead of a VMOVSS to match SSE41. Most tests didn't notice because the two address instruction pass knows how to turn VMOVSS into VBLENDPS to get an independent destination register. llvm-svn: 312564	2017-09-05 19:09:02 +00:00
Matt Arsenault	22cdb61a78	AMDGPU: Fix not accounting for tail call resource usage If the only call in a function is a tail call, the function isn't considered to have a call since it's a type of return. llvm-svn: 312561	2017-09-05 18:36:36 +00:00
Zvi Rackover	2096893f34	X86 Tests: Adding missing AVX512 fptoui coverage tests. NFC. Some of the cases show missing pattern i intend to fix shortly. llvm-svn: 312560	2017-09-05 18:24:39 +00:00
Craig Topper	33caeadd90	[AVX512] Remove patterns for (v8f32 (X86vzmovl (insert_subvector undef, (v4f32 (scalar_to_vector FR32X:)), (iPTR 0)))) and the same for v4f64. We don't have this same pattern for AVX2 so I don't believe we should have it for AVX512. We also didn't have it for v16f32. llvm-svn: 312543	2017-09-05 17:33:58 +00:00
Simon Pilgrim	ab48e5e244	[AMDGPU] Added extra test checks to make D19325 diff clearer llvm-svn: 312537	2017-09-05 14:32:06 +00:00
Simon Pilgrim	49f9ba37d8	[X86] Limit store merge size when implicitfloat is enabled (PR34421) As suggested by @niravd : https://bugs.llvm.org/show_bug.cgi?id=34421#c2 Differential Revision: https://reviews.llvm.org/D37464 llvm-svn: 312534	2017-09-05 13:40:29 +00:00
Simon Pilgrim	8dbd745b09	[X86] Regenerate scalar rotation tests llvm-svn: 312530	2017-09-05 12:28:30 +00:00
Simon Pilgrim	08246d185b	[X86][AVX512] Use AVX512 attributes instead of -mcpu in vector shift tests llvm-svn: 312529	2017-09-05 12:23:45 +00:00
Simon Pilgrim	3cbe005a69	[X86][AVX512] Use AVX512 attributes instead of -mcpu llvm-svn: 312528	2017-09-05 12:05:14 +00:00
Diana Picus	abb088691b	[ARM] GlobalISel: Support global variables for RWPI In RWPI code, globals that are not read-only are accessed relative to the SB register (R9). This is achieved by explicitly generating an ADD instruction between SB and an offset that we either load from a constant pool or movw + movt into a register. llvm-svn: 312521	2017-09-05 07:57:41 +00:00
Hiroshi Inoue	614453b797	[PowerPC] eliminate redundant compare instruction If multiple conditional branches are executed based on the same comparison, we can execute multiple conditional branches based on the result of one comparison on PPC. For example, if (a == 0) { ... } else if (a < 0) { ... } can be executed by one compare and two conditional branches instead of two pairs of a compare and a conditional branch. This patch identifies a code sequence of the two pairs of a compare and a conditional branch and merge the compares if possible. To maximize the opportunity, we do canonicalization of code sequence before merging compares. For the above example, the input for this pass looks like: cmplwi r3, 0 beq 0, .LBB0_3 cmpwi r3, -1 bgt 0, .LBB0_4 So, before merging two compares, we canonicalize it as cmpwi r3, 0 ; cmplwi and cmpwi yield same result for beq beq 0, .LBB0_3 cmpwi r3, 0 ; greather than -1 means greater or equal to 0 bge 0, .LBB0_4 The generated code should be cmpwi r3, 0 beq 0, .LBB0_3 bge 0, .LBB0_4 Differential Revision: https://reviews.llvm.org/D37211 llvm-svn: 312514	2017-09-05 04:15:17 +00:00
Sanjay Patel	8d7c8c7960	[x86] add tests for vector store merge opportunity; NFC llvm-svn: 312504	2017-09-04 22:01:25 +00:00
Sanjay Patel	543f3fda83	[x86] auto-generate complete checks; NFC llvm-svn: 312503	2017-09-04 21:46:05 +00:00
Sanjay Patel	4e10b61d8f	[x86] add/regenerate complete checks; NFC llvm-svn: 312502	2017-09-04 21:43:32 +00:00
Sanjay Patel	d413303b83	[x86] add test for unnecessary cmp + masked store; NFC As noted in PR11210: https://bugs.llvm.org/show_bug.cgi?id=11210 ...fixing this should allow us to eliminate x86-specific masked store intrinsics in IR. (Although more testing will be needed to confirm that.) llvm-svn: 312496	2017-09-04 17:21:17 +00:00
Sam McCall	f71bb198ed	Revert "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"" This crashes on boringSSL on PPC (will send reduced testcase) This reverts commit r312328. llvm-svn: 312490	2017-09-04 15:47:00 +00:00
Simon Pilgrim	91751b42f6	[X86][AVX512] Add support for VPERMILPS v16f32 shuffle lowering (PR34382) Avoid use of VPERMPS where we don't need it by instead using the variable mask version of VPERMILPS for unary shuffles. llvm-svn: 312486	2017-09-04 13:51:57 +00:00
Simon Pilgrim	adffa8b2e9	Added shuffle test case from PR34382 llvm-svn: 312485	2017-09-04 13:43:13 +00:00
Simon Pilgrim	62c78f27d4	Added shuffle test case from PR34369 llvm-svn: 312481	2017-09-04 11:08:47 +00:00
Ayman Musa	5defce3986	[X86] Replace -mcpu option with -mattr in LIT tests added in https://reviews.llvm.org/rL312442 llvm-svn: 312474	2017-09-04 09:31:32 +00:00
Igor Breger	2661ae48c7	[GlobalISel][X86] G_PHI support. llvm-svn: 312473	2017-09-04 09:06:45 +00:00
Dean Michael Berris	ebc1659016	[XRay][CodeGen] Use PIC-friendly code in XRay sleds and remove synthetic references in .text Summary: This is a re-roll of D36615 which uses PLT relocations in the back-end to the call to __xray_CustomEvent() when building in -fPIC and -fxray-instrument mode. Reviewers: pcc, djasper, bkramer Subscribers: sdardis, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D37373 llvm-svn: 312466	2017-09-04 05:34:58 +00:00
Craig Topper	76f44015e7	[X86] Add a combine to recognize when we have two insert subvectors that together write the whole vector, but the starting vector isn't undef. In this case we should replace the starting vector with undef. llvm-svn: 312462	2017-09-04 01:13:36 +00:00
Craig Topper	bc13af84f2	[X86] Add a combine to turn (insert_subvector zero, (insert_subvector zero, X, Idx), Idx) into an insert of X into the larger zero vector. llvm-svn: 312460	2017-09-03 22:25:52 +00:00
Craig Topper	fcf6bc5503	[X86] Add more patterns to use moves to zero the upper portions of a vector register that I missed in r312450. llvm-svn: 312459	2017-09-03 22:25:50 +00:00
Craig Topper	788fbe08db	[X86] Combine inserting a vector of zeros into a vector of zeros just the larger vector. llvm-svn: 312458	2017-09-03 22:25:49 +00:00
Craig Topper	8ee36ffb54	[X86] Add patterns to turn an insert into lower subvector of a zero vector into a move instruction which will implicitly zero the upper elements. Ideally we'd be able to emit the SUBREG_TO_REG without the explicit register->register move, but we'd need to be sure the producing operation would select something that guaranteed the upper bits were already zeroed. llvm-svn: 312450	2017-09-03 17:52:25 +00:00
Craig Topper	fa82efb50a	[X86] Add VBLENDPS/VPBLENDD to the execution domain fixing tables. llvm-svn: 312449	2017-09-03 17:52:23 +00:00
Craig Topper	bb6506d251	[X86] Canonicalize (concat_vectors X, zero) -> (insert_subvector zero, X, 0). In a future patch, I plan to teach isel to use a small vector move with implicit zeroing of the upper elements when it sees the (insert_subvector zero, X, 0) pattern. llvm-svn: 312448	2017-09-03 17:52:19 +00:00
Ayman Musa	2927ea0b19	[X86] Add -mtriple option to LIT tests added in https://reviews.llvm.org/rL312442 llvm-svn: 312443	2017-09-03 15:06:26 +00:00
Ayman Musa	ef8f61bce6	[X86][AVX512] Add simple tests for all AVX512 shuffle instructions. Throughout an effort to strongly check the behavior of CodeGen with the IR shufflevector instruction we generated many tests while predicting the best X86 sequence that may be generated. This is a subset of the generated tests that we think may add value to our X86 set of tests. Some of the checks are not optimal and will be changed after fixing: 1. PR34394 2. PR34382 3. PR34380 4. PR34359 Differential Revision: https://reviews.llvm.org/D37329 llvm-svn: 312442	2017-09-03 13:53:44 +00:00
Ayman Musa	ac12849d32	[X86] Add RUN line for LIT test committed in "rL312438: [X86] Fix crash on assert of non-simple type after type-legalization.". llvm-svn: 312439	2017-09-03 10:44:18 +00:00
Ayman Musa	44cde94935	[X86] Fix crash on assert of non-simple type after type-legalization The function combineShuffleToVectorExtend in DAGCombine might generate an illegal typed node after "legalize types" phase, causing assertion on non-simple type to fail afterwards. Adding a type check in case the combine is running after the type legalize pass. Differential Revision: https://reviews.llvm.org/D37330 llvm-svn: 312438	2017-09-03 09:09:16 +00:00
Craig Topper	619b759a57	[X86] Teach fastisel to handle zext/sext i8->i16 and sext i1->i8/i16/i32/i64 Summary: ZExt and SExt from i8 to i16 aren't implemented in the autogenerated fast isel table because normal isel does a zext/sext to 32-bits and a subreg extract to avoid a partial register write or false dependency on the upper bits of the destination. This means without handling in fast isel we end up triggering a fast isel abort. We had no custom sign extend handling at all so while I was there I went ahead and implemented sext i1->i8/i16/i32/i64 which was also missing. This generates an i1->i8 sign extend using a mask with 1, then an 8-bit negate, then continues with a sext from i8. A better sequence would be a wider and/negate, but would require more custom code. Fast isel tests are a mess and I couldn't find a good home for the tests so I created a new one. The test pr34381.ll had to have fast-isel removed because it was relying on a fast isel abort to hit the bug. The test case still seems valid with fast-isel disabled though some of the instructions changed. Reviewers: spatel, zvi, igorb, guyblank, RKSimon Reviewed By: guyblank Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37320 llvm-svn: 312422	2017-09-02 18:53:46 +00:00
Stanislav Mekhanoshin	520608b268	[AMDGPU] Testcase for computeKnownBits recursion. NFC. Testcase for rL312364: [AMDGPU] Prevent infinite recursion in DAG.computeKnownBits() llvm-svn: 312388	2017-09-01 22:25:22 +00:00
Jessica Paquette	b0d17d99dd	[MIParser] Ensure getHexUint doesn't produce APInts with a bitwidth of 0 If getHexUint reads in a hex 0, it will create an APInt with a value of 0. The number of active bits on this APInt is used to calculate the bitwidth of Result. The number of active bits is defined as an APInt's bitwidth - its number of leading 0s. Since this APInt is 0, its bitwidth and number of leading 0s are equal. Thus, Result is constructed with a bitwidth of 0, triggering an APInt assert. This commit fixes that by checking if the APInt is equal to 0, and setting the bitwidth to 32 if it is. Otherwise, it sets the bitwidth using getActiveBits. This caused issues when compiling MIR files with successor probabilities. In the case that a successor is tagged with a probability of 0, this assert would fire on debug builds. https://reviews.llvm.org/D37401 llvm-svn: 312387	2017-09-01 22:17:14 +00:00
Sanjay Patel	f4425e9a66	[x86] eliminate redundant shuffle of horizontal math ops when both inputs are the same This is limited to a set of patterns based on the example in PR34111: https://bugs.llvm.org/show_bug.cgi?id=34111 ...but as I was investigating this, I see that horizontal patterns can go wrong in many, many other ways that would not be handled by this patch. Each data type may even go different in the DAG after starting with the same basic IR pattern, so even proper IR canonicalization won't fix it all. Differential Revision: https://reviews.llvm.org/D37357 llvm-svn: 312379	2017-09-01 21:09:04 +00:00
Matthias Braun	cebdb17522	LiveIntervalAnalysis: Fix alias regunit reserved definition A register in CodeGen can be marked as reserved: In that case we consider the register always live and do not use (or rather ignore) kill/dead/undef operand flags. LiveIntervalAnalysis however tracks liveness per register unit (not per register). We already needed adjustments for this in r292871 to deal with super/sub registers. However I did not look at aliased register there. Looking at ARM: FPSCR (regunits FPSCR, FPSCR~FPSCR_NZCV) aliases with FPSCR_NZCV (regunits FPSCR_NZCV, FPSCR~FPSCR_NZCV) hence they share a register unit (FPSCR~FPSCR_NZCV) that represents the aliased parts of the registers. This shared register unit was previously considered non-reserved, however given that we uses of the reserved FPSCR potentially violate some rules (like uses without defs) we should make FPSCR~FPSCR_NZCV reserved too and stop tracking liveness for it. This patch: - Defines a register unit as reserved when: At least for one root register, the root register and all its super registers are reserved. - Adjust LiveIntervals::computeRegUnitRange() for new reserved definition. - Add MachineRegisterInfo::isReservedRegUnit() to have a canonical way of testing. - Stop computing LiveRanges for reserved register units in HMEditor even with UpdateFlags enabled. - Skip verification of uses of reserved reg units in the machine verifier (this usually didn't happen because there would be no cached liverange but there is no guarantee for that and I would run into this case before the HMEditor tweak, so may as well fix the verifier too). Note that this should only affect ARMs FPSCR/FPSCR_NZCV registers today; aliased registers are rarely used, the only other cases are hexagons P0-P3/P3_0 and C8/USR pairs which are not mixing reserved/non-reserved registers in an alias. Differential Revision: https://reviews.llvm.org/D37356 llvm-svn: 312348	2017-09-01 18:36:26 +00:00
Nicolai Haehnle	75c98c365b	AMDGPU: IMPLICIT_DEFs and DBG_VALUEs do not contribute to wait states Summary: This fixes a bug that was exposed on gfx9 in various GL45-CTS.shaders.loops.*_iterations.select_iteration_count_fragment tests, e.g. GL45-CTS.shaders.loops.do_while_uniform_iterations.select_iteration_count_fragment Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D36193 llvm-svn: 312337	2017-09-01 16:56:32 +00:00
Craig Topper	2a75b6f26b	[X86] Add test case I forgot to commit with r312285. llvm-svn: 312335	2017-09-01 16:40:24 +00:00
Manoj Gupta	6b54c7e11b	[LoopVectorizer] Use two step casting for float to pointer types. Summary: LoopVectorizer is creating casts between vec<ptr> and vec<float> types on ARM when compiling OpenCV. Since, tIs is illegal to directly cast a floating point type to a pointer type even if the types have same size causing a crash. Fix the crash using a two-step casting by bitcasting to integer and integer to pointer/float. Fixes PR33804. Reviewers: mkuper, Ayal, dlj, rengolin, srhines Reviewed By: rengolin Subscribers: aemerson, kristof.beyls, mkazantsev, Meinersbur, rengolin, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D35498 llvm-svn: 312331	2017-09-01 15:36:00 +00:00
Geoff Berry	65528f2991	Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding" Issues addressed since original review: - Moved removal of dead instructions found by LiveIntervals::shrinkToUses() outside of loop iterating over instructions to avoid instructions being deleted while pointed to by iterator. - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. llvm-svn: 312328	2017-09-01 14:27:20 +00:00
Diana Picus	f959791189	[ARM] GlobalISel: Support ROPI global variables In the ROPI relocation model, read-only variables are accessed relative to the PC. We use the (MOV\|LDRLIT)_ga_pcrel pseudoinstructions for this. llvm-svn: 312323	2017-09-01 11:13:39 +00:00
Diana Picus	b67264b182	[ARM] GlobalISel: More tests. NFC. Test constants as well in the PIC tests. These are also represented as G_GLOBAL_VALUE, and although they are treated just like other globals for PIC, they won't be for ROPI, so it's good to have this coverage. llvm-svn: 312319	2017-09-01 10:18:37 +00:00
Craig Topper	70e581cdd6	[X86] Add isel patterns for memory forms of FMA3 intrinsic instructions llvm-svn: 312309	2017-09-01 07:58:13 +00:00
Matt Arsenault	ab4a5cd335	AMDGPU: Fold clamp modifier for packed instructions llvm-svn: 312297	2017-08-31 23:53:50 +00:00
Derek Schuff	0f3bc0f478	[WebAssembly] Refactor load ISel tablegen patterns into classes Not all of these will be able to be used by atomics because tablegen, but it still seems like a good change by itself. Differential Revision: https://reviews.llvm.org/D37345 llvm-svn: 312287	2017-08-31 21:51:48 +00:00
Jessica Paquette	ffe4abc51b	[MachineOutliner] Recommit r312194, missed optimization remarks Before, this commit caused a buildbot failure: http://bb.pgr.jp/builders/test-llvm-i686-linux-RA/builds/6026/steps/test_llvm/logs/LLVM%20%3A%3A%20CodeGen__AArch64__machine-outliner-remarks.ll This was caused by the Key value in DiagnosticInfoOptimizationBase being deallocated before emitting the remarks defined in MachineOutliner.cpp. As of r312277 this should no longer be an issue. llvm-svn: 312280	2017-08-31 21:02:45 +00:00
Sanjay Patel	841acbbca0	[x86] add more tests for horizontal ops; NFC llvm-svn: 312279	2017-08-31 20:59:25 +00:00
Reid Kleckner	08f5fd51cc	[codeview] Generalize DIExpression parsing to handle load chains Summary: Hopefully this also clarifies exactly when and why we're rewriting certiain S_LOCALs using reference types: We're using the reference type to stand in for a zero-offset load. Reviewers: inglorion Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D37309 llvm-svn: 312247	2017-08-31 15:56:49 +00:00
Daniel Jasper	c0a976d417	Revert r311525: "[XRay][CodeGen] Use PIC-friendly code in XRay sleds; remove synthetic references in .text" Breaks builds internally. Will forward repo instructions to author. llvm-svn: 312243	2017-08-31 15:17:17 +00:00
Yael Tsafrir	185c81725e	[X86] Added run line to intrinsics upgrade test. NFC. llvm-svn: 312241	2017-08-31 13:56:22 +00:00
Ashutosh Nema	bfcac0b480	AMD family 17h (znver1) scheduler model update. Summary: This patch enables the following: 1) Regex based Instruction itineraries for integer instructions. 2) The instructions are grouped as per the nature of the instructions (move, arithmetic, logic, Misc, Control Transfer). 3) FP instructions and their itineraries are added which includes values for SSE4A, BMI, BMI2 and SHA instructions. Patch by Ganesh Gopalasubramanian Reviewers: RKSimon, craig.topper Subscribers: vprasad, shivaram, ddibyend, andreadb, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D36617 llvm-svn: 312237	2017-08-31 12:38:35 +00:00
Martin Storsjo	865d01a3cf	[AArch64] Support COFF linker directives This is similar to what was done for ARM in SVN r269574; the code and the test are straight copypaste to the corresponding AArch64 code and test directory. Differential revision: https://reviews.llvm.org/D37204 llvm-svn: 312223	2017-08-31 08:28:48 +00:00
Daniel Jasper	b8198f02e6	Revert r312194: "[MachineOutliner] Add missed optimization remarks for the outliner." Breaks on buildbot: http://bb.pgr.jp/builders/test-llvm-i686-linux-RA/builds/6026/steps/test_llvm/logs/LLVM%20%3A%3A%20CodeGen__AArch64__machine-outliner-remarks.ll llvm-svn: 312219	2017-08-31 06:22:35 +00:00
Eric Christopher	e42ac21499	Temporarily revert "Update branch coalescing to be a PowerPC specific pass" From comments and code review it wasn't intended to be enabled by default yet. This reverts commit r311588. llvm-svn: 312214	2017-08-31 05:56:16 +00:00
Jessica Paquette	65d953e0b1	[MachineOutliner] Add missed optimization remarks for the outliner. This adds missed optimization remarks which report viable candidates that were not outlined because they would increase code size. Other remarks will come in separate commits. This will help to diagnose code size regressions and changes in outliner behaviour in projects using the outliner. https://reviews.llvm.org/D37085 llvm-svn: 312194	2017-08-30 23:31:49 +00:00
Hans Wennborg	24775a0a6c	Revert r312154 "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"" It caused PR34387: Assertion failed: (RegNo < NumRegs && "Attempting to access record for invalid register number!") > Issues identified by buildbots addressed since original review: > - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. > - The pass no longer forwards COPYs to physical register uses, since > doing so can break code that implicitly relies on the physical > register number of the use. > - The pass no longer forwards COPYs to undef uses, since doing so > can break the machine verifier by creating LiveRanges that don't > end on a use (since the undef operand is not considered a use). > > [MachineCopyPropagation] Extend pass to do COPY source forwarding > > This change extends MachineCopyPropagation to do COPY source forwarding. > > This change also extends the MachineCopyPropagation pass to be able to > be run during register allocation, after physical registers have been > assigned, but before the virtual registers have been re-written, which > allows it to remove virtual register COPY LiveIntervals that become dead > through the forwarding of all of their uses. llvm-svn: 312178	2017-08-30 22:11:37 +00:00
Brian Gesiak	3332976478	[ARM] Use Swift error registers on non-Darwin targets Summary: Remove a check for `ARMSubtarget::isTargetDarwin` when determining whether to use Swift error registers, so that Swift errors work properly on non-Darwin ARM32 targets (specifically Android). Before this patch, generated code would save and restores ARM register r8 at the entry and returns of a function that throws. As r8 is used as a virtual return value for the object being thrown, this gets overwritten by the restore, and calling code is unable to catch the error. In turn this caused Swift code that used `do`/`try`/`catch` to work improperly on Android ARM32 targets. Addresses Swift bug report https://bugs.swift.org/browse/SR-5438. Patch by John Holdsworth. Reviewers: manmanren, rjmccall, aschwaighofer Reviewed By: aschwaighofer Subscribers: srhines, aschwaighofer, aemerson, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D35835 llvm-svn: 312164	2017-08-30 20:03:54 +00:00
Aditya Nandakumar	c6615f56f5	[GISel]: Add a clean up combiner during legalization. Added a combiner which can clean up truncs/extends that are created in order to make the types work during legalization. Also moved the combineMerges to the LegalizeCombiner. https://reviews.llvm.org/D36880 llvm-svn: 312158	2017-08-30 19:32:59 +00:00
Geoff Berry	feffb0c8af	Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding" Issues identified by buildbots addressed since original review: - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. llvm-svn: 312154	2017-08-30 18:41:07 +00:00
Derek Schuff	18ba192843	[WebAssembly] Add target feature for atomics Summary: This tracks the WebAssembly threads feature proposal at https://github.com/WebAssembly/threads/blob/master/proposals/threads/Overview.md Differential Revision: https://reviews.llvm.org/D37300 llvm-svn: 312145	2017-08-30 18:07:45 +00:00
Adrian Prantl	05782218ab	Canonicalize the representation of empty an expression in DIGlobalVariableExpression This change simplifies code that has to deal with DIGlobalVariableExpression and mirrors how we treat DIExpressions in debug info intrinsics. Before this change there were two ways of representing empty expressions on globals, a nullptr and an empty !DIExpression(). If someone needs to upgrade out-of-tree testcases: perl -pi -e 's/(!DIGlobalVariableExpression$var: ![0-9]*)$/\1, expr: !DIExpression())/g' <MYTEST.ll> will catch 95%. llvm-svn: 312144	2017-08-30 18:06:51 +00:00
Craig Topper	afce0baacd	[AVX512] Don't use 32-bit elements version of AND/OR/XOR/ANDN during isel unless we're matching a masked op or broadcast Selecting 32-bit element logical ops without a select or broadcast requires matching a bitconvert on the inputs to the and. But that's a weird thing to rely on. It's entirely possible that one of the inputs doesn't have a bitcast and one does. Since there's no functional difference, just remove the extra patterns and save some isel table size. Differential Revision: https://reviews.llvm.org/D36854 llvm-svn: 312138	2017-08-30 16:38:33 +00:00
Igor Breger	36d447d8a8	[GlobalISel][X86] Support variadic function call. Summary: Support variadic function call. Port the implementation from X86FastISel. Reviewers: zvi, guyblank, oren_ben_simhon Reviewed By: guyblank Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D37261 llvm-svn: 312130	2017-08-30 15:10:15 +00:00
Balaram Makam	42adadfca0	Re-land MachineInstr: Reason locally about some memory objects before going to AA. Summary: Reverts r311008 to reinstate r310825 with a fix. Refine alias checking for pseudo vs value to be conservative. This fixes the original failure in builtbot unittest SingleSource/UnitTests/2003-07-09-SignedArgs. Reviewers: hfinkel, nemanjai, efriedma Reviewed By: efriedma Subscribers: bjope, mcrosier, nhaehnle, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D36900 llvm-svn: 312126	2017-08-30 14:57:12 +00:00
Strahinja Petrovic	89df797ee9	[MIPS] Add support to match more patterns for BBIT instruction This patch supports one more pattern for bbit0 and bbit1 instructions, CBranchBitNum class is expanded so it can take 32 bit immidate. Differential Revision: https://reviews.llvm.org/D36222 llvm-svn: 312111	2017-08-30 11:25:38 +00:00
Sjoerd Meijer	be5b60f735	[AArch64] allow v4f16 types when FullFP16 is supported Support for scalars was committed in r311154, this adds support for allowing v4f16 vector types (thus avoiding conversions from/to single precision for these types). Differential Revision: https://reviews.llvm.org/D37145 llvm-svn: 312104	2017-08-30 08:38:13 +00:00
Gadi Haber	767d98bad8	[X86][Skylake] Fixing duplicated prefixes in the run command of Code Gen regression tests NFC. Replaced duplicated HASWELL prefixes in run commands in the X86 Code Gen regression tests by the SKYLAKE prefix when the -mcpu is set to skylake. The fix is needed in preparation of an upcoming patch containing the Skylake scheduling info. Reviewers: zvi, RKSimon, aymanmus, igorb Differential Revision: https://reviews.llvm.org/D37258 llvm-svn: 312103	2017-08-30 08:08:50 +00:00
Craig Topper	17854ecf24	[AVX512] Correct isel patterns to support selecting masked vbroadcastf32x2/vbroadcasti32x2 Summary: This patch adjusts the patterns to make the result type of the broadcast node vXf64/vXi64. Then adds a bitcast to vXi32 after that. Intrinsic lowering was also adjusted to generate this new pattern. Fixes PR34357 We should probably just drop the intrinsic entirely and use native IR, but I'll leave that for a future patch. Any idea what instruction we should be lowering the floating point 128-bit result version of this pattern to? There's a 128-bit v2i32 integer broadcast but not an fp one. Reviewers: aymanmus, zvi, igorb Reviewed By: aymanmus Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37286 llvm-svn: 312101	2017-08-30 07:48:39 +00:00
Craig Topper	48a7917079	[AVX512] Use 256-bit extract instructions for extracting bits [255:128] from a 512-bit register This enables the use of a smaller encoding by using a VEX instruction when possible. Differential Revision: https://reviews.llvm.org/D37092 llvm-svn: 312100	2017-08-30 07:26:12 +00:00

1 2 3 4 5 ...

21569 Commits