llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	87cb0745eb	[x86] auto-generate better checks; NFC llvm-svn: 287048	2016-11-15 22:42:20 +00:00
Sanjay Patel	9a4ce290d0	[x86] auto-generate better checks; NFC llvm-svn: 287046	2016-11-15 22:33:16 +00:00
Chad Rosier	201fc1ed26	[AArch64] Add support for Qualcomm's Falkor CPU. Differential Revision: https://reviews.llvm.org/D26673 llvm-svn: 287036	2016-11-15 21:34:12 +00:00
Tom Stellard	d23de360db	AMDGPU/SI: Fix pattern for i16 = sign_extend i1 Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D26670 llvm-svn: 287035	2016-11-15 21:25:56 +00:00
Sanjay Patel	2a51748a5d	[x86] add tests for FP-logic equivalent instruction replacement The ANDN test needs at least 3 different fixes. llvm-svn: 287032	2016-11-15 21:19:28 +00:00
Matt Arsenault	d4bb5e4831	AMDGPU: Enable store clustering Also respect the TII hook for these like the generic code does in case we want a flag later to disable this. llvm-svn: 287021	2016-11-15 20:22:55 +00:00
Haicheng Wu	faee2b71a7	[AArch64] Lower multiplication by a constant int to shl+add+shl Lower a = b * C where C = (2^n + 1) * 2^m to add w0, w0, w0, lsl n lsl w0, w0, m Differential Revision: https://reviews.llvm.org/D229245 llvm-svn: 287019	2016-11-15 20:16:48 +00:00
Matt Arsenault	3666629837	AMDGPU: Analyze mubuf with immediate soffset Fixes giving up on clustering common addr64 accesses with constant 0 soffset. llvm-svn: 287018	2016-11-15 20:14:27 +00:00
Wei Mi	37c4aaaf52	Revert r286999 which caused buildbot test failures. Some testcases need to be made target specific. llvm-svn: 287014	2016-11-15 19:42:05 +00:00
Stanislav Mekhanoshin	ea91cca593	[AMDGPU] Add wave barrier builtin The wave barrier represents the discardable barrier. Its main purpose is to carry convergent attribute, thus preventing illegal CFG optimizations. All lanes in a wave come to convergence point simultaneously with SIMT, thus no special instruction is needed in the ISA. The barrier is discarded during code generation. Differential Revision: https://reviews.llvm.org/D26585 llvm-svn: 287007	2016-11-15 19:00:15 +00:00
Sanjay Patel	22465125b3	[x86] auto-generate checks; NFC Also, fix the test params to use an attribute rather than a CPU model and remove the AVX run because that does nothing but check for a 'v' prefix in all of these tests. llvm-svn: 287003	2016-11-15 18:44:53 +00:00
Wei Mi	7ccf7651c0	[LSR] Allow formula containing Reg for SCEVAddRecExpr related with outerloop. In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr, and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser and dropped. Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only handle inner loop now so only %for.body2 will be handled. Using the logic above, formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1reg({0,+,1}<%for.body2>) will be dropped no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related with outerloop. Only formula like reg(%array) + 1reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept because the SCEVAddRecExpr related with outerloop is folded into the initial value of the SCEVAddRecExpr related with current loop. But in some cases, we do need to share the basic induction variable reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction variables used by LSR, so we don't want to drop the formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally. From the existing comment, it tries to avoid considering multiple level loops at the same time. However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other than current loop, it is an invariant and will be simple to handle, and the formula doesn't have to be dropped. Differential Revision: https://reviews.llvm.org/D26429 llvm-svn: 286999	2016-11-15 18:35:53 +00:00
Pawel Bylica	c3f6c97f71	Integer legalization: fix MUL expansion Summary: This fixes the runtime results produces by the fallback multiplication expansion introduced in r270720. For tests I created a fuzz tester that compares the results with Boost.Multiprecision. Reviewers: hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26628 llvm-svn: 286998	2016-11-15 18:29:24 +00:00
Zaara Syeda	a19c9e60e9	vector load store with length (left justified) llvm portion llvm-svn: 286993	2016-11-15 17:54:19 +00:00
Simon Pilgrim	ceffb43b1b	[X86][SSE] Improve SINT_TO_FP of boolean vector results (signum) This patch helps avoids poor legalization of boolean vector results (e.g. 8f32 -> 8i1 -> 8i16) that feed into SINT_TO_FP by inserting an early SIGN_EXTEND and so help improve the truncation logic. This is not necessary for AVX512 targets where boolean vectors are legal - AVX512 manages to lower ( sint_to_fp vXi1 ) into some form of ( select mask, 1.0f , 0.0f ) in most cases. Fix for PR13248 Differential Revision: https://reviews.llvm.org/D26583 llvm-svn: 286979	2016-11-15 16:24:40 +00:00
Tony Jiang	5f850cd1b1	[PowerPC] Implement BE VSX load/store builtins - llvm portion. This patch implements all the overloads for vec_xl_be and vec_xst_be. On BE, they behaves exactly the same with vec_xl and vec_xst, therefore they are simply implemented by defining a matching macro. On LE, they are implemented by defining new builtins and intrinsics. For int/float/long long/double, it is just a load (lxvw4x/lxvd2x) or store(stxvw4x/stxvd2x). For char/char/short, we also need some extra shuffling before or after call the builtins to get the desired BE order. For int128, simply call vec_xl or vec_xst. llvm-svn: 286967	2016-11-15 14:25:56 +00:00
Zvi Rackover	f0b9b57bd3	[X86][FastISel] Fix lowering of overflow result on AVX512 targets Summary: Fix a case where the overflow value of type i1, which is legal on AVX512, was assigned to a VK1 register class. We always want this value to be assigned to a GPR since the overflow return value is lowered to a SETO instruction. Fixes pr30981. Reviewers: mkuper, igorb, craig.topper, guyblank, qcolombet Subscribers: qcolombet, llvm-commits Differential Revision: https://reviews.llvm.org/D26620 llvm-svn: 286958	2016-11-15 13:29:23 +00:00
Javed Absar	f043dac25d	[ARM] Add machine scheduler for Cortex-R52 This patch adds the Sched Machine Model for Cortex-R52. Details of the pipeline and descriptions are in comments in file ARMScheduleR52.td included in this patch. Reviewers: rengolin, jmolloy Differential Revision: https://reviews.llvm.org/D26500 llvm-svn: 286949	2016-11-15 11:34:54 +00:00
Asaf Badouh	b573553424	DAGCombiner: fix combine of trunc and select bugzilla: https://llvm.org/bugs/show_bug.cgi?id=29002 pr29002 Differential Revision: https://reviews.llvm.org/D26449 llvm-svn: 286938	2016-11-15 07:55:22 +00:00
Zvi Rackover	76dbf26599	[X86][GlobalISel] Add minimal call lowering support to the IRTranslator Summary: Add basic functionality to support call lowering for X86. Currently only supports functions which return void and take zero arguments. Inspired by commit 286573. Reviewers: ab, qcolombet, t.p.northover Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26593 llvm-svn: 286935	2016-11-15 06:34:33 +00:00
Craig Topper	0637099f24	[AVX-512] Add an example test case for PR31018. llvm-svn: 286934	2016-11-15 05:21:55 +00:00
Matt Arsenault	c79dc70d50	AMDGPU: Fix f16 fabs/fneg llvm-svn: 286931	2016-11-15 02:25:28 +00:00
Matt Arsenault	972034bda9	AMDGPU: Fix formatting of 1/2pi immediate llvm-svn: 286912	2016-11-15 00:04:33 +00:00
Tom Stellard	9c884e495c	MIRParser: Add support for parsing vreg reg alloc hints Reviewers: qcolombet, MatzeB Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D26573 llvm-svn: 286911	2016-11-15 00:03:14 +00:00
Evandro Menezes	9fc54826e0	[AArch64] Compute the Newton series for reciprocals natively Implement the Newton series for square root, its reciprocal and reciprocal natively using the specialized instructions in AArch64 to perform each series iteration. Differential revision: https://reviews.llvm.org/D26518 llvm-svn: 286907	2016-11-14 23:29:01 +00:00
Tim Northover	e33b175411	GlobalISel: add tests for G_ZEXT/G_SEXT to types smaller than 32-bits. Support was accidentally added in r286407, but there were no tests at the time. llvm-svn: 286903	2016-11-14 22:50:22 +00:00
Tom Stellard	11e60ff7da	RegAllocGreedy: Properly initialize this pass, so that -run-pass will work Reviewers: qcolombet, MatzeB Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D26572 llvm-svn: 286895	2016-11-14 21:50:13 +00:00
Tim Northover	46a6f0fbf0	Recommit: ARM: sort register lists by encoding in push/pop instructions. For example we were producing push {r8, r10, r11, r4, r5, r7, lr} This is misleading (r4, r5 and r7 are actually pushed before the rest), and other components (stack folding recently) often forget to deal with the extra complexity coming from the different order, leading to miscompiles. Finally, we warn about our own code in -no-integrated-as mode without this, which is really not a good idea. Fixed usage of std::sort so that we (hopefully) use instantiations that actually exist in GCC 4.8. llvm-svn: 286881	2016-11-14 20:28:24 +00:00
Michael Kuperstein	f221f13ccc	[X86] Tests exhibiting bad parial reloading behavior. NFC. llvm-svn: 286878	2016-11-14 19:58:11 +00:00
Geoff Berry	526c50588d	[AArch64] Split 0 vector stores into scalar store pairs. Summary: Replace a splat of zeros to a vector store by scalar stores of WZR/XZR. The load store optimizer pass will merge them to store pair stores. This should be better than a movi to create the vector zero followed by a vector store if the zero constant is not re-used, since one instructions and one register live range will be removed. For example, the final generated code should be: stp xzr, xzr, [x0] instead of: movi v0.2d, #0 str q0, [x0] Reviewers: t.p.northover, mcrosier, MatzeB, jmolloy Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D26561 llvm-svn: 286875	2016-11-14 19:39:04 +00:00
Tim Northover	1b66f39cf2	Revert "ARM: sort register lists by encoding in push/pop instructions." This reverts commit 286866. It broke a bot, something to do with exactly which templates std::sort accepts. llvm-svn: 286867	2016-11-14 19:05:28 +00:00
Tim Northover	e908ea844c	ARM: sort register lists by encoding in push/pop instructions. For example we were producing push {r8, r10, r11, r4, r5, r7, lr} This is misleading (r4, r5 and r7 are actually pushed before the rest), and other components (stack folding recently) often forget to deal with the extra complexity coming from the different order, leading to miscompiles. Finally, we warn about our own code in -no-integrated-as mode without this, which is really not a good idea. llvm-svn: 286866	2016-11-14 19:02:17 +00:00
Sean Fertile	a435e07de8	[PPC] Add intrinsic mapping to the xscvhpsp instruction add an intrinsic to expose the 'VSX Scalar Convert Half-Precision to Single-Precision' instruction. Differential review: https://reviews.llvm.org/D26536 llvm-svn: 286862	2016-11-14 18:43:59 +00:00
Changpeng Fang	8236fe103f	AMDGPU/SI: Support data types other than V4f32 in image intrinsics Summary: Extend image intrinsics to support data types of V1F32 and V2F32. TODO: we should define a mapping table to change the opcode for data type of V2F32 but just one channel is active, even though such case should be very rare. Reviewers: tstellarAMD Differential Revision: http://reviews.llvm.org/D26472 llvm-svn: 286860	2016-11-14 18:33:18 +00:00
Zvi Rackover	35bb7fdadc	[X86] Adding reproducer for pr30981 llvm-svn: 286855	2016-11-14 18:10:44 +00:00
Sumanth Gundapaneni	d428cf8b5f	[Hexagon] Remove unsafe load instructions that affect Stack Slot Coloring The Stack slot coloring pass removes a store that is followed by a load that deal with the same stack slot. The function isLoadFromStackSlot is supposed to consider the loads that have no side-effects. This patch fixed the issue by removing the unsafe loads from this function Eg: %vreg0<def> = L2_loadruh_io <fi#15>, 0 S2_storeri_io <fi#15>, 0, %vreg0 In this case, we load an unsigned extended half word and store this in to the same stack slot. The Stack slot coloring pass considers safe to remove the store. This patch marked all the non-vector byte and half word loads as unsafe. llvm-svn: 286843	2016-11-14 17:11:00 +00:00
Sean Fertile	adda5b2d2b	[PPC] add intrinsics for vec extract exp/significand and vec test data class. Differential Revision: https://reviews.llvm.org/D26272 llvm-svn: 286829	2016-11-14 14:42:37 +00:00
Craig Topper	b8596e4d1d	[X86] Cleanup 'x' and 'y' mnemonic suffixes for vcvtpd2dq/vcvttpd2dq/vcvtpd2ps and similar instructions. -Don't print the 'x' suffix for the 128-bit reg/mem VEX encoded instructions in Intel syntax. This is consistent with the EVEX versions. -Don't print the 'y' suffix for the 256-bit reg/reg VEX encoded instructions in Intel or AT&T syntax. This is consistent with the EVEX versions. -Allow the 'x' and 'y' suffixes to be used for the reg/mem forms when we're assembling using Intel syntax. -Allow the 'x' and 'y' suffixes on the reg/reg EVEX encoded instructions in Intel or AT&T syntax. This is consistent with what VEX was already allowing. This should fix at least some of PR28850. llvm-svn: 286787	2016-11-14 01:53:29 +00:00
Craig Topper	353e59b6d6	[AVX-512] Remove and autoupgrade masked dword/qword variable shift intrinsics to the new unmasked versions and selects. llvm-svn: 286786	2016-11-14 01:53:22 +00:00
Sanjay Patel	cfcc42bdc2	[ValueTracking] recognize even more variants of smin/smax Similar to: https://reviews.llvm.org/rL285499 https://reviews.llvm.org/rL286318 We can't minimally expose this in IR tests because we don't have min/max intrinsics, but the difference is visible in codegen because SelectionDAGBuilder::visitSelect() uses matchSelectPattern(). We're not canonicalizing these patterns in IR (yet), so I don't expect there to be any regressions as noted here: http://lists.llvm.org/pipermail/llvm-dev/2016-November/106868.html llvm-svn: 286776	2016-11-13 20:04:52 +00:00
Matt Arsenault	dc45274d54	AMDGPU: Implement SGPR spilling with scalar stores nThis avoids the nasty problems caused by using memory instructions that read the exec mask while spilling / restoring registers used for control flow masking, but only for VI when these were added. This always uses the scalar stores when enabled currently, but it may be better to still try to spill to a VGPR and use this on the fallback memory path. The cache also needs to be flushed before wave termination if a scalar store is used. llvm-svn: 286766	2016-11-13 18:20:54 +00:00
Igor Breger	e2399f9e0e	revert commit r286761, some builds failed on Win platforms llvm-svn: 286765	2016-11-13 15:48:11 +00:00
Simon Pilgrim	055c09c1c0	[X86][SSE] Add zero lower 32-bits test case for PR30845 llvm-svn: 286764	2016-11-13 15:32:11 +00:00
Simon Pilgrim	8f7c56125e	[X86][AVX512] Add masked VPMOZX test case for PR26762 llvm-svn: 286763	2016-11-13 15:16:43 +00:00
Simon Pilgrim	ce59a536f7	[X86][SSE] Add additional test case for PR30845 llvm-svn: 286762	2016-11-13 14:57:52 +00:00
Ayman Musa	c09b3769ae	[X86][AVX512] Removing llvm x86 intrinsics for _mm_mask_move_{ss\|sd} intrinsics. Differential Revision: https://reviews.llvm.org/D26128 llvm-svn: 286761	2016-11-13 14:51:25 +00:00
Ayman Musa	46af8f9c6f	[X86][AVX512] Add patterns for all variants of VMOVSS/VMOVSD instructions. Differential Revision: https://reviews.llvm.org/D26022 llvm-svn: 286758	2016-11-13 14:29:32 +00:00
Craig Topper	43e97649a1	[AVX-512] Add unmasked intrinsics for variable shifts of dwords and qwords. These will be used to replace the masked intrinsics so that InstCombineCalls can optimize the AVX-512 variable shifts the same way it does for AVX2. llvm-svn: 286754	2016-11-13 07:26:15 +00:00
Konstantin Zhuravlyov	f86e4b7266	[AMDGPU] Add f16 support (VI+) Differential Revision: https://reviews.llvm.org/D25975 llvm-svn: 286753	2016-11-13 07:01:11 +00:00
Craig Topper	706d897d8a	[AVX-512] Move masked shift intrinsics tests to the autoupgrade test file. These missed being moved in r286725. llvm-svn: 286746	2016-11-13 03:42:27 +00:00

1 2 3 4 5 ...

18074 Commits