llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	02cb0ff7db	R600/SI: Fix using mad with multiplies by 2 These turn into fadds, so combine them into the target mad node. fadd (fadd (a, a), b) -> mad 2.0, a, b llvm-svn: 218608	2014-09-29 14:59:34 +00:00
Matt Arsenault	ed8a3e0a08	R600/SI: Add strict check lines to div_scale tests. This has weird operand requirements so it's worthwhile to have very strict checks for its operands. Add different combinations of SGPR operands. llvm-svn: 218535	2014-09-26 17:55:11 +00:00
Matt Arsenault	6a0919fb9b	R600/SI Allow same SGPR to be used for multiple operands Instead of moving the first SGPR that is different than the first, legalize the operand that requires the fewest moves if one SGPR is used for multiple operands. This saves extra moves and is also required for some instructions which require that the same operand be used for multiple operands. llvm-svn: 218532	2014-09-26 17:55:03 +00:00
Matt Arsenault	cb0ac3d1fb	R600/SI: Partially move operand legalization to post-isel hook. Disable the SGPR usage restriction parts of the DAG legalizeOperands. It now should only be doing immediate folding until it can be replaced later. The real legalization work is now done by the other SIInstrInfo::legalizeOperands llvm-svn: 218531	2014-09-26 17:54:59 +00:00
Matt Arsenault	5885bef6cf	R600/SI: Don't move operands that are required to be SGPRs e.g. v_cndmask_b32 requires the condition operand be an SGPR. If one of the source operands were an SGPR, that would be considered the one SGPR use and the condition operand would be illegally moved. llvm-svn: 218529	2014-09-26 17:54:52 +00:00
Matt Arsenault	aff65fbca5	R600/SI: Fix using wrong operand indices when commuting No test since the current SIISelLowering::legalizeOperands effectively hides this, and the general uses seem to only fire on SALU instructions which don't have modifiers between the operands. When trying to use legalizeOperands immediately after instruction selection, it now sees a lot more patterns it did not see before which break on this. llvm-svn: 218527	2014-09-26 17:54:43 +00:00
Matt Arsenault	0c652c3fbc	R600: Avoid repeated check lines llvm-svn: 218487	2014-09-26 01:12:36 +00:00
Matt Arsenault	3a99759498	R600/SI: Fix emitting trailing whitespace after s_waitcnt llvm-svn: 218486	2014-09-26 01:09:46 +00:00
Matt Arsenault	42d1565844	R600: Fix some missing conversion testcases llvm-svn: 218474	2014-09-25 23:16:18 +00:00
Matt Arsenault	c16fafb24d	Remove duplicated RUN lines in middle of test llvm-svn: 218473	2014-09-25 23:16:14 +00:00
Tom Stellard	7980fc8562	R600/SI: Add support for global atomic add llvm-svn: 218457	2014-09-25 18:30:26 +00:00
Matt Arsenault	3e0effa223	R600/SI: Fix weird CHECK-DAG usage This prevents these from failing in a future commit. llvm-svn: 218356	2014-09-24 02:14:26 +00:00
Tom Stellard	744b99b476	R600/SI: Enable selecting SALU inside branches We can do this now that the FixSGPRLiveRanges pass is working. llvm-svn: 218353	2014-09-24 01:33:28 +00:00
Tom Stellard	9f73851e39	Revert "R600/SI: Add support for global atomic add" This reverts commit r218254. The global_atomics.ll test fails with asserts disabled. For some reason, the compiler fails to produce the atomic no return variants. llvm-svn: 218257	2014-09-22 16:44:04 +00:00
Tom Stellard	2355a77e74	R600/SI: Add support for global atomic add llvm-svn: 218254	2014-09-22 15:35:35 +00:00
Matt Arsenault	de0253791c	R600: Un-xfail a test which passes with pass disabled llvm-svn: 218165	2014-09-19 23:02:20 +00:00
Matt Arsenault	5e5b242946	R600/SI: Un-xfail tests which work now llvm-svn: 218164	2014-09-19 23:02:18 +00:00
Matt Arsenault	a986554377	R600/SI: Un xfail a test that works now llvm-svn: 218162	2014-09-19 22:42:40 +00:00
Matt Arsenault	4505f3a73d	R600/SI: Fix test to prepare for scheduler llvm-svn: 218131	2014-09-19 18:11:16 +00:00
Matt Arsenault	46cbc4367b	R600: Better fix for bug 20982 Just do the left shift as unsigned to avoid the UB. llvm-svn: 218092	2014-09-19 00:42:06 +00:00
Matt Arsenault	6462f94884	R600: Bug 20982 - Avoid undefined left shift of negative value I'm not sure what the hardware actually does, so don't bother trying to fold it for now. llvm-svn: 218057	2014-09-18 15:52:26 +00:00
Alexey Samsonov	7bddb0a56a	Exclude known and bugzilled failures from UBSan bootstrap llvm-svn: 217979	2014-09-17 20:17:52 +00:00
Matt Arsenault	02dc26529e	R600/SI: Change formatting of printed FP immediates Only 1 decimal place should be printed for inline immediates. Other constants should be hex constants. Does not include f64 tests because folding those inline immediates currently does not work. llvm-svn: 217964	2014-09-17 17:32:13 +00:00
Matt Arsenault	49dd4283ed	R600/SI: Prefer selecting more e64 instruction forms. Add some more tests to make sure better operand choices are still made. Leave some cases that seem to have no reason to ever be e64 alone. llvm-svn: 217789	2014-09-15 17:15:02 +00:00
Matt Arsenault	0fd0a316ed	R600/SI: Make sure double vector fmul is tested llvm-svn: 217787	2014-09-15 17:04:54 +00:00
Matt Arsenault	72aafd0689	R600/SI: Add some mubuf testcases. I noticed some odd looking cases where addr64 wasn't set when storing to a pointer in an SGPR. This seems to be intentional, and partially tested already. The documentation seems to describe addr64 in terms of which registers addressing modifiers come from, but I would expect to always need addr64 when using 64-bit pointers. If no offset is applied, it makes sense to not need to worry about doing a 64-bit add for the final address. A small immediate offset can be applied, so is it OK to not have addr64 set if a carry is necessary when adding the base pointer in the resource to the offset? llvm-svn: 217785	2014-09-15 16:48:01 +00:00
Matt Arsenault	3f98140c87	R600/SI: Add preliminary support for flat address space llvm-svn: 217777	2014-09-15 15:41:53 +00:00
Matt Arsenault	f620a575bf	R600/SI: Fix broken check lines llvm-svn: 217736	2014-09-14 18:32:05 +00:00
Matt Arsenault	362f345bab	R600/SI: Fix off by 1 error in used register count The register numbers start at 0, so if only 1 register was used, this was reported as 0. llvm-svn: 217636	2014-09-11 22:51:37 +00:00
Matt Arsenault	8239eaab99	Add DAG combine for shl + add of constants. Do (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) This is already done for multiplies, but since multiplies by powers of two are turned into shifts, we also need to handle it here. This might want checks for isLegalAddImmediate to avoid transforming an add of a legal immediate with one that isn't. llvm-svn: 217610	2014-09-11 17:34:19 +00:00
Aaron Watry	3ffc560094	R600: Test local atomics for evergreen Now that the operations are all implemented, we can test this sub-arch here. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 217595	2014-09-11 15:02:52 +00:00
Matt Arsenault	61a528adc7	R600/SI: Fix losing chain when fixing reg class of loads. The lost chain resulting in earlier side effecting nodes being deleted. llvm-svn: 217561	2014-09-10 23:26:19 +00:00
Matt Arsenault	16e313343d	R600: Custom lower frem llvm-svn: 217553	2014-09-10 21:44:27 +00:00
Matt Arsenault	7ac9c4a074	R600/SI: Replace LDS atomics with no return versions llvm-svn: 217379	2014-09-08 15:07:31 +00:00
Matt Arsenault	7b46a59b5a	R600/SI: Relax a few tests to help enable scheduler llvm-svn: 217320	2014-09-06 20:44:41 +00:00
Matt Arsenault	a9fcf62a9c	R600/SI: Fix broken check lines. Fix missing check, and hardcoded register numbers. llvm-svn: 217318	2014-09-06 20:37:56 +00:00
Matt Arsenault	8ae5961065	R600/SI: Use same complex patterns for DS atomics This fixes hitting the same negative base offset problem that was already fixed for regular loads and stores. llvm-svn: 217256	2014-09-05 16:24:58 +00:00
Jan Vesely	d1d1334064	R600: Fix FROUND round halfway cases away from zero Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 217250	2014-09-05 14:26:54 +00:00
Tom Stellard	80942a1b50	R600/SI: Use S_ADD_U32 and S_SUB_U32 for low half of 64-bit operations https://bugs.freedesktop.org/show_bug.cgi?id=83416 llvm-svn: 217248	2014-09-05 14:07:59 +00:00
Matt Arsenault	869cd07158	R600/SI: Try to keep i32 mul on SALU Also fix bug this exposed where when legalizing an immediate operand, a v_mov_b32 would be created with a VSrc dest register. llvm-svn: 217108	2014-09-03 23:24:35 +00:00
Tom Stellard	102c68786c	R600/SI: Add a pattern for i64 and in a branch llvm-svn: 217041	2014-09-03 15:22:41 +00:00
Matt Arsenault	4c24d73709	R600/SI: Relax some ordering in tests. This will help with enabling misched llvm-svn: 216971	2014-09-02 21:45:50 +00:00
Matt Arsenault	b78875e979	R600/SI: Fix hardcoded register numbers in test llvm-svn: 216944	2014-09-02 20:43:07 +00:00
Matt Arsenault	d1649db2fc	R600/SI: Add failing testcase. This is broken when 64-bit add is only partially moved to the VALU. llvm-svn: 216933	2014-09-02 19:12:31 +00:00
Matt Arsenault	c1a71217b3	Fix interference caused by fmul 2, x -> fadd x, x If an fmul was introduced by lowering, it wouldn't be folded into a multiply by a constant since the earlier combine would have replaced the fmul with the fadd. llvm-svn: 216932	2014-09-02 19:02:53 +00:00
Matt Arsenault	8675db15da	R600/SI: Use mad for fsub + fmul We can use a negate source modifier to match this for fsub. llvm-svn: 216735	2014-08-29 16:01:14 +00:00
Tom Stellard	f3fc555e3b	R600/SI: Use READ2/WRITE2 instructions for 64-bit mem ops with 32-bit alignment llvm-svn: 216279	2014-08-22 18:49:35 +00:00
Tom Stellard	85e8b6d5f9	R600/SI: Use a ComplexPattern for DS loads and stores llvm-svn: 216278	2014-08-22 18:49:33 +00:00
Tom Stellard	745f2eddef	R600/SI: Teach moveToVALU how to handle more S_LOAD_* instructions llvm-svn: 216220	2014-08-21 20:41:00 +00:00
Tom Stellard	162a947160	R600/SI: Make sure SCRATCH_WAVE_OFFSET is added as Live-In to the function This fixes a crash in an ocl conformance test. llvm-svn: 216219	2014-08-21 20:40:58 +00:00
Matt Arsenault	fabf545299	R600/SI: Move all fabs / fneg handling to patterns llvm-svn: 215749	2014-08-15 18:42:22 +00:00
Matt Arsenault	13623d0e28	R600/SI: Use source modifiers for f64 fneg llvm-svn: 215748	2014-08-15 18:42:18 +00:00
Matt Arsenault	a147438e37	R600/SI: Use source modifier for f64 fabs llvm-svn: 215747	2014-08-15 18:42:15 +00:00
Matt Arsenault	b2baffaffd	R600/SI: Fix offset folding in some cases with shifted pointers. Ordinarily (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) is only done if the add has one use. If the resulting constant add can be folded into an addressing mode, force this to happen for the pointer operand. This ends up happening a lot because of how LDS objects are allocated. Since the globals are allocated next to each other, acessing the first element of the second object is directly indexed by a shifted pointer. llvm-svn: 215739	2014-08-15 17:49:05 +00:00
Matt Arsenault	2e7cc48baa	R600/SI: Add intrinsic for ldexp llvm-svn: 215734	2014-08-15 17:30:25 +00:00
Matt Arsenault	5015a89aa5	R600/SI: Implement isLegalAddressingMode The default assumes that a 16-bit signed offset is used. LDS instruction use a 16-bit unsigned offset, so it wasn't being used in some cases where it was assumed a negative offset could be used. More should be done here, but first isLegalAddressingMode needs to gain an addressing mode argument. For now, copy most of the rest of the default implementation with the immediate offset change. llvm-svn: 215732	2014-08-15 17:17:07 +00:00
Matt Arsenault	74ef277774	R600: Correctly set the src value offset for scalarized kernel args This for some reason fixes v1i64 kernel arguments on pre-SI. This currently breaks some other cases in the kernel-args.ll test for R600, but I'm not particularly confident in the new output. VTX_READ_* are not used for some of the scalarized cases, and the code reading from the constant buffer doesn't make much sense to me. llvm-svn: 215564	2014-08-13 18:14:11 +00:00
Hal Finkel	415e344f29	Fix classof for ISD::INTRINSIC_W_CHAIN and INTRINSIC_VOID Unfortunately, our use of the SDNode class hierarchy for INTRINSIC_W_CHAIN and INTRINSIC_VOID nodes is somewhat broken right now. These nodes sometimes are used for memory intrinsics (those with MachineMemOperands), and sometimes not. When not, the nodes are not created as instances of MemIntrinsicSDNode, but rather created as some other subclass of SDNode using DAG::getNode. When they are memory intrinsics, they are created using DAG::getMemIntrinsicNode as instances of MemIntrinsicSDNode. MemIntrinsicSDNode is a subclass of MemSDNode, but prior to r214452, we had a non-self-consistent setup whereby MemIntrinsicSDNode::classof on INTRINSIC_W_CHAIN and INTRINSIC_VOID would return true but MemSDNode::classof on INTRINSIC_W_CHAIN and INTRINSIC_VOID would return false. In r214452, MemSDNode::classof was changed to return true for INTRINSIC_W_CHAIN and INTRINSIC_VOID, which is now self-consistent. The problem is that neither the pre-r214452 logic and the post-r214452 logic are really right. The truth is that not all INTRINSIC_W_CHAIN and INTRINSIC_VOID nodes are instances of MemIntrinsicSDNode (or MemSDNode for that matter), and the return value from classof needs to reflect that. This was broken before r214452 (because MemIntrinsicSDNode::classof always returned true), and was broken afterward (because MemSDNode::classof also always returned true), and will now be correct. The minimal solution is to grab one of the SubclassData bits (there is one left for MemIntrinsicSDNode nodes) and use it to store whether or not a particular INTRINSIC_W_CHAIN or INTRINSIC_VOID is really an instance of MemIntrinsicSDNode or not. Doing this allows both MemIntrinsicSDNode::classof and MemSDNode::classof to return the correct answer for the underlying object for both the memory-intrinsic and non-memory-intrinsic cases. This fixes the problem that r214452 created in the SelectionDAGDumper (thanks to Matt Arsenault for pointing it out). Because PowerPC does not implement getTgtMemIntrinsic, this change breaks test/CodeGen/PowerPC/unal-altivec-wint.ll. I've XFAILed it for now, and will fix it in a follow-up commit. llvm-svn: 215511	2014-08-13 01:15:37 +00:00
Jan Vesely	e5ca27d716	R600: Use optimized 24bit path in udivrem v2: drop enum keyword use correct extension mode don't bother computing the sign in unsinged case Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 215462	2014-08-12 17:31:20 +00:00
Jan Vesely	4a33bc6206	R600: Use i24 optimized path for SREM v2: add tests rename LowerSDIV24 to LowerSDIVREM24 handle the rem part in this function Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 215460	2014-08-12 17:31:17 +00:00
Tom Stellard	155bbb7713	R600/SI: Add a ComplexPattern for selecting MUBUF _OFFSET variant This saves us from having to copy a 64-bit 0 value into VGPRs for BUFFER_* instruction which only have a 12-bit immediate offset. llvm-svn: 215399	2014-08-11 22:18:17 +00:00
Tom Stellard	48194cdd81	R600/SI: Add check for low 32 bits of encoding to mubuf tests There are no variable values like registers encoded in the low 32 bits of MUBUF instructions, so it is relatively easy to check these bits, and it will help prevent us from introducing encoding bugs. llvm-svn: 215397	2014-08-11 22:18:11 +00:00
Tom Stellard	93ba12f163	R600/SI: Clear lds bit on MUBUF instructions used for private stores This bit was left uninitialized, which was causing some random failures of piglit tests. NOTE: This is a candidate for the 3.5 branch. llvm-svn: 215396	2014-08-11 22:18:09 +00:00
Tom Stellard	e04fd9d430	R600/SI: Fix broken test llvm-svn: 215395	2014-08-11 22:18:05 +00:00
Tom Stellard	c0503db9e2	R600/SI: Custom lower CONCAT_VECTORS This will lower them using register copies rather than loads and stores to the stack. llvm-svn: 215270	2014-08-09 01:06:56 +00:00
Tom Stellard	4f575f7aaf	R600/SI: Update concat_vectors.ll to check for scratch usage These tests were using SI-NOT: MOVREL to make sure concat vectors weren't being lowered to stack loads and stores, but we are using scratch buffers for the stack now instead of registers, so we need to add an additional SI-NOT check for scratch buffers. With this change I was able to uncover one broken test which will be fixed in a future commit. llvm-svn: 215269	2014-08-09 01:06:53 +00:00
Matt Arsenault	a6dc6c281c	R600: Cleanup fadd and fsub tests llvm-svn: 214991	2014-08-06 20:27:55 +00:00
Matt Arsenault	d5f4de27b6	R600: Increase nearby load scheduling threshold. This partially fixes weird looking load scheduling in memcpy test. The load clustering doesn't seem particularly smart, but this method seems to be partially deprecated so it might not be worth trying to fix. llvm-svn: 214943	2014-08-06 00:29:49 +00:00
Matt Arsenault	c10853f29f	R600/SI: Implement areLoadsFromSameBasePtr This currently has a noticable effect on the kernel argument loads. LDS and global loads are more problematic, I think because of how copies are currently inserted to ensure that the address is a VGPR. llvm-svn: 214942	2014-08-06 00:29:43 +00:00
Tom Stellard	229d5e669b	R600/SI: Update MUBUF assembly string to match AMD proprietary compiler llvm-svn: 214866	2014-08-05 14:48:12 +00:00
Tom Stellard	b37f797678	R600/SI: Avoid generating REGISTER_LOAD instructions. SI doesn't use REGISTER_LOAD anymore, but it was still hitting this code path for 8-bit and 16-bit private loads. llvm-svn: 214865	2014-08-05 14:40:52 +00:00
Matt Arsenault	9215b17eb7	R600/SI: Fix extra whitespace in asm str This slipped in in r214467, so something like V_MOV_B32_e32 v0, ... is now printed with 2 spaces between the instruction name and first operand. llvm-svn: 214660	2014-08-03 05:27:14 +00:00
Matt Arsenault	4de324442b	R600: Cleanup fneg tests llvm-svn: 214612	2014-08-02 02:26:51 +00:00
Tom Stellard	4973a13680	Revert "R600: Move code for generating REGISTER_LOAD into R600ISelLowering.cpp" This reverts commit r214566. I did not mean to commit this yet. llvm-svn: 214572	2014-08-01 21:55:50 +00:00
Tom Stellard	c16f73d7c5	R600: Move code for generating REGISTER_LOAD into R600ISelLowering.cpp SI doesn't use REGISTER_LOAD anymore, but it was still hitting this code path for 8-bit and 16-bit private loads. llvm-svn: 214566	2014-08-01 21:50:47 +00:00
Matt Arsenault	06bd3933ba	R600: Cleanup test Remove -CHECKs, use multiple prefixes, name values, also test the @llvm.fabs version llvm-svn: 214525	2014-08-01 17:00:29 +00:00
Tom Stellard	b4a313a76f	R600/SI: Do abs/neg folding with ComplexPatterns Abs/neg folding has moved out of foldOperands and into the instruction selection phase using complex patterns. As a consequence of this change, we now prefer to select the 64-bit encoding for most instructions and the modifier operands have been dropped from integer VOP3 instructions. llvm-svn: 214467	2014-08-01 00:32:39 +00:00
Tom Stellard	6407e1e632	R600/SI: Fold immediates when shrinking instructions This will prevent us from using extra MOV instructions once we prefer selecting 64-bit instructions. llvm-svn: 214464	2014-08-01 00:32:33 +00:00
Tom Stellard	86d12ebdbd	R600/SI: Fix incorrect commute operation in shrink instructions pass We were commuting the instruction by still shrinking it using the original opcode. NOTE: This is a candidate for the 3.5 branch. llvm-svn: 214463	2014-08-01 00:32:28 +00:00
Jan Vesely	3047950964	R600: Modernize work item intrinsics test Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com> llvm-svn: 214451	2014-07-31 22:11:03 +00:00
Matt Arsenault	2b252ecf6b	R600: Modernize test llvm-svn: 214108	2014-07-28 18:06:08 +00:00
Matt Arsenault	46645fa102	R600/SI: Implement getOptimalMemOpType The default guess uses i32. This needs an address space argument to really do the right thing in all cases. llvm-svn: 214104	2014-07-28 17:49:26 +00:00
Matt Arsenault	6f2a526101	Add alignment value to allowsUnalignedMemoryAccess Rename to allowsMisalignedMemoryAccess. On R600, 8 and 16 byte accesses are mostly OK with 4-byte alignment, and don't need to be split into multiple accesses. Vector loads with an alignment of the element type are not uncommon in OpenCL code. llvm-svn: 214055	2014-07-27 17:46:40 +00:00
Matt Arsenault	24aa028cfa	R600/SI: Fix broken test. There was no check prefix for the instruction lines. Match what is emitted though, although I'm pretty sure it is incorrect. llvm-svn: 214035	2014-07-26 21:21:42 +00:00
Chandler Carruth	411fb407f8	[SDAG] When performing post-legalize DAG combining, run the legalizer over each node in the worklist prior to combining. This allows the combiner to produce new nodes which need to go back through legalization. This is particularly useful when generating operands to target specific nodes in a post-legalize DAG combine where the operands are significantly easier to express as pre-legalized operations. My immediate use case will be PSHUFB formation where we need to build a constant shuffle mask with a build_vector node. This also refactors the relevant functionality in the legalizer to support this, and updates relevant tests. I've spoken to the R600 folks and these changes look like improvements to them. The avx512 change needs to be investigated, I suspect there is a disagreement between the legalizer and the DAG combiner there, but it seems a minor issue so leaving it to be re-evaluated after this patch. Differential Revision: http://reviews.llvm.org/D4564 llvm-svn: 214020	2014-07-26 05:49:40 +00:00
Chandler Carruth	9f4530b95d	[SDAG] Introduce a combined set to the DAG combiner which tracks nodes which have successfully round-tripped through the combine phase, and use this to ensure all operands to DAG nodes are visited by the combiner, even if they are only added during the combine phase. This is critical to have the combiner reach nodes that are introduced during combining. Previously these would sometimes be visited and sometimes not be visited based on whether they happened to end up on the worklist or not. Now we always run them through the combiner. This fixes quite a few bad codegen test cases lurking in the suite while also being more principled. Among these, the TLS codegeneration is particularly exciting for programs that have this in the critical path like TSan-instrumented binaries (although I think they engineer to use a different TLS that is faster anyways). I've tried to check for compile-time regressions here by running llc over a merged (but not LTO-ed) clang bitcode file and observed at most a 3% slowdown in llc. Given that this is essentially a worst case (none of opt or clang are running at this phase) I think this is tolerable. The actual LTO case should be even less costly, and the cost in normal compilation should be negligible. With this combining logic, it is possible to re-legalize as we combine which is necessary to implement PSHUFB formation on x86 as a post-legalize DAG combine (my ultimate goal). Differential Revision: http://reviews.llvm.org/D4638 llvm-svn: 213898	2014-07-24 22:15:28 +00:00
Matt Arsenault	83592a2d32	R600: Add FMA instructions for Evergreen llvm-svn: 213882	2014-07-24 17:41:01 +00:00
Matt Arsenault	9acb978105	R600: Match rcp node on pre-SI llvm-svn: 213844	2014-07-24 06:59:24 +00:00
Matt Arsenault	0daeb63f03	R600: Fix LowerSDIV24 Use ComputeNumSignBits instead of checking for i8 / i16 which only worked when AMDIL was lying about having legal i8 / i16. If an integer is known to fit in 24-bits, we can do division faster with float ops. llvm-svn: 213843	2014-07-24 06:59:20 +00:00
Chandler Carruth	9a0051cd59	[SDAG] Make the DAGCombine worklist not grow endlessly due to duplicate insertions. The old behavior could cause arbitrarily bad memory usage in the DAG combiner if there was heavy traffic of adding nodes already on the worklist to it. This commit switches the DAG combine worklist to work the same way as the instcombine worklist where we null-out removed entries and only add new entries to the worklist. My measurements of codegen time shows slight improvement. The memory utilization is unsurprisingly dominated by other factors (the IR and DAG itself I suspect). This change results in subtle, frustrating churn in the particular order in which DAG combines are applied which causes a number of minor regressions where we fail to match a pattern previously matched by accident. AFAICT, all of these should be using AddToWorklist to directly or should be written in a less brittle way. None of the changes seem drastically bad, and a few of the changes seem distinctly better. A major change required to make this work is to significantly harden the way in which the DAG combiner handle nodes which become dead (zero-uses). Previously, we relied on the ability to "priority-bump" them on the combine worklist to achieve recursive deletion of these nodes and ensure that the frontier of remaining live nodes all were added to the worklist. Instead, I've introduced a routine to just implement that precise logic with no indirection. It is a significantly simpler operation than that of the combiner worklist proper. I suspect this will also fix some other problems with the combiner. I think the x86 changes are really minor and uninteresting, but the avx512 change at least is hiding a "regression" (despite the test case being just noise, not testing some performance invariant) that might be looked into. Not sure if any of the others impact specific "important" code paths, but they didn't look terribly interesting to me, or the changes were really minor. The consensus in review is to fix any regressions that show up after the fact here. Thanks to the other reviewers for checking the output on other architectures. There is a specific regression on ARM that Tim already has a fix prepped to commit. Differential Revision: http://reviews.llvm.org/D4616 llvm-svn: 213727	2014-07-23 07:08:53 +00:00
Tom Stellard	1aaad6970c	R600/SI: Add instruction shrinking pass This pass converts 64-bit instructions to 32-bit when possible. llvm-svn: 213561	2014-07-21 16:55:33 +00:00
Tom Stellard	e812f2fdd8	R600/SI: Clean up some of the unused REGISTER_{LOAD,STORE} code There are a few more cleanups to do, but I ran into some problems with ext loads and trunc stores, when I tried to change some of the vector loads and stores from custom to legal, so I wasn't able to get rid of everything. llvm-svn: 213552	2014-07-21 15:45:06 +00:00
Tom Stellard	b02094e115	R600/SI: Use scratch memory for large private arrays llvm-svn: 213551	2014-07-21 15:45:01 +00:00
Tom Stellard	067c81567b	R600/SI: Store constant initializer data in constant memory This implements a solution for constant initializers suggested by Vadim Girlin, where we store the data after the shader code and then use the S_GETPC instruction to compute its address. This saves use the trouble of creating a new buffer for constant data and then having to pass the pointer to the kernel via user SGPRs or the input buffer. llvm-svn: 213530	2014-07-21 14:01:14 +00:00
Tom Stellard	54a3b65bb9	R600/SI: Use VALU for i1 XOR llvm-svn: 213528	2014-07-21 14:01:10 +00:00
Matt Arsenault	4100ebd67b	R600: Add missing test for concat_vectors llvm-svn: 213473	2014-07-20 07:13:17 +00:00
Matt Arsenault	e261b6e853	R600/SI: Remove dead code and add missing tests. This probably was killed by some generic DAGCombiner improvements in checking the TargetBooleanContents instead of just 1. llvm-svn: 213471	2014-07-20 06:11:02 +00:00
Matt Arsenault	ad14ce84b7	R600/SI: implement range reduction for sin/cos These instructions can only take a limited input range, and return the constant value 1 out of range. We should do range reduction to be able to process arbitrary values. Use a FRACT instruction after normalization to achieve this. Also add a test for constant folding with the lowered code with unsafe-fp-math enabled. v2: use DAG lowering instead of intrinsic, adapt test v3: calculate constant, fold pattern into instruction definition v4: misc style fixes, add sin-fold testcase, cosmetics Patch by Grigori Goronzy llvm-svn: 213458	2014-07-19 18:44:39 +00:00
Tim Northover	00fdbbbf60	R600: support fpext/fptrunc operations to and from f16. llvm-svn: 213376	2014-07-18 13:01:37 +00:00
Tim Northover	20bd0ced30	CodeGen: soften f16 type by default instead of marking legal. Actual support for softening f16 operations is still limited, and can be added when it's needed. But Soften is much closer to being a useful thing to try than keeping it Legal when no registers can actually hold such values. Longer term, we probably want something between Soften and Promote semantics for most targets, it'll be more efficient to promote the 4 basic operations to f32 than libcall them. llvm-svn: 213372	2014-07-18 12:41:46 +00:00

1 2 3 4 5 ...

668 Commits