llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	5885bef6cf	R600/SI: Don't move operands that are required to be SGPRs e.g. v_cndmask_b32 requires the condition operand be an SGPR. If one of the source operands were an SGPR, that would be considered the one SGPR use and the condition operand would be illegally moved. llvm-svn: 218529	2014-09-26 17:54:52 +00:00
Matt Arsenault	aff65fbca5	R600/SI: Fix using wrong operand indices when commuting No test since the current SIISelLowering::legalizeOperands effectively hides this, and the general uses seem to only fire on SALU instructions which don't have modifiers between the operands. When trying to use legalizeOperands immediately after instruction selection, it now sees a lot more patterns it did not see before which break on this. llvm-svn: 218527	2014-09-26 17:54:43 +00:00
Matt Arsenault	0c652c3fbc	R600: Avoid repeated check lines llvm-svn: 218487	2014-09-26 01:12:36 +00:00
Matt Arsenault	3a99759498	R600/SI: Fix emitting trailing whitespace after s_waitcnt llvm-svn: 218486	2014-09-26 01:09:46 +00:00
Matt Arsenault	42d1565844	R600: Fix some missing conversion testcases llvm-svn: 218474	2014-09-25 23:16:18 +00:00
Matt Arsenault	c16fafb24d	Remove duplicated RUN lines in middle of test llvm-svn: 218473	2014-09-25 23:16:14 +00:00
Tom Stellard	7980fc8562	R600/SI: Add support for global atomic add llvm-svn: 218457	2014-09-25 18:30:26 +00:00
Matt Arsenault	3e0effa223	R600/SI: Fix weird CHECK-DAG usage This prevents these from failing in a future commit. llvm-svn: 218356	2014-09-24 02:14:26 +00:00
Tom Stellard	744b99b476	R600/SI: Enable selecting SALU inside branches We can do this now that the FixSGPRLiveRanges pass is working. llvm-svn: 218353	2014-09-24 01:33:28 +00:00
Tom Stellard	9f73851e39	Revert "R600/SI: Add support for global atomic add" This reverts commit r218254. The global_atomics.ll test fails with asserts disabled. For some reason, the compiler fails to produce the atomic no return variants. llvm-svn: 218257	2014-09-22 16:44:04 +00:00
Tom Stellard	2355a77e74	R600/SI: Add support for global atomic add llvm-svn: 218254	2014-09-22 15:35:35 +00:00
Matt Arsenault	de0253791c	R600: Un-xfail a test which passes with pass disabled llvm-svn: 218165	2014-09-19 23:02:20 +00:00
Matt Arsenault	5e5b242946	R600/SI: Un-xfail tests which work now llvm-svn: 218164	2014-09-19 23:02:18 +00:00
Matt Arsenault	a986554377	R600/SI: Un xfail a test that works now llvm-svn: 218162	2014-09-19 22:42:40 +00:00
Matt Arsenault	4505f3a73d	R600/SI: Fix test to prepare for scheduler llvm-svn: 218131	2014-09-19 18:11:16 +00:00
Matt Arsenault	46cbc4367b	R600: Better fix for bug 20982 Just do the left shift as unsigned to avoid the UB. llvm-svn: 218092	2014-09-19 00:42:06 +00:00
Matt Arsenault	6462f94884	R600: Bug 20982 - Avoid undefined left shift of negative value I'm not sure what the hardware actually does, so don't bother trying to fold it for now. llvm-svn: 218057	2014-09-18 15:52:26 +00:00
Alexey Samsonov	7bddb0a56a	Exclude known and bugzilled failures from UBSan bootstrap llvm-svn: 217979	2014-09-17 20:17:52 +00:00
Matt Arsenault	02dc26529e	R600/SI: Change formatting of printed FP immediates Only 1 decimal place should be printed for inline immediates. Other constants should be hex constants. Does not include f64 tests because folding those inline immediates currently does not work. llvm-svn: 217964	2014-09-17 17:32:13 +00:00
Matt Arsenault	49dd4283ed	R600/SI: Prefer selecting more e64 instruction forms. Add some more tests to make sure better operand choices are still made. Leave some cases that seem to have no reason to ever be e64 alone. llvm-svn: 217789	2014-09-15 17:15:02 +00:00
Matt Arsenault	0fd0a316ed	R600/SI: Make sure double vector fmul is tested llvm-svn: 217787	2014-09-15 17:04:54 +00:00
Matt Arsenault	72aafd0689	R600/SI: Add some mubuf testcases. I noticed some odd looking cases where addr64 wasn't set when storing to a pointer in an SGPR. This seems to be intentional, and partially tested already. The documentation seems to describe addr64 in terms of which registers addressing modifiers come from, but I would expect to always need addr64 when using 64-bit pointers. If no offset is applied, it makes sense to not need to worry about doing a 64-bit add for the final address. A small immediate offset can be applied, so is it OK to not have addr64 set if a carry is necessary when adding the base pointer in the resource to the offset? llvm-svn: 217785	2014-09-15 16:48:01 +00:00
Matt Arsenault	3f98140c87	R600/SI: Add preliminary support for flat address space llvm-svn: 217777	2014-09-15 15:41:53 +00:00
Matt Arsenault	f620a575bf	R600/SI: Fix broken check lines llvm-svn: 217736	2014-09-14 18:32:05 +00:00
Matt Arsenault	362f345bab	R600/SI: Fix off by 1 error in used register count The register numbers start at 0, so if only 1 register was used, this was reported as 0. llvm-svn: 217636	2014-09-11 22:51:37 +00:00
Matt Arsenault	8239eaab99	Add DAG combine for shl + add of constants. Do (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) This is already done for multiplies, but since multiplies by powers of two are turned into shifts, we also need to handle it here. This might want checks for isLegalAddImmediate to avoid transforming an add of a legal immediate with one that isn't. llvm-svn: 217610	2014-09-11 17:34:19 +00:00
Aaron Watry	3ffc560094	R600: Test local atomics for evergreen Now that the operations are all implemented, we can test this sub-arch here. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 217595	2014-09-11 15:02:52 +00:00
Matt Arsenault	61a528adc7	R600/SI: Fix losing chain when fixing reg class of loads. The lost chain resulting in earlier side effecting nodes being deleted. llvm-svn: 217561	2014-09-10 23:26:19 +00:00
Matt Arsenault	16e313343d	R600: Custom lower frem llvm-svn: 217553	2014-09-10 21:44:27 +00:00
Matt Arsenault	7ac9c4a074	R600/SI: Replace LDS atomics with no return versions llvm-svn: 217379	2014-09-08 15:07:31 +00:00
Matt Arsenault	7b46a59b5a	R600/SI: Relax a few tests to help enable scheduler llvm-svn: 217320	2014-09-06 20:44:41 +00:00
Matt Arsenault	a9fcf62a9c	R600/SI: Fix broken check lines. Fix missing check, and hardcoded register numbers. llvm-svn: 217318	2014-09-06 20:37:56 +00:00
Matt Arsenault	8ae5961065	R600/SI: Use same complex patterns for DS atomics This fixes hitting the same negative base offset problem that was already fixed for regular loads and stores. llvm-svn: 217256	2014-09-05 16:24:58 +00:00
Jan Vesely	d1d1334064	R600: Fix FROUND round halfway cases away from zero Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 217250	2014-09-05 14:26:54 +00:00
Tom Stellard	80942a1b50	R600/SI: Use S_ADD_U32 and S_SUB_U32 for low half of 64-bit operations https://bugs.freedesktop.org/show_bug.cgi?id=83416 llvm-svn: 217248	2014-09-05 14:07:59 +00:00
Matt Arsenault	869cd07158	R600/SI: Try to keep i32 mul on SALU Also fix bug this exposed where when legalizing an immediate operand, a v_mov_b32 would be created with a VSrc dest register. llvm-svn: 217108	2014-09-03 23:24:35 +00:00
Tom Stellard	102c68786c	R600/SI: Add a pattern for i64 and in a branch llvm-svn: 217041	2014-09-03 15:22:41 +00:00
Matt Arsenault	4c24d73709	R600/SI: Relax some ordering in tests. This will help with enabling misched llvm-svn: 216971	2014-09-02 21:45:50 +00:00
Matt Arsenault	b78875e979	R600/SI: Fix hardcoded register numbers in test llvm-svn: 216944	2014-09-02 20:43:07 +00:00
Matt Arsenault	d1649db2fc	R600/SI: Add failing testcase. This is broken when 64-bit add is only partially moved to the VALU. llvm-svn: 216933	2014-09-02 19:12:31 +00:00
Matt Arsenault	c1a71217b3	Fix interference caused by fmul 2, x -> fadd x, x If an fmul was introduced by lowering, it wouldn't be folded into a multiply by a constant since the earlier combine would have replaced the fmul with the fadd. llvm-svn: 216932	2014-09-02 19:02:53 +00:00
Matt Arsenault	8675db15da	R600/SI: Use mad for fsub + fmul We can use a negate source modifier to match this for fsub. llvm-svn: 216735	2014-08-29 16:01:14 +00:00
Tom Stellard	f3fc555e3b	R600/SI: Use READ2/WRITE2 instructions for 64-bit mem ops with 32-bit alignment llvm-svn: 216279	2014-08-22 18:49:35 +00:00
Tom Stellard	85e8b6d5f9	R600/SI: Use a ComplexPattern for DS loads and stores llvm-svn: 216278	2014-08-22 18:49:33 +00:00
Tom Stellard	745f2eddef	R600/SI: Teach moveToVALU how to handle more S_LOAD_* instructions llvm-svn: 216220	2014-08-21 20:41:00 +00:00
Tom Stellard	162a947160	R600/SI: Make sure SCRATCH_WAVE_OFFSET is added as Live-In to the function This fixes a crash in an ocl conformance test. llvm-svn: 216219	2014-08-21 20:40:58 +00:00
Matt Arsenault	fabf545299	R600/SI: Move all fabs / fneg handling to patterns llvm-svn: 215749	2014-08-15 18:42:22 +00:00
Matt Arsenault	13623d0e28	R600/SI: Use source modifiers for f64 fneg llvm-svn: 215748	2014-08-15 18:42:18 +00:00
Matt Arsenault	a147438e37	R600/SI: Use source modifier for f64 fabs llvm-svn: 215747	2014-08-15 18:42:15 +00:00
Matt Arsenault	b2baffaffd	R600/SI: Fix offset folding in some cases with shifted pointers. Ordinarily (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) is only done if the add has one use. If the resulting constant add can be folded into an addressing mode, force this to happen for the pointer operand. This ends up happening a lot because of how LDS objects are allocated. Since the globals are allocated next to each other, acessing the first element of the second object is directly indexed by a shifted pointer. llvm-svn: 215739	2014-08-15 17:49:05 +00:00
Matt Arsenault	2e7cc48baa	R600/SI: Add intrinsic for ldexp llvm-svn: 215734	2014-08-15 17:30:25 +00:00
Matt Arsenault	5015a89aa5	R600/SI: Implement isLegalAddressingMode The default assumes that a 16-bit signed offset is used. LDS instruction use a 16-bit unsigned offset, so it wasn't being used in some cases where it was assumed a negative offset could be used. More should be done here, but first isLegalAddressingMode needs to gain an addressing mode argument. For now, copy most of the rest of the default implementation with the immediate offset change. llvm-svn: 215732	2014-08-15 17:17:07 +00:00
Matt Arsenault	74ef277774	R600: Correctly set the src value offset for scalarized kernel args This for some reason fixes v1i64 kernel arguments on pre-SI. This currently breaks some other cases in the kernel-args.ll test for R600, but I'm not particularly confident in the new output. VTX_READ_* are not used for some of the scalarized cases, and the code reading from the constant buffer doesn't make much sense to me. llvm-svn: 215564	2014-08-13 18:14:11 +00:00
Hal Finkel	415e344f29	Fix classof for ISD::INTRINSIC_W_CHAIN and INTRINSIC_VOID Unfortunately, our use of the SDNode class hierarchy for INTRINSIC_W_CHAIN and INTRINSIC_VOID nodes is somewhat broken right now. These nodes sometimes are used for memory intrinsics (those with MachineMemOperands), and sometimes not. When not, the nodes are not created as instances of MemIntrinsicSDNode, but rather created as some other subclass of SDNode using DAG::getNode. When they are memory intrinsics, they are created using DAG::getMemIntrinsicNode as instances of MemIntrinsicSDNode. MemIntrinsicSDNode is a subclass of MemSDNode, but prior to r214452, we had a non-self-consistent setup whereby MemIntrinsicSDNode::classof on INTRINSIC_W_CHAIN and INTRINSIC_VOID would return true but MemSDNode::classof on INTRINSIC_W_CHAIN and INTRINSIC_VOID would return false. In r214452, MemSDNode::classof was changed to return true for INTRINSIC_W_CHAIN and INTRINSIC_VOID, which is now self-consistent. The problem is that neither the pre-r214452 logic and the post-r214452 logic are really right. The truth is that not all INTRINSIC_W_CHAIN and INTRINSIC_VOID nodes are instances of MemIntrinsicSDNode (or MemSDNode for that matter), and the return value from classof needs to reflect that. This was broken before r214452 (because MemIntrinsicSDNode::classof always returned true), and was broken afterward (because MemSDNode::classof also always returned true), and will now be correct. The minimal solution is to grab one of the SubclassData bits (there is one left for MemIntrinsicSDNode nodes) and use it to store whether or not a particular INTRINSIC_W_CHAIN or INTRINSIC_VOID is really an instance of MemIntrinsicSDNode or not. Doing this allows both MemIntrinsicSDNode::classof and MemSDNode::classof to return the correct answer for the underlying object for both the memory-intrinsic and non-memory-intrinsic cases. This fixes the problem that r214452 created in the SelectionDAGDumper (thanks to Matt Arsenault for pointing it out). Because PowerPC does not implement getTgtMemIntrinsic, this change breaks test/CodeGen/PowerPC/unal-altivec-wint.ll. I've XFAILed it for now, and will fix it in a follow-up commit. llvm-svn: 215511	2014-08-13 01:15:37 +00:00
Jan Vesely	e5ca27d716	R600: Use optimized 24bit path in udivrem v2: drop enum keyword use correct extension mode don't bother computing the sign in unsinged case Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 215462	2014-08-12 17:31:20 +00:00
Jan Vesely	4a33bc6206	R600: Use i24 optimized path for SREM v2: add tests rename LowerSDIV24 to LowerSDIVREM24 handle the rem part in this function Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 215460	2014-08-12 17:31:17 +00:00
Tom Stellard	155bbb7713	R600/SI: Add a ComplexPattern for selecting MUBUF _OFFSET variant This saves us from having to copy a 64-bit 0 value into VGPRs for BUFFER_* instruction which only have a 12-bit immediate offset. llvm-svn: 215399	2014-08-11 22:18:17 +00:00
Tom Stellard	48194cdd81	R600/SI: Add check for low 32 bits of encoding to mubuf tests There are no variable values like registers encoded in the low 32 bits of MUBUF instructions, so it is relatively easy to check these bits, and it will help prevent us from introducing encoding bugs. llvm-svn: 215397	2014-08-11 22:18:11 +00:00
Tom Stellard	93ba12f163	R600/SI: Clear lds bit on MUBUF instructions used for private stores This bit was left uninitialized, which was causing some random failures of piglit tests. NOTE: This is a candidate for the 3.5 branch. llvm-svn: 215396	2014-08-11 22:18:09 +00:00
Tom Stellard	e04fd9d430	R600/SI: Fix broken test llvm-svn: 215395	2014-08-11 22:18:05 +00:00
Tom Stellard	c0503db9e2	R600/SI: Custom lower CONCAT_VECTORS This will lower them using register copies rather than loads and stores to the stack. llvm-svn: 215270	2014-08-09 01:06:56 +00:00
Tom Stellard	4f575f7aaf	R600/SI: Update concat_vectors.ll to check for scratch usage These tests were using SI-NOT: MOVREL to make sure concat vectors weren't being lowered to stack loads and stores, but we are using scratch buffers for the stack now instead of registers, so we need to add an additional SI-NOT check for scratch buffers. With this change I was able to uncover one broken test which will be fixed in a future commit. llvm-svn: 215269	2014-08-09 01:06:53 +00:00
Matt Arsenault	a6dc6c281c	R600: Cleanup fadd and fsub tests llvm-svn: 214991	2014-08-06 20:27:55 +00:00
Matt Arsenault	d5f4de27b6	R600: Increase nearby load scheduling threshold. This partially fixes weird looking load scheduling in memcpy test. The load clustering doesn't seem particularly smart, but this method seems to be partially deprecated so it might not be worth trying to fix. llvm-svn: 214943	2014-08-06 00:29:49 +00:00
Matt Arsenault	c10853f29f	R600/SI: Implement areLoadsFromSameBasePtr This currently has a noticable effect on the kernel argument loads. LDS and global loads are more problematic, I think because of how copies are currently inserted to ensure that the address is a VGPR. llvm-svn: 214942	2014-08-06 00:29:43 +00:00
Tom Stellard	229d5e669b	R600/SI: Update MUBUF assembly string to match AMD proprietary compiler llvm-svn: 214866	2014-08-05 14:48:12 +00:00
Tom Stellard	b37f797678	R600/SI: Avoid generating REGISTER_LOAD instructions. SI doesn't use REGISTER_LOAD anymore, but it was still hitting this code path for 8-bit and 16-bit private loads. llvm-svn: 214865	2014-08-05 14:40:52 +00:00
Matt Arsenault	9215b17eb7	R600/SI: Fix extra whitespace in asm str This slipped in in r214467, so something like V_MOV_B32_e32 v0, ... is now printed with 2 spaces between the instruction name and first operand. llvm-svn: 214660	2014-08-03 05:27:14 +00:00
Matt Arsenault	4de324442b	R600: Cleanup fneg tests llvm-svn: 214612	2014-08-02 02:26:51 +00:00
Tom Stellard	4973a13680	Revert "R600: Move code for generating REGISTER_LOAD into R600ISelLowering.cpp" This reverts commit r214566. I did not mean to commit this yet. llvm-svn: 214572	2014-08-01 21:55:50 +00:00
Tom Stellard	c16f73d7c5	R600: Move code for generating REGISTER_LOAD into R600ISelLowering.cpp SI doesn't use REGISTER_LOAD anymore, but it was still hitting this code path for 8-bit and 16-bit private loads. llvm-svn: 214566	2014-08-01 21:50:47 +00:00
Matt Arsenault	06bd3933ba	R600: Cleanup test Remove -CHECKs, use multiple prefixes, name values, also test the @llvm.fabs version llvm-svn: 214525	2014-08-01 17:00:29 +00:00
Tom Stellard	b4a313a76f	R600/SI: Do abs/neg folding with ComplexPatterns Abs/neg folding has moved out of foldOperands and into the instruction selection phase using complex patterns. As a consequence of this change, we now prefer to select the 64-bit encoding for most instructions and the modifier operands have been dropped from integer VOP3 instructions. llvm-svn: 214467	2014-08-01 00:32:39 +00:00
Tom Stellard	6407e1e632	R600/SI: Fold immediates when shrinking instructions This will prevent us from using extra MOV instructions once we prefer selecting 64-bit instructions. llvm-svn: 214464	2014-08-01 00:32:33 +00:00
Tom Stellard	86d12ebdbd	R600/SI: Fix incorrect commute operation in shrink instructions pass We were commuting the instruction by still shrinking it using the original opcode. NOTE: This is a candidate for the 3.5 branch. llvm-svn: 214463	2014-08-01 00:32:28 +00:00
Jan Vesely	3047950964	R600: Modernize work item intrinsics test Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com> llvm-svn: 214451	2014-07-31 22:11:03 +00:00
Matt Arsenault	2b252ecf6b	R600: Modernize test llvm-svn: 214108	2014-07-28 18:06:08 +00:00
Matt Arsenault	46645fa102	R600/SI: Implement getOptimalMemOpType The default guess uses i32. This needs an address space argument to really do the right thing in all cases. llvm-svn: 214104	2014-07-28 17:49:26 +00:00
Matt Arsenault	6f2a526101	Add alignment value to allowsUnalignedMemoryAccess Rename to allowsMisalignedMemoryAccess. On R600, 8 and 16 byte accesses are mostly OK with 4-byte alignment, and don't need to be split into multiple accesses. Vector loads with an alignment of the element type are not uncommon in OpenCL code. llvm-svn: 214055	2014-07-27 17:46:40 +00:00
Matt Arsenault	24aa028cfa	R600/SI: Fix broken test. There was no check prefix for the instruction lines. Match what is emitted though, although I'm pretty sure it is incorrect. llvm-svn: 214035	2014-07-26 21:21:42 +00:00
Chandler Carruth	411fb407f8	[SDAG] When performing post-legalize DAG combining, run the legalizer over each node in the worklist prior to combining. This allows the combiner to produce new nodes which need to go back through legalization. This is particularly useful when generating operands to target specific nodes in a post-legalize DAG combine where the operands are significantly easier to express as pre-legalized operations. My immediate use case will be PSHUFB formation where we need to build a constant shuffle mask with a build_vector node. This also refactors the relevant functionality in the legalizer to support this, and updates relevant tests. I've spoken to the R600 folks and these changes look like improvements to them. The avx512 change needs to be investigated, I suspect there is a disagreement between the legalizer and the DAG combiner there, but it seems a minor issue so leaving it to be re-evaluated after this patch. Differential Revision: http://reviews.llvm.org/D4564 llvm-svn: 214020	2014-07-26 05:49:40 +00:00
Chandler Carruth	9f4530b95d	[SDAG] Introduce a combined set to the DAG combiner which tracks nodes which have successfully round-tripped through the combine phase, and use this to ensure all operands to DAG nodes are visited by the combiner, even if they are only added during the combine phase. This is critical to have the combiner reach nodes that are introduced during combining. Previously these would sometimes be visited and sometimes not be visited based on whether they happened to end up on the worklist or not. Now we always run them through the combiner. This fixes quite a few bad codegen test cases lurking in the suite while also being more principled. Among these, the TLS codegeneration is particularly exciting for programs that have this in the critical path like TSan-instrumented binaries (although I think they engineer to use a different TLS that is faster anyways). I've tried to check for compile-time regressions here by running llc over a merged (but not LTO-ed) clang bitcode file and observed at most a 3% slowdown in llc. Given that this is essentially a worst case (none of opt or clang are running at this phase) I think this is tolerable. The actual LTO case should be even less costly, and the cost in normal compilation should be negligible. With this combining logic, it is possible to re-legalize as we combine which is necessary to implement PSHUFB formation on x86 as a post-legalize DAG combine (my ultimate goal). Differential Revision: http://reviews.llvm.org/D4638 llvm-svn: 213898	2014-07-24 22:15:28 +00:00
Matt Arsenault	83592a2d32	R600: Add FMA instructions for Evergreen llvm-svn: 213882	2014-07-24 17:41:01 +00:00
Matt Arsenault	9acb978105	R600: Match rcp node on pre-SI llvm-svn: 213844	2014-07-24 06:59:24 +00:00
Matt Arsenault	0daeb63f03	R600: Fix LowerSDIV24 Use ComputeNumSignBits instead of checking for i8 / i16 which only worked when AMDIL was lying about having legal i8 / i16. If an integer is known to fit in 24-bits, we can do division faster with float ops. llvm-svn: 213843	2014-07-24 06:59:20 +00:00
Chandler Carruth	9a0051cd59	[SDAG] Make the DAGCombine worklist not grow endlessly due to duplicate insertions. The old behavior could cause arbitrarily bad memory usage in the DAG combiner if there was heavy traffic of adding nodes already on the worklist to it. This commit switches the DAG combine worklist to work the same way as the instcombine worklist where we null-out removed entries and only add new entries to the worklist. My measurements of codegen time shows slight improvement. The memory utilization is unsurprisingly dominated by other factors (the IR and DAG itself I suspect). This change results in subtle, frustrating churn in the particular order in which DAG combines are applied which causes a number of minor regressions where we fail to match a pattern previously matched by accident. AFAICT, all of these should be using AddToWorklist to directly or should be written in a less brittle way. None of the changes seem drastically bad, and a few of the changes seem distinctly better. A major change required to make this work is to significantly harden the way in which the DAG combiner handle nodes which become dead (zero-uses). Previously, we relied on the ability to "priority-bump" them on the combine worklist to achieve recursive deletion of these nodes and ensure that the frontier of remaining live nodes all were added to the worklist. Instead, I've introduced a routine to just implement that precise logic with no indirection. It is a significantly simpler operation than that of the combiner worklist proper. I suspect this will also fix some other problems with the combiner. I think the x86 changes are really minor and uninteresting, but the avx512 change at least is hiding a "regression" (despite the test case being just noise, not testing some performance invariant) that might be looked into. Not sure if any of the others impact specific "important" code paths, but they didn't look terribly interesting to me, or the changes were really minor. The consensus in review is to fix any regressions that show up after the fact here. Thanks to the other reviewers for checking the output on other architectures. There is a specific regression on ARM that Tim already has a fix prepped to commit. Differential Revision: http://reviews.llvm.org/D4616 llvm-svn: 213727	2014-07-23 07:08:53 +00:00
Tom Stellard	1aaad6970c	R600/SI: Add instruction shrinking pass This pass converts 64-bit instructions to 32-bit when possible. llvm-svn: 213561	2014-07-21 16:55:33 +00:00
Tom Stellard	e812f2fdd8	R600/SI: Clean up some of the unused REGISTER_{LOAD,STORE} code There are a few more cleanups to do, but I ran into some problems with ext loads and trunc stores, when I tried to change some of the vector loads and stores from custom to legal, so I wasn't able to get rid of everything. llvm-svn: 213552	2014-07-21 15:45:06 +00:00
Tom Stellard	b02094e115	R600/SI: Use scratch memory for large private arrays llvm-svn: 213551	2014-07-21 15:45:01 +00:00
Tom Stellard	067c81567b	R600/SI: Store constant initializer data in constant memory This implements a solution for constant initializers suggested by Vadim Girlin, where we store the data after the shader code and then use the S_GETPC instruction to compute its address. This saves use the trouble of creating a new buffer for constant data and then having to pass the pointer to the kernel via user SGPRs or the input buffer. llvm-svn: 213530	2014-07-21 14:01:14 +00:00
Tom Stellard	54a3b65bb9	R600/SI: Use VALU for i1 XOR llvm-svn: 213528	2014-07-21 14:01:10 +00:00
Matt Arsenault	4100ebd67b	R600: Add missing test for concat_vectors llvm-svn: 213473	2014-07-20 07:13:17 +00:00
Matt Arsenault	e261b6e853	R600/SI: Remove dead code and add missing tests. This probably was killed by some generic DAGCombiner improvements in checking the TargetBooleanContents instead of just 1. llvm-svn: 213471	2014-07-20 06:11:02 +00:00
Matt Arsenault	ad14ce84b7	R600/SI: implement range reduction for sin/cos These instructions can only take a limited input range, and return the constant value 1 out of range. We should do range reduction to be able to process arbitrary values. Use a FRACT instruction after normalization to achieve this. Also add a test for constant folding with the lowered code with unsafe-fp-math enabled. v2: use DAG lowering instead of intrinsic, adapt test v3: calculate constant, fold pattern into instruction definition v4: misc style fixes, add sin-fold testcase, cosmetics Patch by Grigori Goronzy llvm-svn: 213458	2014-07-19 18:44:39 +00:00
Tim Northover	00fdbbbf60	R600: support fpext/fptrunc operations to and from f16. llvm-svn: 213376	2014-07-18 13:01:37 +00:00
Tim Northover	20bd0ced30	CodeGen: soften f16 type by default instead of marking legal. Actual support for softening f16 operations is still limited, and can be added when it's needed. But Soften is much closer to being a useful thing to try than keeping it Legal when no registers can actually hold such values. Longer term, we probably want something between Soften and Promote semantics for most targets, it'll be more efficient to promote the 4 basic operations to f32 than libcall them. llvm-svn: 213372	2014-07-18 12:41:46 +00:00
Tim Northover	12817862f1	R600: rename misleading fp16 test. This test is actually going in the opposite direction to what the filename and function name suggested. llvm-svn: 213358	2014-07-18 08:43:30 +00:00
Tim Northover	f861de3d7b	R600: support f16 -> f64 conversion intrinsic. Unfortunately, we don't seem to have a direct truncation, but the extension can be legally split into two operations so we should support that. llvm-svn: 213357	2014-07-18 08:43:24 +00:00
Tim Northover	fd7e424935	CodeGen: extend f16 conversions to permit types > float. This makes the two intrinsics @llvm.convert.from.f16 and @llvm.convert.to.f16 accept types other than simple "float". This is only strictly needed for the truncate operation, since otherwise double rounding occurs and there's no way to represent the strict IEEE conversion. However, for symmetry we allow larger types in the extend too. During legalization, we can expand an "fp16_to_double" operation into two extends for convenience, but abort when the truncate isn't legal. A new libcall is probably needed here. Even after this commit, various target tweaks are needed to actually use the extended intrinsics. I've put these into separate commits for clarity, so there are no actual tests of f64 conversion here. llvm-svn: 213248	2014-07-17 10:51:23 +00:00
Matt Arsenault	22ca3f8860	R600/SI: Allow using f32 rcp / rsq when denormals not handled. These are precise enough to use for OpenCL unless denormals are handled. llvm-svn: 213107	2014-07-15 23:50:10 +00:00
Matt Arsenault	0d89e849bd	R600/SI: Fix select on i1 llvm-svn: 213096	2014-07-15 21:44:37 +00:00
Matt Arsenault	e9fa3b8e6b	R600/SI: Implement less wrong f32 fdiv Assuming single precision denormals and accurate sqrt/div are not reported, this passes the OpenCL conformance test. llvm-svn: 213089	2014-07-15 20:18:31 +00:00
Jan Vesely	6ddb8dd442	R600: Implement zero undef variants of ctlz/cttz v2: use ffbh/l if available v3: Rebase on top of Matt's SI patches Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 213072	2014-07-15 15:51:09 +00:00
Matt Arsenault	ca3976f7ae	R600: Add dag combine for copy of an illegal type. This helps avoid redundant instructions to unpack, and repack the vectors. Ideally we could recognize that pattern and eliminate it. Currently v4i8 and other small element type vectors are scalarized, so this has the added bonus of avoiding that. llvm-svn: 213031	2014-07-15 02:06:31 +00:00
Matt Arsenault	f171cf23b8	R600: Add denormal handling subtarget features. llvm-svn: 213018	2014-07-14 23:40:49 +00:00
Matt Arsenault	c6ae7b4763	R600/SI: Default to no single precision denormals. llvm-svn: 213017	2014-07-14 23:40:43 +00:00
Matt Arsenault	7d5e2cb09f	R600: Run more tests with promote alloca disabled. Re-run tests changed in r211110 to test both paths. Also fix broken check line. llvm-svn: 212895	2014-07-13 02:46:17 +00:00
Matt Arsenault	d0b6f3e173	R600: Run private-memory test with and without alloca promote The unpromoted path still needs to be tested since we can't always promote to using LDS. llvm-svn: 212894	2014-07-13 02:18:06 +00:00
Matt Arsenault	6026858158	R600: Add missing tests for some intrinsics llvm-svn: 212870	2014-07-12 00:36:19 +00:00
Marek Olsak	eac5062cc0	R600/SI: Use i32 vectors for resources and samplers This affects new intrinsics only. What surprises me is that v32i8 still works. llvm-svn: 212831	2014-07-11 17:11:52 +00:00
Marek Olsak	d8ecaeec02	R600/SI: add sample and image intrinsics exposing all instruction fields We need the intrinsics with offsets, so why not just add them all. The R128 parameter will also be useful for reducing SGPR usage. GL_ARB_image_load_store also adds some image GLSL modifiers like "coherent", so Mesa will probably translate those to slc, glc, etc. When LLVM 3.5 is released, I'll switch Mesa to these new intrinsics. llvm-svn: 212830	2014-07-11 17:11:46 +00:00
Jan Vesely	2cb62ce2a0	R600: Implement float to long/ulong Use alg. from LegalizeDAG.cpp Move Expand setting to SIISellowering v2: Extend existing tests instead of creating new ones v3: use separate LowerFPTOSINT function v4: use TargetLowering::expandFP_TO_SINT add comment about using FP_TO_SINT for uints Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 212773	2014-07-10 22:40:21 +00:00
Matt Arsenault	3332b70627	Revert "Revert r212640, "Add trunc (select c, a, b) -> select c (trunc a), (trunc b) combine."" Don't try to convert the select condition type. llvm-svn: 212750	2014-07-10 18:21:04 +00:00
NAKAMURA Takumi	f862ce8908	Revert r212640, "Add trunc (select c, a, b) -> select c (trunc a), (trunc b) combine." This caused miscompilation on, at least, x86-64. SExt(i1 cond) confused other optimizations. llvm-svn: 212708	2014-07-10 11:37:28 +00:00
Matt Arsenault	b0df92577d	R600/SI: Add support for llvm.convert.{to\|from}.fp16 llvm-svn: 212676	2014-07-10 03:22:20 +00:00
Matt Arsenault	658c5576d1	Add trunc (select c, a, b) -> select c (trunc a), (trunc b) combine. Do this if the truncate is free and the select is legal. llvm-svn: 212640	2014-07-09 19:12:07 +00:00
Matt Arsenault	d2c9e08b63	R600: Fix mishandling of load / store chains. Fixes various bugs with reordering loads and stores. Scalarized vector loads weren't collecting the chains at all. llvm-svn: 212473	2014-07-07 18:34:45 +00:00
Tom Stellard	10ae6a0e6a	R600: Promote i64 loads to v2i32 llvm-svn: 212216	2014-07-02 20:53:54 +00:00
Matt Arsenault	018e91f808	Revert "Temporary hack to try cleaning extra .s file from bots." llvm-svn: 211967	2014-06-27 23:11:26 +00:00
Matt Arsenault	c9c44d682c	Temporary hack to try cleaning extra .s file from bots. llvm-svn: 211963	2014-06-27 21:43:50 +00:00
David Blaikie	6a21e14d53	Fix test so it doesn't try to write out temporary files into the test tree. llvm-svn: 211916	2014-06-27 17:45:43 +00:00
Matt Arsenault	642d2e78b3	R600: Don't crash on unhandled instruction in promote alloca llvm-svn: 211906	2014-06-27 16:52:49 +00:00
Matt Arsenault	6995dd90c0	R600: Add some testcases for promote alloca pass. More complicated GEPs are skipped. Add some tests to actually stress this skipping. llvm-svn: 211859	2014-06-27 03:55:55 +00:00
Matt Arsenault	0989d51520	R600/SI: Add FP mode bits to binary. The default rounding mode to initialize the mode register needs to be reported to the runtime. Fill in other bits a kernel may be interested in setting for future use. llvm-svn: 211791	2014-06-26 17:22:30 +00:00
Matt Arsenault	c6f8fdb4e5	R600: Fix vector FMA llvm-svn: 211757	2014-06-26 01:28:05 +00:00
Tom Stellard	9b3816b5ee	R600: Promote i64 stores to v2i32 Now we need only one 64-bit pattern for stores. llvm-svn: 211643	2014-06-24 23:33:04 +00:00
Matt Arsenault	257d48d22c	R600: Fix inconsistency in rsq instructions. R600 was using a clamped version of rsq, but SI was not. Add a new rsq_clamped intrinsic and use them consistently. It's unclear to me from the documentation what behavior the R600 instructions have, so I assume they have the legacy behavior described by the SI documents. For R600, use RECIPSQRT_IEEE for both llvm.AMDGPU.rsq.legacy and llvm.AMDGPU.rsq. R600 also has RECIPSQRT_FF, which I'm not sure how it fits in here. llvm-svn: 211637	2014-06-24 22:13:39 +00:00
Matt Arsenault	f2b0aebb8a	R600/SI: Fix div_scale intrinsic. The operand that must match one of the others does matter, and implement selecting for it. llvm-svn: 211523	2014-06-23 18:28:28 +00:00
Matt Arsenault	c4d3d3a16e	R600: Move add/sub with overflow out of AMDILISelLowering Add more tests for these. llvm-svn: 211517	2014-06-23 18:00:49 +00:00
Matt Arsenault	b8b5153935	R600/SI: Handle i64 sub. We can handle it the same way as add llvm-svn: 211514	2014-06-23 18:00:38 +00:00
Jan Vesely	b32714054a	R600: Add udivrem test v2: move < %s to the end of the line space after ; add v4i32 test Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 211476	2014-06-22 21:42:58 +00:00
Tom Stellard	ae4c9e7bc3	R600/SI: Add patterns for ctpop inside a branch llvm-svn: 211378	2014-06-20 17:06:11 +00:00
Tom Stellard	9c603ebca4	R600/SI: Add a pattern for f32 ftrunc llvm-svn: 211377	2014-06-20 17:06:09 +00:00
Tom Stellard	a79e9f0f6d	R600: Expand vector flog2 llvm-svn: 211376	2014-06-20 17:06:07 +00:00
Tom Stellard	5222a88653	R600: Expand vector fexp2 llvm-svn: 211375	2014-06-20 17:06:05 +00:00
Tom Stellard	c9dedb8e29	R600/SI: Add a VALU pattern for i64 xor llvm-svn: 211373	2014-06-20 17:05:57 +00:00
Matt Arsenault	8e34ecb797	R600: Add a few tests I forgot to add. These belong with r210827 llvm-svn: 211253	2014-06-19 04:24:43 +00:00
Matt Arsenault	a0050b0961	R600/SI: Add intrinsics for various math instructions. These will be used for custom lowering and for library implementations of various math functions, so it's useful to expose these as builtins. llvm-svn: 211247	2014-06-19 01:19:19 +00:00
Matt Arsenault	692bd5ec2f	R600: Handle fnearbyint The difference from rint isn't really relevant here, so treat them as equivalent. OpenCL doesn't have nearbyint, so this is sort of pointless other than for completeness. llvm-svn: 211229	2014-06-18 22:03:45 +00:00
Marek Olsak	51b8e7b2e7	R600/SI: add gather4 and getlod intrinsics (v3) This contains all the previous patches + getlod support on top of it. It doesn't use SDNodes anymore, so it's quite small. It also adds v16i8 to SReg_128, which is used for the sampler descriptor. Reviewed-by: Tom Stellard llvm-svn: 211228	2014-06-18 22:00:29 +00:00
Jan Vesely	85f0dbce5c	R600: Expand vector fceil Move fp64 fceil tests to fceil64.ll v2: rebase Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 211194	2014-06-18 17:57:29 +00:00
Matt Arsenault	43160e7af2	R600/SI: Add intrinsics for brev instructions llvm-svn: 211187	2014-06-18 17:13:57 +00:00
Matt Arsenault	dbc9aae1fb	R600/SI: Prettier operand printing for 64-bit ops. Copy what is done for 32-bit already so the order is about the same. llvm-svn: 211186	2014-06-18 17:13:51 +00:00
Matt Arsenault	4601093267	R600: Implement f64 ftrunc, ffloor and fceil. CI has instructions for these, so this fixes them for older hardware. llvm-svn: 211183	2014-06-18 17:05:30 +00:00
Matt Arsenault	e8208ec95b	R600: Custom lower f64 frint for pre-CI llvm-svn: 211182	2014-06-18 17:05:26 +00:00
Jan Vesely	ecf5133a2b	R600: Implement 64bit SRA v2: Use capitalized variable name Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 211159	2014-06-18 12:27:17 +00:00
Jan Vesely	900ff2e74b	R600: Implement 64bit SRL v2: use C++ style comment Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 211158	2014-06-18 12:27:15 +00:00
Jan Vesely	25f362766e	R600: Implement 64bit SHL v2: Use c++ style comment Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 211157	2014-06-18 12:27:13 +00:00
Matt Arsenault	295b86e81d	R600/SI: Match cttz_zero_undef llvm-svn: 211116	2014-06-17 17:36:27 +00:00
Matt Arsenault	8579601050	R600/SI: Match ctlz_zero_undef llvm-svn: 211115	2014-06-17 17:36:24 +00:00

1 2 3 4 5 ...

714 Commits