llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	94812216ef	R600/SI: Use S_BFE_I64 for 64-bit sext_inreg llvm-svn: 222012	2014-11-14 18:18:16 +00:00
Matt Arsenault	da59f3de45	R600/SI: Fix fmin_legacy / fmax_legacy matching for SI select_cc is expanded on SI, so this was never matched. llvm-svn: 221941	2014-11-13 23:03:09 +00:00
Matt Arsenault	7784992999	R600/SI: Use s_movk_i32 llvm-svn: 221922	2014-11-13 20:44:23 +00:00
Matt Arsenault	6ef66144f3	R600: Fix assert on empty function If a function is just an unreachable, this would hit a "this is not a MachO target" assertion because of setting HasSubsectionViaSymbols. llvm-svn: 221920	2014-11-13 20:07:40 +00:00
Matt Arsenault	cc8d3b8774	R600: Error on initializer for LDS. Also give a proper error for other address spaces. llvm-svn: 221917	2014-11-13 19:56:13 +00:00
Matt Arsenault	1cffa4c191	R600/SI: Get rid of FCLAMP_SI pseudo It's not necessary. Also use complex patterns to allow src modifier usage. llvm-svn: 221916	2014-11-13 19:49:04 +00:00
Matt Arsenault	581a7a6933	R600/SI: Allow commuting with src2_modifiers llvm-svn: 221911	2014-11-13 19:26:50 +00:00
Matt Arsenault	95e48668b6	R600/SI: Allow commuting some 3 op instructions e.g. v_mad_f32 a, b, c -> v_mad_f32 b, a, c This simplifies matching v_madmk_f32. This looks somewhat surprising, but it appears to be OK to do this. We can commute src0 and src1 in all of these instructions, and that's all that appears to matter. llvm-svn: 221910	2014-11-13 19:26:47 +00:00
Matt Arsenault	afbf21f15c	R600/SI: Fix broken check prefixes in test llvm-svn: 221565	2014-11-08 00:02:57 +00:00
Matt Arsenault	b6e51ff1e7	R600/SI: Add testcase I forgot to commit from months ago llvm-svn: 221384	2014-11-05 19:01:22 +00:00
Tom Stellard	326d6ece94	R600/SI: Change all instruction assembly names to lowercase. This matches the format produced by the AMD proprietary driver. //==================================================================// // Shell script for converting .ll test cases: (Pass the .ll files you want to convert to this script as arguments). //==================================================================// ; This was necessary on my system so that A-Z in sed would match only ; upper case. I'm not sure why. export LC_ALL='C' TEST_FILES="$" MATCHES=`grep -v Patterns SIInstructions.td \| grep -o '"[A-Z0-9_]\+["e]' \| grep -o '[A-Z0-9_]\+' \| sort -r` for f in $TEST_FILES; do # Check that there are SI tests: grep -q -e 'verde' -e 'bonaire' -e 'SI' -e 'tahiti' $f if [ $? -eq 0 ]; then for match in $MATCHES; do sed -i -e "s/$[ :]$match$/\L\1/" $f done # Try to get check lines with partial instruction names sed -i 's/$;[ ]SI[A-Z\\-]: $$[A-Z_0-9]\+$/\1\L\2/' $f fi done sed -i -e 's/bb0_1/BB0_1/g' ../../../test/CodeGen/R600/infinite-loop.ll sed -i -e 's/SI-NOT: bfe/SI-NOT: {{[^@]}}bfe/g'../../../test/CodeGen/R600/llvm.AMDGPU.bfe.32.ll ../../../test/CodeGen/R600/sext-in-reg.ll sed -i -e 's/exp_IEEE/EXP_IEEE/g' ../../../test/CodeGen/R600/llvm.exp2.ll sed -i -e 's/numVgprs/NumVgprs/g' ../../../test/CodeGen/R600/register-count-comments.ll sed -i 's/$; CHECK[-NOT]*: $$[A-Z_0-9]\+$/\1\L\2/' ../../../test/CodeGen/R600/select64.ll ../../../test/CodeGen/R600/sgpr-copy.ll //==================================================================// // Shell script for converting .td files (run this last) //==================================================================// export LC_ALL='C' sed -i -e '/Patterns/!s/$"[A-Z0-9_]\+[ "e]$/\L\1/g' SIInstructions.td sed -i -e 's/"EXP/"exp/g' SIInstrInfo.td llvm-svn: 221350	2014-11-05 14:50:53 +00:00
Tom Stellard	bd59920616	R600/SI: Add an extra check line to make test more strict llvm-svn: 221349	2014-11-05 14:50:34 +00:00
Tom Stellard	5cbb53c41e	Reapply: R600: Make sure to inline all internal functions Function calls aren't supported yet. This was reverted due to build breakages, which should be fixed now. llvm-svn: 221173	2014-11-03 19:49:05 +00:00
Reid Kleckner	9abe268adb	Revert "R600: Make sure to inline all internal functions" This reverts commit r220996. It introduced layering violations causing link errors in many configurations. llvm-svn: 221020	2014-10-31 23:35:26 +00:00
Tom Stellard	5b2927fe83	R600: Don't promote allocas when one of the users is a ptrtoint instruction We need to figure out how to track ptrtoint values all the way until result is converted back to a pointer in order to correctly rewrite the pointer type. llvm-svn: 220997	2014-10-31 20:52:04 +00:00
Tom Stellard	aa73831757	R600: Make sure to inline all internal functions Function calls aren't supported yet. llvm-svn: 220996	2014-10-31 20:52:02 +00:00
Matt Arsenault	0cf39569bf	R600/SI: Add another failing testcase for i1 copies It's not handling phis. llvm-svn: 220371	2014-10-22 05:30:42 +00:00
Matt Arsenault	59102d38fb	R600/SI: Add failing testcase reduced from OpenCV This fails the verifier with: "Expected a VCSrc_32 register, but got a VReg_1 register" llvm-svn: 220368	2014-10-22 04:26:10 +00:00
Matt Arsenault	7c93690be0	Add minnum / maxnum codegen llvm-svn: 220342	2014-10-21 23:01:01 +00:00
Matt Arsenault	75c658e2cc	R600/SI: Add missing parameter to div_fmas intrinsic llvm-svn: 220338	2014-10-21 22:20:55 +00:00
Matt Arsenault	8c4fb7cae0	R600: Use default GlobalDirective The overridden one wasn't inserting a space, so you would end up with .globalfoo llvm-svn: 220329	2014-10-21 21:08:36 +00:00
Matt Arsenault	e306a32325	R600/SI: Add pattern for bswap llvm-svn: 220304	2014-10-21 16:25:08 +00:00
Aaron Watry	8114437a8f	R600/SI: Add global atomicrmw xchg v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220110	2014-10-17 23:33:03 +00:00
Aaron Watry	d672ee2a47	R600/SI: Add global atomicrmw xor v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220109	2014-10-17 23:33:01 +00:00
Aaron Watry	8a911e6926	R600/SI: Add global atomicrmw or v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220108	2014-10-17 23:32:59 +00:00
Aaron Watry	58c9992f15	R600/SI: Add global atomicrmw min/umin v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220107	2014-10-17 23:32:57 +00:00
Aaron Watry	29f295d7a5	R600/SI: Add global atomicrmw max/umax v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220106	2014-10-17 23:32:56 +00:00
Aaron Watry	621278034c	R600/SI: Add global atomicrmw and v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220105	2014-10-17 23:32:54 +00:00
Aaron Watry	328f1bae8e	R600/SI: Add global atomicrmw sub v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220104	2014-10-17 23:32:52 +00:00
Aaron Watry	28682cf205	R600/SI: Fix/add tests for atomicrmw add The previous tests claimed to test constant offsets in the function name, but the tests weren't actually testing them. Clone the tests, and do testing of all combinations of the following: 1) with/without constant pointer offset 2) 32/64-bit addressing modes 3) Usage and non-usage of the return value from the atomicrmw Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220103	2014-10-17 23:32:50 +00:00
Aaron Watry	1d13d36520	R600: Rename atomic_load global tests to atomic_add The function name now matches what it's actually testing. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220102	2014-10-17 23:32:49 +00:00
Matt Arsenault	d282ada508	R600/SI: Allow commuting with source modifiers llvm-svn: 220066	2014-10-17 18:00:48 +00:00
Matt Arsenault	6d3cd544bb	R600/SI: Allow comuting fp immediates llvm-svn: 220062	2014-10-17 18:00:39 +00:00
Matt Arsenault	83a535ff6b	R600/SI: Remove SI_BUFFER_RSRC pseudo Just use REG_SEQUENCE directly, so there are fewer instructions to need to deal with later. llvm-svn: 220056	2014-10-17 17:42:56 +00:00
Jan Vesely	b535c902e6	R600: Add EG to FMA test Reviewed-by: Tom Stellard <tom@stellard.net> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 220045	2014-10-17 14:45:27 +00:00
Jan Vesely	af62cf4db0	SelectionDAG: Add sext_inreg optimizations v2: use dyn_cast fixup comments v3: use cast Reviewed-by: Matt Arsenault <arsenm2@gmail.com> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 220044	2014-10-17 14:45:25 +00:00
Matt Arsenault	a3fe7c62d1	R600: Fix nonsensical implementation of computeKnownBits for BFE This was resulting in invalid simplifications of sdiv llvm-svn: 219953	2014-10-16 20:07:40 +00:00
Tom Stellard	c8d7920ad9	R600/SI: Fix bug where immediates were being used in DS addr operands The SelectDS1Addr1Offset complex pattern always tries to store constant lds pointers in the offset operand and store a zero value in the addr operand. Since the addr operand does not accept immediates, the zero value needs to first be copied to a register. This newly created zero value will not go through normal instruction selection, so we need to manually insert a V_MOV_B32_e32 in the complex pattern. This bug was hidden by the fact that if there was another zero value in the DAG that had not been selected yet, then the CSE done by the DAG would use the unselected node for the addr operand rather than the one that was just created. This would lead to the zero value being selected and the DAG automatically inserting a V_MOV_B32_e32 instruction. llvm-svn: 219848	2014-10-15 21:08:59 +00:00
Matt Arsenault	1a74aff846	R600/SI: Also try to use 0 base for misaligned 8-byte DS loads. llvm-svn: 219823	2014-10-15 18:06:43 +00:00
Matt Arsenault	7b68fdf3c0	R600: Fix miscompiles when BFE has multiple uses SimplifyDemandedBits would break the other uses of the operand. llvm-svn: 219819	2014-10-15 17:58:34 +00:00
Jan Vesely	e5121f3c10	Reapply "R600: Add new intrinsic to read work dimensions" This effectively reverts revert 219707. After fixing the test to work with new function name format and renamed intrinsic. Reviewed-by: Tom Stellard <tom@stellard.net> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 219710	2014-10-14 20:05:26 +00:00
Rafael Espindola	db3f0a24ec	Revert "R600: Add new intrinsic to read work dimensions" This reverts commit r219705. CodeGen/R600/work-item-intrinsics.ll was failing on linux. llvm-svn: 219707	2014-10-14 18:58:04 +00:00
Jan Vesely	86187d231a	R600: Add new intrinsic to read work dimensions v2: Add SI lowering Add test v3: Place work dimensions after the kernel arguments. v4: Calculate offset while lowering arguments v5: rebase v6: change prefix to AMDGPU Reviewed-by: Tom Stellard <tom@stellard.net> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 219705	2014-10-14 18:52:07 +00:00
Matt Arsenault	e775f5fe76	R600/SI: Use DS offsets for constant addresses Use 0 as the base address for a constant address, so if we have a constant address we can save moves and form read2/write2s. llvm-svn: 219698	2014-10-14 17:21:19 +00:00
NAKAMURA Takumi	6fd86f1678	llvm/test/CodeGen: Some tests don't REQUIRE asserts any more. Remove them. llvm-svn: 219581	2014-10-12 06:47:47 +00:00
Matt Arsenault	61cc9083d0	R600/SI: Change how DS offsets are printed Match SC by using offset/offset0/offset1 and printing in decimal. llvm-svn: 219537	2014-10-10 22:16:07 +00:00
Matt Arsenault	fe0a2e677b	R600/SI: Match read2/write2 stride 64 versions llvm-svn: 219536	2014-10-10 22:12:32 +00:00
Matt Arsenault	410332860d	R600/SI: Add load / store machine optimizer pass. Currently this only functions to match simple cases where ds_read2_* / ds_write2_* instructions can be used. In the future it might match some of the other weird load patterns, such as direct to LDS loads. Currently enabled only with a subtarget feature to enable easier testing. llvm-svn: 219533	2014-10-10 22:01:59 +00:00
Tom Stellard	3457a8495a	R600/SI: Legalize CopyToReg during instruction selection The instruction emitter will crash if it encounters a CopyToReg node with a non-register operand like FrameIndex. llvm-svn: 219428	2014-10-09 19:06:00 +00:00
Tom Stellard	8dd392e135	R600/SI: Legalize INSERT_SUBREG instructions during PostISelFolding LLVM assumes INSERT_SUBREG will always have register operands, so we need to legalize non-register operands, like FrameIndexes, to avoid random assertion failures. llvm-svn: 219420	2014-10-09 18:09:15 +00:00
Tom Stellard	20fa0be97f	R600/SI: Remove assertion in SIInstrInfo::areLoadsFromSameBasePtr() Added a FIXME coment instead, we need to handle the case where the two DS instructions being compared have different numbers of operands. llvm-svn: 219236	2014-10-07 21:09:20 +00:00
Matt Arsenault	c996175b57	R600/SI: Custom lower f64 -> i64 conversions llvm-svn: 219038	2014-10-03 23:54:56 +00:00
Matt Arsenault	f7c95e3eda	R600: Custom lower [s\|u]int_to_fp for i64 -> f64 llvm-svn: 219037	2014-10-03 23:54:41 +00:00
Matt Arsenault	6cda887776	R600/SI: Fix ftrunc f64 conformance failures. Re-add the tests since they were deleted at some point llvm-svn: 219036	2014-10-03 23:54:27 +00:00
Tom Stellard	fae1dc8a12	R600: Align functions to 256 bytes llvm-svn: 219002	2014-10-03 19:02:02 +00:00
Tom Stellard	79243d9664	R600: Call EmitFunctionHeader() in the AsmPrinter to populate the ELF symbol table llvm-svn: 218776	2014-10-01 17:15:17 +00:00
Matt Arsenault	9706978077	R600/SI: Fix printing of clamp and omod No tests for omod since nothing uses it yet, but this should get rid of the remaining annoying trailing zeros after some instructions. llvm-svn: 218692	2014-09-30 19:49:48 +00:00
Matt Arsenault	1c4571e0fd	R600: Fix broken check lines, missing scalar case. llvm-svn: 218655	2014-09-30 01:05:29 +00:00
Matt Arsenault	3d4233fe48	R600/SI: Also fix fsub + fadd a, a to mad combines llvm-svn: 218609	2014-09-29 14:59:38 +00:00
Matt Arsenault	02cb0ff7db	R600/SI: Fix using mad with multiplies by 2 These turn into fadds, so combine them into the target mad node. fadd (fadd (a, a), b) -> mad 2.0, a, b llvm-svn: 218608	2014-09-29 14:59:34 +00:00
Matt Arsenault	ed8a3e0a08	R600/SI: Add strict check lines to div_scale tests. This has weird operand requirements so it's worthwhile to have very strict checks for its operands. Add different combinations of SGPR operands. llvm-svn: 218535	2014-09-26 17:55:11 +00:00
Matt Arsenault	6a0919fb9b	R600/SI Allow same SGPR to be used for multiple operands Instead of moving the first SGPR that is different than the first, legalize the operand that requires the fewest moves if one SGPR is used for multiple operands. This saves extra moves and is also required for some instructions which require that the same operand be used for multiple operands. llvm-svn: 218532	2014-09-26 17:55:03 +00:00
Matt Arsenault	cb0ac3d1fb	R600/SI: Partially move operand legalization to post-isel hook. Disable the SGPR usage restriction parts of the DAG legalizeOperands. It now should only be doing immediate folding until it can be replaced later. The real legalization work is now done by the other SIInstrInfo::legalizeOperands llvm-svn: 218531	2014-09-26 17:54:59 +00:00
Matt Arsenault	5885bef6cf	R600/SI: Don't move operands that are required to be SGPRs e.g. v_cndmask_b32 requires the condition operand be an SGPR. If one of the source operands were an SGPR, that would be considered the one SGPR use and the condition operand would be illegally moved. llvm-svn: 218529	2014-09-26 17:54:52 +00:00
Matt Arsenault	aff65fbca5	R600/SI: Fix using wrong operand indices when commuting No test since the current SIISelLowering::legalizeOperands effectively hides this, and the general uses seem to only fire on SALU instructions which don't have modifiers between the operands. When trying to use legalizeOperands immediately after instruction selection, it now sees a lot more patterns it did not see before which break on this. llvm-svn: 218527	2014-09-26 17:54:43 +00:00
Matt Arsenault	0c652c3fbc	R600: Avoid repeated check lines llvm-svn: 218487	2014-09-26 01:12:36 +00:00
Matt Arsenault	3a99759498	R600/SI: Fix emitting trailing whitespace after s_waitcnt llvm-svn: 218486	2014-09-26 01:09:46 +00:00
Matt Arsenault	42d1565844	R600: Fix some missing conversion testcases llvm-svn: 218474	2014-09-25 23:16:18 +00:00
Matt Arsenault	c16fafb24d	Remove duplicated RUN lines in middle of test llvm-svn: 218473	2014-09-25 23:16:14 +00:00
Tom Stellard	7980fc8562	R600/SI: Add support for global atomic add llvm-svn: 218457	2014-09-25 18:30:26 +00:00
Matt Arsenault	3e0effa223	R600/SI: Fix weird CHECK-DAG usage This prevents these from failing in a future commit. llvm-svn: 218356	2014-09-24 02:14:26 +00:00
Tom Stellard	744b99b476	R600/SI: Enable selecting SALU inside branches We can do this now that the FixSGPRLiveRanges pass is working. llvm-svn: 218353	2014-09-24 01:33:28 +00:00
Tom Stellard	9f73851e39	Revert "R600/SI: Add support for global atomic add" This reverts commit r218254. The global_atomics.ll test fails with asserts disabled. For some reason, the compiler fails to produce the atomic no return variants. llvm-svn: 218257	2014-09-22 16:44:04 +00:00
Tom Stellard	2355a77e74	R600/SI: Add support for global atomic add llvm-svn: 218254	2014-09-22 15:35:35 +00:00
Matt Arsenault	de0253791c	R600: Un-xfail a test which passes with pass disabled llvm-svn: 218165	2014-09-19 23:02:20 +00:00
Matt Arsenault	5e5b242946	R600/SI: Un-xfail tests which work now llvm-svn: 218164	2014-09-19 23:02:18 +00:00
Matt Arsenault	a986554377	R600/SI: Un xfail a test that works now llvm-svn: 218162	2014-09-19 22:42:40 +00:00
Matt Arsenault	4505f3a73d	R600/SI: Fix test to prepare for scheduler llvm-svn: 218131	2014-09-19 18:11:16 +00:00
Matt Arsenault	46cbc4367b	R600: Better fix for bug 20982 Just do the left shift as unsigned to avoid the UB. llvm-svn: 218092	2014-09-19 00:42:06 +00:00
Matt Arsenault	6462f94884	R600: Bug 20982 - Avoid undefined left shift of negative value I'm not sure what the hardware actually does, so don't bother trying to fold it for now. llvm-svn: 218057	2014-09-18 15:52:26 +00:00
Alexey Samsonov	7bddb0a56a	Exclude known and bugzilled failures from UBSan bootstrap llvm-svn: 217979	2014-09-17 20:17:52 +00:00
Matt Arsenault	02dc26529e	R600/SI: Change formatting of printed FP immediates Only 1 decimal place should be printed for inline immediates. Other constants should be hex constants. Does not include f64 tests because folding those inline immediates currently does not work. llvm-svn: 217964	2014-09-17 17:32:13 +00:00
Matt Arsenault	49dd4283ed	R600/SI: Prefer selecting more e64 instruction forms. Add some more tests to make sure better operand choices are still made. Leave some cases that seem to have no reason to ever be e64 alone. llvm-svn: 217789	2014-09-15 17:15:02 +00:00
Matt Arsenault	0fd0a316ed	R600/SI: Make sure double vector fmul is tested llvm-svn: 217787	2014-09-15 17:04:54 +00:00
Matt Arsenault	72aafd0689	R600/SI: Add some mubuf testcases. I noticed some odd looking cases where addr64 wasn't set when storing to a pointer in an SGPR. This seems to be intentional, and partially tested already. The documentation seems to describe addr64 in terms of which registers addressing modifiers come from, but I would expect to always need addr64 when using 64-bit pointers. If no offset is applied, it makes sense to not need to worry about doing a 64-bit add for the final address. A small immediate offset can be applied, so is it OK to not have addr64 set if a carry is necessary when adding the base pointer in the resource to the offset? llvm-svn: 217785	2014-09-15 16:48:01 +00:00
Matt Arsenault	3f98140c87	R600/SI: Add preliminary support for flat address space llvm-svn: 217777	2014-09-15 15:41:53 +00:00
Matt Arsenault	f620a575bf	R600/SI: Fix broken check lines llvm-svn: 217736	2014-09-14 18:32:05 +00:00
Matt Arsenault	362f345bab	R600/SI: Fix off by 1 error in used register count The register numbers start at 0, so if only 1 register was used, this was reported as 0. llvm-svn: 217636	2014-09-11 22:51:37 +00:00
Matt Arsenault	8239eaab99	Add DAG combine for shl + add of constants. Do (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) This is already done for multiplies, but since multiplies by powers of two are turned into shifts, we also need to handle it here. This might want checks for isLegalAddImmediate to avoid transforming an add of a legal immediate with one that isn't. llvm-svn: 217610	2014-09-11 17:34:19 +00:00
Aaron Watry	3ffc560094	R600: Test local atomics for evergreen Now that the operations are all implemented, we can test this sub-arch here. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 217595	2014-09-11 15:02:52 +00:00
Matt Arsenault	61a528adc7	R600/SI: Fix losing chain when fixing reg class of loads. The lost chain resulting in earlier side effecting nodes being deleted. llvm-svn: 217561	2014-09-10 23:26:19 +00:00
Matt Arsenault	16e313343d	R600: Custom lower frem llvm-svn: 217553	2014-09-10 21:44:27 +00:00
Matt Arsenault	7ac9c4a074	R600/SI: Replace LDS atomics with no return versions llvm-svn: 217379	2014-09-08 15:07:31 +00:00
Matt Arsenault	7b46a59b5a	R600/SI: Relax a few tests to help enable scheduler llvm-svn: 217320	2014-09-06 20:44:41 +00:00
Matt Arsenault	a9fcf62a9c	R600/SI: Fix broken check lines. Fix missing check, and hardcoded register numbers. llvm-svn: 217318	2014-09-06 20:37:56 +00:00
Matt Arsenault	8ae5961065	R600/SI: Use same complex patterns for DS atomics This fixes hitting the same negative base offset problem that was already fixed for regular loads and stores. llvm-svn: 217256	2014-09-05 16:24:58 +00:00
Jan Vesely	d1d1334064	R600: Fix FROUND round halfway cases away from zero Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 217250	2014-09-05 14:26:54 +00:00
Tom Stellard	80942a1b50	R600/SI: Use S_ADD_U32 and S_SUB_U32 for low half of 64-bit operations https://bugs.freedesktop.org/show_bug.cgi?id=83416 llvm-svn: 217248	2014-09-05 14:07:59 +00:00
Matt Arsenault	869cd07158	R600/SI: Try to keep i32 mul on SALU Also fix bug this exposed where when legalizing an immediate operand, a v_mov_b32 would be created with a VSrc dest register. llvm-svn: 217108	2014-09-03 23:24:35 +00:00
Tom Stellard	102c68786c	R600/SI: Add a pattern for i64 and in a branch llvm-svn: 217041	2014-09-03 15:22:41 +00:00
Matt Arsenault	4c24d73709	R600/SI: Relax some ordering in tests. This will help with enabling misched llvm-svn: 216971	2014-09-02 21:45:50 +00:00
Matt Arsenault	b78875e979	R600/SI: Fix hardcoded register numbers in test llvm-svn: 216944	2014-09-02 20:43:07 +00:00
Matt Arsenault	d1649db2fc	R600/SI: Add failing testcase. This is broken when 64-bit add is only partially moved to the VALU. llvm-svn: 216933	2014-09-02 19:12:31 +00:00
Matt Arsenault	c1a71217b3	Fix interference caused by fmul 2, x -> fadd x, x If an fmul was introduced by lowering, it wouldn't be folded into a multiply by a constant since the earlier combine would have replaced the fmul with the fadd. llvm-svn: 216932	2014-09-02 19:02:53 +00:00
Matt Arsenault	8675db15da	R600/SI: Use mad for fsub + fmul We can use a negate source modifier to match this for fsub. llvm-svn: 216735	2014-08-29 16:01:14 +00:00
Tom Stellard	f3fc555e3b	R600/SI: Use READ2/WRITE2 instructions for 64-bit mem ops with 32-bit alignment llvm-svn: 216279	2014-08-22 18:49:35 +00:00
Tom Stellard	85e8b6d5f9	R600/SI: Use a ComplexPattern for DS loads and stores llvm-svn: 216278	2014-08-22 18:49:33 +00:00
Tom Stellard	745f2eddef	R600/SI: Teach moveToVALU how to handle more S_LOAD_* instructions llvm-svn: 216220	2014-08-21 20:41:00 +00:00
Tom Stellard	162a947160	R600/SI: Make sure SCRATCH_WAVE_OFFSET is added as Live-In to the function This fixes a crash in an ocl conformance test. llvm-svn: 216219	2014-08-21 20:40:58 +00:00
Matt Arsenault	fabf545299	R600/SI: Move all fabs / fneg handling to patterns llvm-svn: 215749	2014-08-15 18:42:22 +00:00
Matt Arsenault	13623d0e28	R600/SI: Use source modifiers for f64 fneg llvm-svn: 215748	2014-08-15 18:42:18 +00:00
Matt Arsenault	a147438e37	R600/SI: Use source modifier for f64 fabs llvm-svn: 215747	2014-08-15 18:42:15 +00:00
Matt Arsenault	b2baffaffd	R600/SI: Fix offset folding in some cases with shifted pointers. Ordinarily (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) is only done if the add has one use. If the resulting constant add can be folded into an addressing mode, force this to happen for the pointer operand. This ends up happening a lot because of how LDS objects are allocated. Since the globals are allocated next to each other, acessing the first element of the second object is directly indexed by a shifted pointer. llvm-svn: 215739	2014-08-15 17:49:05 +00:00
Matt Arsenault	2e7cc48baa	R600/SI: Add intrinsic for ldexp llvm-svn: 215734	2014-08-15 17:30:25 +00:00
Matt Arsenault	5015a89aa5	R600/SI: Implement isLegalAddressingMode The default assumes that a 16-bit signed offset is used. LDS instruction use a 16-bit unsigned offset, so it wasn't being used in some cases where it was assumed a negative offset could be used. More should be done here, but first isLegalAddressingMode needs to gain an addressing mode argument. For now, copy most of the rest of the default implementation with the immediate offset change. llvm-svn: 215732	2014-08-15 17:17:07 +00:00
Matt Arsenault	74ef277774	R600: Correctly set the src value offset for scalarized kernel args This for some reason fixes v1i64 kernel arguments on pre-SI. This currently breaks some other cases in the kernel-args.ll test for R600, but I'm not particularly confident in the new output. VTX_READ_* are not used for some of the scalarized cases, and the code reading from the constant buffer doesn't make much sense to me. llvm-svn: 215564	2014-08-13 18:14:11 +00:00
Hal Finkel	415e344f29	Fix classof for ISD::INTRINSIC_W_CHAIN and INTRINSIC_VOID Unfortunately, our use of the SDNode class hierarchy for INTRINSIC_W_CHAIN and INTRINSIC_VOID nodes is somewhat broken right now. These nodes sometimes are used for memory intrinsics (those with MachineMemOperands), and sometimes not. When not, the nodes are not created as instances of MemIntrinsicSDNode, but rather created as some other subclass of SDNode using DAG::getNode. When they are memory intrinsics, they are created using DAG::getMemIntrinsicNode as instances of MemIntrinsicSDNode. MemIntrinsicSDNode is a subclass of MemSDNode, but prior to r214452, we had a non-self-consistent setup whereby MemIntrinsicSDNode::classof on INTRINSIC_W_CHAIN and INTRINSIC_VOID would return true but MemSDNode::classof on INTRINSIC_W_CHAIN and INTRINSIC_VOID would return false. In r214452, MemSDNode::classof was changed to return true for INTRINSIC_W_CHAIN and INTRINSIC_VOID, which is now self-consistent. The problem is that neither the pre-r214452 logic and the post-r214452 logic are really right. The truth is that not all INTRINSIC_W_CHAIN and INTRINSIC_VOID nodes are instances of MemIntrinsicSDNode (or MemSDNode for that matter), and the return value from classof needs to reflect that. This was broken before r214452 (because MemIntrinsicSDNode::classof always returned true), and was broken afterward (because MemSDNode::classof also always returned true), and will now be correct. The minimal solution is to grab one of the SubclassData bits (there is one left for MemIntrinsicSDNode nodes) and use it to store whether or not a particular INTRINSIC_W_CHAIN or INTRINSIC_VOID is really an instance of MemIntrinsicSDNode or not. Doing this allows both MemIntrinsicSDNode::classof and MemSDNode::classof to return the correct answer for the underlying object for both the memory-intrinsic and non-memory-intrinsic cases. This fixes the problem that r214452 created in the SelectionDAGDumper (thanks to Matt Arsenault for pointing it out). Because PowerPC does not implement getTgtMemIntrinsic, this change breaks test/CodeGen/PowerPC/unal-altivec-wint.ll. I've XFAILed it for now, and will fix it in a follow-up commit. llvm-svn: 215511	2014-08-13 01:15:37 +00:00
Jan Vesely	e5ca27d716	R600: Use optimized 24bit path in udivrem v2: drop enum keyword use correct extension mode don't bother computing the sign in unsinged case Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 215462	2014-08-12 17:31:20 +00:00
Jan Vesely	4a33bc6206	R600: Use i24 optimized path for SREM v2: add tests rename LowerSDIV24 to LowerSDIVREM24 handle the rem part in this function Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 215460	2014-08-12 17:31:17 +00:00
Tom Stellard	155bbb7713	R600/SI: Add a ComplexPattern for selecting MUBUF _OFFSET variant This saves us from having to copy a 64-bit 0 value into VGPRs for BUFFER_* instruction which only have a 12-bit immediate offset. llvm-svn: 215399	2014-08-11 22:18:17 +00:00
Tom Stellard	48194cdd81	R600/SI: Add check for low 32 bits of encoding to mubuf tests There are no variable values like registers encoded in the low 32 bits of MUBUF instructions, so it is relatively easy to check these bits, and it will help prevent us from introducing encoding bugs. llvm-svn: 215397	2014-08-11 22:18:11 +00:00
Tom Stellard	93ba12f163	R600/SI: Clear lds bit on MUBUF instructions used for private stores This bit was left uninitialized, which was causing some random failures of piglit tests. NOTE: This is a candidate for the 3.5 branch. llvm-svn: 215396	2014-08-11 22:18:09 +00:00
Tom Stellard	e04fd9d430	R600/SI: Fix broken test llvm-svn: 215395	2014-08-11 22:18:05 +00:00
Tom Stellard	c0503db9e2	R600/SI: Custom lower CONCAT_VECTORS This will lower them using register copies rather than loads and stores to the stack. llvm-svn: 215270	2014-08-09 01:06:56 +00:00
Tom Stellard	4f575f7aaf	R600/SI: Update concat_vectors.ll to check for scratch usage These tests were using SI-NOT: MOVREL to make sure concat vectors weren't being lowered to stack loads and stores, but we are using scratch buffers for the stack now instead of registers, so we need to add an additional SI-NOT check for scratch buffers. With this change I was able to uncover one broken test which will be fixed in a future commit. llvm-svn: 215269	2014-08-09 01:06:53 +00:00
Matt Arsenault	a6dc6c281c	R600: Cleanup fadd and fsub tests llvm-svn: 214991	2014-08-06 20:27:55 +00:00
Matt Arsenault	d5f4de27b6	R600: Increase nearby load scheduling threshold. This partially fixes weird looking load scheduling in memcpy test. The load clustering doesn't seem particularly smart, but this method seems to be partially deprecated so it might not be worth trying to fix. llvm-svn: 214943	2014-08-06 00:29:49 +00:00
Matt Arsenault	c10853f29f	R600/SI: Implement areLoadsFromSameBasePtr This currently has a noticable effect on the kernel argument loads. LDS and global loads are more problematic, I think because of how copies are currently inserted to ensure that the address is a VGPR. llvm-svn: 214942	2014-08-06 00:29:43 +00:00
Tom Stellard	229d5e669b	R600/SI: Update MUBUF assembly string to match AMD proprietary compiler llvm-svn: 214866	2014-08-05 14:48:12 +00:00
Tom Stellard	b37f797678	R600/SI: Avoid generating REGISTER_LOAD instructions. SI doesn't use REGISTER_LOAD anymore, but it was still hitting this code path for 8-bit and 16-bit private loads. llvm-svn: 214865	2014-08-05 14:40:52 +00:00
Matt Arsenault	9215b17eb7	R600/SI: Fix extra whitespace in asm str This slipped in in r214467, so something like V_MOV_B32_e32 v0, ... is now printed with 2 spaces between the instruction name and first operand. llvm-svn: 214660	2014-08-03 05:27:14 +00:00
Matt Arsenault	4de324442b	R600: Cleanup fneg tests llvm-svn: 214612	2014-08-02 02:26:51 +00:00
Tom Stellard	4973a13680	Revert "R600: Move code for generating REGISTER_LOAD into R600ISelLowering.cpp" This reverts commit r214566. I did not mean to commit this yet. llvm-svn: 214572	2014-08-01 21:55:50 +00:00
Tom Stellard	c16f73d7c5	R600: Move code for generating REGISTER_LOAD into R600ISelLowering.cpp SI doesn't use REGISTER_LOAD anymore, but it was still hitting this code path for 8-bit and 16-bit private loads. llvm-svn: 214566	2014-08-01 21:50:47 +00:00
Matt Arsenault	06bd3933ba	R600: Cleanup test Remove -CHECKs, use multiple prefixes, name values, also test the @llvm.fabs version llvm-svn: 214525	2014-08-01 17:00:29 +00:00
Tom Stellard	b4a313a76f	R600/SI: Do abs/neg folding with ComplexPatterns Abs/neg folding has moved out of foldOperands and into the instruction selection phase using complex patterns. As a consequence of this change, we now prefer to select the 64-bit encoding for most instructions and the modifier operands have been dropped from integer VOP3 instructions. llvm-svn: 214467	2014-08-01 00:32:39 +00:00
Tom Stellard	6407e1e632	R600/SI: Fold immediates when shrinking instructions This will prevent us from using extra MOV instructions once we prefer selecting 64-bit instructions. llvm-svn: 214464	2014-08-01 00:32:33 +00:00
Tom Stellard	86d12ebdbd	R600/SI: Fix incorrect commute operation in shrink instructions pass We were commuting the instruction by still shrinking it using the original opcode. NOTE: This is a candidate for the 3.5 branch. llvm-svn: 214463	2014-08-01 00:32:28 +00:00
Jan Vesely	3047950964	R600: Modernize work item intrinsics test Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com> llvm-svn: 214451	2014-07-31 22:11:03 +00:00
Matt Arsenault	2b252ecf6b	R600: Modernize test llvm-svn: 214108	2014-07-28 18:06:08 +00:00
Matt Arsenault	46645fa102	R600/SI: Implement getOptimalMemOpType The default guess uses i32. This needs an address space argument to really do the right thing in all cases. llvm-svn: 214104	2014-07-28 17:49:26 +00:00
Matt Arsenault	6f2a526101	Add alignment value to allowsUnalignedMemoryAccess Rename to allowsMisalignedMemoryAccess. On R600, 8 and 16 byte accesses are mostly OK with 4-byte alignment, and don't need to be split into multiple accesses. Vector loads with an alignment of the element type are not uncommon in OpenCL code. llvm-svn: 214055	2014-07-27 17:46:40 +00:00
Matt Arsenault	24aa028cfa	R600/SI: Fix broken test. There was no check prefix for the instruction lines. Match what is emitted though, although I'm pretty sure it is incorrect. llvm-svn: 214035	2014-07-26 21:21:42 +00:00
Chandler Carruth	411fb407f8	[SDAG] When performing post-legalize DAG combining, run the legalizer over each node in the worklist prior to combining. This allows the combiner to produce new nodes which need to go back through legalization. This is particularly useful when generating operands to target specific nodes in a post-legalize DAG combine where the operands are significantly easier to express as pre-legalized operations. My immediate use case will be PSHUFB formation where we need to build a constant shuffle mask with a build_vector node. This also refactors the relevant functionality in the legalizer to support this, and updates relevant tests. I've spoken to the R600 folks and these changes look like improvements to them. The avx512 change needs to be investigated, I suspect there is a disagreement between the legalizer and the DAG combiner there, but it seems a minor issue so leaving it to be re-evaluated after this patch. Differential Revision: http://reviews.llvm.org/D4564 llvm-svn: 214020	2014-07-26 05:49:40 +00:00
Chandler Carruth	9f4530b95d	[SDAG] Introduce a combined set to the DAG combiner which tracks nodes which have successfully round-tripped through the combine phase, and use this to ensure all operands to DAG nodes are visited by the combiner, even if they are only added during the combine phase. This is critical to have the combiner reach nodes that are introduced during combining. Previously these would sometimes be visited and sometimes not be visited based on whether they happened to end up on the worklist or not. Now we always run them through the combiner. This fixes quite a few bad codegen test cases lurking in the suite while also being more principled. Among these, the TLS codegeneration is particularly exciting for programs that have this in the critical path like TSan-instrumented binaries (although I think they engineer to use a different TLS that is faster anyways). I've tried to check for compile-time regressions here by running llc over a merged (but not LTO-ed) clang bitcode file and observed at most a 3% slowdown in llc. Given that this is essentially a worst case (none of opt or clang are running at this phase) I think this is tolerable. The actual LTO case should be even less costly, and the cost in normal compilation should be negligible. With this combining logic, it is possible to re-legalize as we combine which is necessary to implement PSHUFB formation on x86 as a post-legalize DAG combine (my ultimate goal). Differential Revision: http://reviews.llvm.org/D4638 llvm-svn: 213898	2014-07-24 22:15:28 +00:00
Matt Arsenault	83592a2d32	R600: Add FMA instructions for Evergreen llvm-svn: 213882	2014-07-24 17:41:01 +00:00
Matt Arsenault	9acb978105	R600: Match rcp node on pre-SI llvm-svn: 213844	2014-07-24 06:59:24 +00:00
Matt Arsenault	0daeb63f03	R600: Fix LowerSDIV24 Use ComputeNumSignBits instead of checking for i8 / i16 which only worked when AMDIL was lying about having legal i8 / i16. If an integer is known to fit in 24-bits, we can do division faster with float ops. llvm-svn: 213843	2014-07-24 06:59:20 +00:00
Chandler Carruth	9a0051cd59	[SDAG] Make the DAGCombine worklist not grow endlessly due to duplicate insertions. The old behavior could cause arbitrarily bad memory usage in the DAG combiner if there was heavy traffic of adding nodes already on the worklist to it. This commit switches the DAG combine worklist to work the same way as the instcombine worklist where we null-out removed entries and only add new entries to the worklist. My measurements of codegen time shows slight improvement. The memory utilization is unsurprisingly dominated by other factors (the IR and DAG itself I suspect). This change results in subtle, frustrating churn in the particular order in which DAG combines are applied which causes a number of minor regressions where we fail to match a pattern previously matched by accident. AFAICT, all of these should be using AddToWorklist to directly or should be written in a less brittle way. None of the changes seem drastically bad, and a few of the changes seem distinctly better. A major change required to make this work is to significantly harden the way in which the DAG combiner handle nodes which become dead (zero-uses). Previously, we relied on the ability to "priority-bump" them on the combine worklist to achieve recursive deletion of these nodes and ensure that the frontier of remaining live nodes all were added to the worklist. Instead, I've introduced a routine to just implement that precise logic with no indirection. It is a significantly simpler operation than that of the combiner worklist proper. I suspect this will also fix some other problems with the combiner. I think the x86 changes are really minor and uninteresting, but the avx512 change at least is hiding a "regression" (despite the test case being just noise, not testing some performance invariant) that might be looked into. Not sure if any of the others impact specific "important" code paths, but they didn't look terribly interesting to me, or the changes were really minor. The consensus in review is to fix any regressions that show up after the fact here. Thanks to the other reviewers for checking the output on other architectures. There is a specific regression on ARM that Tim already has a fix prepped to commit. Differential Revision: http://reviews.llvm.org/D4616 llvm-svn: 213727	2014-07-23 07:08:53 +00:00
Tom Stellard	1aaad6970c	R600/SI: Add instruction shrinking pass This pass converts 64-bit instructions to 32-bit when possible. llvm-svn: 213561	2014-07-21 16:55:33 +00:00

1 2 3 4 5 ...

777 Commits