llvm-project

Commit Graph

Author	SHA1	Message	Date
Stanislav Mekhanoshin	127dbdbb02	[AMDGPU] Cleanup in memory legalizer tests. NFC. llvm-svn: 325042	2018-02-13 20:03:32 +00:00
Yaxun Liu	0124b5484c	[AMDGPU] Change constant addr space to 4 Differential Revision: https://reviews.llvm.org/D43170 llvm-svn: 325030	2018-02-13 18:00:25 +00:00
Matt Arsenault	923712b6b5	Reapply "AMDGPU: Add 32-bit constant address space" This reverts r324494 and reapplies r324487. llvm-svn: 324747	2018-02-09 16:57:57 +00:00
Stefan Maksimovic	dc66ae78c6	[SelectionDAG] Provide adequate register class for RegisterSDNode When adding operands to machine instructions in case of RegisterSDNodes, generate a COPY node in case the register class does not match the one in the instruction definition. Differental Revision: https://reviews.llvm.org/D35561 llvm-svn: 324733	2018-02-09 13:55:25 +00:00
Matt Arsenault	c24d5e2819	AMDGPU: Minor cleanups Column limit, typo, unnecessary reference llvm-svn: 324666	2018-02-08 22:46:38 +00:00
Matt Arsenault	b02cebf552	AMDGPU: Fix incorrect reordering when inline asm defines LDS address Defs of operands outside of the instruction's explicit defs need to be checked. llvm-svn: 324554	2018-02-08 01:56:14 +00:00
Matt Arsenault	c908e3f77a	AMDGPU: Don't crash when trying to fold implicit operands llvm-svn: 324550	2018-02-08 01:12:46 +00:00
Rafael Espindola	f4e3f3e31c	Revert "AMDGPU: Add 32-bit constant address space" This reverts commit r324487. It broke clang tests. llvm-svn: 324494	2018-02-07 18:09:35 +00:00
Marek Olsak	871c30e540	AMDGPU: Add 32-bit constant address space Note: This is a candidate for LLVM 6.0, because it was planned to be in that release but was delayed due to a long review period. Merge conflict in release_60 - resolution: Add "-p6:32:32" into the second (non-amdgiz) string. Only scalar loads support 32-bit pointers. An address in a VGPR will fail to compile. That's OK because the results of loads will only be used in places where VGPRs are forbidden. Updated AMDGPUAliasAnalysis and used SReg_64_XEXEC. The tests cover all uses cases we need for Mesa. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D41651 llvm-svn: 324487	2018-02-07 16:01:00 +00:00
Marek Olsak	b2cc77985b	AMDGPU: Remove the s_buffer workaround for GFX9 chips Summary: I checked the AMD closed source compiler and the workaround is only needed when x3 is emulated as x4, which we don't do in LLVM. SMEM x3 opcodes don't exist, and instead there is a possibility to use x4 with the last component being unused. If the last component is out of buffer bounds and falls on the next 4K page, the hw hangs. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D42756 llvm-svn: 324486	2018-02-07 16:00:40 +00:00
Tom Stellard	33445765dd	AMDGPU/GlobalISel: Mark 32-bit G_FPTOUI as legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D42152 llvm-svn: 324446	2018-02-07 04:47:59 +00:00
Mark Searles	24c92eeb83	[AMDGPU] Suppress redundant waitcnt instrs. 1. Run the memory legalizer prior to the waitcnt pass; keep the policy that the waitcnt pass does not remove any waitcnts within the incoming IR. 2. The waitcnt pass doesn't (yet) track waitcnts that exist prior to the waitcnt pass (it just skips over them); because the waitcnt pass is ignorant of them, it may insert a redundant waitcnt. To avoid this, check the prev instr. If it and the to-be-inserted waitcnt are the same, suppress the insertion. We keep the existing waitcnt under the assumption that whomever, e.g., the memory legalizer, inserted it knows what they were doing. 3. Follow-on work: teach the waitcnt pass to record the pre-existing waitcnts for better waitcnt production. Differential Revision: https://reviews.llvm.org/D42854 llvm-svn: 324440	2018-02-07 02:21:21 +00:00
Matt Arsenault	a18b3bcf51	AMDGPU: Select BFI patterns with 64-bit ints llvm-svn: 324431	2018-02-07 00:21:34 +00:00
Craig Topper	58ecffd857	[DAGCombiner][AMDGPU][X86] Turn cttz/ctlz into cttz_zero_undef/ctlz_zero_undef if we can prove the input is never zero X86 currently has a late DAG combine after cttz/ctlz are turned into BSR+BSF+CMOV to detect this and remove the CMOV. But we should be able to do this much earlier and avoid creating the cmov all together. For the changed AMDGPU test case it appears that previously the i8 cttz was type legalized to i16 which introduced an OR with 256 in order to limit the result to 8 on the widened type. At this point the result is known to never be zero, but nothing checked that. Then operation legalization is told to promote all i16 cttz to i32. This introduces an extend and a truncate and another OR with 65536 to limit the result to 16. With the DAG combiner change we are able to prevent the creation of the second OR since the opcode will have been changed to cttz_zero_undef after the first OR. I the lack of the OR caused the instruction to change to v_ffbl_b32_sdwa Differential Revision: https://reviews.llvm.org/D42985 llvm-svn: 324427	2018-02-06 23:54:37 +00:00
Marek Olsak	7d92b7e23a	AMDGPU: Fix S_BUFFER_LOAD_DWORD_SGPR moveToVALU Author: Bas Nieuwenhuizen https://reviews.llvm.org/D42881 llvm-svn: 324353	2018-02-06 15:17:55 +00:00
Tim Renouf	807ecc3d66	[AMDGPU] do not generate .AMDGPU.config for amdpal os type Summary: Now we generate PAL metadata for the amdpal os type, there is no need to generate the .AMDGPU.config section. Reviewers: arsenm, nhaehnle, dstuttard Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D37760 Change-Id: I303c5fad66656ce97293da60621afac6595b4c18 llvm-svn: 324346	2018-02-06 13:39:38 +00:00
Konstantin Zhuravlyov	8818d13ed2	AMDGPU/MemoryModel: Fix monotonic atomic loads Those should have glc bit set for system and agent synchronization scopes llvm-svn: 324314	2018-02-06 04:06:04 +00:00
Yaxun Liu	2a22c5deff	[AMDGPU] Switch to the new addr space mapping by default This requires corresponding clang change. Differential Revision: https://reviews.llvm.org/D40955 llvm-svn: 324101	2018-02-02 16:07:16 +00:00
Geoff Berry	94503c7bc3	[MachineCopyPropagation] Extend pass to do COPY source forwarding Summary: This change extends MachineCopyPropagation to do COPY source forwarding and adds an additional run of the pass to the default pass pipeline just after register allocation. This version of this patch uses the newly added MachineOperand::isRenamable bit to avoid forwarding registers is such a way as to violate constraints that aren't captured in the Machine IR (e.g. ABI or ISA constraints). This change is a continuation of the work started in D30751. Reviewers: qcolombet, javed.absar, MatzeB, jonpa, tstellar Subscribers: tpr, mgorny, mcrosier, nhaehnle, nemanjai, jyknight, hfinkel, arsenm, inouehrs, eraman, sdardis, guyblank, fedor.sergeev, aheejin, dschuff, jfb, myatsina, llvm-commits Differential Revision: https://reviews.llvm.org/D41835 llvm-svn: 323991	2018-02-01 18:54:01 +00:00
Changpeng Fang	29fcf883fb	AMDGPU/SI: Adjust the encoding family for D16 buffer instructions when the target has UnpackedD16VMem feature. Reviewers: Matt and Brian Differential Revision: https://reviews.llvm.org/D42548 llvm-svn: 323988	2018-02-01 18:41:33 +00:00
Matt Arsenault	df0f25070c	DAG: Fix not truncating when promoting bswap/bitreverse These need to convert back to the original type, like any other promotion. llvm-svn: 323932	2018-01-31 23:54:16 +00:00
Puyan Lotfi	43e94b15ea	Followup on Proposal to move MIR physical register namespace to '$' sigil. Discussed here: http://lists.llvm.org/pipermail/llvm-dev/2018-January/120320.html In preparation for adding support for named vregs we are changing the sigil for physical registers in MIR to '$' from '%'. This will prevent name clashes of named physical register with named vregs. llvm-svn: 323922	2018-01-31 22:04:26 +00:00
Marek Olsak	d4bb329d0e	AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9 Summary: This enables load merging into x2, x4, which is driven by inline offsets. 6500 shaders are affected: Code Size in affected shaders: -15.14 % Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D42078 llvm-svn: 323909	2018-01-31 20:18:11 +00:00
Marek Olsak	13e4741275	AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16} Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D41663 llvm-svn: 323908	2018-01-31 20:18:04 +00:00
Yaxun Liu	c00d81e697	LLParser: add an argument for overriding data layout and do not check alloca addr space Sometimes users do not specify data layout in LLVM assembly and let llc set the data layout by target triple after loading the LLVM assembly. Currently the parser checks alloca address space no matter whether the LLVM assembly contains data layout definition, which causes false alarm since the default data layout does not contain the correct alloca address space. The parser also calls verifier to check debug info and updating invalid debug info. Currently there is no way to let the verifier to check debug info only. If the verifier finds non-debug-info issues the parser will fail. For llc, the fix is to remove the check of alloca addr space in the parser and disable updating debug info, and defer the updating of debug info and verification to be after setting data layout of the IR by target. For other llvm tools, since they do not override data layout by target but instead can override data layout by a command line option, an argument for overriding data layout is added to the parser. In cases where data layout overriding is necessary for the parser, the data layout can be provided by command line. Differential Revision: https://reviews.llvm.org/D41832 llvm-svn: 323826	2018-01-30 22:32:39 +00:00
Geoff Berry	1d53101387	[AMDGPU] isRenamable fixes to support copy forwarding Mark more opcodes as hasExtraSrcRegAllocReq so that their operands will be marked as not renamable, to avoid copy forwarding violating the constraint that only one operand may use the constant bus. These changes fix a few mis-compiles when copy forwarding is enabled in MachineCopyPropagation by D41835 (and were reviewed as part of that change). llvm-svn: 323794	2018-01-30 17:37:39 +00:00
Mark Searles	94ae3b2f9b	[AMDGPU] Revert "[AMDGPU] Add options for waitcnt pass debugging; add instr count in debug output." Patch caused a buildbot failure; arg; http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/17373/s\ teps/build_Lld/logs/stdio : /Users/buildslave/as-bldslv9/lld-x86_64-darwin13/llvm.src/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1563:18: error: unused variable 'InstCnt' [-Werror,-Wunused-variable] static int32_t InstCnt = 0; " This reverts commit 4f4a7d61e306b67044d9f16bc2016fee806bc2cc. llvm-svn: 323791	2018-01-30 17:17:06 +00:00
Mark Searles	d6d5a2571f	[AMDGPU] Add options for waitcnt pass debugging; add instr count in debug output. -amdgpu-waitcnt-forcezero={1\|0} Force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) -amdgpu-waitcnt-forceexp=<n> Force emit a s_waitcnt expcnt(0) before the first <n> instrs -amdgpu-waitcnt-forcelgkm=<n> Force emit a s_waitcnt lgkmcnt(0) before the first <n> instrs -amdgpu-waitcnt-forcevm=<n> Force emit a s_waitcnt vmcnt(0) before the first <n> instrs This patch was pushed ( abb190fd51cd2f9a9eef08c024e109f7f7e909fc ), which caused a buildbot failure, reverted ( 6227480d74da507cf8e1b4bcaffbdb9fb875b4b8 ), and then updated to fix buildbot failures (this patch). Differential Revision: https://reviews.llvm.org/D40091 llvm-svn: 323788	2018-01-30 16:49:38 +00:00
Marek Olsak	48057b554c	AMDGPU: Allow a SGPR for the conditional KILL operand Patch by: Bas Nieuwenhuizen Just use the _e64 variant if needed. This should be possible as per def : Pat < (int_amdgcn_kill (i1 (setcc f32:$src, InlineFPImm<f32>:$imm, cond:$cond))), (SI_KILL_F32_COND_IMM_PSEUDO $src, (bitcast_fpimm_to_i32 $imm), (cond_as_i32imm $cond)) > ; I don't think we can get an immediate for the other operand for which we need the second 32-bit word. https://reviews.llvm.org/D42302 llvm-svn: 323706	2018-01-29 23:19:10 +00:00
Hiroshi Inoue	c8e9245816	[NFC] fix trivial typos in comments and documents "to to" -> "to" llvm-svn: 323628	2018-01-29 05:17:03 +00:00
Francis Visoiu Mistrih	e4718e84e8	[MIR] Add support for addrspace in MIR Add support for printing / parsing the addrspace of a MachineMemOperand. Fixes PR35970. Differential Revision: https://reviews.llvm.org/D42502 llvm-svn: 323521	2018-01-26 11:47:28 +00:00
Daniil Fukalov	6e1dc68117	[AMDGPU] fix LDS f32 intrinsics - using qualified pointer addrspace in intrinsics class to avoid .f32 mangling - changed too common atomic mangling to ds - added missing intrinsics to AMDGPUTTIImpl::getTgtMemIntrinsic Reviewed by: b-sumner Differential Revision: https://reviews.llvm.org/D42383 llvm-svn: 323516	2018-01-26 11:09:38 +00:00
Nicolai Haehnle	4afb64e4c6	Revert r321751, "StructurizeCFG: Fix broken backedge detection" It causes regressions in various OpenGL test suites. Keep the test cases introduced by r321751 as XFAIL, and add a test case for the regression. Change-Id: I90b4cc354f68cebe5fcef1f2422dc8fe1c6d3514 Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=36015 llvm-svn: 323355	2018-01-24 18:02:05 +00:00
Sven van Haastregt	e8404780c3	[DAGCombiner] Bail out if vector size is not a multiple For the included test case, the DAG transformation concat_vectors(scalar, undef) -> scalar_to_vector(sclr) would attempt to create a v2i32 vector for a v9i8 concat_vector. Bail out to avoid creating a bitcast with mismatching sizes later on. Differential Revision: https://reviews.llvm.org/D42379 llvm-svn: 323312	2018-01-24 09:53:47 +00:00
Hiroshi Inoue	501931b117	[NFC] fix trivial typos in comments "the the" -> "the" llvm-svn: 323302	2018-01-24 05:04:35 +00:00
Yaxun Liu	8b7454a8dd	CodeGen: Fix assertion in ScheduleDAGMILive::scheduleMI due to llvm.dbg.value Fix a bug in ScheduleDAGMILive::scheduleMI which causes BotRPTracker not tracking CurrentBottom in some rare cases involving llvm.dbg.value. This issues causes amdgcn target to assert when compiling some user codes with -g. Differential Revision: https://reviews.llvm.org/D42394 llvm-svn: 323214	2018-01-23 16:04:53 +00:00
Mark Searles	7687d42052	[AMDGPU] SI Load Store Optimizer: When merging with offset, use V_ADD_{I\|U}32_e64 - Change inserted add ( V_ADD_{I\|U}32_e32 ) to _e64 version ( V_ADD_{I\|U}32_e64 ) so that the add uses a vreg for the carry; this prevents inserted v_add from killing VCC; the _e64 version doesn't accept a literal in its encoding, so we need to introduce a mov instr as well to get the imm into a register. - Change pass name to "SI Load Store Optimizer"; this removes the '/', which complicates scripts. Differential Revision: https://reviews.llvm.org/D42124 llvm-svn: 323153	2018-01-22 21:46:43 +00:00
Daniel Neilson	1e68724d24	Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1) Summary: This is a resurrection of work first proposed and discussed in Aug 2015: http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html and initially landed (but then backed out) in Nov 2015: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument which is required to be a constant integer. It represents the alignment of the dest (and source), and so must be the minimum of the actual alignment of the two. This change is the first in a series that allows source and dest to each have their own alignments by using the alignment attribute on their arguments. In this change we: 1) Remove the alignment argument. 2) Add alignment attributes to the source & dest arguments. We, temporarily, require that the alignments for source & dest be equal. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false) will now read call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false) Downstream users may have to update their lit tests that check for @llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script may help with updating the majority of your tests, but it does not catch all possible patterns so some manual checking and updating will be required. s~declare void @llvm\.mem(set\|cpy\|move)\.p([^(])$(.), i32, i1$~declare void @llvm.mem\1.p\2(\3, i1)~g s~call void @llvm\.memset\.p([^(])i8$i8([^])\ (.), i8 (.), i8 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i16$i8([^])\ (.), i8 (.), i16 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i32$i8([^])\ (.), i8 (.), i32 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i64$i8([^])\ (.), i8 (.), i64 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i128$i8([^])\ (.), i8 (.), i128 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i8$i8([^])\ (.), i8 (.), i8 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i8(i8\2 align \6 \3, i8 \4, i8 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i16$i8([^])\ (.), i8 (.), i16 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i16(i8\2 align \6 \3, i8 \4, i16 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i32$i8([^])\ (.), i8 (.), i32 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i32(i8\2 align \6 \3, i8 \4, i32 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i64$i8([^])\ (.), i8 (.), i64 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i64(i8\2 align \6 \3, i8 \4, i64 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i128$i8([^])\ (.), i8 (.), i128 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i128(i8\2 align \6 \3, i8 \4, i128 \5, i1 \7)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i8$i8([^])\ (.), i8([^])\ (.), i8 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i8(i8\3 \4, i8\5* \6, i8 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i16$i8([^])\ (.), i8([^])\ (.), i16 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i16(i8\3 \4, i8\5* \6, i16 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i32$i8([^])\ (.), i8([^])\ (.), i32 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i32(i8\3 \4, i8\5* \6, i32 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i64$i8([^])\ (.), i8([^])\ (.), i64 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i64(i8\3 \4, i8\5* \6, i64 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i128$i8([^])\ (.), i8([^])\ (.), i128 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i128(i8\3 \4, i8\5* \6, i128 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i8$i8([^])\ (.), i8([^])\ (.), i8 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i16$i8([^])\ (.), i8([^])\ (.), i16 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i32$i8([^])\ (.), i8([^])\ (.), i32 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i64$i8([^])\ (.), i8([^])\ (.), i64 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i128$i8([^])\ (.), i8([^])\ (.), i128 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g The remaining changes in the series will: Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. Step 3) Update Clang to use the new IRBuilder API. Step 4) Update Polly to use the new IRBuilder API. Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use getDestAlignment() and getSourceAlignment() instead. Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reviewers: pete, hfinkel, lhames, reames, bollu Reviewed By: reames Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits Differential Revision: https://reviews.llvm.org/D41675 llvm-svn: 322965	2018-01-19 17:13:12 +00:00
Matthias Braun	8bb5228db9	Move tests to the correct place test/CodeGen/MIR is for testing the MIR parser/printer. Tests for passes and targets belong to test/CodeGen/TARGETNAME. llvm-svn: 322925	2018-01-19 06:08:15 +00:00
Changpeng Fang	4737e892de	AMDGPU/SI: Add d16 support for image intrinsics. Summary: This patch implements d16 support for image load, image store and image sample intrinsics. Reviewers: Matt, Brian. Differential Revision: https://reviews.llvm.org/D3991 llvm-svn: 322903	2018-01-18 22:08:53 +00:00
Daniil Fukalov	d5fca554e2	[AMDGPU] add LDS f32 intrinsics added llvm.amdgcn.atomic.{add\|min\|max}.f32 intrinsics to allow generate ds_{add\|min\|max}[_rtn]_f32 instructions needed for OpenCL float atomics in LDS Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D37985 llvm-svn: 322656	2018-01-17 14:05:05 +00:00
Stanislav Mekhanoshin	62875fcd6c	[AMDGPU] Add HW_REG_SH_MEM_BASES symbolic name for s_getreg_b32 Differential Revision: https://reviews.llvm.org/D41617 llvm-svn: 322500	2018-01-15 18:49:15 +00:00
Tim Renouf	75ced9d5b8	[AMDGPU] stop image_store being moved illegally Summary: A recent change 321556: AMDGPU: Remove mayLoad/hasSideEffects from MIMG stores can allow the machine instruction scheduler to move an image store past an image load using the same descriptor. V2: Fixed by marking image ops as mayAlias and isAliased. This may be overly conservative, and we may need to revisit. V3: Reverted test change done on 321556. Reviewers: arsenm, nhaehnle, dstuttard Subscribers: llvm-commits, t-tye, yaxunl, wdng, kzhuravl Differential Revision: https://reviews.llvm.org/D41969 llvm-svn: 322419	2018-01-12 22:57:24 +00:00
Changpeng Fang	44dfa1de3b	AMDGPU/SI: Add d16 support for buffer intrinsics. Differential Revision: https://reviews.llvm.org/D38906 Reviewers: Matt and Brian. llvm-svn: 322402	2018-01-12 21:12:19 +00:00
Rafael Espindola	e4b0231c63	Make internal/private GVs implicitly dso_local. While updating clang tests for having clang set dso_local I noticed that: - There are a lot of tests to update. - Many of the updates are redundant. They are redundant because a GV is "obviously dso_local". This patch starts formalizing that a bit by requiring that internal and private GVs be dso_local too. Since they all are, we don't have to print dso_local to the textual representation, making it a bit more compact and easier to read. llvm-svn: 322317	2018-01-11 22:15:05 +00:00
Puyan Lotfi	fe6c9cbb24	[MIR] Repurposing '$' sigil used by external symbols. Replacing with '&'. Planning to add support for named vregs. This puts is in a conundrum since physregs are named as well. To rectify this we need to use a sigil other than '%' for physregs in MIR. We've settled on using '$' for physregs but first we must repurpose it from external symbols using it, which is what this commit is all about. We think '&' will have familiar semantics for C/C++ users. llvm-svn: 322146	2018-01-10 00:56:48 +00:00
Tim Renouf	d68fa1be57	[SelectionDAG] Fixed f16-from-vector promotion problem Summary: In the case of an fp_extend of v1f16 to v1f32 where the v1f16 is the result of a bitcast from i16, avoid creating an illegal fp16_to_fp where the input is not a vector and the result is a v1f32. V2: The fix is now to avoid vector scalarization creating a v1->scalar bitcast. Reviewers: srhines, t.p.northover Subscribers: nhaehnle, llvm-commits, dstuttard, t-tye, yaxunl, wdng, kzhuravl, arsenm Differential Revision: https://reviews.llvm.org/D41126 llvm-svn: 322120	2018-01-09 21:36:25 +00:00
Tim Renouf	6eaad1e539	[AMDGPU] Fixed incorrect uniform branch condition Summary: I had a case where multiple nested uniform ifs resulted in code that did v_cmp comparisons, combining the results with s_and_b64, s_or_b64 and s_xor_b64 and using the resulting mask in s_cbranch_vccnz, without first ensuring that bits for inactive lanes were clear. There was already code for inserting an "s_and_b64 vcc, exec, vcc" to clear bits for inactive lanes in the case that the branch is instruction selected as s_cbranch_scc1 and is then changed to s_cbranch_vccnz in SIFixSGPRCopies. I have added the same code into SILowerControlFlow for the case that the branch is instruction selected as s_cbranch_vccnz. This de-optimizes the code in some cases where the s_and is not needed, because vcc is the result of a v_cmp, or multiple v_cmp instructions combined by s_and/s_or. We should add a pass to re-optimize those cases. Reviewers: arsenm, kzhuravl Subscribers: wdng, yaxunl, t-tye, llvm-commits, dstuttard, timcorringham, nhaehnle Differential Revision: https://reviews.llvm.org/D41292 llvm-svn: 322119	2018-01-09 21:34:43 +00:00
Francis Visoiu Mistrih	2b3bd30637	[CodeGen] Don't print register classes in -debug output Since register classes and banks are already printed with the register definition, don't print it at the end of every instruction anymore. This follows MIR in this regard and is another step to the unification of the two formats. llvm-svn: 322086	2018-01-09 15:39:44 +00:00
Matt Arsenault	8070882b4e	StructurizeCFG: Fix broken backedge detection The work order was changed in r228186 from SCC order to RPO with an arbitrary sorting function. The sorting function attempted to move inner loop nodes earlier. This was was apparently relying on an assumption that every block in a given loop / the same loop depth would be seen before visiting another loop. In the broken testcase, a block outside of the loop was encountered before moving onto another block in the same loop. The testcase would then structurize such that one blocks unconditional successor could never be reached. Revert to plain RPO for the analysis phase. This fixes detecting edges as backedges that aren't really. The processing phase does use another visited set, and I'm unclear on whether the order there is as important. An arbitrary order doesn't work, and triggers some infinite loops. The reversed RPO list seems to work and is closer to the order that was used before, minus the arbitary custom sorting. A few of the changed tests now produce smaller code, and a few are slightly worse looking. llvm-svn: 321751	2018-01-03 18:45:37 +00:00

1 2 3 4 5 ...

1416 Commits