llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	d768737454	AMDGPU: Fix not encoding src2 of VOP3b instructions Broken by r247074. Should include an assembler test, but the assembler is currently broken for VOP3b apparently. llvm-svn: 247123	2015-09-09 08:39:49 +00:00
Matt Arsenault	acd68b58ae	SelectionDAG: Support Expand of f16 extloads Currently this hits an assert that extload should always be supported, which assumes integer extloads. This moves a hack out of SI's argument lowering and is covered by existing tests. llvm-svn: 247113	2015-09-09 01:12:27 +00:00
Matt Arsenault	86d336e91b	AMDGPU/SI: Fix input vcc operand for VOP2b instructions Adds vcc to output string input for e32. Allows option of using e64 encoding with assembler. Also fixes these instructions not implicitly reading exec. llvm-svn: 247074	2015-09-08 21:15:00 +00:00
Matt Arsenault	8ac35cd031	AMDGPU: Mark s_barrier as a high latency instruction These were marked as WriteSALU, which is low latency. I'm guessing at the value to use, but it should probably be considered the highest latency instruction. I'm not sure this has any actual effect since hasSideEffects probably is preventing any moving of these. llvm-svn: 247060	2015-09-08 19:54:32 +00:00
Matt Arsenault	8fb810a1d2	AMDGPU: Fix s_barrier flags This should be convergent. This is not a barrier in the isBarrier sense, nor hasCtrlDep. llvm-svn: 247059	2015-09-08 19:54:25 +00:00
Matt Arsenault	966a94f861	AMDGPU: Handle sub of constant for DS offset folding sub C, x - > add (sub 0, x), C for DS offsets. This is mostly to fix regressions that show up when SeparateConstOffsetFromGEP is enabled. llvm-svn: 247054	2015-09-08 19:34:22 +00:00
Sanjay Patel	ce74db9d8d	check for fastness before merging in DAGCombiner::MergeConsecutiveStores() Use and check the 'IsFast' optional parameter to TLI.allowsMemoryAccess() any time we have a merged access candidate. Without this patch, we were generating unaligned 16-byte (SSE) memops for x86 targets where those accesses are slow. This change was mentioned in: http://reviews.llvm.org/D10662 and http://reviews.llvm.org/D10905 and will help solve PR21711. Differential Revision: http://reviews.llvm.org/D12573 llvm-svn: 246771	2015-09-03 15:03:19 +00:00
Matt Arsenault	51d2d0f668	AMDGPU: Fix adding redundant implicit operands These are already added during the MachineInstr construction, so this was adding the implicit registers twice. llvm-svn: 246525	2015-09-01 02:02:21 +00:00
Matt Arsenault	e4d0c142e8	AMDGPU: Add sdst operand to VOP2b instructions The VOP3 encoding of these allows any SGPR pair for the i1 output, but this was forced before to always use vcc. This doesn't yet try to use this, but does add the operand to the definitions so the main change is adding vcc to the output of the VOP2 encoding. llvm-svn: 246358	2015-08-29 07:16:50 +00:00
Matt Arsenault	9a32cd3d3b	AMDGPU: Set mem operands for spill instructions llvm-svn: 246357	2015-08-29 06:48:57 +00:00
Matt Arsenault	5c004a7c61	AMDGPU: Fix dropping mem operands when moving to VALU Without a memory operand, mayLoad or mayStore instructions are treated as hasUnorderedMemRef, which results in much worse scheduling. We really should have a verifier check that any non-side effecting mayLoad or mayStore has a memory operand. There are a few instructions (interp and images) which I'm not sure what / where to add these. llvm-svn: 246356	2015-08-29 06:48:46 +00:00
Tom Stellard	eea72ccbf2	AMDGPU/SI: Fix some invaild assumptions when folding 64-bit immediates Summary: We were assuming tha if the use operand had a sub-register that the immediate was 64-bits, but this was breaking the case of folding a 64-bit immediate into another 64-bit instruction. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12255 llvm-svn: 246354	2015-08-29 01:58:21 +00:00
Tom Stellard	b8ce14c4c3	AMDGPU/SI: Factor operand folding code into its own function Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12254 llvm-svn: 246353	2015-08-28 23:45:19 +00:00
Matt Arsenault	8a067121f8	AMDGPU: Delete dead code There is no context where s_mov_b64 is emitted and could potentially be moved to the VALU. It is currently only emitted for materializing immediates, which can't be dependent on vector sources. The immediate splitting is already done when selecting constants. I'm not sure what contexts if any the register splitting would have been used before. Also clean up using s_mov_b64 in place of v_mov_b64_pseudo, although this isn't required and just skips the extra step of eliminating the copy from the SReg_64. llvm-svn: 246080	2015-08-26 20:48:08 +00:00
Matt Arsenault	5e7f95e567	AMDGPU: Don't reprocess instructions when splitting i64 bcnt llvm-svn: 246079	2015-08-26 20:48:04 +00:00
Matt Arsenault	445833cc91	AMDGPU: Fix not moving users of s_bfe_i64 to VALU This wouldn't propagate to users of the original BFE and would hit a verifier error. llvm-svn: 246078	2015-08-26 20:47:58 +00:00
Matt Arsenault	f003c38e1e	AMDGPU: Don't create intermediate SALU instructions When splitting 64-bit operations, create the correct VALU instructions immediately. This was splitting things like s_or_b64 into the two s_or_b32s and then pushing the new instructions onto the worklist. There's no reason we need to do this intermediate step. llvm-svn: 246077	2015-08-26 20:47:50 +00:00
Matt Arsenault	602a16d3db	AMDGPU/SI: Report SIFixSGPRLiveRanges changed function llvm-svn: 246056	2015-08-26 19:12:03 +00:00
Matt Arsenault	bd66061db7	AMDGPU: Make sure to reserve super registers I think this could potentially have broken if one of the super registers were allocated that contain v254/v255. llvm-svn: 246051	2015-08-26 18:54:50 +00:00
Matt Arsenault	19c5488015	AMDGPU: Produce error on dynamic_stackalloc llvm-svn: 246048	2015-08-26 18:37:13 +00:00
Matt Arsenault	0a3ac1be43	AMDGPU: Allow specifying different opcode on VI for SMRD/SMEM Although the basic s_load_* instructions happen to use the same opcode, some of the special case SMRD instructions have different opcodes. llvm-svn: 245775	2015-08-22 00:54:31 +00:00
Matt Arsenault	e8df879948	AMDGPU: Improve accuracy of instruction rates for some FP instructions llvm-svn: 245774	2015-08-22 00:50:41 +00:00
Matt Arsenault	33010103b7	AMDGPU: Use DFS to avoid second loop over function llvm-svn: 245772	2015-08-22 00:43:38 +00:00
Matt Arsenault	c8d8e4ed76	AMDGPU: Make sure to run verifier after SIFixSGPRLiveRanges llvm-svn: 245769	2015-08-22 00:19:34 +00:00
Matt Arsenault	aba29d6ab1	AMDGPU: Improve debug printing in SIFixSGPRLiveRanges llvm-svn: 245768	2015-08-22 00:19:25 +00:00
Matt Arsenault	6adf07a92e	AMDGPU: Move CI instructions into CIInstructions.td There are still a couple of CI patterns left in SIInstructions. llvm-svn: 245767	2015-08-22 00:16:34 +00:00
Matt Arsenault	f56872dc30	AMDGPU: Minor cleanups to help with f16 support The main change is inverting the condition for the operand class classes so that VT.Size == 16 uses VGPR_32 instead of 64. llvm-svn: 245764	2015-08-21 23:49:51 +00:00
Tom Stellard	bd8a0856e2	AMDGPU/SI: Better handle s_wait insertion We can wait on either VM, EXP or LGKM. The waits are independent. Without this patch, a wait inserted because of one of them would also wait for all the previous others. This patch makes s_wait only wait for the ones we need for the next instruction. Here's an example of subtle perf reduction this patch solves: This is without the patch: buffer_load_format_xyzw v[8:11], v0, s[44:47], 0 idxen buffer_load_format_xyzw v[12:15], v0, s[48:51], 0 idxen s_load_dwordx4 s[44:47], s[8:9], 0xc s_waitcnt lgkmcnt(0) buffer_load_format_xyzw v[16:19], v0, s[52:55], 0 idxen s_load_dwordx4 s[48:51], s[8:9], 0x10 s_waitcnt vmcnt(1) buffer_load_format_xyzw v[20:23], v0, s[44:47], 0 idxen The s_waitcnt vmcnt(1) is useless. The reason it is added is because the last buffer_load_format_xyzw needs s[44:47], which was issued by the first s_load_dwordx4. It waits for all VM before that call to have finished. Internally after every instruction, 3 counters (for VM, EXP and LGTM) are updated after every instruction. For example buffer_load_format_xyzw will increase the VM counter, and s_load_dwordx4 the LGKM one. Without the patch, for every defined register, the current 3 counters are stored, and are used to know how long to wait when an instruction needs the register. Because of that, the s[44:47] counter includes that to use the register you need to wait for the previous buffer_load_format_xyzw. Instead this patch stores only the counters that matter for the register, and puts zero for the other ones, since we don't need any wait for them. Patch by: Axel Davy Differential Revision: http://reviews.llvm.org/D11883 llvm-svn: 245755	2015-08-21 22:47:27 +00:00
Michael Kuperstein	dcdab4cd3a	[TLI] Refactor "is integer division cheap" queries. This removes the isPow2SDivCheap() query, as it is not currently used in any meaningful way. isIntDivCheap() no longer relies on a state variable (as all in-tree target set it to false), but the interface allows querying based on the type optimization level. NFC. Differential Revision: http://reviews.llvm.org/D12082 llvm-svn: 245430	2015-08-19 11:17:59 +00:00
Matthias Braun	d55bcf2646	MachineRegisterInfo: Introduce isPhysRegUsed() This method checks whether a physical regiser or any of its aliases are used in the function. Using this function in SIRegisterInfo::findUnusedReg() should also fix this reported failure: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150803/292143.html http://reviews.llvm.org/rL242173#inline-533 The report doesn't come with a testcase and I don't know enough about AMDGPU to create one myself. llvm-svn: 245329	2015-08-18 18:54:27 +00:00
Yaron Keren	178c465223	Add missing include guard. llvm-svn: 245173	2015-08-16 07:55:08 +00:00
Matt Arsenault	588732bd6e	AMDGPU/SI: Only look at live out SGPR defs When trying to fix SGPR live ranges, skip defs that are killed in the same block as the def. I don't think we need to worry about these cases as long as the live ranges of the SGPRs in dominating blocks are correct. This reduces the number of elements the second loop over the function needs to look at, and makes it generally easier to understand. The second loop also only considers if the live range is live in to a block, which logically means it must have been live out from another. llvm-svn: 245150	2015-08-15 02:58:49 +00:00
James Y Knight	5567bafe93	Remove redundant TargetFrameLowering::getFrameIndexOffset virtual function. This was the same as getFrameIndexReference, but without the FrameReg output. Differential Revision: http://reviews.llvm.org/D12042 llvm-svn: 245148	2015-08-15 02:32:35 +00:00
Matt Arsenault	297ae311ce	AMDGPU/SI: Fix printing useless info with amdhsa The comments at the bottom would all report 0 if amdhsa was used. llvm-svn: 245135	2015-08-15 00:12:39 +00:00
Matt Arsenault	0259a7aa41	AMDGPU/SI: Update LiveVariables This is simple but won't work if/when this pass is moved to be post-SSA. llvm-svn: 245134	2015-08-15 00:12:37 +00:00
Matt Arsenault	670ba46efe	AMDGPU/SI: Update LiveIntervals during SIFixSGPRLiveRanges Does not mark SlotIndexes as reserved, although I think that might be OK. LiveVariables still need to be handled. llvm-svn: 245133	2015-08-15 00:12:35 +00:00
Matt Arsenault	b75233235c	AMDGPU: Remove unnecessary assert These shouldn't ever be null. The number of successors was already asserted to be 2. llvm-svn: 245132	2015-08-15 00:12:32 +00:00
Matt Arsenault	4275c29a02	AMDGPU/SI: Make comments more precise. True branch instructions do behave as expected with liveness. Avoid the phrasing "branch decision is based on a value in an SGPR" because this could be misleading. A VALU compare instruction's result is still based on an SGPR, even though that condition may be divergent. llvm-svn: 245131	2015-08-15 00:12:30 +00:00
Tom Stellard	bef1094ee7	AMDGPU/SI: Add missing spill class The compiler was failing to spill for some shaders. Patch By: Axel Davy llvm-svn: 245087	2015-08-14 19:46:05 +00:00
Simon Pilgrim	7218251861	[AMDGPU] Use the general SMAX/SMIN/UMAX/UMIN pattern matching and remove the AMDGPU implementation D9746 added general SMAX/SMIN/UMAX/UMIN pattern matching to SelectionDAGBuilder::visitSelect. Differential Revision: http://reviews.llvm.org/D12007 llvm-svn: 244960	2015-08-13 21:40:02 +00:00
Yaron Keren	556b21aa10	Remove and forbid raw_svector_ostream::flush() calls. After r244870 flush() will only compare two null pointers and return, doing nothing but wasting run time. The call is not required any more as the stream and its SmallString are always in sync. Thanks to David Blaikie for reviewing. llvm-svn: 244928	2015-08-13 18:12:56 +00:00
Matt Arsenault	c574686529	AMDGPU: Fix assert on dbg_value instructions llvm-svn: 244728	2015-08-12 09:04:44 +00:00
Alex Lorenz	e40c8a2b26	PseudoSourceValue: Replace global manager with a manager in a machine function. This commit removes the global manager variable which is responsible for storing and allocating pseudo source values and instead it introduces a new manager class named 'PseudoSourceValueManager'. Machine functions now own an instance of the pseudo source value manager class. This commit also modifies the 'get...' methods in the 'MachinePointerInfo' class to construct pseudo source values using the instance of the pseudo source value manager object from the machine function. This commit updates calls to the 'get...' methods from the 'MachinePointerInfo' class in a lot of different files because those calls now need to pass in a reference to a machine function to those methods. This change will make it easier to serialize pseudo source values as it will enable me to transform the mips specific MipsCallEntry PseudoSourceValue subclass into two target independent subclasses. Reviewers: Akira Hatanaka llvm-svn: 244693	2015-08-11 23:09:45 +00:00
Benjamin Kramer	df005cbe19	Fix some comment typos. llvm-svn: 244402	2015-08-08 18:27:36 +00:00
Tom Stellard	30cf77457d	AMDGPU/SI: Another attempt to fix Windows bots broken by r244372 llvm-svn: 244383	2015-08-08 01:11:07 +00:00
Matt Arsenault	cbd753761a	AMDGPU: Implement AMDGPUOperand::print() llvm-svn: 244381	2015-08-08 00:41:51 +00:00
Matt Arsenault	4635915504	AMDGPU/SI: Remove VCCReg llvm-svn: 244380	2015-08-08 00:41:48 +00:00
Matt Arsenault	6942d1a034	AMDGPU/SI: Remove source uses of VCCReg llvm-svn: 244379	2015-08-08 00:41:45 +00:00
Tom Stellard	fc70950bf2	AMDGPU/SI: Attempt to fix Windows bots broken by r244372 llvm-svn: 244376	2015-08-08 00:17:59 +00:00
Tom Stellard	fd25395c72	AMDGPU: Add pass to lower OpenCL image and sampler arguments. The pass adds new kernel arguments for image attributes, and resolves calls to dummy attribute and resource id getter functions. Patch by: Zoltan Gilian llvm-svn: 244372	2015-08-07 23:19:30 +00:00

1 2 3

146 Commits