llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	dc4ebad6d4	AMDGPU: Add kernarg.segment.ptr intrinsic llvm-svn: 268105	2016-04-29 21:16:52 +00:00
Matt Arsenault	ab2232cf73	DAGCombiner: Reduce truncated shl width llvm-svn: 268094	2016-04-29 19:53:16 +00:00
Tom Stellard	92b24f324b	AMDGPU/SI: Add offset field to ds_permute/ds_bpermute instructions Summary: These instructions can add an immediate offset to the address, like other ds instructions. Reviewers: arsenm Subscribers: arsenm, scchan Differential Revision: http://reviews.llvm.org/D19233 llvm-svn: 268043	2016-04-29 14:34:26 +00:00
Nikolay Haustov	4f672a34ed	AMDGPU/SI: Assembler: Unify parsing/printing of operands. Summary: The goal is for each operand type to have its own parse function and at the same time share common code for tracking state as different instruction types share operand types (e.g. glc/glc_flat, etc). Introduce parseAMDGPUOperand which can parse any optional operand. DPP and Clamp/OMod have custom handling for now. Sam also suggested to have class hierarchy for operand types instead of table. This can be done in separate change. Remove parseVOP3OptionalOps, parseDS*OptionalOps, parseFlatOptionalOps, parseMubufOptionalOps, parseDPPOptionalOps. Reduce number of definitions of AsmOperand's and MatchClasses' by using common base class. Rename AsmMatcher/InstPrinter methods accordingly. Print immediate type when printing parsed immediate operand. Use 'off' if offset/index register is unused instead of skipping it to make it more readable (also agreed with SP3). Update tests. Reviewers: tstellarAMD, SamWot, artem.tamazov Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19584 llvm-svn: 268015	2016-04-29 09:02:30 +00:00
Matthias Braun	f3619b8212	RegisterPressure: Fix default lanemask for missing regunit intervals In case of missing live intervals for a physical registers getLanesWithProperty() would report 0 which was not a safe default in all situations. Add a parameter to pass in a safe default. No testcase because in-tree targets do not skip computing register unit live intervals. Also cleanup the getXXX() functions to not perform the RequireLiveIntervals checks anymore so we do not even need to return safe defaults. llvm-svn: 267977	2016-04-29 02:44:54 +00:00
Matt Arsenault	1c4d0efe56	AMDGPU: Emit error if too much LDS is used llvm-svn: 267922	2016-04-28 19:37:35 +00:00
Matt Arsenault	c5fce69031	AMDGPU: Fix mishandling array allocations when promoting alloca The canonical form for allocas is a single allocation of the array type. In case we see a non-canonical array alloca, make sure we aren't replacing this with an array N times smaller. llvm-svn: 267916	2016-04-28 18:38:48 +00:00
Matthias Braun	fbe85ae12e	CodeGen: Add DetectDeadLanes pass. The DetectDeadLanes pass performs a dataflow analysis of used/defined subregister lanes across COPY instructions and instructions that will get lowered to copies. It detects dead definitions and uses reading undefined values which are obscured by COPY and subregister usage. These dead definitions cause trouble in the register coalescer which cannot deal with definitions suddenly becoming dead after coalescing COPY instructions. For now the pass only adds dead and undef flags to machine operands. It should be possible to extend it in the future to remove the dead instructions and redo the analysis for the affected virtual registers. Differential Revision: http://reviews.llvm.org/D18427 llvm-svn: 267851	2016-04-28 03:07:16 +00:00
Matt Arsenault	0547b016b1	AMDGPU: Account for globals in AMDGPUPromoteAlloca pass Patch by Bas Nieuwenhuizen llvm-svn: 267791	2016-04-27 21:05:08 +00:00
Nicolai Haehnle	f66bdb5ea8	AMDGPU/SI: Add llvm.amdgcn.s.waitcnt.all intrinsic Summary: So it appears that to guarantee some of the ordering requirements of a GLSL memoryBarrier() executed in the shader, we need to emit an s_waitcnt. (We can't use an s_barrier, because memoryBarrier() may appear anywhere in the shader, in particular it may appear in non-uniform control flow.) Reviewers: arsenm, mareko, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19203 llvm-svn: 267729	2016-04-27 15:46:01 +00:00
Artem Tamazov	5cd55b1784	[AMDGPU][llvm-mc] s_getreg/setreg* - Support symbolic names of hardware registers. Possibility to specify code of hardware register kept. Disassemble to symbolic name, if name is known. Tests updated/added. Differential Revision: http://reviews.llvm.org/D19335 llvm-svn: 267724	2016-04-27 15:17:03 +00:00
Konstantin Zhuravlyov	1d99c4d03c	[AMDGPU] Reserve VGPRs for trap handler usage if instructed Differential Revision: http://reviews.llvm.org/D19235 llvm-svn: 267563	2016-04-26 15:43:14 +00:00
Matt Arsenault	99c14524ec	AMDGPU: Implement addrspacecast llvm-svn: 267452	2016-04-25 19:27:24 +00:00
Matt Arsenault	48ab526f12	AMDGPU: Add queue ptr intrinsic llvm-svn: 267451	2016-04-25 19:27:18 +00:00
Artem Tamazov	d6468666b5	[AMDGPU][llvm-mc] s_getreg/setreg* - Add hwreg(...) syntax. Added hwreg(reg[,offset,width]) syntax. Default offset = 0, default width = 32. Possibility to specify 16-bit immediate kept. Added out-of-range checks. Disassembling is always to hwreg(...) format. Tests updated/added. Differential Revision: http://reviews.llvm.org/D19329 llvm-svn: 267410	2016-04-25 14:13:51 +00:00
Matt Arsenault	7e8de01f84	AMDGPU: sext_inreg (srl x, K), vt -> bfe x, K, vt.Size llvm-svn: 267244	2016-04-22 22:59:16 +00:00
Matt Arsenault	efa3fe14d1	AMDGPU: Re-visit nodes in performAndCombine This fixes test regressions when i64 loads/stores are made promote. llvm-svn: 267240	2016-04-22 22:48:38 +00:00
Matt Arsenault	3b748d76f6	DAGCombiner: Relax alignment restriction when changing store type If the target allows the alignment, this should be OK. llvm-svn: 267217	2016-04-22 21:01:41 +00:00
Matt Arsenault	629d12de70	DAGCombiner: Relax alignment restriction when changing load type If the target allows the alignment, this should still be OK. llvm-svn: 267209	2016-04-22 20:21:36 +00:00
Konstantin Zhuravlyov	a40d8358e7	[AMDGPU] Insert nop pass: take care of outstanding feedback - Switch few loops to range-based for loops - Fix nop insertion at the end of BB - Fix formatting - Check for endpgm Differential Revision: http://reviews.llvm.org/D19380 llvm-svn: 267167	2016-04-22 17:04:51 +00:00
Nicolai Haehnle	b0c9748709	AMDGPU/SI: add llvm.amdgcn.ps.live intrinsic Summary: This intrinsic returns true if the current thread belongs to a live pixel and false if it belongs to a pixel that we are executing only for derivative computation. It will be used by Mesa to implement gl_HelperInvocation. Note that for pixels that are killed during the shader, this implementation also returns true, but it doesn't matter because those pixels are always disabled in the EXEC mask. This unearthed a corner case in the instruction verifier, which complained about a v_cndmask 0, 1, exec, exec<imp-use> instruction. That's stupid but correct code, so make the verifier accept it as such. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19191 llvm-svn: 267102	2016-04-22 04:04:08 +00:00
Matt Arsenault	8d1052f55c	DAGCombiner: Reduce 64-bit BFE pattern to pattern on 32-bit component If the extracted bits are restricted to the upper half or lower half, this can be truncated. llvm-svn: 267024	2016-04-21 18:03:06 +00:00
Mandeep Singh Grang	029a0567fa	[LLVM] Remove unwanted --check-prefix=CHECK from unit tests. NFC. Summary: Removed unwanted --check-prefix=CHECK from numerous unit tests. Reviewers: t.p.northover, dblaikie, uweigand, MatzeB, tstellarAMD, mcrosier Subscribers: mcrosier, dsanders Differential Revision: http://reviews.llvm.org/D19279 llvm-svn: 266834	2016-04-19 23:51:52 +00:00
Nicolai Haehnle	b48275f134	Add IntrWrite[Arg]Mem intrinsic property Summary: This property is used to mark an intrinsic that only writes to memory, but neither reads from memory nor has other side effects. An example where this is useful is the llvm.amdgcn.buffer.store.format.* intrinsic, which corresponds to a store instruction that goes through a special buffer descriptor rather than through a plain pointer. With this property, the intrinsic should still be handled as having side effects at the LLVM IR level, but machine scheduling can make smarter decisions. Reviewers: tstellarAMD, arsenm, joker.eph, reames Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18291 llvm-svn: 266826	2016-04-19 21:58:33 +00:00
Nicolai Haehnle	e2dda4f750	AMDGPU: Guard VOPC instructions against incorrect commute Summary: The added testcase, which triggered this, was derived from a shader-db case via bugpoint. A separate question is why scalar branching wasn't used. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19208 llvm-svn: 266825	2016-04-19 21:58:22 +00:00
Konstantin Zhuravlyov	8c273ad719	[AMDGPU] Add insert nops pass based on subtarget features instead of cl::opt Also, - Skip pass if machine module does not have debug info - Minor comment changes - Added test Differential Revision: http://reviews.llvm.org/D19079 llvm-svn: 266626	2016-04-18 16:28:23 +00:00
Matt Arsenault	c10783c42d	AMDGPU: Enable LocalStackSlotAllocation pass This resolves more frame indexes early and folds the immediate offsets into the scratch mubuf instructions. This cleans up a lot of the mess that's currently emitted, such as emitting add 0s and repeatedly initializing the same register to 0 when spilling. llvm-svn: 266508	2016-04-16 02:13:37 +00:00
Matt Arsenault	b6be202779	AMDGPU: Use s_addk_i32 / s_mulk_i32 llvm-svn: 266506	2016-04-16 01:46:49 +00:00
Adrian Prantl	75819aedf6	[PR27284] Reverse the ownership between DICompileUnit and DISubprogram. Currently each Function points to a DISubprogram and DISubprogram has a scope field. For member functions the scope is a DICompositeType. DIScopes point to the DICompileUnit to facilitate type uniquing. Distinct DISubprograms (with isDefinition: true) are not part of the type hierarchy and cannot be uniqued. This change removes the subprograms list from DICompileUnit and instead adds a pointer to the owning compile unit to distinct DISubprograms. This would make it easy for ThinLTO to strip unneeded DISubprograms and their transitively referenced debug info. Motivation ---------- Materializing DISubprograms is currently the most expensive operation when doing a ThinLTO build of clang. We want the DISubprogram to be stored in a separate Bitcode block (or the same block as the function body) so we can avoid having to expensively deserialize all DISubprograms together with the global metadata. If a function has been inlined into another subprogram we need to store a reference the block containing the inlined subprogram. Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script that updates LLVM IR testcases to the new format. http://reviews.llvm.org/D19034 <rdar://problem/25256815> llvm-svn: 266446	2016-04-15 15:57:41 +00:00
Nicolai Haehnle	750082d1fe	AMDGPU/SI: Fix regression with no-return atomics Summary: In the added test-case, the atomic instruction feeds into a non-machine CopyToReg node which hasn't been selected yet, so guard against non-machine opcodes here. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19043 llvm-svn: 266433	2016-04-15 14:42:36 +00:00
Matt Arsenault	fd8ab09c0e	AMDGPU: Include LDS size in printed comment llvm-svn: 266382	2016-04-14 22:11:51 +00:00
Matt Arsenault	3d1c1deb04	AMDGPU: Run SIFoldOperands after PeepholeOptimizer PeepholeOptimizer cleans up redundant copies, which makes the operand folding more effective. shader-db stats: Totals: SGPRS: 34200 -> 34336 (0.40 %) VGPRS: 22118 -> 21655 (-2.09 %) Code Size: 632144 -> 633460 (0.21 %) bytes LDS: 11 -> 11 (0.00 %) blocks Scratch: 10240 -> 11264 (10.00 %) bytes per wave Max Waves: 8822 -> 8918 (1.09 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 7704 -> 7840 (1.77 %) VGPRS: 5169 -> 4706 (-8.96 %) Code Size: 234444 -> 235760 (0.56 %) bytes LDS: 2 -> 2 (0.00 %) blocks Scratch: 0 -> 1024 (0.00 %) bytes per wave Max Waves: 1188 -> 1284 (8.08 %) Wait states: 0 -> 0 (0.00 %) Increases: SGPRS: 35 (0.01 %) VGPRS: 1 (0.00 %) Code Size: 59 (0.02 %) LDS: 0 (0.00 %) Scratch: 1 (0.00 %) Max Waves: 48 (0.02 %) Wait states: 0 (0.00 %) Decreases: SGPRS: 26 (0.01 %) VGPRS: 54 (0.02 %) Code Size: 68 (0.03 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) Max Waves: 4 (0.00 %) Wait states: 0 (0.00 %) llvm-svn: 266378	2016-04-14 21:58:24 +00:00
Matt Arsenault	7900334dd5	AMDGPU: Fold bitcasts of scalar constants to vectors This cleans up some messes since the individual scalar components can be CSEed. llvm-svn: 266376	2016-04-14 21:58:07 +00:00
Tom Stellard	000c5af3e6	AMDGPU: Add skeleton GlobalIsel implementation Summary: This adds the necessary target code to be able to run the ir translator. Lowering function arguments and returns is a nop and there is no support for RegBankSelect. Reviewers: arsenm, qcolombet Subscribers: arsenm, joker.eph, vkalintiris, llvm-commits Differential Revision: http://reviews.llvm.org/D19077 llvm-svn: 266356	2016-04-14 19:09:28 +00:00
Nicolai Haehnle	13d90f324c	[DivergenceAnalysis] Treat PHI with incoming undef as constant Summary: If a PHI has an incoming undef, we can pretend that it is equal to one non-undef, non-self incoming value. This is particularly relevant in combination with the StructurizeCFG pass, which introduces PHI nodes with undefs. Previously, this lead to branch conditions that were uniform before StructurizeCFG to become non-uniform afterwards, which confused the SIAnnotateControlFlow pass. This fixes a crash when Mesa radeonsi compiles a shader from dEQP-GLES3.functional.shaders.switch.switch_in_for_loop_dynamic_vertex Reviewers: arsenm, tstellarAMD, jingyue Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D19013 llvm-svn: 266347	2016-04-14 17:42:47 +00:00
Nicolai Haehnle	723b73b4eb	AMDGPU: Remove SIFixSGPRLiveRanges pass Summary: This pass is unnecessary and overly conservative. It was motivated by situations like def %vreg0:SGPR_32 ... if-block: .. def %vreg1:SGPR_32 ... else-block: ... use %vreg0:SGPR_32 ... and similar situations with uses after the non-uniform control flow, where we are not allowed to assign %vreg0 and %vreg1 to the same physical register, even though in the original, thread/workitem-based CFG, it looks like the live ranges of these registers do not overlap. However, by the time register allocation runs, we have moved to a wave-based CFG that accurately represents the fact that the wave may run through both the if- and the else-block. So the live ranges of %vreg0 and %vreg1 already overlap even without the SIFixSGPRLiveRanges pass. In addition to proving this change correct, I have tested it with Piglit and a small number of other tests. Reviewers: arsenm, tstellarAMD Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19041 llvm-svn: 266345	2016-04-14 17:42:29 +00:00
Tom Stellard	79a1fd718c	AMDGPU: allow specifying a workgroup size that needs to fit in a compute unit Summary: For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD. This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions. Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug. Reviewers: mareko, arsenm, tstellarAMD, nhaehnle Subscribers: FireBurn, kerberizer, llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18340 Patch By: Bas Nieuwenhuizen llvm-svn: 266337	2016-04-14 16:27:07 +00:00
Tom Stellard	f110f8f9f7	AMDGPU/SI: Use the correct scratch wave offset register for shaders. Summary: The code previously always used s1 as it was using the user + system SGPR information for compute kernels. This is incorrect for Mesa shaders though, The register should be the next SGPR after all user and system SGPR's. We use that Mesa adds arguments for all input and system SGPR's and take the next available SGPR for the scratch wave offset register. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewers: mareko, arsenm, nhaehnle, tstellarAMD Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18941 Patch By: Bas Nieuwenhuizen llvm-svn: 266336	2016-04-14 16:27:03 +00:00
Matt Arsenault	9cd90712f0	AMDGPU: Implement canonicalize Also add generic DAG node for it. llvm-svn: 266272	2016-04-14 01:42:16 +00:00
Artem Tamazov	eb4d5a9b0b	[AMDGPU][llvm-mc] Support of Trap Handler registers (TTMP0..11 and TBA/TMA)git status Tests added along with implemented feature. Note that there is a small leftover of unecessary MI sheduling issue (more info in the review). CodeGen/AMDGPU/salu-to-valu.ll updated to fix the false regression. TODO: Support for TTMP quads, comma-separated syntax in "[]" and more. Differential Revision: http://reviews.llvm.org/D17825 llvm-svn: 266205	2016-04-13 16:18:41 +00:00
Matt Arsenault	887d4767b7	AMDGPU: Add test for m0 initialization in basic loop Initialization of m0 is emitted for each LDS operation, so every block with LDS usage ends up with one. MachineLICM used to fail to hoist this out of the loop, so every loop iteration with LDS usage in it would re-initialize it. This seems to be fixed now, so add a test to make sure that it stays this way. llvm-svn: 266156	2016-04-13 00:39:52 +00:00
Nicolai Haehnle	df77c9ada4	AMDGPU: add llvm.amdgcn.buffer.load/store intrinsics Summary: They correspond to BUFFER_LOAD/STORE_DWORD[_X2,X3,X4] and mostly behave like llvm.amdgcn.buffer.load/store.format. They will be used by Mesa for SSBO and atomic counters at least when robust buffer access behavior is desired. (These instructions perform no format conversion and do buffer range checking per component.) As a side effect of sharing patterns with llvm.amdgcn.buffer.store.format, it has become trivial to add support for the f32 and v2f32 variants of that intrinsic, so the patch does so. Also DAG-ify (and fix) some tests that I noticed intermittent failures in while developing this patch. Some tests were (temporarily) adjusted for the required mayLoad/hasSideEffects changes to the BUFFER_STORE_DWORD* instructions. See also http://reviews.llvm.org/D18291. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18292 llvm-svn: 266126	2016-04-12 21:18:10 +00:00
Tom Stellard	ab1d3a9d50	AMDGPU/SI: Insert wait states required after v_readfirstlane on SI Summary: We will be able to handle this case much better once the hazard recognizer is finished, but this conservative implementation fixes a hang with the piglit test: spec/arb_arrays_of_arrays/execution/sampler/fs-nested-struct-arrays-nonconst-nested-arra Reviewers: arsenm, nhaehnle Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18988 llvm-svn: 266105	2016-04-12 18:40:43 +00:00
Matt Arsenault	3b08238f78	AMDGPU: Eliminate half of i64 or if one operand is zero_extend from i32 This helps clean up some of the mess when expanding unaligned 64-bit loads when changed to be promote to v2i32, and fixes situations where or x, 0 was emitted after splitting 64-bit ors during moveToVALU. I think this could be a generic combine but I'm not sure. llvm-svn: 266104	2016-04-12 18:24:38 +00:00
Nicolai Haehnle	279970c0dc	AMDGPU/SI: Fix a mis-compilation of multi-level breaks Summary: Under certain circumstances, multi-level breaks (or what is understood by the control flow passes as such) could be miscompiled in a way that causes infinite loops, by emitting incorrect control flow intrinsics. This fixes a hang in dEQP-GLES3.functional.shaders.loops.while_dynamic_iterations.conditional_continue_vertex Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18967 llvm-svn: 266088	2016-04-12 16:10:38 +00:00
Matt Arsenault	64fa2f4513	AMDGPU: Implement i64 global atomics llvm-svn: 266075	2016-04-12 14:05:11 +00:00
Matt Arsenault	a9dbdcae04	AMDGPU: Add atomic_inc + atomic_dec intrinsics These are different than atomicrmw add 1 because they have an additional input value to clamp the result. llvm-svn: 266074	2016-04-12 14:05:04 +00:00
Matt Arsenault	44e5483ada	AMDGPU: Add volatile to test loads and stores When the memory vectorizer is enabled, these tests break. These tests don't really care about the memory instructions, and it's easier to write check lines with the unmerged loads. llvm-svn: 266071	2016-04-12 13:38:18 +00:00
Tom Stellard	0ffdf65eaa	Revert "AMDGPU/SI: Do not generate s_waitcnt after ds_permute/ds_bpermute" This reverts commit r263720. Just confirmed that s_waitcnt is required after ds_permute/ds_bpermute. llvm-svn: 265992	2016-04-11 20:38:40 +00:00
Jan Vesely	43b7b5b846	AMDGPU/SI: Implement atomic load/store for i32 and i64 Standard load/store instructions with GLC bit set. Reviewers: tstellardAMD, arsenm Differential Revision: http://reviews.llvm.org/D18760 llvm-svn: 265709	2016-04-07 19:23:11 +00:00
Tom Stellard	9112758077	AMDGPU/SI: Add latency for export instructions Reviewers: arsenm, nhaehnle Subscribers: nhaehnle, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18599 llvm-svn: 265708	2016-04-07 18:30:05 +00:00
Nicolai Haehnle	df3a20cd80	AMDGPU: Add a shader calling convention This makes it possible to distinguish between mesa shaders and other kernels even in the presence of compute shaders. Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Differential Revision: http://reviews.llvm.org/D18559 llvm-svn: 265589	2016-04-06 19:40:20 +00:00
Konstantin Zhuravlyov	e63e02cb0c	[AMDGPU] Emit linkonce and linkonce_odr symbols Differential Revision: http://reviews.llvm.org/D18726 llvm-svn: 265408	2016-04-05 16:00:58 +00:00
Tom Stellard	354a43c7bc	AMDGPU: Implement {BUFFER,FLAT}_ATOMIC_CMPSWAP{,_X2} Summary: Implement BUFFER_ATOMIC_CMPSWAP{,_X2} instructions on all GCN targets, and FLAT_ATOMIC_CMPSWAP{,_X2} on CI+. 32-bit instruction variants tested manually on Kabini and Bonaire. Tests and parts of code provided by Jan Veselý. Patch by: Vedran Miletić Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: jvesely, scchan, kanarayan, arsenm Differential Revision: http://reviews.llvm.org/D17280 llvm-svn: 265170	2016-04-01 18:27:37 +00:00
Adrian Prantl	b8089516a5	testcase gardening: update the emissionKind enum to the new syntax. (NFC) llvm-svn: 265081	2016-04-01 00:16:49 +00:00
Matt Arsenault	2fe4fbc184	AMDGPU: Add frexp_exp intrinsic llvm-svn: 264944	2016-03-30 22:28:52 +00:00
Tom Stellard	1d5e6d4bdc	AMDGPU/SI: Improve MachineSchedModel definition This patch contains a few improvements to the model, including: - Using a single resource with a defined buffers size for each memory unit. - Setting the IssueWidth correctly. - Fixing latency values for memory instructions. shader-db stats: 16429 shaders in 3231 tests Totals: SGPRS: 318232 -> 312328 (-1.86 %) VGPRS: 208996 -> 209346 (0.17 %) Code Size: 7147044 -> 7166440 (0.27 %) bytes LDS: 83 -> 83 (0.00 %) blocks Scratch: 1862656 -> 1459200 (-21.66 %) bytes per wave Max Waves: 49182 -> 49243 (0.12 %) Wait states: 0 -> 0 (0.00 %)A Differential Revision: http://reviews.llvm.org/D18453 llvm-svn: 264877	2016-03-30 16:35:13 +00:00
Tom Stellard	0bc954e3bc	AMDGPU/SI: Enable lanemask tracking in misched Summary: This results in higher register usage, but should make it easier for the compiler to hide latency. This pass is a prerequisite for some more scheduler improvements, and I think the increase register usage with this patch is acceptable, because when combined with the scheduler improvements, the total register usage will decrease. shader-db stats: 2382 shaders in 478 tests Totals: SGPRS: 48672 -> 49088 (0.85 %) VGPRS: 34148 -> 34847 (2.05 %) Code Size: 1285816 -> 1289128 (0.26 %) bytes LDS: 28 -> 28 (0.00 %) blocks Scratch: 492544 -> 573440 (16.42 %) bytes per wave Max Waves: 6856 -> 6846 (-0.15 %) Wait states: 0 -> 0 (0.00 %) Depends on D18451 Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18452 llvm-svn: 264876	2016-03-30 16:35:09 +00:00
Sanjay Patel	1f867c6f9c	fix checks: _DAG -> -DAG llvm-svn: 264676	2016-03-28 22:11:06 +00:00
Matthias Braun	2bd8eeb6b7	CodeGen: Correct specification of PHI nodes They do have a def machine operand. Fixing the definition is necessary for an upcoming patch. Differential Revision: http://reviews.llvm.org/D18384 llvm-svn: 264607	2016-03-28 18:18:41 +00:00
Tom Stellard	a76bcc2ea1	AMDGPU/SI: Limit load clustering to 16 bytes instead of 4 instructions Summary: This helps prevent load clustering from drastically increasing register pressure by trying to cluster 4 SMRDx8 loads together. The limit of 16 bytes was chosen, because it seems like that was the original intent of setting the limit to 4 instructions, but more analysis could show that a different limit is better. This fixes yields small decreases in register usage with shader-db, but also helps avoid a large increase in register usage when lane mask tracking is enabled in the machine scheduler, because lane mask tracking enables more opportunities for load clustering. shader-db stats: 2379 shaders in 477 tests Totals: SGPRS: 49744 -> 48600 (-2.30 %) VGPRS: 34120 -> 34076 (-0.13 %) Code Size: 1282888 -> 1283184 (0.02 %) bytes LDS: 28 -> 28 (0.00 %) blocks Scratch: 495616 -> 492544 (-0.62 %) bytes per wave Max Waves: 6843 -> 6853 (0.15 %) Wait states: 0 -> 0 (0.00 %) Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18451 llvm-svn: 264589	2016-03-28 16:10:13 +00:00
Matthias Braun	ae81c29352	LiveInterval: Fix Distribute() failing on liveranges with unused VNInfos This fixes http://llvm.org/PR26991 llvm-svn: 264345	2016-03-24 21:41:38 +00:00
Matt Arsenault	30d37a74da	AMDGPU: Remove atomic inc/dec patterns There is no benefit to these since materializing the constant 1 requires the same number of instructions as materializing uint_max llvm-svn: 264215	2016-03-23 23:23:38 +00:00
Matt Arsenault	0a30e456b4	AMDGPU: Promote alloca should skip volatiles llvm-svn: 264214	2016-03-23 23:17:29 +00:00
Matt Arsenault	f43c2a0b49	AMDGPU: Insert moves of frame index to value operands Strengthen tests of storing frame indices. Right now this just creates irrelevant scheduling changes. We don't want to have multiple frame index operands on an instruction. There seem to be various assumptions that at least the same frame index will not appear twice in the LocalStackSlotAllocation pass. There's no reason to have this happen, and it just makes it easy to introduce bugs where the immediate offset is appplied to the storing instruction when it should really be applied to the value being stored as a separate add. This might not be sufficient. It might still be problematic to have an add fi, fi situation, but that's even less unlikely to happen in real code. llvm-svn: 264200	2016-03-23 21:49:25 +00:00
Nicolai Haehnle	213e87f2ee	AMDGPU: Add SIWholeQuadMode pass Summary: Whole quad mode is already enabled for pixel shaders that compute derivatives, but it must be suspended for instructions that cause a shader to have side effects (i.e. stores and atomics). This pass addresses the issue by storing the real (initial) live mask in a register, masking EXEC before instructions that require exact execution and (re-)enabling WQM where required. This pass is run before register coalescing so that we can use machine SSA for analysis. The changes in this patch expose a problem with the second machine scheduling pass: target independent instructions like COPY implicitly use EXEC when they operate on VGPRs, but this fact is not encoded in the MIR. This can lead to miscompilation because instructions are moved past changes to EXEC. This patch fixes the problem by adding use-implicit operands to target independent instructions. Some general codegen passes are relaxed to work with such implicit use operands. Reviewers: arsenm, tstellarAMD, mareko Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18162 llvm-svn: 263982	2016-03-21 20:28:33 +00:00
Tom Stellard	92339e888f	AMDGPU/SI: Fix threshold calculation for branching when exec is zero Summary: When control flow is implemented using the exec mask, the compiler will insert branch instructions to skip over the masked section when exec is zero if the section contains more than a certain number of instructions. The previous code would only count instructions in successor blocks, and this patch modifies the code to start counting instructions in all blocks between the start and end of the branch. Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18282 llvm-svn: 263969	2016-03-21 18:56:58 +00:00
Matt Arsenault	cb38a6bd35	AMDGPU: Remove SignBitIsZero for mubuf scratch offsets These instructions do not have the same negative base address problem that DS instructions do on SI. llvm-svn: 263964	2016-03-21 18:02:18 +00:00
Matt Arsenault	b96b57347a	AMDGPU: Add frexp_mant intrinsic llvm-svn: 263948	2016-03-21 16:11:05 +00:00
Nicolai Haehnle	95e8ffd398	AMDGPU: Overload return type of llvm.amdgcn.buffer.load.format Summary: Allow the selection of BUFFER_LOAD_FORMAT_x and _XY. Do this now before the frontend patches land in Mesa. Eventually, we may want to automatically reduce the size of loads at the LLVM IR level, which requires such overloads, and in some cases Mesa can generate them directly. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18255 llvm-svn: 263792	2016-03-18 16:24:40 +00:00
Nicolai Haehnle	ad63638f6d	AMDGPU/SI: Add llvm.amdgcn.buffer.atomic.* intrinsics Summary: These intrinsics expose the BUFFER_ATOMIC_* instructions and will be used by Mesa to implement atomics with buffer semantics. The intrinsic interface matches that of buffer.load.format and buffer.store.format, except that the GLC bit is not exposed (it is automatically deduced based on whether the return value is used). The change of hasSideEffects is required for TableGen to accept the pattern that matches the intrinsic. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, rivanvx, llvm-commits Differential Revision: http://reviews.llvm.org/D18151 llvm-svn: 263791	2016-03-18 16:24:31 +00:00
Nicolai Haehnle	3003ba00a3	AMDGPU: use ComplexPattern for offsets in llvm.amdgcn.buffer.load/store.format Summary: We cannot easily deduce that an offset is in an SGPR, but the Mesa frontend cannot easily make use of an explicit soffset parameter either. Furthermore, it is likely that in the future, LLVM will be in a better position than the frontend to choose an SGPR offset if possible. Since there aren't any frontend uses of these intrinsics in upstream repositories yet, I would like to take this opportunity to change the intrinsic signatures to a single offset parameter, which is then selected to immediate offsets or voffsets using a ComplexPattern. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18218 llvm-svn: 263790	2016-03-18 16:24:20 +00:00
Sam Kolton	a74cd526e9	[AMDGPU] Assembler: Change dpp_ctrl syntax to match sp3 Review: http://reviews.llvm.org/D18267 llvm-svn: 263789	2016-03-18 15:35:51 +00:00
Changpeng Fang	234fcb81d3	AMDGPU/SI: Do not generate s_waitcnt after ds_permute/ds_bpermute Symmary: ds_permute/ds_bpermute do not read memory so s_waitcnt is not needed. Reviewers arsenm, tstellarAMD Subscribers llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18197 llvm-svn: 263720	2016-03-17 16:43:50 +00:00
Nicolai Haehnle	ef160de3e5	AMDGPU: Prevent uniform loops from becoming infinite Summary: Uniform loops where the branch leaving the loop is predicated on VCCNZ must be skipped if EXEC = 0, otherwise they will be infinite. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18137 llvm-svn: 263658	2016-03-16 20:14:33 +00:00
Changpeng Fang	01f6062227	AMDGPU/SI: Implement GroupStaticSize Intrinsic for Dynamic LDS Summary: Static LDS size is saved in MachineFunctionInfo::LDSSize, We define a pseudo instruction with usesCustomInserter bit set. Then, in EmitInstrWithCustomInserter, we replace this pseudo instruction with a mov of MachineFunctionInfo::LDSSize. Reviewers: arsenm tstellarAMD Subscribers llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18064 llvm-svn: 263563	2016-03-15 17:28:44 +00:00
Tom Stellard	331f981cc9	AMDGPU/SI: Handle wait states required for DPP instructions Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17543 llvm-svn: 263447	2016-03-14 17:05:56 +00:00
Marek Olsak	ed2213e6ef	AMDGPU/SI: Incomplete shader binaries need to finish execution at the end Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D18058 llvm-svn: 263441	2016-03-14 15:57:14 +00:00
Chad Rosier	e6a5231628	Update test case to appease bots after 263255. I'll follow up with Matt to confirm this is the correct fix. llvm-svn: 263268	2016-03-11 17:33:36 +00:00
Nikolay Haustov	6560781c4f	[AMDGPU] Assembler: change v_madmk operands to have same order as mad. The constant is now at source operand 1 (previously at 2). This is also how it is in legacy AMD sp3 assembler. Update tests. Differential Revision: http://reviews.llvm.org/D17984 llvm-svn: 263212	2016-03-11 09:27:25 +00:00
Matt Arsenault	9a19c240c0	AMDGPU: Materialize sign bits with bfrev If a constant is the same as the reverse of an inline immediate, this is 4 bytes smaller than having to embed a 32-bit literal. llvm-svn: 263201	2016-03-11 07:42:49 +00:00
Nicolai Haehnle	b142770bfe	AMDGPU/SI: add llvm.amdgcn.buffer.load/store.format intrinsics Summary: They correspond to BUFFER_LOAD/STORE_FORMAT_XYZW and will be used by Mesa to implement the GL_ARB_shader_image_load_store extension. The intention is that for llvm.amdgcn.buffer.load.format, LLVM will decide whether one of the _X/_XY/_XYZ opcodes can be used (similar to image sampling and loads). However, this is not currently implemented. For llvm.amdgcn.buffer.store, LLVM cannot decide to use one of the "smaller" opcodes and therefore the intrinsic is overloaded. Currently, only the v4f32 is actually implemented since GLSL also only has a vec4 variant of the store instructions, although it's conceivable that Mesa will want to be smarter about this in the future. BUFFER_LOAD_FORMAT_XYZW is already exposed via llvm.SI.vs.load.input, which has a legacy name, pretends not to access memory, and does not capture the full flexibility of the instruction. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17277 llvm-svn: 263140	2016-03-10 18:43:50 +00:00
Changpeng Fang	278a5b31a5	AMDGPU/SI: Define S_GETREG Intrinsic Summary: Define s_getreg intrinsic to generate s_getreg instruction to read hardware registers. Reviewers: tstellarAMD, arsenm Subscribers: llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D17892 llvm-svn: 263124	2016-03-10 16:47:15 +00:00
Tom Stellard	9f2e00de7b	SelectionDAG: Fix a crash on inline asm when output register supports multiple types Summary: The code in SelectionDAG did not handle the case where the register type and output types were different, but had the same size. Reviewers: arsenm, echristo Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D17940 llvm-svn: 263022	2016-03-09 16:02:52 +00:00
Sam Kolton	dfa29f7c5b	[AMDGPU] Assembler: Support DPP instructions. Supprot DPP syntax as used in SP3 (except several operands syntax). Added dpp-specific operands in td-files. Added DPP flag to TSFlags to determine if instruction is dpp in InstPrinter. Support for VOP2 DPP instructions in td-files. Some tests for DPP instructions. ToDo: - VOP2bInst: - vcc is considered as operand - AsmMatcher doesn't apply mnemonic aliases when parsing operands - v_mac_f32 - v_nop - disable instructions with 64-bit operands - change dpp_ctrl assembler representation to conform sp3 Review: http://reviews.llvm.org/D17804 llvm-svn: 263008	2016-03-09 12:29:31 +00:00
Matt Arsenault	c89f2919a4	AMDGPU: Match more med3 integer patterns llvm-svn: 262864	2016-03-07 21:54:48 +00:00
Matthias Braun	4797ec95e4	RegisterCoalescer: Remap subregister lanemasks before exchanging operands Rematerializing and merging into a bigger register class at the same time, requires the subregister range lanemasks getting remapped to the new register class. This fixes http://llvm.org/PR26805 llvm-svn: 262768	2016-03-05 04:36:13 +00:00
Tom Stellard	649b5db557	AMDGPU/SI: Add support for spiling SGPRs to scratch buffer Summary: This is necessary for when we run out of VGPRs and can no longer use v_{read,write}_lane for spilling SGPRs. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17592 llvm-svn: 262732	2016-03-04 18:31:18 +00:00
Nikolay Haustov	5bf46ac150	AMDGPU/SI: add llvm.amdgcn.image.atomic.* intrinsics These correspond to IMAGE_ATOMIC_* and are going to be used by Mesa for the GL_ARB_shader_image_load_store extension. Initial change by Nicolai H.hnle Differential Revision: http://reviews.llvm.org/D17401 llvm-svn: 262701	2016-03-04 10:39:50 +00:00
Matt Arsenault	7d0a77b979	DAGCombiner: Make sure an integer is being truncated llvm-svn: 262446	2016-03-02 01:36:51 +00:00
Matt Arsenault	b36d462fac	DAGCombiner: Turn truncate of a bitcasted vector to an extract On AMDGPU where operations i64 operations are often bitcasted to v2i32 and back, this pattern shows up regularly where it breaks some expected combines on i64, such as load width reducing. This fixes some test failures in a future commit when i64 loads are changed to promote. llvm-svn: 262397	2016-03-01 21:31:53 +00:00
Matt Arsenault	03dac8d8e4	DAGCombiner: Turn extract of bitcasted integer into truncate This reduces the number of bitcast nodes and generally cleans up the DAG when bitcasting between integers and vectors everywhere. llvm-svn: 262358	2016-03-01 18:01:37 +00:00
Changpeng Fang	24f035af32	AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and Intrinsics Summary: This patch impleemnts DS_PERMUTE/DS_BPERMUTE instruction definitions and intrinsics, which are new since VI. Reviewers: tstellarAMD, arsenm Subscribers: llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D17614 llvm-svn: 262356	2016-03-01 17:51:23 +00:00
Matt Arsenault	59b8b77405	AMDGPU: Set HasExtractBitInsn This currently does not have the control over the bitwidth, and there are missing optimizations to reduce the integer to 32-bit if it can be. But in most situations we do want the sinking to occur. llvm-svn: 262296	2016-03-01 04:58:17 +00:00
Matt Arsenault	3a61985b2f	AMDGPU: More bits of frame index are known to be zero The maximum private allocation for the whole GPU is 4G, so the maximum possible index for a single workitem is the maximum size divided by the smallest granularity for a dispatch. This increases the number of known zero high bits, which enables more offset folding. The maximum private size per workitem with this is 128M but may be smaller still. llvm-svn: 262153	2016-02-27 20:26:57 +00:00
Matt Arsenault	982224cfb8	DAGCombiner: Don't unnecessarily swap operands in ReassociateOps In the case where op = add, y = base_ptr, and x = offset, this transform: (op y, (op x, c1)) -> (op (op x, y), c1) breaks the canonical form of add by putting the base pointer in the second operand and the offset in the first. This fix is important for the R600 target, because for some address spaces the base pointer and the offset are stored in separate register classes. The old pattern caused the ISel code for matching addressing modes to put the base pointer and offset in the wrong register classes, which required no-trivial code transformations to fix. llvm-svn: 262148	2016-02-27 19:57:45 +00:00
Matt Arsenault	360d244d5b	DAGCombiner: Relax sqrt NaN folding check This is OK for +0 since compares to +/-0 give the same result. llvm-svn: 262125	2016-02-27 09:38:05 +00:00
Matt Arsenault	274d34e725	AMDGPU: Add s_sleep intrinsic llvm-svn: 262120	2016-02-27 08:53:52 +00:00
Matt Arsenault	61738cbcb6	AMDGPU: Implement readcyclecounter This matches the behavior of the HSAIL clock instruction. s_realmemtime is used if the subtarget supports it, and falls back to s_memtime if not. Also introduces new intrinsics for each of s_memtime / s_memrealtime. llvm-svn: 262119	2016-02-27 08:53:46 +00:00
Nikolay Haustov	2f684f1347	[AMDGPU] Assembler: Basic support for MIMG Add parsing and printing of image operands. Matches legacy sp3 assembler. Change image instruction order to have data/image/sampler operands in the beginning. This is needed because optional operands in MC are always last. Update SITargetLowering for new order. Add basic MC test. Update CodeGen tests. Review: http://reviews.llvm.org/D17574 llvm-svn: 261995	2016-02-26 09:51:05 +00:00
Matthias Braun	9dcd65f478	MachineCopyPropagation: Catch copies of the form A<-B;A<-B Differential Revision: http://reviews.llvm.org/D17475 llvm-svn: 261966	2016-02-26 03:18:55 +00:00
Matt Arsenault	d38272b214	AMDGPU: Add failing testcase for register coalescer llvm-svn: 261592	2016-02-22 23:45:42 +00:00
Matt Arsenault	d85e5a1303	AMDGPU: Fix alignments in test I don't think this test was intending to test unaligned load/store. Change it to use the natural alignment to avoid regressing. Also adds missing SI checks. llvm-svn: 261571	2016-02-22 21:04:23 +00:00
Matt Arsenault	fa67bdbde0	AMDGPU/R600: Implement allowsMisalignedMemoryAccess This avoids some test regressions in a future commit when unaligned operations are expanded when they have custom lowering. llvm-svn: 261570	2016-02-22 21:04:16 +00:00
Tom Stellard	467b5b9024	AMDGPU/SI: Use v_readfirstlane to legalize SMRD with VGPR base pointer Summary: Instead of trying to replace SMRD instructions with a VGPR base pointer with an equivalent MUBUF instruction, we now copy the base pointer to SGPRs using v_readfirstlane. This is safe to do, because any load selected as an SMRD instruction has been proven to have a uniform base pointer, so each thread in the wave will have the same pointer value in VGPRs. This will fix some errors on VI from trying to replace SMRD instructions with addr64-enabled MUBUF instructions that don't exist. Reviewers: arsenm, cfang, nhaehnle Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17305 llvm-svn: 261385	2016-02-20 00:37:25 +00:00
Tom Stellard	2d26fe7aa6	AMDGPU/SI: Fix s_waitcnt insertion for flat instructions Summary: This was broken in r260694 which swapped the address and data operands for flat store instructions. The code in SIInsertWaits assumes that the data operand always comes before the address operand, so we need to add a special case for flat. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17366 llvm-svn: 261330	2016-02-19 15:33:13 +00:00
Nicolai Haehnle	f2c64db55a	AMDGPU/SI: add llvm.amdgcn.image.load/store[.mip] intrinsics Summary: These correspond to IMAGE_LOAD/STORE[_MIP] and are going to be used by Mesa for the GL_ARB_shader_image_load_store extension. IMAGE_LOAD is already matched by llvm.SI.image.load. That intrinsic has a legacy name and pretends not to read memory. Differential Revision: http://reviews.llvm.org/D17276 llvm-svn: 261224	2016-02-18 16:44:18 +00:00
Matt Arsenault	f2ddbf00ed	AMDGPU: Prepare for reducing private element size. Tests for the new scalarize all private access options will be included with a future commit. The only functional change is to make the split/scalarize behavior for private access of > 4 element vectors to be consistent with the flat/global handling. This makes the spilling worse in the two changed tests. llvm-svn: 260804	2016-02-13 04:18:53 +00:00
Tom Stellard	4409051d00	AMDGPU/SI: Add llvm.amdgcn.mov.dpp intrinsic This intrinsic will be used to expose dpp functionality to higher-level languages. It will map to the dpp version of v_mov_b32. llvm-svn: 260792	2016-02-13 02:09:49 +00:00
Matt Arsenault	ce56a0ef54	AMDGPU: Add intrinsics for sin/cos These provide direct access to the hardware instruction without the unit version required like llvm.sin/llvm.cos lowering requires. llvm-svn: 260782	2016-02-13 01:19:56 +00:00
Matt Arsenault	79963e80b8	AMDGPU: Rename intrinsic to better match instruction name Also fixes missing f32 test. llvm-svn: 260780	2016-02-13 01:03:00 +00:00
Tom Stellard	bc4497b13c	AMDGPU/SI: Detect uniform branches and emit s_cbranch instructions Reviewers: arsenm Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16603 llvm-svn: 260765	2016-02-12 23:45:29 +00:00
Tom Stellard	46937ca4e7	[AMDGPU] Assembler: Swap operands of flat_store instructions to match AMD assembler Historically, AMD internal sp3 assembler has flat_store* addr, data format. To match existing code and to enable reuse, change LLVM definitions to match. Also update MC and CodeGen tests. Differential Revision: http://reviews.llvm.org/D16927 Patch by: Nikolay Haustov llvm-svn: 260694	2016-02-12 17:57:54 +00:00
Changpeng Fang	e07f1aa8fa	AMDGPU/SI: Annotate Loops with Constant Condition in SIAnnotateControlFlow pass. Summary: It is possible that the loop condition can be a boolean constant (infinite loop, for example). So we sould handle constant condition in annotating a loop. This patch adds this functionality to support annotating constant condition. Reviewers: tstellarAMD, arsenm Subscribers: llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D15093 llvm-svn: 260692	2016-02-12 17:11:04 +00:00
Matt Arsenault	296b849163	AMDGPU: Set flat_scratch from flat_scratch_init reg This was hardcoded to the static private size, but this would be missing the offset and additional size for someday when we have dynamic sizing. Also stops always initializing flat_scratch even when unused. In the future we should stop emitting this unless flat instructions are used to access private memory. For example this will initialize it almost always on VI because flat is used for global access. llvm-svn: 260658	2016-02-12 06:31:30 +00:00
Matt Arsenault	24ee0785dd	AMDGPU: Set element_size in private resource descriptor Introduce a subtarget feature for this, and leave the default with the current behavior which assumes up to 16-byte loads/stores can be used. The field also seems to have the ability to be set to 2 bytes, but I'm not sure what that would be used for. llvm-svn: 260651	2016-02-12 02:40:47 +00:00
Nicolai Haehnle	b80a5811ce	AMDGPU: Quick fix for extreme slowness in spill-scavenge-offset.ll test Summary: Also, some cosmetic fixes. Reviewers: arsenm, tstellarAMD Subscribers: qcolombet, llvm-commits Differential Revision: http://reviews.llvm.org/D17161 llvm-svn: 260625	2016-02-12 00:05:34 +00:00
Tom Stellard	1397d49ef5	AMDGPU/SI: Make sure MIMG descriptors and samplers stay in SGPRs Summary: It's possible to have resource descriptors and samplers stored in VGPRs, either by a VMEM instruction or in the case of samplers, floating-point calculations. When this happens, we need to use v_readfirstlane to copy these values back to sgprs. Reviewers: mareko, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17102 llvm-svn: 260599	2016-02-11 21:45:07 +00:00
Matt Arsenault	fcb345f172	AMDGPU: Fix constant bus use check with subregisters If the two operands to an instruction were both subregisters of the same super register, it would incorrectly think this counted as the same constant bus use. This fixes the verifier error in fmin_legacy.ll which was missing -verify-machineinstrs. llvm-svn: 260495	2016-02-11 06:15:39 +00:00
Matt Arsenault	9c47dd583a	AMDGPU: Remove some old intrinsic uses from tests llvm-svn: 260493	2016-02-11 06:02:01 +00:00
Nicolai Haehnle	d791bd07c7	AMDGPU: Release the scavenged offset register during VGPR spill Summary: This fixes a crash where subsequent spills would be unable to scavenge a register. In particular, it fixes a crash in piglit's spec@glsl-1.50@execution@geometry@max-input-components (the test still has a shader that fails to compile because of too many SGPR spills, but at least it doesn't crash any more). This is a candidate for the release branch. Reviewers: arsenm, tstellarAMD Subscribers: qcolombet, arsenm Differential Revision: http://reviews.llvm.org/D16558 llvm-svn: 260427	2016-02-10 20:13:58 +00:00
Matt Arsenault	92edab2df9	AMDGPU: Remove bfi and bfm intrinsics Nothing is using them. llvm-svn: 260123	2016-02-08 19:06:01 +00:00
Matt Arsenault	2bba779272	SelectionDAG: Lower some range metadata to AssertZext If a range has a lower bound of 0, add an AssertZext from the nearest floor power of two. This allows operations with some workitem intrinsics with known maximum ranges to use fast 24-bit multiplies. llvm-svn: 260109	2016-02-08 16:28:19 +00:00
Matt Arsenault	7f83397d72	AMDGPU: Account for LDS alignment The current situation isn't great, because the amount of padding requires is determined by the inverse order of the first encountered use. We should eventually somehow sort these to minimize wasted space. Another problem is the alignment of kernel arguments isn't respected. The group_segment_alignment is always emitted as the default 16, and typed arguments with higher alignments or an explicitly set alignment are also ignored. llvm-svn: 259912	2016-02-05 19:47:29 +00:00
Matt Arsenault	cf84e26fb6	AMDGPU: Preserve alignments on new created globals Also switch to internal linkage, and include the name of the function in the name. llvm-svn: 259911	2016-02-05 19:47:23 +00:00
Jonas Paulsson	ac29f01788	[ScheduleDAGInstrs::buildSchedGraph()] Handling of memory dependecies rewritten. Recommited, after some fixing with test cases. Updated test cases: test/CodeGen/AArch64/arm64-misched-memdep-bug.ll test/CodeGen/AArch64/tailcall_misched_graph.ll Temporarily disabled test cases: test/CodeGen/AMDGPU/split-vector-memoperand-offsets.ll test/CodeGen/PowerPC/ppc64-fastcc.ll (partially updated) test/CodeGen/PowerPC/vsx-fma-m.ll test/CodeGen/PowerPC/vsx-fma-sp.ll http://reviews.llvm.org/D8705 Reviewers: Hal Finkel, Andy Trick. llvm-svn: 259673	2016-02-03 17:52:29 +00:00
Matt Arsenault	de4208122b	AMDGPU: Do not promote allocas with non-inbounds GEPs If we can't assume the pointer value isn't within the bounds of the object, it seems risky to try to replace the pointer calculations. llvm-svn: 259573	2016-02-02 21:16:12 +00:00
Matt Arsenault	7e747f1a38	AMDGPU: Handle promoting memmove Also add missing tests for the others. llvm-svn: 259558	2016-02-02 20:28:10 +00:00
Matt Arsenault	8b175672cb	AMDGPU: Skip promote alloca with no optimizations llvm-svn: 259551	2016-02-02 19:32:42 +00:00
Matt Arsenault	ad1348459f	AMDGPU: Whitelist handled intrinsics We shouldn't crash on unhandled intrinsics. Also simplify failure handling in loop. llvm-svn: 259546	2016-02-02 19:18:53 +00:00
Matt Arsenault	853a1fc6d9	AMDGPU: Use inbounds when calculating workitem offset When promoting allocas to LDS, we know we are indexing into a specific area just created, and the calculation will also never overflow. Also emit some of the muls as nsw nuw, because instcombine infers this already from the range metadata. I think putting this on the other adds and muls might be OK too, but I'm not 100% sure. llvm-svn: 259545	2016-02-02 19:18:48 +00:00
Oliver Stannard	7e7d983a87	Refactor backend diagnostics for unsupported features Re-commit of r258951 after fixing layering violation. The BPF and WebAssembly backends had identical code for emitting errors for unsupported features, and AMDGPU had very similar code. This merges them all into one DiagnosticInfo subclass, that can be used by any backend. There should be minimal functional changes here, but some AMDGPU tests have been updated for the new format of errors (it used a slightly different format to BPF and WebAssembly). The AMDGPU error messages will now benefit from having precise source locations when debug info is available. llvm-svn: 259498	2016-02-02 13:52:43 +00:00
Matt Arsenault	e013246462	AMDGPU: Fix emitting invalid workitem intrinsics for HSA The AMDGPUPromoteAlloca pass was emitting the read.local.size calls, which with HSA was incorrectly selected to reading from the offset mesa uses off of the kernarg pointer. Error on intrinsics which aren't supported by HSA, and start emitting the correct IR to read the workgroup size out of the dispatch pointer. Also initialize the pass so it can be tested with opt, and start moving towards not depending on the subtarget as an argument. Start emitting errors for the intrinsics not handled with HSA. llvm-svn: 259297	2016-01-30 05:19:45 +00:00
Matt Arsenault	d0799df707	AMDGPU: Stop checking intrinsics not used by HSA for dispatch-ptr Only the dispatch.ptr intrinsic is supposed to be used now to get the workgroup size, and the read.local.size intrinsics do not work correctly. llvm-svn: 259296	2016-01-30 05:10:59 +00:00
Matt Arsenault	43976df0da	AMDGPU: Add new amdgcn workitem intrinsics These use the correct prefix and follow the HSA naming convention rather than the config register option names. llvm-svn: 259293	2016-01-30 04:25:19 +00:00
Matt Arsenault	295875efda	AMDGPU: Remove 24-bit intrinsics The known bit matching code seems to work reasonably well, so these shouldn't really be needed. llvm-svn: 259180	2016-01-29 10:05:16 +00:00
Matt Arsenault	5b39b34ca5	AMDGPU: Match fmed3 patterns with legacy fmin/fmax llvm-svn: 259090	2016-01-28 20:53:48 +00:00
Matt Arsenault	f639c32739	AMDGPU: Match some med3 patterns llvm-svn: 259089	2016-01-28 20:53:42 +00:00
Matt Arsenault	7293f9895e	AMDGPU: Set DX10Clamp bit llvm-svn: 259088	2016-01-28 20:53:35 +00:00
Oliver Stannard	02fa1c80c4	Revert r259035, it introduces a cyclic library dependency llvm-svn: 259045	2016-01-28 13:19:47 +00:00
Oliver Stannard	b4b092ea1b	Add backend dignostic printer for unsupported features Re-commit of r258951 after fixing layering violation. The related LLVM patch adds a backend diagnostic type for reporting unsupported features, this adds a printer for them to clang. In the case where debug location information is not available, I've changed the printer to report the location as the first line of the function, rather than the closing brace, as the latter does not give the user any information. This also affects optimisation remarks. Differential Revision: http://reviews.llvm.org/D16590 llvm-svn: 259035	2016-01-28 10:07:27 +00:00
NAKAMURA Takumi	628a7a0aef	Revert r258951 (and r258950), "Refactor backend diagnostics for unsupported features" It broke layering violation in LLVMIR. clang r258950 "Add backend dignostic printer for unsupported features" llvm r258951 "Refactor backend diagnostics for unsupported features" llvm-svn: 259016	2016-01-28 04:41:32 +00:00
Oliver Stannard	1e67a9f196	Refactor backend diagnostics for unsupported features The BPF and WebAssembly backends had identical code for emitting errors for unsupported features, and AMDGPU had very similar code. This merges them all into one DiagnosticInfo subclass, that can be used by any backend. There should be minimal functional changes here, but some AMDGPU tests have been updated for the new format of errors (it used a slightly different format to BPF and WebAssembly). The AMDGPU error messages will now benefit from having precise source locations when debug info is available. The implementation of DiagnosticInfoUnsupported::print must be in lib/Codegen rather than in the existing file in lib/IR/ to avoid introducing a dependency from IR to CodeGen. Differential Revision: http://reviews.llvm.org/D16590 llvm-svn: 258951	2016-01-27 17:30:33 +00:00
Marek Olsak	e86f252209	AMDGPU/SI: Stoney has only 16 LDS banks Summary: This is a candidate for stable, along with all patches that add the "stoney" processor. Reviewers: tstellarAMD Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16485 llvm-svn: 258922	2016-01-27 11:19:45 +00:00
Matt Arsenault	b22828f2fb	AMDGPU: Fix default device handling When no device name is specified, default to kaveri for HSA since SI is not supported and it woud fail. Default to "tahiti" instead of "SI" since these are effectively the same, and tahiti is an actual device. Move default device handling to the TargetMachine rather than the AMDGPUSubtarget. The module ISA version is computed from the device name provided with the target machine, so the attributes printed by the AsmPrinter were inconsistent with those computed in the subtarget. Also remove DevName field from subtarget since it's redundant with getCPU() in the superclass. llvm-svn: 258901	2016-01-27 02:17:49 +00:00
Matt Arsenault	c5f6152911	AMDGPU: Make v32i8/v64i8 illegal types Old intrinsics were forcing these, but they have now all been removed. This fixes large i8 vector operations generally being broken. llvm-svn: 258788	2016-01-26 04:43:48 +00:00
Matt Arsenault	018179fc46	AMDGPU: Remove old sample intrinsics I did my best to try to update all the uses in tests that just happened to use the old ones to the newer intrinsics. I'm not sure I got all of the immediate operand conversions correct, since the value seems to have been ignored by the old pattern but I don't think it really matters. llvm-svn: 258787	2016-01-26 04:38:08 +00:00
Matt Arsenault	051d6f9fde	AMDGPU: Add new amdgcn intrinsics for cube instructions More cleanup to try to get all intrinsics using the correct amdgcn prefix that are as close to the instruction as possible. llvm-svn: 258786	2016-01-26 04:29:56 +00:00
Matt Arsenault	9a10cea7fb	AMDGPU: Implement read_register and write_register intrinsics Some of the special intrinsics now that now correspond to a instruction also have special setting of some registers, e.g. llvm.SI.sendmsg sets m0 as well as use s_sendmsg. Using these explicit register intrinsics may be a better option. Reading the exec mask and others may be useful for debugging. For this I'm not sure this is entirely correct because we would want this to be convergent, although it's possible this is already treated sufficently conservatively. llvm-svn: 258785	2016-01-26 04:29:24 +00:00
Matt Arsenault	0c3e2338fe	AMDGPU: Restore AMDGPU prefixed rsq intrinsic for now Also move into backend intrinsics to discourage use of the old name. llvm-svn: 258783	2016-01-26 04:14:16 +00:00

1 2 3 4 5 ...

424 Commits